-
Story
-
Resolution: Done
-
Major
-
None
-
None
Value Statement
APPLICATION-Level Metrics (Model Loss & Accuracy)
In the second phase, we gather metrics directly related to the FL training process and model quality.
Metric Type | Source | Collection Tool |
Accuracy/ round(convergence trend) | FL server /client (training loop) | Custom OpenTelemetry Exporter |
Loss (train/val)/Round | FL server / client | Custom OpenTelemetry Exporter |
Recall / F1 Score | FL server / client(post evaluation) | Custom OpenTelemetry Exporter |
Round Duration | Server timestamps | Embedded metrics or tracing |
Client Participation | FL coordinator logs / events | Exported as custom metrics |
Collection Setup:
- Export metrics from training loop via OpenTelemetry SDK (e.g., Python or Go)
- Expose /metrics endpoint or use OTLP exporters to send metrics to collector
Add relevant attributes: round_id, client_id, dataset, cluster_id
Definition of Done for Engineering Story Owner (Checklist)
- ...
Development Complete
- The code is complete.
- Functionality is working.
- Any required downstream Docker file changes are made.
Tests Automated
[ ] Unit/function tests have been automated and incorporated into the
build.[ ] 100% automated unit/function test coverage for new or changed APIs.
Secure Design
[ ] Security has been assessed and incorporated into your threat model.
Multidisciplinary Teams Readiness
[ ] Create an informative documentation issue using the [Customer
Portal_doc_issue template](
https://github.com/stolostron/backlog/issues/new?assignees=&labels=squad%3Adoc&template=doc_issue.md&title=),
and ensure doc acceptance criteria is met. Link the development issue to
the doc issue.[ ] Provide input to the QE team, and ensure QE acceptance criteria
(established between story owner and QE focal) are met.
Support Readiness
[ ] The must-gather script has been updated.
- clones
-
ACM-22688 Support System-Level Federated Learning Metrics in Open Cluster Management
-
- Closed
-