Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-23005

Support Application-Level Federated Learning Metrics in Open Cluster Management

XMLWordPrintable

    • None

      Value Statement

      APPLICATION-Level Metrics (Model Loss & Accuracy)

      https://docs.google.com/document/d/1mWjWiJz4IxDe5fcSdjv1a-Loii27FeOqJCUtsqifIz4/edit?tab=t.0#heading=h.tuk7jg1rd7s

      In the second phase, we gather metrics directly related to the FL training process and model quality.

      Metric Type Source Collection Tool
      Accuracy/ round(convergence trend) FL server /client (training loop) Custom OpenTelemetry Exporter 
      Loss (train/val)/Round FL server / client Custom OpenTelemetry Exporter
      Recall / F1 Score FL server / client(post evaluation) Custom OpenTelemetry Exporter
      Round Duration Server timestamps Embedded metrics or tracing
      Client Participation FL coordinator logs / events Exported as custom metrics

      Collection Setup:

      • Export metrics from training loop via OpenTelemetry SDK (e.g., Python or Go)
      • Expose /metrics endpoint or use OTLP exporters to send metrics to collector

      Add relevant attributes: round_id, client_id, dataset, cluster_id

      Definition of Done for Engineering Story Owner (Checklist)

      • ...

      Development Complete

      • The code is complete.
      • Functionality is working.
      • Any required downstream Docker file changes are made.

      Tests Automated

      • [ ] Unit/function tests have been automated and incorporated into the
        build.
      • [ ] 100% automated unit/function test coverage for new or changed APIs.

      Secure Design

      • [ ] Security has been assessed and incorporated into your threat model.

      Multidisciplinary Teams Readiness

      Support Readiness

      • [ ] The must-gather script has been updated.

              rh-ee-myan Meng Yan
              yuhe@redhat.com Yuanyuan He
              Hui Chen Hui Chen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: