Loading...

XML

Word

Printable

Type: Spike
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Global Hub, Server Foundation
Labels:

Activity Type:
Product / Portfolio Work
Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
2025 GSoC - Federated Learning in Open Cluster Management
Acceptance Criteria:
Hide

Provide the required acceptance criteria using this template.

...
Show
Provide the required acceptance criteria using this template. ...
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Value Statement

Through collaborative innovation with APAC CTO and Edge Tech CoP, we've done a prototype to onboard Federated Learning into ACM. Leveraging ACM's existing architecture and APIs, we can easily support deploying FL runtimes to multi-cluster env, dispatching training workload to remote clusters for local training, and aggregating trained parameters back.

In next step, one of our goal is to seek for potential users that can collaborate together to put the solution into real use and continuous enhance feature based on feedback, in order to accelerate the solution evolving to production level.

The first potential user is Professor Bahman from Western Sydney University. In the two use cases his lab is currently working on, FL is to used improve AI efficiency in apps for Satellites Space Situational Awareness and Natural Disaster Management. Technical challenges regarding FL include (energy-aware and low latency):

communication overhead (both the satellite and the drones for disaster data collection have limited time to transport the trained parameters back to central training side for aggregation)
Footprint and energy consumption
Should support reporting back metrics for accuracy and performance evaluation, e.g. training related matrics like computation time, accuracy, train round; resource usage like the power consumption, etc
Ensure the demo can run locally and provide clear setup guidelines - Need support NodePort communication between the server and clients

Another use case is for 'retail' with respect to a franchise/dealer store management topology, requirements for the FL platform include:

be edge ready/friendly
easy for dealers & francise setups
FL framework (flower, openFL, FLARE) independent
support many different segmentations
support different FL participants per segmentation, updating different models
full-scale experimentation necessary across segmentations to achieve best model accuracy/performance
test different segmentations and provide metrics on model accuracy.

Definition of Done for Engineering Story Owner (Checklist)

Development Complete

The code is complete.
Functionality is working.
Any required downstream Docker file changes are made.

Tests Automated

~~[ ] Unit/function tests have been automated and incorporated into the~~
~~build.~~
~~[ ] 100% automated unit/function test coverage for new or changed APIs.~~

Secure Design

~~[ ] Security has been assessed and incorporated into your threat model.~~

Multidisciplinary Teams Readiness

~~[ ] Create an informative documentation issue using the [Customer~~
~~Portal_doc_issue template](~~
~~https://github.com/stolostron/backlog/issues/new?assignees=&labels=squad%3Adoc&template=doc_issue.md&title=),~~
~~and ensure doc acceptance criteria is met. Link the development issue to~~
~~the doc issue.~~
~~[ ] Provide input to the QE team, and ensure QE acceptance criteria~~
~~(established between story owner and QE focal) are met.~~

Support Readiness

~~[ ] The must-gather script has been updated.~~

is cloned by

ACM-22688 Support System-Level Federated Learning Metrics in Open Cluster Management

Closed

ACM-21010 Chatting with ACM - Prototype and Cases

Assignee:: Meng Yan

Reporter:: Yuanyuan He

QA Contact:: Hui Chen

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/03/28 6:57 AM

Updated:: 2025/08/11 2:03 AM

Resolved:: 2025/08/11 2:03 AM

Details

Description

Value Statement

Definition of Done for Engineering Story Owner (Checklist)

Development Complete

Tests Automated

Secure Design

Multidisciplinary Teams Readiness

Support Readiness

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates