-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Annotation led refresh
-
L
-
False
-
-
False
-
-
To Do
-
GITOPS-8789 - AppSet Progressive Sync GA
-
100% To Do, 0% In Progress, 0% Done
-
-
Epic Goal
- Make sure all Applications part of the ApplicationSet are reconciled before starting a Progressive sync. This modifies the internal logic of progressive sync, possible impact to the end user is delay in overall sync of applications across defined steps. The change - progressive sync can only be started when all apps have been refreshed. This ensures that Progressive Sync decisions are always based on the absolute latest state of child applications, eliminating out of defined order execution caused by stale data.
Why is this important?
- There are no mechanisms in Argo CD to make sure all Applications of an ApplicationSet are reconciled together. We have to assume eventual consistency.
- Progressive Sync relies on the ApplicationSet controller’s view of its child Applications. However, due to the asynchronous nature of Kubernetes controllers, the ApplicationSet currently makes a "Go/No-Go" sync decision without verifying the freshness of that data and might base it on outdated status of an application in certain scenarios.
- Stability block - this is primary blocker for moving progressive sync feature to stability, and thus GA. (issue - https://github.com/argoproj/argo-cd/issues/22558
- Operational Risk: Without this, users cannot trust that a multi-stage rollout will respect the defined order (e.g., syncing Production before Staging finishes)
- Suggested as a workaround on the proposal of making caching changes as the root-cause of stale data. Instead of a heavy-handed refactor of the entire caching layer, this approach uses existing ArgoCD patterns (annotations) to create a "Ready-to-Check" signal.
[Keith] will the outcome of this epic provide that level of guarantee and assurance that the multi-stage rollout will be respected? Will the use of the annotations just 'improve' the situation to some level of degree? Especially in more complex appset configurations with many applications, do you think it will work?
Scenarios
(extracted from the proposal - please refer to the linked proposal for more details)
- If all the apps in my appset pull from a single targetRevision, I expect that a change pushed to that targetRevision will be progressively synced through the defined stages.
- If I make a change to the AppSet's spec.template, and if that change causes two or more Apps to go OutOfSync, I expect my change to be progressively synced through the defined stages.
- If my AppSet's Applications have different sources (repos, branches, etc.), a change made to all sources at one time should be progressively synced through the defined stages.
SDLC Questionnaire
| S.No | Questions | Yes/No | Sample JIRA Epic |
|---|---|---|---|
| 1 | Does this Epic address a change in way the product is being used? (eg: Adding support for OpenShift GitOps to be used in ROSA cluster with HCP) | No | GITOPS-5223 |
| 2 | Does this Epic require a change in the application's runtime - Upgrade of operator-sdk, OLM, client-go, go-toolset ? | No | GITOPS-8104 |
| 3 | Does this Epic primarily dealing with introducing a new security related feature (eg: Introduce SSO support) | No | GITOPS-437, GITOPS-547 |
| 4 | Does this Epic primarily dealing with the modification of an existing security feature ? (Eg: Supporting of External Authentication for SSO) | No | GITOPS-8017 |
| 5 | Does this Epic require changes to any cryptographic library ( Eg: FIPS support for OpenShift GitOps) | No | |
| 6 | Does this Epic require any new or change in the existing cryptographic algorithms used in the product (Eg: Using GPG verification for manifests, Upgrading from SHA256 to SHA512) | No | |
| 7 | Does this Epic require any change in existing authentication mechanisms (eg: Argo CD Auth integration with OpenShift, Kerberos to OAuth) | No | GITOPS-437 GITOPS-547 |
| 8 | Does this Epic require any change in authorisation mechanism (Eg: Using RBAC and service accounts impersonation for App Sync) | No | |
| 9 | Does this Epic require a change in the Communication protocol ( Eg: Using TLS to encrypt data traffic to/from Redis cache) | No | GITOPS-720 |
| 10 | Does this Epic require a change in how External Data is parsed and validated ? ( Eg: Change from JSON to Protobuf) | No | |
| 11 | Does this Epic require a change in core libraries or runtime (Eg: go compiler upgrade, Changing Operator SDK, controller-runtime, client-go versions) | No | |
| 12 | Does this Epic require exposing any internal service to internet (Eg: Allow exposing Argo CD Agent principal via Route, using ArgoCD CR) | No | |
| 13 | Does this Epic require a change in any existing gRPC service APIs | No | |
| 14 | Does this Epic require a change in any new external service (Eg: Support for OCI container registry for storing manifests) | No | |
| 15 | Does this Epic require a change in the tenancy model ? (Eg: Supporting Apps/Appsets in Any namespace, cluster and repo credentials in any namespace) | No | |
| 16 | Does this Epic require any addition/modification of RBAC resources (Service Account, Role, RoleBinding, ClusterRole, ClusterRoleBinding) ? | No | |
| 17 | Does this Epic require a feature that needs to be enabled only for cluster scoped Argo CD instances ? | No |
Other Considerations
- This is not the only things holding back progressive sync from GA.
- Previous effort gone into working on the model of adding annotation in the draft PR - https://github.com/argoproj/argo-cd/pull/25903 - but needs changes to the method.
- This should help resolve the linked issue, in theory, unless hit some unknown variable/scenario.
- <Any unanswered questions?>
Definition of Ready
- The epic has been broken down into stories.
- Stories have been scoped.
- The epic has been stack ranked.
Description and Background context:
https://github.com/argoproj/argo-cd/pull/25903 draft PR is the first implementation incorporating feedback from proposal https://github.com/argoproj/argo-cd/pull/24757#pullrequestreview-3316698997
Background context: root-cause of applications syncing out of order was believed to be the caching delay from controller-runtime when looking at the latest data from owned applications from applicationset controller. Feedback from the proposal suggests that caching delay may add to the problem but may not necessarily be the root-cause of the issue. ApplicationSets does not have a mechanism to get applications in order. Thus, before taking on cache improvements, which will be major change, adding a way for ApplicationSets to know which application to get first should help eliminate the issue.
Idea is that once ApplicationSet detects that any application has pending changes, ApplicationSet controller adds refresh annotations to all owned applications. And waits until all applications have been reconciled before proceeding with progressive sync.
Application controller has existing logic that reconciles application when that annotation is present. Once application controller reconciles an application - it removes this annotation and updates reconciledAt on the application.
To put it simply, once all owned applications have annotations removed, it is an indication that all owned applications have been reconciled and thus should have the latest data for progressive sync to take action on.
First task - how does ApplicationSet detect when to add annotations? Suggestion was to look at timeStamp when application changes to Waiting. But to decide order, need a reference for comparision, i.e - which can be earliest or latest-
draft PR - was adding annotation by looking at the earliest Transition time which was not resolving the syncing out of order issue, and could possibly run into an endless loop. After discussion with community members (AlexG) - should switch out to LatestTransitionTime instead.
Definition of Done
- Code Complete:
- All code has been written, reviewed, and approved.
- Tested:
- Unit tests have been written and passed.
- Integration tests have been completed.
- System tests have been conducted, and all critical bugs have been fixed.
- Tested on OpenShift either upstream or downstream on a local build.
- Documentation:
- User documentation or release notes have been written.
- Build:
- Code has been successfully built and integrated into the main repository / project.
- Review:
- Code has been peer-reviewed and meets coding standards.
- All acceptance criteria defined in the user story have been met.
- Tested by reviewer on OpenShift.
- Deployment:
- The feature has been deployed on OpenShift cluster for testing.
- Acceptance:
- Product Manager or stakeholder has reviewed and accepted the work.
- clones
-
GITOPS-7387 Progressive Sync MVP for beta
-
- Release Pending
-