-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Automatic Cleanup of Orphaned Roles and RoleBindings in Source Namespaces
-
M
-
False
-
-
False
-
To Do
-
-
Epic Goal
The goal of this epic is to implement a safety mechanism that identifies namespaces containing "orphaned" Operator roles and automatically re-applies the tracking label. This ensures that even if tracking labels were lost during an upgrade or due to race conditions, the Operator can regain control and perform a clean deletion of resources.
Why is this important?
When a namespace is removed from the .spec.sourceNamespaces or .spec.applicationSet.sourceNamespaces list, the Operator should clean up the associated Roles and RoleBindings. However, these resources are currently being orphaned.
Technical Root Cause: Investigation reveals a race condition involving the legacy function removeUnmanagedApplicationSetSourceNamespaceResources. During an upgrade, this code incorrectly removes the tracking label (argocd.argoproj.io/applicationset-managed-by-cluster-argocd) from the namespace before the new Operator version takes over.
Because the tracking label is removed prematurely, the new Operator logic cannot identify the namespace as one that requires cleanup, leaving orphaned Roles and RoleBindings behind indefinitely.
Approach
The Operator will use a label selector to list Roles across all namespaces and filter them based on deterministic naming conventions and labels. If a match is found in a non-tracked namespace, the Operator will temporarily re-add the tracking label to the namespace to trigger a standard reconciliation cleanup.
A resource is considered an "orphan" if it meets all of the following:
- Labels: Has app.kubernetes.io/part-of: argocd AND app.kubernetes.io/managed-by: <ArgoCD-CR-Name>.
- Deterministic Name: Matches (cr.Name)
(cr.Namespace)-applicationsetOR (cr.Name)(target-namespace).
- Scope Check: The Role contains permissions/references specifically for the Application or ApplicationSet CRD.
Reference PR: argocd-operator/pull/1979
Other Considerations
- Risk: Infinite Reconciliation Loop. If the discovery logic and cleanup logic conflict, the label might be added and removed repeatedly.
-
- Mitigation: Strict validation that the label is only re-added if valid orphaned Roles (with Application/ApplicationSet permissions) are present.
- Risk: Regression in Stable Streams.
Definition of Ready
- The epic has been broken down into stories.
- Stories have been scoped.
- The epic has been stack ranked.
Definition of Done
- Code Complete:
- All code has been written, reviewed, and approved.
- Tested:
- Unit tests have been written and passed.
- Integration tests have been completed.
- System tests have been conducted, and all critical bugs have been fixed.
- Tested on OpenShift either upstream or downstream on a local build.
- Documentation:
- User documentation or release notes have been written.
- Build:
- Code has been successfully built and integrated into the main repository / project.
- Review:
- Code has been peer-reviewed and meets coding standards.
- All acceptance criteria defined in the user story have been met.
- Tested by reviewer on OpenShift.
- Deployment:
- The feature has been deployed on OpenShift cluster for testing.
- Acceptance:
- Product Manager or stakeholder has reviewed and accepted the work.