-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Important
Upstream issue: https://github.com/argoproj/argo-cd/issues/23806
This is a tracking JIRA ticket for reviewing https://github.com/argoproj/argo-cd/pull/25229
Summary
When updating the namespaces field of a cluster secret, the handleModEvent function will Invalidate the entire cluster cache, then warm the cluster cache back up from scratch.
It would be great if this could be avoided as (for sufficiently many managed namespaces and API resources) it leads to long outages where all applications will be stuck in "Refreshing" state until the cache is re-filled.
Instead, modifying the namespaces field of a cluster-secret should only remove cache entries relating to the namespaces which are no longer under ArgoCD management.
Motivation
For ArgoCD in namespaced mode, adding and removing managed namespaces should not cause any issues for users. Currently, when namespaces under ArgoCD management get dynamically added and removed from the cluster-secret multiple times per day, argo is unusable for significant periods of time.
Proposal
How do you think this should be implemented?
Instead of invalidating the entire cluster cache when a cluster-secret is modified, only invalidate the parts of the cache which cannot be used anymore, i.e:
- if a namespace has been removed, remove all resources from that namespace from the cluster cache
- if a namespace has been added, the cache should still be valid, so leave all cache entries intact
This would reduce the load on the k8s API server (as it eliminates the need to refresh the entire cluster state), as well as ensure ArgoCD works as expected shortly after the managed namespace list gets modified.
Admittedly, I'm not familiar enough with ArgoCD's caching mechanisms to judge if this approach is reasonable, but I'd be happy to get feedback on this.
The request is to review and merge https://github.com/argoproj/argo-cd/pull/25229