-
Bug
-
Resolution: Done
-
Critical
-
None
I've found a fairly bad OpenShift GitOps/Argo CD bug that I'm seeing on the Stonesoup Staging cluster (but can easily be reproduced without Stonesoup).
The bug is VERY similar to, but not the same as, GITOPS-2242.
When Argo CD is run as namespace-scoped, e.g. using the `argocd.argoproj.io/managed-by: (argo cd namespace)` feature:
You will find that Argo CD is unable to deploy to all other namespace-scoped Namespaces (eg is entirely broken for this use case), if there exists at least one other Namespace in 'Terminating' state.
Or said another way, a Namespace stuck in Terminating state will break all other Argo CD deployments.
Steps to reproduce:
1) Follow the 'Steps to reproduce' on https://issues.redhat.com/browse/GITOPS-2242
- The symptoms of this bug differ slightly from that bug, so next, proceed to step 2
2) 'kubectl apply' the following Argo CD Application
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app namespace: gitops-service-argocd spec: destination: namespace: john server: https://kubernetes.default.svc project: default source: path: kustomize-guestbook repoURL: https://github.com/argoproj/argocd-example-apps targetRevision: master syncPolicy: automated: prune: true selfHeal: true
3) Wait a few moments, then run 'kubectl get -n gitops-service-argocd application.argoproj.io/my-app -o yaml'
You will see that the Argo CD Application fails to deploy, and has the following status:
status: conditions: - lastTransitionTime: "2023-02-02T17:58:41Z" message: 'failed to sync cluster https://172.30.0.1:443: failed to load initial state of resource Service: services is forbidden: User "system:serviceaccount:gitops-service-argocd:gitops-service-argocd-argocd-application-controller" cannot list resource "services" in API group "" in the namespace "jane"' type: ComparisonError - lastTransitionTime: "2023-02-02T17:58:41Z" message: 'failed to sync cluster https://172.30.0.1:443: failed to load initial state of resource Service: services is forbidden: User "system:serviceaccount:gitops-service-argocd:gitops-service-argocd-argocd-application-controller" cannot list resource "services" in API group "" in the namespace "jane"' type: ComparisonError health: status: Healthy reconciledAt: "2023-02-02T17:58:41Z" sync: comparedTo: destination: namespace: john server: https://kubernetes.default.svc source: path: kustomize-guestbook repoURL: https://github.com/argoproj/argocd-example-apps targetRevision: master status: Unknown
You can see that Argo CD is not able to deploy to the 'john' Namespace.
This error is unexpected as:
- It is failing in the 'jane' namespace, and we are not trying to deploy into the 'jane' namespace
- There exist no existing Applications that are attempting to deploy to the 'jane' namespace
- Argo CD DOES have a valid Role/RoleBinding in 'john', and that points to the valid serviceaccount in gitops-service-argo-cd.
So why is that message being printed? Well, the initial 'kubectl delete ns jane' command deleted the Role/Rolebindings in 'jane', so Argo CD is no longer able to access them. That alone appears to be enough to prevent Argo CD from deploying to any other Namespaces.