-
Bug
-
Resolution: Done
-
Major
-
1.8.3
-
5
-
False
-
-
False
-
-
-
-
5
-
GITOPS Sprint 3251, GitOps Crimson - Sprint 3258, GitOps Crimson - Sprint 3259, GitOps Crimson - Sprint 3262, GitOps Crimson - Sprint 3263, GitOps Crimson - Sprint 3268
-
Customer Facing
Description of problem:
Argocd notification-controller sends "success" notification when application is stilll in "progressing" state.
Related Upstream Issue: https://github.com/argoproj/argo-cd/issues/9070
Cu using the argocd notification-controller (installed via gitops operator) to trigger a webhook on successful deployments. While this generally works, they noticed that it also often triggers a false positive status, i.e. it triggers the 'app-deployed' condition while the app is in fact not yet deployed but still in 'progressing' state.
They are using the default 'on-deployed' trigger as shipped with gitops and also documented in the upstream argocd project:
trigger.on-deployed: |- - description: Application is synced and healthy. Triggered once per commit. oncePer: app.status.operationState.syncResult.revision send: - app-deployed when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'Healthy'
While analyzing this problem they found a similar issue was reported to the upstream github repo: https://github.com/argoproj/argo-cd/issues/9070.
Workaround
During their attempts to reproduce this issue, they noticed that it would not always but often enough trigger false positives. Because of this they think there is some kind of race condition occurring in the app health state tracking of the notification controller. Inspired by this assumption they tried to work around the issue by simply adding a sleep interval to the 'on-deployed' trigger:
trigger.on-deployed: |- - description: Application is synced and healthy. Triggered once per commit. oncePer: app.status.operationState.syncResult.revision send: - app-deployed when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'Healthy' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Seconds() >= 10
While this really isn't elegant, it so far helped them to avoid the issue. Since introducing this hacky workaround the issue hasn't appeared any more. Still, they would really appreciate to have this bug fixed properly, since their use cases include huge amounts of parallel deployments and this sleep interval really slows things down.
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
Check Upstream Issue: https://github.com/argoproj/argo-cd/issues/9070
Reproducibility (Always/Intermittent/Only Once):
Always
Acceptance criteria:
Definition of Done:
Build Details:
Additional info (Such as Logs, Screenshots, etc):
Customer satisfaction is impacted since this issue causes noticeably longer wait times.
- relates to
-
GITOPS-5572 Timestamp for Application Health Status
-
- Closed
-
- links to
-
RHEA-2025:144480 Errata Advisory for Red Hat OpenShift GitOps v1.16.0