Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 1.16.0
Affects Version/s: 1.8.3
Component/s: ArgoCD, Operator
Labels:

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Release Note Text:

Hide
Before this update, the ArgoCD notification-controller's `on-deployed` trigger would send a `success` notification even when the application was still in the `progressing` state. This issue occurrs due to how ArgoCD handles application status updates. To resolve this, a new time field, `status.health.lastTransitionTime`, has been introduced in the application status. This field records the timestamp of the last health status change. Using this new field, the `on-deployed` trigger has been stabilized to prevent false-positive notifications.

Show
Before this update, the ArgoCD notification-controller's `on-deployed` trigger would send a `success` notification even when the application was still in the `progressing` state. This issue occurrs due to how ArgoCD handles application status updates. To resolve this, a new time field, `status.health.lastTransitionTime`, has been introduced in the application status. This field records the timestamp of the last health status change. Using this new field, the `on-deployed` trigger has been stabilized to prevent false-positive notifications.
Git Pull Request:
https://github.com/argoproj/argo-cd/pull/21333, https://github.com/argoproj-labs/argocd-operator/pull/1633
Intelligence Requested:
Market:

Original story points:
5
Sprint:
GITOPS Sprint 3251, GitOps Crimson - Sprint 3258, GitOps Crimson - Sprint 3259, GitOps Crimson - Sprint 3262, GitOps Crimson - Sprint 3263, GitOps Crimson - Sprint 3268
Customer Impact:

Customer Facing

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:
Argocd notification-controller sends "success" notification when application is stilll in "progressing" state.
Related Upstream Issue: https://github.com/argoproj/argo-cd/issues/9070

Cu using the argocd notification-controller (installed via gitops operator) to trigger a webhook on successful deployments. While this generally works, they noticed that it also often triggers a false positive status, i.e. it triggers the 'app-deployed' condition while the app is in fact not yet deployed but still in 'progressing' state.

They are using the default 'on-deployed' trigger as shipped with gitops and also documented in the upstream argocd project:

trigger.on-deployed: |-
    - description: Application is synced and healthy. Triggered once per commit.
      oncePer: app.status.operationState.syncResult.revision
      send:
      - app-deployed
      when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status
          == 'Healthy'

While analyzing this problem they found a similar issue was reported to the upstream github repo: https://github.com/argoproj/argo-cd/issues/9070.

Workaround
During their attempts to reproduce this issue, they noticed that it would not always but often enough trigger false positives. Because of this they think there is some kind of race condition occurring in the app health state tracking of the notification controller. Inspired by this assumption they tried to work around the issue by simply adding a sleep interval to the 'on-deployed' trigger:

 trigger.on-deployed: |-
    - description: Application is synced and healthy. Triggered once per commit.
      oncePer: app.status.operationState.syncResult.revision
      send:
      - app-deployed
      when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status
          == 'Healthy' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Seconds() >= 10

While this really isn't elegant, it so far helped them to avoid the issue. Since introducing this hacky workaround the issue hasn't appeared any more. Still, they would really appreciate to have this bug fixed properly, since their use cases include huge amounts of parallel deployments and this sleep interval really slows things down.

Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
Check Upstream Issue: https://github.com/argoproj/argo-cd/issues/9070

Reproducibility (Always/Intermittent/Only Once):

Always

Acceptance criteria:

Definition of Done:

Build Details:

Additional info (Such as Logs, Screenshots, etc):
Customer satisfaction is impacted since this issue causes noticeably longer wait times.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

false-postive-notification.mov
2024/01/11 6:25 AM
57.92 MB
Siddhesh Ghadi
false-postive-notification-1.mov
2024/01/11 6:26 AM
57.92 MB
Siddhesh Ghadi

relates to

GITOPS-5572 Timestamp for Application Health Status

Closed

links to

openshift/openshift-docs#90047: RHDEVDOCS-6337: Content creation for GitOps 1.16 RN

RHEA-2025:144480 Errata Advisory for Red Hat OpenShift GitOps v1.16.0

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates