Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-3699

Argocd notification-controller sends "success" notification when application is stilll in "progressing" state

XMLWordPrintable

    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      Before this update, the ArgoCD notification-controller's `on-deployed` trigger would send a `success` notification even when the application was still in the `progressing` state. This issue occurrs due to how ArgoCD handles application status updates. To resolve this, a new time field, `status.health.lastTransitionTime`, has been introduced in the application status. This field records the timestamp of the last health status change. Using this new field, the `on-deployed` trigger has been stabilized to prevent false-positive notifications.
      Show
      Before this update, the ArgoCD notification-controller's `on-deployed` trigger would send a `success` notification even when the application was still in the `progressing` state. This issue occurrs due to how ArgoCD handles application status updates. To resolve this, a new time field, `status.health.lastTransitionTime`, has been introduced in the application status. This field records the timestamp of the last health status change. Using this new field, the `on-deployed` trigger has been stabilized to prevent false-positive notifications.
    • 5
    • GITOPS Sprint 3251, GitOps Crimson - Sprint 3258, GitOps Crimson - Sprint 3259, GitOps Crimson - Sprint 3262, GitOps Crimson - Sprint 3263, GitOps Crimson - Sprint 3268
    • Customer Facing

      Description of problem:
      Argocd notification-controller sends "success" notification when application is stilll in "progressing" state.
      Related Upstream Issue: https://github.com/argoproj/argo-cd/issues/9070
       
      Cu using the argocd notification-controller (installed via gitops operator) to trigger a webhook on successful deployments. While this generally works, they noticed that it also often triggers a false positive status, i.e. it triggers the 'app-deployed' condition while the app is in fact not yet deployed but still in 'progressing' state.
       
      They are using the default 'on-deployed' trigger as shipped with gitops and also documented in the upstream argocd project:

      trigger.on-deployed: |-
          - description: Application is synced and healthy. Triggered once per commit.
            oncePer: app.status.operationState.syncResult.revision
            send:
            - app-deployed
            when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status
                == 'Healthy' 

      While analyzing this problem they found a similar issue was reported to the upstream github repo: https://github.com/argoproj/argo-cd/issues/9070.
       
      Workaround
      During their attempts to reproduce this issue, they noticed that it would not always but often enough trigger false positives. Because of this they think there is some kind of race condition occurring in the app health state tracking of the notification controller. Inspired by this assumption they tried to work around the issue by simply adding a sleep interval to the 'on-deployed' trigger:
       

       trigger.on-deployed: |-
          - description: Application is synced and healthy. Triggered once per commit.
            oncePer: app.status.operationState.syncResult.revision
            send:
            - app-deployed
            when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status
                == 'Healthy' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Seconds() >= 10 

      While this really isn't elegant, it so far helped them to avoid the issue. Since introducing this hacky workaround the issue hasn't appeared any more. Still, they would really appreciate to have this bug fixed properly, since their use cases include huge amounts of parallel deployments and this sleep interval really slows things down.
       
      Prerequisites (if any, like setup, operators/versions):
      Steps to Reproduce
      Check Upstream Issue: https://github.com/argoproj/argo-cd/issues/9070
       
      Reproducibility (Always/Intermittent/Only Once):

      Always

      Acceptance criteria: 
       
      Definition of Done:
       
      Build Details:
       
      Additional info (Such as Logs, Screenshots, etc):
      Customer satisfaction is impacted since this issue causes noticeably longer wait times. 

        1. false-postive-notification.mov
          57.92 MB
          Siddhesh Ghadi
        2. false-postive-notification-1.mov
          57.92 MB
          Siddhesh Ghadi

              rh-ee-sghadi Siddhesh Ghadi
              rhn-support-gio Ginilekshmi A O (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: