Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3887

[sig-arch] Check if alerts are firing during or after upgrade success --- alert TargetDown fired for x seconds

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 4.12
    • Monitoring
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Rejected
    • MON Sprint 238
    • 1
    • -
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      These alerts are causing failures in upgrade jobs across multiple platforms and CNI (seen on both ovn and sdn). The original bug was filed in bz and can't be updated in jira. Closing
      that in favor of this new to move it entirely in jira.

      last 7 days as of 11/18/2022:

      anything 4.12 upgrade, which is ovn and sdn, all platforms and 4.12->4.12 as well as 4.11->4.12, it's seen in 17% of all failed jobs:
      https://search.ci.openshift.org/?search=alert+TargetDown+fired+for&maxAge=168h&context=1&type=junit&name=periodic.*4.12.*-upgrade&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      for just ovn jobs, it's 11% of failures and for just sdn jobs it's 22% of failures.

      two interesting observations:

      1. when looking at 4.11->4.12 jobs specifically, it's still happening at 18% rate, but
        sdn sees it in 28% of it's failures, while ovn sees it in only 8%.
      2. and... when looking at 4.12->4.12 upgrade jobs, it's also happening in that same 17%
        rate, but all are in ovn and none in sdn.

      not sure what to make of that.

              spasquie@redhat.com Simon Pasquier
              jluhrsen Jamo Luhrsen
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: