Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24195

Auth operator capable of firing over 100 events in seconds on OpenShiftAPICheckFailed

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      The failure is fairly rare globally but some platforms seem to see it more often. Last night we happened to see it twice in 10 azure runs and aggregation failed on it. It appears to be a longstanding issue however.

      The following test catches the problem

      [sig-arch] events should not repeat pathologically for ns/openshift-authentication-operator

      And the error will show something similar to:

      {  1 events happened too frequently
      
      event happened 70 times, something is wrong: namespace/openshift-authentication-operator deployment/authentication-operator hmsg/16eeb8c913 - reason/OpenShiftAPICheckFailed "oauth.openshift.io.v1" failed with an attempt failed with statusCode = 503, err = the server is currently unable to handle the request From: 15:46:39Z To: 15:46:40Z result=reject }
      

      This is quite severe for just 1 second. The intervals database shows occurrences of over 100.

      Sippy's test page provides insight into what platforms see the problem more, and can be used to find job runs where this happens, but the runs from yesterday were:

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1729512594592501760

      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1729512598153465856

            lszaszki@redhat.com Lukasz Szaszkiewicz
            rhn-engineering-dgoodwin Devan Goodwin
            Xingxing Xia Xingxing Xia
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: