Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54873

console operator techpreview job frequently fails due to kube-apiserver still progressing

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      Likely a testing issue, but must be resolved as it is a component readiness regression, hiding signal if nothing else.

      Show
      Likely a testing issue, but must be resolved as it is a component readiness regression, hiding signal if nothing else.
    • None
    • Important
    • None
    • None
    • None
    • Approved
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      operator conditions kube-apiserver

      No significant regressions found

      Sample (being evaluated) Release: 4.19
      Start Time: 2025-03-13T00:00:00Z
      End Time: 2025-04-10T16:00:00Z
      Success Rate: 95.89%
      Successes: 140
      Failures: 6
      Flakes: 0

      Base (historical) Release: 4.18
      Start Time: 2025-03-11T00:00:00Z
      End Time: 2025-04-10T16:00:00Z
      Success Rate: 100.00%
      Successes: 102
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      For whatever reason, the console operator job is completing and the kube-api-server is complaining about

      {Operator progressing (NodeInstaller): NodeInstallerProgressing: 1 node is at revision 5; 2 nodes are at revision 7  Operator progressing (NodeInstaller): NodeInstallerProgressing: 1 node is at revision 5; 2 nodes are at revision 7}
      

      This problem is surfacing on the console operator jobs far more common than anywhere else and it's been going on for some time. Also note it appears to be a techpreview job that is the problem.

      This job appears to not use origin and thus most of the debug tools we would rely on appear to be missing. (namely intervals)

      Normal openshift origin test steps do have some code that waits for a stable cluster, it's possible these are missing in your job? Also hooking the resourcewatch observer into the flow would help. This creates a git repo where each update to most resources in the cluster is modelled as a commit, telling you when updates happened and what they changed. https://github.com/openshift/release/tree/e3d11049e53bdf229e4d77f72484d7cdda128dbf/ci-operator/step-registry/observers/resource-watch

              jhadvig@redhat.com Jakub Hadvig
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              YaDan Pei YaDan Pei
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: