Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-52848

Component Readiness: samples operator failing installs frequently on gcp

XMLWordPrintable

    • Important
    • Yes
    • False
    • Hide

      None

      Show
      None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a regression in the following test:

      install should succeed: overall

      Extreme regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 98.88% to 57.89%.
      Overrode base stats using release 4.17

      Sample (being evaluated) Release: 4.19
      Start Time: 2025-03-03T00:00:00Z
      End Time: 2025-03-10T08:00:00Z
      Success Rate: 57.89%
      Successes: 22
      Failures: 16
      Flakes: 0

      Base (historical) Release: 4.17
      Start Time: 2024-09-01T00:00:00Z
      End Time: 2024-10-01T00:00:00Z
      Success Rate: 98.88%
      Successes: 88
      Failures: 1
      Flakes: 0

      View the test details report for additional context.

      gcp installs seem to be failing frequently with the error:

      These cluster operators were not stable: [openshift-samples]
      

      From: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-techpreview/1898814955482779648

      The samples operator reports:

      status:
        conditions:
          - lastTransitionTime: "2025-03-09T19:56:05Z"
            status: "False"
            type: Degraded
          - lastTransitionTime: "2025-03-09T19:56:17Z"
            message: Samples installation successful at 4.19.0-0.nightly-2025-03-09-190956
            status: "True"
            type: Available
          - lastTransitionTime: "2025-03-09T20:43:02Z"
            message: "Samples installed at 4.19.0-0.nightly-2025-03-09-190956, with image import failures for these imagestreams: java,kube-root-ca.crt,openshift-service-ca.crt,nodejs; last import attempt 2025-03-09 19:57:39 +0000 UTC"
            reason: FailedImageImports
            status: "False"
            type: Progressing
      

      I'm confused how this is failing install given available=true and degraded=false, and yet there does appear to be a problem reported in the message. It is possible this artifact was collected a few minutes after the install failed, is it possible the operator stabilizes (ignores these errors) in that time? Note that not all installs are failing this way, but a good chunk.

      Problem appears limited to 4.19 gcp, I do see one hit for vsphere though.

      https://search.dptools.openshift.org/?search=These+cluster+operators+were+not+stable%3A.*openshift-samples&maxAge=48h&context=1&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

              aroyo@redhat.com Antonio Carlos Royo
              rhn-engineering-dgoodwin Devan Goodwin
              Jitendar Singh Jitendar Singh
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: