Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60551

Component Readiness: [Installer / openshift-installer] [Other] test regressed

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      install should succeed: overall

      Significant regression detected.
      Fishers Exact probability of a regression: 99.97%.
      Test pass rate dropped from 100.00% to 89.36%.

      Sample (being evaluated) Release: 4.20
      Start Time: 2025-08-08T00:00:00Z
      End Time: 2025-08-15T12:00:00Z
      Success Rate: 89.36%
      Successes: 84
      Failures: 10
      Flakes: 0
      Base (historical) Release: 4.19
      Start Time: 2025-05-18T00:00:00Z
      End Time: 2025-06-17T23:59:59Z
      Success Rate: 100.00%
      Successes: 38
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      Also hits: install should succeed: other

      This is happening much too often and thus there are three regressions on the board for metal for the past three weeks.

      I see two primary failure patterns:

      error: unable to iterate over layer sha256:fff8760c9373285309eebc4e509e9187a3280bc264071465edccf73e27e2f4c0 from quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f11871b1ecd03626999334bb45265c12f76211e18724e4d4f0bc662285f4a72: unexpected EOF
      

      and

       Error: Error downloading packages:
        cockpit-ws-selinux-344-1.el9.x86_64: Cannot download, all mirrors were already tried without success 
      

      The precise image/package varies from failure to failure.

      These are longstanding issues in the metal jobs. What steps can be taken to stabilize these problems? Several were discussed the last time this came up, has there been any progress on any of the following:

      • maintaining more stable dnf repos
      • generating junit tests to identify specific failure patterns and how often they occur

      We need creative thinking here as we cannot leave metal installs failing like this, it pollutes signal and causes problems for TRT and the installer team who get flagged on the reduced pass rates, nor can we just turn our back on monitoring of metal install success and ignore it.

      3 comp readiness regressions, marking release blocker.

      Filed by: dgoodwin@redhat.com

              Unassigned Unassigned
              openshift-trt OpenShift Technical Release Team
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: