Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48630

Metal upgrade jobs unable to mirror images prior to testing

XMLWordPrintable

    • Important
    • Yes
    • 3
    • Metal Platform 265, Metal Platform 266
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      [sig-cluster-lifecycle] Cluster completes upgrade

      Significant regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 99.53% to 90.48%.

      Sample (being evaluated) Release: 4.18
      Start Time: 2025-01-13T00:00:00Z
      End Time: 2025-01-20T16:00:00Z
      Success Rate: 90.48%
      Successes: 38
      Failures: 4
      Flakes: 0

      Base (historical) Release: 4.17
      Start Time: 2024-09-01T00:00:00Z
      End Time: 2024-10-01T23:59:59Z
      Success Rate: 99.53%
      Successes: 211
      Failures: 1
      Flakes: 0

      View the test details report for additional context.

      It seems metal ipv6 clusters are frequently failing to complete upgrade. Pass rates have clearly dropped on metal.

      Sample job runs:

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6/1881184224447303680

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-upgrade-ovn-ipv6/1879484257919832064

      In these runs the CVO is reporting a problem downloading the payload from build05

      I0115 12:59:38.191689       1 event.go:377] Event(v1.ObjectReference{Kind:"ClusterVersion", Namespace:"openshift-cluster-version", Name:"version", UID:"", APIVersion:"config.openshift.io/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'RetrievePayloadFailed' Retrieving payload failed version="" image="registry.build05.ci.openshift.org/ci-op-i76vkq77/release@sha256:d75540021d4619775d9d04ea32857b0bf2493837a62701befaf5ea3941072139" failure=Unable to download and prepare the update: deadline exceeded, reason: "DeadlineExceeded", message: "Job was active longer than specified deadline"
      

      This message shows near the end of the CVO log.

      However there are other failures in the mix of the component readiness test report linked above, but these two look interesting.

      Hits other tests as well:

      [sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
      [sig-arch][Feature:ClusterUpgrade] Cluster should be upgradeable after finishing upgrade [Late][Suite:upgrade]
      upgrade: [sig-cluster-lifecycle] Cluster version operator acknowledges upgrade
      [Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility collection

              rpittau@redhat.com Riccardo Pittau
              rhn-engineering-dgoodwin Devan Goodwin
              Honza Pokorny
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

                Created:
                Updated: