Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-77165

Image not known errors causing mass failures across CI since cri-o 1.35.0

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • Approved
    • OCP Node Core Sprint 284
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Initially this was found as a install failure where the network operator complains that multus is still awaiting 1-2 nodes on gcp. However the situation has now evolved to where we can see the underlying issue on many tests, jobs, and platforms:

      Error: reading image "92c3c9025fe657a9160372a805bfd1624fb087262f2e3db326f937c298d47d70": locating image with ID "92c3c9025fe657a9160372a805bfd1624fb087262f2e3db326f937c298d47d70": image not known
      

       

      Comments below show job runs where this is being seen.

       Scanning bigquery for tests reporting this in their output:

      SELECT modified_time, prowjob_build_id, test_name, success, test_id, branch, prowjob_name, failure_content
      FROM `openshift-gce-devel.ci_analysis_us.junit` 
      WHERE   success = falseAND modified_time BETWEEN DATETIME("2026-02-10") AND DATETIME("2026-02-25")  
      AND failure_content like '%image not known%'  order by modified_time asc
      LIMIT 1000 

      Shows again, explosion of this problem on Feb 19. It is happening only in 4.22 jobs and a few presubmits, ruling out infrastructure or registries. (I think?) I will attach the results of the above query in json format.

      Sample job runs:

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview/2025590450445881344

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-techpreview/2025540689521020928

      Or see the test details report below.

       

      Search.ci can be used to find job runs impacted. It doesn't always load, but if you wait 5-10 minutes and try again it usually will.

      Original Report

       

      Component Readiness has found a potential regression in the following test:

      install should succeed: overall

      Extreme regression detected.
      Fishers Exact probability of a regression: 100.00%.
      Test pass rate dropped from 98.81% to 69.93%.

      Sample (being evaluated) Release: 4.22
      Start Time: 2026-02-16T00:00:00Z
      End Time: 2026-02-23T12:00:00Z
      Success Rate: 69.93%
      Successes: 200
      Failures: 86
      Flakes: 0
      Base (historical) Release: 4.20
      Start Time: 2025-09-21T00:00:00Z
      End Time: 2025-10-21T00:00:00Z
      Success Rate: 98.81%
      Successes: 1000
      Failures: 12
      Flakes: 0

      View the test details report for additional context.

      Several other gcp install variant combos are showing problems, this is just the most visible one. See the triage record link that will be added to this card in a comment shortly for the full list of all regressions, but there should be more than enough job runs to investigate in this report linked above.

      I think I see multiple causes here, unclear if there's one unifying underlying cause. I see problems with storage volume mounting, network operator awaiting one node, possibly more.

      Consider this card a bug to get gcp install rates back up from their current 70% success rate.

      Filed by: dgoodwin@redhat.com

      From what I can see, almost all of these runs are also showing a failure for:

      verify operator conditions network
      
      {  Operator Progressing=True (Deploying): DaemonSet "/openshift-multus/multus-additional-cni-plugins" is not available (awaiting 2 nodes)}
      

      Sometimes it's just one node. Yet all nodes seem to be fine.

      The test seems to be showing regressed in other areas too, here's a metal regression that looks similar.

      Global test analysis shows a sharp decline probably around the 19th.

      Mostly hitting gcp but some metal in the mix as well.

              rh-ee-atokubi Ayato Tokubi
              openshift-trt OpenShift Technical Release Team
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: