Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: 4.17.z, 4.16.z
Component/s: Containers
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    On clusters running OpenShift 4.17, we observe that from time to time, container creation fails because the node pulls the wrong image architecture (e.g., amd64 instead of arm64). This leads to crio logging an architecture mismatch and the container never starting.
The issue has occurred in multiple cases:

1. While using CrunchyData’s PGO operator (not from OperatorHub)

2. With internal images like cloud-network-ingress-operator:v1.0.3 during a node drain


Even after pulling the correct image with podman using --arch arm64, crio continues to try pulling the incorrect architecture


This is not limited to cluster upgrades. Although the behavior became prominent after an upgrade from 4.17.23 to 4.17.24, we have observed it during normal pod scheduling, not just upgrades or node drains.

Version-Release number of selected component (if applicable):

How reproducible:

    Intermittent. Not deterministic, but has been reproduced across multiple nodes and clusters.

Steps to Reproduce:

 1.Deploy a pod using a multi-arch image (e.g., crunchy-postgres:15.12.1).


2. On an arm64 node, observe if it fails to start due to an architecture mismatch.


3. Inspect crio logs and note if it pulled the amd64 image instead of arm64.

Actual results:

    crio pulls the image with incorrect architecture:

 Image operating system mismatch: image uses OS "linux"+architecture "amd64"+"", expecting one of "linux+arm64+\"v8\", linux+arm64+\"\""

Container fails to start. The issue is resolved only after manually deleting the incorrect image with crictl rmi, after which the node pulls the correct image again.

Expected results:

    crio should always select the correct image architecture (arm64 on aarch64 nodes) from multi-arch manifests.

Additional info:

    podman pull --arch arm64 fetches the correct image, showing the registry and manifest are correct.


We suspect a regression in the logic CRI-O uses to resolve image manifests for multi-arch images.


This affects multiple images (not just PGO):


docker.bin.sbb.ch/hive/crunchy-postgres:15.12.1


paas.docker.bin.sbb.ch/paas/cloud-network-ingress-operator:v1.0.3




Issue started becoming more frequent post OpenShift 4.17.x updates, but is not strictly related to upgrades.

Assignee:: Miloslav Trmač

Reporter:: Vishvranjan Mishra

Need Info From:: None

Contributors:: None

QA Contact:: Min Li

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/04/17 12:59 PM

Updated:: 2025/09/13 6:26 PM

Resolved:: 2025/07/10 2:51 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide