Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.18, 4.19, 4.20, 4.21
Component/s: Bare Metal Hardware Provisioning
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

It has been observed that metal jobs that fail to mirror an image at the start of their run do not terminate immediately. Instead, they proceed to run all subsequent steps, such as tests, and only report the failure at the very end. 

This leads to a significant waste of computational resources and time. For instance, this [1] example job ran for over 5 hours and 45 minutes before finally failing, when the critical error occurred at the beginning.

Implementing a "fail-fast" mechanism for image mirroring would save considerable resources and provide developers with much faster feedback on job failures, specially on k8s bump PRs.

[1] https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_kubernetes/2464/pull-ci-openshift-kubernetes-release-4.19-e2e-metal-ipi-ovn-ipv6/1967784233418100736

Version-Release number of selected component (if applicable):

All version

How reproducible:

When there is a new image that needs mirroring (which happens often in kube bumps).

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

The job logs the image mirror failure but continues to execute all subsequent steps. The job runs for its entire duration (or until another step fails) and only then reports the final "Failed" status. This can take hours, as seen in the example provided.

Expected results:

The job should detect the image mirror failure, immediately terminate, and report the error. The job status should change to "Failed" within minutes of starting.

Additional info:

Assignee:: Tudor Domnescu

Reporter:: Fabio Bertinatto

Need Info From:: None

Contributors:: None

QA Contact:: Jad Haj Yahya

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/16 1:31 PM

Updated:: 2025/09/29 11:56 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates