Loading...

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.22
Component/s: Test Infrastructure
Labels:
- component-regression

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:

[sig-devex][Feature:Templates] templateinstance readiness test should report ready soon after all annotated objects are ready [apigroup:template.openshift.io][apigroup:build.openshift.io] [Suite:openshift/conformance/parallel]

Extreme regression detected.
Fishers Exact probability of a regression: 100.00%.
Test pass rate dropped from 100.00% to 80.00%.

Sample (being evaluated) Release: 4.22
Start Time: 2026-02-02T00:00:00Z
End Time: 2026-02-09T16:00:00Z
Success Rate: 80.00%
Successes: 20
Failures: 5
Flakes: 0
Base (historical) Release: 4.21
Start Time: 2026-01-04T00:00:00Z
End Time: 2026-02-03T23:59:59Z
Success Rate: 100.00%
Successes: 64
Failures: 0
Flakes: 0

View the test details report for additional context.

Initial suspect for the root cause:

[Monitor:legacy-test-framework-invariants-pathological][sig-arch] events should not repeat pathologically expand_less 	0s
{  2 events happened too frequently

event happened 21 times, something is wrong: namespace/openshift-must-gather-nwnfb node/master-2 pod/must-gather-h2twt hmsg/802e69c41f - reason/BackOff Back-off pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1d38119f000dff5ea7f725e17d366bab5be2280d13afa432b1228e1fdc389740" (06:35:09Z) result=reject 
event happened 21 times, something is wrong: namespace/openshift-must-gather-nwnfb node/master-2 pod/must-gather-h2twt hmsg/d5bf9afefc - reason/Failed Error: ImagePullBackOff (06:35:09Z) result=reject }

This indicates a problem with infra somewhere. I am triaging all known occurrences to this jira.

It looks like this resolved roughly a day and a half ago, so we'll have to keep an eye on it.

Filed by: dgoodwin@redhat.com

This goes beyond metal, AI analysis:

Signature Verification Failures in Payload 4.22.0-0.nightly-2026-02-08-005048

Two jobs using the same payload experienced image pull failures due to sigstore signature verification. The error in both cases is identical:

SignatureValidationFailed: unable to pull image or OCI artifact:
  pull image err: Source image rejected: A signature was required, but no signature exists;
  artifact err: provided artifact is a container image

All affected images are from quay.io/openshift-release-dev/ocp-v4.0-art-dev.

Job 1: Azure Install Failure

Job: periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.22-periodics-e2e-azure
Platform: Azure | Cluster: build01
Impact: Complete install failure — cluster never came up

Root cause chain:

machine-config-daemon-pull.service failed to pull the MCD image (sha256:4f5f9954...) on all 3 master nodes
kubelet could not start (depends on MCD pull)
No control plane pods scheduled
bootkube timed out waiting for etcd/kube-apiserver
Installer exited with code 5 (bootstrap timeout)

Evidence location: Node journal logs ({{control-plane/10.0.0.

{4,5,6}

/journals/journal.log}}) in the log bundle at ipi-install-install/artifacts/.

Scale: ~5,784 total image pull errors across 3 master nodes (~1,845 on master-0, ~1,966 on master-1, ~1,973 on master-2). The pull retried every ~1.5 seconds from 01:11 to 01:57 UTC and never succeeded.

Job 2: Metal IPI Upgrade — Test Failures

Job: periodic-ci-openshift-release-master-nightly-4.22-upgrade-from-stable-4.21-e2e-metal-ipi-ovn-upgrade
Platform: Metal IPI | Cluster: build09
Upgrade: 4.21.0-0.nightly-2026-02-05-184824 → 4.22.0-0.nightly-2026-02-08-005048
Impact: Install and upgrade succeeded, but post-upgrade tests failed en masse

Existing running pods were unaffected (images already cached), but any new pod creation after upgrade — e2e test pods, debug pods, must-gather pods, build pods — failed to pull 4.22 images.

Scale:

Metric	Count
Affected image digests	3 (`sha256:5d52dda6...`, `sha256:1d38119f...`, `sha256:295c2081...`)
Affected nodes	5 of 6 (master-0, master-2, worker-0, worker-1, worker-2)
Affected namespaces	78
Affected pods	62
E2E test failures	104 of 3,863
Monitor test failures	112 of 2,223

Evidence locations:

Cluster events (gather-extra/artifacts/events.json):

Failed to pull image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5d52dda6d305a81830d9ddab3045adfa019241ca016e68eb9736a40206a6ebf5":
SignatureValidationFailed: ...Source image rejected: A signature was required, but no signature exists

E2E monitor tests (baremetalds-e2e-test/artifacts/junit/e2e-monitor-tests__20260208-040623.xml):

platform pods in ns/openshift-infra should not fail to start — 20 container start failures, recycler-for-nfs-b8npw on worker-1, cause SignatureValidationFailed starting 04:48 UTC
platform pods in ns/default should not fail to start — 44 container start failures, master-0-debug-p5c2p, cause SignatureValidationFailed starting 06:14 UTC
events should not repeat pathologically in e2e namespaces — 64 pathological events, ImagePullBackOff across e2e test pods, service-network-monitor repeated 684 times
should not encounter ErrImagePull in non-openshift namespace pods — 67 ErrImagePull intervals

E2E tests (baremetalds-e2e-test/artifacts/junit/junit_e2e__20260208-040623.xml):

8 oc adm must-gather tests failed directly with SignatureValidationFailed pulling sha256:1d38119f...
88 tests timed out or were interrupted — cascade failures from pods stuck in ImagePullBackOff (builds, deployments, oauth test servers, etc.)

Conclusion

This is not a registry outage — quay.io was reachable and serving image manifests. The failure is at the signature verification layer: the ClusterImagePolicy required a sigstore signature but none could be found for these image digests. This points to either a sigstore/signature store outage or a signing pipeline failure for payload 4.22.0-0.nightly-2026-02-08-005048 around 2026-02-08 01:00–07:00 UTC.

Details

Description

Job 1: Azure Install Failure

Job 2: Metal IPI Upgrade — Test Failures

Conclusion

Attachments

Easy Agile Planning Poker

Activity

People

Dates