-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.22
Serval latest build failed :
level=info msg=Cluster operator openshift-apiserver EvaluationConditionsDetected is Unknown with NoData: level=info msg=Cluster operator openshift-controller-manager EvaluationConditionsDetected is Unknown with NoData: level=info msg=Cluster operator service-ca EvaluationConditionsDetected is Unknown with NoData: level=info msg=Cluster operator storage EvaluationConditionsDetected is Unknown with NoData: level=error msg=Cluster initialization failed because one or more operators are not functioning properly. level=error msg=The cluster should be accessible for troubleshooting as detailed in the documentation linked below, level=error msg=https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html level=error msg=The 'wait-for install-complete' subcommand can then be used to continue the installation level=error msg=failed to initialize the cluster: Could not update customresourcedefinition "metal3remediations.infrastructure.cluster.x-k8s.io" (283 of 1050): the object is invalid, possibly due to local cluster configuration: timed out waiting for the condition
Component Readiness has found a potential regression in the following test:
install should succeed: cluster creation
Significant regression detected.
Fishers Exact probability of a regression: 100.00%.
Test pass rate dropped from 100.00% to 90.91%.
Sample (being evaluated) Release: 4.22
Start Time: 2026-02-19T00:00:00Z
End Time: 2026-02-26T04:00:00Z
Success Rate: 90.91%
Successes: 80
Failures: 8
Flakes: 0
Base (historical) Release: 4.21
Start Time: 2026-01-04T00:00:00Z
End Time: 2026-02-03T23:59:59Z
Success Rate: 100.00%
Successes: 285
Failures: 0
Flakes: 0
View the test details report for additional context.
The following is the analysis from ai-helper for https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-dualstack-techpreview/2026423090430349312
The Actual Error
From the Cluster Version Operator logs, the real error is:
CustomResourceDefinition.apiextensions.k8s.io "metal3remediations.infrastructure.cluster.x-k8s.io" is invalid:
status.storedVersions[1]: Invalid value: "v1beta2":
missing from spec.versions; v1beta2 was previously a storage version, and must remain in spec.versions
until a storage migration ensures no data remains persisted in v1beta2 and
removes v1beta2 from status.storedVersions
What This Means
This is a CRD version migration issue, NOT a webhook timeout issue as initially suggested by the error message.
The Problem:
1. The CRD currently in the cluster has:
- spec.versions: [v1beta1, v1beta2]
- status.storedVersions: [v1beta1, v1beta2]
2. The 4.22 nightly payload is trying to apply a new version of the CRD that:
- Only includes v1beta1 in spec.versions
- Removes v1beta2 from the list
3. Kubernetes API server validation prevents removing a version from spec.versions if:
- That version is still listed in status.storedVersions
- This protects against data loss from objects stored in the removed version
Why This Causes Install Failure
The Cluster Version Operator:
- Repeatedly tries to update the CRD (resource #283 of 1051)
- Each attempt is rejected by API server validation
- The update never succeeds
- Installation gets stuck at 93% waiting for this CRD update
- Eventually times out after 1 hour
Component Status Summary
✅ Working Correctly:
- API Server: Running and enforcing validation
- capm3-webhook-service: Running, conversion webhook functional
- Metal3 controllers: Running
- Webhook communication: No timeout issues
❌ The Issue:
- CRD version migration mismatch between current cluster state and new payload
- CVO cannot complete the upgrade due to validation failure
Why This Happened
This appears to be a regression or issue in the 4.22.0-0.nightly-2026-02-24-222058 payload where:
1. The Metal3Remediation CRD removed v1beta2 from its supported versions
2. But the cluster already has data marked as stored in v1beta2
3. No storage migration was performed to convert v1beta2 objects to v1beta1
Required Fix
The payload needs one of these fixes:
1. Keep v1beta2 in spec.versions until all clusters can migrate
2. Perform storage version migration before removing v1beta2
3. Add a storage version migrator to convert existing v1beta2 objects to v1beta1 and update storedVersions
—
Summary
This is NOT an installation infrastructure issue - it's a payload/CRD versioning bug in the nightly 4.22 release that prevents cluster initialization from completing.
The error message "the object is invalid, possibly due to local cluster configuration: timed out waiting for the condition" was misleading - the timeout was from CVO retrying the failed
update, not from a webhook timeout.
=====================
This issue is seen for the 1st time on 4.22.0-0.nightly-2026-02-24-222058, from the changelog, seem https://github.com/openshift/cluster-api-provider-metal3/pull/63 is suspicious.