-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.16
-
Critical
-
No
-
Sprint 252, Sprint 253
-
2
-
Proposed
-
False
-
-
-
Release Note Not Required
Description of problem:
Agent CI jobs (compact and HA) are currently experiencing failures because the control-plane-machine-set operator is degraded, despite the SNO cluster operating normally.
Version-Release number of selected component (if applicable):
4.16
How reproducible:
100%
Actual results:
level=info msg=Cluster operator control-plane-machine-set Available is False with UnavailableReplicas: Missing 3 available replica(s)124level=error msg=Cluster operator control-plane-machine-set Degraded is True with UnmanagedNodes: Found 3 unmanaged node(s)125level=info msg=Cluster operator csi-snapshot-controller EvaluationConditionsDetected is Unknown with NoData: 126level=info msg=Cluster operator etcd EvaluationConditionsDetected is Unknown with NoData: 127level=info msg=Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: 128level=info msg=Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer129level=info msg=Cluster operator insights Disabled is False with AsExpected: 130level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"dc5b9421-248f-4ac4-9135-ac5bf6bcd2ce","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}131level=info msg=Cluster operator kube-apiserver EvaluationConditionsDetected is False with AsExpected: All is well132level=info msg=Cluster operator kube-controller-manager EvaluationConditionsDetected is Unknown with NoData: 133level=info msg=Cluster operator kube-scheduler EvaluationConditionsDetected is Unknown with NoData: 134level=info msg=Cluster operator network ManagementStateDegraded is False with : 135level=info msg=Cluster operator openshift-controller-manager EvaluationConditionsDetected is Unknown with NoData: 136level=info msg=Cluster operator storage EvaluationConditionsDetected is Unknown with NoData: 137level=error msg=Cluster initialization failed because one or more operators are not functioning properly.138level=error msg= The cluster should be accessible for troubleshooting as detailed in the documentation linked below,139level=error msg= https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html140ERROR: Installation failed. Aborting execution.
Expected results:
Install should be successful.
Additional info:
HA must gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-vsphere-agent-ha-f14/1771068123387006976/artifacts/vsphere-agent-ha-f14/gather-must-gather/artifacts/must-gather.tar Compact must gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/pr-logs/pull/openshift_release/50544/rehearse-50544-periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-vsphere-agent-compact-fips-f14/1775524930515898368/artifacts/vsphere-agent-compact-fips-f14/gather-must-gather/artifacts/must-gather.tar
- is related to
-
OCPBUGS-20061 control-plane-machine-set goes Available=False with UnavailableReplicas during updates
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update
- mentioned on