Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
4.16
-
None
-
Important
-
No
-
Proposed
-
s390x
-
False
-
Description
OCP 4.16 Installation is failing - Bootstrap failing due to access denied to perform create on Certificate with HTTP post
level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"d39d6133-da19-4ac0-bdc2-9af0d255f4f3","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}
After 40 mins of wait for cluster to initialize.. there were multiple errors like below operators and shows the server is down or not responding
level=debug msg=Loading Agent Config... level=info msg=Waiting up to 40m0s (until 2:15PM UTC) for the cluster at https://api.libvirt-s390x-4-1-17572.libvirt-s390x-4-1.ci:6443 to initialize... level=debug msg=Still waiting for the cluster to initialize: Multiple errors are preventing progress: level=debug msg=* Cluster operators authentication, image-registry, ingress, insights, kube-apiserver, machine-api, machine-config, monitoring, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver are not available level=debug msg=* Could not update imagestream "openshift/driver-toolkit" (608 of 886): the server is down or not responding level=debug msg=* Could not update oauthclient "console" (546 of 886): the server does not recognize this resource, check extension API servers level=debug msg=* Could not update role "openshift-console-operator/prometheus-k8s" (804 of 886): resource may have been deleted level=debug msg=* Could not update role "openshift-console/prometheus-k8s" (808 of 886): resource may have been deleted
which caused error with machine config degraded and master nodes failed.
level=error msg=Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Unable to apply 4.16.0-0.nightly-s390x-2024-03-03-131239: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)]]
Issue is observed in last 9 jobs runs since March 2nd.
FYI
Checked with telpelt on this VM during the same time duration, there were no network glitch or infrastructure problems.