Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Machine Config Operator
Labels:
None

Severity:
Important
Regression:
No
Release Blocker:
Proposed
Architecture:

s390x
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Links:

Description

OCP 4.16 Installation is failing - Bootstrap failing due to access denied to perform create on Certificate with HTTP post

level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"d39d6133-da19-4ac0-bdc2-9af0d255f4f3","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}

After 40 mins of wait for cluster to initialize.. there were multiple errors like below operators and shows the server is down or not responding

level=debug msg=Loading Agent Config...
level=info msg=Waiting up to 40m0s (until 2:15PM UTC) for the cluster at https://api.libvirt-s390x-4-1-17572.libvirt-s390x-4-1.ci:6443 to initialize...
level=debug msg=Still waiting for the cluster to initialize: Multiple errors are preventing progress:
level=debug msg=* Cluster operators authentication, image-registry, ingress, insights, kube-apiserver, machine-api, machine-config, monitoring, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver are not available
level=debug msg=* Could not update imagestream "openshift/driver-toolkit" (608 of 886): the server is down or not responding
level=debug msg=* Could not update oauthclient "console" (546 of 886): the server does not recognize this resource, check extension API servers
level=debug msg=* Could not update role "openshift-console-operator/prometheus-k8s" (804 of 886): resource may have been deleted
level=debug msg=* Could not update role "openshift-console/prometheus-k8s" (808 of 886): resource may have been deleted

which caused error with machine config degraded and master nodes failed.

level=error msg=Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Unable to apply 4.16.0-0.nightly-s390x-2024-03-03-131239: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)]]

Job Link: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-ovn-remote-libvirt-s390x/1764278211769798656

Issue is observed in last 9 jobs runs since March 2nd.

FYI
Checked with telpelt on this VM during the same time duration, there were no network glitch or infrastructure problems.

Attachments

Activity

People

Assignee:: Sinny Kumari

Reporter:: Amrut Purandare

QA Contact:: Doug Slavens

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2024/03/05 7:34 AM

Updated:: 2024/03/13 1:27 PM

Resolved:: 2024/03/13 1:27 PM