-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
Description of problem:
m6id.metal c6id.metal r6id.metal r5b.metal rosa cluster with above new instances type cannot provision successfully
Version-Release number of selected component (if applicable):
not related with version
How reproducible:
always
Steps to Reproduce:
1.rosa create cluster -c tzhou --compute-machine-type c6id.metal 2. 3.
Actual results:
The cluster is in error status
Expected results:
The cluster is ready
Additional info:
here are some logs and debug info: $ ocm get cluster 266n4fjprsjs423bb1vufkt2cj8l56b9 | jq -r .status { "state": "error", "description": "GeneralOperatorDegraded", "dns_ready": true, "oidc_ready": false, "provision_error_message": "", "provision_error_code": "", "configuration_mode": "full", "limited_support_reason_count": 0 } $ ocm get /api/clusters_mgmt/v1/clusters/266n4fjprsjs423bb1vufkt2cj8l56b9/resources/live ... "conditions": [ { "type": "ProvisionFailed", "status": "True", "lastProbeTime": "2023-09-12T02:35:58Z", "lastTransitionTime": "2023-09-12T02:35:58Z", "reason": "GeneralOperatorDegraded", "message": "Timeout waiting for an operator to become ready" }, ... $ ocm get /api/clusters_mgmt/v1/clusters/266njkqad4ipndraac0ei4u07nuso18s/logs/install ... time="2023-09-12T03:06:12Z" level=error msg="Cluster operator authentication Available is False with WellKnown_NotReady: WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.156.181:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)" ... // the cluster can login, and the co looks good $ oc get co -A --kubeconfig=kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.31 True False False 48m baremetal 4.12.31 True False False 101m cloud-controller-manager 4.12.31 True False False 105m cloud-credential 4.12.31 True False False 105m cluster-autoscaler 4.12.31 True False False 102m config-operator 4.12.31 True False False 103m console 4.12.31 True False False 55m control-plane-machine-set 4.12.31 True False False 99m csi-snapshot-controller 4.12.31 True False False 102m dns 4.12.31 True False False 102m etcd 4.12.31 True False False 100m image-registry 4.12.31 True False False 60m ingress 4.12.31 True False False 59m insights 4.12.31 True False False 89m kube-apiserver 4.12.31 True False False 87m kube-controller-manager 4.12.31 True False False 100m kube-scheduler 4.12.31 True False False 100m kube-storage-version-migrator 4.12.31 True False False 102m machine-api 4.12.31 True False False 60m machine-approver 4.12.31 True False False 102m machine-config 4.12.31 True False False 100m marketplace 4.12.31 True False False 102m monitoring 4.12.31 True False False 58m network 4.12.31 True False False 104m node-tuning 4.12.31 True False False 102m openshift-apiserver 4.12.31 True False False 87m openshift-controller-manager 4.12.31 True False False 99m openshift-samples 4.12.31 True False False 94m operator-lifecycle-manager 4.12.31 True False False 102m operator-lifecycle-manager-catalog 4.12.31 True False False 102m operator-lifecycle-manager-packageserver 4.12.31 True False False 94m service-ca 4.12.31 True False False 102m storage 4.12.31 True False False 102m
cc. yunjiang-1 would you please help take a look if the bug is under correct component? thanks in advance.