-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
Description of problem:
m6id.metal c6id.metal r6id.metal r5b.metal rosa cluster with above new instances type cannot provision successfully
Version-Release number of selected component (if applicable):
not related with version
How reproducible:
always
Steps to Reproduce:
1.rosa create cluster -c tzhou --compute-machine-type c6id.metal 2. 3.
Actual results:
The cluster is in error status
Expected results:
The cluster is ready
Additional info:
here are some logs and debug info:
$ ocm get cluster 266n4fjprsjs423bb1vufkt2cj8l56b9 | jq -r .status
{
"state": "error",
"description": "GeneralOperatorDegraded",
"dns_ready": true,
"oidc_ready": false,
"provision_error_message": "",
"provision_error_code": "",
"configuration_mode": "full",
"limited_support_reason_count": 0
}
$ ocm get /api/clusters_mgmt/v1/clusters/266n4fjprsjs423bb1vufkt2cj8l56b9/resources/live
...
"conditions": [
{
"type": "ProvisionFailed",
"status": "True",
"lastProbeTime": "2023-09-12T02:35:58Z",
"lastTransitionTime": "2023-09-12T02:35:58Z",
"reason": "GeneralOperatorDegraded",
"message": "Timeout waiting for an operator to become ready"
},
...
$ ocm get /api/clusters_mgmt/v1/clusters/266njkqad4ipndraac0ei4u07nuso18s/logs/install
...
time="2023-09-12T03:06:12Z" level=error msg="Cluster operator authentication Available is False with WellKnown_NotReady: WellKnownAvailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.156.181:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)"
...
// the cluster can login, and the co looks good
$ oc get co -A --kubeconfig=kubeconfig
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.12.31 True False False 48m
baremetal 4.12.31 True False False 101m
cloud-controller-manager 4.12.31 True False False 105m
cloud-credential 4.12.31 True False False 105m
cluster-autoscaler 4.12.31 True False False 102m
config-operator 4.12.31 True False False 103m
console 4.12.31 True False False 55m
control-plane-machine-set 4.12.31 True False False 99m
csi-snapshot-controller 4.12.31 True False False 102m
dns 4.12.31 True False False 102m
etcd 4.12.31 True False False 100m
image-registry 4.12.31 True False False 60m
ingress 4.12.31 True False False 59m
insights 4.12.31 True False False 89m
kube-apiserver 4.12.31 True False False 87m
kube-controller-manager 4.12.31 True False False 100m
kube-scheduler 4.12.31 True False False 100m
kube-storage-version-migrator 4.12.31 True False False 102m
machine-api 4.12.31 True False False 60m
machine-approver 4.12.31 True False False 102m
machine-config 4.12.31 True False False 100m
marketplace 4.12.31 True False False 102m
monitoring 4.12.31 True False False 58m
network 4.12.31 True False False 104m
node-tuning 4.12.31 True False False 102m
openshift-apiserver 4.12.31 True False False 87m
openshift-controller-manager 4.12.31 True False False 99m
openshift-samples 4.12.31 True False False 94m
operator-lifecycle-manager 4.12.31 True False False 102m
operator-lifecycle-manager-catalog 4.12.31 True False False 102m
operator-lifecycle-manager-packageserver 4.12.31 True False False 94m
service-ca 4.12.31 True False False 102m
storage 4.12.31 True False False 102m
cc. yunjiang-1 would you please help take a look if the bug is under correct component? thanks in advance.