-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14
-
Critical
-
No
-
2
-
OCP VE Sprint 239, OCP VE Sprint 240
-
2
-
Rejected
-
False
-
-
-
8/8: testing this again now that
OCPBUGS-16889is verified
Description of problem:
Install IPI sno and specify baselineCapabilitySet as None in install-config.yaml, installation failed at stage of bootstrap complete. Node is Ready but etcd operator is degraded: $ oc get nodes NAME STATUS ROLES AGE VERSION jima03sno-cgqzt-master-0 Ready control-plane,master,worker 52m v1.27.3+ab0b8ee $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.0-0.nightly-2023-06-30-131338 False False True 49m APIServicesAvailable: PreconditionNotReady... cloud-controller-manager 4.14.0-0.nightly-2023-06-30-131338 True False False 50m cloud-credential True False False 58m config-operator 4.14.0-0.nightly-2023-06-30-131338 True False False 49m dns 4.14.0-0.nightly-2023-06-30-131338 True False False 48m etcd 4.14.0-0.nightly-2023-06-30-131338 False True True 48m StaticPodsAvailable: 0 nodes are active; 1 nodes are at revision 0; 0 nodes have achieved new revision 2 image-registry ingress False True True 48m The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending) kube-apiserver 4.14.0-0.nightly-2023-06-30-131338 False True True 49m StaticPodsAvailable: 0 nodes are active; 1 nodes are at revision 0; 0 nodes have achieved new revision 2 kube-controller-manager 4.14.0-0.nightly-2023-06-30-131338 True False False 45m kube-scheduler 4.14.0-0.nightly-2023-06-30-131338 True False False 45m kube-storage-version-migrator 4.14.0-0.nightly-2023-06-30-131338 True False False 49m machine-approver 4.14.0-0.nightly-2023-06-30-131338 True False False 48m machine-config 4.14.0-0.nightly-2023-06-30-131338 True False False 48m monitoring False True True 104s reconciling Alertmanager Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), reconciling Thanos Querier Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), reconciling Prometheus API Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline network 4.14.0-0.nightly-2023-06-30-131338 True False False 50m openshift-apiserver 4.14.0-0.nightly-2023-06-30-131338 False False True 49m APIServicesAvailable: PreconditionNotReady openshift-controller-manager 4.14.0-0.nightly-2023-06-30-131338 True False False 42m operator-lifecycle-manager 4.14.0-0.nightly-2023-06-30-131338 True False False 48m operator-lifecycle-manager-catalog 4.14.0-0.nightly-2023-06-30-131338 True False False 48m operator-lifecycle-manager-packageserver False True False 48m ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout service-ca 4.14.0-0.nightly-2023-06-30-131338 True False False 49m And I also found that node could not be accessed by ssh with below error: # ssh -i ~/.ssh/openshift-qe.pem core@10.0.0.6 The authenticity of host '10.0.0.6 (10.0.0.6)' can't be established. ECDSA key fingerprint is SHA256:rCrEiTqPIPuRU84ierPqo0J/UAv4+yiEoLOzlakfvGs. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '10.0.0.6' (ECDSA) to the list of known hosts. core@10.0.0.6: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). Debug on node, ssh public key is not copied into /home/core/.ssh/authorized_keys. sh-5.1# ls -ltr /home/core/.ssh/authorized_keys.d/ total 0 -rw-------. 1 core core 0 Jul 3 00:48 ignition machine-api operator is disabled, but I still see namespace openshift-machine-api and service resource under it. $ oc get all -n openshift-machine-api NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/cluster-autoscaler-operator ClusterIP 172.30.129.2 <none> 443/TCP,9192/TCP 68m Similar issue also happened on UPI cluster specifying baselineCapabilitySet:None in install-config.yaml. attached must-gather log.
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-06-30-131338
How reproducible:
Always when installing IPI SNO or UPI cluster with disabling MachineAPI capability.
Steps to Reproduce:
1. Prepare install-config.yaml and set baselineCapabilitySet:None 2. Install IPI SNO or UPI cluster 3.
Actual results:
Installation is failure
Expected results:
Installation is successful.
Additional info:
Installation is successful if setting baselineCapabilitySet:None + addtionalEabledCapabilities: [MachineAPI] in install-config.yaml
- is caused by
-
OCPBUGS-16889 CEO needs to handle optional MachineAPI
- Closed
- links to
-
RHEA-2023:5006 rpm