Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.14.0
Affects Version/s: 4.14
Component/s: Cloud Compute / Unknown
Labels:

Severity:
Critical
Regression:
No
Epic Link:
CNF-6318
Story Points:
2
Sprint:
OCP VE Sprint 239, OCP VE Sprint 240
sprint_count:
2
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:
Latest Status Summary:
8/8: testing this again now that ~~OCPBUGS-16889~~ is verified
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Install IPI sno and specify baselineCapabilitySet as None in install-config.yaml, installation failed at stage of bootstrap complete.

Node is Ready but etcd operator is degraded:
$ oc get nodes
NAME                       STATUS   ROLES                         AGE   VERSION
jima03sno-cgqzt-master-0   Ready    control-plane,master,worker   52m   v1.27.3+ab0b8ee

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.nightly-2023-06-30-131338   False       False         True       49m     APIServicesAvailable: PreconditionNotReady...
cloud-controller-manager                   4.14.0-0.nightly-2023-06-30-131338   True        False         False      50m     
cloud-credential                                                                True        False         False      58m     
config-operator                            4.14.0-0.nightly-2023-06-30-131338   True        False         False      49m     
dns                                        4.14.0-0.nightly-2023-06-30-131338   True        False         False      48m     
etcd                                       4.14.0-0.nightly-2023-06-30-131338   False       True          True       48m     StaticPodsAvailable: 0 nodes are active; 1 nodes are at revision 0; 0 nodes have achieved new revision 2
image-registry                                                                                                               
ingress                                                                         False       True          True       48m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
kube-apiserver                             4.14.0-0.nightly-2023-06-30-131338   False       True          True       49m     StaticPodsAvailable: 0 nodes are active; 1 nodes are at revision 0; 0 nodes have achieved new revision 2
kube-controller-manager                    4.14.0-0.nightly-2023-06-30-131338   True        False         False      45m     
kube-scheduler                             4.14.0-0.nightly-2023-06-30-131338   True        False         False      45m     
kube-storage-version-migrator              4.14.0-0.nightly-2023-06-30-131338   True        False         False      49m     
machine-approver                           4.14.0-0.nightly-2023-06-30-131338   True        False         False      48m     
machine-config                             4.14.0-0.nightly-2023-06-30-131338   True        False         False      48m     
monitoring                                                                      False       True          True       104s    reconciling Alertmanager Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), reconciling Thanos Querier Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), reconciling Prometheus API Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
network                                    4.14.0-0.nightly-2023-06-30-131338   True        False         False      50m     
openshift-apiserver                        4.14.0-0.nightly-2023-06-30-131338   False       False         True       49m     APIServicesAvailable: PreconditionNotReady
openshift-controller-manager               4.14.0-0.nightly-2023-06-30-131338   True        False         False      42m     
operator-lifecycle-manager                 4.14.0-0.nightly-2023-06-30-131338   True        False         False      48m     
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-06-30-131338   True        False         False      48m     
operator-lifecycle-manager-packageserver                                        False       True          False      48m     ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout
service-ca                                 4.14.0-0.nightly-2023-06-30-131338   True        False         False      49m  


And I also found that node could not be accessed by ssh with below error:
# ssh -i ~/.ssh/openshift-qe.pem core@10.0.0.6
The authenticity of host '10.0.0.6 (10.0.0.6)' can't be established.
ECDSA key fingerprint is SHA256:rCrEiTqPIPuRU84ierPqo0J/UAv4+yiEoLOzlakfvGs.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.0.0.6' (ECDSA) to the list of known hosts.
core@10.0.0.6: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Debug on node, ssh public key is not copied into /home/core/.ssh/authorized_keys.
sh-5.1# ls -ltr /home/core/.ssh/authorized_keys.d/        
total 0
-rw-------. 1 core core 0 Jul  3 00:48 ignition

machine-api operator is disabled, but I still see namespace openshift-machine-api and service resource under it.
$ oc get all -n openshift-machine-api
NAME                                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)            AGE
service/cluster-autoscaler-operator   ClusterIP   172.30.129.2   <none>        443/TCP,9192/TCP   68m

Similar issue also happened on UPI cluster specifying baselineCapabilitySet:None in install-config.yaml.

attached must-gather log.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-30-131338

How reproducible:

Always when installing IPI SNO or UPI cluster with disabling MachineAPI capability.

Steps to Reproduce:

1. Prepare install-config.yaml and set baselineCapabilitySet:None
2. Install IPI SNO or UPI cluster
3.

Actual results:

Installation is failure

Expected results:

Installation is successful.

Additional info:

Installation is successful if setting baselineCapabilitySet:None + addtionalEabledCapabilities: [MachineAPI] in install-config.yaml

is caused by

OCPBUGS-16889 CEO needs to handle optional MachineAPI

Closed

links to

openshift/api#1554: OCPBUGS-15654: Add MAPI to all capability sets

RHEA-2023:5006 rpm

Assignee:: Bulat Zamalutdinov

Reporter:: Jinyun Ma

QA Contact:: Jinyun Ma

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Created:: 2023/07/03 2:40 AM

Updated:: 2023/11/16 3:48 AM

Resolved:: 2023/10/31 1:17 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates