-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16.z
-
None
-
Critical
-
None
-
False
-
Description of problem:
Upgrade on a three-node compact cluster (with FIPS enabled) is stuck. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.36 True True 16h Unable to apply 4.16.16: some cluster operators are not available - lastTransitionTime: "2024-10-22T14:48:43Z" message: Cluster operators authentication, openshift-apiserver are not available reason: ClusterOperatorsNotAvailable status: "True" type: Failing - lastTransitionTime: "2024-10-22T14:19:07Z" message: 'Unable to apply 4.16.16: some cluster operators are not available' reason: ClusterOperatorsNotAvailable status: "True" type: Progressing $ oc get co authentication openshift-apiserver NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.16.16 False False False 16h openshift-apiserver 4.16.16 False False False 16h $ oc get co |grep "4.15" dns 4.15.36 True False False 21h machine-config 4.15.36 True False False 273d network 4.15.36 True False False 1y apiserver reporting conditions: - lastTransitionTime: "2023-02-10T18:17:54Z" message: All is well reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2024-10-22T18:16:17Z" message: All is well reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2024-10-22T14:47:44Z" message: 'APIServicesAvailable: PreconditionNotReady' reason: APIServices_PreconditionNotReady status: "False" type: Available - lastTransitionTime: "2023-02-10T18:15:23Z" message: All is well reason: AsExpected status: "True" type: Upgradeable - lastTransitionTime: "2024-10-22T14:47:41Z" reason: NoData status: Unknown type: EvaluationConditionsDetected kube-apiserver, openshift-apiserver and authentication are reporting the following errors: 2024-10-23T09:39:14.208529772+02:00 W1023 07:39:14.208441 1 logging.go:59] [core] [Channel #50082 SubChannel #50085] grpc: addrConn.createTransport failed to connect to {Addr: "172.31.36.3:2379", ServerName: "172.31.36.3:2379", }. Err: connection error: desc = "error reading server preface: read tcp 10.149.0.155:39190->172.31.36.3:2379: use of closed network connection" 2024-10-23T09:42:17.705910825+02:00 W1023 07:42:17.705817 1 logging.go:59] [core] [Channel #50251 SubChannel #50252] grpc: addrConn.createTransport failed to connect to {Addr: "172.31.36.1:2379", ServerName: "172.31.36.1:2379", }. Err: connection error: desc = "transport: authentication handshake failed: context canceled" But testing connectivity from those pods to the etcd enpoints with the mounted certificate works. Also verified etcd performance (which is not the best but looks not related).
Version-Release number of selected component (if applicable):
4.16.16
Expected results:
openshift-apiserver and openshift-authentication becomes available and update progress.