-
Bug
-
Resolution: Done
-
Minor
-
4.11.z
-
False
-
Description of problem:
While upgrading 1241 SNOs using TALM, 3 clusters were unable to upgrade because their etcd operator was degraded.
Version-Release number of selected component (if applicable):
Deployed SNO OCP 4.10.32 Attempted to upgrade to 4.11.5
How reproducible:
Rare because only 3 out of 1241 clusters had this issue Out of all the upgrade failures, it was 3 out 26 failures
Steps to Reproduce:
1. 2. 3.
Actual results:
[root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00423/kubeconfig get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.32 True False 41h Error while reconciling 4.10.32: the cluster operator etcd is degraded [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00454/kubeconfig get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.32 True False 41h Error while reconciling 4.10.32: the cluster operator etcd is degraded [root@e27-h01-000-r650 common-and-group]# oc --kubeconfig=/root/hv-vm/sno/manifests/sno01049/kubeconfig get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.32 True False 39h Error while reconciling 4.10.32: the cluster operator etcd is degraded
Expected results:
SNO clusters to not degrade etcd operator and prevent upgrading
Additional info:
- lastTransitionTime: "2022-10-26T04:36:26Z" message: |- EtcdEndpointsDegraded: no etcd members are present UpgradeBackupControllerDegraded: etcdmembers.etcd.operator.openshift.io "sno01049" not found reason: EtcdEndpoints_ErrorUpdatingEtcdEndpoints::UpgradeBackupController_Error status: "True" type: Degraded [root@sno00423 ~]# netstat -ntlp | egrep "2380|2379|9978" tcp6 0 0 :::2379 :::* LISTEN 9047/etcd tcp6 0 0 :::2380 :::* LISTEN 9047/etcd tcp6 0 0 :::9978 :::* LISTEN 9047/etcd
I opened this against etcd however it isn't clear yet what is the actual responsible component for the failures.
- account is impacted by
-
OCPNODE-1418 Fix propagation status in Kubelet
- Closed
- is related to
-
OCPBUGS-6878 SNO failed to deploy because etcd is in degraded state
- Closed