-
Bug
-
Resolution: Done-Errata
-
Major
-
4.12.z
-
+
-
Critical
-
No
-
ShiftStack Sprint 244, ShiftStack Sprint 245, ShiftStack Sprint 246
-
3
-
False
-
-
-
Bug Fix
Description of problem:
During the upgrade mutlijob for OCP starting from version 4.10 with OVNkubernetes network type on OSP 16.2, the upgrade process encountered an error when upgrading from version 4.11 to 4.12. The cluster operator etcd became unavailable. A specific node, ostest-ttvx4-master-2, is currently in SchedulingDisabled status. Examination of the openshift-etcd namespace reveals that the etcd-ostest-ttvx4-master-0 pod has been reporting errors. Log data suggests issues related to etcd members and their data directories.
Version-Release number of selected component (if applicable):
OCP 4.11.50 to 4.12.36 RHOS-16.2-RHEL-8-20230510.n.1
How reproducible:
Always
Steps to Reproduce:
1.Begin the OCP upgrade process starting from version 4.10 2.Upgrade from 4.10 to 4.11 3.Upgrade from 4.11 to 4.12
Actual results:
The upgrade process fails during the upgrading between versions 4.11 and 4.12, specifically pointing to issues with the etcd operator. The operator reports being unavailable and indicates problems with specific etcd members.
Expected results:
Smooth upgrade from 4.11 to 4.12 without any issues.
Additional info:
$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.36 True False True 4h17m APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()... baremetal 4.12.36 True False False 8h ... ... csi-snapshot-controller 4.12.36 True False False 8h dns 4.12.36 True False False 8h etcd 4.12.36 False True True 4h33m EtcdMembersAvailable: 2 of 4 members are available, NAME-PENDING-172.17.5.228 has not started, ostest-ttvx4-master-0 is unhealthy ..... machine-approver 4.12.36 True False False 8h machine-config 4.11.50 True True True 6h14m Unable to apply 4.12.36: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 2, updated: 2, unavailable: 1)] marketplace 4.12.36 True False False 8h monitoring 4.12.36 True False False 4h15m network 4.12.36 True False False 8h node-tuning 4.12.36 True False False 5h15m openshift-apiserver 4.12.36 True False True 4h19m APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver () ... operator-lifecycle-manager-packageserver 4.12.36 True False False 8h service-ca 4.12.36 True False False 8h storage 4.12.36 True False False 8h
$ oc get pods -n openshift-etcd NAME READY STATUS RESTARTS AGE etcd-guard-ostest-ttvx4-master-0 0/1 Running 0 4h33m etcd-guard-ostest-ttvx4-master-1 1/1 Running 0 4h22m etcd-guard-ostest-ttvx4-master-2 1/1 Running 0 5h40m etcd-ostest-ttvx4-master-0 3/4 Error 58 (5m10s ago) 4h25m etcd-ostest-ttvx4-master-1 4/4 Running 0 4h25m etcd-ostest-ttvx4-master-2 4/4 Running 2 (4h28m ago) 4h48m installer-25-ostest-ttvx4-master-0 0/1 Completed 0 4h47m installer-26-ostest-ttvx4-master-0 0/1 Completed 0 4h44m installer-27-ostest-ttvx4-master-0 0/1 Completed 0 4h34m revision-pruner-25-ostest-ttvx4-master-0 0/1 Completed 0 4h47m revision-pruner-26-ostest-ttvx4-master-0 0/1 Completed 0 4h44m revision-pruner-26-ostest-ttvx4-master-1 0/1 Completed 0 4h34m revision-pruner-27-ostest-ttvx4-master-0 0/1 Completed 0 4h34m revision-pruner-27-ostest-ttvx4-master-1 0/1 Completed 0 4h34m
$ oc logs etcd-ostest-ttvx4-master-0 -n openshift-etcd 1a4f2630e5f2296f, unstarted, , https://172.17.5.228:2380, , true 2f6c4ca331daa2de, started, ostest-ttvx4-master-2, https://10.196.2.249:2380, https://10.196.2.249:2379, false 752ca6c9953eff21, started, ostest-ttvx4-master-1, https://10.196.1.187:2380, https://10.196.1.187:2379, false a6d1d802202a55e3, started, ostest-ttvx4-master-0, https://10.196.2.93:2380, https://10.196.2.93:2379, false #### attempt 0 member={name="", peerURLs=[https://172.17.5.228:2380}, clientURLs=[] member={name="ostest-ttvx4-master-2", peerURLs=[https://10.196.2.249:2380}, clientURLs=[https://10.196.2.249:2379] member={name="ostest-ttvx4-master-1", peerURLs=[https://10.196.1.187:2380}, clientURLs=[https://10.196.1.187:2379] member={name="ostest-ttvx4-master-0", peerURLs=[https://10.196.2.93:2380}, clientURLs=[https://10.196.2.93:2379] target={name="ostest-ttvx4-master-0", peerURLs=[https://10.196.2.93:2380}, clientURLs=[https://10.196.2.93:2379] member "https://10.196.2.93:2380" dataDir has been destroyed and must be removed from the cluster
- depends on
-
OCPBUGS-23190 Failure on OCP Upgrade Between 4.11 to 4.12 Due to etcd Operator Issues
- Closed
- is cloned by
-
OCPBUGS-23190 Failure on OCP Upgrade Between 4.11 to 4.12 Due to etcd Operator Issues
- Closed
- links to
-
RHBA-2023:7691 OpenShift Container Platform 4.11.z bug fix update