-
Bug
-
Resolution: Done
-
Minor
-
None
-
4.17
-
-
-
Important
-
None
-
False
-
Description of problem:
After applying the etcd-to-ephemeral procedure successfully with the following considerations:
- Deploy the masters without the ephemeral attribute in the master flavors.
- Apply the machineConfig 98-var-lib-etcd
- Change the CPMS to use a flavor that includes the ephemeral attribute set to 10G.
The rollback fails due to etcd pods in crashloopback.
With above steps, the dir /sysroot/ostree/deploy/rhcos/var/lib/etcd/ is empty but the content is in /var/lib/etcd on vdb partition. So apparently the rollback cannot be performed:
1. Rollback:
[stack@undercloud-0 ~]$ oc delete -f 98-var-lib-etcd.yaml machineconfig.machineconfiguration.openshift.io "98-var-lib-etcd" deleted
2. The removal of the machineConfig started with master-1, and the volume is not mounted anymore:
[stack@undercloud-0 ~]$ oc debug node/ostest-jnkbp-master-7cbwv-1 -- chroot /host lsblk Starting pod/ostest-jnkbp-master-7cbwv-1-debug-p8jw4 ... To use host binaries, run `chroot /host` NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS vda 252:0 0 40G 0 disk |-vda1 252:1 0 1M 0 part |-vda2 252:2 0 127M 0 part |-vda3 252:3 0 384M 0 part /boot `-vda4 252:4 0 39.5G 0 part /var /sysroot/ostree/deploy/rhcos/var /sysroot /usr /etc / vdb 252:16 0 10G 0 disk
But the etcd pod is not happy:
[stack@undercloud-0 ~]$ oc get pods -n openshift-etcd -l app=etcd NAME READY STATUS RESTARTS AGE etcd-ostest-jnkbp-master-7cbwv-1 3/4 CrashLoopBackOff 22 (2m32s ago) 24h etcd-ostest-jnkbp-master-gqbz7-2 4/4 Running 0 24h etcd-ostest-jnkbp-master-vp9mr-0 4/4 Running 0 24h
[stack@undercloud-0 ~]$ oc logs -n openshift-etcd etcd-ostest-jnkbp-master-7cbwv-1 82789fe40c55eb75, started, ostest-jnkbp-master-gqbz7-2, https://10.196.0.107:2380, https://10.196.0.107:2379, false 8e8f484b15ae158f, started, ostest-jnkbp-master-vp9mr-0, https://10.196.0.182:2380, https://10.196.0.182:2379, false b4d7fd333dda7cb3, started, ostest-jnkbp-master-7cbwv-1, https://10.196.1.177:2380, https://10.196.1.177:2379, false #### attempt 0 member={name="ostest-jnkbp-master-gqbz7-2", peerURLs=[https://10.196.0.107:2380}, clientURLs=[https://10.196.0.107:2379] member={name="ostest-jnkbp-master-vp9mr-0", peerURLs=[https://10.196.0.182:2380}, clientURLs=[https://10.196.0.182:2379] member={name="ostest-jnkbp-master-7cbwv-1", peerURLs=[https://10.196.1.177:2380}, clientURLs=[https://10.196.1.177:2379] target={name="ostest-jnkbp-master-7cbwv-1", peerURLs=[https://10.196.1.177:2380}, clientURLs=[https://10.196.1.177:2379]
where:
[stack@undercloud-0 ~]$ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ostest-jnkbp-master-7cbwv-1 Ready control-plane,master 25h v1.30.3 10.196.1.177 <none> Red Hat Enterprise Linux CoreOS 417.94.202408170011-0 5.14.0-427.33.1.el9_4.x86_64 cri-o://1.30.4-5.rhaos4.17.git95e494c.el9 ostest-jnkbp-master-gqbz7-2 Ready,SchedulingDisabled control-plane,master 25h v1.30.3 10.196.0.107 <none> Red Hat Enterprise Linux CoreOS 417.94.202408170011-0 5.14.0-427.33.1.el9_4.x86_64 cri-o://1.30.4-5.rhaos4.17.git95e494c.el9 ostest-jnkbp-master-vp9mr-0 Ready control-plane,master 26h v1.30.3 10.196.0.182 <none> Red Hat Enterprise Linux CoreOS 417.94.202408170011-0 5.14.0-427.33.1.el9_4.x86_64 cri-o://1.30.4-5.rhaos4.17.git95e494c.el9 ostest-jnkbp-worker-0-4hl2w Ready worker 2d18h v1.30.3 10.196.2.70 <none> Red Hat Enterprise Linux CoreOS 417.94.202408170011-0 5.14.0-427.33.1.el9_4.x86_64 cri-o://1.30.4-5.rhaos4.17.git95e494c.el9 ostest-jnkbp-worker-0-w7759 Ready worker 2d18h v1.30.3 10.196.3.100 <none> Red Hat Enterprise Linux CoreOS 417.94.202408170011-0 5.14.0-427.33.1.el9_4.x86_64 cri-o://1.30.4-5.rhaos4.17.git95e494c.el9
Version-Release number of selected component (if applicable):
4.17.0-rc.0 RHOS-17.1-RHEL-9-20240701.n.1
How reproducible: Always
Actual results: missing etcd member, cluster shows warnings.
Expected results: The procedure can be successfully rolled back.
Additional info: must-gather on private comment.
- causes
-
OCPBUGS-41846 Add warning about etcd-to-ephemeral procedure rollback
- Closed