-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.13.z
-
None
-
Important
-
No
-
False
-
Description of problem:
Control plane machine set operator (CPMSO) fails to roll-out control plane nodes, hanging, because the new etcd pod fails with Init:CrashLoopBackOff. The error comes from the init container "etcd-ensure-env-vars" because environment parameters are missing from the deployed static yaml. For example: ~~~ etcd-ensure-env-vars: ... Command: /bin/sh -c #!/bin/sh set -euo pipefail : "${NODE_rugouvei_cluster1_bkw7w_master_kxp8p_1_ETCD_URL_HOST?not set}" : "${NODE_rugouvei_cluster1_bkw7w_master_kxp8p_1_ETCD_NAME?not set}" : "${NODE_rugouvei_cluster1_bkw7w_master_kxp8p_1_IP?not set}" ... State: Terminated Reason: Error Message: /bin/sh: line 4: NODE_rugouvei_cluster1_bkw7w_master_kxp8p_1_ETCD_URL_HOST: not set ~~~ The error is correct. The only "NODE_*" parameters listed are: ~~~ NODE_rugouvei_cluster1_bkw7w_master_1_ETCD_NAME: rugouvei-cluster1-bkw7w-master-1 NODE_rugouvei_cluster1_bkw7w_master_1_ETCD_URL_HOST: 10.44.135.201 NODE_rugouvei_cluster1_bkw7w_master_1_IP: 10.44.135.201 NODE_rugouvei_cluster1_bkw7w_master_2_ETCD_NAME: rugouvei-cluster1-bkw7w-master-2 NODE_rugouvei_cluster1_bkw7w_master_2_ETCD_URL_HOST: 10.44.135.240 NODE_rugouvei_cluster1_bkw7w_master_2_IP: 10.44.135.240 NODE_IP: (v1:status.podIP) ~~~ "rugouvei_cluster1_bkw7w_master_0" was the first machine/node to be deleted, and the new node "rugouvei-cluster1-bkw7w-master-kxp8p-1" is not in the list causing the container to fail.
Version-Release number of selected component (if applicable):
Reproduced with freshly installed VMware IPI cluster 4.13.23 in Lab.
Steps to Reproduce:
1. Install IPI VMware cluster 4.13.23. 2. Configure CPMSO following documentation: https://docs.openshift.com/container-platform/4.13/machine_management/control_plane_machine_management/cpmso-about.html 3. Manually delete "master-0".
Actual results:
$ ./oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master NAME PHASE TYPE REGION ZONE AGE rugouvei-cluster1-bkw7w-master-1 Deleting 5h9m rugouvei-cluster1-bkw7w-master-2 Running 5h9m rugouvei-cluster1-bkw7w-master-gx9pd-0 Running 34m rugouvei-cluster1-bkw7w-master-kxp8p-1 Running 34m $ ./oc get nodes -l node-role.kubernetes.io/master= NAME STATUS ROLES AGE VERSION rugouvei-cluster1-bkw7w-master-1 Ready control-plane,master 5h7m v1.26.9+636f2be rugouvei-cluster1-bkw7w-master-2 Ready control-plane,master 5h7m v1.26.9+636f2be rugouvei-cluster1-bkw7w-master-gx9pd-0 Ready control-plane,master 30m v1.26.9+636f2be rugouvei-cluster1-bkw7w-master-kxp8p-1 Ready control-plane,master 30m v1.26.9+636f2be $ ./oc -n openshift-etcd get pod -l app=etcd --show-labels NAME READY STATUS RESTARTS AGE LABELS etcd-rugouvei-cluster1-bkw7w-master-1 4/4 Running 0 61m app=etcd,etcd=true,k8s-app=etcd,revision=9 etcd-rugouvei-cluster1-bkw7w-master-2 4/4 Running 0 63m app=etcd,etcd=true,k8s-app=etcd,revision=9 etcd-rugouvei-cluster1-bkw7w-master-kxp8p-1 0/4 Init:CrashLoopBackOff 10 (2m42s ago) 29m app=etcd,etcd=true,k8s-app=etcd,revision=9
Expected results:
New masters to roll-out.
- is duplicated by
-
OCPBUGS-23044 [4.13] CEO prevents member deletion during revision rollout
- Verified