-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.19
-
None
-
Important
-
Yes
-
Rejected
-
False
-
Description of problem:
When using custom dhcp on AWS, MAPI machine get Running but node has uninitialized taints; CAPI machine stuck in Provisioned and csr pending
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-02-14-215306
How reproducible:
seems always for MAPI machine when scaling, and high incidence ratio for CAPI machine
Steps to Reproduce:
1.Install a 4.19 AWS cluster, we use automated template ipi-on-aws/versioned-installer-techpreview-ci liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.19.0-0.nightly-2025-02-14-215306 True False 100m Cluster version is 4.19.0-0.nightly-2025-02-14-215306 2.Create a custom dhcp, then swap the VPC to use the custom dhcp on AWS console 3.Scale a worker machineset, the machine get Running, but the node has uninitialized taints liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset huliu-aws217a-lts7q-worker-us-east-2a --replicas=2 machineset.machine.openshift.io/huliu-aws217a-lts7q-worker-us-east-2a scaled liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws217a-lts7q-master-0 Running m6i.xlarge us-east-2 us-east-2a 141m huliu-aws217a-lts7q-master-1 Running m6i.xlarge us-east-2 us-east-2b 141m huliu-aws217a-lts7q-master-2 Running m6i.xlarge us-east-2 us-east-2c 141m huliu-aws217a-lts7q-worker-us-east-2a-cm5c9 Running m6i.xlarge us-east-2 us-east-2a 137m huliu-aws217a-lts7q-worker-us-east-2a-wz82p Running m6i.xlarge us-east-2 us-east-2a 16m huliu-aws217a-lts7q-worker-us-east-2b-w2gg5 Running m6i.xlarge us-east-2 us-east-2b 137m huliu-aws217a-lts7q-worker-us-east-2c-rfm65 Running m6i.xlarge us-east-2 us-east-2c 137m liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-16-147.example.com Ready worker 24m v1.32.1 ip-10-0-2-172.us-east-2.compute.internal Ready control-plane,master 151m v1.32.1 ip-10-0-25-84.us-east-2.compute.internal Ready worker 141m v1.32.1 ip-10-0-35-16.us-east-2.compute.internal Ready control-plane,master 149m v1.32.1 ip-10-0-38-54.us-east-2.compute.internal Ready worker 141m v1.32.1 ip-10-0-73-150.us-east-2.compute.internal Ready worker 145m v1.32.1 ip-10-0-73-232.us-east-2.compute.internal Ready control-plane,master 151m v1.32.1 liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-16-147.example.com -oyaml |grep -A5 taints taints: - effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized value: "true" status: addresses: 4. Create a CAPI machine, the machine get Running, and no taints on the node. (By the way, I also encountered the CAPI machine stuck in Provisioned when creating before) But when I scale it to 2, the new CAPI machine stuck in Provisioned, and csr Pending liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION capi-ms-swmg9 huliu-aws217a-lts7q aws:///us-east-2b/i-0dc359eb8b745f27f Provisioned 21m capi-ms-v5vjl huliu-aws217a-lts7q ip-10-0-47-232.example.com aws:///us-east-2b/i-0c7d74925b9cae64f Running 31m liuhuali@Lius-MacBook-Pro huali-test % oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-4f6ls 17m kubernetes.io/kube-apiserver-client system:node:ip-10-0-45-27.example.com 24h Approved,Issued csr-5hrxs 65m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued csr-79pcj 64m kubernetes.io/kube-apiserver-client system:node:ip-10-0-16-147.example.com 24h Approved,Issued csr-7xgz8 27m kubernetes.io/kube-apiserver-client system:node:ip-10-0-47-232.example.com 24h Approved,Issued csr-9fqhh 64m kubernetes.io/kube-apiserver-client system:node:ip-10-0-16-147.example.com 24h Approved,Issued csr-b9lvw 28m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued csr-bh4bj 27m kubernetes.io/kube-apiserver-client system:node:ip-10-0-47-232.example.com 24h Approved,Issued csr-grl8d 18m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued csr-nfzws 64m kubernetes.io/kubelet-serving system:node:ip-10-0-16-147.example.com <none> Approved,Issued csr-p6zw7 27m kubernetes.io/kubelet-serving system:node:ip-10-0-47-232.example.com <none> Approved,Issued csr-rbn7k 17m kubernetes.io/kube-apiserver-client system:node:ip-10-0-45-27.example.com 24h Approved,Issued csr-rrwdx 2m51s kubernetes.io/kubelet-serving system:node:ip-10-0-45-27.example.com <none> Pending csr-wxrpf 17m kubernetes.io/kubelet-serving system:node:ip-10-0-45-27.example.com <none> Pending 5. I scale another worker machineset to 2, the new machine get Running and has uninitialized taints. But I create a new worker machineset, the machine get Running hasn't uninitialized taints. liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE huliu-aws217a-lts7q-master-0 Running m6i.xlarge us-east-2 us-east-2a 4h15m ip-10-0-2-172.us-east-2.compute.internal aws:///us-east-2a/i-029a670b32f1a0ab8 running huliu-aws217a-lts7q-master-1 Running m6i.xlarge us-east-2 us-east-2b 4h15m ip-10-0-35-16.us-east-2.compute.internal aws:///us-east-2b/i-03b0d624a0d77296e running huliu-aws217a-lts7q-master-2 Running m6i.xlarge us-east-2 us-east-2c 4h15m ip-10-0-73-232.us-east-2.compute.internal aws:///us-east-2c/i-00815b8b0af77f7d2 running huliu-aws217a-lts7q-worker-us-east-2a-cm5c9 Running m6i.xlarge us-east-2 us-east-2a 4h11m ip-10-0-25-84.us-east-2.compute.internal aws:///us-east-2a/i-011a38e96cda97dc7 running huliu-aws217a-lts7q-worker-us-east-2a-wz82p Running m6i.xlarge us-east-2 us-east-2a 130m ip-10-0-16-147.example.com aws:///us-east-2a/i-06069479b9e0f14a1 running huliu-aws217a-lts7q-worker-us-east-2aa-w589d Running m6i.xlarge us-east-2 us-east-2a 21m ip-10-0-18-230.example.com aws:///us-east-2a/i-0842dcd42e4a170fa running huliu-aws217a-lts7q-worker-us-east-2b-w2gg5 Running m6i.xlarge us-east-2 us-east-2b 4h11m ip-10-0-38-54.us-east-2.compute.internal aws:///us-east-2b/i-0ece9bc2cd89b2e6e running huliu-aws217a-lts7q-worker-us-east-2c-nl2bt Running m6i.xlarge us-east-2 us-east-2c 55m ip-10-0-91-174.example.com aws:///us-east-2c/i-0ac4328744648f204 running huliu-aws217a-lts7q-worker-us-east-2c-rfm65 Running m6i.xlarge us-east-2 us-east-2c 4h11m ip-10-0-73-150.us-east-2.compute.internal aws:///us-east-2c/i-0179fef8bc05db99c running liuhuali@Lius-MacBook-Pro huali-test % liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-91-174.example.com -oyaml |grep -A5 taints taints: - effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized value: "true" status: addresses: liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-18-230.example.com -oyaml |grep -A5 taints liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
MAPI machine get Running but node has uninitialized taints; CAPI machine stuck in Provisioned and csr pending
Expected results:
machine get Running, shouldn't have uninitialized taints
Additional info:
Discussion on slack: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1739437423874929