-
Bug
-
Resolution: Done-Errata
-
Major
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
Yes
-
None
-
Rejected
-
CLOUD Sprint 270, CLOUD Sprint 271
-
2
-
In Progress
-
Bug Fix
-
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-50905. The following is the description of the original issue:
—
Description of problem:
When using custom dhcp on AWS, MAPI machine get Running but node has uninitialized taints; CAPI machine stuck in Provisioned and csr pending
Version-Release number of selected component (if applicable):
4.19.0-0.nightly-2025-02-14-215306
How reproducible:
seems always for MAPI machine when scaling, and high incidence ratio for CAPI machine
Steps to Reproduce:
1.Install a 4.19 AWS cluster, we use automated template ipi-on-aws/versioned-installer-techpreview-ci
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.19.0-0.nightly-2025-02-14-215306 True False 100m Cluster version is 4.19.0-0.nightly-2025-02-14-215306
2.Create a custom dhcp, then swap the VPC to use the custom dhcp on AWS console
3.Scale a worker machineset, the machine get Running, but the node has uninitialized taints
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset huliu-aws217a-lts7q-worker-us-east-2a --replicas=2
machineset.machine.openshift.io/huliu-aws217a-lts7q-worker-us-east-2a scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-aws217a-lts7q-master-0 Running m6i.xlarge us-east-2 us-east-2a 141m
huliu-aws217a-lts7q-master-1 Running m6i.xlarge us-east-2 us-east-2b 141m
huliu-aws217a-lts7q-master-2 Running m6i.xlarge us-east-2 us-east-2c 141m
huliu-aws217a-lts7q-worker-us-east-2a-cm5c9 Running m6i.xlarge us-east-2 us-east-2a 137m
huliu-aws217a-lts7q-worker-us-east-2a-wz82p Running m6i.xlarge us-east-2 us-east-2a 16m
huliu-aws217a-lts7q-worker-us-east-2b-w2gg5 Running m6i.xlarge us-east-2 us-east-2b 137m
huliu-aws217a-lts7q-worker-us-east-2c-rfm65 Running m6i.xlarge us-east-2 us-east-2c 137m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-16-147.example.com Ready worker 24m v1.32.1
ip-10-0-2-172.us-east-2.compute.internal Ready control-plane,master 151m v1.32.1
ip-10-0-25-84.us-east-2.compute.internal Ready worker 141m v1.32.1
ip-10-0-35-16.us-east-2.compute.internal Ready control-plane,master 149m v1.32.1
ip-10-0-38-54.us-east-2.compute.internal Ready worker 141m v1.32.1
ip-10-0-73-150.us-east-2.compute.internal Ready worker 145m v1.32.1
ip-10-0-73-232.us-east-2.compute.internal Ready control-plane,master 151m v1.32.1
liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-16-147.example.com -oyaml |grep -A5 taints
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
status:
addresses:
4. Create a CAPI machine, the machine get Running, and no taints on the node. (By the way, I also encountered the CAPI machine stuck in Provisioned when creating before) But when I scale it to 2, the new CAPI machine stuck in Provisioned, and csr Pending
liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-ms-swmg9 huliu-aws217a-lts7q aws:///us-east-2b/i-0dc359eb8b745f27f Provisioned 21m
capi-ms-v5vjl huliu-aws217a-lts7q ip-10-0-47-232.example.com aws:///us-east-2b/i-0c7d74925b9cae64f Running 31m
liuhuali@Lius-MacBook-Pro huali-test % oc get csr
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-4f6ls 17m kubernetes.io/kube-apiserver-client system:node:ip-10-0-45-27.example.com 24h Approved,Issued
csr-5hrxs 65m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued
csr-79pcj 64m kubernetes.io/kube-apiserver-client system:node:ip-10-0-16-147.example.com 24h Approved,Issued
csr-7xgz8 27m kubernetes.io/kube-apiserver-client system:node:ip-10-0-47-232.example.com 24h Approved,Issued
csr-9fqhh 64m kubernetes.io/kube-apiserver-client system:node:ip-10-0-16-147.example.com 24h Approved,Issued
csr-b9lvw 28m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued
csr-bh4bj 27m kubernetes.io/kube-apiserver-client system:node:ip-10-0-47-232.example.com 24h Approved,Issued
csr-grl8d 18m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Approved,Issued
csr-nfzws 64m kubernetes.io/kubelet-serving system:node:ip-10-0-16-147.example.com <none> Approved,Issued
csr-p6zw7 27m kubernetes.io/kubelet-serving system:node:ip-10-0-47-232.example.com <none> Approved,Issued
csr-rbn7k 17m kubernetes.io/kube-apiserver-client system:node:ip-10-0-45-27.example.com 24h Approved,Issued
csr-rrwdx 2m51s kubernetes.io/kubelet-serving system:node:ip-10-0-45-27.example.com <none> Pending
csr-wxrpf 17m kubernetes.io/kubelet-serving system:node:ip-10-0-45-27.example.com <none> Pending
5. I scale another worker machineset to 2, the new machine get Running and has uninitialized taints. But I create a new worker machineset, the machine get Running hasn't uninitialized taints.
liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
huliu-aws217a-lts7q-master-0 Running m6i.xlarge us-east-2 us-east-2a 4h15m ip-10-0-2-172.us-east-2.compute.internal aws:///us-east-2a/i-029a670b32f1a0ab8 running
huliu-aws217a-lts7q-master-1 Running m6i.xlarge us-east-2 us-east-2b 4h15m ip-10-0-35-16.us-east-2.compute.internal aws:///us-east-2b/i-03b0d624a0d77296e running
huliu-aws217a-lts7q-master-2 Running m6i.xlarge us-east-2 us-east-2c 4h15m ip-10-0-73-232.us-east-2.compute.internal aws:///us-east-2c/i-00815b8b0af77f7d2 running
huliu-aws217a-lts7q-worker-us-east-2a-cm5c9 Running m6i.xlarge us-east-2 us-east-2a 4h11m ip-10-0-25-84.us-east-2.compute.internal aws:///us-east-2a/i-011a38e96cda97dc7 running
huliu-aws217a-lts7q-worker-us-east-2a-wz82p Running m6i.xlarge us-east-2 us-east-2a 130m ip-10-0-16-147.example.com aws:///us-east-2a/i-06069479b9e0f14a1 running
huliu-aws217a-lts7q-worker-us-east-2aa-w589d Running m6i.xlarge us-east-2 us-east-2a 21m ip-10-0-18-230.example.com aws:///us-east-2a/i-0842dcd42e4a170fa running
huliu-aws217a-lts7q-worker-us-east-2b-w2gg5 Running m6i.xlarge us-east-2 us-east-2b 4h11m ip-10-0-38-54.us-east-2.compute.internal aws:///us-east-2b/i-0ece9bc2cd89b2e6e running
huliu-aws217a-lts7q-worker-us-east-2c-nl2bt Running m6i.xlarge us-east-2 us-east-2c 55m ip-10-0-91-174.example.com aws:///us-east-2c/i-0ac4328744648f204 running
huliu-aws217a-lts7q-worker-us-east-2c-rfm65 Running m6i.xlarge us-east-2 us-east-2c 4h11m ip-10-0-73-150.us-east-2.compute.internal aws:///us-east-2c/i-0179fef8bc05db99c running
liuhuali@Lius-MacBook-Pro huali-test %
liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-91-174.example.com -oyaml |grep -A5 taints
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
status:
addresses:
liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-18-230.example.com -oyaml |grep -A5 taints
liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
MAPI machine get Running but node has uninitialized taints; CAPI machine stuck in Provisioned and csr pending
Expected results:
machine get Running, shouldn't have uninitialized taints
Additional info:
Discussion on slack: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1739437423874929
- clones
-
OCPBUGS-50905 MAPI machine has uninitialized taints when using custom dhcp on AWS (and CAPI machine stuck in Provisioned)
-
- Closed
-
- is blocked by
-
OCPBUGS-50905 MAPI machine has uninitialized taints when using custom dhcp on AWS (and CAPI machine stuck in Provisioned)
-
- Closed
-
- links to
-
RHBA-2025:7863
OpenShift Container Platform 4.18.14 bug fix update