Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50905

MAPI machine has uninitialized taints and CAPI machine stuck in Provisioned when using custom dhcp on AWS

    • Important
    • Yes
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          When using custom dhcp on AWS, MAPI machine get Running but node has uninitialized taints; CAPI machine stuck in Provisioned and csr pending

      Version-Release number of selected component (if applicable):

          4.19.0-0.nightly-2025-02-14-215306

      How reproducible:

          seems always for MAPI machine when scaling, and high incidence ratio for CAPI machine

      Steps to Reproduce:

          1.Install a 4.19 AWS cluster, we use automated template ipi-on-aws/versioned-installer-techpreview-ci
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.19.0-0.nightly-2025-02-14-215306   True        False         100m    Cluster version is 4.19.0-0.nightly-2025-02-14-215306
      
          2.Create a custom dhcp, then swap the VPC to use the custom dhcp on AWS console 
      
          3.Scale a worker machineset, the machine get Running, but the node has uninitialized taints
      
      liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset huliu-aws217a-lts7q-worker-us-east-2a --replicas=2
      machineset.machine.openshift.io/huliu-aws217a-lts7q-worker-us-east-2a scaled
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine                          
      NAME                                          PHASE     TYPE         REGION      ZONE         AGE
      huliu-aws217a-lts7q-master-0                  Running   m6i.xlarge   us-east-2   us-east-2a   141m
      huliu-aws217a-lts7q-master-1                  Running   m6i.xlarge   us-east-2   us-east-2b   141m
      huliu-aws217a-lts7q-master-2                  Running   m6i.xlarge   us-east-2   us-east-2c   141m
      huliu-aws217a-lts7q-worker-us-east-2a-cm5c9   Running   m6i.xlarge   us-east-2   us-east-2a   137m
      huliu-aws217a-lts7q-worker-us-east-2a-wz82p   Running   m6i.xlarge   us-east-2   us-east-2a   16m
      huliu-aws217a-lts7q-worker-us-east-2b-w2gg5   Running   m6i.xlarge   us-east-2   us-east-2b   137m
      huliu-aws217a-lts7q-worker-us-east-2c-rfm65   Running   m6i.xlarge   us-east-2   us-east-2c   137m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                        STATUS   ROLES                  AGE    VERSION
      ip-10-0-16-147.example.com                  Ready    worker                 24m    v1.32.1
      ip-10-0-2-172.us-east-2.compute.internal    Ready    control-plane,master   151m   v1.32.1
      ip-10-0-25-84.us-east-2.compute.internal    Ready    worker                 141m   v1.32.1
      ip-10-0-35-16.us-east-2.compute.internal    Ready    control-plane,master   149m   v1.32.1
      ip-10-0-38-54.us-east-2.compute.internal    Ready    worker                 141m   v1.32.1
      ip-10-0-73-150.us-east-2.compute.internal   Ready    worker                 145m   v1.32.1
      ip-10-0-73-232.us-east-2.compute.internal   Ready    control-plane,master   151m   v1.32.1
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-16-147.example.com  -oyaml |grep -A5 taints
        taints:
        - effect: NoSchedule
          key: node.cloudprovider.kubernetes.io/uninitialized
          value: "true"
      status:
        addresses:
           
      4. Create a CAPI machine, the machine get Running, and no taints on the node. (By the way, I also encountered the CAPI machine stuck in Provisioned when creating before) But when I scale it to 2, the new CAPI machine stuck in Provisioned, and csr Pending
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine.c
      NAME            CLUSTER               NODENAME                     PROVIDERID                              PHASE         AGE   VERSION
      capi-ms-swmg9   huliu-aws217a-lts7q                                aws:///us-east-2b/i-0dc359eb8b745f27f   Provisioned   21m   
      capi-ms-v5vjl   huliu-aws217a-lts7q   ip-10-0-47-232.example.com   aws:///us-east-2b/i-0c7d74925b9cae64f   Running       31m   
      liuhuali@Lius-MacBook-Pro huali-test % oc get csr
      NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   REQUESTEDDURATION   CONDITION
      csr-4f6ls   17m     kubernetes.io/kube-apiserver-client           system:node:ip-10-0-45-27.example.com                                       24h                 Approved,Issued
      csr-5hrxs   65m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Approved,Issued
      csr-79pcj   64m     kubernetes.io/kube-apiserver-client           system:node:ip-10-0-16-147.example.com                                      24h                 Approved,Issued
      csr-7xgz8   27m     kubernetes.io/kube-apiserver-client           system:node:ip-10-0-47-232.example.com                                      24h                 Approved,Issued
      csr-9fqhh   64m     kubernetes.io/kube-apiserver-client           system:node:ip-10-0-16-147.example.com                                      24h                 Approved,Issued
      csr-b9lvw   28m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Approved,Issued
      csr-bh4bj   27m     kubernetes.io/kube-apiserver-client           system:node:ip-10-0-47-232.example.com                                      24h                 Approved,Issued
      csr-grl8d   18m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Approved,Issued
      csr-nfzws   64m     kubernetes.io/kubelet-serving                 system:node:ip-10-0-16-147.example.com                                      <none>              Approved,Issued
      csr-p6zw7   27m     kubernetes.io/kubelet-serving                 system:node:ip-10-0-47-232.example.com                                      <none>              Approved,Issued
      csr-rbn7k   17m     kubernetes.io/kube-apiserver-client           system:node:ip-10-0-45-27.example.com                                       24h                 Approved,Issued
      csr-rrwdx   2m51s   kubernetes.io/kubelet-serving                 system:node:ip-10-0-45-27.example.com                                       <none>              Pending
      csr-wxrpf   17m     kubernetes.io/kubelet-serving                 system:node:ip-10-0-45-27.example.com                                       <none>              Pending
      
      5. I scale another worker machineset to 2, the new machine get Running and has uninitialized taints. But I create a new worker machineset, the machine get Running hasn't uninitialized taints.
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o wide                  
      NAME                                           PHASE     TYPE         REGION      ZONE         AGE     NODE                                        PROVIDERID                              STATE
      huliu-aws217a-lts7q-master-0                   Running   m6i.xlarge   us-east-2   us-east-2a   4h15m   ip-10-0-2-172.us-east-2.compute.internal    aws:///us-east-2a/i-029a670b32f1a0ab8   running
      huliu-aws217a-lts7q-master-1                   Running   m6i.xlarge   us-east-2   us-east-2b   4h15m   ip-10-0-35-16.us-east-2.compute.internal    aws:///us-east-2b/i-03b0d624a0d77296e   running
      huliu-aws217a-lts7q-master-2                   Running   m6i.xlarge   us-east-2   us-east-2c   4h15m   ip-10-0-73-232.us-east-2.compute.internal   aws:///us-east-2c/i-00815b8b0af77f7d2   running
      huliu-aws217a-lts7q-worker-us-east-2a-cm5c9    Running   m6i.xlarge   us-east-2   us-east-2a   4h11m   ip-10-0-25-84.us-east-2.compute.internal    aws:///us-east-2a/i-011a38e96cda97dc7   running
      huliu-aws217a-lts7q-worker-us-east-2a-wz82p    Running   m6i.xlarge   us-east-2   us-east-2a   130m    ip-10-0-16-147.example.com                  aws:///us-east-2a/i-06069479b9e0f14a1   running
      huliu-aws217a-lts7q-worker-us-east-2aa-w589d   Running   m6i.xlarge   us-east-2   us-east-2a   21m     ip-10-0-18-230.example.com                  aws:///us-east-2a/i-0842dcd42e4a170fa   running
      huliu-aws217a-lts7q-worker-us-east-2b-w2gg5    Running   m6i.xlarge   us-east-2   us-east-2b   4h11m   ip-10-0-38-54.us-east-2.compute.internal    aws:///us-east-2b/i-0ece9bc2cd89b2e6e   running
      huliu-aws217a-lts7q-worker-us-east-2c-nl2bt    Running   m6i.xlarge   us-east-2   us-east-2c   55m     ip-10-0-91-174.example.com                  aws:///us-east-2c/i-0ac4328744648f204   running
      huliu-aws217a-lts7q-worker-us-east-2c-rfm65    Running   m6i.xlarge   us-east-2   us-east-2c   4h11m   ip-10-0-73-150.us-east-2.compute.internal   aws:///us-east-2c/i-0179fef8bc05db99c   running
      liuhuali@Lius-MacBook-Pro huali-test % 
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-91-174.example.com  -oyaml |grep -A5 taints
        taints:
        - effect: NoSchedule
          key: node.cloudprovider.kubernetes.io/uninitialized
          value: "true"
      status:
        addresses:
      liuhuali@Lius-MacBook-Pro huali-test % oc get node ip-10-0-18-230.example.com  -oyaml |grep -A5 taints
      liuhuali@Lius-MacBook-Pro huali-test % 
      
      

      Actual results:

          MAPI machine get Running but node has uninitialized taints; CAPI machine stuck in Provisioned and csr pending

      Expected results:

          machine get Running, shouldn't have uninitialized taints

      Additional info:

          Discussion on slack: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1739437423874929

              joelspeed Joel Speed
              huliu@redhat.com Huali Liu
              Huali Liu Huali Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: