Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31689

CAPI machines stuck in Provisioned and CSRs not approved after swap VPC to use custom DHCP option set

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • None
    • None
    • Rejected
    • CLOUD Sprint 253, CLOUD Sprint 254, CLOUD Sprint 255, CLOUD Sprint 256, CLOUD Sprint 257, CLOUD Sprint 258, CLOUD Sprint 259, CLOUD Sprint 260, CLOUD Sprint 261, CLOUD Sprint 263, CLOUD Sprint 264, CLOUD Sprint 262, CLOUD Sprint 265, CLOUD Sprint 266, CLOUD Sprint 267, CLOUD Sprint 268, CLOUD Sprint 269, CLOUD Sprint 270, CLOUD Sprint 271, CLOUD Sprint 272, CLOUD Sprint 273, CLOUD Sprint 274, CLOUD Sprint 275, CLOUD Sprint 276, CLOUD Sprint 277, CLOUD Sprint 278
    • 26
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      CAPI machines stuck in Provisioned and CSRs not approved after swap VPC to use custom DHCP option set

      Version-Release number of selected component (if applicable):
      I tested on 4.16.0-0.nightly-2024-04-02-182836
      and 4.14.0-0.nightly-2024-04-02-185046, same issue

      How reproducible:
      Always

      Steps to Reproduce:
      1. Create dhcp-options-set

      liuhuali@Lius-MacBook-Pro huali-test % aws ec2 create-dhcp-options --dhcp-configurations '[
      {"Key":"domain-name-servers","Values":["AmazonProvidedDNS"]},
      {"Key":"domain-name","Values":["examplehuali.com"]}]'
      
      DHCPOPTIONS dopt-05771a83747e85ab1 301721915996
      DHCPCONFIGURATIONS domain-name
      VALUES examplehuali.com
      DHCPCONFIGURATIONS domain-name-servers
      VALUES AmazonProvidedDNS
      liuhuali@Lius-MacBook-Pro huali-test %
        

      2.Create default OCP IPI cluster, allowing the installer to create it's own VPC, we use automated template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_16/ipi-on-aws/versioned-installer-ovn-ci

      3. Enable feature gate

      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.16.0-0.nightly-2024-04-02-182836 True False 32m Cluster version is 4.16.0-0.nightly-2024-04-02-182836
      liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
      Now using project "openshift-machine-api" on server "https://api.huliu-aws43c.qe.devcluster.openshift.com:6443".
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME PHASE TYPE REGION ZONE AGE
      huliu-aws43c-54nxg-master-0 Running m6i.xlarge us-east-2 us-east-2a 53m
      huliu-aws43c-54nxg-master-1 Running m6i.xlarge us-east-2 us-east-2b 53m
      huliu-aws43c-54nxg-master-2 Running m6i.xlarge us-east-2 us-east-2c 53m
      huliu-aws43c-54nxg-worker-us-east-2a-h8fmm Running m6i.xlarge us-east-2 us-east-2a 50m
      huliu-aws43c-54nxg-worker-us-east-2b-mqt86 Running m6i.xlarge us-east-2 us-east-2b 50m
      huliu-aws43c-54nxg-worker-us-east-2c-zpw8b Running m6i.xlarge us-east-2 us-east-2c 50m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
      ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
      ip-10-0-62-71.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5
      ip-10-0-8-140.us-east-2.compute.internal Ready worker 46m v1.29.2+258f1d5
      ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
      ip-10-0-92-164.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5
      liuhuali@Lius-MacBook-Pro huali-test % 

      4.Create CAPI machine, we use automated code https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/capi_machines.go#L298, machine get running

      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
      NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
      capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 9m34s 
      liuhuali@Lius-MacBook-Pro huali-test % oc get node 
      NAME STATUS ROLES AGE VERSION
      ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
      ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
      ip-10-0-62-71.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5
      ip-10-0-8-140.us-east-2.compute.internal Ready worker 59m v1.29.2+258f1d5
      ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
      ip-10-0-88-75.us-east-2.compute.internal Ready worker 6m54s v1.29.2+258f1d5
      ip-10-0-92-164.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5 

      5.Swap the dhcp-options-set for the VPC with the one created in step1, then scale the CAPI machineset, the new machine stuck in Provisioned

      liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset.cluster.x-k8s.io capi-machineset-51071 --replicas=2
      machineset.cluster.x-k8s.io/capi-machineset-51071 scaled
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io 
      NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
      capi-machineset-51071-2gx76 huliu-aws43c-54nxg aws:///us-east-2c/i-0c60f8db945e5bae4 Provisioned 32m 
      capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 42m 
      liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-approver-capi-8b567cdb4-9588r -n openshift-cluster-machine-approver -c machine-approver-controller
      ...
      I0403 10:48:42.834794 1 controller.go:120] Reconciling CSR: csr-rg597
      E0403 10:48:42.855209 1 csr_check.go:263] csr-rg597: failed to find machine for node ip-10-0-90-0.examplehuali.com, cannot approve
      I0403 10:48:42.855224 1 controller.go:232] csr-rg597: CSR not authorized
      E0403 10:48:42.855258 1 controller.go:329] "Reconciler error" err="could not reconcile CSR: failed to find machine for node ip-10-0-90-0.examplehuali.com" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-rg597" namespace="" name="csr-rg597" reconcileID="d0917465-b9d6-454f-b5ed-afec6f6a220e" 

      Actual results:

      CAPI Machine stuck in Provisioned after swap VPC to use custom DHCP option set

      Expected results:
      CAPI Machine should get Running after swap VPC to use custom DHCP option set

       

      Additional info:
      We have such cases for MAPI https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30379 https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-51013 and I tested it for CAPI today found it didn’t work.

      Must gather:
      https://drive.google.com/file/d/13vv2ccLWD66sY3C2V7HzgqxW_7NDA3_a/view?usp=sharing

              rh-ee-tbarberb Theo Barber-Bany
              huliu@redhat.com Huali Liu
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: