Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31689

CAPI machines stuck in Provisioned and CSRs not approved after swap VPC to use custom DHCP option set

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Undefined
    • None
    • 4.16
    • None
    • Important
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:
      CAPI machines stuck in Provisioned and CSRs not approved after swap VPC to use custom DHCP option set

      Version-Release number of selected component (if applicable):
      I tested on 4.16.0-0.nightly-2024-04-02-182836
      and 4.14.0-0.nightly-2024-04-02-185046, same issue

      How reproducible:
      Always

      Steps to Reproduce:
      1. Create dhcp-options-set

      liuhuali@Lius-MacBook-Pro huali-test % aws ec2 create-dhcp-options --dhcp-configurations '[

      {"Key":"domain-name-servers","Values":["AmazonProvidedDNS"]}

      ,

      {"Key":"domain-name","Values":["examplehuali.com"]}

      ]'
      DHCPOPTIONS dopt-05771a83747e85ab1 301721915996
      DHCPCONFIGURATIONS domain-name
      VALUES examplehuali.com
      DHCPCONFIGURATIONS domain-name-servers
      VALUES AmazonProvidedDNS
      liuhuali@Lius-MacBook-Pro huali-test %

      2.Create default OCP IPI cluster, allowing the installer to create it's own VPC, we use automated template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_16/ipi-on-aws/versioned-installer-ovn-ci

      3. Enable feature gate
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.16.0-0.nightly-2024-04-02-182836 True False 32m Cluster version is 4.16.0-0.nightly-2024-04-02-182836
      liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
      Now using project "openshift-machine-api" on server "https://api.huliu-aws43c.qe.devcluster.openshift.com:6443".
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME PHASE TYPE REGION ZONE AGE
      huliu-aws43c-54nxg-master-0 Running m6i.xlarge us-east-2 us-east-2a 53m
      huliu-aws43c-54nxg-master-1 Running m6i.xlarge us-east-2 us-east-2b 53m
      huliu-aws43c-54nxg-master-2 Running m6i.xlarge us-east-2 us-east-2c 53m
      huliu-aws43c-54nxg-worker-us-east-2a-h8fmm Running m6i.xlarge us-east-2 us-east-2a 50m
      huliu-aws43c-54nxg-worker-us-east-2b-mqt86 Running m6i.xlarge us-east-2 us-east-2b 50m
      huliu-aws43c-54nxg-worker-us-east-2c-zpw8b Running m6i.xlarge us-east-2 us-east-2c 50m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
      ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
      ip-10-0-62-71.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5
      ip-10-0-8-140.us-east-2.compute.internal Ready worker 46m v1.29.2+258f1d5
      ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
      ip-10-0-92-164.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5
      liuhuali@Lius-MacBook-Pro huali-test %

      4.Create CAPI machine, we use automated code https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/capi_machines.go#L298, machine get running
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
      NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
      capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 9m34s
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
      ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
      ip-10-0-62-71.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5
      ip-10-0-8-140.us-east-2.compute.internal Ready worker 59m v1.29.2+258f1d5
      ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
      ip-10-0-88-75.us-east-2.compute.internal Ready worker 6m54s v1.29.2+258f1d5
      ip-10-0-92-164.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5

      5.Swap the dhcp-options-set for the VPC with the one created in step1, then scale the CAPI machineset, the new machine stuck in Provisioned

      liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset.cluster.x-k8s.io capi-machineset-51071 --replicas=2
      machineset.cluster.x-k8s.io/capi-machineset-51071 scaled
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
      NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
      capi-machineset-51071-2gx76 huliu-aws43c-54nxg aws:///us-east-2c/i-0c60f8db945e5bae4 Provisioned 32m
      capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 42m
      liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-approver-capi-8b567cdb4-9588r -n openshift-cluster-machine-approver -c machine-approver-controller
      ...
      I0403 10:48:42.834794 1 controller.go:120] Reconciling CSR: csr-rg597
      E0403 10:48:42.855209 1 csr_check.go:263] csr-rg597: failed to find machine for node ip-10-0-90-0.examplehuali.com, cannot approve
      I0403 10:48:42.855224 1 controller.go:232] csr-rg597: CSR not authorized
      E0403 10:48:42.855258 1 controller.go:329] "Reconciler error" err="could not reconcile CSR: failed to find machine for node ip-10-0-90-0.examplehuali.com" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-rg597" namespace="" name="csr-rg597" reconcileID="d0917465-b9d6-454f-b5ed-afec6f6a220e"

      Actual results:
      CAPI Machine stuck in Provisioned after swap VPC to use custom DHCP option set
      Expected results:
      CAPI Machine should get Running after swap VPC to use custom DHCP option set
      Additional info:
      We have such cases for MAPI https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30379 https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-51013 and I tested it for CAPI today found it didn’t work.
      Must gather:
      https://drive.google.com/file/d/13vv2ccLWD66sY3C2V7HzgqxW_7NDA3_a/view?usp=sharing

      Attachments

        Activity

          People

            rh-ee-tbarberb Theo Barber-Bany
            huliu@redhat.com Huali Liu
            Huali Liu Huali Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: