Details
-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16
-
None
-
Important
-
No
-
Proposed
-
False
-
Description
Description of problem:
CAPI machines stuck in Provisioned and CSRs not approved after swap VPC to use custom DHCP option set
Version-Release number of selected component (if applicable):
I tested on 4.16.0-0.nightly-2024-04-02-182836
and 4.14.0-0.nightly-2024-04-02-185046, same issue
How reproducible:
Always
Steps to Reproduce:
1. Create dhcp-options-set
liuhuali@Lius-MacBook-Pro huali-test % aws ec2 create-dhcp-options --dhcp-configurations '[
{"Key":"domain-name-servers","Values":["AmazonProvidedDNS"]},
{"Key":"domain-name","Values":["examplehuali.com"]}]'
DHCPOPTIONS dopt-05771a83747e85ab1 301721915996
DHCPCONFIGURATIONS domain-name
VALUES examplehuali.com
DHCPCONFIGURATIONS domain-name-servers
VALUES AmazonProvidedDNS
liuhuali@Lius-MacBook-Pro huali-test %
2.Create default OCP IPI cluster, allowing the installer to create it's own VPC, we use automated template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_16/ipi-on-aws/versioned-installer-ovn-ci
3. Enable feature gate
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.0-0.nightly-2024-04-02-182836 True False 32m Cluster version is 4.16.0-0.nightly-2024-04-02-182836
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-aws43c.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME PHASE TYPE REGION ZONE AGE
huliu-aws43c-54nxg-master-0 Running m6i.xlarge us-east-2 us-east-2a 53m
huliu-aws43c-54nxg-master-1 Running m6i.xlarge us-east-2 us-east-2b 53m
huliu-aws43c-54nxg-master-2 Running m6i.xlarge us-east-2 us-east-2c 53m
huliu-aws43c-54nxg-worker-us-east-2a-h8fmm Running m6i.xlarge us-east-2 us-east-2a 50m
huliu-aws43c-54nxg-worker-us-east-2b-mqt86 Running m6i.xlarge us-east-2 us-east-2b 50m
huliu-aws43c-54nxg-worker-us-east-2c-zpw8b Running m6i.xlarge us-east-2 us-east-2c 50m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
ip-10-0-62-71.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5
ip-10-0-8-140.us-east-2.compute.internal Ready worker 46m v1.29.2+258f1d5
ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5
ip-10-0-92-164.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5
liuhuali@Lius-MacBook-Pro huali-test %
4.Create CAPI machine, we use automated code https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/capi_machines.go#L298, machine get running
liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 9m34s
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
ip-10-0-62-71.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5
ip-10-0-8-140.us-east-2.compute.internal Ready worker 59m v1.29.2+258f1d5
ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
ip-10-0-88-75.us-east-2.compute.internal Ready worker 6m54s v1.29.2+258f1d5
ip-10-0-92-164.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5
5.Swap the dhcp-options-set for the VPC with the one created in step1, then scale the CAPI machineset, the new machine stuck in Provisioned
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset.cluster.x-k8s.io capi-machineset-51071 --replicas=2
machineset.cluster.x-k8s.io/capi-machineset-51071 scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-machineset-51071-2gx76 huliu-aws43c-54nxg aws:///us-east-2c/i-0c60f8db945e5bae4 Provisioned 32m
capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 42m
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-approver-capi-8b567cdb4-9588r -n openshift-cluster-machine-approver -c machine-approver-controller
...
I0403 10:48:42.834794 1 controller.go:120] Reconciling CSR: csr-rg597
E0403 10:48:42.855209 1 csr_check.go:263] csr-rg597: failed to find machine for node ip-10-0-90-0.examplehuali.com, cannot approve
I0403 10:48:42.855224 1 controller.go:232] csr-rg597: CSR not authorized
E0403 10:48:42.855258 1 controller.go:329] "Reconciler error" err="could not reconcile CSR: failed to find machine for node ip-10-0-90-0.examplehuali.com" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-rg597" namespace="" name="csr-rg597" reconcileID="d0917465-b9d6-454f-b5ed-afec6f6a220e"
Actual results:
CAPI Machine stuck in Provisioned after swap VPC to use custom DHCP option set
Expected results:
CAPI Machine should get Running after swap VPC to use custom DHCP option set
Additional info:
We have such cases for MAPI https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30379 https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-51013 and I tested it for CAPI today found it didn’t work.
Must gather:
https://drive.google.com/file/d/13vv2ccLWD66sY3C2V7HzgqxW_7NDA3_a/view?usp=sharing