-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.16
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
None
-
None
-
Rejected
-
CLOUD Sprint 253, CLOUD Sprint 254, CLOUD Sprint 255, CLOUD Sprint 256, CLOUD Sprint 257, CLOUD Sprint 258, CLOUD Sprint 259, CLOUD Sprint 260, CLOUD Sprint 261, CLOUD Sprint 263, CLOUD Sprint 264, CLOUD Sprint 262, CLOUD Sprint 265, CLOUD Sprint 266, CLOUD Sprint 267, CLOUD Sprint 268, CLOUD Sprint 269, CLOUD Sprint 270, CLOUD Sprint 271, CLOUD Sprint 272, CLOUD Sprint 273, CLOUD Sprint 274, CLOUD Sprint 275, CLOUD Sprint 276, CLOUD Sprint 277, CLOUD Sprint 278
-
26
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
CAPI machines stuck in Provisioned and CSRs not approved after swap VPC to use custom DHCP option set
Version-Release number of selected component (if applicable):
I tested on 4.16.0-0.nightly-2024-04-02-182836
and 4.14.0-0.nightly-2024-04-02-185046, same issue
How reproducible:
Always
Steps to Reproduce:
1. Create dhcp-options-set
liuhuali@Lius-MacBook-Pro huali-test % aws ec2 create-dhcp-options --dhcp-configurations '[ {"Key":"domain-name-servers","Values":["AmazonProvidedDNS"]}, {"Key":"domain-name","Values":["examplehuali.com"]}]' DHCPOPTIONS dopt-05771a83747e85ab1 301721915996 DHCPCONFIGURATIONS domain-name VALUES examplehuali.com DHCPCONFIGURATIONS domain-name-servers VALUES AmazonProvidedDNS liuhuali@Lius-MacBook-Pro huali-test %
2.Create default OCP IPI cluster, allowing the installer to create it's own VPC, we use automated template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_16/ipi-on-aws/versioned-installer-ovn-ci
3. Enable feature gate
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-04-02-182836 True False 32m Cluster version is 4.16.0-0.nightly-2024-04-02-182836 liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.huliu-aws43c.qe.devcluster.openshift.com:6443". liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws43c-54nxg-master-0 Running m6i.xlarge us-east-2 us-east-2a 53m huliu-aws43c-54nxg-master-1 Running m6i.xlarge us-east-2 us-east-2b 53m huliu-aws43c-54nxg-master-2 Running m6i.xlarge us-east-2 us-east-2c 53m huliu-aws43c-54nxg-worker-us-east-2a-h8fmm Running m6i.xlarge us-east-2 us-east-2a 50m huliu-aws43c-54nxg-worker-us-east-2b-mqt86 Running m6i.xlarge us-east-2 us-east-2b 50m huliu-aws43c-54nxg-worker-us-east-2c-zpw8b Running m6i.xlarge us-east-2 us-east-2c 50m liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5 ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5 ip-10-0-62-71.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5 ip-10-0-8-140.us-east-2.compute.internal Ready worker 46m v1.29.2+258f1d5 ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 53m v1.29.2+258f1d5 ip-10-0-92-164.us-east-2.compute.internal Ready worker 44m v1.29.2+258f1d5 liuhuali@Lius-MacBook-Pro huali-test %
4.Create CAPI machine, we use automated code https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/capi_machines.go#L298, machine get running
liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 9m34s
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-1-171.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
ip-10-0-48-139.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
ip-10-0-62-71.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5
ip-10-0-8-140.us-east-2.compute.internal Ready worker 59m v1.29.2+258f1d5
ip-10-0-86-243.us-east-2.compute.internal Ready control-plane,master 66m v1.29.2+258f1d5
ip-10-0-88-75.us-east-2.compute.internal Ready worker 6m54s v1.29.2+258f1d5
ip-10-0-92-164.us-east-2.compute.internal Ready worker 57m v1.29.2+258f1d5
5.Swap the dhcp-options-set for the VPC with the one created in step1, then scale the CAPI machineset, the new machine stuck in Provisioned
liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset.cluster.x-k8s.io capi-machineset-51071 --replicas=2 machineset.cluster.x-k8s.io/capi-machineset-51071 scaled liuhuali@Lius-MacBook-Pro huali-test % oc get machines.cluster.x-k8s.io NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION capi-machineset-51071-2gx76 huliu-aws43c-54nxg aws:///us-east-2c/i-0c60f8db945e5bae4 Provisioned 32m capi-machineset-51071-dk62q huliu-aws43c-54nxg ip-10-0-88-75.us-east-2.compute.internal aws:///us-east-2c/i-0fa63e97cab2340d3 Running 42m liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-approver-capi-8b567cdb4-9588r -n openshift-cluster-machine-approver -c machine-approver-controller ... I0403 10:48:42.834794 1 controller.go:120] Reconciling CSR: csr-rg597 E0403 10:48:42.855209 1 csr_check.go:263] csr-rg597: failed to find machine for node ip-10-0-90-0.examplehuali.com, cannot approve I0403 10:48:42.855224 1 controller.go:232] csr-rg597: CSR not authorized E0403 10:48:42.855258 1 controller.go:329] "Reconciler error" err="could not reconcile CSR: failed to find machine for node ip-10-0-90-0.examplehuali.com" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-rg597" namespace="" name="csr-rg597" reconcileID="d0917465-b9d6-454f-b5ed-afec6f6a220e"
Actual results:
CAPI Machine stuck in Provisioned after swap VPC to use custom DHCP option set
Expected results:
CAPI Machine should get Running after swap VPC to use custom DHCP option set
Additional info:
We have such cases for MAPI https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30379 https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-51013 and I tested it for CAPI today found it didn’t work.
Must gather:
https://drive.google.com/file/d/13vv2ccLWD66sY3C2V7HzgqxW_7NDA3_a/view?usp=sharing