Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9268

[AWS]'oc get node' cannot return the node which miss AWS DNS suffix on the cluster created with feature gate

    XMLWordPrintable

Details

    • Moderate
    • Unspecified
    • If docs needed, set a value

    Description

      Description of problem:
      'oc get node' cannot return the node which miss AWS DNS suffix on the cluster created with feature gate

      Version-Release number of selected component (if applicable):
      4.11.0-0.nightly-2022-05-11-054135

      How reproducible:
      Always

      Steps to Reproduce:
      1.Create dhcp-options-set
      liuhuali@Lius-MacBook-Pro huali-test % aws ec2 create-dhcp-options --dhcp-configurations '[

      {"Key":"domain-name-servers","Values":["AmazonProvidedDNS"]}

      ]'
      DHCPOPTIONS dopt-0c9dfcde919f49105 301721915996
      DHCPCONFIGURATIONS domain-name-servers
      VALUES AmazonProvidedDNS
      liuhuali@Lius-MacBook-Pro huali-test %

      2.Install a cluster with feature gate like this:
      ./openshift-install create install-config --log-level=debug --dir=cluster1
      ./openshift-install create manifests --log-level=debug --dir=cluster1
      vi cluster1/manifests/manifest_feature_gate.yaml

      apiVersion: config.openshift.io/v1
      kind: FeatureGate
      metadata:
      annotations:
      include.release.openshift.io/self-managed-high-availability: "true"
      include.release.openshift.io/single-node-developer: "true"
      release.openshift.io/create-only: "true"
      name: cluster
      spec:
      featureSet: TechPreviewNoUpgrade

      ./openshift-install create cluster --log-level=debug --dir=cluster1

      3.After installation, check the cluster is ok, 'oc get node' return 6 nodes
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.machine.openshift.io -o wide
      NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      huliu-aws411ccm-ktgjv-master-0 Running m6i.xlarge us-east-2 us-east-2a 59m ip-10-0-142-194.us-east-2.compute.internal aws:///us-east-2a/i-05d96395caa887d8a running
      huliu-aws411ccm-ktgjv-master-1 Running m6i.xlarge us-east-2 us-east-2b 59m ip-10-0-188-250.us-east-2.compute.internal aws:///us-east-2b/i-062357b65874125d0 running
      huliu-aws411ccm-ktgjv-master-2 Running m6i.xlarge us-east-2 us-east-2c 59m ip-10-0-193-79.us-east-2.compute.internal aws:///us-east-2c/i-0a220248387b666a8 running
      huliu-aws411ccm-ktgjv-worker-us-east-2a-68lcj Running m6i.large us-east-2 us-east-2a 55m ip-10-0-137-131.us-east-2.compute.internal aws:///us-east-2a/i-07835e479d27914ea running
      huliu-aws411ccm-ktgjv-worker-us-east-2b-wsdr9 Running m6i.large us-east-2 us-east-2b 55m ip-10-0-190-236.us-east-2.compute.internal aws:///us-east-2b/i-0ff467ae0b64f5e97 running
      huliu-aws411ccm-ktgjv-worker-us-east-2c-mhf4h Running m6i.large us-east-2 us-east-2c 55m ip-10-0-193-47.us-east-2.compute.internal aws:///us-east-2c/i-0cda097a70aca5373 running
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.machine.openshift.io
      NAME PHASE TYPE REGION ZONE AGE
      huliu-aws411ccm-ktgjv-master-0 Running m6i.xlarge us-east-2 us-east-2a 60m
      huliu-aws411ccm-ktgjv-master-1 Running m6i.xlarge us-east-2 us-east-2b 60m
      huliu-aws411ccm-ktgjv-master-2 Running m6i.xlarge us-east-2 us-east-2c 60m
      huliu-aws411ccm-ktgjv-worker-us-east-2a-68lcj Running m6i.large us-east-2 us-east-2a 56m
      huliu-aws411ccm-ktgjv-worker-us-east-2b-wsdr9 Running m6i.large us-east-2 us-east-2b 56m
      huliu-aws411ccm-ktgjv-worker-us-east-2c-mhf4h Running m6i.large us-east-2 us-east-2c 56m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-137-131.us-east-2.compute.internal Ready worker 54m v1.23.3+69213f8
      ip-10-0-142-194.us-east-2.compute.internal Ready master 59m v1.23.3+69213f8
      ip-10-0-188-250.us-east-2.compute.internal Ready master 60m v1.23.3+69213f8
      ip-10-0-190-236.us-east-2.compute.internal Ready worker 54m v1.23.3+69213f8
      ip-10-0-193-47.us-east-2.compute.internal Ready worker 54m v1.23.3+69213f8
      ip-10-0-193-79.us-east-2.compute.internal Ready master 59m v1.23.3+69213f8

      4.Swap the dhcp-options-set for the VPC with the one above

      5.Delete a worker machine backed by a machineset, allowing MAPI to recreate the machine
      liuhuali@Lius-MacBook-Pro huali-test % oc delete machines.machine.openshift.io huliu-aws411ccm-ktgjv-worker-us-east-2c-mhf4h
      machine.machine.openshift.io "huliu-aws411ccm-ktgjv-worker-us-east-2c-mhf4h" deleted
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-137-131.us-east-2.compute.internal Ready worker 63m v1.23.3+69213f8
      ip-10-0-142-194.us-east-2.compute.internal Ready master 68m v1.23.3+69213f8
      ip-10-0-188-250.us-east-2.compute.internal Ready master 69m v1.23.3+69213f8
      ip-10-0-190-236.us-east-2.compute.internal Ready worker 63m v1.23.3+69213f8
      ip-10-0-193-79.us-east-2.compute.internal Ready master 68m v1.23.3+69213f8
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.machine.openshift.io -o wide
      NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      huliu-aws411ccm-ktgjv-master-0 Running m6i.xlarge us-east-2 us-east-2a 70m ip-10-0-142-194.us-east-2.compute.internal aws:///us-east-2a/i-05d96395caa887d8a running
      huliu-aws411ccm-ktgjv-master-1 Running m6i.xlarge us-east-2 us-east-2b 70m ip-10-0-188-250.us-east-2.compute.internal aws:///us-east-2b/i-062357b65874125d0 running
      huliu-aws411ccm-ktgjv-master-2 Running m6i.xlarge us-east-2 us-east-2c 70m ip-10-0-193-79.us-east-2.compute.internal aws:///us-east-2c/i-0a220248387b666a8 running
      huliu-aws411ccm-ktgjv-worker-us-east-2a-68lcj Running m6i.large us-east-2 us-east-2a 66m ip-10-0-137-131.us-east-2.compute.internal aws:///us-east-2a/i-07835e479d27914ea running
      huliu-aws411ccm-ktgjv-worker-us-east-2b-wsdr9 Running m6i.large us-east-2 us-east-2b 66m ip-10-0-190-236.us-east-2.compute.internal aws:///us-east-2b/i-0ff467ae0b64f5e97 running
      huliu-aws411ccm-ktgjv-worker-us-east-2c-58cml Running m6i.large us-east-2 us-east-2c 8m44s ip-10-0-200-145 aws:///us-east-2c/i-00c3c1b8ac9e27704 running
      liuhuali@Lius-MacBook-Pro huali-test %
      liuhuali@Lius-MacBook-Pro huali-test % oc get pod -n openshift-cluster-machine-approver
      NAME READY STATUS RESTARTS AGE
      machine-approver-5955745c76-5z6rq 2/2 Running 0 73m
      machine-approver-capi-687b57b66d-lpv2q 2/2 Running 0 73m
      liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-cluster-machine-approver logs -f machine-approver-5955745c76-5z6rq -c machine-approver-controller

      I0512 02:23:42.526825 1 controller.go:121] Reconciling CSR: csr-9n978
      I0512 02:23:42.545659 1 csr_check.go:157] csr-9n978: CSR does not appear to be client csr
      I0512 02:23:42.552248 1 csr_check.go:545] retrieving serving cert from ip-10-0-200-145 (10.0.200.145:10250)
      I0512 02:23:42.553087 1 csr_check.go:182] Failed to retrieve current serving cert: remote error: tls: internal error
      I0512 02:23:42.553099 1 csr_check.go:202] Falling back to machine-api authorization for ip-10-0-200-145
      I0512 02:23:42.558665 1 controller.go:240] CSR csr-9n978 approved

      Actual results:
      'oc get machines.machine.openshift.io -o wide' can see the newly created node(ip-10-0-200-145) miss AWS DNS suffix;
      'oc get node' only return 5 nodes, miss the one newly created.

      Expected results:
      'oc get machines.machine.openshift.io -o wide' should see all nodes with AWS DNS suffix;
      'oc get node' should return 6 nodes

      Additional info:
      Seems related to https://bugzilla.redhat.com/show_bug.cgi?id=2072195

      some other cases:

      Case1:
      Repeat the above steps but change step 4 to 'Swap the dhcp-options-set for the VPC with one with domain-name'
      'oc get node' can return the node newly created.

      liuhuali@Lius-MacBook-Pro huali-test % oc delete machines.machine.openshift.io huliu-aws411ccm-ktgjv-worker-us-east-2a-68lcj
      machine.machine.openshift.io "huliu-aws411ccm-ktgjv-worker-us-east-2a-68lcj" deleted
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.machine.openshift.io -o wide
      NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      huliu-aws411ccm-ktgjv-master-0 Running m6i.xlarge us-east-2 us-east-2a 123m ip-10-0-142-194.us-east-2.compute.internal aws:///us-east-2a/i-05d96395caa887d8a running
      huliu-aws411ccm-ktgjv-master-1 Running m6i.xlarge us-east-2 us-east-2b 123m ip-10-0-188-250.us-east-2.compute.internal aws:///us-east-2b/i-062357b65874125d0 running
      huliu-aws411ccm-ktgjv-master-2 Running m6i.xlarge us-east-2 us-east-2c 123m ip-10-0-193-79.us-east-2.compute.internal aws:///us-east-2c/i-0a220248387b666a8 running
      huliu-aws411ccm-ktgjv-worker-us-east-2a-q6gwt Running m6i.large us-east-2 us-east-2a 11m ip-10-0-128-73.us-east-2.compute.internal aws:///us-east-2a/i-079457d03825b1a8e running
      huliu-aws411ccm-ktgjv-worker-us-east-2b-wsdr9 Running m6i.large us-east-2 us-east-2b 119m ip-10-0-190-236.us-east-2.compute.internal aws:///us-east-2b/i-0ff467ae0b64f5e97 running
      huliu-aws411ccm-ktgjv-worker-us-east-2c-58cml Running m6i.large us-east-2 us-east-2c 61m ip-10-0-200-145 aws:///us-east-2c/i-00c3c1b8ac9e27704 running
      liuhuali@Lius-MacBook-Pro huali-test %
      liuhuali@Lius-MacBook-Pro huali-test %
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-128-73.us-east-2.compute.internal Ready worker 8m25s v1.23.3+69213f8
      ip-10-0-142-194.us-east-2.compute.internal Ready master 122m v1.23.3+69213f8
      ip-10-0-188-250.us-east-2.compute.internal Ready master 123m v1.23.3+69213f8
      ip-10-0-190-236.us-east-2.compute.internal Ready worker 117m v1.23.3+69213f8
      ip-10-0-193-79.us-east-2.compute.internal Ready master 122m v1.23.3+69213f8

      Case2:
      Repeat the above steps but change step 2 to 'install a cluster without feature gate'
      'oc get node' can return the node newly created;
      'oc get machine -o wide' can see all nodes with AWS DNS suffix

      liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-aws411org-n9znk-worker-us-east-2c-g6png
      machine.machine.openshift.io "huliu-aws411org-n9znk-worker-us-east-2c-g6png" deleted
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-143-96.us-east-2.compute.internal Ready worker 64m v1.23.3+69213f8
      ip-10-0-158-115.us-east-2.compute.internal Ready master 69m v1.23.3+69213f8
      ip-10-0-161-97.us-east-2.compute.internal Ready worker 64m v1.23.3+69213f8
      ip-10-0-183-83.us-east-2.compute.internal Ready master 67m v1.23.3+69213f8
      ip-10-0-207-171.us-east-2.compute.internal Ready master 68m v1.23.3+69213f8
      ip-10-0-211-24.us-east-2.compute.internal Ready worker 4m28s v1.23.3+69213f8
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o wide
      NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      huliu-aws411org-n9znk-master-0 Running m6i.xlarge us-east-2 us-east-2a 69m ip-10-0-158-115.us-east-2.compute.internal aws:///us-east-2a/i-015848c984c27f208 running
      huliu-aws411org-n9znk-master-1 Running m6i.xlarge us-east-2 us-east-2b 69m ip-10-0-183-83.us-east-2.compute.internal aws:///us-east-2b/i-05d5e5f3928e1f0cc running
      huliu-aws411org-n9znk-master-2 Running m6i.xlarge us-east-2 us-east-2c 69m ip-10-0-207-171.us-east-2.compute.internal aws:///us-east-2c/i-0b3e2d804b47bb401 running
      huliu-aws411org-n9znk-worker-us-east-2a-6595z Running m6i.large us-east-2 us-east-2a 66m ip-10-0-143-96.us-east-2.compute.internal aws:///us-east-2a/i-0caef8be0317db87c running
      huliu-aws411org-n9znk-worker-us-east-2b-nnprl Running m6i.large us-east-2 us-east-2b 66m ip-10-0-161-97.us-east-2.compute.internal aws:///us-east-2b/i-0315216923c19c195 running
      huliu-aws411org-n9znk-worker-us-east-2c-kfpwh Running m6i.large us-east-2 us-east-2c 8m23s ip-10-0-211-24.us-east-2.compute.internal aws:///us-east-2c/i-09981802d2b381bdf running

      Then enable feature gate
      liuhuali@Lius-MacBook-Pro huali-test % oc edit featuregate cluster
      featuregate.config.openshift.io/cluster edited

      Wait more than four hours, the node still NotReady

      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME STATUS ROLES AGE VERSION
      ip-10-0-143-96.us-east-2.compute.internal Ready worker 6h11m v1.23.3+69213f8
      ip-10-0-158-115.us-east-2.compute.internal Ready master 6h16m v1.23.3+69213f8
      ip-10-0-161-97.us-east-2.compute.internal Ready worker 6h11m v1.23.3+69213f8
      ip-10-0-183-83.us-east-2.compute.internal Ready master 6h14m v1.23.3+69213f8
      ip-10-0-207-171.us-east-2.compute.internal Ready master 6h15m v1.23.3+69213f8
      ip-10-0-211-24.us-east-2.compute.internal NotReady,SchedulingDisabled worker 5h12m v1.23.3+69213f8
      liuhuali@Lius-MacBook-Pro huali-test % oc get machines.machine.openshift.io -o wide
      NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
      huliu-aws411org-n9znk-master-0 Running m6i.xlarge us-east-2 us-east-2a 6h20m ip-10-0-158-115.us-east-2.compute.internal aws:///us-east-2a/i-015848c984c27f208 running
      huliu-aws411org-n9znk-master-1 Running m6i.xlarge us-east-2 us-east-2b 6h20m ip-10-0-183-83.us-east-2.compute.internal aws:///us-east-2b/i-05d5e5f3928e1f0cc running
      huliu-aws411org-n9znk-master-2 Running m6i.xlarge us-east-2 us-east-2c 6h20m ip-10-0-207-171.us-east-2.compute.internal aws:///us-east-2c/i-0b3e2d804b47bb401 running
      huliu-aws411org-n9znk-worker-us-east-2a-6595z Running m6i.large us-east-2 us-east-2a 6h17m ip-10-0-143-96.us-east-2.compute.internal aws:///us-east-2a/i-0caef8be0317db87c running
      huliu-aws411org-n9znk-worker-us-east-2b-nnprl Running m6i.large us-east-2 us-east-2b 6h17m ip-10-0-161-97.us-east-2.compute.internal aws:///us-east-2b/i-0315216923c19c195 running
      huliu-aws411org-n9znk-worker-us-east-2c-kfpwh Running m6i.large us-east-2 us-east-2c 5h19m ip-10-0-211-24.us-east-2.compute.internal aws:///us-east-2c/i-09981802d2b381bdf running

      Attachments

        Issue Links

          Activity

            People

              huliu@redhat.com Huali Liu
              huliu@redhat.com Huali Liu
              Huali Liu Huali Liu
              Red Hat Employee
              Damiano Donati
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: