Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43274

[AWS] cluster upgrade or install failed when using dhcp option with some domain-name

XMLWordPrintable

    • Moderate
    • None
    • CLOUD Sprint 261
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

         [AWS] cluster upgrade failed when using dhcp option with upper case domain-name 

      Version-Release number of selected component (if applicable):

          I tested 4.13.0-0.nightly-2024-10-10-221519 -> 4.14.0-0.nightly-2024-10-11-181710 and
       4.13.48-x86_64 -> 4.14.36-x86_64 and 4.13.0-0.nightly-2024-10-10-221519 -> 4.14.0-0.ci.test-2024-10-14-013447-ci-ln-75t1mmb-latest

      How reproducible:

          Always

      Steps to Reproduce:

          1.Create a dhcp option with upper case domain-name
      
      liuhuali@Lius-MacBook-Pro huali-test % aws ec2 create-dhcp-options --dhcp-configurations '[{"Key":"domain-name-servers","Values":["AmazonProvidedDNS"]},{"Key":"domain-name","Values":["HUALI-Qe.exampleA.com"]}]'
      DHCPOPTIONS	dopt-085f8c5f0eb6dae21	301721915996
      DHCPCONFIGURATIONS	domain-name
      VALUES	HUALI-Qe.exampleA.com
      DHCPCONFIGURATIONS	domain-name-servers
      VALUES	AmazonProvidedDNS
      
          2.Install an AWS  cluster, I use automated template: 
      ipi-on-aws/versioned-installer-ci
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.0-0.nightly-2024-10-10-221519   True        False         26m     Cluster version is 4.13.0-0.nightly-2024-10-10-221519 
      
          3.Swap the VPC to the dhcp created in the first step on AWS console     
      
          4.Create a new machineset or scale a machineset
      
        liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
      machineset.machine.openshift.io/huliu-aws1012d-snmnz-worker-us-east-2aa created
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine
      NAME                                            PHASE     TYPE         REGION      ZONE         AGE
      huliu-aws1012d-snmnz-master-0                   Running   m6i.xlarge   us-east-2   us-east-2a   56m
      huliu-aws1012d-snmnz-master-1                   Running   m6i.xlarge   us-east-2   us-east-2b   56m
      huliu-aws1012d-snmnz-master-2                   Running   m6i.xlarge   us-east-2   us-east-2c   56m
      huliu-aws1012d-snmnz-worker-us-east-2a-ksnlj    Running   m6i.xlarge   us-east-2   us-east-2a   52m
      huliu-aws1012d-snmnz-worker-us-east-2aa-qxsgk   Running   m6i.xlarge   us-east-2   us-east-2a   10m
      huliu-aws1012d-snmnz-worker-us-east-2b-88pjf    Running   m6i.xlarge   us-east-2   us-east-2b   52m
      huliu-aws1012d-snmnz-worker-us-east-2c-prp5h    Running   m6i.xlarge   us-east-2   us-east-2c   52m
      liuhuali@Lius-MacBook-Pro huali-test % oc get node
      NAME                                         STATUS   ROLES                  AGE     VERSION
      ip-10-0-135-249.us-east-2.compute.internal   Ready    control-plane,master   55m     v1.26.15+53fd427
      ip-10-0-143-246.us-east-2.compute.internal   Ready    worker                 7m55s   v1.26.15+53fd427
      ip-10-0-146-224.us-east-2.compute.internal   Ready    worker                 49m     v1.26.15+53fd427
      ip-10-0-168-215.us-east-2.compute.internal   Ready    worker                 49m     v1.26.15+53fd427
      ip-10-0-189-48.us-east-2.compute.internal    Ready    control-plane,master   55m     v1.26.15+53fd427
      ip-10-0-197-11.us-east-2.compute.internal    Ready    control-plane,master   55m     v1.26.15+53fd427
      ip-10-0-203-123.us-east-2.compute.internal   Ready    worker                 49m     v1.26.15+53fd427
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws1012d-snmnz-worker-us-east-2aa-qxsgk  -oyaml
      ...
      status:
        addresses:
        - address: 10.0.143.246
          type: InternalIP
        - address: ip-10-0-143-246.us-east-2.compute.internal
          type: InternalDNS
        - address: ip-10-0-143-246.us-east-2.compute.internal
          type: Hostname
        - address: ip-10-0-143-246.HUALI-Qe.exampleA.com
          type: InternalDNS
      ...
      
         5.Upgrade the cluster, the cluster stuck on cloud-credential
      
      liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.0-0.nightly-2024-10-10-221519   True        True          78m     Unable to apply 4.14.0-0.nightly-2024-10-11-181710: wait has exceeded 40 minutes for these operators: cloud-credential
      liuhuali@Lius-MacBook-Pro huali-test % oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.0-0.nightly-2024-10-11-181710   True        False         False      128m    
      baremetal                                  4.14.0-0.nightly-2024-10-11-181710   True        False         False      141m    
      cloud-controller-manager                   4.14.0-0.nightly-2024-10-11-181710   True        False         False      144m    
      cloud-credential                           4.14.0-0.nightly-2024-10-11-181710   True        True          True       144m    6 of 6 credentials requests are failing to sync.
      cluster-autoscaler                         4.14.0-0.nightly-2024-10-11-181710   True        False         False      141m    
      config-operator                            4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      console                                    4.14.0-0.nightly-2024-10-11-181710   True        False         False      130m    
      control-plane-machine-set                  4.14.0-0.nightly-2024-10-11-181710   True        False         False      136m    
      csi-snapshot-controller                    4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      dns                                        4.13.0-0.nightly-2024-10-10-221519   True        False         False      141m    
      etcd                                       4.14.0-0.nightly-2024-10-11-181710   True        False         False      141m    
      image-registry                             4.14.0-0.nightly-2024-10-11-181710   True        False         False      135m    
      ingress                                    4.14.0-0.nightly-2024-10-11-181710   True        False         False      136m    
      insights                                   4.14.0-0.nightly-2024-10-11-181710   True        False         False      136m    
      kube-apiserver                             4.14.0-0.nightly-2024-10-11-181710   True        False         False      131m    
      kube-controller-manager                    4.14.0-0.nightly-2024-10-11-181710   True        False         False      139m    
      kube-scheduler                             4.14.0-0.nightly-2024-10-11-181710   True        False         False      139m    
      kube-storage-version-migrator              4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      machine-api                                4.14.0-0.nightly-2024-10-11-181710   True        False         False      138m    
      machine-approver                           4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      machine-config                             4.13.0-0.nightly-2024-10-10-221519   True        False         False      141m    
      marketplace                                4.14.0-0.nightly-2024-10-11-181710   True        False         False      141m    
      monitoring                                 4.14.0-0.nightly-2024-10-11-181710   True        False         False      135m    
      network                                    4.13.0-0.nightly-2024-10-10-221519   True        False         False      143m    
      node-tuning                                4.14.0-0.nightly-2024-10-11-181710   True        False         False      51m     
      openshift-apiserver                        4.14.0-0.nightly-2024-10-11-181710   True        False         False      131m    
      openshift-controller-manager               4.14.0-0.nightly-2024-10-11-181710   True        False         False      138m    
      openshift-samples                          4.14.0-0.nightly-2024-10-11-181710   True        False         False      52m     
      operator-lifecycle-manager                 4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      operator-lifecycle-manager-catalog         4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2024-10-11-181710   True        False         False      52m     
      service-ca                                 4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m    
      storage                                    4.14.0-0.nightly-2024-10-11-181710   True        False         False      142m   
      
      
      liuhuali@Lius-MacBook-Pro huali-test % oc logs cloud-credential-operator-567bc97fb4-4lfq5 -n openshift-cloud-credential-operator  -c  cloud-credential-operator
      ...
      time="2024-10-12T10:42:59Z" level=error msg="RequestError: send request failed\ncaused by: Post \"https://iam.amazonaws.com/\": dial tcp 18.119.154.66:443: i/o timeout"
      time="2024-10-12T10:42:59Z" level=error msg="error determining whether a credentials update is needed" actuator=aws cr=openshift-cloud-credential-operator/openshift-machine-api-aws error="AWS Error: RequestError: send request failed\ncaused by: Post \"https://iam.amazonaws.com/\": dial tcp 18.119.154.66:443: i/o timeout"
      time="2024-10-12T10:42:59Z" level=error msg="error syncing credentials: error determining whether a credentials update is needed" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-machine-api/aws-cloud-credentials
      time="2024-10-12T10:42:59Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-aws secret=openshift-machine-api/aws-cloud-credentials
      time="2024-10-12T10:42:59Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/aws-ebs-csi-driver-operator

      Actual results:

          Upgrade failed

      Expected results:

          Upgrade succeed

      Additional info:

      Must-gather  https://drive.google.com/file/d/1v--I5ghJvVBVvnW9hwW3DfAxiHo-G33G/view?usp=sharing    

              joelspeed Joel Speed
              huliu@redhat.com Huali Liu
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: