Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20369

worker CSR are pending, so no worker nodes available

XMLWordPrintable

    • Important
    • No
    • CLOUD Sprint 243, CLOUD Sprint 244
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, in certain proxied environments, the Amazon Web Services (AWS) metadata service might not have been present on initial startup, and might have only been available shortly after startup. The kubelet hostname fetching did not account for this delay and, consequently, the node would fail to boot because it would not have a valid hostname. This update ensures that the hostname fetching script retries on failure for some time. As a result, inaccessibility of the metadata service is tolerated for a short period of time. (link:https://issues.redhat.com/browse/OCPBUGS-20369[*OCPBUGS-20369*])
      Show
      * Previously, in certain proxied environments, the Amazon Web Services (AWS) metadata service might not have been present on initial startup, and might have only been available shortly after startup. The kubelet hostname fetching did not account for this delay and, consequently, the node would fail to boot because it would not have a valid hostname. This update ensures that the hostname fetching script retries on failure for some time. As a result, inaccessibility of the metadata service is tolerated for a short period of time. (link: https://issues.redhat.com/browse/OCPBUGS-20369 [* OCPBUGS-20369 *])
    • Bug Fix
    • Done

      Description of problem:

      worker CSR are pending, so no worker nodes available

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-10-06-234925

      How reproducible:

      Always

      Steps to Reproduce:

      Create a cluster with profile - aws-c2s-ipi-disconnected-private-fips

      Actual results:

      Workers csrs are pending 

      Expected results:

      workers should be up and running all CSRs approved 

      Additional info:

      failed to find machine for node ip-10-143-1-120” , in logs of cluster-machine-approver 
      
      Seems like we should have ips like 
      “ip-10-143-1-120.ec2.internal”
      
      failing here - https://github.com/openshift/cluster-machine-approver/blob/master/pkg/controller/csr_check.go#L263

       

      Must-gather - https://drive.google.com/file/d/15tz9TLdTXrH6bSBSfhlIJ1l_nzeFE1R3/view?usp=sharing

      cluster - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/238922/

      template for installation - https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-fips-c2s-ci

       

      cc yunjiang-1 rhn-support-zhsun 

            joelspeed Joel Speed
            rh-ee-miyadav Milind Yadav
            Yunfei Jiang Yunfei Jiang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: