Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15556

[Baremetal Workers][OSP 16.2] Worker machines stuck in the 'Provisioned' phase

XMLWordPrintable

    • +
    • Critical
    • No
    • ShiftStack Sprint 240, ShiftStack Sprint 241
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Installing OCP 4.10/4.11 on top of RHOS-16.2 with baremetal workers fails with worker machines stuck in the 'Provisioned' phase.

      Observing the baremetal nodes that can't pull the metadata from OpenStack:

      Jun 28 07:16:49 ostest-qkqjx-master-1 hyperkube[1688]: I0628 07:16:49.187526    1688 csi_plugin.go:1021] Failed to contact API server when waiting for CSINode publishing: Get "https://api-int.ostest.shiftstack.com:6443/apis/storage.k8s.io/v1/csinodes/ostest-qkqjx-master-1": dial tcp: lookup api-int.ostest.shiftstack.com on 10.46.0.31:53: no such host
      Jun 28 07:16:49 ostest-qkqjx-master-1 hyperkube[1688]: E0628 07:16:49.313614    1688 kubelet_node_status.go:94] "Unable to register node with API server" err="Post \"https://api-int.ostest.shiftstack.com:6443/api/v1/nodes\": dial tcp: lookup api-int.ostest.shiftstack.com on 10.46.0.31:53: no such host" node="ostest-qkqjx-master-1"
      Jun 28 09:18:19 ostest-qkqjx-worker-0-q4gmn hyperkube[66773]: I0628 09:18:19.723093   66773 cloud_request_manager.go:115] "Node addresses from cloud provider for node not collected" nodeName=ostest-qkqjx-worker-0-q4gmn err="error fetching http://169.254.169.254/2009-04-04/meta-data/local-ipv4: Get \"http://169.254.169.254/2009-04-04/meta-data/local-ipv4\": dial tcp 169.254.169.254:80: connect: connection refused"
      Jun 28 09:18:19 ostest-qkqjx-worker-0-q4gmn hyperkube[66773]: E0628 09:18:19.723417   66773 kubelet.go:2375] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?"

      Debug:

      • Using the same configuration and environment but with the installer of OCP 4.12 (with the same network type) passes successfully.
      • The OSP IPI is accessible from the worker, but the curl of the metadata fails:
      [root@ostest-4ccwn-worker-0-tg2dh core]# ping -c 3 169.254.169.254
      PING 169.254.169.254 (169.254.169.254) 56(84) bytes of data.
      64 bytes from 169.254.169.254: icmp_seq=1 ttl=64 time=1.15 ms
      64 bytes from 169.254.169.254: icmp_seq=2 ttl=64 time=0.791 ms
      64 bytes from 169.254.169.254: icmp_seq=3 ttl=64 time=0.249 ms--- 169.254.169.254 ping statistics ---
      3 packets transmitted, 3 received, 0% packet loss, time 2033ms
      rtt min/avg/max/mdev = 0.249/0.730/1.152/0.372 ms
      
      [root@ostest-4ccwn-worker-0-tg2dh core]# curl http://169.254.169.254
      curl: (7) Failed to connect to 169.254.169.254 port 80: Connection refused

      This behavior reminds the scenario on Bug 2213862, but in our case, we are applying the following w/a - https://bugzilla.redhat.com/show_bug.cgi?id=2213862#c11.

      Version-Release number of selected component (if applicable):

      OCP 4.10/4.11 on top of RHOS-16.2-RHEL-8-20230526.n.1 (The current OSP passed_phase2)

      How reproducible:

      Always

      Steps to Reproduce:

      Run the 4.10/4.11 openshift installer with Baremetal Workers on top of OSP 16.2. 

      Actual results:

      Worker machines stuck in the 'Provisioned' phase 
      

      Expected results:

      Worker machines provisioned and moving to the 'Running' phase

      Additional info:

      * The issue is not repreduced with OCP 4.12/4.13.
      * The issue is not repreduced when using OCP 4.10, and 4.11 on top of OSP 16.1.6.
      * FYI, installation of OCP 4.12/4.13 with BMWs failed until OSP 16.2.5 due to https://bugzilla.redhat.com/show_bug.cgi?id=2007120. This bug is on Verified state now.

            mdemaced Maysa De Macedo Souza
            rhn-support-imatza Itay Matza
            Itshak Brown Itshak Brown
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: