-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.11, 4.10
-
+
-
Critical
-
No
-
ShiftStack Sprint 240, ShiftStack Sprint 241
-
2
-
Rejected
-
False
-
Description of problem:
Installing OCP 4.10/4.11 on top of RHOS-16.2 with baremetal workers fails with worker machines stuck in the 'Provisioned' phase.
Observing the baremetal nodes that can't pull the metadata from OpenStack:
Jun 28 07:16:49 ostest-qkqjx-master-1 hyperkube[1688]: I0628 07:16:49.187526 1688 csi_plugin.go:1021] Failed to contact API server when waiting for CSINode publishing: Get "https://api-int.ostest.shiftstack.com:6443/apis/storage.k8s.io/v1/csinodes/ostest-qkqjx-master-1": dial tcp: lookup api-int.ostest.shiftstack.com on 10.46.0.31:53: no such host Jun 28 07:16:49 ostest-qkqjx-master-1 hyperkube[1688]: E0628 07:16:49.313614 1688 kubelet_node_status.go:94] "Unable to register node with API server" err="Post \"https://api-int.ostest.shiftstack.com:6443/api/v1/nodes\": dial tcp: lookup api-int.ostest.shiftstack.com on 10.46.0.31:53: no such host" node="ostest-qkqjx-master-1"
Jun 28 09:18:19 ostest-qkqjx-worker-0-q4gmn hyperkube[66773]: I0628 09:18:19.723093 66773 cloud_request_manager.go:115] "Node addresses from cloud provider for node not collected" nodeName=ostest-qkqjx-worker-0-q4gmn err="error fetching http://169.254.169.254/2009-04-04/meta-data/local-ipv4: Get \"http://169.254.169.254/2009-04-04/meta-data/local-ipv4\": dial tcp 169.254.169.254:80: connect: connection refused" Jun 28 09:18:19 ostest-qkqjx-worker-0-q4gmn hyperkube[66773]: E0628 09:18:19.723417 66773 kubelet.go:2375] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?"
Debug:
- Using the same configuration and environment but with the installer of OCP 4.12 (with the same network type) passes successfully.
- The OSP IPI is accessible from the worker, but the curl of the metadata fails:
[root@ostest-4ccwn-worker-0-tg2dh core]# ping -c 3 169.254.169.254
PING 169.254.169.254 (169.254.169.254) 56(84) bytes of data.
64 bytes from 169.254.169.254: icmp_seq=1 ttl=64 time=1.15 ms
64 bytes from 169.254.169.254: icmp_seq=2 ttl=64 time=0.791 ms
64 bytes from 169.254.169.254: icmp_seq=3 ttl=64 time=0.249 ms--- 169.254.169.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2033ms
rtt min/avg/max/mdev = 0.249/0.730/1.152/0.372 ms
[root@ostest-4ccwn-worker-0-tg2dh core]# curl http://169.254.169.254
curl: (7) Failed to connect to 169.254.169.254 port 80: Connection refused
This behavior reminds the scenario on Bug 2213862, but in our case, we are applying the following w/a - https://bugzilla.redhat.com/show_bug.cgi?id=2213862#c11.
Version-Release number of selected component (if applicable):
OCP 4.10/4.11 on top of RHOS-16.2-RHEL-8-20230526.n.1 (The current OSP passed_phase2)
How reproducible:
Always
Steps to Reproduce:
Run the 4.10/4.11 openshift installer with Baremetal Workers on top of OSP 16.2.
Actual results:
Worker machines stuck in the 'Provisioned' phase
Expected results:
Worker machines provisioned and moving to the 'Running' phase
Additional info:
* The issue is not repreduced with OCP 4.12/4.13. * The issue is not repreduced when using OCP 4.10, and 4.11 on top of OSP 16.1.6. * FYI, installation of OCP 4.12/4.13 with BMWs failed until OSP 16.2.5 due to https://bugzilla.redhat.com/show_bug.cgi?id=2007120. This bug is on Verified state now.