-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.11.0
-
Quality / Stability / Reliability
-
False
-
-
3
-
Important
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Attempting to perform an IPI baremetal deployment using NMstate to pre-configure the external and provisioning NICs on the cluster. The master nodes are being deployed however the worker nodes never deploy. In order to get the deployment to successfully complete so I can look at things, I moved the networking services to the masters and made them double as worker nodes so the IPI deployment finished and I can look at things. Looking at the deployment, it looks like the workers are failing because the provisioning IP fails to get configured on the master nodes: kni@gvs01 clusterconfigs]$ oc -n openshift-machine-api get pods NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-66466545bd-n9q2j 2/2 Running 0 29m cluster-baremetal-operator-6966c498d7-j6qmq 2/2 Running 0 29m machine-api-controllers-777cc7c6d5-lmsmd 7/7 Running 0 5m51s machine-api-operator-6d8cf76747-t454f 2/2 Running 0 29m metal3-6647f79d64-5rvt8 0/7 Init:CrashLoopBackOff 5 (2m1s ago) 5m25s metal3-image-cache-5h8q6 1/1 Running 0 5m4s metal3-image-cache-t5jd4 1/1 Running 0 5m4s metal3-image-cache-tkkf8 1/1 Running 0 5m4s metal3-image-customization-6c54bdcd96-cg295 1/1 Running 0 4m33s [kni@gvs01 clusterconfigs]$ oc -n openshift-machine-api logs metal3-6647f79d64-5rvt8 Defaulted container "metal3-baremetal-operator" out of: metal3-baremetal-operator, metal3-httpd, metal3-ironic, metal3-ramdisk-logs, metal3-ironic-inspector, metal3-static-ip-manager, metal3-dnsmasq, metal3-static-ip-set (init), machine-os-images (init) Error from server (BadRequest): container "metal3-baremetal-operator" in pod "metal3-6647f79d64-5rvt8" is waiting to start: PodInitializing [kni@gvs01 clusterconfigs]$ oc -n openshift-machine-api logs metal3-6647f79d64-5rvt8 -c metal3-static-ip-set + '[' -z 172.22.10.3/24 ']' + '[' -z eno12399 ']' + '[' -n eno12399 ']' ++ ip -o addr show dev eno12399 scope global + [[ -n 4: eno12399 inet 172.22.10.21/24 brd 172.22.10.255 scope global noprefixroute eno12399\ valid_lft forever preferred_lft forever ]] + ip -o addr show dev eno12399 scope global + grep -q 172.22.10.3/24 + echo 'ERROR: "eno12399" is already set to ip address belong to different subset than "172.22.10.3/24"' ERROR: "eno12399" is already set to ip address belong to different subset than "172.22.10.3/24" + exit 1 [kni@gvs01 clusterconfigs]$ I think the ERROR message does not match the test that is being done. It looks like the test is only checking to see if the IP matches: if ! ip -o addr show dev "${PROVISIONING_INTERFACE}" scope global | grep -q "${PROVISIONING_IP//::*/}" ; then echo "ERROR: \"$PROVISIONING_INTERFACE\" is already set to ip address belong to different subset than \"$PROVISIONING_IP\"" exit 1 fi Logging into a master node shows the NIC was configured properly using NMstate: [core@openshift-master-0 ~]$ ip a s eno12399 4: eno12399: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether b4:96:91:cc:07:9c brd ff:ff:ff:ff:ff:ff inet 172.22.10.20/24 brd 172.22.10.255 scope global noprefixroute eno12399 valid_lft forever preferred_lft forever [core@openshift-master-0 ~]$ And functional: [root@openshift-master-0 ~]# ping -I eno12399 172.22.10.21 PING 172.22.10.21 (172.22.10.21) from 172.22.10.20 eno12399: 56(84) bytes of data. 64 bytes from 172.22.10.21: icmp_seq=1 ttl=64 time=0.146 ms 64 bytes from 172.22.10.21: icmp_seq=2 ttl=64 time=0.152 ms ^C --- 172.22.10.21 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1036ms rtt min/avg/max/mdev = 0.146/0.149/0.152/0.003 ms [root@openshift-master-0 ~]# And routing looks right: [core@openshift-master-0 ~]$ ip r default via 192.168.66.1 dev br-ex proto static metric 48 10.128.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.0.2 10.128.0.0/14 via 10.128.0.1 dev ovn-k8s-mp0 169.254.169.0/30 via 192.168.66.1 dev br-ex 169.254.169.3 via 10.128.0.1 dev ovn-k8s-mp0 172.22.10.0/24 dev eno12399 proto kernel scope link src 172.22.10.20 metric 100 172.30.0.0/16 via 192.168.66.1 dev br-ex mtu 1400 192.168.66.0/25 dev br-ex proto kernel scope link src 192.168.66.20 metric 48 [core@openshift-master-0 ~]$ And as you can see, there's a route match for the PROVISIONING_IP subnet
Version-Release number of selected component (if applicable):
How reproducible:
100% of time
Steps to Reproduce:
1. Create install-config.yaml for IPI deployment using provider network 2. Ensure using NMstate config for configuring both baremetal and provisioning NIC on cluster nodes 3. Provide provisioningNetworkInterface in the install-config.yaml even though you provided bootMACaddress for the nodes (The deployment fails with a different error (can't find suitable interfaces) if you don't provide this which may be another issue entirely. 4. Complete steps detailed in 14.3.8.3. Optional: Configuring network components to run on the control plane otherwise the IPI deployment does not complete at all 5. Deploy When deployment is complete, investigate the metal3- pods.
Actual results:
Worker nodes never deploy.
Expected results:
Worker nodes deploy.
Additional info:
- relates to
-
HCIDOCS-137 Update static deploy steps to use virtual media
-
- Closed
-