Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.11.0
Component/s: Bare Metal Hardware Provisioning / ironic
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:

4.13
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Attempting to perform an IPI baremetal deployment using NMstate to pre-configure the external and provisioning NICs on the cluster.  The master nodes are being deployed however the worker nodes never deploy.  In order to get the deployment to successfully complete so I can look at things, I moved the networking services to the masters and made them double as worker nodes so the IPI deployment finished and I can look at things.

Looking at the deployment, it looks like the workers are failing because the provisioning IP fails to get configured on the master nodes:

kni@gvs01 clusterconfigs]$ oc -n openshift-machine-api get pods
NAME                                           READY   STATUS                  RESTARTS       AGE
cluster-autoscaler-operator-66466545bd-n9q2j   2/2     Running                 0              29m
cluster-baremetal-operator-6966c498d7-j6qmq    2/2     Running                 0              29m
machine-api-controllers-777cc7c6d5-lmsmd       7/7     Running                 0              5m51s
machine-api-operator-6d8cf76747-t454f          2/2     Running                 0              29m
metal3-6647f79d64-5rvt8                        0/7     Init:CrashLoopBackOff   5 (2m1s ago)   5m25s
metal3-image-cache-5h8q6                       1/1     Running                 0              5m4s
metal3-image-cache-t5jd4                       1/1     Running                 0              5m4s
metal3-image-cache-tkkf8                       1/1     Running                 0              5m4s
metal3-image-customization-6c54bdcd96-cg295    1/1     Running                 0              4m33s
[kni@gvs01 clusterconfigs]$ oc -n openshift-machine-api logs metal3-6647f79d64-5rvt8
Defaulted container "metal3-baremetal-operator" out of: metal3-baremetal-operator, metal3-httpd, metal3-ironic, metal3-ramdisk-logs, metal3-ironic-inspector, metal3-static-ip-manager, metal3-dnsmasq, metal3-static-ip-set (init), machine-os-images (init)
Error from server (BadRequest): container "metal3-baremetal-operator" in pod "metal3-6647f79d64-5rvt8" is waiting to start: PodInitializing
[kni@gvs01 clusterconfigs]$ oc -n openshift-machine-api logs metal3-6647f79d64-5rvt8 -c metal3-static-ip-set
+ '[' -z 172.22.10.3/24 ']'
+ '[' -z eno12399 ']'
+ '[' -n eno12399 ']'
++ ip -o addr show dev eno12399 scope global
+ [[ -n 4: eno12399    inet 172.22.10.21/24 brd 172.22.10.255 scope global noprefixroute eno12399\       valid_lft forever preferred_lft forever ]]
+ ip -o addr show dev eno12399 scope global
+ grep -q 172.22.10.3/24
+ echo 'ERROR: "eno12399" is already set to ip address belong to different subset than "172.22.10.3/24"'
ERROR: "eno12399" is already set to ip address belong to different subset than "172.22.10.3/24"
+ exit 1
[kni@gvs01 clusterconfigs]$

I think the ERROR message does not match the test that is being done.  It looks like the test is only checking to see if the IP matches:

if ! ip -o addr show dev "${PROVISIONING_INTERFACE}" scope global | grep -q "${PROVISIONING_IP//::*/}" ; then
      echo "ERROR: \"$PROVISIONING_INTERFACE\" is already set to ip address belong to different subset than \"$PROVISIONING_IP\""
      exit 1
fi

Logging into a master node shows the NIC was configured properly using NMstate:

[core@openshift-master-0 ~]$ ip a s eno12399
4: eno12399: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:91:cc:07:9c brd ff:ff:ff:ff:ff:ff
    inet 172.22.10.20/24 brd 172.22.10.255 scope global noprefixroute eno12399
       valid_lft forever preferred_lft forever
[core@openshift-master-0 ~]$

And functional:
[root@openshift-master-0 ~]# ping -I eno12399 172.22.10.21
PING 172.22.10.21 (172.22.10.21) from 172.22.10.20 eno12399: 56(84) bytes of data.
64 bytes from 172.22.10.21: icmp_seq=1 ttl=64 time=0.146 ms
64 bytes from 172.22.10.21: icmp_seq=2 ttl=64 time=0.152 ms
^C
--- 172.22.10.21 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1036ms
rtt min/avg/max/mdev = 0.146/0.149/0.152/0.003 ms
[root@openshift-master-0 ~]#

And routing looks right:

[core@openshift-master-0 ~]$ ip r
default via 192.168.66.1 dev br-ex proto static metric 48
10.128.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.0.2
10.128.0.0/14 via 10.128.0.1 dev ovn-k8s-mp0
169.254.169.0/30 via 192.168.66.1 dev br-ex
169.254.169.3 via 10.128.0.1 dev ovn-k8s-mp0
172.22.10.0/24 dev eno12399 proto kernel scope link src 172.22.10.20 metric 100
172.30.0.0/16 via 192.168.66.1 dev br-ex mtu 1400
192.168.66.0/25 dev br-ex proto kernel scope link src 192.168.66.20 metric 48
[core@openshift-master-0 ~]$

And as you can see, there's a route match for the PROVISIONING_IP subnet

Version-Release number of selected component (if applicable):

How reproducible:

100% of time

Steps to Reproduce:

1. Create install-config.yaml for IPI deployment using provider network
2. Ensure using NMstate config for configuring both baremetal and provisioning NIC on cluster nodes
3. Provide provisioningNetworkInterface in the install-config.yaml even though you provided bootMACaddress for the nodes (The deployment fails with a different error (can't find suitable interfaces) if you don't provide this which may be another issue entirely.
4. Complete steps detailed in 14.3.8.3. Optional: Configuring network components to run on the control plane otherwise the IPI deployment does not complete at all
5. Deploy

When deployment is complete, investigate the metal3- pods.

Actual results:

Worker nodes never deploy.

Expected results:

Worker nodes deploy.

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mg.20221116.tgz
12.41 MB
2022/11/16 3:09 PM

relates to

HCIDOCS-137 Update static deploy steps to use virtual media

Closed

Assignee:: Benjamin Nemec

Reporter:: Darin Sorrentino

Need Info From:: None

Contributors:: None

QA Contact:: Pedro Jose Amoedo Martinez

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2022/11/15 8:03 PM

Updated:: 2025/07/28 5:51 PM

Resolved:: 2023/08/24 4:06 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates