Details
-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12.z
-
Important
-
No
-
Rejected
-
False
-
Description
Description of problem:
When trying to deploy OCP 4.12.0-0.nightly-2023-04-03-212004 the deployment does progress because nodes fail to boot from PXE (PXE-E18: Server response timeout)
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2023-04-03-212004
How reproducible:
100%
Steps to Reproduce:
1. Start MNO dual-stack IPI pxe deployment 2. Monitor nodes over ipmi 3.
Actual results:
Deployment eventually fail
Expected results:
Deployment completes successfully
Additional info:
I've been stuck for two days now trying to understand what is the problem and everything from the environment seems to be configured correctly, but for some reason pxe booting fails. I've been digging into it from a stand-alone RHEL 9.1 installation (on master-0) and I cannot make the bootstrap vm to respond to DCHP DISCOVER (I've used nmap scripts for broadcast dhcp and dhcp-discover), dhcpdump segfaults with no results. If IP is assigned to the ens1f0 interface from the 172.22.0.0 subnet, the bootstrap vm is reachable. Provisionhost that runs bootstrap vm is connected over Mellanox CX4 NIC, port 1 (eno1) is used for baremetal network bridge and port 2 (eno2) is used for Provisioning network bridge. MNO cluster is using Intel XXV710 NICs, port 0 (ens1f0) is used for PXE booting and port 1 (ens1f1) is configured to be used for baremetal network. On a switch side, respective ports are configured as follows: baremetal ports are set to use routable vlan 182 (10.1.219.0/24) provision ports are set to use non-routable vlan 1000 This is baremetal servers Dell R740XD (master) and R740XL (workers). Please advise what else I might be missing?