Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11367

[OCP 4.12][IPI deployment] PXE boot fails

    XMLWordPrintable

Details

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      When trying to deploy OCP 4.12.0-0.nightly-2023-04-03-212004 the deployment does progress because nodes fail to boot from PXE (PXE-E18: Server response timeout)

      Version-Release number of selected component (if applicable):

      4.12.0-0.nightly-2023-04-03-212004

      How reproducible:

      100%

      Steps to Reproduce:

      1. Start MNO dual-stack IPI pxe deployment
      2. Monitor nodes over ipmi
      3. 
      

      Actual results:

      Deployment eventually fail

      Expected results:

      Deployment completes successfully

      Additional info:

      I've been stuck for two days now trying to understand what is the problem and everything from the environment seems to be configured correctly, but for some reason pxe booting fails.
      
      I've been digging into it from a stand-alone RHEL 9.1 installation (on master-0) and I cannot make the bootstrap vm to respond to DCHP DISCOVER (I've used nmap scripts for broadcast dhcp and dhcp-discover), dhcpdump segfaults with no results.
      
      If IP is assigned to the ens1f0 interface from the 172.22.0.0 subnet, the bootstrap vm is reachable.
      
      Provisionhost that runs bootstrap vm is connected over Mellanox CX4 NIC, port 1 (eno1) is used for baremetal network bridge and port 2 (eno2) is used for Provisioning network bridge.
      
      MNO cluster is using Intel XXV710 NICs, port 0 (ens1f0) is used for PXE booting and port 1 (ens1f1) is configured to be used for baremetal network.
      
      On a switch side, respective ports are configured as follows:
      baremetal ports are set to use routable vlan 182 (10.1.219.0/24)
      provision ports are set to use non-routable vlan 1000
      
      This is baremetal servers Dell R740XD (master) and R740XL (workers).
      
      Please advise what else I might be missing?

       

      Attachments

        Activity

          People

            bnemec@redhat.com Benjamin Nemec
            agurenko@redhat.com Alexander Gurenko
            Pedro Jose Amoedo Martinez Pedro Jose Amoedo Martinez
            Alexander Gurenko
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: