-
Bug
-
Resolution: Unresolved
-
Major
-
4.18.z
Description of problem:
During install, one or more systems have a problem assigning the correct interface name. for example, idrac shows interface ens2f0 on slot 1 port 1 whereas on the system, it shows ens2f0 as slot1 port 1 via mac address. The system with the different interface assingment casnnot join the cluster and eventually becomes unreachable (looses it's ip address)
Version-Release number of selected component (if applicable):
OPC 4.18.4, rehl 4.9
How reproducible:
The cluster installation fails fails at least 9 out of 10 times
Steps to Reproduce:
1. ipi install initiated 2. bootstrap initially online 3. installation fails
Actual results:
Expected results:
successful ocp install
Additional info:
Customer has installed the same configuration on other systems without problems This is the first install will Dell AMD servers. The interface assignment for the servers is not predicable/consistent. Customer tried initially with bonds but then removed the bonds to simplify the situation and ranintothe same problem ocp version 4.18.9 redfish 3 node bmh cluster Only one is accessable oc gen no -owide master-0.dellamd.mavdallab.com Ready control-plane,master,worker 20m v1.31.7 10.69.26.97 bootstrap 10.69.26.219/10.69.26.95(api) master-1.dellamd.mavdallab.com" 10.69.26.93 - comes online then drops master-0.dellamd.mavdallab.com 10.69.26.97 - online after reboot master-2.dellamd.mavdallab.com 10.69.26.94 - never comes online May 02 18:03:06 localhost.localdomain baremetal-operator[5558]: {"level":"info","ts":1746208986.411257,"logger":"controllers.BareMetalHost.host_config_data","msg":"PreprovisioningNetworkData networkData key is not set, returning empty data","baremetalhost":{"name":"master-1.dellamd.mavdallab.com","namespace":"openshift-machine-api"},"provisioningState":"provisioned"} time="2025-05-01T11:36:24-05:00" level=info msg=" baremetalhost: master-0.dellamd.mavdallab.com: uninitialized" time="2025-05-01T11:36:25-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: uninitialized" time="2025-05-01T11:36:25-05:00" level=info msg=" baremetalhost: master-2.dellamd.mavdallab.com: uninitialized" time="2025-05-01T11:36:41-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: registering" time="2025-05-01T11:36:42-05:00" level=info msg=" baremetalhost: master-0.dellamd.mavdallab.com: registering" time="2025-05-01T11:36:42-05:00" level=info msg=" baremetalhost: master-2.dellamd.mavdallab.com: registering" time="2025-05-01T11:38:36-05:00" level=info msg=" baremetalhost: master-0.dellamd.mavdallab.com: inspecting" time="2025-05-01T11:38:36-05:00" level=info msg=" baremetalhost: master-2.dellamd.mavdallab.com: inspecting" time="2025-05-01T11:38:36-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: inspecting" time="2025-05-01T11:51:45-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: preparing" time="2025-05-01T11:51:45-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: available" time="2025-05-01T11:51:45-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: provisioning" time="2025-05-01T12:07:58-05:00" level=info msg=" baremetalhost: master-1.dellamd.mavdallab.com: provisioned" time="2025-05-01T12:19:24-05:00" level=info msg=" baremetalhost: master-2.dellamd.mavdallab.com: preparing" time="2025-05-01T12:19:25-05:00" level=info msg=" baremetalhost: master-2.dellamd.mavdallab.com: available" time="2025-05-01T12:19:25-05:00" level=info msg=" baremetalhost: master-2.dellamd.mavdallab.com: provisioning" time="2025-05-01T12:20:54-05:00" level=info msg=" baremetalhost: master-0.dellamd.mavdallab.com: preparing" time="2025-05-01T12:20:55-05:00" level=info msg=" baremetalhost: master-0.dellamd.mavdallab.com: available" time="2025-05-01T12:20:55-05:00" level=info msg=" baremetalhost: master-0.dellamd.mavdallab.com: provisioning" time="2025-05-01T12:34:58-05:00" level=error msg="Cluster operator authentication Degraded is True with omc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE openshift-machine-api dellamd-kzqtm-master-0 Running 1h 2025-05-02T17:14:28Z master-0.dellamd.mavdallab.com baremetalhost:///openshift-machine-api/master-0.dellamd.mavdallab.com/17653a53-d2c3-4ce8-bfcf-a5d06d311e2f openshift-machine-api dellamd-kzqtm-master-1 Provisioned 1h 2025-05-02T17:14:29Z baremetalhost:///openshift-machine-api/master-1.dellamd.mavdallab.com/ecbd117c-a598-4a91-8305-bee503e8bbc3 openshift-machine-api dellamd-kzqtm-master-2 Provisioning 1h 2025-05-02T17:14:29Z Looking at sosreport-master-0-04131365-2025-05-02-gdqnsui May 02 17:52:06 localhost systemd-udevd[2928]: ens5f0: Failed to rename network interface 6 from 'eth2' to 'ens5f0': File exists May 02 17:52:06 localhost systemd-udevd[2980]: ens2f1: Failed to rename network interface 9 from 'eth5' to 'ens2f1': File exists May 02 17:52:06 localhost systemd-udevd[3025]: ens2f0: Failed to rename network interface 10 from 'eth6' to 'ens2f0': File exists May 02 17:52:06 localhost systemd-udevd[3731]: ens5f1: Failed to rename network interface 7 from 'eth3' to 'ens5f1': File exists It is trying to rename some network interfaces reusing the existing names (ens2f* and ens5f*). This seems to match the issue description for https://access.redhat.com/solutions/7112603 and https://issues.redhat.com/browse/RHEL-44630 . From the Jira issue, I can see the lspci output ("sos_commands/pci/lspci_-tv" in the sosreport) shows the same structure with two cards hanging from the same IOMMU root complex: -+-[0000:c0]-+-00.0 Advanced Micro Devices, Inc. [AMD] Device 14a4 | +-00.2 Advanced Micro Devices, Inc. [AMD] Device 149e | +-00.3 Advanced Micro Devices, Inc. [AMD] Device 14a6 | +-01.0 Advanced Micro Devices, Inc. [AMD] Device 149f | +-01.1-[c4]--+-00.0 Intel Corporation Ethernet Controller E810-XXV for SFP | | \-00.1 Intel Corporation Ethernet Controller E810-XXV for SFP | +-01.2-[c5]--+-00.0 Intel Corporation Ethernet Controller E810-XXV for SFP | | \-00.1 Intel Corporation Ethernet Controller E810-XXV for SFP | +-02.0 Advanced Micro Devices, Inc. [AMD] Device 149f | +-03.0 Advanced Micro Devices, Inc. [AMD] Device 149f ...
- is blocked by
-
OCPBUGS-62739 Need new CoreOS boot image with nmstate-2.2.50
-
- POST
-
- is duplicated by
-
OCPBUGS-56001 attemping to install openshift ipi on 3 bmh Dell servers with AMD ethernet cards. One of the systems will not get the correct name for the interface and the install will fail
-
- Closed
-
- links to