-
Bug
-
Resolution: Unresolved
-
Normal
-
4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
In Progress
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem:
In a dual-stack OpenShift cluster, ironic-proxy pods on 2 out of 3 ACM hub master nodes only have IPv4 addresses assigned, while the third master node has both IPv4 and IPv6 addresses. This causes failures when customers attempt to discover IPv6-only BMC hosts, as requests originating from nodes without IPv6 addresses fail with error messages indicating inability to locate ISO/IMG image files.
Version-Release number of selected component (if applicable):
ACM 2.13/ocp 4.18
How reproducible:
Consistently reproducible when: Cluster is configured as dual-stack Ironic-proxy pods land on master nodes without IPv6 addresses Attempting to discover IPv6-only BMC hosts
Steps to Reproduce:
Deploy a dual-stack OpenShift cluster with ACM Configure IPv6-only BMC hosts Attempt to discover the BMC hosts through metal3/ironic Observe failures when requests originate from nodes without IPv6 addresses
Actual results:
Ironic-proxy pods on some master nodes only have IPv4 addresses BMC discovery fails with errors when using IPv6 addresses Error messages indicate inability to locate ISO/IMG image files Node status shows inconsistent IP address assignments across masters
Expected results:
All ironic-proxy pods in a dual-stack cluster should have both IPv4 and IPv6 addresses BMC discovery should work consistently regardless of which master node handles the request Cluster should maintain consistent dual-stack networking configuration across all nodes
Additional info:
Customer observations: IPv4 addresses are issued by DHCP (Infoblox) Underlying network uses Cisco ACI with RA messages and SLAAC (but using DHCPv6 due to OpenShift limitations) DHCPv6 service running on a dedicated server NodeIP configuration seems to initialize before IPv6 address is available Cluster was installed as dual-stack from beginning (not converted) Diagnostic data available: ACM must-gather OCP must-gather sosreport from affected nodes Detailed network configuration information Relevant error from logs (sensitive data redacted): {"level":"info","ts":1751443199.2524848,"logger":"controllers.BareMetalHost","msg":"using PreprovisioningImage","baremetalhost":{"name":"test","namespace":"baremetal"},"provisioningState":"provisioning","Image":{"ImageURL":"https://assisted-image-service-multicluster-engine.apps.<redacted-domain>/byapikey/<redacted>","KernelURL":"","ExtraKernelParams":"","Format":"iso"}} {"level":"info","ts":1751443199.2841182,"logger":"provisioner.ironic","msg":"current provision state","host":"baremetal~test","lastError":"Failed to prepare to deploy. Exception: HTTP POST https://[<redacted-ipv6>]/redfish/v1/Systems/System.Embedded.1/VirtualMedia/1/Actions/VirtualMedia.InsertMedia returned code 500. Base.1.12.GeneralError: A general error has occurred. See ExtendedInfo for more information Extended information: [{'Message': 'Unable to locate the ISO or IMG image file or folder in the network share location because the file or folder path or the user credentials entered are incorrect.', 'MessageArgs': ['https://<redacted-ipv4>:6183/redfish/boot-<redacted-uuid>.iso'], 'MessageArgs@odata.count': 1, 'MessageId': 'IDRAC.2.9.RAC0720', 'RelatedProperties': ['#/Image'], 'RelatedProperties@odata.count': 1, 'Resolution': 'Enter the correct file or folder path and credentials, and then retry the operation.', 'Severity': 'Informational'}]","current":"deploy failed","target":"active"} Note: Hostnames (e.g., masterXX.cluster.example.com) replaced with generic terms IPv4/IPv6 addresses redacted (e.g., 159.216.9.19 → <redacted-ipv4>, 2a13:6203:... → <redacted-ipv6>) MAC addresses, API keys, and UUIDs removed Domain names anonymized
Post-investigation engineering summary:
Due to https://github.com/openshift/installer/blob/release-4.18/pkg/asset/machines/master.go#L580-L591 the "ip=dhcp,dhcp6" kernel arguments are present in dual-stack clusters only when platform is Metal, OpenStack or vSphere. For platform "none" there is nothing in the installation process that configures those. As a result, clusters installed with platform "none" are prone to the race condition occurring when IP address acquisition for v4 stack is significantly faster than for v6.
As a workaround, affected clusters can be fed with two MachineConfigs provided in a comment below (https://issues.redhat.com/browse/OCPBUGS-59104?focusedId=27584371&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-27584371) that will add the required kernel arguments.