Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59104

Dual-Stack Cluster Fails BMC Discovery Due to Missing IPv6 on Some Ironic-Proxy Pods

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • In Progress
    • Release Note Not Required
    • N/A
    • None
    • None
    • None
    • None

      Description of problem:

      In a dual-stack OpenShift cluster, ironic-proxy pods on 2 out of 3 ACM hub master nodes only have IPv4 addresses assigned, while the third master node has both IPv4 and IPv6 addresses. This causes failures when customers attempt to discover IPv6-only BMC hosts, as requests originating from nodes without IPv6 addresses fail with error messages indicating inability to locate ISO/IMG image files.
      

      Version-Release number of selected component (if applicable):

      ACM 2.13/ocp 4.18
      

      How reproducible:

      Consistently reproducible when:
      
      Cluster is configured as dual-stack
      
      Ironic-proxy pods land on master nodes without IPv6 addresses
      
      Attempting to discover IPv6-only BMC hosts
      

      Steps to Reproduce:

      Deploy a dual-stack OpenShift cluster with ACM
      
      Configure IPv6-only BMC hosts
      
      Attempt to discover the BMC hosts through metal3/ironic
      
      Observe failures when requests originate from nodes without IPv6 addresses
      

      Actual results:

      Ironic-proxy pods on some master nodes only have IPv4 addresses
      
      BMC discovery fails with errors when using IPv6 addresses
      
      Error messages indicate inability to locate ISO/IMG image files
      
      Node status shows inconsistent IP address assignments across masters
      

      Expected results:

      All ironic-proxy pods in a dual-stack cluster should have both IPv4 and IPv6 addresses
      
      BMC discovery should work consistently regardless of which master node handles the request
      
      Cluster should maintain consistent dual-stack networking configuration across all nodes
      

      Additional info:

      Customer observations:
      
      IPv4 addresses are issued by DHCP (Infoblox)
      
      Underlying network uses Cisco ACI with RA messages and SLAAC (but using DHCPv6 due to OpenShift limitations)
      
      DHCPv6 service running on a dedicated server
      
      NodeIP configuration seems to initialize before IPv6 address is available
      
      Cluster was installed as dual-stack from beginning (not converted)
      
      Diagnostic data available:
      
      ACM must-gather
      
      OCP must-gather
      
      sosreport from affected nodes
      
      Detailed network configuration information
      
      Relevant error from logs (sensitive data redacted):
      {"level":"info","ts":1751443199.2524848,"logger":"controllers.BareMetalHost","msg":"using PreprovisioningImage","baremetalhost":{"name":"test","namespace":"baremetal"},"provisioningState":"provisioning","Image":{"ImageURL":"https://assisted-image-service-multicluster-engine.apps.<redacted-domain>/byapikey/<redacted>","KernelURL":"","ExtraKernelParams":"","Format":"iso"}}
      {"level":"info","ts":1751443199.2841182,"logger":"provisioner.ironic","msg":"current provision state","host":"baremetal~test","lastError":"Failed to prepare to deploy. Exception: HTTP POST https://[<redacted-ipv6>]/redfish/v1/Systems/System.Embedded.1/VirtualMedia/1/Actions/VirtualMedia.InsertMedia returned code 500. Base.1.12.GeneralError: A general error has occurred. See ExtendedInfo for more information Extended information: [{'Message': 'Unable to locate the ISO or IMG image file or folder in the network share location because the file or folder path or the user credentials entered are incorrect.', 'MessageArgs': ['https://<redacted-ipv4>:6183/redfish/boot-<redacted-uuid>.iso'], 'MessageArgs@odata.count': 1, 'MessageId': 'IDRAC.2.9.RAC0720', 'RelatedProperties': ['#/Image'], 'RelatedProperties@odata.count': 1, 'Resolution': 'Enter the correct file or folder path and credentials, and then retry the operation.', 'Severity': 'Informational'}]","current":"deploy failed","target":"active"}
      
      Note:
      
      Hostnames (e.g., masterXX.cluster.example.com) replaced with generic terms
      
      IPv4/IPv6 addresses redacted (e.g., 159.216.9.19 → <redacted-ipv4>, 2a13:6203:... → <redacted-ipv6>)
      
      MAC addresses, API keys, and UUIDs removed
      
      Domain names anonymized
      

      Post-investigation engineering summary:

      Due to https://github.com/openshift/installer/blob/release-4.18/pkg/asset/machines/master.go#L580-L591 the "ip=dhcp,dhcp6" kernel arguments are present in dual-stack clusters only when platform is Metal, OpenStack or vSphere. For platform "none" there is nothing in the installation process that configures those. As a result, clusters installed with platform "none" are prone to the race condition occurring when IP address acquisition for v4 stack is significantly faster than for v6.

      As a workaround, affected clusters can be fed with two MachineConfigs provided in a comment below (https://issues.redhat.com/browse/OCPBUGS-59104?focusedId=27584371&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-27584371) that will add the required kernel arguments.

              mkowalsk@redhat.com Mat Kowalski
              rhn-support-mlele Mihir Lele
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: