Uploaded image for project: 'OpenShift Installer'
  1. OpenShift Installer
  2. CORS-3614

[CAPI] Gather bootstrap using IPv6 address for IPv4 Primary Dualstack

XMLWordPrintable

    • False
    • None
    • False

      Description of problem:

       vSphere Dualstack (IPv4 Primary) cluster failed to install.

      Looking at the gather bootstrap logs we are using one of the IPv6 address to contact the node instead of the either using the IPv4 address , or optimally looping and using all the provide host addresses in turn.
       

      platform:
        vsphere:
          apiVIPs:
            - 10.94.146.130
            - fd65:a1a8:60ad:1234::4
          ingressVIPs:
            - 10.94.146.131
            - fd65:a1a8:60ad:1234::5
      networking:
        networkType: OVNKubernetes
        machineNetwork:
        - cidr: 10.94.146.128/25
        - cidr: fd65:a1a8:60ad:1234::/64
        clusterNetwork:
        - cidr: 10.128.0.0/14
          hostPrefix: 23
        - cidr: fd65:10:128::/56
          hostPrefix: 64
        serviceNetwork:
        - 172.30.0.0/16
        - fd65:172:16::/112
      
          
      level=info msg=openshift-install gather bootstrap --help
      level=error msg=Bootstrap failed to complete: timed out waiting for the condition
      level=error msg=Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
      level=info msg=Pulling Cluster API artifacts
      level=info msg=Skipping VM console logs gather: no gather methods registered for "vsphere"
      level=info msg=Pulling debug logs from the bootstrap machine
      level=info msg=Failed to gather bootstrap logs: failed to create SSH client: dial tcp [fd65:a1a8:60ad:1234:3d20:f47c:6df4:21e0]:22: connect: network is unreachable
      
      # yq '.status.addresses'  ./launch/ipi-install-install/artifacts/clusterapi_output/Machine-openshift-cluster-api-guests-ci-ln-dpbx1jk-c1627-mshkd-bootstrap.yaml
      - type: ExternalIP
        address: 10.94.146.143
      - type: ExternalIP
        address: fd65:a1a8:60ad:1234::22
      - type: ExternalIP
        address: fd65:a1a8:60ad:1234:3d20:f47c:6df4:21e0
      - type: InternalDNS
        address: ci-ln-dpbx1jk-c1627-mshkd-bootstrap
      

      Version-Release number of selected component (if applicable):

      4.16.0-0.nightly-2024-07-02-211018
          

      How reproducible:

      Intermittent. We have no control over DHCPv4 or DHCPv6 timings or IP address ordering.

      Steps to Reproduce:

          1.  Clusterbot: launch 4.16.0-0.nightly vsphere,dualstack
          2.
      

      Actual results:

      Cluster fails to install, bootstrap can't ssh to IPv6

      level=info msg=Failed to gather bootstrap logs: failed to create SSH client: dial tcp [fd65:a1a8:60ad:1234:3d20:f47c:6df4:21e0]:22: connect: network is unreachable
          

      Expected results:

      We should probably try all the Host ExternalIPs until success.

      Otherwise could follow the Dualstack Primary IP family rules and try IPv4 or IPv6 first.

      Additional info:

      A lot of code incorrectly assumes single stack and that there is only a single IP for a Host.
      We should assume there exists an IPv4 and IPv6 address for every host, until IPv4 is retired.

            Unassigned Unassigned
            rbrattai@redhat.com Ross Brattain
            Gaoyun Pei Gaoyun Pei
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: