Uploaded image for project: 'OpenShift Service Mesh'
  1. OpenShift Service Mesh
  2. OSSM-6699

Fix failures of IPv6 cluster installation


      The installation of IPv6 cluster fails in bootstrap VM on `I/O timeout` during the init phase during connection to DNS server (in order to resolve where the service VM is with bootstrap.ign file).

      [   22.836355] ignition[786]: GET error: Get "http://api.ossm8.maistra.upshift.redhat.com:8000/bootstrap.ign": dial tcp: lookup api.ossm8.maistra.upshift.redhat.com on [2620:52:0:88:f816:3eff:fecb:16e2]:53: read udp [2620:52:0:9c:f816:3eff:fe77:c38e]:51369->[2620:52:0:88:f816:3eff:fecb:16e2]:53: i/o timeout 

      The bootstrap is not able to download ignition file from the service VM so the bootstrap VM never get initialized.

      Since the IPv6 OpenStack network doesn't have DHCP, the correct kernel arguments with IP, Gateway IP and Nameserver API need to be passed via GRUB on start into bootstrap/master/worker VM in order to which the VM can download ignition file from the service VM and initializes the system.

      So, the connection goes:

      bootstrap VM (provider_net_ipv6_only network) -> DNS/Proxy VM (provider_net_cci_1 network) -> service VM (provider_net_cci_2 network)

      Due to that, it is impossible to connect via ssh to bootstrap/master/worker VM before first initialization ( to ping DNS, services VM etc.)


      • checked DNS/Proxy VM 
        • logs didn't contain any errors
        • port for DNS (53) was opened and accessible from the service VM ( provider_net_cci_2 network )
        • created RHEL VM on ipv6 network, set up DNS in /etc/resolv.conf, and tried to download the bootstrap.ign file via curl, no issue, so DNS is accessible from provider_net_ipv6_only network  network. So the question is then why the bootstrap VM which is on the same network is not able to download the files and fails on i/o timeout.
        • reinstalled DNS/Proxy VM to be sure that VM behaves correctly since it was 1 year old and survived lots of OpenStack maintenance and upgrades, but no luck
      • tried to use IP of the service VM to access there directly from bootstrap a bypass DNS VM
        • changes in the pipeline: https://gitlab.cee.redhat.com/istio/kiali-qe/kiali-qe-utils/-/commit/bbe29edef3e40283aea9e9649fbe21d1b26776d8
        • the bootstrap was able to download the ignition file, however, the same problem happened to the master/worker's VMs which need to download the ignition file, what's more, the ignition files are now on a different URL ( because it is managed by OCP installer ) with self-sign cert which can cause the problem. However, during the second run, the problem was back on bootstrap as well, so bypassing DNS/Proxy VM didn't help, so maybe there is a problem connection to the service VM
          [   32.849865] ignition[788]: GET error: Get "http://[2620:52:0:88:f816:3eff:fed4:9cb5]:8000/bootstrap.ign": dial tcp [2620:52:0:88:f816:3eff:fed4:9cb5]:8000: i/o timeout 
      • tried to install DNS/Proxy VM on the same network where service VM are created during the cluster installation ( provider_net_cci_2 network ), to minimize the problem with connections between shared networks, but no luck
      • tried to install the IPv6 cluster to RHOS-01 (needed to set all security groups and update the pipelines), but the issue was not there so it is related to RHOS-D only. However, I noticed that RHOS-01 has only one dual-stack network and only one IPv6 network in compare to RHOS-D where also `provider_net_cci_1`, `provider_net_cci_2` and `provider_net_cci_3` are dualstack
      • so tried to install DNS/Proxy VM as well as service VM during the installation on dual stack network `provider_net_dualstack_1` in RHOS-D. It looks like that resolved the problem, so the issue happens only when the cluster service VM is on `provider_net_cci_2`. But it occurs only during the ignition phase on `rhcos` VMs, not when the VM is ready or on manually RHEL VM created (then it is able to connect to the service VM.)

            mkralik@redhat.com Matej Kralik
            mkralik@redhat.com Matej Kralik
            0 Vote for this issue
            2 Start watching this issue
