-
Task
-
Resolution: Done
-
Blocker
-
OSSM 2.5.3, OSSM 2.6.0
-
None
-
False
-
None
-
False
-
-
The installation of IPv6 cluster fails in bootstrap VM on `I/O timeout` during the init phase during connection to DNS server (in order to resolve where the service VM is with bootstrap.ign file).
[ 22.836355] ignition[786]: GET error: Get "http://api.ossm8.maistra.upshift.redhat.com:8000/bootstrap.ign": dial tcp: lookup api.ossm8.maistra.upshift.redhat.com on [2620:52:0:88:f816:3eff:fecb:16e2]:53: read udp [2620:52:0:9c:f816:3eff:fe77:c38e]:51369->[2620:52:0:88:f816:3eff:fecb:16e2]:53: i/o timeout
The bootstrap is not able to download ignition file from the service VM so the bootstrap VM never get initialized.
Context:
Since the IPv6 OpenStack network doesn't have DHCP, the correct kernel arguments with IP, Gateway IP and Nameserver API need to be passed via GRUB on start into bootstrap/master/worker VM in order to which the VM can download ignition file from the service VM and initializes the system.
So, the connection goes:
bootstrap VM (provider_net_ipv6_only network) -> DNS/Proxy VM (provider_net_cci_1 network) -> service VM (provider_net_cci_2 network)
Due to that, it is impossible to connect via ssh to bootstrap/master/worker VM before first initialization ( to ping DNS, services VM etc.)
Investigation:
- checked DNS/Proxy VM
- logs didn't contain any errors
- port for DNS (53) was opened and accessible from the service VM ( provider_net_cci_2 network )
- created RHEL VM on ipv6 network, set up DNS in /etc/resolv.conf, and tried to download the bootstrap.ign file via curl, no issue, so DNS is accessible from provider_net_ipv6_only network network. So the question is then why the bootstrap VM which is on the same network is not able to download the files and fails on i/o timeout.
- reinstalled DNS/Proxy VM to be sure that VM behaves correctly since it was 1 year old and survived lots of OpenStack maintenance and upgrades, but no luck
- tried to use IP of the service VM to access there directly from bootstrap a bypass DNS VM
- changes in the pipeline: https://gitlab.cee.redhat.com/istio/kiali-qe/kiali-qe-utils/-/commit/bbe29edef3e40283aea9e9649fbe21d1b26776d8
- the bootstrap was able to download the ignition file, however, the same problem happened to the master/worker's VMs which need to download the ignition file, what's more, the ignition files are now on a different URL ( because it is managed by OCP installer ) with self-sign cert which can cause the problem. However, during the second run, the problem was back on bootstrap as well, so bypassing DNS/Proxy VM didn't help, so maybe there is a problem connection to the service VM
[ 32.849865] ignition[788]: GET error: Get "http://[2620:52:0:88:f816:3eff:fed4:9cb5]:8000/bootstrap.ign": dial tcp [2620:52:0:88:f816:3eff:fed4:9cb5]:8000: i/o timeout
- tried to install DNS/Proxy VM on the same network where service VM are created during the cluster installation ( provider_net_cci_2 network ), to minimize the problem with connections between shared networks, but no luck
- tried to install the IPv6 cluster to RHOS-01 (needed to set all security groups and update the pipelines), but the issue was not there so it is related to RHOS-D only. However, I noticed that RHOS-01 has only one dual-stack network and only one IPv6 network in compare to RHOS-D where also `provider_net_cci_1`, `provider_net_cci_2` and `provider_net_cci_3` are dualstack
- so tried to install DNS/Proxy VM as well as service VM during the installation on dual stack network `provider_net_dualstack_1` in RHOS-D. It looks like that resolved the problem, so the issue happens only when the cluster service VM is on `provider_net_cci_2`. But it occurs only during the ignition phase on `rhcos` VMs, not when the VM is ready or on manually RHEL VM created (then it is able to connect to the service VM.)
- links to
-
RHSA-2024:135884 Red Hat OpenShift Service Mesh Containers for 2.6.0
- mentioned on