-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.17.z
-
None
-
False
-
Description of problem:
openshift-install execution is killed when receiving an error message from API VIP not reachable
Version-Release number of selected component (if applicable):
Found on latest OpenShift 4.17
How reproducible:
Most of the times that we have followed the steps below
Steps to Reproduce:
1. Deploy a MNO cluster (3 masters and 3 workers) using Agent Based Installer (ABI) 2. Once the cluster is deployed, take one of the worker nodes and deploy a SNO using that node, also with Agent Based Installer (ABI) 3. Monitor this second installation with `openshift-install --log-level=debug agent wait-for bootstrap-complete` to wait for bootstrap complete, and `openshift-install --log-level=debug agent wait-for install-complete` to wait for install complete. (the same happens if we firstly install the SNO in one of the worker nodes, and later on we install the MNO using all nodes including the one that was previously deployed with SNO).
Actual results:
openshift-install execution is interrumpted with a message like this (the example is from install-complete output, but the same happens in bootstrap-complete): level=error msg=Attempted to gather ClusterOperator status after wait failure: Listing ClusterOperator objects: Get \"https://api.cluster1.partnerci.bos2.lab:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 192.168.21.18:6443: connect: no route to hostlevel=info msg=Use the following commands to gather logs from the clusterlevel=info msg=openshift-install gather bootstrap --helplevel=error msg=Bootstrap failed to complete: : bootstrap process returned error: failed to progress after all hosts available The error shows that the API VIP was tried to be reached and a failure happened; probably because we're deploying a new cluster using resources that are already used by another cluster. In fact, the installation continues and it works fine, but we need to re-run openshift-install to monitor the installation.
Expected results:
openshiff-install should omit that error and not close
Additional info:
Here's a link to a Distributed-CI job with reproduces this issue. You can find the must-gather here: https://www.distributed-ci.io/jobs/0a3c77d5-15a2-4e3f-887f-39584af676be/files