Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.17.z
Component/s: Installer / Agent based installation
Labels:
- triaged

Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

openshift-install execution is killed when receiving an error message from API VIP not reachable

Version-Release number of selected component (if applicable):

Found on latest OpenShift 4.17

How reproducible:

Most of the times that we have followed the steps below

Steps to Reproduce:

    1. Deploy a MNO cluster (3 masters and 3 workers) using Agent Based Installer (ABI)
    2. Once the cluster is deployed, take one of the worker nodes and deploy a SNO using that node, also with Agent Based Installer (ABI)
    3. Monitor this second installation with `openshift-install --log-level=debug agent wait-for bootstrap-complete` to wait for bootstrap complete, and `openshift-install --log-level=debug agent wait-for install-complete` to wait for install complete.

(the same happens if we firstly install the SNO in one of the worker nodes, and later on we install the MNO using all nodes including the one that was previously deployed with SNO).

Actual results:

openshift-install execution is interrumpted with a message like this (the example is from install-complete output, but the same happens in bootstrap-complete):

level=error msg=Attempted to gather ClusterOperator status after wait failure: Listing ClusterOperator objects: Get \"https://api.cluster1.partnerci.bos2.lab:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 192.168.21.18:6443: connect: no route to hostlevel=info msg=Use the following commands to gather logs from the clusterlevel=info msg=openshift-install gather bootstrap --helplevel=error msg=Bootstrap failed to complete: : bootstrap process returned error: failed to progress after all hosts available

The error shows that the API VIP was tried to be reached and a failure happened; probably because we're deploying a new cluster using resources that are already used by another cluster.

In fact, the installation continues and it works fine, but we need to re-run openshift-install to monitor the installation.

Expected results:

openshiff-install should omit that error and not close

Additional info:

Here's a link to a Distributed-CI job with reproduces this issue. You can find the must-gather here: https://www.distributed-ci.io/jobs/0a3c77d5-15a2-4e3f-887f-39584af676be/files

Assignee:: Robert Fournier

Reporter:: Ramon Perez

QA Contact:: Gaoyun Pei

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/01/24 3:12 PM

Updated:: 2025/01/30 4:24 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates