Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49157

openshift-install execution is killed when receiving an error message from API VIP not reachable

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      openshift-install execution is killed when receiving an error message from API VIP not reachable    

      Version-Release number of selected component (if applicable):

      Found on latest OpenShift 4.17

      How reproducible:

      Most of the times that we have followed the steps below    

      Steps to Reproduce:

          1. Deploy a MNO cluster (3 masters and 3 workers) using Agent Based Installer (ABI)
          2. Once the cluster is deployed, take one of the worker nodes and deploy a SNO using that node, also with Agent Based Installer (ABI)
          3. Monitor this second installation with `openshift-install --log-level=debug agent wait-for bootstrap-complete` to wait for bootstrap complete, and `openshift-install --log-level=debug agent wait-for install-complete` to wait for install complete.
      
      (the same happens if we firstly install the SNO in one of the worker nodes, and later on we install the MNO using all nodes including the one that was previously deployed with SNO).

      Actual results:

      openshift-install execution is interrumpted with a message like this (the example is from install-complete output, but the same happens in bootstrap-complete):
      
      level=error msg=Attempted to gather ClusterOperator status after wait failure: Listing ClusterOperator objects: Get \"https://api.cluster1.partnerci.bos2.lab:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 192.168.21.18:6443: connect: no route to hostlevel=info msg=Use the following commands to gather logs from the clusterlevel=info msg=openshift-install gather bootstrap --helplevel=error msg=Bootstrap failed to complete: : bootstrap process returned error: failed to progress after all hosts available
      
      The error shows that the API VIP was tried to be reached and a failure happened; probably because we're deploying a new cluster using resources that are already used by another cluster.
      
      In fact, the installation continues and it works fine, but we need to re-run openshift-install to monitor the installation.

      Expected results:

      openshiff-install should omit that error and not close

      Additional info:

      Here's a link to a Distributed-CI job with reproduces this issue. You can find the must-gather here: https://www.distributed-ci.io/jobs/0a3c77d5-15a2-4e3f-887f-39584af676be/files

              bfournie@redhat.com Robert Fournier
              raperez@redhat.com Ramon Perez
              Gaoyun Pei Gaoyun Pei
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: