Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11160

Early network failures preventing bootstrap from completing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.14
    • RHCOS
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      After https://github.com/openshift/installer/pull/7038 landed, we've seen an increase in bootstrap failures - we dropped from about 95% success to 90%.

      This is the run we're picking apart: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn-upgrade/1641269349815685120

      The node that didn't come up seems stuck on afterburn: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn-upgrade/1641269349815685120/artifacts/e2e-aws-ovn-upgrade/gather-aws-console/artifacts/i-05f6db05dd0d2d553

      You can find a list of 4.14 runs that failed bootstrap here if you want more to dig into (although they may not all be caused by this problem): https://sippy.dptools.openshift.org/sippy-ng/jobs/4.14/runs?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22failed_test_names%22%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22cluster%20install.install%20should%20succeed%3A%20cluster%20bootstrap%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D&sortField=timestamp&sort=desc

      Version-Release number of selected component (if applicable):{code:none}
      
      414.92.202303281555-0
      
      

      How reproducible:

      About 5% of the time
      

      Steps to Reproduce:

      1Install OpenShift
      
      

      Actual results:

      Installation fails with a node that didn't come up
      

      Expected results:

      
      

      Additional info:

      
      

              travier@redhat.com Timothée Ravier
              stbenjam Stephen Benjamin
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: