Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31763

gcp install cluster creation fails after 30-40 minutes

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None

    Description

      Component Readiness has found a potential regression in install should succeed: overall.  I see this on various different platforms, but I started digging into GCP failures.  No installer log bundle is created, which seriously hinders my ability to dig further.

      Bootstrap succeeds, and then 30 minutes after waiting for cluster creation, it dies.

      From https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-gcp-sdn-serial/1775871000018161664

      search.ci tells me this affects nearly 10% of jobs on GCP:

      https://search.dptools.openshift.org/?search=Attempted+to+gather+ClusterOperator+status+after+installation+failure%3A+listing+ClusterOperator+objects.*connection+refused&maxAge=168h&context=1&type=bug%2Bissue%2Bjunit&name=.*4.16.*gcp.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

       

      time="2024-04-04T13:27:50Z" level=info msg="Waiting up to 40m0s (until 2:07PM UTC) for the cluster at https://api.ci-op-n3pv5pn3-4e5f3.XXXXXXXXXXXXXXXXXXXXXX:6443 to initialize..."
      time="2024-04-04T14:07:50Z" level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.ci-op-n3pv5pn3-4e5f3.XXXXXXXXXXXXXXXXXXXXXX:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 35.238.130.20:6443: connect: connection refused"
      time="2024-04-04T14:07:50Z" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation"
      time="2024-04-04T14:07:50Z" level=error msg="failed to initialize the cluster: timed out waiting for the condition" 

       

      Probability of significant regression: 99.44%

      Sample (being evaluated) Release: 4.16
      Start Time: 2024-03-29T00:00:00Z
      End Time: 2024-04-04T23:59:59Z
      Success Rate: 68.75%
      Successes: 11
      Failures: 5
      Flakes: 0

      Base (historical) Release: 4.15
      Start Time: 2024-02-01T00:00:00Z
      End Time: 2024-02-28T23:59:59Z
      Success Rate: 96.30%
      Successes: 52
      Failures: 2
      Flakes: 0

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=Other&component=Installer%20%2F%20openshift-installer&confidence=95&environment=sdn%20upgrade-micro%20amd64%20gcp%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=sdn&network=sdn&pity=5&platform=gcp&platform=gcp&sampleEndTime=2024-04-04%2023%3A59%3A59&sampleRelease=4.16&sampleStartTime=2024-03-29%2000%3A00%3A00&testId=cluster%20install%3A0cb1bb27e418491b1ffdacab58c5c8c0&testName=install%20should%20succeed%3A%20overall&upgrade=upgrade-micro&upgrade=upgrade-micro&variant=standard&variant=standard

      Attachments

        Activity

          People

            vrutkovs@redhat.com Vadim Rutkovsky
            stbenjam Stephen Benjamin
            Jianli Wei Jianli Wei
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: