Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42014

Cluster creation failure rate increased since June 2024

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Done

      Description of problem. In Advanced Cluster Security we rely for OCP creation in our CI and recently observed an increase of cluster creation failures. While we've been advised to retry the failures (and we do so now, see ROX-25416), I'm afraid our use case is not so unique and others are affected as well.

      We suggest upgrading terraform and provider to the latest version (possible before license changes) in openshift-installer for 4.12+. The underlying issue is probably already fixed upstream and released in v5.37.0.

      Version-Release number of selected component (if applicable): TBD

      How reproducible: TBD

      Steps to Reproduce: TBD

      Actual results: TBD

      Expected results: TBD

      Additional info.

      The most common error we see in our JIRA issues is and that is something we could find similar issues with AWS provider too eg. OCPBUGS-4213.

      level=error msg=Error: Provider produced inconsistent result after apply .... resource was present, but now absent
      

      Summary of errors from:

            3 failed to create cluster: failed to apply Terraform: error(GCPComputeBackendTimeout) from Infrastructure Provider: GCP is experiencing backend service interuptions, the compute instance failed to create in reasonable time."
            3 Provider produced inconsistent result after apply\n\nWhen applying changes to\nmodule.master.google_service_account.master-node-sa[0], provider\n\"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new value: Root\nresource was present, but now absent.\n\n
            6 Error waiting to create Network: Error waiting for Creating Network: timeout while waiting for state to become 'DONE' (last state: 'RUNNING', timeout: 4m0s)\n\n  with module.network.google_compute_network.cluster_network[0],\n  on network/network.tf line 1, in resource \"google_compute_network\" \"cluster_network\":\n   1: resource \"google_compute_network\" \"cluster_network\" {\n\n"
            9 error applying Terraform configs: failed to apply Terraform: error(GCPComputeBackendTimeout) from Infrastructure Provider: GCP is experiencing backend service interuptions, the compute instance failed to create in reasonable time."
           14 Provider produced inconsistent result after apply\n\nWhen applying changes to module.master.google_service_account.master-node-sa,\nprovider \"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new\nvalue: Root resource was present, but now absent.
           16 Provider produced inconsistent result after apply\n\nWhen applying changes to google_service_account_key.bootstrap, provider\n\"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new value: Root\nresource was present, but now absent.
           18 Provider produced inconsistent result after apply\n\nWhen applying changes to module.iam.google_service_account.worker-node-sa,\nprovider \"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new\nvalue: Root resource was present, but now absent.
           34 Error creating service account key: googleapi: Error 404: Service account projects/acs-san-stackroxci/serviceAccounts/XXX@acs-san-stackroxci.iam.gserviceaccount.com does not exist., notFound\n\n  with google_service_account_key.bootstrap,\n  on main.tf line 38, in resource \"google_service_account_key\" \"bootstrap\":\n  38: resource \"google_service_account_key\" \"bootstrap\" {\n\n"
           45 error applying Terraform configs: failed to apply Terraform: exit status 1\n\nError: Provider produced inconsistent result after apply\n\nWhen applying changes to\nmodule.master.google_service_account.master-node-sa[0], provider\n\"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new value: Root\nresource was present, but now absent.
           59 error applying Terraform configs: failed to apply Terraform: exit status 1\n\nError: Provider produced inconsistent result after apply\n\nWhen applying changes to module.iam.google_service_account.worker-node-sa,\nprovider \"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new\nvalue: Root resource was present, but now absent.
          100 Provider produced inconsistent result after apply\n\nWhen applying changes to google_service_account.bootstrap-node-sa, provider\n\"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new value: Root\nresource was present, but now absent.
          103 Provider produced inconsistent result after apply\n\nWhen applying changes to module.iam.google_service_account.worker-node-sa[0],\nprovider \"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new\nvalue: Root resource was present, but now absent.
          116 Provider produced inconsistent result after apply\n\nWhen applying changes to\nmodule.master.google_service_account.master-node-sa[0], provider\n\"provider[\\\"openshift/local/google\\\"]\" produced an unexpected new value: Root\nresource was present, but now absent.
      

      The openshift installer contains a bundled terraform and google-provider

              rh-ee-bbarbach Brent Barbachem
              aruklets@redhat.com Alexander Rukletsov
              Manoj Hans Manoj Hans
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: