Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43760

Azure installs are slow to create manifests

XMLWordPrintable

    • Critical
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: some azure API calls to list available resources where using an older API that did not allow filtering by location

      Consequence: the API would return enormous amounts of data, causing timeout errors in the installer in some cases

      Fix: By updating the API version and adding a filter, we reduced the number of results returned

      Result: timeout does not occur and install proceeds
      Show
      Cause: some azure API calls to list available resources where using an older API that did not allow filtering by location Consequence: the API would return enormous amounts of data, causing timeout errors in the installer in some cases Fix: By updating the API version and adding a filter, we reduced the number of results returned Result: timeout does not occur and install proceeds

      This is a clone of issue OCPBUGS-31546. The following is the description of the original issue:

      Description of problem:

          When running an Azure install, the installer noticeably hangs for a long time when running create manifests or create cluster. It will sit unresponsive for almost 2 minutes at:
      
      DEBUG OpenShift Installer unreleased-master-9741-gbc9836aa9bd3a4f10d229bb6f87981dddf2adc92 
      DEBUG Built from commit bc9836aa9bd3a4f10d229bb6f87981dddf2adc92 
      DEBUG Fetching Metadata...                         
      DEBUG Loading Metadata...                          
      DEBUG   Loading Cluster ID...                      
      DEBUG     Loading Install Config...                
      DEBUG       Loading SSH Key...                     
      DEBUG       Loading Base Domain...                 
      DEBUG         Loading Platform...                  
      DEBUG       Loading Cluster Name...                
      DEBUG         Loading Base Domain...               
      DEBUG         Loading Platform...                  
      DEBUG       Loading Pull Secret...                 
      DEBUG       Loading Platform...                    
      INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
      
      This could also be related to failures we see in CI such as this:
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8123/pull-ci-openshift-installer-master-e2e-azure-ovn/1773611162923962368
      
       level=info msg=Consuming Worker Machines from target directory
      level=info msg=Credentials loaded from file "/var/run/secrets/ci.openshift.io/cluster-profile/osServicePrincipal.json"
      level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": error connecting to Azure client: failed to list SKUs: compute.ResourceSkusClient#List: Failure responding to request: StatusCode=200 -- Original Error: Error occurred reading http.Response#Body - Error = 'read tcp 10.128.117.2:43870->4.150.240.10:443: read: connection reset by peer' 
      
      If the call takes too long and the context timeout is canceled, we might potentially see this error.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Always

      Steps to Reproduce:

          1. Run azure install
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

      https://github.com/openshift/installer/pull/8134
      has a partial fix    

            padillon Patrick Dillon
            openshift-crt-jira-prow OpenShift Prow Bot
            Jinyun Ma Jinyun Ma
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: