Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31546

Azure installs are slow to create manifests

XMLWordPrintable

    • Critical
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          When running an Azure install, the installer noticeably hangs for a long time when running create manifests or create cluster. It will sit unresponsive for almost 2 minutes at:
      
      DEBUG OpenShift Installer unreleased-master-9741-gbc9836aa9bd3a4f10d229bb6f87981dddf2adc92 
      DEBUG Built from commit bc9836aa9bd3a4f10d229bb6f87981dddf2adc92 
      DEBUG Fetching Metadata...                         
      DEBUG Loading Metadata...                          
      DEBUG   Loading Cluster ID...                      
      DEBUG     Loading Install Config...                
      DEBUG       Loading SSH Key...                     
      DEBUG       Loading Base Domain...                 
      DEBUG         Loading Platform...                  
      DEBUG       Loading Cluster Name...                
      DEBUG         Loading Base Domain...               
      DEBUG         Loading Platform...                  
      DEBUG       Loading Pull Secret...                 
      DEBUG       Loading Platform...                    
      INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json" 
      
      This could also be related to failures we see in CI such as this:
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_installer/8123/pull-ci-openshift-installer-master-e2e-azure-ovn/1773611162923962368
      
       level=info msg=Consuming Worker Machines from target directory
      level=info msg=Credentials loaded from file "/var/run/secrets/ci.openshift.io/cluster-profile/osServicePrincipal.json"
      level=fatal msg=failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": error connecting to Azure client: failed to list SKUs: compute.ResourceSkusClient#List: Failure responding to request: StatusCode=200 -- Original Error: Error occurred reading http.Response#Body - Error = 'read tcp 10.128.117.2:43870->4.150.240.10:443: read: connection reset by peer' 
      
      If the call takes too long and the context timeout is canceled, we might potentially see this error.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Always

      Steps to Reproduce:

          1. Run azure install
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

      https://github.com/openshift/installer/pull/8134
      has a partial fix    

            padillon Patrick Dillon
            padillon Patrick Dillon
            Jinyun Ma Jinyun Ma
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: