Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24995

[azure] bootstrap failed to be provisioned when vm type is set to Standard_NP10s

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

      Description of problem:

      Configure vm type as Standard_NP10s in install-config, which only supports Generation V1.
      --------------
      compute:
      - architecture: amd64
        hyperthreading: Enabled
        name: worker
        platform:
          azure:
            type: Standard_NP10s
        replicas: 3
      controlPlane:
        architecture: amd64
        hyperthreading: Enabled
        name: master
        platform:
          azure:
            type: Standard_NP10s
        replicas: 3
      
      Continue installation, installer failed when provisioning bootstrap node.
      --------------
      ERROR                                              
      ERROR Error: creating Linux Virtual Machine: (Name "jima1211test-rqfhm-bootstrap" / Resource Group "jima1211test-rqfhm-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NP10s' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm" 
      ERROR                                              
      ERROR   with azurerm_linux_virtual_machine.bootstrap, 
      ERROR   on main.tf line 193, in resource "azurerm_linux_virtual_machine" "bootstrap": 
      ERROR  193: resource "azurerm_linux_virtual_machine" "bootstrap" { 
      ERROR                                              
      ERROR failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "bootstrap" stage: error applying Terraform configs: failed to apply Terraform: exit status 1 
      ERROR                                              
      ERROR Error: creating Linux Virtual Machine: (Name "jima1211test-rqfhm-bootstrap" / Resource Group "jima1211test-rqfhm-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_NP10s' cannot boot Hypervisor Generation '2'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '2' VM Size. For more information, see https://aka.ms/azuregen2vm" 
      ERROR                                              
      ERROR   with azurerm_linux_virtual_machine.bootstrap, 
      ERROR   on main.tf line 193, in resource "azurerm_linux_virtual_machine" "bootstrap": 
      ERROR  193: resource "azurerm_linux_virtual_machine" "bootstrap" { 
      ERROR                                              
      ERROR                                              
      
      seems that issue is introduced by https://github.com/openshift/installer/pull/7642/   

      Version-Release number of selected component (if applicable):

      4.15.0-0.nightly-2023-12-09-012410

      How reproducible:

      Always

      Steps to Reproduce:

          1. configure vm type to Standard_NP10s on control-plane in install-config.yaml
          2. install cluster
          3.
          

      Actual results:

          installer failed when provisioning bootstrap node

      Expected results:

          installation get successful

      Additional info:

          

            [OCPBUGS-24995] [azure] bootstrap failed to be provisioned when vm type is set to Standard_NP10s

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the "Target Backport Versions" field to indicate which version(s) will receive the fix.

            OpenShift Jira Bot added a comment - Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the " Target Backport Versions " field to indicate which version(s) will receive the fix.

            Jinyun Ma added a comment -

            Based on pre-merge test in comment and install with instance type against payload '4.16.0-0.nightly-2024-01-04-160948' get passed, move bug to VERIFIED.

            '

            Jinyun Ma added a comment - Based on pre-merge test in comment and install with instance type against payload '4.16.0-0.nightly-2024-01-04-160948' get passed, move bug to VERIFIED. '

            Jinyun Ma added a comment -

            rdossant thanks to raise new bug, will check in that bug.

            Jinyun Ma added a comment - rdossant thanks to raise new bug, will check in that bug.

            Marketplace specific bug here: https://issues.redhat.com/browse/OCPBUGS-25191

            Rafael Fonseca dos Santos added a comment - Marketplace specific bug here: https://issues.redhat.com/browse/OCPBUGS-25191

            jinyunma Ok, I see the problem and it's not related to either PR#7642 nor PR#7822. And it is my fault. Do you mind if we fix it in a separate Jira issue?

            Rafael Fonseca dos Santos added a comment - jinyunma Ok, I see the problem and it's not related to either PR#7642 nor PR#7822. And it is my fault. Do you mind if we fix it in a separate Jira issue?

            Hi rdossant,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi rdossant , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Jinyun Ma added a comment -

            Hi rdossant , pre-merge testing on installer PR#7822, it succeeded to create cluster with G1-only instance type (e.g Standard_NP10s).

            But when testing by using marketplace image, got error.

            Install-config

            platform:
              azure:
                baseDomainResourceGroupName: os4-common
                cloudName: AzurePublicCloud
                outboundType: Loadbalancer
                region: southcentralus
                defaultMachinePlatform:
                  osImage:
                    offer: rh-ocp-worker
                    publisher: redhat
                    sku: rh-ocp-worker
                    version: 413.92.2023101700 

            Create manifests, but failed.

            $ ./openshift-install create manifests --dir ipi1 --log-level debug
            DEBUG OpenShift Installer 4.16.0-0.test-2023-12-12-020559-ci-ln-xkqmlqk-latest 
            DEBUG Built from commit 456ae720a83e39dffd9918c5a71388ad873b6a38 
            DEBUG Fetching Master Machines...                  
            DEBUG Loading Master Machines...                   
            DEBUG   Loading Cluster ID...                      
            DEBUG     Loading Install Config...                
            DEBUG       Loading SSH Key...                     
            DEBUG       Loading Base Domain...                 
            DEBUG         Loading Platform...                  
            DEBUG       Loading Cluster Name...                
            DEBUG         Loading Base Domain...               
            DEBUG         Loading Platform...                  
            DEBUG       Loading Pull Secret...                 
            DEBUG       Loading Platform...                    
            INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
            ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: [controlPlane.platform.azure.osImage: Invalid value: azure.OSImage{Plan:"", Publisher:"redhat", Offer:"rh-ocp-worker", SKU:"rh-ocp-worker", Version:"413.92.2023101700"}: could not get marketplace image: %!w(<nil>), compute[0].platform.azure.osImage: Invalid value: azure.OSImage{Plan:"", Publisher:"redhat", Offer:"rh-ocp-worker", SKU:"rh-ocp-worker", Version:"413.92.2023101700"}: could not get marketplace image: %!w(<nil>)]  

            Same error on nightly build 4.15.0-0.nightly-2023-12-11-033133 which includes installer PR#7642.

            It is successful to create manifests on 4.15.0-ec.3 (created from 4.15.0-0.nightly-2023-11-28-101923, without installer PR#7642).

            Jinyun Ma added a comment - Hi rdossant , pre-merge testing on installer PR#7822 , it succeeded to create cluster with G1-only instance type (e.g Standard_NP10s). But when testing by using marketplace image, got error. Install-config platform:   azure:     baseDomainResourceGroupName: os4-common     cloudName: AzurePublicCloud     outboundType: Loadbalancer     region: southcentralus     defaultMachinePlatform:       osImage:         offer: rh-ocp-worker         publisher: redhat         sku: rh-ocp-worker         version: 413.92.2023101700 Create manifests, but failed. $ ./openshift-install create manifests --dir ipi1 --log-level debug DEBUG OpenShift Installer 4.16.0-0.test-2023-12-12-020559-ci-ln-xkqmlqk-latest  DEBUG Built from commit 456ae720a83e39dffd9918c5a71388ad873b6a38  DEBUG Fetching Master Machines...                   DEBUG Loading Master Machines...                    DEBUG   Loading Cluster ID...                       DEBUG     Loading Install Config...                 DEBUG       Loading SSH Key...                      DEBUG       Loading Base Domain...                  DEBUG         Loading Platform...                   DEBUG       Loading Cluster Name...                 DEBUG         Loading Base Domain...                DEBUG         Loading Platform...                   DEBUG       Loading Pull Secret...                  DEBUG       Loading Platform...                     INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json"   ERROR failed to fetch Master Machines: failed to load asset "Install Config" : failed to create install config: [controlPlane.platform.azure.osImage: Invalid value: azure.OSImage{Plan: "", Publisher:" redhat ", Offer:" rh-ocp-worker ", SKU:" rh-ocp-worker ", Version:" 413.92.2023101700 "}: could not get marketplace image: %!w(<nil>), compute[0].platform.azure.osImage: Invalid value: azure.OSImage{Plan:" ", Publisher:" redhat ", Offer:" rh-ocp-worker ", SKU:" rh-ocp-worker ", Version:" 413.92.2023101700"}: could not get marketplace image: %!w(<nil>)]  Same error on nightly build 4.15.0-0.nightly-2023-12-11-033133 which includes installer PR#7642 . It is successful to create manifests on 4.15.0-ec.3 (created from 4.15.0-0.nightly-2023-11-28-101923, without installer PR#7642 ).

            jinyunma Yeah, I think so too.

            Rafael Fonseca dos Santos added a comment - jinyunma Yeah, I think so too.

            Jinyun Ma added a comment -

            rdossant if reverting installer PR#7642, I guess https://issues.redhat.com/browse/OCPBUGS-25007 should also be fixed.

            Jinyun Ma added a comment - rdossant if reverting installer PR#7642 , I guess https://issues.redhat.com/browse/OCPBUGS-25007 should also be fixed.

              rdossant Rafael Fonseca dos Santos
              jinyunma Jinyun Ma
              Jinyun Ma Jinyun Ma
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: