Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5184

[azure] Fail to create master node with vm size in standardNVSv4Family

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: Installation would fail if a VM size standardNVSv4Family was chosen because that VM family is Windows-only.
      Fix: fix the Installer validation to reject that VM family
      Result: the Installer will fail during validation with a message saying the selected VM size "is currently only supported on Windows".
      Show
      Cause: Installation would fail if a VM size standardNVSv4Family was chosen because that VM family is Windows-only. Fix: fix the Installer validation to reject that VM family Result: the Installer will fail during validation with a message saying the selected VM size "is currently only supported on Windows".
    • Done

    Description

      Description of problem:

      Fail to deploy IPI azure cluster, where set region as westus3, vm type as NV8as_v4. Master node is running from azure portal, but could not ssh login. From serials log, get below error:

      [ 3009.547219] amdgpu d1ef:00:00.0: amdgpu: failed to write reg:de0
      [ 3011.982399] mlx5_core 6637:00:02.0 enP26167s1: TX timeout detected
      [ 3011.987010] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 0, SQ: 0x170, CQ: 0x84d, SQ Cons: 0x823 SQ Prod: 0x840, usecs since last trans: 2418884000
      [ 3011.996946] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 1, SQ: 0x175, CQ: 0x852, SQ Cons: 0x248c SQ Prod: 0x24a7, usecs since last trans: 2148366000
      [ 3012.006980] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 2, SQ: 0x17a, CQ: 0x857, SQ Cons: 0x44a1 SQ Prod: 0x44c0, usecs since last trans: 2055000000
      [ 3012.016936] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 3, SQ: 0x17f, CQ: 0x85c, SQ Cons: 0x405f SQ Prod: 0x4081, usecs since last trans: 1913890000
      [ 3012.026954] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 4, SQ: 0x184, CQ: 0x861, SQ Cons: 0x39f2 SQ Prod: 0x3a11, usecs since last trans: 2020978000
      [ 3012.037208] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 5, SQ: 0x189, CQ: 0x866, SQ Cons: 0x1784 SQ Prod: 0x17a6, usecs since last trans: 2185513000
      [ 3012.047178] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 6, SQ: 0x18e, CQ: 0x86b, SQ Cons: 0x4c96 SQ Prod: 0x4cb3, usecs since last trans: 2124353000
      [ 3012.056893] mlx5_core 6637:00:02.0 enP26167s1: TX timeout on queue: 7, SQ: 0x193, CQ: 0x870, SQ Cons: 0x3bec SQ Prod: 0x3c0f, usecs since last trans: 1855857000
      [ 3021.535888] amdgpu d1ef:00:00.0: amdgpu: failed to write reg:e15
      [ 3021.545955] BUG: unable to handle kernel paging request at ffffb57b90159000
      [ 3021.550864] PGD 100145067 P4D 100145067 PUD 100146067 PMD 0 

      From azure doc https://learn.microsoft.com/en-us/azure/virtual-machines/nvv4-series , looks like nvv4 series only supports Window VM.

       

      Version-Release number of selected component (if applicable):

      4.12 nightly build

      How reproducible:

      Always

      Steps to Reproduce:

      1. prepare install-config.yaml, set region as westus3, vm type as NV8as_v4 2. install cluster
      3.
      

      Actual results:

      installation failed

      Expected results:

      If nvv4 series is not supported for Linux VM, installer might validate and show the message that such size is not supported.

      Additional info:

       

       

       

       

       

      Attachments

        Issue Links

          Activity

            People

              rdossant Rafael Fonseca dos Santos
              jinyunma Jinyun Ma
              Jinyun Ma Jinyun Ma
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: