Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45113

Azure: L-Series & M-Series mount failures due to missing nvme udev rules

XMLWordPrintable

    • Critical
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

          Azure Disk mount failure on Standard_L8s_v4 type with the following error:
      $ oc describe pod mypod-test-1
      Events:
        Type     Reason                  Age                            From                     Message
        ----     ------                  ----                           ----                     -------
        Warning  FailedScheduling        0s                             default-scheduler        0/6 nodes are available: persistentvolumeclaim "mypvc-test-1" not found. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
        Normal   Scheduled               <invalid>                      default-scheduler        Successfully assigned default/mypod-test-1 to jima27a-4swfc-worker-southeastasia3-xb5lw
        Normal   SuccessfulAttachVolume  <invalid>                      attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-adff1a0b-7a93-42a6-a5ad-9fa879ff487c"
        Warning  FailedMount             <invalid> (x4 over <invalid>)  kubelet                  MountVolume.MountDevice failed for volume "pvc-adff1a0b-7a93-42a6-a5ad-9fa879ff487c" : rpc error: code = Internal desc = failed to find disk on lun 0. azureDisk - findDiskByLun(0) failed with error(failed to find disk by lun 0)
      
      
      Checking the attached node, seems there is no lun info in HCTL for the pv (nvme0n2) an even nvme0n1 and other disks.
      sh-5.1# chroot /host
      sh-5.1# lsblk -o NAME,KNAME,MAJ:MIN,FSTYPE,SIZE,TYPE,MOUNTPOINT,HCTL
      NAME        KNAME     MAJ:MIN FSTYPE   SIZE TYPE MOUNTPOINT HCTL
      nvme0n1     nvme0n1   259:0            128G disk
      |-nvme0n1p1 nvme0n1p1 259:1              1M part
      |-nvme0n1p2 nvme0n1p2 259:2   vfat     127M part
      |-nvme0n1p3 nvme0n1p3 259:3   ext4     384M part /boot
      `-nvme0n1p4 nvme0n1p4 259:4   xfs    127.5G part /sysroot
      nvme1n1     nvme1n1   259:5            447G disk
      nvme2n1     nvme2n1   259:6            447G disk
      nvme3n1     nvme3n1   259:7            447G disk
      nvme4n1     nvme4n1   259:8            447G disk
      nvme0n2     nvme0n2   259:9            100G disk  

      Version-Release number of selected component (if applicable):

        4.18.0-0.test-2024-11-27-021558-ci-ln-i9ih8fb-latest
      pre-merge testing for https://issues.redhat.com/browse/OCPSTRAT-1729  

      How reproducible:

          Always

      Steps to Reproduce:

      1. Create azure cluster with Standard_L8s_v4 type (which is supported in OCPSTRAT-1729) 
      2. Creaet pod/pvc
      3. Check pod      

      Actual results:

        The pod is not running:
      $ oc get pod
      NAME           READY   STATUS              RESTARTS   AGE
      mypod-test-1   0/1     ContainerCreating   0          25m  

      Expected results:

      The pod should be running    

      Additional info:

          

              Unassigned Unassigned
              wduan@redhat.com Wei Duan
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: