Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30151

LVMS on SNO: after LUN disk is created, LVMS can't provision volumes

XMLWordPrintable

    • No
    • 3
    • OCPEDGE Sprint 250
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Found during CNV testing:
      We create a LUN disk, and it leaves LVMS disfunctional.

      akalenyu's guess is that LVMS picks up faulty test disks as PVs

      PVC stays Pending, PV is not created.

      $ oc describe pvc prime-9819eed5-e3be-4ae7-bc3e-31a4da5e5240
      ....
      Events:
        Type     Reason                Age                From                                                                                 Message
        ----     ------                ----               ----                                                                                 -------
        Normal   WaitForFirstConsumer  79s (x2 over 79s)  persistentvolume-controller                                                          waiting for first consumer to be created before binding
        Normal   WaitForPodScheduled   79s                persistentvolume-controller                                                          waiting for pod importer-prime-9819eed5-e3be-4ae7-bc3e-31a4da5e5240 to be scheduled
        Normal   ExternalProvisioning  11s (x7 over 77s)  persistentvolume-controller                                                          Waiting for a volume to be created either by the external provisioner 'topolvm.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
        Normal   Provisioning          11s (x7 over 77s)  topolvm.io_topolvm-controller-765c99856c-tk546_2c2d5fc1-2e78-4a05-a0ba-b22ac720e26f  External provisioner is provisioning volume for claim "default/prime-9819eed5-e3be-4ae7-bc3e-31a4da5e5240"
        Warning  ProvisioningFailed    10s (x7 over 77s)  topolvm.io_topolvm-controller-765c99856c-tk546_2c2d5fc1-2e78-4a05-a0ba-b22ac720e26f  failed to provision volume with StorageClass "lvms-vg1": rpc error: code = Internal desc = exit status 5 

      In the log, we see:

      {"level":"info","ts":"2024-02-28T16:19:58Z","msg":"invoking LVM command","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","LogicalVolume":{"name":"pvc-8a23930c-1d92-4c9a-844e-2b06af48ba4f"},"namespace":"","name":"pvc-8a23930c-1d92-4c9a-844e-2b06af48ba4f","reconcileID":"1809a914-4e99-4861-b538-35807bad326e","args":["lvcreate","-T","vg1/thin-pool-1","-n","041071d3-0074-45ac-b6d0-49558f97d225","-V","1073741824b","-W","y","-y"]}
        WARNING: Couldn't find device with uuid ohBUUc-R3Xh-oAbA-yGsl-rev7-Xmuv-4EeQJl.
        WARNING: Couldn't find device with uuid qfXeIs-aBc8-0eol-XPT9-BK45-GP1b-WNGusC.
        WARNING: Couldn't find device with uuid jWPS0x-N955-flpA-sjdR-ufvF-dSgg-0ajGaF.
        WARNING: Couldn't find device with uuid K7M9QZ-zEvp-hwJE-zRG2-4A2K-4lrw-l3ILE2.
        WARNING: Couldn't find device with uuid oMZ5Lm-xA03-IDe9-BIlM-JNcx-0n1M-RzTRj0.
        WARNING: Couldn't find device with uuid Zxzmcs-6SiS-tIUr-Gu4u-llk8-iG2N-qrxZ8u.
        WARNING: Couldn't find device with uuid zPV3eW-lf2M-7dWm-hyic-3CQt-f53d-FL1cG3.
        WARNING: Couldn't find device with uuid 4UAniO-KL3o-u1x5-ojsm-QVC8-tuBs-igI6ZF.
        WARNING: VG vg1 is missing PV ohBUUc-R3Xh-oAbA-yGsl-rev7-Xmuv-4EeQJl (last written to [unknown]).
        WARNING: VG vg1 is missing PV qfXeIs-aBc8-0eol-XPT9-BK45-GP1b-WNGusC (last written to [unknown]).
        WARNING: VG vg1 is missing PV jWPS0x-N955-flpA-sjdR-ufvF-dSgg-0ajGaF (last written to [unknown]).
        WARNING: VG vg1 is missing PV K7M9QZ-zEvp-hwJE-zRG2-4A2K-4lrw-l3ILE2 (last written to [unknown]).
        WARNING: VG vg1 is missing PV oMZ5Lm-xA03-IDe9-BIlM-JNcx-0n1M-RzTRj0 (last written to [unknown]).
        WARNING: VG vg1 is missing PV Zxzmcs-6SiS-tIUr-Gu4u-llk8-iG2N-qrxZ8u (last written to [unknown]).
        WARNING: VG vg1 is missing PV zPV3eW-lf2M-7dWm-hyic-3CQt-f53d-FL1cG3 (last written to [unknown]).
        WARNING: VG vg1 is missing PV 4UAniO-KL3o-u1x5-ojsm-QVC8-tuBs-igI6ZF (last written to [unknown]).
        Cannot change VG vg1 while PVs are missing.
        See vgreduce --removemissing and vgextend --restoremissing.
        Cannot process volume group vg1
      After running 'vgreduce --removemissing' - cluster is back to normal and PVCs get Bound. 

      However, we see that the reported free storage is smaller than expected:

      sh-5.1# vgs
        Devices file sys_wwid naa.60014051bbdbe3f239940cf91bc66e8e PVID ohBUUcR3XhoAbAyGslrev7Xmuv4EeQJl last seen on /dev/sdc not found.
        Devices file sys_wwid naa.3333333000008ca0 PVID jkAlTcIBAKq69CHiK21ryYhIXU0SjUTO last seen on /dev/sdc not found.
        Devices file sys_wwid naa.6001405f04919315d434c29b851e594e PVID nyx5Vm49UAAtHXT0Cv0ydl1662naI30q last seen on /dev/sdc not found.
        VG  #PV #LV #SN Attr   VSize   VFree  
        vg1   1  15   0 wz--n- 446.62g <43.57g
      
      sh-5.1# lvs
        Devices file sys_wwid naa.60014051bbdbe3f239940cf91bc66e8e PVID ohBUUcR3XhoAbAyGslrev7Xmuv4EeQJl last seen on /dev/sdc not found.
        Devices file sys_wwid naa.3333333000008ca0 PVID jkAlTcIBAKq69CHiK21ryYhIXU0SjUTO last seen on /dev/sdc not found.
        Devices file sys_wwid naa.6001405f04919315d434c29b851e594e PVID nyx5Vm49UAAtHXT0Cv0ydl1662naI30q last seen on /dev/sdc not found.
        LV                                   VG  Attr       LSize    Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
        048f1af2-8ea3-4562-94e5-e1853c949dd6 vg1 Vwi-a-tz--   30.00g thin-pool-1        16.67                                  
        51de4f0f-e4d1-42db-8695-63f81284fd0c vg1 Vwi-a-tz--    1.00g thin-pool-1        8.13                                   
        6835a099-2d94-4639-8179-f40e90040262 vg1 Vwi-a-tz--   30.00g thin-pool-1        33.33                                  
        6eca9314-4619-454e-bfdc-de9280dd7146 vg1 Vwi-a-tz--    1.00g thin-pool-1        8.13                                   
        73db819f-e97b-458a-9707-64edc8a93a99 vg1 Vwi-a-tz--    1.00g thin-pool-1        8.13                                   
        75eaa9fe-a61d-4a67-a8c9-bd111323c5fe vg1 Vwi-a-tz--  512.00m thin-pool-1        0.00                                   
        84793682-1ed5-449a-97fd-6f157975e364 vg1 Vwi-a-tz--   30.00g thin-pool-1        33.33                                  
        863bbc33-571c-47f5-a446-f6962bcecca8 vg1 Vwi-a-tz--    1.00g thin-pool-1        8.13                                   
        88db478b-1183-4d69-8c2d-0baa6597a49c vg1 Vwi-a-tz--   30.00g thin-pool-1        33.33                                  
        a8f8cbb6-358b-4ab8-8865-60c48a9c8f5d vg1 Vwi-a-tz--   30.00g thin-pool-1        33.33                                  
        ad21e499-da7e-4f62-ad65-f9f8ab63de93 vg1 Vwi-a-tz--   30.00g thin-pool-1        33.33                                  
        c9423aa9-103d-4da7-b962-aa582907a8f0 vg1 Vwi-a-tz--   30.00g thin-pool-1        26.67                                  
        d14b3e7c-96bb-429b-90e1-b9ed4d94151d vg1 Vwi-a-tz--    1.00g thin-pool-1        8.13                                   
        db2dcf3f-73e6-419f-be9a-312ade2bc79c vg1 Vwi-a-tz--    1.00g thin-pool-1        8.13                                   
        thin-pool-1                          vg1 twi-aotz-- <402.66g                    15.77  12.35    

       

      Also, we see that after LUN creation, empty lvmdevices fix is not honored, so it's  a regression of https://issues.redhat.com/browse/OCPBUGS-5223

      sh-5.1# cat /etc/lvm/devices/system.devices
      # LVM uses devices listed in this file.
      # Created by LVM command vgextend pid 3989733 at Thu Feb 29 13:11:30 2024
      VERSION=1.1.38
      IDTYPE=sys_wwid IDNAME=naa.62cea7f05051440026bda6e52583db0c DEVNAME=/dev/sdb PVID=2919x7klrcW0YR6AfZ56vYHnSzkhmmdd
      IDTYPE=sys_wwid IDNAME=naa.60014051bbdbe3f239940cf91bc66e8e DEVNAME=/dev/sdc PVID=ohBUUcR3XhoAbAyGslrev7Xmuv4EeQJl
      IDTYPE=sys_wwid IDNAME=naa.3333333000008ca0 DEVNAME=/dev/sdc PVID=jkAlTcIBAKq69CHiK21ryYhIXU0SjUTO
      IDTYPE=sys_wwid IDNAME=naa.6001405f04919315d434c29b851e594e DEVNAME=/dev/sdc PVID=nyx5Vm49UAAtHXT0Cv0ydl1662naI30q 

      LVMS team, please feel free to open another bug if there should be two separate fixes.

       

      Version-Release number of selected component (if applicable):

      4.15
      
      ClusterID: f3901f31-d4e9-4a25-b90c-95d4e7a1ce88
      ClusterVersion: Stable at "4.15.0-rc.7"
      ClusterOperators:
          All healthy and stable
      

      Steps to Reproduce:

      Create a LUN disk 

      Actual results:

      LVMS provisioner gets broken with no obvious reason. Steps to fix are only found in the log (which is pretty long itself)

      Expected results:

      LUN disks should not break the provisioner. 

      Additional info:

       

       

       

            rh-ee-jmoller Jakob Moeller
            jpeimer@redhat.com Jenia Peimer
            Rahul Deore Rahul Deore
            Adam Litke, Alexander Wels, Alex Kalenyuk
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: