Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-53147

[LSO] diskmaker failed to get volume mode of path in deletePV

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          I noticed in a presubmit job the diskmaker logs have a series of repeating errors:
      
      I0314 01:50:57.882044   36668 cache.go:55] Added pv "local-pv-ea5b0e26" to cache
      I0314 01:50:57.882171   36668 reconcile.go:97] "Looking for released PVs to cleanup" namespace="openshift-local-storage" name="tentothirty-overlapping-twentytofifty-1-2"
      E0314 01:50:57.882288   36668 deleter.go:103] failed to get volume mode of path "/mnt/local-storage/tentothirty-overlapping-twentytofifty-1-2/nvme-Amazon_Elastic_Block_Store_vol0662dcb2581e2e50d": Directory check for "/mnt/local-storage/tentothirty-overlapping-twentytofifty-1-2/nvme-Amazon_Elastic_Block_Store_vol0662dcb2581e2e50d" failed: open /mnt/local-storage/tentothirty-overlapping-twentytofifty-1-2/nvme-Amazon_Elastic_Block_Store_vol0662dcb2581e2e50d: no such file or directory
      I0314 01:50:57.882306   36668 reconcile.go:101] "Looking for symlinks to cleanup" namespace="openshift-local-storage" name="tentothirty-overlapping-twentytofifty-1-2"
      2025-03-14T01:50:57.882Z	DEBUG	events	recorder/recorder.go:104	Error cleaning PV "local-pv-ea5b0e26": failed to get volume mode of path "/mnt/local-storage/tentothirty-overlapping-twentytofifty-1-2/nvme-Amazon_Elastic_Block_Store_vol0662dcb2581e2e50d": Directory check for "/mnt/local-storage/tentothirty-overlapping-twentytofifty-1-2/nvme-Amazon_Elastic_Block_Store_vol0662dcb2581e2e50d" failed: open /mnt/local-storage/tentothirty-overlapping-twentytofifty-1-2/nvme-Amazon_Elastic_Block_Store_vol0662dcb2581e2e50d: no such file or directory	{"type": "Warning", "object": {"kind":"PersistentVolume","name":"local-pv-ea5b0e26","uid":"7442678b-0860-434a-9260-1a8dbfd19d2c","apiVersion":"v1","resourceVersion":"49089"}, "reason": "VolumeFailedDelete"}
      
      presubmit job:
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_local-storage-operator/522/pull-ci-openshift-local-storage-operator-main-e2e-operator/1900324985017208832
      
      diskmaker log:
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_local-storage-operator/522/pull-ci-openshift-local-storage-operator-main-e2e-operator/1900324985017208832/artifacts/e2e-operator/gather-extra/artifacts/pods/openshift-local-storage_diskmaker-manager-jvpwg_diskmaker-manager.log
      
      It fails to getVolMode here in deletePV:
      https://github.com/openshift/local-storage-operator/blob/646a98497445b51ce0fc3c7455ae06cb4869ec33/vendor/sigs.k8s.io/sig-storage-local-static-provisioner/pkg/deleter/deleter.go#L150-L153
      
      The PV is released but does not have deletionTimestamp:
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_local-storage-operator/522/pull-ci-openshift-local-storage-operator-main-e2e-operator/1900324985017208832/artifacts/e2e-operator/gather-extra/artifacts/persistentvolumes.json
      
      CleanupSymlinks() in diskmaker only processes PV's with deletionTimestamp. So how could it be that the symlink has been deleted while the PV has not?
      
      I suspect this is just an issue in the order of cleanup functions in the e2e test job, cleanupSymlinkDir should run after cleanupLVSetResources:
      https://github.com/openshift/local-storage-operator/blob/646a98497445b51ce0fc3c7455ae06cb4869ec33/test/e2e/localvolumeset_test.go#L99-L106
      
      If cleanupSymlinkDir runs before the LVSet is deleted, we could get into this situation. Strictly speaking, we shouldn't even need cleanupSymlinkDir anymore after https://github.com/openshift/local-storage-operator/pull/504 -- we could try removing it from the test.
      
      

      Version-Release number of selected component (if applicable):

          4.19

      How reproducible:

          Unknown

      Steps to Reproduce:

          1. Run e2e-operator presubmit job
          2. Review diskmaker logs
          

      Actual results:

          "no such file or directory" errors when trying to delete PV

      Expected results:

          PV is deleted by diskmaker successfully

      Additional info:

          When I noticed these errors, my branch was missing https://github.com/openshift/local-storage-operator/pull/522/commits/0b88cae58e65a6176b540713f7717e3971e8e810 -- just in case that's relevant to reproducing this.

              jdobson@redhat.com Jonathan Dobson
              jdobson@redhat.com Jonathan Dobson
              None
              None
              Wei Duan Wei Duan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: