Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-623

[2314715] [GSS] Corrupt mount points from CSI driver

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • odf-4.14
    • ceph/CephFS/x86
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):
      Users are unable to mount pods to volumes. From the CSI plugin logs, they are filled with corrupt mount points:

      ./csi-cephfsplugin-wlcpp/csi-cephfsplugin/csi-cephfsplugin/logs/rotated/0.log.20240918-163004:2024-09-16T22:59:02.180155798+00:00 stderr F W0916 22:59:02.180147 1 nodeserver.go:700] ID: 2024 corrupted mount detected in "/var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount": stat /var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount: permission denied
      ./csi-cephfsplugin-wlcpp/csi-cephfsplugin/csi-cephfsplugin/logs/rotated/0.log.20240918-163004:2024-09-16T23:00:15.587635229+00:00 stderr F W0916 23:00:15.587625 1 nodeserver.go:700] ID: 2052 corrupted mount detected in "/var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount": stat /var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount: permission denied
      ./csi-cephfsplugin-wlcpp/csi-cephfsplugin/csi-cephfsplugin/logs/rotated/0.log.20240918-163004:2024-09-16T23:02:11.816256082+00:00 stderr F W0916 23:02:11.816245 1 nodeserver.go:700] ID: 2088 corrupted mount detected in "/var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount": stat /var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount: permission denied
      ./csi-cephfsplugin-wlcpp/csi-cephfsplugin/csi-cephfsplugin/logs/rotated/0.log.20240918-163004:2024-09-16T23:04:02.943896343+00:00 stderr F W0916 23:04:02.943885 1 nodeserver.go:700] ID: 2130 corrupted mount detected in "/var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount": stat /var/lib/kubelet/pods/7c1e6974-a1f9-4069-9129-b5e935b3f5a7/volumes/kubernetes.io~csi/pvc-abbd6725-bd49-4cd3-a9fe-3bc9320d53b0/mount: permission denied
      ./csi-cephfsplugin-wlcpp/csi-cephfsplugin/csi-cephfsplugin/logs/rotated/0.log.20240918-163004:2024-09-16T23:08:09.885178230+00:00 stderr F W0916 23:08:09.885085 1 nodeserver.go:700] ID: 2190 corrupted mount detected in "/var/lib/kubelet/pods/345e5364-0e16-4fdd-b9de-3076c2728e79/volumes/kubernetes.io~csi/pvc-ab940a15-d40b-4f2c-9ee8-462e0b6c354d/mount": stat /var/lib/kubelet/pods/345e5364-0e16-4fdd-b9de-3076c2728e79/volumes/kubernetes.io~csi/pvc-ab940a15-d40b-4f2c-9ee8-462e0b6c354d/mount: permission denied
      ./csi-cephfsplugin-wlcpp/csi-cephfsplugin/csi-cephfsplugin/logs/rotated/0.log.20240918-163004:2024-09-16T23:08:09.885178230+00:00 stderr F W0916 23:08:09.885122 1 nodeserver.go:700] ID: 2174 corrupted mount detected in "/var/lib/kubelet/pods/db7479a2-8676-40b1-af35-f6280aa9e64f/volumes/kubernetes.io~csi/pvc-ab940a15-d40b-4f2c-9ee8-462e0b6c354d/mount": stat /var/lib/kubelet/pods/db7479a2-8676-40b1-af35-f6280aa9e64f/volumes/kubernetes.io~csi/pvc-ab940a15-d40b-4f2c-9ee8-462e0b6c354d/mount: permission denied

      If we review the csi logs further, we can see the driver handles the corruption appropriately:

      2024-09-16T23:08:27.576871302+00:00 stderr F I0916 23:08:27.576841 1 utils.go:206] ID: 2198 Req-ID: 0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d GRPC request:

      {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/421bcceb78bbcca1328fec2628259a2c79c22e8b0785327ac408431cc1bed320/globalmount","volume_id":"0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d"}

      2024-09-16T23:08:27.576918060+00:00 stderr F E0916 23:08:27.576909 1 nodeserver.go:619] ID: 2198 Req-ID: 0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d stat failed: stat /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/421bcceb78bbcca1328fec2628259a2c79c22e8b0785327ac408431cc1bed320/globalmount: permission denied
      2024-09-16T23:08:27.576924824+00:00 stderr F I0916 23:08:27.576918 1 nodeserver.go:634] ID: 2198 Req-ID: 0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d cephfs: detected corrupted mount in staging target path /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/421bcceb78bbcca1328fec2628259a2c79c22e8b0785327ac408431cc1bed320/globalmount, trying to unmount anyway
      2024-09-16T23:08:27.599288100+00:00 stderr F I0916 23:08:27.599214 1 cephcmds.go:105] ID: 2198 Req-ID: 0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d command succeeded: umount [/var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/421bcceb78bbcca1328fec2628259a2c79c22e8b0785327ac408431cc1bed320/globalmount --all-targets]
      2024-09-16T23:08:27.599422244+00:00 stderr F I0916 23:08:27.599377 1 nodeserver.go:647] ID: 2198 Req-ID: 0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d cephfs: successfully unmounted volume 0001-0011-openshift-storage-0000000000000001-b8a4d123-6136-48d6-ba2b-501daf72301d from /var/lib/kubelet/plugins/kubernetes.io/csi/openshift-storage.cephfs.csi.ceph.com/421bcceb78bbcca1328fec2628259a2c79c22e8b0785327ac408431cc1bed320/globalmount

      However, I don't know whats causing this corrupt mounts to occur in the first place. I'd like to get the csi teams input on this to help narrow down this issue.

      Version of all relevant components (if applicable):
      ODF 4.14
      OCP 4.14

              vshankar@redhat.com Venky Shankar
              rhn-support-kelwhite Kelson White
              Elad Ben Aharon Elad Ben Aharon
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: