-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
odf-4.18
Description:
When a ReadWriteMany (RWX) PersistentVolumeClaim (PVC) is created using the ocs-storagecluster-cephfs storage class, it does not function correctly for pods running under a Security Context Constraint (SCC) with seLinuxStrategy: RunAsAny.
In a multi-replica deployment using such an SCC, only the last pod to start can access the shared volume. The other pods fail with "Permission denied" errors. This is because the CephFS CSI driver mounts the volume and applies the specific SELinux Multi-Category Security (MCS) label of the container process to the shared mount point. Each subsequent pod relabels the volume with its own unique MCS label, revoking access for all previous pods.
This behavior defeats the purpose of an RWX volume, as it prevents multiple pods from concurrently accessing the shared storage unless they are all forced to use the exact same MCS label (e.g., via a restricted-v2 SCC).
Environment Details
Platform: OpenShift Container Platform 4.18
Storage: OpenShift Data Foundation
StorageClass: ocs-storagecluster-cephfs
Provisioner: openshift-storage.cephfs.csi.ceph.com
Access Mode: ReadWriteMany
Steps to Reproduce
Part A: Verification with restricted-v2 SCC (Works as Expected)
1. Create a new project. The project is automatically assigned an MCS label.
oc new-project web-ns oc describe project web-ns | grep "openshift.io/sa.scc.mcs" # Example Output: openshift.io/sa.scc.mcs=s0:c28,c7
2. Deploy a sample application and create an RWX PVC.
oc new-app --name web-app --image quay.io/redhattraining/php-ssl:v1.0 oc set volumes deploy/web-app --add --name data \ --mount-path /mnt/shared \ --type pvc --claim-name data-pvc \ --claim-class ocs-storagecluster-cephfs \ --claim-size 1Gi \ --claim-mode ReadWriteMany
3. Scale the deployment and verify access.
oc scale --replicas 3 deploy/web-app
# Wait for all pods to be in Running state
4. Check file access from any pod. All pods can read and write to the shared volume because they all run under the restricted-v2 SCC and share the same namespace-level MCS label.
# Get a pod name POD_NAME=$(oc get pods -l app=web-app -o jsonpath='{.items[0].metadata.name}') # Create a file from the first pod oc rsh $POD_NAME touch /mnt/shared/testfile_from_pod1 # Verify access from another pod OTHER_POD_NAME=$(oc get pods -l app=web-app -o jsonpath='{.items[1].metadata.name}') oc rsh $OTHER_POD_NAME ls -lZ /mnt/shared/ # Output shows the file is accessible and has the project's MCS label # -rw-r--r--. 1 1000770000 1000770000 system_u:object_r:container_file_t:s0:c28,c7 0 Jun 9 16:00 testfile_from_pod1
Part B: Triggering the Bug with SeLinux RunAsAny SCC (Fails)
1. Create a custom SCC with seLinux: RunAsAny.
oc get scc custom-scc NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES custom-scc true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny 5 false ["*"]
2. Grant the new SCC to the default service account in the project.
oc adm policy add-scc-to-user custom-scc -z default -n web-ns
Restart the pods to apply the new SCC.
oc delete pods --all -n web-ns
Wait for all 3 pods to be in Running state
3. Check which SCC is being used. The pods will now use custom-scc.
oc get pods -l app=web-app -o jsonpath='{.items[0].metadata.annotations.openshift\.io/scc}'
# Output: custom-scc
4. Actual Results
Attempt to access the shared volume from different pods.
Only one pod (typically the last one to complete mounting) can access the shared volume. All other pods get a Permission denied error.
# Get the list of running pods PODS=($(oc get pods -l app=web-app -o jsonpath='{.items[*].metadata.name}')) # Attempt to list files from each pod oc rsh ${PODS[0]} ls -l /mnt/shared/ # ls: cannot access '/mnt/shared/': Permission denied # command terminated with exit code 2 oc rsh ${PODS[1]} ls -l /mnt/shared/ # ls: cannot access '/mnt/shared/': Permission denied # command terminated with exit code 2 oc rsh ${PODS[2]} ls -l /mnt/shared/ # total 0 # -rw-r--r--. 1 1000770000 1000770000 ... 0 Jun 9 16:00 testfile_from_pod1
Inspecting the SELinux label on the mount path from the working pod shows it has been labeled with a unique MCS label, not the shared project label.
oc rsh ${PODS[2]} ls -lZ /mnt/shared/testfile_from_pod1 # -rw-r--r--. 1 1000770000 1000770000 system_u:object_r:container_file_t:s0:c230,c706
This specific label (s0:c230,c706) prevents the other pods, which are running with different, randomly assigned MCS labels, from accessing the directory.
5. Expected Results
All pods in the deployment, regardless of the SCC under which they are running, should have simultaneous read and write access to a ReadWriteMany volume. The underlying storage implementation should not apply a restrictive, single-pod MCS label to a shared resource.
6. Analysis and Proposed Solution
The root cause is the CephFS CSI driver's behavior of relabeling the volume with the specific SELinux context of the pod mounting it. While this is a security feature for single-pod volumes, it breaks the ReadWriteMany contract in a multi-pod, multi-SCC context.
This issue appears to be specific to how the ODF CephFS CSI driver handles volume mounting. For comparison, the azurefile-csi driver does not exhibit this behavior. When an Azure File RWX volume is mounted, it receives a generic SELinux type like cifs_t:s0 without any MCS categories. This allows any pod, regardless of its SCC or MCS label, to access the shared files.
Proposed Solution:
The openshift-storage.cephfs.csi.ceph.com provisioner should be modified to handle RWX volumes differently. When mounting a CephFS RWX volume, the CSI driver should force a generic SELinux context that is accessible to all containers. This can be achieved by adding the mount option context="system_u:object_r:container_file_t:s0".
This change would ensure that the mount point is not tainted with a pod-specific MCS label, restoring true ReadWriteMany functionality for all use cases and allowing customers to use flexible SCCs with their critical applications on ODF storage.
Impact:
The impact of this bug is significant and extends across technical, operational, security, and business domains. It is not a minor inconvenience but a critical flaw that undermines the core value proposition of using ODF as a versatile, enterprise-grade storage solution on OpenShift.
1. Business and Application Impact
Blocks Adoption for Critical Workloads: Many stateful, revenue-generating applications (e.g., clustered content management systems, analytics platforms, collaborative software) are designed to run with specific user permissions and rely on a shared filesystem (ReadWriteMany). This bug creates a hard blocker for migrating or deploying such applications on ODF, forcing customers to seek alternative storage solutions.
Causes Application Instability and Downtime: The "last pod wins" behavior introduces extreme fragility. A simple pod restart or a scaling event can cause a random, working pod to lose storage access, leading to cascading failures and application downtime. This unpredictability is unacceptable for production environments.
Hinders Application Modernization: A key benefit of OpenShift is modernizing traditional applications. If a legacy application requires running as a specific user (which necessitates a flexible SCC) and also needs a shared filesystem, it cannot be modernized onto ODF. This stalls strategic IT initiatives.
2. Operational and Architectural Impact
Violates a fundamental principle of Kubernetes Primitive: ReadWriteMany (RWX) is a standard, well-understood storage primitive. This bug breaks its fundamental contract. An RWX volume that doesn't allow multiple pods to write reliably is not fit for purpose and erodes trust in the storage platform.
Increases Operational Complexity: Administrators must now manage a complex and non-obvious interaction between SCCs and the storage layer. Troubleshooting becomes incredibly difficult, as "Permission denied" errors can send teams down the wrong path of debugging application code, user permissions, or network policies, when the root cause is in the storage driver.
Restricts Architectural Freedom: The bug forces a false choice on architects and developers: either use a restrictive security model to get shared storage, or abandon ODF's shared storage to get the security model your application requires. This severely limits the design of resilient, scalable applications on the platform.
3. Security Impact
Reduces Platform Value: OpenShift's Security Context Constraints (SCCs) are a powerful feature for enforcing enterprise security policy. This bug effectively makes ODF incompatible with a significant portion of that security model (RunAsAny, MustRunAsRange, etc.), diminishing the value of both ODF and the OpenShift platform when used together.
In summary, this bug transforms ODF CephFS from a flexible, multi-purpose storage solution into a niche one that only works for applications that can tolerate the restricted SCC. It creates instability for critical applications, forces security compromises, and ultimately acts as a barrier to ODF adoption for a wide range of important use cases.