Description of problem:
Metric storage_operation_duration_seconds_count records failures for emptyDir volumes on any OCP cluster:
sum by(volume_plugin, status) (storage_operation_duration_seconds_count{operation_name="volume_apply_access_control"}) status volume_plugin Value fail-unknown kubernetes.io/empty-dir 337 success kubernetes.io/configmap 3141
Version-Release number of selected component (if applicable):
Server Version: 4.13.0-0.ci.test-2023-03-23-133416
How reproducible:
always
Steps to Reproduce:
1. Install a cluster 2. get the metric mentioned above
Actual results:
fail-unknown on empty-dir is nozero
Expected results:
fail-unknown on empty-dir is zero
Additional info:
I can see failures in empty-dir are caused by empty-dir volumes used as a backend / wrapped volume for Secrets, ConfigMaps, Projected and DownwardAPI volumes.
I added some debug logs, SetVolumeOwnership call in the EmptyDir volume plugin fails:
fsGroup failed: lstat /var/lib/kubelet/pods/4b058507-681a-45b6-9b18-b04e10fc4c38/volumes/kubernetes.io~empty-dir/wrapped_kube-api-access-77k7c: no such file or directory
- links to
-
RHBA-2024:1458 OpenShift Container Platform 4.14.z bug fix update