-
Feature Request
-
Resolution: Done
-
Normal
-
None
-
openshift-4.12.z, openshift-4.13.z
1. Proposed title of this feature request
Prometheus metrics to calculate containers' total filesystem usage including EmptyDir volumes
2. What is the nature and description of the request?
Prometheus metrics (specially container_fs_usage_bytes) don't calculate the total filesystem usage of each container/pod separately and accurately. It doesn't take into consideration the EmptyDir volumes.
3. Why does the customer need this? (List the business requirements here)
When a node's filesystem is exhausted, there should be a way to tell which particular container is consuming most of the available node's filesystem
4. List any affected packages or components.
OpenShift monitoring stack - Prometheus
Reproduce:
- Create a two containers pod, configure one container to mount a volume of type EmptyDir
- Add a 5GB file to the EmptyDir mount point
- Monitor node's, pod's and container's filesystem usage using Prometheus metrics
# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES simple-866f479df4-bnsqw 2/2 Running 0 17m 10.130.0.24 ipi1-p6h6d-master-0 <none> <none> # oc set volume pod/simple-866f479df4-bnsqw simple-866f479df4-bnsqw empty directory as empty-dir-volume mounted at /mnt/mydata in container container1 unknown as kube-api-access-jwz7x mounted at /var/run/secrets/kubernetes.io/serviceaccount in container container1 mounted at /var/run/secrets/kubernetes.io/serviceaccount in container container2 # oc exec simple-866f479df4-bnsqw -c container1 -- ls -lh /mnt/mydata total 5.1G -rw-rw-rw-. 1 1000670000 1000670000 5.0G Oct 24 09:31 big-file # oc exec simple-866f479df4-bnsqw -c container2 -- ls -lh /mnt/mydata ls: cannot access /mnt/mydata: No such file or directory command terminated with exit code 2
From Prometheus, run the following queries:
sum(container_fs_usage_bytes{node = "ipi1-p6h6d-master-0"})
container_fs_usage_bytes{namespace = "test", pod= "simple-866f479df4-bnsqw"}
There is no change in the collected values