-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.15.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Customer reported that the alert NodeFilesystemAlmostOutOfFiles triggered on the cluster and opened a support case. OLM was observed to be on the node and remained on the node until SRE rebooted the node.
Version-Release number of selected component (if applicable):
4.15.29
How reproducible:
Unknown - this is customer reported. See must-gather and logs attached to the support case
Steps to Reproduce:
1. To reproduce having a process to use up a lot of file handles on the same node that OLM is running on
2. OLM will continue to try to mount even when there is no space left on the device
Actual results:
pod_workers.go:1300] "Error syncing pod, skipping" err="failed to ensure that the pod: 7f89b1b3-dcaa-4ac7-a6a6-e472bf0c1302 cgroups exist and are correctly applied: failed to create container for [kubepods burstable pod7f89b1b3-dcaa-4ac7-a6a6-e472bf0c1302] : No space left on device" pod="openshift-operator-lifecycle-manager/collect-profiles-28907700-tzw6x" podUID="7f89b1b3-dcaa-4ac7-a6a6-e472bf0c1302"
remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to stop infra container for pod sandbox 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: failed to unmount container 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: open /run/containers/storage/overlay-layers/.tmp-mountpoints.json3534497601: no space left on device" podSandboxID="4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f"
kubelet.go:2023] failed to "KillPodSandbox" for "925b7599-c8b5-44db-87c2-30e0223fac82" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to stop infra container for pod sandbox 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: failed to unmount container 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: open /run/containers/storage/overlay-layers/.tmp-mountpoints.json3534497601: no space left on device"
Expected results:
Customers are expecting Openshift operators to be more resilient.
Additional info: