Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-47468

Filesystem has no space left on device for OLM

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.15.z
    • Node / Kubelet
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          Customer reported that the alert NodeFilesystemAlmostOutOfFiles triggered on the cluster and opened a support case. OLM was observed to be on the node and remained on the node until SRE rebooted the node.

       

      Version-Release number of selected component (if applicable):

          4.15.29

      How reproducible:

          Unknown - this is customer reported. See must-gather and logs attached to the support case

      Steps to Reproduce:

          1. To reproduce having a process to use up a lot of file handles on the same node that OLM is running on
          2. OLM will continue to try to mount even when there is no space left on the device
          

      Actual results:

       pod_workers.go:1300] "Error syncing pod, skipping" err="failed to ensure that the pod: 7f89b1b3-dcaa-4ac7-a6a6-e472bf0c1302 cgroups exist and are correctly applied: failed to create container for [kubepods burstable pod7f89b1b3-dcaa-4ac7-a6a6-e472bf0c1302] : No space left on device" pod="openshift-operator-lifecycle-manager/collect-profiles-28907700-tzw6x" podUID="7f89b1b3-dcaa-4ac7-a6a6-e472bf0c1302"
      remote_runtime.go:222]   "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to stop infra container for pod sandbox 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: failed to unmount container 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: open /run/containers/storage/overlay-layers/.tmp-mountpoints.json3534497601: no space left on device" podSandboxID="4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f"
      
       kubelet.go:2023] failed to "KillPodSandbox" for "925b7599-c8b5-44db-87c2-30e0223fac82" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to stop infra container for pod sandbox 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: failed to unmount container 4616416495b4bfa8244b7e3aaaf2f8ed9d84fa3e50c14f0e723dfe9df260713f: open /run/containers/storage/overlay-layers/.tmp-mountpoints.json3534497601: no space left on device"

      Expected results:

          Customers are expecting Openshift operators to be more resilient.

      Additional info:

          

              aos-node@redhat.com Node Team Bot Account
              lranjbar@redhat.com Lisa Ranjbar (Inactive)
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: