Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76453

openshift-operator-controller filling up ephemeral disk when unable to unpack the clustercatalogs

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.18
    • OLM
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      While the clustercatalog PODs failed to pull their image, the operator-controller-controller-manager seems to fail to unpack them and try again & again, not cleaning up the previous temporary folders created in previous attempts

      Listing of temporary folders
      $ oc rsh -n openshift-operator-controller $(oc get pod -n openshift-operator-controller -l control-plane=operator-controller-controller-manager -o name) ls -larth /var/cache/catalogs
      total 72M
      drwxrwxrwx.    3 root root   30 Nov 19 23:08 ..
      drwx------.  152 1001 root 8.0K Feb  3 02:19 openshift-redhat-operators
      drwx------.   43 1001 root 4.0K Feb  3 02:19 openshift-redhat-marketplace
      drwx------.  241 1001 root  12K Feb  3 22:25 openshift-certified-operators
      drwx------.  284 1001 root  12K Feb  3 22:25 openshift-community-operators
      drwx------.  121 1001 root 8.0K Feb  3 22:26 .openshift-redhat-operators-4015104162
      drwx------.  121 1001 root 8.0K Feb  3 22:26 .openshift-redhat-operators-3615668944
      drwx------.  121 1001 root 8.0K Feb  3 22:26 .openshift-redhat-operators-3230349928
      drwx------.  121 1001 root 8.0K Feb  3 22:26 .openshift-redhat-operators-2381597722
      
      [...]
      
      drwx------.  121 1001 root 8.0K Feb  4 16:21 .openshift-redhat-operators-872931253
      drwx------.  121 1001 root 8.0K Feb  4 16:22 .openshift-redhat-operators-4008688996
      drwx------.  121 1001 root 8.0K Feb  4 16:22 .openshift-redhat-operators-4246779203
      drwx------.  121 1001 root 8.0K Feb  4 16:22 .openshift-redhat-operators-2345602149
      drwx------.  121 1001 root 8.0K Feb  4 16:22 .openshift-redhat-operators-565489257
      drwx------.  121 1001 root 8.0K Feb  4 16:22 .openshift-redhat-operators-3535223206
      drwx------.   17 1001 root 4.0K Feb  4 16:57 .openshift-redhat-operators-1604285883
      drwx------.    8 1001 root  163 Feb  4 17:01 .openshift-redhat-operators-2650254321
      drwx------. 6073 1001 root 340K Feb  4 17:35 .
      drwx------.    8 1001 root  163 Feb  4 17:35 .openshift-redhat-operators-1244524397
      
      Used space by the temporary files
      $ oc rsh -n openshift-operator-controller $(oc get pod -n openshift-operator-controller -l control-plane=operator-controller-controller-manager -o name) du -sch /var/cache/catalogs
      3.5T	/var/cache/catalogs
      3.5T	total
      

      Version-Release number of selected component (if applicable):

      This was noticed in RHOCP 4.18.28

      How reproducible:

      Maybe blocking the traffic to access the clustercatalog images should be enough to reproduce the issue.

      Steps to Reproduce:

      1.
      2.
      3.

      Actual results:

      The temporary folders are not removed leading the node running the operator-controller-controller-manager to be under disk pressure once number of temporary folder starts to be too much.

      Expected results:

      The temporary folders from previous attempts should be removed.
      OR
      Some quota should be use in order to prevent the space usage to fill up the disk.

      Additional info:

              rh-ee-cchantse Catherine Chan-Tse
              rhn-support-vlours Vincent Lours
              None
              None
              Jian Zhang Jian Zhang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: