Description of problem:
While the clustercatalog PODs failed to pull their image, the operator-controller-controller-manager seems to fail to unpack them and try again & again, not cleaning up the previous temporary folders created in previous attempts
Listing of temporary folders
$ oc rsh -n openshift-operator-controller $(oc get pod -n openshift-operator-controller -l control-plane=operator-controller-controller-manager -o name) ls -larth /var/cache/catalogs total 72M drwxrwxrwx. 3 root root 30 Nov 19 23:08 .. drwx------. 152 1001 root 8.0K Feb 3 02:19 openshift-redhat-operators drwx------. 43 1001 root 4.0K Feb 3 02:19 openshift-redhat-marketplace drwx------. 241 1001 root 12K Feb 3 22:25 openshift-certified-operators drwx------. 284 1001 root 12K Feb 3 22:25 openshift-community-operators drwx------. 121 1001 root 8.0K Feb 3 22:26 .openshift-redhat-operators-4015104162 drwx------. 121 1001 root 8.0K Feb 3 22:26 .openshift-redhat-operators-3615668944 drwx------. 121 1001 root 8.0K Feb 3 22:26 .openshift-redhat-operators-3230349928 drwx------. 121 1001 root 8.0K Feb 3 22:26 .openshift-redhat-operators-2381597722 [...] drwx------. 121 1001 root 8.0K Feb 4 16:21 .openshift-redhat-operators-872931253 drwx------. 121 1001 root 8.0K Feb 4 16:22 .openshift-redhat-operators-4008688996 drwx------. 121 1001 root 8.0K Feb 4 16:22 .openshift-redhat-operators-4246779203 drwx------. 121 1001 root 8.0K Feb 4 16:22 .openshift-redhat-operators-2345602149 drwx------. 121 1001 root 8.0K Feb 4 16:22 .openshift-redhat-operators-565489257 drwx------. 121 1001 root 8.0K Feb 4 16:22 .openshift-redhat-operators-3535223206 drwx------. 17 1001 root 4.0K Feb 4 16:57 .openshift-redhat-operators-1604285883 drwx------. 8 1001 root 163 Feb 4 17:01 .openshift-redhat-operators-2650254321 drwx------. 6073 1001 root 340K Feb 4 17:35 . drwx------. 8 1001 root 163 Feb 4 17:35 .openshift-redhat-operators-1244524397
Used space by the temporary files
$ oc rsh -n openshift-operator-controller $(oc get pod -n openshift-operator-controller -l control-plane=operator-controller-controller-manager -o name) du -sch /var/cache/catalogs 3.5T /var/cache/catalogs 3.5T total
Version-Release number of selected component (if applicable):
This was noticed in RHOCP 4.18.28
How reproducible:
Maybe blocking the traffic to access the clustercatalog images should be enough to reproduce the issue.
Steps to Reproduce:
1.
2.
3.
Actual results:
The temporary folders are not removed leading the node running the operator-controller-controller-manager to be under disk pressure once number of temporary folder starts to be too much.
Expected results:
The temporary folders from previous attempts should be removed.
OR
Some quota should be use in order to prevent the space usage to fill up the disk.