-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.13, 4.12, 4.11
-
+
-
Moderate
-
None
-
False
-
-
-
Bug Fix
-
Done
Description of problem:
We've discovered a case where the node-exporter NodeFilesystemAlmostOutOfSpace alert is firing for a customer use case that could be excluded from the checks as a whole. The customer is using the [IBM Cloud Object Storage Plugin](https://github.com/IBM/ibmcloud-object-storage-plugin), which creates a tmpfs mountpoint on the cluster node to store the API key or AccessKey + SecretKey for AWS S3. However, it creates a 4k filesystem, just large enough to store a single file, holding the key. This triggers the node-exporter to think the filesystem is out or almost out of disk space, and raise an alert, which is paging OpenShift Dedicated SREs on-call. It might be possible to exclude these on OSD clusters with the `--collector.filesystem.mount-points-exclude`, but doing so for any plugin or 3rd party component that might do this would be difficult to scale. It's been suggested that the node-exporter could possibly handle this, perhaps with a 1M minimum filesystem size requirement to be included in the monitored filesystems (or something along those lines)? I realize this isn't a bug, per se, but we're hoping the node-exporter can be more efficient than manual SRE intervention.
Version-Release number of selected component (if applicable):
N/A
How reproducible:
100%
Steps to Reproduce:
1. Install and setup the IBM Cloud Object Storage plugin S3 component. 2. Observe the nodeFileSystemFillingUp alert triggered by the stored passwd file
Actual results:
nodeFileSystemFillingUp alert triggered
Expected results:
Hopefully nodeFileSystemFillingUp alert is suppressed.
Additional info:
The code in question is here: https://github.com/IBM/ibmcloud-object-storage-plugin/blob/157a391b710bb1a89da609d45c48eb5d2ba3e1d8/driver/driver.go#L259
- links to