Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6577

Node-exporter NodeFilesystemAlmostOutOfSpace alert exception needed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.13, 4.12, 4.11
    • Monitoring
    • +
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      * With this release, the `NodeFilesystemAlmostOutOfSpace` no longer fires for certain read-only `tmpfs` instances. This change fixes an issue in which the alert fired for certain `tmpfs` mount points that were full by design. (link:https://issues.redhat.com/browse/OCPBUGS-6577[*OCPBUGS-6577*])
      Show
      * With this release, the `NodeFilesystemAlmostOutOfSpace` no longer fires for certain read-only `tmpfs` instances. This change fixes an issue in which the alert fired for certain `tmpfs` mount points that were full by design. (link: https://issues.redhat.com/browse/OCPBUGS-6577 [* OCPBUGS-6577 *])
    • Bug Fix
    • Done

      Description of problem:

      We've discovered a case where the node-exporter NodeFilesystemAlmostOutOfSpace alert is firing for a customer use case that could be excluded from the checks as a whole.  
      
      The customer is using the [IBM Cloud Object Storage Plugin](https://github.com/IBM/ibmcloud-object-storage-plugin), which creates a tmpfs mountpoint on the cluster node to store the API key or AccessKey + SecretKey for AWS S3.  However, it creates a 4k filesystem, just large enough to store a single file, holding the key.
      
      This triggers the node-exporter to think the filesystem is out or almost out of disk space, and raise an alert, which is paging OpenShift Dedicated SREs on-call.
      
      It might be possible to exclude these on OSD clusters with the `--collector.filesystem.mount-points-exclude`, but doing so for any plugin or 3rd party component that might do this would be difficult to scale.
      
      It's been suggested that the node-exporter could possibly handle this, perhaps with a 1M minimum filesystem size requirement to be included in the monitored filesystems (or something along those lines)?
      
      I realize this isn't a bug, per se, but we're hoping the node-exporter can be more efficient than manual SRE intervention.
      

      Version-Release number of selected component (if applicable):

      N/A
      

      How reproducible:

      100%
      

      Steps to Reproduce:

      1. Install and setup the IBM Cloud Object Storage plugin S3 component.
      2. Observe the nodeFileSystemFillingUp alert triggered by the stored passwd file
      
      

      Actual results:

      nodeFileSystemFillingUp alert triggered
      

      Expected results:

      Hopefully nodeFileSystemFillingUp alert is suppressed.
      

      Additional info:

      The code in question is here: https://github.com/IBM/ibmcloud-object-storage-plugin/blob/157a391b710bb1a89da609d45c48eb5d2ba3e1d8/driver/driver.go#L259

              jfajersk@redhat.com Jan Fajerski
              chcollin Chris Collins
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: