Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35345

Prometheus reporting no space left on device - but the alert for KubePersistentVolumeFillingUp is not firing

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.14.0
    • Monitoring
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Prometheus reporting no space left on device - but the alert for KubePersistentVolumeFillingUp is not firing
      
      The following alerts were firing
      
      RULE                         SEVERITY   STATE    AGE   ALERTS   ACTIVE SINCE 
      ClusterVersionOperatorDown   critical   firing   28s   1        
      KubeControllerManagerDown    critical   firing   23s   1        
      KubeSchedulerDown            critical   firing   24s   1        
      Watchdog                     none       firing   1s    1        
      AppKubeAPIDown               critical   firing   17s   1        
      AppKubeletDown               critical   firing   11s   1        
      KubeAPIDown                  critical   firing   16s   1        
      KubeletDown                  critical   firing   5s    1        
      NoRunningOvnMaster           critical   firing   0s    1        
      
      In the must-gather provided I can see the KubePersistentVolumeFillingUp alerts reporting `inactive` but only for seconds...the detail in those alerts do show that prometheus was out of space: 
      
      RULE                            SEVERITY   STATE      AGE   ALERTS
      KubePersistentVolumeFillingUp   critical   inactive   21s   0        
      KubePersistentVolumeFillingUp   warning    inactive   21s   0     
      
      health: err
        labels:
          severity: warning
        lastError: 'write to WAL: log samples: write /prometheus/wal/00005989: no space
            

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      
      

      Actual results:

      metrics are not avaliable

      Expected results:

      an alert indiacting an issue with prometheus....after the kubeletdown alert has fired 

      Additional info:

      must-gather is attached to the linked case

              rh-ee-amrini Ayoub Mrini
              rhn-support-nigsmith Nigel Smith
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: