Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-27752

KubePersistentVolumeFillingUp due to exceeded maximum retention size

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.14.z
    • Monitoring
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      KubePersistentVolumeFillingUp (= firing when less than 3% of PV size left) alerts are firing for several clusters that have a configured PV size of 100 GB and a max retention of 90 GB.
      
      Our CMO config: https://github.com/openshift/managed-cluster-config/blob/master/resources/cluster-monitoring-config/config.yaml#L29
       retention: 11d
        retentionSize: 90GB
        volumeClaimTemplate:
          metadata:
            name: prometheus-data
          spec:
            resources:
              requests:
                storage: 100Gi 
      
      We also saw major desyncs in the two PV sizes (screenshot attached), possibly pointing at https://access.redhat.com/solutions/7024829. We are unsure the desync is the cause of it though, as the difference is up to 80 GB.
      
      

      Version-Release number of selected component (if applicable):

          4.14.z

      How reproducible:

          Repeated pattern of alerts flapping for 4 days then no alerts for 4 days

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          KubePersistentVolumeFillingUp firing

      Expected results:

          KubePersistentVolumeFillingUp should not fire as PV used data should never be close to the PV limit with the above CMO settings (https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects)

      Additional info:

      Attaching:
      - partial must gather (no logs)
      - container logs for openshift-monitoring namespace
      - screenshot of PV size difference
      
      

              spasquie@redhat.com Simon Pasquier
              cbusse.openshift Claudio Busse
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: