Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-2802

Alert KubePersistentVolumeInodesFillingUp

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • openshift-4.11
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW

      Description of problem
      ======================

      Since values of total inode capacity for filesystems with dynamic inode
      allocation are not well defined (every such filesystem such as CephFS, XFS,
      or Btrfs behaves slightly differently), it's not possible to interpret these
      values in the same way as for "traditional" filesystems with static inode
      allocation (such as ext4).

      And Because alert KubePersistentVolumeInodesFillingUp doesn't distinquist
      between the two cases, it could fire for PVCs backed by filesystems with
      dynamic inode allocation causing a false alarm.

      Version-Release number of selected component
      ============================================

      OCP 4.11.0

      How reproducible
      ================

      100%

      Steps to Reproduce
      ==================

      1. Install OCP
      2. Reconfigure OpenShift Container Platform registry to use RWX CephFS volume
      provided by ODF
      3. Use the cluster for a while
      4. Check firing alerts

      Actual results
      ==============

      Alert KubePersistentVolumeInodesFillingUp is firing with the following
      message:

      The PersistentVolume claimed by registry-cephfs-rwx-pvc in Namespace
      openshift-image-registry only has 0% free inodes.

      In this particular case, there will be 2 such alerts, as there are 2 replicas
      of the registry.

      Expected results
      ================

      Alert KubePersistentVolumeInodesFillingUp is not firing when RWX CephFS volume
      is used to provide persistent storage for some OCP component.

      Additional info
      ===============

      The definition of the alert looks like this:

      (kubelet_volume_stats_inodes_free{job="kubelet",metrics_path="/metrics",namespace=~"(openshift-.*|kube-.*|default)"} / kubelet_volume_stats_inodes{job="kubelet",metrics_path="/metrics",namespace=~"(openshift-.*|kube-.*|default)"}) < 0.03 and kubelet_volume_stats_inodes_used{job="kubelet",metrics_path="/metrics",namespace=~"(openshift-.*|kube-.*|default)"} > 0 unless on (namespace, persistentvolumeclaim) kube_persistentvolumeclaim_access_mode{access_mode="ReadOnlyMany",namespace=~"(openshift-.*|kube-.*|default)"} == 1 unless on (namespace, persistentvolumeclaim) kube_persistentvolumeclaim_labels{label_alerts_k8s_io_kube_persistent_volume_filling_up="disabled",namespace=~"(openshift-.*|kube-.*|default)"} == 1
      

      So it looks like there was some attempt to prevent this from happening, but
      without some reliable tracking which filesystem is used and whether we want to
      take inode values seriously for given volume, the alert can't avoid false
      alarms.

              spasquie@redhat.com Simon Pasquier
              mbukatov@redhat.com Martin Bukatovič
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: