Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10640

Recommendation to mount /var in its entirety looks like misguided advice

XMLWordPrintable

    • Important
    • No
    • 5
    • OSDOCS Sprint 234, OSDOCS Sprint 235, OSDOCS Sprint 237, OSDOCS Sprint 238, OSDOCS Sprint 236, OSDOCS Sprint 239, OSDOCS Sprint 241, OSDOCS Sprint 242, OSDOCS Sprint 243, OSDOCS Sprint 244, OSDOCS Sprint 245, OSDOCS Sprint 246, OSDOCS Sprint 247, OSDOCS Sprint 248, OSDOCS Sprint 249, OSDOCS Sprint 250, OSDOCS Sprint 251
    • 17
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

      Recommendation to mount /var in its entirety looks like misguided advice

      Red Hat's partitioning recommendations:
      https://docs.openshift.com/container-platform/4.12/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-advanced_disk_installing-bare-metal

      There are two cases where you might want to override the default partitioning when installing RHCOS on an OpenShift Container Platform cluster node:

      Creating separate partitions: For greenfield installations on an empty disk, you might want to add separate storage to a partition. This is officially supported for mounting /var or a subdirectory of /var, such as /var/lib/etcd, on a separate partition, but not both.

      For disk sizes larger than 100GB, and especially disk sizes larger than 1TB, create a separate /var partition. See "Creating a separate /var partition" and this Red Hat Knowledgebase article for more information.

      Kubernetes supports only two file system partitions. If you add more than one partition to the original configuration, Kubernetes cannot monitor all of them.

      It is important to keep in mind the following:
      https://access.redhat.com/articles/4766521
      https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#eviction-policy

      The kubelet supports the following filesystem partitions:

      nodefs: The node's main filesystem, used for local disk volumes, emptyDir, log storage, and more. For example, nodefs contains /var/lib/kubelet/.
      imagefs: An optional filesystem that container runtimes use to store container images and container writable layers.

      Kubelet auto-discovers these filesystems and ignores other filesystems. Kubelet does not support other configurations.

      Therefore, I believe that the recommendation to mount /var in its entirety is actually an error as both /var/lib/kubelet and /var/lib/containers reside on that filesystem.

      If all of /var was mounted to an individual partition, then imagefs == nodefs == /var and kubernetes would only monitor disk pressure on /var. Nodefs monitoring for any other partition would not work, and thus administrators would be flying blind with no metrics about disk pressure reported from kubelet for the other partitions.

      The alternative is to mount /var/lib/containters onto its own partition. In that case imagefs != nodefs, imagefs == /var/lib/containers, nodefs == /. Kubernetes will monitor disk pressure on "/" via nodefs and on "/var/lib/containers" via imagefs.

      In other words:

      • /var/lib/kubelet must be under the "/" mountpoint so that kubernetes monitors all of "/" via the nodefs counter.
      • And /var/lib/containers can be mounted as a separate mountpoint so that kubelet monitors the partition via the imagefs counter.

      I would argue that with the current state of things, the only implementation that makes sense and should be supported for partitioning is either of the following:

      • default partitioning scheme
      • default partitioning scheme + mounting /var/lib/containers on a separate volume

      The above 2 combinations are the only ways to guarantee that kubelet has a complete overview of the state of its node's storage.

      Any other partitioning scheme would mean that kubelet cannot fully track disk usage on the node.

      Here is how you get the metrics from the node:

      TOKEN=$(kubectl get secrets -n openshift-cluster-version -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)
      curl -k -H "Authorization: Bearer ${TOKEN}" https://<node address>:10250/stats/summary 2>/dev/null | jq '.node.fs,.node.runtime.imageFs' 
      

      > NOTE: In case of a firewall blocking port 10250, it's also possible to connect to the node itself, copy/paste the TOKEN into the terminal and run the curl against 127.0.0.1:10250.

      On a system where /var/lib/containers is mounted on its own partition:

      $ TOKEN=$(kubectl get secrets -n openshift-cluster-version -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)
      $ curl -k -H "Authorization: Bearer ${TOKEN}" https://worker01.redhat-ocp1.e5gc.bos.redhat.lab:10250/stats/summary 2>/dev/null | jq '.node.fs,.node.runtime.imageFs' 
      {
        "time": "2023-03-21T15:20:43Z",
        "availableBytes": 28379086848,
        "capacityBytes": 51880394752,
        "usedBytes": 23501307904,
        "inodesFree": 25296924,
        "inodes": 25337344,
        "inodesUsed": 40420
      }
      {
        "time": "2023-03-21T15:20:43Z",
        "availableBytes": 235965177856,
        "capacityBytes": 246890082304,
        "usedBytes": 26686946160,
        "inodesFree": 120490599,
        "inodes": 120610752,
        "inodesUsed": 120153
      }
      

      On a system where /var/lib/containers is not mounted on a separate partition. You can see that the reported values are exactly the same:

      $ curl -k -H "Authorization: Bearer ${TOKEN}" https://192.168.18.22:10250/stats/summary 2>/dev/null | jq '.node.fs,.node.runtime.imageFs'
      {
        "time": "2023-03-21T15:22:53Z",
        "availableBytes": 423603036160,
        "capacityBytes": 479555555328,
        "usedBytes": 55952519168,
        "inodesFree": 233893866,
        "inodes": 234163072,
        "inodesUsed": 269206
      }
      {
        "time": "2023-03-21T15:22:53Z",
        "availableBytes": 423603036160,
        "capacityBytes": 479555555328,
        "usedBytes": 100197083100,
        "inodesFree": 233893866,
        "inodes": 234163072,
        "inodesUsed": 269206
      }
      

              dfitzmau@redhat.com Darragh Fitzmaurice
              akaris@redhat.com Andreas Karis
              Gaoyun Pei Gaoyun Pei
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: