Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.13.z, 4.12.z, 4.11.z, 4.14.z, 4.15, 4.16
Component/s: Documentation / Installer
Labels:
- waiting_on_dev

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
5
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
OSDOCS Sprint 234, OSDOCS Sprint 235, OSDOCS Sprint 237, OSDOCS Sprint 238, OSDOCS Sprint 236, OSDOCS Sprint 239, OSDOCS Sprint 241, OSDOCS Sprint 242, OSDOCS Sprint 243, OSDOCS Sprint 244, OSDOCS Sprint 245, OSDOCS Sprint 246, OSDOCS Sprint 247, OSDOCS Sprint 248, OSDOCS Sprint 249, OSDOCS Sprint 250, OSDOCS Sprint 251
sprint_count:
17

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Recommendation to mount /var in its entirety looks like misguided advice

Red Hat's partitioning recommendations:
https://docs.openshift.com/container-platform/4.12/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-advanced_disk_installing-bare-metal

There are two cases where you might want to override the default partitioning when installing RHCOS on an OpenShift Container Platform cluster node:

Creating separate partitions: For greenfield installations on an empty disk, you might want to add separate storage to a partition. This is officially supported for mounting /var or a subdirectory of /var, such as /var/lib/etcd, on a separate partition, but not both.

For disk sizes larger than 100GB, and especially disk sizes larger than 1TB, create a separate /var partition. See "Creating a separate /var partition" and this Red Hat Knowledgebase article for more information.

Kubernetes supports only two file system partitions. If you add more than one partition to the original configuration, Kubernetes cannot monitor all of them.

It is important to keep in mind the following:
https://access.redhat.com/articles/4766521
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#eviction-policy

The kubelet supports the following filesystem partitions:

nodefs: The node's main filesystem, used for local disk volumes, emptyDir, log storage, and more. For example, nodefs contains /var/lib/kubelet/.
imagefs: An optional filesystem that container runtimes use to store container images and container writable layers.

Kubelet auto-discovers these filesystems and ignores other filesystems. Kubelet does not support other configurations.

Therefore, I believe that the recommendation to mount /var in its entirety is actually an error as both /var/lib/kubelet and /var/lib/containers reside on that filesystem.

If all of /var was mounted to an individual partition, then imagefs == nodefs == /var and kubernetes would only monitor disk pressure on /var. Nodefs monitoring for any other partition would not work, and thus administrators would be flying blind with no metrics about disk pressure reported from kubelet for the other partitions.

The alternative is to mount /var/lib/containters onto its own partition. In that case imagefs != nodefs, imagefs == /var/lib/containers, nodefs == /. Kubernetes will monitor disk pressure on "/" via nodefs and on "/var/lib/containers" via imagefs.

In other words:

/var/lib/kubelet must be under the "/" mountpoint so that kubernetes monitors all of "/" via the nodefs counter.
And /var/lib/containers can be mounted as a separate mountpoint so that kubelet monitors the partition via the imagefs counter.

I would argue that with the current state of things, the only implementation that makes sense and should be supported for partitioning is either of the following:

default partitioning scheme
default partitioning scheme + mounting /var/lib/containers on a separate volume

The above 2 combinations are the only ways to guarantee that kubelet has a complete overview of the state of its node's storage.

Any other partitioning scheme would mean that kubelet cannot fully track disk usage on the node.

Here is how you get the metrics from the node:

TOKEN=$(kubectl get secrets -n openshift-cluster-version -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)
curl -k -H "Authorization: Bearer ${TOKEN}" https://<node address>:10250/stats/summary 2>/dev/null | jq '.node.fs,.node.runtime.imageFs'

> NOTE: In case of a firewall blocking port 10250, it's also possible to connect to the node itself, copy/paste the TOKEN into the terminal and run the curl against 127.0.0.1:10250.

On a system where /var/lib/containers is mounted on its own partition:

$ TOKEN=$(kubectl get secrets -n openshift-cluster-version -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 --decode)
$ curl -k -H "Authorization: Bearer ${TOKEN}" https://worker01.redhat-ocp1.e5gc.bos.redhat.lab:10250/stats/summary 2>/dev/null | jq '.node.fs,.node.runtime.imageFs' 
{
  "time": "2023-03-21T15:20:43Z",
  "availableBytes": 28379086848,
  "capacityBytes": 51880394752,
  "usedBytes": 23501307904,
  "inodesFree": 25296924,
  "inodes": 25337344,
  "inodesUsed": 40420
}
{
  "time": "2023-03-21T15:20:43Z",
  "availableBytes": 235965177856,
  "capacityBytes": 246890082304,
  "usedBytes": 26686946160,
  "inodesFree": 120490599,
  "inodes": 120610752,
  "inodesUsed": 120153
}

On a system where /var/lib/containers is not mounted on a separate partition. You can see that the reported values are exactly the same:

$ curl -k -H "Authorization: Bearer ${TOKEN}" https://192.168.18.22:10250/stats/summary 2>/dev/null | jq '.node.fs,.node.runtime.imageFs'
{
  "time": "2023-03-21T15:22:53Z",
  "availableBytes": 423603036160,
  "capacityBytes": 479555555328,
  "usedBytes": 55952519168,
  "inodesFree": 233893866,
  "inodes": 234163072,
  "inodesUsed": 269206
}
{
  "time": "2023-03-21T15:22:53Z",
  "availableBytes": 423603036160,
  "capacityBytes": 479555555328,
  "usedBytes": 100197083100,
  "inodesFree": 233893866,
  "inodes": 234163072,
  "inodesUsed": 269206
}

links to

openshift/openshift-docs#67707: OCPBUGS-10640: Added clarification point to disk partition BM doc

openshift/openshift-docs#73361: OCPBUGS-10640-13: 4.13 Added clarification point to disk partition BM…

Assignee:: Darragh Fitzmaurice

Reporter:: Andreas Karis

Need Info From:: None

Contributors:: None

QA Contact:: Gaoyun Pei

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2023/03/21 6:00 PM

Updated:: 2025/07/27 11:43 AM

Resolved:: 2024/03/22 4:33 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates