-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When we specify systemReserved but not reservedMemory in the kubeletconfig.experimental annotation of a performance profile, the NTO will automatically calculate the right amount of reservedMemory [1]. However, it assigns all that memory into NUMA node 0, which can lead to suboptimal memory layout on machines with multiple NUMA nodes. [1] - https://github.com/openshift/cluster-node-tuning-operator/blob/release-4.18/pkg/performanceprofile/controller/performanceprofile/components/kubeletconfig/kubeletconfig.go#L125-L147
Version-Release number of selected component (if applicable):
OCP 4.18, looking at the code this should happen at least on 4.14 and later
How reproducible:
Always
Steps to Reproduce:
1. Create a performance profile with the following annotation kubeletconfig.experimental: | {"systemReserved": {"memory": "9Gi"} 2. oc get kubeletconfig -o yaml 3. See how all reservedMemory is assigned to NUMA node 0 reservedMemory: - limits: memory: 9816Mi numaNode: 0
Actual results:
All reservedMemory is assigned to NUMA node 0
Expected results:
The reservedMemory should be equally distributed between all NUMA nodes.
Additional info:
This is impacting Mavenir's deployment automation, because it requires them to manually set reserved memory and distribute it among the NUMA nodes on each cluster node. They are also impacted by OCPBUGS-51105, so there is no automated way to do this, increasing their deployment and maintenance burden. Mavenir has filed several support cases in the past that were related to a suboptimal memory reservation configuration on their nodes. Simplifying this configuration would reduce the support effort. We are currently working with them on a blueprint based on OCP 4.18 for their future RAN and Core deployments, and while there is a workaround with manual configuration, this bug is seen as a severe annoyance.