-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
False
-
-
-
-
-
1. Proposed title of this feature request
Add 'numactl' RPM to CoreOS image for better troubleshooting
2. What is the nature and description of the request?
One of our customers runs at least 30 clusters that I'm aware of and all are baremetal (air-gapped) with multi-processor systems, 512G of RAM and hugepages enabled for 25% of the system RAM. A lot of the performance tuning as well as RCAs post-outage require us to look at all the tiny details to come up with recommendations
3. Why does the customer need this? (List the business requirements here)
Having the 'numactl' RPM added to the CoreOS image would help us troubleshoot and get to a resolution quicker.
4. List any affected packages or components.
Looks like in RHEL 8.4, the required dependencies are met in CoreOS on OCP4.6:
[root@hyp2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 8.4 (Ootpa)
[root@hyp2 ~]# dnf whatprovides numactl numactl-2.0.12-11.el8.x86_64 : Library for tuning for Non Uniform Memory Access machines Repo : rhel-8-for-x86_64-baseos-rpms Matched from: Provide : numactl = 2.0.12-11.el8
[root@hyp2 ~]$ dnf deplist numactl-2.0.12-11.el8.x86_64
package: numactl-2.0.12-11.el8.x86_64
dependency: /sbin/ldconfig
provider: glibc-2.28-151.el8.i686
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6()(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6(GLIBC_2.14)(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6(GLIBC_2.17)(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6(GLIBC_2.2.5)(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6(GLIBC_2.3)(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6(GLIBC_2.3.4)(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libc.so.6(GLIBC_2.4)(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libm.so.6()(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: libnuma.so.1()(64bit)
provider: numactl-libs-2.0.12-11.el8.x86_64
dependency: libnuma.so.1(libnuma_1.1)(64bit)
provider: numactl-libs-2.0.12-11.el8.x86_64
dependency: libnuma.so.1(libnuma_1.2)(64bit)
provider: numactl-libs-2.0.12-11.el8.x86_64
dependency: libnuma.so.1(libnuma_1.3)(64bit)
provider: numactl-libs-2.0.12-11.el8.x86_64
dependency: libnuma.so.1(libnuma_1.4)(64bit)
provider: numactl-libs-2.0.12-11.el8.x86_64
dependency: librt.so.1()(64bit)
provider: glibc-2.28-151.el8.x86_64
dependency: rtld(GNU_HASH)
provider: glibc-2.28-151.el8.i686
provider: glibc-2.28-151.el8.x86_64
[root@hyp2 ~]# dnf download numactl numactl-2.0.12-11.el8.x86_64.rpm
[root@hyp2 ~]# scp numactl-2.0.12-11.el8.x86_64.rpm core@192.168.0.33: numactl-2.0.12-11.el8.x86_64.rpm 100% 76KB 19.1MB/s 00:00
[root@master-2 ~]# cat /etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.6
[root@master-2 ~]# mount -o remount,rw /usr/
[root@master-2 ~]# rpm -i /home/core/numactl-2.0.12-11.el8.x86_64.rpm
[root@master-2 ~]# numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 7971 MB node 0 free: 293 MB node 1 cpus: 2 3 node 1 size: 8062 MB node 1 free: 105 MB node distances: node 0 1 0: 10 20 1: 20 10
=======================
I have a whole write-up in Gitlab that kind of explains what we're doing to circumvent this at the moment
### Why do we need this ?
One thing I see over-looked a lot on multi-processor systems is the amount of RAM free per NUMA node. It's possible for a process to be spawned on a particular NUMA node and it's not uncommon for that process to balloon in memory usage (MySQL) on a single NUMA node over time. If that NUMA node starts running out of Free RAM, it's possible the process will be OOM-killed...
...When that happens, if you run 'free -m', leading up to the event you might see a lot of available RAM but it's deceiving if a lot of the free RAM belongs to the NUMA node where your process that was OOM-killed lived. While we may not know for sure where that process might've lived, we might be able to gather some tell-tale signs that could be a sign of memory exhaustion in the near-future.
### Here's how to calculate that (skip to step 7 if you don't want the explanation):
1. The default memory pagesize is '4096' bytes or 4kB:
[root@master-2 ~]# getconf PAGESIZE 4096
2. Total number of Cores on my system
[root@master-2 ~]# grep processor /proc/cpuinfo processor : 0 processor : 1 processor : 2 processor : 3
3. Total number of NUMA nodes on my system
[root@master-2 ~]# ls /sys/devices/system/node/ | grep node node0 node1
4. Using that logic, I checked how much 'free' RAM is in the 'free' output and checked the 'nr_free_pages' on my lab with fake NUMA:
[root@master-2 ~]# echo 'Free RAM (kB)': `free | awk '/Mem/ {print $4}'`; echo 'Free Pages (NUMA Node 0):' `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node0/vmstat`; echo 'Free Pages (NUMA Node 1):' `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node1/vmstat` Free RAM (kB): 9819444 Free Pages (NUMA Node 0): 1271037 Free Pages (NUMA Node 1): 1183757
5. Knowing that, I wanted something more-sane and converted to MB as well as combined the total Free RAM between the two NUMA nodes to see the contrast
[root@master-2 ~]# echo 'Free RAM (mB):' `free -m | awk '/Mem/ {print $4}'`; echo 'Free RAM (NUMA combined):' $(expr $(expr $(expr `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node0/vmstat` + `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node1/vmstat`) * 4) / 1024) Free RAM (mB): 8446 Free RAM (NUMA combined): 8445
6. Using the output in step 4 and plugging in the numbers, the math checks out:
( 9819444 / 4 ) == (1271037 + 1183757)
or
2454861 Free Pages as seen by 'free' command == 2454794 Free Pages as seen by each NUMA node combined
7. Now that we know how many Cores/NUMA Nodes and how much total Free RAM there is, we can use some math to calculate how much Free RAM there is per NUMA node.
[root@master-2 ~]# echo 'Free RAM (MB):' `free -m | awk '/Mem/ {print $4}'`; echo 'Free RAM (MB) in NUMA 0:' $(expr $(expr `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node0/vmstat` * 4) / 1024); echo 'Free RAM (MB) in NUMA 1:' $(expr $(expr `awk '/nr_free_pages/ {print $2}' /sys/devices/system/node/node1/vmstat` * 4) / 1024) Free RAM (MB): 9433 Free RAM (MB) in NUMA 0: 4672 Free RAM (MB) in NUMA 1: 4766