-
Bug
-
Resolution: Done-Errata
-
Blocker
-
rhos-18.0.0
-
None
-
1
-
False
-
-
False
-
?
-
No Docs Impact
-
OSPRH-811 - Red Hat OpenStack 18.0 Greenfield Deployment
-
openstack-nova-27.5.1-18.0.20240830154702.3e75b4f.el9osttrunk
-
?
-
?
-
None
-
-
Bug Fix
-
Done
-
-
-
Moderate
When cpu power management is enabled, the socket id for a cpu has the potential to report as 0 when the cpu actually belongs to another socket. For example here is the output of 'virsh capabilites' on a multi socket Host where the cpu's located on socket id 1 are instead reporting 0.
<cell id='3'> <memory unit='KiB'>66049368</memory> <pages unit='KiB' size='4'>3929430</pages> <pages unit='KiB' size='2048'>0</pages> <pages unit='KiB' size='1048576'>48</pages> <distances> <sibling id='0' value='32'/> <sibling id='1' value='32'/> <sibling id='2' value='12'/> <sibling id='3' value='10'/> </distances> <cpus num='24'> <cpu id='36' socket_id='1' die_id='1' cluster_id='65535' core_id='16' siblings='36,84'/> <cpu id='37' socket_id='1' die_id='1' cluster_id='65535' core_id='17' siblings='37,85'/> <cpu id='38' socket_id='1' die_id='1' cluster_id='65535' core_id='18' siblings='38,86'/> <cpu id='39' socket_id='1' die_id='1' cluster_id='65535' core_id='20' siblings='39,87'/> <cpu id='40' socket_id='1' die_id='1' cluster_id='65535' core_id='21' siblings='40,88'/> <cpu id='41' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='41'/> <cpu id='42' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='42'/> <cpu id='43' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='43'/> <cpu id='44' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='44'/> <cpu id='45' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='45'/> <cpu id='46' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='46'/> <cpu id='47' socket_id='0' die_id='0' cluster_id='0' core_id='0' siblings='47'/> <cpu id='84' socket_id='1' die_id='1' cluster_id='65535' core_id='16' siblings='36,84'/> <cpu id='85' socket_id='1' die_id='1' cluster_id='65535' core_id='17' siblings='37,85'/> <cpu id='86' socket_id='1' die_id='1' cluster_id='65535' core_id='18' siblings='38,86'/> <cpu id='87' socket_id='1' die_id='1' cluster_id='65535' core_id='20' siblings='39,87'/> <cpu id='88' socket_id='1' die_id='1' cluster_id='65535' core_id='21' siblings='40,88'/> <cpu id='89' socket_id='1' die_id='1' cluster_id='65535' core_id='22' siblings='89'/> <cpu id='90' socket_id='1' die_id='1' cluster_id='65535' core_id='24' siblings='90'/> <cpu id='91' socket_id='1' die_id='1' cluster_id='65535' core_id='25' siblings='91'/> <cpu id='92' socket_id='1' die_id='1' cluster_id='65535' core_id='26' siblings='92'/> <cpu id='93' socket_id='1' die_id='1' cluster_id='65535' core_id='28' siblings='93'/> <cpu id='94' socket_id='1' die_id='1' cluster_id='65535' core_id='29' siblings='94'/> <cpu id='95' socket_id='1' die_id='1' cluster_id='65535' core_id='30' siblings='95'/> </cpus> </cell>
The offlined cpus in the above example (41-47) are all reporting a socket id of 0 instead of 1.
[root@edpm-compute-0 cloud-admin]# cat /sys/bus/cpu/devices/cpu42/online 0
When checking the cpu topology before and after offlining a cpu unfortunately does not show the cpu information, e.g.
[root@edpm-compute-0 cloud-admin]# cat /sys/bus/cpu/devices/cpu42/online 1 [root@edpm-compute-0 cloud-admin]# grep . /sys/bus/cpu/devices/cpu*/topology/* ....REMOVED FOR BREVITY..... /sys/bus/cpu/devices/cpu41/topology/cluster_cpus:02000000,00000200,00000000 /sys/bus/cpu/devices/cpu41/topology/cluster_cpus_list:41,89 /sys/bus/cpu/devices/cpu41/topology/cluster_id:65535 /sys/bus/cpu/devices/cpu41/topology/core_cpus:02000000,00000200,00000000 /sys/bus/cpu/devices/cpu41/topology/core_cpus_list:41,89 /sys/bus/cpu/devices/cpu41/topology/core_id:22 /sys/bus/cpu/devices/cpu41/topology/core_siblings:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu41/topology/core_siblings_list:24-47,72-95 /sys/bus/cpu/devices/cpu41/topology/die_cpus:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu41/topology/die_cpus_list:24-47,72-95 /sys/bus/cpu/devices/cpu41/topology/die_id:1 /sys/bus/cpu/devices/cpu41/topology/package_cpus:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu41/topology/package_cpus_list:24-47,72-95 /sys/bus/cpu/devices/cpu41/topology/physical_package_id:1 /sys/bus/cpu/devices/cpu41/topology/ppin:0x2b55f59dfa78030 /sys/bus/cpu/devices/cpu41/topology/thread_siblings:02000000,00000200,00000000 /sys/bus/cpu/devices/cpu41/topology/thread_siblings_list:41,89 /sys/bus/cpu/devices/cpu42/topology/cluster_cpus:04000000,00000400,00000000 /sys/bus/cpu/devices/cpu42/topology/cluster_cpus_list:42,90 /sys/bus/cpu/devices/cpu42/topology/cluster_id:65535 /sys/bus/cpu/devices/cpu42/topology/core_cpus:04000000,00000400,00000000 /sys/bus/cpu/devices/cpu42/topology/core_cpus_list:42,90 /sys/bus/cpu/devices/cpu42/topology/core_id:24 /sys/bus/cpu/devices/cpu42/topology/core_siblings:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu42/topology/core_siblings_list:24-47,72-95 /sys/bus/cpu/devices/cpu42/topology/die_cpus:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu42/topology/die_cpus_list:24-47,72-95 /sys/bus/cpu/devices/cpu42/topology/die_id:1 /sys/bus/cpu/devices/cpu42/topology/package_cpus:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu42/topology/package_cpus_list:24-47,72-95 /sys/bus/cpu/devices/cpu42/topology/physical_package_id:1 /sys/bus/cpu/devices/cpu42/topology/ppin:0x2b55f59dfa78030 /sys/bus/cpu/devices/cpu42/topology/thread_siblings:04000000,00000400,00000000 /sys/bus/cpu/devices/cpu42/topology/thread_siblings_list:42,90 /sys/bus/cpu/devices/cpu43/topology/cluster_cpus:08000000,00000800,00000000 /sys/bus/cpu/devices/cpu43/topology/cluster_cpus_list:43,91 /sys/bus/cpu/devices/cpu43/topology/cluster_id:65535 /sys/bus/cpu/devices/cpu43/topology/core_cpus:08000000,00000800,00000000 /sys/bus/cpu/devices/cpu43/topology/core_cpus_list:43,91 /sys/bus/cpu/devices/cpu43/topology/core_id:25 /sys/bus/cpu/devices/cpu43/topology/core_siblings:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu43/topology/core_siblings_list:24-47,72-95 /sys/bus/cpu/devices/cpu43/topology/die_cpus:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu43/topology/die_cpus_list:24-47,72-95 /sys/bus/cpu/devices/cpu43/topology/die_id:1 /sys/bus/cpu/devices/cpu43/topology/package_cpus:ffffff00,0000ffff,ff000000 /sys/bus/cpu/devices/cpu43/topology/package_cpus_list:24-47,72-95 /sys/bus/cpu/devices/cpu43/topology/physical_package_id:1 /sys/bus/cpu/devices/cpu43/topology/ppin:0x2b55f59dfa78030 /sys/bus/cpu/devices/cpu43/topology/thread_siblings:08000000,00000800,00000000 /sys/bus/cpu/devices/cpu43/topology/thread_siblings_list:43,91 echo 0 > /sys/bus/cpu/devices/cpu42/online grep . /sys/bus/cpu/devices/cpu*/topology/* .....REMOVED FOR BREVITY..... /sys/bus/cpu/devices/cpu41/topology/cluster_cpus:02000000,00000200,00000000 /sys/bus/cpu/devices/cpu41/topology/cluster_cpus_list:41,89 /sys/bus/cpu/devices/cpu41/topology/cluster_id:65535 /sys/bus/cpu/devices/cpu41/topology/core_cpus:02000000,00000200,00000000 /sys/bus/cpu/devices/cpu41/topology/core_cpus_list:41,89 /sys/bus/cpu/devices/cpu41/topology/core_id:22 /sys/bus/cpu/devices/cpu41/topology/core_siblings:ffffff00,0000fbff,ff000000 /sys/bus/cpu/devices/cpu41/topology/core_siblings_list:24-41,43-47,72-95 /sys/bus/cpu/devices/cpu41/topology/die_cpus:ffffff00,0000fbff,ff000000 /sys/bus/cpu/devices/cpu41/topology/die_cpus_list:24-41,43-47,72-95 /sys/bus/cpu/devices/cpu41/topology/die_id:1 /sys/bus/cpu/devices/cpu41/topology/package_cpus:ffffff00,0000fbff,ff000000 /sys/bus/cpu/devices/cpu41/topology/package_cpus_list:24-41,43-47,72-95 /sys/bus/cpu/devices/cpu41/topology/physical_package_id:1 /sys/bus/cpu/devices/cpu41/topology/ppin:0x2b55f59dfa78030 /sys/bus/cpu/devices/cpu41/topology/thread_siblings:02000000,00000200,00000000 /sys/bus/cpu/devices/cpu41/topology/thread_siblings_list:41,89 /sys/bus/cpu/devices/cpu43/topology/cluster_cpus:08000000,00000800,00000000 /sys/bus/cpu/devices/cpu43/topology/cluster_cpus_list:43,91 /sys/bus/cpu/devices/cpu43/topology/cluster_id:65535 /sys/bus/cpu/devices/cpu43/topology/core_cpus:08000000,00000800,00000000 /sys/bus/cpu/devices/cpu43/topology/core_cpus_list:43,91 /sys/bus/cpu/devices/cpu43/topology/core_id:25 /sys/bus/cpu/devices/cpu43/topology/core_siblings:ffffff00,0000fbff,ff000000 /sys/bus/cpu/devices/cpu43/topology/core_siblings_list:24-41,43-47,72-95 /sys/bus/cpu/devices/cpu43/topology/die_cpus:ffffff00,0000fbff,ff000000 /sys/bus/cpu/devices/cpu43/topology/die_cpus_list:24-41,43-47,72-95 /sys/bus/cpu/devices/cpu43/topology/die_id:1 /sys/bus/cpu/devices/cpu43/topology/package_cpus:ffffff00,0000fbff,ff000000 /sys/bus/cpu/devices/cpu43/topology/package_cpus_list:24-41,43-47,72-95 /sys/bus/cpu/devices/cpu43/topology/physical_package_id:1 /sys/bus/cpu/devices/cpu43/topology/ppin:0x2b55f59dfa78030 /sys/bus/cpu/devices/cpu43/topology/thread_siblings:08000000,00000800,00000000 /sys/bus/cpu/devices/cpu43/topology/thread_siblings_list:43,91
In this state Nova is enable to accurately read socket information.
2024-07-16 17:01:44.544 1 DEBUG nova.pci.stats [None req-e8e48005-14a0-4d63-8837-1fe477b817e8 f8fffcb972e24b40b588e9d14b76f1b6 7c9ac4ef27f14c9bab273e577ec9d47a - - default default] No socket information in host NUMA cell(s). _filter_pools_for_socket_affinity /usr/lib/python3.9/site-packages/nova/pci/stats.py:474
Confirmed workaround of disabling power managment and enabling all CPUs can bypass issues.
- blocks
-
OSPRH-83 Offlining unused CPU cores for better power management
- In Progress
- links to
-
RHBA-2024:139296 Release of components for RHOSO 18.0.2
- mentioned on