-
Bug
-
Resolution: Done-Errata
-
Undefined
-
rhel-9.4
-
None
-
kernel-5.14.0-529.el9
-
No
-
Low
-
rhel-net-core
-
ssg_networking
-
12
-
14
-
3
-
False
-
False
-
-
Yes
-
Red Hat OpenShift Container Platform
-
None
-
Pass
-
-
RegressionOnly
-
Bug Fix
-
-
Done
-
-
x86_64
-
None
What were you trying to do that didn't work?
On OCP 4.16.15 with kernel version mentioned below:
sh-5.1# uname -a Linux host.example.com 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
when rps mask is set to reserved cpus on AMD EPYC 9754 128-Core Processor (512 cpus) , sysctl -a or /proc/sys/net/core/rps_default_mask doesn't display all the bits :
sh-5.1# cat /proc/sys/net/core/rps_default_mask 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000000,00000000,00000000,
There is a , at the end of mask , and nothing else
same is the case when using sysctl -a
sh-5.1# sysctl -a | grep rps net.core.rps_default_mask = 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000000,00000000,00000000, net.core.rps_sock_flow_entries = 0
What is the impact of this issue to you?
On latency sensitive Telco deployments rps mask should have cpu mask equivalent to Reserved cpus. This is set using Performance Addon Operator (part of Node tuning operator).
The above issue cause rps mask to be displayed incorrectly.
Please provide the package NVR for which the bug is seen:
sh-5.1# uname -a Linux host.example.com 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@ocp-installer ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.15 True False 5h44m Cluster version is 4.16.15
How reproducible is this bug?:
Apply the below performance profile on ocp 4.16.15 version of ocp cluster:
on AMD EPYC 9754 128-Core Processor (512 cpus)
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: cpu: isolated: 2-255,258-511 reserved: 0-1,256-257 machineConfigPoolSelector: machineconfiguration.openshift.io/role: worker-cnf net: userLevelNetworking: true nodeSelector: node-role.kubernetes.io/worker-cnf: "" numa: topologyPolicy: single-numa-node realTimeKernel: enabled: false workloadHints: highPowerConsumption: true perPodPowerManagement: false realTime: true
The above profile applies the below kernel parameters
sh-5.1# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/boot/ostree/rhcos-72bef8c60def51d9d95d8c92584bf2c193cb7236f77881b3fb3c5fe15226f238/vmlinuz-5.14.0-427.37.1.el9_4.x86_64 rw ostree=/ostree/boot.0/rhcos/72bef8c60def51d9d95d8c92584bf2c193cb7236f77881b3fb3c5fe15226f238/0 ignition.platform.id=metal ip=dhcp root=UUID=6f52e30e-e608-4934-a387-dd556f32301e rw rootflags=prjquota boot=UUID=d2bd93f7-09a7-41e1-8a55-b5db882a9f6a systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=0 skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-191,194-383 tuned.non_isolcpus=00000003,00000000,00000000,00000000,00000000,00000000,00000003 systemd.cpu_affinity=0,1,193,192 intel_iommu=on iommu=pt isolcpus=managed_irq,2-191,194-383 nohz_full=2-191,194-383 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll intel_pstate=disable
Steps to reproduce
- Apply Performance profile as shown below on ocp cluster
- Login to the worker node and check the rps mask.
Expected results:
RPS mask should display all the cpus of the reserved cpus.
Actual results
RPS mask is not displayed and truncated after 127 bits.
- links to
-
RHSA-2024:138410 kernel bug fix and enhancement update
- mentioned on