Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-61203

kernel cannot represent rps_default_mask on AMD system with 512 cpus

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • rhel-9.6
    • rhel-9.4
    • kernel / Networking
    • None
    • kernel-5.14.0-529.el9
    • No
    • Low
    • rhel-net-core
    • ssg_networking
    • 12
    • 14
    • 3
    • False
    • False
    • Hide

      None

      Show
      None
    • Yes
    • Red Hat OpenShift Container Platform
    • None
    • Bug Fix
    • Hide
      .RHEL displays the correct number of CPUs on a system of 512 CPUs

      The `rps_default_mask` configuration setting controls the default Receive Packet Steering (`rps`) mechanism to direct incoming network packets towards specific CPUs. The `flow_limit_cpu_bitmap` parameter enables or disables flow control per CPU. With this fix, RHEL displays total CPUs along with its parameter values on the console correctly.

      Show
      .RHEL displays the correct number of CPUs on a system of 512 CPUs The `rps_default_mask` configuration setting controls the default Receive Packet Steering (`rps`) mechanism to direct incoming network packets towards specific CPUs. The `flow_limit_cpu_bitmap` parameter enables or disables flow control per CPU. With this fix, RHEL displays total CPUs along with its parameter values on the console correctly.
    • Done
    • x86_64
    • None

      What were you trying to do that didn't work?

      On OCP 4.16.15 with kernel version mentioned below:

      sh-5.1# uname -a
      Linux host.example.com 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

       

      when rps mask is set to reserved cpus on AMD EPYC 9754 128-Core Processor (512 cpus) , sysctl -a or /proc/sys/net/core/rps_default_mask  doesn't display all the bits :

      sh-5.1# cat /proc/sys/net/core/rps_default_mask 
      00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000000,00000000,00000000,

       

      There is a , at the end of mask , and nothing else 

      same is the case when using sysctl -a 

      sh-5.1# sysctl -a | grep rps net.core.rps_default_mask = 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003,00000000,00000000,00000000,00000000,00000000,00000000, net.core.rps_sock_flow_entries = 0

       

      What is the impact of this issue to you?

      On latency sensitive Telco deployments rps mask should have cpu mask equivalent to Reserved cpus.  This is set using Performance Addon Operator (part of Node tuning operator). 

      The above issue cause rps mask to be displayed incorrectly. 

      Please provide the package NVR for which the bug is seen:

      sh-5.1# uname -a
      Linux host.example.com 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Sep 13 12:41:50 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

       

      [root@ocp-installer ~]# oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.15   True        False         5h44m   Cluster version is 4.16.15

       

      How reproducible is this bug?:

      Apply the below performance profile on ocp 4.16.15 version of ocp cluster:
      on  AMD EPYC 9754 128-Core Processor (512 cpus)

       

      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        name: performance
      spec:
        cpu:
          isolated: 2-255,258-511
          reserved: 0-1,256-257
        machineConfigPoolSelector:
          machineconfiguration.openshift.io/role: worker-cnf
        net:
          userLevelNetworking: true
        nodeSelector:
          node-role.kubernetes.io/worker-cnf: ""
        numa:
          topologyPolicy: single-numa-node
        realTimeKernel:
          enabled: false
        workloadHints:
          highPowerConsumption: true
          perPodPowerManagement: false
          realTime: true

       

      The above profile applies the below kernel parameters 

      sh-5.1# cat /proc/cmdline

      BOOT_IMAGE=(hd0,gpt3)/boot/ostree/rhcos-72bef8c60def51d9d95d8c92584bf2c193cb7236f77881b3fb3c5fe15226f238/vmlinuz-5.14.0-427.37.1.el9_4.x86_64 rw ostree=/ostree/boot.0/rhcos/72bef8c60def51d9d95d8c92584bf2c193cb7236f77881b3fb3c5fe15226f238/0 ignition.platform.id=metal ip=dhcp root=UUID=6f52e30e-e608-4934-a387-dd556f32301e rw rootflags=prjquota boot=UUID=d2bd93f7-09a7-41e1-8a55-b5db882a9f6a systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=0 skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-191,194-383 tuned.non_isolcpus=00000003,00000000,00000000,00000000,00000000,00000000,00000003 systemd.cpu_affinity=0,1,193,192 intel_iommu=on iommu=pt isolcpus=managed_irq,2-191,194-383 nohz_full=2-191,194-383 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 processor.max_cstate=1 intel_idle.max_cstate=0 idle=poll intel_pstate=disable

       

      Steps to reproduce

      1.  Apply Performance profile as shown below on ocp cluster 
      2.  Login to the worker node and check the rps mask. 
      3.  

      Expected results:

      RPS mask should display all the cpus of the reserved cpus. 

      Actual results

      RPS mask is not displayed and truncated after 127 bits. 

              atenart@redhat.com Antoine Tenart
              mniranja Mallapadi Niranjan
              Antoine Tenart Antoine Tenart
              Jianwen Ji Jianwen Ji
              Mayur Patil Mayur Patil
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: