Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36431

RPS settings fail to apply on AMD Genoa System

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • Node Tuning Operator
    • None
    • No
    • CNF Compute Sprint 258, CNF Compute Sprint 259, CNF Compute Sprint 260, CNF Compute Sprint 261
    • 4
    • False
    • Hide

      None

      Show
      None
    • Hide
      An internal bug caused cpu masks for interrupt and network handling cpu affinity to be computed improperly when the machine had more than 256 cpus.

      This prevented proper cpu isolation on such machines and manifested itself as systemd unit failures when the internal status of the node was investigated.

      The bug has been fixed.

      (Note for doc folks: a 4.18 known issue until https://github.com/openshift/cluster-node-tuning-operator/pull/1131 is merged and 4.17 until the backports land)
      Show
      An internal bug caused cpu masks for interrupt and network handling cpu affinity to be computed improperly when the machine had more than 256 cpus. This prevented proper cpu isolation on such machines and manifested itself as systemd unit failures when the internal status of the node was investigated. The bug has been fixed. (Note for doc folks: a 4.18 known issue until https://github.com/openshift/cluster-node-tuning-operator/pull/1131 is merged and 4.17 until the backports land)
    • Bug Fix
    • In Progress
    • Hide
      12/08/2024: Not blocked anymore, an issue in code was masked by hidden kernel assumption. Patch posted.
      29/07/2024: Blocked by RHEL-46240. This might be moved to 4.18
      Show
      12/08/2024: Not blocked anymore, an issue in code was masked by hidden kernel assumption. Patch posted. 29/07/2024: Blocked by RHEL-46240. This might be moved to 4.18

      Description of problem:

          After installing OpenShift on a new AMD Genoa system and applying a performance profile, the systemd-sysctl service fails. Upon further investigation, the particular sysctl setting that is failing to apply is the rps_default_mask.

      Version-Release number of selected component (if applicable):

          OpenShift 4.14.28

      How reproducible:

          Always

      Steps to Reproduce:

          1. Install OpenShift + CNF policies to an AMD Genoa system (512 cores)
          2. Login and check the sysctl service
          

      Actual results:

         Logging in to the system after deployment shows:
      
      [systemd]
      Failed Units: 1
        systemd-sysctl.service
      
      Restarting the service fails:
      # systemctl restart systemd-sysctl.service
      Job for systemd-sysctl.service failed because the control process exited with error code.
      See "systemctl status systemd-sysctl.service" and "journalctl -xeu systemd-sysctl.service" for details.
      

      Expected results:

       sysctl settings are properly applied to the system

      Additional info:

      Configuration created via node tuning operator:
      [root@amd-genoa-02 ~]# cat /etc/sysctl.d/99-default-rps-mask.conf 
      # Apply the RPS mask on the virtual interfaces of the host by default, becasue
      # from the container perspective the RPS mask the will be consulted, is the one on the RX side of the veth in the host.
      # Consider the following diagram:
      # Pod A <veth1 - veth2> host <veth3 - veth4> Pod B
      #  veth2's RPS affinity is the one determining the CPUs that are handling the packet processing when sending data from Pod A to pod B.
      # Additional common scenarios:
      # 1. Pod A = sender, host = receiver
      #  The RPS affinity of the host side should be consulted (because it’s the receiver) and it should be set to cpus not sensitive to preemption (reserved pool).
      # 2. Pod A = receiver, host = sender
      #  In case of no RPS mask on the receiver side, the sender needs to pay the price and do all the processing on its cores.
      net.core.rps_default_mask = 30000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      
      Manually attempting to set the value:
      
      [root@amd-genoa-02 ~]# sysctl -w net.core.rps_default_mask=30000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      sysctl: setting key "net.core.rps_default_mask": Invalid argument
      
      Actual/default value:
      
      [root@amd-genoa-02 ~]# sysctl net.core.rps_default_mask
      net.core.rps_default_mask = 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
      
      [core@amd-genoa-02 ~]$ lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         52 bits physical, 57 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  512
        On-line CPU(s) list:   0-511
      Vendor ID:               AuthenticAMD
        Model name:            AMD EPYC 9754 128-Core Processor
          CPU family:          25
          Model:               160
          Thread(s) per core:  2
          Core(s) per socket:  128
          Socket(s):           2
          Stepping:            1
          Frequency boost:     enabled
          CPU max MHz:         3100.3411
          CPU min MHz:         1500.0000
          BogoMIPS:            4493.62
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1g
                               b rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 
                               sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osv
                               w ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perf
                               mon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflush
                               opt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf1
                               6 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassist
                               s pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx
                               512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d
      Virtualization features: 
        Virtualization:        AMD-V
      Caches (sum of all):     
        L1d:                   8 MiB (256 instances)
        L1i:                   8 MiB (256 instances)
        L2:                    256 MiB (256 instances)
        L3:                    512 MiB (32 instances)
      NUMA:                    
        NUMA node(s):          2
        NUMA node0 CPU(s):     0-127,256-383
        NUMA node1 CPU(s):     128-255,384-511
      Vulnerabilities:         
        Gather data sampling:  Not affected
        Itlb multihit:         Not affected
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Not affected
        Retbleed:              Not affected
        Spec rstack overflow:  Mitigation; Safe RET
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
        Srbds:                 Not affected
        Tsx async abort:       Not affected
      [core@amd-genoa-02 ~]$ lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         52 bits physical, 57 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  512
        On-line CPU(s) list:   0-511
      Vendor ID:               AuthenticAMD
        Model name:            AMD EPYC 9754 128-Core Processor
          CPU family:          25
          Model:               160
          Thread(s) per core:  2
          Core(s) per socket:  128
          Socket(s):           2
          Stepping:            1
          Frequency boost:     enabled
          CPU max MHz:         3100.3411
          CPU min MHz:         1500.0000
          BogoMIPS:            4493.62
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1g
                               b rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 
                               sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osv
                               w ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perf
                               mon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflush
                               opt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf1
                               6 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassist
                               s pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx
                               512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d
      Virtualization features: 
        Virtualization:        AMD-V
      Caches (sum of all):     
        L1d:                   8 MiB (256 instances)
        L1i:                   8 MiB (256 instances)
        L2:                    256 MiB (256 instances)
        L3:                    512 MiB (32 instances)
      NUMA:                    
        NUMA node(s):          2
        NUMA node0 CPU(s):     0-127,256-383
        NUMA node1 CPU(s):     128-255,384-511
      Vulnerabilities:         
        Gather data sampling:  Not affected
        Itlb multihit:         Not affected
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Not affected
        Retbleed:              Not affected
        Spec rstack overflow:  Mitigation; Safe RET
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
        Srbds:                 Not affected
        Tsx async abort:       Not affected
      
      

       

      QE Verification steps:

      1. Get a machine with 256+ cpus
      2. Apply a performance profile
      3. Start a guaranteed pod
      4. Check /etc/sysconfig/irqbalance and make sure the mask there matches the right cpus (whole machine with the assigned cpus removed)
      5. Make sure sysctl net.core.rps_default_mask shows the right mask for reserved cpus
      6. If using nohz_full (workload hints realtime=true) make sure the net.core.rps_default_mask and nohz_full mask do not overlap

            msivak@redhat.com Martin Sivak
            dcritch1@redhat.com David Critch
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: