Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39377

RPS settings fail to apply on AMD Genoa System

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • Node Tuning Operator
    • None
    • No
    • CNF Compute Sprint 259, CNF Compute Sprint 260
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, due to an internal bug, the Node Tuning Operator incorrectly computed CPU masks for interrupt and network handling CPU affinity if a machine had more than 256 CPUs. This prevented proper CPU isolation on those machines and resulted in `systemd` unit failures. With this release, the Node Tuning Operator computes the masks correctly.
      (link:https://issues.redhat.com/browse/OCPBUGS-39377[*OCPBUGS-39377*])
      Show
      * Previously, due to an internal bug, the Node Tuning Operator incorrectly computed CPU masks for interrupt and network handling CPU affinity if a machine had more than 256 CPUs. This prevented proper CPU isolation on those machines and resulted in `systemd` unit failures. With this release, the Node Tuning Operator computes the masks correctly. (link: https://issues.redhat.com/browse/OCPBUGS-39377 [* OCPBUGS-39377 *])
    • Bug Fix
    • Done
    • Hide
      17/09/2024: Waiting for QE verification of the 4.17 parent bug
      12/08/2024: Not blocked anymore, an issue in code was masked by hidden kernel assumption. Patch posted.
      29/07/2024: Blocked by RHEL-46240. This might be moved to 4.18
      Show
      17/09/2024: Waiting for QE verification of the 4.17 parent bug 12/08/2024: Not blocked anymore, an issue in code was masked by hidden kernel assumption. Patch posted. 29/07/2024: Blocked by RHEL-46240. This might be moved to 4.18

      This is a clone of issue OCPBUGS-39164. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-36431. The following is the description of the original issue:

      Description of problem:

          After installing OpenShift on a new AMD Genoa system and applying a performance profile, the systemd-sysctl service fails. Upon further investigation, the particular sysctl setting that is failing to apply is the rps_default_mask.

      Version-Release number of selected component (if applicable):

          OpenShift 4.14.28

      How reproducible:

          Always

      Steps to Reproduce:

          1. Install OpenShift + CNF policies to an AMD Genoa system (512 cores)
          2. Login and check the sysctl service
          

      Actual results:

         Logging in to the system after deployment shows:
      
      [systemd]
      Failed Units: 1
        systemd-sysctl.service
      
      Restarting the service fails:
      # systemctl restart systemd-sysctl.service
      Job for systemd-sysctl.service failed because the control process exited with error code.
      See "systemctl status systemd-sysctl.service" and "journalctl -xeu systemd-sysctl.service" for details.
      

      Expected results:

       sysctl settings are properly applied to the system

      Additional info:

      Configuration created via node tuning operator:
      [root@amd-genoa-02 ~]# cat /etc/sysctl.d/99-default-rps-mask.conf 
      # Apply the RPS mask on the virtual interfaces of the host by default, becasue
      # from the container perspective the RPS mask the will be consulted, is the one on the RX side of the veth in the host.
      # Consider the following diagram:
      # Pod A <veth1 - veth2> host <veth3 - veth4> Pod B
      #  veth2's RPS affinity is the one determining the CPUs that are handling the packet processing when sending data from Pod A to pod B.
      # Additional common scenarios:
      # 1. Pod A = sender, host = receiver
      #  The RPS affinity of the host side should be consulted (because it’s the receiver) and it should be set to cpus not sensitive to preemption (reserved pool).
      # 2. Pod A = receiver, host = sender
      #  In case of no RPS mask on the receiver side, the sender needs to pay the price and do all the processing on its cores.
      net.core.rps_default_mask = 30000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      
      Manually attempting to set the value:
      
      [root@amd-genoa-02 ~]# sysctl -w net.core.rps_default_mask=30000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      sysctl: setting key "net.core.rps_default_mask": Invalid argument
      
      Actual/default value:
      
      [root@amd-genoa-02 ~]# sysctl net.core.rps_default_mask
      net.core.rps_default_mask = 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
      
      [core@amd-genoa-02 ~]$ lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         52 bits physical, 57 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  512
        On-line CPU(s) list:   0-511
      Vendor ID:               AuthenticAMD
        Model name:            AMD EPYC 9754 128-Core Processor
          CPU family:          25
          Model:               160
          Thread(s) per core:  2
          Core(s) per socket:  128
          Socket(s):           2
          Stepping:            1
          Frequency boost:     enabled
          CPU max MHz:         3100.3411
          CPU min MHz:         1500.0000
          BogoMIPS:            4493.62
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1g
                               b rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 
                               sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osv
                               w ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perf
                               mon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflush
                               opt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf1
                               6 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassist
                               s pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx
                               512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d
      Virtualization features: 
        Virtualization:        AMD-V
      Caches (sum of all):     
        L1d:                   8 MiB (256 instances)
        L1i:                   8 MiB (256 instances)
        L2:                    256 MiB (256 instances)
        L3:                    512 MiB (32 instances)
      NUMA:                    
        NUMA node(s):          2
        NUMA node0 CPU(s):     0-127,256-383
        NUMA node1 CPU(s):     128-255,384-511
      Vulnerabilities:         
        Gather data sampling:  Not affected
        Itlb multihit:         Not affected
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Not affected
        Retbleed:              Not affected
        Spec rstack overflow:  Mitigation; Safe RET
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
        Srbds:                 Not affected
        Tsx async abort:       Not affected
      [core@amd-genoa-02 ~]$ lscpu
      Architecture:            x86_64
        CPU op-mode(s):        32-bit, 64-bit
        Address sizes:         52 bits physical, 57 bits virtual
        Byte Order:            Little Endian
      CPU(s):                  512
        On-line CPU(s) list:   0-511
      Vendor ID:               AuthenticAMD
        Model name:            AMD EPYC 9754 128-Core Processor
          CPU family:          25
          Model:               160
          Thread(s) per core:  2
          Core(s) per socket:  128
          Socket(s):           2
          Stepping:            1
          Frequency boost:     enabled
          CPU max MHz:         3100.3411
          CPU min MHz:         1500.0000
          BogoMIPS:            4493.62
          Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1g
                               b rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 
                               sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osv
                               w ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perf
                               mon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflush
                               opt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf1
                               6 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassist
                               s pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx
                               512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d
      Virtualization features: 
        Virtualization:        AMD-V
      Caches (sum of all):     
        L1d:                   8 MiB (256 instances)
        L1i:                   8 MiB (256 instances)
        L2:                    256 MiB (256 instances)
        L3:                    512 MiB (32 instances)
      NUMA:                    
        NUMA node(s):          2
        NUMA node0 CPU(s):     0-127,256-383
        NUMA node1 CPU(s):     128-255,384-511
      Vulnerabilities:         
        Gather data sampling:  Not affected
        Itlb multihit:         Not affected
        L1tf:                  Not affected
        Mds:                   Not affected
        Meltdown:              Not affected
        Mmio stale data:       Not affected
        Retbleed:              Not affected
        Spec rstack overflow:  Mitigation; Safe RET
        Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
        Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
        Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
        Srbds:                 Not affected
        Tsx async abort:       Not affected
      
      

              msivak@redhat.com Martin Sivak
              openshift-crt-jira-prow OpenShift Prow Bot
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: