Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57787

AMD-Vi IRQs not affined to reserved CPUs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.12, 4.14, 4.18
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • Done
    • Known Issue
    • Hide
      On systems with certain AMD EPYC processors, some low-level system interrupts, for exmaple `AMD-Vi`, might contain CPUs in their CPU mask that intersect with CPU-pinned workloads. This behavior is because of the hardware design. These specific error-reporting interrupts are generally inactive and there is currently no known performance impact.

      https://issues.redhat.com/browse/OCPBUGS-57787
      Show
      On systems with certain AMD EPYC processors, some low-level system interrupts, for exmaple `AMD-Vi`, might contain CPUs in their CPU mask that intersect with CPU-pinned workloads. This behavior is because of the hardware design. These specific error-reporting interrupts are generally inactive and there is currently no known performance impact. https://issues.redhat.com/browse/OCPBUGS-57787
    • None
    • None
    • None
    • None

      Description of problem:

      
      AMD-Vi IRQs are not affined to the smp_affinity_list after the performance profile has been applied
      
      They always use the first CPU IDs, even if trying to change manually, it does not work.
      
      [root@worker-0 ~]# ls /proc/irq/26
      AMD-Vi  affinity_hint  effective_affinity  effective_affinity_list  node  smp_affinity  smp_affinity_list  spurious
      
      [root@worker-0 ~]# cat /proc/irq/26/smp_affinity_list
      88-95,184-191
      
      [root@worker-0 ~]# echo "84-87" > /proc/irq/26/smp_affinity_list
      
      [root@worker-0 ~]# cat /proc/irq/26/smp_affinity_list
      88-95,184-191
      
          

      Version-Release number of selected component (if applicable):

      Probably in all versions of Openshift, for now tested in OCP 4.12, 4.14, and 4.18
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Apply performance profile
          2. After it has been applied query the affinity of the AMD-Vi IRQs
          3. Try to change the smp_affinity_list as described above
          

      Actual results:

      AMD-Vi IRQs are not affined to the smp_affinity_list or reserved CPU list
          

      Expected results:

      AMD-Vi IRQs should be affined to the smp_affinity_list or reserved CPU list
          

      Additional info:

      Validated in the following hardware:
      
      System Information
              Manufacturer: Dell Inc.
              Product Name: PowerEdge R7615
      Processor Information
              Socket Designation: CPU1
              Type: Central Processor
              Family: Zen
              Manufacturer: AMD
              ID: 11 0F A1 00 FF FB 8B 17
              Signature: Family 25, Model 17, Stepping 1
              Version: AMD EPYC 9654P 96-Core Processor
              Core Count: 96
              Core Enabled: 96
              Thread Count: 192
      
      [core@worker-0 ~]$ sudo dmesg | grep AMD-Vi
      [    0.193265] AMD-Vi: Using global IVHD EFR:0x25bf732fa2295afe, EFR2:0x1d
      [    1.892561] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
      [    1.892593] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
      [    1.892613] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
      [    1.892639] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
      [    1.895566] pci 0000:c0:00.2: AMD-Vi: Found IOMMU cap 0x40
      [    1.895572] AMD-Vi: Extended features (0x25bf732fa2295afe, 0x1d): PPR X2APIC NX GT [5] IA GA PC GA_vAPIC
      [    1.895583] pci 0000:80:00.2: AMD-Vi: Found IOMMU cap 0x40
      [    1.895586] AMD-Vi: Extended features (0x25bf732fa2295afe, 0x1d): PPR X2APIC NX GT [5] IA GA PC GA_vAPIC
      [    1.895595] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
      [    1.895597] AMD-Vi: Extended features (0x25bf732fa2295afe, 0x1d): PPR X2APIC NX GT [5] IA GA PC GA_vAPIC
      [    1.895606] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
      [    1.895608] AMD-Vi: Extended features (0x25bf732fa2295afe, 0x1d): PPR X2APIC NX GT [5] IA GA PC GA_vAPIC
      [    1.895617] AMD-Vi: Interrupt remapping enabled
      [    1.895619] AMD-Vi: X2APIC enabled
      [    1.895637] AMD-Vi: Virtual APIC enabled
      

      AMD-Vi IRQs always use the first CPU IDs,

      $ oc get performanceprofile -o yaml | head -30
      apiVersion: v1
      items:
      - apiVersion: performance.openshift.io/v2
        kind: PerformanceProfile
        metadata:
          name: blueprint-profile
        spec:
          additionalKernelArgs:
          - nohz_full=0-93,96-189
          cpu:
            isolated: 0-93,96-189
            reserved: 94-95,190-191
      
      $ CPUMAX=`cat /proc/cpuinfo | grep processor | tail -n 1 | egrep -o [0-9]*$`
      $ echo === NAME of IRQs for every CPU ===
      $ for C in `seq 0 $CPUMAX` ; do
        echo -n CPU${C}:
        IRQS=`grep -H ${C}  /proc/irq/*/effective_affinity_list | grep :${C}$ | cut -f 4  -d '/'`
        for I in $IRQS ; do
          IRQNAME=`cat /proc/interrupts | grep \ ${I}\: | awk '{print $(NF)}'`
          echo -n " "${IRQNAME}
        done
        echo
      done
      === NAME of IRQs for every CPU ===                                                                                                                            
      CPU0: timer                                                                                                                                                   
      CPU1:                                                                                                                                                         
      CPU2: AMD-Vi                                                                                                                                                  
      CPU3: AMD-Vi                                                                                                                                                 
      CPU4: AMD-Vi                                                                                                                                                  
      CPU5: AMD-Vi  
      ...
      

      This cluster is using the last 4 CPUs from a CCX as reserved, the idea is to leave the other CCX (11 groups of 16 CPUs) for the workloads (isolated CPUs), I believe that's why the smp_affinity_list got 88-95,184-191

      [core@worker-0 ~]$ lscpu -e | grep "191"
      191 0    0      95   95:95:95:11   yes
      [core@worker-0 ~]$ lscpu -e | grep ":11 "
      88  0    0      88   88:88:88:11   yes
      89  0    0      89   89:89:89:11   yes
      90  0    0      90   90:90:90:11   yes
      91  0    0      91   91:91:91:11   yes
      92  0    0      92   92:92:92:11   yes
      93  0    0      93   93:93:93:11   yes
      94  0    0      94   94:94:94:11   yes
      95  0    0      95   95:95:95:11   yes
      184 0    0      88   88:88:88:11   yes
      185 0    0      89   89:89:89:11   yes
      186 0    0      90   90:90:90:11   yes
      187 0    0      91   91:91:91:11   yes
      188 0    0      92   92:92:92:11   yes
      189 0    0      93   93:93:93:11   yes
      190 0    0      94   94:94:94:11   yes
      191 0    0      95   95:95:95:11   yes
      
      [core@worker-0 ~]$ cat /proc/cmdline 
      BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-eca7b576fcf4e3884470fb6bd0b922a280a6f872c453072f58eb164102ec2261/vmlinuz-4.18.0-372.146.1.el8_6.x86_64 ignition.platform.id=metal ostree=/ostree/boot.0/rhcos/eca7b576fcf4e3884470fb6bd0b922a280a6f872c453072f58eb164102ec2261/0 root=UUID=4c4afd3c-b829-44e0-93a1-6ee8082c2472 rw rootflags=prjquota boot=UUID=5815c856-bbea-4035-b961-5f27d65c33df skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=0-93,96-189 tuned.non_isolcpus=c0000000,00000000,00000000,c0000000,00000000,00000000 systemd.cpu_affinity=191,190,94,95 iommu=pt isolcpus=managed_irq,0-93,96-189 nohz_full=0-93,96-189 amd_pstate=passive
      

      4.14 Logs:
      SOSReport: https://issues.redhat.com/secure/attachment/13439126/sosreport-worker-0-2025-06-17-oveuywj.tar.xz
      must-gather: https://issues.redhat.com/secure/attachment/13439169/must_gather.tar.gz

              msivak@redhat.com Martin Sivak
              rhn-gps-manrodri Manuel Rodriguez
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: