Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57783

MEGASAS IRQs not affined to reserved cores

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.12, 4.14, 4.18
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      All megasas irqs are not affined to the reserved cores after the performance profile has been applied.
      
      In this case the reserved cores are 94-95,190-191
      
      [root@worker-0 ~]# ls /proc/irq/153
      affinity_hint  effective_affinity  effective_affinity_list  megasas0-msix80  node  smp_affinity  smp_affinity_list  spurious
      
      [root@worker-0 ~]# cat /proc/irq/153/smp_affinity_list
      72
      
      If I try and manually affine, the following error occurs
      
      [root@worker-0 ~]# echo "88-95,184-191" > /proc/irq/153/smp_affinity_list
      -bash: echo: write error: Input/output error
      
      
          

      Version-Release number of selected component (if applicable):

      Probably in all versions of Openshift, for now tested in OCP 4.12, 4.14, and 4.18
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Apply performance profile
          2. After it has been applied query the affinity of the megasas irqs
          3. Try to change the smp_affinity_list as described above
          

      Actual results:

      Megasas irqs are not affined to the reserved cores.
          

      Expected results:

      Megasas irqs are affined to the reserved cores.
          

      Additional info:

      
      Validated in the following hardware:
      
      System Information
              Manufacturer: Dell Inc.
              Product Name: PowerEdge R7615
      Processor Information
              Socket Designation: CPU1
              Type: Central Processor
              Family: Zen
              Manufacturer: AMD
              ID: 11 0F A1 00 FF FB 8B 17
              Signature: Family 25, Model 17, Stepping 1
              Version: AMD EPYC 9654P 96-Core Processor
              Core Count: 96
              Core Enabled: 96
              Thread Count: 192
      
      [core@worker-0 ~]$ sudo lspci -v | less
      41:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
              DeviceName: SL1 RAID
              Subsystem: Dell PERC H755N Front
              Flags: bus master, fast devsel, latency 0, IRQ 72, NUMA node 0, IOMMU group 17
              Memory at 90000000 (64-bit, prefetchable) [size=1M]
              Memory at 90100000 (64-bit, prefetchable) [size=1M]
              Memory at a4000000 (32-bit, non-prefetchable) [size=1M]
              I/O ports at 4000 [size=256]
              Expansion ROM at <ignored> [disabled]
              Capabilities: [40] Power Management version 3
              Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
              Capabilities: [70] Express Endpoint, MSI 00
              Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
              Capabilities: [100] Advanced Error Reporting
              Capabilities: [148] Power Budgeting <?>
              Capabilities: [158] Alternative Routing-ID Interpretation (ARI)
              Capabilities: [168] Secondary PCI Express
              Capabilities: [188] Physical Layer 16.0 GT/s <?>
              Capabilities: [1b0] Lane Margining at the Receiver <?>
              Capabilities: [248] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
              Capabilities: [348] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
              Capabilities: [380] Data Link Feature <?>
              Kernel driver in use: megaraid_sas
              Kernel modules: megaraid_sas
      

      Here we can see how megasas are using the isolated CPUS, lots of them:

      $ oc get performanceprofile -o yaml | head -30
      apiVersion: v1
      items:
      - apiVersion: performance.openshift.io/v2
        kind: PerformanceProfile
        metadata:
          name: blueprint-profile
        spec:
          additionalKernelArgs:
          - nohz_full=0-93,96-189
          cpu:
            isolated: 0-93,96-189
            reserved: 94-95,190-191
      
      $ CPUMAX=`cat /proc/cpuinfo | grep processor | tail -n 1 | egrep -o [0-9]*$`
      $ echo === NAME of IRQs for every CPU ===
      $ for C in `seq 0 $CPUMAX` ; do
        echo -n CPU${C}:
        IRQS=`grep -H ${C}  /proc/irq/*/effective_affinity_list | grep :${C}$ | cut -f 4  -d '/'`
        for I in $IRQS ; do
          IRQNAME=`cat /proc/interrupts | grep \ ${I}\: | awk '{print $(NF)}'`
          echo -n " "${IRQNAME}
        done
        echo
      done
      === NAME of IRQs for every CPU ===                                                                                                                            
      CPU0: timer                                                                                                                                                   
      CPU1:                                                                                                                                                         
      CPU2: AMD-Vi                                                                                                                                                  
      CPU3: AMD-Vi                                                                                                                                                 
      CPU4: AMD-Vi                                                                                                                                                  
      CPU5: AMD-Vi  
      ...
      CPU71:
      CPU72: megasas0-msix80                                                                                                                                        
      CPU73: megasas0-msix81                                                                                                                                        
      CPU74: megasas0-msix82                                                                                                                                        
      CPU75: megasas0-msix83                                                                                                                                        
      CPU76: megasas0-msix84                                                                                                                                        
      CPU77: megasas0-msix85                                                                                                                                        
      CPU78: megasas0-msix86                                                                                                                                        
      CPU79: megasas0-msix87                                                                                                                                        
      CPU80: megasas0-msix88                                                                                                                                        
      CPU81: megasas0-msix89                                                                                                                                        
      CPU82: megasas0-msix90
      CPU83: megasas0-msix91
      CPU84: megasas0-msix92
      CPU85: megasas0-msix93
      CPU86: megasas0-msix94
      CPU87: megasas0-msix95
      CPU88: megasas0-msix96 mlx5_comp1@pci:0000:81:00.0
      ...
      CPU96: megasas0-msix8                                                                                                                                         
      CPU97: megasas0-msix9                                                                                 
      CPU98: megasas0-msix10                                                                                                                                        
      CPU99: megasas0-msix11                                                                                                                                        
      CPU100: megasas0-msix12                                                                                                                                       
      CPU101: megasas0-msix13                                                                                                                                       
      CPU102: megasas0-msix14                                  
      CPU103: megasas0-msix15                                                                                                                                       
      CPU104: megasas0-msix16
      CPU105: megasas0-msix17
      CPU106: megasas0-msix18
      CPU107: megasas0-msix19
      CPU108: megasas0-msix20
      CPU109: megasas0-msix21
      CPU110: megasas0-msix22
      CPU111: megasas0-msix23
      CPU112: megasas0-msix24
      CPU113: megasas0-msix25
      CPU114: megasas0-msix26
      CPU115: megasas0-msix27
      CPU116: megasas0-msix28
      CPU117: megasas0-msix29
      CPU118: megasas0-msix30
      CPU119: megasas0-msix31
      CPU120: megasas0-msix32
      CPU121: megasas0-msix33
      CPU122: megasas0-msix34
      CPU123: megasas0-msix35
      CPU124: megasas0-msix36
      CPU125: megasas0-msix37
      CPU126: megasas0-msix38
      CPU127: megasas0-msix39
      CPU128: megasas0-msix40
      CPU129: megasas0-msix41
      CPU130: megasas0-msix42
      CPU131: megasas0-msix43                                                                                                                                       
      CPU132: megasas0-msix44                                                                                                                                       
      CPU133: megasas0-msix45                                                                                                                                       
      CPU134: megasas0-msix46                                                                                                                                       
      CPU135: megasas0-msix47                                                                                                                                       
      CPU136: megasas0-msix48                                                                                                                                       
      CPU137: megasas0-msix49                                                                                                                                       
      CPU138: megasas0-msix50                                                                                                                                       
      CPU139: megasas0-msix51                                                                                                                                       
      CPU140: megasas0-msix52                                                                                                                                       
      CPU141: megasas0-msix53                                                                    
      CPU142: megasas0-msix54                                                                                                                                       
      CPU143: megasas0-msix55                                                                                                                                       
      CPU144: megasas0-msix56                                                                                                                                       
      CPU145: megasas0-msix57                                                                                                                                       
      CPU146: megasas0-msix58                                                                                                                  
      CPU147: megasas0-msix59                                                                                                                                       
      CPU148: megasas0-msix60                                                                                                                                       
      CPU149: megasas0-msix61                                                                                                                                       
      CPU150: megasas0-msix62                                                                                                                                       
      CPU151: megasas0-msix63                                                                                                                                       
      CPU152: megasas0-msix64                                                                                                                                       
      CPU153: megasas0-msix65                                                                                                                                       
      CPU154: megasas0-msix66                                              
      CPU155: megasas0-msix67                                                                                                                                       
      CPU156: megasas0-msix68                                                                                                                                       
      CPU157: megasas0-msix69                                                                                                                                       
      CPU158: megasas0-msix70                                                                                                                                       
      CPU159: megasas0-msix71                                                        
      CPU160: megasas0-msix72                                                                                                                                       
      CPU161: megasas0-msix73                                                                                                                                       
      CPU162: megasas0-msix74                                                                                                                                       
      CPU163: megasas0-msix75                                                             
      CPU164: megasas0-msix76                                                                                                                                       
      CPU165: megasas0-msix77
      CPU166: megasas0-msix78                                                                                                                                       
      CPU167: megasas0-msix79                                                                                                                                       
      CPU168: megasas0-msix104                                                                                                                                      
      CPU169: megasas0-msix105                                                                                                                                      
      CPU170: megasas0-msix106                                                                                                                                      
      CPU171: megasas0-msix107                                                                                                                                      
      CPU172: megasas0-msix108                                                                                                                                      
      CPU173: megasas0-msix109                                                                                                                                      
      CPU174: megasas0-msix110                                                                                                                                      
      CPU175: megasas0-msix111                                                                   
      CPU176: megasas0-msix112                                                                                                                                      
      CPU177: megasas0-msix113                                                                                                                                      
      CPU178: megasas0-msix114                                                                                                                                      
      CPU179: megasas0-msix115                                                                                                                                      
      CPU180: megasas0-msix116                                                                                                                 
      CPU181: megasas0-msix117                                                                                                                                      
      CPU182: megasas0-msix118                                                                                                                                      
      CPU183: megasas0-msix119                                                                                                                                      
      CPU184: megasas0-msix120
          

      We even tested this work-around to set smp_affinity_enable=0, but we obtained the same results: https://www.suse.com/support/kb/doc/?id=000021663

      $ oc get performanceprofile blueprint-profile -o json | jq .spec.additionalKernelArgs
      [
        "nohz_full=0-93,96-189",
        "smp_affinity_enable=0"
      

      We found similar bug from OCP 4.6 https://bugzilla.redhat.com/show_bug.cgi?id=1908944 we are wondering if this could also be HW related.

      4.12 SOSReport can be found here: sosreport-worker-0-2025-06-17-oveuywj.tar.xz

              msivak@redhat.com Martin Sivak
              rhn-gps-manrodri Manuel Rodriguez
              None
              None
              Liquan Cui Liquan Cui
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: