Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32035

cyclictest several max latencies between 10us and 20us on 4.14.20 SNO with DU profile

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • Telco Performance
    • None
    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

         cyclictest several max latencies between 10us and 20us on 4.14.20 SNO with DU profile

      Version-Release number of selected component (if applicable):

      local-storage-operator.v4.14.0-202403261739 
      cluster-logging.v5.9.0                      
      packageserver                               
      ptp-operator.v4.14.0-202403222237           
      sriov-network-operator.v4.14.0-202402270139 
      sriov-fec.v2.8.0                            
      

      How reproducible:

          always

      Steps to Reproduce:

          1. Deploy SNO with DU profile
          2. Run a 1 hour cyclictest run with following pod
      
      [INFO] cyclictest git hash: 7645e91f3ae164274297cc9de9825bd318037919
      [INFO] cyclictest image sha: sha256:ebe442b4c138cb0e1b8ff8612e0d76a5a6e4f0c9297bf1e40a49c80a3f9ebe37
      [INFO] Pod spec
      apiVersion: v1
      kind: Pod
      metadata:
        name: cyclictest0
        annotations:
          # Disable CPU balance with CRIO
          irq-load-balancing.crio.io: "disable"
          cpu-load-balancing.crio.io: "disable"
          cpu-quota.crio.io: "disable"
        labels:
          app: cyclictest
      spec:
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: cyclictest
                  operator: Exists
              topologyKey: "kubernetes.io/hostname"
        # Map to the correct performance class
        runtimeClassName: performance-openshift-node-performance-profile
        restartPolicy: Never
        # Force to fetch latest test  image
        imagePullPolicy: Always
        containers:
        - name: container-perf-tools
          image: registry.kni-qe-23.kni.eng.rdu2.dc.redhat.com:5000/ran-test/cyclictest
          resources:
            requests:
              memory: "2Gi"
              cpu: 16
            limits:
              memory: "2Gi"
              cpu: 16
          env:
          - name: tool
            value: "cyclictest"
          - name: DURATION
            value: 1h
          - name: INTERVAL
            value: "1000"
          - name: delay
            value: "60"
          - name: rt_priority
            value: "95"
          - name: manual
            value: "n"
          - name: TRACE_THRESHOLD
            value: "20"
          - name: EXTRA_ARGS
            value: "--smi"
          # cyclictest requires privileged=true
          securityContext:
            privileged: true
          volumeMounts:
          - mountPath: /dev/cpu_dma_latency
            name: cstate
        nodeSelector:
          node-role.kubernetes.io/master: ""
        volumes:
        - name: cstate
          hostPath:
            path: /dev/cpu_dma_latency        

      Actual results:

      ########## container info ###########
      /proc/cmdline:
      BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-970f80fb32850b8caa1c4f4b156f16855114d7b18ddbccfa3aee3f99ea9fd584/vmlinuz-5.14.0-284.59.1.rt14.344.el9_2.x86_64 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/970f80fb32850b8caa1c4f4b156f16855114d7b18ddbccfa3aee3f99ea9fd584/0 root=UUID=cb6413b4-2ab4-487b-91f4-2c1b90473d00 rw rootflags=prjquota boot=UUID=8b65a357-2ee4-4aa9-9c53-1c8a6eab1104 crashkernel=512M intel_iommu=on iommu=pt skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-31,34-63 tuned.non_isolcpus=00000003,00000003 systemd.cpu_affinity=0,1,32,33 intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63 nohz_full=2-31,34-63 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 default_hugepagesz=1G hugepagesz=1G hugepages=32 rcupdate.rcu_normal_after_boot=0 vfio_pci.enable_sriov=1 vfio_pci.disable_idle_d3=1 efi=runtime module_blacklist=irdma intel_pstate=disable tsc=reliable systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1
      #####################################
      **** uid: 0 ****
      cyclictest0 5.14.0-284.59.1.rt14.344.el9_2.x86_64
      realtime-tests-2.5-3.fc39.x86_64
      allowed cpu list: 2-9,34-41
      removing cpu34 from the cpu list because it is a sibling of cpu2 which will be the mainaffinity
      new cpu list: 3,4,5,6,7,8,9,35,36,37,38,39,40,41
      running cmd: cyclictest -q -D 1h -p 95 -t 14 -a 3,4,5,6,7,8,9,35,36,37,38,39,40,41 -h 30 -i 1000 --mainaffinity 2 -m  -b 20 --tracemark --smi
      sleep 60 before test
      # /dev/cpu_dma_latency set to 0us
      # Histogram
      000000 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000001 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000002 031956	031973	031971	031977	032082	031984	031974	269453	031915	031925	031946	031943	031961	031950
      000003 3566827	3567262	3567291	3567390	3567259	3567370	3567246	3330090	3566707	3566937	3567143	3567061	3567357	3567165
      000004 001083	000649	000647	000516	000544	000530	000662	000365	001240	001045	000790	000890	000587	000791
      000005 000130	000112	000087	000115	000107	000104	000116	000088	000133	000086	000119	000103	000092	000089
      000006 000001	000002	000002	000000	000003	000003	000000	000001	000001	000004	000001	000001	000001	000003
      000007 000002	000000	000000	000000	000001	000001	000000	000001	000001	000002	000001	000001	000000	000000
      000008 000000	000001	000000	000002	000002	000003	000000	000001	000003	000000	000000	000000	000001	000000
      000009 000001	000000	000001	000000	000002	000000	000002	000000	000000	000000	000000	000001	000000	000001
      000010 000000	000001	000001	000000	000000	000002	000000	000000	000000	000000	000000	000000	000001	000001
      000011 000000	000000	000000	000000	000000	000003	000000	000000	000000	000000	000000	000000	000000	000000
      000012 000000	000000	000000	000000	000000	000000	000000	000001	000000	000001	000000	000000	000000	000000
      000013 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000014 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000015 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000016 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000017 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000018 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000019 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000020 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000021 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000022 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000023 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000024 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000025 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000026 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000027 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000028 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      000029 000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000	000000
      # Total: 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000
      # Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
      # Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002
      # Max Latencies: 00009 00010 00010 00008 00009 00011 00009 00012 00008 00012 00007 00009 00010 00010
      # Histogram Overflows: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
      # Histogram Overflow at cycle number:
      # Thread 0:
      # Thread 1:
      # Thread 2:
      # Thread 3:
      # Thread 4:
      # Thread 5:
      # Thread 6:
      # Thread 7:
      # Thread 8:
      # Thread 9:
      # Thread 10:
      # Thread 11:
      # Thread 12:
      # Thread 13:
      # SMIs: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000
      
      # Thread Ids: 00045 00046 00047 00048 00049 00050 00051 00052 00053 00054 00055 00056 00057 00058    

      Expected results:

      Test fails with:
      
      Error: Total good latency runs percentage :99.99986111111112% is less than allowed lower threshold:99.9999% thread_id:6
      
      min thread_id:1 value:2
      avg thread_id:1 value:2
      max thread_id:1 value:9
      availability thread_id:1 percent:100.0
      min thread_id:2 value:2
      avg thread_id:2 value:2
      max thread_id:2 value:10
      availability thread_id:2 percent:99.99997222222223
      min thread_id:3 value:2
      avg thread_id:3 value:2
      max thread_id:3 value:10
      availability thread_id:3 percent:99.99997222222223
      min thread_id:4 value:2
      avg thread_id:4 value:2
      max thread_id:4 value:8
      availability thread_id:4 percent:100.0
      min thread_id:5 value:2
      avg thread_id:5 value:2
      max thread_id:5 value:9
      availability thread_id:5 percent:100.0
      min thread_id:6 value:2
      avg thread_id:6 value:2
      max thread_id:6 value:11
      availability thread_id:6 percent:99.99986111111112
      min thread_id:7 value:2
      avg thread_id:7 value:2
      max thread_id:7 value:9
      availability thread_id:7 percent:100.0
      min thread_id:8 value:2
      avg thread_id:8 value:2
      max thread_id:8 value:12
      availability thread_id:8 percent:99.99997222222223
      min thread_id:9 value:2
      avg thread_id:9 value:2
      max thread_id:9 value:8
      availability thread_id:9 percent:100.0
      min thread_id:10 value:2
      avg thread_id:10 value:2
      max thread_id:10 value:12
      availability thread_id:10 percent:99.99997222222223
      min thread_id:11 value:2
      avg thread_id:11 value:2
      max thread_id:11 value:7
      availability thread_id:11 percent:100.0
      min thread_id:12 value:2
      avg thread_id:12 value:2
      max thread_id:12 value:9
      availability thread_id:12 percent:100.0
      min thread_id:13 value:2
      avg thread_id:13 value:2
      max thread_id:13 value:10
      availability thread_id:13 percent:99.99997222222223
      min thread_id:14 value:2
      avg thread_id:14 value:2
      max thread_id:14 value:10
      availability thread_id:14 percent:99.99997222222223

      Additional info:

      must-gather available in https://s3.upshift.redhat.com/DH-PROD-OCP-EDGE-QE-CI/ocp-far-edge-vran-collect/4410/index.html    
      
      Please let me know what other info I could gather.

            bwensley@redhat.com Bart Wensley
            mcornea@redhat.com Marius Cornea
            Marius Cornea Marius Cornea
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: