-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.14.z
-
None
-
Important
-
No
-
False
-
-
Description of problem:
cyclictest several max latencies between 10us and 20us on 4.14.20 SNO with DU profile
Version-Release number of selected component (if applicable):
local-storage-operator.v4.14.0-202403261739 cluster-logging.v5.9.0 packageserver ptp-operator.v4.14.0-202403222237 sriov-network-operator.v4.14.0-202402270139 sriov-fec.v2.8.0
How reproducible:
always
Steps to Reproduce:
1. Deploy SNO with DU profile 2. Run a 1 hour cyclictest run with following pod [INFO] cyclictest git hash: 7645e91f3ae164274297cc9de9825bd318037919 [INFO] cyclictest image sha: sha256:ebe442b4c138cb0e1b8ff8612e0d76a5a6e4f0c9297bf1e40a49c80a3f9ebe37 [INFO] Pod spec apiVersion: v1 kind: Pod metadata: name: cyclictest0 annotations: # Disable CPU balance with CRIO irq-load-balancing.crio.io: "disable" cpu-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" labels: app: cyclictest spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: cyclictest operator: Exists topologyKey: "kubernetes.io/hostname" # Map to the correct performance class runtimeClassName: performance-openshift-node-performance-profile restartPolicy: Never # Force to fetch latest test image imagePullPolicy: Always containers: - name: container-perf-tools image: registry.kni-qe-23.kni.eng.rdu2.dc.redhat.com:5000/ran-test/cyclictest resources: requests: memory: "2Gi" cpu: 16 limits: memory: "2Gi" cpu: 16 env: - name: tool value: "cyclictest" - name: DURATION value: 1h - name: INTERVAL value: "1000" - name: delay value: "60" - name: rt_priority value: "95" - name: manual value: "n" - name: TRACE_THRESHOLD value: "20" - name: EXTRA_ARGS value: "--smi" # cyclictest requires privileged=true securityContext: privileged: true volumeMounts: - mountPath: /dev/cpu_dma_latency name: cstate nodeSelector: node-role.kubernetes.io/master: "" volumes: - name: cstate hostPath: path: /dev/cpu_dma_latency
Actual results:
########## container info ########### /proc/cmdline: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-970f80fb32850b8caa1c4f4b156f16855114d7b18ddbccfa3aee3f99ea9fd584/vmlinuz-5.14.0-284.59.1.rt14.344.el9_2.x86_64 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/970f80fb32850b8caa1c4f4b156f16855114d7b18ddbccfa3aee3f99ea9fd584/0 root=UUID=cb6413b4-2ab4-487b-91f4-2c1b90473d00 rw rootflags=prjquota boot=UUID=8b65a357-2ee4-4aa9-9c53-1c8a6eab1104 crashkernel=512M intel_iommu=on iommu=pt skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on rcu_nocbs=2-31,34-63 tuned.non_isolcpus=00000003,00000003 systemd.cpu_affinity=0,1,32,33 intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63 nohz_full=2-31,34-63 tsc=reliable nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 rcutree.kthread_prio=11 default_hugepagesz=1G hugepagesz=1G hugepages=32 rcupdate.rcu_normal_after_boot=0 vfio_pci.enable_sriov=1 vfio_pci.disable_idle_d3=1 efi=runtime module_blacklist=irdma intel_pstate=disable tsc=reliable systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller=1 ##################################### **** uid: 0 **** cyclictest0 5.14.0-284.59.1.rt14.344.el9_2.x86_64 realtime-tests-2.5-3.fc39.x86_64 allowed cpu list: 2-9,34-41 removing cpu34 from the cpu list because it is a sibling of cpu2 which will be the mainaffinity new cpu list: 3,4,5,6,7,8,9,35,36,37,38,39,40,41 running cmd: cyclictest -q -D 1h -p 95 -t 14 -a 3,4,5,6,7,8,9,35,36,37,38,39,40,41 -h 30 -i 1000 --mainaffinity 2 -m -b 20 --tracemark --smi sleep 60 before test # /dev/cpu_dma_latency set to 0us # Histogram 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000001 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000002 031956 031973 031971 031977 032082 031984 031974 269453 031915 031925 031946 031943 031961 031950 000003 3566827 3567262 3567291 3567390 3567259 3567370 3567246 3330090 3566707 3566937 3567143 3567061 3567357 3567165 000004 001083 000649 000647 000516 000544 000530 000662 000365 001240 001045 000790 000890 000587 000791 000005 000130 000112 000087 000115 000107 000104 000116 000088 000133 000086 000119 000103 000092 000089 000006 000001 000002 000002 000000 000003 000003 000000 000001 000001 000004 000001 000001 000001 000003 000007 000002 000000 000000 000000 000001 000001 000000 000001 000001 000002 000001 000001 000000 000000 000008 000000 000001 000000 000002 000002 000003 000000 000001 000003 000000 000000 000000 000001 000000 000009 000001 000000 000001 000000 000002 000000 000002 000000 000000 000000 000000 000001 000000 000001 000010 000000 000001 000001 000000 000000 000002 000000 000000 000000 000000 000000 000000 000001 000001 000011 000000 000000 000000 000000 000000 000003 000000 000000 000000 000000 000000 000000 000000 000000 000012 000000 000000 000000 000000 000000 000000 000000 000001 000000 000001 000000 000000 000000 000000 000013 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000014 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000015 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000016 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000017 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000018 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000019 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000020 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000021 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000022 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000023 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000024 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000025 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000026 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000027 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000028 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000029 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 # Total: 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 003600000 # Min Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 # Avg Latencies: 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 00002 # Max Latencies: 00009 00010 00010 00008 00009 00011 00009 00012 00008 00012 00007 00009 00010 00010 # Histogram Overflows: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 # Histogram Overflow at cycle number: # Thread 0: # Thread 1: # Thread 2: # Thread 3: # Thread 4: # Thread 5: # Thread 6: # Thread 7: # Thread 8: # Thread 9: # Thread 10: # Thread 11: # Thread 12: # Thread 13: # SMIs: 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 # Thread Ids: 00045 00046 00047 00048 00049 00050 00051 00052 00053 00054 00055 00056 00057 00058
Expected results:
Test fails with: Error: Total good latency runs percentage :99.99986111111112% is less than allowed lower threshold:99.9999% thread_id:6 min thread_id:1 value:2 avg thread_id:1 value:2 max thread_id:1 value:9 availability thread_id:1 percent:100.0 min thread_id:2 value:2 avg thread_id:2 value:2 max thread_id:2 value:10 availability thread_id:2 percent:99.99997222222223 min thread_id:3 value:2 avg thread_id:3 value:2 max thread_id:3 value:10 availability thread_id:3 percent:99.99997222222223 min thread_id:4 value:2 avg thread_id:4 value:2 max thread_id:4 value:8 availability thread_id:4 percent:100.0 min thread_id:5 value:2 avg thread_id:5 value:2 max thread_id:5 value:9 availability thread_id:5 percent:100.0 min thread_id:6 value:2 avg thread_id:6 value:2 max thread_id:6 value:11 availability thread_id:6 percent:99.99986111111112 min thread_id:7 value:2 avg thread_id:7 value:2 max thread_id:7 value:9 availability thread_id:7 percent:100.0 min thread_id:8 value:2 avg thread_id:8 value:2 max thread_id:8 value:12 availability thread_id:8 percent:99.99997222222223 min thread_id:9 value:2 avg thread_id:9 value:2 max thread_id:9 value:8 availability thread_id:9 percent:100.0 min thread_id:10 value:2 avg thread_id:10 value:2 max thread_id:10 value:12 availability thread_id:10 percent:99.99997222222223 min thread_id:11 value:2 avg thread_id:11 value:2 max thread_id:11 value:7 availability thread_id:11 percent:100.0 min thread_id:12 value:2 avg thread_id:12 value:2 max thread_id:12 value:9 availability thread_id:12 percent:100.0 min thread_id:13 value:2 avg thread_id:13 value:2 max thread_id:13 value:10 availability thread_id:13 percent:99.99997222222223 min thread_id:14 value:2 avg thread_id:14 value:2 max thread_id:14 value:10 availability thread_id:14 percent:99.99997222222223
Additional info:
must-gather available in https://s3.upshift.redhat.com/DH-PROD-OCP-EDGE-QE-CI/ocp-far-edge-vran-collect/4410/index.html Please let me know what other info I could gather.