-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14.z
-
None
-
Important
-
No
-
CNF Ran Sprint 252, CNF Ran Sprint 253, CNF Ran Sprint 254
-
3
-
False
-
-
4/24: Fix is available (RHEL-9148 ) - needs backport to 9.2.0.z
-
Description of problem:
On SNO spoke with telco DU profile applied, oslat reported 45us latency spike on a 1h run
Version-Release number of selected component (if applicable):
4.14.20 local-storage-operator.v4.14.0-202403261739 cluster-logging.v5.9.0 packageserver ptp-operator.v4.14.0-202403222237 sriov-network-operator.v4.14.0-202402270139 sriov-fec.v2.8.0
How reproducible:
always
Steps to Reproduce:
1. Deploy DU node 2. Run OSLAT test pod [INFO] oslat git hash: ea82509d664d72992068c3a1fc41f9a66e2c3f99 [INFO] oslat image sha: sha256:4b568365d42fd6198aafa6d7ac61a2a6dc842521acb739f05647d5f9b36cca40 [INFO] Pod spec apiVersion: v1 kind: Pod metadata: name: oslat0 annotations: # Disable CPU balance with CRIO irq-load-balancing.crio.io: "disable" cpu-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" labels: app: oslat spec: restartPolicy: Never runtimeClassName: performance-openshift-node-performance-profile affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - oslat topologyKey: "kubernetes.io/hostname" containers: - args: name: container-perf-tools image: registry.kni-qe-22.kni.eng.rdu2.dc.redhat.com:5000/ran-test/oslat # Force to fetch latest test image imagePullPolicy: Always resources: limits: cpu: 16 memory: 2Gi requests: cpu: 16 memory: 2Gi env: - name: tool value: "oslat" - name: RUNTIME_SECONDS value: 1h - name: INITIAL_DELAY_SEC value: "30" - name: PRIO value: "1" - name: delay value: "60" - name: manual value: "n" - name: TRACE_THRESHOLD value: "20" securityContext: privileged: true volumeMounts: - mountPath: /dev/cpu_dma_latency name: cstate nodeSelector: node-role.kubernetes.io/master: "" volumes: - name: cstate hostPath: path: /dev/cpu_dma_latency
Actual results:
oslat: Trace threshold (20 us) triggered on cpu 41 with 45 us!
Expected results:
All samples below 20us
Additional info:
trace file: http://registry.kni-qe-22.kni.eng.rdu2.dc.redhat.com:8080/images/sno.kni-qe-12.lab.eng.rdu2.redhat.com-oslat-kernel-trace.txt
- is caused by
-
RHEL-9148 Interrupt thread not affined after interrupt reaffined
- Release Pending