-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
No
-
-
None
-
None
-
None
-
CNF RAN Sprint 277, CNF RAN Sprint 278, CNF RAN Sprint 279, CNF RAN Sprint 280, CNF RAN Sprint 281, CNF RAN Sprint 282, CNF RAN Sprint 283, CNF RAN Sprint 284
-
8
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
On SNO spoke with telco DU profile applied, oslat reported 45us latency spike on a 1h run
Version-Release number of selected component (if applicable):
4.14.20 local-storage-operator.v4.14.0-202403261739 cluster-logging.v5.9.0 packageserver ptp-operator.v4.14.0-202403222237 sriov-network-operator.v4.14.0-202402270139 sriov-fec.v2.8.0
How reproducible:
always
Steps to Reproduce:
1. Deploy DU node
2. Run OSLAT test pod
[INFO] oslat git hash: ea82509d664d72992068c3a1fc41f9a66e2c3f99
[INFO] oslat image sha: sha256:4b568365d42fd6198aafa6d7ac61a2a6dc842521acb739f05647d5f9b36cca40
[INFO] Pod spec
apiVersion: v1
kind: Pod
metadata:
name: oslat0
annotations:
# Disable CPU balance with CRIO
irq-load-balancing.crio.io: "disable"
cpu-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
labels:
app: oslat
spec:
restartPolicy: Never
runtimeClassName: performance-openshift-node-performance-profile
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- oslat
topologyKey: "kubernetes.io/hostname"
containers:
- args:
name: container-perf-tools
image: registry.kni-qe-22.kni.eng.rdu2.dc.redhat.com:5000/ran-test/oslat
# Force to fetch latest test image
imagePullPolicy: Always
resources:
limits:
cpu: 16
memory: 2Gi
requests:
cpu: 16
memory: 2Gi
env:
- name: tool
value: "oslat"
- name: RUNTIME_SECONDS
value: 1h
- name: INITIAL_DELAY_SEC
value: "30"
- name: PRIO
value: "1"
- name: delay
value: "60"
- name: manual
value: "n"
- name: TRACE_THRESHOLD
value: "20"
securityContext:
privileged: true
volumeMounts:
- mountPath: /dev/cpu_dma_latency
name: cstate
nodeSelector:
node-role.kubernetes.io/master: ""
volumes:
- name: cstate
hostPath:
path: /dev/cpu_dma_latency
Actual results:
oslat: Trace threshold (20 us) triggered on cpu 41 with 45 us!
Expected results:
All samples below 20us
Additional info:
trace file: http://registry.kni-qe-22.kni.eng.rdu2.dc.redhat.com:8080/images/sno.kni-qe-12.lab.eng.rdu2.redhat.com-oslat-kernel-trace.txt
- is caused by
-
RHEL-9148 Interrupt thread not affined after interrupt reaffined
-
- Closed
-