Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32031

oslat 45us spike 1h run on 4.14.20

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • Telco Performance
    • None
    • Important
    • No
    • CNF Ran Sprint 252, CNF Ran Sprint 253, CNF Ran Sprint 254, CNF RAN Sprint 255, CNF RAN Sprint 256, CNF RAN Sprint 257, CNF RAN Sprint 258, CNF RAN Sprint 259, CNF RAN Sprint 260, CNF RAN Sprint 262, CNF RAN Sprint 263, CNF RAN Sprint 264
    • 12
    • False
    • Hide

      None

      Show
      None
    • Hide
      2024/11/18: Fix is now available in RHEL 9 (RHEL-9148) - backports will be required to RHEL 9.4 (OCP 4.16) and RHEL 9.2 (OCP 4.14).
      2024/11/1: No change - still waiting.
      2024/9/24: Still waiting for fix to be verified (RHEL-9148 ) - needs backport to 9.2.0.z
      Show
      2024/11/18: Fix is now available in RHEL 9 ( RHEL-9148 ) - backports will be required to RHEL 9.4 (OCP 4.16) and RHEL 9.2 (OCP 4.14). 2024/11/1: No change - still waiting. 2024/9/24: Still waiting for fix to be verified ( RHEL-9148 ) - needs backport to 9.2.0.z

      Description of problem:

          On SNO spoke with telco DU profile applied, oslat reported 45us latency spike on a 1h run
      

      Version-Release number of selected component (if applicable):

          4.14.20
      
      local-storage-operator.v4.14.0-202403261739 
      cluster-logging.v5.9.0                      
      packageserver                               
      ptp-operator.v4.14.0-202403222237           
      sriov-network-operator.v4.14.0-202402270139 
      sriov-fec.v2.8.0                            
      

      How reproducible:

          always

      Steps to Reproduce:

          1. Deploy DU node
          2. Run OSLAT test pod
      
      [INFO] oslat git hash: ea82509d664d72992068c3a1fc41f9a66e2c3f99
      [INFO] oslat image sha: sha256:4b568365d42fd6198aafa6d7ac61a2a6dc842521acb739f05647d5f9b36cca40
      [INFO] Pod spec
      apiVersion: v1
      kind: Pod
      metadata:
        name: oslat0
        annotations:
          # Disable CPU balance with CRIO
          irq-load-balancing.crio.io: "disable"
          cpu-load-balancing.crio.io: "disable"
          cpu-quota.crio.io: "disable"
        labels:
          app: oslat
      spec:
        restartPolicy: Never
        runtimeClassName: performance-openshift-node-performance-profile
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                   - oslat
              topologyKey: "kubernetes.io/hostname"
        containers:
        - args:
          name: container-perf-tools
          image: registry.kni-qe-22.kni.eng.rdu2.dc.redhat.com:5000/ran-test/oslat
          # Force to fetch latest test  image
          imagePullPolicy: Always
          resources:
            limits:
              cpu: 16
              memory: 2Gi
            requests:
              cpu: 16
              memory: 2Gi
          env:
          - name: tool
            value: "oslat"
          - name: RUNTIME_SECONDS
            value: 1h
          - name: INITIAL_DELAY_SEC
            value: "30"
          - name: PRIO
            value: "1"
          - name: delay
            value: "60"
          - name: manual
            value: "n"
          - name: TRACE_THRESHOLD
            value: "20"
          securityContext:
            privileged: true
          volumeMounts:
          - mountPath: /dev/cpu_dma_latency
            name: cstate
        nodeSelector:
          node-role.kubernetes.io/master: ""
        volumes:
        - name: cstate
          hostPath:
            path: /dev/cpu_dma_latency

      Actual results:

          oslat: Trace threshold (20 us) triggered on cpu 41 with 45 us!
      

      Expected results:

              All samples below 20us
      

      Additional info:

          trace file: http://registry.kni-qe-22.kni.eng.rdu2.dc.redhat.com:8080/images/sno.kni-qe-12.lab.eng.rdu2.redhat.com-oslat-kernel-trace.txt

              rh-ee-cshulyup Costa Shulyupin
              mcornea@redhat.com Marius Cornea
              Marius Cornea Marius Cornea
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: