Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10402

[OCP 4.13] Telco SR-IOV performance degradation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 4.13.0
    • Node Tuning Operator
    • None

      Description of problem:

      Our telco-tuned dut pods are not starting in 4.13 with these definitions:
      irq-load-balancing.crio.io: disable
      cpu-load-balancing.crio.io: disable
      
      31s         Normal    Scheduled        pod/sriov-testpmd   Successfully assigned default/sriov-testpmd to ostest-zh6lq-worker-0
      31s         Normal    AddedInterface   pod/sriov-testpmd   Add eth0 [10.131.0.44/23] from ovn-kubernetes
      21s         Normal    Pulled           pod/sriov-testpmd   Container image "registry.redhat.io/openshift4/dpdk-base-rhel8:v4.10.0-5" already present on machine
      21s         Normal    Created          pod/sriov-testpmd   Created container sriov-testpmd
      11s         Warning   Failed           pod/sriov-testpmd   Error: failed to run pre-start hook for container "sriov-testpmd": set CPU load balancing: timed out waiting for the condition
      
      Removal of these components will result in a pod starting, but the performance will be significantly degraded.
      The attached file "ocp_4.13_sriov_performance_results.txt" contains detailed information and numbers.
      As a comparison, please refer to the attached file "ocp_4.12_sriov_performance_results.txt" for OCP 4.12 performance results in the same tuned lab environment.
      
      

      Actual results:

      Actual results:
      Per-port with packet loss - 0.013 
      Total Mpps - 0.026

      Expected results:

      Expected results:
      Per-port with 0 packet loss - 13.75
      Total Mpps - 27.5

      Additional info:

      (shiftstack) [cloud-user@installer-host ~]$ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.13.0-0.nightly-2023-03-14-053612   True        False         5h19m   Cluster version is 4.13.0-0.nightly-2023-03-14-053612
      (shiftstack) [cloud-user@installer-host ~]$
      (shiftstack) [cloud-user@installer-host ~]$
      (shiftstack) [cloud-user@installer-host ~]$
      (shiftstack) [cloud-user@installer-host ~]$ oc get PerformanceProfile -n openshift-cluster-node-tuning-operator -o yaml sriov-performanceprofile
      apiVersion: performance.openshift.io/v2
      kind: PerformanceProfile
      metadata:
        creationTimestamp: "2023-03-16T09:27:28Z"
        finalizers:
        - foreground-deletion
        generation: 1
        name: sriov-performanceprofile
        resourceVersion: "36702"
        uid: 49fb4534-e2f5-419c-99eb-1fc03309e03b
      spec:
        additionalKernelArgs:
        - nosmt
        - tsc=reliable
        cpu:
          isolated: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
          reserved: 0,1,2,3
        hugepages:
          defaultHugepagesSize: 1G
          pages:
          - count: 7
            node: 0
            size: 1G
        nodeSelector:
          node-role.kubernetes.io/sriov: ""
        numa:
          topologyPolicy: best-effort
        realTimeKernel:
          enabled: false
      status:
        conditions:
        - lastHeartbeatTime: "2023-03-16T09:27:28Z"
          lastTransitionTime: "2023-03-16T09:27:28Z"
          status: "True"
          type: Available
        - lastHeartbeatTime: "2023-03-16T09:27:28Z"
          lastTransitionTime: "2023-03-16T09:27:28Z"
          status: "True"
          type: Upgradeable
        - lastHeartbeatTime: "2023-03-16T09:27:28Z"
          lastTransitionTime: "2023-03-16T09:27:28Z"
          status: "False"
          type: Progressing
        - lastHeartbeatTime: "2023-03-16T09:27:28Z"
          lastTransitionTime: "2023-03-16T09:27:28Z"
          status: "False"
          type: Degraded
        runtimeClass: performance-sriov-performanceprofile
        tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-sriov-performanceprofile
      
      
      core@ostest-zh6lq-worker-1 ~]$ cat /etc/os-release
      NAME="CentOS Stream CoreOS"
      ID="rhcos"
      ID_LIKE="rhel fedora"
      VERSION="413.92.202303061740-0"
      VERSION_ID="4.13"
      VARIANT="CoreOS"
      VARIANT_ID=coreos
      PLATFORM_ID="platform:el9"
      PRETTY_NAME="CentOS Stream CoreOS 413.92.202303061740-0 (Plow)"
      ANSI_COLOR="0;31"
      CPE_NAME="cpe:/o:centos:centos:9coreos"
      HOME_URL="https://centos.org/"
      DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/"
      BUG_REPORT_URL="https://bugzilla.redhat.com/"
      REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
      REDHAT_BUGZILLA_PRODUCT_VERSION="4.13"
      REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
      REDHAT_SUPPORT_PRODUCT_VERSION="4.13"
      OPENSHIFT_VERSION="4.13"
      RHEL_VERSION="9"
      OSTREE_VERSION="413.92.202303061740-0"
       

            yquinn@redhat.com Yanir Quinn
            zgreenbe@redhat.com Ziv Greenberg
            Mallapadi Niranjan Mallapadi Niranjan
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: