-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.13.0
-
None
-
Critical
-
No
-
Rejected
-
False
-
Description of problem:
Our telco-tuned dut pods are not starting in 4.13 with these definitions: irq-load-balancing.crio.io: disable cpu-load-balancing.crio.io: disable 31s Normal Scheduled pod/sriov-testpmd Successfully assigned default/sriov-testpmd to ostest-zh6lq-worker-0 31s Normal AddedInterface pod/sriov-testpmd Add eth0 [10.131.0.44/23] from ovn-kubernetes 21s Normal Pulled pod/sriov-testpmd Container image "registry.redhat.io/openshift4/dpdk-base-rhel8:v4.10.0-5" already present on machine 21s Normal Created pod/sriov-testpmd Created container sriov-testpmd 11s Warning Failed pod/sriov-testpmd Error: failed to run pre-start hook for container "sriov-testpmd": set CPU load balancing: timed out waiting for the condition Removal of these components will result in a pod starting, but the performance will be significantly degraded. The attached file "ocp_4.13_sriov_performance_results.txt" contains detailed information and numbers. As a comparison, please refer to the attached file "ocp_4.12_sriov_performance_results.txt" for OCP 4.12 performance results in the same tuned lab environment.
Actual results:
Actual results: Per-port with packet loss - 0.013 Total Mpps - 0.026
Expected results:
Expected results: Per-port with 0 packet loss - 13.75 Total Mpps - 27.5
Additional info:
(shiftstack) [cloud-user@installer-host ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-03-14-053612 True False 5h19m Cluster version is 4.13.0-0.nightly-2023-03-14-053612 (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ (shiftstack) [cloud-user@installer-host ~]$ oc get PerformanceProfile -n openshift-cluster-node-tuning-operator -o yaml sriov-performanceprofile apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2023-03-16T09:27:28Z" finalizers: - foreground-deletion generation: 1 name: sriov-performanceprofile resourceVersion: "36702" uid: 49fb4534-e2f5-419c-99eb-1fc03309e03b spec: additionalKernelArgs: - nosmt - tsc=reliable cpu: isolated: 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 reserved: 0,1,2,3 hugepages: defaultHugepagesSize: 1G pages: - count: 7 node: 0 size: 1G nodeSelector: node-role.kubernetes.io/sriov: "" numa: topologyPolicy: best-effort realTimeKernel: enabled: false status: conditions: - lastHeartbeatTime: "2023-03-16T09:27:28Z" lastTransitionTime: "2023-03-16T09:27:28Z" status: "True" type: Available - lastHeartbeatTime: "2023-03-16T09:27:28Z" lastTransitionTime: "2023-03-16T09:27:28Z" status: "True" type: Upgradeable - lastHeartbeatTime: "2023-03-16T09:27:28Z" lastTransitionTime: "2023-03-16T09:27:28Z" status: "False" type: Progressing - lastHeartbeatTime: "2023-03-16T09:27:28Z" lastTransitionTime: "2023-03-16T09:27:28Z" status: "False" type: Degraded runtimeClass: performance-sriov-performanceprofile tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-sriov-performanceprofile core@ostest-zh6lq-worker-1 ~]$ cat /etc/os-release NAME="CentOS Stream CoreOS" ID="rhcos" ID_LIKE="rhel fedora" VERSION="413.92.202303061740-0" VERSION_ID="4.13" VARIANT="CoreOS" VARIANT_ID=coreos PLATFORM_ID="platform:el9" PRETTY_NAME="CentOS Stream CoreOS 413.92.202303061740-0 (Plow)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:9coreos" HOME_URL="https://centos.org/" DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.13/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.13" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.13" OPENSHIFT_VERSION="4.13" RHEL_VERSION="9" OSTREE_VERSION="413.92.202303061740-0"
- is blocked by
-
OCPNODE-1538 Support cpu load balancing on cgroupv1 on RHEL 9
- Closed
-
OCPBUGS-13163 [4.13] cgroupv1 support for cpu balancing is broken for non-SNO nodes
- Closed
- is related to
-
OCPBUGS-10600 [OCP 4.13] Telco DPDK performance degradation
- Closed
- relates to
-
OCPBUGS-10601 [OCP 4.13] Telco HW-Offload performance degradation
- Closed