Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-28721

[2196459] [DPDK checkup] Pods are scheduled on reserved instead of isolated CPUs

XMLWordPrintable

    • High

      Created attachment 1963483 [details]
      DPDK checkup resources manifests

      Description of problem:
      When configuring a DPDK checkup job, the user sets (in the PerformanceProfile resource) the isolated CPUs on which the job's pods should be scheduled.
      In practice, the pods are scheduled on the reserved CPUs, which are supposed to remain untouched and left for the OS to use.

      Version-Release number of selected component (if applicable):
      CNV 4.13.0
      container-native-virtualization-kubevirt-dpdk-checkup-rhel9:v4.13.0-37

      How reproducible:
      100%

      Steps to Reproduce:
      1. Make sure the kubelet CPU manager is enabled (follow https://docs.openshift.com/container-platform/4.12/scalability_and_performance/using-cpu-manager.html#seting_up_cpu_manager_using-cpu-manager-and-topology_manager if necessary).

      2. Create namespace for the job, and change context to the new namespace.
      $ oc create ns dpdk-checkup-ns
      $ oc project dpdk-checkup-ns

      3. Label the worker nodes with "worker-dpdk" label.

      4. Apply the resources manifests in the attached file in their numeric order:
      $ oc apply -f 1-dpdk-checkup-resources.yaml
      $ oc apply -f 2-dpdk-checkup-scc.yaml
      ...
      change the resources according to your cluster.

      Please note:
      Due to https://bugzilla.redhat.com/show_bug.cgi?id=2193235, you cannot set which nodes will be used for scheduling the VM and the traffic generator.
      Therefore, you must W/A it by either uncordoning 2 workers and leaving only one as schedulable, or removing the "dpdk-workers" label from 2 nodes and keeping it on only one node.

      5. Follow the pods, and wait for the traffic generator and the VM virt-launcher pods to run:
      $ oc get pods -w
      NAME READY STATUS RESTARTS AGE
      dpdk-checkup-zprz7 1/1 Running 0 11s
      kubevirt-dpdk-checkup-traffic-gen-h89m5 1/1 Running 0 7s
      virt-launcher-dpdk-vmi-rg8nl-2fnjq 0/2 Init:0/2 0 7s
      virt-launcher-dpdk-vmi-rg8nl-2fnjq 0/2 Init:1/2 0 9s
      ocvirt-launcher-dpdk-vmi-rg8nl-2fnjq 0/2 PodInitializing 0 15s
      virt-launcher-dpdk-vmi-rg8nl-2fnjq 2/2 Running 0 21s
      virt-launcher-dpdk-vmi-rg8nl-2fnjq 2/2 Running 0 21s

      6. In each of these pods, check which CPUs are used for scheduling:
      ysegev@ysegev-fedora (dpdk-checkup) $ oc exec -it kubevirt-dpdk-checkup-traffic-gen-h89m5 – cat /sys/fs/cgroup/cpuset/cpuset.cpus
      2,4,6,8,42,44,46,48
      ysegev@ysegev-fedora (dpdk-checkup) $ oc exec -it virt-launcher-dpdk-vmi-rg8nl-2fnjq – cat /sys/fs/cgroup/cpuset/cpuset.cpus
      10,12,14,16,50,52,54,56

      Actual results:
      The CPUs used for scheduling each of these pods are those which are set as "reserved" in the PerformanceProfile resource:
      $ oc get performanceprofile profile-1 -ojsonpath=

      {.spec.cpu}

      | jq
      {
      "isolated": "20,22,24,26,28,30,32,34,36,38,60,62,64,66,68,70,72,74,76,78",
      "reserved": "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,23,25,27,29,31,33,35,37,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,61,63,65,67,69,71,73,75,77,79"
      }

      Expected results:
      The CPUs used for scheduling each of these pods should be from the "isolated" list.

            omisan@redhat.com Orel Misan
            ysegev@redhat.com Yossi Segev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: