Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-26820

[2177668] [DPDK latency checkup] Traffic generator cannot start due to multiple environment vars with PCIDEVICE_ prefix

XMLWordPrintable

    • Urgent

      Description of problem:
      When running the latency checkup job for testing DPDK, the traffic generator fails to start due to inability to locate the unique environment variable it is looking for.

      Version-Release number of selected component (if applicable):
      CNV 4.13.0
      Letncy checkup: registry.redhat.io/container-native-virtualization/vm-network-latency-checkup-rhel9

      How reproducible:
      Always

      Steps to Reproduce:
      1. On a cluster with SR-IOV supported - create the following namespace:
      $ oc create ns dpdk-checkup-ns
      namespace/dpdk-checkup-ns created

      2. Change the cluster context to be in the new namespace:
      $ oc project dpdk-checkup-ns
      Now using project "dpdk-checkup-ns" on server "https://api.bm02-cnvqe2-rdu2.cnvqe2.lab.eng.rdu2.redhat.com:6443".

      3. Apply the following resources, in order to run latency checkup job that tests DPDK (the resources are attached):
      $ oc apply -f dpdk-latency-checkup-infra.yaml
      serviceaccount/dpdk-checkup-sa created
      role.rbac.authorization.k8s.io/kiagnose-configmap-access created
      rolebinding.rbac.authorization.k8s.io/kiagnose-configmap-access created
      role.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
      rolebinding.rbac.authorization.k8s.io/kubevirt-dpdk-checker created
      $
      $ oc apply -f dpdk-latency-checkup-cm.yaml
      configmap/dpdk-checkup-config created
      $

      4. Start the latency checkup job using the attached resource:
      $ oc apply -f dpdk-latency-checkup-job.yaml
      job.batch/dpdk-checkup created

      5. While the job runs - find the traffic-generator pod:
      $ oc get pods -n dpdk-checkup-ns
      NAME READY STATUS RESTARTS AGE
      dpdk-checkup-xzcvt 1/1 Running 0 25s
      kubevirt-dpdk-checkup-traffic-gen-tzb2h 0/1 CrashLoopBackOff 1 (12s ago) 22s
      virt-launcher-dpdk-vmi-v6l69-jd52z 0/2 PodInitializing 0 22s

      6. Check the log of the traffic generator pod:
      $ oc logs kubevirt-dpdk-checkup-traffic-gen-tzb2h --follow
      setting params to trex_cfg.yaml
      + set_pci_addresses
      ++ get_pci_device_env_var
      +++ grep PCIDEVICE_
      +++ env
      ++ local 'pci_device_env_with_value=PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK=0000:19:0a.1,0000:19:0a.0
      PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK_INFO={"0000:19:0a.0":{"generic":

      {"deviceID":"0000:19:0a.0"}

      ,"vfio":{"dev-mount":"/dev/vfio/186","mount":"/dev/vfio/vfio"}},"0000:19:0a.1":{"generic":

      {"deviceID":"0000:19:0a.1"}

      ,"vfio":

      {"dev-mount":"/dev/vfio/187","mount":"/dev/vfio/vfio"}

      }}'
      +++ wc -l
      +++ echo 'PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK=0000:19:0a.1,0000:19:0a.0
      PCIDEVICE_OPENSHIFT_IO_INTEL_NICS_DPDK_INFO={"0000:19:0a.0":{"generic":

      {"deviceID":"0000:19:0a.0"}

      ,"vfio":{"dev-mount":"/dev/vfio/186","mount":"/dev/vfio/vfio"}},"0000:19:0a.1":{"generic":

      {"deviceID":"0000:19:0a.1"}

      ,"vfio":

      {"dev-mount":"/dev/vfio/187","mount":"/dev/vfio/vfio"}

      }}'
      ++ '[' 2 '!=' 1 ']'
      ++ echo 'error: could not find pci device env var'
      ++ exit 1
      + local 'pci_device_env_name=error: could not find pci device env var'
      + IFS=,
      + read -r -a nics_array
      /opt/scripts/set_traffic_gen_cfg_file.sh: line 73: error: could not find pci device env var: invalid variable name

      Checking the log shows that the the flow looks for a single environment variable with a `PCIDEVICE_` prefix, but it finds 2, and because it cannot determine which is the relevant var - it fails.

      Actual results:
      <BUG> Traffic generator fails.

      Expected results:
      The generator should complete its role and generate traffic.

      Additional info:
      By checking the log of the traffic generator pod (pasted above), we can see that the source of this issue is that the the flow looks for a single environment variable with a `PCIDEVICE_` prefix, but it finds 2, and because it cannot determine which is the relevant var - it fails.

            ralavi@redhat.com Ram Lavi
            ysegev@redhat.com Yossi Segev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: