Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23311

Intel 710 SR-IOV: Pod Initialization Time Exceeds 1 Minute

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.12.z
    • Networking / SR-IOV
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • Hide
      2024-03-20

      Dev - Initially, the rhel team asked us to run this test with a fix they had for a similar bug. The test passed. We waited some weeks for the fix to be included in 4.12 latest. When this happened, the test was run again. It failed with the same initial error. This was reported back (with logs) to the rhel team and a remainder was sent the week after since no response was obtained. Dev engineer went on PTO for 3 weeks and bug was unattended, still no response from rhel team during that time. Last week, the test was run again with 4.12 latest to confirm the bug was still present. Logs were once again shared with the rhel team. If by the end of this week, no answer is obtained from the rhel engineer assigned to the ticket, we will ping another engineer who is working on another X710 bug.
      Show
      2024-03-20 Dev - Initially, the rhel team asked us to run this test with a fix they had for a similar bug. The test passed. We waited some weeks for the fix to be included in 4.12 latest. When this happened, the test was run again. It failed with the same initial error. This was reported back (with logs) to the rhel team and a remainder was sent the week after since no response was obtained. Dev engineer went on PTO for 3 weeks and bug was unattended, still no response from rhel team during that time. Last week, the test was run again with 4.12 latest to confirm the bug was still present. Logs were once again shared with the rhel team. If by the end of this week, no answer is obtained from the rhel engineer assigned to the ticket, we will ping another engineer who is working on another X710 bug.
    • None
    • None
    • None
    • CNF Network Sprint 245, CNF Network Sprint 246, CNF Network Sprint 247, CNF Network Sprint 248, CNF Network Sprint 249, CNF Network Sprint 250, CNF Network Sprint 251, CNF Network Sprint 252
    • 8
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      To start a pod take up to 2 min.

      Version-Release number of selected component (if applicable):

      4.12.41
      Intel 710 nic

      How reproducible:

      50%

      Steps to Reproduce:

      1. Create SR-IOV configuration (SR-IOV policy and network)
      2. Create 2 pods with SR-IOV nics
      3.
      

      Actual results:

      $ oc get pods -n sriov-operator-tests 
      NAME            READY   STATUS     RESTARTS   AGE
      testpod-j4jl2   0/1     Init:0/1   0          38s
      testpod-pw6j4   2/2     Running    0          63s

      Expected results:

      $ oc get pods -n sriov-operator-tests 
      NAME            READY   STATUS     RESTARTS   AGE
      testpod-j4jl2   0/1     Running   0           38s
      testpod-pw6j4   2/2     Running    0          63s

      Additional info:

      ov 13 21:17:10 worker-0 kubenswrapper[5794]: E1113 21:17:10.821142    5794 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod-htpnt_sriov-operator-tests_828ef151-f323-4722-950e-b3f807c0164a_0(4173993d011d6c27995bdcfe96ab9a96c292361809732f0e739c4d2e6b83b0d9): error adding pod sriov-operator-tests_testpod-htpnt to CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (add): [sriov-operator-tests/testpod-htpnt/828ef151-f323-4722-950e-b3f807c0164a:test-sriov-static-jumbo-diff]: error adding container to network \"test-sriov-static-jumbo-diff\": SRIOV-CNI failed to load netconf: LoadConf(): the VF 0000:3b:0a.2 does not have a interface name or a dpdk driver"
      Nov 13 21:17:10 worker-0 kubenswrapper[5794]: E1113 21:17:10.821255    5794 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod-htpnt_sriov-operator-tests_828ef151-f323-4722-950e-b3f807c0164a_0(4173993d011d6c27995bdcfe96ab9a96c292361809732f0e739c4d2e6b83b0d9): error adding pod sriov-operator-tests_testpod-htpnt to CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (add): [sriov-operator-tests/testpod-htpnt/828ef151-f323-4722-950e-b3f807c0164a:test-sriov-static-jumbo-diff]: error adding container to network \"test-sriov-static-jumbo-diff\": SRIOV-CNI failed to load netconf: LoadConf(): the VF 0000:3b:0a.2 does not have a interface name or a dpdk driver" pod="sriov-operator-tests/testpod-htpnt"
      Nov 13 21:17:10 worker-0 kubenswrapper[5794]: E1113 21:17:10.821338    5794 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod-htpnt_sriov-operator-tests_828ef151-f323-4722-950e-b3f807c0164a_0(4173993d011d6c27995bdcfe96ab9a96c292361809732f0e739c4d2e6b83b0d9): error adding pod sriov-operator-tests_testpod-htpnt to CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (add): [sriov-operator-tests/testpod-htpnt/828ef151-f323-4722-950e-b3f807c0164a:test-sriov-static-jumbo-diff]: error adding container to network \"test-sriov-static-jumbo-diff\": SRIOV-CNI failed to load netconf: LoadConf(): the VF 0000:3b:0a.2 does not have a interface name or a dpdk driver" pod="sriov-operator-tests/testpod-htpnt"
      Nov 13 21:17:10 worker-0 kubenswrapper[5794]: E1113 21:17:10.821482    5794 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"testpod-htpnt_sriov-operator-tests(828ef151-f323-4722-950e-b3f807c0164a)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"testpod-htpnt_sriov-operator-tests(828ef151-f323-4722-950e-b3f807c0164a)\\\": rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod-htpnt_sriov-operator-tests_828ef151-f323-4722-950e-b3f807c0164a_0(4173993d011d6c27995bdcfe96ab9a96c292361809732f0e739c4d2e6b83b0d9): error adding pod sriov-operator-tests_testpod-htpnt to CNI network \\\"multus-cni-network\\\": plugin type=\\\"multus\\\" name=\\\"multus-cni-network\\\" failed (add): [sriov-operator-tests/testpod-htpnt/828ef151-f323-4722-950e-b3f807c0164a:test-sriov-static-jumbo-diff]: error adding container to network \\\"test-sriov-static-jumbo-diff\\\": SRIOV-CNI failed to load netconf: LoadConf(): the VF 0000:3b:0a.2 does not have a interface name or a dpdk driver\"" pod="sriov-operator-tests/testpod-htpnt" podUID=828ef151-f323-4722-950e-b3f807c0164a
      
      
      
      
      sh-4.4# ls /var/lib/cni/sriov/pci/
      0000:3b:02.3  0000:3b:02.4  0000:3b:02.5
      
      
      
      Nov 13 21:17:49 worker-0 kubenswrapper[5794]: I1113 21:17:49.382939    5794 generic.go:296] "Generic (PLEG): container finished" podID=828ef151-f323-4722-950e-b3f807c0164a containerID="720c2247b0a46e0a97331ac7c1b0f1f074ca53daa16d9f0047444c09050b6106" exitCode=0
      Nov 13 21:17:49 worker-0 kubenswrapper[5794]: I1113 21:17:49.383025    5794 kubelet.go:2157] "SyncLoop (PLEG): event for pod" pod="sriov-operator-tests/testpod-htpnt" event=&{ID:828ef151-f323-4722-950e-b3f807c0164a Type:ContainerDied Data:720c2247b0a46e0a97331ac7c1b0f1f074ca53daa16d9f0047444c09050b6106}
      Nov 13 21:17:50 worker-0 crio[5622]: time="2023-11-13 21:17:50.387018492Z" level=info msg="Stopping pod sandbox: 8d0b0376432619710809d7330d40afd7046297b24e8e724089cbcd65c835e4c3" id=e88bfc0f-8f72-4b76-93fb-b566dd79234b name=/runtime.v1.RuntimeService/StopPodSandbox
      Nov 13 21:17:50 worker-0 crio[5622]: time="2023-11-13 21:17:50.387621262Z" level=info msg="Got pod network &{Name:testpod-htpnt Namespace:sriov-operator-tests ID:8d0b0376432619710809d7330d40afd7046297b24e8e724089cbcd65c835e4c3 UID:828ef151-f323-4722-950e-b3f807c0164a NetNS:/var/run/netns/b458c6fb-ee42-423c-b7f2-3c12de6efac0 Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[]}] Aliases:map[]}"
      Nov 13 21:17:50 worker-0 crio[5622]: time="2023-11-13 21:17:50.387962449Z" level=info msg="Deleting pod sriov-operator-tests_testpod-htpnt from CNI network \"multus-cni-network\" (type=multus)"
      Nov 13 21:17:50 worker-0 kernel: i40e 0000:3b:00.1: Setting MAC 86:36:27:f3:da:5a on VF 2
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2: Reset indication received from the PF
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2: Scheduling reset task
      Nov 13 21:17:50 worker-0 kernel: i40e 0000:3b:00.1: Bring down and up the VF interface to make this change effective.
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2 ens1f1v2: renamed from net1
      Nov 13 21:17:50 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:50 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:50 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:50 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:50 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 ovs-vswitchd[2588]: ovs|13843|connmgr|INFO|br-ex<->unix#33819: 2 flow_mods in the last 0 s (2 adds)
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:51 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:51 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:52 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:52 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:52 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:52 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:52 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:52 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:52 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:52 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:52 worker-0 kubenswrapper[5794]: I1113 21:17:52.390204    5794 scope.go:115] "RemoveContainer" containerID="4808e9c8021075e2203405509d5673ec1af3af66c932a9921d83ac700c589f17"
      Nov 13 21:17:52 worker-0 crio[5622]: time="2023-11-13 21:17:52.391640418Z" level=info msg="Removing container: 4808e9c8021075e2203405509d5673ec1af3af66c932a9921d83ac700c589f17" id=875453cb-af71-4a33-8cfa-7cc3e34a8378 name=/runtime.v1.RuntimeService/RemoveContainer
      Nov 13 21:17:52 worker-0 kernel: i40e 0000:3b:00.1: VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation
      Nov 13 21:17:52 worker-0 kernel: iavf 0000:3b:0a.2: Failed to add MAC filter, error IAVF_ERR_NVM
      Nov 13 21:17:52 worker-0 systemd[1]: var-lib-containers-storage-overlay-76a8ebdf79f026051a1cf6195a539da590d8dad445c77edc6fe63792c03aef82-merged.mount: Succeeded.
      Nov 13 21:17:52 worker-0 systemd[1]: var-lib-containers-storage-overlay-76a8ebdf79f026051a1cf6195a539da590d8dad445c77edc6fe63792c03aef82-merged.mount: Consumed 0 CPU time
      Nov 13 21:17:52 worker-0 systemd[1]: run-runc-5961ba24ef7b0c8aeb280cf52e8129661a2369be902f845ebd887ecab126ca5d-runc.hWbXm7.mount: Succeeded.
      
      
      
      Ther is no name, MTU and mac on the problematic VF
      
      - Vfs:
          - deviceID: 154c
            driver: iavf
            mac: 16:0c:59:e4:15:a5
            mtu: 1500
            name: eth12
            pciAddress: 0000:3b:0a.0
            vendor: "8086"
            vfID: 0
          - deviceID: 154c
            driver: iavf
            pciAddress: 0000:3b:0a.1
            vendor: "8086"
            vfID: 1
          - deviceID: 154c
            driver: iavf
            mac: 8e:6a:48:82:35:dd
            mtu: 1500
            name: ens1f1v10
            pciAddress: 0000:3b:0b.2
            vendor: "8086"
            vfID: 10
      
      
      - Vfs:
          - deviceID: 154c
            driver: iavf
            mac: 16:0c:59:e4:15:a5
            mtu: 1500
            name: eth12
            pciAddress: 0000:3b:0a.0
            vendor: "8086"
            vfID: 0
          - deviceID: 154c
            driver: iavf
            pciAddress: 0000:3b:0a.1
            vendor: "8086"
            vfID: 1
          - deviceID: 154c
            driver: iavf
            mac: 8e:6a:48:82:35:dd
            mtu: 1500
            name: ens1f1v10
            pciAddress: 0000:3b:0b.2
            vendor: "8086"
            vfID: 10
      
      

              rh-ee-marguerr Marcelo Guerrero Viveros
              rhn-cnf-elevin Evgeny Levin
              None
              Marcelo Guerrero Viveros
              Evgeny Levin Evgeny Levin
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: