Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38163

Error on deleting a pod using sriov netdevice

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.13, 4.14
    • Networking / SR-IOV
    • None
    • No
    • CNF Network Sprint 257, CNF Network Sprint 258
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      * An error might occur when deleting a pod that uses an SR-IOV network device. This error is caused by a change in {op-system-base} 9 where the previous name of a network interface is added to its alternative names list when it is renamed. As a consequence, when a pod attached to an SR-IOV virtual function (VF) is deleted, the VF returns to the pool with a new unexpected name, for example, `dev69`, instead of its original name, for example, `ensf0v2`. Although this error is non-fatal, Multus and SR-IOV logs might show the error while the system recovers on its own. Deleting the pod might take a few extra seconds due to this error. (link:https://issues.redhat.com/browse/OCPBUGS-11281[*OCPBUGS-11281*],link:https://issues.redhat.com/browse/OCPBUGS-18822[*OCPBUGS-18822*], link:https://issues.redhat.com/browse/RHEL-5988[*RHEL-5988*])
      Show
      * An error might occur when deleting a pod that uses an SR-IOV network device. This error is caused by a change in {op-system-base} 9 where the previous name of a network interface is added to its alternative names list when it is renamed. As a consequence, when a pod attached to an SR-IOV virtual function (VF) is deleted, the VF returns to the pool with a new unexpected name, for example, `dev69`, instead of its original name, for example, `ensf0v2`. Although this error is non-fatal, Multus and SR-IOV logs might show the error while the system recovers on its own. Deleting the pod might take a few extra seconds due to this error. (link: https://issues.redhat.com/browse/OCPBUGS-11281 [* OCPBUGS-11281 *],link: https://issues.redhat.com/browse/OCPBUGS-18822 [* OCPBUGS-18822 *], link: https://issues.redhat.com/browse/RHEL-5988 [* RHEL-5988 *])
    • Known Issue
    • Done
    • Hide
      01/24: Potential PR needs to be tested, waiting for reproducer to validate the fix
      05/05: Dan C is looking into the upstream commit.
      05/03: waiting on upstream commit. Requested forecast.
      Show
      01/24: Potential PR needs to be tested, waiting for reproducer to validate the fix 05/05: Dan C is looking into the upstream commit. 05/03: waiting on upstream commit. Requested forecast.

      This is a clone of issue OCPBUGS-11281. The following is the description of the original issue:

      Description of problem:

      Looks like there was a change in RHEL 9.2 every time we change an interface name it adds the old name to the alternative name.

       
      net1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
      altname enp216s0f1v9
      altname ens3f1v9
      and when we remove a pod and the sriov-cni tries to move back the nic and rename it we get this error
      0s Warning FailedKillPod pod/client error killing pod: failed to "KillPodSandbox" for "56fd1e57-c9f3-4aae-8ab7-3f6564e0c675" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_client_seba_56fd1e57-c9f3-4aae-8ab7-3f6564e0c675_0(8b0f6b4a6d8d639e72d0d27b41719389f002e26e6e723c019c716ff43abdb5e1): error removing pod seba_client from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): DelegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to move interface ens3f1v6 to init netns: file exists"
      and the nic is moved to the host with the wrong name
      69: dev69: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
      altname enp216s0f1v9
      altname ens3f1v9
      because of the altname a new pod can start but that is not the right way.

       

      I will work on a PR to remove the altname after we do a rename this way we will be able to move the interface back to is original name on the sriov-cni delete/detach part.

       

            sscheink@redhat.com Sebastian Scheinkman
            openshift-crt-jira-prow OpenShift Prow Bot
            Evgeny Levin Evgeny Levin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: