Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11281

Error on deleting a pod using sriov netdevice

    XMLWordPrintable

Details

    • No
    • CNF Network Sprint 234, CNF Network Sprint 236, CNF Network Sprint 237, CNF Network Sprint 238, CNF Network Sprint 239, CNF Network Sprint 242, CNF Network Sprint 248, CNF Network Sprint 251, CNF Network Sprint 252
    • 9
    • False
    • Hide

      None

      Show
      None
    • Hide
      * An error might occur when deleting a pod that uses an SR-IOV network device. This error is caused by a change in {op-system-base} 9 where the previous name of a network interface is added to its alternative names list when it is renamed. As a consequence, when a pod attached to an SR-IOV virtual function (VF) is deleted, the VF returns to the pool with a new unexpected name, for example, `dev69`, instead of its original name, for example, `ensf0v2`. Although this error is non-fatal, Multus and SR-IOV logs might show the error while the system recovers on its own. Deleting the pod might take a few extra seconds due to this error. (link:https://issues.redhat.com/browse/OCPBUGS-11281[*OCPBUGS-11281*],link:https://issues.redhat.com/browse/OCPBUGS-18822[*OCPBUGS-18822*], link:https://issues.redhat.com/browse/RHEL-5988[*RHEL-5988*])
      Show
      * An error might occur when deleting a pod that uses an SR-IOV network device. This error is caused by a change in {op-system-base} 9 where the previous name of a network interface is added to its alternative names list when it is renamed. As a consequence, when a pod attached to an SR-IOV virtual function (VF) is deleted, the VF returns to the pool with a new unexpected name, for example, `dev69`, instead of its original name, for example, `ensf0v2`. Although this error is non-fatal, Multus and SR-IOV logs might show the error while the system recovers on its own. Deleting the pod might take a few extra seconds due to this error. (link: https://issues.redhat.com/browse/OCPBUGS-11281 [* OCPBUGS-11281 *],link: https://issues.redhat.com/browse/OCPBUGS-18822 [* OCPBUGS-18822 *], link: https://issues.redhat.com/browse/RHEL-5988 [* RHEL-5988 *])
    • Known Issue
    • Done
    • Hide
      01/24: Potential PR needs to be tested, waiting for reproducer to validate the fix
      05/05: Dan C is looking into the upstream commit.
      05/03: waiting on upstream commit. Requested forecast.
      Show
      01/24: Potential PR needs to be tested, waiting for reproducer to validate the fix 05/05: Dan C is looking into the upstream commit. 05/03: waiting on upstream commit. Requested forecast.

    Description

      Description of problem:

      Looks like there was a change in RHEL 9.2 every time we change an interface name it adds the old name to the alternative name.

       
      net1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
      altname enp216s0f1v9
      altname ens3f1v9
      and when we remove a pod and the sriov-cni tries to move back the nic and rename it we get this error
      0s Warning FailedKillPod pod/client error killing pod: failed to "KillPodSandbox" for "56fd1e57-c9f3-4aae-8ab7-3f6564e0c675" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_client_seba_56fd1e57-c9f3-4aae-8ab7-3f6564e0c675_0(8b0f6b4a6d8d639e72d0d27b41719389f002e26e6e723c019c716ff43abdb5e1): error removing pod seba_client from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): DelegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to move interface ens3f1v6 to init netns: file exists"
      and the nic is moved to the host with the wrong name
      69: dev69: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
      altname enp216s0f1v9
      altname ens3f1v9
      because of the altname a new pod can start but that is not the right way.

       

      I will work on a PR to remove the altname after we do a rename this way we will be able to move the interface back to is original name on the sriov-cni delete/detach part.

       

      Attachments

        Issue Links

          Activity

            People

              sscheink@redhat.com Sebastian Scheinkman
              sscheink@redhat.com Sebastian Scheinkman
              Evgeny Levin Evgeny Levin
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: