Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37087

Failed to create pod sandbox when using several NADs using the same OVN localnet bridge mapping

XMLWordPrintable

    • Critical
    • None
    • False
    • Hide

      None

      Show
      None
    • 07/23 NAD config change doesn't take. Maybe a timing issue? Workaround is in the linked KCS. BQI:Fair

      Description of problem:
      This issue happens after some NADs using the same OVN localnet bridge mapping have been reconfigured with a different vlanID or no vlanID at all.

      Initial state (all working OK):

      • NNCP with a single OVN bridge-mapping.
      • 2 NADs in different namespaces with the same vlanID configuration and using the bridge mapping.
      • 2 VMs, each one in the NAD namespaces making use of the network. The virt-launcher pods run fine.

      After that, I stop the VMs and reconfigure the NADs removing the vlanID. When starting the VMs only one of them runs, the other one fails with a FailedCreatePodSandBox event.

      Version-Release number of selected component (if applicable):
      OCP 4.15.20
      OCP Virt 4.15.2
      OVN-Kubernetes

      How reproducible:

      Most of the times, but not always.

      Steps to Reproduce:

      1. Create 2 namespaces: test1 and test2

      2. Create a NNCP with a bridge mapping using br-ex:

      spec:
        desiredState:
          ovn:
            bridge-mappings:
            - bridge: br-ex
              localnet: localnet-test-1
              state: present
        nodeSelector:
          node-role.kubernetes.io/worker: ""

      3. Create 2 NADs using the same vlanID and bridge mapping:

      ---
      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: nad-test
        namespace: test1
      spec:
        config: '{"name":"localnet-test-1","type":"ovn-k8s-cni-overlay","cniVersion":"0.3.1","topology":"localnet","vlanID":100,"netAttachDefName":"test1/nad-test"}'
        
      ---
      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: nad-test
        namespace: test2
      spec:
        config: '{"name":"localnet-test-1","type":"ovn-k8s-cni-overlay","cniVersion":"0.3.1","topology":"localnet","vlanID":100,"netAttachDefName":"test2/nad-test"}'  

      4. In each namespace, create a VM using their respective NAD and start them --> Works OK.
      5. Stop the VMs
      6. Recreate the NADs without the vlanID:

      ---
      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: nad-test
        namespace: test1
      spec:
        config: '{"name":"localnet-test-1","type":"ovn-k8s-cni-overlay","cniVersion":"0.3.1","topology":"localnet","netAttachDefName":"test1/nad-test"}'
        
      ---
      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: nad-test
        namespace: test2
      spec:
        config: '{"name":"localnet-test-1","type":"ovn-k8s-cni-overlay","cniVersion":"0.3.1","topology":"localnet","netAttachDefName":"test2/nad-test"}'  
      

      7. Start the VMs.

      Actual results:
      One VM starts fine and the other one doesn't. The pod is stuck in ContainerCreating and this event is logged:

       

      {{test1                                  0s                        Warning   FailedCreatePodSandBox              Pod/virt-launcher-rhel9-purple-vole-57-4jrzd                         Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_virt-launcher-rhel9-purple-vole-57-4jrzd_test1_01b21877-0c88-4d2c-85d2-6d9d6db838ea_0(8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69): error adding pod test1_virt-launcher-rhel9-purple-vole-57-4jrzd to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&

      {ContainerID:8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69 Netns:/var/run/netns/3cffb6ba-f4df-4801-9430-51446e42ee40 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=test1;K8S_POD_NAME=virt-launcher-rhel9-purple-vole-57-4jrzd;K8S_POD_INFRA_CONTAINER_ID=8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69;K8S_POD_UID=01b21877-0c88-4d2c-85d2-6d9d6db838ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]}

      ContainerID:"8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69" Netns:"/var/run/netns/3cffb6ba-f4df-4801-9430-51446e42ee40" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=test1;K8S_POD_NAME=virt-launcher-rhel9-purple-vole-57-4jrzd;K8S_POD_INFRA_CONTAINER_ID=8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69;K8S_POD_UID=01b21877-0c88-4d2c-85d2-6d9d6db838ea" Path:"" ERRORED: error configuring pod [test1/virt-launcher-rhel9-purple-vole-57-4jrzd] networking: [test1/virt-launcher-rhel9-purple-vole-57-4jrzd/01b21877-0c88-4d2c-85d2-6d9d6db838ea:localnet-test-1]: error adding container to network "localnet-test-1": CNI request failed with status 400: '[test1/virt-launcher-rhel9-purple-vole-57-4jrzd 8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69 network localnet-test-1 NAD test1/nad-test] [test1/virt-launcher-rhel9-purple-vole-57-4jrzd 8aec0fe53b0e279abbb12dc5931c3385f0a6764dee995c0d15cd709b927fee69 network localnet-test-1 NAD test1/nad-test] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded}}

       

      Expected results:
      Both pods starting and using the new vlanID configuration.

      Additional info:
      This issue is similar to what is described in this bug: https://issues.redhat.com/browse/OCPBUGS-31679
      However that bug only talks about using the same bridge mapping from different NADs with different vlanIDs. In this case all NADs are using the same vlanID configuration.

              sdn-team-bot sdn-team bot
              rhn-support-jortialc Juan Orti
              Anurag Saxena Anurag Saxena
              Jaime Caamaño Ruiz
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: