Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50898

OVN secondary localnet network is not working well in pod

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Major Major
    • None
    • 4.17.z
    • Networking / multus
    • None
    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Create ovn localnet network: tenantblue and create a pod with k8s.v1.cni.cncf.io/networks: tenantblue, 
      the pod can ping outside through default localnet: physnet (device eth0 in pod), it can't ping worker node/outside through localnet: tenantblue (device net1 in pod)

      Version-Release number of selected component (if applicable):

      Client Version: 4.16.11
      Server Version: 4.17.16(also test 4.17.7)
      Kubernetes Version: v1.30.7

      How reproducible:

      100%

      Steps to Reproduce:

      1.Deploy OCP 4.17.16 cluster on 3 bare metal servers,  each server with 2 interfaces: eno8303 (br-ex), eno8403. The eno8403 on 3 servers are connected to the same switch, can ping each other through eno8403.

      The provision node with interfaces: eno1(baremetal), eno4(provisioning).

      The provision node eno4(172.22.0.0) and 3 worker nodes eno8403(172.22.0.100/101/102) are connected to the same switch, they can ping each other.

      2.Create NodeNetworkConfigurationPolicy, map the localnet: tenantblue with bridge: br-withvlan, more details in file: NNCP-chhu.yaml

      #oc apply -f NNCP-chhu.yaml

      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: br-withvlan
      spec:
        desiredState:
          interfaces:
            - name: br-withvlan
              description: ovs bridge with secondary NIC eno8403 as a port with two VLAN configuration.
              type: ovs-bridge
              state: up
              bridge:
                options:
                  stp: true
                port:
                  - name: eno8403
                    vlan:
                      mode: trunk
                      trunk-tags:
                        - id: 10
                        - id: 20
          ovn:
            bridge-mappings:
            - localnet: tenantblue
              bridge: br-withvlan
              state: present 

      Check in worker node:

      # oc debug node/dell-per750-43.lab.eng.pek2.redhat.com
      # nmcli device|grep br-withvlan
      br-withvlan                                                   ovs-bridge     connected               br-withvlan-br
      # nmstatectl show
      ...
      ovn:
        bridge-mappings:
        - localnet: physnet
          bridge: br-ex
        - localnet: tenantblue
          bridge: br-withvlan  

      3.Create localnet: test/tenantblue with subnets: "192.168.20.0/24"

      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: tenantblue
      spec:
        config: |2
          {
                  "cniVersion": "0.4.0",
                  "name": "tenantblue",
                  "type": "ovn-k8s-cni-overlay",
                  "topology":"localnet",
                  "subnets": "192.168.20.0/24",
                  "excludeSubnets": "192.168.20.1/32",
                  "vlanID": 20,
                  "netAttachDefName": "test/tenantblue"
          }     

      Create pods to use the localnet: tenantblue, more details in files: Blue-POD1.yaml, Blue-POD2.yaml

      kind: Pod
      apiVersion: v1
      metadata:
        name: tenantblue-pod1
        labels:
          name: tenantblue-pod1
        annotations:
          k8s.v1.cni.cncf.io/networks: tenantblue
      spec:
        securityContext:
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault
        containers:
        - name: tenantblue-pod1
          image: quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]

      5. Check the pods are running

      # oc get pod -o wide
      NAME              READY   STATUS    RESTARTS   AGE     IP             NODE                                     NOMINATED NODE   READINESS GATES
      tenantblue-pod1   1/1     Running   0          7m16s   10.130.1.28    dell-per750-43.lab.eng.pek2.redhat.com   <none>           <none>
      tenantblue-pod2   1/1     Running   0          4m53s   10.128.1.217   dell-per750-44.lab.eng.pek2.redhat.com   <none>           <none> 
      
      #oc rsh -n openshift-ovn-kubernetes ovnkube-node-xl6pm
      sh-5.1# ovn-nbctl show
      switch 9302b2ea-e873-44c7-a711-f49df01894d7 (tenantblue_ovn_localnet_switch)
          port test.tenantblue_test_tenantblue-pod1
              addresses: ["0a:58:c0:a8:14:05 192.168.20.5"]
          port tenantblue_ovn_localnet_port
              type: localnet
              tag: 20
              addresses: ["unknown"]
      
      #oc rsh -n openshift-ovn-kubernetes ovnkube-node-cvtcc
      sh-5.1# ovn-nbctl show
      switch 39d3d476-19ff-428d-a1e6-8e71bdda191e (tenantblue_ovn_localnet_switch)
          port test.tenantblue_test_tenantblue-pod2
              addresses: ["0a:58:c0:a8:14:09 192.168.20.9"]
          port tenantblue_ovn_localnet_port
              type: localnet
              tag: 20
              addresses: ["unknown"]

      6. Check the ip address/route in tenantblue-pod1

       ~ $ ip r
      default via 10.129.0.1 dev eth0 
      10.128.0.0/14 via 10.129.0.1 dev eth0 
      10.129.0.0/23 dev eth0 proto kernel scope link src 10.129.1.23 
      100.64.0.0/16 via 10.129.0.1 dev eth0 
      169.254.0.5 via 10.129.0.1 dev eth0 
      172.30.0.0/16 via 10.129.0.1 dev eth0 
      192.168.20.0/24 dev net1 proto kernel scope link src 192.168.20.5 
      ~ $ ip addr show net1
      3: net1@if604: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default 
          link/ether 0a:58:c0:a8:14:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0
          inet 192.168.20.5/24 brd 192.168.20.255 scope global net1
            valid_lft forever preferred_lft forever
          inet6 fe80::858:c0ff:fea8:1405/64 scope link 
             valid_lft forever preferred_lft forever

      7. Check on tenantblue-pod1 through localnet: tenantblue device:net1
         Ping another pod/worker node/outside network - Failed

      ~ $ ping -I net1 192.168.20.9 
      PING 192.168.20.9 (192.168.20.9) from 192.168.20.5 net1: 56(84) bytes of data.
      From 192.168.20.5 icmp_seq=1 Destination Host Unreachable
      ~ $ ping -I net1 10.73.72.38     
      PING 10.73.72.38 (10.73.72.38) from 192.168.20.5 net1: 56(84) bytes of data.
      From 192.168.20.5 icmp_seq=1 Destination Host Unreachable
      ~ $ ping -I net1 www.baidu.com   
      PING www.wshifen.com (45.113.192.101) from 192.168.20.5 net1: 56(84) bytes of data.
      From 192.168.20.5 (192.168.20.5) icmp_seq=1 Destination Host Unreachable 

      8. Check on tenantblue-pod1 through localnet: physnet device: eth0
         Ping anogher pod/worker node/outside network - Passed

      ~ $ ping -c 1 -I eth0 10.130.0.131
      PING 10.130.0.131 (10.130.0.131) from 10.129.1.23 eth0: 56(84) bytes of data.
      64 bytes from 10.130.0.131: icmp_seq=1 ttl=62 time=3.33 ms
      ~ $ ping -c 1 -I eth0 10.73.72.38
      PING 10.73.72.38 (10.73.72.38) from 10.129.1.23 eth0: 56(84) bytes of data.
      64 bytes from 10.73.72.38: icmp_seq=1 ttl=64 time=1.41 ms
      ~ $ ping -c 1 -I eth0 www.baidu.com
      PING www.wshifen.com (45.113.192.101) from 10.129.1.23 eth0: 56(84) bytes of data.
      64 bytes from 45.113.192.101 (45.113.192.101): icmp_seq=1 ttl=51 time=65.2 ms

      9. Set the provision node and check the connection from pod net1 is failed: "Host is unreachable"

      #ip link add link provisioning name provisioning.20 type vlan id 20
      #ip addr add 192.168.20.1/24 dev provisioning.20
      #ip link set provisioning.20 up
      #ip addr show provisioning.20
      13: provisioning.20@provisioning: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
          link/ether 20:04:0f:eb:0e:73 brd ff:ff:ff:ff:ff:ff
          inet 192.168.20.1/24 scope global provisioning.20
      #podman run --name ipecho -d -p 9095:80 quay.io/openshifttest/ip-echo:multiarch
      5065ba825aaac7a5f8ac246c1183beb2b0d410877c978b162b0a0de2946da261
      
      #oc exec tenantblue-pod1 -- curl --interface net1 192.168.20.1:9095
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
        0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
      curl: (7) Failed to connect to 192.168.20.1 port 9095 after 3111 ms: Host is unreachable

      Actual results:

      In step7, Ping another pod/worker node/outside network through localnet: tenantblue (Device in pod: net1) failed

      Expected results:

      In step7, Ping another pod/worker node/outside network through localnet: tenantblue (Device in pod: net1) success

      Additional info:

          

              bbennett@redhat.com Ben Bennett
              chhu@redhat.com Chenli Hu
              Weibin Liang Weibin Liang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: