Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-47548

OCP 4.16: ovs-bridge locanet interfaces for virtual machines fail to stay connected after provisioning

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • Customer Escalated
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      On OpenShift 4.16; provision 4 vlan interfaces with the following configs in (reproducer) steps below. 
      - Observe that virtual machines are provisioned properly and vms come online successfully with the target networks
      - Observe after a day or so, vms are offline and vlan interfaces can no longer reach the gateway.
      - Rebuilding the interfaces and configs will grant another day or so of connectivity - unclear what is killing the interfaces but is blocking go-live for major project.    

      Version-Release number of selected component (if applicable):

          4.16.10

      How reproducible:

          twice now - customer env

      Steps to Reproduce:

          1. Deploy the following template config for creating a net-attach-definition and nncp:    
       apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: ovs-br${VLANTAG}-vlan
      spec:
        nodeSelector:
      node-role.kubernetes.io/worker: ''
        desiredState:
      interfaces:
      - name: ovs-br-${VLANTAG}
        description: |-
          A dedicated OVS bridge with eth1 as a port
          allowing all VLANs and untagged traffic
        type: ovs-bridge
        state: up
        bridge:
          allow-extra-patch-ports: true
          options:
            stp: false
          port:
          - name: bond0
      ovn:
        bridge-mappings:
        - localnet: localnet-vlan${VLANTAG}
          bridge: ovs-br-${VLANTAG}
          state: present
      
      ...
      
      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: localnet-vlan${VLANTAG}
        namespace: ${VMPROJECT}
      spec:
        config: |2
      {
              "cniVersion": "0.4.0",
              "name": "localnet-vlan${VLANTAG}",
              "type": "ovn-k8s-cni-overlay",
              "topology": "localnet",
              "netAttachDefName": "${VMPROJECT}/localnet-vlan${VLANTAG}",
              "vlanID": ${VLANTAG}
      }

       

      Actual results:

       - Vms come online, are accessible - fail after `n` hours and are not recoverable   

      Expected results:

          virtualized machines and sub-network interfaces should remain stable after provisioning. 
      Nodes/vms/network resources were not power cycled or migrated to customer knowledge during this timeframe (weekend). 

      Additional info:

          Observing the ovsvctl outputs on the nodes we see that vlan23 remained accessible but vlan 18, 16, 20 are unavailable:
      
      
      ~~~
      $ less 0070-sosreport-w-7/sosreport/sos_commands/openvswitch/ovs-appctl_fdb.stats-show_ovs-br-23
      
      Statistics for bridge "ovs-br-23":
        Current/maximum MAC entries in the table: 1168/8192
        Current static MAC entries in the table : 0
        Total number of learned MAC entries     : 377359
        Total number of expired MAC entries     : 376191
        Total number of evicted MAC entries     : 0
        Total number of port moved MAC entries  : 0
      
      $ less 0070-sosreport-w-7/sosreport/sos_commands/openvswitch/ovs-appctl_fdb.show_ovs-br-23 | wc -l
      1169
      
      
      ~~~
      
      On the other hand, the br-16 has only one in the port 2. This explains why this is not communication outside:
      
      ~~~
      $ cat 0070-sosreport-w-7/sosreport/sos_commands/openvswitch/ovs-appctl_fdb.show_ovs-br-16 
       port  VLAN  MAC                Age
          2    16  02:xx:xx:xx:xx:5d    0
      
      ~~~
      
      We can see differences in the patch between br-23 and br-16:
      
      ~~~
      $ less 0070-sosreport-w-7/sosreport/sos_commands/openvswitch/ovs-vsctl_-t_5_show
      
          Bridge ovs-br-16
              Port patch-localnet.vlan16_ovn_localnet_port-to-br-int
                  Interface patch-localnet.vlan16_ovn_localnet_port-to-br-int
                      type: patch
                      options: {peer=patch-br-int-to-localnet.vlan16_ovn_localnet_port}
          Bridge ovs-br-23
              Port bond0
                  Interface bond0
                      type: system
      ~~~
      
      
      

              phoracek@redhat.com Petr Horacek
              rhn-support-wrussell Will Russell
              None
              None
              Liangquan Li Liangquan Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: