Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37561

Localnet topology without a subnet produces "error" logs despite functioning well and being supported

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      One of the main use-cases for localnet in OCP is OpenShift Virtualization, where VMs use it to connect directly to the underlay network, with or without VLAN encapsulation. In this scenario, OVN's IPAM ("subnets") is typically not used, as the VM relies on the networks DHCP server or static IP allocation.

      Although this configuration works well, is tested by Virtualization QE, documented, and adopted by many customers, it does not seem to be properly handled in the codebase:

      Version-Release number of selected component (if applicable):

      4.14+

      How reproducible:

      Always

      Steps to Reproduce:

      1. Configure localnet
      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: ovs-br1-multiple-networks 
      spec:
        nodeSelector:
          node-role.kubernetes.io/worker: '' 
        desiredState:
          interfaces:
          - name: ovs-br1 
            description: |-
              A dedicated OVS bridge with eth1 as a port
              allowing all VLANs and untagged traffic
            type: ovs-bridge
            state: up
            bridge:
              options:
                stp: true
              port:
              - name: bond0 
          ovn:
            bridge-mappings:
            - localnet: localnet-network-282
              bridge: ovs-br1 
              state: present 
      
      apiVersion: k8s.cni.cncf.io/v1
      kind: NetworkAttachmentDefinition
      metadata:
        name: localnet-network-282
        namespace: default
      spec:
        config: |2
          {
                  "cniVersion": "0.3.1", 
                  "name": "localnet-network-282", 
                  "type": "ovn-k8s-cni-overlay", 
                  "topology": "localnet",
                  "vlanID": 282,
                  "netAttachDefName": "default/localnet-network-282" 
          } 

      2. Request it from a Pod/VM

      Actual results:

      The network works as expected, but there is an error log:

      oc logs -n openshift-ovn-kubernetes ovnkube-control-plane-6556759887-zjn6t -c ovnkube-cluster-manager
      I0725 09:22:52.906720 1 controller.go:220] Controller [cluster-manager network manager]: error found while processing localnet-network-282: [cluster-manager network manager]: failed to create network localnet-network-282: no cluster network controller to manage topology
      Expected results:

      Logs should be clear. Upstream should cover this scenario to make sure we won't suffer regression.

      Additional info:

      Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

      Affected Platforms:

      This issue affected several customers already and mislead troubleshooting efforts.

            [OCPBUGS-37561] Localnet topology without a subnet produces "error" logs despite functioning well and being supported

            OCPBUGS-43454 has been opened for the upgrade / crashing behavior

            Andrew Austin Byrum added a comment - OCPBUGS-43454 has been opened for the upgrade / crashing behavior

            phoracek@redhat.com Sure thing, I'll verify reproduction steps in a clean lab and open a new bug.

            It's the same error message and workaround only the control plane pod fails to start up after the upgrade due to that error instead of just continuing to operate normally.

            Andrew Austin Byrum added a comment - phoracek@redhat.com Sure thing, I'll verify reproduction steps in a clean lab and open a new bug. It's the same error message and workaround only the control plane pod fails to start up after the upgrade due to that error instead of just continuing to operate normally.

            aaustin@redhat.com will you please open a new bug for this? This one is tracking a small issue of incorrect logging. Yours sounds much much worse

            Petr Horacek added a comment - aaustin@redhat.com will you please open a new bug for this? This one is tracking a small issue of incorrect logging. Yours sounds much much worse

            A customer attempted to upgrade a 4.16.15 cluster to 4.17.1 with localnet NADs without subnet definitions and this caused the ovnkube-control-plane pods to crash loop, blocking the upgrade. We added the subnet field to all NADs to resolve the issue temporarily, but this bug may need higher criticality since it causes upgrade failures.

            Andrew Austin Byrum added a comment - A customer attempted to upgrade a 4.16.15 cluster to 4.17.1 with localnet NADs without subnet definitions and this caused the ovnkube-control-plane pods to crash loop, blocking the upgrade. We added the subnet field to all NADs to resolve the issue temporarily, but this bug may need higher criticality since it causes upgrade failures.

            Hi Petr Horacek!

            I am the OpenShift Networking SDN Team's bug backlog bot. This bug is currently assigned to me and is priority=Minor which means:

            1. This bug is not being actively monitored by a human.
            2. Please wait until an SDN Team engineer gets free cycles to be assigned to this bug after they get through higher priority bugs.
            3. If you need immediate attention, reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group.
            4. If you do not agree about priority=Minor, reach out to #forum-ocp-sdn channel on slack, tag the @sdn-bug-pre-dispatch-team user group and provide a valid business justification for why priority should be higher.
            5. If an engineer is unable to address this bug within 56 working days (totally dependent on the current bug queue for the team), this bug will be automatically closed with the reason "lack of cycles for the SDN team". If that happens and you are unhappy with the decision, please reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group.

            Thank you for your understanding and patience,
            With Kind Regards,
            OpenShift Networking SDN-Team.

            sdn-team bot added a comment - Hi Petr Horacek! I am the OpenShift Networking SDN Team's bug backlog bot. This bug is currently assigned to me and is priority=Minor which means: 1. This bug is not being actively monitored by a human. 2. Please wait until an SDN Team engineer gets free cycles to be assigned to this bug after they get through higher priority bugs. 3. If you need immediate attention, reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group. 4. If you do not agree about priority=Minor, reach out to #forum-ocp-sdn channel on slack, tag the @sdn-bug-pre-dispatch-team user group and provide a valid business justification for why priority should be higher. 5. If an engineer is unable to address this bug within 56 working days (totally dependent on the current bug queue for the team), this bug will be automatically closed with the reason "lack of cycles for the SDN team". If that happens and you are unhappy with the decision, please reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group. Thank you for your understanding and patience, With Kind Regards, OpenShift Networking SDN-Team.

              ellorent Felix Enrique Llorente Pastora
              phoracek@redhat.com Petr Horacek
              Anurag Saxena Anurag Saxena
              Patryk Diak
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: