[OCPBUGS-37561] Localnet topology without a subnet produces "error" logs despite functioning well and being supported

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.14, 4.16, 4.17
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:Platform:OVNK

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

One of the main use-cases for localnet in OCP is OpenShift Virtualization, where VMs use it to connect directly to the underlay network, with or without VLAN encapsulation. In this scenario, OVN's IPAM ("subnets") is typically not used, as the VM relies on the networks DHCP server or static IP allocation.

Although this configuration works well, is tested by Virtualization QE, documented, and adopted by many customers, it does not seem to be properly handled in the codebase:

~~There seem to be no e2e test for subnetless localnet on upstream~~ https://github.com/ovn-org/ovn-kubernetes/blob/master/test/e2e/multihoming.go#L219
Running in this config produces ERROR logs in OVN control plane

Version-Release number of selected component (if applicable):

4.14+

How reproducible:

Always

Steps to Reproduce:

Configure localnet

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: ovs-br1-multiple-networks 
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: '' 
  desiredState:
    interfaces:
    - name: ovs-br1 
      description: |-
        A dedicated OVS bridge with eth1 as a port
        allowing all VLANs and untagged traffic
      type: ovs-bridge
      state: up
      bridge:
        options:
          stp: true
        port:
        - name: bond0 
    ovn:
      bridge-mappings:
      - localnet: localnet-network-282
        bridge: ovs-br1 
        state: present

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: localnet-network-282
  namespace: default
spec:
  config: |2
    {
            "cniVersion": "0.3.1", 
            "name": "localnet-network-282", 
            "type": "ovn-k8s-cni-overlay", 
            "topology": "localnet",
            "vlanID": 282,
            "netAttachDefName": "default/localnet-network-282" 
    }

2. Request it from a Pod/VM

Actual results:

The network works as expected, but there is an error log:

oc logs -n openshift-ovn-kubernetes ovnkube-control-plane-6556759887-zjn6t -c ovnkube-cluster-manager
I0725 09:22:52.906720 1 controller.go:220] Controller [cluster-manager network manager]: error found while processing localnet-network-282: [cluster-manager network manager]: failed to create network localnet-network-282: no cluster network controller to manage topology
Expected results:

Logs should be clear. Upstream should cover this scenario to make sure we won't suffer regression.

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

This issue affected several customers already and mislead troubleshooting efforts.

Andrew Austin Byrum added a comment - 2024/10/16 4:33 PM

~~OCPBUGS-43454~~ has been opened for the upgrade / crashing behavior

Andrew Austin Byrum added a comment - 2024/10/16 4:33 PM OCPBUGS-43454 has been opened for the upgrade / crashing behavior

Andrew Austin Byrum added a comment - 2024/10/16 2:02 PM

phoracek@redhat.com Sure thing, I'll verify reproduction steps in a clean lab and open a new bug.

It's the same error message and workaround only the control plane pod fails to start up after the upgrade due to that error instead of just continuing to operate normally.

Andrew Austin Byrum added a comment - 2024/10/16 2:02 PM phoracek@redhat.com Sure thing, I'll verify reproduction steps in a clean lab and open a new bug. It's the same error message and workaround only the control plane pod fails to start up after the upgrade due to that error instead of just continuing to operate normally.

Petr Horacek added a comment - 2024/10/16 8:41 AM

aaustin@redhat.com will you please open a new bug for this? This one is tracking a small issue of incorrect logging. Yours sounds much much worse

Petr Horacek added a comment - 2024/10/16 8:41 AM aaustin@redhat.com will you please open a new bug for this? This one is tracking a small issue of incorrect logging. Yours sounds much much worse

Andrew Austin Byrum added a comment - 2024/10/15 6:26 PM

A customer attempted to upgrade a 4.16.15 cluster to 4.17.1 with localnet NADs without subnet definitions and this caused the ovnkube-control-plane pods to crash loop, blocking the upgrade. We added the subnet field to all NADs to resolve the issue temporarily, but this bug may need higher criticality since it causes upgrade failures.

Andrew Austin Byrum added a comment - 2024/10/15 6:26 PM A customer attempted to upgrade a 4.16.15 cluster to 4.17.1 with localnet NADs without subnet definitions and this caused the ovnkube-control-plane pods to crash loop, blocking the upgrade. We added the subnet field to all NADs to resolve the issue temporarily, but this bug may need higher criticality since it causes upgrade failures.

sdn-team bot added a comment - 2024/07/29 1:52 PM

Hi Petr Horacek!

I am the OpenShift Networking SDN Team's bug backlog bot. This bug is currently assigned to me and is priority=Minor which means:

1. This bug is not being actively monitored by a human.
2. Please wait until an SDN Team engineer gets free cycles to be assigned to this bug after they get through higher priority bugs.
3. If you need immediate attention, reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group.
4. If you do not agree about priority=Minor, reach out to #forum-ocp-sdn channel on slack, tag the @sdn-bug-pre-dispatch-team user group and provide a valid business justification for why priority should be higher.
5. If an engineer is unable to address this bug within 56 working days (totally dependent on the current bug queue for the team), this bug will be automatically closed with the reason "lack of cycles for the SDN team". If that happens and you are unhappy with the decision, please reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group.

Thank you for your understanding and patience,
With Kind Regards,
OpenShift Networking SDN-Team.

sdn-team bot added a comment - 2024/07/29 1:52 PM Hi Petr Horacek! I am the OpenShift Networking SDN Team's bug backlog bot. This bug is currently assigned to me and is priority=Minor which means: 1. This bug is not being actively monitored by a human. 2. Please wait until an SDN Team engineer gets free cycles to be assigned to this bug after they get through higher priority bugs. 3. If you need immediate attention, reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group. 4. If you do not agree about priority=Minor, reach out to #forum-ocp-sdn channel on slack, tag the @sdn-bug-pre-dispatch-team user group and provide a valid business justification for why priority should be higher. 5. If an engineer is unable to address this bug within 56 working days (totally dependent on the current bug queue for the team), this bug will be automatically closed with the reason "lack of cycles for the SDN team". If that happens and you are unhappy with the decision, please reach out to #forum-ocp-sdn channel on slack and tag the @sdn-bug-pre-dispatch-team user group. Thank you for your understanding and patience, With Kind Regards, OpenShift Networking SDN-Team.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

Collapse comment: Andrew Austin Byrum added a comment - 2024/10/16 4:33 PM

Expand comment: Andrew Austin Byrum added a comment - 2024/10/16 4:33 PM

Collapse comment: Andrew Austin Byrum added a comment - 2024/10/16 2:02 PM

Expand comment: Andrew Austin Byrum added a comment - 2024/10/16 2:02 PM

Collapse comment: Petr Horacek added a comment - 2024/10/16 8:41 AM

Expand comment: Petr Horacek added a comment - 2024/10/16 8:41 AM

Collapse comment: Andrew Austin Byrum added a comment - 2024/10/15 6:26 PM

Expand comment: Andrew Austin Byrum added a comment - 2024/10/15 6:26 PM

Collapse comment: sdn-team bot added a comment - 2024/07/29 1:52 PM

Expand comment: sdn-team bot added a comment - 2024/07/29 1:52 PM

People

Dates