-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The ovnkube-cluster-manager is cycling between finished syncing NAD / finished syncing network ovs-bridge-vlan204 / error found while processing ovs-bridge-vlan204 for every NAD on the cluster.
This does not appear to impact the NAD / networks or their attached VMs.
This is a red herring generating thousands of log lines, creating issue for actual debugging and creating the illusion of an issue.
2025-06-24T19:38:01.736891443+00:00 stderr F I0624 19:38:01.736872 1 controller.go:132] Adding controller [clustermanager-nad-controller NAD controller] event handlers 2025-06-24T19:38:01.736946275+00:00 stderr F I0624 19:38:01.736937 1 shared_informer.go:313] Waiting for caches to sync for [clustermanager-nad-controller NAD controller] 2025-06-24T19:38:01.736950611+00:00 stderr F I0624 19:38:01.736945 1 shared_informer.go:320] Caches are synced for [clustermanager-nad-controller NAD controller] 2025-06-24T19:38:01.737199252+00:00 stderr F I0624 19:38:01.737192 1 controller.go:156] Starting controller [clustermanager-nad-controller NAD controller] with 1 workers 2025-06-24T19:38:01.737253446+00:00 stderr F I0624 19:38:01.737247 1 network_controller.go:246] [clustermanager-nad-controller network controller]: syncing all networks 2025-06-24T19:38:01.737265634+00:00 stderr F I0624 19:38:01.737258 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan210, took 5.422µs 2025-06-24T19:38:01.737269503+00:00 stderr F I0624 19:38:01.737267 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan204, took 1.573µs 2025-06-24T19:38:01.737273898+00:00 stderr F I0624 19:38:01.737271 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan205, took 1.46µs 2025-06-24T19:38:01.737278240+00:00 stderr F I0624 19:38:01.737275 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan206, took 1.398µs 2025-06-24T19:38:01.737281830+00:00 stderr F I0624 19:38:01.737279 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan207, took 1.364µs 2025-06-24T19:38:01.737289253+00:00 stderr F I0624 19:38:01.737283 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan208, took 2.123µs 2025-06-24T19:38:01.737304651+00:00 stderr F I0624 19:38:01.737290 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan209, took 2.074µs 2025-06-24T19:38:01.737309600+00:00 stderr F I0624 19:38:01.737304 1 network_controller.go:257] [clustermanager-nad-controller network controller]: finished syncing all networks. Time taken: 56.699µs 2025-06-24T19:38:01.737313128+00:00 stderr F I0624 19:38:01.737308 1 controller.go:156] Starting controller [clustermanager-nad-controller network controller] with 1 workers 2025-06-24T19:38:01.737317467+00:00 stderr F I0624 19:38:01.737315 1 nad_controller.go:162] [clustermanager-nad-controller NAD controller]: started 2025-06-24T19:38:01.737326680+00:00 stderr F I0624 19:38:01.737321 1 network_cluster_controller.go:376] Initializing cluster manager network controller "default" ... 2025-06-24T19:38:01.737357487+00:00 stderr F I0624 19:38:01.737351 1 network_cluster_controller.go:382] Cluster manager network controller "default" initialized. Took: 32.626µs 2025-06-24T19:38:01.737357487+00:00 stderr F I0624 19:38:01.737355 1 network_cluster_controller.go:386] Cluster manager network controller "default" starting node watcher... 2025-06-24T19:38:01.737391928+00:00 stderr F I0624 19:38:01.737383 1 nad_controller.go:246] [clustermanager-nad-controller NAD controller]: finished syncing NAD default/ovs-bridge-vlan206, took 174.417µs 2025-06-24T19:38:01.737416439+00:00 stderr F I0624 19:38:01.737403 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan204, took 19.002µs 2025-06-24T19:38:01.737421181+00:00 stderr F I0624 19:38:01.737416 1 controller.go:257] Controller [clustermanager-nad-controller network controller]: error found while processing ovs-bridge-vlan204: [clustermanager-nad-controller network controller]: failed to ensure network ovs-bridge-vlan204: failed to create network ovs-bridge-vlan204: no cluster network controller to manage topology 2025-06-24T19:38:01.737439932+00:00 stderr F I0624 19:38:01.737433 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan205, took 3.123µs 2025-06-24T19:38:01.737443920+00:00 stderr F I0624 19:38:01.737439 1 controller.go:257] Controller [clustermanager-nad-controller network controller]: error found while processing ovs-bridge-vlan205: [clustermanager-nad-controller network controller]: failed to ensure network ovs-bridge-vlan205: failed to create network ovs-bridge-vlan205: no cluster network controller to manage topology 2025-06-24T19:38:01.737443920+00:00 stderr F I0624 19:38:01.737440 1 nad_controller.go:246] [clustermanager-nad-controller NAD controller]: finished syncing NAD default/ovs-bridge-vlan207, took 45.213µs 2025-06-24T19:38:01.737456415+00:00 stderr F I0624 19:38:01.737449 1 network_controller.go:275] [clustermanager-nad-controller network controller]: finished syncing network ovs-bridge-vlan206, took 2.899µs 2025-06-24T19:38:01.737459992+00:00 stderr F I0624 19:38:01.737455 1 controller.go:257] Controller [clustermanager-nad-controller network controller]: error found while processing ovs-bridge-vlan206: [clustermanager-nad-controller network controller]: failed to ensure network ovs-bridge-vlan206: failed to create network ovs-bridge-vlan206: no cluster network controller to manage topology 2025-06-24T19:38:01.737964590+00:00 stderr F I0624 19:38:01.737954 1 network_cluster_controller.go:391] Cluster manager network controller "default" completed watch nodes. Took: 597.124µs 2025-06-24T19:38:01.737983410+00:00 stderr F I0624 19:38:01.737978 1 zone_cluster_controller.go:217] Node qq2dsfcd27e34.exp-corp.cloud has the id 12 set 2025-06-24T19:38:01.737987080+00:00 stderr F I0624 19:38:01.737983 1 zone_cluster_controller.go:217] Node qq2dsfcd27e40.exp-corp.cloud has the id 9 set 2025-06-24T19:38:01.737987080+00:00 stderr F I0624 19:38:01.737985 1 zone_cluster_controller.go:217] Node qq2dsfcd40e36.exp-corp.cloud has the id 3 set 2025-06-24T19:38:01.737990738+00:00 stderr F I0624 19:38:01.737988 1 zone_cluster_controller.go:217] Node qq2dsfcd40e37.exp-corp.cloud has the id 4 set 2025-06-24T19:38:01.737994296+00:00 stderr F I0624 19:38:01.737990 1 zone_cluster_controller.go:217] Node qq2dsfcd40e38.exp-corp.cloud has the id 2 set 2025-06-24T19:38:01.737994296+00:00 stderr F I0624 19:38:01.737993 1 zone_cluster_controller.go:217] Node qq2dsfcc27e34.exp-corp.cloud has the id 8 set 2025-06-24T19:38:01.737997812+00:00 stderr F I0624 19:38:01.737995 1 zone_cluster_controller.go:217] Node qq2dsfcc27e36.exp-corp.cloud has the id 10 set 2025-06-24T19:38:01.738001273+00:00 stderr F I0624 19:38:01.737998 1 zone_cluster_controller.go:217] Node qq2dsfcc27e40.exp-corp.cloud has the id 6 set 2025-06-24T19:38:01.738004749+00:00 stderr F I0624 19:38:01.738000 1 zone_cluster_controller.go:217] Node qq2dsfcc27e38.exp-corp.cloud has the id 5 set 2025-06-24T19:38:01.738004749+00:00 stderr F I0624 19:38:01.738003 1 zone_cluster_controller.go:217] Node qq2dsfcd27e36.exp-corp.cloud has the id 11 set 2025-06-24T19:38:01.738008234+00:00 stderr F I0624 19:38:01.738005 1 zone_cluster_controller.go:217] Node qq2dsfcd27e38.exp-corp.cloud has the id 7 set 2025-06-24T19:38:01.738107914+00:00 stderr F I0624 19:38:01.738089 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:2 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.2/16"}] on node qq2dsfcd40e38.exp-corp.cloud 2025-06-24T19:38:01.738147623+00:00 stderr F I0624 19:38:01.738118 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:8 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.8/16"}] on node qq2dsfcc27e34.exp-corp.cloud 2025-06-24T19:38:01.738171690+00:00 stderr F I0624 19:38:01.738136 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:12 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.12/16"}] on node qq2dsfcd27e34.exp-corp.cloud 2025-06-24T19:38:01.738171690+00:00 stderr F I0624 19:38:01.738142 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:3 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.3/16"}] on node qq2dsfcd40e36.exp-corp.cloud 2025-06-24T19:38:01.738171690+00:00 stderr F I0624 19:38:01.738147 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:7 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.7/16"}] on node qq2dsfcd27e38.exp-corp.cloud 2025-06-24T19:38:01.738212015+00:00 stderr F I0624 19:38:01.738171 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:9 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.9/16"}] on node qq2dsfcd27e40.exp-corp.cloud 2025-06-24T19:38:01.738212015+00:00 stderr F I0624 19:38:01.738119 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:5 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.5/16"}] on node qq2dsfcc27e38.exp-corp.cloud 2025-06-24T19:38:01.738220716+00:00 stderr F I0624 19:38:01.738167 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:11 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.11/16"}] on node qq2dsfcd27e36.exp-corp.cloud 2025-06-24T19:38:01.738225904+00:00 stderr F I0624 19:38:01.738141 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:10 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.10/16"}] on node qq2dsfcc27e36.exp-corp.cloud 2025-06-24T19:38:01.738324583+00:00 stderr F I0624 19:38:01.738284 1 kube.go:133] Setting annotations map[k8s.ovn.org/node-id:6 k8s.ovn.org/node-transit-switch-port-ifaddr:{"ipv4":"100.88.0.6/16"}] on node qq2dsfcc27e40.exp-corp.cloud
Version-Release number of selected component (if applicable):
4.19.1
How reproducible:
Has been happening since the the upgrade from 4.18.
Steps to Reproduce:
1. NNCP used as a vlan trunk:
spec: desiredState: ovn: bridge-mappings: - bridge: br-ex localnet: vlan-trunk state: present nodeSelector: node-role.kubernetes.io/worker: ""
2. localnet NAD on a VLAN using the physicalNetworkName
spec: config: |- { "cniVersion": "0.4.0", "name": "ovs-bridge-vlan204", "type": "ovn-k8s-cni-overlay", "mtu": 9000, "netAttachDefName": "default/ovs-bridge-vlan204", "topology": "localnet", "physicalNetworkName": "vlan-trunk", "vlanID": 204 }
3.
Actual results:
This is logged 4 times per minute for every NAD. The volume is impactful and its hard to see through the noise.
Expected results:
Additional info:
If it is a customer / SD issue:
- Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
- Don’t presume that Engineering has access to Salesforce.
- Do presume that Engineering will access attachments through supportshell.
- Describe what each relevant attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
- Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
- What is the srcNode, srcNamespace, srcPodName and srcPodIP?
- What is the dstNode, dstNamespace, dstPodName and dstPodIP?
- What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
- Please provide the UTC timestamp networking outage window from must-gather
- Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
- Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.
- When showing the results from commands, include the entire command in the output.
- For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
- For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, label with “sbr-untriaged”
- Do not set the priority, that is owned by Engineering and will be set when the bug is evaluated
- Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”
- For guidance on using this template please see
OCPBUGS Template Training for Networking components