Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.17.z
Affects Version/s: 4.17
Component/s: Networking / ovn-kubernetes
Labels:

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.17.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, when you attempted to use the Cluster Network Operator (CNO) to upgrade a cluster with existing `localnet` networks, `ovnkube-control-plane` pods would fail to run. This happened because the `ovnkube-cluster-manager` container could not process an OVN-Kubernetes `localnet` topology network that did not have subnets defined. With this release, a fix ensures that the `ovnkube-cluster-manager` container can process an OVN-Kubernetes `localnet` topology network that does not have subnets defined. (link:https://issues.redhat.com/browse/OCPBUGS-43454[*~~OCPBUGS-43454~~*])

Show
* Previously, when you attempted to use the Cluster Network Operator (CNO) to upgrade a cluster with existing `localnet` networks, `ovnkube-control-plane` pods would fail to run. This happened because the `ovnkube-cluster-manager` container could not process an OVN-Kubernetes `localnet` topology network that did not have subnets defined. With this release, a fix ensures that the `ovnkube-cluster-manager` container can process an OVN-Kubernetes `localnet` topology network that does not have subnets defined. (link: https://issues.redhat.com/browse/OCPBUGS-43454 [* OCPBUGS-43454 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Customers using OVN-K localnet topology networks for virtualization often do not define a "subnets" field in their NetworkAttachmentDefinitions. Examples in the OCP documentation virtualization section do not include that field either.

When a cluster with such NADs is upgraded from 4.16 to 4.17, the ovnkube-control-plane pods crash when CNO is upgraded and the upgrade hangs in a failing state. Once in the failing state, the cluster upgrade can be recovered by adding a subnets field to the localnet NADs

Version-Release number of selected component (if applicable): 4.16.15 > 4.17.1

How reproducible:

Start with an OCP 4.16 cluster with OVN-K localnet NADs configured per the OpenShift Virtualization documentation and attempt to upgrade the cluster to 4.17.1.

Steps to Reproduce:

1. Deploy an OCP 4.16.15 cluster, the type shouldn't matter but all testing has been done on bare metal (SNO and HA topologies)

2. Configure an OVS bridge with localnet bridge mappings and create one or more NetworkAttachmentDefinitions using the localnet topology without configuring the "subnets" field

3. Observe that this is a working configuration in 4.16 although error-level log messages appear in the ovnkube-control-plane pod (see OCPBUGS-37561)

4. Delete the ovnkube-control-plane pod on 4.16 and observe that the log messages do not prevent you from starting ovnkube on 4.16

5. Trigger an upgrade to 4.17.1

6. Once ovnkube-control-plane is restarted as part of the upgrade, observe that the ovnkube-cluster-manager container is crashing with the following message where "vlan10" is the name of a NetworkAttachmentDefinition created earlier

failed to run ovnkube: failed to start cluster manager: initial sync failed: failed to sync network vlan10: [cluster-manager network manager]: failed to create network vlan10: no cluster network controller to manage topology

7. Edit all NetworkAttachmentDefinitions to include a subnets field

8. Wait or delete the ovnkube-control-plane pods and observe that the pods come up and the upgrade resumes and completes normally

Actual results: The upgrade fails and ovnkube-control-plane is left in a crashing state

Expected results: The upgrade succeeds and ovnkube-control-plane is running

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms: Tested on baremetal, but all using OVN-K localnet networks should be impacted

Is it an

~~internal CI failure~~
customer issue / SD (Case 03960269)
internal RedHat testing failure - Reproduction steps are based on internal testing as the customer environment has been repaired with the workaround

If it is an internal RedHat testing failure:

Kubeconfig for an internet-reachable cluster currently in the failed state is available upon request from Andrew Austin Byrum until at least 25 October 2024

depends on

OCPBUGS-44195 ovnkube-control-plane pods crash when upgrading from 4.16 to 4.17 with localnet topology networks without subnets

Closed

is cloned by

OCPBUGS-44195 ovnkube-control-plane pods crash when upgrading from 4.16 to 4.17 with localnet topology networks without subnets

Closed

is related to

CORENET-5338 Impact ovnkube-control-plane pods crash when upgrading from 4.16 to 4.17 with localnet topology networks without subnets

Closed

links to

KCS Created

openshift/ovn-kubernetes#2362: [release-4.17] OCPBUGS-43454: Ignore cluster manager topology not managed errors for localnet with no subnets

RHBA-2024:10518 OpenShift Container Platform 4.17.z bug fix update

(2 links to)

Assignee:: Miguel Duarte de Mora Barroso

Reporter:: Andrew Austin Byrum

Need Info From:: None

Contributors:: None

QA Contact:: Weibin Liang

Doc Contact:: None

Votes:: 6 Vote for this issue

Watchers:: 28 Start watching this issue

Created:: 2024/10/16 4:25 PM

Updated:: 2025/09/13 11:18 PM

Resolved:: 2024/12/03 6:08 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates