Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Networking / ovn-kubernetes
Labels:
- sbr-triaged

Severity:
Critical
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.15.z
Target Backport Versions:

4.14.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Description of problem:

In OCPBUGS-43740, it was necessary to add the {{ --cluster-manager-v4-transit-switch-subnet }} and {{ --cluster-manager-v6-transit-switch-subnet }} startup options to the ovnkube-node daemonset because otherwise it would fail doing subnet overlap validation.

The problem is that when the fix was backported to 4.14.z, it was backported in a way such that it only works for the usual multizone OVN-K deployment (the usual one with interconnect enabled) and not the temporary singlezone one used during the upgrade. To be more concrete, PR#2607 only introduced the change in the startup scripts of the ovnkube-script-lib configmap (which are used for the multizone daemonsets) , but the singlezone daemonset has the startup command embebed and doesn't use the library configmap, so it didn't catch the fix.

It is easy to see by just following the reproducer steps.

Version-Release number of selected component (if applicable):

4.13.z-->4.14.44 upgrade

How reproducible:

Always if there are subnets that overlap with the default transit switch subnet.

Steps to Reproduce:

1. Install a 4.13 cluster whose service network (or pod network) overlaps with the default transit switch subnet 100.88.0.0/16.

2. Start the upgrade

3. Specify the custom transit switch subnet in the network.operator/cluster object.

Actual results:

Upgrade stalled because one or more ovnkube-node pods are in crashloopbackoff with an error like this

illegal network configuration: transit switch subnet "100.88.0.0/16" overlaps cluster subnet "100.x.x.x/12"

Expected results:

Upgrade to work properly if the right custom transit switch subnet is specified.

Additional info:

A workaround is to force the direct deployment of the multizone daemonset by patching the ovn-interconnect configmap like this:

$ oc -n openshift-ovn-kubernetes patch cm/ovn-interconnect-configuration --type merge --patch '{"data":{"zone-mode":"multizone","fast-forward-to-multizone":""}}'

However, this cannot be considered a definitive solution, because it causes expected outage in the pod network.

More information in the comments.

clones

OCPBUGS-48323 Solution for OCPBUGS-43740 is not enough for upgrades

POST

is depended on by

OCPBUGS-48323 Solution for OCPBUGS-43740 is not enough for upgrades

POST

Assignee:: Ben Bennett

Reporter:: Pablo Alonso Rodriguez

QA Contact:: Zhanqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/01/14 6:35 AM

Updated:: 2025/01/14 6:37 AM

Resolved:: 2025/01/14 6:37 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates