Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:OVNK:IPSEC
- ipsec

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

During the cluster update from 4.18.30 to 4.19.21, pod-to-pod and node-to-node connectivity is lost following node reboots triggered by the MachineConfig operator. Although nodes return to the Ready state and pods remain running, encrypted traffic is dropped.

AuthServerRouteEndpointAccessibleControllerAvailable:
Get "https://oauth.apps.<cluster-domain>/healthz": EOF

It looks like the root cause is a state desynchronization in the IPsec stack. When a node reboots, it re-initializes its IPsec Security Associations (SAs) with new Security Parameters Indexes (SPIs). However, peer nodes fail to synchronize immediately and continue attempting to send traffic using stale SPIs. This mismatch causes the kernel to drop encrypted packets, leading to EOF errors on authentication routes and stalling the upgrade.

Restarting ovn-ipsec-host and ovnkube-node resolves the issue.

Version-Release number of selected component (if applicable): OpenShift 4.18.30 -> 4.19.21

How reproducible:

Try to upgrade the cluster from 4.18.30 to 4.19.21 while having IPSec activated.

Steps to Reproduce:

Enable IPsec encryption on an OVN-Kubernetes cluster.

Initiate a cluster upgrade or trigger a MachineConfig change that requires a rolling reboot of nodes.

Monitor connectivity to the OAuth or Console routes during the reboot cycle.

Actual results:

Nodes reboot and return to Ready status.

IPsec xfrm state shows mismatched SPIs between the rebooted node and its peers.

Encrypted traffic is dropped; Ingress and OAuth routes become unreachable.

Error reported: AuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth.apps.<domain>/healthz": EOF{{ }}

The upgrade hangs because Cluster Operators cannot verify health over the network.

Additional info:

The issue was detected on a VMWare UPI cluster.

Assignee:: Lionel Jouin

Reporter:: Simon Stumpf

Need Info From:: None

Contributors:: None

QA Contact:: Anurag Saxena

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2026/01/20 9:57 AM

Updated:: 2026/02/12 3:27 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates