Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z, 4.19.z, 4.20.z, 4.21.0
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:Platform:OVNK

Activity Type:
Quality / Stability / Reliability
Blocked:
True
Blocked Reason:

Hide

Red Hat

Show
Red Hat
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:

4.21.0
Release Blocker:
None
Sprint:
None

Special Handling:

contract-priority
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
PX Impact Score:
PX Technical Impact:
PX Impact Range:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Summary:
The haproxy pod became unable to query dns (udp communication failure) after deleted/recreated dns-default-xxx pods.
This issue is resolved after deleting conntrack record of the worker node where haproxy is running, this record has an old(already deleted) dns-default pod IP addres

Details:
dns-pod was deleted

quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-86409e992e7e8323656a6a25ad2aead3cd82beea55680af1fd65c00a73a6abc4/host_service_logs/masters/kubelet_service.log:Jun 02 05:29:56.984844 etcd-1.paas.tmg.local kubenswrapper[2396]: I0602 05:29:56.984780    2396 kubelet.go:2449] "SyncLoop DELETE" source="api" pods=[openshift-dns/dns-default-jvbrs]
quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-86409e992e7e8323656a6a25ad2aead3cd82beea55680af1fd65c00a73a6abc4/host_service_logs/masters/kubelet_service.log:Jun 02 05:29:56.988325 etcd-1.paas.tmg.local kubenswrapper[2396]: I0602 05:29:56.988269    2396 kubelet.go:2443] "SyncLoop REMOVE" source="api" pods=[openshift-dns/dns-default-jvbrs]

But After that, the following contrack record is still existed

udp      17 119 src=10.130.2.168 dst=172.30.0.10 sport=44882 dport=53 src=10.129.0.106 dst=10.130.2.168 sport=5353 dport=44882 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 zone=20 use=1

10.130.2.168 is The IP address of the haproxy-0 pod
172.30.0.10 is The DNS ServiceIP
10.129.0.106 is the IP address of the is the old(already deleted) DNS pod (dns-default-jvbrs)

It seems that a stale conntrack UDP entry exists and I am causing DNS query failure. This issue is resolved after deleting old conntrack records.

In my understanding old conntrack record will be deleted after 180s passed if there is not any active communications. But in this case, The old conntrack record is keep having.

I am assuming that "Due to haproxy, DNS queries are frequent, and packets are continuously sent to old entries, causing the connection tracking retention time to be updated."

I found the this Jira[1], It seems to be talking the same issue but this Jira is for the SDN component not for ovn-kubernetes.

In this Jira[1], It was said that "It seems the issue does not exist with OVN (Openshift 4.16.13)"
But it seems that It occurs less frequently than SDN and is difficult to reproduce, but it does occur in reality.

It seems udp conntrack cleanup logic is the same as 4.14[2] and 4.16[3].
(Even if newer versions such as 4.18, It is the same. If my confirmation point is not correct, I am sorry.)

[1] https://issues.redhat.com/browse/OCPBUGS-42203

[2] https://github.com/openshift/ovn-kubernetes/blob/release-4.14/go-controller/pkg/node/default_node_network_controller.go#L1221-L1258

[3] https://github.com/openshift/ovn-kubernetes/blob/release-4.16/go-controller/pkg/node/default_node_network_controller.go#L1179-L1216

Version-Release number of selected component (if applicable):
Openshift 4.14.51
openshift4/ose-ovn-kubernetes@sha256:9c1407542398da5dda6c7c335c36221ba7c78df70c3d90c182b7f8e2eb4e0c91

How reproducible:
Not always but sometimes (25% - when executing Reproduce steps In customer env).

Steps to Reproduce:

1. Created haproxy pods in advance.

2. Deleted the dns pods and wait for completing to recreate.

3. Excuting the operation to occur dns query via haproxy pods.

Actual results:
old conntrack record existed then DNS resolution failure occurred

Expected results:
DNS resolution should continue

Affected Platforms:
None

links to

openshift/ovn-kubernetes#2859: OCPBUGS-61285,OCPBUGS-57053: DownStream Merge [11-13-2025]

u/s PR

Assignee:: Peng Liu

Reporter:: Takeshi Saito

Need Info From:: None

Contributors:: None

QA Contact:: Anurag Saxena

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Due:: 2025/10/30

Created:: 2025/06/04 8:41 AM

Updated:: 2025/11/19 11:15 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates