-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
4.12.z
-
Low
-
No
-
Rejected
-
False
-
-
-
-
business impact not discernable, maybe none; low-value
-
-
-
Description of problem:
When follow the KCS https://access.redhat.com/articles/4365651 or https://access.redhat.com/solutions/4569211 to capture pod's DNS packets to analyze, we can found tcpdump fails to capture DNS packets intermittently.
The pattern like below,
- app.pcap // DNS queries send by a pod
67 2023-12-20 00:59:05.517611 10.153.0.71 172.30.0.10 DNS 84 Standard query 0x5066 A google.com
68 2023-12-20 00:59:05.517616 10.153.0.71 172.30.0.10 DNS 84 Standard query 0x186a AAAA google.com
69 2023-12-20 00:59:05.517820 172.30.0.10 10.153.0.71 DNS 269 Standard query response 0x186a AAAA google.com AAAA 2404:6800:4009:82b::200e
- coredns.pcap // captured from the same node
226 2023-12-20 00:59:05.517716 10.153.0.71 10.153.0.8 MDNS 126 Standard query 0x5066 A google.com
227 2023-12-20 00:59:05.517783 10.153.0.8 10.153.0.71 MDNS 187 Standard query response 0x186a AAAA google.com AAAA 2404:6800:4009:82b::200e
228 2023-12-20 00:59:05.517800 10.153.0.8 10.153.0.71 MDNS 124 Standard query response 0x5066 A google.com A 142.250.70.46
To summarize,
- tcpdump captured in the pod which sending the DNS queries missing DNS A response
- tcpdump captured in the coredns which in the same node missing DNS AAAA query
- DNS resolution works fine
Version-Release number of selected component (if applicable):
OCP 4.12.13
OCP 4.12.39
How reproducible:
Steps to Reproduce:
1. Create a pod which can run nslookup, for example busybox
2. Run tcpdump in the same node where the app pod's running
3. Do DNS query 1000 times
for i in `seq 1 1000`;do nslookup google.com;done
Actual results:
Failed to capture about 300+ packets (sometimes 200+ or 400+)
Expected results:
Wish can capture all of the packets when requiring packet analysis.
Additional info:
The issue happens both RHCOS and RHEL nodes
Confirmed it doesn't happen in sdn environment
Only can reproduced in a ovn environment