Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25888

[ovn] tcpdump fails to capture DNS packets intermitently

XMLWordPrintable

    • Low
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • business impact not discernable, maybe none; low-value

      Description of problem:
      When follow the KCS https://access.redhat.com/articles/4365651 or https://access.redhat.com/solutions/4569211 to capture pod's DNS packets to analyze, we can found tcpdump fails to capture DNS packets intermittently.

      The pattern like below,

      • app.pcap // DNS queries send by a pod

      67 2023-12-20 00:59:05.517611 10.153.0.71 172.30.0.10 DNS 84 Standard query 0x5066 A google.com
      68 2023-12-20 00:59:05.517616 10.153.0.71 172.30.0.10 DNS 84 Standard query 0x186a AAAA google.com
      69 2023-12-20 00:59:05.517820 172.30.0.10 10.153.0.71 DNS 269 Standard query response 0x186a AAAA google.com AAAA 2404:6800:4009:82b::200e

      • coredns.pcap // captured from the same node

      226 2023-12-20 00:59:05.517716 10.153.0.71 10.153.0.8 MDNS 126 Standard query 0x5066 A google.com
      227 2023-12-20 00:59:05.517783 10.153.0.8 10.153.0.71 MDNS 187 Standard query response 0x186a AAAA google.com AAAA 2404:6800:4009:82b::200e
      228 2023-12-20 00:59:05.517800 10.153.0.8 10.153.0.71 MDNS 124 Standard query response 0x5066 A google.com A 142.250.70.46

      To summarize,

      • tcpdump captured in the pod which sending the DNS queries missing DNS A response
      • tcpdump captured in the coredns which in the same node missing DNS AAAA query
      • DNS resolution works fine

      Version-Release number of selected component (if applicable):
      OCP 4.12.13
      OCP 4.12.39

      How reproducible:
      Steps to Reproduce:

      1. Create a pod which can run nslookup, for example busybox
      2. Run tcpdump in the same node where the app pod's running
      3. Do DNS query 1000 times
      for i in `seq 1 1000`;do nslookup google.com;done

      Actual results:
      Failed to capture about 300+ packets (sometimes 200+ or 400+)

      Expected results:
      Wish can capture all of the packets when requiring packet analysis.

      Additional info:
      The issue happens both RHCOS and RHEL nodes
      Confirmed it doesn't happen in sdn environment
      Only can reproduced in a ovn environment
       

            bpickard@redhat.com Ben Pickard
            rhn-support-yhuang Ying Huang
            Anurag Saxena Anurag Saxena
            Ying Wang
            Chris Fields
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: