Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-20303

[18.0.10][dnsmasq-dns pods restarting every 30 min due to dns queries from edpm node]

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • None
    • openstack-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Critical

      Issue:

      The dnsmasq-dns pod restart every 30 min which is causing edpm node connectivity failure and any process during that time is getting hampered. Customer ran tcpdump on the master node for all the requests coming[1] and see.

      • EDPM nodes are flooding with the DNS queries
      • TCP queries to port are coming which is creating additional overhead of 3 way handshake
      • Need to know why so many DNS queries + tcp requests coming from EDPM node.

       

      rw-rw-rw+ 1 yank yank 1364202 Sep 24 09:38 0130-dnsmasq-240925.pcap //tcpdump -i any -y LINUX_SLL2 -C 500 -W 5 -w /tmp/external-capture_1.pcap port 53 or port 5353 or port 32726 or host 100.64.68.81 or host 100.64.80.15 or host 100.64.80.16 or host 100.64.80.17
      rw-rw-rw+ 1 yank yank   47853 Sep 24 17:45 0140-external-capture_1.pcap0 //Compute node

       

      After adding following enhancement and increase of the replica from 3->6 only one pod was seen restarted when the connection exceed 20 and post increase to 12 no restart of pod is observed.

      https://issues.redhat.com/browse/OSPRH-20039?focusedId=28118325&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-28118325
      Reference Jira:  https://issues.redhat.com/browse/OSPRH-20039

              Unassigned Unassigned
              rhn-support-pgodwin Paul Godwin
              rhos-dfg-df
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: