Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31444

Wrong dnsPolicy is used for konnectivity-agent in data plane

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Normal
    • 4.16.0
    • 4.14, 4.15
    • HyperShift
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: konnectivity-agent daemonset is using dnsPolicy ClusterIP
      *Consequence*: When CoreDNS is down, konnectivity-agent pods on the data plane cannot resolve the proxy-server-address, and they may fail to konnectivity-server in the control plane.
      *Fix*: Changed the konnectivity-agent daemonset to use dnsPolicy: Default
      *Result*: konnectivity-agent is using the host system DNS service to lookup the proxy-server-address and it does not depend anymore on CoreDNS
      Show
      *Cause*: konnectivity-agent daemonset is using dnsPolicy ClusterIP *Consequence*: When CoreDNS is down, konnectivity-agent pods on the data plane cannot resolve the proxy-server-address, and they may fail to konnectivity-server in the control plane. *Fix*: Changed the konnectivity-agent daemonset to use dnsPolicy: Default *Result*: konnectivity-agent is using the host system DNS service to lookup the proxy-server-address and it does not depend anymore on CoreDNS
    • Bug Fix
    • Done

    Description

      Description of problem:

      The konnectivity-agent on the data plane needs to resolve its proxy-server-url to connect the control plane's konnectivity server. Also, the these agents are using the default dnsPolicy which is ClusterFirst.
      
      This creates a dependency with CoreDNS. If CoreDNS is misconfigured or down, agents won't able to connect to the server, and all konnectivity related traffic goes down (blocks updates, webhooks, logs, etc).
      
      The correction would to use the dnsPolicy: Default in the konnectivity-agent daemonset on the data plane, so it would use the name resolution configuration from the node.
      
      This makes sure that the konnectivity-agent's proxy-server-url can be resolved even if coreDNS is down or mis-configured
      
      The konnectivity-agent control plane deployment shall not change as it still needs to use coreDNS as in that case a ClusterIP Service is configured as proxy-server-url.   

      Version-Release number of selected component (if applicable):

      4.14, 4.15
      
          

      How reproducible:

      Break coreDNS configuration

      Steps to Reproduce:

      1. Put an invalid forwarder to the dns.operator/default to fail upstream DNS resolving
      2. Rollout restart the konnectivity-agent daemonset in kube-system

      Actual results:

      kubectl log is failing

      Expected results:

      kubectl log is working

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              adam.mihelcsik Adam Mihelcsik
              adam.mihelcsik Adam Mihelcsik
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: