Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12049

dns failing on 6 nodes, no external communication for pods scheduled to these hosts

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      (taken from salesForce)

      application pods on these worker nodes could not resolve service names, ping IP addresses, reach api endpoint etc. Pods appeared to have no external communication ability while scheduled on these nodes. There was no indication these nodes were unhealthy, which could have prevented pod scheduling and tenant disruptions.

      How reproducible: Unknown
      
      

      Steps to Reproduce: Current state, can't re-produce in lab

      Actual results: N/A

      Expected results: Pods would be scheduled to nodes, dns would resolve normally
      
      

      Additional info: Albert noticed that labels present on other workers were missing, but during a troubleshooting call, we tested on three nodes. One had the workerperf and webscale label, second had only the workerperf label, third had 0 labels.

      On the first two machines, our testing was successful. On the third, it would just hang. (One thing to note, the other 2 hosts are somehow ok with resolving external hosts as well... we can query/ping google.com from those hosts, while this is not possible from our target/problem host worker-079)

      
      

            mkennell@redhat.com Martin Kennelly
            dacarpen@redhat.com Darren Carpenter
            Anurag Saxena Anurag Saxena
            Darren Carpenter
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: