-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.8
-
Important
-
No
-
Rejected
-
False
-
-
(taken from salesForce)
application pods on these worker nodes could not resolve service names, ping IP addresses, reach api endpoint etc. Pods appeared to have no external communication ability while scheduled on these nodes. There was no indication these nodes were unhealthy, which could have prevented pod scheduling and tenant disruptions.
How reproducible: Unknown
Steps to Reproduce: Current state, can't re-produce in lab
Actual results: N/A
Expected results: Pods would be scheduled to nodes, dns would resolve normally
Additional info: Albert noticed that labels present on other workers were missing, but during a troubleshooting call, we tested on three nodes. One had the workerperf and webscale label, second had only the workerperf label, third had 0 labels.
On the first two machines, our testing was successful. On the third, it would just hang. (One thing to note, the other 2 hosts are somehow ok with resolving external hosts as well... we can query/ping google.com from those hosts, while this is not possible from our target/problem host worker-079)