Description of problem:
On 4.20 it was discovered that kubelet does not allow enough time for NSS dns lookups to time out and fallback to myhostname. We can avoid this timeout by moving myhostname ahead of dns in nsswitch.conf
Version-Release number of selected component (if applicable):
4.20.z
How reproducible:
Always
Steps to Reproduce:
1. Create a new worker node whose DNS server will time out
2. Observe that kubelet will never create the Node object
3.
Actual results:
No Node object is created, instead kubelet continually errors out and is restarted
Expected results:
Kubelet creates the Node object and the worker joins the cluster successfully
Additional info:
This is a result of the investigation into OCPBUGS-64883. Currently ARO has declared this an upgrade risk for 4.20.
See also incident Slack channel #itn-2025-00296 https://redhat.enterprise.slack.com/archives/C09SCTRBK7Z
- is cloned by
-
OCPBUGS-68380 [azure] Re-order NSS modules to move myhostname ahead of dns
-
- New
-
- relates to
-
OCPBUGS-64883 Workers don't create their Node objects on ARO
-
- New
-
-
RHEL-39537 [RHEL-10]hostname -f cannot show fqdn
-
- Closed
-