-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.20.z
-
None
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
On 4.20 it was discovered that kubelet does not allow enough time for NSS dns lookups to time out and fallback to myhostname. We can avoid this timeout by moving myhostname ahead of dns in nsswitch.conf As one potential solution to OCPBUGS-64883, MCO would template /etc/nsswitch.conf on Azure to have a hosts line that puts myhostname before dns: hosts: files myhostname dns
Other solutions include:
Changing the kublet startup timeout in OCPBUGS-67200, this is unlikely to be changed in the near term to resolve the 4.20 update risk, and even it it is accepted upstream it probably won't be backported.
Changing the underlying CoreOS nsswitch.conf in OCPBUGS-67317. It's unclear if changing this globally will be an accepted solution.
Version-Release number of selected component (if applicable):
4.20.z
How reproducible:
Always
Steps to Reproduce:
1. Create a new worker node whose DNS server will time out
2. Observe that kubelet will never create the Node object
3.
Actual results:
No Node object is created, instead kubelet continually errors out and is restarted
Expected results:
Kubelet creates the Node object and the worker joins the cluster successfully
Additional info:
This is a result of the investigation into OCPBUGS-64883. Currently ARO has declared this an upgrade risk for 4.20.
See also incident Slack channel #itn-2025-00296 https://redhat.enterprise.slack.com/archives/C09SCTRBK7Z
- clones
-
OCPBUGS-67317 Re-order NSS modules to move myhostname ahead of dns
-
- New
-
- relates to
-
OCPBUGS-64883 Workers don't create their Node objects on ARO
-
- New
-