-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.20.z
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
None
-
None
-
CORENET Sprint 280
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
ARO clusters with UDR and only ARO-provided DNS are unable to create Node objects for new worker nodes. Apiserver connectivity is confirmed working, we see the new node's kubelet successfully creating node status events. Audit logs don't show any attempt to create the Node object.
Version-Release number of selected component (if applicable):
4.20.0
How reproducible:
Every time on a cluster that has a) restricted outbound via UDR, and b) DNS limited to just ARO dnsmasq-provided names (i.e. vnet has an invalid DNS IP).
Steps to Reproduce:
1. Create a vnet with a DNS server that won't respond
2. Create an ARO cluster using the vnet, with --outbound-type UserDefinedRouting
3. Upgrade to 4.20.0
4. Scale up a machineset
Actual results:
Machine object and VM are created successfuly, but the Node object never gets created and the worker node does not fully join the cluster. MHC will repeatedly re-create the worker since it never becomes healthy.
Expected results:
New worker node joins the cluster and becomes healthy
Additional info:
See also Slack thread: https://redhat-internal.slack.com/archives/CK1AE4ZCK/p1762365561311179
Node sosreport: https://drive.google.com/file/d/10l1KsX564ZHwf8Fg2Gt0noZXZKBgzYzu/view?usp=drive_link
Kubelet logs set to loglevel 6: https://drive.google.com/file/d/1JZlivPQKomcCIqvkR6UhwKG5glTxt7oq/view?usp=drive_link