Loading...

Type: Bug
Resolution: Not a Bug
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.14, 4.15, 4.16
Component/s: Networking / kubernetes-nmstate
Labels:
- cee.Next
- dns
- kubernetes
- nmstate
- nnce
- nncp

Activity Type:
Quality / Stability / Reliability
Blocked:
True
Blocked Reason:

Hide

Waiting for feedback re: solution proposed on 2025/07/30 10:18 AM

Show
Waiting for feedback re: solution proposed on 2025/07/30 10:18 AM
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Some background about the issue:

Lets say users are modifying `/etc/resolv.conf` of the nodes to add a search domain `example.com` using the NMState operator using the below NNCP:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: worker-dns-modify
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  desiredState:
    dns-resolver:
      config:
        search:
        - example.com

The NNCP gets applied successfully. But whenever the node reboots, maybe due to some patching or MCO rollout or manual reboot for that matter, whenever node reboots, as per design, the configuration done using the NNCP gets reverted. And once the node gets into READY state, the NNCP and NNCE are re-applied on the nodes, which is expected. Now, many customers have observed that sometimes NNCE takes around 5 minutes to apply on the node after it becomes READY, while sometimes it gets applied within 2 minutes.

Now What Is the problem we are facing:
In the test cluster, while reproducing, the issue here is that as soon as the nodes become READY, pods start to schedule on the node. Now, at this point, right after reboot, NMState is parallelly applying the NNCP and NNCE. However, the nodes still have old DNS settings applied in `/etc/resolv.conf` and which is why the pods pick up the old DNS configuration from the nodes.

For instance, when we do `$ oc rsh` into the twistlock pod, we can see that this pod which was created after reboot but before NMState applied changes has no `example.com` search domain

$ oc get po -o wide | grep -i <node-name>
twistlock-defender-xx-xxxxx 1/1 Running 1 8d 10.xx.xx.xx <node-name> <none> <none>
$ oc rsh twistlock-defender-xx-xxxxx
sh-5.1# cat /etc/resolv.conf
search twistlock-example.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.xx.xx.xx
options ndots:5
sh-5.1#
exit

After the NNCE finished successfully and pod is restarted, the pods now take updated DNS setting from the nodes /etc/resolv.conf as we can now see this new search domain `example.com` is now available :

$ oc delete po twistlock-defender-xx-xxxxx
pod "twistlock-defender-xx-xxxxx" deleted
$ oc get po -o wide | grep -i <node-name>
twistlock-defender-xx-xxxxx 1/1 Running 0 10s 10.xx.xx.xx <node-name> <none> <none>
$ oc rsh twistlock-defender-xx-xxxxx
sh-5.1# cat /etc/resolv.conf
search twistlock-example.svc.cluster.local svc.cluster.local cluster.local example.com
nameserver 10.xx.xx.xx
options ndots:5
sh-5.1#

The Challenge we are Facing due to this issue:

It has been observed that, after node reboot, everytime NNCE gets re-applied and it takes nearly 3-4 minutes on average to apply the desired DNS configuration on the nodes /etc/resolv.conf.

During this time, pods have already started to schedule and are running on the node as the Node is reporting READY.

Now, Meanwhile, NMState is parallelly applying the NNCE on the nodes and it takes nearly 3-4 mins to apply the changes, by that time the pods have already taken the old DNS configuration. So, it becomes extra work for users to every time restart pods after NNCE gets applied, so that the pods get new DNS configuration from the node.

In one of the customer scenario, they had nearly 3000 pods running on this node and restarting all the pods becomes an additional overhead. Also, lets assume the time calculation in this whole process, where node reboot takes nearly 10 minutes to report READY state and NMState takes additional 3-4 minutes and then after that restarting all 3000 pods would take 3-4 minutes as well. So, for one reboot of the node it takes 18 minutes on average to make it fully operational.

If for one node, it's taking 18-20 minutes just after a single reboot, let's assume how big the problem here is if we talk about large clusters, having 50-60 nodes. A simple patching using MCO rollout will reboot nodes one by one, and consequently, after that NNCE is applied on all nodes, and after that user also has to again restart the pods of the node that came up into READY just now. This process is not very optimized.

For customer not using DHCP, rather using static DNS config configured at the time of node addition into the cluster. And the customer is looking to add additional search domain `example.com` or any search domain for that matter, in the node's /etc/resolv.conf through NMState, this problem is very troublesome.

A possible solution to this problem:
The NNCE gets re-applied after each reboot and the challenge here is pods are scheduled first before NNCE finishes. Here, to optimize this what we can do is, lets say after reboot, the NMhandler daemon set will still run on the node and those NMState handler pods put the node to remain cordoned until NNCE is re-applied, and once NNCE is applied successfully, the nodes are uncordoned, and pods are now allowed to schedule and takes new DNS configuration automatically from the nodes. So, this could be a better way or some code level changes we can do. This is just a suggestion from my side.

I had a detailed discussion on this on Slack (Kubernetes-nmstate channel): https://redhat-internal.slack.com/archives/CP7329Z5Z/p1739286929445249?thread_ts=1739268087.035259&cid=CP7329Z5Z

A solution proposed here was customer will have nmconnection file with static IP and static DNS on the nodes. In the current existing configuration customer wants to add more DNS search domain into the nodes /etc/resolv.conf and is using NMState for that purpose.
If customers can directly add more DNS search domains into the nmconnection file directly then this will persist across reboot and as soon as the nodes are rebooted, the pods that will schedule will get new updated DNS settings from the node. So, this shall solve the problem.

Version-Release number of selected component (if applicable):

Kubernetes NMState operator

How reproducible:

    Always

Steps to Reproduce:

    1. Apply NNCP on node to modify or add new search domains in nodes /etc/resolv.conf
    2. Reboot the node and observe the NNCE are reapplied on the nodes. 
    3. Only after NNCE is re-applied, the new DNS search domains will be visible.

Actual results:

    Only after NNCE is re-applied, the new DNS search domains will be visible.

Expected results:

    The nodes should have persistent way to modify the /etc/resolv.conf after reboots.

Additional info:

    More details are covered in the Slack thread shared above.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide