-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhwa-25.8
The NodeHealthCheck (NHC) controller correctly identifies the initial set of nodes matching its spec.selector when the NHC object is first created.
However, if a node that did not initially match the selector is later updated (i.e., a new label is added) to match the selector, the NHC controller fails to detect this change. The node is not added to the .status.observedNodes list and is consequently not monitored for health.
The only way to force the controller to recognize the newly labeled node is to either delete/recreate the NHC object or restart the NHC controller manager pod.
Steps to Reproduce
1. Prerequisites: A cluster with the NodeHealthCheck operator running and at least two worker nodes (e.g., node-1 and node-2).
Initial Node Labeling:
Label node-1 with both required labels:
oc label node node-1 hypershift.openshift.io/nodePool=test
oc label node node-1 fencing=true
Label node-2 with only one of the required labels:
oc label node node-2 hypershift.openshift.io/nodePool=test
2. Create NodeHealthCheck Object:
Apply the following NodeHealthCheck CR, which selects on both labels:
selector:
matchLabels:
hypershift.openshift.io/nodePool: test
fencing: "true"
3. Observe Initial State:
Check the status of the newly created NHC object:
oc get nhc <name> -n openshift-workload-availability -o yaml | grep -i observedNodes observedNodes: 1
4. Trigger the Bug (Update Node 2):
Add the missing fencing: "true" label to node-2, making it a valid target for the NHC selector:
oc label node node-2 fencing=true
node-2 now fully matches spec.selector.matchLabels.
5. Observe Final State:
Wait several minutes for the controller to reconcile and check the status again:
oc get nhc <name> -n openshift-workload-availability -o yaml | grep -i observedNodes
Actual Results
The status.observedNodes field remains at 1. The controller never detects that node-2 now matches the selector, and the node is not monitored.
Expected Results
After node-2 is labeled, the NHC controller's reconciliation loop should detect the change, re-evaluate its selector, and add node-2 to its list of monitored nodes.
The status.observedNodes field should update to 2.
Workaround
Manually forcing the controller to re-initialize its list of nodes "fixes" the issue:
Workaround 1: Delete and recreate the NHC object.
Workaround 2: Restart the NHC controller manager pod.
oc delete pods -l app.kubernetes.io/component=controller-manager -n openshift-workload-availability