Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18
Component/s: Hosted Control Planes
Labels:
- kube-controller-manager

Target Version:
None
Activity Type:
Quality / Stability / Reliability
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Description of problem:

I am raising this RFE to improve the behavior of OCP in a specific use-case

The state of different worker nodes is controlled via the kube-controller-manager through `node-monitor-grace-period` defaulting to 40s & `pod-eviction-timeout` defaulting to 5mins and also the kubeletConfig through `node-status-update-frequency` & `node-status-report-frequency`.

According to Network separation with remote workers[1], If the kube controller loses contact with a node after a configured period, the node controller on the control plane updates the node health to Unhealthy and marks the node Ready condition as Unknown.
In response, the scheduler stops scheduling pods to that node. The on-premise node controller adds a node.kubernetes.io/unreachable taint with a NoExecute effect to the node and schedules pods on the node for eviction after five minutes, by default.
If a workload controller, such as a Deployment object or StatefulSet object, is directing traffic to pods on the unhealthy node and other nodes can reach the cluster, OpenShift Container Platform routes the traffic away from the pods on the node. Nodes that cannot reach the cluster do not get updated with the new traffic routing. As a result, the workloads on those nodes might continue to attempt to reach the unhealthy node.

The `node-status-update-frequency` parameter works with the `node-monitor-grace-period` parameter. The `node-monitor-grace-period` parameter specifies how long OpenShift Container Platform waits after a node associated with a MachineConfig object is marked Unhealthy if the controller manager does not receive the node heartbeat. Workloads on the node continue to run after this time (I believe within the grace period and not after).
If the remote worker node rejoins the cluster after `node-monitor-grace-period` expires, pods continue to run. New pods can be scheduled to that node. The `node-monitor-grace-period` interval is 40s.
The `node-status-update-frequency` value must be lower than the node-monitor-grace-period value.

This is suboptimal behavior within the lines of hosted Control Planes with Baremetal, Hosted Control Planes with Remote Virtualization Infrastructure and Remote edge scenarios.

In the case of hosted control planes, the impact is cascaded: Assume the control plane pods of a hosted cluster running on a namespace of one of the management cluster worker nodes are operating fine. However, the management cluster Kube-API for some reason is isolated from the worker nodes; the control-plane-pods of the tenant (hosted) cluster can communicate with their remote baremetal worker nodes through different interfaces. However, we are faced with undesired scenarios. The first worst-case scenario, is kube-controller Terminating these pods. The second less severe scenario, when the connection is restored and assuming the hosted cluster pods were running, the kube-controller will re-cycle all pods. In return, a cascading effect might start with the bare-metal remote workers of the hosted control plane will be rebooted as well or even lost with their applications.

Hosted Control Planes with Remote Virtualization Infrastructure is no better, although a remote infrastructure is used that could be fully operational, this might not be leveraged and the loss of connection will cause the hosted cluster virtualized workers to be rebooted. (I am not sure of this one but I assume they will be de-scheduled)

[1]: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/nodes/remote-worker-nodes-on-the-network-edge#nodes-edge-remote-workers-network_nodes-edge-remote-workers

Version-Release number of selected component (if applicable):

How reproducible:

  Inherent behavior to all OCP releases since OCP 4.0

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Assignee:: Ramon Acedo

Reporter:: Ahmed Zaky

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/08/08 3:32 PM

Updated:: 2025/10/10 9:28 AM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates