Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: Node Healthcheck, Self Node Remediation
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

1. Have a Node Health Check and Single Node Remediation operators.

2. After that they have shutdown one of the node by directly removing the power-cable.

3. After again connecting the power cable the node is stuck into the `Ready|SchedulingDisabled` status and the `out-of-service` taint is getting applied to the node.
~~~
"masternode3.rhocpclusterdc.powergrid.in"
{
  "taints": [
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/out-of-service",
      "timeAdded": "2025-03-03T13:00:41Z",
      "value": "nodeshutdown"
    }
  ]
}
~~~
Due to this the pods are not able to schedule on that node.

4. Try to uncordon the and node comes in Ready state but the taint is still present and due to that no pod is getting schedule on that node.

Version-Release number of selected component (if applicable):

How reproducible:

Yes

Steps to Reproduce:

    1. Installed the node-healthcheck-operator and self-node-remediation operator.
    
    2. Then try to reboot the node after that you will see the operator will apply the taint for `out-of-service`.
 
    3. And even if the node is in healthy state still the operator is considering the node as unhealthy and taint is still there and resulting not able to schedule any pod on that node.

Actual results:

Even if the node is in healthy state still the operator is considering the node as unhealthy and taint is still there and resulting not able to schedule any pod on that node.

Expected results:

Once the node comes into healthy state (Ready) the `out-of-service` taint should remove automatically and the pods are able to schedule on node.

Additional info:

The below taint is applied on the affected node by operator:
~~~
Taints:             medik8s.io/remediation=self-node-remediation:NoExecute                    node.kubernetes.io/out-of-service=nodeshutdown:NoExecute                    node.kubernetes.io/unschedulable:NoSchedule
~~~

Also, The `apiserver` pods are stuck in pending state due to this taint of the affected node.

Assignee:: Node Team Bot Account

Reporter:: Harshal Thakare

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/03/05 9:54 AM

Updated:: 2025/09/13 1:07 PM

Resolved:: 2025/03/11 8:19 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty