Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-7

The node went into SchedulingDisable due to the applied taint "node.kubernetes.io/out-of-service" by `node-healthcheck` operator after reboot.

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Important

      Description of problem:

      1. Have a Node Health Check and Single Node Remediation operators.
      
      2. After that they have shutdown one of the node by directly removing the power-cable.
      
      3. After again connecting the power cable the node is stuck into the `Ready|SchedulingDisabled` status and the `out-of-service` taint is getting applied to the node.
      ~~~
      "masternode3.rhocpclusterdc.powergrid.in"
      {
        "taints": [
          {
            "effect": "NoExecute",
            "key": "node.kubernetes.io/out-of-service",
            "timeAdded": "2025-03-03T13:00:41Z",
            "value": "nodeshutdown"
          }
        ]
      }
      ~~~
      Due to this the pods are not able to schedule on that node.
      
      4. Try to uncordon the and node comes in Ready state but the taint is still present and due to that no pod is getting schedule on that node.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Yes    

      Steps to Reproduce:

          1. Installed the node-healthcheck-operator and self-node-remediation operator.
          
          2. Then try to reboot the node after that you will see the operator will apply the taint for `out-of-service`.
       
          3. And even if the node is in healthy state still the operator is considering the node as unhealthy and taint is still there and resulting not able to schedule any pod on that node. 
          

      Actual results:

      Even if the node is in healthy state still the operator is considering the node as unhealthy and taint is still there and resulting not able to schedule any pod on that node.     

      Expected results:

      Once the node comes into healthy state (Ready) the `out-of-service` taint should remove automatically and the pods are able to schedule on node.

      Additional info:

      The below taint is applied on the affected node by operator:
      ~~~
      Taints:             medik8s.io/remediation=self-node-remediation:NoExecute                    node.kubernetes.io/out-of-service=nodeshutdown:NoExecute                    node.kubernetes.io/unschedulable:NoSchedule
      ~~~
      
      Also, The `apiserver` pods are stuck in pending state due to this taint of the affected node.

              aos-node@redhat.com Node Team Bot Account
              rhn-support-hthakare Harshal Thakare
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: