Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-570

Common Cordon Across Medik8s Remediators

XMLWordPrintable

    • cordon-medik8s-remediators
    • False
    • Hide

      None

      Show
      None
    • False
    • To Do
    • 33% To Do, 67% In Progress, 0% Done

      Background

      Team Medik8s provides a high-availability solution to automate the "healing" of Kubernetes clusters. The solution is mainly composed of an NHC operator for detecting the nodes’ health, and several remediator operators (SNR, FAR, MDR, and SBR) to fence the node from the cluster and remediate it from an unhealthy state to a healthy one. 

      Even though each remediator works differently, there is quite a similarity in the flow between FAR, SNR, and SBR (MDR is a unique use case) in how they perform cordon at the beginning of their remediation:

      1. FAR adds a custom medik8s.io/fence-agents-remediation:NoExecute taint (see RHWA-311 for why it is suggested to use the NoSchedule effect) on the node.
      2. SNR (and SBR) adds a different custom medik8s.io/remediation=self-node-remediation:NoExecute{} taint, and then modifies the Node's spec as Unschedulable so that Kubernetes can append the node.kubernetes.io/unschedulable:NoSchedule taint.

      Solution

      Both of the used custom taints have a different API (different key) and a wrong effect

      • FAR should use the same taint key but change the effect (i.e., remediation.medik8s.io/fence-agents-remediation:NoSchedule)
      • SNR should change the taint key and effect (i.e.,remediation.medik8s.io/self-node-remediation:NoSchedule)
      • SNR should stop marking the node as unscheduleable by updating the spec (i.e., implicitly trigger K8s taint)
      • Watch out for any reference to the old taint key/effect, so it will be updated as part of this epic 

      For more, see design/discussion at https://docs.google.com/document/d/12FMnb6RSs2iZFX7BldpY3BjlpBy0uSzv15hTxPb3iOg/edit?tab=t.0 and in two Slack threads https://redhat-internal.slack.com/archives/C03M5GKJNBA/p1767710450933989 and https://redhat-internal.slack.com/archives/C03M5GKJNBA/p1765956370373569.

              oraz@redhat.com Or Raz
              oraz@redhat.com Or Raz
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: