Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-283

[FAR] Improve HA by enforcing podAntiAffinity for controller-manager replicas

XMLWordPrintable

    • False
    • False
    • Hide
      Cause: The default scheduler may place 2 FAR replicas on the same node, especially when fewer nodes are available.
      Consequence: The fence-agents-controller-manager replicas may be scheduled on the same node, which reduces the effectiveness of high availability and leads to a single point of failure.
      Fix: FAR deployment includes a podAntiAffinity rule preferredDuringSchedulingIgnoredDuringExecution with topologyKey: kubernetes.io/hostname.
      Result: The default scheduler prefers placing the FAR replica on a node that does not already have a replica.
      Show
      Cause: The default scheduler may place 2 FAR replicas on the same node, especially when fewer nodes are available. Consequence: The fence-agents-controller-manager replicas may be scheduled on the same node, which reduces the effectiveness of high availability and leads to a single point of failure. Fix: FAR deployment includes a podAntiAffinity rule preferredDuringSchedulingIgnoredDuringExecution with topologyKey: kubernetes.io/hostname. Result: The default scheduler prefers placing the FAR replica on a node that does not already have a replica.
    • Feature
    • Proposed

      The fence-agents-controller-manager replicas may be scheduled on the same node, which reduces the effectiveness of high availability.

      This issue can occur when nodes are being brought back online one by one, such as during maintenance.
      In such cases, both replicas may be placed on a single node, creating a single point of failure.

      This patch introduces the following change:

      [Before merging this patch]
      The default scheduler may place 2 replicas on the same node, especially when fewer nodes are available.

      [After merging this patch]
      The fence-agents-controller-manager deployment includes a podAntiAffinity rule using requiredDuringSchedulingIgnoredDuringExecution, ensuring that replicas are scheduled on separate nodes.

      Scheduling both replicas on the same node introduces a single point of failure and should be avoided in HA configurations.
      In situations where only one node is available, such as during planned maintenance or recovery, this can lead to delays or missed remediation.
      To help improve fault tolerance, it is recommended to use podAntiAffinity rules so that each replica runs on a different node.
      By preventing both replicas from running on the same node, this setup enhances the resilience of the remediation process.

              oraz@redhat.com Or Raz
              kkawakam@redhat.com KATSUYA KAWAKAMI
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: