Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-366

Investigate update strategy with topologySpreadConstraints

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      Cause: The RHWA operators used to have inconsistent deployment configuration with regards to replicas, node affinity and update strategy.
      Consequence: Potentially slower remediation in case the operator pod was running on an unhealthy node.
      Fix: Use 2 replicas for NHC, FAR and SNR, use topologySpreadConstraints for preventing running on the same node, and use updateStrategy for avoiding potential update locks in some edge cases.
      Result: Reduced chance of slower remediation.
      Show
      Cause: The RHWA operators used to have inconsistent deployment configuration with regards to replicas, node affinity and update strategy. Consequence: Potentially slower remediation in case the operator pod was running on an unhealthy node. Fix: Use 2 replicas for NHC, FAR and SNR, use topologySpreadConstraints for preventing running on the same node, and use updateStrategy for avoiding potential update locks in some edge cases. Result: Reduced chance of slower remediation.
    • Enhancement
    • Proposed

      We configured NHC, FAR and SNR to use topologySpreadConstraints for spreading replicas across nodes. This might introduce an issue with updates in some corner cases, see comment on the SNR PR: https://github.com/medik8s/self-node-remediation/pull/180#discussion_r2419792014

      Investigate if this is a real issue, and update all 3 operators if needed.

              slintes Marc Sluiter
              slintes Marc Sluiter
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: