Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-689

SBR | Default number of controller-manager replicas is 1 (unlike other operators)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhwa-26.1
    • None
    • Moderate

        1. Summary
          The Storage-base-remediation (SBR) operator deploys its controller manager with *1 replica* by default. Other medik8s remediation operators (Node Health Check and Fence-Agents-Remediation) default to *2 replicas* for high availability. This inconsistency can leave SBR with no failover if the single controller-manager pod is evicted or fails.
        1. Current behavior
      • *SBR:* `replicas: 1` in the OLM bundle CSV (`bundle/manifests/sbd-operator.clusterserviceversion.yaml`).
      • *NHC:* `replicas: 2` with RollingUpdate (maxSurge: 0, maxUnavailable: 1) and topology spread.
      • *FAR:* `replicas: 2` with the same rollout and topology spread.
        1. Expected behavior
          SBR controller manager should default to *2 replicas* (with leader election), aligned with NHC and FAR, so that:
      • Eviction or failure of one pod does not leave the cluster without the operator.
      • Rolling updates can use maxUnavailable: 1 without dropping to zero replicas.
        1. Proposed change
      • Set default `replicas: 2` for the controller-manager deployment in the bundle CSV.
      • Add a RollingUpdate strategy (e.g. maxSurge: 0, maxUnavailable: 1) and topologySpreadConstraints (e.g. spread by `kubernetes.io/hostname`, maxSkew: 1) to match NHC/FAR and improve HA.
        1. References

              Unassigned Unassigned
              rh-ee-malter Maxim Alter
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: