-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhwa-26.1
-
None
-
- Summary
The Storage-base-remediation (SBR) operator deploys its controller manager with *1 replica* by default. Other medik8s remediation operators (Node Health Check and Fence-Agents-Remediation) default to *2 replicas* for high availability. This inconsistency can leave SBR with no failover if the single controller-manager pod is evicted or fails.
- Summary
-
- Current behavior
- *SBR:* `replicas: 1` in the OLM bundle CSV (`bundle/manifests/sbd-operator.clusterserviceversion.yaml`).
- *NHC:* `replicas: 2` with RollingUpdate (maxSurge: 0, maxUnavailable: 1) and topology spread.
- *FAR:* `replicas: 2` with the same rollout and topology spread.
-
- Expected behavior
SBR controller manager should default to *2 replicas* (with leader election), aligned with NHC and FAR, so that:
- Expected behavior
- Eviction or failure of one pod does not leave the cluster without the operator.
- Rolling updates can use maxUnavailable: 1 without dropping to zero replicas.
-
- Proposed change
- Set default `replicas: 2` for the controller-manager deployment in the bundle CSV.
- Add a RollingUpdate strategy (e.g. maxSurge: 0, maxUnavailable: 1) and topologySpreadConstraints (e.g. spread by `kubernetes.io/hostname`, maxSkew: 1) to match NHC/FAR and improve HA.
-
- References
- NHC deployment spec: https://github.com/medik8s/node-healthcheck-operator/blob/main/bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml
- FAR deployment spec: https://github.com/medik8s/fence-agents-remediation/blob/main/bundle/manifests/fence-agents-remediation.clusterserviceversion.yaml
- SBR already uses `--leader-elect`, so multiple replicas are safe; only one will be active.