Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-716

SBR | PriorityClassName mismatch: SBR controller-manager lacks system-cluster-critical priority compared to FAR and NHC

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhwa-26.1
    • None
    • Important

      During a baseline audit of the remediation operators in the openshift-workload-availability namespace, a discrepancy was found in the priorityClassName configuration. While the Fence Agents Remediation (FAR) and Node Healthcheck (NHC) controllers are protected by system-cluster-critical, the SBR (sbd-operator) controller-manager is running with no defined priority class (defaulting to 0).

      In a resource-constrained environment, the SBR controller could be preempted/evicted to make room for application workloads, effectively breaking the node fencing mechanism when it is needed most.

      Environment:

      • Namespace: openshift-workload-availability
      • Operator: Storage-Based-Remediation (SBD)

      Observed Priority Mapping:

      Operator Component PriorityClassName Risk
      fence-agents-remediation-controller system-cluster-critical ✅ Protected
      node-healthcheck-controller system-cluster-critical ✅ Protected
      sbd-operator-controller-manager (Empty/None) Vulnerable to Eviction

      Steps to Reproduce:

      1. Inspect the pods in the remediation namespace:
        oc get pods -n openshift-workload-availability -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.priorityClassName}{"\n"}{end}' | grep sbd-operator 
      1. Observe that the result for the controller-manager is empty.

      Expected Result: The sbd-operator-controller-manager should have priorityClassName: system-cluster-critical to ensure parity with other Medik8s/remediation components and guarantee availability during node pressure events.

      Actual Result: The pod runs with default priority (0), making it a candidate for eviction by the kube-scheduler during heavy cluster load.

              Unassigned Unassigned
              rh-ee-malter Maxim Alter
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: