-
Task
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
None
-
False
-
None
-
False
-
No
-
---
-
---
-
-
-
MK - Sprint 230, MK - Sprint 231
Relates to OHSS-17337
WHAT
The change to elevate priority may cause existing broker / zookeeper pods to be evicted.
WHY
If the additional reserve / nodes are not immediately available, then existing broker / zookeeper pods may be evicted (circumventing the drain cleaner) and then must wait for the additional nodes to spin up to be available again.
HOW
The near-term solution is to disable elevate-priority in the strimzi bundle configmap for the affected version.
Longer term one of the following needs to be done:
1. Confirm / update the production reserve, such that there is sufficient hot capacity to complete the rollout without causing eviction of existing brokers / zookeepers.
2. Replace the priority class with one that is non-preempting. However this means that the new pods would no longer preempt the reserve, if cluster reached capacity the reserve could stay on the cluster ahead of these new pods. We'd have to ensure that production has more max node capacity in its machine pools (effectively capacity + reserve). Then once the rollout is completed and the priority class is replaced a preempting, another change would be required to roll that through - as the pods will not immediately pick up priority class changes.
3. Switch this feature in the FSO to be more situational - that is detect when nodes are preventing broker placement and have the FSO delete the offending pods which should then allow (but not guarentee) subsequent deployments to appropriately consume the node, or the auto-scaler to reclaim the node.
- relates to
-
MGDSTRM-10300 Create separate blast radius control mechanisms
- New
-
MGDSTRM-10160 Add ""elevate-priority" flag to RHOSAK bundle
- Closed
- mentioned on