-
Story
-
Resolution: Done
-
Major
-
None
Goals
- Use available pod disruption primitives to harden the LokiStack reliability during OCP cluster restarts
- Keep the LokiStack ingestion path working while the OCP cluster is restarting
- Keep the LokiStack query path working while the OCP cluster is restarting
Motivation
In OpenShift Container Platform 4, updates are applied based on MachineConfigPool level, requiring customers to apply PodDisruptionBudget to prevent undesired disruption when OpenShift Container Platform 4 - Nodes are being updated/rebooted.
LokiStack is missing PodDisruptionBudget configuration, which could trigger all OpenShift Container Platform 4 - Nodes, hosting such components to be updated at the same time and therefore restart the entire service at the same time, which may introcued undesired service disruption.
Acceptance Criteria
- Any LokiStack deployment size supports OCP cluster restarts without human administrator attendance.
- Any LokiStack path (ingestion/query) keeps operating within the available boundaries of node resources (CPU/Memory) during OCP cluster restarts.
Documentation Considerations
PodDisruptionBudget are already well documented in the official OpenShift Container Platform documentation pages. However our Logging docs should have some sort of banner that we explains how the LokiStack will behave during cluster restarts, e.g. explaning the effect of each PodDisruptionBudget we place.