Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-3839

Loki - Cluster Restart Hardening

XMLWordPrintable

    • Loki - Cluster Restart Hardening
    • 3
    • False
    • None
    • False
    • Green
    • NEW
    • Done
    • OBSDA-309 - PodDisruptionBudget for LokiStack to keep the stack up and running and under control during OpenShift Container Platform 4 - upgrades
    • OBSDA-309PodDisruptionBudget for LokiStack to keep the stack up and running and under control during OpenShift Container Platform 4 - upgrades
    • VERIFIED
    • 0% To Do, 0% In Progress, 100% Done
    • With this update, the Loki Operator introduces PodDisruptionBudget configuration on LokiStack deployments to ensure normal operations during OCP cluster restarts by keeping ingestion and the query path available.
    • Enhancement

      Goals

      • Use available pod disruption primitives to harden the LokiStack reliability during OCP cluster restarts
      • Keep the LokiStack ingestion path working while the OCP cluster is restarting
      • Keep the LokiStack query path working while the OCP cluster is restarting

      Non-Goals

      • Enable user-customizable disruption configuration for each individual LokiStack component.
      • Dynamically adjust the pod disruption configuration by operator-managed automation.

      Motivation

      In OpenShift Container Platform 4, updates are applied based on MachineConfigPool level, requiring customers to apply PodDisruptionBudget to prevent undesired disruption when OpenShift Container Platform 4 - Nodes are being updated/rebooted.

      LokiStack is missing PodDisruptionBudget configuration, which could trigger all OpenShift Container Platform 4 - Nodes, hosting such components to be updated at the same time and therefore restart the entire service at the same time, which may introcued undesired service disruption.

      Alternatives

      Acceptance Criteria

      • Any LokiStack deployment size supports OCP cluster restarts without human administrator attendance.
      • Any LokiStack path (ingestion/query) keeps operating within the available boundaries of node resources (CPU/Memory) during OCP cluster restarts.

      Risk and Assumptions

      Documentation Considerations

      PodDisruptionBudget are already well documented in the official OpenShift Container Platform documentation pages (See here). However our Logging docs should have some sort of banner that we explains how the LokiStack will behave during cluster restarts, e.g. explaning the effect of each PodDisruptionBudget we place.

      Open Questions

      Additional Notes

              ptsiraki@redhat.com Periklis Tsirakidis
              ptsiraki@redhat.com Periklis Tsirakidis
              Kabir Bharti Kabir Bharti
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: