Uploaded image for project: 'Observability Documentation'
  1. Observability Documentation
  2. OBSDOCS-214

Loki - Cluster Restart Hardening

XMLWordPrintable

    • OBSDOCS (Sep 11 - Oct 2) #242

      Goals

      • Use available pod disruption primitives to harden the LokiStack reliability during OCP cluster restarts
      • Keep the LokiStack ingestion path working while the OCP cluster is restarting
      • Keep the LokiStack query path working while the OCP cluster is restarting

      Motivation

      In OpenShift Container Platform 4, updates are applied based on MachineConfigPool level, requiring customers to apply PodDisruptionBudget to prevent undesired disruption when OpenShift Container Platform 4 - Nodes are being updated/rebooted.

      LokiStack is missing PodDisruptionBudget configuration, which could trigger all OpenShift Container Platform 4 - Nodes, hosting such components to be updated at the same time and therefore restart the entire service at the same time, which may introcued undesired service disruption.

      Acceptance Criteria

      • Any LokiStack deployment size supports OCP cluster restarts without human administrator attendance.
      • Any LokiStack path (ingestion/query) keeps operating within the available boundaries of node resources (CPU/Memory) during OCP cluster restarts.

      Documentation Considerations

      PodDisruptionBudget are already well documented in the official OpenShift Container Platform documentation pages. However our Logging docs should have some sort of banner that we explains how the LokiStack will behave during cluster restarts, e.g. explaning the effect of each PodDisruptionBudget we place.

            abrennan@redhat.com Ashleigh Brennan
            rkratky@redhat.com Robert Krátký
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: