Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-5204

Loki - Re-evaluate deployment sizes

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • Logging 6.1.0
    • None
    • Log Storage
    • None
    • Loki - Re-evaluate deployment sizes
    • False
    • None
    • False
    • Not Selected
    • NEW
    • To Do
    • NEW
    • 100% To Do, 0% In Progress, 0% Done
    • If Release Note Needed, Set a Value

      Goals

      • Provide state of the union benchmarks for LokiStack deployment sizes to re-evaluate CPU, Memory and Disk requests per Loki component.
      • Rescale the LokiStack deployment sizes based on the benchmarks provided to support HA-setups with replication factor of 2 out of the box.
      • Rescale the LokiStack deployment sizes based on the benchmarks provided to support PDBs in single-replica demo setups.

      Non-Goals

      • Expose or introduce request and limits tuning in the LokiStack CRD.

      Motivation

      The current LokiStack deployment sizes are developed based on synthetic benchmarks (See loki-benchmarks) from the early beginning of designing the OpenShift Logging migration from Elasticsearch to Loki (early as Feb 2020). Besides all these years Loki design staying the same and simple on both paths, the maintainers team added many features that improved reliability and scalability a lot (i.e. Work-Ahead-Log replacing Handover-Replication, Automatic Stream Sharding, TSDB-Index-support, etc.). The Loki Operator adapted to many of these incoming improvements always with care and been selective for on-premises. However the initial design having focus on few but big replicas per component stayed till today. In contrast the scheduling options of LokiStack components have been expanding on each release so far (i.e. PodDisruptionBudgets, TopologySpreadConstraints for Zone-Aware-Data-Replication, PodAntiAffinity).

      Despite the above several customer reports using LokiStack either in high-availability configurations (See LOG-4914) and small demo setups (See LOG-4824) inform that the present LokiStack deployment sizes are not considered a good fit out of the box. Counting on the Loki efficiency gains over the years, the following EPIC is dedicated to re-evaluate each deployment size towards more but smaller replicas.

      Alternatives

      N/A

      Acceptance Criteria

      1. Given the LokiStack administrator installed one of the out-of-the box deployments sizes when the cluster restarts any of the scheduled nodes then the LokiStack ingestion path continues operations for a replication factor of 2.
      2. Given the LokiStack administrator adjusts any of the deployment sizes to use a single replica per component when any of the Loki pods gets evicted by a cluster restart then the cluster restart can progress without being blocked by PodDisruption Budgets.

      Risk and Assumptions

      N/A

      Documentation Considerations

      Probably add amount of replicas per component on the deployment sizes table.

      Open Questions

      N/A

      Additional Notes

      N/A

            ptsiraki@redhat.com Periklis Tsirakidis
            ptsiraki@redhat.com Periklis Tsirakidis
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: