XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Normal
Fix Version/s: Logging 6.2.z, Logging 6.3.0
Affects Version/s: None
Component/s: Log Storage
Labels:
None

Epic Name:
Loki - Re-evaluate deployment sizes
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Green
Docs QE Status:
NEW
Epic Status:
To Do
QE Status:
NEW
Hierarchy Progress Bar:

100% To Do, 0% In Progress, 0% Done
Release Note Type:
If Release Note Needed, Set a Value

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Goals

Provide state of the union benchmarks for LokiStack deployment sizes to re-evaluate CPU, Memory and Disk requests per Loki component.
Rescale the LokiStack deployment sizes based on the benchmarks provided to support HA-setups with replication factor of 2 out of the box.
Rescale the LokiStack deployment sizes based on the benchmarks provided to support PDBs in single-replica demo setups.

Non-Goals

Expose or introduce request and limits tuning in the LokiStack CRD.

Motivation

The current LokiStack deployment sizes are developed based on synthetic benchmarks (See loki-benchmarks) from the early beginning of designing the OpenShift Logging migration from Elasticsearch to Loki (early as Feb 2020). Besides all these years Loki design staying the same and simple on both paths, the maintainers team added many features that improved reliability and scalability a lot (i.e. Work-Ahead-Log replacing Handover-Replication, Automatic Stream Sharding, TSDB-Index-support, etc.). The Loki Operator adapted to many of these incoming improvements always with care and been selective for on-premises. However the initial design having focus on few but big replicas per component stayed till today. In contrast the scheduling options of LokiStack components have been expanding on each release so far (i.e. PodDisruptionBudgets, TopologySpreadConstraints for Zone-Aware-Data-Replication, PodAntiAffinity).

Despite the above several customer reports using LokiStack either in high-availability configurations (See LOG-4914) and small demo setups (See LOG-4824) inform that the present LokiStack deployment sizes are not considered a good fit out of the box. Counting on the Loki efficiency gains over the years, the following EPIC is dedicated to re-evaluate each deployment size towards more but smaller replicas.

Alternatives

N/A

Acceptance Criteria

Given the LokiStack administrator installed one of the out-of-the box deployments sizes when the cluster restarts any of the scheduled nodes then the LokiStack ingestion path continues operations for a replication factor of 2.
Given the LokiStack administrator adjusts any of the deployment sizes to use a single replica per component when any of the Loki pods gets evicted by a cluster restart then the cluster restart can progress without being blocked by PodDisruption Budgets.

Risk and Assumptions

N/A

Documentation Considerations

Probably add amount of replicas per component on the deployment sizes table.

Open Questions

N/A

Additional Notes

N/A

links to

[KCS] Log is missing during 1 node down when using Loki Stack in RHOCP4

Log is missing during 1 node down when using Loki Stack in RHOCP4

Assignee:: Robert Jacob

Reporter:: Periklis Tsirakidis

QA Contact:: Kabir Bharti

Votes:: 5 Vote for this issue

Watchers:: 18 Start watching this issue

Created:: 2024/03/11 3:45 PM

Updated:: 2025/01/27 1:47 PM

Details

Description

Goals

Non-Goals

Motivation

Alternatives

Acceptance Criteria

Risk and Assumptions

Documentation Considerations

Open Questions

Additional Notes

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates