-
Bug
-
Resolution: Not a Bug
-
Normal
-
Logging 6.0.5, Logging 6.1.3, Logging 6.2.0, Logging 6.3.0
-
2
-
False
-
-
False
-
NEW
-
NEW
-
Release Note Not Required
-
-
-
Log Storage - Sprint 269
Description of problem:
The distributor component returns an error when ingesting logs with an error indicating that not enough ingesters are available for the replication factor to be fulfilled, but there are actually enough ingesters available.
Error message:
level=warn ts=2025-03-19T19:07:59.945837176Z caller=logging.go:128 orgID=infrastructure msg="POST /loki/api/v1/push (500) 208.414µs Response: \"at least 2 live replicas required, could only find 1 - unhealthy instances: 10.128.1.107:9095\\n\" ws: false; Accept-Encoding: identity; Content-Encoding: snappy; Content-Length: 3801; Content-Type: application/x-protobuf; User-Agent: Vector/0.37.1 (x86_64-unknown-linux-gnu); X-Forwarded-For: 10.130.0.6; X-Forwarded-Prefix: /api/logs/v1/infrastructure; X-Scope-Orgid: infrastructure; "
Ingester pod status:
> oc get pod -l app.kubernetes.io/component=ingester -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES lokistack-dev-ingester-0 1/1 Running 0 9m 10.128.1.107 ip-10-0-52-251.eu-west-1.compute.internal <none> <none> lokistack-dev-ingester-1 1/1 Running 0 10m 10.131.0.35 ip-10-0-32-184.eu-west-1.compute.internal <none> <none> lokistack-dev-ingester-2 1/1 Running 0 12m 10.131.0.32 ip-10-0-32-184.eu-west-1.compute.internal <none> <none>
LokiStack:
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: lokistack-dev
namespace: openshift-logging
spec:
limits:
global:
retention:
days: 30
managementState: Managed
replication:
factor: 2
size: 1x.demo
storage:
schemas:
- effectiveDate: "2024-06-01"
version: v13
secret:
name: test
type: s3
storageClassName: gp3-csi
template:
ingester:
replicas: 3
tenants:
mode: openshift-logging
Version-Release number of selected component (if applicable):
Loki Operator 6.3.0
Steps to Reproduce:
- Create a LokiStack with one ingester more than the replication factor requires
- Disrupt the network of a single ingester
- Wait for that ingester to become UNHEALTHY
- Observe messages in distributor that not enough ingesters are available
Expected results:
Distributor is able to ingest log entries even with unhealthy ingesters, if the number of healthy ingesters is enough to fulfill replication factor.