Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-6885

Loki distributor does not accept logs even when enough ingesters are available

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • Logging 6.3.0
    • Logging 6.0.5, Logging 6.1.3, Logging 6.2.0, Logging 6.3.0
    • Log Storage
    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • NEW
    • NEW
    • Release Note Not Required
    • Log Storage - Sprint 269

      Description of problem:

      The distributor component returns an error when ingesting logs with an error indicating that not enough ingesters are available for the replication factor to be fulfilled, but there are actually enough ingesters available.

      Error message:

      level=warn ts=2025-03-19T19:07:59.945837176Z caller=logging.go:128 orgID=infrastructure msg="POST /loki/api/v1/push (500) 208.414µs Response: \"at least 2 live replicas required, could only find 1 - unhealthy instances: 10.128.1.107:9095\\n\" ws: false; Accept-Encoding: identity; Content-Encoding: snappy; Content-Length: 3801; Content-Type: application/x-protobuf; User-Agent: Vector/0.37.1 (x86_64-unknown-linux-gnu); X-Forwarded-For: 10.130.0.6; X-Forwarded-Prefix: /api/logs/v1/infrastructure; X-Scope-Orgid: infrastructure; "
      

      Ingester pod status:

      > oc get pod -l app.kubernetes.io/component=ingester -o wide
      NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE                                        NOMINATED NODE   READINESS GATES
      lokistack-dev-ingester-0   1/1     Running   0          9m    10.128.1.107   ip-10-0-52-251.eu-west-1.compute.internal   <none>           <none>
      lokistack-dev-ingester-1   1/1     Running   0          10m   10.131.0.35    ip-10-0-32-184.eu-west-1.compute.internal   <none>           <none>
      lokistack-dev-ingester-2   1/1     Running   0          12m   10.131.0.32    ip-10-0-32-184.eu-west-1.compute.internal   <none>           <none>
      

      LokiStack:

      apiVersion: loki.grafana.com/v1
      kind: LokiStack
      metadata:
        name: lokistack-dev
        namespace: openshift-logging
      spec:
        limits:
          global:
            retention:
              days: 30
        managementState: Managed
        replication:
          factor: 2
        size: 1x.demo
        storage:
          schemas:
          - effectiveDate: "2024-06-01"
            version: v13
          secret:
            name: test
            type: s3
        storageClassName: gp3-csi
        template:
          ingester:
            replicas: 3
        tenants:
          mode: openshift-logging
      

      Version-Release number of selected component (if applicable):

      Loki Operator 6.3.0

      Steps to Reproduce:

      1. Create a LokiStack with one ingester more than the replication factor requires
      2. Disrupt the network of a single ingester
      3. Wait for that ingester to become UNHEALTHY
      4. Observe messages in distributor that not enough ingesters are available

      Expected results:

      Distributor is able to ingest log entries even with unhealthy ingesters, if the number of healthy ingesters is enough to fulfill replication factor.

      Additional info:

              rojacob@redhat.com Robert Jacob
              rojacob@redhat.com Robert Jacob
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: