Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-978

Kafka: commit after loki write

    • Icon: Spike Spike
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Kafka, Loki
    • None
    • False
    • None
    • False
    • NetObserv - ShiftWeek 4.15

      Evaluate pros & cons of committing logs only after successful loki write.

      At a first glance, pros:

      • If FLP pods restart, messages currently being processed are safe if they are only committed on loki write. That's not currently the case, such flows would be lost today
      • If FLP fails to write to Loki (e.g. due to temporary load spike, or Loki temporarily down), after retries, flows are still consummable from Kafka and won't be dropped

      Cons:

      • It might put more stress on FLP globally, as the total volume of flows increases as there are more retries?
      • After some time, might run in "too-far-in-past" kind of issues, when retrying to process old logs
      • Need to carefully consider which Loki errors deserve a retry or not. This is more error prone. E.g. "too-far-in-past" should not be retried, bc it will fail again and again.
      • Metrics such as loki dropped entries will be screwed up, since the loki client doesn't know that the dropped flow will actually be consumed again from kafka.

              Unassigned Unassigned
              jtakvori Joel Takvorian
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: