Evaluate pros & cons of committing logs only after successful loki write.
At a first glance, pros:
- If FLP pods restart, messages currently being processed are safe if they are only committed on loki write. That's not currently the case, such flows would be lost today
- If FLP fails to write to Loki (e.g. due to temporary load spike, or Loki temporarily down), after retries, flows are still consummable from Kafka and won't be dropped
Cons:
- It might put more stress on FLP globally, as the total volume of flows increases as there are more retries?
- After some time, might run in "too-far-in-past" kind of issues, when retrying to process old logs
- Need to carefully consider which Loki errors deserve a retry or not. This is more error prone. E.g. "too-far-in-past" should not be retried, bc it will fail again and again.
- Metrics such as loki dropped entries will be screwed up, since the loki client doesn't know that the dropped flow will actually be consumed again from kafka.