-
Task
-
Resolution: Done
-
Blocker
-
None
-
None
-
False
-
False
-
None
We recently had a production failure with Kafka because we were not polling fast enough and our consumer (the notifications app) got kicked-out of the consumer group without being able to commit its latest offset. This means we certainly processed some of the Kafka messages several times and sent duplicate notifications.
Our current code processes the Kafka messages at least once but what we would need ideally is to process them exactly once. As explained in the Kafka doc, one way of achieving that is to include a unique identifier into the Kafka message and then check on the consumer side (our app) if that identifier has already been processed. If so, the message should be ignored.
Including that identifier into the payload would be a breaking change and require several schema versions at the same time on production, but there's another way to deal with the identifier. Kafka messages come with headers which can hold any value. This is what we should use to retrieve the identifier.
Using an identifier from the header would allow a smooth migration that could look like this:
- Phase 1: We try to retrieve the identifier and if it's present and already known, the message is ignored. Otherwise the message is processed. If the identifier is missing, then the message is also processed.
- Phase 2: We inform the onboarded app that they should now send that header with their message, but that the change is non-breaking and they have until a certain deadline to do the change.
- Phase 3: The header is mandatory and messages are rejected if it's not present.
- relates to
-
RHCLOUD-22078 Ask the onboarded apps to send the new rh-message-id Kafka header
-
- Closed
-
- mentioned on