Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 2.5.0.GA
Component/s: kafka-broker
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
GSS Priority:
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

We have noticed a number of oddities in the way that offsets are synchronized by MM2, that were not present in Streams 2.4. We suspect that these oddities might be the result of work that was done in KAFKA-14666. These problems can be reproduced quite easily using two Streams clusters in different namespaces on OpenShift, if the namespaces are allowed to communicate using local services. Otherwise routes will have to be used, which means configuring TLS and certificates, which would complicate the set-up a lot.

We have set up MM2 using the attached `mm2.yaml`.

Here is one way to reproduce a problem, but we have seen other problems in our testing, as has the customer.

1. Set the number of MM2 replicas to zero, so it isn't running.

2. Send a million messages to a specific topic, using kafka-console-producer.sh.

3. Start a consumer on that topic using kafka-console-consumer.sh using a specific consumer group ID.

4. At the same time, monitor the consumer group using `kafka-consumer-groups.sh`. Stop when roughly half the messages are consumed. So the utility says `log length=1000000, last offset=500000, lag=500000` (for example).

5. Stop producer and consumer

6. Start MM2 by increasing its replica count to 1. Wait a little while – perhaps a minute or so – for the synchronization to happen (can be seen in the MM2 log)

7. Run `kafka-consumer-groups.sh` on the target system, same consumer group. In our tests we consistently saw `length=1000000, last offset=1000000, lag=0`. That is, the offset is (it seems) at the end of the log in the target system.

8. Try to consume messages from the target using `kakfa-console-consumer.sh`. For the consumer group ID we have been using, no message is consumed – presumably because the offset is at the end of the log.

We appreciate that the same messages will not necessarily appear at the same offset in the mirrors Kafka cluster. However, we would expect consumers to start consuming from the same place (more or less). In all out tests, consumers do not get the messages they expect; often the offsets are wildly wrong.

We tried setting `offset.lag.max` to zero, but it made no difference.

Arguably, replicating a cluster with no producers and consumers is not a realistic operation. But we saw problems when the producers and consumers were running as well – they're just harder to quantify.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mm2.yaml
4 kB
2023/11/16 8:22 AM

Assignee:: Unassigned

Reporter:: Kevin Boone

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/11/16 8:20 AM

Updated:: 2023/12/11 4:12 PM

Resolved:: 2023/12/11 3:14 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates