Loading...

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: 2.7.0.GA
Affects Version/s: 2.1.0.GA
Component/s: topic-operator
Labels:
- day2

Epic Link:
TO Fixes
Blocked:
False
Blocked Reason:
None
Ready:
False
GSS Priority:
Target Release:

2.7.0.GA

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The customer is having the following KafkaConnect error following an automatic upgrade on OpenShift 4:

2022-04-07 20:47:57,406 ERROR [Worker clientId=connect-1, groupId=ircc-connect-cluster] Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1-1]
org.apache.kafka.common.config.ConfigException: Topic 'connect-cluster-offsets' supplied via the 'offset.storage.topic' property is required to have 'cleanup.policy=compact' to guarantee consistency and durability of source connector offsets, but found the topic currently has 'cleanup.policy=delete'. Continuing would likely result in eventually losing source connector offsets and problems restarting this Connect cluster in the future. Change the 'offset.storage.topic' property in the Connect worker configurations to use a topic with 'cleanup.policy=compact'.

It is all good when you deploy KafkaConnect after the Kafka cluster is up and running.

$ kubectl get kt | grep connect-cluster                                                                                       
connect-cluster-configs                                                                            my-cluster   1            3                    True
connect-cluster-offsets                                                                            my-cluster   25           3                    True
connect-cluster-status                                                                             my-cluster   5            3                    True

$ kubectl get kt connect-cluster-offsets -o yaml | yq eval ".spec" -
config:
  cleanup.policy: compact
partitions: 25
replicas: 3
topicName: connect-cluster-offsets

Instead, this is what happens when Kafka and KafkaConnect are reconciled concurrently and you manually delete all topic resources.

$ kubectl delete po --all && kubectl delete kt --all
...

$ kubectl get kt | grep connect-cluster
connect-cluster-configs                                          my-cluster   3            3                    
connect-cluster-offsets                                          my-cluster   3            3                    
connect-cluster-status                                           my-cluster   3            3                                     

$ kubectl get kt connect-cluster-offsets -o yaml | yq eval ".spec" -
config: {}
partitions: 3
replicas: 3
topicName: connect-cluster-offsets

At this point, we can check the TopicOperator log to see what happened to our connect-cluster-offsets topic for example.
Initially, the topic is only present in Kafka from the previous deployment, so we need to create it in K8s.

2022-04-29 10:52:21,07830 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #100(initial kafka connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Reconciling topic connect-cluster-offsets, k8sTopic:null, kafkaTopic:nonnull, privateTopic:nonnull

Then we have lots of invalid state store errors, which I think are responsible for the lost topic configuration.

2022-04-29 10:54:30,63425 INFO  [vert.x-eventloop-thread-1] TopicOperator:576 - Reconciliation #735(periodic -connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Reconciling topic connect-cluster-offsets, k8sTopic:null, kafkaTopic:nonnull, privateTopic:null

2022-04-29 10:54:38,37543 ERROR [vert.x-eventloop-thread-0] K8sTopicWatcher:69 - Reconciliation #943(kube +connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Failure processing KafkaTopic watch event ADDED on resource connect-cluster-offsets with labels {strimzi.io/cluster=my-cluster}: The state store, topic-store, may have migrated to another instance.

Finally, the topic is created, but with the wrong configuration.

2022-04-29 10:54:38,37525 INFO  [kubernetes-ops-pool-11] CrdOperator:113 - Reconciliation #926(kube +connect-cluster-offsets) KafkaTopic(test/connect-cluster-offsets): Status of KafkaTopic connect-cluster-offsets in namespace test has been updated

After that, the TopicOperator does not work anymore, as it is stuck with an invalid state store (restarting the pod does not seem to help).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

amq_streams_configuration_data.tar.gz
698 kB
2022/05/19 8:34 PM
to.log
228 kB
2022/04/29 3:28 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates