Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 1.9.7.Final, 2.0.0.Final
Affects Version/s: 1.9.6.Final
Component/s: vitess-connector
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Target Release:

1.9.GA
Git Pull Request:
https://github.com/debezium/debezium-connector-vitess/pull/103

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

Bug report

For bug reports, provide this information, please:

What Debezium connector do you use and what version?

Vitess Connector

What is the connector configuration?

Not relevant

What is the captured database version and mode of depoyment?

(E.g. on-premises, with a specific cloud provider, etc.)

Vitess on AWS

What behaviour do you expect?

Connector will retry on this specific error

What behaviour do you see?

Connector shutdown

Do you see the same behaviour using the latest relesead Debezium version?

Yes

Do you have the connector logs, ideally from start till finish?

2022-10-12 10:00:17,440 INFO Vitess|dev|streaming Exception code: UNKNOWN and description: vttablet: rpc error: code = Unknown desc = stream (at source tablet) error @ 027c67a2-c0b0-11ec-8a34-0ed0087913a5:1-11418261,08fb1cf3-0ce5-11ed-b921-0a8939501751:1-1443715,0c4d44de-3bc1-11ed-af99-0ee57f5f6bdb:1-3035408,396a0037-51e5-11ec-b6d4-06e7d02a97f7:1-23131540,47e148fc-b58f-11ec-afdd-0a92e288553b:1-50908799,4b67241e-03d4-11ed-af16-0a82ff4428fb:1-2100765,7007b595-20d7-11ed-b227-0a96a0fa5379:1-26327638,79099da3-b59a-11ec-b407-121791d0ff3d:1-99004816,9ced45d8-3699-11ec-a94b-1290866dd717:1-122256212,b5f72b93-06cb-11ed-9102-12b3b7a9a0f3:1-46356613,c96e48e1-03da-11ed-973f-0e1efd5bebd5:1-11561105,c97f6052-b594-11ec-bc09-06fc71a256b7:1-6826162,d91ec27f-3bc8-11ed-b440-1218686a4bc1:1-21409798,dc43ce77-51df-11ec-b20f-12bb5fac9935:1-173860388,e3303cce-3698-11ec-8558-06a96d8a92e5:1-100485,efbe0dcb-3b6c-11ed-8683-0a4617a72d31:1-1077281,fe2dace5-2256-11ed-a3d4-12151ce13117:1-35314391: unexpected server EOF [io.debezium.connector.vitess.VitessErrorHandler]
2022-10-12 10:00:17,441 INFO Vitess|dev|streaming Closing replication connection [io.debezium.connector.vitess.connection.VitessReplicationConnection]
2022-10-12 10:00:17,443 INFO Vitess|dev|streaming VStream GRPC channel is shutdown in time. [io.debezium.connector.vitess.connection.VitessReplicationConnection]
2022-10-12 10:00:17,443 INFO Vitess|dev|streaming Finished streaming [io.debezium.pipeline.ChangeEventSourceCoordinator]
2022-10-12 10:00:17,443 INFO Vitess|dev|streaming Connected metrics set to 'false' [io.debezium.pipeline.ChangeEventSourceCoordinator]
2022-10-12 10:00:17,992 ERROR || WorkerSourceTask{id=vitess-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:50)
at io.debezium.connector.vitess.VitessStreamingChangeEventSource.execute(VitessStreamingChangeEventSource.java:78)
at io.debezium.connector.vitess.VitessStreamingChangeEventSource.execute(VitessStreamingChangeEventSource.java:29)
at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:174)
at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:141)
at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:109)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: io.grpc.StatusRuntimeException: UNKNOWN: vttablet: rpc error: code = Unknown desc = stream (at source tablet) error @ 027c67a2-c0b0-11ec-8a34-0ed0087913a5:1-11418261,08fb1cf3-0ce5-11ed-b921-0a8939501751:1-1443715,0c4d44de-3bc1-11ed-af99-0ee57f5f6bdb:1-3035408,396a0037-51e5-11ec-b6d4-06e7d02a97f7:1-23131540,47e148fc-b58f-11ec-afdd-0a92e288553b:1-50908799,4b67241e-03d4-11ed-af16-0a82ff4428fb:1-2100765,7007b595-20d7-11ed-b227-0a96a0fa5379:1-26327638,79099da3-b59a-11ec-b407-121791d0ff3d:1-99004816,9ced45d8-3699-11ec-a94b-1290866dd717:1-122256212,b5f72b93-06cb-11ed-9102-12b3b7a9a0f3:1-46356613,c96e48e1-03da-11ed-973f-0e1efd5bebd5:1-11561105,c97f6052-b594-11ec-bc09-06fc71a256b7:1-6826162,d91ec27f-3bc8-11ed-b440-1218686a4bc1:1-21409798,dc43ce77-51df-11ec-b20f-12bb5fac9935:1-173860388,e3303cce-3698-11ec-8558-06a96d8a92e5:1-100485,efbe0dcb-3b6c-11ed-8683-0a4617a72d31:1-1077281,fe2dace5-2256-11ed-a3d4-12151ce13117:1-35314391: unexpected server EOF
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478)
at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:616)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:802)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:781)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
... 3 more
2022-10-12 10:00:17,993 INFO || Stopping down connector [io.debezium.connector.common.BaseSourceTask]
2022-10-12 10:00:17,994 INFO || [Producer clientId=connector-producer-vitess-connector-0] Closing the Kafka producer with timeoutMillis = 30000 ms. [org.apache.kafka.clients.producer.KafkaProducer]
2022-10-12 10:00:17,995 INFO || Metrics scheduler closed [org.apache.kafka.common.metrics.Metrics]
2022-10-12 10:00:17,995 INFO || Closing reporter org.apache.kafka.common.metrics.JmxReporter [org.apache.kafka.common.metrics.Metrics]
2022-10-12 10:00:17,995 INFO || Metrics reporters closed [org.apache.kafka.common.metrics.Metrics]
2022-10-12 10:00:17,995 INFO || App info kafka.producer for connector-producer-vitess-connector-0 unregistered [org.apache.kafka.common.utils.AppInfoParser]

How to reproduce the issue using our tutorial deployment?

Setup 2 VtTablet servers behind the VtGate, have one of the VtTable have truncated binlogs (e.g. newly bootstrapped from a backup file), have the Debezium connects to the VtGate using a relatively old Gtids. On 50% the request will hit one of the VtTablet server which doesn't have the grid history, the Vstream will error out with error messages above. We should retry on this case to get the request route to other VtTablet servers which has the grid.

Feature request or enhancement

For feature requests or enhancements, provide this information, please:

Which use case/requirement will be addressed by the proposed feature?

Implementation ideas (optional)

Details

Description