-
Bug
-
Resolution: Done
-
Major
-
2.5.0.Final
-
None
Debezium server unable to shutdown on pubsub error
Bug report
As described in the issue:
https://issues.redhat.com/browse/DBZ-6461
When the load on PubSub becomes high, there's a high chance for the connection to fail. It appears that the Debezium Server is unable to recover from such failures and halts. A fix for this issue was addressed in this PR: Debezium Server PR #23.
Fix Implemented:
- The fix introduced a time-bound publish to PubSub, which helps initiate a shutdown by throwing a timeout exception.
Current Problem:
- Although the timeout exception initiates the shutdown, the process halts due to an infinite wait during the shutdown of the managed channel by the PubSub publisher (com.google.cloud.pubsub.v1.Publisher.shutdown).
Suggested Improvement: To address this issue, we need to ensure that the shutdown process of the managed channel does not hang indefinitely. Implementing a timeout for the shutdown operation itself may help in resolving this problem.
What Debezium connector do you use and what version?
2.5
What is the connector configuration?
debezium.sink.type=pubsub
What is the captured database version and mode of depoyment?
mysql and sql server <not relevant >
What behaviour do you expect?
we can ensure a more robust handling of high load situations and improve the overall stability of the Debezium Server when interacting with PubSub.
What behaviour do you see?
Halt after timeout error
Do you see the same behaviour using the latest relesead Debezium version?
Yes
Do you have the connector logs, ideally from start till finish?
"Stopping the task and engine"
"Stopping down connector"
"Coordinator didn't stop in the expected time, shutting down executor now"
"Producer failure"
"Finished streaming"
"Connected metrics set to 'false'"
"SignalProcessor stopped"
"Debezium ServiceRegistry stopped."
"Connection gracefully closed"
"Connection gracefully closed"
"Stopped RedisOffsetBackingStore"
"Connector completed: success = 'false', message = 'Stopping connector after error in the application's handler method: java.util.concurrent.TimeoutException: Waited 30000 milliseconds (plus 86180 nanoseconds delay) for ListFuture@360b4ac4[status=PENDING, info=[futures=[com.google.api.core.AbstractApiFuture$InternalSettableFuture@f1e1318[status=SUCCESS, result=[java.lang.String@31172d89]],.......<>
"Received request to stop the engine"
"Stopping the embedded engine"
stackTrace:
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.16/Native Method)
- waiting on <no object reference available>
at java.lang.Object.wait(java.base@11.0.16/Object.java:328)
at com.google.cloud.pubsub.v1.Waiter.waitComplete(Waiter.java:44) - waiting to re-lock in wait() <0x00000000d26dc200> (a com.google.cloud.pubsub.v1.Waiter)
at com.google.cloud.pubsub.v1.Publisher.shutdown(Publisher.java:617)
at io.debezium.server.pubsub.PubSubChangeConsumer.lambda$close$1(PubSubChangeConsumer.java:210)How to reproduce the issue using our tutorial deployment?
Same as here: https://issues.redhat.com/browse/DBZ-6461
Feature request or enhancement
We should ideally address this issue in the Pub/Sub publisher. Once the managed channel fails, it cannot automatically recover. For Debezium, it is crucial to be resilient to such issues.
Which use case/requirement will be addressed by the proposed feature?
Debezium will more resilient.
Implementation ideas (optional)
- links to
-
RHEA-2024:139598 Red Hat build of Debezium 2.5.4 release