XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.7.0.Beta1
Affects Version/s: 2.5.0.Final
Component/s: debezium-server
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Git Pull Request:
https://github.com/debezium/debezium-server/pull/95/files

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Debezium server unable to shutdown on pubsub error

Bug report

As described in the issue:
https://issues.redhat.com/browse/DBZ-6461

When the load on PubSub becomes high, there's a high chance for the connection to fail. It appears that the Debezium Server is unable to recover from such failures and halts. A fix for this issue was addressed in this PR: Debezium Server PR #23.

Fix Implemented:

The fix introduced a time-bound publish to PubSub, which helps initiate a shutdown by throwing a timeout exception.

Current Problem:

Although the timeout exception initiates the shutdown, the process halts due to an infinite wait during the shutdown of the managed channel by the PubSub publisher (com.google.cloud.pubsub.v1.Publisher.shutdown).

Suggested Improvement: To address this issue, we need to ensure that the shutdown process of the managed channel does not hang indefinitely. Implementing a timeout for the shutdown operation itself may help in resolving this problem.

What Debezium connector do you use and what version?

2.5

What is the connector configuration?

debezium.sink.type=pubsub

What is the captured database version and mode of depoyment?

mysql and sql server <not relevant >

What behaviour do you expect?

we can ensure a more robust handling of high load situations and improve the overall stability of the Debezium Server when interacting with PubSub.

What behaviour do you see?

Halt after timeout error

Do you see the same behaviour using the latest relesead Debezium version?

Yes

Do you have the connector logs, ideally from start till finish?

"Stopping the task and engine"

"Stopping down connector"

"Coordinator didn't stop in the expected time, shutting down executor now"

"Producer failure"

"Finished streaming"

"Connected metrics set to 'false'"

"SignalProcessor stopped"

"Debezium ServiceRegistry stopped."

"Connection gracefully closed"

"Stopped RedisOffsetBackingStore"

"Connector completed: success = 'false', message = 'Stopping connector after error in the application's handler method: java.util.concurrent.TimeoutException: Waited 30000 milliseconds (plus 86180 nanoseconds delay) for ListFuture@360b4ac4[status=PENDING, info=[futures=[com.google.api.core.AbstractApiFuture$InternalSettableFuture@f1e1318[status=SUCCESS, result=[java.lang.String@31172d89]],.......<>

"Received request to stop the engine"

"Stopping the embedded engine"

stackTrace:
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.16/Native Method)

waiting on <no object reference available>
at java.lang.Object.wait(java.base@11.0.16/Object.java:328)
at com.google.cloud.pubsub.v1.Waiter.waitComplete(Waiter.java:44)
waiting to re-lock in wait() <0x00000000d26dc200> (a com.google.cloud.pubsub.v1.Waiter)
at com.google.cloud.pubsub.v1.Publisher.shutdown(Publisher.java:617)
at io.debezium.server.pubsub.PubSubChangeConsumer.lambda$close$1(PubSubChangeConsumer.java:210)
How to reproduce the issue using our tutorial deployment?

Same as here: https://issues.redhat.com/browse/DBZ-6461

Feature request or enhancement

We should ideally address this issue in the Pub/Sub publisher. Once the managed channel fails, it cannot automatically recover. For Debezium, it is crucial to be resilient to such issues.

Which use case/requirement will be addressed by the proposed feature?

Debezium will more resilient.

Implementation ideas (optional)

https://github.com/debezium/debezium-server/pull/95/files

links to

RHEA-2024:139598 Red Hat build of Debezium 2.5.4 release

Assignee:: Unassigned

Reporter:: Ankur Gupta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/05/28 12:58 PM

Updated:: 2024/10/09 12:22 PM

Resolved:: 2024/06/03 1:56 PM

Details

Description

Bug report

What Debezium connector do you use and what version?

What is the connector configuration?

What is the captured database version and mode of depoyment?

What behaviour do you expect?

What behaviour do you see?

Do you see the same behaviour using the latest relesead Debezium version?

How to reproduce the issue using our tutorial deployment?

Feature request or enhancement

Which use case/requirement will be addressed by the proposed feature?

Implementation ideas (optional)

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates