-
Bug
-
Resolution: Done
-
Major
-
None
-
None
When the Vitess connector is started and there are no database changes that would trigger change events, the VStream gRPC connection will be closed in a few minutes and the Vitess connector crashes with the task turning into FAILED state.
I can always reproduce this in both local and our production environments. I have observed two types of error messages, but both are of "UNAVAILABLE" gRPC status code.
Local laptop where Vitess is running in Docker container:
Caused by: java.lang.RuntimeException: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason at io.debezium.connector.vitess.VitessStreamingChangeEventSource.execute(VitessStreamingChangeEventSource.java:74) ... 8 moreCaused by: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
In production environment where Envoy proxy is used:
Caused by: java.lang.RuntimeException: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERRORReceived Rst Stream\tat io.debezium.connector.vitess.VitessStreamingChangeEventSource.execute(VitessStreamingChangeEventSource.java:74)\t... 9 moreCaused by: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERRORReceived Rst Stream
I'm working on proposed fixes which are:
- Enable keepalive pings for the VStream gRPC. I tested out 1-minute interval which can prevent the crash. This interval can be made configurable. Setting this value is not required but recommended.
- Implement a VitessErrorHandler and mark StatusRuntimeException that has UNAVAILABLE status code as retriable as per the recommendation. https://grpc.github.io/grpc/core/md_doc_statuscodes.html
- is related to
-
DBZ-4391 Unstable test for online DDL changes
- Closed