Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4389

VStream gRPC connection closed after being idle for a few minutes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 1.8.0.CR1
    • None
    • vitess-connector
    • None
    • False
    • False
    • Hide
      1. Start Kafka Connect with Vitess plugin
      2. Optionally insert a row or two into Vitess using mysql cli, or do nothing.
      3. Wait for a few minutes. On laptop this is typically 5+ minutes, and in prod it's around 10 minutes.
      4. The error occurred in Kafka Connect worker and connector stopped working with the task in FAILED state. The error can be found in Kafka Connect worker log or hitting the REST API for task status, such as curl -s localhost:8083/connectors/sample_single_shard_connector/status
      Show
      Start Kafka Connect with Vitess plugin Optionally insert a row or two into Vitess using mysql cli, or do nothing. Wait for a few minutes. On laptop this is typically 5+ minutes, and in prod it's around 10 minutes. The error occurred in Kafka Connect worker and connector stopped working with the task in FAILED state. The error can be found in Kafka Connect worker log or hitting the REST API for task status, such as curl -s localhost:8083/connectors/sample_single_shard_connector/status

    Description

      When the Vitess connector is started and there are no database changes that would trigger change events, the VStream gRPC connection will be closed in a few minutes and the Vitess connector crashes with the task turning into FAILED state.

      I can always reproduce this in both local and our production environments. I have observed two types of error messages, but both are of "UNAVAILABLE" gRPC status code.

      Local laptop where Vitess is running in Docker container:

      Caused by: java.lang.RuntimeException: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason	at io.debezium.connector.vitess.VitessStreamingChangeEventSource.execute(VitessStreamingChangeEventSource.java:74)	... 8 moreCaused by: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason 

      In production environment where Envoy proxy is used:

      Caused by: java.lang.RuntimeException: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERRORReceived Rst Stream\tat io.debezium.connector.vitess.VitessStreamingChangeEventSource.execute(VitessStreamingChangeEventSource.java:74)\t... 9 moreCaused by: io.grpc.StatusRuntimeException: UNAVAILABLE: HTTP/2 error code: NO_ERRORReceived Rst Stream 

      I'm working on proposed fixes which are:

      • Enable keepalive pings for the VStream gRPC. I tested out 1-minute interval which can prevent the crash. This interval can be made configurable. Setting this value is not required but recommended.
      • Implement a VitessErrorHandler and mark StatusRuntimeException that has UNAVAILABLE status code as retriable as per the recommendation. https://grpc.github.io/grpc/core/md_doc_statuscodes.html

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shichaoan Shichao An (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: