Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8485

Debezium MSK Connector Running but Suddenly Stalling Data Capture

XMLWordPrintable

    • False
    • None
    • False
    • Critical

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:

      What Debezium connector do you use and what version?

      We are using the Debezium MSK Connector with version 2.7.0 

      What is the connector configuration?

      ```

      connector.class=io.debezium.connector.mysql.MySqlConnector 

      database.user=<db_user> 

      database.password=<db_password> 

      database.hostname=<db_hostname> 

      database.port=<db_port> 

      database.server.id=<server_id> 

      database.server.name=<server_name> 

      database.include.list=<db_include_list> 

      tasks.max=1 

      snapshot.mode=schema_only 

      key.converter=org.apache.kafka.connect.json.JsonConverter 

      key.converter.schemas.enable=false 

      value.converter=org.apache.kafka.connect.json.JsonConverter 

      value.converter.schemas.enable=false 

      config.storage.topic=<config_storage_topic> 

      offset.storage.topic=<offset_storage_topic> 

      offset.storage.replication.factor=2 

      topic.creation.default.cleanup.policy=delete 

      topic.creation.default.partitions=2 

      topic.creation.default.replication.factor=2 

      decimal.handling.mode=string 
      ```

      What is the captured database version and mode of deployment?

      The database is RDS MySQL version 8.0.35 (please replace with the version you’re using) deployed in AWS MSK using Debezium as the Kafka connector.

      What behavior do you expect?

      The connector should continuously capture data changes from the source database and write them to the Kafka topic without interruptions.

      What behavior do you see?

      The connector process remains running, but data capture suddenly stalls, and no new data is published to the Kafka topic.

      •The stalls occur unpredictably without prior warning, even though the connector logs indicate it is actively running.

      •We observed gradual memory utilization increases over time (step-like pattern), but this increase does not directly correlate with the sudden stalls.

      •Restarting the connector temporarily resolves the issue, but it recurs after about 30 days.

       

      bytes in per sec for offset topic on datadog metic

      Logs in a normal state

      Logs in an abnormal state

      Gradual memory increase metrics

       

      Sudden connector restart (appears to be memory-related, but no OOM message observed)

      Do you see the same behaviour using the latest released Debezium version?

      Yes, the issue persists in the latest stable release. Testing with the latest Alpha/Beta/CR version is pending.

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

       

      Yes, the logs are available. Notable observations include:

      • Normal operation logs show Committing offsets → flushing outstanding messages → Finished commitOffsets successfully.

      • At problematic times, logs alternate between Committing offsets → flushing 0 outstanding messages, with no Finished commitOffsets successfully.

      How to reproduce the issue using our tutorial deployment?

      1.Deploy the Debezium MSK Connector with the provided configuration.

      2.Configure it to capture binlogs from a MySQL database with moderate to high traffic.

      3.Observe memory utilization over time in CloudWatch or equivalent monitoring tools.

      4.The memory utilization will display a step-like increase pattern after a few hours of operation.

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      <Your answer>

      Implementation ideas (optional)

      <Your answer>

              Unassigned Unassigned
              flatcoke Taemin Kim (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: