Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-9710

Debezium Cassandra Connector Silently Stops Processing After CommitLog Move Failure

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 3.4.0.CR1
    • 3.3.0.Final
    • cassandra-connector
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Bug report

      What Debezium connector do you use and what version?

      Debezium Cassandra connector version 3.3.0.Final. Experienced the same with 2.7.0.Final version.

      What is the connector configuration?

       

      connector.name=debezium-test
      commit.log.relocation.dir=/debezium/relocation/
      http.port=8000
      
      
      cassandra.config=/etc/cassandra/cassandra.yaml
      cassandra.driver.config.file=/debezium/config/application.conf
      cassandra.hosts=127.0.0.1
      cassandra.port=9042
      
      
      kafka.producer.bootstrap.servers=<kafka-bootstrap-server>
      kafka.producer.retries=3
      kafka.producer.retry.backoff.ms=1000
      kafka.producer.buffer.memory=134217728
      kafka.producer.batch.size=131072
      kafka.producer.linger.ms=5
      kafka.producer.compression.type=lz4
      topic.prefix=test_prefix
      
      
      key.converter=org.apache.kafka.connect.json.JsonConverter
      value.converter=org.apache.kafka.connect.json.JsonConverter
      
      
      offset.flush.interval.ms=1000
      offset.backing.store.dir=/debezium/offsets
      
      
      snapshot.consistency=ONE
      snapshot.mode=NEVER
      
      
      event.order.guarantee.mode=PARTITION_VALUES
      commit.log.real.time.processing.enabled=true
      commit.log.marked.complete.poll.interval.ms=1000
      cdc.dir.poll.interval.ms=1000
      num.of.change.event.queues=1 

       

      What is the captured database version and mode of deployment?

      Cloud provider using k8s, seen in both Cassandra 4 and Cassandra 5 versions.

      What behavior do you expect?

      If Debezium cannot move/archive a CommitLog file (e.g., AccessDeniedException), the connector should log the error and retry, since Cassandra or k8s CSI (Container Storage Interface), may still hold a temporary lock on the file.

      The QueueProcessor loop should not exit permanently on a single file-move failure (could be solved by just adding retries) or at least the connector should surface the error to allow restart/backoff logic, not silently stall.

      What behavior do you see?

      After encountering an AccessDeniedException on moving a commit log file, Debezium throws a RuntimeException and:

      • The QueueProcessor thread exits its processing loop.
      • The Debezium service continues running, but silently stops processing all CDC events.
      • No restart occurs because the exception is caught inside the queue thread.

      Result: silent data pipeline stall, causing Cassandra’s cdc_raw directory to fill, eventually triggering CDCWriteException and blocking all writes to CDC-enabled tables.

      Restarting the Debezium process solves the issue, and Debezium is able to move that file without any AccessDeniedException. This happens randomly and after Debezium has been able to move other files from the same directory.

      Do you see the same behaviour using the latest released Debezium version?

      Yes, on latest 3.3.0.Final version

      Do you have the connector logs, ideally from start till finish?

      After the ERROR log, Debezium becomes silent and stops sending log events.

       

      2025-11-20 22:56:51,210 ERROR  ||  Processing of event EOFEvent{file=/var/lib/cassandra/cdc_raw/CommitLog-8-1763650771500.log} was errorneous: {}   [io.debezium.connector.cassandra.QueueProcessor]
      java.lang.RuntimeException: java.nio.file.AccessDeniedException: /var/lib/cassandra/cdc_raw/CommitLog-8-1763650771500.log
      
      2025-11-20 22:56:51,210 WARN   ||  Failed to move CommitLog file CommitLog-8-1763650771500.log to /debezium/relocation/archive. Error:   [io.debezium.connector.cassandra.CommitLogUtil]
      java.nio.file.AccessDeniedException: /var/lib/cassandra/cdc_raw/CommitLog-8-1763650771500.log 
      
      2025-11-20 22:56:51,164 INFO   ||  Encountered EOF event for CommitLog-8-1763650771500.log ...   [io.debezium.connector.cassandra.QueueProcessor]
      
      2025-11-20 22:56:50,923 INFO   ||  Finished reading /var/lib/cassandra/cdc_raw/CommitLog-8-1763650771500.log   [org.apache.cassandra.db.commitlog.CommitLogReader]

       

       

      How to reproduce the issue using our tutorial deployment?

      • Deploy Debezium Cassandra connector with default QueueProcessor and CDC archiving logic.
      • Create a workload producing CDC events.
      • Induce a file-system permission error on a commit log file (e.g. only giving read permissions to the cdc_raw folder to the debezium process
      • Observe:
        • Debezium throws RuntimeException from moveCommitLog().
        • QueueProcessor thread exits.
        • Connector becomes silent and stops processing CDC events permanently.
        • No automatic restart; service appears healthy but is not processing data.

              Unassigned Unassigned
              jogomez97 Joan Gomez
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: