Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-3224

Improve relocation logic for processed commitLog files

XMLWordPrintable

      Cassandra CDC will enqueue an EOF (end-of-file) event to ChangeEventQueue after it finishes reading all mutations in a CommitLogFile. Since we only have one instance of ChangeEventQueue in Cassandra CDC, it's guaranteed that the EOF event comes after all change events of a CommitLogFile. When QueueProcessor polls out the EOF event of a CommitLogFile, it means that all change events of this CommitLogFile have been published successfully and Cassandra CDC will move this file into the success relocation folder from cdc_raw. However, if Cassandra CDC fails to publish the change event of a mutation, it will stop and the EOF event of the CommitLogFile which contains this mutation won't be move out of cdc_raw, which will potentially suspend writes into Cassandra DB.

      To solve the potential P0 issue as described above, we'll want to make the following refactors in Cassandra CDC:

      1). When Cassandra CDC fails to publish a change event, we should catch the exception and make Cassandra CDC keep processing other change events.
      2). But 1) will generate a new problem that when QueueProcessor polls out the EOF event of a CommitLogFile, it's possible that some change events of this file are not published successfully, but this file will still be moved to success relocation folder and won't be re-processed.
      3) To solve the problem described in 2), we might want to maintain a set/map in either CassandraConnectorContext/QueueProcessor/CommitLogPostProcessor. When QueueProcessor polls out an EOF event, it should firstly check if the name of the CommitLogFile is in the map/set, if yes, the file should be moved to error relocation folder for re-processing, otherwise, it should be moved to success relocation folder.

              zhou.bing@husky.neu.edu Bingqin Zhou (Inactive)
              zhou.bing@husky.neu.edu Bingqin Zhou (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: