Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-3452

source.timestamp.mode=commit imposes a significant performance penalty

XMLWordPrintable

    • False
    • False
    • Undefined

      Currently, with source.timestamp.mode=commit (the default mode) the SQL Server connector sources the timestamp of each individual transaction via a separate stored function call. The N+1 query issue was originally mentioned in DBZ-1065 but hasn't been addressed.

      According to the profiling we've done, the connector spends a significant amount of time fetching transaction timestamps (see attached screenshot).

      In our test environments, the connector with the default settings could process only 10-15 updates per second while switching to source.timestamp.mode=processing increased that number to ~5500 updates per second. The updates were specially crafted to have one update per transaction to exacerbate the issue.

      As discussed above, the likely solution is to retrieve transaction timestamps as part of the CDC data by joining them on the server-side rather than requesting separately.

      For reference, the performance impact is mentioned in the documentation:

      [...] if you want to avoid the additional cost of Debezium querying the database to extract the LSN timestamps.

            Unassigned Unassigned
            sergeimorozov Sergei Morozov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: