Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4843

SQL Server - latency issue for large schemas - optimization needed

    XMLWordPrintable

Details

    • False
    • None
    • False

    Description

      We have relatively large schema consisting of many tables, around 300 are cdc-enabled.

      Regarding these 300 tables:

      • <20% of the tables are responsible for >80% of the change events
      • >80% of these are tables with sporadic updates.

      We observed poor latency because of large number of the tables to query.

      Even in a simplest case - just one change to process - single Debezium iteration took about 5800ms in our environment.

      That's because SQL Server Connector executes 300 queries to probe all 300 tables, even if 299 tables do not contain any new change since last time.

      An improvement would be to execute dynamic query to determine tables containing actual changes since last time.

      In case of SQL Server dynamic query can be implemented using stored procedure.

      We implemented such optimization and measured timings after the change

      • now in case of 1 change to process the single iteration takes about 100ms (comparing to ~5800ms previously)
      • in typical case of changes from 20 tables the single iteration takes about 600ms (comparing to ~6500ms previously)

      Note:

      • with this optimization every iteration needs to determine tables to query for given range of LSNs
      • with help of stored procedure this determination takes about 70ms, which is extra cost comparing to non-optimized version, but then significantly reduces number of queries on change tables
      • empty iterations are not affected

      The average end-to-end (commit-to-publish) time in realistic conditions in our environment was reduced from ~10 seconds to 1 second, which is very good in our opinion.

      The change in code is simple and non-intrusive, see below; moreover - it can be implemented as truly optional optimization with easy fallback to default behavior in case of problems.

      The only real drawback is necessity to create stored procedure on SQL Server - but it is required only for those who want to activate the optimization.

      Given the simplicity of this solution and benefits we recommend implementing such optimization in Debezium standard SQL Server connector.

      Details of the solution as implemented on 1.8.0 code basis

       

      • modification in SqlServerStreamingChangeEventSource.executeIteration()

       

      • modification in SqlServerStreamingChangeEventSource.getChangeTablesToQuery()

       

      • new method in SqlServerStreamingChangeEventSource

       

      • new stored procedure in SQL Server

      Attachments

        Activity

          People

            Unassigned Unassigned
            kgrzechnik Krzysztof Grzechnik
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: