• Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 1.2.0.CR2
    • None
    • sqlserver-connector
    • None

      I use Debezium MSSQL to get data from ProductLogs and then push to kafka
      I found that the Metric MillisecondsBehindSource is high. The delay increase when the running time increase (60s --> 300s)
      The delay showed in attached file.
      I changed the BatchSize to 4096, PullInterval to 200ms but it's effective.
      Speed of data in ProductLogs arround 10/seconds.
      The size of data arround 400KB-1024KB

      ---------
      Below is my configuration
      connector.class=io.debezium.connector.sqlserver.SqlServerConnector
      database.dbname=xxx_PRODUCT
      database.user=sapo_dev_kafka_connect
      transforms.unwrap.delete.handling.mode=rewrite
      transforms.insertKey.fields=TenantId
      transforms.extractKey.field=TenantId
      tasks.max=1
      transforms.insertKey.type=org.apache.kafka.connect.transforms.ValueToKey
      database.history.kafka.bootstrap.servers=xxx
      database.history.kafka.topic=sqlserver.dbhistory.xxx
      transforms=unwrap,insertKey,extractKey
      database.server.name=xxx
      transforms.extractKey.type=org.apache.kafka.connect.transforms.ExtractField$Key
      database.port=1433
      table.whitelist=dbo.ProductLogs,dbo.Categories,dbo.ProductLogs
      database.hostname=xxx
      database.password=xxx
      transforms.unwrap.drop.tombstones=true
      transforms.unwrap.type=io.debezium.transforms.UnwrapFromEnvelope
      transforms.unwrap.operation.header=true
      snapshot.mode=initial_schema_only

            [DBZ-2120] Optimize SQLServer connector query

            Released

            Jiri Pechanec added a comment - Released

            I face the same problem, after investigate, i found it cause by get all changes query:

            SELECT * FROM cdc.[fn_cdc_get_all_changes_#](ISNULL(?,sys.fn_cdc_get_min_lsn('#')), ?, N'all update old')

            This query is very slow in production database, it take about 10 minutes to complete, but when run in SSMS, it very fast.

            I change query to SELECT * FROM cdc.[fn_cdc_get_all_changes_#](?, ?, N'all update old')  and move check null logic to java code, it run very fast.

            I think it related to SQLServer params sniffing http://www.sommarskog.se/query-plan-mysteries.html

             

             

            Linh Nguyen Hoang (Inactive) added a comment - I face the same problem, after investigate, i found it cause by get all changes query: SELECT * FROM cdc. [fn_cdc_get_all_changes_#] (ISNULL(?,sys.fn_cdc_get_min_lsn('#')), ?, N'all update old') This query is very slow in production database, it take about 10 minutes to complete, but when run in SSMS, it very fast. I change query to SELECT * FROM cdc. [fn_cdc_get_all_changes_#] (?, ?, N'all update old')  and move check null logic to java code, it run very fast. I think it related to SQLServer params sniffing  http://www.sommarskog.se/query-plan-mysteries.html    

              Unassigned Unassigned
              cuongnv84 Nguyễn Văn Cường (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: