Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4015

Performance Bottleneck in TableIdParser String Replacement

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Minor Minor
    • 1.7.0.CR2
    • 1.7.0.CR1
    • core-library
    • None
    • False
    • False

      After several Performance Optimizations,

      1. https://issues.redhat.com/browse/DBZ-3770 - https://github.com/debezium/debezium/pull/2545
      2. https://issues.redhat.com/browse/DBZ-3870 - https://github.com/debezium/debezium/pull/2603
      3. https://issues.redhat.com/browse/DBZ-3887 - https://github.com/debezium/debezium/pull/2612

      most of the performance bottlenecks such as JSON Serialization and producer - consumer delays are addressed.

      Now JVisualVM Sampling metrics of Initital Incremental CDC reveals 15% - 25% (1.5s - 2.5s out of total execution time of 8s for 10M records) of time being spent on TableIdParser due to inefficient `replaceAll` usage for String replacement.

      JMH Benchmarks

      package io.debezium.relational;
      
      import io.debezium.text.TokenStream;
      import org.openjdk.jmh.annotations.*;
      
      import java.util.ArrayList;
      import java.util.List;
      import java.util.concurrent.TimeUnit;
      
      @Fork(1)
      @State(Scope.Thread)
      @Warmup(iterations = 5, time = 1)
      @Measurement(iterations = 5, time = 1)
      @OutputTimeUnit(TimeUnit.NANOSECONDS)
      @BenchmarkMode({Mode.AverageTime})
      public class TableIdParserPerf {
      
          @Param({
                  "table", "\"\"\"table\"\"\"",
                  "database.schema.table", "\"\"\"database\"\"\".\"\"\"schema\"\"\".\"\"\"table\"\"\""
          })
          private String value;
      
          @Benchmark
          public void benchmark_v1_replaceAll() {
              TokenStream stream = new TokenStream(value, new TableIdParser.TableIdTokenizer(value), true);
              stream.start();
              List<String> parts = new ArrayList<>();
              while (stream.hasNext()) {
                  parts.add(stream.consume().replaceAll("''", "'").replaceAll("\"\"", "\"").replaceAll("``", "`"));
              }
          }
      
      }
      -- Java 8
      Benchmark                                                                  (value)  Mode  Cnt     Score     Error  Units
      TableIdParserPerf.benchmark_v1_replaceAll                                    table  avgt    5   543.088 ±  21.374  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll                              """table"""  avgt    5   798.946 ± 259.624  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll                    database.schema.table  avgt    5  1650.824 ±  54.234  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll  """database"""."""schema"""."""table"""  avgt    5  2327.782 ±  51.957  ns/op
      
      -- Java 11
      Benchmark                                                                  (value)  Mode  Cnt     Score     Error  Units
      TableIdParserPerf.benchmark_v1_replaceAll                                    table  avgt    5   601.808 ±  14.228  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll                              """table"""  avgt    5   797.075 ±  18.719  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll                    database.schema.table  avgt    5  1820.684 ±  32.619  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll  """database"""."""schema"""."""table"""  avgt    5  2434.282 ± 197.290  ns/op
      
      -- Java 17
      Benchmark                                                                  (value)  Mode  Cnt     Score    Error  Units
      TableIdParserPerf.benchmark_v1_replaceAll                                    table  avgt    5   569.806 ± 20.406  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll                              """table"""  avgt    5   760.002 ± 18.795  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll                    database.schema.table  avgt    5  1743.369 ± 28.877  ns/op
      TableIdParserPerf.benchmark_v1_replaceAll  """database"""."""schema"""."""table"""  avgt    5  2301.432 ± 36.062  ns/op

            Unassigned Unassigned
            krnaveen14 Naveen Kumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: