Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-9241

Exceptionally large mining windows can lead unintended metrics/performance issues

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      In cases where the connector may be gathering reasonable batches of changes and there is a high burst of changes, the connector's upper bounds calculation may result in the use of the database's write SCN position, which could lead to an unusually large mining window, as shown below:

      [2025-07-18 00:44:32,917] DEBUG Fetching results for SCN [13280797146901, 13280797646901] (io.debezium.connector.oracle.logminer.buffered.BufferedLogMinerStreamingChangeEventSource)
      [2025-07-18 00:45:10,009] DEBUG Fetching results for SCN [13280797271601, 13280797771601] (io.debezium.connector.oracle.logminer.buffered.BufferedLogMinerStreamingChangeEventSource)
      [2025-07-18 00:45:19,916] DEBUG Fetching results for SCN [13280797711527, 13280798211527] (io.debezium.connector.oracle.logminer.buffered.BufferedLogMinerStreamingChangeEventSource)]
      [2025-07-18 00:47:19,919] DEBUG Fetching results for SCN [13280798724070, 13280875053817] (io.debezium.connector.oracle.logminer.buffered.BufferedLogMinerStreamingChangeEventSource)
      [2025-07-18 02:06:41,154] DEBUG Fetching results for SCN [13280874355097, 13280913995115] (io.debezium.connector.oracle.logminer.buffered.BufferedLogMinerStreamingChangeEventSource)
      

      At 00:47:19.916, we were mining at the maximum batch size of 500k when the connector detected a large batch of changes. Instead, it chose to move to the maximum write SCN on the database, yielding a read window of nearly 76M SCNs rather than 500k.

      This resulted in that batch taking nearly 79 minutes to complete rather than being iterative as we had been previously mining at 500K chunks.

      If we had done this iteratively, 76 million SCNs would be approximately 152 chunks at 500K. If each chunk averages around 20 seconds, that would result in anywhere from 50 to 60 minutes of lag to process those chunks, compared to the single batch taking 79 minutes. This is a savings, but certainly not necessarily one that huge when we look at it at the macro level.

      The benefit here would be that metrics are far more accurate, as they'll be updated 152 times across that 50-60 minute window, rather than just once. It should also benefit performance on the database side with smaller mining windows.

              ccranfor@redhat.com Chris Cranford
              ccranfor@redhat.com Chris Cranford
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: