Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-8860

The method removeTransactionEventWithRowId creates high CPU load in certain scenarios

XMLWordPrintable

      Given a transaction that performs the following ordered steps:

      1. Insert 1 row into a table
      2. Create a save point
      3. Insert 100k rows into the table
      4. Rollback to the save point
      5. Commit

      The removeTransactionEventWithRowId method will cause high CPU when the transaction is consumed due to how the for loop handles event ids.

      As events are added to the Transaction, we increment the event-id counter inside the Transaction class. When the processor observes an event with the RollbackFlag enabled, we iterate the events from the max event-id back to 0 to find the matching event with the same Oracle row-id.

      With a large batch of events rolled back due to save points, the first iteration finds its event typically as the last event-id minus one. But as this process continues forward and undos more events, the for-loop event cache lookup returns nulls for the LogMinerEvent because it was removed in a prior call.

      Over this, this creates a more expensive for-loop that has more and more lookups that are null, which raises the CPU. This also creates a performance drop because the loop must constantly check from max event-id backward.

      In Debezium 2.7, this wasn't an issue for heap-based implementations because the Transaction held the list of events and we could iterate backward based on size and list index. For the unified mode in Debezium 3.0 for heap & non-heap this was changed and the API doesn't support this well.

              ccranfor@redhat.com Chris Cranford
              ccranfor@redhat.com Chris Cranford
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: