Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-4366

Oracle Logminer: LOB truncated in streaming mode

XMLWordPrintable

    • False
    • False
    • Hide

      In Oracle 19c, set up the following:

      CREATE TABLE foo (id INT PRIMARY KEY, lobcol CLOB);
      
      CREATE OR REPLACE PROCEDURE grow_proc(rid IN PLS_INTEGER, str IN varchar2, n IN PLS_INTEGER) IS
              loc CLOB;
              i PLS_INTEGER;
      BEGIN
              SELECT lobcol INTO loc FROM foo WHERE id = rid FOR UPDATE;
              DBMS_LOB.OPEN(loc, DBMS_LOB.LOB_READWRITE);
              FOR i IN 1..n LOOP
                  DBMS_LOB.WRITEAPPEND(loc, LENGTH(str), str);
              END LOOP;
              DBMS_LOB.CLOSE(loc);
      END;
      

      Then set up a debezium connector with logminer+lob support to read this table. Wait until the snapshot phase is over.
      Then insert some rows into the table as follows:

      INSERT INTO foo VALUES (1, 'x');
      CALL grow_proc(1, '<some 1024-byte string>', 5000);
      INSERT INTO foo VALUES (2, 'x');
      CALL grow_proc(2, '<some 1024-byte string>', 5000);
      
      Show
      In Oracle 19c, set up the following: CREATE TABLE foo (id INT PRIMARY KEY , lobcol CLOB ); CREATE OR REPLACE PROCEDURE grow_proc(rid IN PLS_INTEGER, str IN varchar2 , n IN PLS_INTEGER) IS loc CLOB ; i PLS_INTEGER; BEGIN SELECT lobcol INTO loc FROM foo WHERE id = rid FOR UPDATE ; DBMS_LOB. OPEN (loc, DBMS_LOB.LOB_READWRITE); FOR i IN 1..n LOOP DBMS_LOB.WRITEAPPEND(loc, LENGTH (str), str); END LOOP ; DBMS_LOB. CLOSE (loc); END ; Then set up a debezium connector with logminer+lob support to read this table. Wait until the snapshot phase is over. Then insert some rows into the table as follows: INSERT INTO foo VALUES (1, 'x' ); CALL grow_proc(1, '< some 1024-byte string >' , 5000); INSERT INTO foo VALUES (2, 'x' ); CALL grow_proc(2, '< some 1024-byte string >' , 5000);

      When I set up a connector to read the contents of a table with a LOB column (I verified this with both BLOB and CLOB), with LOB support enabled in the configuration, I observe the following behavior for medium-sized (5-10MiB) LOBs:

      • initial snapshot replication works perfectly; the LOBs are emitted onto the Kafka topic in their entirety
      • during streaming replication, LOBs are truncated. Only the last ~300KiB of the LOB appears on the Kafka topic.

      I figured out what is going on: the code that processes LOBs (debezium-connector-oracle/src/main/java/io/debezium/connector/oracle/logminer/processor/TransactionCommitConsumer.java) assumes that a LOB value is represented in LogMiner as

      • a single SELECT <lob-column> INTO <variable> FOR UPDATE (SEL_LOB_LOCATOR event), followed by
      • an unbroken sequence of LOB_WRITE events

      When the LOB_WRITE event sequence is interrupted by any other event, the accumulated LOB contents are written to lastEvent's newValues array (mergeLobWriteData).

      However, in my reproduction scenario, I see the following behavior from Oracle (19c EE):

      • SEL_LOB_LOCATOR for row X, column Y
      • a (short) series of LOB_WRITE events
      • SEL_LOB_LOCATOR for row X, column Y (so same LOB as the first SEL_LOB_LOCATOR)
      • another shortish series of LOB_WRITE events
      • ... and so on

      Every time the LOB_WRITE series is interrupted with a (spurious) SEL_LOB_LOCATOR, mergeLobWriteData is called, the accumulated bytes are concatenated, and the previous accumulation isĀ  overwritten in lastEvent.newValues.

      Ultimately, what ends up in the Kafka topics are the bytes from the last unbroken string of LOB_WRITE events, everything prior to that is discarded.

              Unassigned Unassigned
              dominique.chanet@klarrio.com Dominique Chanet (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: