Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-7193

Unchanged toasted array columns are substituted with unavailable.value.placeholder, even when REPLICA IDENTITY FULL is configured.

XMLWordPrintable

    • False
    • None
    • False
    • Important

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:

      What Debezium connector do you use and what version?

      Debezium version: 2.3.4

      What is the connector configuration?

      Below is the Debezium connector config 

       

      {
          "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
          "database.dbname": "test_application_sg",
          "database.hostname": "rds_host_name,
          "database.password": "rds_password",
          "database.port": "5432",
          "database.server.name": "test_application_sg",
          "database.user": "dwh_debezium",
          "heartbeat.action.query": "CREATE SCHEMA IF NOT EXISTS debezium;\nCREATE TABLE IF NOT EXISTS debezium.heartbeat (id INTEGER PRIMARY KEY, ts TIMESTAMP WITH TIME ZONE);\nINSERT INTO debezium.heartbeat (id, ts) VALUES (1, NOW()) ON CONFLICT(id) DO UPDATE SET ts=NOW();\n",
          "heartbeat.interval.ms": "15000",
          "max.batch.size": "2048",
          "max.queue.size": "8192",
          "name": "test_application_sg_sg_source_postgres",
          "plugin.name": "pgoutput",
          "producer.override.batch.size": "327680",
          "producer.override.buffer.memory": "16777216",
          "producer.override.compression.type": "lz4",
          "producer.override.max.request.size": "5242880",
          "producer.override.offset.flush.interval.ms": "10000",
          "producer.override.offset.flush.timeout.ms": "60000",
          "producer.override.socket.receive.buffer.bytes": "800000",
          "schema.include.list": "public,debezium",
          "slot.max.retries": "6",
          "slot.name": "test_application_sg",
          "slot.retry.delay.ms": "60000",
          "snapshot.mode": "never",
          "table.include.list": "^public\\.events$,^debezium\\.heartbeat$",
          "tasks.max": "1",
          "topic.creation.default.cleanup.policy": "compact,delete",
          "topic.creation.default.compression.type": "lz4",
          "topic.creation.default.delete.retention.ms": "86400000",
          "topic.creation.default.max.message.bytes": "8388608",
          "topic.creation.default.partitions": "1",
          "topic.creation.default.replication.factor": "3",
          "topic.creation.groups": "heartbeat",
          "topic.creation.heartbeat.cleanup.policy": "delete",
          "topic.creation.heartbeat.compression.type": "lz4",
          "topic.creation.heartbeat.delete.retention.ms": "86400000",
          "topic.creation.heartbeat.include": "(^__debezium-heartbeat\\.test_application$|^test_application\\.debezium\\.heartbeat$)",
          "topic.prefix": "test_application.sg",
          "transforms": "unwrap",
          "transforms.unwrap.add.fields": "op,table,source.ts_ms,source.db",
          "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"
      } 

       

      What is the captured database version and mode of depoyment?

      AWS - RDS PostgreSQL / Aurora PostgresSQL

      PostgreSQL version: 12.15

      What behaviour do you expect?

      • Create the below sample events table 
        CREATE TABLE public.events (
            id int4 NOT NULL,
            application varchar NOT NULL,
            created_at timestamptz NULL DEFAULT now(),
            updated_at timestamptz NULL DEFAULT now(),
            shape geometry(polygon) NULL,
            message jsonb NULL,
            days_array _int4 NULL
        ) 
      • Set replica identity to FULL 
        ALTER TABLE events REPLICA IDENTITY FULL 
      • Insert a large row so that Postgres uses Toast Storage
        INSERT INTO public.events 
        (id, platform, shape, message, recurrence_days) 
        VALUES (1, 'app_1', 'SAMPLE_LARGE_GEOMETRY_VALUE', 'SAMPLE_LARGE_MESSAGE', ARRAY[1,2,3,4,5,6]); 
      • Run an below sample update which does not updates the Toasted column
        UPDATE events
        set updated_at = ((updated_at) - INTERVAL '1 SECONDS')
        where id=1 and platform='app_1'; 
      • Given that the Replica identity is configured as FULL, Debezium should ideally generate 
        {
            "schema": { ... },
            "payload": {
                "before": { 
                    "id": 1
                    "platform": "app_1",
                    "shape": "SAMPLE_LARGE_GEOMETRY_VALUE",
                    "created_at": "2023-11-28 10:15:20",
                    "updated_at": "2023-11-28 10:15:20",
                    "message": "SAMPLE_LARGE_MESSAGE",
                    "recurrence_days": [1,2,3,4,5,6]
                },
                "after": { 
                    "id": 1
                    "platform": "app_1",
                    "shape": "SAMPLE_LARGE_GEOMETRY_VALUE",
                    "created_at": "2023-11-28 10:15:20", 
                    "updated_at": "2023-11-28 10:15:20",          
                    "message": "SAMPLE_LARGE_MESSAGE",
                    "recurrence_days": [1,2,3,4,5,6]
                }
            }
        }

      What behaviour do you see?

       

      {
          "schema": { ... },
          "payload": {
              "before": { 
                  "id": 1
                  "platform": "app_1",
                  "shape": "SAMPLE_LARGE_GEOMETRY_VALUE",
                  "message": "SAMPLE_LARGE_MESSAGE",
                  "created_at": "2023-11-28 10:15:20",
                  "updated_at": "2023-11-28 10:15:20",             
                  "recurrence_days": [1,2,3,4,5,6]
              },
              "after": { 
                  "id": 1
                  "platform": "app_1",
                  "shape": "SAMPLE_LARGE_GEOMETRY_VALUE",
                  "message": "SAMPLE_LARGE_MESSAGE",
                  "created_at": "2023-11-28 10:15:20",
                  "updated_at": "2023-11-28 10:15:20",             
                  "recurrence_days": [95, 95, 100, 101, 98, 101, 122, 105, 117, 109, 95, 117, 110, 97, 118, 97, 105, 108, 97, 98, 108, 101, 95, 118, 97, 108, 117, 101]
              }
          }
      } 

       

      Do you see the same behaviour using the latest relesead Debezium version?

      Yes this behaviour exist with the lastest version of Debezium.

      It works fine with Debezium 2.0.

      Do you have the connector logs, ideally from start till finish?

      Attaching the logs collected during debugging.

      debezium_logs.json

      How to reproduce the issue using our tutorial deployment?

      Explained steps above.

      Implementation ideas (optional)

      I have this proposal https://github.com/debezium/debezium/pull/5049

       

        1. debezium_logs.json
          12 kB
          Pavithrananda Prabhu S

              Unassigned Unassigned
              pavithrananda.sivananda@deliveryhero.com Pavithrananda Prabhu S (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: