Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-277

Memory leak and excessive CPU usage when using materialized views

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 0.5.1
    • 0.5
    • postgresql-connector
    • None
    • Hide

      Monitor some table in a postgres database.

      Create a view in the same database as follows (the idea here is that the contents of the view changes every time you access it, generating lots of WAL from the materialized view):
      CREATE VIEW public.generate_wal AS
      SELECT s.i,
      random() r1, random() r2, random() r3, random() r4
      FROM generate_series(1, 100000) s;

      Make a materialized view out of it:
      CREATE MATERIALIZED VIEW mat_wal AS SELECT * FROM generate_wal;

      Add an index so concurrent refresh is possible:
      CREATE UNIQUE INDEX ON mat_wal;

      Refresh the materialized view:
      REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal;

      Observe the high CPU usage in the Kafka connect task, and enjoy how long it goes on for!

      Do it a second time:
      REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal;

      With the default Kafka connect heap size of 256MB this is already enough to exhaust the heap. With a 1G heap size you just need to be patient and keep doing this - eventually you will run out of heap.

      Probably there's some connection between the excessive CPU usage/processing time and the leak.

      Show
      Monitor some table in a postgres database. Create a view in the same database as follows (the idea here is that the contents of the view changes every time you access it, generating lots of WAL from the materialized view): CREATE VIEW public.generate_wal AS SELECT s.i, random() r1, random() r2, random() r3, random() r4 FROM generate_series(1, 100000) s ; Make a materialized view out of it: CREATE MATERIALIZED VIEW mat_wal AS SELECT * FROM generate_wal; Add an index so concurrent refresh is possible: CREATE UNIQUE INDEX ON mat_wal ; Refresh the materialized view: REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal; Observe the high CPU usage in the Kafka connect task, and enjoy how long it goes on for! Do it a second time: REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal; With the default Kafka connect heap size of 256MB this is already enough to exhaust the heap. With a 1G heap size you just need to be patient and keep doing this - eventually you will run out of heap. Probably there's some connection between the excessive CPU usage/processing time and the leak.

      Doing REFRESH MATERIALIZED VIEW CONCURRENTLY on a materialized view in the same PostgreSQL database as a table being monitored by debezium results in excessive CPU usage in the record stream producer thread, and leaks java heap: even with a 1G heap, refreshing a huge materialized view with many changes results in java.lang.OutOfMemoryError being thrown.

            [DBZ-277] Memory leak and excessive CPU usage when using materialized views

            Last time I looked debezium didn't support materialized views. How then could materialized views cause a problem like in this Jira? The answer is that postgres sends WAL for everything to the decoderbufs plugin, not just for tables debezium is monitoring. Nothing useful is done with WAL from materialized views, but due to various bugs it still managed to cause trouble.

            Duncan Sands (Inactive) added a comment - Last time I looked debezium didn't support materialized views. How then could materialized views cause a problem like in this Jira? The answer is that postgres sends WAL for everything to the decoderbufs plugin, not just for tables debezium is monitoring. Nothing useful is done with WAL from materialized views, but due to various bugs it still managed to cause trouble.

            Duncan Sands, Could you explain. How to get data from materialized view to debezium.

            Deeapk Sagta (Inactive) added a comment - Duncan Sands, Could you explain. How to get data from materialized view to debezium.

            Bulk closing issues in state "Resolved" with resolution "Done" and with a released "Fix Version".

            Gunnar Morling added a comment - Bulk closing issues in state "Resolved" with resolution "Done" and with a released "Fix Version".

            About (2), where's that ever-growing list you mention? Would be great to have a follow-up issue for that one, too. Thanks!

            Gunnar Morling added a comment - About (2), where's that ever-growing list you mention? Would be great to have a follow-up issue for that one, too. Thanks!

            Thanks, duncan.sands. Really great description and analysis, and thanks for submitting the PR!

            Randall Hauch (Inactive) added a comment - Thanks, duncan.sands . Really great description and analysis, and thanks for submitting the PR!

            Duncan Sands (Inactive) added a comment - Fix on the decoderbufs end here: https://github.com/debezium/postgres-decoderbufs/pull/7

            The cause of the memory leak and poor performance is kind of silly: the decoderbufs is producing a warning on every delete it processes on the materialized view:
            elog(WARNING, "no information to decode from DELETE because either no PK is present or REPLICA IDENTITY NOTHING or invalid ");
            These warnings are sent to the client (the debezium postgres connector). The JDBC layer carefully preserves them in a list. Since no-one ever clears the list, it only ever gets longer. As there is one warning per row, it gets long pretty quickly! Eventually the entire heap is consumed.

            I think this needs to be tackled from both ends:
            (1) The decoder plugin should not generate any output at WARNING level or higher on a per-row basis. I've written a patch that drops the priority to DEBUG, which means they aren't sent to clients any more.
            (2) The debezium client shouldn't just totally ignore the possibility that warnings (and notifications too for that matter) may be accumulating, it should do something with them.

            Duncan Sands (Inactive) added a comment - The cause of the memory leak and poor performance is kind of silly: the decoderbufs is producing a warning on every delete it processes on the materialized view: elog(WARNING, "no information to decode from DELETE because either no PK is present or REPLICA IDENTITY NOTHING or invalid "); These warnings are sent to the client (the debezium postgres connector). The JDBC layer carefully preserves them in a list. Since no-one ever clears the list, it only ever gets longer. As there is one warning per row, it gets long pretty quickly! Eventually the entire heap is consumed. I think this needs to be tackled from both ends: (1) The decoder plugin should not generate any output at WARNING level or higher on a per-row basis. I've written a patch that drops the priority to DEBUG, which means they aren't sent to clients any more. (2) The debezium client shouldn't just totally ignore the possibility that warnings (and notifications too for that matter) may be accumulating, it should do something with them.

              Unassigned Unassigned
              duncan.sands Duncan Sands (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: