Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-277

Memory leak and excessive CPU usage when using materialized views

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 0.5.1
    • 0.5
    • postgresql-connector
    • None
    • Hide

      Monitor some table in a postgres database.

      Create a view in the same database as follows (the idea here is that the contents of the view changes every time you access it, generating lots of WAL from the materialized view):
      CREATE VIEW public.generate_wal AS
      SELECT s.i,
      random() r1, random() r2, random() r3, random() r4
      FROM generate_series(1, 100000) s;

      Make a materialized view out of it:
      CREATE MATERIALIZED VIEW mat_wal AS SELECT * FROM generate_wal;

      Add an index so concurrent refresh is possible:
      CREATE UNIQUE INDEX ON mat_wal;

      Refresh the materialized view:
      REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal;

      Observe the high CPU usage in the Kafka connect task, and enjoy how long it goes on for!

      Do it a second time:
      REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal;

      With the default Kafka connect heap size of 256MB this is already enough to exhaust the heap. With a 1G heap size you just need to be patient and keep doing this - eventually you will run out of heap.

      Probably there's some connection between the excessive CPU usage/processing time and the leak.

      Show
      Monitor some table in a postgres database. Create a view in the same database as follows (the idea here is that the contents of the view changes every time you access it, generating lots of WAL from the materialized view): CREATE VIEW public.generate_wal AS SELECT s.i, random() r1, random() r2, random() r3, random() r4 FROM generate_series(1, 100000) s ; Make a materialized view out of it: CREATE MATERIALIZED VIEW mat_wal AS SELECT * FROM generate_wal; Add an index so concurrent refresh is possible: CREATE UNIQUE INDEX ON mat_wal ; Refresh the materialized view: REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal; Observe the high CPU usage in the Kafka connect task, and enjoy how long it goes on for! Do it a second time: REFRESH MATERIALIZED VIEW CONCURRENTLY mat_wal; With the default Kafka connect heap size of 256MB this is already enough to exhaust the heap. With a 1G heap size you just need to be patient and keep doing this - eventually you will run out of heap. Probably there's some connection between the excessive CPU usage/processing time and the leak.

      Doing REFRESH MATERIALIZED VIEW CONCURRENTLY on a materialized view in the same PostgreSQL database as a table being monitored by debezium results in excessive CPU usage in the record stream producer thread, and leaks java heap: even with a 1G heap, refreshing a huge materialized view with many changes results in java.lang.OutOfMemoryError being thrown.

              Unassigned Unassigned
              duncan.sands Duncan Sands (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: