Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-6092

Postgres connector stuck when replication slot does not have confirmed_flush_lsn

    XMLWordPrintable

Details

    • False
    • None
    • False
    • Important

    Description

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:
      Sometimes there is a situation when after recovery Postgres replication slot does not have confirmed_flush_lsn. In our case this situation is caused by migration from one node to another and after restoration it is possible to advance restart_lsn value but not confirmed_flush_lsn.
      So Debezium tries to fetch the replication slots from Postgres and filter out the slots that do not have confirmed_flush_lsn set initially retrying and waiting for potential long-running transactions to be committed as described in DBZ-862 and then fails with an exception when the amount of retries is exhausted. After that the retrying repeats.

      What Debezium connector do you use and what version?

      Postgres connector version 1.9.5

      What is the connector configuration?

      <Your answer>

      What is the captured database version and mode of depoyment?

      (E.g. on-premises, with a specific cloud provider, etc.)

      Postgres 12

      What behaviour do you expect?

      The connector falls back to restart_lsn.

      What behaviour do you see?

      Debezium tries to fetch the replication slots from Postgres and filter out the slots that do not have confirmed_flush_lsn set initially retrying and waiting for potential long-running transactions to be committed as described in DBZ-862 and then fails with an exception when the amount of retries is exhausted. After that the retrying repeats.

      Do you see the same behaviour using the latest relesead Debezium version?

      (Ideally, also verify with latest Alpha/Beta/CR version)

      Yes.

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      Yes

      How to reproduce the issue using our tutorial deployment?

      The issue is easily reproducible with the test form this PR https://github.com/debezium/debezium/pull/4265

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      <Your answer>

      Implementation ideas (optional)

      To solve this the suggestion is to fall back to restart_lsn in this case.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aipopov Anatolii Popov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: