Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-6011

Vitess: Handle the shard list difference between current db shards and persisted shards

    XMLWordPrintable

Details

    Description

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:

      What Debezium connector do you use and what version?

      Vitess

      What is the connector configuration?

      Using vitess.offset.storage.per.task=true

      What is the captured database version and mode of depoyment?

      Vitess V13, AWS

      What behaviour do you expect?

      When dbzium connector went offline for some time and Vitess shard splits during that time (e.g. s1 split into s10 and s11), the current shard list from v$session will have latest (s10, s11) however the shard list persisted Kafka offset topic only contains old shard s1 (pointed at an old position gt1).

      When vitess connector restarts, the current logic in VitessConnector.taskConfigs() will use the latest shard list (s10, s11) to do the task assignment and it will use "current" (i.e. tail of the binlog queue) for s10/s11.  The correct behavior should use the old shard (s1) and old position (e.g. gt1) from persisted Kafka offset storage, this way the connector would subscribe to the exact point when it was stopped before.  When the vtgate continue playing the binlog events from vttablet, it will eventually encounter the shard split binlog event where tablet stream from s1 vttablet will be closed and be replaced with tablet streams from s10 and s11. All these will happen transparently for vitess connector, vitess connector will seamlessly receives the new events from new shards from vtgate (as if it was connected online all the time).

      What behaviour do you see?

      Currently when the connector detects the current shard list from v$session are different from the shards from persisted offset storage, it will abort.

      Do you see the same behaviour using the latest relesead Debezium version?

      Yes

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      <Your answer>

      How to reproduce the issue using our tutorial deployment?

      Stop vitess connector and doing a shard split and restart vitess connector

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      <Your answer>

      Implementation ideas (optional)

      Modify VitessConnector.taskConfigs() to favor the shards from persisted offset storage

      Attachments

        Activity

          People

            Unassigned Unassigned
            haiyingcai Henry Haiying Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: