-
Bug
-
Resolution: Done
-
Major
-
1.4.2.Final
-
None
-
False
-
False
-
Undefined
-
Debezium incorrectly identifies primary member of replica set. As a result, Debezium doesn't always connect and replicate from the primary oplog (luck of the draw). Furthermore, if the incorrectly selected replica set host is out of the shard for maintenance or other reasons, Debezium errors out.
ServerAddress serverAddress = serverDescriptions.get(0).getAddress();
Instead of assuming that the primary address is in zero position of the list of servers in the replica-set (via the MongoClient created with the replica set members), there should be loop that traverses all of the serverDescriptions looking for isPrimary == true.
This issue has led to significant downtime, preventing us from being able to perform maintenance on our sharded cluster. Debezium randomly picks from a list of replica set hosts provided during discovery against our Mongo configuration replica set.
For example the maintenance that we're currently trying to perform on a replica set member requires a re-sync (up-sizing disk), thus it needs to remain in replica set, but not as a primary or secondary node (start up mode). The Mongo Configuration replica set still reports this node as a member of the replica set to debezium, as it should. Debezium does not check for the primary node correctly and sometimes selects this node even though it's in a start-up mode. Since we have 3 nodes in a replica set, there's a 1 in 3 chance that debezium will fail during maintenance.
This issue potentially affects 1.4 and up.