Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-3836

Timeout when reading from MongoDB oplog cannot be controlled

    XMLWordPrintable

Details

    Description

      Due to competing connectors, it might sometimes take a while before the cursor on the oplog can fetch the next position. This then results in:

      com.mongodb.MongoExecutionTimeoutException: operation exceeded time limit
          at com.mongodb.internal.connection.ProtocolHelper.createSpecialException(ProtocolHelper.java:239)
          at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:171)
          at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:358)
          at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:279)
          at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)
          at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)
          at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)
          at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)
          at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)
          at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)
          at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:110)
          at com.mongodb.internal.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:268)
          at com.mongodb.internal.operation.QueryBatchCursor.tryHasNext(QueryBatchCursor.java:219)
          at com.mongodb.internal.operation.QueryBatchCursor.tryNext(QueryBatchCursor.java:203)
          at com.mongodb.client.internal.MongoBatchCursorAdapter.tryNext(MongoBatchCursorAdapter.java:74)
          at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.readOplog(MongoDbStreamingChangeEventSource.java:210)
          at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.lambda$streamChangesForReplicaSet$0(MongoDbStreamingChangeEventSource.java:106)
          at io.debezium.connector.mongodb.ConnectionContext$MongoPrimary.execute(ConnectionContext.java:288)
          at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.streamChangesForReplicaSet(MongoDbStreamingChangeEventSource.java:105)
          at io.debezium.connector.mongodb.MongoDbStreamingChangeEventSource.execute(MongoDbStreamingChangeEventSource.java:87)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:140)
          at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:113)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.lang.Thread.run(Thread.java:834)
      

      As far as I can see, there's no way to manipulate the timeout for this specific cursor and when a timeout occurs, the connector does not crash but because of retrying, it keeps running into this more and more.

      Some context: we're running 13 Debezium connectors that look into the oplog, this is almost certainly creating contention but there's no way for us to reorganize this now. The above exception is happening ~400/day.

      Would it be possible to either manipulate the timeout or build a retry mechanism around this?

      Attachments

        Activity

          People

            ccranfor@redhat.com Chris Cranford
            frankkoornstra Frank Koornstra (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: