Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-9657

MongoDB connector: Whitespace in collection.include.list causes incorrect regex pattern in change streams

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Bug report

      When collection.include.list is configured using multiline YAML format (with `>-` fold block scalar) in a Strimzi KafkaConnector resource, Debezium's MongoDB connector generates a change stream aggregation pipeline with an incorrect regex pattern that includes a space after the pipe (`|`) character. This causes the regex pattern to fail to match collection names correctly, resulting in change stream events not being captured for collections listed after the first one.

      Note: We use Strimzi to manage the Debezium connector. 

      What Debezium connector do you use and what version?

      Debezium Mongodb 3.2.1.Final

      What is the connector configuration?

      As specified in Strimzi YAML

      apiVersion: kafka.strimzi.io/v1beta2
      kind: KafkaConnector
      metadata:
        name: realtime
        namespace: kafka
        labels:
          strimzi.io/cluster: stage-cluster-1
      spec:
        class: io.debezium.connector.mongodb.MongoDbConnector
        tasksMax: 1
        state: running
        config:
          topic.prefix: realtime
          database.include.list: db1
          collection.include.list: >-
            db1.col1,
            db2.col2
          key.converter: org.apache.kafka.connect.json.JsonConverter
          # other configs

      What is the captured database version and mode of deployment?

      Database: MongoDB 

      Deployment

      • Kubernetes cluster
      • Strimzi Kafka Connect operator (kafka.strimzi.io/v1beta2)
      • Connector deployed as KafkaConnector custom resource

        What behavior do you expect?

      The connector should handle whitespace in comma-separated configuration values gracefully, trimming whitespace from collection names before generating regex patterns, regardless of how the configuration is provided (multiline YAML via Strimzi, single-line with spaces, or single-line without spaces).

      • Single-line with spaces after commas: collection.include.list: db1.col1, db2.col2 (Notice the space after comma)
      • Multiline YAML (when using Strimzi to manage the connector)
        collection.include.list: >-
           db1.col1,
           db2.col2

      What behavior do you see?

      1. Only the first collection (db1.col1) is captured - change stream events are received for this collection
      1. The second collection (db2.col2) is NOT captured - no change stream events are received
      1. No errors are thrown - this is a silent failure
      1. The generated regex pattern includes a space after the pipe character

      Do you see the same behaviour using the latest released Debezium version?

      Checked on version 3.2.1.Final, very likely reproducible on latest version too.

      Do you have the connector logs, ideally from start till finish?

       

      [2025-11-09 13:05:07,342] INFO [realtime-stream-test|task-0] Effective change stream pipeline: 
      [{"$replaceRoot": {"newRoot": {"namespace": {"$concat": ["$ns.db", ".", "$ns.coll"]}, "event": "$$ROOT"}}}, 
      {"$match": {"$and": [{"$or": [{"$and": [{"event.ns.db": {"$regularExpression": {"pattern": "db1", "options": "i"}}}, 
      {"namespace": {"$regularExpression": {"pattern": "db1.col1| db2.col2", "options": "i"}}}]}, 
      {"namespace": "db1.debezium_signaling"}]}, {"event.operationType": {"$in": ["insert", "update", "replace", "delete"]}}]}}, 
      {"$replaceRoot": {"newRoot": "$event"}}]

      Please notice the space after | in the `

      {"pattern": "db1.col1| db2.col2", "options": "i"}

      `

       

              Unassigned Unassigned
              sumitjha4321@gmail.com Sumit Jha
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: