-
Bug
-
Resolution: Done
-
Minor
-
None
-
None
-
False
-
-
False
Bug report
When collection.include.list is configured using multiline YAML format (with `>-` fold block scalar) in a Strimzi KafkaConnector resource, Debezium's MongoDB connector generates a change stream aggregation pipeline with an incorrect regex pattern that includes a space after the pipe (`|`) character. This causes the regex pattern to fail to match collection names correctly, resulting in change stream events not being captured for collections listed after the first one.
Note: We use Strimzi to manage the Debezium connector.
What Debezium connector do you use and what version?
Debezium Mongodb 3.2.1.Final
What is the connector configuration?
As specified in Strimzi YAML
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: realtime
namespace: kafka
labels:
strimzi.io/cluster: stage-cluster-1
spec:
class: io.debezium.connector.mongodb.MongoDbConnector
tasksMax: 1
state: running
config:
topic.prefix: realtime
database.include.list: db1
collection.include.list: >-
db1.col1,
db2.col2
key.converter: org.apache.kafka.connect.json.JsonConverter
# other configs
What is the captured database version and mode of deployment?
Database: MongoDB
Deployment
- Kubernetes cluster
- Strimzi Kafka Connect operator (kafka.strimzi.io/v1beta2)
- Connector deployed as KafkaConnector custom resource
What behavior do you expect?
The connector should handle whitespace in comma-separated configuration values gracefully, trimming whitespace from collection names before generating regex patterns, regardless of how the configuration is provided (multiline YAML via Strimzi, single-line with spaces, or single-line without spaces).
- Single-line with spaces after commas: collection.include.list: db1.col1, db2.col2 (Notice the space after comma)
- Multiline YAML (when using Strimzi to manage the connector)
collection.include.list: >- db1.col1, db2.col2
What behavior do you see?
- Only the first collection (db1.col1) is captured - change stream events are received for this collection
- The second collection (db2.col2) is NOT captured - no change stream events are received
- No errors are thrown - this is a silent failure
- The generated regex pattern includes a space after the pipe character
Do you see the same behaviour using the latest released Debezium version?
Checked on version 3.2.1.Final, very likely reproducible on latest version too.
Do you have the connector logs, ideally from start till finish?
[2025-11-09 13:05:07,342] INFO [realtime-stream-test|task-0] Effective change stream pipeline:
[{"$replaceRoot": {"newRoot": {"namespace": {"$concat": ["$ns.db", ".", "$ns.coll"]}, "event": "$$ROOT"}}},
{"$match": {"$and": [{"$or": [{"$and": [{"event.ns.db": {"$regularExpression": {"pattern": "db1", "options": "i"}}},
{"namespace": {"$regularExpression": {"pattern": "db1.col1| db2.col2", "options": "i"}}}]},
{"namespace": "db1.debezium_signaling"}]}, {"event.operationType": {"$in": ["insert", "update", "replace", "delete"]}}]}},
{"$replaceRoot": {"newRoot": "$event"}}]
Please notice the space after | in the `
{"pattern": "db1.col1| db2.col2", "options": "i"}`