Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-5920

Ingestion issues with Mongodb when empty [] or empty {} appear in the Json feed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • 2.7.0.Alpha1
    • 1.9.6.Final
    • mongodb-connector
    • None
    • False
    • None
    • False
    • Moderate

    Description

      In order to make your issue reports as actionable as possible, please provide the following information, depending on the issue type.

      Bug report

      For bug reports, provide this information, please:

       

      What Debezium connector do you use and what version?

      1.9.6

      What is the connector configuration?

      {
                "name": "mongo-connector",
                "config": {
                  "connector.class": "io.debezium.connector.mongodb.MongoDbConnector",
                  "tasks.max": "1",
                  "mongodb.hosts": "rs0/mongo:27017",
                  "mongodb.name": "mongodbserver1",
                  "mongodb.user": "root",
                  "mongodb.password": "root",
                  "mongodb.server.selection.timeout.ms": "60000000",
                  "transforms": "unwrap",
                  "transforms.unwrap.type": "io.debezium.connector.mongodb.transforms.ExtractNewDocumentState",
                  "transforms.unwrap.sanitize.field.names": true,
                  "transforms.unwrap.drop.tombstones": "false",
                  "transforms.unwrap.delete.handling.mode": "drop",
                  "transforms.unwrap.add.headers": "op"
                }
              }
      

       

      What is the captured database version and mode of depoyment?

      (E.g. on-premises, with a specific cloud provider, etc.)

      Local Mongo db 4.4 as replica set.

      What behaviour do you expect?

      The ingestion not to fail.

      What behaviour do you see?

      In case of empty [] and {} a parsing error is returned.

      Do you see the same behaviour using the latest relesead Debezium version?

      (Ideally, also verify with latest Alpha/Beta/CR version)

      Didn't try it

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      For the {} I have a small snippet (could not catch totally):

      Caused by: org.apache.kafka.connect.errors.DataException: Failed to find field 'attribute_values' in schema mongodbserver1.mydata.release.media.tracks.recording.relations at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:144) ....

      {{}}
      and more complete for []

       

      2022-12-08 11:30:43,265 WARN   ||  [Producer clientId=connector-producer-mongo-connector-0] Error while fetching metadata with correlation id 57 : {mongodbserver1.music_brainz_api.release=LEADER_NOT_AVAILABLE}   [org.apache.kafka.clients.NetworkClient]
      2022-12-08 11:30:49,394 WARN   ||  Field 'offset-count' name potentially not safe for serialization, replaced with 'offset_count'   [io.debezium.schema.FieldNameSelector$FieldNameSanitizer]
      2022-12-08 11:30:49,395 WARN   ||  Field 'catalog-number' name potentially not safe for serialization, replaced with 'catalog_number'   [io.debezium.schema.FieldNameSelector$FieldNameSanitizer]
      2022-12-08 11:30:49,395 WARN   ||  Field 'label-code' name potentially not safe for serialization, replaced with 'label_code'   [io.debezium.schema.FieldNameSelector$FieldNameSanitizer]
      2022-12-08 11:31:00,493 INFO   ||  53 records sent during previous 00:01:18.626, last recorded offset: {sec=1670499060, ord=1, transaction_id=null, resume_token=826391CAF4000000012B022C0100296E5A100463BEA335932C4C6D83991DE596D21920463C5F6964003C34386663303665312D663861342D343963342D613736322D323438353139623238383939000004, h=null}   [io.debezium.connector.common.BaseSourceTask]
      2022-12-08 11:31:07,338 INFO   MongoDB|mongodbserver1|disc  Checking current members of replica set at rs0/mongo:27017   [io.debezium.connector.mongodb.ReplicaSetDiscovery]
      2022-12-08 11:31:16,406 ERROR  ||  WorkerSourceTask{id=mongo-connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted   [org.apache.kafka.connect.runtime.WorkerTask]
      org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
              at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:223)
              at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:149)
              at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
              at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:355)
              at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:258)
              at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
              at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base/java.lang.Thread.run(Thread.java:829)
      Caused by: org.apache.kafka.connect.errors.DataException: isrcs is not a valid field name
              at org.apache.kafka.connect.data.Struct.lookupField(Struct.java:254)
              at org.apache.kafka.connect.data.Struct.put(Struct.java:202)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:214)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:151)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:265)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.lambda$convertFieldValue$0(MongoDataConverter.java:189)
              at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
              at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:181)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:265)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.lambda$convertFieldValue$0(MongoDataConverter.java:189)
              at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
              at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertFieldValue(MongoDataConverter.java:181)
              at io.debezium.connector.mongodb.transforms.MongoDataConverter.convertRecord(MongoDataConverter.java:60)
              at io.debezium.connector.mongodb.transforms.ExtractNewDocumentState.newRecord(ExtractNewDocumentState.java:324)
              at io.debezium.connector.mongodb.transforms.ExtractNewDocumentState.apply(ExtractNewDocumentState.java:264)
              at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
              at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:173)
              at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:207)
              ... 11 more
      2022-12-08 11:31:16,406 INFO   ||  Stopping down connector   [io.debezium.connector.common.BaseSourceTask]
      2022-12-08 11:31:17,119 INFO   MongoDB|mongodbserver1|streaming  Closing all connections to rs0/mongo:27017   [io.debezium.connector.mongodb.ConnectionContext]
      2022-12-08 11:31:17,126 INFO   MongoDB|mongodbserver1|streaming  Finished streaming   [io.debezium.pipeline.ChangeEventSourceCoordinator]
      2022-12-08 11:31:17,127 INFO   MongoDB|mongodbserver1|streaming  Connected metrics set to 'false'   [io.debezium.pipeline.ChangeEventSourceCoordinator]
      2022-12-08 11:31:17,128 INFO   ||  [Producer clientId=connector-producer-mongo-connector-0] Closing the Kafka producer with timeoutMillis = 30000 ms.   [org.apache.kafka.clients.producer.KafkaProducer]
      2022-12-08 11:31:17,131 INFO   ||  Metrics scheduler closed   [org.apache.kafka.common.metrics.Metrics]
      2022-12-08 11:31:17,131 INFO   ||  Closing reporter org.apache.kafka.common.metrics.JmxReporter   [org.apache.kafka.common.metrics.Metrics]
      2022-12-08 11:31:17,131 INFO   ||  Metrics reporters closed   [org.apache.kafka.common.metrics.Metrics]
      2022-12-08 11:31:17,131 INFO   ||  App info kafka.producer for connector-producer-mongo-connector-0 unregistered   [org.apache.kafka.common.utils.AppInfoParser]

       

       

      How to reproduce the issue using our tutorial deployment?

      A simple unit/integration test could be enough to reproduce is, using the following JSON for {} (snip at the end) where the issue is related to the empty `attribute-values` at the bottom

       

      {
              "_id" : "987f3e2d-22a6-4a4f-b840-c80c26b8b91a",
              "quality" : "normal",
              "date" : "2019-06-14",
              "asin" : "null",
              "status-id" : "4e304316-386d-3409-af2e-78857eec5cfe",
              "status" : "Official",
              "disambiguation" : "",
              "text-representation" : {
                      "script" : "Latn",
                      "language" : "eng"
              },
              "relations" : [ ],
              "release-events" : [
                      {
                              "date" : "2019-06-14",
                              "area" : {
                                      "name" : "[Worldwide]",
                                      "disambiguation" : "",
                                      "id" : "525d4e18-3d00-31b9-a58b-a146a916de8f",
                                      "type" : "null",
                                      "type-id" : "null",
                                      "sort-name" : "[Worldwide]",
                                      "iso-3166-1-codes" : [
                                              "XW"
                                      ]
                              }
                      }
              ],
              "packaging-id" : "119eba76-b343-3e02-a292-f0f00644bb9b",
              "country" : "XW",
              "media" : [
                      {
                              "title" : "",
                              "tracks" : [
                                      {
                                              "id" : "33781879-1dae-422c-a634-b26f89705e48",
                                              "position" : "1",
                                              "title" : "Become Desert",
                                              "length" : "2422450",
                                              "number" : "1",
                                              "recording" : {
                                                      "first-release-date" : "2019-06-14",
                                                      "title" : "Become Desert",
                                                      "length" : "2422450",
                                                      "disambiguation" : "",
                                                      "id" : "d90bc0ff-c7d9-4c09-a12b-d46f46f7281d",
                                                      "artist-credit" : [
                                                              {
                                                                      "artist" : {
                                                                              "sort-name" : "Seattle Symphony",
                                                                              "type" : "Orchestra",
                                                                              "type-id" : "a0b36c92-3eb1-3839-a4f9-4799823f54a5",
                                                                              "name" : "Seattle Symphony",
                                                                              "disambiguation" : "",
                                                                              "id" : "0b51c328-1f2b-464c-9e2c-0c2a8cce20ae"
                                                                      },
                                                                      "joinphrase" : ", ",
                                                                      "name" : "Seattle Symphony"
                                                              }
                                                      ],
                                                      "video" : "false",
                                                      "relations" : [
                                                              {
                                                                      "target-type" : "artist",
                                                                      "type-id" : "234670ce-5f22-4fd0-921b-ef1662695c5d",
                                                                      "type" : "conductor",
                                                                      "target-credit" : "",
                                                                      "attribute-values" : {
                                                                      },
                                                                      ...etc...
      

      and a Json snipped for [] where the issue is at the end for `isrcs`

      {
              "_id" : "86d82194-ade8-4822-ba20-1c37703ed19f",
              "status-id" : "4e304316-386d-3409-af2e-78857eec5cfe",
              "quality" : "normal",
              "release-group" : {
                      "first-release-date" : "",
                      "primary-type-id" : "null",
                      "primary-type" : "null",
                      "secondary-types" : [ ],
                      "title" : "The Treatment",
                      "disambiguation" : "",
                      "id" : "4634be1c-553b-3b2b-8340-c8d83f4879f9",
                      "secondary-type-ids" : [ ],
                      "artist-credit" : [
                              {
                                      "artist" : {
                                              "name" : "The Treatment",
                                              "id" : "694c8123-cae2-432b-ae4b-5c8d9c409c41",
                                              "disambiguation" : "unknown, album 'The Treatment'",
                                              "type-id" : "null",
                                              "type" : "null",
                                              "sort-name" : "The Treatment"
                                      },
                                      "joinphrase" : "",
                                      "name" : "The Treatment"
                              }
                      ]
              },
              "asin" : "null",
              "status" : "Official",
              "text-representation" : {
                      "language" : "eng",
                      "script" : "Latn"
              },
              "disambiguation" : "",
              "media" : [
                      {
                              "track-count" : "12",
                              "position" : "1",
                              "format" : "null",
                              "title" : "",
                              "tracks" : [
                                      {
                                              "artist-credit" : [
                                                      {
                                                              "artist" : {
                                                                      "name" : "The Treatment",
                                                                      "disambiguation" : "unknown, album 'The Treatment'",
                                                                      "id" : "694c8123-cae2-432b-ae4b-5c8d9c409c41",
                                                                      "sort-name" : "The Treatment",
                                                                      "type-id" : "null",
                                                                      "type" : "null"
                                                              },
                                                              "joinphrase" : "",
                                                              "name" : "The Treatment"
                                                      }
                                              ],
                                              "recording" : {
                                                      "title" : "GI Blues",
                                                      "length" : "119920",
                                                      "disambiguation" : "",
                                                      "id" : "8c51f247-33a6-42b4-ac76-59e51ef45ffe",
                                                      "video" : "false",
                                                      "artist-credit" : [
                                                              {
                                                                      "artist" : {
                                                                              "sort-name" : "The Treatment",
                                                                              "type-id" : "null",
                                                                              "type" : "null",
                                                                              "id" : "694c8123-cae2-432b-ae4b-5c8d9c409c41",
                                                                              "disambiguation" : "unknown, album 'The Treatment'",
                                                                              "name" : "The Treatment"
                                                                      },
                                                                      "joinphrase" : "",
                                                                      "name" : "The Treatment"
                                                              }
                                                      ],
                                                      "isrcs" : [ ]
                                              },
      ....etc

       

      Feature request or enhancement

      For feature requests or enhancements, provide this information, please:

      Which use case/requirement will be addressed by the proposed feature?

      Ingestion for a public api via Mongo/Debezium

      Implementation ideas (optional)

      Some unit/integration test to reproduce and fix the issue might be enough

      Attachments

        Activity

          People

            Unassigned Unassigned
            dfrancesconi Daniele Francesconi (Inactive)
            Avinash Dongre
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: