Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-7389

Oracle connector unable to find SCN after Exadata maintenance updates

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.6.0.CR1
    • 2.4.0.Final, 2.5.0.Final, 2.6.0.Alpha1
    • oracle-connector
    • None
    • Critical

      What Debezium connector do you use and what version?

      Debezium Server's Oracle Connector (nightly - after PR #5162 had been merged)

      What is the connector configuration?

       

      debezium.sink.type=pubsub
      debezium.sink.pubsub.project.id=bee-data-ingestion
      debezium.sink.pubsub.ordering.enabled=false
      
      debezium.source.connector.class=io.debezium.connector.oracle.OracleConnector
      debezium.source.snapshot.mode=schema_only
      debezium.source.topic.prefix=oracle-planck-ingestion
      debezium.source.tombstones.on.delete=false
      
      debezium.source.log.mining.strategy=redo_log_catalog
      debezium.source.log.mining.batch.size.min=20000
      debezium.source.log.mining.batch.size.max=500000
      debezium.source.log.mining.sleep.time.default.ms=600
      debezium.source.log.mining.transaction.retention.ms=79200000
      debezium.source.query.fetch.size=20000
      
      debezium.source.offset.storage=io.debezium.storage.redis.offset.RedisOffsetBackingStore
      debezium.source.offset.storage.redis.address=wl-data-integration-npf-redis.metaplane.cloud:6379
      debezium.source.offset.storage.redis.password=${REDIS_PASSWORD}
      debezium.source.offset.storage.redis.key=database-ingestion:oracle-planck-cdc:debezium-server:offset
      debezium.source.offset.flush.interval.ms=30000
      
      debezium.source.schema.history.internal.store.only.captured.tables.ddl=true
      debezium.source.schema.history.internal=io.debezium.storage.redis.history.RedisSchemaHistory
      debezium.source.schema.history.internal.redis.address=wl-data-integration-npf-redis.metaplane.cloud:6379
      debezium.source.schema.history.internal.redis.password=${REDIS_PASSWORD}
      debezium.source.schema.history.internal.redis.key=database-ingestion:oracle-planck-cdc:debezium-server:schema_history
      
      debezium.source.decimal.handling.mode=string
      debezium.source.key.converter.schemas.enable=false
      debezium.source.value.converter.schemas.enable=false
      
      debezium.transforms.Reroute.type=io.debezium.transforms.ByLogicalTableRouter
      debezium.transforms.Reroute.topic.regex=.*
      debezium.transforms.Reroute.topic.replacement=oracle-planck-ingestion
      debezium.transforms=Reroute
      
      quarkus.http.port=8080
      quarkus.log.level=DEBUG
      
      debezium.source.database.hostname=dbpkpr-scan.back.b2w
      debezium.source.database.port=1521
      debezium.source.database.dbname=SRV_OGG_PLK
      debezium.source.table.include.list=# (list of 132 tables)  

       

       

      What is the captured database version and mode of depoyment?

      (E.g. on-premises, with a specific cloud provider, etc.)

      Oracle RAC 19c

      What behaviour do you expect?

      When the offset contains an SCN value that is available in the archived logs (that is, in a file not yet deleted from the current primary instance), the connector should be able to locate it and start a mining session from that point.

      What behaviour do you see?

      In DBZ-7345 I've mentioned a few scenarios which seemed to trigger an Oracle connector's failure on locating the offset SCN. The logs for one of those cases (a database version upgrade) pointed to an issue with the removal of duplicate multi-thread sequences, which was promptly solved by ccranfor@redhat.com's PR #5162.

      However, while using a nightly image from after the fix merge, we faced another instance of the same symptom: the "None of log files contains offset SCN" error. After a few of our databases were affected by an Exadata Cloud Infrastructure maintenance update, the Oracle connectors linked to them failed with this same error, even though those databases still presented the log files that contained the respective SCNs and no duplicate sequences were involved.

      Again, it was only possible to avoid the error by manually replacing the offset SCNs. For one of the databases it was even possible to restart reading from a previous SCN with no new interruptions. The logs from this specific case can be found in the appended files (before and after offset repositioning).

      Do you see the same behaviour using the latest relesead Debezium version?

      (Ideally, also verify with latest Alpha/Beta/CR version)

      Could not verify, since the issue happened between #5162's merge and v2.6.0.Alpha1's release and it's not easily reproducible. But I believe none of the more recent PRs have fixed this as they seem not to be related.

      Do you have the connector logs, ideally from start till finish?

      (You might be asked later to provide DEBUG/TRACE level log)

      Yes, please find them attached as
      "planck_logs_nightly_scn_10423209952132_fail.txt", from when the error happened and
      "planck_logs_nightly_scn_10423209745803_success.txt", after offset SCN was manually moved back.

      On the first one, despite the error, the file that contains offset SCN (10423209952132) was selected to be added, as we can see in the following line:

      2024-01-21 12:54:10,539 DEBUG [io.deb.con.ora.log.LogMinerHelper] (debezium-oracleconnector-oracle-planck-ingestion-change-event-source-coordinator) Archive log +RECOC1/DB8PLKPR_65N_VCP/ARCHIVELOG/2024_01_21/thread_4_seq_65948.49404.1158802907 with SCN range 10423209919749 to 10423209995534 sequence 65948 to be added.

      Differing from the previous issue with inaccurate duplicate removals, the sequence 65948 was not duplicated and no log files seem to be removed.

      However, after restarting from a previous SCN (10423209745803), the exact same file ended up being added and the error did not occur.

       

        1. mastersaff_archivelogs.xlsx
          37 kB
          Lucas Marques
        2. mastersaff_logs_fail.txt
          1.89 MB
          Lucas Marques
        3. planck_logs_nightly_scn_10423209745803_success.txt
          10.83 MB
          Lucas Marques
        4. planck_logs_nightly_scn_10423209952132_fail.txt
          4.07 MB
          Lucas Marques
        5. umbrella_threads_view.csv
          2 kB
          Lucas Marques

              ccranfor@redhat.com Chris Cranford
              lpmarques Lucas Marques (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: