Uploaded image for project: 'Debezium'
  1. Debezium
  2. DBZ-3479

SQL Server CDC event timestamps do not get converted to UTC

    XMLWordPrintable

Details

    • False
    • False
    • Undefined
    • Hide
      1. Start the test SQL Server container in a timezone different from UTC:
        diff --git a/debezium-connector-sqlserver/pom.xml b/debezium-connector-sqlserver/pom.xml
        index fa7fa447b..bea9ef34b 100644
        --- a/debezium-connector-sqlserver/pom.xml
        +++ b/debezium-connector-sqlserver/pom.xml
        @@ -115,6 +115,7 @@
                                             <SA_PASSWORD>${sqlserver.password}</SA_PASSWORD>
                                             <MSSQL_PID>Standard</MSSQL_PID>
                                             <MSSQL_AGENT_ENABLED>true</MSSQL_AGENT_ENABLED>
        +                                    <TZ>America/Los_Angeles</TZ>
                                         </env>
                                         <ports>
                                             <port>${sqlserver.port}:1433</port>
           
      1. Run some integration tests to populate the CDC schema.
      2. Run the following query in the test database:
      SELECT tran_begin_time,
      CURRENT_TIMEZONE() AS TZ,
      tran_begin_time AT TIME ZONE 'UTC' AS UTC
      FROM testDB.cdc.lsn_time_mapping;
      
      tran_begin_time         TZ                                      UTC
      ----------------------- --------------------------------------- ------------------------------
      2021-04-28 16:26:36.387 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:26:36.387 +00:00
      2021-04-28 16:26:36.360 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:26:36.360 +00:00
      2021-04-28 16:26:43.293 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:26:43.293 +00:00
      2021-04-28 16:26:45.347 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:26:45.347 +00:00
      2021-04-28 16:26:45.333 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:26:45.333 +00:00
      2021-04-28 16:26:45.567 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:26:45.567 +00:00
      2021-04-28 16:31:50.070 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:31:50.070 +00:00
      2021-04-28 16:36:55.047 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 16:36:55.047 +00:00
      

      As you can see, the values in the first and the third columns are the same with the exception of the +00:00 timezone offset added that corresponds to UTC, not PDT.

      A proper solution might be to use a conversion like the following:

      TODATETIMEOFFSET(?, DATEPART(TZOFFSET, SYSDATETIMEOFFSET())) AT TIME ZONE 'UTC'
      

      In this case, the result will look as expected:

      1> SELECT tran_begin_time,
      2> CURRENT_TIMEZONE() AS TZ,
      3> TODATETIMEOFFSET(
      4> tran_begin_time,
      5> DATEPART(TZOFFSET, SYSDATETIMEOFFSET())
      6> ) AT TIME ZONE 'UTC' AS UTC
      7> FROM testDB.cdc.lsn_time_mapping;
      8> GO
      tran_begin_time         TZ                                      UTC
      ----------------------- --------------------------------------- ------------------------------
      2021-04-28 16:26:36.387 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:26:36.387 +00:00
      2021-04-28 16:26:36.360 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:26:36.360 +00:00
      2021-04-28 16:26:43.293 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:26:43.293 +00:00
      2021-04-28 16:26:45.347 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:26:45.347 +00:00
      2021-04-28 16:26:45.333 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:26:45.333 +00:00
      2021-04-28 16:26:45.567 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:26:45.567 +00:00
      2021-04-28 16:31:50.070 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:31:50.070 +00:00
      2021-04-28 16:36:55.047 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:36:55.047 +00:00
      2021-04-28 16:42:00.057 (UTC-08:00) Pacific Time (US & Canada)  2021-04-28 23:42:00.057 +00:00
      
      (9 rows affected)
      
      Notes
      1. The proposed solution above is a result of quick googling and may need validation and/or optimization.
      2. The solution above is still error-prone since the conversion will not work as expected if the server timezone changes (e.g. due to DST changes) between a CDC record has been produced and consumed (ref).
      Show
      Start the test SQL Server container in a timezone different from UTC: diff --git a/debezium-connector-sqlserver/pom.xml b/debezium-connector-sqlserver/pom.xml index fa7fa447b..bea9ef34b 100644 --- a/debezium-connector-sqlserver/pom.xml +++ b/debezium-connector-sqlserver/pom.xml @@ -115,6 +115,7 @@ <SA_PASSWORD>${sqlserver.password}</SA_PASSWORD> <MSSQL_PID>Standard</MSSQL_PID> <MSSQL_AGENT_ENABLED> true </MSSQL_AGENT_ENABLED> + <TZ>America/Los_Angeles</TZ> </env> <ports> <port>${sqlserver.port}:1433</port> Run some integration tests to populate the CDC schema. Run the following query in the test database: SELECT tran_begin_time, CURRENT_TIMEZONE() AS TZ, tran_begin_time AT TIME ZONE 'UTC' AS UTC FROM testDB.cdc.lsn_time_mapping; tran_begin_time TZ UTC ----------------------- --------------------------------------- ------------------------------ 2021-04-28 16:26:36.387 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:26:36.387 +00:00 2021-04-28 16:26:36.360 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:26:36.360 +00:00 2021-04-28 16:26:43.293 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:26:43.293 +00:00 2021-04-28 16:26:45.347 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:26:45.347 +00:00 2021-04-28 16:26:45.333 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:26:45.333 +00:00 2021-04-28 16:26:45.567 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:26:45.567 +00:00 2021-04-28 16:31:50.070 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:31:50.070 +00:00 2021-04-28 16:36:55.047 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 16:36:55.047 +00:00 As you can see, the values in the first and the third columns are the same with the exception of the +00:00 timezone offset added that corresponds to UTC, not PDT. A proper solution might be to use a conversion like the following: TODATETIMEOFFSET(?, DATEPART(TZOFFSET, SYSDATETIMEOFFSET())) AT TIME ZONE 'UTC' In this case, the result will look as expected: 1> SELECT tran_begin_time, 2> CURRENT_TIMEZONE() AS TZ, 3> TODATETIMEOFFSET( 4> tran_begin_time, 5> DATEPART(TZOFFSET, SYSDATETIMEOFFSET()) 6> ) AT TIME ZONE 'UTC' AS UTC 7> FROM testDB.cdc.lsn_time_mapping; 8> GO tran_begin_time TZ UTC ----------------------- --------------------------------------- ------------------------------ 2021-04-28 16:26:36.387 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:26:36.387 +00:00 2021-04-28 16:26:36.360 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:26:36.360 +00:00 2021-04-28 16:26:43.293 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:26:43.293 +00:00 2021-04-28 16:26:45.347 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:26:45.347 +00:00 2021-04-28 16:26:45.333 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:26:45.333 +00:00 2021-04-28 16:26:45.567 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:26:45.567 +00:00 2021-04-28 16:31:50.070 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:31:50.070 +00:00 2021-04-28 16:36:55.047 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:36:55.047 +00:00 2021-04-28 16:42:00.057 (UTC-08:00) Pacific Time (US & Canada) 2021-04-28 23:42:00.057 +00:00 (9 rows affected) Notes The proposed solution above is a result of quick googling and may need validation and/or optimization. The solution above is still error-prone since the conversion will not work as expected if the server timezone changes (e.g. due to DST changes) between a CDC record has been produced and consumed ( ref ).

    Description

      As a solution to DBZ-1643, the SQL Server connector attempts to convert transaction times to UTC using the AT TIME ZONE 'UTC' expression.

      It doesn't seem to work because the lsn_time_mapping table stores only the local time of the transaction, and according to the SQL Server documentation,

      When inputdate is provided without offset information, the function applies the offset of the time zone assuming that inputdate is in the target time zone.

      As a result, if a server runs say in UTC-4, the connector metrics always show it as 4 hours behind the source.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sergeimorozov Sergei Morozov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: