• Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 1.0.0.CR1
    • 0.10.0.Beta3
    • postgresql-connector
    • None

      If a new type is introduced as an alias to a built-in type like CREATE DOMAIN money2 AS NUMERIC(12,2); then it is not correctly handled in io.debezium.connector.postgresql.PostgresValueConverter.convertBinary(Column, Field, Object).
      The issue is visble in snapshot as the value arrives as the logical Java class and is not thus possible to be converted as a binary value.

      See https://issues.jboss.org/browse/DBZ-1009?focusedCommentId=13757501&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13757501 for reference.

      Key problems

      • streaming mode delivers byte representation but snapshot mode delivers Java class
      • how to handle default values?

            [DBZ-1413] Incorrect handling of type alias

            Released

            Jiri Pechanec added a comment - Released

            jpechane directed me here from DBZ-920. Although I see, that this issue is related, I have trouble assesing, whether it touches the same problem. If you consider an enum to be an alias of string, with additional constraint, that its value matches one of the pre-defined values, then I guess I have the same problem as described in this ticket.

            I have a problem with getting Postgres enum value out. I create an enum type similar to:

            CREATE TYPE test.test_type AS ENUM (
               'Value1',
               'Value2'
            );
            

            When I stream changes from table that uses the type, the column is ignored by default. After setting "include.unknown.datatypes": true and
            _"column.propagate.source.type": ".*", I can get the value out, but it is returned as bytes (the bytes converted to string match the enum value) and column type is
            UNKNOWN

            "__debezium.source.column.type": "UNKNOWN",
            "__debezium.source.column.scale": "-1"
            

            If I understood correctly, when this issue is solved, the enum columns' values would be returned as strings and debezium.source.column.type would be
            TEST_TYPE. Did I get that right?

            Krzysztof Lesniewski (Inactive) added a comment - jpechane directed me here from DBZ-920 . Although I see, that this issue is related, I have trouble assesing, whether it touches the same problem. If you consider an enum to be an alias of string, with additional constraint, that its value matches one of the pre-defined values, then I guess I have the same problem as described in this ticket. I have a problem with getting Postgres enum value out. I create an enum type similar to: CREATE TYPE test.test_type AS ENUM ( 'Value1' , 'Value2' ); When I stream changes from table that uses the type, the column is ignored by default. After setting "include.unknown.datatypes": true and _"column.propagate.source.type": ".*", I can get the value out, but it is returned as bytes (the bytes converted to string match the enum value) and column type is UNKNOWN "__debezium.source.column.type" : "UNKNOWN" , "__debezium.source.column.scale" : "-1" If I understood correctly, when this issue is solved, the enum columns' values would be returned as strings and debezium.source.column.type would be TEST_TYPE . Did I get that right?

            I'd expect to get BigDecimal? Is it because the custom type does not provide the correct scale/length?

            Actually no. The PgResultSetMetaData actually reports the OID as 1700 (numeric) rather than the OID for the domain type. So during PostgresSnapshotChangeEventSource#getColumnValue, we actually end up building a SpecialDecimalValue because of that. I would if instead we should consider basing the switch logic off Column#getNativeType() which actually is the OID that represents the domain type?

            Chris Cranford added a comment - I'd expect to get BigDecimal? Is it because the custom type does not provide the correct scale/length? Actually no. The PgResultSetMetaData actually reports the OID as 1700 (numeric) rather than the OID for the domain type. So during PostgresSnapshotChangeEventSource#getColumnValue , we actually end up building a SpecialDecimalValue because of that. I would if instead we should consider basing the switch logic off Column#getNativeType() which actually is the OID that represents the domain type?

            ccranfor@redhat.com
            I think the concurrency question has been quite alleviated with switch to the new framework. Given the current characteristic I'd

            • suppose, require and document not-thread safety
            • If mutlithreading access will be necessary I think the copy-on-write semantics would be good enough if the updates are really only very rare

            snapshot - why is that? I'd expect to get BigDecimal? Is it because the custom type does not provide the correct scale/length? If it is the case we might need to update/hook the common snapshotting code.

            Genrally I believe all is solvable now.

            Jiri Pechanec added a comment - ccranfor@redhat.com I think the concurrency question has been quite alleviated with switch to the new framework. Given the current characteristic I'd suppose, require and document not-thread safety If mutlithreading access will be necessary I think the copy-on-write semantics would be good enough if the updates are really only very rare snapshot - why is that? I'd expect to get BigDecimal? Is it because the custom type does not provide the correct scale/length? If it is the case we might need to update/hook the common snapshotting code. Genrally I believe all is solvable now.

            Chris Cranford added a comment - - edited

            Hi jpechane

            The ponly problem is - is there a concurrent access to the registry? I'd like to avoid adding synchronization to it if possible.

            I am not aware of any concurrent access. My hope is that for the most part building the TypeRegistry up-front with these domain types solves 99% of the use cases for users. The only time that we'd mutate the TypeRegistry is when a new table is added with a new domain type at runtime. If we'd rather not make it mutable, could we add that as a limitation that the connector should be restarted after adding new domain types and before adding new objects that use said domain types?

            have you checked the snapshot -e.g. JDBC? How does it behave in this case?

            In this case we get a SpecialValueDecimal with a decimalValue.

            could you please check the behaviour of having column.propagate.source.type set to one of columns with custom type?

            Without any changes, what I observe is that the schema parameters for the Field are populated with:

            __debezium.source.column.type=MONEY2
            __debezium.source.column.length=2147483647
            __debezium.source.column.scale=0
            

            This should be resolved automatically by making sure that the MONEY2 domain type is in the TypeRegistry and that the right attribute type modifiers are being applied to the type so that the length, precision, and scale are being resolved and therefore can be propagated downstream when this is enabled.

            Given that MONEY2 is defined as NUMERIC(12,2) in the above use case, I would expect it to then have these parameters defined as

            __debezium.source.column.type=MONEY2
            __debezium.source.column.length=12
            __debezium.source.column.scale=2
            

            Chris Cranford added a comment - - edited Hi jpechane The ponly problem is - is there a concurrent access to the registry? I'd like to avoid adding synchronization to it if possible. I am not aware of any concurrent access. My hope is that for the most part building the TypeRegistry up-front with these domain types solves 99% of the use cases for users. The only time that we'd mutate the TypeRegistry is when a new table is added with a new domain type at runtime. If we'd rather not make it mutable, could we add that as a limitation that the connector should be restarted after adding new domain types and before adding new objects that use said domain types? have you checked the snapshot -e.g. JDBC? How does it behave in this case? In this case we get a SpecialValueDecimal with a decimalValue. could you please check the behaviour of having column.propagate.source.type set to one of columns with custom type? Without any changes, what I observe is that the schema parameters for the Field are populated with: __debezium.source.column.type=MONEY2 __debezium.source.column.length=2147483647 __debezium.source.column.scale=0 This should be resolved automatically by making sure that the MONEY2 domain type is in the TypeRegistry and that the right attribute type modifiers are being applied to the type so that the length, precision, and scale are being resolved and therefore can be propagated downstream when this is enabled. Given that MONEY2 is defined as NUMERIC(12,2) in the above use case, I would expect it to then have these parameters defined as __debezium.source.column.type=MONEY2 __debezium.source.column.length=12 __debezium.source.column.scale=2

            ccranfor@redhat.com I guess Gunnar meant DBZ-808

            Jiri Pechanec added a comment - ccranfor@redhat.com I guess Gunnar meant DBZ-808

            gunnar.morling, was your prior comment meant for something else? I'm not following exactly what you were trying to convey.

            Chris Cranford added a comment - gunnar.morling , was your prior comment meant for something else? I'm not following exactly what you were trying to convey.

            Please see the linked DBZ-1413 for something that'd probably be a good test case.

            Gunnar Morling added a comment - Please see the linked DBZ-1413 for something that'd probably be a good test case.

            ccranfor@redhat.com COol, thanks for the investigation. The TypeRegistry plan is exactly what I had in mind. Add another method that would try to find out the type in case it is missing from the internal map. The ponly problem is - is there a concurrent access to the registry? I'd like to avoid adding synchronization to it if possible.

            Also before following this path - have you checked the snapshot -e.g. JDBC? How does it behave in this case?

            Also last but not least could you please check the behaviour of having column.propagate.source.type set to one of columns with custom type?

            Jiri Pechanec added a comment - ccranfor@redhat.com COol, thanks for the investigation. The TypeRegistry plan is exactly what I had in mind. Add another method that would try to find out the type in case it is missing from the internal map. The ponly problem is - is there a concurrent access to the registry? I'd like to avoid adding synchronization to it if possible. Also before following this path - have you checked the snapshot -e.g. JDBC? How does it behave in this case? Also last but not least could you please check the behaviour of having column.propagate.source.type set to one of columns with custom type?

            jpechane gunnar.morling:

            What do we get as the type name in decoder event?

            We get the domain type, money2 in all 3 decoder events.

            If the other can we use any postgres metadata query to get the source type and do the correct transformation?

            We certainly can but should we consider making TypeRegistry mutable? While we can fetch the base type for a newly added domain type as needed, I think it would make sense to cache this in TypeRegistry so that we avoid doing that lookup unnecessarily. What are you guys thoughts?

            Chris Cranford added a comment - jpechane gunnar.morling : What do we get as the type name in decoder event? We get the domain type, money2 in all 3 decoder events. If the other can we use any postgres metadata query to get the source type and do the correct transformation? We certainly can but should we consider making TypeRegistry mutable? While we can fetch the base type for a newly added domain type as needed, I think it would make sense to cache this in TypeRegistry so that we avoid doing that lookup unnecessarily. What are you guys thoughts?

              ccranfor@redhat.com Chris Cranford
              jpechane Jiri Pechanec
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: