Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5932

LATERAL LEFT JOIN can return duplicate rows at batch boundaries

    Details

    • Sprint:
      DV Sprint 62
    • Story Points:
      0.5
    • Steps to Reproduce:
      Hide

      1. In postgresql create the test_having table using the following script:

      CREATE TABLE public.test_having
      (
        str character varying(4000)
      );
      

      2. Add postgresql database configuration in standalone-teiid.xml:

                      <datasource jndi-name="java:/test_dwh_pg" pool-name="test_dwh_pg" enabled="true" use-java-context="true">
                          <connection-url>jdbc:postgresql://localhost:5432/test_dwh?charSet=utf8</connection-url>
                          <driver-class>org.postgresql.Driver</driver-class>
                          <driver>org.postgresql</driver>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                              <prefill>false</prefill>
                              <use-strict-min>false</use-strict-min>
                              <flush-strategy>FailingConnectionOnly</flush-strategy>
                          </pool>
                          <security>
                              <user-name>XXXXX</user-name>
                              <password>XXXXX</password>
                          </security>
                          <validation>
                              <check-valid-connection-sql>select 0</check-valid-connection-sql>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      

      3. Add in test-vdb.xml java:/test_tables_pg configured in previous step as datasource:

          <model name="test_dwh_pg">
              <property name="importer.useFullSchemaName" value="false"/>
              <property name="importer.tableTypes" value="TABLE,VIEW"/>
              <property name="importer.importKeys" value="false"/>
              <source name="test_dwh_pg" translator-name="myPg" connection-jndi-name="java:/test_dwh_pg"/>
          </model>
      

      4. Run the following queries and compare results:

      --Lateral JOIN
      --Correct result
      SELECT
      	xt.event_id
      FROM (SELECT * FROM test_dwh_pg.test_having LIMIT 10000) AS d
      --here
      ,XMLTABLE(
      			'/root'
      			PASSING JSONTOXML('root', d.str)
      			COLUMNS
      				event_id string PATH 'eventid'
      		) xt
      LEFT JOIN XMLTABLE(
      			'/anything'
      			PASSING JSONTOXML('anything', d.str)
      			COLUMNS
      				some_col string PATH 'anything'
      		) xt2 
      	ON xt.event_id = 'xxx'
      	GROUP BY event_id
      	HAVING COUNT(*) > 1 ;;
       
      --INNER JOIN
      --FAILURE(returns 1023 as a result)
      SELECT
      	xt.event_id
      FROM test_dwh_pg.test_having AS d
      --here
      INNER JOIN XMLTABLE(
      			'/root'
      			PASSING JSONTOXML('root', d.str)
      			COLUMNS
      				event_id string PATH 'eventid'
      		) xt
      	ON TRUE
      LEFT JOIN XMLTABLE(
      			'/anything'
      			PASSING JSONTOXML('anything', d.str)
      			COLUMNS
      				some_col string PATH 'anything'
      		) xt2 
      	ON xt.event_id = 'xxx'
      	GROUP BY event_id
      	HAVING COUNT(*) > 1 ;;
       
      --LEFT JOIN
      --FAILURE(returns 1023 as a result)
      SELECT
      	xt.event_id
      FROM test_dwh_pg.test_having AS d
      --here
      LEFT JOIN XMLTABLE(
      			'/root'
      			PASSING JSONTOXML('root', d.str)
      			COLUMNS
      				event_id string PATH 'eventid'
      		) xt
      	ON TRUE
      LEFT JOIN XMLTABLE(
      			'/anything'
      			PASSING JSONTOXML('anything', d.str)
      			COLUMNS
      				some_col string PATH 'anything'
      		) xt2 
      	ON xt.event_id = 'xxx'
      	GROUP BY event_id
      	HAVING COUNT(*) > 1 ;;
       
      --CROSS JOIN
      --FAILURE(returns 1023 as a result)
      SELECT
      	xt.event_id
      FROM test_dwh_pg.test_having AS d
      --here
      CROSS JOIN XMLTABLE(
      			'/root'
      			PASSING JSONTOXML('root', d.str)
      			COLUMNS
      				event_id string PATH 'eventid'
      		) xt
      LEFT JOIN XMLTABLE(
      			'/anything'
      			PASSING JSONTOXML('anything', d.str)
      			COLUMNS
      				some_col string PATH 'anything'
      		) xt2 
      	ON xt.event_id = 'xxx'
      	GROUP BY event_id
      	HAVING COUNT(*) > 1 ;;
      
      Show
      1. In postgresql create the test_having table using the following script: CREATE TABLE public .test_having ( str character varying (4000) ); 2. Add postgresql database configuration in standalone-teiid.xml: <datasource jndi-name= "java:/test_dwh_pg" pool-name= "test_dwh_pg" enabled= "true" use-java-context= "true" > <connection-url> jdbc:postgresql://localhost:5432/test_dwh?charSet=utf8 </connection-url> <driver-class> org.postgresql.Driver </driver-class> <driver> org.postgresql </driver> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> <prefill> false </prefill> <use-strict-min> false </use-strict-min> <flush-strategy> FailingConnectionOnly </flush-strategy> </pool> <security> <user-name> XXXXX </user-name> <password> XXXXX </password> </security> <validation> <check-valid-connection-sql> select 0 </check-valid-connection-sql> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_tables_pg configured in previous step as datasource: <model name= "test_dwh_pg" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_dwh_pg" translator-name= "myPg" connection-jndi-name= "java:/test_dwh_pg" /> </model> 4. Run the following queries and compare results: -- Lateral JOIN --Correct result SELECT xt.event_id FROM ( SELECT * FROM test_dwh_pg.test_having LIMIT 10000) AS d --here ,XMLTABLE( '/root' PASSING JSONTOXML( 'root' , d.str) COLUMNS event_id string PATH 'eventid' ) xt LEFT JOIN XMLTABLE( '/anything' PASSING JSONTOXML( 'anything' , d.str) COLUMNS some_col string PATH 'anything' ) xt2 ON xt.event_id = 'xxx' GROUP BY event_id HAVING COUNT (*) > 1 ;; -- INNER JOIN --FAILURE( returns 1023 as a result ) SELECT xt.event_id FROM test_dwh_pg.test_having AS d --here INNER JOIN XMLTABLE( '/root' PASSING JSONTOXML( 'root' , d.str) COLUMNS event_id string PATH 'eventid' ) xt ON TRUE LEFT JOIN XMLTABLE( '/anything' PASSING JSONTOXML( 'anything' , d.str) COLUMNS some_col string PATH 'anything' ) xt2 ON xt.event_id = 'xxx' GROUP BY event_id HAVING COUNT (*) > 1 ;; -- LEFT JOIN --FAILURE( returns 1023 as a result ) SELECT xt.event_id FROM test_dwh_pg.test_having AS d --here LEFT JOIN XMLTABLE( '/root' PASSING JSONTOXML( 'root' , d.str) COLUMNS event_id string PATH 'eventid' ) xt ON TRUE LEFT JOIN XMLTABLE( '/anything' PASSING JSONTOXML( 'anything' , d.str) COLUMNS some_col string PATH 'anything' ) xt2 ON xt.event_id = 'xxx' GROUP BY event_id HAVING COUNT (*) > 1 ;; -- CROSS JOIN --FAILURE( returns 1023 as a result ) SELECT xt.event_id FROM test_dwh_pg.test_having AS d --here CROSS JOIN XMLTABLE( '/root' PASSING JSONTOXML( 'root' , d.str) COLUMNS event_id string PATH 'eventid' ) xt LEFT JOIN XMLTABLE( '/anything' PASSING JSONTOXML( 'anything' , d.str) COLUMNS some_col string PATH 'anything' ) xt2 ON xt.event_id = 'xxx' GROUP BY event_id HAVING COUNT (*) > 1 ;;

      Description

      HAVING in combination with LEFT JOIN of two XMLTABLE sub-queries returns an incorrect result if one of XMLTABLE sub-queries has INNER, LEFT or CROSS JOIN instead of a lateral one and amount of rows in the source is equal or more than 1024. For example, the query:

      SELECT
      	xt.event_id
      FROM test_dwh_pg.test_having AS d
      --here
      INNER JOIN XMLTABLE(
      			'/root'
      			PASSING JSONTOXML('root', d.str)
      			COLUMNS
      				event_id string PATH 'eventid'
      		) xt
      	ON TRUE
      LEFT JOIN XMLTABLE(
      			'/anything'
      			PASSING JSONTOXML('anything', d.str)
      			COLUMNS
      				some_col string PATH 'anything'
      		) xt2 
      	ON xt.event_id = 'xxx'
      	GROUP BY event_id
      	HAVING COUNT(*) > 1 ;;
      

      will return a result but it should return no results as there are no duplicates in test data.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                shawkins Steven Hawkins
                Reporter:
                dalex005 Dmitrii Pogorelov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 4 hours Original Estimate - 4 hours
                  4h
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 5 hours, 30 minutes
                  5h 30m