Uploaded image for project: 'Teiid'
  1. Teiid
  2. TEIID-5930

FULL JOIN missing data when not using LIMIT clause

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 14.0, 13.0.3, 13.1.1
    • 13.1
    • Query Engine
    • None
    • DV Sprint 62
    • 0.5
    • Hide

      1. In postgresql create the calendar_m, table1_m and table2_m tables using the following script:

      select x.a as date_value, null::integer id
      into table1_m
      from generate_series('2018-06-01'::date, '2020-06-01'::date, '1 day'::interval) x(a);
       
      select x.a as date_value, null::integer id
      into table2_m
      from generate_series('2019-06-01'::date, '2021-06-01'::date, '1 day'::interval) x(a);
       
      select x.a as date_value
      into calendar_m
      from generate_series('2015-01-01'::date, '2025-01-01'::date, '1 day'::interval) x(a);
      

      2. Add postgresql database configuration in standalone-teiid.xml:

                      <datasource jndi-name="java:/test_dwh_pg" pool-name="test_dwh_pg" enabled="true" use-java-context="true">
                          <connection-url>jdbc:postgresql://localhost:5432/test_dwh?charSet=utf8</connection-url>
                          <driver-class>org.postgresql.Driver</driver-class>
                          <driver>org.postgresql</driver>
                          <pool>
                              <min-pool-size>2</min-pool-size>
                              <max-pool-size>70</max-pool-size>
                              <prefill>false</prefill>
                              <use-strict-min>false</use-strict-min>
                              <flush-strategy>FailingConnectionOnly</flush-strategy>
                          </pool>
                          <security>
                              <user-name>XXXXX</user-name>
                              <password>XXXXX</password>
                          </security>
                          <validation>
                              <check-valid-connection-sql>select 0</check-valid-connection-sql>
                          </validation>
                          <timeout>
                              <blocking-timeout-millis>120000</blocking-timeout-millis>
                              <idle-timeout-minutes>5</idle-timeout-minutes>
                          </timeout>
                      </datasource>
      

      3. Add in test-vdb.xml java:/test_tables_pg configured in previous step as datasource:

          <model name="test_dwh_pg">
              <property name="importer.useFullSchemaName" value="false"/>
              <property name="importer.tableTypes" value="TABLE,VIEW"/>
              <property name="importer.importKeys" value="false"/>
              <source name="test_dwh_pg" translator-name="myPg" connection-jndi-name="java:/test_dwh_pg"/>
          </model>
      

      4. Configure in the test-vdb.xml the following virtual views:

          <model visible = "true" type = "VIRTUAL" name = "views">
              <metadata type = "DDL"><![CDATA[
              CREATE VIEW view1_m
              AS
                SELECT
                  days.date_value as date_value1
                  ,table1_m.*
                FROM 
                  (SELECT *
                   FROM test_dwh_pg.calendar_m
                   WHERE year(date_value) BETWEEN 2018 AND year(NOW())+2) days
                   LEFT JOIN (select * from test_dwh_pg.table1_m) table1_m on days.date_value = table1_m.date_value;
              CREATE VIEW view2_m
              AS
                SELECT
                  days.date_value as date_value2
                  ,table2_m.*
                FROM 
                  (SELECT *
                   FROM test_dwh_pg.calendar_m
                   WHERE year(date_value) BETWEEN 2018 AND year(NOW())+2) days
                   LEFT JOIN (select * from test_dwh_pg.table2_m) table2_m on days.date_value = table2_m.date_value
              ]]>
              </metadata>
          </model>
      

      5. Compare the results of these two queries:

      SELECT COUNT(*) 
      FROM 
          (
              SELECT 
                  date_value1 
                  , id
              FROM views.view1_m 
              Limit 1000000000
          ) v1 
      full 
      JOIN  
          (
              SELECT 
                  date_value2 
                  , id
              FROM views.view2_m 
              LIMIT 1000000000
          ) v2 
          ON 
              v1.date_value1 = v2.date_value2 
              and v1.id = v2.id ;;
      

      and

      SELECT COUNT(*) 
      FROM 
          (
              SELECT 
                  date_value1 
                  , id
              FROM views.view1_m 
              --Limit 1000000000
          ) v1 
      full 
      JOIN  
          (
              SELECT 
                  date_value2 
                  , id
              FROM views.view2_m 
              --LIMIT 1000000000
          ) v2 
          ON 
              v1.date_value1 = v2.date_value2 
              and v1.id = v2.id ;;
      
      Show
      1. In postgresql create the calendar_m, table1_m and table2_m tables using the following script: select x. a as date_value, null :: integer id into table1_m from generate_series( '2018-06-01' :: date , '2020-06-01' :: date , '1 day ' :: interval ) x( a ); select x. a as date_value, null :: integer id into table2_m from generate_series( '2019-06-01' :: date , '2021-06-01' :: date , '1 day ' :: interval ) x( a ); select x. a as date_value into calendar_m from generate_series( '2015-01-01' :: date , '2025-01-01' :: date , '1 day ' :: interval ) x( a ); 2. Add postgresql database configuration in standalone-teiid.xml: <datasource jndi-name= "java:/test_dwh_pg" pool-name= "test_dwh_pg" enabled= "true" use-java-context= "true" > <connection-url> jdbc:postgresql://localhost:5432/test_dwh?charSet=utf8 </connection-url> <driver-class> org.postgresql.Driver </driver-class> <driver> org.postgresql </driver> <pool> <min-pool-size> 2 </min-pool-size> <max-pool-size> 70 </max-pool-size> <prefill> false </prefill> <use-strict-min> false </use-strict-min> <flush-strategy> FailingConnectionOnly </flush-strategy> </pool> <security> <user-name> XXXXX </user-name> <password> XXXXX </password> </security> <validation> <check-valid-connection-sql> select 0 </check-valid-connection-sql> </validation> <timeout> <blocking-timeout-millis> 120000 </blocking-timeout-millis> <idle-timeout-minutes> 5 </idle-timeout-minutes> </timeout> </datasource> 3. Add in test-vdb.xml java:/test_tables_pg configured in previous step as datasource: <model name= "test_dwh_pg" > <property name= "importer.useFullSchemaName" value= "false" /> <property name= "importer.tableTypes" value= "TABLE,VIEW" /> <property name= "importer.importKeys" value= "false" /> <source name= "test_dwh_pg" translator-name= "myPg" connection-jndi-name= "java:/test_dwh_pg" /> </model> 4. Configure in the test-vdb.xml the following virtual views: <model visible = "true" type = "VIRTUAL" name = "views" > <metadata type = "DDL" > <![CDATA[ CREATE VIEW view1_m AS SELECT days.date_value as date_value1 ,table1_m.* FROM (SELECT * FROM test_dwh_pg.calendar_m WHERE year(date_value) BETWEEN 2018 AND year(NOW())+2) days LEFT JOIN (select * from test_dwh_pg.table1_m) table1_m on days.date_value = table1_m.date_value; CREATE VIEW view2_m AS SELECT days.date_value as date_value2 ,table2_m.* FROM (SELECT * FROM test_dwh_pg.calendar_m WHERE year(date_value) BETWEEN 2018 AND year(NOW())+2) days LEFT JOIN (select * from test_dwh_pg.table2_m) table2_m on days.date_value = table2_m.date_value ]]> </metadata> </model> 5. Compare the results of these two queries: SELECT COUNT (*) FROM ( SELECT date_value1 , id FROM views.view1_m Limit 1000000000 ) v1 full JOIN ( SELECT date_value2 , id FROM views.view2_m LIMIT 1000000000 ) v2 ON v1.date_value1 = v2.date_value2 and v1.id = v2.id ;; and SELECT COUNT (*) FROM ( SELECT date_value1 , id FROM views.view1_m -- Limit 1000000000 ) v1 full JOIN ( SELECT date_value2 , id FROM views.view2_m -- LIMIT 1000000000 ) v2 ON v1.date_value1 = v2.date_value2 and v1.id = v2.id ;;

      When commenting out/uncommenting limit clause in the query:

      SELECT COUNT(*) 
      FROM 
          (
              SELECT 
                  date_value1 
                  , id
              FROM views.view1_m 
              Limit 1000000000
          ) v1 
      full 
      JOIN  
          (
              SELECT 
                  date_value2 
                  , id
              FROM views.view2_m 
              LIMIT 1000000000
          ) v2 
          ON 
              v1.date_value1 = v2.date_value2 
              and v1.id = v2.id ;;
      

      teiid returns different results, for example, when using limits in all subqueries Teiid will return 3652, but when commenting out all limit clauses teiid in turn will return already 0.

              rhn-engineering-shawkins Steven Hawkins
              dalex005 Dmitrii Pogorelov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 4 hours
                  4h
                  Remaining:
                  Time Spent - 3 hours Remaining Estimate - 1 hour
                  1h
                  Logged:
                  Time Spent - 3 hours Remaining Estimate - 1 hour
                  3h