Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-5705

Execution of a Hibrid Query that involves certain specific 'OR' conditions returns incorrect results

    XMLWordPrintable

Details

    • Hide

      Create a embedded query such as
      SELECT p.ID, p.NAME FROM com.testapp.Person p WHERE p.IS_ACTIVE=1 AND (p.CITY='city1' OR p.CITY='city2') AND p.ID>1000,

      Execute the query, and verify the results

      Show
      Create a embedded query such as SELECT p.ID, p.NAME FROM com.testapp.Person p WHERE p.IS_ACTIVE=1 AND (p.CITY='city1' OR p.CITY='city2') AND p.ID>1000, Execute the query, and verify the results

    Description

      The hibrid query, mentioned below produces wrong results, upon execution.

      SELECT p.ID, p.NAME FROM com.testapp.Person p WHERE p.IS_ACTIVE=1 AND (p.CITY='city1' OR p.CITY='city2') AND p.ID>1000

      for the Data:

      ID | NAME | CITY | IS_ACTIVE
      -------------------
      2001 person1 city1 1
      2002 person1 city1 1
      2003 person1 city1 1
      2004 person1 city1 0
      2005 person1 city2 1
      2006 person1 city2 1
      2007 person1 city3 0
      2008 person1 city3 1

      Indexed fields: ID, NAME, CITY
      Non-indexed fields: IS_ACTIVE

      Query execution returned 0 number of rows, whereas expected result count is 5.

      After some analysis root cause for the problem has been found, and the details are as follows,

      As a part of separating the query that depends on indexed fields, using boole shannon algorithm, it has first extraced out the condition 'IS_ACTIVE=1' (as it deals with non-indexed fields).
      Assuming four boolean conditions as c1, c2, c3, c4, where c1 represents the condition 'IS_ACTIVE=1'
      f(c1,c2,c3,c4) = c1.f(1, c2, c3, c4) + c1`.f`(0, c2, c3, c4)

      consider
      e1=f(1, c2, c3, c4)
      e2=f`(0, c2, c3, c4)
      which have to be used for constructing the lucene query, for further transformation.

      After splitting as per the above logic, it is trying to optimize the resultant subconditions(e1, e2), so as to reduce the number of conditions to be evaluated. One such optimization that has been found is that when there is a conjuction operation between two comparison predicates dealing with same lvalues(here, same fields), but with the different rvalues(here, constants), it has been optimized to replace it with a contradiction(constant false boolean expression). As we know, this optimization is applicable only for conjuction. But, the same is applied even for the disjuction, which is causing the above mentioned problem.

      For example,
      condition p.CITY='city1' AND p.CITY='city2'
      can be replaced with CONTRADICTION(BooleanConst.FALSE)
      But, the same cannot be done, for
      condition p.CITY='city1' OR p.CITY='city2'

      Following change may correct the problem.

      File: infinispan-8.0.0.Beta3/object-filter/src/main/java/org/infinispan/objectfilter/impl/syntax/BooleShannonExpansion.java:155

      diff:

      @@ -152,7 +152,7 @@
      }
      }
      }

      • PredicateOptimisations.optimizePredicates(newChildren, true);
        + PredicateOptimisations.optimizePredicates(newChildren, false);
        if (newChildren.size() == 1) { return newChildren.get(0); }

      Attachments

        Activity

          People

            anistor Adrian Nistor
            prashanth.reddy_jira Prashanth Reddy (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: