Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.4.0.Final
Affects Version/s: 4.1.0.Final
Component/s: Query
Labels:
None

Steps to Reproduce:
Hide

Tests in TokenStreamTest.java that demonstrate the issue:

@Test public void shouldMatchUpperCaseVersionOfßCharacterWhenCaseInsensitive() { content = "ß"; makeCaseInsensitive(); tokens.consume("SS"); assertThat(tokens.hasNext(), is(false)); } @Test public void shouldHandleTokensAfterßCharacterWhenCaseInsensitive() { content = "ß and"; makeCaseInsensitive(); tokens.consume(TokenStream.ANY_VALUE); tokens.consume("AND"); assertThat(tokens.hasNext(), is(false)); }
Show
Tests in TokenStreamTest.java that demonstrate the issue: @Test public void shouldMatchUpperCaseVersionOfßCharacterWhenCaseInsensitive() { content = "ß"; makeCaseInsensitive(); tokens.consume("SS"); assertThat(tokens.hasNext(), is(false)); } @Test public void shouldHandleTokensAfterßCharacterWhenCaseInsensitive() { content = "ß and"; makeCaseInsensitive(); tokens.consume(TokenStream.ANY_VALUE); tokens.consume("AND"); assertThat(tokens.hasNext(), is(false)); }
Git Pull Request:
https://github.com/ModeShape/modeshape/pull/1451

Description

When performing SQL2 queries containing strings with the german ß symbol, the query is not parsed correctly.

Exception when handling request.: javax.jcr.query.InvalidQueryException: The JCR-SQL2 query "SELECT metadatanode.*, document.'jcr:created' FROM [tresorxml:element] AS metadatanode INNER JOIN [tresorxml:document] AS document ON ISDESCENDANTNODE(metadatanode, document) WHERE NAME(metadatanode) = 'xaip:metaDataSection' AND PATH(document) LIKE '/tresorxml:vault[5]/My Folders/?/%' AND DEPTH(document) = CAST(4 AS LONG) ORDER BY document.'jcr:created' DESC" is not well-formed: Unexpected token 'AND' at line 1, column 250

The reason for this is that the tokeniser parses queries in a case-insensitive manner, and the JVM converts ß to SS in upper-case (see e.g. http://www.the-interweb.com/serendipity/index.php?/archives/80-Converting-strings-to-upper-case-is-tricky.html ).

The result is the upper-case string is longer than the lower-case version. This sends the indexes out of kilter within the TokenStream class when using case insensitive tokenising.

The solution is to override the match method in the CaseInsensitiveToken to convert the current token to upper-case, rather than storing an upper-case version of the entire input string, which may not have the same indexes as the lower-case version.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Daniel Kelleher (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2015/08/03 8:50 AM

Updated:: 2015/08/28 7:31 AM

Resolved:: 2015/08/04 9:26 AM