Uploaded image for project: 'Application Server 3  4  5 and 6'
  1. Application Server 3 4 5 and 6
  2. JBAS-872

Problem with non-English characters in UTF-8 encoded queries

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • No Release
    • JBossAS-3.2.6 Final
    • CMP service
    • None

      SourceForge Submitter: azzazzel .
      I'm resubmitting this issue since the the last time I
      did (see #887491) it was understood wrong,
      closed and marked as "invalid" by "loubyansky" without
      even making the effort to clarify it.

      --------------------------------
      Initial Comment (see #887491 for full text):

      It seems that all characters coded with more than one
      byte (2+ bytes) in UTF-8 encoded queries are
      incorrectly parsed by [EJB/JBoss]QLParser as seen in
      this log fragment:
      [...]
      If I pass parameter like '\u0105' instead of ''
      then it works.
      [...]

      Comment By: Alexey Loubyansky (loubyansky):

      Why is '&#261' supposed to be understood?
      Either you provide unicode content as is (not the
      '&#261' form) or you use unicode escapes as defined in
      the Java
      spec, i.e. '\u'.

      --------------------------------

      I very well know that "&#261" is not supposed to be
      understood!
      What I have typed in the <TEXTAREA> was character with
      Unicode code \u0105 also called "LATIN SMALL LETTER A
      WITH OGONEK"
      I guess it was converted to "&#261" by SF and I haven't
      even noticed it was!
      I bet if you type Russian characters in <TEXTAREA> they
      would also be displayed in &#XXX; form.

      This subject was discussed previously on JBoss-user
      list and Alexey Loubyansky was also answering my
      e-mails there.
      (See:
      http://www.mail-archive.com/jboss-user@lists.sourceforge.net/msg35226.html)
      I have also contacted Alexey Loubyansky and Dain
      Sundstrom since they are mentioned to be the authors of
      "JBossQLParser.jjt" and "EJBQLParser.jjt".
      Alexey didn't answer, while Dain stated he does not
      work for JBoss any more.
      I was asked to open a bug report by Heiko Rupp on
      jboss-user list!

      Now once again to make it clear:

      I do not enter in my queries characters in the form
      "&#261" but naturally in UTF-8 encoding (as they are
      typed)!
      It does not work! It is incorrectly parsed! I believe
      it is because parser expects 1 byte long character
      (\u0105 has two bytes in UTF-8).

      As I said before setting "JAVA_UNICODE_ESCAPE = false"
      in "JBossQLParser.jjt" and "EJBQLParser.jjt" solves the
      problem!
      More specifically it causes that parser understands
      UTF-8 but does not understand Unicode escaped
      characters (in the form \uXXXX).
      I don't know how to set it in order to understand both!

      Can I please ask you, to have another look on this!
      Please contact me if you need more information on this
      subject!

      Milen Dyankov

              olubyans@redhat.com Alexey Loubyansky
              sourceforge-user SourceForge legacy user (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: