Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-20946

[GSS](7.3.z) Xalan XML to stream transformation produces wrong encoding

XMLWordPrintable

      An XML transformation from XML (DOMSource) to String (StreamSource) using encoding "ISO-8859-1" produces Unicode decimal encoding instead of ISO-8859-1 within EAP running with JDK 11.

      Encoding "ISO-8859-1" is configured by:

      lTransformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
      

      By means of service loader configurtion of xalan-2.7.1.redhat-12.jar org.apache.xalan.processor.TransformerFactoryImpl is used as implementation of javax.xml.transform.TransformerFactory.

      Please find attached XalanTransformation.zip which can be used to reproduce the issue and also to show different behaviour between using JDK 8 and JDK 11.

       

      c:\progra~1\java\jdk-11.0.9.0.3\bin\java -cp .;serializer-2.7.1.redhat-12.jar;xalan-2.7.1.redhat-12.jar XalanTransformation
      <?xml version="1.0" encoding="ISO-8859-1"?><Name>H&#252;bner</Name
      
      c:\progra~1\java\jdk1.8.0_271\bin\java -cp .;serializer-2.7.1.redhat-12.jar;xalan-2.7.1.redhat-12.jar XalanTransformation
      <?xml version="1.0" encoding="ISO-8859-1"?><Name>Hübner</Name>
       
      

      The issue seems to be caused by the mechanism xalan uses to register the encodings from serializer-2.7.1.redhat-12.jar\org\apache\xml\serializer\Encodings.properties. The file contains:

       

      ISO8859-1 ISO-8859-1 0x00FF
      ISO8859_1 ISO-8859-1 0x00FF
      8859-1 ISO-8859-1 0x00FF
      8859_1 ISO-8859-1 0x00FF 
      

      and registers the Java names (left column) and the Mime names (middle column).

      The encoding used by the transformation is looked up b means of the Mime name "ISO-8859-1". Which Java encoding is actually used for the string encoding apparently depends on the order of encoding registration to org.apache.xml.serializer.Encodings._encodingTableKeyMime. For JDK 11 8859-1
      is registered last and therefore used for transformation. It produces an UnsupportedEncodingException so that xalan writes unicode decimal as fallback plan.

      When JDK 8 is used ISO8859_1 is registered last and therefore used for transformation. This encoding successfully encodes ISO-8859-1.

        1. build.log
          189 kB
          Scott Marlow

              rhn-support-ivassile Ilia Vassilev
              rhn-support-bmaxwell Brad Maxwell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: