Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: jbossws-1.2.1
Component/s: jbossws-jaxrpc, jbossws-native
Labels:
None

Forum Reference:
http://community.jboss.org/message/335246#335246
Workaround:

Workaround Exists
Workaround Description:

Hide

set system property file.encoding=utf-8

this workaround is equally bad since it breaks apps that rely on platform specific reading ...

Show
set system property file.encoding=utf-8 this workaround is equally bad since it breaks apps that rely on platform specific reading ...

When sending a client request which includes a non-ASCII UTF-8 character such as the "ç" in "Français" on a machine which has the default character encoding set to something different than UTF-8, the encoding is erroneous. For example, the "ç" in the example above is marshalled on the network stream as 0xC3 0x83 0xC2 0xA7 instead of the legal UTF-8 sequence being 0xC3 0xA7, when the machine's default character set is set to MS1252 in this case (Windows).

A fix for this is setting the system property file.encoding=utf-8, but this causes as many problems elsewhere as it fixes (especially in the case of legacy platform-specific file reading) ... .

A forum post is highly likely to expose the same phenomenon: http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4030510#4030510

After some good hours of stepping through the JBossWS code, I discovered what I guess must be the culprit in the method XMLFragment.writeSourceInternal(Writer writer):
....
if (reader == null)
reader = new InputStreamReader(streamSource.getInputStream());

Here streamSource.getInputStream() is an already UTF-8 encoded stream. However, when a new instance of InputStreamReader is created around it, it will be set to the machine's default character encoding, thus effectively interpreting bytes from the UTF-8 stream in a different encoding scheme, resulting in corrupted data.

Each time data passes through the marschalling corruption is added, effectively worsening wrong character count when data is passed back and forth.

I would suggest attaching a reader to the StreamSource source instance var so that it keeps track of its encoding, but that might break things elsewhere ...

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

a-jbossws1.2.1GA-utf8-patch.jar
8 kB
2007/06/22 4:00 AM
JAXBSerializer.java--afterchange
6 kB
2009/09/27 4:33 AM
JAXBSerializer.java--beforechange
5 kB
2009/09/27 4:34 AM

duplicates

JBWS-1763 Incorrect handling of charsets when the default charset is not UTF-8

Closed

Assignee:: Unassigned

Reporter:: Wim De Muynck (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2007/06/21 1:53 PM

Updated:: 2010/07/16 6:42 AM

Resolved:: 2007/08/03 11:31 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates