Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2615

BsonDataInput java.nio.charset.MalformedInputException

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 5.2.0.Final, 4.6.1.Final
    • 4.6.0.Final
    • None
    • None
    • Hide

      The below test replicates the exception (although I'm not 100% certain it replicates the specific issue with our content). In org.infinispan.schematic.internal.document.BsonReadingAndWritingTest

          @Test
          public void throwsMalformedInputException() throws Exception{
              char[] chars = new char[BufferCache.MINIMUM_SIZE];
              Arrays.fill(chars, 'a');
              chars[BufferCache.MINIMUM_SIZE - 1]='\u00A3';
              Document document = new BasicDocument("string", new String(chars));
              assertRoundtrip(document);
          }
      
      Show
      The below test replicates the exception (although I'm not 100% certain it replicates the specific issue with our content). In org.infinispan.schematic.internal.document.BsonReadingAndWritingTest @Test public void throwsMalformedInputException() throws Exception{ char [] chars = new char [BufferCache.MINIMUM_SIZE]; Arrays.fill(chars, 'a' ); chars[BufferCache.MINIMUM_SIZE - 1]= '\u00A3' ; Document document = new BasicDocument( "string" , new String (chars)); assertRoundtrip(document); }

    Description

      We've been having the exception: java.nio.charset.MalformedInputException: Input length = 1 come up in our application recently. It has been difficult to reproduce but after some digging I believe it is related to the position of some Unicode characters in our content. We have some very long strings translated to French. When this started showing up I found that simply adding a space (anywhere, randomly) would resolve the issue.

      It seems to centre around the chunking that's done between the byte buffer and char buffer. If (in the above unit test) I increase the size of the string to 2 times the buffers min size and step through the readUTF method I can see the char buffer ends up with 1 remaining slot but is never filled. This goes on and on until the length is 0, at this point the decoder flags it as malformed input.

      Any feedback on this would be great!

      Attachments

        Activity

          People

            hchiorean Horia Chiorean (Inactive)
            adam.mccormick Adam McCormick (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: