Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2135

OOM with JGroups 3.6.11.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.6.12, 3.6.20.Final
    • 3.6.11
    • None

    Description

      We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"

      we have been experiencing OOMs fairly often, and the OOMs happen at:

      Object / Stack Frame                                                              |Name                                                                                             | Shallow Heap | Retained Heap |Context Class Loader                         |Is Daemon
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      java.lang.Thread @ 0x81bdf838                                                     |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461|          120 |           456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
      |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)             |                                                                                                 |              |               |                                             |
      |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)|                                                                                                 |              |               |                                             |
      |- at java.lang.Thread.run()V (Thread.java:745)                                   |                                                                                                 |              |               |                                             |
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      

      the Code where it happens is in TcpConnection.java:

      while(canRun()) {
                      try {
                          int len=in.readInt();
                          if(buffer == null || buffer.length < len)
                              buffer=new byte[len];
                          in.readFully(buffer, 0, len);
                          updateLastAccessed();
                          server.receive(peer_addr, buffer, 0, len);
                      }
                      catch(OutOfMemoryError mem_ex) {
                          t=mem_ex;
                          break; // continue;
                      }
                      catch(IOException io_ex) {
                          t=io_ex;
                          break;
                      }
                      catch(Throwable e) {
                      }
                  }
      

      when allocating: buffer=new byte[len];

      it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array

      Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:

                      catch(OutOfMemoryError mem_ex) {
                          t=mem_ex;
                          break; // continue;
                      }
      

      Handling OutOfMemoryError is a strange implementation choice...
      instead a size limit should be employed to protect from receiving invalid sizes...

      My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            zolyfarkas_jira Zoltan Farkas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: