Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Blocker
Fix Version/s: None
Affects Version/s: 7.1.0.DR19
Component/s: ActiveMQ
Labels:
- eap7.1-risks-mitigation

Affects Testing:

Regression
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.1.0.GA
Steps to Reproduce:
Hide

This is not 100% reproducer, it was hit in 1 of 2 runs - note, it took 18 hours to hit OOME, steps to run soak test:

git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout master groovy -DEAP_VERSION=7.1.0.DR19 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn -Dsoak.duration=86400000 -Dgroups=category.Soak -Dmaven.test.failure.ignore=true -Deap=7x -DfailIfNoTests=false clean install | tee log
Show
This is not 100% reproducer, it was hit in 1 of 2 runs - note, it took 18 hours to hit OOME, steps to run soak test: git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout master groovy -DEAP_VERSION=7.1.0.DR19 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn -Dsoak.duration=86400000 -Dgroups=category.Soak -Dmaven.test.failure.ignore= true -Deap=7x -DfailIfNoTests= false clean install | tee log

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Customer Impact: Server can crash on OOME during long running execution. It's a regression against previous DRs and EAP 7.0.

There is OOME in one of the SOAK tests. OOME seems to be random as it was in 1 of 2 runs. SOAK is running complex test scenario with temporary queues, message selectors, core and JMS bridges, remote JCA and topics with durable subscription.

See 1st comment to download the heap dump, all logs, message journal, ...

Eclipse memory analyzer shows 2 suspects for memory leak - see attachment. The 1st one seems to be responsible for OOME. There are 139,903 instances of org.apache.activemq.artemis.utils.LinkedListImpl$Node taking 456MB. Number of org.apache.activemq.artemis.utils.LinkedListImpl$Node instances seems to be equivalent to number of messages in journal for all destinations as they're references to messages.

Attaching all graphs (memory, cpu, gc statistics, ...) from SOAK. Interesting is mainly memory graph as it seems that OOME happened when not all heap memory was consumed.

It almost looks like that GC did not manage to free enough memory in time and last GC cycles took more time. (it's not so good visible from gc measurement graph but it's there) By looking at the way how instances in LinkedListImpl are removed, there is interesting thing:

private void removeAfter(Node<E> node) {
      Node<E> toRemove = node.next;
...
      //Help GC - otherwise GC potentially has to traverse a very long list to see if elements are reachable, this can result in OOM
      //https://jira.jboss.org/browse/HORNETQ-469
      toRemove.next = toRemove.prev = null;
   }

There is chance that GC traversed over 100k instances to check whether there are references to some object in the list and it took a lot of time. In the mean time server crashed on OOME. But this is a theory.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

jms-server_CpuLoadMeasurement.png
100 kB
2017/05/29 8:19 AM
jms-server_FileMeasurement.png
59 kB
2017/05/29 8:19 AM
jms-server_GarbageCollectorMeasurement.png
68 kB
2017/05/29 8:19 AM
jms-server_MemoryMeasurement.png
130 kB
2017/05/29 8:19 AM
jms-server_SocketMeasurement.png
55 kB
2017/05/29 8:19 AM
memory-analyzer.png
118 kB
2017/05/29 8:18 AM

is related to

JBEAP-11264 [7.1] Messaging - max-size-bytes is not respected leaving server to crash on OutOfMemoryError

Closed

JBEAP-11518 [7.0] Messaging - max-size-bytes is not respected leaving server to crash on OutOfMemoryError

Closed

Assignee:: Martyn Taylor (Inactive)

Reporter:: Miroslav Novak

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2017/05/29 8:17 AM

Updated:: 2017/06/30 6:27 AM

Resolved:: 2017/06/30 6:27 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates