Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 7.0.2.CR1, 7.0.2.GA
Affects Version/s: 7.0.0.ER6
Component/s: ActiveMQ
Labels:
- downstream_dependency

Affects:

Release Notes
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=1321998
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Release Note Status:
Documented as Known Issue
Target Release:

7.0.z.GA
Steps to Reproduce:
Hide

git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_VERSION=7.0.0.ER6 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap export JOURNAL_DIRECTORY_A=$WORKSPACE/journal-A export JOURNAL_DIRECTORY_B=$WORKSPACE/journal-B export JOURNAL_DIRECTORY_C=$WORKSPACE/journal-C export JOURNAL_DIRECTORY_D=$WORKSPACE/journal-D cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailbackWithMdbsShutdown -DfailIfNoTests=false -Deap=7x | tee log
Show
git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_VERSION=7.0.0.ER6 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap export JOURNAL_DIRECTORY_A=$WORKSPACE/journal-A export JOURNAL_DIRECTORY_B=$WORKSPACE/journal-B export JOURNAL_DIRECTORY_C=$WORKSPACE/journal-C export JOURNAL_DIRECTORY_D=$WORKSPACE/journal-D cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailbackWithMdbsShutdown -DfailIfNoTests= false -Deap=7x | tee log

Sprint:
EAP 7.0.2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Scenario: We have two nodes in (manually created) colocated replicated topology. Both nodes contain InQueue and OutQueue.

We send 2000 messages (mix of large and normal) to InQueue on node 1
On each node we deploy MDB which resend messages from InQueue to OutQueue
During resending of messages we cleanly shutdown node 2 and after some time we start it again
We receive messages from OutQueue on node 1 and check if number of received messages equals to number of send messages

Expectation: all messages will be resent

Actual state: some messages are not resent and they are lost

Customer impact: large messages might get lost in colocated HA topology with replicated journal if one of the servers is cleanly shutdown

As you can see in [1] and [2], lost messages are stuck in sf.my-cluster queue of node 2 and corresponding large message files have zero length. Bodies of lost messages are in largemessages1, see [3].

Race condition which cause loss of messages

Node 2 decides to redistribute message-1 to node 1
It creates copy of message-1 with new messageID (message-2) and message-1 is considered as delivered
In the meantime the node 2 is shutting down and thus redistribution of message-2 to node 1 fails
After that backup on node 1 comes to alive and it continues in redistribution of message-2 to live on node 1
Backup knows about message-2 but it does not have body of this message, it sends only header packet and waits for acknowledge from live. Live receives header packet and waits for chunk packets. Both servers wait for each other.
Node 2 is started again. Live on node 2 synchronizes with backup on node 1 and thus it receives message-2 with body of zero length.
Again node 2 sends only header packet and waits for acknowledge and node 1 receives header packet and waits for chunks.
Message-2 is stuck in sf.my-cluster queue and its body is lost.

[1]

[standalone@localhost:9990 runtime-queue=sf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b] :list-messages
{
    "outcome" => "success",
    "result" => [
        {
            "address" => "jms.queue.InQueue",
            "color" => "GREEN",
            "count" => 136,
            "messageID" => 1162,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 137,
            "type" => 3,
            "priority" => 4,
            "userID" => "ID:a5b8a3ea-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "d56d32d3-9678-498a-8376-da1658497cc91457085939341",
            "timestamp" => 1457085939341L,
            "_AMQ_LARGE_SIZE" => 409605
        },
        {
            "address" => "jms.queue.InQueue",
            "color" => "RED",
            "count" => 139,
            "messageID" => 1169,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 140,
            "type" => 6,
            "priority" => 4,
            "userID" => "ID:a5da83cd-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "2d2f958d-bb0e-4e2e-9c4b-413cbb4550fc1457085939563",
            "timestamp" => 1457085939563L,
            "_AMQ_LARGE_SIZE" => 409615
        },
        {
            "address" => "jms.queue.InQueue",
            "color" => "RED",
            "count" => 137,
            "messageID" => 1184,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 138,
            "type" => 2,
            "priority" => 4,
            "userID" => "ID:a5d839db-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "f58aae80-f118-46c6-a19a-c8e90fec7bc51457085939548",
            "timestamp" => 1457085939548L,
            "_AMQ_LARGE_SIZE" => 409617
        },
        {
            "address" => "jms.queue.InQueue",
            "color" => "GREEN",
            "count" => 138,
            "messageID" => 1189,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 139,
            "type" => 5,
            "priority" => 4,
            "userID" => "ID:a5d9725c-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "8dbb9b33-f654-4556-885d-9201c443f2821457085939556",
            "timestamp" => 1457085939556L,
            "_AMQ_LARGE_SIZE" => 413163
        },
        {
            "address" => "jms.queue.InQueue",
            "color" => "RED",
            "count" => 145,
            "messageID" => 1192,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 146,
            "type" => 4,
            "priority" => 4,
            "userID" => "ID:a5dc5893-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "763ec86d-010e-455a-95bd-58ca9dd7bf7f1457085939575",
            "timestamp" => 1457085939575L,
            "_AMQ_LARGE_SIZE" => 204800
        },
        {
            "address" => "jms.queue.InQueue",
            "color" => "GREEN",
            "count" => 146,
            "messageID" => 1195,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 147,
            "type" => 3,
            "priority" => 4,
            "userID" => "ID:a5fba064-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "2587ff38-91c2-4192-846c-8d796ffd84bb1457085939780",
            "timestamp" => 1457085939780L,
            "_AMQ_LARGE_SIZE" => 409605
        },
        {
            "address" => "jms.queue.InQueue",
            "color" => "RED",
            "count" => 147,
            "messageID" => 1228,
            "_AMQ_ROUTE_TOsf.my-cluster.6ef15b5a-e1f0-11e5-b678-65948414801b" => [
                0,
                0,
                0,
                0,
                0,
                0,
                0,
                20
            ],
            "counter" => 148,
            "type" => 2,
            "priority" => 4,
            "userID" => "ID:a61b3655-e1f0-11e5-a3fa-7f78e6f9d09b",
            "durable" => true,
            "__AMQ_CID" => "a165c4ff-e1f0-11e5-a3fa-7f78e6f9d09b",
            "expiration" => 0,
            "_AMQ_DUPL_ID" => "b84bebbb-94b8-426a-a848-05a0a078acec1457085939987",
            "timestamp" => 1457085939987L,
            "_AMQ_LARGE_SIZE" => 409617
        }
    ]
}

[2]

ls -l largemessages
celkom 0
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1162.msg
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1169.msg
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1184.msg
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1189.msg
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1192.msg
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1195.msg
-rw-rw-r--. 1 eduda eduda 0 mar  4 11:07 1228.msg

[3]

ls -l largemessages1
celkom 2624
-rw-rw-r--. 1 eduda eduda 409605 mar  4 11:07 1162.msg
-rw-rw-r--. 1 eduda eduda 409615 mar  4 11:07 1169.msg
-rw-rw-r--. 1 eduda eduda 409617 mar  4 11:07 1184.msg
-rw-rw-r--. 1 eduda eduda 413163 mar  4 11:07 1189.msg
-rw-rw-r--. 1 eduda eduda 204800 mar  4 11:07 1192.msg
-rw-rw-r--. 1 eduda eduda 409605 mar  4 11:07 1195.msg
-rw-rw-r--. 1 eduda eduda 409617 mar  4 11:07 1228.msg

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

server1.log.7z
9.17 MB
2016/03/04 5:46 AM
server2.log.7z
1.42 MB
2016/03/04 5:46 AM
test-suite.log.zip
1.22 MB
2016/03/04 5:43 AM

clones

JBEAP-5257 (7.1.0) Redistribution loses large messages when server with HA is restarted

Verified

is incorporated by

JBEAP-4679 (7.0.z) Upgrade Artemis from 1.1.0.SP17 to 1.1.0.SP18

Verified

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates