Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: EAP_EWP 5.1.2 ER2, EAP_EWP 5.1.2
Affects Version/s: EAP_EWP 5.1.1, EAP_EWP 5.1.2 CR1, EAP_EWP 5.1.2 CR3, EAP_EWP 5.1.2 CR4
Component/s: HornetQ
Labels:
None
Environment:

RHEL 6 x86-64 with GFS2/SAN

Release Note Text:

Hide
In a situation with clustered HornetQ instances where a cluster node has its journal disconnected - e.g. when the server loses its connection to the SAN - the other nodes did not take over in place of the failed node. This problem has now been fixed and failover from a failed HornetQ node now occurs without interruption to the client.

Show
In a situation with clustered HornetQ instances where a cluster node has its journal disconnected - e.g. when the server loses its connection to the SAN - the other nodes did not take over in place of the failed node. This problem has now been fixed and failover from a failed HornetQ node now occurs without interruption to the client.
Release Note Status:
Documented as Resolved Issue
Docs QE Status:
NEW

Hi Clebert,

as we agreed we've started developing tests with disconnected journal according to HornetQ test plan (section 10). For now all test scenarios are failing because HornetQ architecture was not initially designed to handle such a situation. I'd like to share here current test results and some information about testing environment.

Test Scenario - "Node is disconnected from journal" - collocated backup (corresponds to section 10.1.1):
1. Start cluster - EAP servers A and B
2. Start "live" producer and "live" consumer connected to server A and sending messages to "liveQueue" - active for the whole duration of the test
3. Start producer - send 1000 messages to "testQueue" to server A
4. Disconnect SAN from server A
5. Start consumer - read from server B from "testQueue"

Pass criteria:
After step 4 the backup node will take its role.
Clients will be reconnected to backup node and will be able to continue with its work.

Test results:
After step 4.:

EAP server B won't take its role - backup doesn't come to live
"live" producer/consumer ends with exception - attached logs - and don't failover to EAP server B
In step 5. consumer on EAP node B is able to read only half of the messages sent in step 3. to "testQueue" (load- balancing)

Note about testing environment:
GFS2/SAN is using "fenced" daemon which power off nodes which failed. By disconnecting SAN this happens but it takes couple of minutes. Considering our test scenario after step 4 - New clients can connect to EAP server A and fail to read/send any messages. EAP server B just deliver messages which are in its journal when clients connect to it.

Do we have some solution already?

Thank you,

Mirek

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Hide
jmsClient.zip
2011/12/06 7:15 AM
127 kB
Miroslav Novak
Extracting archive...
Show
jmsClient.zip
2011/12/06 7:15 AM
127 kB
Miroslav Novak
live_consumer.log
2011/09/16 4:07 AM
5 kB
Miroslav Novak
live_producer.log
2011/09/16 4:07 AM
10 kB
Miroslav Novak
Hide
logs.zip
2012/01/04 11:51 AM
3.86 MB
Miroslav Novak
Extracting archive...
Show
logs.zip
2012/01/04 11:51 AM
3.86 MB
Miroslav Novak
Hide
logs.zip
2011/12/06 5:58 AM
1.03 MB
Miroslav Novak
Extracting archive...
Show
logs.zip
2011/12/06 5:58 AM
1.03 MB
Miroslav Novak
Hide
newJmsClient.zip
2012/01/11 11:56 AM
92 kB
Miroslav Novak
Extracting archive...
Show
newJmsClient.zip
2012/01/11 11:56 AM
92 kB
Miroslav Novak
Hide
reproducer.zip
2011/12/06 5:58 AM
9.04 MB
Miroslav Novak
Extracting archive...
Show
reproducer.zip
2011/12/06 5:58 AM
9.04 MB
Miroslav Novak
san_consumer_threaddump.txt
2012/01/11 11:56 AM
12 kB
Miroslav Novak
serverA.log
2011/09/16 4:07 AM
79 kB
Miroslav Novak
server-A-threaddump.txt
2012/01/11 11:56 AM
157 kB
Miroslav Novak
serverB.log
2011/09/16 4:07 AM
37 kB
Miroslav Novak
server-B-threaddump.txt
2012/01/11 11:56 AM
187 kB
Miroslav Novak

is related to

JBPAPP-7870 Node disconnected from journal hangs on shutdown

Closed

Assignee:: Clebert Suconic

Reporter:: Miroslav Novak

Writer:: Russell Dickenson (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2011/09/15 9:10 AM

Updated:: 2012/01/12 9:16 AM

Resolved:: 2012/01/12 9:16 AM

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates