-
Bug
-
Resolution: Done
-
Critical
-
EAP_EWP 5.1.1, EAP_EWP 5.1.2 CR1, EAP_EWP 5.1.2 CR3, EAP_EWP 5.1.2 CR4
-
None
-
RHEL 6 x86-64 with GFS2/SAN
-
-
Documented as Resolved Issue
-
NEW
Hi Clebert,
as we agreed we've started developing tests with disconnected journal according to HornetQ test plan (section 10). For now all test scenarios are failing because HornetQ architecture was not initially designed to handle such a situation. I'd like to share here current test results and some information about testing environment.
Test Scenario - "Node is disconnected from journal" - collocated backup (corresponds to section 10.1.1):
1. Start cluster - EAP servers A and B
2. Start "live" producer and "live" consumer connected to server A and sending messages to "liveQueue" - active for the whole duration of the test
3. Start producer - send 1000 messages to "testQueue" to server A
4. Disconnect SAN from server A
5. Start consumer - read from server B from "testQueue"
Pass criteria:
After step 4 the backup node will take its role.
Clients will be reconnected to backup node and will be able to continue with its work.
Test results:
After step 4.:
- EAP server B won't take its role - backup doesn't come to live
- "live" producer/consumer ends with exception - attached logs - and don't failover to EAP server B
In step 5. consumer on EAP node B is able to read only half of the messages sent in step 3. to "testQueue" (load- balancing)
Note about testing environment:
GFS2/SAN is using "fenced" daemon which power off nodes which failed. By disconnecting SAN this happens but it takes couple of minutes. Considering our test scenario after step 4 - New clients can connect to EAP server A and fail to read/send any messages. EAP server B just deliver messages which are in its journal when clients connect to it.
Do we have some solution already?
Thank you,
Mirek
- is related to
-
JBPAPP-7870 Node disconnected from journal hangs on shutdown
- Closed