Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.1.0.ER3
Component/s: ActiveMQ
Labels:
None

CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.1.0.GA
Steps to Reproduce:

Hide

To reproduce this issue you can use this [1] jenkins job. We have NFS 4.1 mount in our messaging-lab

[1]https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/eap7-artemis-ha-failover-dedicated-nfs41-tier-03-okalman/

Show
To reproduce this issue you can use this [1] jenkins job. We have NFS 4.1 mount in our messaging-lab [1] https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/eap7-artemis-ha-failover-dedicated-nfs41-tier-03-okalman/

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In multiple failover/failback test where live server is killed backup server fails to wake up on NFS 4.1.
I've managed to monitor server.lock file during this scenario.
When live (pid 9001) is working and backup (pid 12127) waits for live to fail it prints:
[hudson@messaging-12 journal]$ lsof server.lock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 9001 hudson 470u REG 0,41 19 138 server.lock
java 12127 hudson 463u REG 0,41 19 138 server.lock

After live is killed only backup remains with open FD on this file:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 12127 hudson 463u REG 0,41 19 138 server.lock

So everything looks good. But backup fails to detect live failure and doesn't become alive.
We also have thread dump from this.

I tried to reproduce the same issue on NFS 4.0 and everything seems to be working fine.

Customer impact: high availability of HA topology will be questionable, as failover mechanism is not reliable on NFS 4.1.

This is regression against EAP 7.0, where we are not able to reproduce this issue.
We didn't encountered this before because this scenario failed on https://issues.jboss.org/browse/JBEAP-10704

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

logs_and_dump.zip
1.69 MB
2017/08/08 9:53 AM

is related to

JBEAP-12872 Artemis is not be able to guarantee HA on NFSv4 on RHEL 7.4

Closed

Assignee:: Martyn Taylor (Inactive)

Reporter:: Ondřej Kalman (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2017/08/08 9:54 AM

Updated:: 2021/10/24 6:41 AM

Resolved:: 2018/01/04 1:48 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates