Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-12630

Backup fails to become alive on NFS 4.1

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 7.1.0.ER3
    • ActiveMQ
    • None
    • Hide

      To reproduce this issue you can use this [1] jenkins job. We have NFS 4.1 mount in our messaging-lab

      [1]https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/eap7-artemis-ha-failover-dedicated-nfs41-tier-03-okalman/

      Show
      To reproduce this issue you can use this [1] jenkins job. We have NFS 4.1 mount in our messaging-lab [1] https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/job/eap7-artemis-ha-failover-dedicated-nfs41-tier-03-okalman/

      In multiple failover/failback test where live server is killed backup server fails to wake up on NFS 4.1.
      I've managed to monitor server.lock file during this scenario.
      When live (pid 9001) is working and backup (pid 12127) waits for live to fail it prints:
      [hudson@messaging-12 journal]$ lsof server.lock
      COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
      java 9001 hudson 470u REG 0,41 19 138 server.lock
      java 12127 hudson 463u REG 0,41 19 138 server.lock

      After live is killed only backup remains with open FD on this file:
      COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
      java 12127 hudson 463u REG 0,41 19 138 server.lock

      So everything looks good. But backup fails to detect live failure and doesn't become alive.
      We also have thread dump from this.

      I tried to reproduce the same issue on NFS 4.0 and everything seems to be working fine.

      Customer impact: high availability of HA topology will be questionable, as failover mechanism is not reliable on NFS 4.1.

      This is regression against EAP 7.0, where we are not able to reproduce this issue.
      We didn't encountered this before because this scenario failed on https://issues.jboss.org/browse/JBEAP-10704

            mtaylor1@redhat.com Martyn Taylor (Inactive)
            okalman@redhat.com Ondřej Kalman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: