Loading...

Type: Bug
Resolution: Unresolved
Priority: Blocker
Component/s: Artemis
Labels:
None

Steps to Reproduce:
Hide

create 2 EC2 instances using Red Hat JBoss EAP AMI (RHEL-7-~~JBEAP-7~~.4.0_HVM_GA-20210909-x86_64-0-Access2-GP2)

Attach the same EFS storage to both nodes

configure the nodes using the attached scripts
Show
create 2 EC2 instances using Red Hat JBoss EAP AMI (RHEL-7- JBEAP-7 .4.0_HVM_GA-20210909-x86_64-0-Access2-GP2) Attach the same EFS storage to both nodes configure the nodes using the attached scripts
Workaround:

Workaround Exists
Workaround Description:

Hide

restart both EAP instances that compose the HA Live / Backup pair

Show
restart both EAP instances that compose the HA Live / Backup pair

Scenario:

This scenario is inspired by High Availability - Shared Store and is an attempt to replicate that setup on AWS using AWS EFS as storage:

we have 2 EC2 instances created from Red Hat AMI (RHEL-7-~~JBEAP-7~~.4.0_HVM_GA-20210909-x86_64-0-Access2-GP2); both EC2 instance type must support multi attach (e.g. t3.medium)
the first instance is configured as Live node (LIVE.standalone-ec2-full-ha.xml)
the second instance is configured as Backup node (BACKUP.standalone-ec2-full-ha.xml)
both instances use shares storage on an external AWS EFS File system which is mounted on both EC2 instances using NFS4 protocol; note this is possible since both EC2 instance types support multi attach

This scenario presents two main flaws:

startup fails
EFS is slower if compared to other storage solutions like EBS

startup fails

Slave is started at 09:50:20 and Master is started at 09:52:33: slave is started 2,5 minutes before master node;
Note that if the start sequence is reversed we have the error anyway;

When you first start the Live/Backup pairs they produce the following errors and you are not able to send/receive messages to/from Master node:

Slave:

2022-01-21 09:52:54,398 INFO  [org.infinispan.CLUSTER] (thread-7,ejb,ip-172-31-22-146) ISPN000094: Received new cluster view for channel ejb: [ip-172-31-22-146|1] (2) [ip-172-31-22-146, ip-172-31-18-82]
2022-01-21 09:52:54,399 INFO  [org.infinispan.CLUSTER] (thread-7,ejb,ip-172-31-22-146) ISPN100000: Node ip-172-31-18-82 joined the cluster
2022-01-21 09:52:54,544 WARN  [org.apache.activemq.artemis.core.server] (Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3faf32ff)) AMQ222137: Unable to announce backup, retrying: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219012: Timed out waiting to receive initial broadcast from cluster]
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.executeDiscovery(ServerLocatorImpl.java:767)
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:655)
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:549)
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:528)
    at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:267)
    at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
    at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
    at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

Master:

2022-01-21 09:52:56,287 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 87) ISPN000094: Received new cluster view for channel ejb: [ip-172-31-22-146|1] (2) [ip-172-31-22-146, ip-172-31-18-82]
2022-01-21 09:52:56,293 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 84) ISPN000079: Channel ejb local address is ip-172-31-18-82, physical addresses are [172.31.18.82:7600]
2022-01-21 09:52:56,297 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 85) ISPN000079: Channel ejb local address is ip-172-31-18-82, physical addresses are [172.31.18.82:7600]
2022-01-21 09:52:56,307 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 87) ISPN000079: Channel ejb local address is ip-172-31-18-82, physical addresses are [172.31.18.82:7600]
2022-01-21 09:52:56,358 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 88) AMQ221034: Waiting indefinitely to obtain live lock
2022-01-21 09:53:06,358 WARN  [org.apache.activemq.artemis.core.server] (Thread-0 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@5adf17ff)) AMQ222137: Unable to announce backup, retrying: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ219012: Timed out waiting to receive initial broadcast from cluster]
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.executeDiscovery(ServerLocatorImpl.java:767)
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:655)
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:549)
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:528)
    at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:267)
    at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
    at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
    at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

Note that, looking at the logs, you can see a cluster is formed but, nevertheless, the Broker doesn't start;
Complete logs in attached MASTER-server.log and SLAVE-server.log;

Restarting the EAP instance on Master and Slave nodes solves the issue;
Complete logs in attached MASTER-AFTER_RESTART-server.log and SLAVE-AFTER_RESTART-server.log;

EFS is slower if compared to other storage solutions like EBS

Using a Java client external to AWS we are now able to send/ receive messages from the Master node;

Looking at performances, it takes 30 seconds to send 200 messages and another 34 seconds to receive 200 messages:

Fri Jan 21 13:43:59 CET 2022 - Sending 200 messages ...
Fri Jan 21 13:44:29 CET 2022 - Sent 200 messages.
Fri Jan 21 13:44:32 CET 2022 - Receiving messages ...
Fri Jan 21 13:45:06 CET 2022 - Received 200 messages.

If, instead of EFS, we use the default EC2 instance storage (not multi attached) which is EBS, it takes 20 seconds to send 200 messages and another 21 seconds to receive 200 messages:

Fri Jan 21 13:57:53 CET 2022 - Sending 200 messages ...
Fri Jan 21 13:58:13 CET 2022 - Sent 200 messages.
Fri Jan 21 13:58:16 CET 2022 - Receiving messages ...
Fri Jan 21 13:58:37 CET 2022 - Received 200 messages.

questions

Is it worth to fix the startup issue and provide support for this scenario on AWS?

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

BACKUP.standalone-ec2-full-ha.xml
2022/01/28 8:13 AM
40 kB
Tommaso Borgato
LIVE.standalone-ec2-full-ha.xml
2022/01/28 8:13 AM
40 kB
Tommaso Borgato

Details

Description

Scenario:

startup fails

EFS is slower if compared to other storage solutions like EBS

questions

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates