-
Bug
-
Resolution: Obsolete
-
Blocker
-
None
-
9.0.1.Final
I am using wildfly 9 in a cluster setup of 3 nodes (standalone-full-ha.xml) and use Singleton service for some of our operations. Sometimes (during heavy load/traffic) we are seeing that singleton service silently dies without giving any error or exception. There was no exception like : "Failed to get quorum.."
When load reduces (number of concurrent requests) on wildfly, then also it doesn't recover i.e. reactivate singleton in some node. In order to start singleton again, the only option that works is manually restarting wildfly
Following is my standalone-full-ha.xml config for infinispan and Jgroups.
<stack name="tcp"> <transport socket-binding="jgroups-tcp" type="TCP"/> <protocol type="TCPPING"> <property name="initial_hosts"> 10.0.1.32[7600],10.0.1.38[7600],10.0.1.39[7600]</property> <property name="port_range"> 0 </property> </protocol> <protocol type="MERGE2"/> <protocol socket-binding="jgroups-tcp-fd" type="FD_SOCK"/> <protocol type="FD"/> <protocol type="VERIFY_SUSPECT"/> <protocol type="pbcast.NAKACK2"/> <protocol type="UNICAST3"/> <protocol type="pbcast.STABLE"/> <protocol type="pbcast.GMS"> <property name="join_timeout"> 5000 </property> </protocol> <protocol type="MFC"/> <protocol type="FRAG2"/> <protocol type="RSVP"/> </stack>
....
<subsystem xmlns="urn:jboss:domain:infinispan:3.0"> <cache-container aliases="singleton cluster" default-cache="default" module="org.wildfly.clustering.server" name="server"> <transport lock-timeout="120000"/> <replicated-cache mode="ASYNC" name="default"> <state-transfer enabled="true" timeout="300000"/> <transaction locking="OPTIMISTIC" mode="BATCH"/> </replicated-cache> </cache-container> <cache-container default-cache="session" module="org.wildfly.clustering.web.infinispan" name="web"> <transport lock-timeout="120000"/> <replicated-cache mode="ASYNC" name="session"> <state-transfer enabled="true" timeout="300000"/> <locking isolation="READ_COMMITTED"/> <transaction locking="OPTIMISTIC" mode="BATCH"/> </replicated-cache> </cache-container> </subsystem>
Following is a java code snippet that we use to activate and start singleton in a cluster:
public class SingletonServiceActivator implements ServiceActivator { public static final ServiceName SINGLETON_SERVICE_NAME = ServiceName.JBOSS.append("ha", "singleton"); private static final String CONTAINER_NAME = "server"; private static final String CACHE_NAME = "default"; @Override public void activate(ServiceActivatorContext context) throws ServiceRegistryException { int quorum = 2; InjectedValue<ServerEnvironment> env = new InjectedValue<>(); SingletonServiceClient srv = new SingletonServiceClient(env); ServiceController<?> factoryService = context.getServiceRegistry().getRequiredService(SingletonServiceBuilderFactory.SERVICE_NAME.append(CONTAINER_NAME, CACHE_NAME)); SingletonServiceBuilderFactory factory = (SingletonServiceBuilderFactory) factoryService.getValue(); SingletonElectionPolicy policy = new SimpleSingletonElectionPolicy(0); factory.createSingletonServiceBuilder(SINGLETON_SERVICE_NAME, srv) .requireQuorum(quorum) .electionPolicy(policy) .build(new DelegatingServiceContainer(context.getServiceTarget(),context.getServiceRegistry())) .addDependency(ServerEnvironmentService.SERVICE_NAME, ServerEnvironment.class, env) .setInitialMode(ServiceController.Mode.ACTIVE) .install(); } public final class SingletonServiceClient extends AbstractService<Serializable> { private final Value<ServerEnvironment> env; public SingletonServiceClient(Value<ServerEnvironment> env) { this.env = env; } @Override public void start(StartContext startContext) { // startContext. log("SingletonService started"); //do work } @Override public void stop(StopContext stopContext) { log("SingletonService stopped"); // THIS NEVER GETS CALLED //stop } }
Is there something wrong in the config or in the way I am trying to activate and start singleton ?
I thought that there could be some connectivity issue between nodes in a cluster because of which its unable to get desired quorum to start singleton. Just to experiment, I changed quorum to 1. But still sometimes I see this issue during heavy load.
I will really appreciate some help or suggestions on this issue.
Also, is there a way to monitor state of singleton from application code and trigger it from our application code ?