Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-23782

Clustering EJB + Servlet deployment: nodes take long to start after fail-over or emit many exceptions

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 8.0.0.Beta
    • Clustering
    • None

      We have the following scenario:

      • we run 4 clustered EAP8 nodes
      • we have a deployment which contains a servlet which invokes a Stateful SessionScoped EJB and stores a session value which is supposed to be propagated to the cluster members
      • we have an haproxy load balancer in front of the 4 clustered EAP8 nodes
      • the 4 clustered EAP8 nodes are shutdown ad restarted one by one

      We notices some nodes take a very long time to restart, for example (see https://issues.redhat.com/secure/attachment/12735115/wildfly-run-215.zip):

      • node 2 takes about 4 minutes and starts with errors:
        2022-06-28 10:38:33,205 INFO  [org.jboss.modules] (main) JBoss Modules version 2.0.3.Final
        ...
        2022-06-28 10:42:40,755 ERROR [org.jboss.as] (Controller Boot Thread) WFLYSRV0026: JBoss EAP 8.0.0.Beta (WildFly Core 19.0.0.Final-redhat-20220628) started (with errors) in 247838ms - Started 1021 of 1270 services (12 services failed or missing dependencies, 526 services are lazy, passive or on-demand) - Server configuration file in use: standalone-ha.xml
        
      • node 3 takes about 4 minutes and starts with errors:
        2022-06-28 10:43:12,537 INFO  [org.jboss.modules] (main) JBoss Modules version 2.0.3.Final
        ...
        2022-06-28 10:47:20,475 ERROR [org.jboss.as] (Controller Boot Thread) WFLYSRV0026: JBoss EAP 8.0.0.Beta (WildFly Core 19.0.0.Final-redhat-20220628) started (with errors) in 248223ms - Started 1021 of 1270 services (12 services failed or missing dependencies, 526 services are lazy, passive or on-demand) - Server configuration file in use: standalone-ha.xml
        

      We repeated the test several times, the common thing is some nodes taking very long to start but the nodes aren't always the same (not always node 2 and 3);

      In some runs, the startup times weren't so bad (max 12 seconds) but we observed a huge number of the following exceptions (see https://issues.redhat.com/secure/attachment/12735119/wlf_20225329-135314-wildfly-service-1-server.zip):

      2022-06-29 14:03:16,717 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (default task-6) ISPN000136: Error executing command PrepareCommand on Cache 'clusterbench-ee8.ear.clusterbench-ee8-web.war', writing keys [SessionAttributesKey(6PtpEkA-zr2ifrmVng3BlhwMGmf6GLpA4K3yPHDA), SessionCreationMetaDataKey(6PtpEkA-zr2ifrmVng3BlhwMGmf6GLpA4K3yPHDA), SessionAccessMetaDataKey(6PtpEkA-zr2ifrmVng3BlhwMGmf6GLpA4K3yPHDA)]: org.infinispan.commons.marshall.MarshallingException
      	at org.infinispan@13.0.10.Final-redhat-00001//org.infinispan.marshall.protostream.impl.AbstractInternalProtoStreamMarshaller.objectToByteBuffer(AbstractInternalProtoStreamMarshaller.java:81)
      	at org.infinispan@13.0.10.Final-redhat-00001//org.infinispan.marshall.protostream.impl.AbstractInternalProtoStreamMarshaller.objectToByteBuffer(AbstractInternalProtoStreamMarshaller.java:87)
      

              Unassigned Unassigned
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: