Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-17873

Clustering: replicated-cache sampling errors

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 7.3.0.GA
    • 7.3.0.CD18
    • Clustering
    • None

      The issue is about replicated-cache in fail-over tests.
      EAP is started in clustered mode using a replicated cache for replicating HTTP session data across cluster nodes; all 4 nodes in the cluster are initialized with the following cli script:

      embed-server --server-config=standalone-ha.xml
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      /subsystem=infinispan/cache-container=web/replicated-cache=testRepl:add()
      /subsystem=infinispan/cache-container=web/replicated-cache=testRepl/component=locking:write-attribute(name=isolation, value=REPEATABLE_READ)
      /subsystem=infinispan/cache-container=web/replicated-cache=testRepl/component=transaction:write-attribute(name=mode, value=BATCH)
      /subsystem=infinispan/cache-container=web/replicated-cache=testRepl/store=file:add()
      /subsystem=infinispan/cache-container=web:write-attribute(name=default-cache, value=testRepl)
      

      The test is run with jboss-eap-7.3.0.CD18-CR1.zip;
      The same tests run with version jboss-eap-7.2.5.CP-CR1.zip do not have any problem;
      hence this looks like a regression;

      As usual, we test that the serial value stored in the scattered cache is incremented at every call: when this is not true, we say we have a sampling error;

      Here are the runs that exhibit this issue:

      We also repeated the tests with a slightly different MOD_JK configuration to make sure it can be reproduced:

      It's worth mentioning that the same tests performed using HAPRPOXY also exibiths an increase in fail-rate but not as much as with MOD_JK (which probably makes it more evident):

      The MOD_JK workers.properties looks like the following:

      worker.list=loadbalancer,status
      
      worker.node1.port=8009
      worker.node1.host=10.16.176.60
      worker.node1.type=ajp13
      worker.node1.ping_mode=A
      worker.node1.lbfactor=1
      worker.node1.retries=2
      worker.node1.fail_on_status=404,503
      
      worker.node2.port=8009
      worker.node2.host=10.16.176.62
      worker.node2.type=ajp13
      worker.node2.ping_mode=A
      worker.node2.lbfactor=1
      worker.node2.retries=2
      worker.node2.fail_on_status=404,503
      
      worker.node3.port=8009
      worker.node3.host=10.16.176.56
      worker.node3.type=ajp13
      worker.node3.ping_mode=A
      worker.node3.lbfactor=1
      worker.node3.retries=2
      worker.node3.fail_on_status=404,503
      
      worker.node4.port=8009
      worker.node4.host=10.16.176.58
      worker.node4.type=ajp13
      worker.node4.ping_mode=A
      worker.node4.lbfactor=1
      worker.node4.retries=2
      worker.node4.fail_on_status=404,503
      
      worker.loadbalancer.type=lb
      worker.loadbalancer.balance_workers=node1,node2,node3,node4
      worker.loadbalancer.sticky_session=1
      
      worker.status.type=status
      

              pferraro@redhat.com Paul Ferraro
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: