Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-2518

Cache startup failure with server hinting and insufficient segments

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • JDG 7.3.1 ER2
    • JDG 7.2.3 GA, JDG 7.3 ER3
    • Clustering
    • None
    • Release Notes
    • Workaround Exists
    • Hide

      increasing segment

      Show
      increasing segment
    • Hide

      1. set segments to 1, and add machine setting to transport.

      <distributed-cache name="default" segments="1" />
      ...
      <stack name="udp">
          <transport type="UDP" socket-binding="jgroups-udp" machine="${jboss.jgroups.transport.machine:machine1}" rack="${jboss.jgroups.transport.rack:rack1}" site="${jboss.jgroups.transport.site:site1}" />
      </stack>
      

      2. startup 3 nodes.
      3. the 3rd node will fail with Replication timeout by state-transfer timeout.

      This log and clustered.xml was attached as log.zip.

      Show
      1. set segments to 1, and add machine setting to transport. <distributed-cache name= " default " segments= "1" /> ... <stack name= "udp" > <transport type= "UDP" socket-binding= "jgroups-udp" machine= "${jboss.jgroups.transport.machine:machine1}" rack= "${jboss.jgroups.transport.rack:rack1}" site= "${jboss.jgroups.transport.site:site1}" /> </stack> 2. startup 3 nodes. 3. the 3rd node will fail with Replication timeout by state-transfer timeout. This log and clustered.xml was attached as log.zip.
    • JDG Sprint #25

      When setting small segment to a cache and using server hinting, node can't start with the following error[1].
      It can be reproduced with RHDG 7.2.3 and 7.3 ER2.

      [1]

      ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered.test: org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered.test: Failed to start service
      ...
      Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
      ...
      Caused by: org.infinispan.util.concurrent.TimeoutException: Replication timeout for svr01 (flags=0), site-id=site1, rack-id=rack1, machine-id=machine1)
      at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:916)
      ...
      

      For example, 3rd node will fail to start with the following setting in 3 nodes cluster.
      When set the segments to 20 (6.6.2 default), 6th node will fail to start with the above timeout.
      Nodes seems to not be able to finish the initial state transfer and start up fails if the segments are set insufficiently against the number of nodes,

      <distributed-cache name="default" segments="1" />
      ...
      <stack name="udp">
          <transport type="UDP" socket-binding="jgroups-udp" machine="${jboss.jgroups.transport.machine:machine1}" rack="${jboss.jgroups.transport.rack:rack1}" site="${jboss.jgroups.transport.site:site1}" />
      </stack>
      

        1. logs.zip
          16 kB
        2. reproducer.zip
          150 kB

              dberinde@redhat.com Dan Berindei (Inactive)
              rhn-support-hdaicho Hiroki Daicho (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: