Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-7332

XSite trying to replicate to site after site has been shutdown

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 9.0.0.Final
    • None
    • None

      In a 2-site scenario, with a server in each site, when the server in one of the sites goes down, and as a result the entire site is gone, the initial site might still try to replicate to the other site. Example:

      Sites: EARTH and MOON
      Servers: server-earth-one and server-moon-one respectively

      server-moon-one stops:

      2017-01-04 12:16:38,666 INFO  [org.jboss.as] (MSC service thread 1-4) WFLYSRV0050: 
      Infinispan Server 9.0.0.Beta1 (WildFly Core 2.2.0.Final) stopped in 102ms
      

      server-earth-one realises that and sets the correct view:

      2017-01-04 12:16:38,649 TRACE [org.jgroups.protocols.relay.RELAY2] (jgroups-3,_master:server-earth-one:EARTH) 
      [Relayer _master:server-earth-one:EARTH] view: [_master:server-earth-one:EARTH|4] (1) [_master:server-earth-one:EARTH]
      

      server-earth-one gets a put invocation

      2017-01-04 12:16:38,709 TRACE [org.infinispan.interceptors.impl.InvocationContextInterceptor] (HotRodServerHandler-8-1) 
      Invoked with command PutKeyValueCommand{key=org.infinispan.commons.marshall.WrappedByteArray@a3b01a15, 
      value=org.infinispan.commons.marshall.WrappedByteArray@b68d6067, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, 
      valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=NumericVersion{version=4294967297}}, 
      successful=true} and InvocationContext [org.infinispan.context.SingleKeyNonTxInvocationContext@3d9b13cf]
      

      But for some reason, server-earth-one still tries to send it to the MOON site:

      2017-01-04 12:16:38,713 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (HotRodServerHandler-8-1) 
      About to send to backups [MOON (sync, timeout=10000)], command SingleXSiteRpcCommand{command=PutKeyValueCommand{key=org.infinispan.commons.marshall.WrappedByteArray@a3b01a15, 
      value=org.infinispan.commons.marshall.WrappedByteArray@b68d6067, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, 
      valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=NumericVersion{version=4294967297}}, 
      successful=true}}
      

      ^ That should not happen.

      Moreover, the JGroups layer detects there's no site already:

      2017-01-04 12:16:38,717 ERROR [org.jgroups.protocols.relay.RELAY2] (HotRodServerHandler-8-1) 
      master:server-earth-one: no route to MOON: dropping message
      

      But timeout needs to occur for the put to complete:

      2017-01-04 12:16:48,721 WARN  [org.infinispan.xsite.BackupSenderImpl] (HotRodServerHandler-8-1) 
      ISPN000202: Problems backing up data for cache xsiteCache to site MOON: org.infinispan.util.concurrent.TimeoutException: 
      Timed out after 10 seconds waiting for a response from MOON (sync, timeout=10000)
      ...
      2017-01-04 12:16:48,726 TRACE [org.infinispan.server.hotrod.HotRodEncoder] (HotRodServerWorker-7-1) 
      Encode msg EmptyResponse{version=25, messageId=21, cacheName='xsiteCache', clientIntel=3, operation=PUT, status=Success, topologyId=1}
      

      I'm attaching full TRACE logs.

              Unassigned Unassigned
              rh-ee-galder Galder Zamarreño
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: