Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-2070

Unable to failover to remote Site in cross-site replication in JDG

    XMLWordPrintable

Details

    • CR1
    • Hide

      Please find the setup as well the detailed steps performed to reproduce the issue.

      Environment:
      1> JDG 7.1.0
      2> There are 2 sites , site1 and site2
      3> Site 1 contains node 1 and node2 running on port offset 0 and 100
      4> Site 2 contains node3 and node4 running on port offset 200 and 300.
      5> All the above 4 nodes are running on 127.0.0.1
      6> I have defined cache named "Pranab" in all the 4 JDG nodes.
      7> node1 and node2 are in UDP cluster , similarly node 3 and node4 are in cluster using UDP protocol
      8> Site1 and Site2 are in cluster using TCP clustering .

      Test 1:
      =====
      > Started all the 3 nodes node2 , node3 , node4 (not started the node1 of site1 )
      > Start the client and access the application using "http://10.10.10.10:8080/application" ( client is deployed in EAP 7 running on 10.10.10.10)
      > Now , as the client code contains below , hence node 2 will initially serve the request
      ~~~
      ConfigurationBuilder builder = new ConfigurationBuilder();
      builder.addServer().host("127.0.0.1").port(11222).addServer().host("127.0.0.1").port(11322).addCluster("site2").addClusterNode("127.0.0.1", 11422).addClusterNode("127.0.0.1", 11522);

      ~~~
      > Next , shut down node 2 , and access the application , now node 3 and node 4 will serve the request ,which can be confirmed from hotrod-access.log ( i.e here the failover happens successfully)
      > Next , start node 2 and access the application still node 3 and node 4 will server the request as expected.
      > Next , Shut down node 3 and node4 , now node 2 will server the request ( fail back scenario)
      > next start node3 and node4 and post start of these nodes , shut down node 2 and access the application ,here is where the error is populated which can be seen in test1.log attached

      Test 2:
      ======

      > Started all the 4 nodes node1, node2 , node3 , node4
      > Start the client and access the application using "http://10.10.10.10:8080/application"
      > Now , as the client code contains below , hence node 1 and node2 will initially server the request
      ~~~
      ConfigurationBuilder builder = new ConfigurationBuilder();
      builder.addServer().host("127.0.0.1").port(11222).addServer().host("127.0.0.1").port(11322).addCluster("site2").addClusterNode("127.0.0.1", 11422).addClusterNode("127.0.0.1", 11522);

      ~~~
      > Next , shut down node1 and node2 , and access the application , now node 3 and node 4 will serve the request ,which can be confirmed from hotrod-access.log.
      > Next , start only node 2 and access the application still node3 and node 4 will server the request as expected.
      > Next , Shut down node 3 and node4 , now node 2 will serve the request
      > Next start node3 and node4 and post start of these nodes still node 2 will continue to serve the request .
      > Now , shut down node 2 and access the application it will result in error as in test2.log

      These are the test performed by me in my local when I received the error similar to what customer has received , however customer is not at all able to perform the Fail over scenario to the other site and the application continues to keep trying to access the nodes of first site which has been shut down.

      Show
      Please find the setup as well the detailed steps performed to reproduce the issue. Environment: 1> JDG 7.1.0 2> There are 2 sites , site1 and site2 3> Site 1 contains node 1 and node2 running on port offset 0 and 100 4> Site 2 contains node3 and node4 running on port offset 200 and 300. 5> All the above 4 nodes are running on 127.0.0.1 6> I have defined cache named "Pranab" in all the 4 JDG nodes. 7> node1 and node2 are in UDP cluster , similarly node 3 and node4 are in cluster using UDP protocol 8> Site1 and Site2 are in cluster using TCP clustering . Test 1: ===== > Started all the 3 nodes node2 , node3 , node4 (not started the node1 of site1 ) > Start the client and access the application using "http://10.10.10.10:8080/application" ( client is deployed in EAP 7 running on 10.10.10.10) > Now , as the client code contains below , hence node 2 will initially serve the request ~~~ ConfigurationBuilder builder = new ConfigurationBuilder(); builder.addServer().host("127.0.0.1").port(11222).addServer().host("127.0.0.1").port(11322).addCluster("site2").addClusterNode("127.0.0.1", 11422).addClusterNode("127.0.0.1", 11522); ~~~ > Next , shut down node 2 , and access the application , now node 3 and node 4 will serve the request ,which can be confirmed from hotrod-access.log ( i.e here the failover happens successfully) > Next , start node 2 and access the application still node 3 and node 4 will server the request as expected. > Next , Shut down node 3 and node4 , now node 2 will server the request ( fail back scenario) > next start node3 and node4 and post start of these nodes , shut down node 2 and access the application ,here is where the error is populated which can be seen in test1.log attached Test 2: ====== > Started all the 4 nodes node1, node2 , node3 , node4 > Start the client and access the application using "http://10.10.10.10:8080/application" > Now , as the client code contains below , hence node 1 and node2 will initially server the request ~~~ ConfigurationBuilder builder = new ConfigurationBuilder(); builder.addServer().host("127.0.0.1").port(11222).addServer().host("127.0.0.1").port(11322).addCluster("site2").addClusterNode("127.0.0.1", 11422).addClusterNode("127.0.0.1", 11522); ~~~ > Next , shut down node1 and node2 , and access the application , now node 3 and node 4 will serve the request ,which can be confirmed from hotrod-access.log. > Next , start only node 2 and access the application still node3 and node 4 will server the request as expected. > Next , Shut down node 3 and node4 , now node 2 will serve the request > Next start node3 and node4 and post start of these nodes still node 2 will continue to serve the request . > Now , shut down node 2 and access the application it will result in error as in test2.log These are the test performed by me in my local when I received the error similar to what customer has received , however customer is not at all able to perform the Fail over scenario to the other site and the application continues to keep trying to access the nodes of first site which has been shut down.
    • Workaround Exists
    • Hide

      Test 1: start node1

      Show
      Test 1: start node1
    • Compatibility/Configuration, User Experience
    • JDG Sprint #17, JDG Sprint #18

    Description

      Unable to perform failover from one site to the other site.

      When the Primary site/datacenter goes down the client doesn't connect to the secondary site/data-center and fails with the below error which can be seen in the client logs:

      ~~~
      11:44:54,241 WARNING [javax.enterprise.resource.webcontainer.jsf.lifecycle] (default task-43) #

      {cacheBean.put(input1.value, input2.value)}: org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport: javax.faces.FacesException: #{cacheBean.put(input1.value, input2.value)}

      : org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport
      at com.sun.faces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:118)
      at javax.faces.component.UICommand.broadcast(UICommand.java:315)
      at javax.faces.component.UIViewRoot.broadcastEvents(UIViewRoot.java:790)
      at javax.faces.component.UIViewRoot.processApplication(UIViewRoot.java:1282)
      at com.sun.faces.lifecycle.InvokeApplicationPhase.execute(InvokeApplicationPhase.java:81)
      at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:101)
      at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:198)
      at javax.faces.webapp.FacesServlet.service(FacesServlet.java:658)
      at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85)
      at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62)
      at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36)
      at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78)
      at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
      at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:131)
      at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57)
      at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
      at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46)
      at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64)
      at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60)
      at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77)
      at io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50)
      at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43)
      at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
      at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61)
      at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
      at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
      at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:285)
      at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:264)
      at io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81)
      at io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:175)
      at io.undertow.server.Connectors.executeRootHandler(Connectors.java:207)
      at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:802)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: javax.faces.el.EvaluationException: org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport
      at javax.faces.component.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:101)
      at com.sun.faces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:102)
      ... 34 more
      Caused by: org.infinispan.client.hotrod.exceptions.TransportException:: Could not fetch transport
      at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.borrowTransportFromPool(TcpTransportFactory.java:414)
      at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.getTransport(TcpTransportFactory.java:248)
      at org.infinispan.client.hotrod.impl.operations.AbstractKeyOperation.getTransport(AbstractKeyOperation.java:40)
      at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:55)
      at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:269)
      at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79)
      at com.clustering.chapter7.ActOnCache.putCache(ActOnCache.java:28)
      at com.clustering.chapter7.HelloBean.put(HelloBean.java:39)
      at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at javax.el.ELUtil.invokeMethod(ELUtil.java:300)
      at javax.el.BeanELResolver.invoke(BeanELResolver.java:415)
      at javax.el.CompositeELResolver.invoke(CompositeELResolver.java:256)
      at com.sun.el.parser.AstValue.invoke(AstValue.java:285)
      at com.sun.el.MethodExpressionImpl.invoke(MethodExpressionImpl.java:304)
      at org.jboss.weld.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:40)
      at org.jboss.weld.el.WeldMethodExpression.invoke(WeldMethodExpression.java:50)
      at org.jboss.weld.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:40)
      at org.jboss.weld.el.WeldMethodExpression.invoke(WeldMethodExpression.java:50)
      at com.sun.faces.facelets.el.TagMethodExpression.invoke(TagMethodExpression.java:105)
      at javax.faces.component.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:87)
      ... 35 more
      Caused by: org.infinispan.client.hotrod.exceptions.TransportException:: Could not connect to server: /127.0.0.1:11222
      at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.<init>(TcpTransport.java:85)
      at org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory.makeObject(TransportObjectFactory.java:38)
      at org.infinispan.client.hotrod.impl.transport.tcp.TransportObjectFactory.makeObject(TransportObjectFactory.java:17)
      at infinispan.org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1220)
      at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory.borrowTransportFromPool(TcpTransportFactory.java:409)
      ... 56 more
      Caused by: java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
      at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
      at org.infinispan.client.hotrod.impl.transport.tcp.TcpTransport.<init>(TcpTransp
      ~~~

      Attachments

        1. Configuration_local.zip
          14 kB
        2. test1.log
          230 kB
        3. test2.log
          24 kB

        Issue Links

          Activity

            People

              gzamarre Galder Zamarreño
              rhn-support-plohia Pranab Lohia (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: