-
Bug
-
Resolution: Done
-
Major
-
jboss-fuse-6.2.1
-
%
-
-
Setting zookeeper timeout and retry options lead to a failure of the zookeeper session.
fabric:profile-edit --pid io.fabric8.zookeeper/zookeeper.retry.interval=11000 default 1.0 fabric:profile-edit --pid io.fabric8.zookeeper/zookeeper.retry.max=340 default 1.0 fabric:profile-edit --pid io.fabric8.zookeeper/zookeeper.session.timeout=310000 default 1.0 fabric:profile-edit --pid io.fabric8.zookeeper/zookeeper.connection.time.out=65000 default 1.0
When the profile edit is done, the connection is recreated followed by a number of client exceptions.
2016-07-06 18:19:56,161 | INFO | MCF-1-thread-1 | CuratorFrameworkImpl | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Starting 2016-07-06 18:19:56,162 | INFO | MCF-1-thread-1 | ZooKeeper | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Initiating client connection, connectString=fusefabric1.gsslab.rdu2.redhat.com:2182,fusefabric2.gsslab.rdu2.redhat.com:2182,fusefabric3.gsslab.rdu2.redhat.com:2182 sessionTimeout=310000 watcher=org.apache.curator.ConnectionState@694f15db 2016-07-06 18:19:56,226 | INFO | ad-1-EventThread | ConnectionStateManager | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | State change: CONNECTED 2016-07-06 18:01:33,838 | ERROR | ZooKeeperGroup-0 | ZooKeeperGroup | 143 - io.fabric8.fabric-groups - 1.2.0.redhat-621107 | java.lang.IllegalStateException: Client is not started at com.google.common.base.Preconditions.checkState(Preconditions.java:176)[138:com.google.guava:17.0.0] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:113)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:199)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curaf at io.fabric8.groups.internal.ZooKeeperGroup$4.run(ZooKeeperGroup.java:157)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_60] at java.lang.Thread.run(Thread.java:745)[:1.7.0_60]
And
2016-07-06 18:01:33,852 | WARN | MCF-1-thread-1 | ServletEventDispatcher | 92 - org.ops4j.pax.web.pax-web-runtime - 3.2.9 | Listener caused an exception, will be ignored java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Client is not started at java.util.concurrent.FutureTask.report(FutureTask.java:122)[:1.7.0_60] at java.util.concurrent.FutureTask.get(FutureTask.java:188)[:1.7.0_60] at java.util.concurrent.AbstractExecutorService.doInvokeAny(AbstractExecutorService.java:193)[:1.7.0_60] at java.util.concurrent.AbstractExecutorService.invokeAny(AbstractExecutorService.java:225)[:1.7.0_60] at org.ops4j.pax.web.service.internal.ServletEventDispatcher.callListener(ServletEventDispatcher.java:185) at org.ops4j.pax.web.service.internal.ServletEventDispatcher.callListeners(ServletEventDispatcher.java:174) at org.ops4j.pax.web.service.internal.ServletEventDispatcher.servletEvent(ServletEventDispatcher.java:135) at org.ops4j.pax.web.service.internal.HttpServiceStarted.servletEvent(HttpServiceStarted.java:193) at org.ops4j.pax.web.service.internal.HttpServiceStarted.unregister(HttpServiceStarted.java:270) at org.ops4j.pax.web.service.internal.HttpServiceProxy.unregister(HttpServiceProxy.java:77) at io.fabric8.maven.proxy.impl.MavenProxyRegistrationHandler.destroy(MavenProxyRegistrationHandler.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.7.0_60] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)[:1.7.0_60] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)[:1.7.0_60] at java.lang.reflect.Method.invoke(Method.java:606)[:1.7.0_60]
If you shutdown one of the ensemble servers, state changes to suspended, even though there are still 2 or 4 up and only 1 down.
2016-07-06 18:21:03,076 | WARN | SendWorker:3 | QuorumCnxManager | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Send worker leaving thread 2016-07-06 18:21:03,978 | INFO | ad-1-EventThread | ConnectionStateManager | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | State change: SUSPENDED
Bring up the shutdown node and it reconnects to the other 2 fabric nodes, but they do not recognize it. Fabric commands do not work.
18:22:04,793 | INFO | redhat.com:2182) | ClientCnxn | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Session establishment complete on server fusefabric1.gsslab.rdu2.redhat.com/10.10.183.203:2182, sessionid = 0x155c23b4fbd0007, negotiated timeout = 40000
18:22:04,798 | INFO | ad-1-EventThread | ConnectionStateManager | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | State change: CONNECTED
18:22:05,124 | INFO | 0:0:0:0:0:0:2182 | Learner | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | FOLLOWING - LEADER ELECTION TOOK - 90
18:22:05,139 | INFO | 0:0:0:0:0:0:2182 | Learner | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Getting a diff from the leader 0x2000008b2
18:22:05,143 | WARN | 0:0:0:0:0:0:2182 | Learner | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Got zxid 0x2000008ab expected 0x1
18:22:05,158 | INFO | 0:0:0:0:0:0:2182 | FileTxnSnapLog | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Snapshotting: 0x2000008b2 to /opt/fuse/jboss-fuse-6.2.1.redhat-107/data/zookeeper/0001/version-2/snapshot.2000008b2
18:22:05,206 | INFO | SyncThread:3 | FileTxnLog | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Creating new log file: log.2000008b3
The other server known it got the message, but continues in a connection failure state
2016-07-06 18:22:14,175 | INFO | .10.183.203:3888 | QuorumCnxManager | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Received connection request /10.10.183.205:50322 2016-07-06 18:22:14,192 | INFO | Receiver[myid=1] | FastLeaderElection | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Notification: 1 (message format version), 3 (n.leader), 0x2000008aa (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) FOLLOWING (my state) 2016-07-06 18:23:27,003 | ERROR | ZooKeeperGroup-0 | ConnectionState | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Connection timed out for connection string (fusefabric1.gsslab.rdu2.redhat.com:2182,fusefabric2.gsslab.rdu2.redhat.com:2182,fusefabric3.gsslab.rdu2.redhat.com:2182) and timeout (66000) / elapsed (143025) org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:287)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)[149:io.fabric8.fabric-zookeeper:1.2.0.redhat-621107] at io.fabric8.groups.internal.ZooKeeperGroup.getDataAndStat(ZooKeeperGroup.java:400)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at io.fabric8.groups.internal.ZooKeeperGroup.processChildren(ZooKeeperGroup.java:476)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at io.fabric8.groups.internal.ZooKeeperGroup.refresh(ZooKeeperGroup.java:378)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at io.fabric8.groups.internal.RefreshOperation.invoke(RefreshOperation.java:32)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at io.fabric8.groups.internal.ZooKeeperGroup.mainLoop(ZooKeeperGroup.java:496)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at io.fabric8.groups.internal.ZooKeeperGroup.access$200(ZooKeeperGroup.java:66)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at io.fabric8.groups.internal.ZooKeeperGroup$4.run(ZooKeeperGroup.java:157)[143:io.fabric8.fabric-groups:1.2.0.redhat-621107] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_60] at java.lang.Thread.run(Thread.java:745)[:1.7.0_60]
Eventually, when the session timeout happens, the session is destroyed and things reform.
2016-07-06 18:26:45,854 | WARN | l Console Thread | ConnectionState | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Connection attempt unsuccessful after 341876 (greater than max timeout of 310000). Resetting connection and trying again with a new connection. 2016-07-06 18:26:45,864 | INFO | l Console Thread | ZooKeeper | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Session: 0x355c23be86f0002 closed 2016-07-06 18:26:45,864 | INFO | l Console Thread | ZooKeeper | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | Initiating client connection, connectString=fusefabric1.gsslab.rdu2.redhat.com:2182,fusefabric2.gsslab.rdu2.redhat.com:2182,fusefabric3.gsslab.rdu2.redhat.com:2182 sessionTimeout=310000 watcher=org.apache.curator.ConnectionState@694f15db 2016-07-06 18:26:45,881 | INFO | read-EventThread | ConnectionStateManager | 149 - io.fabric8.fabric-zookeeper - 1.2.0.redhat-621107 | State change: RECONNECTED 2016-07-06 18:26:45,884 | INFO | onStateManager-0 | GitDataStoreImpl | 144 - io.fabric8.fabric-git - 1.2.0.redhat-621107 | Shared Counter (Re)connected, doing a pull
- is related to
-
ENTESB-5206 "Client is not started" Exception
- Closed
-
CURATOR-126 Loading...
- relates to
-
ENTESB-5206 "Client is not started" Exception
- Closed
-
ENTMQ-1753 [mq fabric discovery] mq fabric discovery stops working when ensemble size change
- Closed