Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
7.0.0.Alpha1
-
None
Description
Moving over the issue from GitHub:
https://github.com/infinispan/cpp-client/issues/153
—
Hi,
I occasionally get a Linux arithmetic exception (crash) in my client application when testing the cpp-client library, for example:
Start cpp-client…
Program terminated with signal 8, Arithmetic exception. (gdb) bt #0 0x00000000004309f7 in infinispan::hotrod::consistenthash::ConsistentHashV1::getServer(infinispan::hotrod::hrbytes const&) () #1 0x0000000000426388 in infinispan::hotrod::transport::TcpTransportFactory::getTransport(infinispan::hotrod::hrbytes const&) () #2 0x00000000004164c9 in infinispan::hotrod::operations::AbstractKeyOperation<infinispan::hotrod::hrbytes>::getTransport(int) () #3 0x0000000000416fe9 in infinispan::hotrod::operations::RetryOnFailureOperation<infinispan::hotrod::hrbytes>::execute() () #4 0x00000000004119c7 in infinispan::hotrod::RemoteCacheImpl::put(infinispan::hotrod::RemoteCacheBase&, void const*, void const*, unsigned long, unsigned long) () #5 0x000000000040b0ca in infinispan::hotrod::RemoteCacheBase::base_put(void const*, void const*, long, long) () #6 0x0000000000408cf7 in infinispan::hotrod::RemoteCache<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::put (this=0x7ffff2fc8ca0, key="12345", val="a value", lifespan=0, lifespanUnit=SECONDS, maxIdle=0, maxIdleUnit=SECONDS) at /home/sensus/installs/exp/cpp-client/include/infinispan/hotrod/RemoteCache.h:162 #7 0x0000000000408d43 in infinispan::hotrod::RemoteCache<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::put (this=0x7ffff2fc8ca0, key="12345", val="a value", lifespan=0, maxIdle=0) at /home/sensus/installs/exp/cpp-client/include/infinispan/hotrod/RemoteCache.h:115 #8 0x0000000000403d8e in main (argc=4, args=0x7ffff2fc8f48) at loop3.cpp:99
The following is reported by the client just before the crash:
ERROR [RetryOnFailureOperation.h:68] Exception encountered, retry 7 of 8: Request for message id[21386] returned �org.infinispan.remoting.transport.jgroups.SuspectException: One or more nodes have left the cluster while replicating command SingleRpcCommand{cacheName='cacheTest', command=PutKeyValueCommand{key=[B0x3132333435, value=[B@414dd5ae, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=NumericVersion{version=11259003363409760}}, successful=true}} Floating point exception (core dumped)
Here is info on the configuration and how testing was done (same as my previous "hang" report).
The test environment is an Infinispan cluster on four virtual machines (RHEL 5.10, 2.6.18-371.9.1.el5). The cluster is configured to communicate using TCP:
cluster="flexnet" stack="tcp" /> default-stack="${jboss.default.jgroups.stack:tcp}">
The machines are statically configured to know about each other:
<property name="initial_hosts">infinispan-test1a[7800],infinispan-test2a[7800],infinispan-test3a[7800],infinispan-test4a[7800],</property>
On each server VM, I have a simple client which uses the cpp-client library. The client contains a loop that 10 times a second just does a put.
key = "12345";
value = "a value";
pprev = cache.put(key, value);
All four clients use the same code (same key, same value).
Each client is started with the IP address and TCP port of its local server and the cache name:
ConfigurationBuilder builder; builder.addServer().host(args[1]).port(atoi(args[2])); StringCache cache = manager.getCache<std::string, std::string>(args[3], true);
For example:
./loop3 192.168.136.131 11222 cacheTest
Where cacheTest is defined as:
<distributed-cache name="cacheTest" mode="SYNC" segments="20" owners="2" remote-timeout="30000" start="EAGER" />
This test configuration will run forever with no reported errors if I leave it alone…
High-availability testing:
The test is to gracefully stop the Infinispan service on one of the hosts. In my configuration, one way of doing that is just:
sudo /etc/init.d/infinispan stop
On any one of the servers the following sequence is run until the crash occurs:
while true; do sudo /etc/init.d/flexnet-infinispan stop; sleep 30; sudo /etc/init.d/flexnet-infinispan start; sleep 60; done
This may take hours for the crash to occur...
Alternately, the following steps can be done manually on each server platform:
[Server1]# service infinispan stop Wait 30 seconds [Server1]# service infinispan start Wait 60 seconds [Server2]# service infinispan stop Wait 30 seconds [Server2]# service infinispan start Wait 60 seconds [Server3]# service infinispan stop Wait 30 seconds [Server3]# service infinispan start Wait 60 seconds [Server4]# service infinispan stop Wait 30 seconds [Server4]# service infinispan start Wait 60 seconds
Repeat the above until the crash in the hotrod client occurs…
I’ll try to get more info as to what is actually causing the crash…