Uploaded image for project: 'Hot Rod Native client'
  1. Hot Rod Native client
  2. HRCPP-176

cpp-client arithmetic exception (crash)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • 7.0.0.Alpha1
    • Core
    • None

    Description

      Moving over the issue from GitHub:
      https://github.com/infinispan/cpp-client/issues/153

      Hi,
      I occasionally get a Linux arithmetic exception (crash) in my client application when testing the cpp-client library, for example:
      Start cpp-client…

      Program terminated with signal 8, Arithmetic exception.
       (gdb) bt
      #0  0x00000000004309f7 in infinispan::hotrod::consistenthash::ConsistentHashV1::getServer(infinispan::hotrod::hrbytes const&) ()
      #1  0x0000000000426388 in infinispan::hotrod::transport::TcpTransportFactory::getTransport(infinispan::hotrod::hrbytes const&) ()
      #2  0x00000000004164c9 in infinispan::hotrod::operations::AbstractKeyOperation<infinispan::hotrod::hrbytes>::getTransport(int) ()
      #3  0x0000000000416fe9 in infinispan::hotrod::operations::RetryOnFailureOperation<infinispan::hotrod::hrbytes>::execute() ()
      #4  0x00000000004119c7 in infinispan::hotrod::RemoteCacheImpl::put(infinispan::hotrod::RemoteCacheBase&, void const*, void const*, unsigned long, unsigned long) ()
      #5  0x000000000040b0ca in infinispan::hotrod::RemoteCacheBase::base_put(void const*, void const*, long, long) ()
      #6  0x0000000000408cf7 in infinispan::hotrod::RemoteCache<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::put (this=0x7ffff2fc8ca0, key="12345", val="a value", lifespan=0, lifespanUnit=SECONDS, maxIdle=0, 
          maxIdleUnit=SECONDS) at /home/sensus/installs/exp/cpp-client/include/infinispan/hotrod/RemoteCache.h:162
      #7  0x0000000000408d43 in infinispan::hotrod::RemoteCache<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >::put (this=0x7ffff2fc8ca0, key="12345", val="a value", lifespan=0, maxIdle=0)
          at /home/sensus/installs/exp/cpp-client/include/infinispan/hotrod/RemoteCache.h:115
      #8  0x0000000000403d8e in main (argc=4, args=0x7ffff2fc8f48) at loop3.cpp:99
      

      The following is reported by the client just before the crash:

      ERROR [RetryOnFailureOperation.h:68] Exception encountered, retry 7 of 8: Request for message id[21386] returned �org.infinispan.remoting.transport.jgroups.SuspectException: One or more nodes have left the cluster while replicating command SingleRpcCommand{cacheName='cacheTest', command=PutKeyValueCommand{key=[B0x3132333435, value=[B@414dd5ae, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=NumericVersion{version=11259003363409760}}, successful=true}}
      
      Floating point exception (core dumped)
      

      Here is info on the configuration and how testing was done (same as my previous "hang" report).

      The test environment is an Infinispan cluster on four virtual machines (RHEL 5.10, 2.6.18-371.9.1.el5). The cluster is configured to communicate using TCP:

            cluster="flexnet" stack="tcp" />
            default-stack="${jboss.default.jgroups.stack:tcp}">
      

      The machines are statically configured to know about each other:

            <property name="initial_hosts">infinispan-test1a[7800],infinispan-test2a[7800],infinispan-test3a[7800],infinispan-test4a[7800],</property>
      

      On each server VM, I have a simple client which uses the cpp-client library. The client contains a loop that 10 times a second just does a put.
      key = "12345";
      value = "a value";
      pprev = cache.put(key, value);

      All four clients use the same code (same key, same value).

      Each client is started with the IP address and TCP port of its local server and the cache name:

      ConfigurationBuilder builder;
              builder.addServer().host(args[1]).port(atoi(args[2]));
      StringCache cache = manager.getCache<std::string, std::string>(args[3], true);
      

      For example:

      	./loop3 192.168.136.131 11222 cacheTest
      

      Where cacheTest is defined as:

      <distributed-cache name="cacheTest" mode="SYNC"
              segments="20" owners="2" remote-timeout="30000" start="EAGER" />
      

      This test configuration will run forever with no reported errors if I leave it alone…

      High-availability testing:
      The test is to gracefully stop the Infinispan service on one of the hosts. In my configuration, one way of doing that is just:
      sudo /etc/init.d/infinispan stop

      On any one of the servers the following sequence is run until the crash occurs:

      while true; do sudo /etc/init.d/flexnet-infinispan stop; sleep 30; sudo /etc/init.d/flexnet-infinispan start; sleep 60; done
      

      This may take hours for the crash to occur...

      Alternately, the following steps can be done manually on each server platform:

      [Server1]#  service infinispan stop
      Wait 30 seconds
      [Server1]#  service infinispan start
      Wait 60 seconds
      [Server2]#  service infinispan stop
      Wait 30 seconds
      [Server2]#  service infinispan start
      Wait 60 seconds
      [Server3]#  service infinispan stop
      Wait 30 seconds
      [Server3]#  service infinispan start
      Wait 60 seconds
      [Server4]#  service infinispan stop
      Wait 30 seconds
      [Server4]#  service infinispan start
      Wait 60 seconds
      

      Repeat the above until the crash in the hotrod client occurs…

      I’ll try to get more info as to what is actually causing the crash…

      Attachments

        Activity

          People

            isavin_jira Ion Savin (Inactive)
            isavin_jira Ion Savin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: