Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-12997

Crash due to jgroups error

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 12.0.1.Final
    • None
    • None
    • Undefined

      We're using an Infinispan cluster with 3 nodes to back our Keycloak instances. In addition, RocksDB is used to persist the cache content for disaster recovery.

      One Infinispan node suddenly crashed with a core dump indicating an issue in jgroups:

      .. nothing interesting before, happily running for weeks
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # A fatal error has been detected by the Java Runtime Environment:
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #  SIGSEGV (0xb) at pc=0x00007f5d7297aae7, pid=807, tid=142488
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # JRE version: OpenJDK Runtime Environment (11.0.10+9) (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # Java VM: OpenJDK 64-Bit Server VM (11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # Problematic frame:
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # J 15613 c2 org.jgroups.protocols.UNICAST3.triggerXmit()V (421 bytes) @ 0x00007f5d7297aae7 [0x00007f5d72979ca0+0x0000000000000e47]
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to //core.807)
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # An error report file with more information is saved as:
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # /tmp/hs_err_pid807.log
      Apr 29 09:30:21 prod-ispn-1 bash[807]: Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:21 prod-ispn-1 bash[807]: # If you would like to submit a bug report, please visit:
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
      Apr 29 09:30:21 prod-ispn-1 bash[807]: #
      Apr 29 09:30:22 prod-ispn-1 bash[631]: *** Server process (807) received ABRT signal ***
      

      Core dump hs_err_pid807.log is attached.

      Failover to node 2 and 3 worked as expected without issues.

      However, after starting Infinispan on node 1 again, the RocksDB used as the persistence layer is corrupted:

      ...
      Apr 29 09:31:02 prod-ispn-1 bash[142605]: #033[0m09:31:02,330 INFO  (main) [org.infinispan.CLUSTER] ISPN000079: Channel cluster local address is prod-ispn-1, physical addresses are [10.126.2.16:7800]#033[m
      Apr 29 09:31:03 prod-ispn-1 bash[142605]: #033[0m09:31:03,161 INFO  (main) [org.jboss.threads] JBoss Threads version 2.3.3.Final#033[m
      Apr 29 09:31:03 prod-ispn-1 bash[142605]: #033[33m09:31:03,343 WARN  (main) [org.infinispan.encoding.impl.StorageConfigurationManager] ISPN000599: Configuration for cache 'sessions' does not define the encoding for keys or values. If you use operations that require data conversion or queries, you should configure the cache with a specific MediaType for keys or values.#033[m
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # A fatal error has been detected by the Java Runtime Environment:
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #  SIGSEGV (0xb) at pc=0x00007fb47597ecd9, pid=142605, tid=143913
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # JRE version: OpenJDK Runtime Environment (11.0.11+9) (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Java VM: OpenJDK 64-Bit Server VM (11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Problematic frame:
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # C  [librocksdbjni10804339340371980225.so+0x4e2cd9]  rocksdb::BlockBasedTableIterator::NextAndGetResult(rocksdb::IterateResult*)+0x19
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to //core.142605)
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # An error report file with more information is saved as:
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # /tmp/hs_err_pid142605.log
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # If you would like to submit a bug report, please visit:
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # The crash happened outside the Java Virtual Machine in native code.
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: # See problematic frame for where to report the bug.
      Apr 29 09:31:18 prod-ispn-1 bash[142605]: #
      Apr 29 09:32:12 prod-ispn-1 bash[142511]: *** Server process (142605) received ABRT signal ***
      

      Core dump hs_err_pid142605.log is attached.

      Unfortunately this second crash lead to the other two Infinispan instances running into continuous timeouts, eventually becoming completely unresponsive. All instances had to be shut down to to recover the cluster. RocksDB persistence of all instances was damaged afterwards, leading to complete data loss.

        1. hs_err_pid807.log
          421 kB
        2. hs_err_pid142605.log
          383 kB
        3. infinispan.xml
          4 kB

              Unassigned Unassigned
              georgpace Georg F (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: