-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
12.0.1.Final
-
None
-
None
-
Undefined
We're using an Infinispan cluster with 3 nodes to back our Keycloak instances. In addition, RocksDB is used to persist the cache content for disaster recovery.
One Infinispan node suddenly crashed with a core dump indicating an issue in jgroups:
.. nothing interesting before, happily running for weeks Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:21 prod-ispn-1 bash[807]: # A fatal error has been detected by the Java Runtime Environment: Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:21 prod-ispn-1 bash[807]: # SIGSEGV (0xb) at pc=0x00007f5d7297aae7, pid=807, tid=142488 Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:21 prod-ispn-1 bash[807]: # JRE version: OpenJDK Runtime Environment (11.0.10+9) (build 11.0.10+9-Ubuntu-0ubuntu1.20.04) Apr 29 09:30:21 prod-ispn-1 bash[807]: # Java VM: OpenJDK 64-Bit Server VM (11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) Apr 29 09:30:21 prod-ispn-1 bash[807]: # Problematic frame: Apr 29 09:30:21 prod-ispn-1 bash[807]: # J 15613 c2 org.jgroups.protocols.UNICAST3.triggerXmit()V (421 bytes) @ 0x00007f5d7297aae7 [0x00007f5d72979ca0+0x0000000000000e47] Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:21 prod-ispn-1 bash[807]: # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to //core.807) Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:21 prod-ispn-1 bash[807]: # An error report file with more information is saved as: Apr 29 09:30:21 prod-ispn-1 bash[807]: # /tmp/hs_err_pid807.log Apr 29 09:30:21 prod-ispn-1 bash[807]: Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:21 prod-ispn-1 bash[807]: # If you would like to submit a bug report, please visit: Apr 29 09:30:21 prod-ispn-1 bash[807]: # https://bugs.launchpad.net/ubuntu/+source/openjdk-lts Apr 29 09:30:21 prod-ispn-1 bash[807]: # Apr 29 09:30:22 prod-ispn-1 bash[631]: *** Server process (807) received ABRT signal ***
Core dump hs_err_pid807.log is attached.
Failover to node 2 and 3 worked as expected without issues.
However, after starting Infinispan on node 1 again, the RocksDB used as the persistence layer is corrupted:
... Apr 29 09:31:02 prod-ispn-1 bash[142605]: #033[0m09:31:02,330 INFO (main) [org.infinispan.CLUSTER] ISPN000079: Channel cluster local address is prod-ispn-1, physical addresses are [10.126.2.16:7800]#033[m Apr 29 09:31:03 prod-ispn-1 bash[142605]: #033[0m09:31:03,161 INFO (main) [org.jboss.threads] JBoss Threads version 2.3.3.Final#033[m Apr 29 09:31:03 prod-ispn-1 bash[142605]: #033[33m09:31:03,343 WARN (main) [org.infinispan.encoding.impl.StorageConfigurationManager] ISPN000599: Configuration for cache 'sessions' does not define the encoding for keys or values. If you use operations that require data conversion or queries, you should configure the cache with a specific MediaType for keys or values.#033[m Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:31:18 prod-ispn-1 bash[142605]: # A fatal error has been detected by the Java Runtime Environment: Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:31:18 prod-ispn-1 bash[142605]: # SIGSEGV (0xb) at pc=0x00007fb47597ecd9, pid=142605, tid=143913 Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:31:18 prod-ispn-1 bash[142605]: # JRE version: OpenJDK Runtime Environment (11.0.11+9) (build 11.0.11+9-Ubuntu-0ubuntu2.20.04) Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Java VM: OpenJDK 64-Bit Server VM (11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Problematic frame: Apr 29 09:31:18 prod-ispn-1 bash[142605]: # C [librocksdbjni10804339340371980225.so+0x4e2cd9] rocksdb::BlockBasedTableIterator::NextAndGetResult(rocksdb::IterateResult*)+0x19 Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to //core.142605) Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:31:18 prod-ispn-1 bash[142605]: # An error report file with more information is saved as: Apr 29 09:31:18 prod-ispn-1 bash[142605]: # /tmp/hs_err_pid142605.log Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:31:18 prod-ispn-1 bash[142605]: # If you would like to submit a bug report, please visit: Apr 29 09:31:18 prod-ispn-1 bash[142605]: # https://bugs.launchpad.net/ubuntu/+source/openjdk-lts Apr 29 09:31:18 prod-ispn-1 bash[142605]: # The crash happened outside the Java Virtual Machine in native code. Apr 29 09:31:18 prod-ispn-1 bash[142605]: # See problematic frame for where to report the bug. Apr 29 09:31:18 prod-ispn-1 bash[142605]: # Apr 29 09:32:12 prod-ispn-1 bash[142511]: *** Server process (142605) received ABRT signal ***
Core dump hs_err_pid142605.log is attached.
Unfortunately this second crash lead to the other two Infinispan instances running into continuous timeouts, eventually becoming completely unresponsive. All instances had to be shut down to to recover the cluster. RocksDB persistence of all instances was damaged afterwards, leading to complete data loss.