Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-16100

Embedded Distributed Infinispan Cluster: Cache Event Listener Issue After Network Disconnection

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 13.0.22.FInal
    • None
    • None

      I'm encountering an issue with an embedded Infinispan clustered cache setup in my application. Following is the scenario,

      Setup Overview

      I have a distributed setup with 2 nodes running Infinispan clustered cache embedded in a Spring Boot application. The nodes communicate over a network to form a cluster and share cache data.

      Problem Description

      When there is a network disconnection between nodes, they get disconnected and form split clusters.

      {{2024-05-07 03:39:29,677 INFO o.i.r.t.j.JGroupsTransport [VERIFY_SUSPECT.TimerThread-92,node2] ISPN000094: Received new cluster view for channel my-cluster: [node2|2] (1) [node2]
      2024-05-07 03:39:29,678 INFO o.i.u.l.e.i.BasicEventLogger [VERIFY_SUSPECT.TimerThread-92,node2] ISPN100001: Node node1 left the cluster}}
      

      After some time, when the network connection is restored, the nodes reconnect as subgroups within the cluster.

      {{2024-05-07 03:40:19,272 INFO o.i.r.t.j.JGroupsTransport [jgroups-82,node2] ISPN000093: Received new, MERGED cluster view for channel my-cluster: MergeView::[node1|3] (2) [node1, node2], 2 subgroups: [node2|2] (1) [node2], [node1|2] (1) [node1]
      2024-05-07 03:40:19,273 INFO o.i.u.l.e.i.BasicEventLogger [jgroups-82,node2] ISPN100000: Node node1 joined the cluster}}
      

      However, after this event, I noticed that the Infinispan cache event listener on one of the nodes (This happens randomly, node1 or node2) stops working without logging any errors. (The listener of the other node works fine.)

      Additional Information

      Following is the cache event listener implementation. It worked fine before the network disconnection event.

      {{@Listener(clustered = true)
      public class CacheEventListener {
          private static final Logger LOGGER = LogManager.getLogger(CacheEventListener.class);
      
          @CacheEntryCreated
          public void entryCreated(CacheEntryCreatedEvent event) {
             LOGGER.info("Cache created event for {}", event.getCache().getName());
          }
      
          @CacheEntryModified
          public void entryModified(CacheEntryModifiedEvent event) {
              LOGGER.info("Cache modified event for {}", event.getCache().getName());
          }
      }}}
      

      There are no logged errors or exceptions related to the cache event listener or Infinispan configuration.

      Following is my Jgroups configuration.

      {{<config xmlns="urn:org:jgroups"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.2.xsd">
          <!-- jgroups.tcp.address is deprecated and will be removed, see ISPN-11867 -->
          <TCP bind_addr="${jgroups.bind.address,jgroups.tcp.address:SITE_LOCAL}"
               bind_port="${jgroups.bind.port,jgroups.tcp.port:7805}"
               client_bind_port="7004"
               enable_diagnostics="false"
               thread_naming_pattern="pl"
               send_buf_size="640k"
               sock_conn_timeout="300"
               bundler_type="no-bundler"
      
               thread_pool.min_threads="${jgroups.thread_pool.min_threads:2}"
               thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
               thread_pool.keep_alive_time="60000"
      
               thread_dumps_threshold="${jgroups.thread_dumps_threshold:10000}"
          />
          <TCPPING async_discovery="true"
                   initial_hosts="192.168.1.7[7805]"
                   port_range="0"
          />
          <MERGE3 min_interval="10000"
                  max_interval="30000"
          />
          <FD_SOCK/>
          <!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->
          <FD_ALL timeout="10000"
                  interval="2000"
                  timeout_check_interval="1000"
          />
          <VERIFY_SUSPECT timeout="1000"/>
          <pbcast.NAKACK2 use_mcast_xmit="false"
                          xmit_interval="100"
                          xmit_table_num_rows="50"
                          xmit_table_msgs_per_row="1024"
                          xmit_table_max_compaction_time="30000"
                          resend_last_seqno="true"
          />
          <UNICAST3 xmit_interval="100"
                    xmit_table_num_rows="50"
                    xmit_table_msgs_per_row="1024"
                    xmit_table_max_compaction_time="30000"
          />
          <pbcast.STABLE stability_delay="500"
                         desired_avg_gossip="5000"
                         max_bytes="1M"
          />
          <pbcast.GMS print_local_addr="false"
                      join_timeout="${jgroups.join_timeout:2000}"
          />
          <UFC max_credits="4m"
               min_threshold="0.40"
          />
          <MFC max_credits="4m"
               min_threshold="0.40"
          />
          <FRAG3/>
      </config>}}
      
      • jgroups version : 4.2.18.Final
      • infinispan version : 13.0.22.Final
      • springboot version : 2.2.6.RELEASE
      • JDK version : 1.8

              wburns@redhat.com Will Burns
              shakya_rajindi Shakya Wanigarathna (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: