Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4787

FD_SOCK timeout causing random test failures


      When a test doesn't require failure detection, we remove the FD protocol from the JGroups stack, but we keep FD_SOCK. Normally this isn't a problem, but in rare occasions it can fail to open the ping socked and the cluster doesn't form:

      22:51:45,978 DEBUG (testng-GlobalKeySetTaskTest:) [FD_SOCK] NodeA-60950: VIEW_CHANGE received: [NodeA-60950]
      22:51:46,401 DEBUG (Incoming-1,NodeA-60950:) [FD_SOCK] NodeA-60950: VIEW_CHANGE received: [NodeA-60950, NodeB-24360]
      22:51:46,675 DEBUG (FD_SOCK pinger,NodeA-60950:) [FD_SOCK] NodeA-60950: ping_dest is NodeB-24360, pingable_mbrs=[NodeA-60950, NodeB-24360]
      22:51:46,803 DEBUG (testng-GlobalKeySetTaskTest:) [FD_SOCK] NodeB-24360: VIEW_CHANGE received: [NodeA-60950, NodeB-24360]
      22:51:47,149 DEBUG (FD_SOCK pinger,NodeB-24360:) [FD_SOCK] NodeB-24360: ping_dest is NodeA-60950, pingable_mbrs=[NodeA-60950, NodeB-24360]
      22:51:49,113 WARN  (FD_SOCK pinger,NodeB-24360:) [FD_SOCK] NodeB-24360: creating the client socket failed: java.net.SocketTimeoutException
      22:51:49,116 DEBUG (FD_SOCK pinger,NodeB-24360:) [FD_SOCK] NodeB-24360: could not create socket to NodeA-60950 (pinger thread is running)
      22:51:49,116 DEBUG (FD_SOCK pinger,NodeB-24360:) [FD_SOCK] NodeB-24360: suspecting NodeA-60950
      22:51:49,117 DEBUG (FD_SOCK pinger,NodeB-24360:) [FD_SOCK] NodeB-24360: ping_dest is null, pingable_mbrs=[NodeB-24360]
      22:51:49,117 DEBUG (INT-2,NodeB-24360:) [FD_SOCK] NodeB-24360: suspecting [NodeA-60950]
      22:51:49,262 DEBUG (Incoming-1,NodeB-24360:) [FD_SOCK] NodeB-24360: VIEW_CHANGE received: [NodeB-24360]
      22:55:49,387 DEBUG (FD_SOCK pinger,NodeA-60950:) [FD_SOCK] 89fe2d3e-0b0a-dae8-a63a-6272ea5b7372: socket to NodeB-24360 was closed gracefully

      We should increase FD_SOCK.sock_conn_timeout and remove FD_SOCK from the stack unless the test uses TransportFlags.withMerge().

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            0 Vote for this issue
            1 Start watching this issue