-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
Our product uses 2-3 JGroup Channels. Typically, if one works then they all work. However, we have customer where 2 of the channels is working fine. But the third is consistently failing to establish a connection on several of our customer's clusters. Some work, but several fail, all with the same JGroup channel. The channel property configuration is consistent across clusters, both working and nonworking. Looking at the packet trace, the nonworking cluster's connections are getting closed by JGroup that is listening on the port:
55 2024-08-06 16:19:09.052508 130.172.253.82 130.172.253.83 TCP 74 46397→10060 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=1364139487 TSecr=0 WS=128
56 2024-08-06 16:19:09.052527 130.172.253.83 130.172.253.82 TCP 74 10060→46397 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=455360557 TSecr=1364139487 WS=2
57 2024-08-06 16:19:09.052698 130.172.253.82 130.172.253.83 TCP 66 46397→10060 [ACK] Seq=1 Ack=1 Win=32128 Len=0 TSval=1364139487 TSecr=455360557
58 2024-08-06 16:19:09.052839 130.172.253.82 130.172.253.83 TCP 81 46397→10060 [PSH, ACK] Seq=1 Ack=1 Win=32128 Len=15 TSval=1364139488 TSecr=455360557
59 2024-08-06 16:19:09.052858 130.172.253.83 130.172.253.82 TCP 66 10060→46397 [ACK] Seq=1 Ack=16 Win=65536 Len=0 TSval=455360558 TSecr=1364139488
60 2024-08-06 16:19:09.053295 130.172.253.82 130.172.253.83 TCP 227 46397→10060 [PSH, ACK] Seq=16 Ack=1 Win=32128 Len=161 TSval=1364139488 TSecr=455360558
61 2024-08-06 16:19:09.053299 130.172.253.83 130.172.253.82 TCP 66 10060→46397 [ACK] Seq=1 Ack=177 Win=66608 Len=0 TSval=455360558 TSecr=1364139488
62 2024-08-06 16:19:09.053419 130.172.253.83 130.172.253.82 TCP 73 10060→46397 [PSH, ACK] Seq=1 Ack=177 Win=66608 Len=7 TSval=455360558 TSecr=1364139488 ---------------- Length 7 followed by FIN in next packet
63 2024-08-06 16:19:09.053458 130.172.253.83 130.172.253.82 TCP 66 10060→46397 [FIN, ACK] Seq=8 Ack=177 Win=66608 Len=0 TSval=455360558 TSecr=1364139488
64 2024-08-06 16:19:09.053513 130.172.253.82 130.172.253.83 TCP 66 46397→10060 [ACK] Seq=177 Ack=8 Win=32128 Len=0 TSval=1364139488 TSecr=455360558
65 2024-08-06 16:19:09.053795 130.172.253.82 130.172.253.83 TCP 66 46397→10060 [FIN, ACK] Seq=177 Ack=9 Win=32128 Len=0 TSval=1364139489 TSecr=455360558
67 2024-08-06 16:19:09.700572 130.172.253.83 130.172.253.82 TCP 74 46395→10060 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=455361206 TSecr=0 WS=128
68 2024-08-06 16:19:09.700703 130.172.253.82 130.172.253.83 TCP 74 10060→46395 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=1364140135 TSecr=455361206 WS=2
On the working cluster, the PSH, ACK is larger - lenth 124 - and followed by more PSH, ACK:
66 2024-08-06 16:24:54.587026 130.172.252.94 130.172.252.95 TCP 190 10060→54833 [PSH, ACK] Seq=1 Ack=177 Win=65024 Len=124 TSval=3278739087 TSecr=1856588722 --------------- Length 124 follows by PSH, ACK, lenth 141
67 2024-08-06 16:24:54.587065 130.172.252.94 130.172.252.95 TCP 207 10060→54833 [PSH, ACK] Seq=125 Ack=177 Win=65024 Len=141 TSval=3278739087 TSecr=1856588722
68 2024-08-06 16:24:54.587130 130.172.252.95 130.172.252.94 TCP 66 54833→10060 [ACK] Seq=177 Ack=125 Win=64256 Len=0 TSval=1856588723 TSecr=3278739087
69 2024-08-06 16:24:54.587156 130.172.252.95 130.172.252.94 TCP 66 54833→10060 [ACK] Seq=177 Ack=266 Win=64128 Len=0 TSval=1856588723 TSecr=3278739087
70 2024-08-06 16:24:54.590620 130.172.252.95 130.172.252.94 TCP 169 54833→10060 [PSH, ACK] Seq=177 Ack=266 Win=64128 Len=103 TSval=1856588726 TSecr=3278739087
71 2024-08-06 16:24:54.591537 130.172.252.94 130.172.252.95 TCP 162 10060→54833 [PSH, ACK] Seq=266 Ack=280 Win=65024 Len=96 TSval=3278739092 TSecr=1856588726
72 2024-08-06 16:24:54.591577 130.172.252.94 130.172.252.95 TCP 239 10060→54833 [PSH, ACK] Seq=362 Ack=280 Win=65024 Len=173 TSval=3278739092 TSecr=1856588726
73 2024-08-06 16:24:54.591643 130.172.252.94 130.172.252.95 TCP 204 10060→54833 [PSH, ACK] Seq=535 Ack=280 Win=65024 Len=138 TSval=3278739092 TSecr=1856588726
74 2024-08-06 16:24:54.591664 130.172.252.94 130.172.252.95 TCP 218 10060→54833 [PSH, ACK] Seq=673 Ack=280 Win=65024 Len=152 TSval=3278739092 TSecr=1856588726
75 2024-08-06 16:24:54.591682 130.172.252.95 130.172.252.94 TCP 66 54833→10060 [ACK] Seq=280 Ack=535 Win=64128 Len=0 TSval=1856588727 TSecr=3278739092
Are there any tools or tips to diagnose why the non-working Channel getting terminated?
Thanks,
Ted