• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • False
    • None
    • False

      Our product uses 2-3 JGroup Channels.  Typically, if one works then they all work.  However, we have customer where 2 of the channels is working fine.  But the third is consistently failing to establish a connection on several of our customer's clusters.  Some work, but several fail, all with the same JGroup channel.  The channel property configuration is consistent across clusters, both working and nonworking.  Looking at the packet trace, the nonworking cluster's connections are getting closed by JGroup that is listening on the port:

      55    2024-08-06 16:19:09.052508    130.172.253.82    130.172.253.83    TCP    74    46397→10060 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=1364139487 TSecr=0 WS=128
      56    2024-08-06 16:19:09.052527    130.172.253.83    130.172.253.82    TCP    74    10060→46397 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=455360557 TSecr=1364139487 WS=2    
      57    2024-08-06 16:19:09.052698    130.172.253.82    130.172.253.83    TCP    66    46397→10060 [ACK] Seq=1 Ack=1 Win=32128 Len=0 TSval=1364139487 TSecr=455360557
      58    2024-08-06 16:19:09.052839    130.172.253.82    130.172.253.83    TCP    81    46397→10060 [PSH, ACK] Seq=1 Ack=1 Win=32128 Len=15 TSval=1364139488 TSecr=455360557
      59    2024-08-06 16:19:09.052858    130.172.253.83    130.172.253.82    TCP    66    10060→46397 [ACK] Seq=1 Ack=16 Win=65536 Len=0 TSval=455360558 TSecr=1364139488        
      60    2024-08-06 16:19:09.053295    130.172.253.82    130.172.253.83    TCP    227    46397→10060 [PSH, ACK] Seq=16 Ack=1 Win=32128 Len=161 TSval=1364139488 TSecr=455360558
      61    2024-08-06 16:19:09.053299    130.172.253.83    130.172.253.82    TCP    66    10060→46397 [ACK] Seq=1 Ack=177 Win=66608 Len=0 TSval=455360558 TSecr=1364139488        
      62    2024-08-06 16:19:09.053419    130.172.253.83    130.172.253.82    TCP    73    10060→46397 [PSH, ACK] Seq=1 Ack=177 Win=66608 Len=7 TSval=455360558 TSecr=1364139488    ---------------- Length 7 followed by FIN in next packet    
      63    2024-08-06 16:19:09.053458    130.172.253.83    130.172.253.82    TCP    66    10060→46397 [FIN, ACK] Seq=8 Ack=177 Win=66608 Len=0 TSval=455360558 TSecr=1364139488        
      64    2024-08-06 16:19:09.053513    130.172.253.82    130.172.253.83    TCP    66    46397→10060 [ACK] Seq=177 Ack=8 Win=32128 Len=0 TSval=1364139488 TSecr=455360558
      65    2024-08-06 16:19:09.053795    130.172.253.82    130.172.253.83    TCP    66    46397→10060 [FIN, ACK] Seq=177 Ack=9 Win=32128 Len=0 TSval=1364139489 TSecr=455360558
      67    2024-08-06 16:19:09.700572    130.172.253.83    130.172.253.82    TCP    74    46395→10060 [SYN] Seq=0 Win=32120 Len=0 MSS=1460 SACK_PERM=1 TSval=455361206 TSecr=0 WS=128
      68    2024-08-06 16:19:09.700703    130.172.253.82    130.172.253.83    TCP    74    10060→46395 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=1364140135 TSecr=455361206 WS=2

      On the working cluster, the PSH, ACK is larger - lenth 124 - and followed by more PSH, ACK:

      66    2024-08-06 16:24:54.587026    130.172.252.94    130.172.252.95    TCP    190    10060→54833 [PSH, ACK] Seq=1 Ack=177 Win=65024 Len=124 TSval=3278739087 TSecr=1856588722  --------------- Length 124 follows by PSH, ACK, lenth 141
      67    2024-08-06 16:24:54.587065    130.172.252.94    130.172.252.95    TCP    207    10060→54833 [PSH, ACK] Seq=125 Ack=177 Win=65024 Len=141 TSval=3278739087 TSecr=1856588722
      68    2024-08-06 16:24:54.587130    130.172.252.95    130.172.252.94    TCP    66    54833→10060 [ACK] Seq=177 Ack=125 Win=64256 Len=0 TSval=1856588723 TSecr=3278739087
      69    2024-08-06 16:24:54.587156    130.172.252.95    130.172.252.94    TCP    66    54833→10060 [ACK] Seq=177 Ack=266 Win=64128 Len=0 TSval=1856588723 TSecr=3278739087
      70    2024-08-06 16:24:54.590620    130.172.252.95    130.172.252.94    TCP    169    54833→10060 [PSH, ACK] Seq=177 Ack=266 Win=64128 Len=103 TSval=1856588726 TSecr=3278739087
      71    2024-08-06 16:24:54.591537    130.172.252.94    130.172.252.95    TCP    162    10060→54833 [PSH, ACK] Seq=266 Ack=280 Win=65024 Len=96 TSval=3278739092 TSecr=1856588726
      72    2024-08-06 16:24:54.591577    130.172.252.94    130.172.252.95    TCP    239    10060→54833 [PSH, ACK] Seq=362 Ack=280 Win=65024 Len=173 TSval=3278739092 TSecr=1856588726
      73    2024-08-06 16:24:54.591643    130.172.252.94    130.172.252.95    TCP    204    10060→54833 [PSH, ACK] Seq=535 Ack=280 Win=65024 Len=138 TSval=3278739092 TSecr=1856588726
      74    2024-08-06 16:24:54.591664    130.172.252.94    130.172.252.95    TCP    218    10060→54833 [PSH, ACK] Seq=673 Ack=280 Win=65024 Len=152 TSval=3278739092 TSecr=1856588726
      75    2024-08-06 16:24:54.591682    130.172.252.95    130.172.252.94    TCP    66    54833→10060 [ACK] Seq=280 Ack=535 Win=64128 Len=0 TSval=1856588727 TSecr=3278739092

      Are there any tools or tips to diagnose why the non-working Channel getting terminated?

      Thanks,
      Ted

              rhn-engineering-bban Bela Ban
              ted.carlson@syncsort.com Ted Carlson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: