Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 13.0.0.Final
Affects Version/s: 12.1.7.Final, 13.0.0.Final
Component/s: Hot Rod
Labels:
None

Git Pull Request:
https://github.com/infinispan/infinispan/pull/9522

Before the ~~ISPN-12598~~ fix, each client operation could decide to switch to another cluster or to the initial server list after max-retries+1 transport errors (e.g. connection/socket timeout). This meant a client with max-retries==0 would attempt to switch after every transport error, causing a pseudo-infinite cycle of back-and-forth switching.

After the ~~ISPN-12598~~ fix, a client operation only tries to switch to another cluster or to the initial server list after it has marked all the servers as failed. Now we have the opposite problem: if a client has max-retries < cluster size, a single operation can never mark all the servers as failed, so it will never switch.

The solution is to move the tracking of failed servers from individual operation level (RetryOnFailureOperation) to the remote cache manager level (e.g. to ChannelFactory), and decide globally when to switch.

Log an error when the initial connection to a server fails (e.g. it times out because the server requires encryption and the client doesn't have it)
Define a server connection as failed and close it when there is at least one operation waiting for a server response on that connection, and there was no server response for more than socketTimeout millis
When a server gets to 0 connections, start counting connection attempts agains max-retries.
When the count of failed connection attempts gets at max-retries, mark the server as failed.
Only attempt to re-connect to a failed server when there's a new topology update that includes it
Or at least prevent the client from trying to open more than one connection at the same time
When all servers are marked as failed, try to switch to another cluster or to the initial server list
Again, prevent any new connection attempts while a switch is in progress

incorporates

ISPN-13220 Initial server list switch should increment topology age

Closed

JDG-4797 HotRod client manual cluster switch not working

Closed

is incorporated by

JDG-4860 [8.3] Hot Rod java client retries too many times

Closed

is related to

ISPN-13216 Client should not close server connection after timeout

Resolved

relates to

ISPN-12598 Hot Rod java client retries too many times

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2021/09/07 9:21 AM

Updated:: 2024/07/12 4:55 PM

Resolved:: 2021/10/04 4:29 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty