Loading...

XML

Word

Printable

Type: Enhancement
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: JDG 7.1.0 GA
Component/s: None
Labels:
- Keycloak

Forum Reference:
https://developer.jboss.org/thread/276274

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

I have setup of 2 JDG 7.1 servers, which are supposed to be set for cross-site setup. They are connected through the RELAY2 protocol and have caches in the SYNC backup mode. Pretty much similar to the documentation setup: https://access.redhat.com/documentation/en-us/red_hat_jboss_data_grid/7.1/html/administration_and_configuration_guide/set_up_cross_datacenter_replication#configure_cross_datacenter_replication_remote_client_server_mode

Then I have a simple Java application, which connects to the infinispan server through the hotrod (RemoteCache). I am seeing the deadlock when there is an attempt to write record to the same key "123" on both sites concurrently. There are those exceptions in the server.log of both servers:

20:30:15,461 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (HotRodServerHandler-8-32) ISPN000136: Error executing command ReplaceCommand, writing keys [[B0x033E03313233]: The local cache sessions failed to backup data to the remote sites:
LON: org.infinispan.util.concurrent.TimeoutException: Timed out after 10 seconds waiting for a response from LON (sync, timeout=10000)

	at org.infinispan.xsite.BackupSenderImpl.processFailedResponses(BackupSenderImpl.java:227)
	at org.infinispan.xsite.BackupSenderImpl.processResponses(BackupSenderImpl.java:132)
	at org.infinispan.xsite.BackupSenderImpl.processResponses(BackupSenderImpl.java:124)
	at org.infinispan.interceptors.xsite.NonTransactionalBackupInterceptor.lambda$handleSingleKeyWriteCommand$0(NonTransactionalBackupInterceptor.java:58)
	at org.infinispan.interceptors.xsite.NonTransactionalBackupInterceptor$$Lambda$303/1579852903.accept(Unknown Source)
	at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenAccept(BaseAsyncInterceptor.java:108)

I am also attaching the files with thread dumps from both servers.

If I am analyzing the thread-dump correctly, I see that what happened is:

Site1 transaction1: cache.replace("123", val)

Site1 transaction1: lockManager.lock("123", ...) called from AbstractLockingInterceptor. Acquired "site1-lock".

Site1 transaction1: BackupSender.backupWrite called for "123" and sending backup to Site2

Concurrently with it, I have on site2:

Site2 transaction2: cache.replace("123", val);

Site2 transaction2: lockManager.lock("123", ...) called from AbstractLockingInterceptor. Acquired "site2-lock".

Site2 transaction2: BackupSender.backupWrite called for "123" and sending backup to Site1

In the meantime, Site2 received backup from Site1 (triggered by Site1 transaction1). But BaseBackupReceiver on site2 needs to wait for Site2 transaction2, for the site2-lock, so cannot continue. But site1 transaction1 is waiting for the response from BaseBackupReceiver, so cannot continue.

In the meantime, Site1 received backup from Site2 (triggered by Site2 transaction2). But BaseBackupReceiver on site1 needs to wait for Site1 transaction1, for the site1-lock, so cannot continue. But site2 transaction2 is waiting for the response from BaseBackupReceiver, so cannot continue.

So we have nice deadlock here, which is "unblocked" after 10 seconds due the BackupSender timeout.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

jdg1-thread_dump.txt
2017/10/18 3:17 PM
10 kB
Marek Posolda
jdg2-thread_dump.txt
2017/10/18 3:17 PM
10 kB
Marek Posolda

Assignee:: Pedro Ruivo

Reporter:: Marek Posolda

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2017/10/18 3:17 PM

Updated:: 2023/02/16 9:36 AM

Resolved:: 2023/02/16 9:36 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates