Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 5.2.5.Final
Component/s: Core, Server
Labels:
None

When under heavy write load from a hotrod client with 64+ threads and a new node is started, the new node will sometimes fail to start, eventually giving off state transfer timeouts and finally terminating. During the time it takes it to time out (~10 minutes) the hotrod client is totally blocked.

Setup is as follows:
3 servers, 1 client

dl380x2385, 10.64.106.21, client
dl380x2384, 10.64.106.20, first node
dl380x2383, 10.64.106.19, second node
dl380x2382, 10.64.106.18, third node

2 caches, initial state transfer off, transactions on, config is attached.
Small app that triggers the problem is also attached.

Steps to reproduce:
1. Start first node
2. Start client, wait for counter to reach 50000 (in client)
3. Start second node. 10% chance it'll fail.
4. Wait for counter to reach 100000 in client.
5. Start third node, 50% chance it'll fail.
If it doesn't fail, terminate everything and start over.

I realize this may be hard to reproduce, so if any more logs or tests are needed, let me know.

I've been unable to reproduce it on a single physical machine, and it only occurs when using more than 64 client threads. Changing the ratio of writes between the caches also seems to make it not occur. I was unable to reproduce it with TRACE log level on (too slow), but if you can specify some packages that you want traces of, that might work.

Turning transactions off makes it worse, 90% chance to fail on second node. Funny enough, disabling the concurrent GC lowers the failure rate to 10% on third node. Guessing race condition somewhere, may be similar to ~~ISPN-2982~~.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

test-jgroups.xml
3 kB
2013/04/02 4:23 AM
test-infinispan.xml
5 kB
2013/04/02 4:23 AM
test.infinispan.zip
8 kB
2013/04/02 4:23 AM
logs.zip
310 kB
2013/04/02 4:23 AM

is incorporated by

ISPN-2849 Don't keep threads blocked when waiting for locks to be released

Closed

Assignee:: Tristan Tarrant

Reporter:: Marc Bridner (Inactive)

Archiver:: Amol Dongare

Created:: 2013/04/02 4:21 AM

Updated:: 2014/01/24 8:08 AM

Resolved:: 2013/04/08 8:58 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty