Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 13.0.0.Final
Affects Version/s: 9.4.14.Final
Component/s: None
Labels:
None

Steps to Reproduce:
1. Create cluster of 3 (just an example) nodes
2. Kill one node not gracefully
Forum Reference:
https://groups.google.com/u/1/g/wildfly/c/Ot8X-vzm7h0/m/ds_OztdfAQAJ
Release Note Text:
Undefined

We have 3 nodes in cluster: app1, app2 and app3. App1 was shut down not gracefully because of some hardware issue. After that app2 and app3 started to fail with something like

{{ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p23-t1) ISPN000136: Error executing command RemoveCommand on Cache 'fs.war', writing keys [SessionCreationMetaDataKey(PGARVVdjGKfifzrVfyd7HAllbrwaRG7wLhKha1On)]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 422657 from app1}}
{{ {{ at org.infinispan@9.4.14.Final//org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)}}}}
{{ {{ at org.infinispan@9.4.14.Final//org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)}}}}
{{ {{ at org.infinispan@9.4.14.Final//org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)}}}}
{{ {{ at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)}}}}
{{ {{ at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)}}}}
{{ {{ at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)}}}}
{{ {{ at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)}}}}
{{ {{ at java.base/java.lang.Thread.run(Thread.java:834)}}}}

So these 2 nodes (app2 and app3) could not serve user requests anymore until app1 recovered. My question is... Is it ok? Should not Infinispan identify that one of nodes is down, remove it from cluster and notify app2 and app3 about it? I know that there is something like VERIFY_SUSPECT but it didn't happen.

Assignee:: Pedro Ruivo

Reporter:: Dmitry Kruglikov (Inactive)

Archiver:: Amol Dongare

Created:: 2021/01/25 5:43 PM

Updated:: 2022/07/13 6:44 PM

Resolved:: 2021/11/30 12:58 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty