-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
8.0.0.GA-CR3
-
False
-
None
-
False
-
-
-
-
-
-
Known Issue
-
-
We noticed a new condition through ** an EAP 8+RHDG interoperability test (EJB distributed timers) on OpenShift.
We noticed this issue is stable now on AWS ARM based clusters, and the scenario is the following one
1. 2 members (A, B) EAP 8 cluster with EJB timers configured to be persisted by Infinispan (provided by the EAP cluster itself, not a remote one)
2. a timer is created, so B starts executing it
3. during the timer execution, B is non-gracefully terminated (pod is deleted)
we'd expect that A should take over while C is started by OpenShift to compensate the deleted pod, but instead we can see this only happens once C is ready. Is this expected? There's one survivor which is ready so why it does not take over immediately for the timer execution?
As stated, this only happens on a cluster where our EAP application service instances take really some time to process topology updates.
By looking at the logs, we could see that:
- a message tracing the removal (i.e. {{ member has left the cluster}}) is logged only after ~40 seconds the pod has been deleted
- the newly started pod boots up and immediately begins to output traces like:
[0m09:51:09,653 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-7,ee,eap-distributed-ejb-timers-app-1-kvvmw) 10.131.0.69:7600: failed connecting to 10.131.0.68:7600: java.net.SocketTimeoutException: Connect timed out [0m09:51:09,653 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-7,ee,eap-distributed-ejb-timers-app-1-kvvmw) 10.131.0.69:7600: removed connection to 10.131.0.68:7600 [0m09:51:09,653 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-7,ee,eap-distributed-ejb-timers-app-1-kvvmw) JGRP000036: eap-distributed-ejb-timers-app-1-kvvmw: exception sending bundled msgs: java.net.SocketTimeoutException: Connect timed out [0m09:51:09,653 TRACE [org.jgroups.protocols.TCP] (TQ-Bundler-7,ee,eap-distributed-ejb-timers-app-1-kvvmw) 10.131.0.69:7600: connecting to 10.131.0.68:7600
This affects EAP 8 CR3, and is filed as a critical targeting EAP 8,.0.z GA since it doesn't block the timers feature per-se or violates the Jakarta timers EJB timers spec.
- clones
-
JBEAP-25790 (8.0.z) HotRod calls to remote caches use outdated topology information
- Closed
- is triggered by
-
JBEAP-25247 Test EAP 8 GA images with GA CR bits for ARM
- Closed