Loading...

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: OP-2.3.10.GA
Component/s: OpenShift, Operator
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

OP-3.0.0.GA
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Trying to run the application at https://github.com/kabir/eap-operator-tx-recovery-demo/tree/JBEAP-24814. I've also attached an archive of it.

I deploy everything as mentioned in the application README, scale the application to three pods, and then I make sure that pods 2 and 3 are in a transaction (steps below).

After almost an hour (5 hours yesterday) I still have three pods, with 2 and 3 in the SCALING_DOWN_RECOVERY_INVESTIGATION state:

% oc get  wfly eap7-app --template={{.status}} -w        
map[hosts:[eap7-app-route-myproject.apps.cluster-l4fgj.l4fgj.sandbox309.opentlc.com] pods:[map[name:eap7-app-0 podIP:10.129.2.23 state:ACTIVE] map[name:eap7-app-1 podIP:10.128.2.13 state:ACTIVE] map[name:eap7-app-2 podIP:10.131.0.35 state:ACTIVE]] replicas:3 scalingdownPods:0 selector:app.kubernetes.io/name=eap7-app]map[hosts:[eap7-app-route-myproject.apps.cluster-l4fgj.l4fgj.sandbox309.opentlc.com] pods:[map[name:eap7-app-0 podIP:10.129.2.23 state:ACTIVE] map[name:eap7-app-1 podIP: state:ACTIVE] map[name:eap7-app-2 podIP:10.131.0.35 state:ACTIVE]] replicas:3 scalingdownPods:0 selector:app.kubernetes.io/name=eap7-app]map[hosts:[eap7-app-route-myproject.apps.cluster-l4fgj.l4fgj.sandbox309.opentlc.com] pods:[map[name:eap7-app-0 podIP:10.129.2.23 state:ACTIVE] map[name:eap7-app-1 podIP:10.128.2.14 state:ACTIVE] map[name:eap7-app-2 podIP:10.131.0.35 state:ACTIVE]] replicas:3 scalingdownPods:0 selector:app.kubernetes.io/name=eap7-app]


--- SCALING TO 1 ----

map[hosts:[eap7-app-route-myproject.apps.cluster-l4fgj.l4fgj.sandbox309.opentlc.com] pods:[map[name:eap7-app-0 podIP:10.129.2.23 state:ACTIVE] map[name:eap7-app-1 podIP:10.128.2.14 state:ACTIVE] map[name:eap7-app-2 podIP:10.131.0.35 state:ACTIVE]] replicas:3 scalingdownPods:0 selector:app.kubernetes.io/name=eap7-app]map[hosts:[eap7-app-route-myproject.apps.cluster-l4fgj.l4fgj.sandbox309.opentlc.com] pods:[map[name:eap7-app-0 podIP:10.129.2.23 state:ACTIVE] map[name:eap7-app-1 podIP:10.128.2.14 state:SCALING_DOWN_RECOVERY_INVESTIGATION] map[name:eap7-app-2 podIP:10.131.0.35 state:SCALING_DOWN_RECOVERY_INVESTIGATION]] replicas:3 scalingdownPods:2 selector:app.kubernetes.io/name=eap7-app]


map[hosts:[eap7-app-route-myproject.apps.cluster-l4fgj.l4fgj.sandbox309.opentlc.com] pods:[map[name:eap7-app-0 podIP:10.129.2.23 state:ACTIVE] map[name:eap7-app-1 podIP:10.128.2.40 state:SCALING_DOWN_RECOVERY_INVESTIGATION] map[name:eap7-app-2 podIP:10.128.2.41 state:SCALING_DOWN_RECOVERY_INVESTIGATION]] replicas:3 scalingdownPods:2 selector:app.kubernetes.io/name=eap7-app]

As I understand it, this should take only a minute.

The commands to get to this stage are:

[~/sourcecontrol/eap-operator-tx-recovery-demo] 
% ./demo.sh add one
--- SNIP ---
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 202 Accepted
--- SNIP ---
eap7-app-1%

The above was added on pod 1. Now we try another

[~/sourcecontrol/eap-operator-tx-recovery-demo] 
% ./demo.sh add two
--- SNIP ---
< HTTP/1.1 202 Accepted
--- SNIP ---
eap7-app-2%

The above was added on pod 2. Now we try another add

% ./demo.sh add three
---- SNIP ----
* < HTTP/1.1 409 Conflict
---- SNIP ----

The above hit one of the already done pods (2 or 1). So we try again:

% ./demo.sh add three
--- SNIP ---
< HTTP/1.1 202 Accepted
--- SNIP ---
eap7-app-0%

This worked and was added on pod 0. Now we release pod 0's transaction (pods 1 and 2 are still hanging)

% ./demo.sh release 0

Now I try to scale the pods to 1 (from 3).

Looking in the logs for pod 1, it looks like the pod is terminated before the EAP instance has a chance to be brought up. The attached logs below contains a few attempts at running oc logs -f eap7-app-1 (it seems to disconnect when the pod is terminated). Look for

ERROR *** WildFly wrapper process (1) received TERM signal ***

to see where OpenShift stops the pod.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

application-source.zip
405 kB
2023/04/25 4:23 PM
pod 1.log
84 kB
2023/04/25 4:22 PM

is related to

JBEAP-24448 Operator TX recovery facility does not work with KitchenSink quickstart

Closed

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates