Loading...

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: EAP72 1.0.BETA, EAPCD 13.0.GA
Component/s: EAP7, EAP_CD
Labels:
None

CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

EAPCD 14.0.GA

Sprint:
Cloud Sprint 32, Cloud Sprint 33, Cloud Sprint 34, Cloud Sprint 35, Cloud Sprint 36, Cloud Sprint 37

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

We are seeing problems scaling an EAP cluster under load down from 3 to 2 pods when SYM_ENCRYPT is enabled. We see this when testing an EAP on OS 7.2 Beta test image, and when testing test images that incorporate a build of the current WildFly master. We believe these problems probably go back to CD 12, e.g. see the discussion of ~~CLOUD-2417~~.

A characteristic of the issue is lots of messages like these in the server logs:

[0m�[31m16:02:04,996 ERROR [org.jgroups.protocols.SYM_ENCRYPT] (thread-9,ee,hsc-1-z9f7g) hsc-1-z9f7g: received message without encrypt header from hsc-1-8dfqx; dropping it

The condition results in failed requests, so it's not just log noise.

Following are instructions from kwills@redhat.com on how to reproduce this:

"I've pushed a WFLY image to the internal registry if anyone wants to give it a go:

docker pull docker-registry.engineering.redhat.com/kwills/eap-cd-openshift:WFLY

To reproduce the problem:

docker pull docker-registry.engineering.redhat.com/kwills/eap-cd-openshift:WFLY
docker tag docker-registry.engineering.redhat.com/kwills/eap-cd-openshift:WFLY jboss-eap-7-tech-preview/eap-cd-openshift:13.0
oc cluster up
run setup-ocp.sh from https://github.com/luck3y/openshift-util-scripts (clone the repo, then run it from inside the repo dir, this will set up your local env)
run:
oc -n myproject new-app eap-cd-https-s2i \
-p APPLICATION_NAME=eap-clustering-test-1 \
-p JGROUPS_ENCRYPT_SECRET=eap7-app-secret \
-p JGROUPS_ENCRYPT_NAME="secret-key" \
-p JGROUPS_ENCRYPT_PASSWORD="password"

This will build and deploy an image using kitchensink, I had some issues getting the S2I builds working from CEE, so I just built the artifact and deployed it manually (attached)

Scale up application to 3
$ oc scale --replicas=3 dc/eap-clustering-test-1
Deploy ROOT.war (its in a directory called deployments locally)
for i in `oc get pods | grep -v build | grep -v NAME| awk '

{print $1}

'`
do
echo $i
oc rsync ./deployments/ $i:/deployments/
done

Wait for deployment to complete, then start making requests:
for i in `seq 9999`
do
curl -c cookies -b cookies "http://eap-clustering-test-1-myproject.127.0.0.1.nip.io/Counter?requestId=$i";
done

While requests are executing, scale down to 2:
$ oc scale --replicas=2 dc/eap-clustering-test-1
(Just remember if you use this method, you'll need to redeploy with rsync if you bring up new pods.)

One pod will terminate, and exceptions will begin referencing the terminated pod in the others, requests are either blocked or return a 503 until the application is scaled all the way down, then back up again."

The image Ken refers to there is a test image that packages current WildFly master. To try a test image containing EAP 7.2 Beta, use docker-registry.engineering.redhat.com/bstansbe/eap72-beta-openshift:~~CLOUD-2694~~.

I'll attach the deployment Ken referred to. I'll also attach test output (e.g. log files etc) of tests run against the 7.2 Beta test image and against the WF master test image, the latter with and without SYM_ENCRYPT.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

CLOUD-2694-7.2.beta-scale-down-3-to-2.tar.gz
1.27 MB
2018/08/09 4:19 PM
eap-clustering-2-1-97v2q.log
66 kB
2018/09/07 3:29 PM
eap-clustering-2-1-rrpnk.log
122 kB
2018/09/07 3:29 PM
jgroups.jceks
0.5 kB
2018/09/07 3:29 PM
ROOT.war
7 kB
2018/08/09 4:19 PM
standalone-openshift-97v2q.xml
37 kB
2018/09/07 3:29 PM
standalone-openshift-rrpnk.xml
37 kB
2018/09/07 3:29 PM
wf-image-logs.zip
1.52 MB
2018/08/09 4:19 PM
wf-image-no-jgroups-encrypt.zip
497 kB
2018/08/09 4:19 PM

blocks

CLOUD-2417 [EAP CD] Clustering with openshift.KUBE_PING doesn't work correctly

Closed

is caused by

WFLY-10464 ISPN000482: Cannot create remote transaction X, already completed in ASYM_ENCRYPT scenario (following "received message without encrypt header from perf21; dropping it")

Closed

relates to

CLOUD-2694 Images and templates for EAP on OpenShift 7.2 Beta

Closed

JGRP-2297 Coordinator with ASYM_ENCRYPT in the stack does not leave gracefully

Resolved

JGRP-2293 Graceful concurrent leaving of coordinator(s) leaves the cluster with stale views

Resolved

CLOUD-2417 [EAP CD] Clustering with openshift.KUBE_PING doesn't work correctly

Closed

(1 relates to)

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates