-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
EAP72 1.0.BETA, EAPCD 13.0.GA
-
None
-
Cloud Sprint 32, Cloud Sprint 33, Cloud Sprint 34, Cloud Sprint 35, Cloud Sprint 36, Cloud Sprint 37
We are seeing problems scaling an EAP cluster under load down from 3 to 2 pods when SYM_ENCRYPT is enabled. We see this when testing an EAP on OS 7.2 Beta test image, and when testing test images that incorporate a build of the current WildFly master. We believe these problems probably go back to CD 12, e.g. see the discussion of CLOUD-2417.
A characteristic of the issue is lots of messages like these in the server logs:
[0m�[31m16:02:04,996 ERROR [org.jgroups.protocols.SYM_ENCRYPT] (thread-9,ee,hsc-1-z9f7g) hsc-1-z9f7g: received message without encrypt header from hsc-1-8dfqx; dropping it
The condition results in failed requests, so it's not just log noise.
Following are instructions from kwills@redhat.com on how to reproduce this:
"I've pushed a WFLY image to the internal registry if anyone wants to give it a go:
docker pull docker-registry.engineering.redhat.com/kwills/eap-cd-openshift:WFLY
To reproduce the problem:
docker pull docker-registry.engineering.redhat.com/kwills/eap-cd-openshift:WFLY
docker tag docker-registry.engineering.redhat.com/kwills/eap-cd-openshift:WFLY jboss-eap-7-tech-preview/eap-cd-openshift:13.0
oc cluster up
run setup-ocp.sh from https://github.com/luck3y/openshift-util-scripts (clone the repo, then run it from inside the repo dir, this will set up your local env)
run:
oc -n myproject new-app eap-cd-https-s2i \
-p APPLICATION_NAME=eap-clustering-test-1 \
-p JGROUPS_ENCRYPT_SECRET=eap7-app-secret \
-p JGROUPS_ENCRYPT_NAME="secret-key" \
-p JGROUPS_ENCRYPT_PASSWORD="password"
This will build and deploy an image using kitchensink, I had some issues getting the S2I builds working from CEE, so I just built the artifact and deployed it manually (attached)
Scale up application to 3
$ oc scale --replicas=3 dc/eap-clustering-test-1
Deploy ROOT.war (its in a directory called deployments locally)
for i in `oc get pods | grep -v build | grep -v NAME| awk '
'`
do
echo $i
oc rsync ./deployments/ $i:/deployments/
done
Wait for deployment to complete, then start making requests:
for i in `seq 9999`
do
curl -c cookies -b cookies "http://eap-clustering-test-1-myproject.127.0.0.1.nip.io/Counter?requestId=$i";
done
While requests are executing, scale down to 2:
$ oc scale --replicas=2 dc/eap-clustering-test-1
(Just remember if you use this method, you'll need to redeploy with rsync if you bring up new pods.)
One pod will terminate, and exceptions will begin referencing the terminated pod in the others, requests are either blocked or return a 503 until the application is scaled all the way down, then back up again."
The image Ken refers to there is a test image that packages current WildFly master. To try a test image containing EAP 7.2 Beta, use docker-registry.engineering.redhat.com/bstansbe/eap72-beta-openshift:CLOUD-2694.
I'll attach the deployment Ken referred to. I'll also attach test output (e.g. log files etc) of tests run against the 7.2 Beta test image and against the WF master test image, the latter with and without SYM_ENCRYPT.
- blocks
-
CLOUD-2417 [EAP CD] Clustering with openshift.KUBE_PING doesn't work correctly
- Closed
- is caused by
-
WFLY-10464 ISPN000482: Cannot create remote transaction X, already completed in ASYM_ENCRYPT scenario (following "received message without encrypt header from perf21; dropping it")
- Closed
- relates to
-
CLOUD-2694 Images and templates for EAP on OpenShift 7.2 Beta
- Closed
-
JGRP-2297 Coordinator with ASYM_ENCRYPT in the stack does not leave gracefully
- Resolved
-
JGRP-2293 Graceful concurrent leaving of coordinator(s) leaves the cluster with stale views
- Resolved
-
CLOUD-2417 [EAP CD] Clustering with openshift.KUBE_PING doesn't work correctly
- Closed