Status: Closed (View Workflow)
Affects Version/s: 5.0.0
Fix Version/s: None
Steps to Reproduce:
Follow the instruction in https://github.com/bsingr/keycloak-docker-jdbcping-cluster-example
Docs QE Status:NEW
(I posted this on keycloak-user; firstname.lastname@example.org asked me to put it here)
We have a keycloak instance running as docker container in our AWS ECS docker environment.
For single instance this setup works great, but we failed to enhance it with a second instance for HA.
Problem: We cannot authenticate in one of instances behind the load balancer as soon as we have more than one keycloak instance.
- Keycloak v5.0.0 (docker image quay.io/keycloak/keycloak:5.0.0)
- Containers are behind AWS ALB load balancers with round-robin but without sticky sessions (the latter is important for our setup)
- JGroups with JDBC_PING configured and instances properly add/remove themselve from the configured MySQL table
- Containers run on separete EC2 hosts, TCP communication between containers is possible (port 7600 exposed also on hosts)
- Cache owners for all distributed caches are set to 2 (we also tested with 1 but without any different results)
Startup logs from infinispan look fine:
- On startup we see log message that cluster nodes can discover each other
"ISPN000094: Received new cluster view for channel ejb: [ip-10-129-2-31.eu-central-1.compute.internal|1] (2) [ip-10-129-2-31.eu-central-1.compute.internal, ip-10-129-2-54.eu-central-1.compute.internal]"
- After that also infinispan rebalancing happens
"[Context=offlineClientSessions] ISPN100010: Finished rebalance with members [ip-10-129-2-31.eu-central-1.compute.internal, ip-10-129-2-54.eu-central-1.compute.internal]”
Analysis (so far):
- The problem is obviously because authentication starts on node 1. Due to round robin authentication will be continued on node 2 and this fails because node 2 does not know about the authentication session started on node 1.
- According to the documentation there should be a lookup from node 2 in the cluster for started authentication session. Seems like this is not happening, but we cannot see any log related to this.
- Also regular sessions are not distributed in the cache. We tested this running only 1 node to do the authentication and then spinning up a second node and doing a fail-over to the new node. Afterwards the regular session was gone (we are logged out).