Details

      Description

      (I posted this on keycloak-user; vramik@redhat.com asked me to put it here)

      We have a keycloak instance running as docker container in our AWS ECS docker environment.

      For single instance this setup works great, but we failed to enhance it with a second instance for HA.

      Problem: We cannot authenticate in one of instances behind the load balancer as soon as we have more than one keycloak instance.

      Cluster setup:

      • Keycloak v5.0.0 (docker image quay.io/keycloak/keycloak:5.0.0)
      • Containers are behind AWS ALB load balancers with round-robin but without sticky sessions (the latter is important for our setup)
      • JGroups with JDBC_PING configured and instances properly add/remove themselve from the configured MySQL table
      • Containers run on separete EC2 hosts, TCP communication between containers is possible (port 7600 exposed also on hosts)
      • Cache owners for all distributed caches are set to 2 (we also tested with 1 but without any different results)

      Startup logs from infinispan look fine:

      • On startup we see log message that cluster nodes can discover each other
        "ISPN000094: Received new cluster view for channel ejb: [ip-10-129-2-31.eu-central-1.compute.internal|1] (2) [ip-10-129-2-31.eu-central-1.compute.internal, ip-10-129-2-54.eu-central-1.compute.internal]"
      • After that also infinispan rebalancing happens
        "[Context=offlineClientSessions] ISPN100010: Finished rebalance with members [ip-10-129-2-31.eu-central-1.compute.internal, ip-10-129-2-54.eu-central-1.compute.internal]

      Analysis (so far):

      • The problem is obviously because authentication starts on node 1. Due to round robin authentication will be continued on node 2 and this fails because node 2 does not know about the authentication session started on node 1.
      • According to the documentation there should be a lookup from node 2 in the cluster for started authentication session. Seems like this is not happening, but we cannot see any log related to this.
      • Also regular sessions are not distributed in the cache. We tested this running only 1 node to do the authentication and then spinning up a second node and doing a fail-over to the new node. Afterwards the regular session was gone (we are logged out).

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  sebastian.laskawiec Sebastian Laskawiec
                  Reporter:
                  bsingr j b
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  8 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: