-
Bug
-
Resolution: Done
-
Undefined
-
None
Before reporting an issue
[x] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.
Area
admin/api
Describe the bug
We recently updated our Keycloak from Version 25.0.6 to 26.4.1. After the update, we noticed a increased Heap Memory Usage and OutOfMemoryErrors in Keycloak.
Version
26.4.1
Regression
[x] The issue is a regression
Expected behavior
no OOM Errors
Actual behavior
<img width="3732" height="1660" alt="Image" src="https://github.com/user-attachments/assets/37219936-dae2-4b8f-9acd-99607fddf558" />
In this screenshot you can see the heap memory of our Keycloak Cluster with 3 replicas over a time span of 30 days. Each replica is running in a Docker container with a memory limit of 1.5 G. We are also using the parameter -XX:MaxRAMPercentage=70, so that the max heap is at 1.05G.
I marked some interesting Points in the screenshot:
1. the update was deployed, heap looks good before but starts to increase after
2. heap memory reaches the max limit, and OOM Errors occur, users might experience long request times or can't even access our website any more. After some time the docker container gets unhealthy is is restarted automatically
3. not all replicas are affected, sometimes they can even recover and drop the memory usage (the blue line)
4. we decided to test a increase of the container limit to 2 G, so having 1.4G max heap memory
5. the OOM still occurs
Since the last 3 days we have a cronjob which restarts Keycloak every day, to avoid the OOM, but of course this is not a valid solution.
We also noticed, that the heap memory mostly increases during the night. We have some cronjobs running every night to synchronize user data, roles etc. between Keycloak and our app using the Keycloak API. I should also mention that we have ca. 170 Realms in this cluster.
How to Reproduce?
We can only reproduce it in our production environment. In our Test environment which has less realms / users / activity, the problem does not occur.
Anything else?
At some point i also made a Heap Dump to get some insights about the memory usage. It seems like the memory is mostly used for Cache and QuarkusKeycloakSession:
<img width="3658" height="1605" alt="Image" src="https://github.com/user-attachments/assets/283671b2-6f68-44e9-89e5-65722b7a6675" />
I also noticed one especially large QuarkusKeycloakSession:
<img width="3648" height="1723" alt="Image" src="https://github.com/user-attachments/assets/65ca790f-bbd1-443a-99fc-991890dde010" />
<img width="3666" height="897" alt="Image" src="https://github.com/user-attachments/assets/bdb96a67-2e0b-4a78-98e3-d36aa2e06f39" />
Please let me know if you need any more information to investigate the issue.
- links to