-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
12.0.1.Final
-
None
-
None
-
- set up Infinispan with distributed caches and RocksDB persistence
- add several thousand entries to the caches
- call /metrics endpoint
-
Undefined
We're using an Infinispan cluster with 3 nodes to back our Keycloak instances. In addition, RocksDB is used to persist the cache content for disaster recovery.
When RocksDB is used for cache persistence, metrics collection on "/metrics" endpoint takes a very long time and also leads to timeouts in Keycloak which tries to access the Infinispan cluster.
The issue becomes more visible, the more entries are in the caches. With 6 distributed caches (2 owners, 3 nodes) and a total of 60000 entries we observe the following metrics collection duration:
- no RocksDB: <1s
- RocksDB (not segmented): 124s
- RocksDB (segmented: 256): 49s
With even more cache entries (300000), the Infinspan cluster becomes almost not usable when the /metrics endpoint is crawled by Prometheus every 2 minutes.
Interestingly the statistics shown in the Infinispan UI load without problems even when RocksDB is enabled. Only the /metrics endpoint causes trouble.
Please see the attached infinispan.xml file for details about the setup.
- is related to
-
ISPN-12607 Metrics degrade cluster performance
- Closed