Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-12834

Metrics degrade cluster performance when RocksDB is used

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 12.0.1.Final
    • None
    • None
      • set up Infinispan with distributed caches and RocksDB persistence
      • add several thousand entries to the caches
      • call /metrics endpoint
    • Undefined

      We're using an Infinispan cluster with 3 nodes to back our Keycloak instances. In addition, RocksDB is used to persist the cache content for disaster recovery.

      When RocksDB is used for cache persistence, metrics collection on "/metrics" endpoint takes a very long time and also leads to timeouts in Keycloak which tries to access the Infinispan cluster.

      The issue becomes more visible, the more entries are in the caches. With 6 distributed caches (2 owners, 3 nodes) and a total of 60000 entries we observe the following metrics collection duration:

      • no RocksDB: <1s
      • RocksDB (not segmented): 124s
      • RocksDB (segmented: 256): 49s
         

      With even more cache entries (300000), the Infinspan cluster becomes almost not usable when the /metrics endpoint is crawled by Prometheus every 2 minutes.

      Interestingly the statistics shown in the Infinispan UI load without problems even when RocksDB is enabled. Only the /metrics endpoint causes trouble.

      Please see the attached infinispan.xml file for details about the setup.

              remerson@redhat.com Ryan Emerson
              georgpace Georg F (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: