Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-16077

Hot Rod client degraded performance with auth enabled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 15.0.4.Final
    • 14.0.28.Final, 15.0.3.Final
    • Hot Rod
    • None

      Found while running the Ansible benchmark with Hyperfoil. I am attaching a reproducer to run locally. The issue boils down to the SASL challenge evaluation. Each time this operation runs, it allocates a 1.28 Mb byte array. Internally, it iterates 20_000 times on the HMAC, and each execution of the HMC allocates a 64-byte array.

       

      The authentication runs for each connection in the Hot Rod client. In a scenario with a single core, this still causes even more connections to be open, which causes more garbage, which causes the application and GC to compete for resources. This scenario also happens in multi-core machines, only on a smaller scale. 

       

      The user can mitigate the allocation problem by setting maximum connections in the pool with an exhaust action of WAIT (or EXCEPTION). However, this might hinder throughput a bit.

       

      For comparison, the reproducer attached keeps issuing batches of 1_000 operations. When one batch finishes, it starts another. I am running for 30 seconds on a single core.

       

       

        byte allocation number of operations time to finish created connections
      With Auth 2.32 Gb  {GET=600, PUT=400} 43 s 953
      No Auth 477 Mb  {GET=320166, PUT=213444} 30 s 288

       


      To run the reproducer, first decompress it. It is a maven project. The first step is to build everything with `mvn clean package` and to run with:

       

      java -cp "target/classes:target/libs/*" -Dlog4j.configurationFile=log4j2.xml -Xmx800m -XX:+FlightRecorder -XX:StartFlightRecording=compress=false,delay=10s,duration=24h,settings=profile,filename=hfa-hotrod.jfr -ea io.jbolina.reproducer.Reproducer 30 false
      

      The first parameter (`30`) is the time in seconds to run, and the second (`false`) is whether to use authentication.

      The reproducer needs an Infinispan server running locally on port 11222. A single server is enough to reproduce the issue. To run with auth enabled, create the user and update the reproducer code accordingly to reflect the information. The default user is `admin` with password `password`.

       

      To reproduce the issue, run with `taskset` pinning to a single core. Analyzing the JFR, it is possible to see the allocation in the stack trace and flamegraph pointing to the `AuthHandler` with the `evaluateChallenge` method. Internally, these invocations go through Elytron but eventually to the JDK, the one doing all the allocations.

        1. auth-garbage.tar.xz
          6 kB
          Jose Bolina
        2. hyperfoil-report.png
          231 kB
          Jose Bolina

              rh-ee-jbolina Jose Bolina
              rh-ee-jbolina Jose Bolina
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: