-
Bug
-
Resolution: Done
-
Major
-
14.0.28.Final, 15.0.3.Final
-
None
Found while running the Ansible benchmark with Hyperfoil. I am attaching a reproducer to run locally. The issue boils down to the SASL challenge evaluation. Each time this operation runs, it allocates a 1.28 Mb byte array. Internally, it iterates 20_000 times on the HMAC, and each execution of the HMC allocates a 64-byte array.
The authentication runs for each connection in the Hot Rod client. In a scenario with a single core, this still causes even more connections to be open, which causes more garbage, which causes the application and GC to compete for resources. This scenario also happens in multi-core machines, only on a smaller scale.
The user can mitigate the allocation problem by setting maximum connections in the pool with an exhaust action of WAIT (or EXCEPTION). However, this might hinder throughput a bit.
For comparison, the reproducer attached keeps issuing batches of 1_000 operations. When one batch finishes, it starts another. I am running for 30 seconds on a single core.
byte allocation | number of operations | time to finish | created connections | |
---|---|---|---|---|
With Auth | 2.32 Gb | {GET=600, PUT=400} | 43 s | 953 |
No Auth | 477 Mb | {GET=320166, PUT=213444} | 30 s | 288 |
To run the reproducer, first decompress it. It is a maven project. The first step is to build everything with `mvn clean package` and to run with:
java -cp "target/classes:target/libs/*" -Dlog4j.configurationFile=log4j2.xml -Xmx800m -XX:+FlightRecorder -XX:StartFlightRecording=compress=false,delay=10s,duration=24h,settings=profile,filename=hfa-hotrod.jfr -ea io.jbolina.reproducer.Reproducer 30 false
The first parameter (`30`) is the time in seconds to run, and the second (`false`) is whether to use authentication.
The reproducer needs an Infinispan server running locally on port 11222. A single server is enough to reproduce the issue. To run with auth enabled, create the user and update the reproducer code accordingly to reflect the information. The default user is `admin` with password `password`.
To reproduce the issue, run with `taskset` pinning to a single core. Analyzing the JFR, it is possible to see the allocation in the stack trace and flamegraph pointing to the `AuthHandler` with the `evaluateChallenge` method. Internally, these invocations go through Elytron but eventually to the JDK, the one doing all the allocations.
- relates to
-
ISPN-16063 Expand the Hot Rod client connection/pool metrics available
- New
-
ISPN-14868 Remove and refactor Client Connection Pool
- Resolved
-
ISPN-16093 Hot Rod client should PING servers added on topology update
- Resolved