-
Bug
-
Resolution: Done
-
Blocker
-
None
-
21.0.6 GA
-
None
-
None
-
False
-
-
False
-
-
This method call replaces the heap region we allocate from (it's called "alloc region") with a free one: https://github.com/openjdk/jdk21u/blob/jdk-21.0.6%2B6/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L441
When an alloc region is replaced it's crucial to add the old one to the set of regions that will be processed during the next garbage collection cycle (that set is called "collection set").
Adding a region to the collection set is called "retirement".
The call at line 441 doesn't do retirement itself but it happens after and only after the call at line 430, the previous allocation attempt, which does.
The construction is solid until we introduce NUMA support. With regards to the discussed code, it means that each NUMA node now has its own alloc region. Each thread is associated with a node (through G1Allocator::current_node_index), each node is associated with an alloc region.
It works well as long as the association between threads and nodes doesn't change but with our load it sometimes does.
When the association changes between lines 430 and 441, it results in a region that is never retired because line 441 doesn't retire any region and line 430 retires a region associated with a different node.
A region that is not retired is a lost region that causes the JVM to crash after some time.
This is an assert that fails when the issue happens: https://github.com/openjdk/jdk21u/blob/jdk-21.0.6%2B6/src/hotspot/share/gc/g1/g1AllocRegion.cpp#L135
This is our naive fix:
— a/src/hotspot/share/gc/g1/g1AllocRegion.inline.hpp
+++ b/src/hotspot/share/gc/g1/g1AllocRegion.inline.hpp
@@ -117,6 +117,7 @@ inline HeapWord* G1AllocRegion::attempt_allocation_force(size_t word_size) {
assert_alloc_region(_alloc_region != nullptr, "not initialized properly");
trace("forcing alloc", word_size, word_size);
+ retire(true);
HeapWord* result = new_alloc_region_and_allocate(word_size, true /* force */);
if (result != nullptr) {
trace("alloc forced", word_size, word_size, word_size, result);
- links to
-
RHBA-2025:149968
OpenJDK 21 G1 and NUMA migrations Bugfix Update for Portable Linux Builds