Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 14.0.22.Final, 15.0.0.Final
Affects Version/s: 14.0.17.Final
Component/s: Server
Labels:
None

Git Pull Request:
https://github.com/infinispan/infinispan/pull/11365, https://github.com/infinispan/infinispan/pull/11517

If a cache encounters a fatal exception on the server, an exception is thrown, the Server stops all other caches and the Server terminates with a FATAL exception. However, when other caches are stopped we only make a call to EmbeddedCacheManager#stop which means that the Caches' state is never persisted. Consequently, if another cache manages to form a cluster before the exception is thrown and it has PartitionHandling.DENY_READ_WRITES configured, then on node restart it is never possible for the cluster to become AVAILABLE again as the UUID of the restarted node will differ from the original.

The org.infinispan.LOCKS cache utilises PartitionHandling.DENY_READ_WRITES, therefore any code attempting to utilise a Lock will fail even if the server correctly startups on restart.

This issue was encountered in the Operator testsuite because a single node was failing due to ~~ISPN-15089~~ and k8s automatically restarts the server pod on failure. Once the cluster successfully forms, attempts to perform a Backup Restore fail with:

ISPN000136: Error executing command GetKeyValueCommand on Cache 'org.infinispan.LOCKS', writing keys [] org.infinispan.partitionhandling.AvailabilityException: ISPN000306: Key 'ClusteredLockKey{name=BackupManagerImpl-restore}' is not available. Not all owners are in this partition

As a cache needs to lose at least half it's members, or all owners of a segment, for a cluster to be affected by this issue it must meet one of the following:

Cluster only has 2 nodes
Cluster has > 2 nodes, but a cache has num_owners=1

causes

ISPN-16015 Consistent strategy throws NPE for node joining during partition

Closed

is related to

ISPN-5290 Better automatic merge for caches with enabled partition handling

Closed

Assignee:: Jose Bolina

Reporter:: Ryan Emerson

Archiver:: Amol Dongare

Created:: 2023/09/26 1:10 PM

Updated:: 2024/07/12 4:02 PM

Resolved:: 2023/12/06 9:58 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty