-
Bug
-
Resolution: Done
-
Major
-
12.1.1.Final
-
None
In the operator we use zero-capacity nodes to create and restore backups.
- Source cluster created
- Backup pod started with zero-capacity=true, backup created, node leaves
- Source cluster shutdown
- Target cluster created (pod test-backup-restore-data-grid-target-0)
- Restore pod started with zero-capacity=true, restore restored, node leaves
In our testsuite we're frequently seeing step 5 fail, as the Restore pod is timing out when trying to join the cluster:
WARN (timeout-thread--p4-t1) [org.infinispan.CLUSTER] ISPN000071: Caught exception when handling command TopologyJoinCommand{cacheName='someCache', origin=restore-25099, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.SyncConsistentHashFactory@ffffd8e9, numSegments=256, numOwners=2, timeout=240000, cacheMode=DIST_SYNC, persistentUUID=ccc7e49d-fec1-4131-a783-0435c303edfc, persistentStateChecksum=Optional.empty}, viewId=1} org.infinispan.util.concurrent.TimeoutException
We have also seen issues with the zero-pods at step 2, but I don't have logs for that yet.
Logs are attached.