Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-12953

Zero Capacity Node TopologyJoinCommand frequently timing out

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 12.1.2.Final, 13.0.0.Final
    • 12.1.1.Final
    • Core
    • None

      In the operator we use zero-capacity nodes to create and restore backups.

      1. Source cluster created
      2. Backup pod started with zero-capacity=true, backup created, node leaves
      3. Source cluster shutdown
      4. Target cluster created (pod test-backup-restore-data-grid-target-0)
      5. Restore pod started with zero-capacity=true, restore restored, node leaves

      In our testsuite we're frequently seeing step 5 fail, as the Restore pod is timing out when trying to join the cluster:

       WARN  (timeout-thread--p4-t1) [org.infinispan.CLUSTER] ISPN000071: Caught exception when handling command TopologyJoinCommand{cacheName='someCache', origin=restore-25099, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.SyncConsistentHashFactory@ffffd8e9, numSegments=256, numOwners=2, timeout=240000, cacheMode=DIST_SYNC, persistentUUID=ccc7e49d-fec1-4131-a783-0435c303edfc, persistentStateChecksum=Optional.empty}, viewId=1} org.infinispan.util.concurrent.TimeoutException
      

      We have also seen issues with the zero-pods at step 2, but I don't have logs for that yet.

      Logs are attached.

        1. test-backup-restore-data-grid-target-0
          463 kB
        2. backup
          445 kB
        3. restore
          485 kB

              remerson@redhat.com Ryan Emerson
              remerson@redhat.com Ryan Emerson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: