Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4484

Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER command

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • 7.0.0.Alpha5
    • 6.0.2.Final
    • Core, State Transfer
    • None

      This appeared during the 32-nodes elasticity test in the Hyperion environment.

      Just as apex947 left, it started a rebalance, which apex948 dutifully cancelled as it became the new coordinator. apex949 had already requested segments from apex959, so it sent a StateRequestCommand(CANCEL_STATE_TRANSFER) asynchronously to apex959. Then apex948 started a new rebalance, and apex949 asked apex959 for the same segments. When apex959 finally received the cancel request, it didn't check the topology id and it incorrectly cancelled the outbound transfer to apex949.

      The solution would be to verify the topology id in the CANCEL_STATE_TRANSFER command before cancelling the transfer. I also think we can avoid sending the cancel command completely in this case, and only send it as we are about to stop.

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: