Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-9883

It took follower pod more than 5 minutes to acquire lock on shared storage PVC after "powering off" the worker with leader pod

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • AMQ 7.13.0.GA
    • high-availability
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Important

      With leader-follower HA configuration, we witnessed that the follower pod waits for 5 minutes before it locks the file during the worker node with leader pod "poweroff" test.

      We have ODF for storage, which is 3 node CEPH cluster running on INFRA nodes . We don't touch storage nodes during the tests of Broker HA. We only "poweroff" the worker node where Broker Leader is running. PVC used as shared storage is this:

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: shared-volume
      spec:
        storageClassName: ocs-storagecluster-cephfs
        accessModes:
          - ReadWriteMany
        resources:
          requests:
            storage: 10Gi
      

      The lock release issue only happens with "powering off" worker and the leader pod test scenario. In another scenario, when testing with command "oc delete <leader_pod>" to delete the leader pod, the follower pod acquired the lock immediately so HA failover worked perfectly fine.

              Unassigned Unassigned
              rhn-support-qluo Joe Luo
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: