Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-19006

Galera node sometimes fail to fully synchronize after joining the cluster.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhos-16.2.z
    • mariadb-operator
    • None
    • 13
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • rhos-ops-platform-services-pidone
    • None
    • Moderate

      Galera node sometimes fail to fully synchronize after joining the cluster.

      This is a follow up of https://issues.redhat.com/browse/OSPRH-12518

      This is another attempt at coming up with a conclusive RCA for issue
      reported in OSPRH-12518; under circumstances that involve a heavy
      loaded environment, a Galera node joining an existing cluster would
      correctly integrate the data it received from the SST it requested
      over rsync, but somehow fail to behave correctly after it integrate
      the remaining write sets over IST to catch up with the state of the
      cluster.

      As an additional information, although SST can be implemented with
      rsync or mariabackup, there is currently no evidence that the latter
      would prevent this problem from occuring. Moreover we have witnessed
      a handful of cases where the error message "WSREP: Failed to apply trx"
      reported in this Jira was seen in other environments. So the goal
      of this Jira is to track the similarities and hopefully come up
      with a definitive explanation as to why this problem can occur,
      and whether this is still an issue in OSP 17 and beyond.

      For reference, after an initial discussion, we could not determined
      from the logs of the original case that the rsync SST misbehaved in
      any way. While this is no proof, we are thinking that the issue may lie
      in the way the IST is integrated post SST.

      This Jira is a tracker to log actions that are needed to try to
      reproduce a the issue under the right environment conditions.

              rhn-engineering-dciabrin Damien Ciabrini
              rhn-engineering-dciabrin Damien Ciabrini
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: