Uploaded image for project: 'Red Hat Workload Availability'
  1. Red Hat Workload Availability
  2. RHWA-382

SNR pods crash on remaining hosts when a node is put into maintenance mode and shut down

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      Cause: Sometimes gathering IP addresses of peers temporarily fails for single nodes. The resulting error was bubbled up to the SNR agent start.
      Consequence: The SNR agent did not start because of such an error.
      Fix: Catch the error, and retry gathering IP addresses of peers instead.
      Result: Temporary errors don't stop SNR agent start anymore.
      Show
      Cause: Sometimes gathering IP addresses of peers temporarily fails for single nodes. The resulting error was bubbled up to the SNR agent start. Consequence: The SNR agent did not start because of such an error. Fix: Catch the error, and retry gathering IP addresses of peers instead. Result: Temporary errors don't stop SNR agent start anymore.
    • Bug Fix
    • Proposed

      When a host machine is gracefully placed into maintenance mode (with NMO) and subsequently turned off, the other SNR pods running on the remaining, operational cluster hosts begin to crash.

      This issue seems related to a sync failure with the turned off peer. The pods continue to crash until the offline node is successfully brought back online and its respective SNR pod is operational again

              slintes Marc Sluiter
              rh-ee-clobrano Carlo Lobrano
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: