Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-126083

podman-etcd should coordinate recovery with peer in case of local etcd container failure

Linking RHIVOS CVEs to...Migration: Automation ...RHELPRIO AssignedTeam ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • rhel-10.2
    • rhel-9.6
    • resource-agents
    • None
    • resource-agents-4.16.0-44.el10
    • None
    • Moderate
    • OtherQA, ZStream
    • rhel-ha
    • 13
    • 26
    • 3
    • False
    • False
    • Hide

      None

      Show
      None
    • No
    • None
    • Regression Exception
    • Unspecified Release Note Type - Unknown
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Currently, if the etcd container managed by podman-etcd is abruptly terminated, the monitor operation returns OCF_NOT_RUNNING. This is received by Pacemaker as the resource was never running, which triggers an immediate, local restart of the agent.

      This restart is too quick and uncoordinated with the peer node. The agent attempts to rejoin a cluster that hasn't yet recognized the failure,
      leading to inconsistent state detection (e.g., seeing 2 active nodes) and causing the start operation to fail or deadlock.

              rhn-engineering-oalbrigt Oyvind Albrigtsen
              rh-ee-clobrano Carlo Lobrano
              Oyvind Albrigtsen Oyvind Albrigtsen
              Douglas Hensel Douglas Hensel
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: