Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59238

[TNF] Quick podman-etcd restart result in failure to start

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19.0, 4.20
    • Two Node Fencing
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 0
    • None
    • None
    • None
    • None
    • Rejected
    • OCPEDGE Sprint 278
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      A rapid restart of podman-etcd fails, probably due to a misalignment of the clone notification environment variables[1] counting the number of active and inactive agents.

      Jul 11 09:26:59 master-0 pacemaker-controld[1885]:  notice: Result of stop operation for etcd on master-0: ok
      Jul 11 09:26:59 master-0 pacemaker-controld[1885]:  notice: Requesting local execution of start operation for etcd on master-0
      Jul 11 09:27:00 master-0 podman-etcd(etcd)[9729]: NOTICE: podman-etcd start
      Jul 11 09:27:00 master-0 podman-etcd(etcd)[9762]: INFO: ensure etcd pod is not running (retries: 60, interval: 10)
      Jul 11 09:27:00 master-0 podman-etcd(etcd)[9896]: ERROR: Unexpected active resource count: 2
      Jul 11 09:27:00 master-0 pacemaker-controld[1885]:  notice: Result of start operation for etcd on master-0: error
      

      [1]: https://clusterlabs.org/projects/pacemaker/doc/2.1/Pacemaker_Administration/html/agents.html#clone-notifications

              rh-ee-pfontani Pablo Fontanilla
              rh-ee-clobrano Carlo Lobrano
              None
              None
              Douglas Hensel Douglas Hensel
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: