Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12950

BZ#2319438 HAProxy container failures during minor upgrade procedure is ignored by TripleO, but blocks cluster operations later

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Set
    • Not Set
    • Not Set
    • None
    • Moderate

      Description of problem:
      Two different customers recently reported same problem blocking minor upgrade of RHOSP 16.2 deployments: haproxy failed to start after pacemaker service was stopped and started, then unused podman images were removed and haproxy image was removed as well, then pacemaker tried to start haproxy again when another controller node was updated. As a result, some of haproxy services are unable to start because there is no proper image and cluster is unhealthy state:

      haproxy-bundle-podman-1_start_0 on controller-0 'error' (1): call=140, status='complete', exitreason='failed to pull image cluster.common.tag/openstack-haproxy:pcmklatest', last-rc-change='2024-08-27 20:17:10Z', queued=0ms, exec=420ms

      HAProxy startup failures were different in two cases, so IMO it is right to focus on DF here. But this problem also can be investigated from HAProxy perspective.

      Version-Release number of selected component (if applicable): RHOSP 16.2

      How reproducible: sporadically reproduced during upgrade

      Actual results: upgrade is not interrupted if haproxy fails to start properly, and next sequence of calls breaks haproxy bundle after removing proper image

      Expected results: engineering knows better how to handle this

              rhn-engineering-lbezdick Lukas Bezdicka
              jira-bugzilla-migration RH Bugzilla Integration
              Archana Singh Archana Singh
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: