-
Bug
-
Resolution: Unresolved
-
Normal
-
rhos-16.2.z
Description of problem:
Two different customers recently reported same problem blocking minor upgrade of RHOSP 16.2 deployments: haproxy failed to start after pacemaker service was stopped and started, then unused podman images were removed and haproxy image was removed as well, then pacemaker tried to start haproxy again when another controller node was updated. As a result, some of haproxy services are unable to start because there is no proper image and cluster is unhealthy state:
haproxy-bundle-podman-1_start_0 on controller-0 'error' (1): call=140, status='complete', exitreason='failed to pull image cluster.common.tag/openstack-haproxy:pcmklatest', last-rc-change='2024-08-27 20:17:10Z', queued=0ms, exec=420ms
HAProxy startup failures were different in two cases, so IMO it is right to focus on DF here. But this problem also can be investigated from HAProxy perspective.
Version-Release number of selected component (if applicable): RHOSP 16.2
How reproducible: sporadically reproduced during upgrade
Actual results: upgrade is not interrupted if haproxy fails to start properly, and next sequence of calls breaks haproxy bundle after removing proper image
Expected results: engineering knows better how to handle this
- external trackers