-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
None
-
3
-
False
-
-
False
-
Not Selected
-
rhos-ops-day1day2-upgrades
-
-
Known Issue
-
-
-
RHOS Upgrades 2025 Sprint 18, RHOS Upgrades 2025 Sprint 19
-
2
Goal:
Debug the failure during 16.2->17.1 overcloud upgrade:
FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-2 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.090809", "end": "2025-11-16 22:36:25.397058", "msg": "non-zero return code", "rc": 125, "start": "2025-11-16 22:36:25.306249", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}
Acceptance Criteria:
- Overcloud upgrade passed the random image tag failures.
Known Issue: Image Tagging Failure During OpenStack Upgrade
Observed Error
2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a | FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}
Root Cause
This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images.
Workaround
1. Find out on which node we have running cinder-volume and cinder-backup services.
pcs status
2. On the node that is running cinder-volume, look at the list of images.
podman image list
3. You will see cluster.common.tag/openstack-cinder-volume pcmklatest tag pointing to an image, for example:
undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume with example tag 16.2_20251118.1
4. On remaining nodes, create the cluster.common.tag/openstack-cinder-volume:pcmklatest tag.
podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest
Related Jira Issue
- causes
-
OSPRH-23127 Document known issue and it's remedy
-
- Closed
-
- links to