XML

Word

Printable

Type: Task
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Story Points:
3
Epic Link:
Finalize sequential major upgrade testing procedure up to 17.1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
AssignedTeam:
rhos-ops-day1day2-upgrades
Release Note Text:

Hide
# Known Issue: Image Tagging Failure During OpenStack Upgrade

## Observed Error
```
2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a | FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}
```

## Root Cause
This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images.

## Workaround
1. Find out on which node we have running cinder-volume and cinder-backup services.
   ```
   pcs status
   ```

2. On the node that is running cinder-volume, look at the list of images.
   ```
   podman image list
   ```

3. You will see `cluster.common.tag/openstack-cinder-volume pcmklatest` tag pointing to an image, for example:
   `undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume` with example tag `16.2_20251118.1`

4. On remaining nodes, create the `cluster.common.tag/openstack-cinder-volume:pcmklatest` tag.
   ```
   podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest
   ```

## Related Jira Issue
[~~OSPRH-22122~~](https://issues.redhat.com/browse/OSPRH-22122)

Show
# Known Issue: Image Tagging Failure During OpenStack Upgrade ## Observed Error ``` 2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a | FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []} ``` ## Root Cause This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images. ## Workaround 1. Find out on which node we have running cinder-volume and cinder-backup services.    ```    pcs status    ``` 2. On the node that is running cinder-volume, look at the list of images.    ```    podman image list    ``` 3. You will see `cluster.common.tag/openstack-cinder-volume pcmklatest` tag pointing to an image, for example:    `undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume` with example tag `16.2_20251118.1` 4. On remaining nodes, create the `cluster.common.tag/openstack-cinder-volume:pcmklatest` tag.    ```    podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest    ``` ## Related Jira Issue [ OSPRH-22122 ]( https://issues.redhat.com/browse/OSPRH-22122 )
Release Note Type:
Known Issue
Intelligence Requested:
Market:

Sprint:
RHOS Upgrades 2025 Sprint 18, RHOS Upgrades 2025 Sprint 19
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Goal:

Debug the failure during 16.2->17.1 overcloud upgrade:

https://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-upgrades-ffu-17.1-from-16.2-latest_cdn-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/48/undercloud-0/home/stack/overcloud_upgrade_run-computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud.log.gz

FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-2 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.090809", "end": "2025-11-16 22:36:25.397058", "msg": "non-zero return code", "rc": 125, "start": "2025-11-16 22:36:25.306249", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}

Acceptance Criteria:

Overcloud upgrade passed the random image tag failures.

Known Issue: Image Tagging Failure During OpenStack Upgrade

Observed Error

2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a |      FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}

Root Cause

This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images.

Workaround

1. Find out on which node we have running cinder-volume and cinder-backup services.

   pcs status

2. On the node that is running cinder-volume, look at the list of images.

   podman image list

3. You will see cluster.common.tag/openstack-cinder-volume pcmklatest tag pointing to an image, for example:
undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume with example tag 16.2_20251118.1

4. On remaining nodes, create the cluster.common.tag/openstack-cinder-volume:pcmklatest tag.

   podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest

Related Jira Issue

OSPRH-22122

causes

OSPRH-23127 Document known issue and it's remedy

Closed

links to

openstack-k8s-operators/edpm-ansible#1087: edpm-podman: Improve image cleanup logic to preserve deployment images

tripleo-podman: Improve image cleanup logic to preserve deployment images

Assignee:: Lukas Bezdicka

Reporter:: Archana Singh

Team:: rhos-dfg-upgrades

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/11/18 10:47 AM

Updated:: 2026/01/12 11:28 AM

Resolved:: 2026/01/12 11:28 AM

Details

Description

Known Issue: Image Tagging Failure During OpenStack Upgrade

Observed Error

Root Cause

Workaround

Related Jira Issue

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty