Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-22122

Debug the image tagging failure for cinder-backup during OC upgrade

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • rhos-ops-day1day2-upgrades
    • Hide
      # Known Issue: Image Tagging Failure During OpenStack Upgrade

      ## Observed Error
      ```
      2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a | FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}
      ```

      ## Root Cause
      This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images.

      ## Workaround
      1. Find out on which node we have running cinder-volume and cinder-backup services.
         ```
         pcs status
         ```

      2. On the node that is running cinder-volume, look at the list of images.
         ```
         podman image list
         ```

      3. You will see `cluster.common.tag/openstack-cinder-volume pcmklatest` tag pointing to an image, for example:
         `undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume` with example tag `16.2_20251118.1`

      4. On remaining nodes, create the `cluster.common.tag/openstack-cinder-volume:pcmklatest` tag.
         ```
         podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest
         ```

      ## Related Jira Issue
      [OSPRH-22122](https://issues.redhat.com/browse/OSPRH-22122)
      Show
      # Known Issue: Image Tagging Failure During OpenStack Upgrade ## Observed Error ``` 2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a | FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []} ``` ## Root Cause This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images. ## Workaround 1. Find out on which node we have running cinder-volume and cinder-backup services.    ```    pcs status    ``` 2. On the node that is running cinder-volume, look at the list of images.    ```    podman image list    ``` 3. You will see `cluster.common.tag/openstack-cinder-volume pcmklatest` tag pointing to an image, for example:    `undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume` with example tag `16.2_20251118.1` 4. On remaining nodes, create the `cluster.common.tag/openstack-cinder-volume:pcmklatest` tag.    ```    podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest    ``` ## Related Jira Issue [ OSPRH-22122 ]( https://issues.redhat.com/browse/OSPRH-22122 )
    • Known Issue
    • RHOS Upgrades 2025 Sprint 18, RHOS Upgrades 2025 Sprint 19
    • 2

      Goal:

      Debug the failure during 16.2->17.1 overcloud upgrade:

      https://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/staging/DFG-upgrades-ffu-17.1-from-16.2-latest_cdn-3cont_3db_3msg_2net_3hci-ipv6-ovs_dvr/48/undercloud-0/home/stack/overcloud_upgrade_run-computehci-0,computehci-1,computehci-2,controller-0,controller-1,controller-2,database-0,database-1,database-2,messaging-0,messaging-1,messaging-2,networker-0,networker-1,undercloud.log.gz

      FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-2 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.090809", "end": "2025-11-16 22:36:25.397058", "msg": "non-zero return code", "rc": 125, "start": "2025-11-16 22:36:25.306249", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}

      Acceptance Criteria:

      • Overcloud upgrade passed the random image tag failures.

      Known Issue: Image Tagging Failure During OpenStack Upgrade

      Observed Error

      2025-11-27 09:56:32 | 2025-11-27 09:56:32.158531 | 52540032-4698-d560-1726-00000000024a |      FATAL | Tag cluster.common.tag/cinder-backup:pcmklatest to latest cluster.common.tag/openstack-cinder-backup:pcmklatest image | controller-1 | error={"changed": true, "cmd": "podman tag cluster.common.tag/openstack-cinder-backup:pcmklatest cluster.common.tag/cinder-backup:pcmklatest", "delta": "0:00:00.082326", "end": "2025-11-27 09:56:32.129833", "msg": "non-zero return code", "rc": 125, "start": "2025-11-27 09:56:32.047507", "stderr": "Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known", "stderr_lines": ["Error: cluster.common.tag/openstack-cinder-backup:pcmklatest: image not known"], "stdout": "", "stdout_lines": []}
      

      Root Cause

      This is because PCS resources cinder-volume and cinder-backup run only on one node and are started by Pacemaker if there is a failover to another node. This means only one node has a running container using the tagged image. When we run an upgrade or update, there is a post-operation that calls cleanup of container images.

      Workaround

      1. Find out on which node we have running cinder-volume and cinder-backup services.

         pcs status
         

      2. On the node that is running cinder-volume, look at the list of images.

         podman image list
         

      3. You will see cluster.common.tag/openstack-cinder-volume pcmklatest tag pointing to an image, for example:
      undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume with example tag 16.2_20251118.1

      4. On remaining nodes, create the cluster.common.tag/openstack-cinder-volume:pcmklatest tag.

         podman tag undercloud-0.ctlplane.redhat.local:8787/rhosp-rhel8/openstack-cinder-volume:16.2_20251118.1 cluster.common.tag/openstack-cinder-volume:pcmklatest
         

      Related Jira Issue

      OSPRH-22122

              rhn-engineering-lbezdick Lukas Bezdicka
              arcsingh@redhat.com Archana Singh
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: