Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: rhos-17.1.z
Component/s: validations-common
Labels:
None

Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
Fixed in Build:
openstack-tripleo-validations-14.3.2-17.1.20250120160809.2b526f8.el9osttrunk
Gerrit Link:
https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-validations/+/454498
Regression:
None
Intelligence Requested:
Market:
Target Version:

rhos-17.1.5

Original story points:
5
Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In case https://access.redhat.com/support/cases/#/case/04021756 (see also [1]) a customer got stuck upgrading from RHCSv4 to 5. The workaround was for them to `podman pull` the ceph container from their undercloud.

In the step 3 of the following chapter:

https://docs.redhat.com/en/documentation/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-to-ceph-storage-5-upgrading-ceph

A specific validation playbook task fails like this:

(undercloud) [stack@lab-sp-director ~]$ openstack overcloud external-upgrade run --skip-tags ceph_ansible_remote_tmp --stack overcloud --tags cephadm_adopt  2>&1 -y

### output omitted ###
2024-12-03 18:01:48.581362 | 005056a3-3ff7-c9b1-f534-0000000000b3 |       TASK | Check for valid ceph version during FFU                                                                                          
2024-12-03 18:01:48.669910 | 005056a3-3ff7-c9b1-f534-0000000000b3 |      FATAL | Check for valid ceph version during FFU | undercloud | error={"changed": false, "msg": "Target ceph version cannot be  for FFU."}
### output omitted ###

That ansible task was introduced with this:

https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-validations/+/450495

It was put in place to avoid this scenario:

https://bugzilla.redhat.com/show_bug.cgi?id=2259286

The task in `roles/ceph/tasks/ceph-upgrade-version-check.yaml` always assumes that the first task below will succeed.

- name: Get Ceph version
  shell: "{{ container_client | default('podman') }} run --rm --entrypoint=ceph {{ ceph_container }} -v | awk '{print $5}'"
  register: ceph_version
  become: true
  vars: 
    ceph_container: "{{ tripleo_cephadm_container_ns }}/{{ tripleo_cephadm_container_image }}:{{ tripleo_cephadm_container_tag }}"

- name: Check for valid ceph version during FFU
  fail: 
    msg: "Target ceph version cannot be {{ ceph_version.stdout }} for FFU."
  when: 
    - ceph_version.stdout != 'pacific'

The code above was tested but only in a scenario where the ceph container is already on the undercloud. This is our default scenario. However, if the customer is using Satellite to host the image, then the first task above will not populate ceph_version.stdout with a ceph version string.

This tasks file should be able to handle that case. It should better handle when ceph_version.stdout with all whitespace removed is empty. If it at least output an error like this:

"Unable to determine ceph version by running $CMD on $HOST".

That alone would have given the customer insight into why the validation had failed.

It's up to the person fixing this bug how to handle that. Perhaps start with the above but also add something which tries to `podman pull` the container?

Also, the docs impact of this bug is that the documentation referenced above should point out that when Satellite is used to host the container image, that a copy of the ceph container still needs to be on the undercloud.

[1] https://groups.google.com/a/redhat.com/g/rhos-tech/c/NaAG9OKL4f4/m/G-QK8mDOBwAJ

rhos-tech@ subject "RHOSP 16.2 to 17.1 upgrade with Ceph deployed through Director failing"

Assignee:: Manoj Katari

Reporter:: John Fulton

Contributors:: Alexon Ferreira de Oliveira, Erin Peterson

Team:: rhos-dfg-storage-squad-ceph

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/01/06 1:46 PM

Updated:: 2025/02/05 12:35 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty