Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12739

FFU tripleo-validation for ceph version always fails when Satellite hosts container

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhos-17.1.z
    • validations-common
    • None
    • 5
    • Moderate

      In case https://access.redhat.com/support/cases/#/case/04021756 (see also [1]) a customer got stuck upgrading from RHCSv4 to 5. The workaround was for them to `podman pull` the ceph container from their undercloud.

      In the step 3 of the following chapter:

      https://docs.redhat.com/en/documentation/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index#upgrading-to-ceph-storage-5-upgrading-ceph

      A specific validation playbook task fails like this:

      (undercloud) [stack@lab-sp-director ~]$ openstack overcloud external-upgrade run --skip-tags ceph_ansible_remote_tmp --stack overcloud --tags cephadm_adopt  2>&1 -y
      
      ### output omitted ###
      2024-12-03 18:01:48.581362 | 005056a3-3ff7-c9b1-f534-0000000000b3 |       TASK | Check for valid ceph version during FFU                                                                                          
      2024-12-03 18:01:48.669910 | 005056a3-3ff7-c9b1-f534-0000000000b3 |      FATAL | Check for valid ceph version during FFU | undercloud | error={"changed": false, "msg": "Target ceph version cannot be  for FFU."}
      ### output omitted ###
      

      That ansible task was introduced with this:

      https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-validations/+/450495

      It was put in place to avoid this scenario:

      https://bugzilla.redhat.com/show_bug.cgi?id=2259286

      The task in `roles/ceph/tasks/ceph-upgrade-version-check.yaml` always assumes that the first task below will succeed.

      - name: Get Ceph version
        shell: "{{ container_client | default('podman') }} run --rm --entrypoint=ceph {{ ceph_container }} -v | awk '{print $5}'"
        register: ceph_version
        become: true
        vars: 
          ceph_container: "{{ tripleo_cephadm_container_ns }}/{{ tripleo_cephadm_container_image }}:{{ tripleo_cephadm_container_tag }}"
      
      - name: Check for valid ceph version during FFU
        fail: 
          msg: "Target ceph version cannot be {{ ceph_version.stdout }} for FFU."
        when: 
          - ceph_version.stdout != 'pacific'
      

      The code above was tested but only in a scenario where the ceph container is already on the undercloud. This is our default scenario. However, if the customer is using Satellite to host the image, then the first task above will not populate ceph_version.stdout with a ceph version string.

      This tasks file should be able to handle that case. It should better handle when ceph_version.stdout with all whitespace removed is empty. If it at least output an error like this:

      "Unable to determine ceph version by running $CMD on $HOST".

      That alone would have given the customer insight into why the validation had failed.

      It's up to the person fixing this bug how to handle that. Perhaps start with the above but also add something which tries to `podman pull` the container?

      Also, the docs impact of this bug is that the documentation referenced above should point out that when Satellite is used to host the container image, that a copy of the ceph container still needs to be on the undercloud.

      [1] https://groups.google.com/a/redhat.com/g/rhos-tech/c/NaAG9OKL4f4/m/G-QK8mDOBwAJ

      rhos-tech@ subject "RHOSP 16.2 to 17.1 upgrade with Ceph deployed through Director failing"

              rh-ee-mkatari Manoj Katari
              rhn-support-johfulto John Fulton
              Alexon Ferreira de Oliveira, Erin Peterson
              rhos-dfg-storage-squad-ceph
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: