Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13230

FFU upgrade from 16.2.6 to 17.1.4 failed on a wrong running version of the galera container (16.2 instead of 17.1)

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Important

      The command "openstack overcloud upgrade run --yes --stack overcloud --debug --limit allovercloud,undercloud --playbook all" fails with the following error:

      ~~~
       FATAL | List all DB users that match the DB users to be dropped | control0001-cdm | error=

      {"changed": true, "cmd": "for u in cind er glance heat keystone neutron nova placement; do podman exec -u root -it \"d7e342f142e0\" mysql -sNe \"select concat('\\`',user,'\\`@\\`',host,'\\`') from mysql.user where user = '$u' and host != '%';\"; done" , "delta": "0:00:01.923160", "end": "2025-01-20 14:24:15.165200", "msg": "non-zero return code", "rc": 1, "start": "2025-01-20 14:24:13.242040", "stderr": "", "stderr_lines": [], "stdout": "ERROR 2002 (HY000): C an't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\r\nERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\r\nERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\r\nERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\r\nERROR 2002 (HY000):  Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\r\nERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)\r\nERROR 2002 (HY000) : Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)", "stdout_lines": ["ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)", " ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)", "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)", "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)", "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)",  "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)", "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)" ]}

      ESC[0m
      ~~~

      On the controller nodes, we have multiple problems:

      Mysql not running:
        * Container bundle set: galera-bundle [cluster.common.tag/mariadb:pcmklatest]:
          * galera-bundle-0   (ocf::heartbeat:galera):         FAILED Master control0001-naz91 (blocked)
          * galera-bundle-1   (ocf::heartbeat:galera):         Slave control0001-lb
          * galera-bundle-2   (ocf::heartbeat:galera):         Slave control0001-cdm

      This is the mysql log from control0001-naz91:
      2025-01-20 14:40:09 0 [ERROR] InnoDB: Unsupported redo log format. The redo log was created with MariaDB 10.5.22.
      2025-01-20 14:40:09 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
      2025-01-20 14:40:09 0 [Note] InnoDB: Starting shutdown...
      2025-01-20 14:40:10 0 [ERROR] Plugin 'InnoDB' init function returned error.
      2025-01-20 14:40:10 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2025-01-20 14:40:10 0 [Note] Plugin 'FEEDBACK' is disabled.
      2025-01-20 14:40:10 0 [ERROR] Unknown/unsupported storage engine: innodb
      2025-01-20 14:40:10 0 [ERROR] Aborting'

      As a result of the internal anaslysis, the issue was induced by a wrong container version used:

      ~~~
      From the error we see: 10.5.22 db version

      2025-01-20 14:40:09 0 [ERROR] InnoDB: Unsupported redo log format. The redo log was created with MariaDB 10.5.22.

      And the mariadb version is this: 10.3.32-MariaDB

      250120 14:40:10 mysqld_safe WSREP: Failed to recover position: '2025-01-20 14:40:09 0 [Note] /usr/libexec/mysqld (mysqld 10.3.32-MariaDB) starting as process 33089 ...

      From your sosreport, we see the running galera container is: 9f83d5650a0f

      ontrol0001-naz91]$ less sos_commands/podman/podman_ps |grep galera
      9f83d5650a0f  cluster.common.tag/mariadb:pcmklatest                                                                                                                                                          /bin/bash /usr/lo...  3 hours ago  Up 3 hours ago              galera-bundle-podman-0

      The container is still 16.2.6 version

      control0001-naz91]$ grep url sos_commands/podman/containers/podman_inspect_9f83d5650a0f 
                      "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp16/openstack-mariadb/images/16.2.6-18.1730685219",

      However, based on the tag: cluster.common.tag/mariadb:pcmklatest it should be 17.1 version. ID 86f0ac31f2e6 is linked to 17.1 version.

       cat sos_commands/podman/podman_images|grep maria
      wefali28.webfarm.bancaditalia.it:443/banca_d_italia_sddc-lab-openstack_17_1_composite-osp17_containers-mariadb                       17.1        86f0ac31f2e6  5 weeks ago    595 MB
      cluster.common.tag/mariadb                                                                                                           pcmklatest  86f0ac31f2e6  5 weeks ago    595 MB
      cluster.common.tag/banca_d_italia_sddc-lab-openstack_16_2-osp16_containers-mariadb                                                   pcmklatest  935c87814e32  2 months ago   781 MB
      sefaly05.utenze.bankit.it:443/banca_d_italia_sddc-lab-openstack_16_2-osp16_containers-mariadb                                        16.2.6      935c87814e32  2 months ago   781 MB
      ~~~

      This issue was solved disabling and enabling galera.
      In this way mysql containers started with the new image aligned to 17.1

      Despite the several checks and known issues tracked, this problem isn't tracked in the FFU documentation [1]

      [1] https://docs.redhat.com/en/documentation/red_hat_openstack_platform/17.1/html-single/framework_for_upgrades_16.2_to_17.1/index

      Point Of Attention: in the logs attached there is also another problem related to the OVN.
      This second issue is covered in the ticket OSPRH-13228 (FFU upgrade from 16.2.6 to 17.1.4 failed on service "OS::TripleO::Services"::OVNDBs not more required in the neutron-ovn-dvr-ha.yaml)

       

              jbadiapa@redhat.com Juan Payno
              rhn-support-rbruzzon Riccardo Bruzzone
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: