Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-14135

"Get the mysql container id when galera is enabled" play is error-prone

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • rhos-16.2.9
    • rhos-16.2.z
    • tripleo-ansible
    • None
    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • tripleo-ansible-0.8.1-2.20250429125109.123ce73.el8ost
    • rhos-ops-day1day2-upgrades
    • None
    • Hide
      .Summary:
      Galera database can not be backed it up when more than one galera container is running on the server.
       
      Cause -
      When there are several galera containers are running on the server the procedure is not able to get the correct id of the container to execute the command.

      Consequence -
      The command is executed on the wrong container and it fails.

      Workaround -
      Clean up all the orphan containers and just let the galera pacemaker container running before doing the backup

      Result –
      Show
      .Summary: Galera database can not be backed it up when more than one galera container is running on the server.   Cause - When there are several galera containers are running on the server the procedure is not able to get the correct id of the container to execute the command. Consequence - The command is executed on the wrong container and it fails. Workaround - Clean up all the orphan containers and just let the galera pacemaker container running before doing the backup Result –
    • Known Issue
    • Moderate

      To Reproduce Steps to reproduce the behavior:
      Customer had an orphaned galera container on controller node during backup. As a result, " Get the mysql container id when galera is enabled" play returned two container IDs. This blocked backup process: "Galera desync the MySQL node" play failed with the following output:

      fatal: [controller02]: FAILED! => {"attempts": 300, "changed": true, "cmd": "set -o pipefail\npodman exec 8862d8d227b8\n1c9d9dd69dd3 bash -c \"mysql -p -u root \\\n-pPASSWORD --execute 'SET GLOBAL wsrep_desync = ON'\"\n", "delta": "0:00:00.145099", "end": "TS", "msg": "non-zero return code", "rc": 127, "start": "TS0", "stderr": "Error: must provide a non-empty command to start an exec session: invalid argument\n/bin/sh: line 2: 1c9d9dd69dd3: command not found", "stderr_lines": ["Error: must provide a non-empty command to start an exec session: invalid argument", "/bin/sh: line 2: 1c9d9dd69dd3: command not found"], "stdout": "", "stdout_lines": []}
      

      Expected behavior
      It is expected that tripleo-ansible will either fail gracefully if it was unable to get single consistent container ID, or will use better search filter when determining container ID.

      Bug impact
      ReaR procedure is blocked until orphaned container is removed

      Known workaround
      Clean up orphaned entries

      P.S. I understand that this is unlikely to get fixed, but reporting if I miss anything and we would want to actually fix this.

              jbadiapa@redhat.com Juan Payno
              rhn-support-astupnik Alex Stupnikov
              Archana Singh Archana Singh
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: