Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-10195

Add support for Galera's safe_to_bootstrap to improve bootstrap

    • support galera's safe_to_bootstrap
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • Proposed
    • No Docs Impact
    • To Do
    • mariadb-operator-container-1.0.4-4
    • ?
    • ?

      When a galera cluster is stopped cleanly (i.e. it followed a sequential shutdown) , the last node to be stopped is marked as "safe to bootstrap", meaning this node should be the one to restart first the next time the cluster must be bootstrapped.

       

      The mariadb operator should be able to benefit from that flag when bootstrapping a galera cluster. This can also improve situations where not all pods can be probed before deciding who should bootstrap the cluster, as the "safe_to_bootstrap" attribute is enough to decide.

            [OSPRH-10195] Add support for Galera's safe_to_bootstrap to improve bootstrap

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: Control plane Operators for RHOSO 18.0.3 (Feature Release 1) security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:9485

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: Control plane Operators for RHOSO 18.0.3 (Feature Release 1) security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:9485

            Verified:

            [zuul@controller-0 ~]$ oc get openstackversion
            NAME TARGET VERSION AVAILABLE VERSION DEPLOYED VERSION
            controlplane 18.0.3-trunk-20241025.1 18.0.3-trunk-20241025.1 18.0.3-trunk-20241025.1

            [zuul@controller-0 ~]$ oc get -n openstack-operators clusterserviceversions.operators.coreos.com |grep mariadb
            mariadb-operator.v1.0.4 MariaDB Operator 1.0.4

            1. Initial state:
            Galera pods are running:
            [zuul@controller-0 ~]$ date; oc get pod -l app=galera -o wide -w
            Tue Nov 12 16:36:22 EST 2024
            NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
            openstack-cell1-galera-0 1/1 Running 4 (29h ago) 43h 192.168.24.27 master-0 <none> <none>
            openstack-cell1-galera-1 1/1 Running 4 (29h ago) 3d23h 192.168.20.34 master-2 <none> <none>
            openstack-cell1-galera-2 1/1 Running 20 (29h ago) 8d 192.168.16.74 master-1 <none> <none>
            openstack-galera-0 1/1 Running 4 (20h ago) 43h 192.168.24.30 master-0 <none> <none>
            openstack-galera-1 1/1 Running 6 (20h ago) 3d23h 192.168.19.172 master-1 <none> <none>
            openstack-galera-2 1/1 Running 6 (20h ago) 3d23h 192.168.20.54 master-2 <none> <none>

            Mariadb-operato related reasourcess are running:
            [zuul@controller-0 ~]$ oc get all -n openstack-operators -l app=mariadb
            Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
            No resources found in openstack-operators namespace.
            [zuul@controller-0 ~]$ oc get all -n openstack-operators -l openstack.org/operator-name=mariadb
            Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
            NAME READY STATUS RESTARTS AGE
            pod/mariadb-operator-controller-manager-58459897c4-jpcbl 2/2 Running 0 5h54m

            NAME READY UP-TO-DATE AVAILABLE AGE
            deployment.apps/mariadb-operator-controller-manager 1/1 1 1 8d

            NAME DESIRED CURRENT READY AGE
            replicaset.apps/mariadb-operator-controller-manager-58459897c4 1 1 1 8d
            [zuul@controller-0 ~]$

            2. Disable the mariadb operator:
            [zuul@controller-0 ~]$ date; oc -n openstack-operators patch deployment mariadb-operator-controller-manager -p '{"spec":{"replicas":0}}'
            Tue Nov 12 16:36:16 EST 2024
            deployment.apps/mariadb-operator-controller-manager

            [zuul@controller-0 ~]$ date; oc get all -n openstack-operators -l openstack.org/operator-name=mariadb
            Tue Nov 12 16:36:38 EST 2024
            Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
            NAME READY UP-TO-DATE AVAILABLE AGE
            deployment.apps/mariadb-operator-controller-manager 0/0 0 0 8d

            NAME DESIRED CURRENT READY AGE
            replicaset.apps/mariadb-operator-controller-manager-58459897c4 0 0 0 8d
            [zuul@controller-0 ~]$

            3. Stop all openstack galera pods:
            $ for i in 2 1 0; do oc rsh openstack-galera-$i /bin/bash -c 'mysqladmin -uroot -p${DB_ROOT_PASSWORD} shutdown'; done
            Pods were restarted by OCP, but mysql is not running inside - containers are not running:
            openstack-galera-2 0/1 Completed 6 (20h ago) 3d23h 192.168.20.54 master-2 <none> <none>
            openstack-galera-2 0/1 Running 7 (2s ago) 3d23h 192.168.20.54 master-2 <none> <none>
            openstack-galera-1 0/1 Completed 6 (20h ago) 3d23h 192.168.19.172 master-1 <none> <none>
            openstack-galera-1 0/1 Running 7 (1s ago) 3d23h 192.168.19.172 master-1 <none> <none>
            openstack-galera-0 0/1 Completed 4 (20h ago) 43h 192.168.24.30 master-0 <none> <none>
            openstack-galera-0 0/1 Running 5 (2s ago) 43h 192.168.24.30 master-0 <none> <none>

            4. Openstack-galera-0 was shut down last - let’s check its grastate.dat and see it’s safe to bootstrap:

            [zuul@controller-0 ~]$ oc rsh openstack-galera-0 cat /var/lib/mysql/grastate.dat
            Defaulted container "galera" out of: galera, mysql-bootstrap (init)

            1. GALERA saved state
              version: 2.1
              uuid: ab1e0cdf-9a40-11ef-9309-e647ae7fb46f
              seqno: 878953
              safe_to_bootstrap: 1
              [zuul@controller-0 ~]$

            5. Restart mariadb operator:
            [zuul@controller-0 ~]$ date; oc -n openstack-operators patch deployment mariadb-operator-controller-manager -p '{"spec":{"replicas":1}}'
            Tue Nov 12 17:08:41 EST 2024
            deployment.apps/mariadb-operator-controller-manager patched
            Mariadb-operator' related resources came back:
            zuul@controller-0 ~]$ date; oc get all -n openstack-operators -l openstack.org/operator-name=mariadb
            Tue Nov 12 17:09:05 EST 2024
            Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
            NAME READY STATUS RESTARTS AGE
            pod/mariadb-operator-controller-manager-58459897c4-7dbkh 2/2 Running 0 24s

            NAME READY UP-TO-DATE AVAILABLE AGE
            deployment.apps/mariadb-operator-controller-manager 1/1 1 1 8d

            NAME DESIRED CURRENT READY AGE
            replicaset.apps/mariadb-operator-controller-manager-58459897c4 1 1 1 8d
            [zuul@controller-0 ~]$
            Galera pods recovered:
            openstack-galera-0 0/1 Running 8 (4m10s ago) 43h 192.168.24.30 master-0 <none> <none>
            openstack-galera-0 1/1 Running 8 (4m10s ago) 43h 192.168.24.30 master-0 <none> <none>
            ^[[Iopenstack-galera-2 0/1 Running 10 (4m30s ago) 4d 192.168.20.54 master-2 <none> <none>
            openstack-galera-2 1/1 Running 10 (4m30s ago) 4d 192.168.20.54 master-2 <none> <none>
            openstack-galera-1 0/1 Running 10 (4m30s ago) 4d 192.168.19.172 master-1 <none> <none>
            openstack-galera-1 1/1 Running 10 (4m30s ago) 4d 192.168.19.172 master-1 <none>
            <none>

            6. . Verify that the mariadb-operator bootstraps cluster from the node with "SafeToBootstrap": flag is on (here it’s openstack-galera-0):
            [zuul@controller-0 ~]$ oc -n openstack-operators logs -f $(oc -n openstack-operators get pod -o name | grep mariadb) | grep -e Attributes -e Push -e Started
            2024-11-12T22:09:00Z INFO Controllers.galera Attributes retrieved for openstack-galera-1 {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera":

            {"name":"openstack","namespace":"openstack"}

            , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "UUID": "ab1e0cdf-9a40-11ef-9309-e647ae7fb46f", "Seqno": "878951", "SafeToBootstrap": false}
            2024-11-12T22:09:00Z INFO Controllers.galera Attributes retrieved for openstack-galera-2 {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera":

            {"name":"openstack","namespace":"openstack"}

            , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "UUID": "ab1e0cdf-9a40-11ef-9309-e647ae7fb46f", "Seqno": "878948", "SafeToBootstrap": false}
            2024-11-12T22:09:00Z INFO Controllers.galera Attributes retrieved for openstack-galera-0 {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera":

            {"name":"openstack","namespace":"openstack"}

            , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "UUID": "ab1e0cdf-9a40-11ef-9309-e647ae7fb46f", "Seqno": "878953", "SafeToBootstrap": true}
            2024-11-12T22:09:00Z INFO Controllers.galera Pushing gcomm URI to bootstrap {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera":

            {"name":"openstack","namespace":"openstack"}

            , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "pod": "openstack-galera-0"}
            2024-11-12T22:09:09Z INFO Controllers.galera Pushing gcomm URI to joiner {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera":

            {"name":"openstack","namespace":"openstack"}

            , "namespace": "openstack", "name": "openstack", "reconcileID": "0fee0c15-2b00-4c9e-a529-0e2ceb495e70", "pod": "openstack-galera-1"}
            2024-11-12T22:09:09Z INFO Controllers.galera Pushing gcomm URI to joiner {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera":

            {"name":"openstack","namespace":"openstack"}

            , "namespace": "openstack", "name": "openstack", "reconcileID": "0fee0c15-2b00-4c9e-a529-0e2ceb495e70", "pod": "openstack-galera-2"}

            Julia Marciano added a comment - Verified: [zuul@controller-0 ~] $ oc get openstackversion NAME TARGET VERSION AVAILABLE VERSION DEPLOYED VERSION controlplane 18.0.3-trunk-20241025.1 18.0.3-trunk-20241025.1 18.0.3-trunk-20241025.1 [zuul@controller-0 ~] $ oc get -n openstack-operators clusterserviceversions.operators.coreos.com |grep mariadb mariadb-operator.v1.0.4 MariaDB Operator 1.0.4 1. Initial state: Galera pods are running: [zuul@controller-0 ~] $ date; oc get pod -l app=galera -o wide -w Tue Nov 12 16:36:22 EST 2024 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES openstack-cell1-galera-0 1/1 Running 4 (29h ago) 43h 192.168.24.27 master-0 <none> <none> openstack-cell1-galera-1 1/1 Running 4 (29h ago) 3d23h 192.168.20.34 master-2 <none> <none> openstack-cell1-galera-2 1/1 Running 20 (29h ago) 8d 192.168.16.74 master-1 <none> <none> openstack-galera-0 1/1 Running 4 (20h ago) 43h 192.168.24.30 master-0 <none> <none> openstack-galera-1 1/1 Running 6 (20h ago) 3d23h 192.168.19.172 master-1 <none> <none> openstack-galera-2 1/1 Running 6 (20h ago) 3d23h 192.168.20.54 master-2 <none> <none> Mariadb-operato related reasourcess are running: [zuul@controller-0 ~] $ oc get all -n openstack-operators -l app=mariadb Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ No resources found in openstack-operators namespace. [zuul@controller-0 ~] $ oc get all -n openstack-operators -l openstack.org/operator-name=mariadb Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY STATUS RESTARTS AGE pod/mariadb-operator-controller-manager-58459897c4-jpcbl 2/2 Running 0 5h54m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mariadb-operator-controller-manager 1/1 1 1 8d NAME DESIRED CURRENT READY AGE replicaset.apps/mariadb-operator-controller-manager-58459897c4 1 1 1 8d [zuul@controller-0 ~] $ 2. Disable the mariadb operator: [zuul@controller-0 ~] $ date; oc -n openstack-operators patch deployment mariadb-operator-controller-manager -p '{"spec":{"replicas":0}}' Tue Nov 12 16:36:16 EST 2024 deployment.apps/mariadb-operator-controller-manager [zuul@controller-0 ~] $ date; oc get all -n openstack-operators -l openstack.org/operator-name=mariadb Tue Nov 12 16:36:38 EST 2024 Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mariadb-operator-controller-manager 0/0 0 0 8d NAME DESIRED CURRENT READY AGE replicaset.apps/mariadb-operator-controller-manager-58459897c4 0 0 0 8d [zuul@controller-0 ~] $ 3. Stop all openstack galera pods: $ for i in 2 1 0; do oc rsh openstack-galera-$i /bin/bash -c 'mysqladmin -uroot -p${DB_ROOT_PASSWORD} shutdown'; done Pods were restarted by OCP, but mysql is not running inside - containers are not running: openstack-galera-2 0/1 Completed 6 (20h ago) 3d23h 192.168.20.54 master-2 <none> <none> openstack-galera-2 0/1 Running 7 (2s ago) 3d23h 192.168.20.54 master-2 <none> <none> openstack-galera-1 0/1 Completed 6 (20h ago) 3d23h 192.168.19.172 master-1 <none> <none> openstack-galera-1 0/1 Running 7 (1s ago) 3d23h 192.168.19.172 master-1 <none> <none> openstack-galera-0 0/1 Completed 4 (20h ago) 43h 192.168.24.30 master-0 <none> <none> openstack-galera-0 0/1 Running 5 (2s ago) 43h 192.168.24.30 master-0 <none> <none> 4. Openstack-galera-0 was shut down last - let’s check its grastate.dat and see it’s safe to bootstrap: [zuul@controller-0 ~] $ oc rsh openstack-galera-0 cat /var/lib/mysql/grastate.dat Defaulted container "galera" out of: galera, mysql-bootstrap (init) GALERA saved state version: 2.1 uuid: ab1e0cdf-9a40-11ef-9309-e647ae7fb46f seqno: 878953 safe_to_bootstrap: 1 [zuul@controller-0 ~] $ 5. Restart mariadb operator: [zuul@controller-0 ~] $ date; oc -n openstack-operators patch deployment mariadb-operator-controller-manager -p '{"spec":{"replicas":1}}' Tue Nov 12 17:08:41 EST 2024 deployment.apps/mariadb-operator-controller-manager patched Mariadb-operator' related resources came back: zuul@controller-0 ~]$ date; oc get all -n openstack-operators -l openstack.org/operator-name=mariadb Tue Nov 12 17:09:05 EST 2024 Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY STATUS RESTARTS AGE pod/mariadb-operator-controller-manager-58459897c4-7dbkh 2/2 Running 0 24s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mariadb-operator-controller-manager 1/1 1 1 8d NAME DESIRED CURRENT READY AGE replicaset.apps/mariadb-operator-controller-manager-58459897c4 1 1 1 8d [zuul@controller-0 ~] $ Galera pods recovered: openstack-galera-0 0/1 Running 8 (4m10s ago) 43h 192.168.24.30 master-0 <none> <none> openstack-galera-0 1/1 Running 8 (4m10s ago) 43h 192.168.24.30 master-0 <none> <none> ^[[Iopenstack-galera-2 0/1 Running 10 (4m30s ago) 4d 192.168.20.54 master-2 <none> <none> openstack-galera-2 1/1 Running 10 (4m30s ago) 4d 192.168.20.54 master-2 <none> <none> openstack-galera-1 0/1 Running 10 (4m30s ago) 4d 192.168.19.172 master-1 <none> <none> openstack-galera-1 1/1 Running 10 (4m30s ago) 4d 192.168.19.172 master-1 <none> <none> 6. . Verify that the mariadb-operator bootstraps cluster from the node with "SafeToBootstrap": flag is on (here it’s openstack-galera-0): [zuul@controller-0 ~] $ oc -n openstack-operators logs -f $(oc -n openstack-operators get pod -o name | grep mariadb) | grep -e Attributes -e Push -e Started 2024-11-12T22:09:00Z INFO Controllers.galera Attributes retrieved for openstack-galera-1 {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera": {"name":"openstack","namespace":"openstack"} , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "UUID": "ab1e0cdf-9a40-11ef-9309-e647ae7fb46f", "Seqno": "878951", "SafeToBootstrap": false} 2024-11-12T22:09:00Z INFO Controllers.galera Attributes retrieved for openstack-galera-2 {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera": {"name":"openstack","namespace":"openstack"} , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "UUID": "ab1e0cdf-9a40-11ef-9309-e647ae7fb46f", "Seqno": "878948", "SafeToBootstrap": false} 2024-11-12T22:09:00Z INFO Controllers.galera Attributes retrieved for openstack-galera-0 {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera": {"name":"openstack","namespace":"openstack"} , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "UUID": "ab1e0cdf-9a40-11ef-9309-e647ae7fb46f", "Seqno": "878953", "SafeToBootstrap": true} 2024-11-12T22:09:00Z INFO Controllers.galera Pushing gcomm URI to bootstrap {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera": {"name":"openstack","namespace":"openstack"} , "namespace": "openstack", "name": "openstack", "reconcileID": "81983e41-3692-4f6f-9079-2cc7d736138f", "pod": "openstack-galera-0"} 2024-11-12T22:09:09Z INFO Controllers.galera Pushing gcomm URI to joiner {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera": {"name":"openstack","namespace":"openstack"} , "namespace": "openstack", "name": "openstack", "reconcileID": "0fee0c15-2b00-4c9e-a529-0e2ceb495e70", "pod": "openstack-galera-1"} 2024-11-12T22:09:09Z INFO Controllers.galera Pushing gcomm URI to joiner {"controller": "galera", "controllerGroup": "mariadb.openstack.org", "controllerKind": "Galera", "Galera": {"name":"openstack","namespace":"openstack"} , "namespace": "openstack", "name": "openstack", "reconcileID": "0fee0c15-2b00-4c9e-a529-0e2ceb495e70", "pod": "openstack-galera-2"}

              rhn-engineering-dciabrin Damien Ciabrini
              rhn-engineering-dciabrin Damien Ciabrini
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: