Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-24856

Galera pods fail to start in dual-stack IPv4/IPv6 OpenStack deployments

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • mariadb-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • rhos-ops-platform-services-pidone

      Summary: Galera pods fail to start in dual-stack IPv4/IPv6 OpenStack deployments

      Description: When deploying OpenStack with dual-stack IPv4/IPv6 networking, the Galera database pods enter CrashLoopBackOff state and fail to form a proper cluster quorum. This affects both the main Galera cluster (openstack-galera) and cell clusters (openstack-cell1-galera).

       

      Environment:

      • OpenStack Platform: RHOSO 18 / OSP 18
      • Network Configuration: Dual-stack IPv4/IPv6
      • Deployment Topology: NFV OVS-DPDK-SRIOV with IPv6
      • Kubernetes/OpenShift version: v1.31.6

      Steps to Reproduce:

      1. Deploy OpenStack control plane with dual-stack IPv4/IPv6 configuration
      2. Deploy Galera database clusters as part of the control plane
      3. Observe pod status with oc get pods -n openstack | grep galera
      que hacen estos pods?
      openstack-cell1-galera-0                                                 0/1     CrashLoopBackOff   1280 (4m51s ago)   2d19h
      openstack-cell1-galera-1                                                 0/1     CrashLoopBackOff   1279 (46s ago)     2d19h
      openstack-cell1-galera-2                                                 1/1     Running            6 (28h ago)        2d19h
      openstack-galera-0                                                       0/1     CrashLoopBackOff   1286 (4m5s ago)    2d19h
      openstack-galera-1                                                       0/1     CrashLoopBackOff   1278 (4m3s ago)    2d19h
      openstack-galera-2                                                       1/1     Running            6 (12h ago)        2d19h 

      Actual Results:

      • Galera pods remain in CrashLoopBackOff state (e.g., 2d19h duration)
      • Only 1 out of 3 pods in each cluster successfully starts
      • Cluster quorum cannot be established (requires 2/3 nodes minimum)
      • Pods affected:
        • openstack-galera-0, openstack-galera-1, openstack-galera-2
        • openstack-cell1-galera-0, openstack-cell1-galera-1, openstack-cell1-galera-2

      Expected Results: All Galera pods should start successfully and form a healthy 3-node cluster with proper quorum.

       

      Root Cause: The Galera StatefulSet/Service resources are not configured with the proper ipFamilyPolicy for dual-stack environments. The services (openstack-galera and openstack-cell1-galera) default to single-stack mode, preventing proper cluster communication in IPv4/IPv6 environments.

       

      Workaround: Manually patch the Galera service resources to add dual-stack configuration:

      {{# Patch the main Galera service
      oc -n openstack patch service openstack-galera --type='merge' -p '{
      "spec":

      { "ipFamilyPolicy": "PreferDualStack", "ipFamilies": ["IPv6", "IPv4"] }

      }'

      1. Patch the cell1 Galera service
        oc -n openstack patch service openstack-cell1-galera --type='merge' -p ' { "spec": \{ "ipFamilyPolicy": "PreferDualStack", "ipFamilies": ["IPv6", "IPv4"] }

        }'

      1. Verify the patches were applied
        oc -n openstack get service openstack-galera -o jsonpath='{.spec.ipFamilyPolicy} {"\n"} {.spec.ipFamilies}

        {"n"}
        '
        oc -n openstack get service openstack-cell1-galera -o jsonpath='{.spec.ipFamilyPolicy}

        {"\n"}{.spec.ipFamilies}{"n"}

        '

      1. Monitor pod recovery (may need to force restart)
        oc -n openstack get pods | grep galera
      1. If pods don't recover automatically, force restart by deleting them
        oc -n openstack delete pod openstack-galera-0 openstack-galera-1 openstack-galera-2
        oc -n openstack delete pod openstack-cell1-galera-0 openstack-cell1-galera-1 openstack-cell1-galera-2}}
        Proposed Solution: The OpenStack operator should support configuring ipFamilyPolicy through the Galera template configuration in the OpenStackControlPlane CR. This would allow users to properly configure dual-stack networking without requiring manual patches to the generated service resources.

       

      Example of desired template configuration:

      {{galera:
      templates:
      openstack:
      ipFamilyPolicy: PreferDualStack
      ipFamilies:

      • IPv6
      • IPv4
        openstack-cell1:
        ipFamilyPolicy: PreferDualStack
        ipFamilies:
      • IPv6
      • IPv4}}
        Impact:
      • Severity: High/Critical
      • Priority: High
      • Blocks deployment of OpenStack in dual-stack IPv4/IPv6 environments
      • Affects all database-dependent services (Nova, Neutron, Glance, Cinder, Heat, etc.)
      • No functional control plane without working Galera clusters
      • Manual workaround is required for every deployment and may not persist through operator reconciliation

      Additional Information:

      • This configuration follows standard Kubernetes dual-stack networking practices
      • The ipFamilyPolicy: PreferDualStack setting allows the service to work with both IP families
      • Currently not exposed through the operator's template interface for Galera resources
      • The operator should propagate this configuration from the OpenStackControlPlane CR to the generated Service resources

              Unassigned Unassigned
              mnietoji Miguel Angel Nieto Jimenez
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: