-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
Not Selected
-
rhos-ops-platform-services-pidone
-
-
Summary: Galera pods fail to start in dual-stack IPv4/IPv6 OpenStack deployments
Description: When deploying OpenStack with dual-stack IPv4/IPv6 networking, the Galera database pods enter CrashLoopBackOff state and fail to form a proper cluster quorum. This affects both the main Galera cluster (openstack-galera) and cell clusters (openstack-cell1-galera).
Environment:
- OpenStack Platform: RHOSO 18 / OSP 18
- Network Configuration: Dual-stack IPv4/IPv6
- Deployment Topology: NFV OVS-DPDK-SRIOV with IPv6
- Kubernetes/OpenShift version: v1.31.6
Steps to Reproduce:
- Deploy OpenStack control plane with dual-stack IPv4/IPv6 configuration
- Deploy Galera database clusters as part of the control plane
- Observe pod status with oc get pods -n openstack | grep galera
que hacen estos pods? openstack-cell1-galera-0 0/1 CrashLoopBackOff 1280 (4m51s ago) 2d19h openstack-cell1-galera-1 0/1 CrashLoopBackOff 1279 (46s ago) 2d19h openstack-cell1-galera-2 1/1 Running 6 (28h ago) 2d19h openstack-galera-0 0/1 CrashLoopBackOff 1286 (4m5s ago) 2d19h openstack-galera-1 0/1 CrashLoopBackOff 1278 (4m3s ago) 2d19h openstack-galera-2 1/1 Running 6 (12h ago) 2d19h
Actual Results:
- Galera pods remain in CrashLoopBackOff state (e.g., 2d19h duration)
- Only 1 out of 3 pods in each cluster successfully starts
- Cluster quorum cannot be established (requires 2/3 nodes minimum)
- Pods affected:
- openstack-galera-0, openstack-galera-1, openstack-galera-2
- openstack-cell1-galera-0, openstack-cell1-galera-1, openstack-cell1-galera-2
Expected Results: All Galera pods should start successfully and form a healthy 3-node cluster with proper quorum.
Root Cause: The Galera StatefulSet/Service resources are not configured with the proper ipFamilyPolicy for dual-stack environments. The services (openstack-galera and openstack-cell1-galera) default to single-stack mode, preventing proper cluster communication in IPv4/IPv6 environments.
Workaround: Manually patch the Galera service resources to add dual-stack configuration:
{{# Patch the main Galera service
oc -n openstack patch service openstack-galera --type='merge' -p '{
"spec":
}'
- Patch the cell1 Galera service
oc -n openstack patch service openstack-cell1-galera --type='merge' -p ' { "spec": \{ "ipFamilyPolicy": "PreferDualStack", "ipFamilies": ["IPv6", "IPv4"] }}'
- Verify the patches were applied
oc -n openstack get service openstack-galera -o jsonpath='{.spec.ipFamilyPolicy} {"\n"} {.spec.ipFamilies}{"n"}
{"\n"}{.spec.ipFamilies}{"n"}
'
oc -n openstack get service openstack-cell1-galera -o jsonpath='{.spec.ipFamilyPolicy}'
- Monitor pod recovery (may need to force restart)
oc -n openstack get pods | grep galera
- If pods don't recover automatically, force restart by deleting them
oc -n openstack delete pod openstack-galera-0 openstack-galera-1 openstack-galera-2
oc -n openstack delete pod openstack-cell1-galera-0 openstack-cell1-galera-1 openstack-cell1-galera-2}}
Proposed Solution: The OpenStack operator should support configuring ipFamilyPolicy through the Galera template configuration in the OpenStackControlPlane CR. This would allow users to properly configure dual-stack networking without requiring manual patches to the generated service resources.
Example of desired template configuration:
{{galera:
templates:
openstack:
ipFamilyPolicy: PreferDualStack
ipFamilies:
- IPv6
- IPv4
openstack-cell1:
ipFamilyPolicy: PreferDualStack
ipFamilies: - IPv6
- IPv4}}
Impact:
- Severity: High/Critical
- Priority: High
- Blocks deployment of OpenStack in dual-stack IPv4/IPv6 environments
- Affects all database-dependent services (Nova, Neutron, Glance, Cinder, Heat, etc.)
- No functional control plane without working Galera clusters
- Manual workaround is required for every deployment and may not persist through operator reconciliation
Additional Information:
- This configuration follows standard Kubernetes dual-stack networking practices
- The ipFamilyPolicy: PreferDualStack setting allows the service to work with both IP families
- Currently not exposed through the operator's template interface for Galera resources
- The operator should propagate this configuration from the OpenStackControlPlane CR to the generated Service resources