Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-20280

openstack-galera-2 pod not ready bring issue on other services

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhos-18.0.12
    • openstack-operator
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Sprint 7
    • 1
    • Moderate

      To Reproduce Steps to reproduce the behavior:

      1. Seems compute nodes are down openstack hypervisor list
        Failed to discover available identity versions when contacting https://keystone-public-openstack.apps.x.x.com. Attempting to parse version from URL.
        Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. Gateway Timeout (HTTP 504)
      2. we can see in the logs keystone nor able to reach mysql, pods/keystone-645d67cbd4-5z4nx/logs/keystone-api.log 
        [Tue Sep 23 16:14:44.882793 2025] [wsgi:error] [pid 19:tid 53] [remote 10.131.0.81:60594] 2025-09-23 16:14:44.882 19 WARNING oslo_db.sqlalchemy.engines [None req-a1f1afd6-3001-4cf6-b5ed-5c4ab099e448 - - - - - -] SQL connection failed. -13 attempts left.: oslo_db.exception.DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'openstack.openstack.svc' ([Errno 111] Connection refused)")\x1b[00m
      3. one of the pod was not ready openstack-galera-2-describe

      Conditions:
        Type                        Status
        PodReadyToStartContainers   True 
        Initialized                 True 
        Ready                       False  <=======================
        ContainersReady             False  <=======================
        PodScheduled                True 
      Events:
        Type     Reason     Age                      From     Message
        ----     ------     ----                     ----     -------
        Warning  Unhealthy  2m13s (x2490 over 6h8m)  kubelet  Readiness probe failed: wsrep_local_state_comment (Donor/Desynced) differs from Synced

      1. after delete `oc delete pods -l app=galera` start working again

      Expected behavior

      • galera should be able to recover, or at least on impacting other services if one of the pods is not readt

      Device Info (please complete the following information):

      • operator version: 1.0.14
      • OPENSTACK_RELEASE_VERSION: 18.0.12-20250902.2

      Bug impact

      • No direct impact now

      Known workaround

      • delete galera pods

              rhn-engineering-dciabrin Damien Ciabrini
              rhn-support-ltamagno Luigi Dino Tamagnone
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: