Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-26269

Reconcilliation issue when deleting one rabbitmq-notifications-server pod

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • rhos-18.0.16
    • rhos-18.0.15
    • nova-operator
    • None
    • 0
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • rhos-workloads-compute
    • None
    • Hide
      .`notificationsBusInstance` configuration for RabbitMqCluster CR causes service downtime

      Due to a bug in nova-operator, if `notificationsBusInstance` is configured in the `nova` section of the `OpenStackControlPlane` custom resource (CR) to point to a RabbitMqCluster CR, and a pod in that cluster is restarted, then nova-operator reconfigures all Compute services twice. This results in unnecessary service downtime.

      *Temporary workaround:*

      1. Remove the `notificationsBusInstance` configuration from the `OpenStackControlPlane` CR. Do not remove the notification RabbitMqCluster definition from the `OpenStackControlPlane` CR.

      2. Gather the `transport_url` from the RabbitMqCluster:
      +
      ----
      $ oc get secret <name-rabbitmq-notifications>-default-user -o json | jq '.data | map_values(@base64d) | "rabbit://\(.username):\(.password)@\(.host):\(.port)/?ssl=1"'
      ----
      +
      * Replace `<name-rabbitmq-notifications>` with the name of the RabbitMqCluster for notifications.

      3. For each `nova` service in the `OpenStackControlPlane` CR, add the following to the `customerServiceConfig` field:
      +
      ----
      [oslo_messaging_notifications]
      transport_url = <transport_url value from the previous step>
      driver = messagingv2
      ----
      +
      ----
      [notifications]
      notify_on_state_change = vm_and_task_state
      notification_format=both
      ----

      4. For each `nova` `OpenStackDataPlaneService` CR, add the above config snippet to the related nova extra config map and then create an `OpenStackDataPlaneDeployment` CR to apply the config changes on the data plane nodes.
      +
      [Note]
      This makes the notification message bus configuration in nova static. If the RabbitMqCluster is changed in a way that affects the effective `transport_url` of the cluster, then you must perform the above nova configuration procedure again.
      +
      [Warning]
      The `customServiceConfig` stores the configuration in plain text, and the `transport_url` contains the user and password of the RabbitMqCluster. Applying this workaround decreases the security of the notification rabbitmq cluster.
      Show
      .`notificationsBusInstance` configuration for RabbitMqCluster CR causes service downtime Due to a bug in nova-operator, if `notificationsBusInstance` is configured in the `nova` section of the `OpenStackControlPlane` custom resource (CR) to point to a RabbitMqCluster CR, and a pod in that cluster is restarted, then nova-operator reconfigures all Compute services twice. This results in unnecessary service downtime. *Temporary workaround:* 1. Remove the `notificationsBusInstance` configuration from the `OpenStackControlPlane` CR. Do not remove the notification RabbitMqCluster definition from the `OpenStackControlPlane` CR. 2. Gather the `transport_url` from the RabbitMqCluster: + ---- $ oc get secret <name-rabbitmq-notifications>-default-user -o json | jq '.data | map_values(@base64d) | "rabbit://\(.username):\(.password)@\(.host):\(.port)/?ssl=1"' ---- + * Replace `<name-rabbitmq-notifications>` with the name of the RabbitMqCluster for notifications. 3. For each `nova` service in the `OpenStackControlPlane` CR, add the following to the `customerServiceConfig` field: + ---- [oslo_messaging_notifications] transport_url = <transport_url value from the previous step> driver = messagingv2 ---- + ---- [notifications] notify_on_state_change = vm_and_task_state notification_format=both ---- 4. For each `nova` `OpenStackDataPlaneService` CR, add the above config snippet to the related nova extra config map and then create an `OpenStackDataPlaneDeployment` CR to apply the config changes on the data plane nodes. + [Note] This makes the notification message bus configuration in nova static. If the RabbitMqCluster is changed in a way that affects the effective `transport_url` of the cluster, then you must perform the above nova configuration procedure again. + [Warning] The `customServiceConfig` stores the configuration in plain text, and the `transport_url` contains the user and password of the RabbitMqCluster. Applying this workaround decreases the security of the notification rabbitmq cluster.
    • Known Issue
    • Done
    • Important

      We are experiencing reconciliation issues with all Nova API, scheduler, and conductor pods whenever any pod in the RabbitMQ instance used for the Nova notification server fails or restarts. This behavior causes the Nova services to become unstable. This is a critical issue because the failure or restart of a single rabbitmq_notification_server pod directly impacts Nova service.

      Customer is able to reproduce the issue on both of their clusters.

       

      To reproduce they just delete the rabbit_notification_server pod

      """
      We have 2 environments with notificationBusInstance configured in nova service. In both, when any of rabbitmq pods from notification cluster is deleted, it cause redeployment of Nova. It does not happen for cell/global rabbitmq cluster pods.

      To answer below:

      • lastTransitionTime: "2026-01-26T14:49:21Z"
        message: OpenStackControlPlane Nova completed
        reason: Ready
        status: "True"
        type: OpenStackControlPlaneNovaReady

      This transition happen after rabbitmq pod deletion.
      """

              Unassigned Unassigned
              rhn-support-ebarrera Eduard Barrera Casas
              rhos-workloads-compute
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: