Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-21916

OpenStack Control Plane disruption during update from FR2 to FR4

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • openstack-operator
    • None
    • Moderate

      Summary

      During the update of RHOSO18 uni03gamma (HCI setup) from FR2 (v1.0.7) to latest pre-release, we observe a ~10 minutes disruption of the OpenStack API.

      Wed Nov 12 15:39:43 UTC 2025 (1762961983) 169s FAILED (1)
      Wed Nov 12 15:42:33 UTC 2025 (1762962153) 74s FAILED (1)
      + openstack compute service list --service nova-compute
      Internal Server Error (HTTP 500)
      + openstack network agent list
      Internal Server Error (HTTP 500)
      
      HttpException: 503: Server Error for url: https://nova-public-openstack.apps.ocp.openstac
      k.lab/v2.1/servers/detail?deleted=False, The server is currently unavailable. Please try
      again at a later time.<br /><br />
      The Keystone service is temporarily unavailable.
      
      Wed Nov 12 15:43:48 UTC 2025 (1762962228) 332s SUCCESS (0)
      Wed Nov 12 15:49:21 UTC 2025 (1762962561) 110s FAILED (1)
      Unable to establish connection to https://keystone-public-openstack.apps.ocp.openstack.la
      b/v3/auth/tokens: ('Connection aborted.', RemoteDisconnected('Remote end closed connectio
      n without response'))
      
      Wed Nov 12 15:51:12 UTC 2025 (1762962672) 246s SUCCESS (0)
      Wed Nov 12 15:55:19 UTC 2025 (1762962919) 186s SUCCESS (0)
      

      Note that it eventually recover and everything is fine and stay fine starting from 15:51:12.

      Environment

      • Starting Version: FR2 (v1.0.7), corresponding to deployedVersion: 18.0.6-20250317.1
      • Target Version: Latest pre-release, corresponding to targetVersion:
      18.0.14-20251103.185748
      • We create and destroy a OpenStack vm continuously during the entire update.

      Timeline

      
      $ grep 'UPDATE EVENT' update_timeline.log
      
      2025-11-12T15:37:35,180835603+00:00 [UPDATE EVENT] Update Role
      2025-11-12T15:40:01,262034355+00:00 [UPDATE EVENT] Wait for successful deployment of the openstack operator
      2025-11-12T15:40:04,518043690+00:00 [UPDATE EVENT] About to get a new version
      2025-11-12T15:41:43,843602359+00:00 [UPDATE EVENT] Got new version 18.0.14-20251103.185748 (18.0.6-20250317.1)
      2025-11-12T15:41:46,240704688+00:00 [UPDATE EVENT] Starting the update sequence
      2025-11-12T15:41:51,361811053+00:00 [UPDATE EVENT] Patching the Openstack Version
      2025-11-12T15:44:55,546213307+00:00 [UPDATE EVENT] MinorUpdateOVNControlplane Completed
      2025-11-12T15:44:57,768695605+00:00 [UPDATE EVENT] Applying the OVN CRD
      2025-11-12T15:47:59,612033815+00:00 [UPDATE EVENT] MinorUpdateOVNDataplane Completed
      

      then update got interupted because of https://issues.redhat.com/browse/OSPRH-21858.

      But this seems to indicate that interruption happen very soon after we accept the install plan as Wait for successful deployment of the openstack operator event is coming from this line of code (ie just after install_plan acceptation).

      Attachment and reproducer

      I'll attach the full log of the vm creation and errors, the complete series of events during the update. If a reproducer can be run using that documentation - but you will need to apply the workaround mentioned there https://issues.redhat.com/browse/OSPRH-21858 during update.

      In the update_archive.tar.gz we have:

      update/workload_launch.sh                             <- script used to create vm
      update/update_timeline.log                           <- all events during update (openshift even, update event, podman even on the compute node)
      update/control-plane-test-1062106.log          <- the control plane test result 
      update/ct-1062106/
      update/ct-1062106/1762961983.log              <- associated individual vm creation logs.
      update/ct-1062106/1762962153.log
      ...
      

        1. update_archive.tar.gz
          573 kB
          Sofer Athlan Guyot

              Unassigned Unassigned
              sathlang@redhat.com Sofer Athlan Guyot
              Sofer Athlan Guyot
              rhos-conplat-core-operators
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: