-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
?
-
rhos-conplat-core-operators
-
None
-
-
-
-
Moderate
Summary
During the update of RHOSO18 uni03gamma (HCI setup) from FR2 (v1.0.7) to latest pre-release, we observe a ~10 minutes disruption of the OpenStack API.
Wed Nov 12 15:39:43 UTC 2025 (1762961983) 169s FAILED (1) Wed Nov 12 15:42:33 UTC 2025 (1762962153) 74s FAILED (1) + openstack compute service list --service nova-compute Internal Server Error (HTTP 500) + openstack network agent list Internal Server Error (HTTP 500) HttpException: 503: Server Error for url: https://nova-public-openstack.apps.ocp.openstac k.lab/v2.1/servers/detail?deleted=False, The server is currently unavailable. Please try again at a later time.<br /><br /> The Keystone service is temporarily unavailable. Wed Nov 12 15:43:48 UTC 2025 (1762962228) 332s SUCCESS (0) Wed Nov 12 15:49:21 UTC 2025 (1762962561) 110s FAILED (1) Unable to establish connection to https://keystone-public-openstack.apps.ocp.openstack.la b/v3/auth/tokens: ('Connection aborted.', RemoteDisconnected('Remote end closed connectio n without response')) Wed Nov 12 15:51:12 UTC 2025 (1762962672) 246s SUCCESS (0) Wed Nov 12 15:55:19 UTC 2025 (1762962919) 186s SUCCESS (0)
Note that it eventually recover and everything is fine and stay fine starting from 15:51:12.
Environment
• Starting Version: FR2 (v1.0.7), corresponding to deployedVersion: 18.0.6-20250317.1
• Target Version: Latest pre-release, corresponding to targetVersion:
18.0.14-20251103.185748
• We create and destroy a OpenStack vm continuously during the entire update.
Timeline
$ grep 'UPDATE EVENT' update_timeline.log 2025-11-12T15:37:35,180835603+00:00 [UPDATE EVENT] Update Role 2025-11-12T15:40:01,262034355+00:00 [UPDATE EVENT] Wait for successful deployment of the openstack operator 2025-11-12T15:40:04,518043690+00:00 [UPDATE EVENT] About to get a new version 2025-11-12T15:41:43,843602359+00:00 [UPDATE EVENT] Got new version 18.0.14-20251103.185748 (18.0.6-20250317.1) 2025-11-12T15:41:46,240704688+00:00 [UPDATE EVENT] Starting the update sequence 2025-11-12T15:41:51,361811053+00:00 [UPDATE EVENT] Patching the Openstack Version 2025-11-12T15:44:55,546213307+00:00 [UPDATE EVENT] MinorUpdateOVNControlplane Completed 2025-11-12T15:44:57,768695605+00:00 [UPDATE EVENT] Applying the OVN CRD 2025-11-12T15:47:59,612033815+00:00 [UPDATE EVENT] MinorUpdateOVNDataplane Completed
then update got interupted because of https://issues.redhat.com/browse/OSPRH-21858.
But this seems to indicate that interruption happen very soon after we accept the install plan as Wait for successful deployment of the openstack operator event is coming from this line of code (ie just after install_plan acceptation).
Attachment and reproducer
I'll attach the full log of the vm creation and errors, the complete series of events during the update. If a reproducer can be run using that documentation - but you will need to apply the workaround mentioned there https://issues.redhat.com/browse/OSPRH-21858 during update.
In the update_archive.tar.gz we have:
update/workload_launch.sh <- script used to create vm update/update_timeline.log <- all events during update (openshift even, update event, podman even on the compute node) update/control-plane-test-1062106.log <- the control plane test result update/ct-1062106/ update/ct-1062106/1762961983.log <- associated individual vm creation logs. update/ct-1062106/1762962153.log ...