Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: data-plane-adoption
Labels:
None

Story Points:
0
Epic Link:
RHOSP 17.1 to 18 Adoption in High Density node situation (+250 compute nodes)
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-ops-day1day2-upgrades
Regression:
None
Intelligence Requested:
Market:
PX Impact Score:

Severity:
Critical

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

We are performing a large-scale OpenStack adoption from RHOSP 17.1 to RHOSO 18.0 in an environment with 250+ nodes and 10k+ VMs.

During data plane adoption, while running the tripleo-cleanup service across compute nodes, few existing running instances were unexpectedly terminated. The affected VMs were not part of the cleanup operation and are in ACTIVE state at the time.

On the compute nodes, libvirt logs show the corresponding qemu-kvm processes receiving SIGTERM (signal 15), after which the instances transitioned to a forceful shutdown. No user initiated actions were performed on these instances.

This behavior occurs during execution of the tripleo-cleanup service and results in unexpected workload disruption during adoption. We have observed that ~7VMs out of 10k got terminated during this time.

Logs for one of the terminated instances:

// vm got a signal 15 from pid 93929
[root@computer660-63 ~]# head -20 /var/log/containers/libvirt/qemu/instance-0000109a.log
2026-01-11T16:50:57.629336Z qemu-kvm: terminating on signal 15 from pid 93929 (<unknown process>)2026-01-11 16:50:57.881+0000: shutting down, reason=shutdown 


// pid 93929 has selinux context "container_runtime_t"
[root@computer660-63 ~]# grep "93929" /var/log/audit/audit.log | head -20
type=AVC msg=audit(1768150257.628:1250469): avc:  denied  { search } for  pid=93649 comm="qemu-kvm" name="93929" dev="proc" ino=489139156 scontext=system_u:system_r:svirt_t:s0:c83,c933 tcontext=unconfined_u:unconfined_r:container_runtime_t:s0 tclass=dir permissive=0


// more logs
[root@computer660-63 ~]# sudo grep "instance-0000109a" /var/log/containers/libvirt/virtqemud.log 
2026-01-11 16:50:57.629+0000: 93655: debug : qemuProcessHandleShutdown:590 : Transitioned guest instance-0000109a to shutdown state
2026-01-11 16:50:57.630+0000: 93655: debug : qemuProcessKill:8247 : vm=0x7fe088022af0 name=instance-0000109a pid=93649 flags=0x2
2026-01-11 16:50:57.680+0000: 93655: debug : qemuMonitorIO:541 : Error on monitor <null> mon=0x7fe0a40285d0 vm=0x7fe088022af0 name=instance-0000109a
2026-01-11 16:50:57.680+0000: 93655: debug : qemuMonitorIO:563 : Triggering EOF callback mon=0x7fe0a40285d0 vm=0x7fe088022af0 name=instance-0000109a
2026-01-11 16:50:57.680+0000: 93655: debug : qemuProcessHandleMonitorEOF:310 : Received EOF on 0x7fe088022af0 'instance-0000109a'
2026-01-11 16:50:57.680+0000: 920512: debug : qemuProcessKill:8247 : vm=0x7fe088022af0 name=instance-0000109a pid=93649 flags=0x1
2026-01-11 16:50:57.881+0000: 920512: debug : qemuProcessStop:8331 : Shutting down vm=0x7fe088022af0 name=instance-0000109a id=6 pid=93649, reason=shutdown, asyncJob=none, flags=0x0
2026-01-11 16:50:57.881+0000: 920512: debug : qemuDomainLogAppendMessage:7108 : Append log message (vm='instance-0000109a' message='2026-01-11 16:50:57.881+0000: shutting down, reason=shutdown
2026-01-11 16:50:57.883+0000: 920512: debug : qemuProcessKill:8247 : vm=0x7fe088022af0 name=instance-0000109a pid=93649 flags=0x5
2026-01-11 16:50:57.883+0000: 920512: debug : qemuDomainCleanupRun:7558 : driver=0x7fe0680212d0, vm=instance-0000109a

Versions:

[root@computer660-63 ~]# rpm -qa podman
podman-4.4.1-22.el9_2.4.x86_64
[root@computer660-63 ~]# rpm -qa conmon
conmon-2.1.7-1.el9_2.x86_64
[root@computer660-63 ~]# rpm -qa crun
crun-1.8.4-1.el9_2.x86_64 

[tripleo-admin@computer660-63 ~]$ uname -r
5.14.0-284.144.1.el9_2.x86_64

Actual results:
Intermittent VM terminations during tripleo-cleanup service execution

Expected results:
No disruption to the workload.

Assignee:: Lukas Bezdicka

Reporter:: Rajesh Pulapakula

Team:: rhos-dfg-upgrades

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2026/01/31 12:12 PM

Updated:: 2026/02/11 2:25 PM

Resolved:: 2026/02/11 2:25 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty