Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-26015

Intermittent termination of running VMs observed during tripleo-cleanup service execution during RHOSP 17.1 to RHOSO 18.0 adoption

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • data-plane-adoption
    • None
    • Critical

      We are performing a large-scale OpenStack adoption from RHOSP 17.1 to RHOSO 18.0 in an environment with 250+ nodes and 10k+ VMs.

      During data plane adoption, while running the tripleo-cleanup service across compute nodes, few existing running instances were unexpectedly terminated. The affected VMs were not part of the cleanup operation and are in ACTIVE state at the time.

      On the compute nodes, libvirt logs show the corresponding qemu-kvm processes receiving SIGTERM (signal 15), after which the instances transitioned to a forceful shutdown. No user initiated actions were performed on these instances.

      This behavior occurs during execution of the tripleo-cleanup service and results in unexpected workload disruption during adoption. We have observed that ~7VMs out of 10k got terminated during this time.

      Logs for one of the terminated instances:

      // vm got a signal 15 from pid 93929
      [root@computer660-63 ~]# head -20 /var/log/containers/libvirt/qemu/instance-0000109a.log
      2026-01-11T16:50:57.629336Z qemu-kvm: terminating on signal 15 from pid 93929 (<unknown process>)2026-01-11 16:50:57.881+0000: shutting down, reason=shutdown 
      
      
      // pid 93929 has selinux context "container_runtime_t"
      [root@computer660-63 ~]# grep "93929" /var/log/audit/audit.log | head -20
      type=AVC msg=audit(1768150257.628:1250469): avc:  denied  { search } for  pid=93649 comm="qemu-kvm" name="93929" dev="proc" ino=489139156 scontext=system_u:system_r:svirt_t:s0:c83,c933 tcontext=unconfined_u:unconfined_r:container_runtime_t:s0 tclass=dir permissive=0
      
      
      // more logs
      [root@computer660-63 ~]# sudo grep "instance-0000109a" /var/log/containers/libvirt/virtqemud.log 
      2026-01-11 16:50:57.629+0000: 93655: debug : qemuProcessHandleShutdown:590 : Transitioned guest instance-0000109a to shutdown state
      2026-01-11 16:50:57.630+0000: 93655: debug : qemuProcessKill:8247 : vm=0x7fe088022af0 name=instance-0000109a pid=93649 flags=0x2
      2026-01-11 16:50:57.680+0000: 93655: debug : qemuMonitorIO:541 : Error on monitor <null> mon=0x7fe0a40285d0 vm=0x7fe088022af0 name=instance-0000109a
      2026-01-11 16:50:57.680+0000: 93655: debug : qemuMonitorIO:563 : Triggering EOF callback mon=0x7fe0a40285d0 vm=0x7fe088022af0 name=instance-0000109a
      2026-01-11 16:50:57.680+0000: 93655: debug : qemuProcessHandleMonitorEOF:310 : Received EOF on 0x7fe088022af0 'instance-0000109a'
      2026-01-11 16:50:57.680+0000: 920512: debug : qemuProcessKill:8247 : vm=0x7fe088022af0 name=instance-0000109a pid=93649 flags=0x1
      2026-01-11 16:50:57.881+0000: 920512: debug : qemuProcessStop:8331 : Shutting down vm=0x7fe088022af0 name=instance-0000109a id=6 pid=93649, reason=shutdown, asyncJob=none, flags=0x0
      2026-01-11 16:50:57.881+0000: 920512: debug : qemuDomainLogAppendMessage:7108 : Append log message (vm='instance-0000109a' message='2026-01-11 16:50:57.881+0000: shutting down, reason=shutdown
      2026-01-11 16:50:57.883+0000: 920512: debug : qemuProcessKill:8247 : vm=0x7fe088022af0 name=instance-0000109a pid=93649 flags=0x5
      2026-01-11 16:50:57.883+0000: 920512: debug : qemuDomainCleanupRun:7558 : driver=0x7fe0680212d0, vm=instance-0000109a 

       

      Versions:

      [root@computer660-63 ~]# rpm -qa podman
      podman-4.4.1-22.el9_2.4.x86_64
      [root@computer660-63 ~]# rpm -qa conmon
      conmon-2.1.7-1.el9_2.x86_64
      [root@computer660-63 ~]# rpm -qa crun
      crun-1.8.4-1.el9_2.x86_64 
      
      [tripleo-admin@computer660-63 ~]$ uname -r
      5.14.0-284.144.1.el9_2.x86_64
      

      Actual results:
      Intermittent VM terminations during tripleo-cleanup service execution

      Expected results:
      No disruption to the workload. 

              Unassigned Unassigned
              rpulapak@redhat.com Rajesh Pulapakula
              rhos-dfg-upgrades
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: