-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
0
-
False
-
-
False
-
?
-
rhos-connectivity-neutron
-
None
-
-
-
-
Priority Bugs
-
1
-
Important
We are currently holding a RHOSO cluster with 250+ nodes with ~32k VMs and ~32k tenant Networks(each VM takes a network). When we create a new VM it takes ~(1min 15s - 4min) to boot which is very slow. But usually it takes less than 30s(using cirros image with basic flavors).
One of the main reasons for slow VM boot times is OVN Northd(https://issues.redhat.com/browse/OSPRH-25563). But apart from that, ovn-controller on compute is also too slow, adding ~(44 - 60) seconds to VM boot time. And ovn poll cycles are taking around ~18-27 seconds.
Let us look at a sample boot time for a new VM on a compute:
sh-5.1$ time openstack server create test-debug-vm-4 --flavor m1.tiny-cirros --image cirro5 --network test-debug-net --security-group workload-allow-all --availability-zone nova:computer660-93.redhat.local --wait
+-------------------------------------+----------------------------------------------------------+
| Field | Value |
+-------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | computer660-93.redhat.local |
| OS-EXT-SRV-ATTR:hypervisor_hostname | computer660-93.redhat.local |
| OS-EXT-SRV-ATTR:instance_name | instance-0001ff5a |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2026-01-22T07:29:25.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | test-debug-net=10.0.0.113 |
| adminPass | P8qRJFCsCsTm |
| config_drive | True |
| created | 2026-01-22T07:27:44Z |
| flavor | m1.tiny-cirros (4d29d86d-1470-483a-b638-0e56216e2cd1) |
| hostId | 876b92bca437a5f98d1d604b59077a3b4a8c8c408fb6f3cd9865a5b9 |
| id | ed8e85ab-85a2-40db-88fb-e0a741da0638 |
| image | cirro5 (27b56258-e5ad-4f8f-8708-df3158c74c7f) |
| key_name | None |
| name | test-debug-vm-4 |
| progress | 0 |
| project_id | 1a29671bc0d54d25aaae8425642a2c02 |
| properties | |
| security_groups | name='workload-allow-all' |
| status | ACTIVE |
| updated | 2026-01-22T07:29:25Z |
| user_id | b51d25480a3f45b8bb71a46fc26749aa |
| volumes_attached | |
+-------------------------------------+----------------------------------------------------------+
real 1m46.909s
user 0m0.862s
sys 0m0.107s
It is to be noted that VM took 1m46s to boot.
Nova compute and ovn-controller logs for the VM "ed8e85ab-85a2-40db-88fb-e0a741da0638":
// nova compute [root@computer660-93 tripleo-admin]# sudo journalctl -u edpm_nova_compute --since '2026-01-22 07:27:00' | grep 'ed8e85ab-85a2-40db-88fb-e0a741da0638' | grep -E '(Starting|vif-plugged|Took|spawned)' Jan 22 07:27:44 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:27:44.856 2 DEBUG nova.compute.manager [None req-df8f26bd-d996-4831-bdab-a17849b43f71 b51d25480a3f45b8bb71a46fc26749aa 1a29671bc0d54d25aaae8425642a2c02 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Starting instance... _do_build_and_run_instance /usr/lib/python3.9/site-packages/nova/compute/manager.py:2424 Jan 22 07:27:52 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:27:52.846 2 DEBUG nova.compute.manager [None req-df8f26bd-d996-4831-bdab-a17849b43f71 b51d25480a3f45b8bb71a46fc26749aa 1a29671bc0d54d25aaae8425642a2c02 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Preparing to wait for external event network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7 prepare_for_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:283 Jan 22 07:29:25 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:25.285 2 DEBUG nova.compute.manager [req-d0498f96-c9c6-4adf-ab77-b52b2e1ad115 req-7796f1aa-e997-4640-b81f-82b521022c96 7e09b58d16134ee8a60f398d219ab4f5 5fbf206b93004e4cb6003b344c5bf773 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Received event network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7 external_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:11116 Jan 22 07:29:25 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:25.287 2 DEBUG nova.compute.manager [req-d0498f96-c9c6-4adf-ab77-b52b2e1ad115 req-7796f1aa-e997-4640-b81f-82b521022c96 7e09b58d16134ee8a60f398d219ab4f5 5fbf206b93004e4cb6003b344c5bf773 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Processing event network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7 _process_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:10876 Jan 22 07:29:25 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:25.288 2 DEBUG nova.compute.manager [None req-df8f26bd-d996-4831-bdab-a17849b43f71 b51d25480a3f45b8bb71a46fc26749aa 1a29671bc0d54d25aaae8425642a2c02 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Instance event wait completed in 90 seconds for network-vif-plugged wait_for_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:577 Jan 22 07:29:25 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:25.305 2 INFO nova.virt.libvirt.driver [-] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Instance spawned successfully. Jan 22 07:29:25 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:25.359 2 INFO nova.compute.manager [None req-df8f26bd-d996-4831-bdab-a17849b43f71 b51d25480a3f45b8bb71a46fc26749aa 1a29671bc0d54d25aaae8425642a2c02 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Took 97.12 seconds to spawn the instance on the hypervisor. Jan 22 07:29:25 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:25.399 2 INFO nova.compute.manager [None req-df8f26bd-d996-4831-bdab-a17849b43f71 b51d25480a3f45b8bb71a46fc26749aa 1a29671bc0d54d25aaae8425642a2c02 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Took 100.51 seconds to build instance. Jan 22 07:29:28 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:28.129 2 DEBUG nova.compute.manager [req-adf8164e-f4d6-490f-9515-ac1e3b45ebd8 req-cf67c7ac-a10c-4c49-8962-968e73544ebb 7e09b58d16134ee8a60f398d219ab4f5 5fbf206b93004e4cb6003b344c5bf773 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Received event network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7 external_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:11116 Jan 22 07:29:28 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:28.130 2 DEBUG nova.compute.manager [req-adf8164e-f4d6-490f-9515-ac1e3b45ebd8 req-cf67c7ac-a10c-4c49-8962-968e73544ebb 7e09b58d16134ee8a60f398d219ab4f5 5fbf206b93004e4cb6003b344c5bf773 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] No waiting events found dispatching network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7 pop_instance_event /usr/lib/python3.9/site-packages/nova/compute/manager.py:320 Jan 22 07:29:28 computer660-93.redhat.local nova_compute[209538]: 2026-01-22 07:29:28.130 2 WARNING nova.compute.manager [req-adf8164e-f4d6-490f-9515-ac1e3b45ebd8 req-cf67c7ac-a10c-4c49-8962-968e73544ebb 7e09b58d16134ee8a60f398d219ab4f5 5fbf206b93004e4cb6003b344c5bf773 - - default default] [instance: ed8e85ab-85a2-40db-88fb-e0a741da0638] Received unexpected event network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7 for instance with vm_state active and task_state None. // ovn controller [root@computer660-93 tripleo-admin]# sudo journalctl -u edpm_ovn_controller --since '2026-01-22 07:27:00' --until '2026-01-22 07:30:00' | grep -E '(Claiming|Unreason|0fb816d3|Setting.*up)' Jan 22 07:27:02 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:27:02Z|392415|timeval|WARN|Unreasonably long 18275ms poll interval (18182ms user, 11ms system) Jan 22 07:27:20 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:27:20Z|392669|binding|INFO|Setting lport 041e62b4-34ce-44de-9003-e4baaea1f194 up in Southbound Jan 22 07:27:20 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:27:20Z|392670|timeval|WARN|Unreasonably long 18129ms poll interval (17960ms user, 83ms system) Jan 22 07:27:39 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:27:39Z|392922|timeval|WARN|Unreasonably long 18539ms poll interval (18438ms user, 9ms system) Jan 22 07:27:57 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:27:57Z|392931|timeval|WARN|Unreasonably long 18453ms poll interval (18335ms user, 26ms system) Jan 22 07:28:03 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:03Z|392939|timeval|WARN|Unreasonably long 4470ms poll interval (4319ms user, 129ms system) Jan 22 07:28:10 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:10Z|392944|binding|INFO|Claiming lport 0fb816d3-4594-4370-8967-87f3127c67f7 for this chassis. Jan 22 07:28:10 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:10Z|392945|binding|INFO|0fb816d3-4594-4370-8967-87f3127c67f7: Claiming fa:16:3e:a3:ac:a7 10.0.0.113 Jan 22 07:28:35 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:35Z|392947|timeval|WARN|Unreasonably long 25078ms poll interval (24685ms user, 189ms system) Jan 22 07:28:35 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:35Z|392951|binding|INFO|Setting lport 0fb816d3-4594-4370-8967-87f3127c67f7 ovn-installed in OVS Jan 22 07:28:54 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:54Z|393201|binding|INFO|Setting lport 0fb816d3-4594-4370-8967-87f3127c67f7 up in Southbound Jan 22 07:28:54 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:28:54Z|393202|timeval|WARN|Unreasonably long 18277ms poll interval (18125ms user, 68ms system) Jan 22 07:29:12 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:29:12Z|393457|timeval|WARN|Unreasonably long 18102ms poll interval (17991ms user, 27ms system) Jan 22 07:29:57 computer660-93.redhat.local ovn_controller[5045]: 2026-01-22T07:29:57Z|393464|timeval|WARN|Unreasonably long 27648ms poll interval (27184ms user, 279ms system)
From the above nova-compute logs, its to be observed that..
1) 07:27:44 - Nova starting instance
2) 07:27:52 - Preparing to wait for external event network-vif-plugged-0fb816d3-4594-4370-8967-87f3127c67f7
3) 07:29:25 - Instance event wait completed in 90 seconds for network-vif-plugged
4) 07:29:25 - Took 100.51 seconds to build instance
From ovn-controller:
1) 07:28:10 - Claiming lport 0fb816d3-4594-4370-8967-87f3127c67f7 for this chassis
2) 07:28:54 - Setting lport 0fb816d3-4594-4370-8967-87f3127c67f7 up in Southbound
So the ovn-controller alone was taking ~44 seconds from claiming the port to setting it 'up' in SBDB per VM. And the Instance waited for 'network-vif-plugged' event for 90s, where ~90% of the VM boot time is spent.
Number of existing VMs on the computer-660-93.redhat.local: 249
[root@computer660-93 tripleo-admin]# sudo virsh list | grep -i instance | wc -l 249
Number of openflow rules on br-int currently: ~5.5 million
cookie=0x66f7224c, duration=149032.237s, table=12, n_packets=0, n_bytes=0, priority=110,icmp6,metadata=0x349d,nw_ttl=255,icmp_type=134,icmp_code=0 actions=resubmit(,13) cookie=0x66f7224c, duration=149032.234s, table=12, n_packets=0, n_bytes=0, priority=110,icmp6,metadata=0x349b,nw_ttl=255,icmp_type=134,icmp_code=0 actions=resubmit(,13) cookie=0x66f7224c, duration=149032.188s, table=12, n_packets=0, n_bytes=0, priority=110,icmp6,metadata=0x349d,nw_ttl=255,icmp_type=135,icmp_code=0 actions=resubmit(,13) 5514942 [1]- Done sudo ovs-ofctl dump-flows br-int [2]+ Done sudo ovs-ofctl dump-flows br-int
Versions:
[root@computer660-93 tripleo-admin]# cat /etc/rhoso-release
Red Hat OpenStack Services on OpenShift release 18.0.15 (Antelope)