-
Bug
-
Resolution: Unresolved
-
Undefined
-
rhos-18.0 Feature Release 1 (Nov 2024)
-
None
-
False
-
-
False
-
?
-
?
-
?
-
?
-
None
-
-
-
Critical
Having a workload like this:
sh-5.1$ openstack server list --all --long +--------------------------------------+-----------+--------+------------+-------------+----------------------------------------------------------+---------------------------------+--------------------------------------+----------------------------+-------------------+--------------------------------+------------+-------------+ | ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor | Availability Zone | Host | Properties | Host Status | +--------------------------------------+-----------+--------+------------+-------------+----------------------------------------------------------+---------------------------------+--------------------------------------+----------------------------+-------------------+--------------------------------+------------+-------------+ | 9da89876-b12b-44ce-998a-e15f3bd2c10c | instance6 | ACTIVE | None | Running | data=10.10.166.117; dpdkmgmt=10.10.10.139, 10.46.141.161 | rhel-guest-image-9.5-20241009.2 | 44cdcd14-8101-4653-8cf0-29a7bd7d218e | m1_medium_huge_pages_host1 | nova | compute-1.ctlplane.example.com | | UP | | 118a85e1-fec4-409d-b445-b94d6a7f6f2e | instance5 | ACTIVE | None | Running | data=10.10.166.118; dpdkmgmt=10.10.10.181, 10.46.141.165 | rhel-guest-image-9.5-20241009.2 | 44cdcd14-8101-4653-8cf0-29a7bd7d218e | m1_medium_huge_pages_host0 | nova | compute-0.ctlplane.example.com | | UP | | bbf172cb-a2d8-4ce1-8f42-185c450aff62 | instance4 | ACTIVE | None | Running | data=10.10.166.125; dpdkmgmt=10.10.10.200, 10.46.141.170 | rhel-guest-image-9.5-20241009.2 | 44cdcd14-8101-4653-8cf0-29a7bd7d218e | m1_medium_huge_pages_host1 | nova | compute-1.ctlplane.example.com | | UP | | 6ea17395-14d7-4eed-89e7-525460990ee8 | instance3 | ACTIVE | None | Running | data=10.10.166.144; dpdkmgmt=10.10.10.148, 10.46.141.167 | rhel-guest-image-9.5-20241009.2 | 44cdcd14-8101-4653-8cf0-29a7bd7d218e | m1_medium_huge_pages_host0 | nova | compute-0.ctlplane.example.com | | UP | | f3d82b52-01c2-4d03-9bc2-e7bdd372b832 | instance2 | ACTIVE | None | Running | data=10.10.166.171; dpdkmgmt=10.10.10.124, 10.46.141.162 | rhel-guest-image-9.5-20241009.2 | 44cdcd14-8101-4653-8cf0-29a7bd7d218e | m1_medium_huge_pages_host1 | nova | compute-1.ctlplane.example.com | | UP | | 98f17950-dd1e-4861-9877-2ad6d42c8110 | instance1 | ACTIVE | None | Running | data=10.10.166.170; dpdkmgmt=10.10.10.159, 10.46.141.169 | rhel-guest-image-9.5-20241009.2 | 44cdcd14-8101-4653-8cf0-29a7bd7d218e | m1_medium_huge_pages_host0 | nova | compute-0.ctlplane.example.com | | UP | +--------------------------------------+-----------+--------+------------+-------------+----------------------------------------------------------+---------------------------------+--------------------------------------+----------------------------+-------------------+--------------------------------+------------+-------------+
Ping works normally for VMs:
[zuul@controller-0 ~]$ ping 10.46.141.169 PING 10.46.141.169 (10.46.141.169) 56(84) bytes of data. 64 bytes from 10.46.141.169: icmp_seq=1 ttl=61 time=0.973 ms 64 bytes from 10.46.141.169: icmp_seq=2 ttl=61 time=0.460 ms ...
Then we restart the openvswitch.service in one of the computes (in this case in compute-0):
[zuul@panther06 ~]$ ssh -ostricthostkeychecking=no -ouserknownhostsfile=/dev/null -i /tmp/k cloud-admin@192.168.122.101 Warning: Permanently added '192.168.122.101' (ED25519) to the list of known hosts. Register this system with Red Hat Insights: insights-client --register Create an account or view all your systems at https://red.ht/insights-dashboard Last login: Mon Nov 18 08:24:22 2024 from 192.168.122.1 [cloud-admin@compute-0 ~]$ sudo ovs-vsctl show d49151e9-54fb-448a-a157-a719a460cabe Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-link0 fail_mode: standalone datapath_type: netdev Port br-link0 tag: 164 Interface br-link0 type: internal Port dpdkbond0 Interface dpdk0 type: dpdk options: {dpdk-devargs="0000:06:00.0", n_rxq="2"} Interface dpdk1 type: dpdk options: {dpdk-devargs="0000:06:00.1", n_rxq="2"} Bridge br-dpdk1 fail_mode: standalone datapath_type: netdev Port dpdk4 Interface dpdk4 type: dpdk options: {dpdk-devargs="0000:82:00.1", n_rxq="3"} Port br-dpdk1 Interface br-dpdk1 type: internal Bridge br-int fail_mode: secure datapath_type: netdev Port patch-br-int-to-provnet-55a983c0-4995-4d06-95bd-efcd7b6077a7 Interface patch-br-int-to-provnet-55a983c0-4995-4d06-95bd-efcd7b6077a7 type: patch options: {peer=patch-provnet-55a983c0-4995-4d06-95bd-efcd7b6077a7-to-br-int} Port ovn-71cf59-0 Interface ovn-71cf59-0 type: geneve options: {csum="true", key=flow, remote_ip="172.19.0.31", tos="0"} bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="Control Detection Time Expired", remote_state=up, state=up} Port tap10c517b9-c0 Interface tap10c517b9-c0 Port vhuca97fc52-8a Interface vhuca97fc52-8a type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhuca97fc52-8a"} Port vhu430dd602-36 Interface vhu430dd602-36 type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhu430dd602-36"} Port br-int Interface br-int type: internal Port vhu63995776-9f Interface vhu63995776-9f type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhu63995776-9f"} Port vhuc4181b43-96 Interface vhuc4181b43-96 type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhuc4181b43-96"} Port tap3bc86f39-c0 Interface tap3bc86f39-c0 Port vhu0996cf4b-fe Interface vhu0996cf4b-fe type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhu0996cf4b-fe"} Port ovn-a4fb3c-0 Interface ovn-a4fb3c-0 type: geneve options: {csum="true", key=flow, remote_ip="172.19.0.100", tos="0"} Port vhu7c1125d6-c9 Interface vhu7c1125d6-c9 type: dpdkvhostuserclient options: {vhost-server-path="/var/lib/vhost_sockets/vhu7c1125d6-c9"} Port ovn-64b940-0 Interface ovn-64b940-0 type: geneve options: {csum="true", key=flow, remote_ip="172.19.0.32", tos="0"} bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="Control Detection Time Expired", remote_state=up, state=up} Port ovn-07eed2-0 Interface ovn-07eed2-0 type: geneve options: {csum="true", key=flow, remote_ip="172.19.0.30", tos="0"} bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="Control Detection Time Expired", remote_state=up, state=up} Bridge br-dpdk0 fail_mode: standalone datapath_type: netdev Port dpdkbond1 Interface dpdk2 type: dpdk options: {dpdk-devargs="0000:82:00.2", n_rxq="3"} Interface dpdk3 type: dpdk options: {dpdk-devargs="0000:82:00.3", n_rxq="3"} Port br-dpdk0 Interface br-dpdk0 type: internal Port patch-provnet-55a983c0-4995-4d06-95bd-efcd7b6077a7-to-br-int Interface patch-provnet-55a983c0-4995-4d06-95bd-efcd7b6077a7-to-br-int type: patch options: {peer=patch-br-int-to-provnet-55a983c0-4995-4d06-95bd-efcd7b6077a7} ovs_version: "3.3.3-49.el9fdp" [cloud-admin@compute-0 ~]$ systemctl -a |grep openvs openvswitch.service loaded active exited Open vSwitch [cloud-admin@compute-0 ~]$ systemctl status openvswitch.service ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; preset: disabled) Active: active (exited) since Mon 2024-11-18 11:52:54 UTC; 2h 9min ago Process: 245106 ExecStart=/bin/true (code=exited, status=0/SUCCESS) Main PID: 245106 (code=exited, status=0/SUCCESS) CPU: 2ms [cloud-admin@compute-0 ~]$ [cloud-admin@compute-0 ~]$ sudo systemctl restart openvswitch.service [cloud-admin@compute-0 ~]$ rpm -qi openvswitch3.3 Name : openvswitch3.3 Version : 3.3.0 Release : 49.el9fdp Architecture: x86_64 Install Date: Thu 14 Nov 2024 08:33:15 AM UTC Group : System Environment/Daemons daemon/database/utilities Size : 24895143 License : ASL 2.0 and LGPLv2+ and SISSL Signature : RSA/SHA256, Mon 16 Sep 2024 04:17:54 PM UTC, Key ID 199e2f91fd431d51 Source RPM : openvswitch3.3-3.3.0-49.el9fdp.src.rpm Build Date : Mon 16 Sep 2024 08:40:03 AM UTC Build Host : x86-64-04.build.eng.rdu2.redhat.com Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : http://www.openvswitch.org/ Summary : Open vSwitch Description : Open vSwitch provides standard network bridging functions and support for the OpenFlow protocol for remote per-flow control of traffic.
After doing the openvswitch service restart we lose connectivity the VMs hosted by that compute (ping loss corresponds to the moment the opevswitch service was restarted):
[zuul@controller-0 ~]$ ping 10.46.141.169 PING 10.46.141.169 (10.46.141.169) 56(84) bytes of data. 64 bytes from 10.46.141.169: icmp_seq=1 ttl=61 time=0.973 ms 64 bytes from 10.46.141.169: icmp_seq=2 ttl=61 time=0.460 ms 64 bytes from 10.46.141.169: icmp_seq=3 ttl=61 time=0.492 ms 64 bytes from 10.46.141.169: icmp_seq=4 ttl=61 time=0.415 ms 64 bytes from 10.46.141.169: icmp_seq=5 ttl=61 time=0.448 ms 64 bytes from 10.46.141.169: icmp_seq=6 ttl=61 time=0.457 ms 64 bytes from 10.46.141.169: icmp_seq=7 ttl=61 time=0.449 ms 64 bytes from 10.46.141.169: icmp_seq=8 ttl=61 time=0.519 ms 64 bytes from 10.46.141.169: icmp_seq=9 ttl=61 time=0.460 ms 64 bytes from 10.46.141.169: icmp_seq=10 ttl=61 time=0.457 ms 64 bytes from 10.46.141.169: icmp_seq=11 ttl=61 time=0.507 ms 64 bytes from 10.46.141.169: icmp_seq=12 ttl=61 time=0.489 ms 64 bytes from 10.46.141.169: icmp_seq=13 ttl=61 time=0.532 ms 64 bytes from 10.46.141.169: icmp_seq=14 ttl=61 time=0.593 ms 64 bytes from 10.46.141.169: icmp_seq=15 ttl=61 time=0.482 ms 64 bytes from 10.46.141.169: icmp_seq=16 ttl=61 time=0.485 ms 64 bytes from 10.46.141.169: icmp_seq=17 ttl=61 time=0.600 ms 64 bytes from 10.46.141.169: icmp_seq=18 ttl=61 time=0.569 ms 64 bytes from 10.46.141.169: icmp_seq=19 ttl=61 time=0.556 ms 64 bytes from 10.46.141.169: icmp_seq=20 ttl=61 time=0.553 ms 64 bytes from 10.46.141.169: icmp_seq=21 ttl=61 time=0.503 ms 64 bytes from 10.46.141.169: icmp_seq=22 ttl=61 time=0.495 ms 64 bytes from 10.46.141.169: icmp_seq=23 ttl=61 time=27.4 ms 64 bytes from 10.46.141.169: icmp_seq=39 ttl=61 time=1.28 ms ^C --- 10.46.141.169 ping statistics --- 65 packets transmitted, 24 received, 63.0769% packet loss, time 65486ms rtt min/avg/max/mdev = 0.415/1.673/27.380/5.363 ms [zuul@controller-0 ~]$ ping -c1 10.46.141.169 PING 10.46.141.169 (10.46.141.169) 56(84) bytes of data. --- 10.46.141.169 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms
And we need reboot the VM (with virsh console) to get ping working again (in this case ping loss corresponds to the moment the VM is starting after the reboot and then we have ping again):
[zuul@controller-0 ~]$ ping 10.46.141.169 PING 10.46.141.169 (10.46.141.169) 56(84) bytes of data. 64 bytes from 10.46.141.169: icmp_seq=21 ttl=61 time=1.63 ms 64 bytes from 10.46.141.169: icmp_seq=22 ttl=61 time=0.953 ms 64 bytes from 10.46.141.169: icmp_seq=23 ttl=61 time=0.522 ms ^C --- 10.46.141.169 ping statistics --- 23 packets transmitted, 3 received, 86.9565% packet loss, time 22504ms rtt min/avg/max/mdev = 0.522/1.035/1.632/0.456 ms
However ping with the rest of VMs hosted by compute-0 still doesn't work:
[zuul@controller-0 ~]$ ping -c1 10.46.141.167 PING 10.46.141.167 (10.46.141.167) 56(84) bytes of data. --- 10.46.141.167 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms [zuul@controller-0 ~]$ ping -c1 10.46.141.165 PING 10.46.141.165 (10.46.141.165) 56(84) bytes of data. --- 10.46.141.165 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms
This issue has also been reproduced with RHEL 8.4 as guest.
Find attached:
ovs-vswitchd-post_reboot_vm.log: ovs-vswitchd.log before doing the restart service
ovs-vswitchd-pre_reboot_vm.log: ovs-vswitchd.log before doing the restart service and the VM reboot
instance1_messages: messages of instance1
Reproduction procedure:
1 Deploy VMs
2 Ping works
3 Restart opevswitch service in compute
4 Ping stops working
5 Reboot the VM
6 Ping works again