-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
rhel-9
-
None
-
-
This ticket is tracking the QE verification effort for the solution to the problem described below.
Description of problem:
The customer is experiencing intermittent connectivity issues with multiple virtual machines, one of the windows VM shows the behavior consistently.
The cluster was upgraded from 4.17 to 4.18.18 since the issue is observed.
The issue was characterized by frequent disconnections, which sometimes resolve after migrating the VM to another node or restarting the ovnkube-node pod of the node or on its own.
The issue can be seen on a windows VM currently, in past the behavior was observed on the linux VM's as well spread across different nodes of the cluster.
How the VMS are connection to the Physical Interface: **
bond1 --> enbd-ex(OVS Bridge)-Localnet NAD
From the recent troubleshooting we found the problem between two nodes within one VLAN in two different nodes; **
Source:
Node: lben203vpm017u
VM : lvenaacaac601u(10.119.134.31)
VLAN: 2434
# ovn-nbctl show nad.2434_ovn_localnet_switch switch ea5a5b0f-4697-4450-997d-a6d197e3a291 (nad.2434_ovn_localnet_switch) port aac.574.nad.2434_aac-574_virt-launcher-lvenaacaac601u-ct74v addresses: ["00:50:56:97:79:64"] port cxb.9.nad.2434_cxb-9_virt-launcher-lvencxbapp601u-jpgp2 addresses: ["02:8e:62:00:00:9f"] port nad.2434_ovn_localnet_port type: localnet tag: 2434 addresses: ["unknown"] port adc.31.nad.2434_adc-31_virt-launcher-wvenadcadc403u-hv69w addresses: ["00:50:56:97:67:b2"]
Packet Capture from the source system(lben203vpm017u) confirms that ARP packet from the source VM 10.119.134.31 never exits through physical interface(bond1|enbd-ex), as a result the destination node lben213vpm007u never receives this packet, expectedly VM can't receive it.
# tcpdump -i any host 10.119.134.58
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
09:35:14.751059 0c25ea895989a_3 B ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:14.751076 1c9e4579fcddb_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:14.751080 6c6fd1c37c900_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:15.775934 0c25ea895989a_3 B ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:15.775951 1c9e4579fcddb_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:15.775954 6c6fd1c37c900_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:16.798941 0c25ea895989a_3 B ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:16.798945 1c9e4579fcddb_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:16.798947 6c6fd1c37c900_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:17.823152 0c25ea895989a_3 B ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 28
09:35:17.823156 1c9e4579fcddb_3 Out ARP, Request who-has 10.119.134.58 tell 10.119.134.31, length 2
Destination:
Node: lben213vpm007u
VM : wvenaacaac304u(10.119.134.58)
VLAN: 2434
# ovn-nbctl show nad.2434_ovn_localnet_switch switch b400ae4b-1810-4ea2-aff0-872cb5b5d164 (nad.2434_ovn_localnet_switch) port nad.2434_ovn_localnet_port type: localnet tag: 2434 addresses: ["unknown"] port aac.574.nad.2434_aac-574_virt-launcher-wvenaacaac304u-mcwsz addresses: ["00:50:56:97:06:40"]
When we did reverse ping from 10.119.134.58(dst) to 10.119.134.31(src), the behavior remained the same that the ARP packet never arrived at physical interface bond1/enbd-ex.
[root@lben213vpm007u /]# tcpdump -i any host 10.119.134.31 -nnn
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
11:24:17.023481 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:17.025672 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:18.012001 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:18.012009 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:19.012978 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:19.012987 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:20.014509 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:20.014517 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:21.020162 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:21.020181 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:22.020808 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:22.020816 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:23.022912 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:23.022932 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:24.010992 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:24.010999 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:25.011607 a826ad88d47d6_3 B ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
11:24:25.011615 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.31 tell 10.119.134.58, length 28
When we migrated another VM in 2434 to the lben213vpm007u node, PING worked from that VM to 10.119.134.58.
------------------ same node 58 to 98 --------------
[root@lben213vpm007u /]# tcpdump -i any host 10.119.134.98 and host 10.119.134.58 -nnn
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
11:27:16.710204 a826ad88d47d6_3 P IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 57, length 40
11:27:16.710559 12b1dfc23c7ec_3 Out IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 57, length 40
11:27:16.710711 12b1dfc23c7ec_3 P IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 57, length 40
11:27:16.711012 a826ad88d47d6_3 Out IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 57, length 40
11:27:17.714190 a826ad88d47d6_3 P IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 58, length 40
11:27:17.714202 12b1dfc23c7ec_3 Out IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 58, length 40
11:27:17.714335 12b1dfc23c7ec_3 P IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 58, length 40
11:27:17.714338 a826ad88d47d6_3 Out IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 58, length 40
11:27:18.732977 a826ad88d47d6_3 P IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 59, length 40
11:27:18.732985 12b1dfc23c7ec_3 Out IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 59, length 40
11:27:18.733194 12b1dfc23c7ec_3 P IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 59, length 40
11:27:18.733198 a826ad88d47d6_3 Out IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 59, length 40
11:27:19.746997 a826ad88d47d6_3 P IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 60, length 40
11:27:19.747006 12b1dfc23c7ec_3 Out IP 10.119.134.58 > 10.119.134.98: ICMP echo request, id 1, seq 60, length 40
11:27:19.747186 12b1dfc23c7ec_3 P IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 60, length 40
11:27:19.747192 a826ad88d47d6_3 Out IP 10.119.134.98 > 10.119.134.58: ICMP echo reply, id 1, seq 60, length 40
11:27:21.516034 a826ad88d47d6_3 P ARP, Request who-has 10.119.134.98 (02:8e:62:00:01:42) tell 10.119.134.58, length 28
11:27:21.518149 12b1dfc23c7ec_3 Out ARP, Request who-has 10.119.134.98 (02:8e:62:00:01:42) tell 10.119.134.58, length 28
11:27:21.518250 12b1dfc23c7ec_3 P ARP, Reply 10.119.134.98 is-at 02:8e:62:00:01:42, length 28
11:27:21.518582 a826ad88d47d6_3 Out ARP, Reply 10.119.134.98 is-at 02:8e:62:00:01:42, length 28
Version-Release number of selected component (if applicable):
4.18.8
Actual results:
- Packet failure is observed in the ping command ran from bastion.
- PING fails from
Expected results:
- Ping shouldn't be failing.
Additional info:
- Old must-gather - 0110-must-gather.local.9025741996085704360.tar.gz
- Raw troubleshooting text file - 0340-issue_vm.txt
- Source: Packet capture, OVN DB, OVS DB - 0270-source-134.31.tar.gz
- Destination: Packet capture, OVN DB, OVS DB - 0280-dest-134.58.tar.gz
- SOS of source node - 0300-sosreport-lben203vpm017u-2026-01-07-oafvpri.tar.xz
- OVS commands from source: 0330-lben203vpm017u-ovs_command_logs.tar.gz
- OVS commands from destination: 0320-lben213vpm007u-ovs_command_logs.tar.gz
- Customer yet to upload latest MG