-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
False
-
False
-
Committed
-
Committed
-
Committed
-
None
Description of problem:
Because CU configured each DPDK NiCs with MTUs of 9000 and these NICs belongs to the same NUMA node(0), they supposed that the value 3072 of dpdk-socket-mem was enough from the document[1] and they set it in their OSP environment.
However, CU was faced with the error messages[2] in /var/log/messages of a compute node when they attached several ports to a instance. The message[2] shows "vhuf2c19278-e7, with MTU 1500", even though MTUs of all DPDK NICs configured as 9000.
And then, after CU changed the value of dpdk-socket-mem from 3072 to 4096, the error stopped occurring and everything worked well.
The procedure which CU executed when the error occurred is [3].
According to CU, this problem occurred around Dec 23 15:21(UTC) (Dec 24 00:21(JST)) in the compute node "cpdh002a".
You can check related logs around the above time by referring the following information:
- Support case number: 03112074
Please run `yank 03112074` in supportshell to see the sosreports.
- ID of one of the affected instances: 1eefaef2-0bad-4b44-9cb9-4df9a096e45f
- One of the affected vhostuser ports: vhuf2c19278-e7
- sosreport of the compute node in which the instance run:
0020-sosreport-cpdh002a-lab3a-vxf-2021-12-24-otohkun.tar.xz
+ /var/log/messages has the error message[2].
+ /var/log/containers/nova/nova-compute.log.1 may show related messages
- lspci shws 10 physical interfaces[4], in which DPDK NICs are 4 "Mellanox Technologies MT27710 [ConnectX-4 Lx]".
DPDK NICs' names are ens1f0, ens2f0, ens1f1 and ens2f1.
(As far as I checked lspci, these 10 interfaces belongs the same NUMA nodes (0)).
- Unfortunately, CU had deleted the instance before they got the sosreports, so we can't find configuration of it in /etc/libvirt/qemu/ and any information of the ports sos_commands/openvswitch/.
If you need the above information, please let me know. I will try to ask CU to reproduce this problem and to get sosreports.
You can also check CU's T-H-T environment files in the following attachment file in supportshell.
- 0110-custom_latest.zip
As far as I checked the T-H-T files, they already configured the following settings:
- network-environment.yaml includes the following:
~~~
parameter_defaults:
:
- MTU global configuration
NeutronGlobalPhysnetMtu: 9000
~~~
- MTUs of related physical interfaces and virtual bridges are set 9000 in the following files:
custom-latest-custom-nics/compute-dpdk-dl360gen10.yaml
[2] /var/log/messages of the compute node:
~~~
Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl – --may-exist add-br br-int – set Bridge br-int datapath_type=netdev
Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 – --may-exist add-port br-int vhuf2c19278-e7 – set Interface vhuf2c19278-e7 external-ids:iface-id=f2c19278-e78c-4ab2-946a-fba3ce2b3264 external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:e8:d9:6a external-ids:vm-uuid=1eefaef2-0bad-4b44-9cb9-4df9a096e45f type=dpdkvhostuserclient options:vhost-server-path=/var/lib/vhost_sockets/vhuf2c19278-e7
Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vswitchd[238324]: ovs|3779302|netdev_dpdk|ERR|Failed to create memory pool for netdev vhuf2c19278-e7, with MTU 1500 on socket 0: Invalid argument
Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vswitchd[238324]: ovs|3779303|dpif_netdev|ERR|Failed to set interface vhuf2c19278-e7 new configuration
~~~
[3] procedure which CU executed
~~~network create
curl -g -i -X POST http://10.3.58.68:9696/v2.0/networks -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d '
{
"network":
}'
~~~
~~~port create
curl -g -i -X POST http://10.3.58.68:9696/v2.0/ports -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d '
{
"port":
}'
~~~
~~~server create
curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{
"server": {
"name": "$server_name",
"imageRef": "",
"availability_zone": "$az_name",
"block_device_mapping_v2": [
],
"flavorRef": "$flavor_id",
"max_count": 1,
"min_count": 1,
"networks": [
]
}
}'
~~~
~~~server stop
curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers/$server_id/action -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '
'
~~~
~~~server add port (loop this 20 times)
curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers/$server_id/os-interface -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '
{
"interfaceAttachment":
}'
~~~
~~~server start
curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers/$server_id/action -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '
'
~~~
[4] Physical NICs from lspci:
02:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
02:00.2 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
02:00.3 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
12:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
12:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
37:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
37:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
5d:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
5d:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
Version-Release number of selected component (if applicable):
- Red Hat OpenStack Platform 13.0.8
- openvswitch-2.9.0-110.el7fdp.x86_64
- Interface: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
How reproducible:
Steps to Reproduce:
Please see [3]
Actual results:
When CU set dpdk-socket-mem to 3072, the error[2] occurred. However, when CU set dpdk-socket-mem to 4096, the error[2] stopped occurring and everything worked well.
Expected results:
Even if CU set dpdk-socket-mem set 3072, the error doesn't occur.
Additional info:
- xsos -i shows the output[5]. You can see each MTU of the interfaces and the bridges.
- sospreports of controller nodes of when this problem occurred:
0030-sosreport-ctrf002a-lab3a-vxf-2021-12-24-ekfgneu.tar.xz
0040-sosreport-ctrf001a-lab3a-vxf-2021-12-24-gadoewz.tar.xz
0050-sosreport-ctrf003a-lab3a-vxf-2021-12-24-lvsoxnl.tar.xz
[5] `xsos -i` of the compute node:
IP4
Interface Master IF MAC Address MTU State IPv4 Address
========= ========= ================= ====== ===== ==================
lo - - 65536 up 127.0.0.1/8
eno1 - 94:40:c9:3f:62:54 1500 up -
eno2 - 94:40:c9:3f:62:55 9000 up 10.3.60.13/23
eno3 - 94:40:c9:3f:62:56 1500 up -
eno4 - 94:40:c9:3f:62:57 1500 up -
eno5 bond-intpub d4:f5:ef:0f:b1:30 9000 up -
eno6 bond-intpub d4:f5:ef:0f:b1:30 9000 up -
ens1f1 - 94:40:c9:a2:58:9d 9000 up 10.3.69.4/23
ens2f1 - 94:40:c9:a8:9e:d9 9000 up 10.3.71.4/23
ovs-netdev - 06:3b:af:68:c9:c7 1500 DOWN -
br-int - 26:16:e2:18:86:49 9000 DOWN -
br-ex - f2:d9:99:9b:28:4f 1500 DOWN -
bond0 - 1e:2b:79:2a:ad:2e 1500 DOWN -
bond-intpub - d4:f5:ef:0f:b1:30 9000 up -
br-nfv - 94:40:c9:a2:58:9c 9000 up -
vlan4@bond-intpub - d4:f5:ef:0f:b1:30 9000 up 10.3.51.4/23
vlan5@bond-intpub - d4:f5:ef:0f:b1:30 9000 up 10.3.55.4/23
docker0 - 02:42:5c:18:4c:b7 1500 up 172.31.0.1/24
ens1f0 - 94:40:c9:a2:58:9c 9000 up -
ens2f0 - 94:40:c9:a8:9e:d8 9000 up -