Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-3165

BZ#2040554 netdev_dpdk logged "Failed to create memory pool for netdev with MTU 1500" even though MTUs of all DPDK NICs were configured as 9000.

XMLWordPrintable

    • False
    • False
    • Committed
    • Committed
    • Committed
    • None

      Description of problem:

      Because CU configured each DPDK NiCs with MTUs of 9000 and these NICs belongs to the same NUMA node(0), they supposed that the value 3072 of dpdk-socket-mem was enough from the document[1] and they set it in their OSP environment.

      However, CU was faced with the error messages[2] in /var/log/messages of a compute node when they attached several ports to a instance. The message[2] shows "vhuf2c19278-e7, with MTU 1500", even though MTUs of all DPDK NICs configured as 9000.

      And then, after CU changed the value of dpdk-socket-mem from 3072 to 4096, the error stopped occurring and everything worked well.

      The procedure which CU executed when the error occurred is [3].

      According to CU, this problem occurred around Dec 23 15:21(UTC) (Dec 24 00:21(JST)) in the compute node "cpdh002a".

      You can check related logs around the above time by referring the following information:

      • Support case number: 03112074
        Please run `yank 03112074` in supportshell to see the sosreports.
      • ID of one of the affected instances: 1eefaef2-0bad-4b44-9cb9-4df9a096e45f
      • One of the affected vhostuser ports: vhuf2c19278-e7
      • sosreport of the compute node in which the instance run:

      0020-sosreport-cpdh002a-lab3a-vxf-2021-12-24-otohkun.tar.xz

      + /var/log/messages has the error message[2].
      + /var/log/containers/nova/nova-compute.log.1 may show related messages

      • lspci shws 10 physical interfaces[4], in which DPDK NICs are 4 "Mellanox Technologies MT27710 [ConnectX-4 Lx]".
        DPDK NICs' names are ens1f0, ens2f0, ens1f1 and ens2f1.

      (As far as I checked lspci, these 10 interfaces belongs the same NUMA nodes (0)).

      • Unfortunately, CU had deleted the instance before they got the sosreports, so we can't find configuration of it in /etc/libvirt/qemu/ and any information of the ports sos_commands/openvswitch/.

      If you need the above information, please let me know. I will try to ask CU to reproduce this problem and to get sosreports.

      You can also check CU's T-H-T environment files in the following attachment file in supportshell.

      • 0110-custom_latest.zip

      As far as I checked the T-H-T files, they already configured the following settings:

      • network-environment.yaml includes the following:

      ~~~
      parameter_defaults:
      :

      1. MTU global configuration
        NeutronGlobalPhysnetMtu: 9000
        ~~~
      • MTUs of related physical interfaces and virtual bridges are set 9000 in the following files:

      custom-latest-custom-nics/compute-dpdk-dl360gen10.yaml

      [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/network_functions_virtualization_planning_and_configuration_guide/index#c_ovsdpdk-memory-params

      [2] /var/log/messages of the compute node:
      ~~~
      Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl – --may-exist add-br br-int – set Bridge br-int datapath_type=netdev
      Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 – --may-exist add-port br-int vhuf2c19278-e7 – set Interface vhuf2c19278-e7 external-ids:iface-id=f2c19278-e78c-4ab2-946a-fba3ce2b3264 external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:e8:d9:6a external-ids:vm-uuid=1eefaef2-0bad-4b44-9cb9-4df9a096e45f type=dpdkvhostuserclient options:vhost-server-path=/var/lib/vhost_sockets/vhuf2c19278-e7
      Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vswitchd[238324]: ovs|3779302|netdev_dpdk|ERR|Failed to create memory pool for netdev vhuf2c19278-e7, with MTU 1500 on socket 0: Invalid argument
      Dec 23 15:21:35 cpdh002a-lab3a-vxf ovs-vswitchd[238324]: ovs|3779303|dpif_netdev|ERR|Failed to set interface vhuf2c19278-e7 new configuration
      ~~~

      [3] procedure which CU executed
      ~~~network create
      curl -g -i -X POST http://10.3.58.68:9696/v2.0/networks -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d '
      {
      "network":

      { "name": $network_name, "provider:physical_network": "nfv-plane", "admin_state_up": true, "provider:network_type": "vlan", "shared": true, "port_security_enabled": false, "mtu": "9000", "provider:segmentation_id": $vlan_id }

      }'

      ~~~

      ~~~port create
      curl -g -i -X POST http://10.3.58.68:9696/v2.0/ports -H "User-Agent: osc-lib/1.9.0 keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN" -d '
      {
      "port":

      { "network_id": $network_id, "port_security_enabled": false, "name": $port_name, "admin_state_up": true }

      }'

      ~~~

      ~~~server create
      curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{
      "server": {
      "name": "$server_name",
      "imageRef": "",
      "availability_zone": "$az_name",
      "block_device_mapping_v2": [

      { "source_type": "volume", "boot_index": "0", "uuid": "$volume_id", "destination_type": "volume" }

      ],
      "flavorRef": "$flavor_id",
      "max_count": 1,
      "min_count": 1,
      "networks": [

      { "port": "$port_id" }

      ]
      }
      }'

      ~~~

      ~~~server stop
      curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers/$server_id/action -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '

      { "os-stop": null }

      '

      ~~~

      ~~~server add port (loop this 20 times)
      curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers/$server_id/os-interface -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '
      {
      "interfaceAttachment":

      { "port_id": "$port_id" }

      }'

      ~~~

      ~~~server start
      curl -g -i -X POST http://10.3.58.68:8774/v2.1/servers/$server_id/action -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '

      { "os-start": null }

      '

      ~~~

      [4] Physical NICs from lspci:

      02:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
      02:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
      02:00.2 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
      02:00.3 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
      12:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
      12:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
      37:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
      37:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
      5d:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
      5d:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)

      Version-Release number of selected component (if applicable):

      • Red Hat OpenStack Platform 13.0.8
      • openvswitch-2.9.0-110.el7fdp.x86_64
      • Interface: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

      How reproducible:

      Steps to Reproduce:

      Please see [3]

      Actual results:

      When CU set dpdk-socket-mem to 3072, the error[2] occurred. However, when CU set dpdk-socket-mem to 4096, the error[2] stopped occurring and everything worked well.

      Expected results:

      Even if CU set dpdk-socket-mem set 3072, the error doesn't occur.

      Additional info:

      • xsos -i shows the output[5]. You can see each MTU of the interfaces and the bridges.
      • sospreports of controller nodes of when this problem occurred:

      0030-sosreport-ctrf002a-lab3a-vxf-2021-12-24-ekfgneu.tar.xz
      0040-sosreport-ctrf001a-lab3a-vxf-2021-12-24-gadoewz.tar.xz
      0050-sosreport-ctrf003a-lab3a-vxf-2021-12-24-lvsoxnl.tar.xz

      [5] `xsos -i` of the compute node:
      IP4
      Interface Master IF MAC Address MTU State IPv4 Address
      ========= ========= ================= ====== ===== ==================
      lo - - 65536 up 127.0.0.1/8
      eno1 - 94:40:c9:3f:62:54 1500 up -
      eno2 - 94:40:c9:3f:62:55 9000 up 10.3.60.13/23
      eno3 - 94:40:c9:3f:62:56 1500 up -
      eno4 - 94:40:c9:3f:62:57 1500 up -
      eno5 bond-intpub d4:f5:ef:0f:b1:30 9000 up -
      eno6 bond-intpub d4:f5:ef:0f:b1:30 9000 up -
      ens1f1 - 94:40:c9:a2:58:9d 9000 up 10.3.69.4/23
      ens2f1 - 94:40:c9:a8:9e:d9 9000 up 10.3.71.4/23
      ovs-netdev - 06:3b:af:68:c9:c7 1500 DOWN -
      br-int - 26:16:e2:18:86:49 9000 DOWN -
      br-ex - f2:d9:99:9b:28:4f 1500 DOWN -
      bond0 - 1e:2b:79:2a:ad:2e 1500 DOWN -
      bond-intpub - d4:f5:ef:0f:b1:30 9000 up -
      br-nfv - 94:40:c9:a2:58:9c 9000 up -
      vlan4@bond-intpub - d4:f5:ef:0f:b1:30 9000 up 10.3.51.4/23
      vlan5@bond-intpub - d4:f5:ef:0f:b1:30 9000 up 10.3.55.4/23
      docker0 - 02:42:5c:18:4c:b7 1500 up 172.31.0.1/24
      ens1f0 - 94:40:c9:a2:58:9c 9000 up -
      ens2f0 - 94:40:c9:a8:9e:d8 9000 up -

              ratailor_jira Rajesh Tailor (Inactive)
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: