-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
None
-
None
-
False
-
False
-
?
-
None
-
Undefined
-
Description of problem:
Removing a VM and its ports (VFs) produces a kernel crash when using a RT image in computes
[ 8184.214770] IPv4: martian source 10.35.74.8 from 10.35.74.126, on dev eno1
[ 8184.214773] ll header: 00000000: ff ff ff ff ff ff 9c cc 83 58 1c 60 08 06 .........X.`..
[ 8192.714949] i40e 0000:05:00.2: Setting MAC b6:e2:14:b6:d6:4e on VF 8
[ 8192.800714] i40e 0000:05:00.2: Bring down and up the VF interface to make this change effective.
[ 8192.811921] iavf 0000:05:0b.0: enabling device (0000 -> 0002)
[ 8192.874279] iavf 0000:05:0b.0: Multiqueue Enabled: Queue pair count = 4
[ 8192.878943] iavf 0000:05:0b.0: MAC address: b6:e2:14:b6:d6:4e
[ 8192.878945] iavf 0000:05:0b.0: GRO is enabled
[ 8192.893759] iavf 0000:05:0b.0 enp5s0f2v8: renamed from eth0
[ 8192.999646] iavf 0000:05:0b.0: Reset warning received from the PF
[ 8192.999649] iavf 0000:05:0b.0: Scheduling reset task
[ 8193.105429] i40e 0000:05:00.2: VF 8 is now untrusted
[ 8193.108240] IPv6: ADDRCONF(NETDEV_UP): enp5s0f2v8: link is not ready
[ 8193.121854] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 8193.121856] PGD 0 P4D 0
[ 8193.121860] Oops: 0000 1 PREEMPT SMP NOPTI
[ 8193.121863] CPU: 21 PID: 5689 Comm: NetworkManager Kdump: loaded Not tainted 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1
[ 8193.121864] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.8.0 005/17/2018
[ 8193.121872] RIP: 0010:iavf_alloc_rx_buffers+0x4f/0x250 [iavf]
[ 8193.121874] Code: 0f 85 df 00 00 00 0f b7 47 48 41 89 f7 48 89 fb 49 89 c4 48 8d 14 40 49 89 c5 48 8b 47 20 49 c1 e4 05 4c 03 67 08 48 8d 2c d0 <48> 83 7d 08 00 0f b7 4b 46 0f 84 c1 00 00 00 48 83 83 80 00 00 00
[ 8193.121875] RSP: 0018:ffffc16857923558 EFLAGS: 00010246
[ 8193.121877] RAX: 0000000000000000 RBX: ffff9b72e22e1000 RCX: 0000000000000200
[ 8193.121878] RDX: 0000000000000000 RSI: 00000000000001ff RDI: ffff9b72e22e1000
[ 8193.121879] RBP: 0000000000000000 R08: 0000000000000600 R09: ffff9b7b220a0ec0
[ 8193.121880] R10: 0000000092492480 R11: 0000000000000000 R12: 0000000000000000
[ 8193.121881] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000001ff
[ 8193.121882] FS: 00007f7fff96d200(0000) GS:ffff9b7b3f880000(0000) knlGS:0000000000000000
[ 8193.121883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8193.121884] CR2: 0000000000000008 CR3: 0000003fe9bce001 CR4: 00000000003626e0
[ 8193.121886] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8193.121887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8193.121887] Call Trace:
[ 8193.121896] iavf_configure+0x124/0x180 [iavf]
[ 8193.121901] iavf_open+0x100/0x180 [iavf]
[ 8193.121905] __dev_open+0xcd/0x160
[ 8193.121908] __dev_change_flags+0x1ad/0x220
[ 8193.121912] dev_change_flags+0x21/0x60
[ 8193.121916] do_setlink+0x314/0xed0
[ 8193.121920] ? preempt_count_add+0x79/0xb0
[ 8193.121922] ? preempt_count_add+0x79/0xb0
[ 8193.121926] ? __nla_validate_parse+0x51/0x840
It is reproduce running the following testcase:
python -m testtools.run nfv_tempest_plugin.tests.scenario.test_nfv_sriov_usecases.TestSriovScenarios.test_sriov_free_resource
And the following templates:
https://gitlab.cee.redhat.com/mnietoji/deployment_templates/-/tree/460218cb433959a6b73597a437882966391b1417/tht/panther08/ospd-16.1-geneve-ovn-dpdk-sriov-ctlplane-dataplane-bonding-rt-hybrid-performance-panther08
The testcase does something similar to the following:
#!/usr/bin/env bash
#networks
openstack network create --provider-network-type geneve mgmt
openstack subnet create --gateway 10.10.10.254 --network mgmt --subnet-range 10.10.10.0/24 --dhcp --dns-nameserver 10.46.0.31 --dns-nameserver 8.8.8.8 --allocation-pool start=10.10.10.100,end=10.10.10.200 mgmt_subnet
openstack network create --provider-physical-network sriov-1 --provider-network-type vlan sriov_vf
openstack subnet create --gateway 40.0.0.254 --network sriov_vf --subnet-range 40.0.0.0/24 --dhcp --dns-nameserver 10.46.0.31 --dns-nameserver 8.8.8.8 --allocation-pool start=40.0.0.100,end=40.0.0.200 sriov_vf_subnet
#ports
openstack port create --network mgmt --vnic-type normal mgmt_1
openstack port create --network mgmt --vnic-type normal mgmt_2
openstack port create --network mgmt --vnic-type normal mgmt_3
openstack port create --network mgmt --vnic-type normal mgmt_4
openstack port create --network sriov_vf --vnic-type direct sriov_vf_1
openstack port create --network sriov_vf --vnic-type direct sriov_vf_2
openstack port create --network sriov_vf --vnic-type direct sriov_vf_3
openstack port create --network sriov_vf --vnic-type direct sriov_vf_4
#flavor
openstack flavor create --ram 8192 --disk 20 --vcpus 6 nfv_qe_base_flavor
openstack flavor set nfv_qe_base_flavor --property hw:mem_page_size=large --property hw:cpu_policy=dedicated --property hw:cpu_realtime=yes --property hw:cpu_emulator_threads=isolate --property hw:cpu_realtime_mask=^0-1
#image
curl -o rhel-guest-image-7-6-210-x86-64-qcow2 http://rhos-qe-mirror-tlv.usersys.redhat.com/brewroot/packages/rhel-guest-image/7.6/210/images/rhel-guest-image-7.6-210.x86_64.qcow2
openstack image create --disk-format qcow2 --container-format bare --public --file ./rhel-guest-image-7-6-210-x86-64-qcow2 rhel-guest-image-7-6-210-x86-64-qcow2
#keypair
openstack keypair create --public-key /home/stack/.ssh/id_rsa.pub mykeypair
#vms
openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_1 --port sriov_vf_1 myinstance1
openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_2 --port sriov_vf_2 myinstance2
openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_3 --port sriov_vf_3 myinstance3
openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_4 --port sriov_vf_4 myinstance4
#destroy ports and vms
ips=$(openstack server list --a -c Networks -f value | sed 's/[=,;]/ /g' | awk '
')
ips=$(echo $ips | sed 's/ /|/g')
ports=$(openstack port list -f value | egrep $ips | awk '
')
servers=$(openstack server list --a -c ID -f value)
for server in $servers;do
openstack server delete $server
done
for port in $ports;do
openstack port delete $port
done
It is not reproduced every time the testcase is run, but I have reproduced it several times
Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210323.n.0(venv) (overcloud) [stack@undercloud-0 ~]
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux computeovndpdksriovrt-1 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1 SMP PREEMPT RT Fri Oct 16 14:11:07 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
How reproducible:
look above
Actual results:
kernel crash
Expected results:
No kernel crash should be generated
Additional info:
I will upload sos reports and kernel crash dumps