Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-14874

BZ#1950268 Removing a VM and its ports (VFs) produces a kernel crash when using a RT image in computes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Undefined Undefined
    • None
    • None
    • python-networking-ovn
    • None
    • False
    • False
    • ?
    • None
    • Undefined

      Description of problem:
      Removing a VM and its ports (VFs) produces a kernel crash when using a RT image in computes

      [ 8184.214770] IPv4: martian source 10.35.74.8 from 10.35.74.126, on dev eno1
      [ 8184.214773] ll header: 00000000: ff ff ff ff ff ff 9c cc 83 58 1c 60 08 06 .........X.`..
      [ 8192.714949] i40e 0000:05:00.2: Setting MAC b6:e2:14:b6:d6:4e on VF 8
      [ 8192.800714] i40e 0000:05:00.2: Bring down and up the VF interface to make this change effective.
      [ 8192.811921] iavf 0000:05:0b.0: enabling device (0000 -> 0002)
      [ 8192.874279] iavf 0000:05:0b.0: Multiqueue Enabled: Queue pair count = 4
      [ 8192.878943] iavf 0000:05:0b.0: MAC address: b6:e2:14:b6:d6:4e
      [ 8192.878945] iavf 0000:05:0b.0: GRO is enabled
      [ 8192.893759] iavf 0000:05:0b.0 enp5s0f2v8: renamed from eth0
      [ 8192.999646] iavf 0000:05:0b.0: Reset warning received from the PF
      [ 8192.999649] iavf 0000:05:0b.0: Scheduling reset task
      [ 8193.105429] i40e 0000:05:00.2: VF 8 is now untrusted
      [ 8193.108240] IPv6: ADDRCONF(NETDEV_UP): enp5s0f2v8: link is not ready
      [ 8193.121854] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [ 8193.121856] PGD 0 P4D 0
      [ 8193.121860] Oops: 0000 1 PREEMPT SMP NOPTI
      [ 8193.121863] CPU: 21 PID: 5689 Comm: NetworkManager Kdump: loaded Not tainted 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1
      [ 8193.121864] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.8.0 005/17/2018
      [ 8193.121872] RIP: 0010:iavf_alloc_rx_buffers+0x4f/0x250 [iavf]
      [ 8193.121874] Code: 0f 85 df 00 00 00 0f b7 47 48 41 89 f7 48 89 fb 49 89 c4 48 8d 14 40 49 89 c5 48 8b 47 20 49 c1 e4 05 4c 03 67 08 48 8d 2c d0 <48> 83 7d 08 00 0f b7 4b 46 0f 84 c1 00 00 00 48 83 83 80 00 00 00
      [ 8193.121875] RSP: 0018:ffffc16857923558 EFLAGS: 00010246
      [ 8193.121877] RAX: 0000000000000000 RBX: ffff9b72e22e1000 RCX: 0000000000000200
      [ 8193.121878] RDX: 0000000000000000 RSI: 00000000000001ff RDI: ffff9b72e22e1000
      [ 8193.121879] RBP: 0000000000000000 R08: 0000000000000600 R09: ffff9b7b220a0ec0
      [ 8193.121880] R10: 0000000092492480 R11: 0000000000000000 R12: 0000000000000000
      [ 8193.121881] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000001ff
      [ 8193.121882] FS: 00007f7fff96d200(0000) GS:ffff9b7b3f880000(0000) knlGS:0000000000000000
      [ 8193.121883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8193.121884] CR2: 0000000000000008 CR3: 0000003fe9bce001 CR4: 00000000003626e0
      [ 8193.121886] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 8193.121887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 8193.121887] Call Trace:
      [ 8193.121896] iavf_configure+0x124/0x180 [iavf]
      [ 8193.121901] iavf_open+0x100/0x180 [iavf]
      [ 8193.121905] __dev_open+0xcd/0x160
      [ 8193.121908] __dev_change_flags+0x1ad/0x220
      [ 8193.121912] dev_change_flags+0x21/0x60
      [ 8193.121916] do_setlink+0x314/0xed0
      [ 8193.121920] ? preempt_count_add+0x79/0xb0
      [ 8193.121922] ? preempt_count_add+0x79/0xb0
      [ 8193.121926] ? __nla_validate_parse+0x51/0x840

      It is reproduce running the following testcase:
      python -m testtools.run nfv_tempest_plugin.tests.scenario.test_nfv_sriov_usecases.TestSriovScenarios.test_sriov_free_resource
      And the following templates:
      https://gitlab.cee.redhat.com/mnietoji/deployment_templates/-/tree/460218cb433959a6b73597a437882966391b1417/tht/panther08/ospd-16.1-geneve-ovn-dpdk-sriov-ctlplane-dataplane-bonding-rt-hybrid-performance-panther08

      The testcase does something similar to the following:
      #!/usr/bin/env bash

      #networks
      openstack network create --provider-network-type geneve mgmt
      openstack subnet create --gateway 10.10.10.254 --network mgmt --subnet-range 10.10.10.0/24 --dhcp --dns-nameserver 10.46.0.31 --dns-nameserver 8.8.8.8 --allocation-pool start=10.10.10.100,end=10.10.10.200 mgmt_subnet
      openstack network create --provider-physical-network sriov-1 --provider-network-type vlan sriov_vf
      openstack subnet create --gateway 40.0.0.254 --network sriov_vf --subnet-range 40.0.0.0/24 --dhcp --dns-nameserver 10.46.0.31 --dns-nameserver 8.8.8.8 --allocation-pool start=40.0.0.100,end=40.0.0.200 sriov_vf_subnet

      #ports
      openstack port create --network mgmt --vnic-type normal mgmt_1
      openstack port create --network mgmt --vnic-type normal mgmt_2
      openstack port create --network mgmt --vnic-type normal mgmt_3
      openstack port create --network mgmt --vnic-type normal mgmt_4
      openstack port create --network sriov_vf --vnic-type direct sriov_vf_1
      openstack port create --network sriov_vf --vnic-type direct sriov_vf_2
      openstack port create --network sriov_vf --vnic-type direct sriov_vf_3
      openstack port create --network sriov_vf --vnic-type direct sriov_vf_4
      #flavor
      openstack flavor create --ram 8192 --disk 20 --vcpus 6 nfv_qe_base_flavor
      openstack flavor set nfv_qe_base_flavor --property hw:mem_page_size=large --property hw:cpu_policy=dedicated --property hw:cpu_realtime=yes --property hw:cpu_emulator_threads=isolate --property hw:cpu_realtime_mask=^0-1

      #image
      curl -o rhel-guest-image-7-6-210-x86-64-qcow2 http://rhos-qe-mirror-tlv.usersys.redhat.com/brewroot/packages/rhel-guest-image/7.6/210/images/rhel-guest-image-7.6-210.x86_64.qcow2
      openstack image create --disk-format qcow2 --container-format bare --public --file ./rhel-guest-image-7-6-210-x86-64-qcow2 rhel-guest-image-7-6-210-x86-64-qcow2

      #keypair
      openstack keypair create --public-key /home/stack/.ssh/id_rsa.pub mykeypair

      #vms
      openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_1 --port sriov_vf_1 myinstance1
      openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_2 --port sriov_vf_2 myinstance2
      openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_3 --port sriov_vf_3 myinstance3
      openstack server create --key-name mykeypair --flavor nfv_qe_base_flavor --image rhel-guest-image-7-6-210-x86-64-qcow2 --security-group default --port mgmt_4 --port sriov_vf_4 myinstance4
      #destroy ports and vms
      ips=$(openstack server list --a -c Networks -f value | sed 's/[=,;]/ /g' | awk '

      {print $2,$4}

      ')
      ips=$(echo $ips | sed 's/ /|/g')
      ports=$(openstack port list -f value | egrep $ips | awk '

      {print $1}

      ')
      servers=$(openstack server list --a -c ID -f value)
      for server in $servers;do
      openstack server delete $server
      done
      for port in $ports;do
      openstack port delete $port
      done

      It is not reproduced every time the testcase is run, but I have reproduced it several times

      Version-Release number of selected component (if applicable):
      RHOS-16.1-RHEL-8-20210323.n.0(venv) (overcloud) [stack@undercloud-0 ~]
      Red Hat Enterprise Linux release 8.2 (Ootpa)
      Linux computeovndpdksriovrt-1 4.18.0-193.28.1.rt13.77.el8_2.x86_64 #1 SMP PREEMPT RT Fri Oct 16 14:11:07 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

      How reproducible:
      look above

      Actual results:
      kernel crash

      Expected results:
      No kernel crash should be generated

      Additional info:
      I will upload sos reports and kernel crash dumps

              jira-bugzilla-migration RH Bugzilla Integration
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: