Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12891

BZ#2214289 Neutron port is not set to up if worker is disconnected while LSP is created and set to up in Northbound

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • openstack-neutron
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None

      Description of problem:
      Noticed the issue in ovn perf job[1]. Fails as below:-
      Traceback (most recent call last):
      File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally/task/runner.py", line 69, in _run_scenario_once
      getattr(scenario_inst, method_name)(**scenario_kwargs)
      File "rally/rally-plugins/netcreate-boot/netcreate_nova_boot_fip_ping.py", line 49, in run
      guest = self._boot_server_with_fip(image, flavor, True,
      File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally_openstack/task/scenarios/vm/utils.py", line 144, in _boot_server_with_fip
      server = self._boot_server(image, flavor, **kwargs)
      File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally_openstack/task/scenarios/nova/utils.py", line 107, in _boot_server
      server = utils.wait_for_status(
      File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally/task/utils.py", line 245, in wait_for_status
      raise exceptions.TimeoutException(
      rally.exceptions.TimeoutException: Rally tired waiting 300.00 seconds for Server s_rally_9dd8b1b4_8pNZItaq:56f8ecc8-5d71-4664-a20b-9e832319af51 to become ('ACTIVE') current status BUILD

      It timed out waiting for network-vif-plugged.

      network-vif-plugged is not triggered as neutron port is not transitioned from down to up.

      Ideally Event LogicalSwitchPortUpdateUpEvent is responsible for setting it up, but that only works for Update Event[2].
      But since the DB is reconnected it get's the Create Event(with up=True) instead of the Create(without up set) followed by Update(up=True).

      There also exist another event LogicalSwitchPortCreateUpEvent which works for the case LSP Create(with up=True)[3] but that only get's triggered during startup after that it's disabled[4]

      Port created in Northbound(checked ovsdb-tool show-log):-
      record 591:
      term: 4
      index: 17274
      eid: 3c41
      2023-06-09 00:11:20.877
      ...
      table Logical_Switch_Port insert row "abad941a-6790-4cc2-b89b-67901247e684" (f71b000d):
      name="abad941a-6790-4cc2-b89b-67901247e684"
      port_security=["fa:16:3e:f1:77:d4 10.2.5.211"]
      addresses=["fa:16:3e:f1:77:d4 10.2.5.211"]
      options=

      {mcast_flood_reports="true", requested-chassis=""}

      external_ids=

      {"neutron:cidrs"="10.2.5.211/24", "neutron:device_id"="56f8ecc8-5d71-4664-a20b-9e832319af51", "neutron:device_owner"="", "neutron:network_name"=neutron-7812be3a-bbaa-4e94-b992-e1f51a518663, "neutron:port_capabilities"="", "neutron:port_name"="", "neutron:project_id"="920a1eca08984537a1c8e5d79778db4b", "neutron:revision_number"="1", "neutron:security_group_ids"="81a5fbaf-935a-4352-9bfc-05d99c0e8457", "neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}

      Port is up in Southbound:-
      2023-06-09T00:11:22.445Z|01066|binding|INFO|Setting lport abad941a-6790-4cc2-b89b-67901247e684 up in Southbound

      Port is up in Northbound(checked ovsdb-tool show-log):-
      record 605:
      term: 4
      index: 17281
      eid: b3f8
      2023-06-09 00:11:22.458 "ovn-northd"
      table Logical_Switch_Port row "abad941a-6790-4cc2-b89b-67901247e684" (f71b000d) diff:
      up=true

      And for the duration(LSP create and LSP up in Northboud) worker is disconnected with db
      $ $ grep -nr :6641 ct1|grep "2023-06-09 00:11:2"|grep req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048
      ct1/server.log.1:13047:2023-06-09 00:11:20.856 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: connecting...
      ct1/server.log.1:13049:2023-06-09 00:11:20.857 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: connected
      ct1/server.log.1:13060:2023-06-09 00:11:20.861 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: clustered database server is not cluster leader; trying another server
      ct1/server.log.1:13063:2023-06-09 00:11:20.862 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: connection closed by client
      ct1/server.log.1:13064:2023-06-09 00:11:20.862 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: waiting 2 seconds before reconnect
      ct1/server.log.1:13130:2023-06-09 00:11:22.863 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.134:6641: connecting...
      ct1/server.log.1:13131:2023-06-09 00:11:22.864 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.134:6641: connected

      1. Events for the LSP record(all happened after db is reconnected)
        $ grep -nr f71b000d ct1
        ct1/server.log.1:13138:2023-06-09 00:11:22.879 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "create" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
        ct1/server.log.1:13751:2023-06-09 00:12:35.175 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "update" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
        ct1/server.log.1:14693:2023-06-09 00:16:23.110 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "update" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
        ct1/server.log.1:14707:2023-06-09 00:16:24.035 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "delete" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769

      [1] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-perfscale-PerfCI-OSP17.1-neutron-ovn/56/artifact/test_results/rally-results.html#/BrowbeatPlugin.create_network_nova_boot_ping-2/failures
      [2] https://github.com/openstack/neutron/blob/e941180b68659a53c3c68705915a92552d3fe25c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L443
      [3] https://github.com/openstack/neutron/blob/e941180b68659a53c3c68705915a92552d3fe25c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L394-L411
      [4] https://github.com/openstack/neutron/blob/e941180b68659a53c3c68705915a92552d3fe25c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L748-L763

      How reproducible: Random issue, can happen if event watcher get's disconnected(due to raft leadership transfer) for the time between lsp is created and marked up in NorthBound(via northd).

      Actual results:
      Port is not up in Neutron(Even it's up in OVN DB) and cause vm creation failure

      Expected results:
      VM Creation should succeed

              twilson@redhat.com Terry Wilson
              jira-bugzilla-migration RH Bugzilla Integration
              Eran Kuris Eran Kuris
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: