Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12896

BZ#2327993 [OVS][17.1] [SRIOV ] Network Agent: Open vSwitch agent is not alive for compute-0 - after FFU

XMLWordPrintable

    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • python-os-ken-1.4.1-17.1.20241205090937.018d755.el9osttrunk
    • Committed
    • Committed
    • None
    • Important

      Description of problem:
      Running SRIOV FFU - OVS job[1][2]- from 16.2 to 17.1
      For this scenario:
      computesriov-0 has RHEL9.2
      computesriov-1 has RHEL8.4

      The Agent Type: Open vSwitch agent is not alive for compute-0
      and causes many tests failures.

      It seems it happened after FFU process [3]

      2024-11-20 18:24:38.797 6144 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-a8faba15-3478-4761-8c73-1a8bd95b8685 - - - - -] OVS is dead. OVSNeutronAgent will keep running and checking OVS status periodically.
      2024-11-20 18:24:38.797 6144 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-a8faba15-3478-4761-8c73-1a8bd95b8685 - - - - -] Agent rpc_loop - iteration:6 completed. Processed ports statistics: {'regular': {'added': 0, 'updated': 0, 'removed': 0}}. Elapsed:300.002
      2024-11-20 18:24:38.798 6144 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-a8faba15-3478-4761-8c73-1a8bd95b8685 - - - - -] Agent rpc_loop - iteration:7 started
      2024-11-20 18:29:37.481 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] OVS is down, not reporting state to server
      2024-11-20 18:29:38.798 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-a8faba15-3478-4761-8c73-1a8bd95b8685 - - - - -] Switch connection timeout
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int [req-a8faba15-3478-4761-8c73-1a8bd95b8685 - - - - -] Failed to communicate with the switch: RuntimeError: Switch connection timeout
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int Traceback (most recent call last):
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py", line 66, in check_canary_table
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int flows = self.dump_flows(constants.CANARY_TABLE)
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 156, in dump_flows
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int (dp, ofp, ofpp) = self._get_dp()
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py", line 71, in _get_dp
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int self._cached_dpid = new_dpid
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 227, in _exit_
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int self.force_reraise()
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int raise self.value
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py", line 54, in _get_dp
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int dp = self._get_dp_by_dpid(self._cached_dpid)
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 79, in _get_dp_by_dpid
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int raise RuntimeError(m)
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int RuntimeError: Switch connection timeout
      2024-11-20 18:29:38.799 6144 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int
      2024-11-20 18:29:38.799 6144 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-a8faba15-3478-4761-8c73-1a8bd95b8685 - - - - -] OVS is dead. OVSNeutronAgent will keep running and checking OVS status periodically.

      Version-Release number of selected component (if applicable):
      RHOS-17.1-RHEL-9-20241030.n.1

      How reproducible:
      Running the job

      Actual results:
      The Agent Type: Open vSwitch agent is not alive for compute-0

      Expected results:
      Alive status for all network agent

      Slack thread: https://redhat-internal.slack.com/archives/C046JULBVJ7/p1732186604009099

      Additional info:

      (overcloud) [stack@undercloud-0 ~]$ openstack network agent list
      ---------------------------------------------------------------------------------------------------------------------------------------

      ID Agent Type Host Availability Zone Alive State Binary

      ---------------------------------------------------------------------------------------------------------------------------------------

      0ebf230c-968a-4476-819a-3ccabc5db378 Open vSwitch agent computesriov-0.redhat.local None XXX UP neutron-openvswitch-agent
      30a0dc8c-e411-4fc1-825d-313588d7e875 Metadata agent controller-0.redhat.local None UP neutron-metadata-agent
      3a621f5a-1660-47d0-8d3e-70306800e1f3 DHCP agent controller-1.redhat.local nova UP neutron-dhcp-agent
      6a0f4d6d-162e-4d87-849d-43839f02327f Metadata agent controller-1.redhat.local None UP neutron-metadata-agent
      7afb0a4c-debb-49e1-899b-f83c6a89bda0 L3 agent controller-0.redhat.local nova UP neutron-l3-agent
      82a50c8c-e85c-4179-8da4-62953f058426 Open vSwitch agent controller-1.redhat.local None UP neutron-openvswitch-agent
      8427d9da-cd01-488c-9dcc-6a8151d001a5 DHCP agent controller-0.redhat.local nova UP neutron-dhcp-agent
      9575d2cb-5294-45cc-b5d6-2b21c60e76c7 NIC Switch agent computesriov-0.redhat.local None UP neutron-sriov-nic-agent
      9a041508-671a-4db0-bd30-478613d7e63a L3 agent controller-2.redhat.local nova UP neutron-l3-agent
      9e324516-bb5a-4690-9483-166fc25d1bd7 NIC Switch agent computesriov-1.redhat.local None UP neutron-sriov-nic-agent
      b23b902e-dca0-4986-81fb-540ced78fc59 Open vSwitch agent controller-2.redhat.local None UP neutron-openvswitch-agent
      b9749684-022c-42ab-bf46-603da9fa4d09 Open vSwitch agent computesriov-1.redhat.local None UP neutron-openvswitch-agent
      c39682c0-c71e-4d31-8e1f-09c017d9328a Open vSwitch agent controller-0.redhat.local None UP neutron-openvswitch-agent
      cf05f4f2-3d23-4eb2-8d97-2fcd351a3fed L3 agent controller-1.redhat.local nova UP neutron-l3-agent
      d363ff5d-21a6-4288-b0b3-0c848c214eb8 DHCP agent controller-2.redhat.local nova UP neutron-dhcp-agent
      e2c5cfd5-0055-42e9-a6a5-95cd0256a2cc Metadata agent controller-2.redhat.local None UP neutron-metadata-agent

      ---------------------------------------------------------------------------------------------------------------------------------------

      [1]https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-all-unified-ffu-upgrade-16.2-17.1_director-rhel-virthost-3cont_2comp-ipv4-vlan-ml2ovs-sriov-multirhel/
      [2] https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/job/DFG-all-unified-ffu-upgrade-16.2-17.1_director-rhel-virthost-3cont_2comp-ipv4-vlan-ml2ovs-sriov-multirhel-fyanac/
      [3]https://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-all-unified-ffu-upgrade-16.2-17.1_director-rhel-virthost-3cont_2comp-ipv4-vlan-ml2ovs-sriov-multirhel/9/computesriov-0/var/log/containers/neutron/openvswitch-agent.log.gz

              rodolfo_alonso Rodolfo Alonso
              jira-bugzilla-migration RH Bugzilla Integration
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: