Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: ovn-operator
Labels:
None

Story Points:
8
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
Regression:
None
Intelligence Requested:
Market:
Target Version:

rhos-18.0.4

Severity:
Important

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Got this error when running minor update in RHOSO:

2024-10-18 12:09:19,342 p=42199 u=zuul n=ansible | TASK [update : Stop l3 agent connectivity check _raw_params={{ cifmw_update_artifacts_basedir }}/l3_agent_stop_ping.sh    {{ cifmw_update_ping_loss_second }}    {{ cifmw_update_ping_loss_percent }}
] ***
2024-10-18 12:09:19,342 p=42199 u=zuul n=ansible | Friday 18 October 2024  12:09:19 -0400 (0:00:00.061)       0:13:02.704 ********
2024-10-18 12:09:19,617 p=42199 u=zuul n=ansible | fatal: [localhost]: FAILED! => changed=true
  cmd: |-
    /home/zuul/ci-framework-data/tests/update/l3_agent_stop_ping.sh    0    0
  delta: '0:00:00.057042'
  end: '2024-10-18 12:09:19.593356'
  msg: non-zero return code
  rc: 1
  start: '2024-10-18 12:09:19.536314'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |-
    521 packets transmitted, 517 received, 0.767754% packet loss, time 529721ms
    rtt min/avg/max/mdev = 0.497/0.909/18.589/1.087 ms
    Ping loss higher than 0 seconds detected (4 seconds)
  stdout_lines: <omitted>

That's the result of running l3_agent_stop_ping.sh [0] (by ansible) LOSS_THRESHOLD and LOSS_THRESHOLD_PERCENT set to 0.

The workload consists of a VM with a dpdk interface with a FIP assigned:

sh-5.1$ openstack server list --all --long
+--------------------------------------+---------------------+--------+------------+-------------+----------------------------------------------------+-----------------------------+--------------------------------------+-------------------------+-------------------+--------------------------------+------------+-------------+
| ID                                   | Name                | Status | Task State | Power State | Networks                                           | Image Name                  | Image ID                             | Flavor                  | Availability Zone | Host                           | Properties | Host Status |
+--------------------------------------+---------------------+--------+------------+-------------+----------------------------------------------------+-----------------------------+--------------------------------------+-------------------------+-------------------+--------------------------------+------------+-------------+
| d42e296a-a815-49c1-8a74-cc4cb5248b2b | instance_4778f6d126 | ACTIVE | None       | Running     | internal_net_4778f6d126=<FIP>, 192.168.0.51 | upgrade_workload_4778f6d126 | ef04f4a3-3b94-4eba-963a-ea6e4def1d52 | v1-8192M-10G-4778f6d126 | nova              | compute-0.ctlplane.example.com |            | UP          |
+--------------------------------------+---------------------+--------+------------+-------------+----------------------------------------------------+-----------------------------+--------------------------------------+-------------------------+-------------------+--------------------------------+------------+-------------+

I was able to reproduce this packet loss twice.

[0] https://github.com/openstack-k8s-operators/ci-framework/blob/main/roles/update/templates/l3_agent_stop_ping.sh.j2

is triggering

OSPRH-11636 Check on how to handle dataplane outage with OVS pods restarts

Closed

OSPRH-11002 Root Cause and Refine OSPRH-10821

Closed

links to

openstack-k8s-operators/ovn-operator#368: Restart ovn-controller gracefully on PreStop

openstack-k8s-operators/ovn-operator#423: [ovn-controller] Change startup mechanism of ovs pods

openstack-k8s-operators/ovn-operator#434: Modify startup scripts for ovn-controller-ovs

mentioned in: Page Loading...

(1 mentioned in)

Assignee:: Arnau Verdaguer Puigdollers

Reporter:: Ricardo Diaz Campos

Team:: rhos-dfg-networking-squad-neutron

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/10/21 9:29 AM

Updated:: 2025/06/06 1:38 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty