Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-10821

Ping loss higher than 0 second after minor update

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • ovn-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Important

      Got this error when running minor update in RHOSO:

      2024-10-18 12:09:19,342 p=42199 u=zuul n=ansible | TASK [update : Stop l3 agent connectivity check _raw_params={{ cifmw_update_artifacts_basedir }}/l3_agent_stop_ping.sh    {{ cifmw_update_ping_loss_second }}    {{ cifmw_update_ping_loss_percent }}
      ] ***
      2024-10-18 12:09:19,342 p=42199 u=zuul n=ansible | Friday 18 October 2024  12:09:19 -0400 (0:00:00.061)       0:13:02.704 ********
      2024-10-18 12:09:19,617 p=42199 u=zuul n=ansible | fatal: [localhost]: FAILED! => changed=true
        cmd: |-
          /home/zuul/ci-framework-data/tests/update/l3_agent_stop_ping.sh    0    0
        delta: '0:00:00.057042'
        end: '2024-10-18 12:09:19.593356'
        msg: non-zero return code
        rc: 1
        start: '2024-10-18 12:09:19.536314'
        stderr: ''
        stderr_lines: <omitted>
        stdout: |-
          521 packets transmitted, 517 received, 0.767754% packet loss, time 529721ms
          rtt min/avg/max/mdev = 0.497/0.909/18.589/1.087 ms
          Ping loss higher than 0 seconds detected (4 seconds)
        stdout_lines: <omitted>

      That's the result of running l3_agent_stop_ping.sh [0] (by ansible) LOSS_THRESHOLD and LOSS_THRESHOLD_PERCENT set to 0.

      The workload consists of a VM with a dpdk interface with a FIP assigned:

      sh-5.1$ openstack server list --all --long
      +--------------------------------------+---------------------+--------+------------+-------------+----------------------------------------------------+-----------------------------+--------------------------------------+-------------------------+-------------------+--------------------------------+------------+-------------+
      | ID                                   | Name                | Status | Task State | Power State | Networks                                           | Image Name                  | Image ID                             | Flavor                  | Availability Zone | Host                           | Properties | Host Status |
      +--------------------------------------+---------------------+--------+------------+-------------+----------------------------------------------------+-----------------------------+--------------------------------------+-------------------------+-------------------+--------------------------------+------------+-------------+
      | d42e296a-a815-49c1-8a74-cc4cb5248b2b | instance_4778f6d126 | ACTIVE | None       | Running     | internal_net_4778f6d126=<FIP>, 192.168.0.51 | upgrade_workload_4778f6d126 | ef04f4a3-3b94-4eba-963a-ea6e4def1d52 | v1-8192M-10G-4778f6d126 | nova              | compute-0.ctlplane.example.com |            | UP          |
      +--------------------------------------+---------------------+--------+------------+-------------+----------------------------------------------------+-----------------------------+--------------------------------------+-------------------------+-------------------+--------------------------------+------------+-------------+

      I was able to reproduce this packet loss twice.

       [0] https://github.com/openstack-k8s-operators/ci-framework/blob/main/roles/update/templates/l3_agent_stop_ping.sh.j2

              averdagu@redhat.com Arnau Verdaguer Puigdollers
              rdiazcam@redhat.com Ricardo Diaz Campos
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: