Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-12861

BZ#2262654 [17.1][OVN][DVR][SNAT] 2 time 20 second of ping loss in case of controller come up after crash for snat ports

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Set
    • Not Set
    • Not Set
    • None

      Description of problem:
      When a controller node crash is triggered SNAT IP first moves to a running controller and then moves back from a running controller (not crashed) to a running controller (crashed). We expect a low amount of packets lost in both situations, but seems that when the IP move back there are several packets lost.

      Example:

      • crash of a controller node at 02/02/2024 10:46:14 CET
        -> from a laptop that can reach SNAP IP assigned to controller node:
        64 bytes from 10.x.10.198: icmp_seq=2091 ttl=252 time=22.0 ms
        3 packets lost
        64 bytes from 10.x.10.198: icmp_seq=2095 ttl=252 time=27.6 ms
        -> from the instance (No FIP) to external IP:
        64 bytes from 10.x.11.254: icmp_seq=1245 ttl=63 time=1.09 ms
        3 packets lost
        64 bytes from 10.x.11.254: icmp_seq=1249 ttl=63 time=2.69 ms

      So in case of a crash trigger, we can see 3 packets lost, not so bad.

      • controller node return UP 02/02/2024 10:54:56 CET
        -> from a laptop that can reach SNAP IP assigned to controller node:
        64 bytes from 10.x.10.198: icmp_seq=2602 ttl=252 time=22.1 ms
        19 packets lost
        64 bytes from 10.x.10.198: icmp_seq=2622 ttl=252 time=21.7 ms
        ....
        64 bytes from 10.x.10.198: icmp_seq=2647 ttl=252 time=21.6 ms
        6 packets lost
        64 bytes from 10.x.10.198: icmp_seq=2655 ttl=252 time=22.8 ms
        -> from the instance (No FIP) to external IP:
        64 bytes from 10.x.11.254: icmp_seq=1755 ttl=63 time=1.14 ms
        20 packets lost
        64 bytes from 10.x.11.254: icmp_seq=1776 ttl=63 time=4.26 ms

      When the node comes back we can see more than 20 packets lost and in case of SNAT IP seems happened two times

      Version-Release number of selected component (if applicable):
      Red Hat Openstack 17.1 (RHOSP17.1)

      Steps to Reproduce:
      1. trigger controller crash with `echo c > /proc/sysrq-trigger`
      2. start pinging the VM an external IP or from host external to RHOSP the SNAT IP
      3. When the controller nodes to come up we can see several ping lost in specific interval.

      Actual results:
      we can see ping lost for some seconds.

      Expected results:
      1 to 3 ping lost.

      Additional info:

              mtomaska@redhat.com Miro Tomaska
              jira-bugzilla-migration RH Bugzilla Integration
              Eran Kuris Eran Kuris
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: