Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19813

[4.14] Significant 12 minute pod-to-host disruption detected on aws ovn minor upgrades


    • Important
    • No
    • Approved
    • False
    • Hide



      DISCLAIMER: The code for measuring disruption in-cluster is extremely new, we cannot be 100% confident what we're seeing is real, however the below bug is demonstrating a problem that is occurring in a very specific configuration, all others are unaffected, so this helps us gain some confidence what we're seeing is real.


      • affects pod-to-host-new-connections
      • affects aws minor upgrades are seeing over 14000s of disruption for the P50
      • does not affect pod-to-host-reused-connections
      • does not affect any other clouds
      • does not affect micro upgrades
      • does not affect pod-to-service or pod-to-pod backends
      • does not affect sdn

      The total disruption comes from a number of pods which are added together, the actual duration of the disruption is roughly / 14. The actual disruption appears to be about 12 minutes and hits all pods doing pod-to-host monitoring simultaneously.

      Sample job: (taken from expanding the "Most Recent Runs" panel in grafana)


      In the first spyglass chart for upgrade you can see the batch of disruption: 7:28:19 - 7:40:03

      We do not have data prior to ovn interconnect landing, so we cannot say if this started at that time or not.

            jtanenba@redhat.com Jacob Tanenbaum
            rhn-engineering-dgoodwin Devan Goodwin
            Anurag Saxena Anurag Saxena
            0 Vote for this issue
            6 Start watching this issue