-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.14.0
-
Important
-
No
-
Approved
-
False
-
-
-
Bug Fix
-
In Progress
DISCLAIMER: The code for measuring disruption in-cluster is extremely new, we cannot be 100% confident what we're seeing is real, however the below bug is demonstrating a problem that is occurring in a very specific configuration, all others are unaffected, so this helps us gain some confidence what we're seeing is real.
- affects pod-to-host-new-connections
- affects aws minor upgrades are seeing over 14000s of disruption for the P50
- does not affect pod-to-host-reused-connections
- does not affect any other clouds
- does not affect micro upgrades
- does not affect pod-to-service or pod-to-pod backends
- does not affect sdn
The total disruption comes from a number of pods which are added together, the actual duration of the disruption is roughly / 14. The actual disruption appears to be about 12 minutes and hits all pods doing pod-to-host monitoring simultaneously.
Sample job: (taken from expanding the "Most Recent Runs" panel in grafana)
In the first spyglass chart for upgrade you can see the batch of disruption: 7:28:19 - 7:40:03
We do not have data prior to ovn interconnect landing, so we cannot say if this started at that time or not.
- is cloned by
-
OCPBUGS-19813 [4.14] Significant 12 minute pod-to-host disruption detected on aws ovn minor upgrades
- Closed
- is depended on by
-
OCPBUGS-19813 [4.14] Significant 12 minute pod-to-host disruption detected on aws ovn minor upgrades
- Closed
- links to
-
RHEA-2023:7198 rpm