-
Bug
-
Resolution: Done
-
Undefined
-
4.12
-
None
-
SDN Sprint 227, SDN Sprint 228
-
2
-
Rejected
-
False
-
Description of problem:
When ovnkube-master leader container is disrupted/reinitializes/restarts/goes through leader election, the pod annotation latency spikes up from 1.5 seconds to ~27.5 seconds as it processes all the pods after it comes back. The creation latency for pods that are already configured and working fine is being bumped, from the time they were scheduled until the time the restart happened like Dan Williams suggested which shouldn't be the case: https://coreos.slack.com/archives/C01G7T6SYSD/p1663607960049459.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Install an OpenShift or HyperShift cluster using the latest nightly - 4.11.0-0.nightly-2022-09-15-164423 has been used for this run 2. Use Kraken container scenarios to disrupt ovnkube-master container: https://github.com/redhat-chaos/krkn/blob/main/docs/container_scenarios.md 3. Observe histogram_quantile(0.99, sum(rate(ovnkube_master_pod_creation_latency_seconds_bucket metric
Actual results:
Expected results:
Additional info:
Logs:http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/hypershift/ovn-disruption/
- clones
-
OCPBUGS-1486 Avoid re-metric'ing the pods that are already setup when ovnkube-master disrupts/reinitializes/restarts/goes through leader election
- Closed
- is blocked by
-
OCPBUGS-1486 Avoid re-metric'ing the pods that are already setup when ovnkube-master disrupts/reinitializes/restarts/goes through leader election
- Closed
- links to