Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1486

Avoid re-metric'ing the pods that are already setup when ovnkube-master disrupts/reinitializes/restarts/goes through leader election

XMLWordPrintable

    • SDN Sprint 227
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Bug Fix
    • Done

      Description of problem:

      When ovnkube-master leader container is disrupted/reinitializes/restarts/goes through leader election, the pod annotation latency spikes up from 1.5 seconds to ~27.5 seconds as it processes all the pods after it comes back. The creation latency for pods that are already configured and working fine is being bumped, from the time they were scheduled until the time the restart happened like Dan Williams suggested which shouldn't be the case: https://coreos.slack.com/archives/C01G7T6SYSD/p1663607960049459.

      Version-Release number of selected component (if applicable):
      
       

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install an OpenShift or HyperShift cluster using the latest nightly - 4.11.0-0.nightly-2022-09-15-164423 has been used for this run 
      2. Use Kraken container scenarios to disrupt ovnkube-master container: https://github.com/redhat-chaos/krkn/blob/main/docs/container_scenarios.md 
      3. Observe histogram_quantile(0.99, sum(rate(ovnkube_master_pod_creation_latency_seconds_bucket metric
      
      

      Actual results:

       

      Expected results:

       

      Additional info:

      Logs:http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/hypershift/ovn-disruption/

            bpickard@redhat.com Ben Pickard
            nelluri Naga Ravi Chaitanya Elluri
            Anurag Saxena Anurag Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: