Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33730

[4.13z] slow ovnkube-node initialization on large number of services with externalIps

XMLWordPrintable

    • +
    • No
    • SDN Sprint 254
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: What actions or circumstances cause this bug to present.
      When ovnkube-node pod is started and there are many external IPs or load balancer IP services exist.
      *Consequence*: What happens when the bug presents.
      ovnkube-node will take a very long time to start up due to it programming iptables rules one at a time for all services. Iterating over external IPs or load balancer iPs is especially costly. The length of start up time depends on the number of services and external IPs/load balancer IPs. We have seen this time take from several minutes to almost an hour.
      *Fix*: What was done to fix the bug.
      Optimizations in the service parsing logic reduces duplicate calls to create iptables rules. Now iptables rules are atomically created using iptable-restore rather than a separate call to iptables for each external IP.
      *Result*: Bug doesn’t present anymore.
      The start up time is much faster, even at high scale it should be on the order of seconds rather than minutes.
      Show
      *Cause*: What actions or circumstances cause this bug to present. When ovnkube-node pod is started and there are many external IPs or load balancer IP services exist. *Consequence*: What happens when the bug presents. ovnkube-node will take a very long time to start up due to it programming iptables rules one at a time for all services. Iterating over external IPs or load balancer iPs is especially costly. The length of start up time depends on the number of services and external IPs/load balancer IPs. We have seen this time take from several minutes to almost an hour. *Fix*: What was done to fix the bug. Optimizations in the service parsing logic reduces duplicate calls to create iptables rules. Now iptables rules are atomically created using iptable-restore rather than a separate call to iptables for each external IP. *Result*: Bug doesn’t present anymore. The start up time is much faster, even at high scale it should be on the order of seconds rather than minutes.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-32426. The following is the description of the original issue:

      on clusters with a large number of services with externalIPs or services from type loadBalancer the ovnkube-node initialization can take up to 50 min

      The problem is after a node reboot done by MCO the unschedule taint is removed from the node so the api allocates pods to that node that get stuck on ContrainerCreating and other nodes continue to go down for reboot making the workloads unavailable. (if no PDB exists for the workload to protect it)

              trozet@redhat.com Tim Rozet
              openshift-crt-jira-prow OpenShift Prow Bot
              Jean Chen Jean Chen
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: