Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15803

TuneD reverts node level profiles on termination

XMLWordPrintable

    • Critical
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-15736. The following is the description of the original issue:

      Description of problem:

      When the tuned daemonset is rolled out it first terminates the old daemonset and starts the new daemonset.  Upon receiving the termination signal the TuneD daemon will roll back all changes it has made.  When the new tuned daemon is started it will apply all changes.
      
      When the daemonset rolls back changes, the value of net.ipv4.neigh.default.gc_thresh3 is changed from the OpenShift default of 65536 to the CoreOS default of 1024.  If the cluster node has more than 1024 entries in the node ARP cache the ARP cache is instantly overflowed and the kernel reports the error "neighbour: arp_cache: neighbor table overflow!"  Since the ARP cache is overflowed, IP addresses cannot be resolved to MAC addresses and the Node's network calls will fail until kernel garbage collection reduces the ARP cache to below 1024.  While the network is unavailable the new tuned daemonset cannot be rolled out because the node cannot pull the image.  
      
      For clusters with a high amount of network traffic the rollback of tuned profiles causes a complete network outage for a few seconds up to a few minutes (possibly more dependent on load/arp_cache size).  This mostly impacts nodes running ingress routers, but the error "neighbour: arp_cache: neighbor table overflow!" can also be observed on the control plane nodes.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Its fairly simple to observe the behavior of the gc_thresh3 value being reverted.  Reproducing the network outage/kernel error "neighbour: arp_cache: neighbor table overflow!" is a little more complicated as you would need a high amount of services/pods running which are exposed via ingress, and artificial network traffic to generate a high enough arp_cache.

      Steps to Reproduce:

      1. Start a debug pod on any node, `chroot /host`, and `watch -n 1 sysctl net.ipv4.neigh.default.gc_thresh3`, observe that the value is 65536
      3. Delete the tuned pod running on the node the debug session is on `oc delete pod tuned-xxxxx -n openshift-cluster-node-tuning-operator`
      3.Observe the value of gc_thresh3 is momentarily reverted to 1024.  Shortly after it will set back to 65536.

      Actual results:

      It is observed that upon deleting the tuned pod, net.ipv4.neigh.default.gc_thresh3 is reverted to 1024. Shortly after it is set back to 65536.

      Expected results:

      It is observed that upon deleting the tuned pod, net.ipv4.neigh.default.gc_thresh3 is maintained at  65536.

      Additional info:

      Here is the line of code that indicates the rollback of changes, it can be observed in the logs when terminating the pod - https://github.com/redhat-performance/tuned/blob/b5363687e06409287217c3a0378ab6adf2e3f50b/tuned/daemon/daemon.py#L241

            jmencak Jiri Mencak
            openshift-crt-jira-prow OpenShift Prow Bot
            Liquan Cui Liquan Cui
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: