Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: None
Affects Version/s: 4.11
Component/s: Node Tuning Operator
Labels:

Severity:
Critical
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Links:

Description of problem:

When the tuned daemonset is rolled out it first terminates the old daemonset and starts the new daemonset.  Upon receiving the termination signal the TuneD daemon will roll back all changes it has made.  When the new tuned daemon is started it will apply all changes.

When the daemonset rolls back changes, the value of net.ipv4.neigh.default.gc_thresh3 is changed from the OpenShift default of 65536 to the CoreOS default of 1024.  If the cluster node has more than 1024 entries in the node ARP cache the ARP cache is instantly overflowed and the kernel reports the error "neighbour: arp_cache: neighbor table overflow!"  Since the ARP cache is overflowed, IP addresses cannot be resolved to MAC addresses and the Node's network calls will fail until kernel garbage collection reduces the ARP cache to below 1024.  While the network is unavailable the new tuned daemonset cannot be rolled out because the node cannot pull the image.  

For clusters with a high amount of network traffic the rollback of tuned profiles causes a complete network outage for a few seconds up to a few minutes (possibly more dependent on load/arp_cache size).  This mostly impacts nodes running ingress routers, but the error "neighbour: arp_cache: neighbor table overflow!" can also be observed on the control plane nodes.

Version-Release number of selected component (if applicable):

How reproducible:

Its fairly simple to observe the behavior of the gc_thresh3 value being reverted.  Reproducing the network outage/kernel error "neighbour: arp_cache: neighbor table overflow!" is a little more complicated as you would need a high amount of services/pods running which are exposed via ingress, and artificial network traffic to generate a high enough arp_cache.

Steps to Reproduce:

1. Start a debug pod on any node, `chroot /host`, and `watch -n 1 sysctl net.ipv4.neigh.default.gc_thresh3`, observe that the value is 65536
3. Delete the tuned pod running on the node the debug session is on `oc delete pod tuned-xxxxx -n openshift-cluster-node-tuning-operator`
3.Observe the value of gc_thresh3 is momentarily reverted to 1024.  Shortly after it will set back to 65536.

Actual results:

It is observed that upon deleting the tuned pod, net.ipv4.neigh.default.gc_thresh3 is reverted to 1024. Shortly after it is set back to 65536.

Expected results:

It is observed that upon deleting the tuned pod, net.ipv4.neigh.default.gc_thresh3 is maintained at  65536.

Additional info:

Here is the line of code that indicates the rollback of changes, it can be observed in the logs when terminating the pod - https://github.com/redhat-performance/tuned/blob/b5363687e06409287217c3a0378ab6adf2e3f50b/tuned/daemon/daemon.py#L241

is cloned by

OCPBUGS-15736 TuneD reverts node level profiles on termination

Closed

is depended on by

OCPBUGS-15736 TuneD reverts node level profiles on termination

Closed

links to

openshift/cluster-node-tuning-operator#699: OCPBUGS-13065: Do not rollback settings on TuneD exit

RHEA-2023:5006 rpm

Assignee:: Jiri Mencak

Reporter:: Tony Schneider

QA Contact:: Liquan Cui

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2023/05/03 7:19 PM

Updated:: 2023/10/31 1:35 PM

Resolved:: 2023/10/31 1:35 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates