-
Feature Request
-
Resolution: Unresolved
-
Blocker
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
-
None
-
None
-
None
1. Proposed title of this feature request
Mitigate iptables-alerter risk for kernel lock contention
2. What is the nature and description of the request?
In some recent escalations, we have found out that iptables-alerter can cause heavy kernel lock contention that ultimately results, among other undesired effects, in keepalived check scripts failing even on an idle node.
This solution explains it in greater detail but, long story short, the bigger the number of pods in the node, the longer that kernel locks can be held, the more kernel blockage caused. The CPU limit on iptables-alerter made things worse, but just removing or relaxing it may not be enough of a mitigation.
So we'd like to explore together with engineering different options to mitigate this risk. Some of the options may be:
- Add a warning in nodes with high pod density to disable iptables-alerter.
- Make iptables-alerter disabled by default, so that users can enable it only to do sporadic checks.
- Fully remove iptables-alerter.
- ...
The above are just suggestions. We'd like to open the discussion with engineering, show the problem we had and agree, with their perspective, in the best way to go.
3. Why does the customer need this? (List the business requirements here)
We have had a major escalation because of the impact of this component, which took great efforts and very low-level troubleshooting until we found the root cause. We are also starting to see it in other customers, so this has risk of becoming something really impacting.
4. List any affected packages or components.
Cluster network operator, iptables-alerter.