-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19.z
-
None
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
Node Green Sprint 280, OCP Node Core Sprint 282
-
2
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
I am investigating an odd issue with kubelet that seems to have been introduced in v4.19 between specific z-streams (v4.19.14 --> v4.19.18). The issue is affecting only BareMetal nodes as it seems with huge capacity of 120 + CPU and lots of RAM. The issue is that whenever the customer deploys ~ 700 pods simultaneously the kubelet trying to mount 2500 + secrets/configmaps simultaneously which causes huge CPU load that makes the node unusable. At first we though this was an issue with kernel but kernel collab shows that there is probably some change introduced that is causing huge amount of processes to stay on D state causing CPU saturation and node being unrensponsive. Kernel comment in the bottom line is the below:
This shows that the pods together account for 2345 mounts (mostly tmpfs secret/projected volumes), which is a primary factor inducing the shrinker_rwsem contention. With hundreds of pods and their thousands of tmpfs mounts, it is quite natural that shrinker_rwsem becomes a hot contention point. The issue is more likely a workload and scaling problem in the OCP environment, rather than a kernel bug.
The version of kubelet that was updated between these 2 z-streams is:
openshift-kubelet 4.19.0-202509122308.p2.g335be3a.assembly.stream.el9 → 4.19.0-202510101528.p2.gf94ad89.assembly.stream.el9
Important Notes:
- We were able to mitigate the issue by downgrading these workers CoreOS image to v4.19.14 with a MachineConfig.
- Extensive analysis of the issue from kernel team is in the attached case as well as vmcores and sosreports from these nodes.
- The same issue is not visible to VM nodes that are also part of the cluster. But not sure about their capacity or deployment volume on these ones as of now. But i can ask if required.
Version-Release number of selected component (if applicable):
4.19.0-202510101528.p2.gf94ad89.assembly.stream.el9
How reproducible:
- Upgrade to v4.19.18 - Deploy ~ 700 pods simultaneously on the node - See the node load rising until it gets completely unresponsive.
Actual results:
- The node becomes completely unresponsive
Expected results:
- The node should not become unresponsive
Additional info: