-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16.z
-
None
-
False
-
Description of problem:
Some of the Openshift Dedicated clusters were found leaking veth* interfaces.
Those clusters are 4.16 with SDN.
In one of the worker node of such clusters, we can see there are 5000+ veth interfaces, while there are only 70 containers running.
$ ip link | grep veth | wc -l 5433 $ crictl ps | wc -l 70
We found this issue because the node-exporter pod was consuming high cpu, then found it was due to it need to read large amount of veth* info causing the high cpu. Ref OCPBUGS-44100 .
Version-Release number of selected component (if applicable):
4.16
How reproducible:
The issue only happens to some clusters, and only some of the nodes have this issue. Replacing/rebooting the node can workaround the issue, but the issue may recur on some of the replaced/rebooted nodes again.
Steps to Reproduce:
We haven't figured out what triggers the problem or how to reproduce.
Actual results:
The number of veth* interfaces in a node keeps growing.
Expected results:
The number of veth* interfaces is similar to the number of running containers.
Additional info:
Affected Platforms:
Is it an SD issue. Impact some of the OSD clusters.
Must-gather and sosreport will be attached in the next comment.
Related ticket OHSS-38706 .