Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48043

Some nodes are leaking veth* interfaces

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Some of the Openshift Dedicated clusters were found leaking veth* interfaces.

      Those clusters are 4.16 with SDN.

      In one of the worker node of such clusters, we can see there are 5000+ veth interfaces, while there are only 70 containers running.

       

      $ ip link | grep veth | wc -l
      5433
      $ crictl ps | wc -l
      70

       

      We found this issue because the node-exporter pod was consuming high cpu, then found it was due to it need to read large amount of veth* info causing the high cpu. Ref OCPBUGS-44100 . 

       

      Version-Release number of selected component (if applicable):

      4.16

       

      How reproducible:

      The issue only happens to some clusters, and only some of the nodes have this issue. Replacing/rebooting the node can workaround the issue, but the issue may recur on some of the replaced/rebooted nodes again.

       

      Steps to Reproduce:

      We haven't figured out what triggers the problem or how to reproduce.

       

      Actual results:

      The number of veth* interfaces in a node keeps growing.

       

      Expected results:

      The number of veth* interfaces is similar to the number of running containers.

       

      Additional info:

      Affected Platforms:

      Is it an SD issue. Impact some of the OSD clusters.

      Must-gather and sosreport will be attached in the next comment.

      Related ticket OHSS-38706 . 

              npinaeva@redhat.com Nadia Pinaeva
              siwu.openshift Siu Wa Wu
              Zhanqi Zhao Zhanqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: