Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1341

Node churn leaks PodNetworkConnectivityChecks

    XMLWordPrintable

Details

    • Moderate
    • 3
    • OCP VE Sprint 225, OCP VE Sprint 226, OCP VE Sprint 227, OCP VE Sprint 228, OCP VE Sprint 229, OCP VE Sprint 230, OCP VE Sprint 231, OCP VE Sprint 232, OCP VE Sprint 233, OCP VE Sprint 234, OCP VE Sprint 235
    • 11
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

    Description

      I haven't gone back to pin down all affected versions, but I wouldn't be surprised if we've had this exposure for a while. On a 4.12.0-ec.2 cluster, we have:

      cluster:usage:resources:sum{resource="podnetworkconnectivitychecks.controlplane.operator.openshift.io"}
      

      currently clocking in around 67983. I've gathered a dump with:

      $ oc --as system:admin -n openshift-network-diagnostics get podnetworkconnectivitychecks.controlplane.operator.openshift.io | gzip >checks.gz
      

      And many, many of these reference nodes which no longer exist (the cluster is aggressively autoscaled, with nodes coming and going all the time). We should fix garbage collection on this resource, to avoid consuming excessive amounts of memory in the Kube API server and etcd as they attempt to list the large resource set.

      Attachments

        Issue Links

          Activity

            People

              pepalani@redhat.com Periyasamy Palanichamy
              trking W. Trevor King
              Michael Fiedler Michael Fiedler
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: