Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-33379

pauseReconcileInstances Prometheus GaugeVec accumulates orphaned label sets

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • None
    • Fleet Management
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False

      Files:

      pauseReconcileInstances is a prometheus.GaugeVec keyed by Central instance ID. A new time series is created whenever a Central is seen for the first time. When Centrals are deleted, deleteStaleReconcilers cleans up the reconciler registry but has no matching call to remove the metric. The label set for every deleted tenant accumulates in memory indefinitely.

      The pattern for correct cleanup already exists in the same file: CertificatesExpiry has DeleteCertMetricDeleteCertNamespaceMetric, and DeleteKeyCertMetric using DeletePartialMatchpauseReconcileInstances is missing the equivalent.

      Network amplification: Not directly affected. The leak rate is determined by tenant churn, not network conditions.

      Fix: Add a deletion method (e.g. DeletePauseReconcileMetric(instance string)) that calls pauseReconcileInstances.DeletePartialMatch(...), and call it from deleteStaleReconcilers when removing a key.

              Unassigned Unassigned
              rh-ee-mhess Michael Hess
              ACS Cloud Service
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: