Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30145

Custom metrics operator memory leak when invalid scaledObject is defined

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.14.z
    • Pod Autoscaler
    • None
    • Moderate
    • No
    • 3
    • PODAUTO - Sprint 251
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously if invalid values (e.g. nonexistent namespaces) were specified in scaledObject metadata, the underling scaler clients would not free/close their client descriptors, resulting in a slow memory leak. This release properly closes the underlying client descriptors when there are errors, preventing memory from leaking.
      Show
      Previously if invalid values (e.g. nonexistent namespaces) were specified in scaledObject metadata, the underling scaler clients would not free/close their client descriptors, resulting in a slow memory leak. This release properly closes the underlying client descriptors when there are errors, preventing memory from leaking.
    • Bug Fix
    • In Progress

      Description of problem:

          keda operator pod increasing memory indefinitely till OOM when invalid namespace is defined in scaled object.

      Version-Release number of selected component (if applicable):

          2.11.2-322

      How reproducible:

          always

      Steps to Reproduce:

          1. define a scaledObject of type prometheus with an invalid namespace in triggers.metadata, like for instance:
          triggers:
          - authenticationRef:
              name: keda-trigger-auth-prometheus
            metadata:
              authModes: bearer
              metricName: http_requests_total
              namespace: ns2                      ===> this namespace does not exist          
              query: sum(rate(http_requests_total{job="prometheus-example-app"}[1m]))
              serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9092
              threshold: "3"
            type: prometheus
      
          2. check keda operator logs for stracktrace of this sort
            oc logs keda-operator-59865dbdc4-dksnk
      ...
      024-03-01T19:47:46Z    ERROR    scale_handler    error getting metric for scaler    {"scaledObject.Namespace": "ns1", "scaledObject.Name": "prom-scaledobject", "scaler": "prometheusScaler", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
          /remote-source/keda/app/pkg/scaling/scale_handler.go:483
      github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
          /remote-source/keda/app/pkg/metricsservice/server.go:47
      github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
          /remote-source/keda/app/pkg/metricsservice/api/metrics_grpc.pb.go:99
      google.golang.org/grpc.(*Server).processUnaryRPC
          /remote-source/keda/app/vendor/google.golang.org/grpc/server.go:1337
      google.golang.org/grpc.(*Server).handleStream
          /remote-source/keda/app/vendor/google.golang.org/grpc/server.go:1714
      google.golang.org/grpc.(*Server).serveStreams.func1.1
          /remote-source/keda/app/vendor/google.golang.org/grpc/server.go:959
      2024-03-01T19:47:58Z    ERROR    prometheus_scaler    prometheus query api returned error    {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
      github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).ExecutePromQuery
          /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:310
      github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
          /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:365
      github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
          /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:130
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
          /remote-source/keda/app/pkg/scaling/scale_handler.go:572
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
          /remote-source/keda/app/pkg/scaling/scale_handler.go:236
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
          /remote-source/keda/app/pkg/scaling/scale_handler.go:175
      2024-03-01T19:47:58Z    ERROR    prometheus_scaler    error executing prometheus query    {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
      github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
          /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:367
      github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
          /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:130
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
          /remote-source/keda/app/pkg/scaling/scale_handler.go:572
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
          /remote-source/keda/app/pkg/scaling/scale_handler.go:236
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
          /remote-source/keda/app/pkg/scaling/scale_handler.go:175
      2024-03-01T19:47:58Z    ERROR    prometheus_scaler    prometheus query api returned error    {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
      github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).ExecutePromQuery
          /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:310
      github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
          /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:365
      github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
          /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:140
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
          /remote-source/keda/app/pkg/scaling/scale_handler.go:572
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
          /remote-source/keda/app/pkg/scaling/scale_handler.go:236
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
          /remote-source/keda/app/pkg/scaling/scale_handler.go:175
      2024-03-01T19:47:58Z    ERROR    prometheus_scaler    error executing prometheus query    {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
      github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
          /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:367
      github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
          /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:140
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
          /remote-source/keda/app/pkg/scaling/scale_handler.go:572
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
          /remote-source/keda/app/pkg/scaling/scale_handler.go:236
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
          /remote-source/keda/app/pkg/scaling/scale_handler.go:175
      2024-03-01T19:47:58Z    ERROR    scale_handler    error getting scale decision    {"scaledObject.Namespace": "ns1", "scaledObject.Name": "prom-scaledobject", "scaler": "prometheusScaler", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
          /remote-source/keda/app/pkg/scaling/scale_handler.go:588
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
          /remote-source/keda/app/pkg/scaling/scale_handler.go:236
      github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
          /remote-source/keda/app/pkg/scaling/scale_handler.go:175
      .....
      
      
          3. check in observe/metrics in openshift-keda namespace this query:
      sum(container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", cluster="", namespace="openshift-keda", container!="", image!=""}) by (pod)          

      Actual results:

          very quickly the pod keda-operator is increasing memory usage. 
          I have seen it increased from ~70Mb to 100Mb in 30m

         

      Expected results:

          

      Additional info:

          

            jkyros@redhat.com John Kyros
            rhn-support-gparente German Parente
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: