-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.14.z
-
None
Description of problem:
keda operator pod increasing memory indefinitely till OOM when invalid namespace is defined in scaled object.
Version-Release number of selected component (if applicable):
2.11.2-322
How reproducible:
always
Steps to Reproduce:
1. define a scaledObject of type prometheus with an invalid namespace in triggers.metadata, like for instance:
triggers:
- authenticationRef:
name: keda-trigger-auth-prometheus
metadata:
authModes: bearer
metricName: http_requests_total
namespace: ns2 ===> this namespace does not exist
query: sum(rate(http_requests_total{job="prometheus-example-app"}[1m]))
serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9092
threshold: "3"
type: prometheus
2. check keda operator logs for stracktrace of this sort
oc logs keda-operator-59865dbdc4-dksnk
...
024-03-01T19:47:46Z ERROR scale_handler error getting metric for scaler {"scaledObject.Namespace": "ns1", "scaledObject.Name": "prom-scaledobject", "scaler": "prometheusScaler", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics
/remote-source/keda/app/pkg/scaling/scale_handler.go:483
github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics
/remote-source/keda/app/pkg/metricsservice/server.go:47
github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler
/remote-source/keda/app/pkg/metricsservice/api/metrics_grpc.pb.go:99
google.golang.org/grpc.(*Server).processUnaryRPC
/remote-source/keda/app/vendor/google.golang.org/grpc/server.go:1337
google.golang.org/grpc.(*Server).handleStream
/remote-source/keda/app/vendor/google.golang.org/grpc/server.go:1714
google.golang.org/grpc.(*Server).serveStreams.func1.1
/remote-source/keda/app/vendor/google.golang.org/grpc/server.go:959
2024-03-01T19:47:58Z ERROR prometheus_scaler prometheus query api returned error {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).ExecutePromQuery
/remote-source/keda/app/pkg/scalers/prometheus_scaler.go:310
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
/remote-source/keda/app/pkg/scalers/prometheus_scaler.go:365
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
/remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:130
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
/remote-source/keda/app/pkg/scaling/scale_handler.go:572
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/remote-source/keda/app/pkg/scaling/scale_handler.go:236
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/remote-source/keda/app/pkg/scaling/scale_handler.go:175
2024-03-01T19:47:58Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
/remote-source/keda/app/pkg/scalers/prometheus_scaler.go:367
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
/remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:130
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
/remote-source/keda/app/pkg/scaling/scale_handler.go:572
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/remote-source/keda/app/pkg/scaling/scale_handler.go:236
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/remote-source/keda/app/pkg/scaling/scale_handler.go:175
2024-03-01T19:47:58Z ERROR prometheus_scaler prometheus query api returned error {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).ExecutePromQuery
/remote-source/keda/app/pkg/scalers/prometheus_scaler.go:310
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
/remote-source/keda/app/pkg/scalers/prometheus_scaler.go:365
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
/remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:140
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
/remote-source/keda/app/pkg/scaling/scale_handler.go:572
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/remote-source/keda/app/pkg/scaling/scale_handler.go:236
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/remote-source/keda/app/pkg/scaling/scale_handler.go:175
2024-03-01T19:47:58Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
/remote-source/keda/app/pkg/scalers/prometheus_scaler.go:367
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
/remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:140
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
/remote-source/keda/app/pkg/scaling/scale_handler.go:572
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/remote-source/keda/app/pkg/scaling/scale_handler.go:236
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/remote-source/keda/app/pkg/scaling/scale_handler.go:175
2024-03-01T19:47:58Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "ns1", "scaledObject.Name": "prom-scaledobject", "scaler": "prometheusScaler", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState
/remote-source/keda/app/pkg/scaling/scale_handler.go:588
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/remote-source/keda/app/pkg/scaling/scale_handler.go:236
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/remote-source/keda/app/pkg/scaling/scale_handler.go:175
.....
3. check in observe/metrics in openshift-keda namespace this query:
sum(container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", cluster="", namespace="openshift-keda", container!="", image!=""}) by (pod)
Actual results:
very quickly the pod keda-operator is increasing memory usage.
I have seen it increased from ~70Mb to 100Mb in 30m
Expected results:
Additional info:
- links to
-
RHSA-2024:129656
Custom Metrics Autoscaler Operator for Red Hat OpenShift security/bugfix update
- mentioned on