-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.14.z
-
None
Description of problem:
keda operator pod increasing memory indefinitely till OOM when invalid namespace is defined in scaled object.
Version-Release number of selected component (if applicable):
2.11.2-322
How reproducible:
always
Steps to Reproduce:
1. define a scaledObject of type prometheus with an invalid namespace in triggers.metadata, like for instance:
triggers: - authenticationRef: name: keda-trigger-auth-prometheus metadata: authModes: bearer metricName: http_requests_total namespace: ns2 ===> this namespace does not exist query: sum(rate(http_requests_total{job="prometheus-example-app"}[1m])) serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9092 threshold: "3" type: prometheus
2. check keda operator logs for stracktrace of this sort
oc logs keda-operator-59865dbdc4-dksnk ... 024-03-01T19:47:46Z ERROR scale_handler error getting metric for scaler {"scaledObject.Namespace": "ns1", "scaledObject.Name": "prom-scaledobject", "scaler": "prometheusScaler", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"} github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).GetScaledObjectMetrics /remote-source/keda/app/pkg/scaling/scale_handler.go:483 github.com/kedacore/keda/v2/pkg/metricsservice.(*GrpcServer).GetMetrics /remote-source/keda/app/pkg/metricsservice/server.go:47 github.com/kedacore/keda/v2/pkg/metricsservice/api._MetricsService_GetMetrics_Handler /remote-source/keda/app/pkg/metricsservice/api/metrics_grpc.pb.go:99 google.golang.org/grpc.(*Server).processUnaryRPC /remote-source/keda/app/vendor/google.golang.org/grpc/server.go:1337 google.golang.org/grpc.(*Server).handleStream /remote-source/keda/app/vendor/google.golang.org/grpc/server.go:1714 google.golang.org/grpc.(*Server).serveStreams.func1.1 /remote-source/keda/app/vendor/google.golang.org/grpc/server.go:959 2024-03-01T19:47:58Z ERROR prometheus_scaler prometheus query api returned error {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"} github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).ExecutePromQuery /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:310 github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:365 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:130 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState /remote-source/keda/app/pkg/scaling/scale_handler.go:572 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /remote-source/keda/app/pkg/scaling/scale_handler.go:236 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /remote-source/keda/app/pkg/scaling/scale_handler.go:175 2024-03-01T19:47:58Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"} github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:367 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:130 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState /remote-source/keda/app/pkg/scaling/scale_handler.go:572 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /remote-source/keda/app/pkg/scaling/scale_handler.go:236 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /remote-source/keda/app/pkg/scaling/scale_handler.go:175 2024-03-01T19:47:58Z ERROR prometheus_scaler prometheus query api returned error {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"} github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).ExecutePromQuery /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:310 github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:365 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:140 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState /remote-source/keda/app/pkg/scaling/scale_handler.go:572 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /remote-source/keda/app/pkg/scaling/scale_handler.go:236 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /remote-source/keda/app/pkg/scaling/scale_handler.go:175 2024-03-01T19:47:58Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "ns1", "name": "prom-scaledobject", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"} github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity /remote-source/keda/app/pkg/scalers/prometheus_scaler.go:367 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler /remote-source/keda/app/pkg/scaling/cache/scalers_cache.go:140 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState /remote-source/keda/app/pkg/scaling/scale_handler.go:572 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /remote-source/keda/app/pkg/scaling/scale_handler.go:236 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /remote-source/keda/app/pkg/scaling/scale_handler.go:175 2024-03-01T19:47:58Z ERROR scale_handler error getting scale decision {"scaledObject.Namespace": "ns1", "scaledObject.Name": "prom-scaledobject", "scaler": "prometheusScaler", "error": "prometheus query api returned error. status: 403 response: Forbidden (user=system:serviceaccount:ns1:thanos, verb=get, resource=pods, subresource=)\n"} github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState /remote-source/keda/app/pkg/scaling/scale_handler.go:588 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers /remote-source/keda/app/pkg/scaling/scale_handler.go:236 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop /remote-source/keda/app/pkg/scaling/scale_handler.go:175 ..... 3. check in observe/metrics in openshift-keda namespace this query:
sum(container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", cluster="", namespace="openshift-keda", container!="", image!=""}) by (pod)
Actual results:
very quickly the pod keda-operator is increasing memory usage. I have seen it increased from ~70Mb to 100Mb in 30m
Expected results:
Additional info:
- links to
-
RHSA-2024:129656 Custom Metrics Autoscaler Operator for Red Hat OpenShift security/bugfix update
- mentioned on