Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 1.7.0
Affects Version/s: 1.6.0
Component/s: Operator
Labels:
- good-first-issue

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Release Note Text:

Hide
= The Operator reverts the number of replicas set automatically by a Horizontal Pod Autoscaler (HPA)

Previously the {product-very-short} Operator enforced `replicas: 1` from its default deployment config, which overrode scaling decisions.

With this update, `replicas: 1` is removed from the default deployment profile and the DB-statefulset manifest.

This update allows HPA to control scaling as expected, so that {product-very-short} instances can now scale dynamically with HPA without being reset by the operator.

Show
= The Operator reverts the number of replicas set automatically by a Horizontal Pod Autoscaler (HPA) Previously the {product-very-short} Operator enforced `replicas: 1` from its default deployment config, which overrode scaling decisions. With this update, `replicas: 1` is removed from the default deployment profile and the DB-statefulset manifest. This update allows HPA to control scaling as expected, so that {product-very-short} instances can now scale dynamically with HPA without being reset by the operator.
Release Note Type:
Bug Fix
Release Note Status:
Done
Intelligence Requested:
Market:

Sprint:
RHDH Install 3276

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

If we create an Horizontal Pod Autoscaler (HPA) resource to scale the RHDH pods based on application usage, we can notice that the RHDH Operator would always revert the number of replicas down to 1 (default number of replicas).

This issue has been reported in ~~RHIDP-4089~~ as well.

Prerequisites (if any, like setup, operators/versions):

RHDH Operator 1.6.1 (or from the main branch of the rhdh-operator repo)
Tested on a ROSA - OCP 4.18 cluster and also on a local Kind cluster (with a metrics server installed):

Steps to Reproduce

Install the RHDH Operator 1.6.1 (also tested with `make deploy` from the rhdh-operator repo main branch)

Create a very simple Backstage CR, like so:

cat << EOF | oc apply -f -                       
apiVersion: rhdh.redhat.com/v1alpha3
kind: Backstage
metadata:
  name: bs1
EOF

Wait until the RHDH pods are fully up and running

Create an HPA resource tied to the RHDH Deployment, either declaratively or imperatively with a command like this:

oc autoscale deployment backstage-bs1 \
  --cpu-percent=50 \
  --min=1 \
  --max=3

Check that there is an HPA resource created and that CPU usage is being tracked:

$ oc get hpa
NAME            REFERENCE                  TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
backstage-bs1   Deployment/backstage-bs1   cpu: 5%/50%   1         3         1          19s

Generate some high CPU load on the RHDH pod with this command as an example:

oc exec -it deploy/backstage-bs1 -- /bin/sh -c "openssl speed -multi $(nproc --all)"

In a separate tab, watch the HPA and notice the CPU usage increasing:

$ oc get hpa
NAME            REFERENCE                  TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
backstage-bs1   Deployment/backstage-bs1   cpu: 399%/50%   1         3         1          4m36s

Describe the HPA and notice that it tried to scale up based on CPU usage:

$ oc describe hpa backstage-bs1                     
Name:                                                  backstage-bs1
Namespace:                                             my-ns
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 16 Jun 2025 17:56:29 +0200
Reference:                                             Deployment/backstage-bs1
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  399% (998m) / 50%
Min replicas:                                          1
Max replicas:                                          3
Deployment pods:                                       1 current / 3 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    SucceededRescale  the HPA controller was able to update the target scale to 3
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age                   From                       Message
  ----    ------             ----                  ----                       -------
  Normal  SuccessfulRescale  13s (x12 over 2m58s)  horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target

Check the RHDH Deployment and notice that it was scaled up (via the HPA), then scaled down (by the Operator):

$ oc describe deployment backstage-bs1

[...]
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   backstage-bs1-65999bf47b (1/1 replicas created)
Events:
  Type    Reason             Age                  From                   Message
  ----    ------             ----                 ----                   -------
  Normal  ScalingReplicaSet  10m                  deployment-controller  Scaled up replica set backstage-bs1-65999bf47b to 1
  Normal  ScalingReplicaSet  3s (x10 over 2m18s)  deployment-controller  Scaled up replica set backstage-bs1-65999bf47b to 3 from 1
  Normal  ScalingReplicaSet  2s (x10 over 2m18s)  deployment-controller  Scaled down replica set backstage-bs1-65999bf47b to 1 from 3

The Operator logs confirm the behavior seen here:

[...]
2025-06-16T16:04:45Z    DEBUG   enqueuing reconcile on Deployment change        {"Deployment": "backstage-bs1", "namespace: ": "my-ns"}
2025-06-16T16:04:45Z    DEBUG   apply object    {"controller": "backstage", "controllerGroup": "rhdh.redhat.com", "controllerKind": "Backstage", "Backstage": {"name":"bs1","namespace":"my-ns"
}, "namespace": "my-ns", "name": "bs1", "reconcileID": "7a84983a-566b-4e76-8b0c-9c4d4fc9a91e", "/v1, Kind=ConfigMap": "backstage-appconfig-bs1"}
[...]

Actual results:

Deployment automatically scaled by the HPA based on application usage, but reverted back by the Operator.

Expected results:

Operator should respect the autoscaling constraints defined by the HPA attached to the RHDH Deployment.
This would help users adapting their RHDH instance to their usage - see ~~RHIDP-4089~~

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

Additional info (Such as Logs, Screenshots, etc):

Operator Logs attached.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

rhdh-operator-logs.txt
222 kB
2025/06/16 4:07 PM

relates to

RHIDP-7818 [Operator] Add support for HPA (Horizontal Pod Autoscaler)

Closed

Details

Description

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Actual results:

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Additional info (Such as Logs, Screenshots, etc):

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty