Uploaded image for project: 'Red Hat Developer Hub Bugs'
  1. Red Hat Developer Hub Bugs
  2. RHDHBUGS-503

Operator reverts the number of replicas set automatically by an Horizontal Pod Autoscaler (HPA)

XMLWordPrintable

    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      = The Operator reverts the number of replicas set automatically by a Horizontal Pod Autoscaler (HPA)

      Previously the {product-very-short} Operator enforced `replicas: 1` from its default deployment config, which overrode scaling decisions.

      With this update, `replicas: 1` is removed from the default deployment profile and the DB-statefulset manifest.

      This update allows HPA to control scaling as expected, so that {product-very-short} instances can now scale dynamically with HPA without being reset by the operator.
      Show
      = The Operator reverts the number of replicas set automatically by a Horizontal Pod Autoscaler (HPA) Previously the {product-very-short} Operator enforced `replicas: 1` from its default deployment config, which overrode scaling decisions. With this update, `replicas: 1` is removed from the default deployment profile and the DB-statefulset manifest. This update allows HPA to control scaling as expected, so that {product-very-short} instances can now scale dynamically with HPA without being reset by the operator.
    • Bug Fix
    • Done
    • RHDH Install 3276

      Description of problem:

      If we create an Horizontal Pod Autoscaler (HPA) resource to scale the RHDH pods based on application usage, we can notice that the RHDH Operator would always revert the number of replicas down to 1 (default number of replicas).

      This issue has been reported in RHIDP-4089 as well.

      Prerequisites (if any, like setup, operators/versions):

      • RHDH Operator 1.6.1 (or from the main branch of the rhdh-operator repo)
      • Tested on a ROSA - OCP 4.18 cluster and also on a local Kind cluster (with a metrics server installed):

      Steps to Reproduce

      • Install the RHDH Operator 1.6.1 (also tested with `make deploy` from the rhdh-operator repo main branch)
      • Create a very simple Backstage CR, like so:
        cat << EOF | oc apply -f -                       
        apiVersion: rhdh.redhat.com/v1alpha3
        kind: Backstage
        metadata:
          name: bs1
        EOF
        
      • Wait until the RHDH pods are fully up and running
      • Create an HPA resource tied to the RHDH Deployment, either declaratively or imperatively with a command like this:
      oc autoscale deployment backstage-bs1 \
        --cpu-percent=50 \
        --min=1 \
        --max=3
      
      • Check that there is an HPA resource created and that CPU usage is being tracked:
      $ oc get hpa
      NAME            REFERENCE                  TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
      backstage-bs1   Deployment/backstage-bs1   cpu: 5%/50%   1         3         1          19s
      
      • Generate some high CPU load on the RHDH pod with this command as an example:
      oc exec -it deploy/backstage-bs1 -- /bin/sh -c "openssl speed -multi $(nproc --all)"
      
      • In a separate tab, watch the HPA and notice the CPU usage increasing:
        $ oc get hpa
        NAME            REFERENCE                  TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
        backstage-bs1   Deployment/backstage-bs1   cpu: 399%/50%   1         3         1          4m36s
        
      • Describe the HPA and notice that it tried to scale up based on CPU usage:
      $ oc describe hpa backstage-bs1                     
      Name:                                                  backstage-bs1
      Namespace:                                             my-ns
      Labels:                                                <none>
      Annotations:                                           <none>
      CreationTimestamp:                                     Mon, 16 Jun 2025 17:56:29 +0200
      Reference:                                             Deployment/backstage-bs1
      Metrics:                                               ( current / target )
        resource cpu on pods  (as a percentage of request):  399% (998m) / 50%
      Min replicas:                                          1
      Max replicas:                                          3
      Deployment pods:                                       1 current / 3 desired
      Conditions:
        Type            Status  Reason            Message
        ----            ------  ------            -------
        AbleToScale     True    SucceededRescale  the HPA controller was able to update the target scale to 3
        ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
        ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
      Events:
        Type    Reason             Age                   From                       Message
        ----    ------             ----                  ----                       -------
        Normal  SuccessfulRescale  13s (x12 over 2m58s)  horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target
      
      • Check the RHDH Deployment and notice that it was scaled up (via the HPA), then scaled down (by the Operator):
      $ oc describe deployment backstage-bs1
      
      [...]
      Conditions:
        Type           Status  Reason
        ----           ------  ------
        Progressing    True    NewReplicaSetAvailable
        Available      True    MinimumReplicasAvailable
      OldReplicaSets:  <none>
      NewReplicaSet:   backstage-bs1-65999bf47b (1/1 replicas created)
      Events:
        Type    Reason             Age                  From                   Message
        ----    ------             ----                 ----                   -------
        Normal  ScalingReplicaSet  10m                  deployment-controller  Scaled up replica set backstage-bs1-65999bf47b to 1
        Normal  ScalingReplicaSet  3s (x10 over 2m18s)  deployment-controller  Scaled up replica set backstage-bs1-65999bf47b to 3 from 1
        Normal  ScalingReplicaSet  2s (x10 over 2m18s)  deployment-controller  Scaled down replica set backstage-bs1-65999bf47b to 1 from 3
      
      • The Operator logs confirm the behavior seen here:
      [...]
      2025-06-16T16:04:45Z    DEBUG   enqueuing reconcile on Deployment change        {"Deployment": "backstage-bs1", "namespace: ": "my-ns"}
      2025-06-16T16:04:45Z    DEBUG   apply object    {"controller": "backstage", "controllerGroup": "rhdh.redhat.com", "controllerKind": "Backstage", "Backstage": {"name":"bs1","namespace":"my-ns"
      }, "namespace": "my-ns", "name": "bs1", "reconcileID": "7a84983a-566b-4e76-8b0c-9c4d4fc9a91e", "/v1, Kind=ConfigMap": "backstage-appconfig-bs1"}
      [...]
      

      Actual results:

      Deployment automatically scaled by the HPA based on application usage, but reverted back by the Operator.

      Expected results:

      Operator should respect the autoscaling constraints defined by the HPA attached to the RHDH Deployment.
      This would help users adapting their RHDH instance to their usage - see RHIDP-4089

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      Additional info (Such as Logs, Screenshots, etc):

      Operator Logs attached.

              rh-ee-fndlovu Fortune Ndlovu
              rh-ee-asoro Armel Soro
              RHIDP - Install
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: