Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-6474

Conflict in replicas management between Quay Operator and unmanaged HPA

    • False
    • None
    • False
    • Documentation (Ref Guide, User Guide, etc.), Compatibility/Configuration, User Experience
    • Hide

      In order to reproduce the bug:

      1. Edit the Quay CRD and set the HPA as "managed: false"
      2. Edit the HPA resource and increase the value of minReplicas field

      In this case, the default number of replicas for the deployment is 2, so try to increase the number of minReplicas in the HPA to a higher value (e.g.: 4).

      An override of the number of replicas for the Quay deployment in the Quay CRD can be also added, as mentioned in the bug description.

      Show
      In order to reproduce the bug: Edit the Quay CRD and set the HPA as "managed: false" Edit the HPA resource and increase the value of minReplicas field In this case, the default number of replicas for the deployment is 2, so try to increase the number of minReplicas in the HPA to a higher value (e.g.: 4). An override of the number of replicas for the Quay deployment in the Quay CRD can be also added, as mentioned in the bug description.

      Editing the Quay CRD in order to set HPA as unmanaged and then increase the number of minReplicas in the HPA resources causes a conflict between Operator and HPA in the management of replicas.

      As a consequence, new pods are created by the HPA but removed by the Operator continuously, so the HPA stops working.

      This behaviour can be observed in the following scenarios:

      • HPA set as unmanaged in Quay CRD, keeping the deployment as managed, and minReplicas of HPA increased
      • HPA set as unmanaged in Quay CRD, keeping the deployment as managed with an override to increase the number of replicas, and minReplicas of HPA increased to the same value
        • In this case, continuous pods creation and deletion start when HPA tries to scale them

      It seems there is not a way to increase pod replicas while keeping the HPA properly scaling them.

            [PROJQUAY-6474] Conflict in replicas management between Quay Operator and unmanaged HPA

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Red Hat Quay v3.13.2 bug fix release), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:10967

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Red Hat Quay v3.13.2 bug fix release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:10967

            CPaaS Service Account mentioned this issue in merge request !447 of quay-midstream / quay-operator-cpaas on branch quay-3.9-rhel-8_upstream_42dfd0f31da572522ec1cf8687847382:

            Updated US source to: d3f4376 overrides: Allow nullable replicas (PROJQUAY-6474)

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in merge request !447 of quay-midstream / quay-operator-cpaas on branch quay-3.9-rhel-8_ upstream _42dfd0f31da572522ec1cf8687847382 : Updated US source to: d3f4376 overrides: Allow nullable replicas ( PROJQUAY-6474 )

            Sean Zhao added a comment -

            Verified on quay-operator-bundle-container-v3.13.2-7, issue fixed.

            prerequisite is adding  "overrides: {replicas: null}"  for component quay in registry cr

            Sean Zhao added a comment - Verified on quay-operator-bundle-container-v3.13.2-7, issue fixed. prerequisite is adding  " overrides: {replicas: null}"   for component quay in registry cr

            rh-ee-shudeshp thank you for testing the fix, I think the output you are getting is correct. When running the operator it will start processing the QuayRegistry object and since he is not managing the replicas this value is assumed to be null the replicas are scaled to 1 with a new replicaset after that the hpa scale out the replicas to 6.

            bcaton@redhat.com rh-ee-shudeshp the 3.13 branch includes the fix developed by jonathankingfc right? In previous comments I understand it was required to set replicas to null in the QuayRegistry CRD to avoid Quay Operator managing the replicas of quay app deployment. However in test performed I do not see this detail in the QuayRegistry object.

            Giovanni Luca Izzi added a comment - rh-ee-shudeshp thank you for testing the fix, I think the output you are getting is correct. When running the operator it will start processing the QuayRegistry object and since he is not managing the replicas this value is assumed to be null the replicas are scaled to 1 with a new replicaset after that the hpa scale out the replicas to 6. bcaton@redhat.com rh-ee-shudeshp the 3.13 branch includes the fix developed by jonathankingfc right? In previous comments I understand it was required to set replicas to null in the QuayRegistry CRD to avoid Quay Operator managing the replicas of quay app deployment. However in test performed I do not see this detail in the QuayRegistry object.

            Shubhra Jayant Deshpande added a comment - - edited

            This is the experience I am getting when I try to reproduce the steps above on 3.13 branch. Can you please confirm if this is the expected behavior?

            (see the attached screenshot for details):

            • Quay pods scale up to expected min replica as soon as I apply the custom HPA resource
            • Once operator is running, the pods get scaled down
            • The pods are scaled up eventually to the expected min replica count

            Steps I followed:
            1. Create CRD with HPA marked as unmanaged
            2. Apply my CRD -> quay app deployment is scaled to 2 at this point
            3. Create custom HPA resource and marked min_replica count to 6 
            4. Apply custom HPA resource -> quay app deployment is scaled to 6 from 2
            5. Ran operator -> quay app deployment was scaled down to 0 and then eventually scaled up to 6

             

            This is how my CRD looked

            apiVersion: quay.redhat.com/v1
            kind: QuayRegistry
            metadata:
              name: shubhra-registry
              namespace: shudeshp
            spec:
              components:
                - kind: objectstorage
                  managed: false
                - kind: monitoring
                  managed: true
                - kind: mirror
                  managed: false
                - kind: horizontalpodautoscaler
                  managed: false
                - kind: clairpostgres
                  managed: false
                - kind: clair
                  managed: false
                - kind: horizontalpodautoscaler
                  managed: false
              configBundleSecret: config-bundle-secret

            The HPA resource looks like this:

            apiVersion: autoscaling/v2
            kind: HorizontalPodAutoscaler
            metadata:
              name: shubhra-registry-quay-app
              namespace: shudeshp
            spec:
              scaleTargetRef:
                apiVersion: apps/v1
                kind: Deployment
                name: shubhra-registry-quay-app
              minReplicas: 6
              maxReplicas: 20
              metrics:
              - type: Resource
                resource:
                  name: cpu
                  target:
                    type: Utilization
                    averageUtilization: 90
              - type: Resource
                resource:
                  name: memory
                  target:
                    type: Utilization
                    averageUtilization: 90

             

            Shubhra Jayant Deshpande added a comment - - edited This is the experience I am getting when I try to reproduce the steps above on 3.13 branch. Can you please confirm if this is the expected behavior? (see the attached screenshot for details): Quay pods scale up to expected min replica as soon as I apply the custom HPA resource Once operator is running, the pods get scaled down The pods are scaled up eventually to the expected min replica count Steps I followed: 1. Create CRD with HPA marked as unmanaged 2. Apply my CRD -> quay app deployment is scaled to 2 at this point 3. Create custom HPA resource and marked min_replica count to 6  4. Apply custom HPA resource -> quay app deployment is scaled to 6 from 2 5. Ran operator -> quay app deployment was scaled down to 0 and then eventually scaled up to 6   This is how my CRD looked apiVersion: quay.redhat.com/v1 kind: QuayRegistry metadata:   name: shubhra-registry   namespace: shudeshp spec:   components:     - kind: objectstorage       managed: false     - kind: monitoring       managed: true     - kind: mirror       managed: false     - kind: horizontalpodautoscaler       managed: false     - kind: clairpostgres       managed: false     - kind: clair       managed: false     - kind: horizontalpodautoscaler       managed: false   configBundleSecret: config-bundle-secret The HPA resource looks like this: apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata:   name: shubhra-registry-quay-app   namespace: shudeshp spec:   scaleTargetRef:     apiVersion: apps/v1     kind: Deployment     name: shubhra-registry-quay-app   minReplicas: 6   maxReplicas: 20   metrics:   - type: Resource     resource:       name: cpu       target:         type: Utilization         averageUtilization: 90   - type: Resource     resource:       name: memory       target:         type: Utilization         averageUtilization: 90  

            CPaaS Service Account mentioned this issue in merge request !388 of quay-midstream / quay-operator-cpaas on branch quay-3.12-rhel-8_upstream_25a3d1a982148a4836299e556a88c31d:

            Updated US source to: 0e87715 hpa: Do not set replicas to 2 when override set to null (PROJQUAY-6474)

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in merge request !388 of quay-midstream / quay-operator-cpaas on branch quay-3.12-rhel-8_ upstream _25a3d1a982148a4836299e556a88c31d : Updated US source to: 0e87715 hpa: Do not set replicas to 2 when override set to null ( PROJQUAY-6474 )

            CPaaS Service Account mentioned this issue in merge request !387 of quay-midstream / quay-operator-cpaas on branch quay-3.11-rhel-8_upstream_6d723d1438302a59fec680dbf5dc1250:

            Updated US source to: 9193f6e hpa: Do not set replicas to 2 when override set to null (PROJQUAY-6474)

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in merge request !387 of quay-midstream / quay-operator-cpaas on branch quay-3.11-rhel-8_ upstream _6d723d1438302a59fec680dbf5dc1250 : Updated US source to: 9193f6e hpa: Do not set replicas to 2 when override set to null ( PROJQUAY-6474 )

            jonathankingfc I've opened another Jira for 3.11.6 and updated the target version for this to 3.12.3.

            Brandon Caton added a comment - jonathankingfc I've opened another Jira for 3.11.6 and updated the target version for this to 3.12.3.

            bcaton@redhat.com We are only cherry picking this PR into 3.10, 3.11, and 3.12. 3.11 and 3.12 cherry picks have been merged, 3.10 needs manual cherry pick since there is a merge conflict. I will open PR for this by EOD.

             

            rhn-support-gizzi Are we able to get the customer to upgrade to one of these versions? 3.8 is no longer supported and 3.9 is in maintenance mode.

            Jonathan King added a comment - bcaton@redhat.com We are only cherry picking this PR into 3.10, 3.11, and 3.12. 3.11 and 3.12 cherry picks have been merged, 3.10 needs manual cherry pick since there is a merge conflict. I will open PR for this by EOD.   rhn-support-gizzi Are we able to get the customer to upgrade to one of these versions? 3.8 is no longer supported and 3.9 is in maintenance mode.

            jonathankingfc does this just need to be cherry-picked into 3.10, 3.11, and 3.12?

            Brandon Caton added a comment - jonathankingfc does this just need to be cherry-picked into 3.10, 3.11, and 3.12?

              rh-ee-shudeshp Shubhra Jayant Deshpande
              rh-ee-sflocco Serena Flocco (Inactive)
              Votes:
              6 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: