Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60609

[release-4.16] Openshift Vertical Pod Autoscaler: fix checkpoint gc of unknown recommenders

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • 4.16.z
    • 4.16, 4.17, 4.18
    • Pod Autoscaler
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 0
    • None
    • None
    • None
    • AUTOSCALE - Sprint 276, AUTOSCALE - Sprint 277
    • 2
    • Done
    • Bug Fix
    • Hide
      * Before this update, when you used multiple recommenders for the Vertical Pod Autoscaler (VPA), the default VPA recommender would erroneously garbage collect `VPACheckpoint` objects that belonged to a VPA that was associated with a non-default recommender. With this release, the default recommender is prevented from garbage collecting the `VPACheckpoint` objects for non-default recommenders. (link:https://issues.redhat.com/browse/OCPBUGS-60609[OCPBUGS-60609])
      Show
      * Before this update, when you used multiple recommenders for the Vertical Pod Autoscaler (VPA), the default VPA recommender would erroneously garbage collect `VPACheckpoint` objects that belonged to a VPA that was associated with a non-default recommender. With this release, the default recommender is prevented from garbage collecting the `VPACheckpoint` objects for non-default recommenders. (link: https://issues.redhat.com/browse/OCPBUGS-60609 [ OCPBUGS-60609 ])
    • None
    • None
    • None
    • None

      Description of problem:

      We have configured Openshift Vertical Pod Autoscaler custom recommenders  as explained in https://www.redhat.com/en/blog/how-to-enable-a-customized-vpa-recommender-on-openshift
      
      However, due to an upstream bug in VPA version: https://github.com/kubernetes/autoscaler/issues/6387
      The Openshift VPA is constantly removing the non-tracked checkpoints (via its garbage collector), so the custom recommender is always recreating the checkpoints and the VPA Updates are not stable over time (pods get always rescheduled).
      
      The upstream bug have been fixed in latest release of the VPA version: https://github.com/kubernetes/autoscaler/pull/6767
      However Openshift does not contain this version 
      
      As per discussion with engineering team on slack, openshift 4.20 VPA release will have this fix automatically. We're looking for backporting of this fix in 4.15+

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          Slack discussion: https://redhat-internal.slack.com/archives/C02F1J9UJJD/p1747920002223329

              rh-ee-macao Max Cao
              rhn-support-aksjadha Akshata Jadhav
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: