Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59638

Openshift Vertical Pod Autoscaler: fix checkpoint gc of unknown recommenders

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • 4.19.z
    • 4.16, 4.17, 4.18
    • Pod Autoscaler
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • AUTOSCALE - Sprint 275
    • 1
    • Done
    • Bug Fix
    • Hide
      When using multiple recommenders for the VPA, there was a bug where the default VPA recommender would erroneously garbage collect VPACheckpoint objects that belonged to a VPA that was associated to a recommender that was not the default.

      This bug fix prevents the default recommender from garbage collecting those other recommender's "owned" checkpoints.
      Show
      When using multiple recommenders for the VPA, there was a bug where the default VPA recommender would erroneously garbage collect VPACheckpoint objects that belonged to a VPA that was associated to a recommender that was not the default. This bug fix prevents the default recommender from garbage collecting those other recommender's "owned" checkpoints.
    • None
    • None
    • None
    • None

      Description of problem:

      We have configured Openshift Vertical Pod Autoscaler custom recommenders  as explained in https://www.redhat.com/en/blog/how-to-enable-a-customized-vpa-recommender-on-openshift
      
      However, due to an upstream bug in VPA version: https://github.com/kubernetes/autoscaler/issues/6387
      The Openshift VPA is constantly removing the non-tracked checkpoints (via its garbage collector), so the custom recommender is always recreating the checkpoints and the VPA Updates are not stable over time (pods get always rescheduled).
      
      The upstream bug have been fixed in latest release of the VPA version: https://github.com/kubernetes/autoscaler/pull/6767
      However Openshift does not contain this version 
      
      As per discussion with engineering team on slack, openshift 4.20 VPA release will have this fix automatically. We're looking for backporting of this fix in 4.15+

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          Slack discussion: https://redhat-internal.slack.com/archives/C02F1J9UJJD/p1747920002223329

              rh-ee-macao Max Cao
              rhn-support-aksjadha Akshata Jadhav
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: