Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-27363

OSDFM dormant Service Cluster Autoscaling

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • Fleet Manager
    • None
    • OSDFM dormant Service Cluster Autoscaling
    • Product / Portfolio Work
    • 10
    • False
    • Hide

      This epic delivers the full Phase of OSDFM’s Service Cluster autoscaling system. It improves SC upscaling behavior, introduces robust dormant-SC lifecycle management, and strengthens region-level autoscaling controls. The work adds missing metrics and alerts required for new-region readiness, enhances provisioning shard lifecycle handling, standardizes SC creation behavior, and enables autoscaling features across Stage and Production.

      Show
      This epic delivers the full Phase of OSDFM’s Service Cluster autoscaling system. It improves SC upscaling behavior, introduces robust dormant-SC lifecycle management, and strengthens region-level autoscaling controls. The work adds missing metrics and alerts required for new-region readiness, enhances provisioning shard lifecycle handling, standardizes SC creation behavior, and enables autoscaling features across Stage and Production.
    • False
    • Hide

       

      1. SC autoscaler logic – SC upscaling behavior, safe SC creation, and saturation calculation.
      1. Dormant SC logic added – auto-marking, activation thresholds, and integration with watcher/autoscaler.
      1. New-region autoscaling metric improvements – 

      added the missing dormant-SC metrics to avoid false alerts, and ensured the Prod dormant-SC-creation alert is triggered only when region saturation reaches 73% and no dormant SC exists. Also added a dormant-SC activation alert when region saturation reaches 90% and no dormant SC has been set to Ready.

      1. Provisioning shard lifecycle enhancements – shard-full detection and maintenance handling.
      1. Standardized SC creation behavior – first SC created in maintenance, subsequent SCs in ready.
      1. Region-level autoscaling opt-out – via sc-autoscale-ignore label.
      1. Dormant SC feature enabled in Stage.
      2. Dormant SC feature enabled in Production.{}

       

      Show
        SC autoscaler logic – SC upscaling behavior, safe SC creation, and saturation calculation. Dormant SC logic added – auto-marking, activation thresholds, and integration with watcher/autoscaler. New-region autoscaling metric improvements –  added the missing dormant-SC metrics to avoid false alerts, and ensured the Prod dormant-SC-creation alert is triggered only when region saturation reaches 73% and no dormant SC exists. Also added a dormant-SC activation alert when region saturation reaches 90% and no dormant SC has been set to Ready. Provisioning shard lifecycle enhancements – shard-full detection and maintenance handling. Standardized SC creation behavior – first SC created in maintenance, subsequent SCs in ready. Region-level autoscaling opt-out – via sc-autoscale-ignore label. Dormant SC feature enabled in Stage. Dormant SC feature enabled in Production. { }  
    • Not Selected
    • Done
    • 0% To Do, 0% In Progress, 100% Done
    • 10
    • OSDFM Sprint 2, OSDFM Sprint 3, OSDFM Sprint 4

      OCP/Telco Definition of Done
      https://docs.google.com/document/d/1TP2Av7zHXz4_fmeX4q9HB0m9cqSZ4F6Jd4AiVoaF_2s/edit#heading=h.gaa58bzbvwde
      Epic Template descriptions and documentation.
      https://docs.google.com/document/d/14CUCEg6hQ_jpsFzJtWo29GfFVWmun2Uivrxq3_Fkgdg/edit
      ACM-wide Product Requirements (Top-level Epics)
      https://docs.google.com/document/d/1uIp6nS2QZ766UFuZBaC9USs8dW_I5wVdtYF9sUObYKg/edit

      *<--- Cut-n-Paste the entire contents of this description into your new
      Epic --->*

      Epic Goal

      ...

      Why is this important?

      ...

      Scenarios

      ...

      Acceptance Criteria

      ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. ...

      Open questions:

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
        Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub
        Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Doc issue opened with a completed template. Separate doc issue
        opened for any deprecation, removal, or any current known
        issue/troubleshooting removal from the doc, if applicable.
      • Considerations were made for Extended Update Support (EUS)

              chuluo@redhat.com Chunxi Luo
              chuluo@redhat.com Chunxi Luo
              Anna Francis Anna Francis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: