Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-7332

Inefficient policy processing with selective policy enforcement in large environment

XMLWordPrintable

    • GRC Sprint 2023-16, GRC Sprint 2023-17, GRC Sprint 2023-18, GRC Sprint 2023-19
    • Important
    • No

      Description of problem:

      In the current root policy reconciliation process, the policy propagator controller processes all target clusters bound to that root policy including non-affected target clusters by comparing each replicated policy with the desired policy and updates/creates the replicated policy as needed.  In a large environment (e.g. 3500+ SNOs), with progressive policy rollout using selective policy enforcement (e.g., enforcing 100 clusters at a time), this can cause each reconciliation to take minutes longer than necessary to handle those non-affected clusters (e.g. ,3400+ SNOs).  Specifically, it adds significant delays for enforcing clusters with multiple policies that have dependencies. 

      Currently only 1 worker processes the root policies. It’s necessary to increase the number of workers to enable several root policies to be processed in parallel.

      Additionally, should consider eliminating inefficiencies in root policy processing by processing only impacted clusters.

      For example,

      • Root policy spec/template resources update -> processing all target clusters is required
      • Child policy spec update -> processing only the impacted cluster
      • Placement/PlacementRule/PlacementBinding update -> processing only the newly added clusters & the clusters have changed remediationAction

      Version-Release number of selected component (if applicable):

      2.9

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create 3500+ managed clusters and bind them to an inform policy
      2. Enforce 200 clusters by adding the clusters to the placement/placementRule that is associated in the "override" placementBinding
      3. ...

      Actual results:

      Expected results:

      Additional info:

      See the attached photo, the completion time for a cluster increases linearly as the number of clusters bound to that policy increases. 

       

      Processing time test for one root policy on KIND cluster with 104 CPUs allocated

        cluster count No-op (seconds) Enforce 200 clusters (seconds)
      1 3600 142 142
      2 200 6 6

      No-op: time spent on processing all clusters without update

      Enforce 200 clusters: time spent on processing all clusters but have 200 clusters to update

      Compared with line2, processing non-affected 3400 clusters take 2 more mins.

              yikim@redhat.com Yi Rae Kim
              angwang@redhat.com Angie Wang
              Matthew Prahl
              Derek Ho Derek Ho
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: