Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: ACM 2.9.0
Affects Version/s: ACM 2.9.0
Component/s: GRC
Labels:

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
💪 Policy Propagator Performance Improvements 💪
Intelligence Requested:
Market:
RH Private Keywords:

Sprint:
GRC Sprint 2023-16, GRC Sprint 2023-17, GRC Sprint 2023-18, GRC Sprint 2023-19
Severity:
Important

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

In the current root policy reconciliation process, the policy propagator controller processes all target clusters bound to that root policy including non-affected target clusters by comparing each replicated policy with the desired policy and updates/creates the replicated policy as needed. In a large environment (e.g. 3500+ SNOs), with progressive policy rollout using selective policy enforcement (e.g., enforcing 100 clusters at a time), this can cause each reconciliation to take minutes longer than necessary to handle those non-affected clusters (e.g. ,3400+ SNOs). Specifically, it adds significant delays for enforcing clusters with multiple policies that have dependencies.

Currently only 1 worker processes the root policies. It’s necessary to increase the number of workers to enable several root policies to be processed in parallel.

Additionally, should consider eliminating inefficiencies in root policy processing by processing only impacted clusters.

For example,

Root policy spec/template resources update -> processing all target clusters is required
Child policy spec update -> processing only the impacted cluster
Placement/PlacementRule/PlacementBinding update -> processing only the newly added clusters & the clusters have changed remediationAction

Version-Release number of selected component (if applicable):

2.9

How reproducible:

100%

Steps to Reproduce:

Create 3500+ managed clusters and bind them to an inform policy
Enforce 200 clusters by adding the clusters to the placement/placementRule that is associated in the "override" placementBinding
...

Actual results:

Expected results:

Additional info:

See the attached photo, the completion time for a cluster increases linearly as the number of clusters bound to that policy increases.

Processing time test for one root policy on KIND cluster with 104 CPUs allocated

	cluster count	No-op (seconds)	Enforce 200 clusters (seconds)
1	3600	142	142
2	200	6	6

No-op: time spent on processing all clusters without update

Enforce 200 clusters: time spent on processing all clusters but have 200 clusters to update

Compared with line2, processing non-affected 3400 clusters take 2 more mins.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

TALM_with_SPE.png
140 kB
2023/09/05 11:53 PM
TALM-SPE-With-Fixes (Run52).png
184 kB
2023/12/03 6:32 PM

is duplicated by

ACM-7403 Placement and PlacementBinding changes should not cause all policies to be regenerated

Closed

is related to

ACM-7403 Placement and PlacementBinding changes should not cause all policies to be regenerated

Closed

Assignee:: Yi Rae Kim

Reporter:: Angie Wang

Contributors:: Matthew Prahl

QA Contact:: Derek Ho

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/09/05 11:54 PM

Updated:: 2023/12/04 3:38 PM

Resolved:: 2023/12/04 3:37 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates