Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: ACM 2.6.7
Affects Version/s: ACM 2.6.7
Component/s: Observability
Labels:
- Obs-Core
- QE

Story Points:
1
Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
Observability Sprint 2023-11
Severity:
Important

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

ACM - Resource optimization along with several ACM dashboards today use label_values(kube_pod_info{clusterType!=\"ocp3\"},cluster) query to obtain a list of clusters to to populate cluster selection dashboard lists. The metric kube_pod_info obtains information about all pods in all clusters, which is very expensive in a large scale environment and can easily run into processing a million time series (total time series = number of clusters x number of pods on each cluster) and this results in timeouts in grafana.

We could instead use cluster_version metric which only uses 3 time series per cluster to get the same results. This metric is also supported on all ACM versions 2.5 and above so should be completely compatible when we back port the fix.

The metric cluster_version does not exist in OCP 3.11 so this fix will be applied to OCP 4 dashboards only.

dbennett@redhat.com jbanerje@redhat.com akrzos@redhat.com rhn-support-xiyin FYA

Version-Release number of selected component (if applicable):

How reproducible: Always

Steps to Reproduce: Access ACM - Resource Optimization dashboard in scale lab environment

Actual results: Dashboard times out with 504 error

Expected results: Dashboard should load quickly with a list of managed clusters

Additional info:

clones

ACM-5057 ACM Resource Optimization dashboard times out in large scale env [2.7]

Closed

Assignee:: Disaiah Bennett

Reporter:: Subbarao Meduri

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/04/20 12:37 PM

Updated:: 2023/08/30 2:41 AM

Resolved:: 2023/08/30 2:41 AM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible: Always

Steps to Reproduce: Access ACM - Resource Optimization dashboard in scale lab environment

Actual results: Dashboard times out with 504 error

Expected results: Dashboard should load quickly with a list of managed clusters

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates