Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Blocker
Fix Version/s: ACM 2.10.0, Global Hub 1.1.0
Affects Version/s: Global Hub 1.0.0
Component/s: Global Hub, Observability, QE
Labels:

Blocked:
False
Blocked Reason:
None
Ready:
False
Acceptance Criteria:

Hide
With observability is configured (regardless of whether hub self-management is turned ON or OFF)
- Metrics from hub cluster are propagated to ACM Thanos
- Alerts from hub cluster are propagated to ACM Alert Manager
- Observability add-on is not present on the hub clusters
- Observability components (endpoint-observability-operator and metrics collector) are deployed in open-cluster-management-observability namespace on the hub
- The hub node selectors and tolerations are applied for observability components
- Hub cluster is listed as "local-cluster" in the list of managed clusters in ACM Grafana dashboards
- Managed cluster labels are available in ACM Overview dashboard
- mTLS certificates used for propagating metrics from collector to observatorium api are automatically refreshed before they expire.

Observability stack is not initialized in a global hub environment by MCO operator, even if a Observability CR is configured
- A global hub environment is reliably detected by the presence of Multicluster-global-hub CR or other ways.

When Observability is disabled (MCO CR removed), or ACM is uninstalled,
- observability components are removed from hub cluster

When ACM is upgraded from prior releases,
- Observability add-on is removed (which in turn removes endpoint-observability-operator and metrics-collector)
- Observability components are redeployed on open-cluster-management namespace

When custom metrics are applied (add/remove metrics, recording rules, dynamic metrics)
- metrics are automatically propagated to hub cluster, just like any other spoke cluster

When metrics collector encounters errors to propagate metrics for 2 scrape cycles (2 x 5 min = 10 min)
- local prometheus alert is generated to indicate metrics propagation is not working

The value of "observability" label on "local-cluster" managed cluster object
- has no effect on the metrics propagation behavior on the hub

- When alert propagation is disabled via MCO annotation "mco-disable-alerting: true", hub alerts are not propagated to ACM alert manager.

There is no change in behavior on regular managed spoke clusters.

Show
With observability is configured (regardless of whether hub self-management is turned ON or OFF) - Metrics from hub cluster are propagated to ACM Thanos - Alerts from hub cluster are propagated to ACM Alert Manager - Observability add-on is not present on the hub clusters - Observability components (endpoint-observability-operator and metrics collector) are deployed in open-cluster-management-observability namespace on the hub - The hub node selectors and tolerations are applied for observability components - Hub cluster is listed as "local-cluster" in the list of managed clusters in ACM Grafana dashboards - Managed cluster labels are available in ACM Overview dashboard - mTLS certificates used for propagating metrics from collector to observatorium api are automatically refreshed before they expire. Observability stack is not initialized in a global hub environment by MCO operator, even if a Observability CR is configured - A global hub environment is reliably detected by the presence of Multicluster-global-hub CR or other ways. When Observability is disabled (MCO CR removed), or ACM is uninstalled, - observability components are removed from hub cluster When ACM is upgraded from prior releases, - Observability add-on is removed (which in turn removes endpoint-observability-operator and metrics-collector) - Observability components are redeployed on open-cluster-management namespace When custom metrics are applied (add/remove metrics, recording rules, dynamic metrics) - metrics are automatically propagated to hub cluster, just like any other spoke cluster When metrics collector encounters errors to propagate metrics for 2 scrape cycles (2 x 5 min = 10 min) - local prometheus alert is generated to indicate metrics propagation is not working The value of "observability" label on "local-cluster" managed cluster object - has no effect on the metrics propagation behavior on the hub - When alert propagation is disabled via MCO annotation "mco-disable-alerting: true", hub alerts are not propagated to ACM alert manager. There is no change in behavior on regular managed spoke clusters.
Git Pull Request:
https://github.com/stolostron/multicluster-observability-operator/pull/1340
Intelligence Requested:
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Value Statement

the observability dashboard - ACM Cluster Overview is empty when imported as managed hub cluster. the reason is due to disable self management so that the metrics collector won't run in local-cluster. that is why the data is lost.

Three potential solutions:

since the data is related with hub cluster, it is not related with local-cluster. so even if the self management is disable, we should still collect such data. run the metrics collector instance in open-cluster-management-observability namespace. it is triggered by MCO CR.
import hub cluster in hosted mode. we only support this option. And we cannot enable other addons in the global hub. Only install registration and work agent for the managed hub clusters. we can install it into a different namespace. In this case, we do not require to disable self management. it disable the capability to use policy to deploy ACM.
enable the observability in the global hub cluster, so that the related metrics can be collected in the global hub thanos. then we can add the ACM cluster overview dashboard in global hub grafana (may have permission issue)

Definition of Done for Engineering Story Owner (Checklist)

Development Complete

The code is complete.
Functionality is working.
Any required downstream Docker file changes are made.

Tests Automated

[ ] Unit/function tests have been automated and incorporated into the
build.
[ ] 100% automated unit/function test coverage for new or changed APIs.

Secure Design

[ ] Security has been assessed and incorporated into your threat model.

Multidisciplinary Teams Readiness

[ ] Create an informative documentation issue using the [Customer
Portal_doc_issue template](
https://github.com/stolostron/backlog/issues/new?assignees=&labels=squad%3Adoc&template=doc_issue.md&title=),
and ensure doc acceptance criteria is met. Link the development issue to
the doc issue.
[ ] Provide input to the QE team, and ensure QE acceptance criteria
(established between story owner and QE focal) are met.

Support Readiness

[ ] The must-gather script has been updated.

depends on

ACM-10122 Observability is not available in the Global hub

Closed

ACM-10032 [doc] ACM Hub Metrics Collection

Closed

impacts account

ACM-10212 [QE Automation] --- Fix automation failures caused by ACM-8509

Closed

links to

openshift/release#51466: Update ACM release version for MCO and add managed cluster for e2e

Assignee:: Subbarao Meduri

Reporter:: Chunlin Yang

QA Contact:: Xiang Yin

Votes:: 2 Vote for this issue

Watchers:: 21 Start watching this issue

Created:: 2023/11/07 12:11 PM

Updated:: 2024/05/28 2:14 AM

Resolved:: 2024/05/28 2:14 AM

Details

Description

Value Statement

Definition of Done for Engineering Story Owner (Checklist)

Development Complete

Tests Automated

Secure Design

Multidisciplinary Teams Readiness

Support Readiness

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates