Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: ACM 2.9.0, ACM 2.8.0, ACM 2.7.0
Affects Version/s: ACM 2.9.3, ACM 2.8.5, ACM 2.7.12
Component/s: Documentation, Observability
Labels:
- doc-ack

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Release Note Type:
Known Issue
Release Note Status:
Proposed
Intelligence Requested:
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Create an informative issue (See each section, incomplete templates/issues won't be triaged)

Using the current documentation as a model, please complete the issue template.

Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.

Prerequisite: Start with what we have

Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:

- Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes

- Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs

Describe the changes in the doc and link to your dev story

Provide info for the following steps:

1. - [x] Mandatory Add the required version to the Fix version/s field.
All versions in the field

2. - [ ] Mandatory Choose the type of documentation change.

- [ ] New topic in an existing section or new section
- [x] Update to an existing topic

3. - [ ] Mandatory for GA content:

- [ ] Add steps and/or other important conceptual information here:

- [x] Add Required access level for the user to complete the task here:
ACM Administrator

- [ ] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)

- [x] Add link to dev story here:
https://issues.redhat.com/browse/ACM-8543

4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:

The observatorium api gateway pods in a restored hub may have stale tenant data after a backup & restore procedure. This is due to a kubernetes limitation per https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically

This results in observatorium api/thanos gateway to reject metrics from collectors, and ACM grafana dashboards won't show data.

This is evidenced by errors logged in observatorium api gateway pod logs
level=error name=observatorium caller=logchannel.go:129 msg="failed to forward metrics" returncode="500 Internal Server Error" response="no matching hashring to handle tenant\n" url=http://observability-thanos-receive.open-cluster-management-observability.svc.cluster.local:19291/api/v1/receive

and thanos receive pods logs have errors that look like this:
caller=handler.go:551 level=error component=receive component=receive-handler tenant=xxxx err="no matching hashring to handle tenant" msg="internal server error"
To resolve the problem,
1. scale Observatorium API gateway deployment down to zero
2. scale it back to 2 (or N, if custom deployment)
This will restart all observatorium API gateway pods with correct tenant information and the data from collectors will start showing up in Grafana in 5-10 minutes.

is documented by

ACM-8543 [RDR] ACM Observability doesn't work on passive hub post hub recovery

Closed

Assignee:: Brandi Swope

Reporter:: Subbarao Meduri

QA Contact:: Xiang Yin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/01/29 1:38 PM

Updated:: 2024/02/09 10:05 PM

Resolved:: 2024/02/09 10:05 PM

Details

Description

Create an informative issue (See each section, incomplete templates/issues won't be triaged)

Prerequisite: Start with what we have

Describe the changes in the doc and link to your dev story

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates