-
Task
-
Resolution: Done
-
Undefined
-
ACM 2.9.3, ACM 2.8.5, ACM 2.7.12
-
False
-
None
-
False
-
Known Issue
-
Proposed
-
-
-
No
Create an informative issue (See each section, incomplete templates/issues won't be triaged)
Using the current documentation as a model, please complete the issue template.
Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.
Prerequisite: Start with what we have
Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:
- Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes
- Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs
Describe the changes in the doc and link to your dev story
Provide info for the following steps:
1. - [x] Mandatory Add the required version to the Fix version/s field.
All versions in the field
2. - [ ] Mandatory Choose the type of documentation change.
- [ ] New topic in an existing section or new section
- [x] Update to an existing topic
3. - [ ] Mandatory for GA content:
- [ ] Add steps and/or other important conceptual information here:
- [x] Add Required access level for the user to complete the task here:
ACM Administrator
- [ ] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)
- [x] Add link to dev story here:
https://issues.redhat.com/browse/ACM-8543
4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:
The observatorium api gateway pods in a restored hub may have stale tenant data after a backup & restore procedure. This is due to a kubernetes limitation per https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically
This results in observatorium api/thanos gateway to reject metrics from collectors, and ACM grafana dashboards won't show data.
This is evidenced by errors logged in observatorium api gateway pod logs
level=error name=observatorium caller=logchannel.go:129 msg="failed to forward metrics" returncode="500 Internal Server Error" response="no matching hashring to handle tenant\n" url=http://observability-thanos-receive.open-cluster-management-observability.svc.cluster.local:19291/api/v1/receive
and thanos receive pods logs have errors that look like this:
caller=handler.go:551 level=error component=receive component=receive-handler tenant=xxxx err="no matching hashring to handle tenant" msg="internal server error"
To resolve the problem,
1. scale Observatorium API gateway deployment down to zero
2. scale it back to 2 (or N, if custom deployment)
This will restart all observatorium API gateway pods with correct tenant information and the data from collectors will start showing up in Grafana in 5-10 minutes.
- is documented by
-
ACM-8543 [RDR] ACM Observability doesn't work on passive hub post hub recovery
- Closed