Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-9681

Known issue: ACM Grafana does not show data after a hub restore procedure


    • False
    • None
    • False
    • No
    • Known Issue
    • Proposed

      Create an informative issue (See each section, incomplete templates/issues won't be triaged)

      Using the current documentation as a model, please complete the issue template. 

      Note: Doc team updates the current version and the two previous versions (n-2). For earlier versions, we will address only high-priority, customer-reported issues for releases in support.

      Prerequisite: Start with what we have

      Always look at the current documentation to describe the change that is needed. Use the source or portal link for Step 4:

       - Use the Customer Portal: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes

       - Use the GitHub link to find the staged docs in the repository: https://github.com/stolostron/rhacm-docs 

      Describe the changes in the doc and link to your dev story

      Provide info for the following steps:

      1. - [x] Mandatory Add the required version to the Fix version/s field.
      All versions in the field

      2. - [ ] Mandatory Choose the type of documentation change.

            - [ ] New topic in an existing section or new section
            - [x] Update to an existing topic

      3. - [ ] Mandatory for GA content:
             - [ ] Add steps and/or other important conceptual information here: 
             - [x] Add Required access level for the user to complete the task here:
             ACM Administrator

             - [ ] Add verification at the end of the task, how does the user verify success (a command to run or a result to see?)
             - [x] Add link to dev story here:

      4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:

      The observatorium api gateway pods in a restored hub may have stale tenant data after a backup & restore procedure. This is due to a kubernetes limitation per https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#mounted-configmaps-are-updated-automatically 

      This results in observatorium api/thanos gateway to reject metrics from collectors, and ACM grafana dashboards won't show data.

      This is evidenced by errors logged in observatorium api gateway pod logs
      level=error name=observatorium caller=logchannel.go:129 msg="failed to forward metrics" returncode="500 Internal Server Error" response="no matching hashring to handle tenant\n" url=http://observability-thanos-receive.open-cluster-management-observability.svc.cluster.local:19291/api/v1/receive

      and thanos receive pods logs have errors that look like this:
      caller=handler.go:551 level=error component=receive component=receive-handler tenant=xxxx err="no matching hashring to handle tenant" msg="internal server error"
      To resolve the problem, 
      1. scale Observatorium API gateway deployment down to zero
      2. scale it back to 2 (or N, if custom deployment)
      This will restart all observatorium API gateway pods with correct tenant information and the data from collectors will start showing up in Grafana in 5-10 minutes.

            bswope@redhat.com Brandi Swope
            smeduri1@redhat.com Subbarao Meduri
            Xiang Yin Xiang Yin
            0 Vote for this issue
            3 Start watching this issue
