Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-7661

Observability - thanos-store-shard is not running after MCO deployment on BM with HCP

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Important
    • No

      Description of problem: since the BM cluster is without available PV, I customized some components with 1 replica to create 1 PVC. But thanos-store-shard is not running after MCO deployment on this BM cluster with HCP. 

      I think the replica to 1 setting is reasonable for testing, so is it the true product issue or environment impacted?
      ```

      observability-thanos-store-shard-0-0                       0/1     Running   2 (65m ago)

      ```

      there are some errors in the pod
      ```

      ts=2023-09-25T11:17:29.378170112Z caller=memcached_client.go:622 level=error name=index-cache msg="failed to resolve addresses for memcached" addresses=dnssrv+_client._tcp.observability-thanos-store-memcached.open-cluster-management-observability.svc err="lookup IP addresses \"_client._tcp.observability-thanos-store-memcached.open-cluster-management-observability.svc\": could not resolve \"observability-thanos-store-memcached-0.observability-thanos-store-memcached.open-cluster-management-observability.svc.cluster.local.\": all servers responded with errors to at least one search domain. Errs ;could not resolve observability-thanos-store-memcached-0.observability-thanos-store-memcached.open-cluster-management-observability.svc.cluster.local.: no servers returned a viable answer. Errs ;resolution against server 172.30.0.10 for observability-thanos-store-memcached-0.observability-thanos-store-memcached.open-cluster-management-observability.svc.cluster.local.: exchange: read udp 10.129.0.120:37845->172.30.0.10:53: i/o timeout"

      ```

      Version-Release number of selected component (if applicable): 

      How reproducible:

      Steps to Reproduce:

      1. create mcocr with some components customized replic to 1, like alaertmanager, store, receive, and rule.
      2. the store-shard pod is not running
      3. pod reports some above errors

      Actual results:

      Expected results:

      Additional info:

              smeduri1@redhat.com Subbarao Meduri
              cquredhat ChangLiang Qu
              Xiang Yin Xiang Yin
              ACM QE Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: