Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-8335

Observability - store pod failed to run on SNO on KVM/Bare-Metal

XMLWordPrintable

    • False
    • None
    • False
    • Important
    • No

      Description of problem: This issue is similar with issue https://issues.redhat.com/browse/ACM-7661, but the pod error is not the same.

      ```

      ts=2023-10-25T07:27:37.975637218Z caller=store.go:418 level=info msg="initializing bucket store"
      ts=2023-10-25T07:28:07.980618437Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.980686207Z caller=http.go:91 level=info service=http/server component=store msg="internal server is shutting down" err="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.980765223Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready
      ts=2023-10-25T07:28:07.980826009Z caller=grpc.go:131 level=info service=gRPC/server component=store msg="listening for serving gRPC" address=0.0.0.0:10901
      ts=2023-10-25T07:28:07.9809763Z caller=http.go:110 level=info service=http/server component=store msg="internal server is shutdown gracefully" err="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.981016962Z caller=intrumentation.go:81 level=info msg="changing probe status" status=not-healthy reason="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.981151498Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.981182126Z caller=grpc.go:138 level=info service=gRPC/server component=store msg="internal server is shutting down" err="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.98120577Z caller=grpc.go:151 level=info service=gRPC/server component=store msg="gracefully stopping internal server"
      ts=2023-10-25T07:28:07.98125267Z caller=grpc.go:164 level=info service=gRPC/server component=store msg="internal server is shutdown gracefully" err="bucket store initial sync: sync block: BaseFetcher: iter bucket: Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout"
      ts=2023-10-25T07:28:07.981406558Z caller=main.go:161 level=error err="Get \"http://obs-acm29.s3.dualstack.us-east-1.amazonaws.com/?delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=\": dial tcp 52.217.84.48:80: i/o timeout\nBaseFetcher: iter bucket\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetchMetadata\n\t/remote-source/thanos/app/pkg/block/fetcher.go:383\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetch.func2\n\t/remote-source/thanos/app/pkg/block/fetcher.go:456\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/remote-source/thanos/deps/gomod/pkg/mod/github.com/golang/groupcache@v0.0.0-20210331224755-41bb18bfe9da/singleflight/singleflight.go:56\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetch\n\t/remote-source/thanos/app/pkg/block/fetcher.go:454\ngithub.com/thanos-io/thanos/pkg/block.(*MetaFetcher).Fetch\n\t/remote-source/thanos/app/pkg/block/fetcher.go:514\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).SyncBlocks\n\t/remote-source/thanos/app/pkg/store/bucket.go:561\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/remote-source/thanos/app/pkg/store/bucket.go:630\nmain.runStore.func5.1\n\t/remote-source/thanos/app/cmd/thanos/store.go:427\ngithub.com/thanos-io/thanos/pkg/runutil.RetryWithLog\n\t/remote-source/thanos/app/pkg/runutil/runutil.go:97\ngithub.com/thanos-io/thanos/pkg/runutil.Retry\n\t/remote-source/thanos/app/pkg/runutil/runutil.go:87\nmain.runStore.func5\n\t/remote-source/thanos/app/cmd/thanos/store.go:426\ngithub.com/oklog/run.(*Group).Run.func1\n\t/remote-source/thanos/deps/gomod/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\nsync block\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/remote-source/thanos/app/pkg/store/bucket.go:631\nmain.runStore.func5.1\n\t/remote-source/thanos/app/cmd/thanos/store.go:427\ngithub.com/thanos-io/thanos/pkg/runutil.RetryWithLog\n\t/remote-source/thanos/app/pkg/runutil/runutil.go:97\ngithub.com/thanos-io/thanos/pkg/runutil.Retry\n\t/remote-source/thanos/app/pkg/runutil/runutil.go:87\nmain.runStore.func5\n\t/remote-source/thanos/app/cmd/thanos/store.go:426\ngithub.com/oklog/run.(*Group).Run.func1\n\t/remote-source/thanos/deps/gomod/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\nbucket store initial sync\nmain.runStore.func5\n\t/remote-source/thanos/app/cmd/thanos/store.go:432\ngithub.com/oklog/run.(*Group).Run.func1\n\t/remote-source/thanos/deps/gomod/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\nstore command failed\nmain.main\n\t/remote-source/thanos/app/cmd/thanos/main.go:161\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598"

      ```

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:

      1. Due to there's no enough pv on this SNO cluster, so I customized and set each component with 1 replica, so each component only create one pvc
        ```
        cqu-mac:bucket changliangqu$ oc get pvc
        NAME                                           STATUS   VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
        alertmanager-db-observability-alertmanager-0   Bound    nfs-pv5    30Gi       RWO            standard       71m
        data-observability-thanos-compact-0            Bound    nfs-pv2    30Gi       RWO            standard       70m
        data-observability-thanos-receive-default-0    Bound    nfs-pv10   30Gi       RWO            standard       70m
        data-observability-thanos-rule-0               Bound    nfs-pv7    30Gi       RWO            standard       70m
        data-observability-thanos-store-shard-0-0      Bound    nfs-pv9    30Gi       RWO            standard       70m
        ```
      2. all pods are running, except store pod
        ```
        observability-thanos-store-shard-0-0                       0/1     CrashLoopBackOff   15 (63s ago)   71m
        ```
      3.  

      Actual results:

      Expected results:

      Additional info:

              smeduri1@redhat.com Subbarao Meduri
              cquredhat ChangLiang Qu
              Xiang Yin Xiang Yin
              ACM QE Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: