Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-3545

RHOSDT 2.9 Tempo 2.1.1 does not work on IBM Z (s390x) architecture

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • rhosdt-3.4
    • None
    • None
    • None
    • Tracing Sprint # 242, Tracing Sprint # 260
    • Customer Escalated

      Tempo ingester throws following error when querier tries to read data from it ("the live data")

       

       

      level=info ts=2023-08-30T16:32:31.563913106Z caller=main.go:215 msg="initialising OpenTracing tracer"level=info ts=2023-08-30T16:32:31.565248933Z caller=main.go:102 msg="Starting Tempo" version="(version=2.1.1, branch=HEAD, revision=4157d7620)"level=info ts=2023-08-30T16:32:31.574473679Z caller=server.go:334 http=[::]:3200 grpc=[::]:9095 msg="server listening on addresses"level=info ts=2023-08-30T16:32:31.575350273Z caller=memberlist_client.go:437 msg="Using memberlist cluster label and node name" cluster_label= node=tempo-simplest36-ingester-0-e7855eb6level=info ts=2023-08-30T16:32:31.575454147Z caller=module_service.go:82 msg=initialising module=internal-serverlevel=info ts=2023-08-30T16:32:31.575501301Z caller=module_service.go:82 msg=initialising module=storelevel=info ts=2023-08-30T16:32:31.575529005Z caller=module_service.go:82 msg=initialising module=serverlevel=info ts=2023-08-30T16:32:31.575616997Z caller=module_service.go:82 msg=initialising module=memberlist-kvlevel=info ts=2023-08-30T16:32:31.575637887Z caller=module_service.go:82 msg=initialising module=overrideslevel=info ts=2023-08-30T16:32:31.575692647Z caller=module_service.go:82 msg=initialising module=ingesterlevel=info ts=2023-08-30T16:32:31.575713835Z caller=ingester.go:327 msg="beginning wal replay"level=warn ts=2023-08-30T16:32:31.575991304Z caller=wal.go:112 msg="unowned file entry ignored during wal replay" file=blocks err=nulllevel=info ts=2023-08-30T16:32:31.576022441Z caller=ingester.go:365 msg="wal replay complete"level=info ts=2023-08-30T16:32:31.576395076Z caller=ingester.go:379 msg="reloading local blocks" tenants=0level=info ts=2023-08-30T16:32:31.576473854Z caller=app.go:188 msg="Tempo started"level=info ts=2023-08-30T16:32:31.576693103Z caller=memberlist_client.go:543 msg="memberlist fast-join starting" nodes_found=1 to_join=2level=info ts=2023-08-30T16:32:31.580934342Z caller=memberlist_client.go:563 msg="memberlist fast-join finished" joined_nodes=2 elapsed_time=4.245506mslevel=info ts=2023-08-30T16:32:31.580970147Z caller=memberlist_client.go:576 msg="joining memberlist cluster" join_members=tempo-simplest36-gossip-ringlevel=info ts=2023-08-30T16:32:31.580996341Z caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=ingesterlevel=info ts=2023-08-30T16:32:31.581205744Z caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=ingesterlevel=info ts=2023-08-30T16:32:31.583029353Z caller=memberlist_client.go:595 msg="joining memberlist cluster succeeded" reached_nodes=2 elapsed_time=2.05928mspanic: runtime error: index out of range [7] with length 7
      goroutine 1272 [running]:github.com/segmentio/parquet-go.(*byteArrayPage).index(...)	/home/ploffay/projects/grafana/tempo/vendor/github.com/segmentio/parquet-go/page.go:982github.com/segmentio/parquet-go.(*byteArrayDictionary).lookupString(0xc0001f15e0, {0xc000ffe000, 0x33, 0x400}, {{0xc0011e0000, 0x33, 0x18}})	/home/ploffay/projects/grafana/tempo/vendor/github.com/segmentio/parquet-go/dictionary_purego.go:42 +0x162github.com/segmentio/parquet-go.(*byteArrayDictionary).Lookup(0xc0001f15e0, {0xc000ffe000, 0x33, 0x400}, {0xc0011e0000, 0x33, 0x3e8})	/home/ploffay/projects/grafana/tempo/vendor/github.com/segmentio/parquet-go/dictionary.go:748 +0x142github.com/segmentio/parquet-go.(*indexedPageValues).ReadValues(0xc0011321c0, {0xc0011e0000, 0x33, 0x3e8})	/home/ploffay/projects/grafana/tempo/vendor/github.com/segmentio/parquet-go/dictionary.go:1332 +0xd6github.com/segmentio/parquet-go.(*repeatedPageValues).ReadValues(0xc000c8a060, {0xc0011e0000, 0x3e8, 0x3e8})	/home/ploffay/projects/grafana/tempo/vendor/github.com/segmentio/parquet-go/page_values.go:98 +0x192github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).iterate.func3.2(0xc0011dff48, 0xc0011dfec8, 0xc00021c9c0, {0xc0011e0000, 0x3e8, 0x3e8}, 0x3e8, {0x283dc58, 0xc0006261b0}, {0x2851920, ...})	/home/ploffay/projects/grafana/tempo/pkg/parquetquery/iters.go:406 +0x26cgithub.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).iterate.func3(0xc00021c9c0, 0xc0011dff48, 0xc0011dfec8, {0xc0011e0000, 0x3e8, 0x3e8}, 0x3e8, {0x283dc58, 0xc0006261b0}, {0x2849b48, ...})	/home/ploffay/projects/grafana/tempo/pkg/parquetquery/iters.go:459 +0x1e4github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).iterate(0xc00021c9c0, {0x283dc58, 0xc0006261b0}, 0x3e8)	/home/ploffay/projects/grafana/tempo/pkg/parquetquery/iters.go:464 +0x676github.com/grafana/tempo/pkg/parquetquery.NewColumnIterator.func1()	/home/ploffay/projects/grafana/tempo/pkg/parquetquery/iters.go:299 +0x42created by github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).next	/home/ploffay/projects/grafana/tempo/pkg/parquetquery/iters.go:485 +0x60
       

       

      The following error appears in querier:

      level=warn ts=2023-08-30T16:35:40.234230788Z caller=logging.go:111 traceID=0d4b5479201902d0 msg="GET /querier/api/search?end=1693413323&limit=20&maxDuration=0s&minDuration=0s&start=1693411540&tags=service.name%3Dmysql (500) 676.589µs Response: \"error querying ingesters in Querier.Search: failed to get response from ingesters: failed to execute f() for 10.120.3.23:9095: rpc error: code = Unavailable desc = connection error: desc = \\\"transport: Error while dialing dial tcp 10.120.3.23:9095: connect: connection refused\\\"\\n\" ws: false; Accept-Encoding: gzip; User-Agent: Go-http-client/1.1; X-Scope-Orgid: single-tenant; " 

      The issue does not appear on other architectures (e.g. IBM P).

       

      Reported issue in the upstream: https://github.com/parquet-go/parquet-go/issues/56 

       

      I have tried rebuilding Tempo with -tags purego (which is used by the parquet) and I got the same result

      GOOS=linux GOARCH=s390x make tempo (add -tags purego in Makefile)
      docker buildx build --push --platform linux/s390x --tag pavolloffay/tempo-ibmz-go120:2.1.1 --build-arg=TARGETARCH=s390x -f ./cmd/tempo/Dockerfile .

      Tempo CR:

      kubectl apply -f - <<EOF
      apiVersion: tempo.grafana.com/v1alpha1
      kind: TempoStack
      metadata:
        name: simplest36
      spec:
        images:
          tempo: pavolloffay/tempo-ibmz-go120:2.1.1
        resources:
          total:
            limits:
              cpu: '2'
              memory: 2Gi
        search:
          defaultResultLimit: 20
          maxDuration: 0s
        managementState: Managed
        limits:
          global:
            ingestion: {}
            query:
              maxSearchDuration: 0s
        template:
          compactor: {}
          distributor:
            replicas: 1
          gateway:
            component: {}
            enabled: false
            ingress:
              route: {}
          ingester:
            replicas: 1
          querier: {}
          queryFrontend:
            component: {}
            jaegerQuery:
              enabled: true
              ingress:
                route:
                  termination: edge
                type: route
        replicationFactor: 1
        storage:
          secret:
            name: minio-test
            type: s3
        storageSize: 200M
        retention:
          global:
            traces: 48h0m0s
      EOF 

       

        1. distributor-metrics.txt
          70 kB
          Pavol Loffay
        2. ingester-metrics.txt
          60 kB
          Pavol Loffay
        3. Screenshot of Jaeger UI (8).jpg
          247 kB
          Pavol Loffay

              ploffay@redhat.com Pavol Loffay
              ploffay@redhat.com Pavol Loffay
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: