Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-4238

For the Larger traces Jaeger UI doesn't show the traces.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • rhosdt-3.4
    • None
    • Jaeger
    • None
    • Tracing Sprint # 260
    • Moderate

      Description of problem:

      Customer is not able to see the output for the larger traces on the Jaeger     UI following error comes :
      There was an error querying for traces:
      HTTP Error: 502 - Bad Gateway
      Status    502
      Status text    Bad Gateway
      URL    /api/traces
      [1] Customer has a namespace "cdi-performance" where large traces are observed and that causes the timeout error.
      [2] Customer has tried to extract the response via curl as well but even that results in "502 Bad Gateway" error.
      curl 'https://jaeger-streaming-tracing-system.apps.1fs-cdi-dev.resbank.co.za/api/traces?end=1712936347443000&limit=20&lookback=1h&maxDuration&minDuration&service=kafka-bridge-performance&start=1712932747443000' \
        -H 'sec-ch-ua: "Google Chrome";v="123", "Not:A-Brand";v="8", "Chromium";v="123"' \
        -H 'Referer: https://jaeger-streaming-tracing-system.apps.1fs-cdi-dev.resbank.co.za/search?end=1712936334695000&limit=20&lookback=5m&maxDuration&minDuration&service=kafka-bridge-performance&start=1712936034695000' \
        -H 'sec-ch-ua-mobile: ?0' \
        -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36' \
        -H 'sec-ch-ua-platform: "Windows"'Result: 502 Bad Gateway
      [3] Customer is using Standalone Jaeger without Service Mesh in the namespace tracing-system.
      [4] Elasticsearch resources ( CPU and Memory ) do not seem to be the problem as we have checked that the CPU / memory does not reach the peak while querying from UI.
      [5] Following pods are deployed in the tracing system namespace.$ omc get po 
      NAME                                                              READY   STATUS      RESTARTS   AGE
      elasticsearch-cdm-tracingsystemjaegerstreaming-1-6cdd9f4dcmvf2g   2/2     Running     0          7d
      elasticsearch-cdm-tracingsystemjaegerstreaming-2-59d766446b5bmv   2/2     Running     0          7d
      elasticsearch-cdm-tracingsystemjaegerstreaming-3-9854894d7k5qhx   2/2     Running     0          7d
      jaeger-streaming-collector-6b684b7c46-qfn2t                       1/1     Running     0          4d
      jaeger-streaming-collector-6b684b7c46-rs6n2                       1/1     Running     0          4d
      jaeger-streaming-collector-6b684b7c46-vrnfm                       1/1     Running     1          4d
      jaeger-streaming-es-index-cleaner-28545115-s7xvs                  0/1     Completed   0          2d
      jaeger-streaming-es-index-cleaner-28546555-pstdc                  0/1     Completed   0          1d
      jaeger-streaming-es-index-cleaner-28547995-gw5cf                  0/1     Completed   0          15h
      jaeger-streaming-ingester-7db9cdd85-drzgs                         1/1     Running     0          4d
      jaeger-streaming-ingester-7db9cdd85-qxpsl                         1/1     Running     0          4d
      jaeger-streaming-ingester-7db9cdd85-zmq52                         1/1     Running     0          4d
      jaeger-streaming-query-5d7bf4bc4f-d6t25                           3/3     Running     0          4d
      jaeger-streaming-query-5d7bf4bc4f-k9z6c                           3/3     Running     0          4d
      jaeger-streaming-query-5d7bf4bc4f-mwjml                           3/3     Running     0          4d
      [6] Following log is generated when querying Jaeger UI to find the traces in the jaeger-streaming-query pods jaeger-query container.oc logs jaeger-streaming-query-5d7bf4bc4f-d6t25 -c jaeger-query2024-04-08T18:47:18.045992632Z {"level":"error","ts":1712602038.0457873,"caller":"app/http_handler.go:492","msg":"HTTP handler, Internal Server Error","error":"Get \"https://elasticsearch.tracing-system.svc.cluster.local:9200/_msearch\": context canceled","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\t/remote-source/jaeger/app/cmd/query/app/http_handler.go:492\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).search\n\t/remote-source/jaeger/app/cmd/query/app/http_handler.go:241\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.traceResponseHandler.func2\n\t/remote-source/jaeger/app/cmd/query/app/http_handler.go:536\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleFunc.WithRouteTag.func3\n\t/remote-source/jaeger/deps/gomod/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.46.1/handler.go:285\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP\n\t/remote-source/jaeger/deps/gomod/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.46.1/handler.go:229\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1\n\t/remote-source/jaeger/deps/gomod/pkg/mod/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.46.1/handler.go:81\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/remote-source/jaeger/deps/gomod/pkg/mod/github.com/gorilla/mux@v1.8.1/mux.go:212\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.additionalHeadersHandler.func4\n\t/remote-source/jaeger/app/cmd/query/app/additional_headers_handler.go:28\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\ngithub.com/jaegertracing/jaeger/cmd/query/app.createHTTPServer.CompressHandler.CompressHandlerLevel.func6\n\t/remote-source/jaeger/deps/gomod/pkg/mod/github.com/gorilla/handlers@v1.5.1/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2136\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\t/remote-source/jaeger/deps/gomod/pkg/mod/github.com/gorilla/handlers@v1.5.1/recovery.go:78\nnet/http.serverHandler.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/lib/golang/src/net/http/server.go:2009"}
      [7] This issue specifically happens for "cdi-performance" namespace.
      [8] We have also attempted to check on the elasticsearch by performing some basic checks like indices , shards health of ES cluster all seems to be fine and does not point to any problem.
      [9] In the ES logs we can see the following logs constantly:2024-04-09T04:14:59.827398673Z [2024-04-09T04:14:59,827][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-cdm-tracingsystemjaegerstreaming-3] [gc][312377] overhead, spent [362ms] collecting in the last [1s]
      2024-04-10T08:51:13.397458816Z [2024-04-10T08:51:13,396][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-cdm-tracingsystemjaegerstreaming-3] [gc][415273] overhead, spent [391ms] collecting in the last [1.2s]
      2024-04-10T10:34:06.305059922Z [2024-04-10T10:34:06,304][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-cdm-tracingsystemjaegerstreaming-3] [gc][421443] overhead, spent [293ms] collecting in the last [1s] 

       

      Version-Release number of selected component (if applicable):

      4.14.20    

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:
      Discussion in #forum-ocp-tracing Slack channel: https://redhat-internal.slack.com/archives/C013N9P9R6F/p1714578959835839 

              ploffay@redhat.com Pavol Loffay
              rhn-support-ankimaha Ankit Mahajan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: