Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-5416

504 gateway timeout while querying traces from Tempo

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • rhosdt-3.5
    • Tempo
    • None
    • Tracing Sprint # 272 - Release
    • Important

      Description of problem:

      Capturing traces from multiple applications to a single TempoStack instance, then while querying those traces for a longer duration (say 6 hours) with trace count of 20 or more, it results in 504 Gateway Timeout

      Version-Release number of selected component (if applicable):

      Tempo Operator 0.15.4-1 

      How reproducible:

      100%

      Steps to Reproduce:

      1. Install and configure multi tentant TempoStack
      2. Setup OpenTelemetryCollector to collect the traces.
      3. Create multiple namespaces (I created 25)
      4. Deploy Instrumentation and Applications in above created namespaces.
      5 Wait for sometime and let TempoStack capture some traces (say 3-4 hours)
      6. Try to query traces (for 1 service) for 6 or more hours of duration and it results in 504 gateway timeout

      Actual results:

      504 Gateway Timeout is seen when querying traces for a longer duration and limit results to 20.

      Expected results:

      Tempo should present the traces without getting timed out.

      Additional info:

      The test was carried out with AWS S3 as object storage.
      Configured 2 vCPU and 5 Gi of memory for querier and query-frontend pods.
      Neither of querier and query-frontend pods got OOMKilled.
      
      ---
      Gateway streams below errors when query fails:
      
      level=warn name=observatorium ts=2025-05-29T19:57:37.387386306Z caller=stdlib.go:105 caller=reverseproxy.go:661 msg="http: proxy error: context deadline exceeded"
      level=warn name=observatorium ts=2025-05-29T19:57:37.387439501Z caller=instrumentation.go:33 request=tempo-simplest-gateway-7d4c694d98-l4ft4/lKPOKWFDP3-000164 proto=HTTP/1.1 method=GET status=502 content= path=/api/traces/v1/dev/api/traces duration=30.000367897s bytes=0
      
      ---
      The issue seems to be either at query-frontend or querier pod side.

              ploffay@redhat.com Pavol Loffay
              rhn-support-dgautam Dhruv Gautam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: