-
Bug
-
Resolution: Done
-
Undefined
-
None
-
rhosdt-3.5
-
None
-
Quality / Stability / Reliability
-
1
-
False
-
-
False
-
-
-
Tracing Sprint # 272 - Release
-
Important
Description of problem:
Capturing traces from multiple applications to a single TempoStack instance, then while querying those traces for a longer duration (say 6 hours) with trace count of 20 or more, it results in 504 Gateway Timeout
Version-Release number of selected component (if applicable):
Tempo Operator 0.15.4-1
How reproducible:
100%
Steps to Reproduce:
1. Install and configure multi tentant TempoStack 2. Setup OpenTelemetryCollector to collect the traces. 3. Create multiple namespaces (I created 25) 4. Deploy Instrumentation and Applications in above created namespaces. 5 Wait for sometime and let TempoStack capture some traces (say 3-4 hours) 6. Try to query traces (for 1 service) for 6 or more hours of duration and it results in 504 gateway timeout
Actual results:
504 Gateway Timeout is seen when querying traces for a longer duration and limit results to 20.
Expected results:
Tempo should present the traces without getting timed out.
Additional info:
The test was carried out with AWS S3 as object storage. Configured 2 vCPU and 5 Gi of memory for querier and query-frontend pods. Neither of querier and query-frontend pods got OOMKilled. --- Gateway streams below errors when query fails: level=warn name=observatorium ts=2025-05-29T19:57:37.387386306Z caller=stdlib.go:105 caller=reverseproxy.go:661 msg="http: proxy error: context deadline exceeded" level=warn name=observatorium ts=2025-05-29T19:57:37.387439501Z caller=instrumentation.go:33 request=tempo-simplest-gateway-7d4c694d98-l4ft4/lKPOKWFDP3-000164 proto=HTTP/1.1 method=GET status=502 content= path=/api/traces/v1/dev/api/traces duration=30.000367897s bytes=0 --- The issue seems to be either at query-frontend or querier pod side.
- relates to
-
TRACING-5247 Add network policies for RHOSDT Operators
-
- Closed
-