-
Bug
-
Resolution: Done
-
Major
-
netobserv-1.5-candidate
-
False
-
None
-
False
-
-
-
NetObserv - Sprint 248
-
Important
NOTE: This is a different 429 error than the one described in NETOBSERV-975
Description of problem:
When running our large-scale PerfScale scenario with NetObserv 1.5, we are seeing a large number of dropped flows due to a Loki 429 error
Steps to Reproduce:
1. Deploy an OCP4.14 cluster and scale to 120 nodes 2. Install NetObserv 1.5, Loki Operator with a 1x.medium LokiStack, and AMQ Streams Operator 3. Run the cluster-density-v2 workload with a variable of 480
Actual results:
Flows are dropped due to the following error (seen on various FLP pods) time=2024-01-23T19:31:21Z level=info component=client error=server returned HTTP status 429 Too Many Requests (429): Maximum active stream limit exceeded, reduce the number of active streams (reduce labels or reduce label values), or contact your Loki administrator to see if the limit can be increased, user: 'network' fields.level=warn fields.msg=error sending batch, will retry host=lokistack-gateway-http.netobserv.svc:8080 module=export/loki status=429
Expected results:
No flows should be dropped
Additional Info:
This was seen in performance runs 0dc5303c-301d-4d1a-8c4c-0d7ef100b5dc and 911c279c-5c58-49b9-82ac-a61508262c44 - the env details from the latter as well as a must-gather are below/attached - additional data from those runs can be found here
OCP: 4.14.0-0.nightly-2024-01-18-061723
NetObserv operator: v1.5.0
Loki: v5.8.2
eBPF-agent: v1.5.0-76
FLP: v1.5.0-76
ConsolePlugin: v1.5.0-76
must-gather: https://drive.google.com/file/d/1kTxe4dElC_FJ5ipL_QINngNaNag_IuRU/view?usp=drive_link