-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
False
-
-
False
-
-
-
Long running clusters were created with load specified in this PR https://github.com/stackrox/stackrox/pull/15886. The berserker load was greater and the berserker namespaces were churned. Sensor began to OOM, about once a day. The ACS version used in that test was from a commit between 4.8.2 and 4.8.3. The test was next run with a more recent version of ACS. Sensor OOMed even more frequently (About once every 8 hours). Next a test was run with the recent version of master, and with churn for the berserker namespaces, but no increased load. In this case sensor memory usage was higher than in tests without churn, but stabilized.
In one of the tests sensor was edited with
ks set env deploy/sensor ROX_GRPC_MAX_MESSAGE_SIZE=201326592
This stopped the memory from continuously increasing. The problem seems to have been that sometimes the messages sent by sensor to central were too large and when the messages failed to be sent the data was retained by sensor.
Another think that might reduce sensor memory usage is setting ROX_NETFLOW_USE_LEGACY_UPDATE_COMPUTER=true.
A Colab notebook for the first test https://colab.research.google.com/drive/1XI6lXT-fpyMpjVA6dqxaBi5n4KCCmjQ8?usp=sharing
A Colab notebook for the second test https://colab.research.google.com/drive/1maT18BLluhXrI2JiEoAiJYgCGVDlL1EB?usp=sharing
A Colab notebook for the third test https://colab.research.google.com/drive/1VMQMWC-muy_F3JC1OlrUkbgxQFsBZhqt?usp=sharing
- is caused by
-
ROX-30941 ProcessesListening deduper state may grow indefinetly
-
- Closed
-