• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Long running clusters were created with load specified in this PR https://github.com/stackrox/stackrox/pull/15886. The berserker load was greater and the berserker namespaces were churned. Sensor began to OOM, about once a day. The ACS version used in that test was from a commit between 4.8.2 and 4.8.3. The test was next run with a more recent version of ACS. Sensor OOMed even more frequently (About once every 8 hours). Next a test was run with the recent version of master, and with churn for the berserker namespaces, but no increased load. In this case sensor memory usage was higher than in tests without churn, but stabilized.

      In one of the tests sensor was edited with

      ks set env deploy/sensor ROX_GRPC_MAX_MESSAGE_SIZE=201326592

      This stopped the memory from continuously increasing. The problem seems to have been that sometimes the messages sent by sensor to central were too large and when the messages failed to be sent the data was retained by sensor.

      Another think that might reduce sensor memory usage is setting ROX_NETFLOW_USE_LEGACY_UPDATE_COMPUTER=true.

      A Colab notebook for the first test https://colab.research.google.com/drive/1XI6lXT-fpyMpjVA6dqxaBi5n4KCCmjQ8?usp=sharing

      A Colab notebook for the second test https://colab.research.google.com/drive/1maT18BLluhXrI2JiEoAiJYgCGVDlL1EB?usp=sharing

      A Colab notebook for the third test https://colab.research.google.com/drive/1VMQMWC-muy_F3JC1OlrUkbgxQFsBZhqt?usp=sharing

              Unassigned Unassigned
              jvirtane@redhat.com Jouko Virtanen
              ACS Sensor & Ecosystem
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: