Details
-
Story
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
-
False
Description
Today, tally memory grows non-linearly with the number of systems that are being re-tallied in an account. This is causing us to use more memory than we are able to allocate. Until SWATCH-688 is able to be completed and we need to put a guard rail into place in order to prevent a tally pod from going into a crash loop if too many systems exist in an account. This currently blocks the processing of whichever kafka queue was assigned to that pod and prevents other customer accounts from being processed.
In order to mitigate this problem we will do the following.
Implement a threshold above which we will turn off tally processing. If the HBI host count is above the threshold we will log a warning & and move on to the next message. The threshold must be configurable and must be able to be turned off with a config flag once the memory issue has been addressed.
Done:
Tally config value name: TALLY_MAX_HBI_ACCOUNT_SIZE
Initial value: 200000
WARNING logged if the threshold is past and an account is skipped. The warning will include both the account number and the total number of instances in HBI.
If the TALLY_MAX_HBI_ACCOUNT_SIZE is not set in the environment then no threshold will be used.