-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
Unset
-
None
-
-
Description of Problem
When the chrome-svc is at its limit, it continues trying to process more and more requests until it gets OOM killed. Other services try to take over and get OOM killed as well. That results in a CrashLoop that can't fix itself without removing the load, waiting for all the replicas to spin up, and then resuming the load.
more details: https://docs.google.com/document/d/1De-b1bGO_6SdAMJ–UNJNadBJ1_k56T1afoi9ECFkAU/edit#heading=h.urv167tq25sq
How reproducible
Send a bunch of HTTP requests to Chrome svc so its memory usage gets too high. I have used Locust, here's a file to make it work:locustfile.py
Steps to Reproduce
- Send a lot of HTTP requests to Chrome
- monitor chrome-svc health, see it's broken
- fix: stop HTTP load
- wait for all chrome-svc replicas to spin up to Ready state
Actual Behavior
OOM crash resulting in Crashloop unless the HTTP load is removed
Expected Behavior
The service does not OOM crash but has some other kind of a way to not crash.
I'd suggest to refuse incoming requests that would result in the service crashing.
It could be hard to implement as the service can have high mem usage but still have a good amount of resources ready, thanks to its internal garbage collection process I presume
Business Impact / Additional info
Chrome-svc is used by the UI to provide some important files so its health is important.