Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-37539

[Chrome-svc] Too many HTTP users result in OutOfMem crash

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Unset
    • None

      Description of Problem

      When the chrome-svc is at its limit, it continues trying to process more and more requests until it gets OOM killed. Other services try to take over and get OOM killed as well. That results in a CrashLoop that can't fix itself without removing the load, waiting for all the replicas to spin up, and then resuming the load.

      more details: https://docs.google.com/document/d/1De-b1bGO_6SdAMJ–UNJNadBJ1_k56T1afoi9ECFkAU/edit#heading=h.urv167tq25sq

      How reproducible

      Send a bunch of HTTP requests to Chrome svc so its memory usage gets too high. I have used Locust, here's a file to make it work:locustfile.py

      Steps to Reproduce

      1. Send a lot of HTTP requests to Chrome
      2. monitor chrome-svc health, see it's broken
      3. fix: stop HTTP load
      4. wait for all chrome-svc replicas to spin up to Ready state

      Actual Behavior

      OOM crash resulting in Crashloop unless the HTTP load is removed

      Expected Behavior

      The service does not OOM crash but has some other kind of a way to not crash.

      I'd suggest to refuse incoming requests that would result in the service crashing.

      It could be hard to implement as the service can have high mem usage but still have a good amount of resources ready, thanks to its internal garbage collection process I presume

      Business Impact / Additional info

      Chrome-svc is used by the UI to provide some important files so its health is important.

              Unassigned Unassigned
              rhn-engineering-jsmejkal Jan Smejkal
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: