-
Bug
-
Resolution: Done
-
Blocker
-
None
-
2.8 GA
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Engineering
WHAT
Container apicast-production must terminate when one of its processes gets OOM killed. The reason for this is so that we get a predictable OOM kill behavior by k8s. When a container runs multiple processes and one of these processes (non-init / main) gets OOM killed by the underlying OSD node's kernel, the container continues to run which results in k8s not knowing that the container was OOM killed, therefore the pod where the container runs is not restarted. This behavior remains undetected by CSSRE because the pod/container is still running.
HOW
The options I can think of are;
1. Use one process per container
2. Ensure the whole container exits when one of its processes gets OOM killed
3. Use a liveness probe that checks for OOM killed processes other than the main/init process.
DONE
Container exits when one of its child processes gets OOM killed. This allows k8s to automatically perform a restart policy for the pod and avoid undetected OOM'd processes in a container.
- is related to
-
THREESCALE-6047 Add new Prometheus alert on Operator
- Closed
-
THREESCALE-5965 Worker process metric
- Closed