Loading...

Type: Bug
Resolution: Done
Priority: High
Fix Version/s: None
Affects Version/s: OSC 1.5.0
Component/s: kata-containers
Labels:
- customer-bug

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Release Note Text:

Hide
.Excessive metric reporting causes Prometheus pods to fail

Previously, the `kata_shim_netdev` metric reported an excessively large volume of metrics, which caused Prometheus pods to fail with `out of memory` errors. In the current release, the issue has been fixed.

Show
.Excessive metric reporting causes Prometheus pods to fail Previously, the `kata_shim_netdev` metric reported an excessively large volume of metrics, which caused Prometheus pods to fail with `out of memory` errors. In the current release, the issue has been fixed.
Release Note Type:
Bug Fix
Release Note Status:
Done
Git Pull Request:
https://github.com/kata-containers/kata-containers/pull/9100
Intelligence Requested:
Market:

Sprint:
Kata Sprint #250, Kata Sprint #252
WSJF:
0

Target Version:

OSC 1.6.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description

The kata shim gathers metrics for the OSC process.
One of these metrics, `kata_shim_netdev`, is known to retrieve too much information compared to what's actually useful.
This is described in upstream issue: https://github.com/kata-containers/kata-containers/issues/5738

Doing that causes Prometheus containers to increase their memory usage. As the network interfaces change (whenever containers are created/deleted), new metrics are added, and overtime, this can lead the Prometheus containers to fail due to lack of memory.

Steps to reproduce

I'm using the following deployment and code to have a loop of containers created/deleted.

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-kata-deployment
#  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        run: nginx
    spec:
      containers:
      - name: nginx-kata
        image: bitnami/nginx
#      runtimeClassName: kata

Note that the runtimeClassName is commented out: I don't actually need the containers to be running kata, as long as there is at least one kata container running elsewhere on the same node.
The problem we have is from the running kata container gathering data and keeping a history of it. The containers from the deployment being deleted regularly, they won't be the cause of the issue, because their data will be lost whenver they're scaled down.

Script:

#!/bin/bash

function wait_for_scaling() {
  sleep 5
  deployed=$(oc get deployment nginx-kata-deployment | tail -n1 | awk '{print $2}')
  while [ ! "$deployed" = "$1/$1" ]; do
    echo "Deployed $deployed - waiting..."
    sleep 1
    deployed=$(oc get deployment nginx-kata-deployment | tail -n1 | awk '\{print $2}')
  done
}

oc apply -f nginx_deployment.yaml

while [ true ]; do
  oc scale deployment nginx-kata-deployment --replicas=10
  wait_for_scaling 10

  oc scale deployment nginx-kata-deployment --replicas=1
  wait_for_scaling 1
done

This will create 10 containers, then delete them, in a loop.

Expected result

The prometheus pods should not grow overtime, and not be OOM-killed

How to check the fix

The prometheus pods that I've been looking at are named "prometheus-k8s-[number]. You can check their memory usage, and/or whether they are OOM-Killed, but OOM-kill can take a long time.

Here is the requests I used in the "Observe/Metrics" panel of Openshift console:

sum(container_memory_working_set_bytes{pod='prometheus-k8s-0',namespace='openshift-monitoring',container='',}) BY (pod, namespace)

sum(container_memory_working_set_bytes{pod='prometheus-k8s-1',namespace='openshift-monitoring',container='',}) BY (pod, namespace)

Alternatively, you can check that "kata_shim_netdev" metric is not visible anymore after the patch is applied.

count(group(kata_shim_netdev) by (interface))

Actual result

With the above script, I can see the number of kata_shim_netdev metric entries grow in prometheus. The longer the test runs, the higher the value.
I can see a grow in memory usage for prometheus pods that are linked to this. The grow is not as strong when a kata container doesn't run.

I did not reproduce the OOM-kill, but I probably would need to run this test longer. It also depends on the cluster's memory limits. But I feel that the mechanism is there.

Env

The first occurrence of this problem was found with OCP 4.12, using OSC 1.4.1
I've been running the test above with OCP 4.14 and OSC 1.5.1

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

reproducer-K2639.sh
1 kB
2024/03/27 7:04 PM

depends on

KATA-2792 downstream: ship metrics gathering fix for OCP 4.13

Closed

KATA-2800 downstream: ship metrics gathering fix for OCP 4.14

Closed

KATA-2801 downstream: ship metrics gathering fix for OCP 4.15

Closed

KATA-2846 downstream: ship metrics gathering fix for OCP 4.12

Closed

1.

downstream: gathering too many metrics makes Prometheus containers OOM killed

Closed

Unassigned

Details

Description

Description

Steps to reproduce

Expected result

How to check the fix

Actual result

Env

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates