Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Node / Kubelet
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
None
Architecture:

All

Target Backport Versions:
None
Target Version:

4.14.z
Release Blocker:
None
Sprint:
None

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-56785~~. The following is the description of the original issue:
—
Description of problem:

The kubelet podresources endpoint returns allocated exclusive resources to active pods.
The endpoint incorrectly returns resources allocated to terminated pods.

There are 2 factors which concur to create the bug

1. the podresources API depends on inner working of kubelet to retrieve the list of currently active pods. Previously, the related function incorrectly returned active and terminated pods.
2. if the podresources API incorrectly consider a terminated pod, we run into another issue in  memory manager. The memory manager collects stale resources (assignment to terminated pods) only in the allocation flow. Thus, if no pods manage to get admitted, the kubelet through podresources API incorrectly reports memory resources assigned to a terminated pods. This reporting is bogus as these resources are not reserved anymore, but the podresources API cannot know that. This does NOT affect the allocation flow (first thing it does is cleanup) but does affect the reporting, and this behavior is not fixed upstream.


why this affects only memory?

1. device assignment is explicitely cleaned by the podresources API endpoint
2. the cpu assignment is not (and it should) but it is automatically 
cleaned every cpuManagerReconcilePeriod seconds, so it automatically 
recovers

this breaks in an unrecoverable way numa aware scheduling.

Version-Release number of selected component (if applicable):

4.18.z (any)
actually reproduced in Server Version: 4.18.0-0.nightly-2025-04-13-142946

How reproducible:

100%

Steps to Reproduce:

    1. configure the kubelet with memory manager policy = Static
    2. run a job whose pod qualify for memory pinning (see example manifest below)
    3. query the podresources endpoint on the nodes. The endpoint is node-local exposed through a unix domain socket. It has to be queried programmatically. Probably the simplest option is to download the `knit` tool from https://github.com/openshift-kni/debug-tools/releases/tag/v0.2.1 and to use it like `knit podres` with root privileges.


example manifest:
```
apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: idle-gu-job-sched-stall
  generateName: generic-pause-
spec:
  backoffLimit: 6
  completionMode: NonIndexed
  completions: 2
  manualSelector: false
  parallelism: 2
  podReplacementPolicy: TerminatingOrFailed
  suspend: false
  template:
    metadata:
      labels:
        app: idle-gu-job-sched-stall
    spec:
      containers:
      - args:
        - 1s
        command:
        - /bin/sleep
        image: quay.io/openshift-kni/pause:test-ci
        imagePullPolicy: IfNotPresent
        name: generic-job-idle
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: idle-gu-job-sched-stall
        matchLabelKeys:
        - pod-template-hash
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

```

Actual results:

kubelet returns memory resources assigned to terminated pod

Expected results:

either:
1. kubelet does not return the terminated pod
2. kubelet return the terminated pod, but without any resource assigned to it

Additional info:

possibly affects older versions of openshift
solved kubernetes upstream by the pod workers refactoring: the podresources endpoint (correctly) ignores terminated pods and only lists active pods

clones

OCPBUGS-60553 [4.16] kubelet podresources API incorrectly reports memory assignments of terminated pods

Closed

depends on

OCPBUGS-61724 [4.15] kubelet podresources API incorrectly reports memory assignments of terminated pods

Closed

Assignee:: Node Team Bot Account

Reporter:: Francesco Romani

Need Info From:: None

Contributors:: None

QA Contact:: Bhargavi Gudi

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/09/15 10:42 AM

Updated:: 2025/10/13 4:00 PM

Resolved:: 2025/09/16 6:46 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates