Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.18.0
Affects Version/s: 4.14.z
Component/s: Cloud Compute / Machine API Providers
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.18.z
Release Blocker:
None
Sprint:
CLOUD Sprint 278
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
Before this update, the controller created and deleted a file with a random name when setting up a session to {aws-first}, which caused the controller to continuously allocate more memory to cache the session. With this release, the controller now uses the same file name instead of a random one, allowing the kernel to re-use the `dentry` instead of requesting a new one for each session. As a result, excessive memory allocation is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-63138[~~OCPBUGS-63138~~])

Show
Before this update, the controller created and deleted a file with a random name when setting up a session to {aws-first}, which caused the controller to continuously allocate more memory to cache the session. With this release, the controller now uses the same file name instead of a random one, allowing the kernel to re-use the `dentry` instead of requesting a new one for each session. As a result, excessive memory allocation is resolved. (link: https://issues.redhat.com/browse/OCPBUGS-63138 [ OCPBUGS-63138 ])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-63137~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-63136~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-38759. The following is the description of the original issue:
—
Description of problem:
Primary received ExtremelyHighIndividualControlPlaneMemory
Alert https://redhat.pagerduty.com/incidents/Q2U3B5WD4300DY on a hypershift MC cluster hs-mc-i0npt9ce0.

Cluster name: hs-mc-i0npt9ce0
Cluster ID: 32a39ea3-1c2c-4786-b991-b04742ad5fdf

A master node is experience a extramely high memory usage

  ip-10-0-0-227.ap-southeast-4.compute.internal    hs-mc-i0npt9ce0-qqqtk-master-bg7s8-0                         🏛  master  20d            1487m (9%)   13572Mi (23%)   
    ip-10-0-1-151.ap-southeast-4.compute.internal    hs-mc-i0npt9ce0-qqqtk-master-899lq-1                         🏛  master  20d            654m (4%)    57859Mi (99%)🔥 
    ip-10-0-2-126.ap-southeast-4.compute.internal    hs-mc-i0npt9ce0-qqqtk-master-xh7tl-2                         🏛  master  20d            1162m (7%)   17166Mi (29%)

MAPI controller took over 46G of memory on this node.
The problematic pod link in dynatrace is
https://zwz85475.apps.dynatrace.com/ui/apps/dynatrace.classic.technologies/#processdetails;gtf=2024-08-21T11:00:00+12:00%20to%202024-08-21T13:00:00+12:00;gf=all;id=PROCESS_GROUP_INSTANCE-E552062AFBF57B4A

No obvious error was found in the pod. Suspect there are memory leak in the MAPI controller due to the memory usage is going up gradually.

I have to restart the problematic pod to lower the memory usage. No idea how to reproduce the issue.

blocks

OCPBUGS-63139 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Closed

clones

OCPBUGS-63137 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Closed

is blocked by

OCPBUGS-63137 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Closed

is cloned by

OCPBUGS-63139 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Closed

links to

openshift/machine-api-provider-aws#146: [release-4.18] OCPBUGS-63138: client: re-use a single file for building the session instead of randomly named files

Assignee:: Christian Schlotter

Reporter:: Tony Kong

Need Info From:: None

Contributors:: None

QA Contact:: Huali Liu

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/10/15 12:25 PM

Updated:: 2025/11/14 11:01 PM

Resolved:: 2025/11/14 11:01 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates