Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.20.z
Affects Version/s: 4.14.z
Component/s: Cloud Compute / Machine API Providers
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.20.z
Release Blocker:
None
Sprint:
CLOUD Sprint 278
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Cause - The controller creates and deletes a file with a random name when setting up authentication to AWS
* Consequence - The controller continuously allocated more memory
* Fix - Using the same file name instead of a random one
* Result - The kernel re-uses the dentry instead of requesting a new one for each file

Show
* Cause - The controller creates and deletes a file with a random name when setting up authentication to AWS * Consequence - The controller continuously allocated more memory * Fix - Using the same file name instead of a random one * Result - The kernel re-uses the dentry instead of requesting a new one for each file

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue OCPBUGS-38759. The following is the description of the original issue:
—
Description of problem:
Primary received ExtremelyHighIndividualControlPlaneMemory
Alert https://redhat.pagerduty.com/incidents/Q2U3B5WD4300DY on a hypershift MC cluster hs-mc-i0npt9ce0.

Cluster name: hs-mc-i0npt9ce0
Cluster ID: 32a39ea3-1c2c-4786-b991-b04742ad5fdf

A master node is experience a extramely high memory usage

  ip-10-0-0-227.ap-southeast-4.compute.internal    hs-mc-i0npt9ce0-qqqtk-master-bg7s8-0                         🏛  master  20d            1487m (9%)   13572Mi (23%)   
    ip-10-0-1-151.ap-southeast-4.compute.internal    hs-mc-i0npt9ce0-qqqtk-master-899lq-1                         🏛  master  20d            654m (4%)    57859Mi (99%)🔥 
    ip-10-0-2-126.ap-southeast-4.compute.internal    hs-mc-i0npt9ce0-qqqtk-master-xh7tl-2                         🏛  master  20d            1162m (7%)   17166Mi (29%)

MAPI controller took over 46G of memory on this node.
The problematic pod link in dynatrace is
https://zwz85475.apps.dynatrace.com/ui/apps/dynatrace.classic.technologies/#processdetails;gtf=2024-08-21T11:00:00+12:00%20to%202024-08-21T13:00:00+12:00;gf=all;id=PROCESS_GROUP_INSTANCE-E552062AFBF57B4A

No obvious error was found in the pod. Suspect there are memory leak in the MAPI controller due to the memory usage is going up gradually.

I have to restart the problematic pod to lower the memory usage. No idea how to reproduce the issue.

blocks

OCPBUGS-63137 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Closed

clones

OCPBUGS-38759 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Verified

is blocked by

OCPBUGS-38759 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Verified

is cloned by

OCPBUGS-63137 The master node in MC cluster report insufficent memory due to MAPI controller used too much memory

Closed

links to

openshift/machine-api-provider-aws#148: [release-4.20] OCPBUGS-63136: client: re-use a single file for building the session instead of randomly named files

Assignee:: Christian Schlotter

Reporter:: Tony Kong

Need Info From:: None

Contributors:: None

QA Contact:: Zhaohua Sun

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/10/15 12:24 PM

Updated:: 2025/10/30 9:04 PM

Resolved:: 2025/10/30 3:06 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates