Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.15.0
Affects Version/s: 4.14, 4.15, 4.16
Component/s: HyperShift
Labels:
- triaged

Severity:
Important
Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
Cause - Kube API Server pods are dying because of OOM when a Hosted Cluster, that has a large number of worker nodes, is being upgraded.
Consequence - ROSA HCP clusters can’t be updated if the total number of worker nodes is over 51.
Fix - Exposed GOMEMLIMIT and GOGC for kube-apiserver via annotations in HostedCluster.
Result - ROSA HCP clusters with worker nodes over 51 can be upgraded now.

Show
Cause - Kube API Server pods are dying because of OOM when a Hosted Cluster, that has a large number of worker nodes, is being upgraded. Consequence - ROSA HCP clusters can’t be updated if the total number of worker nodes is over 51. Fix - Exposed GOMEMLIMIT and GOGC for kube-apiserver via annotations in HostedCluster. Result - ROSA HCP clusters with worker nodes over 51 can be upgraded now.
Target Version:

4.15.0
Target Backport Versions:

4.14.z, 4.15.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-27817~~. The following is the description of the original issue:
—
Description of problem:

When performing upgrades on ROSA HCP clusters with a large number of worker nodes (> 51), the Kube APIServer pods of the cluster use up memory exceeding the capacity of their nodes, resulting in OOMKills.

Version-Release number of selected component (if applicable):

   4.14, 4.15

How reproducible:

    always

Steps to Reproduce:

    1. Create ROSA HCP Cluster
    2. Add 100 workers to Cluster
    3. Upgrade the cluster

Actual results:

    Kube APIServer pods are OOMKilled

Expected results:

    Upgrade completes successfully

Additional info:

blocks

OCPBUGS-29206 High memory usage by Kube APIServer on HostedCluster upgrades

Closed

clones

OCPBUGS-27817 High memory usage by Kube APIServer on HostedCluster upgrades

Closed

is blocked by

OCPBUGS-27817 High memory usage by Kube APIServer on HostedCluster upgrades

Closed

is cloned by

OCPBUGS-29206 High memory usage by Kube APIServer on HostedCluster upgrades

Closed

links to

openshift/hypershift#3457: [release-4.15] OCPBUGS-27818: Add GC knobs for KAS

RHSA-2023:7198 OpenShift Container Platform 4.15 security update

(1 links to)

Assignee:: Alberto Garcia Lamela

Reporter:: OpenShift Prow Bot

QA Contact:: He Liu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/01/23 9:05 PM

Updated:: 2024/02/27 9:08 PM

Resolved:: 2024/02/13 2:26 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates