Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.16.0
Affects Version/s: 4.16.0
Component/s: Bare Metal Hardware Provisioning / baremetal-operator
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Critical
Regression:
Yes

Target Backport Versions:

4.16.0
Target Version:

4.16.0
Release Blocker:
Approved
Sprint:
Metal Platform 255
sprint_count:
1

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of bug ~~OCPBUGS-35211~~, so that the fix can be backported to 4.16.
------
Description of problem:

The ACM perf/scale hub OCP has  3 baremetal nodes, each has 480GB for the installation disk. metal3 pod uses too much disk space for logs and make the node has disk presure and start evicting pods. which make the ACM stop provisioning clusters.
below is the log size of the metal3 pods:
# du -h -d 1 /sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83
4.0K	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/machine-os-images
276M	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-httpd
181M	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ironic
384G	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ramdisk-logs
77M	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ironic-inspector
385G	/sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83

# ls -l -h /sysroot/ostree/deploy/rhcos/var/log/pods/openshift-machine-api_metal3-9df7c7576-9t7dd_7c72c6d6-168d-4c8e-a3c3-3ce8c0518b83/metal3-ramdisk-logs
total 384G
-rw-------. 1 root root 203G Jun 10 12:44 0.log
-rw-r--r--. 1 root root 6.5G Jun 10 09:05 0.log.20240610-084807.gz
-rw-r--r--. 1 root root 8.1G Jun 10 09:27 0.log.20240610-090606.gz
-rw-------. 1 root root 167G Jun 10 09:27 0.log.20240610-092755

the logs are too huge to be attached. Please contact me if you need access to the cluster to check.

Version-Release number of selected component (if applicable):

the one has the issue is 4.16.0-rc4. 4.16.0.rc3 does not have the issue

How reproducible:

Steps to Reproduce:

1.Install latest ACM 2.11.0 build on OCP 4.16.0-rc4 and deploy 3500 SNOs on baremetal hosts
2.
3.

Actual results:

ACM stop deploying the rest of SNOs after 1913 SNOs are deployed b/c ACM pods are being evicated.

Expected results:

3500 SNOs are deployed.

Additional info:

clones

OCPBUGS-35211 metal3 pod produces too much logs and eats up the node disk space

Closed

depends on

OCPBUGS-35211 metal3 pod produces too much logs and eats up the node disk space

Closed

is duplicated by

OCPBUGS-35741 metal3 pod produces too much logs and eats up the node disk space

Closed

links to

openshift/cluster-baremetal-operator#426: OCPBUGS-35503: Add capability to metal3-ramdisk-logs container

RHSA-2024:0041 OpenShift Container Platform 4.16.0 bug fix and security update

Assignee:: Mahnoor Asghar

Reporter:: Ting Xue

Need Info From:: None

Contributors:: None

QA Contact:: Jad Haj Yahya

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2024/06/14 2:14 PM

Updated:: 2025/07/22 11:43 AM

Resolved:: 2024/06/27 11:50 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates