Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: must-gather
Labels:
- 4.17-candidate
- rfe-accepted-to-approved-unresolve

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Intelligence Requested:
Market:
PX Impact Range:
PX Impact Score:
PX Priority Data:
PX Review Complete:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

1. Proposed title of this feature request

Prevent the size of must-gathers from growing up too much due to the "rotated" pod logs.

2. What is the nature and description of the request?

The must-gathers of the clusters of one of our partners are using near 100 GiB each.

They have been analyzed in depth and I proceed to share the conclusions using one of them as a reference.

These are the projects using a higher amount of disk space:

37G    openshift-sdn
20G    openshift-image-registry
15G    openshift-multus
2.6G    openshift-dns
1.4G    openshift-monitoring
1.1G    openshift-ingress

When they have been checked in detail, it has been found that the reason of that huge size are the rotated logs. For example, these are the sizes of the pods in openshift-sdn:

307M    sdn-lhqcn
285M    sdn-dtc5k
283M    sdn-9sssm
282M    sdn-b7n5g
277M    sdn-fmb5n
274M    sdn-v7fpd
274M    sdn-62nhf
273M    sdn-mj7w9
273M    sdn-2kqpv
[...]

previous.log and current.log files are both consuming a lot of space, but even more the directory hosting the rotated logs. The following example is related to the pod sdn-lhqcn:

228M    rotated
44M    previous.log
35M    current.log

Content of directory rotated listed above:

51M    3.log.20230718-110020
51M    3.log.20230622-145341
51M    2.log.20230303-025003
51M    1.log.20230216-031211
7.1M    3.log.20230515-043944.gz
7.0M    3.log.20230622-145341.gz
7.0M    3.log.20230605-082442.gz
7.0M    3.log.20230426-183927.gz

This implies a huge amount of disk space for a cluster like the one referred, which hosts more than 200 nodes.

The request is to prevent this from happening. I proceed to suggest some additional must-gather options which may help (not mutually exclusive):

--exclude-rotated-logs: rotated logs shall not be included in the must-gather.
--exclude-projects=: the OpenShift projects in a comma-separated list shall not be included in the must-gather.
--max-pod-log-size=: the logs of every pod shall not be larger than the amount of KiB specified. In case they are, their oldest log lines shall be truncated.
--exclude-previous-logs: previous.log files shall be excluded. This option should probably imply -exclude-rotated-logs.

Some of the options suggested might not be suitable, but they are only suggestions to be put on the table and be discussed. Alternative ideas would be welcome too.

3. Why does the customer need this? (List the business requirements here)

The fact that the must-gathers are so huge is significantly limiting our capacity to provide support to our partner.

Every must-gather takes hours to be created and compressed. It also takes time to be uploaded or shared.

is blocked by

WRKLDS-950 must-gather: pass envs into must-gather images to enhance control over data collection

Closed

is related to

WRKLDS-994 must-gather: pass envs into must-gather images to enhance control over data collection

Closed

OCPSTRAT-1340 Provide option to collect must-gather based on time stamp: promote to GA

Closed

relates to

OCPSTRAT-791 Prevent must-gather from filling up master node

Closed

OCPSTRAT-1040 Experimental in 4.16-->Provide option to collect must-gather based on time stamp (--since and --until)

Closed

split to

WRKLDS-859 Define configurable default limit to emptydir volume in must gather pod

Closed

links to

How to reduce the size of must-gathers

KCS 5459251: Creating must-gather with more details for specific components in OCP 4

openshift/enhancements#1487: Implementation details for `--all-images`

(1 split to, 3 links to)

Details

Description

1. Proposed title of this feature request

2. What is the nature and description of the request?

3. Why does the customer need this? (List the business requirements here)

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide