-
Feature Request
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
None
-
False
-
Not Selected
-
-
-
-
-
1. Proposed title of this feature request
Prevent the size of must-gathers from growing up too much due to the "rotated" pod logs.
2. What is the nature and description of the request?
The must-gathers of the clusters of one of our partners are using near 100 GiB each.
They have been analyzed in depth and I proceed to share the conclusions using one of them as a reference.
These are the projects using a higher amount of disk space:
37G openshift-sdn 20G openshift-image-registry 15G openshift-multus 2.6G openshift-dns 1.4G openshift-monitoring 1.1G openshift-ingress
When they have been checked in detail, it has been found that the reason of that huge size are the rotated logs. For example, these are the sizes of the pods in openshift-sdn:
307M sdn-lhqcn 285M sdn-dtc5k 283M sdn-9sssm 282M sdn-b7n5g 277M sdn-fmb5n 274M sdn-v7fpd 274M sdn-62nhf 273M sdn-mj7w9 273M sdn-2kqpv [...]
previous.log and current.log files are both consuming a lot of space, but even more the directory hosting the rotated logs. The following example is related to the pod sdn-lhqcn:
228M rotated 44M previous.log 35M current.log
Content of directory rotated listed above:
51M 3.log.20230718-110020 51M 3.log.20230622-145341 51M 2.log.20230303-025003 51M 1.log.20230216-031211 7.1M 3.log.20230515-043944.gz 7.0M 3.log.20230622-145341.gz 7.0M 3.log.20230605-082442.gz 7.0M 3.log.20230426-183927.gz
This implies a huge amount of disk space for a cluster like the one referred, which hosts more than 200 nodes.
The request is to prevent this from happening. I proceed to suggest some additional must-gather options which may help (not mutually exclusive):
- --exclude-rotated-logs: rotated logs shall not be included in the must-gather.
- --exclude-projects=: the OpenShift projects in a comma-separated list shall not be included in the must-gather.
- --max-pod-log-size=: the logs of every pod shall not be larger than the amount of KiB specified. In case they are, their oldest log lines shall be truncated.
- --exclude-previous-logs: previous.log files shall be excluded. This option should probably imply -exclude-rotated-logs.
Some of the options suggested might not be suitable, but they are only suggestions to be put on the table and be discussed. Alternative ideas would be welcome too.
3. Why does the customer need this? (List the business requirements here)
The fact that the must-gathers are so huge is significantly limiting our capacity to provide support to our partner.
Every must-gather takes hours to be created and compressed. It also takes time to be uploaded or shared.
- is blocked by
-
WRKLDS-950 must-gather: pass envs into must-gather images to enhance control over data collection
- Closed
- is related to
-
WRKLDS-994 must-gather: pass envs into must-gather images to enhance control over data collection
- Closed
-
OCPSTRAT-1340 Provide option to collect must-gather based on time stamp: promote to GA
- Closed
- relates to
-
OCPSTRAT-791 Prevent must-gather from filling up master node
- Closed
-
OCPSTRAT-1040 Experimental in 4.16-->Provide option to collect must-gather based on time stamp (--since and --until)
- Closed
- split to
-
WRKLDS-859 Define configurable default limit to emptydir volume in must gather pod
- Closed
- links to