-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
None
-
None
-
False
-
None
-
False
-
NEW
-
NEW
-
Bug Fix
-
-
Description of problem:
On ROSA 4.14.20 running cluster-logging v5.9.5, SRE observed master node disk pressure as a result of the vector process on-node not releasing files that were deleted.
The issue surfaced as a disk pressure taint on a master node, with a second master node reporting nearly the same utilization (but slightly under the diskPressure threshold). When accessing the node to determine the component/directory consuming the most space, df -h indicated that the root disk is nearly full, while du -sh reported substantially less utilization across all directories.
The reason for the discrepancy is that du does not report files which have been deleted on-disk, but whose handlers remain open. Running lsof | grep '(deleted)' lists these files, in addition to the process keeping them open.
In both cases, a process called vector was keeping /var/log/kube-apiserver/audit*.log files open long after the file had been deleted on-disk. Deleting the corresponding collector pod allowed the files to be closed and resolved the disk pressure issue.
Version-Release number of selected component (if applicable):
cluster logging v5.9.5
How reproducible:
Unknown, potentially always
Steps to Reproduce:
- ...
Actual results:
Node disk utilization unexpectedly high, causing a disk-pressure taint to be applied to the node and disrupting cluster operation.
Expected results:
Disk utilization does not increase (substantially) as a result of deploying cluster-logging
Additional info:
- duplicates
-
LOG-5866 Vector not releasing deleted file handles
-
- Closed
-