Description of problem:
Listing events from Kube API takes extremely long to return. Suspecting high event count is the reason behind this delay. Customer is using the default event TTL (i.e. 3hr) in kube-apiserver , however events older than 3 hours are not getting purged automatically. Customers have to delete the events manually everytime whenever the huge number of events observed in any project, to keep the ETCD size relatively small. Below are few events older than 20hr which are still present in the project : ~~~ 20h Normal Scheduled pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e Successfully assigned notebooks-research/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e to lvshdchpc04.lvs.paypalinc.com 20h Warning FailedMount pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e MountVolume.SetUp failed for volume "pvc-721bad8f-05ee-41fb-8066-192a98c6f1d1" : kubernetes.io/csi: mounter.SetUpAt failed to check for STAGE_UNSTAGE_VOLUME capability: rpc error: code = Aborted desc = requests pending 20h Normal AddedInterface pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e Add eth0 [192.253.5.26/23] from ovn-kubernetes 20h Normal Pulled pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e Container image "artifactory.paypalinc.com/core-data-platform/git-sync:v3.2.2" already present on machine 20h Normal Created pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e Created container dags-git-clone 20h Normal Started pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e Started container dags-git-clone ~~~ As per default TTL of 3 hours, events should get deleted when it reaches the TTL. Customer need RCA and a permanent solution for this. The exception from this bug is to check why events are not getting deleted as per it's TTL. Need RCA and permanent fix.
Version-Release number of selected component (if applicable):
4.14.6
Actual results:
Events older than 3 hours are not getting deleted automatically.
Expected results:
Events older than 3 hours should get deleted automatically.
Additional info:
Need urgent assistance on this as customer reported this as an ongoing issue which is impacting their cluster.