Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35550

Events older than 3 hr are not getting purged automatically in RHOCP4

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • 4.14.z
    • Etcd
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Listing events from Kube API takes extremely long to return. Suspecting high event count is the reason behind this delay.
      
      Customer is using the default event TTL (i.e. 3hr) in kube-apiserver , however events older than 3 hours are not getting purged automatically.
      Customers have to delete the events manually everytime whenever the huge number of events observed in any project, to keep the ETCD size relatively small.
      
      Below are few events older than 20hr which are still present in the project :
      ~~~
      20h         Normal    Scheduled                pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e                                                            Successfully assigned notebooks-research/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e to lvshdchpc04.lvs.paypalinc.com
      20h         Warning   FailedMount              pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e                                                            MountVolume.SetUp failed for volume "pvc-721bad8f-05ee-41fb-8066-192a98c6f1d1" : kubernetes.io/csi: mounter.SetUpAt failed to check for STAGE_UNSTAGE_VOLUME capability: rpc error: code = Aborted desc = requests pending
      20h         Normal    AddedInterface           pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e                                                            Add eth0 [192.253.5.26/23] from ovn-kubernetes
      20h         Normal    Pulled                   pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e                                                            Container image "artifactory.paypalinc.com/core-data-platform/git-sync:v3.2.2" already present on machine
      20h         Normal    Created                  pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e                                                            Created container dags-git-clone
      20h         Normal    Started                  pod/vmardavmardavishalresearchemailtask.082003b2e46f44e4a305bcd947f80c0e                                                            Started container dags-git-clone
      ~~~
      
      As per default TTL of 3 hours, events should get deleted when it reaches the TTL. Customer need RCA and a permanent solution for this.
      
      The exception from this bug is to check why events are not getting deleted as per it's TTL. Need RCA and permanent fix.

      Version-Release number of selected component (if applicable):

      4.14.6
      

      Actual results:

      Events older than 3 hours are not getting deleted automatically.

      Expected results:

      Events older than 3 hours should get deleted automatically.

      Additional info:

      Need urgent assistance on this as customer reported this as an ongoing issue which is impacting their cluster.

              jchaloup@redhat.com Jan Chaloupka
              rhn-support-sdharma Suruchi Dharma
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: