Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-6365

The customer faces cluster outages due to excessive events in Tekton/build namespaces. They request a feature to control event generation, allowing quotas or suppression to prevent etcd overload, maintain stability, and optimize resource usage.

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • Improvement
    • False
    • None
    • False
    • Not Selected

      1. Proposed title of this feature request 
      "Implement Event Control and Quota Mechanism in OpenShift for Special Use-Case Clusters"

      2. What is the nature and description of the request?

      The customer is requesting a feature that allows better control over the number of Kubernetes events generated in OpenShift. Specifically, they want to set a quota or limit on the number of events produced by namespaces in order to avoid etcd exhaustion. This feature is particularly important for special clusters (e.g., clusters dedicated to Tekton/builds) where a large number of pods may be created in rapid succession, generating an excessive number of events, which unnecessarily fills etcd storage. The goal is to implement a mechanism to discard or suppress non-essential events in such clusters without affecting cluster stability.

      3. Why does the customer need this? (List the business requirements here)

      • Prevent Cluster Downtime: The customer has experienced frequent cluster downtimes due to etcd reaching its capacity from handling too many events (e.g., 800K+ events) in a short period. These events, caused by noisy apps or pipeline runs, are not application-related and add unnecessary load to the etcd database.
      • Maintain Stability in Special-Purpose Clusters: The customer runs special-purpose clusters dedicated to CI/CD (Tekton/builds), where the generation of large numbers of events is common. Reducing the impact of these events on etcd would improve cluster reliability.
      • Optimize Resource Usage: Allowing users to limit or suppress event generation will reduce resource consumption, leading to more efficient use of etcd storage and cluster resources.
      • Avoid Unwanted Events: Since these events are not essential for normal operation, being able to suppress them will improve cluster performance and prevent issues related to event overload.

      4. List any affected packages or components.

      • etcd: The etcd component becomes overwhelmed by the large number of events.
      • Event Controller: OpenShift's event controller will need an enhancement to support event quotas or suppression policies.
      • Tekton/Build Pipelines: Pipelines are one of the primary generators of high-volume events, so any changes may also affect their components.
      • Kubernetes API Server: The API server might be impacted if the proposed solution involves filtering or suppressing events at the API level.

              Unassigned Unassigned
              rhn-support-vismishr Vishvranjan Mishra
              Votes:
              4 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: