Uploaded image for project: 'Operator Runtime'
  1. Operator Runtime
  2. OPRUN-4390

Avoid unconditional CatalogSource pod creation during catalog polling

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      Description:

      OLMv0 periodically polls CatalogSources to check for catalog upgrades. Currently,
      this polling mechanism always creates a CatalogSource pod, regardless of whether
      the catalog image has actually changed.

      OLMv0 performs the following actions on every polling interval (10 minutes historically, 15 minutes by default today):

      • Pull the CatalogSource image
      • Create a CatalogSource pod
      • Compare it with the existing catalog pod using the pod hash
      • Switch traffic if a real update is detected

      When there is no upgrade (i.e., no catalog image change), OLMv0 still performs
      the following operations:

      • GET ServiceAccount
      • GET NetworkPolicy (x2)
      • GET Service
      • LIST Pods
      • CREATE Pod (writes to etcd)
      • Wait for the Pod to become Ready
      • DELETE Pod (writes to etcd)
      • PATCH CatalogSource/status (writes to etcd)

      With 4 default CatalogSources, this behavior results in approximately:

      • 72 etcd writes per hour
      • 1,728 etcd writes per day

      As part of OCPBUGS-69441, the default polling interval is being increased to
      mitigate the impact of this behavior. However, this is fundamentally a design
      issue rather than a tuning problem.

      A potential long term improvement would be to query the container registry
      (e.g., via manifest digest) to determine whether the catalog image has changed
      before creating a CatalogSource pod, and only proceed with pod creation when a
      real update is detected.

      Given that OLMv0 is in maintenance mode, this work is explicitly captured as
      tech debt and is out of scope for the current bug fix.

      For now, short term workaround has landed via via operator-marketplace through PR https://github.com/operator-framework/operator-marketplace/pull/695

       

      Acceptance Criteria: 

      • A design is identified that allows OLMv0 to determine whether a catalog image
          has changed prior to creating a CatalogSource pod.
      • When no catalog image change is detected, OLMv0 would avoid pod creation and
          associated etcd writes.
      • Any proposed change preserves existing CatalogSource upgrade behavior and
          failure modes.

              Unassigned Unassigned
              rashmigottipati Rashmi Gottipati
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: