Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-69441

10m catalog sync interval contributes to unbounded etcd growth

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.z, 4.19.z, 4.20.z, 4.22.0, 4.21.z
    • OLM
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • Rejected
    • V* Sprint 292
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      It's been observed that the catalog sync triggers high I/O on masters where etcd runs. This then triggers an etcd leader election which then resets TTL counters on keys, in particular resulting in etcd events never clearing.
      
      It seems unlikely that a 10 minute catalog update factors critically into anyone's operational plans. Therefore we should reduce the catalog source sync duration to four hours avoiding the etcd knock on effects in the local cluster while also reducing load on quay.io or their local mirrors by ~ 95%.

      Version-Release number of selected component (if applicable):

      All, but lets only bother with 4.18-4.22

      How reproducible:

      100% 

      Steps to Reproduce:

      Observe catalog update duration
          

      Actual results:

      Happens every 10 minutes

      Expected results:

      Happens every 240 minutes

      Additional info:

      While I suspect the backend load on our infrastructure or the customer's infrastructure isn't horrible it would be good if we ensured there was an appropriate jitter added so that we avoid any stampeding herd effects of a mass reboot like a datacenter outage. A random sleep of up to 10 minutes is probably sufficient. We should consider whether or not an admin wishing to update the catalog right now would need to have a method to skip the jitter or not, but "restart the pod and wait up to 10 minutes" is probably not horrible.
      
      We should also make sure that our release notes mention this change and that we document the preferred path for updating the catalog source right now.

              rh-ee-cchantse Catherine Chan-Tse
              rhn-support-sdodson Scott Dodson
              None
              None
              Jian Zhang Jian Zhang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: