Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.22.0
Affects Version/s: 4.18.z, 4.19.z, 4.20.z, 4.22.0, 4.21.z
Component/s: OLM
Labels:
- olmv0
- triaged

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Critical
Regression:
None

Target Backport Versions:

4.18.0, 4.19.0, 4.20.0, 4.21.0
Target Version:

4.22.0
Release Blocker:
Rejected
Sprint:
Vaporeon Sprint 282
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
In Progress
Release Note Type:
Bug Fix
Release Note Text:

Hide
Before this update, the catalog sync triggered high I/O on masters where etcd ran. As a consequence, an etcd leader election was triggered. The triggered election reset TTL counters on keys and prevented etcd events from being cleared. With this release, the default catalog polling interval is increased from ten minutes to for hours. As a result, the load on catalog sources is reduced.

Show
Before this update, the catalog sync triggered high I/O on masters where etcd ran. As a consequence, an etcd leader election was triggered. The triggered election reset TTL counters on keys and prevented etcd events from being cleared. With this release, the default catalog polling interval is increased from ten minutes to for hours. As a result, the load on catalog sources is reduced.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

It's been observed that the catalog sync triggers high I/O on masters where etcd runs. This then triggers an etcd leader election which then resets TTL counters on keys, in particular resulting in etcd events never clearing.

It seems unlikely that a 10 minute catalog update factors critically into anyone's operational plans. Therefore we should reduce the catalog source sync duration to four hours avoiding the etcd knock on effects in the local cluster while also reducing load on quay.io or their local mirrors by ~ 95%.

Version-Release number of selected component (if applicable):

All, but lets only bother with 4.18-4.22

How reproducible:

100%

Steps to Reproduce:

Observe catalog update duration

Actual results:

Happens every 10 minutes

Expected results:

Happens every 240 minutes

Additional info:

While I suspect the backend load on our infrastructure or the customer's infrastructure isn't horrible it would be good if we ensured there was an appropriate jitter added so that we avoid any stampeding herd effects of a mass reboot like a datacenter outage. A random sleep of up to 10 minutes is probably sufficient. We should consider whether or not an admin wishing to update the catalog right now would need to have a method to skip the jitter or not, but "restart the pod and wait up to 10 minutes" is probably not horrible.

We should also make sure that our release notes mention this change and that we document the preferred path for updating the catalog source right now.

blocks

OCPBUGS-73881 10m catalog sync interval contributes to unbounded etcd growth

Closed

is cloned by

OCPBUGS-73881 10m catalog sync interval contributes to unbounded etcd growth

Closed

is depended on by

OCPBUGS-73875 10m catalog sync interval contributes to unbounded etcd growth

Closed

relates to

OCPBUGS-57118 certified-operators are failing regularly due to startup probe timing out frequently and generating alert for KubePodCrashLooping

links to

operator-framework/operator-marketplace#695: OCPBUGS-69441: Increase default catalog polling interval to 4h (240m)

operator-framework/operator-marketplace#704: [release-4.18] OCPBUGS-69441: Update default catalog polling interval to 4h (240m)

(1 links to)

Assignee:: Catherine Chan-Tse

Reporter:: Scott Dodson

QA Contact:: Jian Zhang

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2025/12/16 4:52 PM

Updated:: 2026/02/27 7:24 PM

Resolved:: 2026/01/19 9:36 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates