Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.15.z, 4.16.z
Component/s: OLM
Labels:
- olmv0
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
Ivysaur OLM Sprint 269
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    ARO SRE have found out that the presence of the openshift-marketplace pods in a master node somehow causes overall high disk bandwidth writes that exceeds Azure disk bandwidth, which in turn triggers Azure to throttle the disk operations in this master node and therefore causing very high latency etcd writes (~5s). 

After further investigation we have narrowed down to the problem by stopping all pods in the openshift-marketplace namespace and thus stopped the very high disk bandwidth writes and lowering down etcd latencies to a normal level.

Version-Release number of selected component (if applicable):

How reproducible:

    100%

Steps to Reproduce:

    1. Install an ARO cluster with version 4.15.35 or 4.16.30.
    2. In the openshit-console go to the alerts page and observe the etcdGRPCRequestsSlow alert. 
    3. The etcdGRPCRequestsSlow alert should be flipping between pending and inactive. 
    4. Grabe the alerts query and run it in and observe the etcd latencies.

Actual results:

    etcd latencies are > 1s, and can even reach to 5s or 9s

Expected results:

    etcd latencies should be < 1s or the alert should not be in pending nor fire

Additional info:

duplicates

OCPBUGS-48697 OLMv0: excessive catalog source snapshots cause severe performance regression [openshift-4.15.z]

Closed

is cloned by

OCPBUGS-58070 High latency etcd disk writes due to openshift-marketplace pods/OLM

Closed

Assignee:: Jordan Keister

Reporter:: Jose Gavine Cueto

Need Info From:: None

Contributors:: None

QA Contact:: Kui Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/04/14 5:49 AM

Updated:: 2025/09/13 9:42 AM

Resolved:: 2025/04/18 6:14 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates