-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
None
-
None
-
None
-
Rejected
-
Mewtwo Sprint 273
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
platform: ARO
OCP Version: 4.16.37 ARO SRE have found out the presence of the openshift-marketplace pods in a master node that somehow causes disk IO contention. The following symptoms are observed in one customer cluster, In master-2 a few openshift-marketplace pods were spiking CPU usage, relatively greater than kube-apiserver or other typical top CPU user. In master-2, etcd latencies are as high as > 1 to ~9s. In master-2 VM disk queue length and IO bandwidth are relatively high or higher than average. We are anticipating the possibility that this may be a regression for the fix for the bug - [OCPBUGS-48697] OLMv0: excessive catalog source snapshots cause severe performance regression [openshift-4.15.z] - Red Hat Issue Tracker Please investigate or help us rule this out. SRE team needs OLM expertise in order to confirm this bug exists in the customer's cluster or not.
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. Install an ARO cluster with version 4.16.37. 2. Wait for some time, perhaps instal operators and put average load on etcd. Or anything that simulates realistic cluster and OLM usage. 3. In the openshit-console go to the alerts page and observe the etcdGRPCRequestsSlow alert. 4. The etcdGRPCRequestsSlow alert should be flipping between pending and inactive, or firing. 4. Grab the alerts query and run it in and observe the etcd latencies.
Actual results:
etcd latencies are > 1s, and can even reach to 5s or 9s
Expected results:
etcd latencies should be < 1s or the alert should not be in pending nor fire
Additional info:
MG link: https://attachments.access.redhat.com/hydra/rest/cases/04179961/attachments/08b1bb49-3b20-4cbc-b212-94fd3facb1f5?usePresignedUrl=true
- clones
-
OCPBUGS-54936 High latency etcd disk writes due to openshift-marketplace pods/OLM
-
- Closed
-
- duplicates
-
OCPBUGS-43966 high snapshot rate on redhat-operators, OLM operator install hangs. RPC DeadlineExceeded while listing bundles.
-
- Verified
-
- is duplicated by
-
OCPBUGS-61307 High CPU Usage Observed in OpenShift Marketplace and API Server on ARO 4.16.z
-
- Closed
-
- is related to
-
OCPBUGS-43966 high snapshot rate on redhat-operators, OLM operator install hangs. RPC DeadlineExceeded while listing bundles.
-
- Verified
-
- relates to
-
OCPBUGS-48696 OLMv0: excessive catalog source snapshots cause severe performance regression [openshift-4.16.z]
-
- Closed
-