Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13
Component/s: OLM
Labels:
- kube-apiserver
- olm

Severity:
Important
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

OLM caused kube-apiserver, and consequentially etcd, outage due to API requests overflood.

Version-Release number of selected component (if applicable):

4.13.34

Actual results (using audit.log.tar.gz from 3/12/2024 2:27 PM):

$ zcat audit.log.tar.gz  | jq -Rr 'fromjson? | select(.requestReceivedTimestamp | contains("2024-03-11")) | (.responseStatus.code|tostring) + " " +  " [" + .user.username + "] " + (.verb|ascii_upcase) + " " + .requestURI'  | sort | uniq -c | sort -n  | tail -5 
36827 200  [system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount] LIST /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations?labelSelector=olm.owner%3Dcryostat-operator.v2.4.0-3%2Colm.owner.kind%3DClusterServiceVersion%2Colm.owner.namespace%3Dcryostat
36829 200  [system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount] LIST /apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations?labelSelector=olm.owner%3Dcryostat-operator.v2.4.0-3%2Colm.owner.kind%3DClusterServiceVersion%2Colm.owner.namespace%3Dcryostat
36884 200  [system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount] LIST /apis/operators.coreos.com/v1/namespaces/cryostat/operatorgroups
37052 404  [system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount] GET /apis/operators.coreos.com/v1alpha1/namespaces/cryostat/clusterserviceversions/cryostat-operator.v2.4.0
39703 201  [system:serviceaccount:openshift-apiserver:openshift-apiserver-sa] CREATE /apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s

$ zcat audit.log.tar.gz  | jq -Rr 'fromjson? | select(.requestReceivedTimestamp | contains("2024-03-11") ) | select(.requestURI=="/apis/operators.coreos.com/v1alpha1/namespaces/cryostat/clusterserviceversions/cryostat-operator.v2.4.0")' | wc -l                                                   
 37052

// ** Showing the first and the last time
      the request was executed (doing this to get the time range) **

$ zcat audit.log.tar.gz  | jq -Rr 'fromjson? | select(.requestReceivedTimestamp | contains("2024-03-11") ) | select(.user.username=="system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount") | .requestReceivedTimestamp' -r | egrep -o "2024-03-11T.*:.*.:" | uniq | sed -e 1b -e '$!d'
2024-03-11T11:39:
2024-03-11T16:36:

From this output, we see OLM executing ~2 requests per second only trying to reach the non-existent (as it receives 404) endpoint /apis/operators.coreos.com/v1alpha1/namespaces/cryostat/clusterserviceversions/cryostat-operator.v2.4.0, and ~10 requests per second if we summarize all the ones related to cryostat.

Expected results:

OLM should pool these resources via API at more relaxed intervals to avoid throwing down kube-apiserver.

is related to

OCPBUGS-38751 openshift-marketplace pods affect etcd sync latencies

Closed

Assignee:: Kevin Rizza

Reporter:: Gabriel Meghnagi

QA Contact:: Jian Zhang

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/03/15 4:19 PM

Updated:: 2024/12/19 11:03 AM

Resolved:: 2024/06/03 9:59 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates