-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.9
-
Quality / Stability / Reliability
-
None
-
None
-
3
-
Moderate
-
None
-
All
-
None
-
None
-
None
-
None
-
None
-
If docs needed, set a value
-
None
-
None
-
None
-
None
-
None
Description of problem:
In larger clusters, we are seeing the VPA webhook admission time increasing. Changes in timeout values in 4.9 help help pod creation not to fail, but the client side throttling is still slowing everything down.
Graphing `sum(rate(apiserver_admission_webhook_admission_duration_seconds_count
{operation="CREATE",rejected="false"}[1m])) by (name)` we can see that that VPA admission durations spike up between 11 and 31seconds
Looking at both the vpa-admission-plugin and the vpa-updater pods show significant throttling:
2022-05-26T06:09:47.567143786Z I0526 06:09:47.567078 1 trace.go:116] Trace[911902081]: "Reflector ListAndWatch" name:k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:95 (started: 2022-05-26 06:09:37.002034365 +0000 UTC m=+1.813803928) (total time: 10.564999592s):
2022-05-26T06:09:47.567143786Z Trace[911902081]: [10.54507419s] [10.54507419s] Objects listed
2022-05-26T06:09:47.603041102Z I0526 06:09:47.602902 1 fetcher.go:100] Initial sync of ReplicationController completed
2022-05-26T06:09:48.104000169Z I0526 06:09:48.103942 1 fetcher.go:100] Initial sync of Job completed
2022-05-26T06:09:48.206281230Z I0526 06:09:48.204097 1 fetcher.go:100] Initial sync of CronJob completed
2022-05-26T06:09:48.304442123Z I0526 06:09:48.304371 1 fetcher.go:100] Initial sync of DaemonSet completed
2022-05-26T06:09:58.891853970Z I0526 06:09:58.891786 1 trace.go:116] Trace[607811211]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:135 (started: 2022-05-26 06:09:48.304811597 +0000 UTC m=+13.116581178) (total time: 10.586932694s):
2022-05-26T06:09:58.891853970Z Trace[607811211]: [10.561769516s] [10.561769516s] Objects listed
2022-05-26T06:45:59.185521417Z I0526 06:45:59.185414 1 request.go:621] Throttling request took 1.198226163s, request: GET:https://10.98.0.1:443/apis/certificates.k8s.io/v1beta1?timeout=32s
2022-05-26T06:46:09.385012980Z I0526 06:46:09.384965 1 request.go:621] Throttling request took 11.397666118s, request: GET:https://10.98.0.1:443/apis/apiserver.openshift.io/v1?timeout=32s
2022-05-26T06:46:19.584660745Z I0526 06:46:19.584606 1 request.go:621] Throttling request took 21.597286629s, request: GET:https://10.98.0.1:443/apis/operators.coreos.com/v2?timeout=32s
2022-05-26T11:49:39.261982690Z I0526 11:49:39.261923 1 request.go:621] Throttling request took 1.198861744s, request: GET:https://10.98.0.1:443/apis/authentication.k8s.io/v1?timeout=32s
2022-05-26T11:49:49.461548564Z I0526 11:49:49.461475 1 request.go:621] Throttling request took 11.39831947s, request: GET:https://10.98.0.1:443/apis/project.openshift.io/v1?timeout=32s
2022-05-26T11:49:59.661101936Z I0526 11:49:59.661037 1 request.go:621] Throttling request took 21.597846712s, request: GET:https://10.98.0.1:443/apis/utils.devops.gov.bc.ca/v1?timeout=32s
2022-05-26T12:00:01.798488449Z I0526 12:00:01.798404 1 request.go:621] Throttling request took 1.198039866s, request: GET:https://10.98.0.1:443/apis/rbac.authorization.k8s.io/v1beta1?timeout=32s
2022-05-26T12:00:11.799177479Z I0526 12:00:11.799070 1 request.go:621] Throttling request took 11.198579542s, request: GET:https://10.98.0.1:443/apis/integreatly.org/v1alpha1?timeout=32s
2022-05-26T12:00:21.997400272Z I0526 12:00:21.997333 1 request.go:621] Throttling request took 21.396652936s, request: GET:https://10.98.0.1:443/apis/imageregistry.operator.openshift.io/v1?timeout=32s
Right now VPA seems to use the default 5/10 for qps/burst the could be limiting factor in larger clusters.
Version-Release number of selected component (if applicable):
4.8/4.9
How reproducible:
Pretty reliable in larger clusters
Steps to Reproduce:
1.
2.
3.
Actual results:
VPA admission is quite slow and could cause pod startup to fail before 4.9 timeouts
Expected results:
Admission should be much faster
Additional info: