Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.10.z
Component/s: Monitoring
Labels:
- monitoring

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Test Coverage:

?

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The metrics don't respond with error like below in kube-apiserver and the oc commands take a very long time (>30s) to respond from the bastion.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-08-31T13:28:14.786189711+06:00 E0831 07:28:14.786058      18 available_controller.go:546] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get "https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1": dial tcp 10.128.10.197:6443: connect: connection refused
2022-08-31T13:28:14.792678415+06:00 I0831 07:28:14.792496      18 available_controller.go:496] "changing APIService availability" name="v1beta1.metrics.k8s.io" oldStatus=False newStatus=False message="failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get \"https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1\": dial tcp 10.128.10.197:6443: connect: connection refused" reason="FailedDiscoveryCheck"
2022-08-31T13:28:14.800577820+06:00 E0831 07:28:14.800509      18 available_controller.go:546] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get "https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1": dial tcp 10.128.10.197:6443: connect: connection refused
2022-08-31T13:28:14.801286534+06:00 E0831 07:28:14.801214      18 available_controller.go:546] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get "https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1": dial tcp 10.128.10.197:6443: connect: connection refused
2022-08-31T13:28:15.660473185+06:00 E0831 07:28:15.660371      18 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
2022-08-31T13:28:15.660473185+06:00 I0831 07:28:15.660385      18 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Below is repeated in verbose output of oc commands(oc get node -v=10).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I0831 13:14:40.047925  932175 round_trippers.go:466] curl -v -XGET  -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" -H "User-Agent: oc/4.10.0 (linux/amd64) kubernetes/45460a5" 'https://api.prod.banglalinkgsm.com:6443/apis/metrics.k8s.io/v1beta1?timeout=32s'
I0831 13:14:40.054588  932175 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 5 ms Duration 6 ms
I0831 13:14:40.054629  932175 round_trippers.go:577] Response Headers:
I0831 13:14:40.054648  932175 round_trippers.go:580]     Audit-Id: 67b54a4d-b314-4a99-9f51-040bd3728aee
I0831 13:14:40.054674  932175 round_trippers.go:580]     Audit-Id: 67b54a4d-b314-4a99-9f51-040bd3728aee
I0831 13:14:40.054695  932175 round_trippers.go:580]     Cache-Control: no-cache, private
I0831 13:14:40.054713  932175 round_trippers.go:580]     Cache-Control: no-cache, private
I0831 13:14:40.054732  932175 round_trippers.go:580]     Content-Type: text/plain; charset=utf-8
I0831 13:14:40.054772  932175 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: d08ae54f-6aa2-4f0b-8188-05518c09d563
I0831 13:14:40.054795  932175 round_trippers.go:580]     Content-Length: 43
I0831 13:14:40.054814  932175 round_trippers.go:580]     Date: Wed, 31 Aug 2022 07:14:13 GMT
I0831 13:14:40.054833  932175 round_trippers.go:580]     Retry-After: 1
I0831 13:14:40.054852  932175 round_trippers.go:580]     X-Content-Type-Options: nosniff
I0831 13:14:40.054871  932175 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 373372fa-869f-4ae0-976f-33a6c0da1286
I0831 13:14:40.054958  932175 with_retry.go:171] Got a Retry-After 1s response for attempt 9 to https://api.prod.banglalinkgsm.com:6443/apis/metrics.k8s.io/v1beta1?timeout=32s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Below errors are seen in kube-controller-manager logs.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
022-08-31T13:28:03.622075963+06:00 E0831 07:28:03.622017       1 horizontal.go:226] failed to compute desired number of replicas based on listed metrics for Deployment/sdp-prod-apps/renewal-datasync: invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
2022-08-31T13:28:03.622100808+06:00 I0831 07:28:03.622078       1 event.go:294] "Event occurred" object="sdp-prod-apps/renewal-datasync" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedGetResourceMetric" message="failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
2022-08-31T13:28:03.622117106+06:00 I0831 07:28:03.622104       1 event.go:294] "Event occurred" object="sdp-prod-apps/renewal-datasync" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedComputeMetricsReplicas" message="invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
2022-08-31T13:28:09.707854155+06:00 E0831 07:28:09.707746       1 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
2022-08-31T13:43:14.941377920+06:00 I0831 07:43:14.941290       1 event.go:294] "Event occurred" object="sdp-prod-apps/cdruploader" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedGetResourceMetric" message="failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
2022-08-31T13:52:40.358398516+06:00 I0831 07:52:40.358321       1 event.go:294] "Event occurred" object="sdp-prod-apps/consent-service" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedGetResourceMetric" message="failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Started working find after restarting the prometheus-adapter and the thanos-querier pods. The oc commands also started responding fast.

Version-Release number of selected component (if applicable):

How reproducible:

Medium

Steps to Reproduce:

Not known as the issue happens randomly on it's own and a restart of prometheus-adapter and the thanos-querier pods fixes it temporarily

Actual results:

oc commands responding very slow and metrics throwing errors

Expected results:

It shouldn't throw such errors making oc commands respond very late

Additional info:

Must gather - https://attachments.access.redhat.com/hydra/rest/cases/03302475/attachments/8be4122c-5626-4b8d-a138-f3042e6f3765?usePresignedUrl=true

Assignee:: Simon Pasquier

Reporter:: Alok Singh

Need Info From:: None

Contributors:: None

QA Contact:: Junqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/09/01 10:17 AM

Updated:: 2025/07/29 11:38 AM

Resolved:: 2022/09/01 2:01 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide