Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-798

loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 4.10.z
    • Monitoring
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • ?
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The metrics don't respond with error like below in kube-apiserver and the oc commands take a very long time (>30s) to respond from the bastion.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      2022-08-31T13:28:14.786189711+06:00 E0831 07:28:14.786058      18 available_controller.go:546] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get "https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1": dial tcp 10.128.10.197:6443: connect: connection refused
      2022-08-31T13:28:14.792678415+06:00 I0831 07:28:14.792496      18 available_controller.go:496] "changing APIService availability" name="v1beta1.metrics.k8s.io" oldStatus=False newStatus=False message="failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get \"https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1\": dial tcp 10.128.10.197:6443: connect: connection refused" reason="FailedDiscoveryCheck"
      2022-08-31T13:28:14.800577820+06:00 E0831 07:28:14.800509      18 available_controller.go:546] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get "https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1": dial tcp 10.128.10.197:6443: connect: connection refused
      2022-08-31T13:28:14.801286534+06:00 E0831 07:28:14.801214      18 available_controller.go:546] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1: Get "https://10.128.10.197:6443/apis/metrics.k8s.io/v1beta1": dial tcp 10.128.10.197:6443: connect: connection refused
      2022-08-31T13:28:15.660473185+06:00 E0831 07:28:15.660371      18 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
      2022-08-31T13:28:15.660473185+06:00 I0831 07:28:15.660385      18 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Below is repeated in verbose output of oc commands(oc get node -v=10).
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      I0831 13:14:40.047925  932175 round_trippers.go:466] curl -v -XGET  -H "Authorization: Bearer <masked>" -H "Accept: application/json, */*" -H "User-Agent: oc/4.10.0 (linux/amd64) kubernetes/45460a5" 'https://api.prod.banglalinkgsm.com:6443/apis/metrics.k8s.io/v1beta1?timeout=32s'
      I0831 13:14:40.054588  932175 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 5 ms Duration 6 ms
      I0831 13:14:40.054629  932175 round_trippers.go:577] Response Headers:
      I0831 13:14:40.054648  932175 round_trippers.go:580]     Audit-Id: 67b54a4d-b314-4a99-9f51-040bd3728aee
      I0831 13:14:40.054674  932175 round_trippers.go:580]     Audit-Id: 67b54a4d-b314-4a99-9f51-040bd3728aee
      I0831 13:14:40.054695  932175 round_trippers.go:580]     Cache-Control: no-cache, private
      I0831 13:14:40.054713  932175 round_trippers.go:580]     Cache-Control: no-cache, private
      I0831 13:14:40.054732  932175 round_trippers.go:580]     Content-Type: text/plain; charset=utf-8
      I0831 13:14:40.054772  932175 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: d08ae54f-6aa2-4f0b-8188-05518c09d563
      I0831 13:14:40.054795  932175 round_trippers.go:580]     Content-Length: 43
      I0831 13:14:40.054814  932175 round_trippers.go:580]     Date: Wed, 31 Aug 2022 07:14:13 GMT
      I0831 13:14:40.054833  932175 round_trippers.go:580]     Retry-After: 1
      I0831 13:14:40.054852  932175 round_trippers.go:580]     X-Content-Type-Options: nosniff
      I0831 13:14:40.054871  932175 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 373372fa-869f-4ae0-976f-33a6c0da1286
      I0831 13:14:40.054958  932175 with_retry.go:171] Got a Retry-After 1s response for attempt 9 to https://api.prod.banglalinkgsm.com:6443/apis/metrics.k8s.io/v1beta1?timeout=32s
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Below errors are seen in kube-controller-manager logs.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      022-08-31T13:28:03.622075963+06:00 E0831 07:28:03.622017       1 horizontal.go:226] failed to compute desired number of replicas based on listed metrics for Deployment/sdp-prod-apps/renewal-datasync: invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
      2022-08-31T13:28:03.622100808+06:00 I0831 07:28:03.622078       1 event.go:294] "Event occurred" object="sdp-prod-apps/renewal-datasync" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedGetResourceMetric" message="failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
      2022-08-31T13:28:03.622117106+06:00 I0831 07:28:03.622104       1 event.go:294] "Event occurred" object="sdp-prod-apps/renewal-datasync" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedComputeMetricsReplicas" message="invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
      2022-08-31T13:28:09.707854155+06:00 E0831 07:28:09.707746       1 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
      2022-08-31T13:43:14.941377920+06:00 I0831 07:43:14.941290       1 event.go:294] "Event occurred" object="sdp-prod-apps/cdruploader" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedGetResourceMetric" message="failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
      2022-08-31T13:52:40.358398516+06:00 I0831 07:52:40.358321       1 event.go:294] "Event occurred" object="sdp-prod-apps/consent-service" kind="HorizontalPodAutoscaler" apiVersion="autoscaling/v2" type="Warning" reason="FailedGetResourceMetric" message="failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)"
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Started working find after restarting the prometheus-adapter and the thanos-querier pods. The oc commands also started responding fast.
      

      Version-Release number of selected component (if applicable):

      
      

      How reproducible:

      Medium
      

      Steps to Reproduce:

      Not known as the issue happens randomly on it's own and a restart of prometheus-adapter and the thanos-querier pods fixes it temporarily
      

      Actual results:

      oc commands responding very slow and metrics throwing errors
      

      Expected results:

      It shouldn't throw such errors making oc commands respond very late
      

      Additional info:

      Must gather - https://attachments.access.redhat.com/hydra/rest/cases/03302475/attachments/8be4122c-5626-4b8d-a138-f3042e6f3765?usePresignedUrl=true
      

              spasquie@redhat.com Simon Pasquier
              rhn-support-alosingh Alok Singh
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: