Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7694

prometheus adapter crashlooping

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done-Errata
    • Normal
    • 4.14.0
    • 4.12
    • Monitoring
    • None
    • +
    • Moderate
    • No
    • MON Sprint 232
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Before this update, the lack of a startup probe prevented the Prometheus Adapter pods from starting when the Kubernetes API had many custom resource definitions installed because the program initialization would take longer than what was allowed by the liveness probe. With this update, the Prometheus Adapter pods are now configured with a startup probe that waits five minutes before failing, thereby resolving the issue. link:https://issues.redhat.com/browse/OCPBUGS-7694[OCPBUGS-7694]
      Show
      * Before this update, the lack of a startup probe prevented the Prometheus Adapter pods from starting when the Kubernetes API had many custom resource definitions installed because the program initialization would take longer than what was allowed by the liveness probe. With this update, the Prometheus Adapter pods are now configured with a startup probe that waits five minutes before failing, thereby resolving the issue. link: https://issues.redhat.com/browse/OCPBUGS-7694 [ OCPBUGS-7694 ]
    • Bug Fix
    • Done

    Description

      Trying to update my cluster from 4.12.0 to 4.12.2 and this resulted in a crashlooping state for both prometheus adapter pods. Tried to downgrade back to 4.12.0 and then upgrade to 4.12.4 but neither approach solved the situation.

       

      What I can see in the logs of the adapters is the following:

       

      I0216 15:24:59.144559 1 adapter.go:114] successfully using in-cluster auth
      I0216 15:25:00.345620 1 request.go:601] Waited for 1.180640418s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
      I0216 15:25:10.345634 1 request.go:601] Waited for 11.180149045s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/triggers.tekton.dev/v1beta1?timeout=32s
      I0216 15:25:20.346048 1 request.go:601] Waited for 2.597453714s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
      I0216 15:25:30.347435 1 request.go:601] Waited for 12.598768922s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
      I0216 15:25:40.545767 1 request.go:601] Waited for 22.797001115s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/samples.operator.openshift.io/v1?timeout=32s
      I0216 15:25:50.546588 1 request.go:601] Waited for 32.797748538s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/metrics.k8s.io/v1beta1?timeout=32s
      I0216 15:25:56.041594 1 secure_serving.go:210] Serving securely on [::]:6443
      I0216 15:25:56.042265 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/etc/tls/private/tls.crt::/etc/tls/private/tls.key"
      I0216 15:25:56.042971 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/tls/private/requestheader-client-ca-file"
      I0216 15:25:56.043309 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
      I0216 15:25:56.043310 1 object_count_tracker.go:84] "StorageObjectCountTracker pruner is exiting"
      I0216 15:25:56.043398 1 dynamic_serving_content.go:146] "Shutting down controller" name="serving-cert::/etc/tls/private/tls.crt::/etc/tls/private/tls.key"
      I0216 15:25:56.043562 1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
      I0216 15:25:56.043606 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/tls/private/client-ca-file"
      I0216 15:25:56.043614 1 secure_serving.go:255] Stopped listening on [::]:6443
      I0216 15:25:56.043621 1 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/tls/private/client-ca-file"
      I0216 15:25:56.043635 1 dynamic_cafile_content.go:171] "Shutting down controller" name="request-header::/etc/tls/private/requestheader-client-ca-file"

      I also tried to search online for known issues and bugs and found this one that might be related:

      https://github.com/kubernetes-sigs/metrics-server/issues/983

      I also tried rebooting the server but it didn't help.

      Need a workaround at least because at the moment the cluster is still in a pending stage.

      Attachments

        Activity

          People

            spasquie@redhat.com Simon Pasquier
            lucamaf Luca Mattia Ferrari
            Brian Burt Brian Burt
            Brian Burt
            Votes:
            1 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: