Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22361

Custom Metrics Autoscaler restarting every 15 minutes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done-Errata
    • Critical
    • None
    • 4.13.z
    • Pod Autoscaler
    • Critical
    • No
    • 3
    • PODAUTO - Sprint 244
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: What actions or circumstances cause this bug to present.
      Custom Metrics Autoscaler version 3.11.2-311 was released without a required volumeMount in the operator deployment. This caused the Custom Metrics Autoscaler operator pod to restart every 15 minutes. This version adds the required volumentMount to the operator deployment. The operator no longer restarts every 15 minutes.
      Show
      Cause: What actions or circumstances cause this bug to present. Custom Metrics Autoscaler version 3.11.2-311 was released without a required volumeMount in the operator deployment. This caused the Custom Metrics Autoscaler operator pod to restart every 15 minutes. This version adds the required volumentMount to the operator deployment. The operator no longer restarts every 15 minutes.
    • Bug Fix
    • Proposed

    Description

      Description of problem:

      custom-metrics-autoscaler-operator pod is restarting every 15 minutes:
      2023-10-24T09:41:27Z    ERROR   cert-rotation   max retries for checking certs existence        {"error": "timed out waiting for the condition"}
      github.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).ensureCertsMounted
              /remote-source/keda-operator/app/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:853
      2023-10-24T09:41:27Z    INFO    cert-rotation   stopping cert rotator controller
      2023-10-24T09:41:27Z    INFO    Stopping and waiting for non leader election runnables
      2023-10-24T09:41:27Z    INFO    Stopping and waiting for leader election runnables
      2023-10-24T09:41:27Z    INFO    Shutdown signal received, waiting for all workers to finish     {"controller": "kedacontroller", "controllerGroup": "keda.sh", "controllerKind": "KedaController"}
      2023-10-24T09:41:27Z    INFO    Shutdown signal received, waiting for all workers to finish     {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret"}
      2023-10-24T09:41:27Z    INFO    Shutdown signal received, waiting for all workers to finish     {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap"}
      2023-10-24T09:41:27Z    INFO    All workers finished    {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret"}
      2023-10-24T09:41:27Z    INFO    All workers finished    {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap"}
      2023-10-24T09:41:27Z    INFO    All workers finished    {"controller": "kedacontroller", "controllerGroup": "keda.sh", "controllerKind": "KedaController"}
      2023-10-24T09:41:27Z    INFO    Shutdown signal received, waiting for all workers to finish     {"controller": "cert-rotator"}
      2023-10-24T09:41:27Z    INFO    All workers finished    {"controller": "cert-rotator"}
      2023-10-24T09:41:27Z    INFO    Stopping and waiting for caches
      2023-10-24T09:41:27Z    INFO    Stopping and waiting for webhooks
      2023-10-24T09:41:27Z    INFO    Stopping and waiting for HTTP servers
      2023-10-24T09:41:27Z    INFO    controller-runtime.metrics      Shutting down metrics server with timeout of 1 minute
      2023-10-24T09:41:27Z    INFO    shutting down server    {"kind": "health probe", "addr": "[::]:8081"}
      2023-10-24T09:41:27Z    INFO    Wait completed, proceeding to shutdown the manager
      2023-10-24T09:41:27Z    ERROR   setup   problem running manager {"error": "could not mount certs", "errorVerbose": "could not mount certs\ngithub.com/open-policy-agent/cert-controller/pkg/rotator.(*CertRotator).Start\n\t/remote-source/keda-operator/app/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:286\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\t/remote-source/keda-operator/app/vendor/sigs.k8s.io/controller-runtime/pkg/manager/runnable_group.go:223\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598"}
      main.main
              /remote-source/keda-operator/app/main.go:144
      runtime.main
              /usr/lib/golang/src/runtime/proc.go:250
      2023-10-24T09:41:27Z    ERROR   error received after stop sequence was engaged  {"error": "leader election lost"}
      sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1
              /remote-source/keda-operator/app/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:490

      Version-Release number of selected component (if applicable):

      2.11.2-311

      How reproducible:

      100%

      Steps to Reproduce:

      1. Install Custom Metrics Autoscaler
      2. Create a kedacontroller object
      3. Wait for 30 minutes, and you should see 2 restarts of the CMA operator
      

      Actual results:

      3 or 4 restarts within an hour

      Expected results:

      0 or 1 restart within an hour

      Additional info:

       

      Attachments

        Issue Links

          Activity

            People

              joelsmith.redhat Joel Smith
              rhn-support-rauferna Raul Fernandez
              Weinan Liu Weinan Liu
              Raul Fernandez
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: