Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36611

prometheus-operator needs to sync ServiceMonitor secrets+configmaps

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.17
    • Monitoring
    • None
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During the run of the etcd certificate rotation tests that is rotating metrics client/signer certs and bundle we see the metrics taking a long time (15-20m) to recover etcd (or sometimes never).
      
      This is similar to https://github.com/prometheus-operator/prometheus-operator/issues/6018
      
      I'm setting this to critical, since this auto rotation is a feature we're delivering in 4.17 and this is causing etcd metrics to go dark for sometimes 15-20 minutes.
      
      source slack convo:
      https://redhat-internal.slack.com/archives/C0VMT03S5/p1719834061510629
      
      reference run:
      https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_clus[…]rator-master-e2e-aws-etcd-certrotation/1808773983189864448
      
      search:
      https://search.dptools.openshift.org/?search=.*&maxAge=48h&context=1&type=junit&name=.*aws-etcd-certrotation.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      Version-Release number of selected component (if applicable):

      4.17, but surely happened earlier    

      How reproducible:

      always    

      Steps to Reproduce:

        1. you can run the origin test suite with "openshift-test run "openshift/etcd/certrotation" 
          2. ???
          3. see that the metric signer rotation test either fails or one of the invariants that test the alerts/metrics

      Actual results:

      metric signer test takes a long time to execute, or the test suite fails entirely because the metrics don't recover during the runtime

      Expected results:

      I would expect both ca bundle and new client cert to be immediately picked up as they change    

      Additional info:

      work-around would be to bump the respective etcd service monitors, so the secret and bundles are reloaded

              mariofer@redhat.com Mario Fernandez Herrero
              tjungblu@redhat.com Thomas Jungblut
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: