-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.17
-
None
-
No
-
Rejected
-
False
-
Description of problem:
During the run of the etcd certificate rotation tests that is rotating metrics client/signer certs and bundle we see the metrics taking a long time (15-20m) to recover etcd (or sometimes never). This is similar to https://github.com/prometheus-operator/prometheus-operator/issues/6018 I'm setting this to critical, since this auto rotation is a feature we're delivering in 4.17 and this is causing etcd metrics to go dark for sometimes 15-20 minutes. source slack convo: https://redhat-internal.slack.com/archives/C0VMT03S5/p1719834061510629 reference run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_clus[…]rator-master-e2e-aws-etcd-certrotation/1808773983189864448 search: https://search.dptools.openshift.org/?search=.*&maxAge=48h&context=1&type=junit&name=.*aws-etcd-certrotation.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Version-Release number of selected component (if applicable):
4.17, but surely happened earlier
How reproducible:
always
Steps to Reproduce:
1. you can run the origin test suite with "openshift-test run "openshift/etcd/certrotation" 2. ??? 3. see that the metric signer rotation test either fails or one of the invariants that test the alerts/metrics
Actual results:
metric signer test takes a long time to execute, or the test suite fails entirely because the metrics don't recover during the runtime
Expected results:
I would expect both ca bundle and new client cert to be immediately picked up as they change
Additional info:
work-around would be to bump the respective etcd service monitors, so the secret and bundles are reloaded
- blocks
-
ETCD-585 Auto-rotation of etcd signer certs
- Closed
- links to