-
Feature
-
Resolution: Done
-
Undefined
-
None
-
None
-
0% To Do, 0% In Progress, 100% Done
Goal
- Improve etcd quality so we have lesser bugs
- Improve documentation and blog posts coverage so etcd information is self serving
- Help customers troubleshoot cluster outages faster
- Aid with automated backup and disaster recovery use cases
- Improve telemetry for proactive cluster health assessment.
Why is this important?
- A number of customers have reported problems recovering from production outages
https://docs.google.com/document/d/1ULGQARWdxjujWpSyncY0pKrUG9OcT0PlhEmYVwrPEAE/edit?ts=5eb18ea3
https://issues.redhat.com/browse/RFE-178
https://bugzilla.redhat.com/show_bug.cgi?id=1656302 - In general, we need to improve documentation demonstrating easy backups/restore scenarios. We need to provide a "fully tested" procedure. Enterprise customer need concrete steps for the backup/restore. The docs are not clear that what admin should take backups of and what users should take backups of . We would like to get this clarify in the docs.
- is related to
-
OCPPLAN-7574 Tune etcd leader elections to minimize interruptions
- New