Resolution: Done
0% To Do, 0% In Progress, 100% Done
- Improve etcd quality so we have lesser bugs
- Improve documentation and blog posts coverage so etcd information is self serving
- Help customers troubleshoot cluster outages faster
- Aid with automated backup and disaster recovery use cases
- Improve telemetry for proactive cluster health assessment.
Why is this important?
- A number of customers have reported problems recovering from production outages
https://bugzilla.redhat.com/show_bug.cgi?id=1656302 - In general, we need to improve documentation demonstrating easy backups/restore scenarios. We need to provide a "fully tested" procedure. Enterprise customer need concrete steps for the backup/restore. The docs are not clear that what admin should take backups of and what users should take backups of . We would like to get this clarify in the docs.
- is related to
OCPPLAN-7574 Tune etcd leader elections to minimize interruptions
- New