1. Proposed title of this feature request
Provide a way to stop a OpenShift cluster in a safer manner
2. What is the nature and description of the request?
OpenShift currently provides shutdown & restart procedures, but it sill warns cluster failure and recommends etcd backup.
This RFE requests to add a safer method to shutdown & restart OpenShift clusters, without a need to take backups.
3. Why does the customer need this? (List the business requirements here)
Our customers want to stop OpenShift clusters on a cloud, develop environment during late night hours and holidays, in order to reduce the cloud cost.
However, they want to avoid the following burdens:
taking cluster backup every time (i.e., every evening) to stop the cluster
cluster recovery and/or recreate when the cluster fails; since these procedures typically require a hour or more
4. How would the customer like to achieve this? (List the functional requirements here)
It seems that our customers do not care how OpenShift handles this as long as the above mentioned business requirements are met.
However, here's our proposals to achieve this:
The functionality to automate the shutdown and restart procedure
The functionality to automatically backup etcd data to external storage (e.g., AWS S3) AND to restore it automatically (or less complicated procedure) when the cluster fails
Reducing the backup burden and shorten the cluster recovery time
The functionality to rotate certificates at any time in order to preventing certificates from expiring during the shutdown
The functionality to temporarily disable Machine Health Check (without deleting related resources), and (automatic) re-enable after the cluster restarts
Reducing the chance for manual intervention after the cluster restart
Again, we understand that these are not the only ways to achieve the business requirements. Other ideas are also appreciated.
5. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
OpenShift does not require manual etcd backup before the cluster shutdown
A failed OpenShift cluster would recover in a shorter time than the cluster recreate (i.e., less than 10min.)
6. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?
OpenShift 4.x
7. List any affected packages or components.
OpenShift 4.x
8. Would the customer be able to assist in testing this functionality if implemented?
We and our customers would love to test this feature if implemented.
- is incorporated by
-
OCPSTRAT-1483 Capability to trigger full shutdown of an OpenShift cluster
- New