Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-8541

DOC caveats for Hibernate 2.9 refresh and 2.10

XMLWordPrintable

    • False
    • None
    • False
    • No

      Describe the changes in the doc and link to your dev story

      Provide info for the following steps:

      1. - [X] Mandatory Add the required version to the Fix version/s field.

      2. - [ ] Mandatory Choose the type of documentation change.

            - [ ] New topic in an existing section or new section
            - [X] Update to an existing topic

      4. - [ ] Mandatory for bugs: What is the diff? Clearly define what the problem is, what the change is, and link to the current documentation:

       

      Please see https://issues.redhat.com/browse/ACM-8428

      As part of the effort to make Hibernate GA, we need to also include some best practices to ensure that customers do not crash a cluster when using hibernate.

       

      Background
      The ability to hibernate your OCP clusters in the cloud can be leveraged to save on cloud costs. This feature is currently implemented via Hive API which builds upon our public OCP documentation (https://docs.openshift.com/container-platform/4.12/backup_and_restore/graceful-cluster-shutdown.html) for graceful cluster shutdown and restart. For more information about Hive API and the Hibernate feature see https://github.com/openshift/hive/blob/master/docs/hibernating-clusters.md

      To Hibernate, Hive effectively shuts down all VM instances and then to resume, it starts them back up and approves any pending CSRs.

      There are some limitations and best practices to be aware of:

      • do not hibernate OCP clusters within 24 hours of cluster provision
      • do not hibernate OCP clusters longer than 60 days
      • do not hibernate OCP clusters if MachineConfigPools are in the updating status
      • upon cluster resume, do not hibernate the OCP cluster again until ensuring all cluster operations are running in a normal state:
        • checking the output of "oc get co" and "oc get clusterversion" to ensure nothing reports pending progress or updates occurring
      • cluster resume could span ~5-45 minutes depending on cloud services and OCP cluster operations settling

            bswope@redhat.com Brandi Swope
            sberens@redhat.com Scott Berens
            Balachandran Chandrasekaran, Bradd Weidenbenner, Eric Fried, Jeffrey Brent (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: