XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Major
Fix Version/s: openshift-4.19
Affects Version/s: None
Component/s: Hive
Labels:
- 4.19-candidate
- FAC:Blue
- OSD
- gcp

Activity Type:
Product / Portfolio Work
Parent Link:
None
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Size:
None

Target Version:

openshift-4.19
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Priority Data:
None
PX Impact Score:
None
PX Technical Impact:
None
PX Impact Range:
None
PX Scheduling Request:
PX Technical Impact Notes:

Intelligence Requested:
Market:

Epic Goal

Make hibernation feature able to stop GCP VirtualMachines which have Local SSD(s) attached (e.g., a2-ultragpu-4g- NVIDIA A100 GPUs).

Why is this important?

clusters with GPU nodes becomes partially hibernated with all the nodes except GPU ones stopped - GPUs keeps being billed
user don't have idea of it happening until they check on Google Cloud console or read the Hive ClusterDeployment status
it affects both Managed (OSD) and self-managed clusters

Scenarios

Create a cluster with Hive on GCP
add a GPU worker node to the cluster - use flavor a2-ultragpu-4 for example
trigger cluster hibernation via Hive
check the VM status in GCP console
check the Hive Cluster Deployment conditions

Acceptance Criteria

GPU nodes get hibernated as the other worker/master nodes
Hive exposes a corresponding option to --discard-local-ssd (
https://cloud.google.com/compute/docs/disks/local-ssd#stop_instance)
others TBD

Dependencies (internal and external)

OCM has a dependency on Hive ?

Previous Work (Optional):

…

Open questions::

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

clones

HIVE-2693 [GCP] Handle hibernation for VMs with Local SSDs

Closed

links to

openshift/hive#2534: GCP: Hibernation: Support SSDs

openshift/openshift-tests-private#22546: [HIVE-2696] auto OCP-78499: Enable hibernation for VMs with Local SSDs on GCP.

mentioned in: Page Loading...

Assignee:: Ju Lim

Reporter:: Berto D'Attoma

Need Info From:: None

Contributors:: Eric Fried, Ju Lim, Marcos Entenza Garcia, Mike Worthington

Architect:: Scott Dodson

Developer:: Eric Fried

QA Contact:: Mingxia Huang

Doc Contact:: Jeana Routh

Product Operations Engineering Contact:: Derrick Ornelas

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/12/16 8:42 PM

Updated:: 2025/09/02 9:11 PM

Resolved:: 2025/03/28 3:00 PM