XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: OCP 4.19.0
Component/s: None
Labels:
- CY25
- CY26
- dkgap
- dkgap-ops
- shortlist25

Activity Type:
Product / Portfolio Work
Story Points:
8
Blocked:
False
Blocked Reason:

Hide

dkgap-team: ECOSYSTEM Dragonfly

Show
dkgap-team: ECOSYSTEM Dragonfly
Ready:
False
Color Status:
Green
Hierarchy Progress Bar:

25% To Do, 75% In Progress, 0% Done
Status Summary:

Hide

2026-02-25: first steps towards CNV-69736 - Validate the storage based (SBD) health check from RHWA-454 works with VMs...

Show
2026-02-25: first steps towards CNV-69736 - Validate the storage based (SBD) health check from RHWA-454 works with VMs...

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Market:

Feature Overview

The customer is experiencing long VM restart time of ~15 minutes when a host fails.

Goals

The expected user outcome is for the VM to restart within seconds on another node in case of a host failure, allowing applications/servers to have minimal downtime.

Requirements

Even with the improvements in FAR (90 seconds to host recovery), this is a small footprint critical system where a failed VM needs to start on another host as quickly as possible, and the failed host rebooted.

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Questions to answer

What system checks or health checks are performed on an IPI installation with the MachineHealthCheck controller?
With the MachineHealthCheck controller, how much time or range of time will the VM take to restart?

Background, and strategic fit

This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
- Customer needs a reference material to show what health checks are done in the case of a host failure and best practices they need to take in order to minimize downtime
Does this feature have doc impact?
- Yes
What concepts do customers need to understand to be successful in [action]?
- The customer needs further information on the above Questions to Answer
How do we expect customers will use the feature? For what purpose(s)?
- They will use this feature to minimize VM downtime due to a host failure
What reference material might a customer want/need to complete [action]?

- Documentation listing steps they need to take to remediate host failure and how long the VM may need to take to restart on another node
Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
- N/A.
What is the doc impact (New Content, Updates to existing content, or Release Note)?
- Release Note

is blocked by

CNV-36134 Reduce time to redeploy VM scheduled on unhealthy node on 4.15.1

Closed

is related to

CNV-30903 Knowledge base article an VM recovery time on 4.14

Closed

CNV-58935 spike: Research how to decrease time to node failure detection

Closed

relates to

CNV-60410 Faster remediation start with baremetal events

In Progress

split from

VIRTSTRAT-545 Faster remediation time

Closed

links to

#tmp-virtstrat-77

brain storming in #fourm-openshift-virtualizaiton

Preview : AWS Windows License Included for ROSA w/ HCP hosts

(3 links to)

Assignee:: Martin Tessun

Reporter:: Faith Bravo (Inactive)

Architect:: Dominik Holler

Votes:: 1 Vote for this issue

Watchers:: 22 Start watching this issue

Created:: 2023/08/08 9:15 PM

Updated:: 2026/02/25 2:08 PM

Details

Description

Feature Overview

Goals

Requirements

Questions to answer

Background, and strategic fit

Documentation Considerations

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates