XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Critical
Fix Version/s: CNV v4.15.0
Affects Version/s: None
Component/s: CNV Infrastructure
Labels:
- CY24
- no-docs
- no-ui
- no-ux
- reviewed-by-eng

Epic Name:
NHC times KCS
Activity Type:
Product / Portfolio Work
Acceptance Criteria:
Hide

Internally document how much time it takes - for each approach SNR and FAR - a VM to recover if the node dies.

Knowledge base article for recovery time tuning which configurations are possible and tested.

Find opportunities/config change to optimize the eviction from the node / recovery time.

a table to describe the recovery times of the different approaches: SNR and FAR specifically
Show
Internally document how much time it takes - for each approach SNR and FAR - a VM to recover if the node dies. Knowledge base article for recovery time tuning which configurations are possible and tested. Find opportunities/config change to optimize the eviction from the node / recovery time. a table to describe the recovery times of the different approaches: SNR and FAR specifically
Current Status:
Green
Epic Status:
To Do
Feature Link:
VIRTSTRAT-305 - Fencing - Compact & FAR
Parent Link:
VIRTSTRAT-305Fencing - Compact & FAR
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Ready-Ready:

dev-ready, doc-ready, po-ready, qe-ready, ux-ready
Status Summary:

Hide

2024-02-19: on track...

Show
2024-02-19: on track...

Sprint:
CNV Infra 243, CNV Infra Next

Target Version:

CNV v4.15.0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Goal

Internally document how much time it takes a VM to recover is the node dies.
Document in a knowledge base article which configurations are possible and tested.
Find opportunities/config change to optimize the eviction from the node / recovery time.

User Stories

As a VM owner I like that my VM becomes as quick as possible to be recovered if the node running the VM dies, so that my application has only little downtime, even if the node is rebooting with a high frequency.

Non-Requirements

We will start with NHC, MHC is a possible follow up.

Notes

Maybe the SAP cluster could be used?

- Geetika, is the cluster good enough for the performance questions?
6 nodes or 3 nodes cluster?
- Ronen
On which remediator should the scneario be focused? SNR, FAR or Metal3?
- Ronen
There might be help from the virt team required to tune the VM.
There might be help from the NHC for the tuning required.
There is a matrix of combinations which influence the time:
- Remediators (SNR + FAR)
- Health Check
- Cluster size (3 node, 6 node)
  -> we have to start with one combination, and can extend to another scenario
Start with cnv 4.13, the article might refer to cnv 4.14

is cloned by

CNV-39369 Update Knowledge base article an VM recovery time on CNV 4.16

Closed

is depended on by

CNV-36134 Reduce time to redeploy VM scheduled on unhealthy node on 4.15.1

Closed

relates to

CNV-25645 Target additional testing for NHC and SNR with compact clusters

Closed

VIRTSTRAT-77 Fencing: Additional out-of-band health checks for faster remediation

In Progress

links to

article: OpenShift Virtualization – VM High Availability Guide

discussion about node adjustment

Node recovery delays - Dragonfly

Shorten Recovery Delays

(3 links to)

Assignee:: Javier Cano Cano

Reporter:: Dominik Holler

QA Contact:: Geetika Kapoor

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/07/13 11:41 AM

Updated:: 2025/08/04 9:14 PM

Resolved:: 2024/02/28 5:16 PM

Details

Description

Goal

User Stories

Non-Requirements

Notes

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates