Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Activity Type:
Product / Portfolio Work
Parent Link:
None
Hierarchy Progress Bar:

0% To Do, 100% In Progress, 0% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Size:
None

Target Version:

openshift-4.22
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Priority Data:
None
PX Impact Score:
PX Technical Impact:
None
PX Impact Range:
None
PX Scheduling Request:
None
PX Technical Impact Notes:
None

Intelligence Requested:
Market:

Feature Overview

Implement a high-performance backup mechanism for the HyperShift OADP plugin that ensures data consistency while maintaining cluster availability. By shifting from a long-duration reconciliation freeze to a targeted snapshot approach, we will enable near-continuous operation of hosted cluster controllers. This solution focuses on decoupling the backup process from the global reconciliation loop, ensuring that critical day-two operations like autoscaling and node provisioning remain functional throughout the backup window

Why is this important?

To maintain a high-quality user experience and meet Recovery Point Objective (RPO) targets, we must reduce or eliminate the reconciliation pause during backups. Currently, this pause in the hypershift-oadp-plugin implementation ** prevents critical day-two operations, such as autoscaling, for up to 6 to 8 minutes per snapshot (often 20-30 minutes total per backup cycle on Azure). For customers requiring hourly backups, this translates to over two hours of daily downtime for cluster operations, which is unacceptable for production workloads.

Proposed solution

Implement Pre-Backup Hooks: Utilize OADP pre-backup job hooks to execute an etcdctl snapshot directly to disk.
Minimize Pause Duration: Transition to using the disk-based etcd snapshot for the Persistent Volume (PV) backup, allowing the reconciliation pause to be either eliminated or restricted only to the brief period required for the initial etcdctl command.
Ensure Consistency: This approach provides a tangible, consistent state for etcd without long-term freezing of cluster controllers

Design

Design doc: https://docs.google.com/document/d/1zIor04wVvZukruaHr3iz9XJBbKh0akzkGEq86aHyPRo/edit?tab=t.ekw0ujiqg0xs#heading=h.mccc0wnss45f

links to

openshift/enhancements#1945: Add EtcdBackup CRD enhancement for OADP integration

Assignee:: Yu Li

Reporter:: Ramon Acedo

Need Info From:: None

Contributors:: Liangquan Li, Martin Gencur, Salvatore Dario Minonne

Architect:: Juan Manuel Parrilla Madrid

QA Contact:: Ge Liu

Doc Contact:: Matthew Werner

Product Operations Engineering Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/12/18 9:56 AM

Updated:: 2026/02/19 6:25 PM

Details

Description

Feature Overview

Why is this important?

Proposed solution

Design

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates