Loading...

XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Normal
Fix Version/s: rhos-19.0.0 GA
Affects Version/s: rhos-18.0.17 FR 5
Component/s: infra-operator
Labels:
- triaged

Epic Name:
Implement PodRemediator controller for local PVC remediation in infra-operator
Activity Type:
Feature Tracking
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Parent Link:
RHOSSTRAT-1214Implement PodRemediator for stateful PVC remediation
Color Status:
Not Selected
Dev Approval:
?
Docs Approval:
?
Epic Status:
To Do
Feature Link:
RHOSSTRAT-1214 - Implement PodRemediator for stateful PVC remediation
PM Approval:
?
AssignedTeam:
rhos-ops-platform-services-pidone
QE Approval:
?
Hierarchy Progress Bar:

50% To Do, 50% In Progress, 0% Done
Intelligence Requested:
Market:
PX Impact Score:

Severity:
Moderate

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Goal:

Implement the PodRemediator controller in the infra-operator to remediate workloads using local PVCs when worker nodes become unhealthy. The controller must integrate with Node Health Check (NHC) and Self Node Remediation (SNR), expose a PodRemediator CRD, and run in the openstack-operators namespace. The goal is to improve resilience for stateful workloads (e.g. Galera, RabbitMQ) without requiring changes inside application operators.

Acceptance Criteria:

PodRemediator controller deployment runs in the openstack-operators namespace, aligned with other infra-operator controllers.
RBAC is in place, granting the controller the required permissions
When NHC and SNR are installed and the supplemental RBAC is applied, the PodRemediator CR reaches status.conditions[type=Ready].status=True.
A from-scratch POC workflow exists and is automated:
- build/use a custom infra-operator image that includes PodRemediator
- apply CRD and RBAC, deploy PodRemediator CR
- install and configure NHC/SNR
- run a smoke test that validates PodRemediator is Ready and no remediation is triggered while nodes are healthy

An E2E remediation scenario is automated in the repo (e.g. Ansible playbook + helper script):
- create a stateful workload with a local PVC
- simulate node failure (e.g. virsh destroy of the worker VM) so the node becomes NotReady
- verify that PodRemediator deletes the PVC bound to that node
- verify that the workload is recreated on another node with a new PVC

mentioned in: Page Loading...

Assignee:: Antonio Romito

Reporter:: Antonio Romito

Team:: rhos-dfg-pidone

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2026/03/06 9:23 AM

Updated:: 2026/03/09 11:59 AM

Details

Description

Goal:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty