-
Epic
-
Resolution: Unresolved
-
Normal
-
rhos-18.0.17 FR 5
-
Implement PodRemediator controller for local PVC remediation in infra-operator
-
Feature Tracking
-
False
-
-
False
-
-
Not Selected
-
?
-
?
-
To Do
-
RHOSSTRAT-1214 - Implement PodRemediator for stateful PVC remediation
-
?
-
rhos-ops-platform-services-pidone
-
?
-
50% To Do, 50% In Progress, 0% Done
-
-
-
-
Moderate
Goal:
Implement the PodRemediator controller in the infra-operator to remediate workloads using local PVCs when worker nodes become unhealthy. The controller must integrate with Node Health Check (NHC) and Self Node Remediation (SNR), expose a PodRemediator CRD, and run in the openstack-operators namespace. The goal is to improve resilience for stateful workloads (e.g. Galera, RabbitMQ) without requiring changes inside application operators.
Acceptance Criteria:
- PodRemediator controller deployment runs in the openstack-operators namespace, aligned with other infra-operator controllers.
- RBAC is in place, granting the controller the required permissions
- When NHC and SNR are installed and the supplemental RBAC is applied, the PodRemediator CR reaches status.conditions[type=Ready].status=True.
- A from-scratch POC workflow exists and is automated:
- build/use a custom infra-operator image that includes PodRemediator
- apply CRD and RBAC, deploy PodRemediator CR
- install and configure NHC/SNR
- run a smoke test that validates PodRemediator is Ready and no remediation is triggered while nodes are healthy
- An E2E remediation scenario is automated in the repo (e.g. Ansible playbook + helper script):
- create a stateful workload with a local PVC
- simulate node failure (e.g. virsh destroy of the worker VM) so the node becomes NotReady
- verify that PodRemediator deletes the PVC bound to that node
- verify that the workload is recreated on another node with a new PVC
- mentioned in
-
Page Loading...