Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-27279

Implement PodRemediator controller for local PVC remediation in infra-operator

XMLWordPrintable

    • Implement PodRemediator controller for local PVC remediation in infra-operator
    • Feature Tracking
    • False
    • Hide

      None

      Show
      None
    • False
    • RHOSSTRAT-1214Implement PodRemediator for stateful PVC remediation
    • Not Selected
    • ?
    • ?
    • To Do
    • RHOSSTRAT-1214 - Implement PodRemediator for stateful PVC remediation
    • ?
    • rhos-ops-platform-services-pidone
    • ?
    • 50% To Do, 50% In Progress, 0% Done
    • Moderate

      Goal:

      Implement the PodRemediator controller in the infra-operator to remediate workloads using local PVCs when worker nodes become unhealthy. The controller must integrate with Node Health Check (NHC) and Self Node Remediation (SNR), expose a PodRemediator CRD, and run in the openstack-operators namespace. The goal is to improve resilience for stateful workloads (e.g. Galera, RabbitMQ) without requiring changes inside application operators.

      Acceptance Criteria:

      • PodRemediator controller deployment runs in the openstack-operators namespace, aligned with other infra-operator controllers.
      • RBAC is in place, granting the controller the required permissions
      • When NHC and SNR are installed and the supplemental RBAC is applied, the PodRemediator CR reaches status.conditions[type=Ready].status=True.
      • A from-scratch POC workflow exists and is automated:
        • build/use a custom infra-operator image that includes PodRemediator
        • apply CRD and RBAC, deploy PodRemediator CR
        • install and configure NHC/SNR
        • run a smoke test that validates PodRemediator is Ready and no remediation is triggered while nodes are healthy
      • An E2E remediation scenario is automated in the repo (e.g. Ansible playbook + helper script):
        • create a stateful workload with a local PVC
        • simulate node failure (e.g. virsh destroy of the worker VM) so the node becomes NotReady
        • verify that PodRemediator deletes the PVC bound to that node
        • verify that the workload is recreated on another node with a new PVC

              rhn-support-aromito Antonio Romito
              rhn-support-aromito Antonio Romito
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: