Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-27280

Implement PodRemediator controller and CRD in infra-operator (development phase)

XMLWordPrintable

    • Sprint 17
    • 1

      Description:

      Implement the PodRemediator feature in infra-operator. 

      This includes:

      • CRD and API: Add the PodRemediator custom resource under remediation.openstack.org (e.g. v1beta1): types for Spec (e.g. namespaces to watch, EnablePVCRemediation) and Status (conditions), OpenAPI validation, defaulting, and CRD manifests.
      • Controller logic: Implement the reconciler in internal/controller/remediation/ so that it:
        • Checks that Node Health Check (NHC) and Self Node Remediation (SNR) are present (e.g. via dynamic client on medik8s APIs); sets Ready=False with a clear message when they are missing.
        • Lists cluster nodes and determines which are unhealthy (e.g. NodeReady not True).
        • For each configured namespace, lists PVCs, resolves the bound PV, and determines if the PV is “local” (e.g. node affinity + Local / HostPath / CSI) and which node it is on (e.g. kubernetes.io/hostname, LVMS/TopoLVM topology keys).
        • For PVCs bound to a local PV on an unhealthy node: optionally annotates the PVC with a “pending-deletion” intent (so a restarted controller can resume), then deletes the PVC so the workload can be rescheduled.
        • Sets Ready=True when NHC/SNR are present and (if applicable) EnablePVCRemediation is true and there are no blocking errors; otherwise sets Ready=False with an appropriate reason/message.
      • Wiring: Register the controller in main.go (scheme, client, dynamic client, manager), set up watches for PodRemediator, Node, PVC, and optionally Pod so reconciliation is triggered on relevant changes.

      Acceptance criteria:

      • PodRemediator CRD and API types are implemented and generated (e.g. make generate); CRD can be applied to the cluster.
      • Controller runs in the infra-operator manager in openstack-operators and reconciles PodRemediator CRs.
      • When NHC and SNR are installed and supplemental RBAC is applied, the PodRemediator CR reaches status.conditions[type=Ready].status=True; when NHC/SNR are missing or check fails, Ready is False with a clear message.
      • When a node hosting a local PV becomes NotReady, the controller deletes PVCs bound to that node’s local PVs (with optional pending-deletion annotation for resume); when nodes are healthy, no PVCs are deleted.
      • Local PV detection supports at least node-affinity-based local volumes (e.g. hostPath, local, CSI with node affinity) and common topology keys (e.g. hostname, LVMS/TopoLVM).
      • RBAC is in place for nodes, PVs, PVCs, NHC/SNR APIs, and PodRemediator CRs; controller has no broader permissions than needed.
      • Unit or functional tests cover the main behaviour (e.g. NHC/SNR check, local PV detection, or reconciliation logic) as per project standards.

              rhn-support-aromito Antonio Romito
              rhn-support-aromito Antonio Romito
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: