Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-151839

[RFE] Add autoheal/state control to the IPA server deployment

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-10.1
    • ipa
    • None
    • IPA state control
    • None
    • FutureFeature
    • rhel-idm-ipa
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • All
    • None

      Description

      As a system administrator, I want the FreeIPA deployment to be highly available and operationally robust by implementing intelligent health awareness and automated recovery behaviors. Specifically, the system should fulfill the following goals:

      • Be aware of the real-time health state of all IPA replicas
        Continuously monitor and expose the operational status of each replica (healthy / degraded / unhealthy / maintenance / hidden).
      • Automatically remove unhealthy or maintenance replicas from client traffic pools
        Withdraw replicas from DNS SRV pools (e.g., via dynamic DNS updates, health-check based removal from LDAP/Kerberos service records) when they enter a bad health state or are placed in maintenance mode.
      • Automatically reintroduce healed replicas into service
        Re-add replicas to client-facing pools once they return to a healthy state (automatic re-healing when self-diagnosed issues are resolved).
      • Implement dependency-aware health checks
        Tie a replica's reported health to the availability and correct functioning of its critical dependencies.
        Example: A KDC should be marked unhealthy and removed from rotation/put down if its local LDAP backend is unavailable or responding incorrectly. This allows clients to automatically failover to healthy replicas instead of being stuck trying a broken instance.
      • Support extensible / pluggable health evaluation logic
        Provide an architectural framework that makes it easy to add new health triggers and conditions in the future without major refactoring.
        Examples of future extensions:
      • React to self-state changes (e.g., CA certificate list change, shared certificate change, replica list change, replication lag exceeding threshold)
      • Ideally integrate external signals (e.g., monitoring alerts, node-level metrics like memory leaking, network issues)
      • Possibility of custom scripts for site-specific checks

      These capabilities should work together to achieve the following outcomes:

      • Minimize client-perceived downtime during replica failures or maintenance
      • Reduce manual intervention for common failure modes
      • Improve overall cluster resilience and observability

      What SSTs and Layered Product teams should review this?

      FreeIPA dev team.

              frenaud@redhat.com Florence Renaud
              rhn-support-asharov Aleksandr Sharov
              Florence Renaud Florence Renaud
              Sudhir Menon Sudhir Menon
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: