Uploaded image for project: 'Ansible Automation Platform RFEs'
  1. Ansible Automation Platform RFEs
  2. AAPRFE-2395

Enhancement Request for Active-Active Deployment Support in Ansible Automation Platform (AAP)

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 2.5
    • topologies
    • False
    • Hide

      None

      Show
      None
    • False

      1] What is the nature and description of the request?
      The customer is requesting product enhancements to Ansible Automation Platform (AAP) to enable Active-Active deployments across multiple datacenters. Their goal is to eliminate single points of failure, achieve high availability (HA), and meet stringent disaster recovery (DR) requirements. The request includes architectural, operational, and functional capabilities that allow multiple AAP instances to run concurrently across different sites with seamless workload distribution, failover, and unified user/API experience.


      2] Why does the customer need this? (Business Requirements)

      • High Availability & Disaster Recovery: Ensure uninterrupted automation services during site or regional outages by removing single points of failure.
      • Stringent SLOs: Meet Recovery Time Objective (RTO ≤ 5 minutes), Recovery Point Objective (RPO ≤ 1 minute), and 99.99% uptime target.
      • Business Continuity: AAP underpins provisioning, patching, compliance enforcement, and configuration management. Outages would directly impact production systems and revenue-generating services.
      • Regulatory Compliance: Operating under SOX, HIPAA, and other frameworks requires strong resiliency, auditability, and data protection mechanisms.
      • Workload Criticality: Tier-1 jobs (security patches, compliance enforcement, production changes) must meet strict recovery and availability requirements.
      • Unified User Experience: Customers require a single pane of glass for management and operations to ensure consistent access and visibility across sites.

      3] How would you like to achieve this? (Functional Requirements)

      • Active-Active Sites:
        • Two concurrently active datacenters with intra-site HA and cross-site resiliency.
        • Each site provisioned for 100% workload capacity to enable seamless failover.
      • Load Balancing & Distribution:
        • Global Server Load Balancer (GSLB) with health checks and proximity-based routing.
        • Configurable session stickiness for UI, stateless routing for API.
        • Policy-based workload distribution (weighted or failover-based) configurable via UI and API.
        • Unified dashboard to display aggregated and site-specific job metrics.
      • Database & Data Consistency:
        • PostgreSQL multi-site support with synchronous or near-real-time asynchronous replication.
        • Unified or reconciled job history across sites.
        • Secrets stored in active-active CyberArk Vault clusters.
        • Execution Environments and collections synchronized across datacenters.
      • Failure Handling:
        • Automated failure detection via GSLB and monitoring.
        • Automated or semi-automated failover with incident alerts.
        • Requeuing of failed/in-flight jobs where possible.
        • Split-brain prevention through quorum and consensus mechanisms.
      • User & API Experience:
        • Single UI/API endpoint, routed intelligently via GSLB.
        • Failover should be seamless or minimally disruptive (≤ 1 minute).
        • CI/CD pipelines must tolerate retries and redirects.
      • Operational Considerations:
        • Rolling upgrades across sites with controlled traffic rebalancing.
        • Centralized monitoring for replication lag, controller health, error rates, EE sync, and traffic distribution.
        • Point-in-time backups for DB, Vault, job history, and metadata.
        • Configuration as Code (CasC) to support backup and recovery.

              dysilva Dylan Silva
              rhn-support-apaygavh Abhishek Paygavhan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: