Uploaded image for project: 'Ansible Automation Platform RFEs'
  1. Ansible Automation Platform RFEs
  2. AAPRFE-2159

Add an ability to monitor amount of pending jobs per instance group.

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      1. What is the nature and description of the request?

      Customer would like to have job metrics—specifically pending and running jobs—exposed per container group in AAP. Currently, the /api/v2/metrics endpoint only provides global metrics:
      awx_running_jobs_total  
      awx_pending_jobs_total

      This aggregation makes it difficult to monitor the performance of individual container groups. Each group in our deployment has different concurrency limits and operational roles.

      For example:
      Customer's uat-jobs container group has a concurrency limit of 50 jobs. It often receives large batches (hundreds or thousands), which can result in a high number of pending jobs that do not impact other workloads. However, since the current metrics are global, our monitoring system interprets these spikes as system-wide issues and triggers unnecessary P2 alerts.
      Customer want to monitor job counts (pending/running) per container group so can tailor alerting thresholds based on the group’s purpose and SLA sensitivity. This would also enable us to explore future enhancements like per-organization job limits via dedicated container groups.

       

      2. Why does the customer need this? (List the business requirements here)

      To enable more granular and accurate monitoring of job queues.
      To reduce false alerts caused by global metrics that don’t reflect group-specific behavior.
      To lay the groundwork for future scaling strategies, such as per-org container groups.

       

      3. How would you like to achieve this? (List the functional requirements here)

      Customer propose extending the /metrics endpoint to include metrics labeled by instance group, such as:
      awx_pending_jobs{instance_group="uat-jobs"} 532.0  
      awx_running_jobs{instance_group="uat-jobs"} 50.0  
      awx_pending_jobs{instance_group="aap-jobs"} 3.0  
      awx_running_jobs{instance_group="aap-jobs"} 612.0

      This would allow to build dashboards and alerts that reflect the actual state of each group.

              bcoursen@redhat.com Brian Coursen
              rhn-support-snarveka Swati Narvekar
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: