Uploaded image for project: 'Ansible Automation Platform RFEs'
  1. Ansible Automation Platform RFEs
  2. AAPRFE-2275

[EDA][containerized][rpm] Over dimensioned amount of workers

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 2.5
    • platform-installer
    • False
    • Hide

      None

      Show
      None
    • False

      Description

      Currently, both the Containerized and RPM installers set the number of workers incorrectly. Whether configured dynamically or statically, the number of workers often exceeds the node's capacity, rendering horizontal scaling useless for solving throughput issues and severely impacting the application's performance.

      For many services, the installer configures the number of workers based on the formula: cpu_cores * 2 + 1.

      Examples:
      https://gitlab.cee.redhat.com/ansible/aap-containerized-installer/-/blob/main/roles/automationeda/defaults/main.yml?ref_type=heads#L28

      https://github.com/ansible/automation-platform-collection/blob/devel/roles/automationedacontroller/defaults/main.yml#L60 

      While this formula is generally correct, it should be applied at the host level to the total number of workers. However, it is currently applied per service (e.g., API workers, activation workers, etc.), resulting in a node with too many processes competing for resources. For example, a 4-core hybrid node can end up with more than 30 Python processes, far exceeding the recommended amount.

      Suggestion

      Ideally, the installer should dynamically set the total number of workers, taking into account the node's core count and type to ensure the global number of processes stays within the limits recommended by the formula.

      If this is too difficult to implement, setting a fixed total number of workers based on the recommended specs would be preferable to the current approach. This would still allow customers to customize the worker count for better tuning if they use more powerful nodes. For safety, the default settings should at least be configured to avoid overloading the VM, rather than the opposite. However, given the dynamic size of the deployments (even in our internal test matrix) the dynamic handling approach should be preferable to not force to CI pipelines or customer to customize workers count depending on the size of the node. 

              Unassigned Unassigned
              rhn-support-aizquier Alejandro Izquierdo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: