In OCP `galaxy-importer` calls `ansible-test` inside a container with restricted resources. `ansible-test` calls `pylint` with `jobs` set to zero https://github.com/ansible/ansible/blob/devel/test/lib/ansible_test/_internal/sanity/pylint.py#L234, and this causes `pylint` to launch as many processes as cores it can see (in OCP testing, 8 cores), and in this case each process has 1/8th of the cpu and memory. This causes slow execution of pylint (local testing showing processes waiting on IO).
Previous issues with our container + pylint, [too small resources](https://github.com/ansible/galaxy_ng/issues/64) causing broken pipe, and an [intermittent OOM](https://github.com/ansible/galaxy_ng/issues/230) error, were resolved with higher resources.
For the current issue where slow execution eventually hits container timeout, possible solutions:
- Restrict the number of processes pylint can spawn by setting `jobs` to a positive number
- Increase CPU resource on `ansible-test` job container (i.e. to `1000m`). But this cannot be set too high, since a high CPU limit can cause the job to take longer to schedule and potentially not start
- Increase timeout - note even with `IMPORTER_JOB_TIMEOUT` set https://github.com/ansible/galaxy-importer/blob/master/galaxy_importer/ansible_test/runners/openshift_job.py#L239 to 15min, timeout still occurs at ~10min, there may be another setting needed to increase the default 10min timeout.
- is triggering
-
AAH-263 Make ansible-test jobs logs show in kibana
-
- Closed
-