-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
6.13.0
-
4
-
False
-
-
False
-
CLOSED
-
0
-
Endeavour
-
-
-
Moderate
-
No Coverage
-
No
Description of problem:
When a task is being triggered/created during dynflow orchestrator restart, the task might end up in a hung state - a few combinations can happen, like:
- foreman task planning/pending, dynflow planning/pending
- foreman task planned/pending, dynflow planned/pending
- foreman task running/pending, dynflow paused/success
- foreman task running/error, dynflow paused/success (with error "Could not transition step from pending to running, step already in running.")
- foreman task running/pending, dynflow running/error (stuck forever, seen at customer on 6.13, error = "Could not transition step from pending to running, step already in running."; this might be fixed in 6.15)
Version-Release number of selected component (if applicable):
Sat 6.13 and also 6.15 (versions I tested)
How reproducible:
25% in one test round
Steps to Reproduce:
basic idea: trigger Actions::Katello::Applicability::Hosts::BulkGenerate tasks in a tight loop and restart the service meantime
1. Have a few tens of Content Hosts. To populate fake C.Hosts, register one and then run few tens times:
SHORTNAME=fill-current-shortname
DOMAINNAME=fill-current-domain-name
AK=fill-your-activation-key
ORG=fill-your-organization
uuid=$(uuidgen)
echo "{\"dmi.system.uuid\": \"${uuid}\"}" > /etc/rhsm/facts/uuid.facts
hostnamectl set-hostname ${SHORTNAME{.${uuid%%-*}.${DOMAINNAME}
subscription-manager clean
subscription-manager register --activationkey ${AK} --org ${ORG}
2. Once having 20-30 Content Hosts (such that one BulkGenerate of all of them will take 1-3 seconds), run in 2-3 shells concurrently:
while true; do echo "ForemanTasks.async_task(::Actions::Katello::Applicability::Hosts::BulkGenerate, host_ids: Host.pluck(:id))"; sleep 0.02; done | foreman-rake console
3. Once the foreman-rake shells fire new ForemanTasks, restart the service:
systemctl restart dynflow-sidekiq@orchestrator.service
4. Wait until the service is restarted, then stop the foreman-rake commands.
5. Monitor WebUI tasks. A few hundreds of "Bulk generate applicability for hosts" tasks will appear - moving from planning to planned to running to stopped. Wait until no further tasks are changing for some time.
6. Check if some BulkGenerate task hangs in either wrong state. If so, compare its state/result with its dynflow task state/result.
7. If no such task exists, goto 2.
Actual results:
6. With some patience (and proper tuning of the reproducer, e.g. ensure the rake console fires new tasks with high cadence and maybe worth having multiple --katello-hosts-queue-workers), so with some patience, you will see a BulkGenerate task hung in running forever.
Expected results:
6. no hung task.
Additional info: