Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-8538

Too many ansiblejobs trigger for the number of managed clusters

XMLWordPrintable

    • Moderate
    • No

      Description of problem:

      While testing ztp day2 ansible automation where an ansiblejob is triggered for each provisioned and ztp-done managedcluster, there appears to be far more ansiblejobs that ran than actual managed clusters.  In an ideal run the count of ansiblejobs should not exceed the count of ztp-done clusters, and even better would be some level of concurrency where several managedclusters are operated on via the same job since the playbook can handle concurrency.

       

      As an example:

      # oc get subscriptions -n ztp-day2-automation monitor-ztp-done-subscription -o json | jq '.status.ansiblejobs.prehookjobshistory | length '
      1189
      # oc get ansiblejobs -n ztp-day2-automation --no-headers | wc -l
      777
      # awx -k --conf.host "https://automationcontroller-ansible-automation-platform.apps.acm-lta.rdu2.scalelab.redhat.com" --conf.username admin --conf.password JNNBIMaYFkQXRrvotu1Y3JsTlnmn1zmS jobs list | jq ".count"
      2693

      You can see the count of ansible jobs is different when looking at different parts of the stack, there appears to be 777 ansiblejob objects, however 1189 entries in the acm application prehook history, and in AAP we actually see 2693 playbook runs.  The number of managedclusters provisioned in this case was 324 with 318 becoming ztp-done. This results in the application of the playbook showing the managedclusters been operated on more than once.  Attached is the graph that shows how clusters go from label ztp-ansible=running to ztp-ansible=completed and back and forth as more jobs get triggered.

      Version-Release number of selected component (if applicable):

      OCP 4.14.1 (Hub and deployed clusters)

      ACM - 2.9.0-DOWNSTREAM-2023-11-03-14-27-40

      How reproducible:

      Everytime I have ran with the application hook I have observed many ansiblejobs that doesn't seem to correspond to the count of actual provisioned clusters, perhaps no state is being tracked and acm's application hook is trigger far too much.

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

              xiangli@redhat.com Xiangjing Li
              akrzos@redhat.com Alex Krzos
              Ruici Hong Ruici Hong
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: