Uploaded image for project: 'Satellite'
  1. Satellite
  2. SAT-16931

re-enabling sync plans [FAIL] Could not update the sync plan: ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic.

XMLWordPrintable

    • Important

      +++ This bug was initially created as a clone of Bug #2131839 +++

      Description of problem: When updating from Satellite 6.11.2 to 6.11.3 I get the following message:

      re-enabling sync plans [FAIL]
      Could not update the sync plan:
      ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic.
      --------------------------------------------------------------------------------
      Scenario [Procedures after migrating to Satellite 6.11.z] failed.

      The following steps ended up in failing state:

      [sync-plans-enable]

      Resolve the failed steps and rerun the command.

      If the situation persists and, you are unclear what to do next,
      contact Red Hat Technical Support.

      In case the failures are false positives, use
      --whitelist="sync-plans-enable"

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:
      1.satellite-maintain upgrade run --target-version 6.11.z
      2.
      3.

      Actual results:
      \ All services started [OK]
      --------------------------------------------------------------------------------
      re-enable sync plans:

      re-enabling sync plans [FAIL]
      Could not update the sync plan:
      ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic.
      --------------------------------------------------------------------------------
      Scenario [Procedures after migrating to Satellite 6.11.z] failed.

      The following steps ended up in failing state:

      [sync-plans-enable]

      Resolve the failed steps and rerun the command.

      If the situation persists and, you are unclear what to do next,
      contact Red Hat Technical Support.

      In case the failures are false positives, use
      --whitelist="sync-plans-enable"

      Expected results:
      run satellite satellite-maintain upgrade run --target-version 6.11.z and have it finish successfully.

      Additional info:

      — Additional comment from on 2022-10-06T14:27:35Z

      Hi William,

      Thanks for raising the bugzilla with your finding.

      Do you already have a case open on this with support? If not, I'd recommend that we begin there.

      — Additional comment from on 2022-10-11T10:37:06Z

      I have already opened a case with support.

      — Additional comment from on 2022-12-02T19:44:05Z

      I also experienced this problem for which I opened the following case record:

      https://access.redhat.com/support/cases/#/case/03373216

      I suspect it was caused by the upgrade of our Satellite from 6.10 to 6.11.
      I worked around the problem by deleting the offending sync plan and re-creating an identical plan (but which obviously has a new and unique recurring logics ID.

      — Additional comment from on 2022-12-12T13:27:40Z

      I hit it on my own Satellite as well. I think the sufficient reproducer is:

      • have Sat 6.11.[0-3]
      • have a sync plan
      • disable it
      • run upgrade to 6.11.4

      — Additional comment from on 2023-01-26T13:52:06Z

      Adam,

      Was this fixed in foreman-tasks?

      — Additional comment from on 2023-01-30T10:03:08Z

      Hard to say. The only thing I'm aware of is https://bugzilla.redhat.com/show_bug.cgi?id=1887511 which reads almost exactly the same, but that should have been fixed a long time ago.

      — Additional comment from on 2023-02-02T12:12:52Z

      Hello,

      I have Hilti AG reporting a similar issue with the upgrade to 6.12.1 and with the Recurring Logic for Inventory sync action.

      So I tested with Sync Plan stuff ( on 6.12.0 and 6.12.1 ):

      • Created a sync plan with custom cron to be executed just after 2 mins
      • Stop all services
      • Bring up the services after 3 mins
      • Check the sync plan and RL, the next sync date\time is still OLD
      • Disable the RL, Try to reenable it, and fails with "ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic."

      The same was reproducible for Inventory Scheduled Sync Recurring Logic as well.

      While for a sync plan, It's easy to fix from UI, but for Inventory Scheduled Sync Action, It's not since the cronline for the Recurring Logic is not editable ( yet ). So we have to clear that canceled logic and recreate a new from rake console

      The fact that Satellite cannot gracefully handle the RL execution and Status if the "Next Run" falls into a downtime ( when services will be down ), that makes it nearly impossible for users to identify the source of the issue and then fix it somehow.

      Let me know if a new BZ is needed for this .

      — Additional comment from on 2023-02-02T15:47:26Z

      Observations for 6.13 ( snap 8 )

      Same reproducer but

      • Start Date 8:54 PM and will run in every 2 mins
      • I let it run atleast once on 8:56 PM and now the Next Sync shows "8:58 PM"
      • Stopped satellite services at 8:57:20 PM
      • Started it back at 9:03:10 PM
      • Checked Monitor --> Tasks and I see another task was executed at 9:05 PM
      • Check the sync plan but the Next Sync still shows "8:58" PM
      • Waited till 9:11 PM and no further tasks were executed.

      CLI:

      1. su - postgres -c "psql -d foreman -c 'select label,count(label),state,result from foreman_tasks_tasks where state <> '\''stopped'\'' group by label,state,result ORDER BY label;'"
        label | count | state | result
        -------------------------------------------------------------------+--------
        CreateExpiredManifestNotifications | 1 | scheduled | pending
        CreatePulpDiskSpaceNotifications | 1 | scheduled | pending
        CreateRssNotifications | 1 | scheduled | pending
        ForemanInventoryUpload::Async::GenerateAllReportsJob | 1 | scheduled | pending
        InsightsCloud::Async::InsightsClientStatusAging | 1 | scheduled | pending
        InsightsCloud::Async::InsightsScheduledSync | 1 | scheduled | pending
        InventorySync::Async::InventoryScheduledSync | 1 | scheduled | pending
        SendExpireSoonNotifications | 1 | scheduled | pending
        StoredValuesCleanupJob | 1 | scheduled | pending
        (9 rows)
      • Checked the recurring logic and
        • That shows me the logic is active and next run is still at ""8:58 PM""
      • Disabled the RL from Recurring Logics page
      • Tried to enable the Logic and it says "ERF28-1357 [ForemanTasks::RecurringLogicCancelledException]: Cannot update a cancelled Recurring Logic."
      • Tried to enable the sync plan and that also says the same thing.

      IP of my Sat 6.13 : 10.74.130.196 ( root\redhat ) ( admin \ RedHat1! ) [ It's running on low resources , So will be a bit slow )

      — Additional comment from on 2023-02-02T18:27:13Z

      After enabling the debug for dynflow stuff on the same instance:

      ( TZ = IST )

      • Same old sync plan that was broken earlier, I changed it's start date to something else and enabled it.
        • It created a new RL ( RL ID 8 )
        • Next Sync: 10:12:00 PM
      • RL executed fine ( 5 occurances in interval of 2 mins )
        Run Sync Plan: Test stopped success February 02, 2023 at 10:20:07 PM 0 seconds
        Run Sync Plan: Test stopped success February 02, 2023 at 10:18:07 PM 0 seconds
        Run Sync Plan: Test stopped success February 02, 2023 at 10:16:07 PM 0 seconds
        Run Sync Plan: Test stopped success February 02, 2023 at 10:14:07 PM 0 seconds
        Run Sync Plan: Test stopped success February 02, 2023 at 10:12:07 PM 0 seconds
      • Next Occurance : 10:22 PM
      • Stopping all satellite services at "10:21:12 PM"
      • Will wait for 3 mins here
      • Initiated service startup at 10:24 PM and compledted at 10:26 PM
      • Surprisingly I see a Sync Plan task scheduled this time:
      1. su - postgres -c "psql -d foreman -c 'select label,count(label),state,result from foreman_tasks_tasks where state <> '\''stopped'\'' group by label,state,result ORDER BY label;'"
        label | count | state | result
        -------------------------------------------------------------------+--------
        Actions::Katello::SyncPlan::Run | 1 | scheduled | pending
      • and the sync plan continues to work now and RL also looks fine:

      Run Sync Plan: Test stopped success February 02, 2023 at 10:30:07 PM 1 second
      Run Sync Plan: Test stopped success February 02, 2023 at 10:28:07 PM 0 seconds
      Run Sync Plan: Test stopped success February 02, 2023 at 10:26:13 PM 0 seconds

      • Disabled that old sync plan
      • Created a new sync plan by name "Test_New" and same old logic of 2 mins interval :

      Start 10:33 PM
      Next Sync: 10:34 PM

      • That scheduled task excuted fine on 10:34 PM and then scheduled the next sync at 10:36 PM
      • CLI, shows two scheduled plans as expected ( old one is disabled and this new one is active )
      1. su - postgres -c "psql -d foreman -c 'select label,count(label),state,result from foreman_tasks_tasks where state <> '\''stopped'\'' group by label,state,result ORDER BY label;'"
        label | count | state | result
        -------------------------------------------------------------------+--------
        Actions::Katello::SyncPlan::Run | 2 | scheduled | pending
      • I stopped all satellite services at 10:35 PM
      • Started back the services at 10:37 PM and all are started by 10:38 PM.
      • Now, I see the same issue again. Just one sync plan shows scheduled ( and it's probably the one that I had disabled earlier )
      1. su - postgres -c "psql -d foreman -c 'select label,count(label),state,result from foreman_tasks_tasks where state <> '\''stopped'\'' group by label,state,result ORDER BY label;'"
        label | count | state | result
        -------------------------------------------------------------------+--------
        Actions::Katello::SyncPlan::Run | 1 | scheduled | pending

      The new one got away

      • Waited till 10:43 PM and I can only see just one new occurrence of the Sync Plan, executed after the services restart at 10:38 PM

      Run Sync Plan: Test_New stopped success February 02, 2023 at 10:38:57 PM 0 seconds
      Run Sync Plan: Test_New stopped success February 02, 2023 at 10:34:07 PM 1 second

      • At this point, I am not touching the new Sync Plan or it's Recurring logic anymore, so that you can investigate it personally if needed.
      • /root/activity.log file contains a "tail -f" on syslog and production.log ( was running during this entire time )

      — Additional comment from on 2023-02-03T12:15:20Z

      Dynflow has a built in subcomponent which runs inside the orchestrator and periodically dispatches delayed execution plans scheduled for the future. Once the delayed plan is properly planned, the delay record (the thing saying "an execution plan $X should be executed at time $T") is destroyed. There are some safeguards in place to ensure a single delayed plan does not get planned multiple times.

      Issue 1:
      In sidekiq-based deployments, the delayed plan dispatching subcomponent is started too early, while the rest of the orchestrator is still doing world validity checks. This can lead to a situation where the subcomponent dispatches a single delayed plan multiple times.

      Issue 2:
      When delayed plans get dispatched multiple times, the safeguards are not handling it properly. The safeguards essentially act as an early return in case the plan in question is already being planned, however, as soon as the early return happens, the delayed record is removed. This breaks planning of the next repetition, which relies on data from it.

      — Additional comment from on 2023-02-23T14:11:16Z

      Please keep this for 6.13

      — Additional comment from on 2023-03-22T13:37:24Z

      Verified with Sat 6.13.0 snap 15.0 and upgrade path 6.12.3 snap 2.0 -> 6.13.0 snap 15.0.

      1) Create a recurring task (All hosts -> <host> -> Schedule a job) with cron "*/2 * * * *"
      2) Create a sync plan (Content -> Sync plans -> Create sync plan) on some repo with the same cron
      3) Run the upgrade to 6.13
      4) Go to Recurring logics and disable both
      5) Enable them again

      A sync plan created in 2) was rescheduled to future during upgrade.

      A recurring task created in 1) was NOT rescheduled to future during upgrade. After disabling it manually, it can't be enabled again. It doesn't run at specified time.

      This BZ specifically mentions sync plans so I'm verifying it and filing a followup BZ for general recurring tasks, like running hosts jobs.

            aruzicka@redhat.com Adam Ruzicka
            jira-bugzilla-migration RH Bugzilla Integration
            Lukas Hellebrandt Lukas Hellebrandt
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: