-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20.z
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When using 10 or more restore waves during an IBU upgrade, the execution order does not match the lca.openshift.io/apply-wave annotation values. The LCA correctly sorts and exports the Restore CRs into numbered subdirectories (restore1, restore2, ..., restore11) during the pre-pivot phase. However, after the reboot, it reads them back using os.ReadDir which sorts alphabetically. This causes restore10 and restore11 to be read before restore2, breaking the intended order. Relevant code: * ExportRestoresToDir: https://github.com/openshift-kni/lifecycle-agent/blob/release-4.20/internal/backuprestore/restore.go#L185-L192 * LoadGroupedManifestsFromPath: https://github.com/openshift-kni/lifecycle-agent/blob/release-4.20/utils/utils.go#L457
Version-Release number of selected component (if applicable):
lifecycle-agent v4.20.1 (verified on release-4.20 branch, but the issue exists in all versions as the code has not changed)
How reproducible:
Always, when using 10 or more distinct {{lca.openshift.io/apply-wave}} values in Restore CRs.
Steps to Reproduce:
1. Define 11 Restore CRs with lca.openshift.io/apply-wave values from 1 to 11
2. Trigger an IBU upgrade
3. After the reboot, observe the order in which restores are created in the LCA logs
Actual results:
The restores are executed in the following order (based on alphabetical sorting of directory names): restore1 → apply-wave 1 restore10 → apply-wave 10 restore11 → apply-wave 11 restore2 → apply-wave 2 restore3 → apply-wave 3 ... restore9 → apply-wave 9 Waves 10 and 11 are executed before wave 2.
Expected results:
Restores should be executed in the order defined by the lca.openshift.io/apply-wave annotation: 1, 2, 3, ..., 10, 11.
Additional info:
This issue was discovered during IBU upgrade testing (OCP 4.18 to 4.20) with 11 restore waves. In this case it did not cause functional issues because the restores were independent, but it could lead to failures if there are dependencies between waves that rely on the execution order.