-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
5
-
None
-
None
-
OCP Node Kueue Sprint 284
ssThis story will track Upgrade testing.
Steps:
- Create an OCP cluster (version to be upgraded)
- Install cert-manager
- Install Kueue Operator and Operand (w/ Jobs, Pods, Deployments and StatefulSets)
- Create the Resources (ResourceFlavor, ClusterQueue and LocalQueue)
- Create some pods and logs
- Check that workloads were created
- Upgrade OCP
- Check the Resources
- Pods and Jobs probably won't be kept once we're not supporting for now seamless upgrade
- Create some pods and logs (using the Resources created before the upgrade)
- Check that workloads are still being accepted
- Run make test-e2e
Tests:
- OCP 4.18 Kueue 1.3 -> OCP 4.19
- OCP 4.19 Kueue 1.3 -> OCP 4.20
- 419_upgrade_tests.txt

- 419_upgrade_tests-rerun_failed.txt
- initial run had 4 failures. All tests passed on the second attempt, confirming no functional regressions were introduced. - Besides proposed tests I also manually verified VisibilityOnDemand (using the 2 versions of API) and LocalQueue Default after upgrading. Both tests have Passed.
- 419_upgrade_tests.txt
- OCP 4.20 Kueue 1.3 -> OCP 4.21

-
- LocalQueueDefaulting and VisibilityOnDemanded - tested
- 420_upgrade_tests.txt

- 420_upgrade_tests-rerun_failed.txt

- OCP 4.21 Kueue upgrade from 1.2 to 1.3
- Partial Uninstall
- FBC created
- Kueue 1.2 installed
- Obs: Kueue 1.2 tests are currently passing on OCP4.21 - here
- Resource Flavor, ClusterQueue, LocalQueue, Pods and Jobs were created.
- Kueue was partially uninstalled
- Delete all operand instances for this operator - not checked
- All resources were kept (Resource Flavor, ClusterQueue, LocalQueue, Workloads and Kueue/Cluster)
- Kueue 1.3 installed
- Installation finished successfully (Operand is the same)
- New pods/jobs were created using the same resources
- Running e2e tests to check if everything still works partial_uninstall.txt

- Full Uninstall
- FBC created
- Kueue 1.2 installed
- All resources were kept (Resource Flavor, ClusterQueue, LocalQueue and Workloads)
- Kueue/Cluster was delete as expected
- Kueue 1.3 installed
- New pods and jobs created
- Ran e2e tests
- Partial Uninstall
- OCP 4.21 Kueue 1.3 Full Uninstall
- The difference now from Partial to Full Uninstall it's just the Operand.
- When we uninstall Kueue partially and install it again, Operand is available (with the previous config) once installation ends;
- When we uninstall Kueue fully and install it again, Operand is not available. We need to install it again.
- On both, resources (ResourceFlavor, ClusterQueue, LocalQueue and Workloads) previously created are kept.
- 421_full_uninstall.txt

- 421_full_uninstall-rerun_failed.txt

- The difference now from Partial to Full Uninstall it's just the Operand.

- is related to
-
OCPBUGS-77245 Kueue-managed pods with explicit runAsUser get permanently stuck in SchedulingGated
-
- Closed
-