-
Story
-
Resolution: Obsolete
-
Normal
-
None
-
None
-
None
-
3
-
False
-
False
-
Developer story
As a WMCO developer, I want to ensure the workloads tester-jobs are accurate and fails if the Windows workloads become inaccessible during the upgrade process, so that the e2e-upgrade test is able to validate the upgrade process and verify the availability of the workloads.
Description
The existing methodology in the e2e-upgrade for checking the workloads is flawed. Follow-up for GH discussion. The primary deficiencies relates to the API call to fetch the failed tester pods where the applied label selector is not capturing the correct value for the `job-name`
Engineering Details
// Test if there was any downtime for Windows workloads by checking the failure on the Job pods. pods, err := testCtx.client.K8s.CoreV1().Pods(testCtx.workloadNamespace).List(context.TODO(), metav1.ListOptions{FieldSelector: "status.phase=Failed", LabelSelector: "job-name=" + windowsWorkloadTesterJob + "-job"})
Above logic, it's just checking that the number of found failed pods is 0. But, if the label selector is not actually picking up pods from the jobs being created; the resulting number of pods will always be 0.
To make things clear with an example, lets say you are creating 2 jobs:
- windows-workload-tester-job-one
- windows-workload-tester-job-two
The selector is looking for failed pods with the label: job-name=windows-workload-tester-job. It won't find any failed pods because it wouldn't find any pods with that label at all. It needs to be doing two separate calls to find failed pods with the label job-name=windows-workload-tester-job-one and job-name=windows-workload-tester-job-two
Instead you could have a slice which keeps track of jobs
var workloadTesterJobs []*batch.jobs
add the jobs to it as you create them, and move the check for failed pods to this function, where you can use the generated job name job.GetName() to search for the pods.
But you cant just search for failed pods, you need to list all the pods and ensure there is at least one pod, and then check if any of them are in the failed state.
require.NotEmpty(t, pods)
// then assert status.Phase != true
Acceptance Criteria
- e2e-upgrade test suite fails if Windows workloads are interrupted.
- The function testUpgradeVersion(...) validates the workloads using the tester jobs.