-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
None
-
False
-
-
False
-
-
If a cronjob fails in stage, it may indicate a regression. Automated testing in stage should include checks of the status of jobs, so that we can quickly detect regressions in scheduled logic.
Acceptance criteria:
- stage health check job updated to include test
- test verified to fail when a job is returned having status "Failed"
Using the kubernetes API, the tests should:
- Loop through every job in the stage namespace, filtering to jobs in the last 24 hours.
- Record the status of the job.
- Fail if any job has "Failed" status.
The easiest way to verify that the latest job run was successful is to look at the `status` section in the job statuses via kubernetes api. e.g.
GET apis/batch/v1/namespaces/rhsm-stage/jobs
(equivalent to `oc get jobs`).
Look at the most recent object in .status.conditions for "type" having value "Failed" and having status.startTime within the last 24 hours. The related cronjob/clowdjobinvocation for any failure can be identified by the value of .metadata.ownerReferences objects having kind "CronJob" or "ClowdJobInvocation" (the reference's "name" value is the cronjob/clowdjobinvocation that failed).
Any job without a terminal status should be skipped.
The test should log all recent jobs' owner objects and their statuses, and fail if any of them failed.
Example output:
Complete jobs: CronJob/floorist-swatch-tally-exporter CronJob/rhsm-subscriptions-egress CronJob/swatch-billable-usage-purge-remittances CronJob/swatch-billable-usage-retry-remittances CronJob/swatch-billable-usage-sync CronJob/swatch-contracts-offering-sync CronJob/swatch-contracts-subscription-sync CronJob/swatch-metrics-rhel-sync CronJob/swatch-metrics-sync CronJob/swatch-system-conduit-sync CronJob/swatch-tally-hourly CronJob/swatch-tally-purge CronJob/swatch-tally-purge-events CronJob/swatch-tally-tally Failed jobs: ClowdJobInvocation/db-changelog-cleanup-2
References:
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/#JobStatus
- relates to
-
SWATCH-2820 Move the database creation out of the monolith
-
- Closed
-