-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
Found on Jan 12 '23, the disruption uploader pod was started over a day and a half ago.
Last log lines were:
uploading prowjob.yaml: jobrun/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-upgrade/1612783363347714048
uploading content: jobrun/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-upgrade/1612783363347714048
uploading backend disruption results: "periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-upgrade"/"1612783363347714048"
Unclear if this job was the issue or something that followed it.
We need to figure out if there should be timeouts on each request, and/or the overall job.
Logs should also be timestamped. Suggest logrus.
Marking this as a major issue as data just silently stops uploading, and that is a problem for us.
Job is defined here: https://github.com/openshift/continuous-release-jobs/blob/master/config/clusters/dpcr/services/dpcr-ci-job-aggregation/disruption-cronjob.yaml