Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-772

Disruption uploader can hang

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Found on Jan 12 '23, the disruption uploader pod was started over a day and a half ago.

      Last log lines were:

      uploading prowjob.yaml: jobrun/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-upgrade/1612783363347714048
      uploading content: jobrun/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-upgrade/1612783363347714048
      uploading backend disruption results: "periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-upgrade"/"1612783363347714048"

      Unclear if this job was the issue or something that followed it.

      We need to figure out if there should be timeouts on each request, and/or the overall job.

      Logs should also be timestamped. Suggest logrus.

      Marking this as a major issue as data just silently stops uploading, and that is a problem for us.

      Job is defined here: https://github.com/openshift/continuous-release-jobs/blob/master/config/clusters/dpcr/services/dpcr-ci-job-aggregation/disruption-cronjob.yaml

              rhn-engineering-dgoodwin Devan Goodwin
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: