Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-1445

Resource Watch on 4.16 payload jobs takes one extra hour to finish

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • False
    • None
    • False
    • SDN Sprint 249

      With 4.15, resource watch completes whenever it is interrupted. But for 4.16 jobs, it does not until 1h grace period kicks in and the job is terminated by ci-operator. This means:

      • All 4.16 jobs take 1h longer than normally
      • If an upgrade pushes the overall time close to the limit, sub-jobs from aggregator will fail at 4h mark.

       

      This was discovered when investigating this slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1705578623724259

       

            [TRT-1445] Resource Watch on 4.16 payload jobs takes one extra hour to finish

            looking back through recent jobs I cannot find an examples of a 4.16 job timing
            out on the resource-watch step. marking this as done. Thanks for the fix kenzhang@redhat.com

            Jamo Luhrsen added a comment - looking back through recent jobs I cannot find an examples of a 4.16 job timing out on the resource-watch step. marking this as done. Thanks for the fix kenzhang@redhat.com

            looks like we still hit this 1h0m timeout for resource-watch in upgrade jobs semi-frequently (4-5%). but that is across other releases too (4.14, 4.15) so not a regression like you fixed last week. I can't totally figure it out, but looking at search.ci with 14d window the percentage jumps to 9% so I assume your fix helped cut these in half-ish at least.

            Jamo Luhrsen added a comment - looks like we still hit this 1h0m timeout for resource-watch in upgrade jobs semi-frequently ( 4-5% ). but that is across other releases too (4.14, 4.15) so not a regression like you fixed last week. I can't totally figure it out, but looking at search.ci with 14d window the percentage jumps to 9% so I assume your fix helped cut these in half-ish at least.

              jluhrsen Jamo Luhrsen
              kenzhang@redhat.com Ken Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: