Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-601

Risk Analysis of CI Job Run Failures to Help Engineers Spot Signal

    XMLWordPrintable

Details

    • Epic
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • Risk Analysis of CI Job Run Failures
    • False
    • None
    • False
    • Not Selected
    • In Progress
    • 78
    • 78% 78%

    Description

      We have a consistent complication where developers miss or ignore job failures on presubmits, because they don't trust the jobs which sometimes have overall pass rates under 30%.

      We have a systemic problem with flaky tests and jobs. Few pay attention anymore, and even fewer people know how to distinguish serious failures from the noise.

      Just fixing the test and jobs is infeasible, piece by piece maybe but we do not have the time to invest in what would be a massive effort.

      Sippy now has presubmit data throughout the history of a PR.

      Could sippy analyze the presubmits for every PR, check test failures against their current pass rate, filter out noise from on-going incidents, and then comment on PRs letting developers know what's really going on.

      As an example:

      job foo - failure severity: LOW

      • test a failed x times, current pass rate 40%, flake rate 20%

      job bar - failure severity: HIGH

      • test b failed 2 times, current pass rate 99%

      job zoo - failure severity: UNKNOWN

      • on-going incident: Azure Install Failures (TRT-XXX)

      David requests this get published in the job as a spyglass panel, gives a historical artifact. We'd likely do both so we know they see comments.

      This epic will cover TRTs project to enhance Sippy to categorize the likely severity of test failures in a bad job run, store this as a historical artifact on the job run, and communicate it directly to developers in their PRs via a comment.

      Attachments

        1.
        Docs Tracker Sub-task Closed Undefined Unassigned
        2.
        PX Tracker Sub-task Closed Undefined Unassigned
        3.
        QE Tracker Sub-task Closed Undefined Unassigned
        4.
        TE Tracker Sub-task Closed Undefined Unassigned

        Activity

          People

            rhn-engineering-dgoodwin Devan Goodwin
            rhn-engineering-dgoodwin Devan Goodwin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: