Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Activity Type:
Future Sustainability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

Our current aggregation logic is too sensitive, leading to the rejection of payloads for non-regressions. Analysis shows that a significant portion of rejected payloads are failing due to infrastructure noise or existing flakes rather than genuine code regressions.

I suggest we try using a "pity factor" and setting the minimum to 3 across the board. We have component readiness as a backstop to identify regressions with greater sample sizes.

Analysis shows 15% of aggregated jobs are failing with only 2 failures on all tests. This is a simple change we can implement immediately while we try to figure out alternatives to achieve our goal that our blocking CI jobs indicate the payload is likely good enough; and shift more subtle regression detection to CR.

Aggregation should detect significant problems;
Results:

  total_job_runs_analyzed | job_runs_meeting_criteria | job_runs_not_meeting_criteria | percentage_meeting_criteria
-------------------------+---------------------------+-------------------------------+-----------------------------
                    2359 |                       344 |                          2015 |                       14.58

Query:

SELECT
    sub.total_job_runs_analyzed,
    sub.job_runs_meeting_criteria,
    sub.job_runs_not_meeting_criteria,
    sub.percentage_meeting_criteria
FROM (
    WITH RelevantFailedTests AS (
        SELECT
            pjr.id AS prow_job_run_id,
            pj.name AS job_name,
            t.name AS test_name,
            pjrt_out.output AS test_output,
            pjrt.id AS prow_job_run_test_id
        FROM prow_job_runs pjr
        JOIN prow_jobs pj ON pjr.prow_job_id = pj.id
        JOIN prow_job_run_tests pjrt ON pjr.id = pjrt.prow_job_run_id
        JOIN tests t ON pjrt.test_id = t.id
        LEFT JOIN prow_job_run_test_outputs pjrt_out ON pjrt.id = pjrt_out.prow_job_run_test_id
        WHERE pj.name LIKE 'aggregated-%'
            AND pjr.timestamp > NOW() - INTERVAL '3 months'
            AND pjrt.status = 12
            AND t.name NOT ILIKE '%sig-sippy%'
    ),
    JobRunOutputCheck AS (
        SELECT
            prow_job_run_id,
            job_name,
            BOOL_AND(test_output ILIKE '%failed 2 times%') AS all_failed_tests_match_pattern
        FROM RelevantFailedTests
        GROUP BY prow_job_run_id, job_name
    )
    SELECT
        COUNT(prow_job_run_id) AS total_job_runs_analyzed,
        COUNT(prow_job_run_id) FILTER (WHERE all_failed_tests_match_pattern = TRUE) AS job_runs_meeting_criteria,
        COUNT(prow_job_run_id) FILTER (WHERE all_failed_tests_match_pattern = FALSE) AS job_runs_not_meeting_criteria,
        CASE
            WHEN COUNT(prow_job_run_id) = 0 THEN 0.0
            ELSE ROUND((COUNT(prow_job_run_id) FILTER (WHERE all_failed_tests_match_pattern = TRUE))::NUMERIC * 100 / COUNT(prow_job_run_id), 2)
        END AS percentage_meeting_criteria
    FROM JobRunOutputCheck
) AS sub;

relates to

SHIPSTRAT-3 A successful nightly most nights

Refinement

links to

openshift/ci-tools#4894: TRT-2487: job-run-aggregator: fail on minimum of 3 failures

Assignee:: Petr Muller

Reporter:: Stephen Benjamin

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/12/21 3:59 PM

Updated:: 2026/01/14 1:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates