Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-2487

Increase failure threshold to 3 for aggregation

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Future Sustainability
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      Our current aggregation logic is too sensitive, leading to the rejection of payloads for non-regressions. Analysis shows that a significant portion of rejected payloads are failing due to infrastructure noise or existing flakes rather than genuine code regressions.

      I suggest we try using a "pity factor" and setting the minimum to 3 across the board. We have component readiness as a backstop to identify regressions with greater sample sizes.

      Analysis shows 15% of aggregated jobs are failing with only 2 failures on all tests. This is a simple change we can implement immediately while we try to figure out alternatives to achieve our goal that our blocking CI jobs indicate the payload is likely good enough; and shift more subtle regression detection to CR.

      Aggregation should detect significant problems;
      Results:

        total_job_runs_analyzed | job_runs_meeting_criteria | job_runs_not_meeting_criteria | percentage_meeting_criteria
      -------------------------+---------------------------+-------------------------------+-----------------------------
                          2359 |                       344 |                          2015 |                       14.58
      

      Query:

      SELECT
          sub.total_job_runs_analyzed,
          sub.job_runs_meeting_criteria,
          sub.job_runs_not_meeting_criteria,
          sub.percentage_meeting_criteria
      FROM (
          WITH RelevantFailedTests AS (
              SELECT
                  pjr.id AS prow_job_run_id,
                  pj.name AS job_name,
                  t.name AS test_name,
                  pjrt_out.output AS test_output,
                  pjrt.id AS prow_job_run_test_id
              FROM prow_job_runs pjr
              JOIN prow_jobs pj ON pjr.prow_job_id = pj.id
              JOIN prow_job_run_tests pjrt ON pjr.id = pjrt.prow_job_run_id
              JOIN tests t ON pjrt.test_id = t.id
              LEFT JOIN prow_job_run_test_outputs pjrt_out ON pjrt.id = pjrt_out.prow_job_run_test_id
              WHERE pj.name LIKE 'aggregated-%'
                  AND pjr.timestamp > NOW() - INTERVAL '3 months'
                  AND pjrt.status = 12
                  AND t.name NOT ILIKE '%sig-sippy%'
          ),
          JobRunOutputCheck AS (
              SELECT
                  prow_job_run_id,
                  job_name,
                  BOOL_AND(test_output ILIKE '%failed 2 times%') AS all_failed_tests_match_pattern
              FROM RelevantFailedTests
              GROUP BY prow_job_run_id, job_name
          )
          SELECT
              COUNT(prow_job_run_id) AS total_job_runs_analyzed,
              COUNT(prow_job_run_id) FILTER (WHERE all_failed_tests_match_pattern = TRUE) AS job_runs_meeting_criteria,
              COUNT(prow_job_run_id) FILTER (WHERE all_failed_tests_match_pattern = FALSE) AS job_runs_not_meeting_criteria,
              CASE
                  WHEN COUNT(prow_job_run_id) = 0 THEN 0.0
                  ELSE ROUND((COUNT(prow_job_run_id) FILTER (WHERE all_failed_tests_match_pattern = TRUE))::NUMERIC * 100 / COUNT(prow_job_run_id), 2)
              END AS percentage_meeting_criteria
          FROM JobRunOutputCheck
      ) AS sub;
      

              afri@afri.cz Petr Muller
              stbenjam Stephen Benjamin
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: