-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Postgres is struggling with speed, and can also be quite delayed in importing data, up to a couple hours.
For pull request risk analysis we would want to query:
- all test results from pr jobs for the latest commit, both presubmit AND /payload jobs
- historical pass rates from periodics
- recent pass rates from periodics as well as presubmits OUTSIDE this PR (may want to make sure we have no overlap between these results and historical)
This data will help us do pr risk analysis against new criteria we want to use to see if incidents are underway in other prs/periodics recently.
This needs to be cost effective so we have to optimize carefully. Matviews cannot be used in bigquery as they auto-refresh as soon as new data is available, so we'd get no benefit. (verify but this is what I saw when I last looked)
First question would be how much would it cost to run these queries directly?
If that proves extremely expensive, could we instead use caching or a scheduled query overwriting a table, plus a query for the last X hours? How could we make this affordable?
This is particularly interesting as well as it parallels with our goals to long term move sippy classic to bigquery. This is effectively writing the query for a huge portion of test pass rate queries in postgres.
There is some prior art here, dgoodwin recently moved one of the matviews to a bigquery query run once a day, this does a one day query for test analysis, broken out by job, for just one day: https://github.com/openshift/sippy/pull/2183/files#diff-41f56b31034ff2c88b75ea81dd32631694108f75cacedb667052743fb96272ccR379 We then store in postgres. Note that because we have it broken out by job, this data can then be used to breakdown by variant. This query may be a good starting point for the above.
But could we go further here and convert our test analysis tables to bigquery somehow?
It would be really nice to use the real testID here instead of testname as we do most places in sippy classic.
Possible approach to cost effective:
- daily scheduled query for historical data overwriting to a table
- live query for last 12 hours
- third query for the pr
- relates to
-
TRT-1984 Reconsider Test Report / Analysis Design
-
- In Progress
-