Currently. when deciding which solver config is the winner in a particular benchmark, the one with the best average score is picked. I believe this to be the wrong approach.
When averaging two numbers, one if which is significantly smaller than the other, the average isn't the best metric you could use. (10, 10, 10, 20 and 1000 give you 210 as the average - is that really the best metric available to describe the data set?)
For this reason, I would like the winner-picking algorithm to work different:
1) For each input file in the benchmark, rank the solvers per their score. (Basically 1st to Nth place.)
2) Then make a median of all these "places" and the best-placed algorithm wins.
This way, you don't compare the solver results themselves. You compare how the solvers did in relation to the other solvers - which is something I consider much more important.