[Termtools] A proposal for the next competition - uniqueness rankings

Johannes Waldmann waldmann at imn.htwk-leipzig.de
Thu Feb 23 11:49:28 CET 2006


There are two issues here:

(a) how to run the competition
(b) how to compute a score (and determine the winners)

I think we approach a consensus about (a), namely:
(1) do first round as in last year,
(2) possibly have extra round(s).

Of course Claude has to decide whether
he has enough computing resources.
I think the aim of his proposal was to avoid part (1)?


For (b), it is more difficult (do we want power or speed, etc.)
Perhaps we can have several categories (and thus several winners)
(the SAT2005 competition had 27 winners - that seems a bit much for us).

what are the interesting categories?
I try to list what has been (implicitly) proposed so far:

(0) completeness -- that's what I would call what we measured
    in 04 and 05: count the number of YES answers

(1) speed -- how do you measure this? Is a tool that solves
    just one (trivial) problem in 0.001 seconds
    the speed winner because it has the lowest average?

(2) uniqueness -- the "divided purse" idea: for each solved problem,
    add the reciprocal of the number of tools that solved it



Here is some data for the "divided purse" proposal
(in short, it would not change much).

I computed the scores for the 2005 competition
and set the winner to 100 percent.

SRS category

completeness:
[(100,Torpa),(90,Aprove),(80,Jambox),(58,Teparla),(52,Matchbox),(51,Ttt),(29,Cime)]

uniqueness:
[(100,Torpa),(83,Aprove),(78,Jambox),(42,Teparla),(39,Matchbox),(36,Ttt),(22,Cime)]

for 2005 competition, TRS category

completeness:
[(100,Aprove),(88,Ttt),(70,Tpa),(60,Teparla),(53,Cime),(28,Matchbox)]

uniqueness:
[(100,Aprove),(79,Ttt),(52,Tpa),(43,Teparla),(39,Cime),(18,Matchbox)]


in both cases, the rankings for completeness (number of problems solved)
and uniqueness (most money collected from "problem purses") coincide.

If we're being subtle, we could find some meaning in the differences.
E. g. Jambox is "quite unique" (among the non-winners, it is the
only tool whose uniqueness is about equal to its completeness) etc.


To me this indicates that we could easily switch
to the "divided purse" idea (uniqueness ranking).

Best regards,
-- 
-- Johannes Waldmann -- Tel/Fax (0341) 3076 6479/80 --
---- http://www.imn.htwk-leipzig.de/~waldmann/ -------

_______________________________________________
Termtools mailing list
Termtools at serveur-listes.lri.fr
http://serveur-listes.lri.fr/mailman/listinfo/termtools



More information about the Termtools mailing list