[Termtools] A proposal for the next competition - uniqueness
	rankings
    Johannes Waldmann 
    waldmann at imn.htwk-leipzig.de
       
    Thu Feb 23 11:49:28 CET 2006
    
    
  
There are two issues here:
(a) how to run the competition
(b) how to compute a score (and determine the winners)
I think we approach a consensus about (a), namely:
(1) do first round as in last year,
(2) possibly have extra round(s).
Of course Claude has to decide whether
he has enough computing resources.
I think the aim of his proposal was to avoid part (1)?
For (b), it is more difficult (do we want power or speed, etc.)
Perhaps we can have several categories (and thus several winners)
(the SAT2005 competition had 27 winners - that seems a bit much for us).
what are the interesting categories?
I try to list what has been (implicitly) proposed so far:
(0) completeness -- that's what I would call what we measured
    in 04 and 05: count the number of YES answers
(1) speed -- how do you measure this? Is a tool that solves
    just one (trivial) problem in 0.001 seconds
    the speed winner because it has the lowest average?
(2) uniqueness -- the "divided purse" idea: for each solved problem,
    add the reciprocal of the number of tools that solved it
Here is some data for the "divided purse" proposal
(in short, it would not change much).
I computed the scores for the 2005 competition
and set the winner to 100 percent.
SRS category
completeness:
[(100,Torpa),(90,Aprove),(80,Jambox),(58,Teparla),(52,Matchbox),(51,Ttt),(29,Cime)]
uniqueness:
[(100,Torpa),(83,Aprove),(78,Jambox),(42,Teparla),(39,Matchbox),(36,Ttt),(22,Cime)]
for 2005 competition, TRS category
completeness:
[(100,Aprove),(88,Ttt),(70,Tpa),(60,Teparla),(53,Cime),(28,Matchbox)]
uniqueness:
[(100,Aprove),(79,Ttt),(52,Tpa),(43,Teparla),(39,Cime),(18,Matchbox)]
in both cases, the rankings for completeness (number of problems solved)
and uniqueness (most money collected from "problem purses") coincide.
If we're being subtle, we could find some meaning in the differences.
E. g. Jambox is "quite unique" (among the non-winners, it is the
only tool whose uniqueness is about equal to its completeness) etc.
To me this indicates that we could easily switch
to the "divided purse" idea (uniqueness ranking).
Best regards,
-- 
-- Johannes Waldmann -- Tel/Fax (0341) 3076 6479/80 --
---- http://www.imn.htwk-leipzig.de/~waldmann/ -------
_______________________________________________
Termtools mailing list
Termtools at serveur-listes.lri.fr
http://serveur-listes.lri.fr/mailman/listinfo/termtools
    
    
More information about the Termtools
mailing list