[Termtools] termination competition bylaws

"Jürgen Giesl (giesl)" giesl at informatik.rwth-aachen.de
Tue Aug 17 20:56:06 CEST 2010


Dear all,

thanks to the Innbruco group for pushing this discussion. Indeed, this is a question
that comes up at every competition and thus, it would be good to decide
on some uniform treatment.

> [First unsoundness, etc. must be defined, not exclusive]
> 
> (u) A tool is unsound if one of its proofs contains a wrong deduction step.
> (v) A tool is unsound if it gives a wrong answer (the answer is the
>      first line of the output).
> (w) Unsoundness of a tool is determined per category.

I'm in favor of (u) and (w) (so (u) should already be enough to
call a tool unsound).

> 
> [We list the following statements as proposals, they are not exclusive]
> 
> (a) If a tool is unsound then it is removed from the corresponding category.
> (b) If a tool is unsound then it is marked as buggy in the
>      corresponding category.
> (c) If a tool is unsound then a fixed version of it can be submitted and
>      is rerun for the corresponding category. It is only allowed to fix the
>      bug in the tool while no new features may be implemented.
>      (c1) The results for the unsound tool stay in the table.
>      (c2) The results for the unsound tool are removed from the table.
> (d) If a tool is unsound, for each wrong answer a penalty is enforced,
>      e.g., -8 points penalty for every wrong answer.
>      (d1) The penalty is the same for all categories, although the scoring
>           in the categories may be different.
>      (d2) Each category selects its own penalty.
>      (d3) There is a maximal number of wrong answers allowed (for example 3)
>           and if a tool gives more than 3 wrong answers it is disqualified.
> (e) The tool's authors can select one of (a)-(d).

I'm in favor of (b) and (c1). In this way, all information on the bug is
maintained in the table, but at the same time, the tool authors have the chance
to show what their corrected tool can do. I think having the bug made public is
enough penalty (so I would not be in favor of (d), which would also be hard to
check/implement, as pointed out by the Innsbruck team). I also share the opinion
of the Innbruck team that (e) is not a good option, since this would not result
in a uniform treatment.

Best Regards
Juergen





More information about the Termtools mailing list