[Termtools] Benchmarks for TPDB

Tue Jun 9 09:51:05 CEST 2009

Dear Rene (and all),

> 1) The TPDB should be split into benchmarks (which may form some kind  
> of logical partition).
>    E.g., the pure logic programs might be separated from logic  
> programs with cuts and predefined predicates, or one might separate  
> the applicative TRSs from the other TRSs.
>    In this way, a tool does not have to implement that many features  
> to be competitive in certain benchmarks, making the outcome of the  
> competition more open and therefore,
>    interesting.

Concerning (1), I think one should separate between two cases.

I agree that problems concerning different languages should
be separated into different categories. For example, LPs with cuts
really are a different language than LPs without cuts. If an LP-tool
cannot handle LPs with cuts, then it really cannot run on programs
with cuts. So here, it might indeed make sense to separare this into
different categories.

But this should not be confused with the case where there are
problems from the same language, but from
different kinds of sources. For example, the TRS category contains
many TRSs, but they come from different sources and have different
characteristics. Examples are TRSs from transformations of CSR,
TRSs from imperative programs, applicative TRSs, randomly generated
TRSs, etc. If a tool can handle TRSs, then it can handle all of these
TRSs. Therefore, it does not make sense to split them up into different
categories. The effect would be that we would end up with many
categories, each of which only has very few participants, which has a
very bad effect for publicity.

Nevertheless, it is still a good idea to identify the
different sources or characterstics for the many examples in a category
like TRSs. The reason is that this can be used for the selection
algorithm in (2). For the competition, my proposal would be to select a
fixed number n of TRSs from every of the subgroups of TRSs that have the
same characteristic. The choice of the n TRSs should then be performed
randomly.

Best Regards
Juergen