[Termtools] competition results - some comments
Frederic Blanqui
frederic.blanqui at inria.fr
Thu Jan 21 08:46:02 CET 2010
Hello!
I made a few calculations about the results on TRS Standard Certifying
(I don't think that the rerun of CeTA on the results of AProVE-CeTA will
change many things).
Certifiers
Prover Proofs CeTA Rainbow Cime3
TTT2Cert 263 263 53 2
AProve-CeTA 259 256 67 13
AProVE-A3PAT 165 96 96 105
AProVE-COLOR 70 62 61 20
Cime3finder 9 9 9 9
TOTAL 686 286 149
Remarks:
1) I am suprised by the score of AProVE-COLOR. It does not reflect the
experiments I did before the competition. And, indeed, with the version
I had of AProVE before the competition, I get on my computer 119 proofs
(instead of 70 only!) among the 263 problems solved by TTT2Cert. So, I
wonder what's going wrong with AProVE-COLOR.
2) As a consequence, Rainbow-coq gets only 70 proofs over a total of
757, ie. 9.2%, generated by a tool with which it could be tested before
the competition. And the other tools use techniques not supported by
Rainbow: usable rules and extended polynomial interpretations. It
naturally follows that the results of Rainbow-coq are not very good in
absolute value. Moreover, I don't think that counting the number of
proofs certified is very sensible because a given proof may be counted
several times. We should count the number of DISTINCT proofs certified
instead. Otherwise, one could artificially increase its score by
duplicating its tools.
3) On the 70 proofs generated by AProVE-COLOR, Rainbow has a score
comparable to CeTA (61 and 62 respectively). Similarly, on the 165
proofs generated for Cime3, all certifiers perform equivalently (Cime3:
105, CeTA and Rainbow: 96). Here, Cime3 gets no big advantage. This is
not the case for CeTA which can certify (almost) all proofs generated
for it. On these proofs, Cime3 performs very badly (2 and 13) and
Rainbow a little bit better (53 and 67).
4) Conclusion: CeTA is clearly the best certifier, mainly because it
supports usable rules I think, a technique that is supported by it only.
But we can see that the results of the certifiers heavily depend on the
tool that has generated the proof, putting the results quit in perspective.
5) Although Rainbow supports (non-)termination proofs for relative and
modulo AC TRSs, this does not appear here since these systems have been
excluded from the competition. :-(
6) About the speed: Without suprise, CeTA is faster than Cime3 and
Rainbow since CeTA is a compiled program while the others generate
scripts that are in turn interpreted and checked in Coq. Hence, CeTA is
about 50 times faster than Rainbow which, more interestingly, is about
twice faster than Cime3. The average time for Rainbow is however quite
reasonable I think: 5s. Note also that Cime3 has 28 timeouts while
Rainbow and CeTA have no timeout. Finally, I would like to mention that
a compiled version of Rainbow is in preparation (actually, there is
already one but it handles polynomial interpretations only).
Frederic.
More information about the Termtools
mailing list