[Termtools] competition results - some comments

Thu Jan 21 08:46:02 CET 2010

Hello!

I made a few calculations about the results on TRS Standard Certifying 
(I don't think that the rerun of CeTA on the results of AProVE-CeTA will 
change many things).

				Certifiers
Prover		Proofs	CeTA	Rainbow	Cime3
TTT2Cert	263	263	53	2	
AProve-CeTA	259	256	67	13	
AProVE-A3PAT	165	96	96	105	
AProVE-COLOR	70	62	61	20	
Cime3finder	9	9	9	9
TOTAL			686	286	149

Remarks:

1) I am suprised by the score of AProVE-COLOR. It does not reflect the 
experiments I did before the competition. And, indeed, with the version 
I had of AProVE before the competition, I get on my computer 119 proofs 
(instead of 70 only!) among the 263 problems solved by TTT2Cert. So, I 
wonder what's going wrong with AProVE-COLOR.

2) As a consequence, Rainbow-coq gets only 70 proofs over a total of 
757, ie. 9.2%, generated by a tool with which it could be tested before 
the competition. And the other tools use techniques not supported by 
Rainbow: usable rules and extended polynomial interpretations. It 
naturally follows that the results of Rainbow-coq are not very good in 
absolute value. Moreover, I don't think that counting the number of 
proofs certified is very sensible because a given proof may be counted 
several times. We should count the number of DISTINCT proofs certified 
instead. Otherwise, one could artificially increase its score by 
duplicating its tools.

3) On the 70 proofs generated by AProVE-COLOR, Rainbow has a score 
comparable to CeTA (61 and 62 respectively). Similarly, on the 165 
proofs generated for Cime3, all certifiers perform equivalently (Cime3: 
105, CeTA and Rainbow: 96). Here, Cime3 gets no big advantage. This is 
not the case for CeTA which can certify (almost) all proofs generated 
for it. On these proofs, Cime3 performs very badly (2 and 13) and 
Rainbow a little bit better (53 and 67).

4) Conclusion: CeTA is clearly the best certifier, mainly because it 
supports usable rules I think, a technique that is supported by it only. 
But we can see that the results of the certifiers heavily depend on the 
tool that has generated the proof, putting the results quit in perspective.

5) Although Rainbow supports (non-)termination proofs for relative and 
modulo AC TRSs, this does not appear here since these systems have been 
excluded from the competition. :-(

6) About the speed: Without suprise, CeTA is faster than Cime3 and 
Rainbow since CeTA is a compiled program while the others generate 
scripts that are in turn interpreted and checked in Coq. Hence, CeTA is 
about 50 times faster than Rainbow which, more interestingly, is about 
twice faster than Cime3. The average time for Rainbow is however quite 
reasonable I think: 5s. Note also that Cime3 has 28 timeouts while 
Rainbow and CeTA have no timeout. Finally, I would like to mention that 
a compiled version of Rainbow is in preparation (actually, there is 
already one but it handles polynomial interpretations only).

Frederic.