Experimental Evaluation of "Better termination proving through cooperation"

Implementation

Details on the implementation, and the sources of the tool, can be found on the T2 website.

Tools

We compare the following tools:

Examples

During our evaluation we ran tools on a set of 449 termination proving benchmarks drawn from a variety of application (e.g. Windows device drivers, the Apache web server, the PostgreSQL server, integer approximations of numerical programs from a book on numerical recipes, integer approximations of benchmarks from LLBMC and other tool evaluations). The examples are available in T2s internal integer transition system format and in an equivalent translation to C. Note that these C examples are obtained from an automatic translation from transition systems and are hence not comparable to normal C programs. Tools relying on syntactic heuristics are hence at a disadvantage here.

We also have some coarse statistics about our examples:

Average number Mean number Max number
Locations 24 9 468
Cutpoints 3 2 120
Transitions 37 11 807

Detailed Results

In our experiments, we used a timeout of 300 seconds for each example. We ran Terminator, T2 and Cooperating-T2 on a system using a quadcore 2.26GHz E5520 with 8GB of RAM and running Windows 7. The remaining tests were performed on a quadcore 3.07GHz Core i7 system with 4GB of RAM and running Debian Linux 6.

An overview of our results:

Cooperating-T2 Terminator T2 ARMC AProVE AProVE+Interproc Size-Change/MCNP (AProVE) KITTeL
Number TERM 245 177 189 138 197 185 156 196
Avg. time TERM (s) 3.42 4.99 5.15 16.16 2.21 1.53 17.50 4.65
Number NONTERM 174 181 180 0 0 0 0 0
Avg. time NONTERM (s) 1.68 5.88 6.67 0 0 0 0 0
Avg. time total (s) 13.77 52.60 47.61 68.61 47.23 81.30 82.81 129.30

In the following tables, results of our evaluation are detailed. Examples with differing results (i.e., TERMINATING vs. TIMEOUT) are highlighted with a yellow background: