We are committed to reproducible research and to making our tools freely available. Since our tools, techniques and benchmark suites evolve and improve over time, there is no single all-inclusive package. Instead, for major papers, we try to make available a snapshot of the programs and benchmarks used in that paper, so that others can reproduce or extend those experiments.
There are several sources of code used in our publications:
ManyBugs and IntroClass
The RepairBenchmarks website contains
detailed information on the ManyBugs and IntroClass benchmarks, described in
detail in TSE 2015, including the baseline
experimental results for GenProg, AE, and TRPAutoRepair.
105 GenProg ICSE 2012 Program Bugs
These scenarios and results were used for the systematic study on program repair published in ICSE 2012
(Paper), and the study of representation and
operator choices in GECCO 2012 (Paper).
Note: these benchmarks are deprecated. We include these results for completeness,
but we discourage their use in future work. Instead, the TSE 2015 benchmarks release
(above) includes important corrections.
Virtual Machine Images
Buggy Programs
TSE 2012 Bugs
These programs were used in the experiments in this
paper; they are a superset of the programs/bugs used
in ICSE 2009 and GECCO 2009. The virtual machine image demonstrates the wu-ftpd
repair described in that article. Instructions assume GenProg v1.0.
Virtual Machine Images
VM Instructions
Buggy Programs
Workloads
GECCO 2010
In GECCO 2010, we investigated alternative fitness functions for test-guided
APR. Instructions assume GenProg v1.0.
Buggy Programs
2009 Buggy Programs
These experiments cover the GenProg publications in both ICSE 2009 and GECCO 2009.
Instructions assume GenProg v1.0.
Buggy Programs
SIR
The ASPLOS 2013 paper includes results on the Software-artifact Infrastructure
Repository.
PARSEC
The ASPLOS 2014 paper makes use of the PARSEC benchmark.
ASE 2013 (Paper)
These experiments relate to the Adaptive Equality repair algorithm that uses an
approximation to program equivalence to reduce the search and introduces on-line
learning strategies to order test cases and repairs.
Code
Experimental Results
ICSE 2012 (Paper)
A systematic study of program repair. These experiments were conducted on AWS, using images
that we have converted to VirtualBox format. The READMEs also point to a
publicly-available AMI. Please use ManyBugs for all
future experiments.
Code
Virtual Machine Images
Experimental Results
TSE 2012 (Paper)
These experiments used GenProg 1.0. The virtual machine image demonstrates the wu-ftpd
repair described in that article.
Virtual Machine Images
VM Instructions
Results
ICSE/GECCO 2009
These results cover the GenProg publications in both ICSE 2009 and GECCO 2009.
Experimental Results
GECCO 2012
These include repair results for various genetic algorithm parameter values.
Experimental Results
GECCO 2010
In GECCO 2010, we investigated alternative fitness functions for test-guided
APR.
Experimental Results
ISSTA 2012
This dataset includes the subject code and questions presented to humans,
as well as the human responses.
Dataset
GPEM 2013
These results relate to neutral mutants and software mutational
robustness. Experimental results
for
higher order neutral mutants are
also available.
Benchmarks
Sorting Programs Experimental Results
Pacific Graphics 2015
These experiments use GenProg-like approaches to automatically generate band-limited procedural shaders. Dataset
ASPLOS 2014
These experiments use GenProg-like approaches to reduce the power consumption of
software.
Experimental Results
Code
ASPLOS 2013
These experiments relate to the automated repair of assembly and binaries in
embedded systems.
Benchmarks
Experimental Results