
Experiments and data sets used for GenProg publications

We are committed to reproducible research and to making our tools freely available. Since our tools, techniques and benchmark suites evolve and improve over time, there is no single all-inclusive package. Instead, for major papers, we try to make available a snapshot of the programs and benchmarks used in that paper, so that others can reproduce or extend those experiments.


There are several sources of code used in our publications:

  • GenProg Source Code (for C). We tag releases for experiments in various papers, described on the releases page.
  • Software Evolution Library. This common interface abstracts over multiple types of software objects including abstract syntax trees parsed from source code, LLVM IR, compiled assembler, and linked ELF binaries. Mutation and evaluation methods are implemented on top of this interface. This code was used for experiments appearing in ASPLOS 2014.
  • Genetic Optimization Algorithm. This GOA implementation accepts an assembly program with a workload and a fitness function (e.g., power consumption) and optimizes that fitness function. This source code was used for experiments appearing in ASPLOS 2014.
  • The experiments in ICSE 2009, GECCO 2009, GECCO 2010 and TSE 2012 used an older version of GenProg. A snapshot of that codebase is available: GenProg v1.0

Subject Programs

ManyBugs and IntroClass
The RepairBenchmarks website contains detailed information on the ManyBugs and IntroClass benchmarks, described in detail in TSE 2015, including the baseline experimental results for GenProg, AE, and TRPAutoRepair.

105 GenProg ICSE 2012 Program Bugs
These scenarios and results were used for the systematic study on program repair published in ICSE 2012 (Paper), and the study of representation and operator choices in GECCO 2012 (Paper). Note: these benchmarks are deprecated. We include these results for completeness, but we discourage their use in future work. Instead, the TSE 2015 benchmarks release (above) includes important corrections.
Virtual Machine Images Buggy Programs

TSE 2012 Bugs
These programs were used in the experiments in this paper; they are a superset of the programs/bugs used in ICSE 2009 and GECCO 2009. The virtual machine image demonstrates the wu-ftpd repair described in that article. Instructions assume GenProg v1.0.
Virtual Machine Images VM Instructions Buggy Programs Workloads

GECCO 2010
In GECCO 2010, we investigated alternative fitness functions for test-guided APR. Instructions assume GenProg v1.0.
Buggy Programs

2009 Buggy Programs
These experiments cover the GenProg publications in both ICSE 2009 and GECCO 2009. Instructions assume GenProg v1.0.
Buggy Programs

The ASPLOS 2013 paper includes results on the Software-artifact Infrastructure Repository.

The ASPLOS 2014 paper makes use of the PARSEC benchmark.

Experimental results

Automatic program repair

ASE 2013 (Paper)
These experiments relate to the Adaptive Equality repair algorithm that uses an approximation to program equivalence to reduce the search and introduces on-line learning strategies to order test cases and repairs.
Code Experimental Results

ICSE 2012 (Paper)
A systematic study of program repair. These experiments were conducted on AWS, using images that we have converted to VirtualBox format. The READMEs also point to a publicly-available AMI. Please use ManyBugs for all future experiments.
Code Virtual Machine Images Experimental Results

TSE 2012 (Paper)
These experiments used GenProg 1.0. The virtual machine image demonstrates the wu-ftpd repair described in that article.
Virtual Machine Images VM Instructions Results

These results cover the GenProg publications in both ICSE 2009 and GECCO 2009.
Experimental Results

Search specifics

GECCO 2012
These include repair results for various genetic algorithm parameter values.
Experimental Results

GECCO 2010
In GECCO 2010, we investigated alternative fitness functions for test-guided APR.
Experimental Results

Patch quality, software robustness

ISSTA 2012
This dataset includes the subject code and questions presented to humans, as well as the human responses.

GPEM 2013
These results relate to neutral mutants and software mutational robustness. Experimental results for higher order neutral mutants are also available. Benchmarks Sorting Programs Experimental Results

Non-functional or quality properties:

Pacific Graphics 2015

These experiments use GenProg-like approaches to automatically generate band-limited procedural shaders. Dataset

These experiments use GenProg-like approaches to reduce the power consumption of software.
Experimental Results Code

These experiments relate to the automated repair of assembly and binaries in embedded systems. Benchmarks Experimental Results