Reproducible experiments are an important pillar of well-founded research. Having benchmarks that are publicly available and representative of real-world applications is an important step towards that; it allows us to measure the results of a tool in terms of its precision, recall and overall accuracy. Having such benchmarks is different from having a corpus of programs—a benchmark needs to have labelled data that can be used as ground truth when measuring precision and recall.

With the increased advent in Artifact Evaluation Committees in most PL/SE conferences, reproducibility studies are making their way to the CFP of top conferences such as ECOOP and ISSTA. In some domains, there are established benchmarks used by a community, however, in other domains, the lack of a benchmark prevents researchers from measuring the true value of their newly developed technique.

This workshop aims to provide a platform for researchers and practitioners to share their experience and thoughts, discussing key learnings from the PL and SE communities, to be able to improve on the sets of benchmarks that are available, or in some cases start/continue the discussion on developing a new benchmark, and their role in research and industry.

Invited Speakers

Title
A Benchmark for Understanding Data Science Software
BenchWork
A Central and Evolving Benchmark
BenchWork
File Attached
Android Taint-Analysis Benchmarks: Past, Present and Future
BenchWork
Media Attached
A Renaissance for Optimizing Compilers
BenchWork
Media Attached
A Word From the Chairs
BenchWork
Creating and Managing Benchmark Suites with ABM
BenchWork
File Attached
Dependability Benchmarking by Injecting Software Bugs
BenchWork
Media Attached
Discussion and Closing
BenchWork
Hermes: Towards Representative Benchmarks
BenchWork
Media Attached

Call for Talks

This workshop aims to provide a platform for researchers and practitioners to share their experience and thoughts, discussing key learnings from the PL and SE communities, to be able to improve on the sets of benchmarks that are available, or in some cases start/continue the discussion on developing a new benchmark, and their role in research and industry. In particular, we welcome contributions in the form of talk abstracts within (but not limited to) the following topics:

  • Experiences with benchmarking in the areas of program-analysis (e.g., finding bugs, measuring points-to sets)
  • Experiences with benchmarking in the areas of software engineering (e.g., clone detection, testing techniques)
  • Infrastructure related to support of a benchmark over time, across different versions of the relevant programs
  • Metrics that are valuable in the context of incomplete programs
  • Support for dynamic analysis, where the benchmark programs need to be run
  • Automation of creation of benchmarks
  • What types of program should be included in program-analysis benchmarks?

  • What type of analysis do you perform?
  • What build systems do your tool support?
  • What program-analysis benchmarks do you typically use? What are their pros and cons?
  • What are the useful metrics to consider when creating program-analysis benchmarks?
  • How can we handle incomplete code in benchmarks?
  • How can program-analysis benchmarks provide good support for dynamic analyses?
  • How can we automate the creation of program-analysis benchmarks?

Plenary
You're viewing the program in a time zone which is different from your device's time zone change time zone

Tue 16 Jul

Displayed time zone: Belfast change

10:45 - 12:15
Benchmark SuitesBenchWork at Bouzy
10:45
15m
Day opening
A Word From the Chairs
BenchWork
Kim Herzig Tools for Software Engineers, Microsoft, Ben Hermann Paderborn University
11:00
30m
Talk
Dependability Benchmarking by Injecting Software Bugs
BenchWork
Roberto Natella Federico II University of Naples
Media Attached
11:30
30m
Talk
A Renaissance for Optimizing Compilers
BenchWork
Media Attached
13:30 - 15:00
Benchmark CreationBenchWork at Bouzy
13:30
30m
Talk
A Central and Evolving Benchmark
BenchWork
Abhishek Tiwari University of Potsdam, Christian Hammer University of Potsdam
File Attached
14:00
30m
Talk
Creating and Managing Benchmark Suites with ABM
BenchWork
Lisa Nguyen Quang Do Paderborn University
File Attached
14:30
30m
Talk
Hermes: Towards Representative Benchmarks
BenchWork
Michael Eichberg TU Darmstadt, Germany
Media Attached
15:30 - 17:00
Specialized Benchmarks and FutureBenchWork at Bouzy
15:30
30m
Talk
A Benchmark for Understanding Data Science Software
BenchWork
Hridesh Rajan Iowa State University
16:00
30m
Talk
Android Taint-Analysis Benchmarks: Past, Present and Future
BenchWork
Felix Pauck Paderborn University, Germany
Media Attached
16:30
30m
Day closing
Discussion and Closing
BenchWork
Kim Herzig Tools for Software Engineers, Microsoft, Ben Hermann Paderborn University
17:30 - 19:30
Social HourCatering at Socials