Hermes: Towards Representative Benchmarks (BenchWork 2019 - (2nd edition))

Mon 15 - Fri 19 July 2019 Hammersmith, London, United Kingdom

Track

BenchWork 2019

Time Zone

The program is currently displayed in (GMT+01:00) Belfast.

Use conference time zone: (GMT+01:00) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Jul 2019 14:30 - 15:00 at Bouzy - Benchmark Creation

Abstract

A comprehensive evaluation is an integral part of modern software engineering research and is either done using established benchmarks (e.g., DaCapo, the Qualitas corpus or the XCorpus) or using ad-hoc benchmarks. In a few cases, a specifically created test suite is used for evaluation purposes. In all cases the representativeness w.r.t. answering the research questions is basically always questionable. The mentioned established corpora contain a large degree of outdated software and – as recent studies have shown – that code is structurally very different when compared to modern Java code as found on, e.g., Maven central. A second issue of (at least) the established benchmarks is that their usage scenarios are only defined at a very high abstraction level (e.g., “general software engineering research”); making their usage in specific context questionable. Tailored benchmarks or custom test suites are often created without any substantial argument regarding their representativeness; making evaluations build on top of them even more questionable. The naive approach to solve the problems to take as many projects/to collect as much code as possible simply doesn’t scale. Analyzing an extremely large code base, such as all non-trivial Java projects found on GitHub, is prohibitively expensive even for simple analyses. This immediately leads to the question of how to build (reasonably) representative benchmarks. In this talk we will discus representativeness of benchmarks before we will present Hermes; a tool that is a first step towards the creation of representative and minimal benchmarks.

Opal Hermes - towards representative benchmarks

Time Zone

The program is currently displayed in (GMT+01:00) Belfast.

Use conference time zone: (GMT+01:00) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Jul
Displayed time zone: Belfast change

13:30 - 15:00	Benchmark CreationBenchWork at Bouzy

13:30 30m Talk		A Central and Evolving Benchmark BenchWork Abhishek Tiwari University of Potsdam, Christian Hammer University of Potsdam File Attached
14:00 30m Talk		Creating and Managing Benchmark Suites with ABM BenchWork Lisa Nguyen Quang Do Paderborn University File Attached
14:30 30m Talk		Hermes: Towards Representative Benchmarks BenchWork Michael Eichberg TU Darmstadt, Germany Media Attached

Hermes: Towards Representative Benchmarks

Tue 16 Jul
Displayed time zone: Belfast change

Michael Eichberg

TU Darmstadt, Germany

Tracks

Workshops

Hermes: Towards Representative Benchmarks

Program Display Configuration

Program Display Configuration

Tue 16 JulDisplayed time zone: Belfast change

Michael Eichberg

TU Darmstadt, Germany

Tue 16 Jul
Displayed time zone: Belfast change