Frequently state-of-the-art Android taint-analysis tools are upgraded and steadily novel tools become available. Benchmarks are the first choice to evaluate such upgraded or novel tools. However, as mature some of these tools are, the immature even the most employed benchmarks are. Most benchmarks are missing a precisely defined ground-truth, are generally deprecated and have never been updated since their on-demand creation.
To overcome these drawbacks, we developed ReproDroid, a framework to create, refine and automatically execute reproducible benchmarks for Android app analysis tools. Along with ReproDroid we could automatically compare analysis tools and thereby identify false promises made in benchmark evaluations and speed-up benchmark processes.
On this basis benchmarks could become more, namely challenges. Different communities already use challenges to compare existing and novel analysis tools competitively and regularly. Thus, it is time to also boost the development of Android taint-analysis benchmarks and their associated tools by starting competitive challenges.