Lifting the Burden of Static Analysis Tool Configuration with Rule Graphs (ECOOP 2019 - Posters)

Who

Lisa Nguyen Quang Do, Eric Bodden

Track

ECOOP 2019 Posters

Time Zone

The program is currently displayed in (GMT+01:00) Belfast.

Use conference time zone: (GMT+01:00) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Jul 2019 18:00 - 19:30 at Mancy - Poster session

Abstract

Static program analysis is known to yield many false positives, for example due to over-approximations (e.g., if a single array cell contains potentially malicious data, the entire array is considered dangerous). As analyses grow more complex with time, their set of internal rules becomes more intricate. To ensure that analysis tools perform well, dedicated developer teams typically configure them before they are deployed in a company. Such teams set up how analysis results are displayed (e.g., how to group warnings together, or decide which ones are more important than others), and edit the analysis rules to customize them for their codebases. With this poster, we explore how to assist developers when configuring a static analysis. In13particular, through the tasks of (1) understanding and (2) classifying warnings, and in finding (3) weak or (4) missing analysis rules. We argue that—to that end—explainability is a core notion: an analysis interprets the source code and builds its own understanding of how it works. Sometimes, this understanding may not match the developer’s, which results in uncertainties, a wrong treatment of critical warnings, wrong tool configurations, or even tool abandonment. Traditional analysis tools typically improve warning explainability by post-processing them using information that is external to the analysis rules, such as the warning type (e.g., SQL injection) or its location in the code. In an effort to help developers understand the analysis’ reasoning, we instead propose to make use of internal information: how the internal rules of an analysis handle the analyzed code. Focusing on data-flow analysis—one of the most complex types of static analysis used in practice, we introduce the concept of rule graphs that characterize analysis warnings and expose information about the internal rules of data-flow analyses, which we use to support our four configuration tasks. In a user study on 22 participants with our IntelliJ plugin Mudarri, we observe that the use of rule graphs can significantly improve the understandability of an analysis warning. A complementary empirical evaluation on 986 Android applications shows that in combination with machine learning, rule graphs can be used to classify analysis warnings (e.g., we are able to differentiate true from false positives in Android with a precision of 0.712 and recall of 0.733), and discover weak analysis rules such as array over-approximations. The empirical evaluation also shows that similarities between rule graphs can also help developers discover missing analysis rules that can cause false positives.

File attachments

Poster and abstract (poster.pdf)	1.61MiB

Lisa Nguyen Quang Do

Paderborn University

Germany

Eric Bodden

Heinz Nixdorf Institut, Paderborn University and Fraunhofer IEM