Lifting the Burden of Static Analysis Tool Configuration with Rule Graphs
Static program analysis is known to yield many false positives, for example due to over-approximations (e.g., if a single array cell contains potentially malicious data, the entire array is considered dangerous). As analyses grow more complex with time, their set of internal rules becomes more intricate. To ensure that analysis tools perform well, dedicated developer teams typically configure them before they are deployed in a company. Such teams set up how analysis results are displayed (e.g., how to group warnings together, or decide which ones are more important than others), and edit the analysis rules to customize them for their codebases. With this poster, we explore how to assist developers when configuring a static analysis. In13particular, through the tasks of (1) understanding and (2) classifying warnings, and in finding (3) weak or (4) missing analysis rules. We argue that—to that end—explainability is a core notion: an analysis interprets the source code and builds its own understanding of how it works. Sometimes, this understanding may not match the developer’s, which results in uncertainties, a wrong treatment of critical warnings, wrong tool configurations, or even tool abandonment. Traditional analysis tools typically improve warning explainability by post-processing them using information that is external to the analysis rules, such as the warning type (e.g., SQL injection) or its location in the code. In an effort to help developers understand the analysis’ reasoning, we instead propose to make use of internal information: how the internal rules of an analysis handle the analyzed code. Focusing on data-flow analysis—one of the most complex types of static analysis used in practice, we introduce the concept of rule graphs that characterize analysis warnings and expose information about the internal rules of data-flow analyses, which we use to support our four configuration tasks. In a user study on 22 participants with our IntelliJ plugin Mudarri, we observe that the use of rule graphs can significantly improve the understandability of an analysis warning. A complementary empirical evaluation on 986 Android applications shows that in combination with machine learning, rule graphs can be used to classify analysis warnings (e.g., we are able to differentiate true from false positives in Android with a precision of 0.712 and recall of 0.733), and discover weak analysis rules such as array over-approximations. The empirical evaluation also shows that similarities between rule graphs can also help developers discover missing analysis rules that can cause false positives.
|Poster and abstract (poster.pdf)