Using Static and Runtime Analysis to Improve Developer Productivity And
Total Page:16
File Type:pdf, Size:1020Kb
Using Static and Runtime Analysis to Improve Developer Productivity and Product Quality Bill Graham and Paul N. Leroux Todd Landry QNX Software Systems Klocwork [email protected], [email protected] [email protected] April 2008 Static and runtime analysis QNX Software Systems Abstract Static analysis can discover a variety of defects and weaknesses in system source code, even before the code is ready to run. Runtime analysis, on the other hand, looks at running software to detect problems as they occur, usually through sophisticated instrumentation. Some may argue that one form of analysis precludes the other, but developers can combine both techniques to achieve faster development and testing as well as higher product quality. The paper begins with static analysis, which prevents problems from entering the main code stream and ensures that any new code is up to standard. Using techniques such as abstract syntax tree (AST) validation and code path analysis, static analysis tools can uncover security vulnerabilities, logic errors, implementation defects, and other problems, both at the developer’s desktop and at system build time. The paper then explores runtime analysis, which developers can perform during module development and system integration to catch any problems missed by static analysis. Runtime analysis not only detects pointer errors and other violations, but also helps optimize utilization of CPU cycles, RAM, flash memory, and other resources. The paper then discusses how developers can combine static and runtime analysis to prevent regressions as a product matures. This two-pronged approach helps to eliminate most problems early in the development cycle, when they cost least to fix. Combining the best of both worlds Static analysis tools find bugs early in the coding phase of a project, usually before the execu- ting code is built. This early detection is particularly useful in large embedded projects, where developers cannot use runtime analysis tools until the software is complete enough to run on the target system. Static analysis detects and describes areas of weakness in source code, including security vulnerabilities, logic errors, implementation defects, concurrency violations, rare boundary conditions, or many other problems. For instance, static analysis tools such as Klocwork Insight perform an in-depth analysis of the source code at a syntactical and semantic level; they also perform sophisticated interprocedural control- and data-flow analysis and use advanced techniques to prune false paths, estimate the values that variables will assume, and simulate potential runtime behavior. Developers can perform static analysis at any time during development, even when only portions of the project are coded; however, the more complete the code, the better. Static analysis can analyze all potential paths through the code — conventional testing rarely does this, unless the project requires 100% code coverage. For instance, static analysis can uncover bugs hidden in “edge cases” or error paths in code not tested during development. Because static analysis attempts to predict behavior based on models of the source code, it will sometimes detect an “error” when, in fact, none exists — this is called a false positive. 2 Static and runtime analysis QNX Software Systems Many modern static analysis tools have implemented advanced techniques to avoid this problem and to perform highly accurate analyses. Static analysis pros Static analysis cons Starts early in the software lifecycle, before the Can find bugs and vulnerabilities code is ready to run and before testing begins. that don’t necessarily cause a crash or impact runtime behavior. Can analyze existing code bases that have already been tested. Non-zero false positive rate. Can integrate into the development environment, as part of nightly builds and as part of each developer’s desktop toolset. Low labor costs: no need to generate test cases or stubs; developers can run their own analyses. Table 1 — Static analysis pros and cons. Runtime analysis tools detect bugs in running code. They allow the developer to monitor or diagnose an application’s behavior at runtime, ideally in the application’s target environment. In many cases, the runtime analysis tool modifies the source or binaries of the application to provide hooks for instrumentation; these hooks detect runtime bugs, memory usage, code coverage, and other conditions. Runtime analysis tools can also generate accurate stack trace information that allows debuggers to find the cause of an error. Therefore, when a runtime analysis tool finds a bug, it is likely a real error that the programmer can quickly identify and fix. That said, the exact runtime conditions for creating the bug must exist for the bug to be detected. Consequently, developers must create a test case for that particular scenario. Runtime analysis pros Runtime analysis cons Generates few false positives — high Instrumentation impairs realtime behavior; productivity rate for errors found. degree of impact depends on amount of instrumentation. Not always an issue, but Can capture full stack trace and execution needs to be considered for time-critical code. environment to track cause of error. Completeness of error analysis depends on Catches errors in the context of the running code coverage. Thus, the code path contain- system, either simulated or real. ing the error must be executed and the test case must create conditions required to create the error. Table 2 — Runtime analysis pros and cons. 3 Static and runtime analysis QNX Software Systems Early detection for lower development costs The earlier that bugs are found, the faster and cheaper it is to correct them. Thus, static and runtime analysis tools offer real value by finding bugs early in the software development life- cycle. Various industry studies indicate that fixing an issue during system test (QA) or once the product has shipped is orders of magnitude more expensive than finding and fixing the same issue while the software is still being developed. Many organizations have specific cost- of-defect metrics; Figure 1 shows the numbers reported in a widely cited reference, Applied Software Measurement by Capers Jones. Figure 1 — As a development project progresses, the cost of fixing software defects can increase exponentially. Static and runtime analysis tools help prevent these costs by finding bugs early in the development lifecycle. Static analysis Static analysis has been around almost as long as modern software development practices. In its first form, it included tools such as lint, which developers used on their desktop, within their local sandbox. When it came to bug detection, these early tools focused on low-hanging fruit, such as coding style and common syntactical mistakes. For example, even the most basic static analysis tools can detect the following bug: 4 Static and runtime analysis QNX Software Systems int foo(int x, int* ptr) { if( x & 1 ); { *ptr = x; return; } ... } Here, the erroneous addition of an extra semicolon leads to potentially disastrous results, with the incoming pointer being dereferenced under unexpected conditions. Whether the tested condition is met or not, the pointer is always dereferenced. These early tools focused largely on syntactic mistakes. So while they could find serious bugs, most of the problems they uncovered were relatively trivial. Also, the tools had too small a code context to produce accurate results. That is because they operated during a developer’s typical compile/link cycle, and the code on a developer’s desktop tends to represent only a small frac- tion of the code available within the entire code stream. This shortcoming forced the analysis tools to make estimations or inferences about what happens outside of the developer’s sandbox, leading to an excess of false positive reports. Subsequent generations of static analysis tools addressed these shortcomings and expanded beyond syntactic and semantic analyses. These new tools build a rich representation or model of the provided code (akin to a compilation phase) and then simulate all possible execution paths through that model, mapping out the flow of logic on those paths, coupled with how and where data objects are created, used, and destroyed. The analysis can span program modules to include interprocedural control and data-flow analysis; it can also minimize false positives through new approaches for pruning false paths, estimating the values that variables will assume, and simulating potential runtime behavior. To generate this level of analysis, static analysis tools have to analyze the entire code base and integrate with a system build, rather than simply operate within the sandbox on a developer’s desktop. To perform this more comprehensive form of analysis, static analysis tools employ two major types of code checking: • Abstract syntax tree (AST) validation — For validating the basic syntax and structure of code. • Code path analysis — For performing more complete types of analysis that depend on understanding the state of a program’s data objects at any particular point on a code execution path. Abstract syntax trees An abstract syntax tree, or AST, is simply a tree-structured representation of the source code as might be generated by the preliminary parsing stages of a compiler. This tree contains a 5 Static and runtime analysis QNX Software Systems rich breakdown of the structure of the code in a nonambiguous manner, allowing the tool to perform simple searches for anomalous syntax. It’s easy to construct AST checkers that enforce standards around naming conventions and function-call restrictions, such as unsafe library checks. Anything that can be inferred from the code without requiring knowledge of that code’s runtime behavior is typically a target for AST checking. Many tools offer AST checking for a variety of languages, including open source tools such as PMD for Java. Several tools use XPath, or an XPath-derived grammar, to define the con- ditions that the checkers look for, and some provide extensibility mechanisms that let users create their own AST checkers.