Effective Source Code Analysis with Minimization

Effective Source Code Analysis with Minimization Geet Tapan Telang Kotaro Hashimoto Krishnaji Desai Hitachi India Pvt. Ltd. Hitachi India Pvt. Ltd. Hitachi India Pvt. Ltd. Research Engineer Software Engineer Researcher Bangalore, India Bangalore, India Bangalore, India Email: [email protected] Email: [email protected] Email: [email protected] Abstract—During embedded software development using open source, there exists substantial amount of code that is ineffective which reduces the debugging efficiency, readability for human inspection and increase in search space for analysis. In domains like real-time embedded system and mission critical systems, this may result in inefficiency and inconsistencies affecting lower quality of service, enhanced readability, increased verification and validation efforts. To mitigate these shortcomings, we propose the method of minimization with an easy to use tool support that leverages preprocessor directives with GCC for cutting out ifdef blocks. Case studies of Linux kernel tree, Busybox and industry- strength OSS projects are evaluated indicating average reduction in lines of code roughly around 5%-22% in base kernel using minimization technique. This results in increased efficiency for analysis, testing and human inspections that may help in assuring dependability of systems. Fig. 1: Overview of minimization. I. INTRODUCTION The outgrowth of Linux operating system and other real- II. BACKGROUND time embedded systems in dependable computing has raised concern in possible areas such as mission critical system. The In OSS software domain, there are many developers con- size of code has grown to approximately 20 million lines of tributing towards the common goal such as real-time, safety- code (Linux) and with this scale and complexity, it becomes mitigation etc. because of which, the source code developed impossible to follow traditional methods for meeting the safety lacks strict guidelines. Although, there are checks made with and real-time requirements. semantic patches [2] and other utilities before the source code is committed, no clear coding guidelines are followed that will Since tools for analysis and testing are getting advanced, make the source code easy to inspect and analyze. the safety and time requirements can be verified and validated by capturing evidence and justifications. Most of these tools Primary problem with the OSS code in embedded domain are interested in meeting the coverage expectations in terms is the usage of pre-processor directives and conditional code of code, execution times, resources, throughput and fault compilations that are used due to varying configuration op- tolerance that define the overall dependability of systems. tions. In case of Linux kernel alone, there are more than 10,000 different configuration flags that have to enabled/disabled and Code coverage with different analysis and test tools be- thereby used as part of the pre-processor directive in the source comes the major part of verification and validation, in order to code [3]. perform this effectively, we propose method of minimization devised for keeping functional safety and real-timeliness into The configurations of OSS code is easy to enumerate consideration for narrowed search space verification, false pos- and apply depending on the configuration flags and configure itive reduction, easier human inspection and shorter verification command. However, the source code is still having all of con- time. The term minimization as shown in figure 1 signifies ditional compilation code with pre-processor directives. Too removal of unused piece of code comprising of #ifdef and much of conditional compilation code based on configuration, #if blocks. The target code along with configuration file is difficult to inspect and analyze for different static analysis when executed using minimization process produces compi- tools. As the configuration options increase, the usage of lable code without #ifdef and #if block and other unused same in code also increases significantly resulting in analysis lines of code, which is different from execution with GCC complexity. preprocessor where compiled code comprises of #ifdefs. To solve this problem, there are few tools that are built The evaluation is exercised on targets such as Linux Kernel with pre-processor directives awareness so that during the source tree, BusyBox [1] tree and similar quantification of source code analysis these tools read system configuration and other OSS projects. directives to selectively analyze relevant code and skip the disabled code automatically. Some of these tools are GNU used to tweak pre-processor directives such that conditional cflow, Coccinelle etc. Problem here is that dead code elim- compilation code is stripped as per enabled configurations. ination is not the main purpose of these tools. Pre-processor This helps in generating .c code that has conditional directives handling is best done with its corresponding compiler in place. applied to remove the redundant code. Limitation here is that, Standalone tools cannot do a good job with this as they do not the approach is not generalized for complete source code tree. have required constructs configured for effective application of conditional compilation. Hence, a standardized method needs One alternative can be Cflow [9], [10] a GNU based tool, to be made available that can suffice minimization agnostic to which can preprocess input files before analyzing them and other static analysis tools and human inspection. However, the it is integrated with pre-processing option itself, however it proposed method needs to leverage compiler technology that renders difficulty because all the required preprocess options can best apply the pre-processor directives. needs to be copied and pasted for execution of Cflow leading to incorrectness. For this purpose, the GCC [4] compiler based pre-processor is selected for realizing the minimization technique. GCC is the Other alternative can have GCC compilation log and then choice due to its immense usage in OSS community including tweak the log options for each file with required pre-processor Linux, Busybox etc. options using GREP. Based on which the required .c and .h files with the stripped code based on the pre-processor A. Linux kernel configuration options can be generated. In subsequent sections, an approach is proposed with Makefile integration and subsequent post Configuration plays a very important role in building any processing for easier code inspection and narrowed down software. In case of OSS, it becomes an essential pre-requisite search space. as the software is developed by different developers with multiple configurations concurrently. To illustrate the same, Linux kernel is a classic example for showing the varied IV. MINIMIZATION APPROACH configuration it supports. A. Minimization Process The Linux kernel configurations are available in arch/*/configs. To alter configuration, integrated Minimization approach emphasizes on a collection of pro- options such as make config, make menuconfig and cesses which tweaks integrated MakeFile options to produce make xconfig are available. Once configuration is done, compilable minimized code. it gets saved in .config file. The indication available in .config file has option =y illustrating driver is built into 1) Definition: The term minimization signifies an efficient the kernel, =m for built as a module or it is not selected [5]. way to get a set of stripped source code, where all the code The .config file appears as below: which is not required according to .config file is left out. Often it is observed that it becomes hard to debug, maintain #General setup and verify the code because of macros and preprocessor CONFIG_INIT_ENV_ARG_LIMIT=32 directives that are expanded during compilation through GCC. CONFIG_CROSS_COMPILE="" This difficulty is subdued with our approach where target # CONFIG_COMPILE_TEST is not set source code is free from selected preprocessor options and CONFIG_LOCALVERSION="" macros expansion, thereby reducing source code. The approach # CONFIG_LOCALVERSION_AUTO is not set that has been used for minimization of source code follows CONFIG_HAVE_KERNEL_GZIP=y the use of GREP command. This filters GCC (used compiler commands) and generates the source tree consisting of limited or useful internal components which are available in shortlisted B. GCC preprocessor configuration .config file as dedicated output. GCC [4] is the compiler choice that is used from utils The GCC compiler: In general scenario the source code to operating system level and rigorously tested for several modules are compiled with certain GCC options to make them years using tools such as CSMITH [6]. Hence, configuration work [11]. Typical GCC option looks like given snippet of based pre-processor is best applied with GCC and is choice kernel modules: for realizing our methodology. The GCC preprocessor implements macro language that is utilized to change C programs before they are compiled. The gcc -Wp,-MD, output is similar to input however, the preprocessor directive arch/x86/tools/.relocs_32.o.d lines are replaced with blank lines and spaces are appended -Wall -Wmissing-prototypes instead of comments based on the configuration. Certain time -Wstrict-prototypes -O2 some directives may be duplicated in output of the preproces- -fomit-frame-pointer -std=gnu89 sor, majority of these are #define and #undef that contains -I./tools/include -c -o certain debugging options [7]. arch/x86/tools/relocs_32.o arch/x86/tools/relocs_32.c

Effective Source Code Analysis with Minimization

Semantic Patches for Java Program Transformation

Coccinelle: Reducing the Barriers to Modularization in a Large C Code Base

Inferring Semantic Patches for the Linux Kernel

Automating Patching of Vulnerable Open-Source Software Versions in Application Binaries

SED 1214 Transcript EPISODE 1214

Automated Secure Code Review for Webapplications

Detect Complex Code Patterns Using Semantic Grep

Introducing Semgrep

Towards Generating Transformation Rules Without Examples for Android API Replacement

Clang and Coccinelle: Synergising Program Analysis Tools for CERT C Secure Coding Standard Certification

Aalborg Universitet Coccinelle Tool Support for Automated

Design and Implementation of Semantic Patch Support for the Spoon Java Transformation Engine