Cslim: Automated Extraction of Iot Functionalities from Legacy C Codebases

Cslim: Automated Extraction of Iot Functionalities from Legacy C Codebases

Cslim: Automated Extraction of IoT Functionalities from Legacy C Codebases Hyogi Simy;∗, Arnab K. Paul∗, Eli Tilevich∗, Ali R. Butt∗, Muhammad Shahzadz Oak Ridge National Laboratoryy, Virginia Tech∗, North Carolina State Universityz [email protected],{akpaul,tilevich,butta}@vt.edu,[email protected] ABSTRACT CCS CONCEPTS Many Internet of Things (IoT) devices are resource-poor, •Software and its engineering ! Embedded soft- possessing limited memory, disk space, and processor ware; Maintaining software; Software usability; capacity. To accommodate such resource scarcity, IoT software cannot include any extraneous functionalities KEYWORDS not used in operating the underlying device. Although Software Engineering, IoT legacy systems software contains numerous functionali- ties that can be reused in IoT applications, these func- ACM Reference format: Hyogi Simy;∗, Arnab K. Paul∗, Eli Tilevich∗, Ali R. Butt∗, Muhammad tionalities are exposed as part of a larger codebase with Shahzadz. 2019. Cslim: Automated Extraction of IoT Functionalities from multiple complex dependencies and a heavy runtime Legacy C Codebases. In Proceedings of International Conference on Dis- footprint. To enable programmers to eectively reuse tributed Computing and Networking, Bangalore, India, January 4–7, 2019 (ICDCN ’19), 6 pages. extant systems software in IoT applications, this paper DOI: 10.1145/3288599.3296013 presents Cslim, a cross-package function extraction tool for C. Cslim extracts programmer-specied functions from a source package and generates new source les 1 INTRODUCTION for a target package, thereby enabling the reuse of sys- The C language is the lingua franca of systems software. tems software in resource-poor execution environments, This language naturally ts the implementation require- such as the IoT devices. Cslim resolves all dependen- ments of various low-level system components, such cies by recursively extracting required functions, while as operating systems and rmware, which put an em- bypassing the complexities of preprocessor macro vari- phasis on increasing execution eciency and reducing abilities by operating on preprocessed source les. Fur- runtime footprint. C programs can be compiled into a thermore, Cslim eciently traverses and resolves the minimal number of machine instructions that can be calling dependencies by maintaining an in-memory re- deployed in compact binaries. In particular, minimiz- lational database. Finally, Cslim is easy to use, as it ing the executable binary code size becomes crucial in requires neither manual intervention nor source code restricted hardware and software environments (e.g., modications. Our prototype implementation of Cslim memory and storage capacity, available shared libraries, has successfully extracted a set of functions from SQLite etc.). Notably, an emerging trend of Internet of Things and GlusterFS, producing slimmed down executables (or IoT) [14] envisions intelligent and connected phys- that can be deployed on IoT devices. ical objects, from small hand-held devices to vehicles and large buildings. Undoubtedly, the binary code size can negatively impact the runtime performance, power usage, and building cost of small-scale IoT devices. ACM acknowledges that this contribution was authored or co- One of the peculiarities of C programming is a lack authored by an employee, or contractor of the national government. of standard libraries that represent common data con- As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for tainers and algorithms to facilitate the development pro- Government purposes only. Permission to make digital or hard copies cess [6]. In C, even a string is simply a null terminated for personal or classroom use is granted. Copies must bear this notice array of characters that must be explicitly allocated. and the full citation on the rst page. Copyrights for components of Higher-level libraries, such as glib [5], and qt [8], intro- this work owned by others than ACM must be honored. To copy other- duce a space deployment overhead, as they require the wise, distribute, republish, or post, requires prior specic permission and/or a fee. Request permissions from [email protected]. shipment of an entire shared library to be able to use ICDCN ’19, Bangalore, India only a single module, such as a hash table; this overhead © 2019 ACM. 978-1-4503-6094-4/19/01...$15.00 can be prohibitive for deployments in small or restricted DOI: 10.1145/3288599.3296013 environments. To eliminate the overhead, programmers manually extract functions of interest from other soft- 2 DESIGN OF CSLIM ware packages, and modify them accordingly to t a Cslim has been designed to conform to the following target software package. Oftentimes, however, such criteria, to ensure its practicality. desired functions are chained in a complex calling de- No manual source code modication. Requiring pendency in the package, rendering manual extraction modications to existing source may hurt ongoing de- tasks tedious and error-prone. Software refactoring— velopment productivity and code maturity. Moreover, behavior-preserving code transformations [19]—can au- requiring source code modication prevents the inclu- tomatically extract these desired functions. Refactoring sion of new source packages, decreasing the tool’s adapt- can address the needs of the IoT community by reusing ability. the already stable and tested infrastructure to build ap- No manual processing. Semi-automated refactoring plications for IoT devices. can unreasonably burden developers, particularly for In this paper, we present Cslim, a cross-package func- larger codebases. Therefore, it is essential to obviate the tion extraction tool for C. As an input, programmers need for human intervention to ensure practicality. only need to specify a list of functions to be extracted Ease of maintenance. Individual source packages can from the source package. Once the function list is pro- always be updated (e.g., to x bugs, to add new features, vided, Cslim rst scans the source package, analyzes the etc.) after the needed functions have been extracted. To calling dependencies, and creates a reference database. accommodate such updates, the framework should sup- It then recursively resolves the calling dependencies, port incremental updates that free programmers from and calculates the nal list of functions in the correct hand-operated and error-prone manual patching. order to appear in the new source les. Finally, Cslim Figure 1 shows the overview of Cslim. The user spec- generates new .h and .c les, that are self-contained ies the list of functions to extract from the source pack- and thus can be embedded in other software packages. age, which we refer to as target functions. Cslim rst Cslim sidesteps the complexity of handling C prepro- bootstraps the source package, primarily to avoid com- cessor macros by operating on the output les of the C plications of preprocessor macro variabilities. After the preprocessor. As stated above, Cslim targets restricted bootstrapping, the source les in the source package are environments, such as IoT devices. Consequently, inject- scanned and all calling dependencies between functions ing static package congurations eliminates variabilities, are analyzed. Cslim stores the analysis results in the without compromising the ecacy and practicability of reference database. Next, any calling dependencies in Cslim. Furthermore, Cslim only manipulates the pack- target functions are resolved by consulting the reference age source code, thus being architecture independent. database. Finally, Cslim generates the self-contained To evaluate our prototype of Cslim, we used SQLite [9] target source les ready to be embedded in the target as a test case because of its popularity in Android appli- package. We next detail each step of Cslim in turn. cation development. SQLite is preferred in small devices, due to its lightweight nature and single-tier database architecture. SQLite was designed to provide local data 2.1 Bootstrapping the source package storage for individual devices and applications. There- To start extracting functions from a source package, fore, SQLite databases require little administration, mak- we rst bootstrap the source software package in two ing them particularly well-suitable for devices that need phases: conguring the source package and running the to operate without expert human support, such as those C preprocessor. In most cases, conguring a source pack- used in the “internet of things.” In addition to SQLite, age requires running its configure script, which takes we also evaluate Cslim by successfully extracting func- input ags that specify the target device architecture and tions from GlusterFS [10], another open-source C pack- other building options, and generates appropriate build age. GlusterFS is a scalable, distributed le system that scripts. The output build scripts contain all necessary is well-suited for data-intensive tasks, such as media information to resolve the variabilities of preprocessor streaming and cloud storage. macros. Then, the bootstrapping is completed by run- The remainder of the paper is organized as follows. ning the build scripts (e.g., Makefile), but only up to the We explain our design (§ 2), and implementation (§ 3) C preprocessor, as Cslim works at the source code level. of Cslim, followed by our initial experience and nd- The output source les, with all preprocessor macros ings from

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us