Evaluating the Ratio of Alive Code in Java Third-Party Libraries

Total Page:16

File Type:pdf, Size:1020Kb

Evaluating the Ratio of Alive Code in Java Third-Party Libraries DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Evaluating the Ratio of Alive Code in Java Third-Party Libraries A Comparison between a Static and a Dynamic Approach ANDREAS BROMMUND KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Evaluating the Ratio of Alive Code in Java Third-Party Libraries A Comparison between a Static and a Dynamic Approach ANDREAS BROMMUND Master in Computer Science Date: July 11, 2019 Supervisor: Pontus Johnson Examiner: Elena Troubitsyna School of Electrical Engineering and Computer Science Host company: Omegapoint Stockholm AB Swedish title: Analysera andelen använd kod i tredjepartsbibliotek – en jämförelse mellan ett statiskt och ett dynamiskt tillvägagångssätt iii Abstract Today’s software development heavily relies on the use of third-party libraries. However, some libraries have a rich set of functionalities where only a few of them are used. This leads to an unnecessary complex codebase that needs maintenance. This thesis compares two methods used to calculate the ratio of used code in the third-party libraries. The first method uses the already existing tool JTombstone, which analyses the code statically. This static approach always examines the whole program. However, it overestimates the result. The sec- ond method uses a dynamic approach. This method always underestimates the result, because, only the part of the program which is executed will be examined. The dynamic code analyser tool modifies all classes which contains in the third-party library. At the beginning of every method a print statement is added, which prints the signature of the current method. In this way, the list of all executed methods is generated. The findings of the thesis are that the first approach always yields higher value and the difference between the two methods decreases while the code coverage increases. The thesis cannot state which method is the best; however, a good solution is to combine both methods to generate an interval which bound the correct value. iv Sammanfattning Dagens mjukvaruutveckling förlitar sig mycket på användningen av tredje- partsbibliotek. Emellertid innehåller många av biblioteken mycket funktiona- litet men bara en liten del av dem används. Det här skapar onödigt komplex mjukvara som måste underhållas. I den här uppsatsen jämförs två olika metoder som används för att beräkna an- delen använd kod i tredjepartsbibliotek. Den första metoden använder JTomb- stone, det här verktyget analyserar koden statiskt. Eftersom den analyserar ko- den statiskt kommer hela projektet alltid bli analyserat, däremot kommer verk- tyget beräkna ett för högt värde. Den andra metoden bygger istället på en dyna- misk utvärdering av koden. När man använder ett dynamiskt tillvägagångsätt så utvärderas bara den delen av koden som kördes, det här leder till att pro- grammet kommer att generera ett för lågt resultat. Verktyget som analyserar koden dynamiskt modifierar alla klasser som tillhör tredjepartsbiblioteket. I början av varje metod lägger verktyget till en utskrift, som skriver ut metodsignaturen för den specifika metoden. På så sätt erhålls en lista av metoderna som har blivit anropade. Uppsatsen kom fram till att den första metoden alltid genererar ett större värde. Resultaten visar också att skillnaden mellan de två metoderna minskar när testerna testar en större del av koden. Med de resultat som genererades går det inte att avgöra vilken av de två metoder som är bäst. En bra lösning är att kombinera metoderna och med hjälp av de två resultaten skapa en övre och undre gräns för det korrekta värdet. Contents 1 Introduction 1 1.1 Background . .1 1.2 Research Question . .2 1.3 Hypothesis . .2 1.4 Delimitations . .2 1.5 Contribution . .2 2 Theory 3 2.1 Third-Party Libraries . .3 2.2 Dead Code . .4 2.3 Dynamic and Static Dispatch . .4 2.4 Call Graph . .5 2.4.1 Sound and Precise Call Graph . .6 2.5 Code Coverage . .6 2.6 Static Code Analysis . .7 2.6.1 Class Hierarchy Analysis . .8 2.6.2 Rapid Type Analysis . 11 2.7 Dynamic Analysis . 13 3 Related Work 15 3.1 Call Graph Construction for Java Libraries . 15 3.2 DUM-Tool . 16 3.3 Dead Code Elimination for Web Systems Written in PHP: Lessons Learned from an Industry Case . 17 4 Methods 18 4.1 Test Data . 18 4.2 Dead Code Granularity . 19 4.3 Tools Used . 19 v vi CONTENTS 4.3.1 Java . 19 4.3.2 Javap . 20 4.3.3 JTombstone . 20 4.3.4 Java Agent . 21 4.4 Experiment Process . 21 4.4.1 Initialisation . 22 4.4.2 Dynamic Analysis . 23 4.4.3 Static Analysis . 23 4.4.4 Calculat Code Coverage . 24 4.4.5 Calculating and Validating the Result . 24 5 Results 25 6 Discussions 29 6.1 Result . 29 6.2 Methodology . 30 6.3 Sources of Error . 31 6.4 Future Work . 32 6.4.1 Improve the Code Coverage . 32 6.4.2 Improve the Static Code Analysis . 32 6.4.3 Analyse the Functionality of the Code . 33 6.5 Ethical Considerations . 33 7 Conclusions 34 Bibliography 35 Glossary API Application Programming Interface. 3, 15, 19, 21 CHA Class Hierarchy Analysis. 8–12, 20 CPA Closed-package assumption. 15 HTTP Hypertext Transfer Protocol. 18 JVM Java Virtual Machine. 21 OPA Open package assumption. 15 RTA Rapid Type Analysis. 11–13, 32 vii Chapter 1 Introduction This chapter has a brief background of the problem statement, the aim of the thesis and the research question. 1.1 Background Today’s software development heavily relies on the use of third-party code libraries and the reuse of code is a necessary part of modern development. It is an easy way to include functionality and speed up the development process. However, the risk analysis usually is omitted when the decision to include a library is decided. Most of the third-party libraries have a rich set of functions where only a few of them are used in the software. This leads to an increased codebase with a high ratio of dead code. [1] An increased codebase leads to a bigger attack surface, and more resources are needed to maintain the software. To prevent being vulnerable to known vulnerabilities, the maintainer must always be updated about new patches and actively patch the libraries. The burden of patching increases with the number of libraries and when the size of the codebase increase. Therefore, it is neces- sary to have control of the dependency tree to reduce unnecessary code. A first step to take control of the dependency problem is to gain knowledge of which dependencies are included. Secondly, it is necessary to find which func- tions are essential for the project and which are not. One way of solving this is to measure the ratio of used and unused code in the dependencies. This thesis 1 2 CHAPTER 1. INTRODUCTION is focused on the second problem and investigates two different approaches to solve this challenge. 1.2 Research Question Is a dynamic approach a better technique for measuring the amount of unused code in third-party libraries included in Java open source projects with high code coverage, compared to a static approach? 1.3 Hypothesis 1. The number of methods categorised as alive is higher for the static anal- ysis approach compared to the dynamic approach. 2. The difference between the two methods decreases when the code cov- erage increases. 1.4 Delimitations Test data only contains a few projects, and all of them must be written in the programming language Java. Only static code analysis and dynamic analysis is investigated. 1.5 Contribution In this thesis, a new dead code analyser is developed. The analyser measures the number of used methods in the third-party libraries. The application is classified as a dynamic approach and is compared with the already existing tool JTombstone1. 1http://jtombstone.sourceforge.net Chapter 2 Theory The relevant theory is presented in this chapter. 2.1 Third-Party Libraries Third-party libraries are a vital part of this thesis and must be defined more precisely. The definition is taken from Heinemann et al. [1]; however, they use the term software reuse. All code not written by the developers themselves are categorised as third-party code. Furthermore, code which is provided by the operating system or the programming language is not included in the defini- tion. Therefore, the Java API is not classified as a third-party library. To distinguish between the third-party code and the self-written code in the upcoming sections, in this chapter and throughout the rest of the thesis, the following six definitions are used; the definitions are taken from Romano et al. [2], • internal code is the code written specifically for the software, • external code is code in the third-party libraries, • internal classes are the classes in the internal code, • external classes are the classes in the external code, • internal methods are the methods in the internal classes and • external methods are the methods in the external classes. 3 4 CHAPTER 2. THEORY 1 public int add(int a, int b){ 2 int c = a; 3 int d = 3; 4 if(false){ 5 System.out.print("Dead"); 6 } 7 return c + b; 8 } Listing 2.1: This code snippet exemplifies dead code. For example, line 5 is dead because of the if statement. 2.2 Dead Code Dead code is one of the key concepts in this thesis, and it is crucial to under- stand the meaning of the concept. In computer science, dead code or unreach- able code has different meanings in different fields. Software engineers define dead code[3] as the part of the program which never is executed.
Recommended publications
  • Security to Android Mobile Applications: Camera Based
    ISSN 2319-8885 Vol.04,Issue.29, August-2015, Pages:5534-5539 www.ijsetr.com Security to Android Mobile Applications: Camera Based Security and Theft Alert Mobile Phones 1 2 3 SANA SULTANA , G.VEERANJANEYULU , SALEHA FARHA 1PG Scholar, Dept of CSE, Shadan Women’s College of Engineering and Technology, Hyderabad, TS, India, E-mail: [email protected]. 2Professor, Dept of CSE, Shadan Women’s College of Engineering and Technology, Hyderabad, TS, India, E-mail: [email protected]. 3HOD, Dept of CSE, Shadan Women’s College of Engineering and Technology, Hyderabad, TS, India, E-mail: [email protected]. Abstract: Now Smartphone’s are simply small computers with added services such as GSM, radio. Thus it is true to say that next generations of operating system will be on these Smartphone and the likes of windows, IOS and android are showing us the glimpse of this future. Android has already gained significant advantages on its counterparts in terms of market share. One of the reason behind this is the most important feature of Android is that it is open-source which makes it free and allows any one person could develop their own applications and publish them freely. This openness of android brings the developers and users a wide range of convenience but simultaneously it increases the security issues. The major threat of Android users is Malware infection via Android Application Market which is targeting some loopholes in the architecture mainly on the end-users part. Attackers can implement spy cameras in malicious apps such that the phone camera is launched automatically without the device owner’s notice, and the captured photos and videos are sent out to these remote attackers.
    [Show full text]
  • Building Vision and Voice Based Robots Using Android
    1st Annual International Interdisciplinary Conference, AIIC 2013, 24-26 April, Azores, Portugal - Proceedings- BUILDING VISION AND VOICE BASED ROBOTS USING ANDROID Pradeep N. , Assistant Professor Dr. Mohammed Sharief, Professor Dr. M. Siddappa, Professor Dept. of Computer Science and Engineering, SSIT, Tumkur, Karnataka, India Abstract: This project uses Android, along with cloud based APIs and the Arduino microcontroller, to create a software-hardware system that attempts to address several current problems in robotics, including, among other things, line-following, obstacle-avoidance, voice control, voice synthesis, remote surveillance, motion & obstacle detection, face detection, remote live image streaming and remote photography in panorama & time-lapse mode. The rover, built upon a portable five-layered architecture, will use several mature and a few experimental algorithms in AI, Computer Vision, Robot Kinetics and Dynamics behind-the-scene to provide its prospective user in defense, entertainment or industrial sector a positive user experience. A minimal amount of security is achieved via. a handshake using MD5, SHA1 and AES. It even has a Twitter account for autonomous social networking. Key Words: Voice Synthesis, Voice Control, Robot-User Interaction, Face Detection, Motion and Object Detection, Remote Surveillance, Panoramic and Time-lapse Photography, Line-following and Obstacle-Avoiding Robotics, AI, Computer Vision, Image Processing, Network Security Introduction Android, which is actually a software stack, running atop the Linux kernel, is one of the most successful operating systems available to mobile users today. This success can be attributed to the fact that Android makes development of application softwares ('apps') really easy by allowing developers to code in Java (among many other languages) and thus enabling them to take advantage of the innumerable packages and modules made available by the Java community.
    [Show full text]
  • When Program Analysis Meets Bytecode Search: Targeted and Efficient Inter-Procedural Analysis of Modern Android Apps in Backdroid
    When Program Analysis Meets Bytecode Search: Targeted and Efficient Inter-procedural Analysis of Modern Android Apps in BackDroid Daoyuan Wu1, Debin Gao2, Robert H. Deng2, and Rocky K. C. Chang3 1The Chinese University of Hong Kong 2Singapore Management University 3The Hong Kong Polytechnic University Contact: [email protected] Abstract—Widely-used Android static program analysis tools, be successfully analyzed by its underlying FlowDroid tool. e.g., Amandroid and FlowDroid, perform the whole-app inter- Although HSOMiner [53] increased the size of the analyzed procedural analysis that is comprehensive but fundamentally apps, the average app size is still only 8.4MB. Even for difficult to handle modern (large) apps. The average app size has these small apps, AppContext timed out for 16.1% of the increased three to four times over five years. In this paper, we 1,002 apps tested and HSOMiner similarly failed in 8.4% explore a new paradigm of targeted inter-procedural analysis that can skip irrelevant code and focus only on the flows of security- of 3,000 apps, causing a relatively high failure rate. Hence, sensitive sink APIs. To this end, we propose a technique called this is not only a performance issue, but also the detection on-the-fly bytecode search, which searches the disassembled app burden. Additionally, third-party libraries were often ignored. bytecode text just in time when a caller needs to be located. In this For example, Amandroid by default skipped the analysis of way, it guides targeted (and backward) inter-procedural analysis 139 popular libraries, such as AdMob, Flurry, and Facebook.
    [Show full text]
  • Självständigt Arbete På Grundnivå Independent Degree Project - First Cycle
    Självständigt arbete på grundnivå Independent degree project - first cycle Computer Engineering A framework for communicating with Android apps from the browser Karl Lindström A framework for communicating with Android apps from the browser Karl Lindström 2014-05-29 MID SWEDEN UNIVERSITY Avdelningen för Informations- och kommunikationssystem (IKS) Examiner: Ulf Jennehag, [email protected] Supervisor: Magnus Eriksson, [email protected] Author: Karl Lindström, [email protected] Degree programme: International Bachelor's Programme in Computer Engi- neering, 180 credits Main field of study: Computer Engineering Semester, year: 1, 2014 ii A framework for communicating with Android apps from the browser Karl Lindström 2014-05-29 Abstract With the recent growth of the mobile market, companies want to target mobile devices while at the same time keeping product development costs low. One way to do this is to develop web applications, which are accessed from a mobile de- vice’s web browser, instead of native applications. The same web application can then be used on different platforms such as Android and iOS. However, devices such as smart phones and tablets often include cameras and sensors that a web ap- plication may want to access, but which are only accessible from native applica- tions. A framework was developed that enables web applications to communicate with native Android applications. Native applications are launched by clicking a link in the browser, and the result produced is made available to the web applica- tion through a HTTP POST request or a local web server running on the device. Key characteristics of the framework include ease of extension and the ability to enable secure (SSL) communication if desired.
    [Show full text]
  • Computing Homomorphic Program Invariants Benjamin Robert Holland Iowa State University
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2018 Computing homomorphic program invariants Benjamin Robert Holland Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Sciences Commons Recommended Citation Holland, Benjamin Robert, "Computing homomorphic program invariants" (2018). Graduate Theses and Dissertations. 16818. https://lib.dr.iastate.edu/etd/16818 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Computing homomorphic program invariants by Benjamin Robert Holland A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Engineering (Secure and Reliable Computing) Program of Study Committee: Suraj Kothari, Co-major Professor Yong Guan, Co-major Professor Srikanta Tirthapura Hridesh Rajan Wei Le The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2018 Copyright © Benjamin Robert Holland, 2018. All rights reserved. ii DEDICATION I would like to dedicate this work to my family, friends, colleagues, mentors, coworkers, and Amber. Without any of you this would not have been possible.
    [Show full text]
  • Implementation of a Homematic Simulator Using Android
    Fakultät für Informatik der Technische Universität München Bachelor’s Thesis in Informatics Implementation of a HomeMatic simulator using Android Johannes Neutze Fakultät für Informatik der Technische Universität München Bachelor’s Thesis in Informatics Implementation of a HomeMatic simulator using Android Implementierung eines HomeMatic Simulators unter Android Author: Johannes Neutze Supervisor: Prof. Dr. Uwe Baumgarten Advisor: Nils T. Kannengießer, M. Sc. Date: April 15, 2013 I assure the single handed composition of this bachelor thesis only supported by declared resources. Munich, 15th of April, 2013 Johannes Neutze Abstract This bachelor thesis is about the development of an Android application, simulating the HomeMatic system and the XML­API v1.2, designed to control the HomeMatic system, e.g. through the HomeDroid client. Configuration of the system, handshake with HomeDroid and persistent storage of the entire system status is implemented by using standard components, like a HTTP server, a relational database and an Android graphical user interface. As a result, the user can connect to the application via HomeDroid and interact with it, like with the real hardware. First, the foundation of emulated system and the environment used to develop the application, are explained. Then the system requirements are outlined and the architecture is illustrated. Finally, the implementation of each component, the graphical user interface and its testing are described. Table of contents Abstract.....................................................................................................................................
    [Show full text]
  • An Industrial Study of Misusing Android Internet Sockets
    When Program Analysis Meets Mobile Security: An Industrial Study of Misusing Android Internet Sockets Wenqi Bu Minhui Xue Lihua Xu East China Normal University, China New York University Shanghai East China Normal University, China [email protected] East China Normal University, China [email protected] [email protected] Yajin Zhou Zhushou Tang Tao Xie China Pwnzen Infotech Inc., China University of Illinois at [email protected] [email protected] Urbana-Champaign, USA [email protected] ABSTRACT ACM Reference format: Despite recent progress in program analysis techniques to identify Wenqi Bu, Minhui Xue, Lihua Xu, Yajin Zhou, Zhushou Tang, and Tao vulnerabilities in Android apps, significant challenges still remain Xie. 2017. When Program Analysis Meets Mobile Security: An Industrial Study of Misusing Android Internet Sockets. In Proceedings of 2017 11th for applying these techniques to large-scale industrial environ- Joint Meeting of the European Software Engineering Conference and the ACM ments. Modern software-security providers, such as Qihoo 360 and SIGSOFT Symposium on the Foundations of Software Engineering, Paderborn, Pwnzen (two leading companies in China), are often required to Germany, September 4–8, 2017 (ESEC/FSE’17), 6 pages. process more than 10 million mobile apps at each run. In this work, https://doi.org/10.1145/3106237.3117764 we focus on effectively and efficiently identifying vulnerable usage of Internet sockets in an industrial setting. To achieve this goal, we propose a practical hybrid approach that enables lightweight yet 1 INTRODUCTION precise detection in the industrial setting. In particular, we integrate Mobile apps and their users have witnessed a massive growth over the process of categorizing potential vulnerable apps with analysis the last decade.
    [Show full text]
  • Generic Open Terminal API Framework (Gotapi) Candidate Version 1.1 – 15 Dec 2015
    Generic Open Terminal API Framework (GotAPI) Candidate Version 1.1 – 15 Dec 2015 Open Mobile Alliance OMA-ER-GotAPI-V1_1-20151215-C 2015 Open Mobile Alliance Ltd. All Rights Reserved. Used with the permission of the Open Mobile Alliance Ltd. under the terms as stated in this document. [OMA-Template-CombinedRelease-20150101-I] OMA-ER-GotAPI-V1_1-20151215-C Page 2 (81) Use of this document is subject to all of the terms and conditions of the Use Agreement located at http://www.openmobilealliance.org/UseAgreement.html. Unless this document is clearly designated as an approved specification, this document is a work in process, is not an approved Open Mobile Alliance™ specification, and is subject to revision or removal without notice. You may use this document or any part of the document for internal or educational purposes only, provided you do not modify, edit or take out of context the information in this document in any manner. Information contained in this document may be used, at your sole risk, for any purposes. You may not use this document in any other manner without the prior written permission of the Open Mobile Alliance. The Open Mobile Alliance authorizes you to copy this document, provided that you retain all copyright and other proprietary notices contained in the original materials on any copies of the materials and that you comply strictly with these terms. This copyright permission does not constitute an endorsement of the products or services. The Open Mobile Alliance assumes no responsibility for errors or omissions in this document. Each Open Mobile Alliance member has agreed to use reasonable endeavors to inform the Open Mobile Alliance in a timely manner of Essential IPR as it becomes aware that the Essential IPR is related to the prepared or published specification.
    [Show full text]
  • Core Java® Volume II—Advanced Features Ninth Edition This Page Intentionally Left Blank Core Java® Volume II—Advanced Features Ninth Edition
    Core Java® Volume II—Advanced Features Ninth Edition This page intentionally left blank Core Java® Volume II—Advanced Features Ninth Edition Cay S. Horstmann Gary Cornell Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission.
    [Show full text]
  • Distributed Mobile Microservice Execution Enabled by Efficient Inter
    REACT: Distributed Mobile Microservice Execution Enabled by Efficient Inter-Process Communication Chathura Sarathchandra InterDigital Europe London, United Kingdom [email protected] ABSTRACT CCS CONCEPTS The increased mobile connectivity, the range and number • Human-centered computing ! Ubiquitous and mo- of services available in various computing environments in bile computing; Ubiquitous computing; Mobile com- the network, demand mobile applications to be highly dy- puting. namic to be able to efficiently incorporate those services into applications, along with other local capabilities on mobile KEYWORDS devices. However, the monolithic structure and mostly static Mobile Microservices; Mobile Function Distribution; Mobile configuration of mobile application components today limit Cloud Computing; Offloading; Microservices; application’s ability to dynamically manage internal compo- nents, to be able to adapt to the user and the environment, and utilize various services in the network for improving the ACM Reference Format: Chathura Sarathchandra. 2021. REACT: Distributed Mobile Mi- application experience. croservice Execution Enabled by Efficient Inter-Process Commu- In this paper, we present REACT, a new Android-based nication. In xxxx ’xx: xxx xxth xxxx xxxx xxx xxxxx, xxxx xx–xx, framework that enables apps to be developed as a collection xxxxx, xx xx, xxxx. ACM, New York, NY, USA, 14 pages. https: of loosely coupled microservices (MS). It allows individual //doi.org/10.1145/1122445.1122456 distribution, dynamic
    [Show full text]
  • Using the Lexicon from Source Code to Determine Application Domains
    Using the Lexicon from Source Code to Determine Application Domains Andrea Capiluppi∗, Nemitari Ajienkay, Nour Ali∗, Mahir Arzoky∗, Steve Counsell∗, Giuseppe Destefanis∗ Alina Miron∗, Bhaveet Nagaria∗, Rumyana Neykova∗, Martin Shepperd∗, Stephen Swift∗, Allan Tucker∗ ∗Department of Computer Science, Brunel University London, UK yDepartment of Computer Science, Edge Hill University, UK Abstract—Context: The vast majority of software engineering As in the example reported in [21], the extensive study of research is independent of the application domain: techniques all JSON parsers available would find similarities between and tools usage is reported without any domain context. This has them or common patterns. That type of study would focus not always been the case - early in the computing era, there were totally separate application domains (for example, scientific and on one particular language (JSON), one specific domain data processing) and the research focus was often application- (parsers) and inevitably draw limited conclusions. On the specific. other hand, considering the “parsers” domain (but without Objective: This paper claims that software systems should be focusing on one single language) would show the common separated and analysed into domain clusters. We propose a code- characteristics of developing that type of systems, irrespective based approach to identify the application domain of a software system, via its lexicon. We compare its precision with the plain of their language. The thrust of this paper stems from the work textual description attached to the same system. of several prominent researchers who called the empirical Method: Using a sample of 50 Java projects, we obtained i) software engineering community to ‘go deeper, not wider’ [24] the description of each project (e.g., its ReadMe file), ii) the and ‘minding the mine, mining the mind’ [18].
    [Show full text]
  • Statically-Informed Dynamic Analysis Tools to Detect Algorithmic Complexity Vulnerabilities
    Statically-informed Dynamic Analysis Tools to Detect Algorithmic Complexity Vulnerabilities Benjamin Holland, Ganesh Ram Santhanam, Payas Awadhutkar, and Suresh Kothari Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa 50011 Email: {bholland, gsanthan, payas, kothari}@iastate.edu Abstract—Algorithmic Complexity (AC) vulnerabilities can Our approach to detect AC vulnerabilities involves: (1) be exploited to cause a denial of service attack. Specifically, an Automated Exploration: Identify loops, precompute their adversary can design an input to trigger excessive (space/time) crucial attributes such as intra- and inter-procedural nest- resource consumption. It is not possible to build a fully automated tool to detect AC vulnerabilities. Since it is an ing structures and depths, and termination conditions. (2) open-ended problem, a human-in-loop exploration is required Hypothesis Generation: Through an interactive inspection to find the program loops that could have AC vulnerabilities. of the precomputed information the analyst hypothesizes Ascertaining whether an arbitrary loop has an AC vulnerability plausible AC vulnerabilities and selects candidate loops for is itself difficult, which is equivalent to the halting problem. further examination using dynamic analysis. (3) Hypothesis This paper is about a pragmatic engineering approach to detect AC vulnerabilities. It presents a statically-informed Validation: The analyst inserts probes and creates a driver dynamic (SID) analysis and two tools that provide critical to exercise the program by feeding workloads to measure capabilities for detecting AC vulnerabilities. The first is a static resource consumption for the selected loops. analysis tool for exploring the software to find loops as the Since detecting AC vulnerabilities is an open-ended prob- potential candidates for AC vulnerabilities.
    [Show full text]