Software Fingerprinting for Automated Assembly Code Analysis Project Overview

Total Page:16

File Type:pdf, Size:1020Kb

Software Fingerprinting for Automated Assembly Code Analysis Project Overview Software fingerprinting for automated assembly code analysis Project overview P. Charland DRDC – Valcartier Research Centre Defence Research and Development Canada Scientific Report DRDC-RDDC-2015-R027 March 2015 © Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2015 © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2015 Abstract …….. With the revolution in information technology, the dependence of the Canadian Armed Forces (CAF) on their information systems continues to grow. While information systems-based assets confer a distinct advantage, they also make the CAF vulnerable if adversaries interfere with those. Unfortunately, the technology required to disrupt and damage an information system through malicious software (malware) is far less sophisticated and expensive than the amount of investment required to create the system. To understand and mitigate this threat, reverse engineering has to be performed to analyze malware. However, software reverse engineering is a manually intensive and time-consuming process. The learning curve to master it is quite steep and once mastered, the process is hindered when anti-reverse engineering techniques are used. This results in the very few available reverse engineers being quickly saturated. This Scientific Report describes new approaches to accelerate the reverse engineering process of malware. The goal is to reduce redundant analysis efforts by automating the identification of code fragments which reuse (i) previously analyzed assembly code or (ii) open source code publicly available. Significance to defence and security With today’s proliferation of malware, new analysis techniques which go beyond existing best practices are needed to accelerate the reverse engineering process. The intended audience for this report are thus software reverse engineers and malware analysts, as well as managers and decision makers of organizations which have to reverse engineer third party code. The techniques presented in this report will contribute to provide the CAF with a core capability to respond more effectively and rapidly to cyber threats and lead to a change in their defence capabilities. To defend against cyber attacks, the CAF currently rely on antivirus and intrusion detection systems (IDS). These approaches consist of looking for symptoms such as increased network traffic or anomalous signals, and try to fight the problems that arise. Although they can be effective in some cases, they do not help in understanding what the real problem is and are ineffective when attacks morph. Providing a new capability for malware reverse engineering, as it is proposed in this report, will allow the CAF to uncover vulnerabilities that malware exploit. This will also give them the required knowledge to generate defences against future malicious attacks to pre-empt them, even when they morph. This new capability will introduce a paradigm shift in the practices of the CAF in the cyber environment. They will move from a strictly defensive approach, which has shown its limits, to a proactive approach. This proposed approach has also security potential for the Royal Canadian Mounted Police (RCMP), the Communications Security Establishment Canada (CSEC), the Canadian Security Intelligence Service (CSIS), and Public Safety (PS) Canada. DRDC-RDDC-2015-R027 i This page intentionally left blank. ii DRDC-RDDC-2015-R027 Résumé …….. Avec la révolution des technologies de l'information, la dépendance des Forces armées canadiennes (FAC) sur leurs systèmes d'information ne cesse de croître. Bien que les systèmes d’information sont des actifs qui confèrent un net avantage aux FAC, ils les rendent également vulnérables si des adversaires interfèrent avec ceux-ci. Malheureusement, la technologie nécessaire pour perturber et endommager un système d'information par l’entremise de logiciels malveillants est beaucoup moins sophistiquée et onéreuse que le montant de l'investissement nécessaire à la création du système. Pour comprendre et atténuer cette menace, la rétro-ingénierie doit être effectuée afin d'analyser ces logiciels malveillants. Cependant, la rétro-ingénierie logicielle est un processus manuel et fastidieux. La courbe d'apprentissage est assez abrupte et une fois maîtrisé, le processus est entravé lorsque des techniques pour contrer la rétro-ingénierie sont utilisées. Cela se traduit par le fait que les très rares rétro-ingénieurs disponibles sont rapidement saturés. Le présent rapport scientifique décrit de nouvelles approches pour accélérer le processus de rétro-ingénierie de logiciels malveillants. L'objectif est de réduire les efforts d'analyse redondants en automatisant l'identification de fragments de code qui réutilisent (i) du code assembleur analysé précédemment ou (ii) du code source ouvert accessible à tous. Importance pour la défense et la sécurité Avec la prolifération actuelle des logiciels malveillants, de nouvelles techniques d'analyse allant au-delà des meilleures pratiques existantes de rétro-ingénierie sont nécessaires pour accélérer le processus. L’audience visée par ce rapport sont donc les rétro-ingénieurs logiciels et analystes de maliciels, ainsi que les gestionnaires et décideurs d’organisations qui ont à faire la rétro- ingénierie de code tiers. Les techniques présentées dans ce rapport contribueront à fournir aux FAC une capacité de base pour répondre plus efficacement et rapidement aux menaces cybernétiques et mènera à un changement dans leurs capacités de défense. Pour se prémunir contre les cyber-attaques, les FAC comptent actuellement sur des antivirus et des systèmes de détection d'intrusions. Ces approches consistent à chercher des symptômes, tels qu’une augmentation du trafic réseau ou des signes anormaux, et essayer de lutter contre les problèmes qui surgissent. Bien qu'elles puissent être efficaces dans certains cas, elles n'aident pas à comprendre ce qu’est le véritable problème et sont inefficaces lorsque les attaques se transforment. Fournir une nouvelle capacité de rétro-ingénierie de logiciels malveillants, tel que proposée dans le présent rapport, permettra aux FAC de découvrir les vulnérabilités que les logiciels malveillants exploitent. Ceci leur fournira aussi les connaissances requises afin de générer des mesures défensives contre de futures attaques malveillantes afin de les contrer, même lorsqu’elles se transforment. Cette nouvelle capacité introduira un changement de paradigme dans les pratiques des FAC dans le cyber-environnement. Celles-ci passeront d'une approche strictement défensive, dont les limites ont été exposées, à une approche proactive. L’approche proposée a aussi un potentiel pour la sécurité, notamment la Gendarmerie royale du Canada (GRC), le Centre de la sécurité des télécommunications Canada (CSTC), le Service canadien du renseignement de sécurité (SCRS) et la Sécurité publique (SP) Canada. DRDC-RDDC-2015-R027 iii This page intentionally left blank. iv DRDC-RDDC-2015-R027 Table of contents Abstract …….. ................................................................................................................................. i Significance to defence and security ................................................................................................ i Résumé …….. ................................................................................................................................ iii Importance pour la défense et la sécurité ....................................................................................... iii Table of contents ............................................................................................................................. v List of figures ................................................................................................................................. vi List of tables .................................................................................................................................. vii 1 Introduction ............................................................................................................................... 1 2 Assembly to source code matching ........................................................................................... 3 2.1 RE-Google ....................................................................................................................... 3 2.2 Open source code repositories ......................................................................................... 3 2.2.1 Black Duck Open Hub Code Search ................................................................... 4 2.2.2 GitHub ................................................................................................................. 4 2.3 Exact matching ................................................................................................................ 5 2.4 Close matching .............................................................................................................. 10 2.5 Contextual matching ...................................................................................................... 12 3 Code clone detection ............................................................................................................... 14 3.1 Assembly code clone detection ..................................................................................... 14 3.2 Assembly code clone search .........................................................................................
Recommended publications
  • Inequalities in Open Source Software Development: Analysis of Contributor’S Commits in Apache Software Foundation Projects
    RESEARCH ARTICLE Inequalities in Open Source Software Development: Analysis of Contributor’s Commits in Apache Software Foundation Projects Tadeusz Chełkowski1☯, Peter Gloor2☯*, Dariusz Jemielniak3☯ 1 Kozminski University, Warsaw, Poland, 2 Massachusetts Institute of Technology, Center for Cognitive Intelligence, Cambridge, Massachusetts, United States of America, 3 Kozminski University, New Research on Digital Societies (NeRDS) group, Warsaw, Poland ☯ These authors contributed equally to this work. * [email protected] a11111 Abstract While researchers are becoming increasingly interested in studying OSS phenomenon, there is still a small number of studies analyzing larger samples of projects investigating the structure of activities among OSS developers. The significant amount of information that OPEN ACCESS has been gathered in the publicly available open-source software repositories and mailing- list archives offers an opportunity to analyze projects structures and participant involve- Citation: Chełkowski T, Gloor P, Jemielniak D (2016) Inequalities in Open Source Software Development: ment. In this article, using on commits data from 263 Apache projects repositories (nearly Analysis of Contributor’s Commits in Apache all), we show that although OSS development is often described as collaborative, but it in Software Foundation Projects. PLoS ONE 11(4): fact predominantly relies on radically solitary input and individual, non-collaborative contri- e0152976. doi:10.1371/journal.pone.0152976 butions. We also show, in the first published study of this magnitude, that the engagement Editor: Christophe Antoniewski, CNRS UMR7622 & of contributors is based on a power-law distribution. University Paris 6 Pierre-et-Marie-Curie, FRANCE Received: December 15, 2015 Accepted: March 22, 2016 Published: April 20, 2016 Copyright: © 2016 Chełkowski et al.
    [Show full text]
  • Exploring Factors and Measures to Select Open Source Software
    Exploring Factors and Measures to Select Open Source Software Xiaozhou Li*a, Sergio Moreschini*a, Zheying Zhanga, Davide Taibia aTampere University, Tampere (Finland) ∗ the two authors equally contributed to the paper Abstract [Context] Open Source Software (OSS) is nowadays used and integrated in most of the commercial products. However, the selection of OSS projects for integration is not a simple process, mainly due to a of lack of clear selection models and lack of information from the OSS portals. [Objective] We investigated the current factors and measures that prac- titioners are currently considering when selecting OSS, the source of infor- mation and portals that can be used to assess the factors, and the possibility to automatically get this information with APIs. [Method] We elicited the factors and the measures adopted to assess and compare OSS performing a survey among 23 experienced developers who often integrate OSS in the software they develop. Moreover, we investigated the APIs of the portals adopted to assess OSS extracting information for the most starred 100K projects in GitHub. [Result] We identified a set consisting of 8 main factors and 74 sub- factors, together with 170 related metrics that companies can use to select OSS to be integrated in their software projects. Unexpectedly, only a small part of the factors can be evaluated automatically, and out of 170 metrics, only 40 are available, of which only 22 returned information for all the 100K projects. [Conclusion.] OSS selection can be partially automated, by extracting arXiv:2102.09977v1 [cs.SE] 19 Feb 2021 the information needed for the selection from portal APIs.
    [Show full text]
  • Publishing Research Software As Open Source on Github
    Publishing Research Software as Open Source on GitHub Table of Contents 1. Introduction 2. Scope & Goals 3. Science & Software i. Reproducibility ii. Software Quality iii. Software Development iv. Software Documentation v. Guide 4. Open Source Basics i. Mindset ii. Arguments against open source... and how to disprove them iii. Success Stories iv. Legal Stuff v. People vi. Guide 5. GitHub i. Basics: Accounts & Repositories ii. Fork & Pull Workflow iii. Social Coding iv. GitHub for Education 6. Software Communities i. Community Building and Openness ii. Marketing and Public Relations iii. Types of Contributors and Tasks iv. Open Source in Your Domain 7. Scientific Publishing of Data and Software 8. Contribute 9. Glossary 2 Publishing Research Software as Open Source on GitHub Introduction How can you publish research software as open source? And do so without too much overhead and actually gain impact by leveraging the open source approach? These questions are answered in this best practice "Publishing Research Software as Open Source on GitHub". It is published by the GLUES project's SDI team. LICENSE This work is licensed under a Creative Commons Attribution 4.0 International License. About this best practice The "source code" of this document is hosted on GitHub and the book was written and published using GitBook. The text is designed to be read in the web view, but PDF and other formats, e.g. for e- readers, area available as well. Version: 0.1 Contributors Thanks to these people for providing contents, giving valuable feedback, reporting errors, ... Daniel Nüst Simon Jirka Ann Hitchcock Want to become a contributor? Check our contribution guidelines.
    [Show full text]
  • Repositories
    Exploring methods for dependency management in multi- repositories Design science research at Saab Training and simulation Main Subject area: Computer Engineering Author: Oskar Persson, Samuel Svensson Supervisor: Ragnar Nohre JÖNKÖPING 2021-07-15 This final thesis has been carried out at the School of Engineering at Jönköping University within Computer Engineering. The authors are responsible for the presented opinions, conclusions, and results. Examiner: Florian Westphal Supervisor: Ragnar Nohre Scope: 15 hp (first-cycle education) Date: 2021-07-15 i Abstract Dependency problems for developers are like sneezing for people with pollen allergies during the spring, an everyday problem. This is especially true when working in multi-repositories. The dependency problems that occur do so as a byproduct of enabling developers to work on different components of a project in smaller teams, where everything is version controlled. Nearly all developers use version control systems, such as Git, Mercurial, or Subversion. While version control systems have helped developers for nearly 40 years and are constantly getting updated, there are still functionalities that do not exist. One example of that is having a good way of managing dependencies and allowing developers to download projects without having to handle dependency problems manually. The solutions that version control systems offer to help manage dependencies (e.g., Git’s submodules or Mercurial’s subrepositories), do not enable developers a fail-safe download or build the project if it contains dependency problems. In this study, a case study was conducted at Saab Training and Simulation to explore methods for dependency management as well as discuss and highlight some of the problems that emerge when working with dependencies in multi-repositories.
    [Show full text]
  • ESIP Software Guidelines: Bibliography and Resources
    ESIP Software Guidelines: Bibliography and Resources References “26.4. Unittest — Unit Testing Framework — Python 3.5.2 Documentation.” Accessed November 23, 2016. https://docs.python.org/3.5/library/unittest.html. “Accessibility - W3C.” Accessed June 15, 2016. https://www.w3.org/standards/webdesign/accessibility#wai. “Agile Project Management.” Accessed October 20, 2016. https://www.pivotaltracker.com/. “A JavaScript Library for Building User Interfaces - React.” Accessed October 26, 2016. https://facebook.github.io/react/. Alter, George, George C Banks, Denny Borsboom, Sara D Bowman, Steven J Breckler, Stuart Buck, Chris Chambers, et al. Transparency and Openness Promotion (TOP) Guidelines. Open Science Framework, 2016. osf.io/9f6gx. “Apache Subversion.” Accessed October 19, 2016. https://subversion.apache.org/. “API Blueprint | API Blueprint.” Accessed October 20, 2016. https://apiblueprint.org/. Atlassian. “Bitbucket | The Git Solution for Professional Teams.” Bitbucket. Accessed October 20, 2016. https://bitbucket.org/. “Backbone.js.” Accessed October 26, 2016. http://backbonejs.org/. “Best Practice Library | Section508.gov.” Accessed May 24, 2016. http://section508.gov/content/learn/best-practice-library. “Bootstrap · The World’s Most Popular Mobile-First and Responsive Front-End Framework.” Accessed October 20, 2016. http://getbootstrap.com/. Brutlag, Jake. “Speed Matters.” Google Research Blog, June 23, 2009. https://research.googleblog.com/2009/06/speed-matters.html. Burger, Matthias, Klaus Juenemann, and Thomas Koenig. RUnit: R Unit Test Framework (version 0.4.31), 2015. https://cran.r-project.org/web/packages/RUnit/index.html. Burgess, Annie. “2015 AIST Evaluations Overview.” Federation of Earth Science Information Partners, 2016. http://testbed.esipfed.org/sites/default/files/2015_AIST_Evaluations_Overview.pdf. Car, Nicholas. “Data Reuse Fitness Assessment Using Provenance.” Denver, CO, 2016.
    [Show full text]
  • A Platform for Building and Sharing Mining Software Repositories Tools As Apps Nitin Mukesh Tiwari Iowa State University
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2017 The design and implementation of Candoia: A platform for building and sharing mining software repositories tools as apps Nitin Mukesh Tiwari Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Sciences Commons Recommended Citation Tiwari, Nitin Mukesh, "The design and implementation of Candoia: A platform for building and sharing mining software repositories tools as apps" (2017). Graduate Theses and Dissertations. 15439. https://lib.dr.iastate.edu/etd/15439 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. The design and implementation of Candoia: A platform for building and sharing mining software repositories tools as apps by Nitin Mukesh Tiwari A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Computer Science Program of Study Committee: Hridesh Rajan, Major Professor Gurpur Prabhu Steven M. Kautz Iowa State University Ames, Iowa 2017 Copyright c Nitin Mukesh Tiwari, 2017. All rights reserved. ii DEDICATION To my teachers, family and friends, who made me realize the real purpose of education. iii TABLE OF CONTENTS LIST OF FIGURES . iv ACKNOWLEDGEMENTS . v ABSTRACT . vi CHAPTER 1. INTRODUCTION . 1 CHAPTER 2. MOTIVATION . 4 CHAPTER 3. CANDOIA PLATFORM & ECOSYSTEM .
    [Show full text]
  • Open Source Softwareprojekte Zwischen Passion Und Kalkül
    STUTTGARTER BEITRÄGE ZUR ORGANISATIONS- UND INNOVATIONSFORSCHUNG SOI Discussion Paper 2015-02 Open Source Softwareprojekte zwischen Passion und Kalkül Jan-Felix Schrape Institut für Sozialwissenschaften Organisations- und Innovationssoziologie Jan-Felix Schrape Open Source Softwareprojekte zwischen Passion und Kalkül SOI Discussion Paper 2015-02 Universität Stuttgart Institut für Sozialwissenschaften Abteilung für Organisations- und Innovationssoziologie (SOWI VI) Seidenstr. 36 D-70174 Stuttgart http://www.uni-stuttgart.de/soz/oi/ Herausgeber Prof. Dr. Ulrich Dolata Tel.: 0711 / 685-81001 [email protected] Redaktion Dr. Jan-Felix Schrape Tel.: 0711 / 685-81004 [email protected] Stuttgarter Beiträge zur Organisations- und Innovationsforschung (SOI) Discussion Paper 2015-02 (11/2015) ISSN 2191-4990 © 2015 by the author(s) Jan-Felix Schrape ist wissenschaftlicher Mitarbeiter der Abteilung Innovations- und Organisationssoziologie am Institut für Sozialwissenschaften der Universität Stuttgart. [email protected] Weitere Downloads der Abteilung für Organisations- und Innovationssoziologie am Institut für Sozialwissenschaften der Universität Stuttgart finden sich unter: http://www.uni-stuttgart.de/soz/oi/publikationen/ Zusammenfassung Dieses Papier entwickelt auf der Grundlage von aggregierten Marktdaten, Dokumen- tenanalysen sowie Literaturauswertungen einen systematisierenden Überblick über Open Source Software Communities und ihre sozioökonomischen Kontexte. Nach einer Rekonstruktion der
    [Show full text]
  • Open Source Systems: Towards Robust Practices
    IFIP AICT 496 Federico Balaguer Roberto Di Cosmo Alejandra Garrido Fabio Kon Gregorio Robles Stefano Zacchiroli (Eds.) Open Source Systems: Towards Robust Practices 13th IFIP WG 2.13 International Conference, OSS 2017 Buenos Aires, Argentina, May 22–23, 2017 Proceedings IFIP Advances in Information and Communication Technology 496 Editor-in-Chief Kai Rannenberg, Goethe University Frankfurt, Germany Editorial Board TC 1 – Foundations of Computer Science Jacques Sakarovitch, Télécom ParisTech, France TC 2 – Software: Theory and Practice Michael Goedicke, University of Duisburg-Essen, Germany TC 3 – Education Arthur Tatnall, Victoria University, Melbourne, Australia TC 5 – Information Technology Applications Erich J. Neuhold, University of Vienna, Austria TC 6 – Communication Systems Aiko Pras, University of Twente, Enschede, The Netherlands TC 7 – System Modeling and Optimization Fredi Tröltzsch, TU Berlin, Germany TC 8 – Information Systems Jan Pries-Heje, Roskilde University, Denmark TC 9 – ICT and Society Diane Whitehouse, The Castlegate Consultancy, Malton, UK TC 10 – Computer Systems Technology Ricardo Reis, Federal University of Rio Grande do Sul, Porto Alegre, Brazil TC 11 – Security and Privacy Protection in Information Processing Systems Steven Furnell, Plymouth University, UK TC 12 – Artificial Intelligence Ulrich Furbach, University of Koblenz-Landau, Germany TC 13 – Human-Computer Interaction Marco Winckler, University Paul Sabatier, Toulouse, France TC 14 – Entertainment Computing Matthias Rauterberg, Eindhoven University of Technology, The Netherlands IFIP – The International Federation for Information Processing IFIP was founded in 1960 under the auspices of UNESCO, following the first World Computer Congress held in Paris the previous year. A federation for societies working in information processing, IFIP’s aim is two-fold: to support information processing in the countries of its members and to encourage technology transfer to developing na- tions.
    [Show full text]
  • Desarrollo Para Hacer Llegar Perceval a Las Masas ______
    Participación en el proyecto de código abierto Perceval: Desarrollo para hacer llegar Perceval a las masas __________________________________________ Desarrollo de aplicaciones Autor: José Miguel Cañibano Iglesias Consultor: Gregorio Robles Martínez Tutor externo: Santiago Dueñas Domínguez 8 de Enero de 2.017 LICENCIA La licencia de todo el contenido del proyecto, tanto de la memoria como del código, así como de cualquier otro contenido, está ligada a todos los efectos a la misma que la del proyecto Perceval [1], y que en el momento de hacer el presente trabajo está basada en GPU v3 (5.4 Licencia GNU v3, 29 de Junio de 2007, página. 65) RESUMEN DEL PROYECTO El proyecto gira en torno a la aplicación de recuperación y recolección de datos de repositorios Perceval [2]. Perceval puede manejar distintos tipos de repositorios como pueden ser: Bugzilla, Gerrit, Git, Jenkins, ReMo, etc. Debido a que Perceval es un proyecto colaborativo que evoluciona constantemente, adolece de determinadas características, y en este proyecto se ha intentado realizar la colaboración aportando al mismo esas características que lo harán una herramienta mucho más completa. La idea básica ha sido intentar llegar a un más amplio grupo de usuarios, haciendo el hincapié en dos puntos: por un lado el sistema operativo que use; por otro en qué formato de salida dé el resultado. Una de las bases ha sido usar herramientas de software libre en detrimento de las propietarias, haciendo un estudio en cada caso de las distintas posibilidades. En cualquier caso, en la realización del trabajo no sólo se basa en la realización de un documento o de unas líneas de código, sino en el trabajo continuo en una comunidad de Open Source, colaborando y participando con ella, pues, al fin y al cabo, parte esencial del Software Libre son las distintas comunidades que existen y que dan una razón de ser a este máster.
    [Show full text]
  • Git As an Encrypted Distributed Version Control System Russell G
    Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-26-2015 Git as an Encrypted Distributed Version Control System Russell G. Shirey Follow this and additional works at: https://scholar.afit.edu/etd Part of the Computer Engineering Commons Recommended Citation Shirey, Russell G., "Git as an Encrypted Distributed Version Control System" (2015). Theses and Dissertations. 57. https://scholar.afit.edu/etd/57 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. GIT AS AN ENCRYPTED DISTRIBUTED VERSION CONTROL SYSTEM THESIS Russell G. Shirey, Captain, USAF AFIT-ENG-MS-15-M-022 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A. APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this thesis are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. AFIT-ENG-MS-15-M-022 GIT AS AN ENCRYPTED DISTRIBUTED VERSION CONTROL SYSTEM THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Russell G.
    [Show full text]
  • Ultra-Large-Scale Software Repository and Source Code Mining
    1 Boa: Ultra-Large-Scale Software Repository and Source Code Mining ROBERT DYER, Bowling Green State University HOAN ANH NGUYEN, Iowa State University HRIDESH RAJAN, Iowa State University TIEN N. NGUYEN, Iowa State University In today’s software-centric world, ultra-large-scale software repositories, e.g. SourceForge, GitHub, and Google Code, are the new library of Alexandria. They contain an enormous corpus of software and related information. Scientists and engineers alike are interested in analyzing this wealth of information. However, systematic extraction and analysis of relevant data from these repositories for testing hypotheses is hard, and best left for mining software repository (MSR) experts! Specifically, mining source code yields significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code. In this paper we address mining source code: a) at a very large scale; b) at a fine-grained level of detail; and c) with full history information. To address these challenges, we present domain-specific language features for source code mining in our language and infrastructure called Boa. The goal of Boa is to ease testing MSR-related hypotheses. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also show drastic improvements in scalability. CCS Concepts: •Software and its engineering ! Patterns; Concurrent programming structures; Additional Key Words and Phrases: Boa; mining software repositories; domain-specific language; scalable; ease of use; lower barrier to entry ACM Reference Format: Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N.
    [Show full text]
  • Monzur Murshed
    An investigation of software vulnerabilities in open source software projects using data from publicly-available online sources S. M. Monzur Murshed A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Applied Science in Technology Innovation Management Carleton University Ottawa, Ontario Copyright © 2017 S. M. Monzur Murshed An investigation of software vulnerabilities in open source software projects using data from publicly-available sources Copyright © 2017 S. M. Monzur Murshed _______________________________________________________________________________________________________________________________________________ Abstract Software vulnerabilities is an active area of research, but little is known about how publicly-observable properties of open source software projects and developer communities relate to the time taken to discover and fix vulnerabilities in the projects’ software. This thesis examines that relationship using data harvested from online sources about a sample of 60 open source content management system (CMS) projects and 1268 vulnerabilities affecting the software produced by those projects. Combining project release histories with metrics from two online databases provided reliable proxy dates for vulnerability introduction and fix, but not discovery. Higher commit density (a proxy for project activity) was associated with shorter time of exposure. The lifecycle model, data collection workflow, and software scripts will enable
    [Show full text]