Graph Databases and Their Application to the Italian Business Register for Efficient Search of Relationships Among Companies

Total Page:16

File Type:pdf, Size:1020Kb

Graph Databases and Their Application to the Italian Business Register for Efficient Search of Relationships Among Companies Universit`adegli Studi di Padova Dipartimento di Ingegneria dell'Informazione Corso di Laurea Magistrale in Ingegneria Informatica Tesi di Laurea Magistrale Graph databases and their application to the Italian Business Register for efficient search of relationships among companies Basi di dati a grafo e loro applicazione al Registro Imprese Italiano per la ricerca efficiente di relazioni tra le imprese 4 aprile 2017 Laureando: Relatore: Sinico Luca Prof. Ferro Nicola 1108637 Anno Accademico 2016/2017 Universit`adegli Studi di Padova Dipartimento di Ingegneria dell'Informazione Corso di Laurea Magistrale in Ingegneria Informatica Tesi di Laurea Magistrale Graph databases and their application to the Italian Business Register for efficient search of relationships among companies Basi di dati a grafo e loro applicazione al Registro Imprese Italiano per la ricerca efficiente di relazioni tra le imprese 4 aprile 2017 Laureando: Relatore: Sinico Luca Prof. Ferro Nicola 1108637 Anno Accademico 2016/2017 Abstract \What are the main characteristics of graph databases? In what do they differ from relational ones, and in what situations should we use them?" These are the main questions of our work. We studied and tested three of the major graph databases of the moment (ArangoDB, Neo4j, OrientDB), and we compared them with a relational database, implemented with PostgreSQL. We worked on a dataset representing equity participations among companies, and we found out that the strong points of graph databases are: the purposely designed storage techniques, which let them have good performance on graph datasets; and the purposely designed query languages, which go beyond the standard SQL and manage the typical problems that arise when graphs are explored. However, we have seen that the main performance increments have been obtained when heavy graph situations are queried; for simpler situations and queries, a relational database performs equally well. i ii Dedicato ad Anna. Dedicato alla mia famiglia. iii iv Contents Abstract i 1 Introduction1 2 Graph Databases3 2.1 Introduction to Graph Databases..........................3 2.2 General definitions..................................4 2.3 Graph database definition.............................. 10 2.4 Graph data models.................................. 12 2.4.1 Property graph................................ 12 2.4.2 RDF graph.................................. 14 2.4.3 Hypergraph.................................. 23 2.5 Graph database characteristics............................ 25 2.5.1 Motivations for the adoption of graph databases.............. 27 2.5.2 Differences from graph computing engines................. 29 2.5.3 Differences from the relational database.................. 31 2.5.4 Differences from document stores...................... 34 v vi CONTENTS 2.6 General approach to process graph queries..................... 35 3 DBMSs comparison 39 3.1 Which are the major Graph DBMSs........................ 39 3.2 The compared DBMSs................................ 43 3.2.1 ArangoDB................................... 43 3.2.2 Neo4j..................................... 54 3.2.3 OrientDB................................... 65 3.2.4 PostgreSQL.................................. 76 3.3 Feature matrix.................................... 82 4 Use case 85 4.1 The domain of interest................................ 85 4.2 Entity-Relationship representation.......................... 88 4.2.1 The existing database solution........................ 89 4.3 Graph representation................................. 91 4.3.1 Design Best Practices............................. 92 4.3.2 Logical design................................. 94 4.3.3 Some graph description metrics....................... 95 4.4 Data behaviour.................................... 96 4.5 Applications that could benefit from a graph database usage........... 97 4.6 Rationale for the choice of the graph DBMSs.................... 98 CONTENTS vii 5 Evaluation and comparison of Graph Databases 101 5.1 Use case........................................ 101 5.2 Experimental setup.................................. 102 5.3 Data import...................................... 104 5.3.1 Summary................................... 110 5.4 Cache warm-up.................................... 110 5.5 Performance comparison............................... 113 5.5.1 single match by cf .............................. 116 5.5.2 get direct fathers ............................... 118 5.5.3 get direct fathers only cfs ........................... 122 5.5.4 get direct neighbors .............................. 124 5.5.5 get distinct descendants and level ...................... 128 5.5.6 count distinct ancestors per level ...................... 140 5.5.7 get common descendants ........................... 146 5.5.8 get common ancestors ............................ 152 5.5.9 get shortest directed path ........................... 152 5.5.10 Summary................................... 158 6 Displaying the resulting subgraph of a query 161 7 Conclusions 171 7.1 DBMSs assessment.................................. 171 7.2 Main conclusions................................... 174 7.3 Final considerations.................................. 175 7.4 Future developments................................. 176 Bibliography 179 viii List of Tables 3.1 Feature matrix.................................... 83 5.1 Data import comparison................................ 110 5.2 Result example for query count distinct ancestors per level............ 141 6.1 Constraints combinations on node `00102681419'.................. 166 7.1 Assessment table of the compared DBMSs...................... 174 ix x List of Figures 2.1 Popularity changes of database technologies during the last four years [1].....4 2.2 A directed graph [31]..................................5 2.3 Breadth-First Search pseudocode [122]........................8 2.4 Breadth-First Search exploration order [153].....................8 2.5 Depth-First Search pseudocode [122].........................9 2.6 Depth-First Search exploration order [154]......................9 2.7 Visualization of a property graph [62]........................ 13 2.8 TinkerPop system architecture [125]......................... 13 2.9 Visualization of a RDF graph [141].......................... 15 2.10 RDF classification based on storage technique [38]................. 16 2.11 Linked Data publishing options and workflows [131]................. 20 2.12 Linking Open Data cloud diagram in May 2007 [58]................. 21 2.13 Linking Open Data cloud diagram in August 2014 [58]............... 21 2.14 Directed hypergraph example [48]........................... 24 2.15 Hypergraph translated in a property graph [48]................... 24 xi xii LIST OF FIGURES 2.16 The graph database space [48]............................ 26 2.17 A high-level view of a typical graph analytics engine deployment [48]....... 29 2.18 Modeling friends and friends-of-friends in a relational database [48]........ 33 2.19 Social network example encoded in a document store [48]............. 35 2.20 Graph query phases.................................. 36 3.1 Ranking list of some of the DBMSs tracked by DB-ENGINES.com [1]....... 40 3.2 Trending chart of some of the DBMSs tracked by DB-ENGINES.com [1]..... 41 3.3 Ranking list of the graph DBMSs tracked by DB-ENGINES.com [1]........ 42 3.4 Trending chart of the top 5 graph DBMSs tracked by DB-ENGINES.com [1]... 42 3.5 ArangoDB web interface............................... 53 3.6 ArangoDB subscription levels [25].......................... 53 3.7 Neo4j architecture layers [48]............................. 55 3.8 Neo4j node and relationship store file record structure [48]............. 56 3.9 Example Neo4j social network data structure [48].................. 57 3.10 A simple Neo4j example graph [48].......................... 61 3.11 Neo4j web interface.................................. 64 3.12 Neo4j editions comparison [61]............................ 65 3.13 OrientDB caching architecture [121]......................... 73 3.14 OrientDB web interface................................ 74 3.15 OrientDB editions comparison [99].......................... 75 3.16 OrientDB support subscriptions comparison [83].................. 75 LIST OF FIGURES xiii 4.1 Italian Chamber system map [50]........................... 86 4.2 ER schema before restructuration.......................... 89 4.3 ER schema...................................... 90 4.4 Logical schema for a relational database...................... 91 4.5 Graph data model................................... 95 5.1 Warm-up cache, times chart............................. 112 5.2 Warm-up quick study................................. 112 5.3 Warm-up cache, memory chart............................ 113 5.4 Query single match by cf charts........................... 117 5.5 Fathers of the node with codice fiscale = `00102681419'.............. 119 5.6 Query get direct fathers charts............................ 121 5.7 Query get direct fathers only cfs charts....................... 123 5.8 Neighbors of the node with codice fiscale = `00102681419'............. 125 5.9 Query get direct neighbors charts.......................... 127 5.10 Descendants of the node with codice fiscale = `00102681419'........... 128 5.11 Query get distinct descendants and level charts.................. 137 5.12 Quick study of path uniqueness vs. global uniqueness................ 138 5.13 Query get distinct descendants and level only cfs charts.............. 139 5.14 Ancestors of the
Recommended publications
  • The Sad State of Web Development Random Thoughts on Web Development
    The Sad State of Web Development Random thoughts on web development Going to shit 2015 is when web development went to shit. Web development used to be nice. You could fire up a text editor and start creating JS and CSS files. You can absolutely still do this. That has not changed. So yes, everything I’m about to say can be invalidated by saying that. The web (specifically the Javascript/Node community) has created some of the most complicated, convoluted, over engineered tools ever conceived. Node.js/NPM At times, I think where web development is at this point is some cruel joke played on us by Ryan Dahl. You see, to get into why web development is so terrible, you have to start at Node. By definition I was a magpie developer, so undoubtedly I used Node, just as everyone should. At universities they should make every developer write an app with Node.js, deploy it to production, then try to update the dependencies 3 months later. The only downside is we would have zero new developers coming out of computer science programs. You see the Node.js philosophy is to take the worst fucking language ever designed and put it on the server. Combine that with all the magpies that were using Ruby at the time, and you have the perfect fucking storm. Lets take everything that was great in Ruby and re write it in Javascript, I think was the official motto. Most of the smart magpies have moved on to Go at this point, but the people who have stayed in the Node community have undoubtedly created the most over engineered eco system that has ever appeared.
    [Show full text]
  • 16 Inspiring Women Engineers to Watch
    Hackbright Academy Hackbright Academy is the leading software engineering school for women founded in San Francisco in 2012. The academy graduates more female engineers than UC Berkeley and Stanford each year. https://hackbrightacademy.com 16 Inspiring Women Engineers To Watch Women's engineering school Hackbright Academy is excited to share some updates from graduates of the software engineering fellowship. Check out what these 16 women are doing now at their companies - and what languages, frameworks, databases and other technologies these engineers use on the job! Software Engineer, Aclima Tiffany Williams is a software engineer at Aclima, where she builds software tools to ingest, process and manage city-scale environmental data sets enabled by Aclima’s sensor networks. Follow her on Twitter at @twilliamsphd. Technologies: Python, SQL, Cassandra, MariaDB, Docker, Kubernetes, Google Cloud Software Engineer, Eventbrite 1 / 16 Hackbright Academy Hackbright Academy is the leading software engineering school for women founded in San Francisco in 2012. The academy graduates more female engineers than UC Berkeley and Stanford each year. https://hackbrightacademy.com Maggie Shine works on backend and frontend application development to make buying a ticket on Eventbrite a great experience. In 2014, she helped build a WiFi-enabled basal body temperature fertility tracking device at a hardware hackathon. Follow her on Twitter at @magksh. Technologies: Python, Django, Celery, MySQL, Redis, Backbone, Marionette, React, Sass User Experience Engineer, GoDaddy 2 / 16 Hackbright Academy Hackbright Academy is the leading software engineering school for women founded in San Francisco in 2012. The academy graduates more female engineers than UC Berkeley and Stanford each year.
    [Show full text]
  • CROP: Linking Code Reviews to Source Code Changes
    CROP: Linking Code Reviews to Source Code Changes Matheus Paixao Jens Krinke University College London University College London London, United Kingdom London, United Kingdom [email protected] [email protected] Donggyun Han Mark Harman University College London Facebook and University College London London, United Kingdom London, United Kingdom [email protected] [email protected] ABSTRACT both industrial and open source software development communities. Code review has been widely adopted by both industrial and open For example, large organisations such as Google and Facebook use source software development communities. Research in code re- code review systems on a daily basis [5, 9]. view is highly dependant on real-world data, and although existing In addition to its increasing popularity among practitioners, researchers have attempted to provide code review datasets, there code review has also drawn the attention of software engineering is still no dataset that links code reviews with complete versions of researchers. There have been empirical studies on the effect of code the system’s code base mainly because reviewed versions are not review on many aspects of software engineering, including software kept in the system’s version control repository. Thus, we present quality [11, 12], review automation [2], and automated reviewer CROP, the Code Review Open Platform, the first curated code recommendation [20]. Recently, other research areas in software review repository that links review data with isolated complete engineering have leveraged the data generated during code review versions (snapshots) of the source code at the time of review. CROP to expand previously limited datasets and to perform empirical currently provides data for 8 software systems, 48,975 reviews and studies.
    [Show full text]
  • Crawling Code Review Data from Phabricator
    Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Technische Fakult¨at,Department Informatik DUMITRU COTET MASTER THESIS CRAWLING CODE REVIEW DATA FROM PHABRICATOR Submitted on 4 June 2019 Supervisors: Michael Dorner, M. Sc. Prof. Dr. Dirk Riehle, M.B.A. Professur f¨urOpen-Source-Software Department Informatik, Technische Fakult¨at Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Versicherung Ich versichere, dass ich die Arbeit ohne fremde Hilfe und ohne Benutzung anderer als der angegebenen Quellen angefertigt habe und dass die Arbeit in gleicher oder ¨ahnlicherForm noch keiner anderen Pr¨ufungsbeh¨ordevorgelegen hat und von dieser als Teil einer Pr¨ufungsleistung angenommen wurde. Alle Ausf¨uhrungen,die w¨ortlich oder sinngem¨aߨubernommenwurden, sind als solche gekennzeichnet. Nuremberg, 4 June 2019 License This work is licensed under the Creative Commons Attribution 4.0 International license (CC BY 4.0), see https://creativecommons.org/licenses/by/4.0/ Nuremberg, 4 June 2019 i Abstract Modern code review is typically supported by software tools. Researchers use data tracked by these tools to study code review practices. A popular tool in open-source and closed-source projects is Phabricator. However, there is no tool to crawl all the available code review data from Phabricator hosts. In this thesis, we develop a Python crawler named Phabry, for crawling code review data from Phabricator instances using its REST API. The tool produces minimal server and client load, reproducible crawling runs, and stores complete and genuine review data. The new tool is used to crawl the Phabricator instances of the open source projects FreeBSD, KDE and LLVM. The resulting data sets can be used by researchers.
    [Show full text]
  • Domestic Biometric Data Operator
    Domestic Biometric Data Operator Sector: IT–ITeS Class XI Domestic Biometric Data Operator (Job Role) Qualification Pack: Ref. Id. SSC/Q2213 Sector: Information Technology and Information Technology enabled Services (IT-ITeS) 171105 NCERT ISBN 978-81-949859-6-9 Textbook for Class XI 2021-22 Cover I_IV.indd All Pages 19-Mar-21 12:56:47 PM ` ` 100/pp.112 ` 90/pp.100 125/pp.136 Code - 17918 Code - 17903 Code - 17913 ISBN - 978-93-5292-075-4 ISBN - 978-93-5292-074-7 ISBN - 978-93-5292-083-9 ` 100/pp.112 ` 120/pp. 128 ` 85/pp. 92 Code - 17914 Code - 17946 Code - 171163 ISBN - 978-93-5292-082-2 ISBN - 978-93-5292-068-6 ISBN - 978-93-5292-080-8 ` 65/pp.68 ` 85/pp.92 ` 100/pp.112 Code - 17920 Code - 171111 Code - 17945 ISBN - 978-93-5292-087-7 ISBN - 978-93-5292-086-0 ISBN - 978-93-5292-077-8 For further enquiries, please visit www.ncert.nic.in or contact the Business Managers at the addresses of the regional centres given on the copyright page. 2021-22 Cover II_III.indd 3 10-Mar-21 4:37:56 PM Domestic Biometric Data Operator (Job Role) Qualification Pack: Ref. Id. SSC/Q2213 Sector: Information Technology and Information Technology enabled Services (IT-ITeS) Textbook for Class XI 2021-22 Prelims.indd 1 19-Mar-21 2:56:01 PM 171105 – Domestic Biometric Data Operator ISBN 978-81-949859-6-9 Vocational Textbook for Class XI First Edition ALL RIGHTS RESERVED March 2021 Phalguna 1942 No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior PD 5T BS permission of the publisher.
    [Show full text]
  • Letter, If Not the Spirit, of One Or the Other Definition
    Producing Open Source Software How to Run a Successful Free Software Project Karl Fogel Producing Open Source Software: How to Run a Successful Free Software Project by Karl Fogel Copyright © 2005-2021 Karl Fogel, under the CreativeCommons Attribution-ShareAlike (4.0) license. Version: 2.3214 Home site: https://producingoss.com/ Dedication This book is dedicated to two dear friends without whom it would not have been possible: Karen Under- hill and Jim Blandy. i Table of Contents Preface ............................................................................................................................. vi Why Write This Book? ............................................................................................... vi Who Should Read This Book? ..................................................................................... vi Sources ................................................................................................................... vii Acknowledgements ................................................................................................... viii For the first edition (2005) ................................................................................ viii For the second edition (2021) .............................................................................. ix Disclaimer .............................................................................................................. xiii 1. Introduction ...................................................................................................................
    [Show full text]
  • Jetbrains Upsource Comparison Upsource Is a Powerful Tool for Teams Wish- Key Benefits Ing to Improve Their Code, Projects and Pro- Cesses
    JetBrains Upsource Comparison Upsource is a powerful tool for teams wish- Key benefits ing to improve their code, projects and pro- cesses. It serves as a polyglot code review How Upsource Compares to Other Code Review Tools tool, a source of data-driven project ana- lytics, an intelligent repository browser and Accuracy of Comparison a team collaboration center. Upsource boasts in-depth knowledge of Java, PHP, JavaScript, Integration with JetBrains Tools Python, and Kotlin to increase the efcien- cy of code reviews. It continuously analyzes Sales Contacts the repository activity providing a valuable insight into potential design problems and project risks. On top of that Upsource makes team collaboration easy and enjoyable. Key benefits IDE-level code insight to help developers Automated workflow, to minimize manual tasks. Powerful search engine. understand and review code changes more efectively. Smart suggestion of suitable reviewers, revi- IDE plugins that allow developers to partici- sions, etc. based on historical data and intel- pate in code reviews right from their IDEs. Data-driven project analytics highlighting ligent progress tracking. potential design flaws such as hotspots, abandoned files and more. Unified access to all your Git, Mercurial, Secure, and scalable. Perforce or Subversion projects. To learn more about Upsource, please visit our website at jetbrains.com/upsource. How Upsource Compares to Other Code Review Tools JetBrains has extensively researched various As all the products mentioned in the docu- tools to come up with a useful comparison ment are being actively developed and their table. We tried to make it as comprehensive functionality changes on a regular basis, this and neutral as we possibly could.
    [Show full text]
  • Phabricator 538D8f2... Overview
    Institute of Computational Science Phabricator 538d8f2... overview Dmitry Mikushin (for the Bugs Course) . October 17, 2013 Dmitry Mikushin Test-drive at http://devel.kernelgen.org October 17, 2013 1 / 14 . What is Phabricator? The LAMP-based web-server + command-line client for: peer code review task management project communication And it’s open-source Dmitry Mikushin Test-drive at http://devel.kernelgen.org October 17, 2013 2 / 14 . Test drive! Browse to http://devel.kernelgen.org, login and look around Dmitry Mikushin Test-drive at http://devel.kernelgen.org October 17, 2013 3 / 14 . Key features Peer code review Integrated environment for request reviewing, tasks/bugs and versioning system - in web environment - in command line Ergonomic task interface Dmitry Mikushin Test-drive at http://devel.kernelgen.org October 17, 2013 4 / 14 . Key features Peer code review Integrated environment for request reviewing, tasks/bugs and versioning system - in web environment - in command line Ergonomic task interface Dmitry Mikushin Test-drive at http://devel.kernelgen.org October 17, 2013 5 / 14 . Peer code review: traditional 1 RFC: Developer publishes a patch with the source and tests 2 Other developers comment on the patch issues/improvements 3 After all issues are addressed, reviewers ACK patch for commit 4 Developer himself or someone with rw rights commits the patch Everything is over email Relationship with Bugs: fixes for PRs are also reviewed this way Dmitry Mikushin Test-drive at http://devel.kernelgen.org October 17, 2013 6 / 14 . Peer code review: Phabricator approach 1 Developer check-ins the patch to the Phabricator over the command line (passing lint/unit tests, if any) $ arc diff .
    [Show full text]
  • Development and Deployment at Facebook
    Development and Deployment at Facebook Dror G. Feitelson Eitan Frachtenberg Kent L. Beck Hebrew University Facebook Facebook Abstract More than one billion users log in to Facebook at least once a month to connect and share content with each other. Among other activities, these users upload over 2.5 billion content items every day. In this article we describe the development and deployment of the software that supports all this activity, focusing on the site’s primary codebase for the Web front-end. Information on Facebook’s architecture and other software components is available elsewhere. Keywords D.2.10.i Rapid prototyping; D.2.18 Software Engineering Process; D.2.19 Software Quality/SQA; D.2.2.c Distributed/Internet based software engineering tools and techniques; D.2.5.r Testing tools; D.2.7.e Evolving Internet applications. Facebook’s main development characteristics are speed and growth. The front-end is under continuous development by hundreds of software engineers. These engineers commit code to the version control system up to 500 times a day, recording changes in some 3,000 files. Naturally, codebase size unique developers by week commits per month 14 800 10 700 12 600 10 8 500 8 6 400 6 300 4 4 200 LoC [millions] 2 100 2 active developers 0 0 0 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 number of commits [1000s] ’05 ’06 ’07 ’08 ’09 ’10 ’11 ’12 Figure 1: Different aspects of Facebook growth: growth of the number of engineers working on the code, growth in the total activity of these engineers, and growth of the codebase itself.
    [Show full text]
  • Crystal Reports Activex Designer
    Quiz List—Reading Practice Page 1 Printed Wednesday, March 18, 2009 2:36:33PM School: Churchland Academy Elementary School Reading Practice Quizzes Quiz Word Number Lang. Title Author IL ATOS BL Points Count F/NF 9318 EN Ice Is...Whee! Greene, Carol LG 0.3 0.5 59 F 9340 EN Snow Joe Greene, Carol LG 0.3 0.5 59 F 36573 EN Big Egg Coxe, Molly LG 0.4 0.5 99 F 9306 EN Bugs! McKissack, Patricia C. LG 0.4 0.5 69 F 86010 EN Cat Traps Coxe, Molly LG 0.4 0.5 95 F 9329 EN Oh No, Otis! Frankel, Julie LG 0.4 0.5 97 F 9333 EN Pet for Pat, A Snow, Pegeen LG 0.4 0.5 71 F 9334 EN Please, Wind? Greene, Carol LG 0.4 0.5 55 F 9336 EN Rain! Rain! Greene, Carol LG 0.4 0.5 63 F 9338 EN Shine, Sun! Greene, Carol LG 0.4 0.5 66 F 9353 EN Birthday Car, The Hillert, Margaret LG 0.5 0.5 171 F 9305 EN Bonk! Goes the Ball Stevens, Philippa LG 0.5 0.5 100 F 7255 EN Can You Play? Ziefert, Harriet LG 0.5 0.5 144 F 9314 EN Hi, Clouds Greene, Carol LG 0.5 0.5 58 F 9382 EN Little Runaway, The Hillert, Margaret LG 0.5 0.5 196 F 7282 EN Lucky Bear Phillips, Joan LG 0.5 0.5 150 F 31542 EN Mine's the Best Bonsall, Crosby LG 0.5 0.5 106 F 901618 EN Night Watch (SF Edition) Fear, Sharon LG 0.5 0.5 51 F 9349 EN Whisper Is Quiet, A Lunn, Carolyn LG 0.5 0.5 63 NF 74854 EN Cooking with the Cat Worth, Bonnie LG 0.6 0.5 135 F 42150 EN Don't Cut My Hair! Wilhelm, Hans LG 0.6 0.5 74 F 9018 EN Foot Book, The Seuss, Dr.
    [Show full text]
  • Zerorpc) by Jérôme Petazzoni from Dot- Cloud 73 10.1 Introduction
    Marc’s PyCon 2012 Notes Documentation Release 1.0 Marc Abramowitz January 27, 2014 Contents 1 Stop Mocking, Start Testing by Augie Fackler and Nathaniel Manista from Google Code3 1.1 Modern Mocking.............................................4 1.2 Testing Today..............................................4 1.3 Injected dependencies..........................................5 1.4 Separate state from behavior.......................................5 1.5 Define interfaces between components.................................5 1.6 Decline to write a test when there’s no clear interface..........................5 1.7 Thank you................................................5 1.8 Questions.................................................6 2 Fast test, slow test by Gary Bernhardt from destroyallsoftware.com7 2.1 Goals of tests...............................................7 2.2 How To Fail...............................................9 2.3 Unit tests................................................. 10 2.4 The End................................................. 11 2.5 Questions................................................. 11 3 Speedily Practical Large-Scale Tests with Erik Rose from Votizen 13 3.1 Die, setUp(), die............................................. 13 3.2 Die, fixtures, die............................................. 13 4 Fake It Til You Make It: Unit Testing Patterns With Mocks And Fakes by Brian K. Jones 17 4.1 Your Speaker............................................... 17 4.2 What’s covered.............................................
    [Show full text]
  • Sapfix: Automated End-To-End Repair at Scale
    SapFix: Automated End-to-End Repair at Scale A. Marginean, J. Bader, S. Chandra, M. Harman, Y. Jia, K. Mao, A. Mols, A. Scott Facebook Inc. Abstract—We report our experience with SAPFIX: the first In order to deploy such a fully automated end-to-end detect- deployment of automated end-to-end fault fixing, from test case and-fix process we naturally needed to combine a number of design through to deployed repairs in production code1. We have different techniques. Nevertheless the SAPFIX core algorithm used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are is a simple one. Specifically, it combines straightforward collectively used by hundreds of millions of people worldwide. approaches to mutation testing [8], [9], search-based software testing [6], [10], [11], and fault localisation [12] as well as INTRODUCTION existing developer-designed test cases. We also needed to Automated program repair seeks to find small changes to deploy many practical engineering techniques and develop software systems that patch known bugs [1], [2]. One widely new engineering solutions in order to ensure scalability. studied approach uses software testing to guide the repair SAPFIX combines a mutation-based technique, augmented by process, as typified by the GenProg approach to search-based patterns inferred from previous human fixes, with a reversion-as- program repair [3]. last resort strategy for high-firing crashes (that would otherwise Recently, the automated test case design system, Sapienz block further testing, if not fixed or removed). This core fixing [4], has been deployed at scale [5], [6].
    [Show full text]