Define: a Fluent Interface DSL for Deep Learning Applications

Total Page:16

File Type:pdf, Size:1020Kb

Define: a Fluent Interface DSL for Deep Learning Applications DEFIne: A Fluent Interface DSL for Deep Learning Applications Nina Dethlefs Ken Hawick The Digital Centre The Digital Centre School of Engineering and School of Engineering and Computer Science Computer Science University of Hull, UK University of Hull, UK [email protected] [email protected] ABSTRACT or medical imaging [23], natural language processing [42], Recent years have seen a surge of interest in deep learning computer vision [18], and artificial intelligence in general models that outperform other machine learning algorithms [5], in some tasks even exceeding human performance [53]. on benchmarks across many disciplines. Most existing deep Despite the great success of deep learning and an increas- learning libraries facilitate the development of neural nets ing number of libraries and frameworks that are available, by providing a mathematical framework that helps users im- a substantial amount of parameter tuning is required to de- plement their models more efficiently. This still represents velop an effective model for a new domain, see e.g. the ran- a substantial investment of time and effort, however, when dom search solution in [8]. There are few guidelines on how the intention is to compare a range of competing models many layers a deep learning model should use, or what acti- quickly for a specific task. We present DEFIne, a fluent vation, loss function and optimiser will lead to an adequate interface DSL for the specification, optimisation and evalu- representation of the input data, as these details depend cru- ation of deep learning models. The fluent interface is im- cially on the target data [6]. Developers will typically start plemented through method chaining. DEFIne is embedded from a basic set of rules of thumb and then experiment to in Python and is build on top of its most popular deep find the best setup. learning libraries, Keras and Theano. It extends these with In this paper, we present a fluent interface DSL that aims common operations for data pre-processing and represen- to facilitate the process of developing deep learning models tation as well as visualisation of datasets and results. We for new domains. A fluent interface [19] is one where syntac- test our framework on three benchmark tasks from different tic features of the hosting language are used to good effect to domains: heart disease diagnosis, hand-written digit recog- construct an DSL that is embedded in a host language and nition and weather forecast generation. Results in terms that captures the jargon, the commands and other notions of accuracy, runtime and lines of code show that our DSL of the requisite application domain [20, 31]. Our fluent inter- achieves equivalent accuracy and runtime to state-of-the-art face is implemented through method chaining as explained models, while requiring only about 10 lines of code per ap- in Section 4. The general idea is to abstract away from im- plication. plementational details and integrate standard operations in data representation or model definition into a DSL that is embedded in Python and built directly on top of the most CCS Concepts popular deep learning libraries, Keras [10] and Theano [58]. •Software engineering ! Reusable software; •Artificial We present experiments in three different domains that are Intelligence ! Learning; relevant to real-world applications: heart disease diagnosis, hand-written digit recognition and weather forecast genera- Keywords tion. Our results in terms of accuracy, runtime and lines of code suggest that our DSL is able to provide sufficient flex- Domain-specific languages, deep learning ibility to programmers to express the representational and mathematical peculiarities of individual domains, while at 1. INTRODUCTION the same time enhancing the readability and maintainabil- Deep learning has received a lot of interest in recent years ity of code. The research contributions of this paper are: in both academic and industrial contexts. It is increasingly used in applications such as stock market prediction [17] 1. A DSL framework that performs automatic data anal- ysis, pre-processing and choice of hyper-parameters for Permission to make digital or hard copies of all or part of this work for personal or deep learning applications; see Section 4. classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation 2. An evaluation in three different application domains to on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or demonstrate the flexibility of the DSL across datasets; republish, to post on servers or to redistribute to lists, requires prior specific permission see Section 5 for the datasets and Section 6 for results. and/or a fee. Request permissions from [email protected]. RWDSL ’17, February 04 2017, Austin, TX, USA 3. A reduction in code size by up to a factor of 5 at equiv- © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. alent performance to recent state-of-the-art results; see ISBN 978-1-4503-4845-4/17/02. $15.00 Section 6. DOI: http://dx.doi.org/10.1145/3039895.3039898 2. BACKGROUND together into a back-end framework or library and allow Computer programming language designers have a major them to be invoked by the application user through appro- goal of helping programmers express ideas and algorithms priate compact programming language features in a span- concisely and clearly. Even the most elegant programming ning DSL. DSLs are particularly effective when a whole fam- languages however sometimes falter in this goal, when pro- ily of problems [4] can be identified. Once a subset of special grams become bigger and more complex. The standard com- cases have been solved, it is often feasible and efficient to in- puter science \divide and conquer" approach is to abstrac- ductively design a DSL framework [32] that then addresses tify ideas and component parts of a large program into a the whole family of problems rather than continuing to solve framework or software library that can be separately de- each individually. This is a very productive approach. veloped, tested and hidden away, and invoked only when Spinellis [54] described some well known usage patterns of needed. This can considerably lower the amount of source DSLs but although ideas such as language-oriented program- code and hence concepts that the programmer needs to hold ming [66] have been reported in the programming literature on their screen or in their mind all at once and is a key to since 1994, the development and deployment of DSLs is still managing large-scale complex software development [62]. a relatively new area with most activity reported only over Many different techniques and tools have been introduced the last 15 years [63, 19, 43, 39]. Many application domains in modern programming languages to help this abstraction including: simulations [25]; business applications [51]; im- and lowering of code complexity including: subroutines and age processing [55]; database systems [41]; materials physics functions; modules and packages; and classes and objects problems [26], or other complex systems [27] could be sup- where data structures and operational code are combined ported in this manner [63, 54]. In this present paper we together to form abstract data types. While these help, ap- develop a DSL for deep-learning applications. plication developers are still often easily overwhelmed by DSLs are used in generating programming languages and the size and sheer complexity of programs [36]. These dis- tools themselves [16, 59], but other areas of reported success- tractions take away effort from addressing the application ful DSL use to date include: communications and telephony problems - particularly when carrying out work in computa- [50, 15]; real-time- embedded systems [24] and field pro- tional experiments when each experiment must be carefully grammable gate array device deployment applications [13]; and repeatably programmed. distributed and computational grid applications [33]; and Different programming languages [52] have varied relative mathematical [9] and equation-based problem formulation strengths, advantages and specialities appropriate to various [40, 28, 57]. Parallel computing [38] is also a promising area sorts of applications. General-purpose programming lan- for use of DSLs, whereby multiple versions of a program guages often have a lot of legacy operational and idiomatic suited to different parallel architectures could be generated features that can obscure the essence of an application and by a single DSL specification. sometimes make it unnecessarily complex for developers. DSLs approaches are thus now being employed in many A relatively new technique is to create a domain-specific application areas [12]. At the time of writing, their use is programming language (DSL) [19, 63, 21, 14] that provides still not completely widespread although this appears to be the programmer with a very high-level vehicle to formulate accelerating as better development tools become available application ideas. The goal is for the language to focus con- and as more positive user and programmer experiences are cisely on only those concepts and aspects that are directly reported. relevant to the particular application problem domain and to hide away any \boiler plate" source code that is just an 3. DEEP LEARNING FRAMEWORKS artefact of the underpinning programming language. The Deep Learning can be approached in a number of ways. application domain is often a business problem using busi- In this section we present some background on principle ap- ness jargon and language or for the discussion below it could proaches, followed by a explanation of how existing libraries use the scientific terminology particular to complex deep can be used. learning systems. A DSL can be implemented as either a full-blown language 3.1 Overview of deep learning models using all the necessary compiler [2] and language builder Two aspects that are particular relevant to our present environment apparatus to aid the programmer [35].
Recommended publications
  • Building Great Interfaces with OOABL
    Building great Interfaces with OOABL Mike Fechner Director Übersicht © 2018 Consultingwerk Ltd. All rights reserved. 2 Übersicht © 2018 Consultingwerk Ltd. All rights reserved. 3 Übersicht Consultingwerk Software Services Ltd. ▪ Independent IT consulting organization ▪ Focusing on OpenEdge and related technology ▪ Located in Cologne, Germany, subsidiaries in UK and Romania ▪ Customers in Europe, North America, Australia and South Africa ▪ Vendor of developer tools and consulting services ▪ Specialized in GUI for .NET, Angular, OO, Software Architecture, Application Integration ▪ Experts in OpenEdge Application Modernization © 2018 Consultingwerk Ltd. All rights reserved. 4 Übersicht Mike Fechner ▪ Director, Lead Modernization Architect and Product Manager of the SmartComponent Library and WinKit ▪ Specialized on object oriented design, software architecture, desktop user interfaces and web technologies ▪ 28 years of Progress experience (V5 … OE11) ▪ Active member of the OpenEdge community ▪ Frequent speaker at OpenEdge related conferences around the world © 2018 Consultingwerk Ltd. All rights reserved. 5 Übersicht Agenda ▪ Introduction ▪ Interface vs. Implementation ▪ Enums ▪ Value or Parameter Objects ▪ Fluent Interfaces ▪ Builders ▪ Strong Typed Dynamic Query Interfaces ▪ Factories ▪ Facades and Decorators © 2018 Consultingwerk Ltd. All rights reserved. 6 Übersicht Introduction ▪ Object oriented (ABL) programming is more than just a bunch of new syntax elements ▪ It’s easy to continue producing procedural spaghetti code in classes ▪
    [Show full text]
  • Functional Javascript
    www.it-ebooks.info www.it-ebooks.info Functional JavaScript Michael Fogus www.it-ebooks.info Functional JavaScript by Michael Fogus Copyright © 2013 Michael Fogus. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Mary Treseler Indexer: Judith McConville Production Editor: Melanie Yarbrough Cover Designer: Karen Montgomery Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Jilly Gagnon Illustrator: Robert Romano May 2013: First Edition Revision History for the First Edition: 2013-05-24: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449360726 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Functional JavaScript, the image of an eider duck, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
    [Show full text]
  • Automating Testing with Autofixture, Xunit.Net & Specflow
    Design Patterns Michael Heitland Oct 2015 Creational Patterns • Abstract Factory • Builder • Factory Method • Object Pool* • Prototype • Simple Factory* • Singleton (* this pattern got added later by others) Structural Patterns ● Adapter ● Bridge ● Composite ● Decorator ● Facade ● Flyweight ● Proxy Behavioural Patterns 1 ● Chain of Responsibility ● Command ● Interpreter ● Iterator ● Mediator ● Memento Behavioural Patterns 2 ● Null Object * ● Observer ● State ● Strategy ● Template Method ● Visitor Initial Acronym Concept Single responsibility principle: A class should have only a single responsibility (i.e. only one potential change in the S SRP software's specification should be able to affect the specification of the class) Open/closed principle: “Software entities … should be open O OCP for extension, but closed for modification.” Liskov substitution principle: “Objects in a program should be L LSP replaceable with instances of their subtypes without altering the correctness of that program.” See also design by contract. Interface segregation principle: “Many client-specific I ISP interfaces are better than one general-purpose interface.”[8] Dependency inversion principle: One should “Depend upon D DIP Abstractions. Do not depend upon concretions.”[8] Creational Patterns Simple Factory* Encapsulating object creation. Clients will use object interfaces. Abstract Factory Provide an interface for creating families of related or dependent objects without specifying their concrete classes. Inject the factory into the object. Dependency Inversion Principle Depend upon abstractions. Do not depend upon concrete classes. Our high-level components should not depend on our low-level components; rather, they should both depend on abstractions. Builder Separate the construction of a complex object from its implementation so that the two can vary independently. The same construction process can create different representations.
    [Show full text]
  • Designpatternsphp Documentation Release 1.0
    DesignPatternsPHP Documentation Release 1.0 Dominik Liebler and contributors Jul 18, 2021 Contents 1 Patterns 3 1.1 Creational................................................3 1.1.1 Abstract Factory........................................3 1.1.2 Builder.............................................8 1.1.3 Factory Method......................................... 13 1.1.4 Pool............................................... 18 1.1.5 Prototype............................................ 21 1.1.6 Simple Factory......................................... 24 1.1.7 Singleton............................................ 26 1.1.8 Static Factory.......................................... 28 1.2 Structural................................................. 30 1.2.1 Adapter / Wrapper....................................... 31 1.2.2 Bridge.............................................. 35 1.2.3 Composite............................................ 39 1.2.4 Data Mapper.......................................... 42 1.2.5 Decorator............................................ 46 1.2.6 Dependency Injection...................................... 50 1.2.7 Facade.............................................. 53 1.2.8 Fluent Interface......................................... 56 1.2.9 Flyweight............................................ 59 1.2.10 Proxy.............................................. 62 1.2.11 Registry............................................. 66 1.3 Behavioral................................................ 69 1.3.1 Chain Of Responsibilities...................................
    [Show full text]
  • Design Patterns for QA Automation Anton Semenchenko
    Design Patterns for QA Automation Anton Semenchenko CONFIDENTIAL 1 Agenda, part 1 (general) 1. Main challenges 2. Solution 3. Design Patterns – the simplest definition 4. Design Patterns language – the simplest definition 5. Encapsulation – the most important OOP principle CONFIDENTIAL 2 Agenda, part 2 (main patterns) 1. Page Element 2. Page Object 3. Action CONFIDENTIAL 3 Agenda, part 3 (less popular patterns) 1. Flow (Fluent Interface) – Ubiquitous language – Key word driven – Behavior Driven Development (BDD) 2. Domain Specific Language (DSL) – Flow 3. Navigator (for Web) CONFIDENTIAL 4 Agenda, part 4 (take away points) 1. “Rules” and principles 2. A huge set of useful links 3. A huge set of examples CONFIDENTIAL 5 2 main challenges in our every day work (interview experience) 1. Pure design 2. Over design CONFIDENTIAL 6 Solution 1. Find a balance CONFIDENTIAL 7 Design Patterns – the simplest definition 1. Elements (blocks) of reusable object-oriented software; 2. The re-usable form of a solution to a design problem; CONFIDENTIAL 8 Design Patterns – as a language 1. Design patterns that relate to a particular field (for example QA Automation) is called a pattern language 2. Design Patterns language gives a common terminology for discussing the situations specialists are faced with: – “The elements of this language are entities called patterns”; – “Each pattern describes a problem that occurs over and over again in our (QA Automation) environment”; – “Each pattern describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice!” CONFIDENTIAL 9 Design Patterns for QA Automation 1.
    [Show full text]
  • Drools Expert User Guide
    Drools Expert User Guide Version 6.0.0.CR2 by The JBoss Drools team [http://www.jboss.org/drools/team.html] ....................................................................................................................................... vii 1. Introduction ................................................................................................................. 1 1.1. Artificial Intelligence ............................................................................................ 1 1.1.1. A Little History ......................................................................................... 1 1.1.2. Knowledge Representation and Reasoning ................................................ 2 1.1.3. Rule Engines and Production Rule Systems (PRS) .................................... 3 1.1.4. Hybrid Reasoning Systems (HRS) ............................................................ 5 1.1.5. Expert Systems ........................................................................................ 8 1.1.6. Recommended Reading ........................................................................... 9 1.2. Why use a Rule Engine? .................................................................................. 12 1.2.1. Advantages of a Rule Engine ................................................................. 13 1.2.2. When should you use a Rule Engine? ..................................................... 14 1.2.3. When not to use a Rule Engine .............................................................. 15 1.2.4. Scripting
    [Show full text]
  • Efficient Engineering and Execution of Pipe-And-Filter Architectures
    Efficient Engineering and Execution of Pipe-and-Filter Architectures Dissertation M.Sc. Christian Wulf Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2019 1. Gutachter: Prof. Dr. Wilhelm Hasselbring Christian-Albrechts-Universität zu Kiel 2. Gutachter: Prof. Dr. Steffen Becker Universität Stuttgart Datum der mündlichen Prüfung: 19. Juli 2019 ii Zusammenfassung Pipe-and-Filter (P&F) ist ein wohlbekannter und häufig verwendeter Archi- tekturstil. Allerdings gibt es unseres Wissens nach kein P&F-Framework, das beliebige P&F-Architekturen sowohl modellieren als auch ausführen kann. Beispielsweise unterstützen die Frameworks FastFlow, StreamIT and Spark nicht mehrere Input- und Output-Ströme pro Filter, sodass sie kei- ne Verzweigungen modellieren können. Andere Frameworks beschränken sich auf sehr spezielle Anwendungsfälle oder lassen die Typsicherheit zwi- schen zwei miteinander verbundenen Filtern außer Acht. Außerdem ist eine effiziente parallele Ausführung von P&F-Architekturen weiterhin eine Herausforderung. Obwohl einige vorhandene Frameworks Filter parallel ausführen können, gibt es noch viel Optimierungspotential. Leider besit- zen die meisten Frameworks kaum Möglichkeiten, die einzig vorhandene Ausführungsstrategie ohne großen Aufwand anzupassen. In dieser Arbeit präsentieren wir unser generisches und paralleles P&F- Framework TeeTime. Es kann beliebige P&F-Architekturen sowohl model- lieren als auch ausführen. Gleichzeitig ist es offen für Modifikationen, um mit dem P&F-Stil zu experimentieren. Zudem erlaubt es, Filter effizient und parallel auf heutigen Multi-core-Systemen auszuführen. Umfangreiche Laborexperimente zeigen, dass TeeTime nur einen sehr geringen und in gewissen Fällen gar keinen zusätzlichen Laufzeit-Overhead im Vergleich zu Implementierungen ohne P&F-Abstraktionen erfordert.
    [Show full text]
  • Lessons in Software Evolution Learned by Listening to Smalltalk *
    Lessons in Software Evolution Learned by Listening to Smalltalk ? Oscar Nierstrasz and Tudor G^ırba Software Composition Group, University of Bern, Switzerland http://scg.unibe.ch Abstract. The biggest challenge facing software developers today is how to gracefully evolve complex software systems in the face of chang- ing requirements. We clearly need software systems to be more dynamic, compositional and model-centric, but instead we continue to build sys- tems that are static, baroque and inflexible. How can we better build change-enabled systems in the future? To answer this question, we pro- pose to look back to one of the most successful systems to support change, namely Smalltalk. We briefly introduce Smalltalk with a few simple ex- amples, and draw some lessons for software evolution. Smalltalk's sim- plicity, its reflective design, and its highly dynamic nature all go a long way towards enabling change in Smalltalk applications. We then illus- trate how these lessons work in practice by reviewing a number of re- search projects that support software evolution by exploiting Smalltalk's design. We conclude by summarizing open issues and challenges for change-enabled systems of the future. 1 Introduction The conventional view of disciplined software construction is to reason that cor- rectness of the final result is paramount, so we must invest carefully in rigorous requirements collection, specification, verification and validation. Of course these things are important, but the fallacy is to suppose that there is a final result. This leads one to the flawed corollary that it is possible to get the requirements right. The truth (as we know) is that in practice evolu- tion is paramount [26,32], so the system is never finished, and neither are its requirements [4].
    [Show full text]
  • Teaching Functional Programming to Professional .NET Developers
    Teaching Functional Programming to Professional .NET Developers Tomas Petricek 1 1 University of Cambridge, Cambridge, United Kingdom [email protected] Abstract. Functional programming is often taught at universities to first-year or second-year students and most of the teaching materials have been written for this audience. With the recent rise of functional programming in the industry, it becomes important to teach functional concepts to professional developers with deep knowledge of other paradigms, most importantly object-oriented. We present our experience with teaching functional programming and F# to experienced .NET developers through a book Real-World Functional Program- ming and commercially offered F# trainings. The most important novelty in our approach is the use of C# for relating functional F# with object-oriented C# and for introducing some of the functional concepts. By presenting principles such as immutability, higher-order functions and functional types from a different perspective, we are able to build on existing knowledge of professional developers. This contrasts with a common approach that asks students to forget everything they know about programming and think completely differently. We believe that our observations are relevant for train- ings designed for practitioners, but perhaps also for students who explore fun- ctional relatively late in the curriculum. 1 Introduction Until recently, object-oriented languages such as Java, C# and C++ were dominating the industry. Despite this fact, functional languages have been successfully taught at universities and interesting approaches to teaching functional languages have been developed. Professionals approached functional programming with the intention to learn about different style of thinking, with the aim of becoming better developers by broadening their horizons.
    [Show full text]
  • Asp.Net Mvc 5 - Building a Website with Visual Studio 2015 and C Sharp: the Tactical Guidebook Pdf, Epub, Ebook
    ASP.NET MVC 5 - BUILDING A WEBSITE WITH VISUAL STUDIO 2015 AND C SHARP: THE TACTICAL GUIDEBOOK PDF, EPUB, EBOOK Jonas Fagerberg | 492 pages | 07 Jul 2016 | Createspace Independent Publishing Platform | 9781535167864 | English | United States ASP.Net MVC 5 - Building a Website with Visual Studio 2015 and C Sharp: The Tactical Guidebook PDF Book Creating a Windows service in. New other. This post contains the required steps to make it work. Proguard is an optimizer and obfuscator for Java code. An example of that is Vue. Doing this in my new technology stack took a bit more time. This is what we really want to say to you who love reading so considerably. There's also a lively ecosystem for extensions, which can improve the development experience even further. However, a very important detail about this approach is mentioned very briefly in the documentation. NET library. NET Core 1. In this follow-up post I describe how to configure TeamCity for Grunt and provide a couple of hints how to make it work even better. NET Framework 3. Mocking the fetch call would be the easiest way to do that. FluentValidation is a small portable validation library with fluent interface. Start by installing Visual Studio However, even with lazy loading enabled, application performance can be tweaked further with different configurations options. It worked great for running tests during development, but lacks support for headless test execution on a build server. They can be based on data in an external data source such as a hierarchy of folders or categories, for example.
    [Show full text]
  • Shallow Edsls and Object-Oriented Programming Beyond Simple Compositionality
    Shallow EDSLs and Object-Oriented Programming Beyond Simple Compositionality Weixin Zhanga and Bruno C. d. S. Oliveiraa a The University of Hong Kong, Hong Kong, China Abstract Context. Embedded Domain-Specific Languages (EDSLs) are a common and widely used approach to DSLs in various languages, including Haskell and Scala. There are two main implementation techniques for EDSLs: shallow embeddings and deep embeddings. Inquiry. Shallow embeddings are quite simple, but they have been criticized in the past for being quite limited in terms of modularity and reuse. In particular, it is often argued that supporting multiple DSL inter- pretations in shallow embeddings is difficult. Approach. This paper argues that shallow EDSLs and Object-Oriented Programming (OOP) are closely re- lated. Gibbons and Wu already discussed the relationship between shallow EDSLs and procedural abstraction, while Cook discussed the connection between procedural abstraction and OOP. We make the transitive step in this paper by connecting shallow EDSLs directly to OOP via procedural abstraction. The knowledge about this relationship enables us to improve on implementation techniques for EDSLs. Knowledge. This paper argues that common OOP mechanisms (including inheritance, subtyping, and type- refinement) increase the modularity and reuse of shallow EDSLs when compared to classical procedural ab- straction by enabling a simple way to express multiple, possibly dependent, interpretations. Grounding. We make our arguments by using Gibbons and Wu’s examples, where procedural abstraction is used in Haskell to model a simple shallow EDSL. We recode that EDSL in Scala and with an improved OO-inspired Haskell encoding. We further illustrate our approach with a case study on refactoring a deep external SQL query processor to make it more modular, shallow, and embedded.
    [Show full text]
  • Blocksci: Design and Applications of a Blockchain Analysis Platform
    BlockSci: Design and applications of a blockchain analysis platform Harry Kalodner, Malte Möser, and Kevin Lee, Princeton University; Steven Goldfeder, Cornell Tech; Martin Plattner, University of Innsbruck; Alishah Chator, Johns Hopkins University; Arvind Narayanan, Princeton University https://www.usenix.org/conference/usenixsecurity20/presentation/kalodner This paper is included in the Proceedings of the 29th USENIX Security Symposium. August 12–14, 2020 978-1-939133-17-5 Open access to the Proceedings of the 29th USENIX Security Symposium is sponsored by USENIX. BlockSci: Design and applications of a blockchain analysis platform Harry Kalodner∗ Malte Möser∗ Kevin Lee Steven Goldfeder Princeton University Princeton University Princeton University Cornell Tech Martin Plattner Alishah Chator Arvind Narayanan University of Innsbruck Johns Hopkins University Princeton University Abstract for cryptocurrency traders. Instead, by providing efficient and convenient programmatic access to the full blockchain data, Analysis of blockchain data is useful for both scientific re- it enables a wide range of reproducible, scientific analyses. search and commercial applications. We present BlockSci, an open-source software platform for blockchain analysis. BlockSci’s design starts with the observation that BlockSci is versatile in its support for different blockchains blockchains are append-only databases; further, the snapshots and analysis tasks. It incorporates an in-memory, analytical used for research are static. Thus, the ACID properties of (rather than transactional) database, making it orders of mag- transactional databases are unnecessary. This makes an in- nitudes faster than using general-purpose graph databases. We memory analytical database the natural choice. On top of describe BlockSci’s design and present four analyses that il- the obvious speed gains of memory, we apply a number of lustrate its capabilities, shedding light on the security, privacy, tricks such as converting hash pointers to actual pointers and and economics of cryptocurrencies.
    [Show full text]