Pico:A Domain-Specific Language for Data Analytics Pipelines

Total Page:16

File Type:pdf, Size:1020Kb

Pico:A Domain-Specific Language for Data Analytics Pipelines University of Torino Doctoral School on Science and High Technology Computer Science Department Doctoral Thesis PiCo: A Domain-Specific Language for Data Analytics Pipelines Author: Supervisor: Claudia Misale Prof. Marco Aldinucci Co-Supervisor: Cycle XVIII Prof. Guy Tremblay A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy ii \Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better. " Edsger W. Dijkstra, \On the nature of Computing Science" Marktoberdorf, 10 August 1984 iii UNIVERSITY OF TORINO Abstract Computer Science Department Doctor of Philosophy PiCo: A Domain-Specific Language for Data Analytics Pipelines by Claudia Misale In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models|for which only informal (and often confusing) semantics is generally provided|all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world. v Acknowledgements Questi quattro anni sono stati veramente brevi. Praticamente non me ne sono accorta. Ho imparato tante cose, molte ancora non ho capito come si fanno (ad esempio non so fare le revisioni in maniera educata ed elegante, non ho imparato a parlare in pubblico, non so riassumere i meeting..) e faccio ancora arrabbiare tantissimo il mio supervisore, il Professor Marco Aldinucci. A lui, infatti, dire semplicemente \grazie" non `eabbastanza. Specie per la pazienza che ha avuto con me, da tutti i punti di vista. These four years were really short. I did not notice they are gone. I learned a lot, I still do not understand how to do a lot of things (for instance I am not able to review papers in a polite and elegant way, I did not learn to talk in public, I am not able to summarize meetings..) and I still make my supervisor very angry with me, Professor Marco Aldinucci. It is not enough to say \thank you" to him. Especially for the patience he had with me, from all points of view. Sono arrivata a Torino senza conoscerlo, chiedendogli di essere il mio supervisore per il dottorato. Non potevo fare scelta migliore. Tutto quello che so lo devo a lui. Le ore passate ad ascoltarlo spiegarmi i mille aspetti di quest'area di ricerca (anche se per lui, di aspetti, ce ne sono sempre e solo due), mi hanno aperto la mente e mi hanno formato tantissimo, anche se ancora ho tutto da imparare. Quattro anni, infatti, non bastano. Mi ha sempre spronato a fare meglio, a guardare dentro le cose, e cercher`odi portarmi dietro i suoi insegnamenti ovunque la vita mi porter`a. Inoltre la sua simpatia e i suoi modi di fare, sono stati ottimi compagni in questi anni di dottorato. S`ı,anche gli insulti lo sono stati! Grazie, theboss. Di cuore. I arrived in Turin without knowing that much about Prof. Aldinucci and I asked him to be my supervisor. I could not make a better choice. Everything I know is thanks to him. I spent a lot of hours listening to him explaining all aspects about this research area (even if, from his point of view, there are always only two aspects), this opened my mind and made me grow up a lot, even if I still have to learn. Four years are not enough, actually. He always pushed me to do my best, to look inside things, and I will always try to bring his teachings with me, wherever I will go. Furthermore, his pleasantness and his way to do things, have been perfect companionship in these years. Thank you, theboss. Thank you so much. Ci sono alcune persone che voglio ringraziare, perch´etanto mi hanno dato e sar`o sempre in debito con loro per questo. There are a few people I want to thank, because they gave me so much and I will always be in debt with them. Vorrei iniziare con il mio co-supervisore, il Professor Guy Tremblay. Grazie di avermi seguito e di avermi aiutato cos`ıtanto nella stesura della tesi e nel capire tanti argomenti che, per entrambi, erano nuovi: la tua grande esperienza `estata illuminante, sei stato una guida importantissima. Grazie ancor di pi`uper i tanti consigli per la presentazione che mi hai dato, ho cercato di seguirli al meglio e spero di esserne stata all'altezza. I would like to start with my co-supervisor, professor Guy Tremblay. Thank you so much for helping me so much while writing this thesis and thank you for helping me in understanding topics that, for both, were new: your great experience has been enlightening, you have been a very important guide. Thank you even more for all the advices about the presentation, I tried to follow all of them the best I can do and I hope I have been able to do that. Vorrei ringraziare i revisori per il lavoro di revisione che hanno fatto: i loro com- menti sono stati preziosissimi e molto dettagliati. Mi hanno permesso di migliorare il manoscritto e soprattutto di rivedere molti aspetti che l'inesperienza non mi per- mette ancora di vedere. vi I would like to thank reviewers for the great job they did, their impressive knowledge and experience, and the time the dedicated to my work: their comments have been precious and very detailed. Their advices made me improve the manuscript and, mostly, to review a lot of aspects that the inexperience makes me not able to see them yet. Un profondo grazie va ai miei colleghi: una cricca di persone uniche con cui ho condiviso tanto e tanto mi hanno dato. Le nostre serate \non facciamo tardi" che finivano alle 4 o 5 del mattino, le mangiate, le uscite, i giochi, i mille discorsi, le Gossip Girlz.. Grazie di tutto ci`oche mi avete dato. Grazie a Caterina, la mia secolare amica. Tu sai tutto di me, che altro ti devo dire. Siamo cresciute insieme nei cinque anni di universit`aall'UniCal in cui ne abbiamo combinate tante e mi hai assistito durante questi anni: sei anche tu una colonna portante di questo dottorato e una portante nella mia vita. Quasi tutte le pause pranzo del mio dottorato le ho trascorse in palestra a \pic- chiare la gente", come dicono i miei colleghi. Le lezioni di kick-boxing/muay-thai sono state assolutamente una droga e a renderle tali sono state innanzitutto le per- sone che ho conosciuto l`ı. Grazie al Maestro Beppe e ai ragazzi, ho imparato ad apprezzare questo magnifico sport fatto di disciplina, rispetto, autocontrollo, che mi ha permesso di spingermi oltre i ci`oche pensavo fossero i miei limiti fisici e mentali. Grazie, Maestro, per avermi fatto scoprire un lato di me a me sconosciuto e grazie della persona e amico che sei. Grazie a tutti i miei \amici delle botte", grazie delle botte che mi avete dato, della perfetta compagnia che siete sia in palestra che fuori, per avermi aiutato a scaricare l'ansia accumulata ogni giorno. Grazie a Roberta, la mia amica di botte numero uno in assoluto. Abbiamo condiviso tanto in palestra e siamo cresciute insieme, mi hai fatto crescere e mi hai insegnato tanto. Soprattutto, hai ascoltato e sopportato tutti i miei sfoghi e le mie frustrazioni da dottoranda. Mai ti ringrazier`oabbastanza.
Recommended publications
  • Building Great Interfaces with OOABL
    Building great Interfaces with OOABL Mike Fechner Director Übersicht © 2018 Consultingwerk Ltd. All rights reserved. 2 Übersicht © 2018 Consultingwerk Ltd. All rights reserved. 3 Übersicht Consultingwerk Software Services Ltd. ▪ Independent IT consulting organization ▪ Focusing on OpenEdge and related technology ▪ Located in Cologne, Germany, subsidiaries in UK and Romania ▪ Customers in Europe, North America, Australia and South Africa ▪ Vendor of developer tools and consulting services ▪ Specialized in GUI for .NET, Angular, OO, Software Architecture, Application Integration ▪ Experts in OpenEdge Application Modernization © 2018 Consultingwerk Ltd. All rights reserved. 4 Übersicht Mike Fechner ▪ Director, Lead Modernization Architect and Product Manager of the SmartComponent Library and WinKit ▪ Specialized on object oriented design, software architecture, desktop user interfaces and web technologies ▪ 28 years of Progress experience (V5 … OE11) ▪ Active member of the OpenEdge community ▪ Frequent speaker at OpenEdge related conferences around the world © 2018 Consultingwerk Ltd. All rights reserved. 5 Übersicht Agenda ▪ Introduction ▪ Interface vs. Implementation ▪ Enums ▪ Value or Parameter Objects ▪ Fluent Interfaces ▪ Builders ▪ Strong Typed Dynamic Query Interfaces ▪ Factories ▪ Facades and Decorators © 2018 Consultingwerk Ltd. All rights reserved. 6 Übersicht Introduction ▪ Object oriented (ABL) programming is more than just a bunch of new syntax elements ▪ It’s easy to continue producing procedural spaghetti code in classes ▪
    [Show full text]
  • Functional Javascript
    www.it-ebooks.info www.it-ebooks.info Functional JavaScript Michael Fogus www.it-ebooks.info Functional JavaScript by Michael Fogus Copyright © 2013 Michael Fogus. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Mary Treseler Indexer: Judith McConville Production Editor: Melanie Yarbrough Cover Designer: Karen Montgomery Copyeditor: Jasmine Kwityn Interior Designer: David Futato Proofreader: Jilly Gagnon Illustrator: Robert Romano May 2013: First Edition Revision History for the First Edition: 2013-05-24: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449360726 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Functional JavaScript, the image of an eider duck, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
    [Show full text]
  • Data Model Standards and Guidelines, Registration Policies And
    Data Model Standards and Guidelines, Registration Policies and Procedures Version 3.2 ● 6/02/2017 Data Model Standards and Guidelines, Registration Policies and Procedures Document Version Control Document Version Control VERSION D ATE AUTHOR DESCRIPTION DRAFT 03/28/07 Venkatesh Kadadasu Baseline Draft Document 0.1 05/04/2007 Venkatesh Kadadasu Sections 1.1, 1.2, 1.3, 1.4 revised 0.2 05/07/2007 Venkatesh Kadadasu Sections 1.4, 2.0, 2.2, 2.2.1, 3.1, 3.2, 3.2.1, 3.2.2 revised 0.3 05/24/07 Venkatesh Kadadasu Incorporated feedback from Uli 0.4 5/31/2007 Venkatesh Kadadasu Incorporated Steve’s feedback: Section 1.5 Issues -Change Decide to Decision Section 2.2.5 Coordinate with Kumar and Lisa to determine the class words used by XML community, and identify them in the document. (This was discussed previously.) Data Standardization - We have discussed on several occasions the cross-walk table between tabular naming standards and XML. When did it get dropped? Section 2.3.2 Conceptual data model level of detail: changed (S) No foreign key attributes may be entered in the conceptual data model. To (S) No attributes may be entered in the conceptual data model. 0.5 6/4/2007 Steve Horn Move last paragraph of Section 2.0 to section 2.1.4 Data Standardization Added definitions of key terms 0.6 6/5/2007 Ulrike Nasshan Section 2.2.5 Coordinate with Kumar and Lisa to determine the class words used by XML community, and identify them in the document.
    [Show full text]
  • Automating Testing with Autofixture, Xunit.Net & Specflow
    Design Patterns Michael Heitland Oct 2015 Creational Patterns • Abstract Factory • Builder • Factory Method • Object Pool* • Prototype • Simple Factory* • Singleton (* this pattern got added later by others) Structural Patterns ● Adapter ● Bridge ● Composite ● Decorator ● Facade ● Flyweight ● Proxy Behavioural Patterns 1 ● Chain of Responsibility ● Command ● Interpreter ● Iterator ● Mediator ● Memento Behavioural Patterns 2 ● Null Object * ● Observer ● State ● Strategy ● Template Method ● Visitor Initial Acronym Concept Single responsibility principle: A class should have only a single responsibility (i.e. only one potential change in the S SRP software's specification should be able to affect the specification of the class) Open/closed principle: “Software entities … should be open O OCP for extension, but closed for modification.” Liskov substitution principle: “Objects in a program should be L LSP replaceable with instances of their subtypes without altering the correctness of that program.” See also design by contract. Interface segregation principle: “Many client-specific I ISP interfaces are better than one general-purpose interface.”[8] Dependency inversion principle: One should “Depend upon D DIP Abstractions. Do not depend upon concretions.”[8] Creational Patterns Simple Factory* Encapsulating object creation. Clients will use object interfaces. Abstract Factory Provide an interface for creating families of related or dependent objects without specifying their concrete classes. Inject the factory into the object. Dependency Inversion Principle Depend upon abstractions. Do not depend upon concrete classes. Our high-level components should not depend on our low-level components; rather, they should both depend on abstractions. Builder Separate the construction of a complex object from its implementation so that the two can vary independently. The same construction process can create different representations.
    [Show full text]
  • Designpatternsphp Documentation Release 1.0
    DesignPatternsPHP Documentation Release 1.0 Dominik Liebler and contributors Jul 18, 2021 Contents 1 Patterns 3 1.1 Creational................................................3 1.1.1 Abstract Factory........................................3 1.1.2 Builder.............................................8 1.1.3 Factory Method......................................... 13 1.1.4 Pool............................................... 18 1.1.5 Prototype............................................ 21 1.1.6 Simple Factory......................................... 24 1.1.7 Singleton............................................ 26 1.1.8 Static Factory.......................................... 28 1.2 Structural................................................. 30 1.2.1 Adapter / Wrapper....................................... 31 1.2.2 Bridge.............................................. 35 1.2.3 Composite............................................ 39 1.2.4 Data Mapper.......................................... 42 1.2.5 Decorator............................................ 46 1.2.6 Dependency Injection...................................... 50 1.2.7 Facade.............................................. 53 1.2.8 Fluent Interface......................................... 56 1.2.9 Flyweight............................................ 59 1.2.10 Proxy.............................................. 62 1.2.11 Registry............................................. 66 1.3 Behavioral................................................ 69 1.3.1 Chain Of Responsibilities...................................
    [Show full text]
  • Using Telelogic DOORS and Microsoft Visio to Model and Visualize Complex Business Processes
    Using Telelogic DOORS and Microsoft Visio to Model and Visualize Complex Business Processes “The Business Driven Application Lifecycle” Bob Sherman Procter & Gamble Pharmaceuticals [email protected] Michael Sutherland Galactic Solutions Group, LLC [email protected] Prepared for the Telelogic 2005 User Group Conference, Americas & Asia/Pacific http://www.telelogic.com/news/usergroup/us2005/index.cfm 24 October 2005 Abstract: The fact that most Information Technology (IT) projects fail as a result of requirements management problems is common knowledge. What is not commonly recognized is that the widely haled “use case” and Object Oriented Analysis and Design (OOAD) phenomenon have resulted in little (if any) abatement of IT project failures. In fact, ten years after the advent of these methods, every major IT industry research group remains aligned on the fact that these projects are still failing at an alarming rate (less than a 30% success rate). Ironically, the popularity of use case and OOAD (e.g. UML) methods may be doing more harm than good by diverting our attention away from addressing the real root cause of IT project failures (when you have a new hammer, everything looks like a nail). This paper asserts that, the real root cause of IT project failures centers around the failure to map requirements to an accurate, precise, comprehensive, optimized business model. This argument will be supported by a using a brief recap of the history of use case and OOAD methods to identify differences between the problems these methods were intended to address and the challenges of today’s IT projects.
    [Show full text]
  • Principles of the Concept-Oriented Data Model Alexandr Savinov
    Principles of the Concept-Oriented Data Model Alexandr Savinov Institute of Mathematics and Computer Science Academy of Sciences of Moldova Academiei 5, MD-2028 Chisinau, Moldova Fraunhofer Institute for Autonomous Intelligent System Schloss Birlinghoven, 53754 Sankt Augustin, Germany http://www.conceptoriented.com [email protected] Principles of the Concept-Oriented Data Model Alexandr Savinov Institute of Mathematics and Computer Science, Academy of Sciences of Moldova Academiei 5, MD-2028 Chisinau, Moldova Fraunhofer Institute for Autonomous Intelligent System Schloss Birlinghoven, 53754 Sankt Augustin, Germany http://www.conceptoriented.com [email protected] In the paper a new approach to data representation and manipulation is described, which is called the concept-oriented data model (CODM). It is supposed that items represent data units, which are stored in concepts. A concept is a combination of superconcepts, which determine the concept’s dimensionality or properties. An item is a combination of superitems taken by one from all the superconcepts. An item stores a combination of references to its superitems. The references implement inclusion relation or attribute- value relation among items. A concept-oriented database is defined by its concept structure called syntax or schema and its item structure called semantics. The model defines formal transformations of syntax and semantics including the canonical semantics where all concepts are merged and the data semantics is represented by one set of items. The concept-oriented data model treats relations as subconcepts where items are instances of the relations. Multi-valued attributes are defined via subconcepts as a view on the database semantics rather than as a built-in mechanism.
    [Show full text]
  • Metamodeling the Enhanced Entity-Relationship Model
    Metamodeling the Enhanced Entity-Relationship Model Robson N. Fidalgo1, Edson Alves1, Sergio España2, Jaelson Castro1, Oscar Pastor2 1 Center for Informatics, Federal University of Pernambuco, Recife(PE), Brazil {rdnf, eas4, jbc}@cin.ufpe.br 2 Centro de Investigación ProS, Universitat Politècnica de València, València, España {sergio.espana,opastor}@pros.upv.es Abstract. A metamodel provides an abstract syntax to distinguish between valid and invalid models. That is, a metamodel is as useful for a modeling language as a grammar is for a programming language. In this context, although the Enhanced Entity-Relationship (EER) Model is the ”de facto” standard modeling language for database conceptual design, to the best of our knowledge, there are only two proposals of EER metamodels, which do not provide a full support to Chen’s notation. Furthermore, neither a discussion about the engineering used for specifying these metamodels is presented nor a comparative analysis among them is made. With the aim at overcoming these drawbacks, we show a detailed and practical view of how to formalize the EER Model by means of a metamodel that (i) covers all elements of the Chen’s notation, (ii) defines well-formedness rules needed for creating syntactically correct EER schemas, and (iii) can be used as a starting point to create Computer Aided Software Engineering (CASE) tools for EER modeling, interchange metadata among these tools, perform automatic SQL/DDL code generation, and/or extend (or reuse part of) the EER Model. In order to show the feasibility, expressiveness, and usefulness of our metamodel (named EERMM), we have developed a CASE tool (named EERCASE), which has been tested with a practical example that covers all EER constructors, confirming that our metamodel is feasible, useful, more expressive than related ones and correctly defined.
    [Show full text]
  • Design Patterns for QA Automation Anton Semenchenko
    Design Patterns for QA Automation Anton Semenchenko CONFIDENTIAL 1 Agenda, part 1 (general) 1. Main challenges 2. Solution 3. Design Patterns – the simplest definition 4. Design Patterns language – the simplest definition 5. Encapsulation – the most important OOP principle CONFIDENTIAL 2 Agenda, part 2 (main patterns) 1. Page Element 2. Page Object 3. Action CONFIDENTIAL 3 Agenda, part 3 (less popular patterns) 1. Flow (Fluent Interface) – Ubiquitous language – Key word driven – Behavior Driven Development (BDD) 2. Domain Specific Language (DSL) – Flow 3. Navigator (for Web) CONFIDENTIAL 4 Agenda, part 4 (take away points) 1. “Rules” and principles 2. A huge set of useful links 3. A huge set of examples CONFIDENTIAL 5 2 main challenges in our every day work (interview experience) 1. Pure design 2. Over design CONFIDENTIAL 6 Solution 1. Find a balance CONFIDENTIAL 7 Design Patterns – the simplest definition 1. Elements (blocks) of reusable object-oriented software; 2. The re-usable form of a solution to a design problem; CONFIDENTIAL 8 Design Patterns – as a language 1. Design patterns that relate to a particular field (for example QA Automation) is called a pattern language 2. Design Patterns language gives a common terminology for discussing the situations specialists are faced with: – “The elements of this language are entities called patterns”; – “Each pattern describes a problem that occurs over and over again in our (QA Automation) environment”; – “Each pattern describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice!” CONFIDENTIAL 9 Design Patterns for QA Automation 1.
    [Show full text]
  • The Importance of Data Models in Enterprise Metadata Management
    THE IMPORTANCE OF DATA MODELS IN ENTERPRISE METADATA MANAGEMENT October 19, 2017 © 2016 ASG Technologies Group, Inc. All rights reserved HAPPINESS IS… SOURCE: 10/18/2017 TODAY SHOW – “THE BLUE ZONE OF HAPPINESS” – DAN BUETTNER • 3 Close Friends • Get a Dog • Good Light • Get Religion • Get Married…. Stay Married • Volunteer • “Money will buy you Happiness… well, it’s more about Financial Security” • “Our Data Models are now incorporated within our corporate metadata repository” © 2016 ASG Technologies Group, Inc. All rights reserved 3 POINTS TO REMEMBER SOURCE: 10/19/2017 DAMA NYC – NOONTIME SPEAKER SLOT – MIKE WANYO – ASG TECHNOLOGIES 1. Happiness is individually sought and achievable © 2016 ASG Technologies Group, Inc. All rights reserved 3 POINTS TO REMEMBER SOURCE: 10/19/2017 DAMA NYC – NOONTIME SPEAKER SLOT – MIKE WANYO – ASG TECHNOLOGIES 1. Happiness is individually sought and achievable 2. Data Models can in be incorporated into your corporate metadata repository © 2016 ASG Technologies Group, Inc. All rights reserved 3 POINTS TO REMEMBER SOURCE: 10/19/2017 DAMA NYC – NOONTIME SPEAKER SLOT – MIKE WANYO – ASG TECHNOLOGIES 1. Happiness is individually sought and achievable 2. Data Models can in be incorporated into your corporate metadata repository 3. ASG Technologies can provide an overall solution with services to accomplish #2 above and more for your company. © 2016 ASG Technologies Group, Inc. All rights reserved AGENDA Data model imports to the metadata collection View and search capabilities Traceability of physical and logical
    [Show full text]
  • Towards the Universal Spatial Data Model Based Indexing and Its Implementation in Mysql
    Towards the universal spatial data model based indexing and its implementation in MySQL Evangelos Katsikaros Kongens Lyngby 2012 IMM-M.Sc.-2012-97 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk IMM-M.Sc.: ISSN XXXX-XXXX Summary This thesis deals with spatial indexing and models that are able to abstract the variety of existing spatial index solutions. This research involves a thorough presentation of existing dynamic spatial indexes based on R-trees, investigating abstraction models and implementing such a model in MySQL. To that end, the relevant theory is presented. A thorough study is performed on the recent and seminal works on spatial index trees and we describe their basic properties and the way search, deletion and insertion are performed on them. During this effort, we encountered details that baffled us, did not make the understanding the core concepts smooth or we thought that could be a source of confusion. We took great care in explaining in depth these details so that the current study can be a useful guide for a number of them. A selection of these models were later implemented in MySQL. We investigated the way spatial indexing is currently engineered in MySQL and we reveal how search, deletion and insertion are performed. This paves the path to the un- derstanding of our intervention and additions to MySQL's codebase. All of the code produced throughout this research was included in a patch against the RDBMS MariaDB.
    [Show full text]
  • HANDLE - a Generic Metadata Model for Data Lakes
    HANDLE - A Generic Metadata Model for Data Lakes Rebecca Eichler, Corinna Giebler, Christoph Gr¨oger, Holger Schwarz, and Bernhard Mitschang In: Big Data Analytics and Knowledge Discovery (DaWaK) 2020 BIBTEX: @inproceedingsfeichler2020handle, title=fHANDLE-A Generic Metadata Model for Data Lakesg, author=fEichler, Rebecca and Giebler, Corinna and Gr{\"o\}ger,Christoph and Schwarz, Holger and Mitschang, Bernhardg, booktitle=fInternational Conference on Big Data Analytics and Knowledge Discovery (DaWaK) 2020g, pages=f73{88g, year=f2020g, organization=fSpringerg, doi=f10.1007/978-3-030-59065-9 7g g c by Springer Nature This is the author's version of the work. The final authenticated version is avail- able online at: https://doi.org/10.1007/978-3-030-59065-9 7 HANDLE - A Generic Metadata Model for Data Lakes Rebecca Eichler1, Corinna Giebler1, Christoph Gr¨oger2, Holger Schwarz1, and Bernhard Mitschang1 1 University of Stuttgart, Universit¨atsstraße38, 70569 Stuttgart, Germany [email protected] 2 Robert Bosch GmbH, Borsigstraße 4, 70469 Stuttgart, Germany [email protected] Abstract The substantial increase in generated data induced the de- velopment of new concepts such as the data lake. A data lake is a large storage repository designed to enable flexible extraction of the data's value. A key aspect of exploiting data value in data lakes is the collec- tion and management of metadata. To store and handle the metadata, a generic metadata model is required that can reflect metadata of any potential metadata management use case, e.g., data versioning or data lineage. However, an evaluation of existent metadata models yields that none so far are sufficiently generic.
    [Show full text]