Reactively Querying an Immutable Log of Relational Facts. Design And

Total Page:16

File Type:pdf, Size:1020Kb

Reactively Querying an Immutable Log of Relational Facts. Design And Reactively Querying an Immutable Log of Relational Facts Design and Implementation of a Bitemporal EAV Database with a Subscription Query Language Master Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Engineering to the University of Applied Sciences FH Campus Wien Master Degree Program Software Design and Engineering Author: Albert Zak Student identification number: 1810838004 Supervisor: Priv.-Doz. Mag.rer.soc.oec. Dipl.-Ing. Dipl.-Ing. Dr.techn. Karl Michael Göschka Date: 2020-05-31 Declaration of authorship: I declare that this Master Thesis has been written by myself. I have not used any other than the listed sources, nor have I received any unauthorized help. I hereby certify that I have not submitted this Master Thesis in any form (to a reviewer for assessment) either in Austria or abroad. Furthermore, I assure that the (printed and electronic) copies I have submitted are identical. Date: Signature: Abstract Growing customer demands lead to increased incidental complexity of data-intensive distributed applications. Relational Database Management Systems (RDBMS), es- pecially those implementing Structured Query Language (SQL) possess performant but destructive-by-default write semantics. Modeling domains such as healthcare in classic RDBMS leads to an explosion in the number of columns and tables. Their structure has to be known in advance, discouraging an explorative development process. Requirements of real time collaboration and auditability of changes push the limit of established paradigms. The core of the data layer presented in this thesis is a simple relational model based on facts in the form of Entity-Attribute-Value (EAV) triples. A central append- only immutable log accretes these facts via assertions and retractions. Transactions guarantee atomicity/consistency/isolation/durability (ACID) and are themselves first class queryable entities carrying arbitrary meta facts, realizing strict bitemporal auditability of all changes by keeping two timestamps: transaction time tx and valid time tv. Changes are replicated to clients which subscribe to the result of a query. Multiple incrementally maintained indices (EAVT, AEVT, AVET, VAET) grant efficient direct access to tuples, regarding the database analogous to a combined graph, column, and document store. The database itself is an immutable value which can be passed within the program and queried locally. This work demonstrates the feasibility of implementing various desirable features in less than 400 lines of Clojure: bitemporality, audit logging, transactions, server- client reactivity, consistency criteria, derived facts, and a simple relational query language based on Datalog. i Key Terms EAV Database Bitemporal Immutable Datalog Lisp ii List of Abbreviations tv valid time tx transaction time 6NF 6th normal form ACID Atomicity, Consistency, Isolation, Durability API Application Programming Interface CAP Consistency, Availability, Partition tolerance CQRS Command Query Responsibility Segregation CRDT Conflict-free Replicated Data Type CRUD Create, Read, Update, Delete DDD Domain-Driven Design DDP Decentralized Data Protocol DRP Distributed Reactive Programming DSL Domain-specific language edn Extensible Data Notation ES Event Sourcing FRP Functional-Reactive Programming IETF Internet Engineering Task Force JSON JavaScript Object Notation lvar logic variable MUMPS Massachusetts General Hospital Utility Multi- Programming System ORM Object-Relational Mapping OT Operational Transformation PACELC Partitions: Availability or Consistency, Else Latency or Consistency iii RAD Rapid Application Development RDBMS Relational Database Management Systems RDF Resource Description Framework REST Representational State Transfer RPC Remote Procedure Call SaaS Software-as-a-Service SPA Single Page Application SQL Structured Query Language UI User Interface UUID Universally Unique Identifier XML Extensible Markup Language iv Contents 1 Introduction 1 1.1 Problem . .3 1.2 Contribution . .3 1.3 State of the art . .4 2 Related Work 8 3 Design 12 3.1 Problems . 12 3.2 Goals . 14 3.3 Non-goals . 15 3.4 Conceptual model . 15 3.5 Query language . 17 4 Implementation 19 4.1 Data model . 21 4.2 Writing . 23 4.3 Querying . 27 4.4 Publishing and subscribing . 30 5 Discussion 32 5.1 Advantages . 32 5.2 Limitations . 33 6 Future Work 35 7 Conclusion 37 v 1 Introduction This thesis describes a possible design and a proof-of-concept implementation of a data layer for near real time distributed collaborative applications targeted at operational workloads in small to medium sized businesses with strict auditability requirements. Context. The ideas presented in this work are based on the author’s experience running a Software-as-a-Service (SaaS) business developing and operating a highly available information system in the medical sector for the past five years (see figure 1). The deployed system covers all of the daily operational needs of medium-sized practices, handling medical records, personal data, and associated imagery for more than 80.000 patients a year, scheduling appointments, employees, departments, and rooms subject to complex business rules constraining available treatments, forecasting, planning and reporting clinical workload, sending appointment reminders, handling inbound calls and messages from patients, recording internal referrals and escalations, along with support for distributed staff (e.g. external call centers, telemedicine) owing to fine-grained roles and authorization controls, thorough audit logging of all changes, and compliance with relevant regulations. Figure 1: The author’s medical information system 1 Background. The system is built on top of the open source Meteor framework (presented in section 2) and MongoDB, making heavy use of their "live query" mechanisms, along with Redux, React and Minimongo in the front end, packaged with Electron and React Native. To increase availability, the back end is deployed over three data centers using primary-backup replication. Currently, the tracked code base comprises about 42.000 lines of JavaScript. Being a system designed, developed, and operated by a single person, the primary challenge by far has been to keep runaway incidental complexity at bay. While the abstractions around "live" publication/subscription provided by the Meteor framework turned out to be very expressive, other design decisions of the underlying framework and database caused unnecessary complexity in the application code. Drawbacks include pervasive mutability, the need to denormalize data, the lack of a relational query language, no server-side joins, and no (bi-)temporality or notions of time or memory for auditing purposes. Outline. Section 2 presents related work. Section 3 exhibits common problems experienced in data-intensive business applications and shows a different approach towards solving some of them in a functional, immutable, and reactive way. A proof- of-concept implementation is described in section 4 and its merits and deficiencies are discussed in section 5. 2 1.1 Problem Classic Relational Database Management Systems (RDBMS) are not suitable for modeling sparse, hierarchical or multi-valued data in domains where evolution and dynamism are hard requirements. They also lack a notion of memory or change over time because writes are destructive by default. Attempts to add concepts of temporality to Structured Query Language (SQL) are complicated and, as a result, not widely used. Auditing changes to a mutable black box with no history is hard. Developers fear the networking aspects of sending queries on a round trip "over there" [Hic12a] to the database, as opposed to having the data in memory and being able to query it directly. On top of all, customers demand increasingly interactive distributed collaboration environments, which pushes the limits of established request-response mechanisms. 1.2 Contribution This thesis explains a possible design of a data layer for business applications and describes its prototypal implementation. The core is a simple relational data model based on facts in the form of Entity-Attribute-Value (EAV) tuples similar to the 6th normal form (6NF) or the Resource Description Framework (RDF) as seen in Semantic Web literature. Such EAV tuples are accreted in an append-only (immutable) log as assertions and retractions together with two timestamps: transaction time (tx) at which the fact was added to the log, and valid time (tv) at which the fact became true in the wider context of the system [Sno92] i.e., the real world. Goal. The original goal of this work was to demonstrate the feasibility of an im- plementation of various desirable data layer features in less than 1000 lines total of readily comprehensible Lisp (Clojure, more specifically ClojureScript [Hic08]): Bitemporality, practically immutable audit logging, transaction metadata, server-client reactivity, schemalessness, consistency criteria, derived facts, versatile indexing, transactions with guarantees of Atomicity, Consistency, Isolation, Durability (ACID), and a simple relational query language based on Datalog. Method. The thesis explains the decisions made while iterating on the design and its implementation and includes a discussion on limitations and advantages. Results. Embracing the EAV+tx+tv data model together with the expressive pro- gramming language ClojureScript reveals that it is possible to implement a proof- of-concept data layer and query language for business applications with various
Recommended publications
  • Specification for JSON Abstract Data Notation Version
    Standards Track Work Product Specification for JSON Abstract Data Notation (JADN) Version 1.0 Committee Specification 01 17 August 2021 This stage: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.html https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/jadn-v1.0-cs01.pdf Previous stage: https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.html https://docs.oasis-open.org/openc2/jadn/v1.0/csd02/jadn-v1.0-csd02.pdf Latest stage: https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.md (Authoritative) https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.html https://docs.oasis-open.org/openc2/jadn/v1.0/jadn-v1.0.pdf Technical Committee: OASIS Open Command and Control (OpenC2) TC Chair: Duncan Sparrell ([email protected]), sFractal Consulting LLC Editor: David Kemp ([email protected]), National Security Agency Additional artifacts: This prose specification is one component of a Work Product that also includes: JSON schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.json JADN schema for JADN documents: https://docs.oasis-open.org/openc2/jadn/v1.0/cs01/schemas/jadn-v1.0.jadn Abstract: JSON Abstract Data Notation (JADN) is a UML-based information modeling language that defines data structure independently of data format.
    [Show full text]
  • Performance Comparison of Data Serialization Schemes for ETSI ITS Car-To-X Communication Systems
    International Journal on Advances in Telecommunications, vol 8 no 1 & 2, year 2015, http://www.iariajournals.org/telecommunications/ 48 Performance Comparison of Data Serialization Schemes for ETSI ITS Car-to-X Communication Systems Sebastian Bittl, Arturo A. Gonzalez, Michael Spahn,¨ and Wolf A. Heidrich Fraunhofer ESK Munich, Germany Email: fsebastian.bittl, arturo.gonzalez, michael.spaehn, [email protected] Abstract—Wireless Car-to-X communication is about to enter sense collision avoidance mechanism (CSMA-CA) for multi- the mass market in upcoming years. The non-licensed, but user channel access, longer packets imply longer transmission reserved bandwidth for such communications is relatively narrow times and hence a higher channel busy ratio. This means that in terms of the few usable channels and the expected number nodes near a transmitter, which are willing to transmit as well, of communicating entities. Among other communication aspects, must wait longer until the channel is free to use. Moreover, data serialization schemes that allow compact encoding as well as the wireless channel has a limited capacity allowing only a fast encoding and decoding are required. Compact representation of data helps in keeping a minimal bandwidth overhead in limited number of transmitted bits per unit of time. The lower the wireless channel. Moreover, since delay is an important the number of transmitted messages, the higher the number of performance parameter in Car-to-X communication, the pro- users that could use the given frequency spectrum in parallel. cessing time of the serialization/deserialization schemes shall be kept to a minimum. So far, no detailed analysis of different The data serialization schemes also influence processing data serialization schemes for Car-to-X communication regarding power requirements in both ETSI Intelligent Transport Sys- important properties such as runtime, memory consumption tems (ITS) in Europe [2] and Wireless Access in Vehicular and encoded output length has been published.
    [Show full text]
  • Clojure Cookbook
    Clojure Cookbook Luke VanderHart and Ryan Neufeld Clojure Cookbook by Luke VanderHart and Ryan Neufeld Copyright © 2014 Cognitect, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Meghan Blanchette Indexer: Judith McConville Production Editor: Kristen Brown Cover Designer: Karen Montgomery Copyeditor: Amanda Kersey Interior Designer: David Futato Proofreader: Rachel Head Illustrator: Rebecca Demarest March 2014: First Edition Revision History for the First Edition: 2014-03-04: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449366179 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Clojure Cookbook, the image of an aardwolf, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-449-36617-9 [LSI] Table of Contents Preface.
    [Show full text]
  • Concise Binary Object Representation See Also: IETF94 CBOR Lightning Tutorial Carsten Bormann, 2015-11-01
    CBOR (RFC 7049) Concise Binary Object Representation See also: IETF94 CBOR lightning tutorial Carsten Bormann, 2015-11-01 http://www.tzi.de/~cabo/CBOR-2015-11-01.pdf 1 Slide stolen from Douglas Crockford History of Data Formats • Ad Hoc • Database Model • Document Model • Programming Language Model 2 Box notation TLV 3 XML XSD 4 JSON data model Primitive: Container: • null • “object” (map, with text • false, true string keys only) • numbers (decimal float) • array • text string (UTF-8) 5 CBOR data model Primitive: Container: • null (+ other “simple”) • map (any key) • false, true • array • numbers: • Integer • Tag (extension point) • Float16, 32, 64 • text string (UTF-8) • byte string 6 JSON limitations • No binary data (byte strings) • Numbers are in decimal, some parsing required • Format requires copying: • Escaping for strings • Base64 for binary • No extensibility (e.g., date format?) • Interoperability issues • I-JSON further reduces functionality (RFC 7493) 7 Data Formats Character- Concise based Binary Document- Oriented XML EXI Data- Oriented JSON ??? 8 Prof. Carsten Bormann, [email protected] BSON and friends • Lots of “binary JSON” proposals • Often optimized for data at rest, not protocol use (BSON ➔ MongoDB) • Most are more complex than JSON 9 Why a new binary object format? • Different design goals from current formats – stated up front in the document • Extremely small code size – for work on constrained node networks • Reasonably compact data size – but no compression or even bit-fiddling • Useful to any protocol or application that likes the design goals 10 Concise Binary Object Representation (CBOR) 11 “Sea Boar” “Sea Boar” 12 Graphics: Stefanie Gerdes Data Formats Character- Concise based Binary Document- Oriented XML EXI Data- Oriented JSON CBOR 13 Prof.
    [Show full text]
  • CBOR (RFC 7049) Concise Binary Object Representation IETF94 CBOR Lightning Tutorial Carsten Bormann, 2015-11-01
    CBOR (RFC 7049) Concise Binary Object Representation IETF94 CBOR lightning tutorial Carsten Bormann, 2015-11-01 1 CBOR: Agenda • What is it, and when might I want it? • How does it work? • How do I work with it? 2 CBOR: Agenda • What is it, and when might I want it? • How does it work? • How do I work with it? 3 Slide stolen from Douglas Crockford History of Data Formats • Ad Hoc • Database Model • Document Model • Programming Language Model Box notation TLV 5 XML XSD 6 Slide stolen from Douglas Crockford JSON • JavaScript Object Notation • Minimal • Textual • Subset of JavaScript Values • Strings • Numbers • Booleans • Objects • Arrays • null Array ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"] [ [0, -1, 0], [1, 0, 0], [0, 0, 1] ] Object { "name": "Jack B. Nimble", "at large": true, "grade": "A", "format": { "type": "rect", "width": 1920, "height": 1080, "interlace": false, "framerate": 24 } } Object Map { "name": "Jack B. Nimble", "at large": true, "grade": "A", "format": { "type": "rect", "width": 1920, "height": 1080, "interlace": false, "framerate": 24 } } Application usage of JSON • No schema needed for parsing • Representation types: bool, number, string, null; map, array • Semantics: • struct: key names in map (vs. table use of maps) • record: position in array (vs. vector use of arrays) • Extensibility: new keys in a key/value struct • Ignore what you don’t understand 12 JSON limitations • No binary data (byte strings) • Numbers are in decimal, some parsing required • Format requires copying: • Escaping for strings • Base64 for binary • No extensibility (e.g., date format?) • Interoperability issues • I-JSON further reduces functionality (RFC 7493) 13 Data Formats Character- Concise based Binary Document- Oriented XML EXI Data- Oriented JSON ??? 14 Prof.
    [Show full text]
  • FRED: Um Formato Para Representação E Troca De Dados
    Universidade de Brasília – UnB Faculdade UnB Gama – FGA Engenharia de Software FRED: Um formato para representação e troca de dados Autor: Matheus de Sousa Bernardo Orientador: Prof. Dr. Fábio Macêdo Mendes Brasília, DF 2019 Matheus de Sousa Bernardo FRED: Um formato para representação e troca de dados Monografia submetida ao curso de graduação em Engenharia de Software da Universidade de Brasília, como requisito parcial para ob- tenção do Título de Bacharel em Engenharia de Software. Universidade de Brasília – UnB Faculdade UnB Gama – FGA Orientador: Prof. Dr. Fábio Macêdo Mendes Brasília, DF 2019 Matheus de Sousa Bernardo FRED: Um formato para representação e troca de dados/ Matheus de Sousa Bernardo. – Brasília, DF, 2019- 79 p. : il. (algumas color.) ; 30 cm. Orientador: Prof. Dr. Fábio Macêdo Mendes Trabalho de Conclusão de Curso – Universidade de Brasília – UnB Faculdade UnB Gama – FGA , 2019. 1. representação de dados. 2. troca de dados. I. Prof. Dr. Fábio Macêdo Mendes. II. Universidade de Brasília. III. Faculdade UnB Gama. IV. FRED: Um formato para representação e troca de dados CDU 02:141:005.6 Matheus de Sousa Bernardo FRED: Um formato para representação e troca de dados Monografia submetida ao curso de graduação em Engenharia de Software da Universidade de Brasília, como requisito parcial para ob- tenção do Título de Bacharel em Engenharia de Software. Trabalho aprovado. Brasília, DF, 12 de Julho de 2019 – Data da aprovação do trabalho: Prof. Dr. Fábio Macêdo Mendes Orientador Prof. Dr. Renato Coral Sampaio Convidado 1 Prof. Dr. Bruno César Ribas Convidado 2 Brasília, DF 2019 Resumo Este projeto propõe a especificação e a implementação de um formato para troca ere- presentação de dados chamado FRED.
    [Show full text]
  • Clojure Applied: from Practice to Practitioner
    www.allitebooks.com www.allitebooks.com Early praise for Clojure Applied Starting your first Clojure job? Kicking off your first big open source Clojure project? This is the book you need to get started building Clojure projects. You may worry, “I don’t know what I don’t know.” This book points you to the things you need to know. ➤ Bridget Hillyer Software developer, LivingSocial While other Clojure books enumerate the beautiful, pristine qualities derived from immutability and functional programming, this book is about getting your hands very dirty. Clojure Applied takes you step by step through the process of designing, writing, and shipping robust, high-performance production systems. ➤ David Nolen Software developer, Cognitect It’s refreshing to see experts such as Ben and Alex distill their knowledge into a tome for the masses. Clojure Applied covers a myriad of subjects that perplexed me about Clojure on my own journey. I’m hopeful the book will short-circuit the journey for many. ➤ Ryan Neufeld Software consultant, founder, Homegrown Labs www.allitebooks.com Clojure Applied is the book I wish I had when I started writing Clojure. This book is packed with advice and tips based on years of real-world experience. I learned something new on what seemed like every page. Buy this book and take your Clojure code to the next level. ➤ Larry Staton, Jr. Senior software engineer, Hendrick Automotive Group I had a solid foundation and collection of recipes. When I was ready for the next level, Clojure Applied gave me application patterns and instructed me on how to apply them in an idiomatic way, quickly.
    [Show full text]
  • Les Fichiers De Texte Et Leurs Applications
    EPAI, Jérôme Frossard (2017) LES FICHIERS DE TEXTE ET LEURS APPLICATIONS Caractériser, mettre en forme et analyser des données Texte et informatique 1 En informatique on utilise beaucoup de texte, mais pas seulement du texte dans le langage ordinaire. On utilise du texte également pour : Communiquer avec une machine à travers un terminal avec un interpréteur de commandes (Bash, Cmd, PowerShell, etc.). Écrire le code source d’un programme (Java, JavaScript, etc.). Stocker des données structurées (XML, JSON, etc.). Représenter l’apparence, la structure et la mise en page du texte dans un document de texte (HTML, Open XML, ODF, TeX, etc.). Échanger des données entre des programmes à travers le réseau ou le système de fichiers (HTTP, CSV, XML, JSON, etc.). EPAI, Jérôme Frossard (2017) Fichier de texte 2 Contient une suite de caractères codés sous forme numérique selon une certaine norme (ASCII, ANSI, UTF-8, etc.). Ne contient pas d’information sur la norme utilisée pour coder les caractères. L’émetteur et le récepteur du fichier doivent s’accorder d’une manière ou d’une autre sur la norme à utiliser. Certains éditeurs de texte peuvent détecter la norme d’encodage, mais des erreurs de détection sont possibles. EPAI, Jérôme Frossard (2017) Éditeur de texte 3 Un éditeur de texte est un programme permettant de créer et de modifier des fichiers de textes. Ne doit pas être confondu avec un logiciel de traitement de texte (pas de structuration ou de mise en page du texte). Peut avoir des fonctions avancées comme l’autocomplétion ou la coloration syntaxique pour faciliter l’écriture de code (Java, HTML, TeX, Bash, etc.).
    [Show full text]
  • ICBO2015 International Conference on Biomedical Ontology 2005
    ICBO2015 International Conference on Biomedical Ontology 2005 Proceedings of the Main Conference July, 27-30 Lisbon, Portugal Preface ICBO2015 is the 6th International Conference on Biomedical Ontology held on July 26-30, 2015 in Lisboa. It is a forum for presentation and discussion of current work and new advances in Biomedical Ontologies. This conference explores directions of future research and facilitates collaboration between researchers, developers, practitioners and students. After a careful revision, the program committee has selected 16 regular papers, 5 early career papers, 14 posters and 9 demos for presentation and discussion in the main confer- ence. This volume contains all these papers. The main conference includes also two invited presentations, one by Helen Parkinson, entitled Using Ontologies in the Wild, and another by Egon Willighagen, entitled The role of ontologies in chem- and bioinformatics. The main conference was preceded by 2 workshops, 1 hackathon and 4 tutorials: • 4th International Workshop on Vaccine and Drug Ontology Studies (VDOS) • Third International Workshop on Definitions in Ontologies (IWOOD) • Genomics data standards - hackathon • Tawny-OWL: re-purposing software engineering for ontology building • Basic Formal Ontology (BFO) • OBO Tutorial: Best Practices for Ontology Use • Tutorial on the Biological Pathway Exchange (BioPAX) Ontology and Pathway Com- mons Database We would like to acknowledge all the organizing committee, the program committee, the additional reviewers, the volunteers, authors
    [Show full text]