Nosql Theory and Examples

PIOTR FULMAŃSKI NoSQL Theory and examples EARLY ACCESS VERSION EDITION 1.0, JANUARY 2021 SIMPLE INTRODUCTION SERIES NoSQL. Theory and examples SIMPLE INTRODUCTION SERIES Copyright © 2021, Piotr Fulmański All rights reserved www: https://fulmanski.pl email: [email protected] GitHub: https://github.com/fulmanp/NoSQL-Theory-and-examples Edition: 1 First published: 1.0, January 12, 2021 (planned) This edition: 1.0, January 2021 (early access) Build number: 202101302359 ISBN-13: 978-83-957405-0-3 (eBook) While the author has used good faith efforts to ensure that the information and instructions contained in this work are accurate, the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. The author makes no warranty, express or implied, with respect to the material contained herein. If any code samples, software or any other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. All trademarks and registered trademarks appearing in this book are the property of their own respective owners. Release Notes (N – new, U – update) Edition 1.0 release: February 2021 • Chapter: 9: Time series databases (N) Edition 1.0 release: January 2021 • Chapter: 10: Apache Pig (N) • Change text highlighting (U) Edition 1.0 release: October 2020, November 2020 • Fix layout Edition 1.0 release: September 2020 • Migrate from iBooks Author to Pages as iBooks Author is no longer updated since July 1, 2020. • Preface (U) • Chapter 1: SQL, NoSQL, NewSQL (U) • Chapter 7: Graph stores (N) • Appendix D: SQL recursive queries (N) Edition 1.0 release: May 2020 • Chapter 1: SQL, NoSQL, NewSQL (U) • Chapter 3: NoSQL (N) • Chapter 4: Column family stores (N) • Chapter 6: Document store (N) • Appendix A: JSON (N) • Appendix B: XML (N) • Appendix C: HTTP (N) Edition 1.0 release: April 2020 • Preface (N) • Chapter 1: SQL, NoSQL, NewSQL (N) • Chapter 2: SQL. Relational model (N) • Chapter 3: Key-value stores (N) Table of contents Preface .......................................................................................................xi SQL, NoSQL, NewSQL .............................................................................21 Data and database ...........................................................................23 SQL ..................................................................................................27 Big Data – big problem with data ...................................................39 NoSQL .............................................................................................45 NewSQL ...........................................................................................57 Summary .........................................................................................61 SQL Relational model ..............................................................................71 Toward relational supremacy .........................................................73 Relational theory key concepts .......................................................85 Normal forms ..................................................................................89 Transactional model and ACID ......................................................99 Codd's relational rules ...................................................................105 Keep SQL in the mainstream .........................................................111 Summary ........................................................................................115 NoSQL .....................................................................................................117 Motivations ....................................................................................119 BASE ..............................................................................................131 CAP theorem ..................................................................................133 Consistency ....................................................................................135 Summary .......................................................................................139 Column family stores .............................................................................143 The origins .....................................................................................145 vii Hadoop ...........................................................................................151 HBase .............................................................................................155 Summary .......................................................................................163 Working with HBase .....................................................................165 Key-value stores ......................................................................................177 Basic ideas .....................................................................................179 Key-value stores vs. relational databases .....................................183 Essential features of key-value databases .....................................185 Key is the key .................................................................................189 Values .............................................................................................197 Summary .......................................................................................201 Working with Riak ........................................................................203 Document stores ....................................................................................215 Basic ideas .....................................................................................217 Document stores vs. relational databases ....................................225 Summary .......................................................................................231 Working with Apache CouchDB – CRUD basics .........................233 Working with Apache CouchDB – querying ................................253 Graph stores ...........................................................................................277 Do we really need another one database type? ............................279 Basic ideas .....................................................................................285 Graph stores design ......................................................................295 Summary .......................................................................................301 Working with Gremlin ..................................................................303 Working with Neo4j ......................................................................319 Column databases ..................................................................................337 Basic ideas .....................................................................................339 Time series databases ............................................................................341 viii Everything changes. New trends of XXI century .........................343 Do we really need time series databases? ....................................349 Data model ....................................................................................365 Working with OpenTSDB ..............................................................371 Apache Pig ..............................................................................................372 What Apache Pig is .......................................................................374 Pig Latin ........................................................................................378 Working examples -- basic informations .....................................398 Working examples -- beyond basic ...............................................414 Summary .......................................................................................435 JSON ......................................................................................................436 Overview .......................................................................................438 Syntax ...........................................................................................440 XML .......................................................................................................444 Overview .......................................................................................446 Tools ..............................................................................................452 HTTP ......................................................................................................468 Overview .......................................................................................470 Tools ..............................................................................................478 SQL recursive queries ...........................................................................490 Common table expressions ...........................................................492 Recursive queries ..........................................................................496 Bibliography ..........................................................................................504 Tables and figures ..................................................................................518

Nosql Theory and Examples

Hadoop Tutorials  Cassandra  Hector API  Request Tutorial  About

Evolution of Database Technology

MÁSTER EN INGENIERÍA WEB Proyecto Fin De Máster

Security Log Analysis Using Hadoop Harikrishna Annangi Harikrishna Annangi, [email protected]

Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions

Apache Oozie the Workflow Scheduler for Hadoop

CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization

Evolution of Database Technology  1960S:  (Electronic) Data Collection, Database Creation, IMS (Hierarchical Database System by IBM) and Network DBMS

Persisting Big-Data the Nosql Landscape

IBM Cloudant: Database As a Service Fundamentals

15-445/645 Database Systems (Fall 2020) Carnegie Mellon University Prof

Distributed, Concurrent, and Independent Access to Encrypted Cloud Databases