Seven Databases in Seven Weeks, Second Edition

Total Page:16

File Type:pdf, Size:1020Kb

Seven Databases in Seven Weeks, Second Edition Prepared exclusively for Shohei Tanaka Prepared exclusively for Shohei Tanaka What Readers Are Saying About Seven Databases in Seven Weeks, Second Edition Choosing a database is perhaps one of the most important architectural decisions a developer can make. Seven Databases in Seven Weeks provides a fantastic tour of different technologies and makes it easy to add each to your engineering toolbox. ➤ Dave Parfitt Senior Site Reliability Engineer, Mozilla By comparing each database technology to a tool you’d find in any workshop, the authors of Seven Databases in Seven Weeks provide a practical and well-balanced survey of a very diverse and highly varied datastore landscape. Anyone looking to get a handle on the database options available to them as a data platform should read this book and consider the trade-offs presented for each option. ➤ Matthew Oldham Director of Data Architecture, Graphium Health Reading this book felt like some of my best pair-programming experiences. It showed me how to get started, kept me engaged, and encouraged me to experiment on my own. ➤ Jesse Hallett Open Source Mentor This book will really give you an overview of what’s out there so you can choose the best tool for the job. ➤ Jesse Anderson Managing Director, Big Data Institute Prepared exclusively for Shohei Tanaka We've left this page blank to make the page numbers the same in the electronic and paper books. We tried just leaving it out, but then people wrote us to ask about the missing pages. Anyway, Eddy the Gerbil wanted to say “hello.” Prepared exclusively for Shohei Tanaka Seven Databases in Seven Weeks, Second Edition A Guide to Modern Databases and the NoSQL Movement Luc Perkins with Eric Redmond and Jim R. Wilson The Pragmatic Bookshelf Raleigh, North Carolina Prepared exclusively for Shohei Tanaka Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic books, screencasts, and audio books can help you and your team create better software and have more fun. Visit us at https://pragprog.com. The team that produced this book includes: Publisher: Andy Hunt VP of Operations: Janet Furlow Managing Editor: Brian MacDonald Supervising Editor: Jacquelyn Carter Series Editor: Bruce A. Tate Copy Editor: Nancy Rapoport Indexing: Potomac Indexing, LLC Layout: Gilson Graphics For sales, volume licensing, and support, please contact [email protected]. For international rights, please contact [email protected]. Copyright © 2018 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. ISBN-13: 978-1-68050-253-4 Encoded using the finest acid-free high-entropy binary digits. Book version: P1.0—April 2018 Prepared exclusively for Shohei Tanaka Contents Acknowledgments . vii Preface . ix 1. Introduction . 1 It Starts with a Question 2 The Genres 3 Onward and Upward 8 2. PostgreSQL . 9 That’s Post-greS-Q-L 9 Day 1: Relations, CRUD, and Joins 10 Day 2: Advanced Queries, Code, and Rules 21 Day 3: Full Text and Multidimensions 36 Wrap-Up 50 3. HBase . 53 Introducing HBase 54 Day 1: CRUD and Table Administration 55 Day 2: Working with Big Data 67 Day 3: Taking It to the Cloud 82 Wrap-Up 88 4. MongoDB . 93 Hu(mongo)us 93 Day 1: CRUD and Nesting 94 Day 2: Indexing, Aggregating, Mapreduce 110 Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS 124 Wrap-Up 132 5. CouchDB . 135 Relaxing on the Couch 135 Prepared exclusively for Shohei Tanaka Contents • vi Day 1: CRUD, Fauxton, and cURL Redux 137 Day 2: Creating and Querying Views 145 Day 3: Advanced Views, Changes API, and Replicating Data 158 Wrap-Up 174 6. Neo4J . 177 Neo4j Is Whiteboard Friendly 177 Day 1: Graphs, Cypher, and CRUD 179 Day 2: REST, Indexes, and Algorithms 189 Day 3: Distributed High Availability 202 Wrap-Up 207 7. DynamoDB . 211 DynamoDB: The “Big Easy” of NoSQL 211 Day 1: Let’s Go Shopping! 216 Day 2: Building a Streaming Data Pipeline 233 Day 3: Building an “Internet of Things” System Around DynamoDB 246 Wrap-Up 255 8. Redis . 259 Data Structure Server Store 259 Day 1: CRUD and Datatypes 260 Day 2: Advanced Usage, Distribution 274 Day 3: Playing with Other Databases 289 Wrap-Up 303 9. Wrapping Up . 305 Genres Redux 305 Making a Choice 309 Where Do We Go from Here? 309 A1. Database Overview Tables . 311 A2. The CAP Theorem . 315 Eventual Consistency 316 CAP in the Wild 317 The Latency Trade-Off 317 Bibliography . 319 Index . 321 Prepared exclusively for Shohei Tanaka Acknowledgments A book with the size and scope of this one is never the work of just the authors, even if there are three of them. It requires the effort of many very smart people with superhuman eyes spotting as many mistakes as possible and providing valuable insights into the details of these technologies. We’d like to thank, in no particular order, all of the folks who provided their time and expertise: Dave Parfitt Jerry Sievert Jesse Hallett Matthew Oldham Ben Rady Nick Capito Jesse Anderson Sean Moubry Finally, thanks to Bruce Tate for his experience and guidance. We’d also like to sincerely thank the entire team at the Pragmatic Bookshelf. Thanks for entertaining this audacious project and seeing us through it. We’re especially grateful to our editor, Jackie Carter. Your patient feedback made this book what it is today. Thanks to the whole team who worked so hard to polish this book and find all of our mistakes. For anyone we missed, we hope you’ll accept our apologies. Any omissions were certainly not intentional. From Eric: Dear Noelle, you’re not special; you’re unique, and that’s so much better. Thanks for living through another book. Thanks also to the database creators and committers for providing us something to write about and make a living at. From Luc: First, I have to thank my wonderful family and friends for making my life a charmed one from the very beginning. Second, I have to thank a handful of people who believed in me and gave me a chance in the tech industry at different stages of my career: Lucas Carlson, Marko and Saša Gargenta, Troy Howard, and my co-author Eric Redmond for inviting me on board to Prepared exclusively for Shohei Tanaka report erratum • discuss Acknowledgments • viii prepare the most recent edition of this book. My journey in this industry has changed my life and I thank all of you for crucial breakthroughs. From Jim: First, I want to thank my family: Ruthy, your boundless patience and encouragement have been heartwarming. Emma and Jimmy, you’re two smart cookies, and your daddy loves you always. Also, a special thanks to all the unsung heroes who monitor IRC, message boards, mailing lists, and bug systems ready to help anyone who needs you. Your dedication to open source keeps these projects kicking. Prepared exclusively for Shohei Tanaka report erratum • discuss Preface If we use oil extraction as a metaphor for understanding data in the contem- porary world, then databases flat-out constitute—or are deeply intertwined with—all aspects of the extraction chain, from the fields to the refineries, drills, and pumps. If you want to harness the potential of data—which has perhaps become as vital to our way of life as oil—then you need to understand databases because they are quite simply the most important piece of modern data equipment. Databases are tools, a means to an end. But like any complex tool, databases also harbor their own stories and embody their own ways of looking at the world. The better you understand databases, the more capable you’ll be of tapping into the ever-growing corpus of data at our disposal. That enhanced understanding could lead to anything from undertaking fun side projects to embarking on a career change or starting your own data-driven company. Why a NoSQL Book What exactly does the term NoSQL even mean? Which types of systems does the term include? How will NoSQL impact the practice of making great soft- ware? These were questions we wanted to answer—as much for ourselves as for others. Looking back more than a half-decade later, the rise of NoSQL isn’t so much buzzworthy as it is an accepted fact. You can still read plenty of articles about NoSQL technologies on HackerNews, TechCrunch, or even WIRED, but the tenor of those articles has changed from starry-eyed prophecy (“NoSQL will change everything!”) to more standard reporting (“check out this new Redis feature!”). NoSQL is now a mainstay and a steadily maturing one at that. But don’t write a eulogy for relational databases—the “SQL” in “NoSQL”—just yet. Although NoSQL databases have gained significant traction in the tech- nological landscape, it’s still far too early to declare “traditional” relational database models as dead or even dying.
Recommended publications
  • Preview HSQLDB Tutorial (PDF Version)
    About the Tutorial HyperSQL Database is a modern relational database manager that conforms closely to the SQL:2011 standard and JDBC 4 specifications. It supports all core features and RDBMS. HSQLDB is used for the development, testing, and deployment of database applications. In this tutorial, we will look closely at HSQLDB, which is one of the best open-source, multi-model, next generation NoSQL product. Audience This tutorial is designed for Software Professionals who are willing to learn HSQL Database in simple and easy steps. It will give you a great understanding on HSQLDB concepts. Prerequisites Before you start practicing the various types of examples given in this tutorial, we assume you are already aware of the concepts of database, especially RDBMS. Disclaimer & Copyright Copyright 2016 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or in this tutorial, please notify us at [email protected]. i Table of Contents About the Tutorial ...................................................................................................................................
    [Show full text]
  • Oracle Metadata Management V12.2.1.3.0 New Features Overview
    An Oracle White Paper October 12 th , 2018 Oracle Metadata Management v12.2.1.3.0 New Features Overview Oracle Metadata Management version 12.2.1.3.0 – October 12 th , 2018 New Features Overview Disclaimer This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced, or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates. 1 Oracle Metadata Management version 12.2.1.3.0 – October 12 th , 2018 New Features Overview Table of Contents Executive Overview ............................................................................ 3 Oracle Metadata Management 12.2.1.3.0 .......................................... 4 METADATA MANAGER VS METADATA EXPLORER UI .............. 4 METADATA HOME PAGES ........................................................... 5 METADATA QUICK ACCESS ........................................................ 6 METADATA REPORTING .............................................................
    [Show full text]
  • Apache Log4j 2 V
    ...................................................................................................................................... Apache Log4j 2 v. 2.4.1 User's Guide ...................................................................................................................................... The Apache Software Foundation 2015-10-08 T a b l e o f C o n t e n t s i Table of Contents ....................................................................................................................................... 1. Table of Contents . i 2. Introduction . 1 3. Architecture . 3 4. Log4j 1.x Migration . 10 5. API . 16 6. Configuration . 19 7. Web Applications and JSPs . 50 8. Plugins . 58 9. Lookups . 62 10. Appenders . 70 11. Layouts . 128 12. Filters . 154 13. Async Loggers . 167 14. JMX . 181 15. Logging Separation . 188 16. Extending Log4j . 190 17. Programmatic Log4j Configuration . 198 18. Custom Log Levels . 204 © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . T a b l e o f C o n t e n t s ii © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . 1 I n t r o d u c t i o n 1 1 Introduction ....................................................................................................................................... 1.1 Welcome to Log4j 2! 1.1.1 Introduction Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U.
    [Show full text]
  • Base Handbook Copyright
    Version 4.0 Base Handbook Copyright This document is Copyright © 2013 by its contributors as listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3 or later, or the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), version 3.0 or later. All trademarks within this guide belong to their legitimate owners. Contributors Jochen Schiffers Robert Großkopf Jost Lange Hazel Russman Martin Fox Andrew Pitonyak Dan Lewis Jean Hollis Weber Acknowledgments This book is based on an original German document, which was translated by Hazel Russman and Martin Fox. Feedback Please direct any comments or suggestions about this document to: [email protected] Publication date and software version Published 3 July 2013. Based on LibreOffice 4.0. Documentation for LibreOffice is available at http://www.libreoffice.org/get-help/documentation Contents Copyright..................................................................................................................................... 2 Contributors.............................................................................................................................2 Feedback................................................................................................................................ 2 Acknowledgments................................................................................................................... 2 Publication
    [Show full text]
  • Chainsys-Platform-Technical Architecture-Bots
    Technical Architecture Objectives ChainSys’ Smart Data Platform enables the business to achieve these critical needs. 1. Empower the organization to be data-driven 2. All your data management problems solved 3. World class innovation at an accessible price Subash Chandar Elango Chief Product Officer ChainSys Corporation Subash's expertise in the data management sphere is unparalleled. As the creative & technical brain behind ChainSys' products, no problem is too big for Subash, and he has been part of hundreds of data projects worldwide. Introduction This document describes the Technical Architecture of the Chainsys Platform Purpose The purpose of this Technical Architecture is to define the technologies, products, and techniques necessary to develop and support the system and to ensure that the system components are compatible and comply with the enterprise-wide standards and direction defined by the Agency. Scope The document's scope is to identify and explain the advantages and risks inherent in this Technical Architecture. This document is not intended to address the installation and configuration details of the actual implementation. Installation and configuration details are provided in technology guides produced during the project. Audience The intended audience for this document is Project Stakeholders, technical architects, and deployment architects The system's overall architecture goals are to provide a highly available, scalable, & flexible data management platform Architecture Goals A key Architectural goal is to leverage industry best practices to design and develop a scalable, enterprise-wide J2EE application and follow the industry-standard development guidelines. All aspects of Security must be developed and built within the application and be based on Best Practices.
    [Show full text]
  • D3.2 - Transport and Spatial Data Warehouse
    Holistic Approach for Providing Spatial & Transport Planning Tools and Evidence to Metropolitan and Regional Authorities to Lead a Sustainable Transition to a New Mobility Era D3.2 - Transport and Spatial Data Warehouse Technical Design @Harmony_H2020 #harmony-h2020 D3.2 - Transport and SpatialData Warehouse Technical Design SUMMARY SHEET PROJECT Project Acronym: HARMONY Project Full Title: Holistic Approach for Providing Spatial & Transport Planning Tools and Evidence to Metropolitan and Regional Authorities to Lead a Sustainable Transition to a New Mobility Era Grant Agreement No. 815269 (H2020 – LC-MG-1-2-2018) Project Coordinator: University College London (UCL) Website www.harmony-h2020.eu Starting date June 2019 Duration 42 months DELIVERABLE Deliverable No. - Title D3.2 - Transport and Spatial Data Warehouse Technical Design Dissemination level: Public Deliverable type: Demonstrator Work Package No. & Title: WP3 - Data collection tools, data fusion and warehousing Deliverable Leader: ICCS Responsible Author(s): Efthimios Bothos, Babis Magoutas, Nikos Papageorgiou, Gregoris Mentzas (ICCS) Responsible Co-Author(s): Panagiotis Georgakis (UoW), Ilias Gerostathopoulos, Shakur Al Islam, Athina Tsirimpa (MOBY X SOFTWARE) Peer Review: Panagiotis Georgakis (UoW), Ilias Gerostathopoulos (MOBY X SOFTWARE) Quality Assurance Committee Maria Kamargianni, Lampros Yfantis (UCL) Review: DOCUMENT HISTORY Version Date Released by Nature of Change 0.1 02/12/2019 ICCS ToC defined 0.3 15/01/2020 ICCS Conceptual approach described 0.5 10/03/2020 ICCS Data descriptions updated 0.7 15/04/2020 ICCS Added sections 3 and 4 0.9 12/05/2020 ICCS Ready for internal review 1.0 01/06/2020 ICCS Final version 1 D3.2 - Transport and SpatialData Warehouse Technical Design TABLE OF CONTENTS EXECUTIVE SUMMARY ....................................................................................................................
    [Show full text]
  • Full-Graph-Limited-Mvn-Deps.Pdf
    org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}
    [Show full text]
  • Main Page 1 Main Page
    Main Page 1 Main Page FLOSSMETRICS/ OpenTTT guides FLOSS (Free/Libre open source software) is one of the most important trends in IT since the advent of the PC and commodity software, but despite the potential impact on European firms, its adoption is still hampered by limited knowledge, especially among SMEs that could potentially benefit the most from it. This guide (developed in the context of the FLOSSMETRICS and OpenTTT projects) present a set of guidelines and suggestions for the adoption of open source software within SMEs, using a ladder model that will guide companies from the initial selection and adoption of FLOSS within the IT infrastructure up to the creation of suitable business models based on open source software. The guide is split into an introduction to FLOSS and a catalog of open source applications, selected to fulfill the requests that were gathered in the interviews and audit in the OpenTTT project. The application areas are infrastructural software (ranging from network and system management to security), ERP and CRM applications, groupware, document management, content management systems (CMS), VoIP, graphics/CAD/GIS systems, desktop applications, engineering and manufacturing, vertical business applications and eLearning. This is the third edition of the guide; the guide is distributed under a CC-attribution-sharealike 3.0 license. The author is Carlo Daffara ([email protected]). The complete guide in PDF format is avalaible here [1] Free/ Libre Open Source Software catalog Software: a guide for SMEs • Software Catalog Introduction • SME Guide Introduction • 1. What's Free/Libre/Open Source Software? • Security • 2. Ten myths about free/libre open source software • Data protection and recovery • 3.
    [Show full text]
  • Apache Couchdb Tutorial Apache Couchdb Tutorial
    Apache CouchDB Tutorial Apache CouchDB Tutorial Welcome to CouchDB Tutorial. In this CouchDB Tutorial, we will learn how to install CouchDB, create database in CouchDB, create documents in a database, replication between CouchDBs, configure databases, and many other concepts. What is CouchDB? CouchDB is a NoSQL Database that uses JSON for documents. CouchDB uses JavaScript for MapReduce indexes. CouchDB uses HTTP for the REST API. CouchDB Installation To install CouchDB, visit [https://couchdb.apache.org/] and click on the download button as shown below. When you click on the download button, it scrolls to the section, where based on your Operating System, you can download the installer. In this tutorial, we have downloaded for Windows (x64), and it should not make any difference if you download for macOs or Debian/Ubuntu/RHEL/CentOS. Double click the downloaded installer and follow through the steps. Once the installation is complete, you can check if CouchDB is installed successfully by requesting the URL http://127.0.0.1:5984/ in your browser. This is CouchDB saying welcome to you, along with information about CouchDB version, GIT hash, UUID, features and vendor. Fauxton Fauxton is a web based interface built into CouchDB. You can do actions like creating and deleting databases, CRUD operations on documents, user management, running MapReduce on indexex, replication between CouchDB instances. You can access CouchDB through Fauxton available at the URL http://127.0.0.1:5984/_utils/. Here you can access the following tabs in the left menu. All Databases Setup Active Tasks Configuration Replication Documentation Verify Create Database in CouchDB To create a CouchDB Database, click on Databases tab in the left menu and then click on Create Database.
    [Show full text]
  • Performance Evaluation of Relational Embedded Databases: an Empirical
    Performance evaluation of relational Check for updates embedded databases: an empirical study Evaluación del rendimiento de bases de datos embebida: un estudio empírico Author: ABSTRACT 1 Hassan B. Hassan Introduction: With the rapid deployment of embedded databases Qusay I. Sarhan2 across a wide range of embedded devices such as mobile devices, Internet of Things (IoT) devices, etc., the amount of data generat- ed by such devices is also growing increasingly. For this reason, the SCIENTIFIC RESEARCH performance is considered as a crucial criterion in the process of selecting the most suitable embedded database management system How to cite this paper: to be used to store/retrieve data of these devices. Currently, many Hassan, B. H., and Sarhan, Q. I., Performance embedded databases are available to be utilized in this context. Ma- evaluation of relational embedded databases: an empirical study, Kurdistan, Irak. Innovaciencia. terials and Methods: In this paper, four popular open-source rela- 2018; 6(1): 1-9. tional embedded databases; namely, H2, HSQLDB, Apache Derby, http://dx.doi.org/10.15649/2346075X.468 and SQLite have been compared experimentally with each other to evaluate their operational performance in terms of creating data- Reception date: base tables, retrieving data, inserting data, updating data, deleting Received: 22 September 2018 Accepted: 10 December 2018 data. Results and Discussion: The experimental results of this Published: 28 December 2018 paper have been illustrated in Table 4. Conclusions: The experi- mental results and analysis showed that HSQLDB outperformed other databases in most evaluation scenarios. Keywords: Embedded devices, Embedded databases, Performance evaluation, Database operational performance, Test methodology.
    [Show full text]
  • Review Supported Technologies
    Review Supported Technologies This document supports Pentaho Business Analytics Suite 5.0 GA and Pentaho Data Integration 5.0 GA, documentation revision August 28, 2013, copyright © 2013 Pentaho Corporation. No part may be reprinted without written permission from Pentaho Corporation. All trademarks are the property of their respective owners. Help and Support Resources If you do not find answers to your quesions here, please contact your Pentaho technical support representative. Support-related questions should be submitted through the Pentaho Customer Support Portal at http://support.pentaho.com. For information about how to purchase support or enable an additional named support contact, please contact your sales representative, or send an email to [email protected]. For information about instructor-led training, visit http://www.pentaho.com/training. Liability Limits and Warranty Disclaimer The author(s) of this document have used their best efforts in preparing the content and the programs contained in it. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, express or implied, with regard to these programs or the documentation contained in this book. The author(s) and Pentaho shall not be liable in the event of incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of the programs, associated instructions, and/or claims. Trademarks Pentaho (TM) and the Pentaho logo are registered trademarks of Pentaho Corporation. All other trademarks are the property of their respective owners. Trademarked names may appear throughout this document.
    [Show full text]
  • HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack
    HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack Geoffrey C. Fox, Judy Qiu, Supun Kamburugamuve Shantenu Jha, Andre Luckow School of Informatics and Computing RADICAL Indiana University Rutgers University Bloomington, IN 47408, USA Piscataway, NJ 08854, USA fgcf, xqiu, [email protected] [email protected], [email protected] Abstract—We review the High Performance Computing En- systems as they illustrate key capabilities and often motivate hanced Apache Big Data Stack HPC-ABDS and summarize open source equivalents. the capabilities in 21 identified architecture layers. These The software is broken up into layers so that one can dis- cover Message and Data Protocols, Distributed Coordination, Security & Privacy, Monitoring, Infrastructure Management, cuss software systems in smaller groups. The layers where DevOps, Interoperability, File Systems, Cluster & Resource there is especial opportunity to integrate HPC are colored management, Data Transport, File management, NoSQL, SQL green in figure. We note that data systems that we construct (NewSQL), Extraction Tools, Object-relational mapping, In- from this software can run interoperably on virtualized or memory caching and databases, Inter-process Communication, non-virtualized environments aimed at key scientific data Batch Programming model and Runtime, Stream Processing, High-level Programming, Application Hosting and PaaS, Li- analysis problems. Most of ABDS emphasizes scalability braries and Applications, Workflow and Orchestration. We but not performance and one of our goals is to produce summarize status of these layers focusing on issues of impor- high performance environments. Here there is clear need tance for data analytics. We highlight areas where HPC and for better node performance and support of accelerators like ABDS have good opportunities for integration.
    [Show full text]