Hbase: the Definitive Guide

Total Page:16

File Type:pdf, Size:1020Kb

Hbase: the Definitive Guide HBase: The Definitive Guide HBase: The Definitive Guide Lars George Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo HBase: The Definitive Guide by Lars George Copyright © 2011 Lars George. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editors: Mike Loukides and Julie Steele Indexer: Angela Howard Production Editor: Jasmine Perez Cover Designer: Karen Montgomery Copyeditor: Audrey Doyle Interior Designer: David Futato Proofreader: Jasmine Perez Illustrator: Robert Romano Printing History: September 2011: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. HBase: The Definitive Guide, the image of a Clydesdale horse, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-39610-7 [LSI] 1314323116 For my wife Katja, my daughter Laura, and son Leon. I love you! Table of Contents Foreword ................................................................... xv Preface .................................................................... xix 1. Introduction ........................................................... 1 The Dawn of Big Data 1 The Problem with Relational Database Systems 5 Nonrelational Database Systems, Not-Only SQL or NoSQL? 8 Dimensions 10 Scalability 12 Database (De-)Normalization 13 Building Blocks 16 Backdrop 16 Tables, Rows, Columns, and Cells 17 Auto-Sharding 21 Storage API 22 Implementation 23 Summary 27 HBase: The Hadoop Database 27 History 27 Nomenclature 29 Summary 29 2. Installation ........................................................... 31 Quick-Start Guide 31 Requirements 34 Hardware 34 Software 40 Filesystems for HBase 52 Local 54 HDFS 54 vii S3 54 Other Filesystems 55 Installation Choices 55 Apache Binary Release 55 Building from Source 58 Run Modes 58 Standalone Mode 59 Distributed Mode 59 Configuration 63 hbase-site.xml and hbase-default.xml 64 hbase-env.sh 65 regionserver 65 log4j.properties 65 Example Configuration 65 Client Configuration 67 Deployment 68 Script-Based 68 Apache Whirr 69 Puppet and Chef 70 Operating a Cluster 71 Running and Confirming Your Installation 71 Web-based UI Introduction 71 Shell Introduction 73 Stopping the Cluster 73 3. Client API: The Basics ................................................... 75 General Notes 75 CRUD Operations 76 Put Method 76 Get Method 95 Delete Method 105 Batch Operations 114 Row Locks 118 Scans 122 Introduction 122 The ResultScanner Class 124 Caching Versus Batching 127 Miscellaneous Features 133 The HTable Utility Methods 133 The Bytes Class 134 4. Client API: Advanced Features .......................................... 137 Filters 137 viii | Table of Contents Introduction to Filters 137 Comparison Filters 140 Dedicated Filters 147 Decorating Filters 155 FilterList 159 Custom Filters 160 Filters Summary 167 Counters 168 Introduction to Counters 168 Single Counters 171 Multiple Counters 172 Coprocessors 175 Introduction to Coprocessors 175 The Coprocessor Class 176 Coprocessor Loading 179 The RegionObserver Class 182 The MasterObserver Class 190 Endpoints 193 HTablePool 199 Connection Handling 203 5. Client API: Administrative Features ...................................... 207 Schema Definition 207 Tables 207 Table Properties 210 Column Families 212 HBaseAdmin 218 Basic Operations 219 Table Operations 220 Schema Operations 228 Cluster Operations 230 Cluster Status Information 233 6. Available Clients ...................................................... 241 Introduction to REST, Thrift, and Avro 241 Interactive Clients 244 Native Java 244 REST 244 Thrift 251 Avro 255 Other Clients 256 Batch Clients 257 MapReduce 257 Table of Contents | ix Hive 258 Pig 263 Cascading 267 Shell 268 Basics 269 Commands 271 Scripting 274 Web-based UI 277 Master UI 277 Region Server UI 283 Shared Pages 283 7. MapReduce Integration ................................................ 289 Framework 289 MapReduce Introduction 289 Classes 290 Supporting Classes 293 MapReduce Locality 293 Table Splits 294 MapReduce over HBase 295 Preparation 295 Data Sink 301 Data Source 306 Data Source and Sink 308 Custom Processing 311 8. Architecture ......................................................... 315 Seek Versus Transfer 315 B+ Trees 315 Log-Structured Merge-Trees 316 Storage 319 Overview 319 Write Path 320 Files 321 HFile Format 329 KeyValue Format 332 Write-Ahead Log 333 Overview 333 HLog Class 335 HLogKey Class 336 WALEdit Class 336 LogSyncer Class 337 LogRoller Class 338 x | Table of Contents Replay 338 Durability 341 Read Path 342 Region Lookups 345 The Region Life Cycle 348 ZooKeeper 348 Replication 351 Life of a Log Edit 352 Internals 353 9. Advanced Usage ...................................................... 357 Key Design 357 Concepts 357 Tall-Narrow Versus Flat-Wide Tables 359 Partial Key Scans 360 Pagination 362 Time Series Data 363 Time-Ordered Relations 367 Advanced Schemas 369 Secondary Indexes 370 Search Integration 373 Transactions 376 Bloom Filters 377 Versioning 381 Implicit Versioning 381 Custom Versioning 384 10. Cluster Monitoring .................................................... 387 Introduction 387 The Metrics Framework 388 Contexts, Records, and Metrics 389 Master Metrics 394 Region Server Metrics 394 RPC Metrics 396 JVM Metrics 397 Info Metrics 399 Ganglia 400 Installation 401 Usage 405 JMX 408 JConsole 410 JMX Remote API 413 Nagios 417 Table of Contents | xi 11. Performance Tuning ................................................... 419 Garbage Collection Tuning 419 Memstore-Local Allocation Buffer 422 Compression 424 Available Codecs 424 Verifying Installation 426 Enabling Compression 427 Optimizing Splits and Compactions 429 Managed Splitting 429 Region Hotspotting 430 Presplitting Regions 430 Load Balancing 432 Merging Regions 433 Client API: Best Practices 434 Configuration 436 Load Tests 439 Performance Evaluation 439 YCSB 440 12. Cluster Administration ................................................. 445 Operational Tasks 445 Node Decommissioning 445 Rolling Restarts 447 Adding Servers 447 Data Tasks 452 Import and Export Tools 452 CopyTable Tool 457 Bulk Import 459 Replication 462 Additional Tasks 464 Coexisting Clusters 464 Required Ports 466 Changing Logging Levels 466 Troubleshooting 467 HBase Fsck 467 Analyzing the Logs 468 Common Issues 471 A. HBase Configuration Properties ......................................... 475 B. Road Map ........................................................... 489 xii | Table of Contents C. Upgrade from Previous Releases ........................................ 491 D. Distributions ......................................................... 493 E. Hush SQL Schema ..................................................... 495 F. HBase Versus Bigtable ................................................. 497 Index ..................................................................... 501 Table of Contents | xiii Foreword The HBase story begins in 2006, when the San Francisco-based startup Powerset was trying to build a natural language search engine for the Web. Their indexing pipeline was an involved multistep process that produced an index about two orders of mag- nitude larger, on average, than your standard term-based index. The datastore that they’d built on top of the then nascent Amazon Web Services to hold the index inter- mediaries and the webcrawl was buckling under the load (Ring. Ring. “Hello! This is AWS. Whatever you are running, please turn it off!”). They were looking for an alter- native. The Google BigTable paper* had just been published. Chad Walters, Powerset’s head of engineering at the time, reflects back on the experience as follows: Building an open source system to run on top of Hadoop’s Distributed Filesystem (HDFS) in much the same way that BigTable ran on top of the Google File System seemed like a good approach because: 1) it was a proven scalable architecture; 2) we could leverage existing work on Hadoop’s HDFS; and 3) we could both contribute to and get additional leverage from the growing Hadoop ecosystem. After the publication of the Google BigTable paper, there were on-again, off-again dis- cussions around what a BigTable-like system on top of Hadoop might look. Then, in early 2007, out of the blue, Mike Cafarela dropped a tarball of thirty odd Java files into the Hadoop issue tracker: “I’ve written some code for HBase, a BigTable-like file store. It’s not perfect, but it’s ready for other people to play with and examine.” Mike had been working with Doug Cutting on Nutch, an open source search engine. He’d done similar drive-by code dumps there to add features such as a Google File System
Recommended publications
  • Introduction to Hbase Schema Design
    Introduction to HBase Schema Design AmANDeeP KHURANA Amandeep Khurana is The number of applications that are being developed to work with large amounts a Solutions Architect at of data has been growing rapidly in the recent past . To support this new breed of Cloudera and works on applications, as well as scaling up old applications, several new data management building solutions using the systems have been developed . Some call this the big data revolution . A lot of these Hadoop stack. He is also a co-author of HBase new systems that are being developed are open source and community driven, in Action. Prior to Cloudera, Amandeep worked deployed at several large companies . Apache HBase [2] is one such system . It is at Amazon Web Services, where he was part an open source distributed database, modeled around Google Bigtable [5] and is of the Elastic MapReduce team and built the becoming an increasingly popular database choice for applications that need fast initial versions of their hosted HBase product. random access to large amounts of data . It is built atop Apache Hadoop [1] and is [email protected] tightly integrated with it . HBase is very different from traditional relational databases like MySQL, Post- greSQL, Oracle, etc . in how it’s architected and the features that it provides to the applications using it . HBase trades off some of these features for scalability and a flexible schema . This also translates into HBase having a very different data model . Designing HBase tables is a different ballgame as compared to relational database systems . I will introduce you to the basics of HBase table design by explaining the data model and build on that by going into the various concepts at play in designing HBase tables through an example .
    [Show full text]
  • Apache Hbase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @Jdcryans
    Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans Tuesday, June 18, 13 Agenda • Introduction to Apache HBase • HBase at StumbleUpon • Overview of other use cases 2 Tuesday, June 18, 13 About le Moi • At Cloudera since October 2012. • At StumbleUpon for 3 years before that. • Committer and PMC member for Apache HBase since 2008. • Living in San Francisco. • From Québec, Canada. 3 Tuesday, June 18, 13 4 Tuesday, June 18, 13 What is Apache HBase Apache HBase is an open source distributed scalable consistent low latency random access non-relational database built on Apache Hadoop 5 Tuesday, June 18, 13 Inspiration: Google BigTable (2006) • Goal: Low latency, consistent, random read/ write access to massive amounts of structured data. • It was the data store for Google’s crawler web table, gmail, analytics, earth, blogger, … 6 Tuesday, June 18, 13 HBase is in Production • Inbox • Storage • Web • Search • Analytics • Monitoring 7 Tuesday, June 18, 13 HBase is in Production • Inbox • Storage • Web • Search • Analytics • Monitoring 7 Tuesday, June 18, 13 HBase is in Production • Inbox • Storage • Web • Search • Analytics • Monitoring 7 Tuesday, June 18, 13 HBase is in Production • Inbox • Storage • Web • Search • Analytics • Monitoring 7 Tuesday, June 18, 13 HBase is Open Source • Apache 2.0 License • A Community project with committers and contributors from diverse organizations: • Facebook, Cloudera, Salesforce.com, Huawei, eBay, HortonWorks, Intel, Twitter … • Code license means anyone can modify and use the code. 8 Tuesday, June 18, 13 So why use HBase? 9 Tuesday, June 18, 13 10 Tuesday, June 18, 13 Old School Scaling => • Find a scaling problem.
    [Show full text]
  • Rubabel: Wrapping Open Babel with Ruby Rob Smith1*, Ryan Williamson1, Dan Ventura1 and John T Prince2*
    Smith et al. Journal of Cheminformatics 2013, 5:35 http://www.jcheminf.com/content/5/1/35 SOFTWARE Open Access Rubabel: wrapping open Babel with Ruby Rob Smith1*, Ryan Williamson1, Dan Ventura1 and John T Prince2* Abstract Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the diverse needs of the chemoinformatic community. While existing chemoinformatics libraries provide a broad range of utilities, many chemoinformaticians find compiled language libraries intimidating, time-consuming, arcane, and verbose. Although high-level language wrappers have been implemented, more can be done to leverage the intuitiveness of object-orientation, the paradigms of high-level languages, and the extensibility of languages such as Ruby. We introduce Rubabel, an intuitive, object-oriented suite of functionality that substantially increases the accessibily of the tools in the Open Babel chemoinformatics library. Results: Rubabel requires fewer lines of code than any other actively developed wrapper, providing better object organization and navigation, and more intuitive object behavior than extant solutions. Moreover, Rubabel provides a convenient interface to the many extensions currently available in Ruby, greatly streamlining otherwise onerous tasks such as creating web applications that serve up Rubabel functionality. Conclusions: Rubabel is powerful, intuitive, concise, freely available, cross-platform, and easy to install. We expect it to be a platform of choice for new users, Ruby users, and some users of current solutions. Keywords: Chemoinformatics, Open Babel, Ruby Background tasks. Though it allows the user to access the functionality Despite the fact that chemoinformatics tools have been of the component libraries from one Python script, Cin- developed since the late 1990s [1], the field has yet to fony does not automatically manage underlying data types rally in support of a single library.
    [Show full text]
  • Final Report CS 5604: Information Storage and Retrieval
    Final Report CS 5604: Information Storage and Retrieval Solr Team Abhinav Kumar, Anand Bangad, Jeff Robertson, Mohit Garg, Shreyas Ramesh, Siyu Mi, Xinyue Wang, Yu Wang January 16, 2018 Instructed by Professor Edward A. Fox Virginia Polytechnic Institute and State University Blacksburg, VA 24061 1 Abstract The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets and millions of webpages for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) projects [6]. We are using a 21 node Cloudera Hadoop cluster to store and retrieve this information. One goal of this project is to expand the data collection to include more web archives and geospatial data beyond what previously had been collected. Another important part in this project is optimizing the current system to analyze and allow access to the new data. To accomplish these goals, this project is separated into 6 parts with corresponding teams: Classification (CLA), Collection Management Tweets (CMT), Collection Management Webpages (CMW), Clustering and Topic Analysis (CTA), Front-end (FE), and SOLR. This report describes the work completed by the SOLR team which improves the current searching and storage system. We include the general architecture and an overview of the current system. We present the part that Solr plays within the whole system with more detail. We talk about our goals, procedures, and conclusions on the improvements we made to the current Solr system. This report also describes how we coordinate with other teams to accomplish the project at a higher level. Additionally, we provide manuals for future readers who might need to replicate our experiments.
    [Show full text]
  • Apache Hbase: the Hadoop Database Yuanru Qian, Andrew Sharp, Jiuling Wang
    Apache HBase: the Hadoop Database Yuanru Qian, Andrew Sharp, Jiuling Wang 1 Agenda ● Motivation ● Data Model ● The HBase Distributed System ● Data Operations ● Access APIs ● Architecture 2 Motivation ● Hadoop is a framework that supports operations on a large amount of data. ● Hadoop includes the Hadoop Distributed File System (HDFS) ● HDFS does a good job of storing large amounts of data, but lacks quick random read/write capability. ● That’s where Apache HBase comes in. 3 Introduction ● HBase is an open source, sparse, consistent distributed, sorted map modeled after Google’s BigTable. ● Began as a project by Powerset to process massive amounts of data for natural language processing. ● Developed as part of Apache’s Hadoop project and runs on top of Hadoop Distributed File System. 4 Big Picture MapReduce Thrift/REST Your Java Hive/Pig Gateway Application Java Client ZooKeeper HBase HDFS 5 An Example Operation The Job: The Table: A MapReduce job row key column 1 column 2 needs to operate on a series of webpages “com.cnn.world” 13 4.5 matching *.cnn.com “com.cnn.tech” 46 7.8 “com.cnn.money” 44 1.2 6 The HBase Data Model 7 Data Model, Groups of Tables RDBMS Apache HBase database namespace table table 8 Data Model, Single Table RDBMS table col1 col2 col3 col4 row1 row2 row3 9 Data Model, Single Table Apache HBase columns are table fam1 fam2 grouped into fam1:col1 fam1:col2 fam2:col1 fam2:col2 Column Families row1 row2 row3 10 Sparse example Row Key fam1:contents fam1:anchor “com.cnn.www” contents:html = “<html>...” contents:html = “<html>...”
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • Apache Hadoop Goes Realtime at Facebook
    Apache Hadoop Goes Realtime at Facebook Dhruba Borthakur Joydeep Sen Sarma Jonathan Gray Kannan Muthukkaruppan Nicolas Spiegelberg Hairong Kuang Karthik Ranganathan Dmytro Molkov Aravind Menon Samuel Rash Rodrigo Schmidt Amitanand Aiyer Facebook {dhruba,jssarma,jgray,kannan, nicolas,hairong,kranganathan,dms, aravind.menon,rash,rodrigo, amitanand.s}@fb.com ABSTRACT 1. INTRODUCTION Facebook recently deployed Facebook Messages, its first ever Apache Hadoop [1] is a top-level Apache project that includes user-facing application built on the Apache Hadoop platform. open source implementations of a distributed file system [2] and Apache HBase is a database-like layer built on Hadoop designed MapReduce that were inspired by Googles GFS [5] and to support billions of messages per day. This paper describes the MapReduce [6] projects. The Hadoop ecosystem also includes reasons why Facebook chose Hadoop and HBase over other projects like Apache HBase [4] which is inspired by Googles systems such as Apache Cassandra and Voldemort and discusses BigTable, Apache Hive [3], a data warehouse built on top of the applications requirements for consistency, availability, Hadoop, and Apache ZooKeeper [8], a coordination service for partition tolerance, data model and scalability. We explore the distributed systems. enhancements made to Hadoop to make it a more effective realtime system, the tradeoffs we made while configuring the At Facebook, Hadoop has traditionally been used in conjunction system, and how this solution has significant advantages over the with Hive for storage and analysis of large data sets. Most of this sharded MySQL database scheme used in other applications at analysis occurs in offline batch jobs and the emphasis has been on Facebook and many other web-scale companies.
    [Show full text]
  • TR-4744: Secure Hadoop Using Apache Ranger with Netapp In
    Technical Report Secure Hadoop using Apache Ranger with NetApp In-Place Analytics Module Deployment Guide Karthikeyan Nagalingam, NetApp February 2019 | TR-4744 Abstract This document introduces the NetApp® In-Place Analytics Module for Apache Hadoop and Spark with Ranger. The topics covered in this report include the Ranger configuration, underlying architecture, integration with Hadoop, and benefits of Ranger with NetApp In-Place Analytics Module using Hadoop with NetApp ONTAP® data management software. TABLE OF CONTENTS 1 Introduction ........................................................................................................................................... 4 1.1 Overview .........................................................................................................................................................4 1.2 Deployment Options .......................................................................................................................................5 1.3 NetApp In-Place Analytics Module 3.0.1 Features ..........................................................................................5 2 Ranger ................................................................................................................................................... 6 2.1 Components Validated with Ranger ................................................................................................................6 3 NetApp In-Place Analytics Module Design with Ranger..................................................................
    [Show full text]
  • Specialising Dynamic Techniques for Implementing the Ruby Programming Language
    SPECIALISING DYNAMIC TECHNIQUES FOR IMPLEMENTING THE RUBY PROGRAMMING LANGUAGE A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2015 By Chris Seaton School of Computer Science This published copy of the thesis contains a couple of minor typographical corrections from the version deposited in the University of Manchester Library. [email protected] chrisseaton.com/phd 2 Contents List of Listings7 List of Tables9 List of Figures 11 Abstract 15 Declaration 17 Copyright 19 Acknowledgements 21 1 Introduction 23 1.1 Dynamic Programming Languages.................. 23 1.2 Idiomatic Ruby............................ 25 1.3 Research Questions.......................... 27 1.4 Implementation Work......................... 27 1.5 Contributions............................. 28 1.6 Publications.............................. 29 1.7 Thesis Structure............................ 31 2 Characteristics of Dynamic Languages 35 2.1 Ruby.................................. 35 2.2 Ruby on Rails............................. 36 2.3 Case Study: Idiomatic Ruby..................... 37 2.4 Summary............................... 49 3 3 Implementation of Dynamic Languages 51 3.1 Foundational Techniques....................... 51 3.2 Applied Techniques.......................... 59 3.3 Implementations of Ruby....................... 65 3.4 Parallelism and Concurrency..................... 72 3.5 Summary............................... 73 4 Evaluation Methodology 75 4.1 Evaluation Philosophy
    [Show full text]
  • Ruby Programming Certification
    Ruby Programming Certification Duration: 5 Days What is the course about? This course will introduce the fundamentals concepts for the Ruby programming language. Ruby is an easy programming language to learn. Its popularly used by startups in developing web applications and for administering systems. Ruby was developed to make programmers happy and is popularly knwon for the Rails web application framework. Ruby is also the language used for developing Puppet, a systems automation software and MetaSploit, a security assesment framework.This course is designed for both programmers and developers who want to get the fundamentals of programming in Ruby. Duration The course is full time and runs over 5 days. This course is primarily offered as a private course to a team, group or a company. Programming Experience This is a beginner's course and no prior programming experience is required. A good command of computer skills, the command line or the terminal is required. Technical Skill We primarily use Linux or Mac during the course. A Windows machine can also be used but some things might not work. So, technical knowledge on the Linux environment is required. Private Training This course is primarily offered as a private course. A minimum of 4 delegates is required to schedule the course. The course can be run on your premises or our premises. When run on your premises, the course costs R9 500 per delegate with a minimum of 4 delegates to schedule the course. The cost is R12 500 to run the course on our premises and training can be done in eitherJohannesburg, Durban or Cape Town.
    [Show full text]
  • Beyond Macrobenchmarks: Microbenchmark-Based Graph Database Evaluation
    Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation Matteo Lissandrini Martin Brugnara Yannis Velegrakis Aalborg University University of Trento University of Trento [email protected] [email protected] [email protected] ABSTRACT There are two categories of graph management systems Despite the increasing interest in graph databases their re- that address two complementary yet distinct sets of func- quirements and specifications are not yet fully understood tionalities. The first is that of graph processing systems [27, by everyone, leading to a great deal of variation in the sup- 42,44], which analyze graphs to discover characteristic prop- ported functionalities and the achieved performances. In erties, e.g., average connectivity degree, density, and mod- this work, we provide a comprehensive study of the exist- ularity. They also perform batch analytics at large scale, ing graph database systems. We introduce a novel micro- implementing computationally expensive graph algorithms, benchmarking framework that provides insights on their per- such as PageRank [54], SVD [23], strongly connected com- formance that go beyond what macro-benchmarks can of- ponents identification [63], and core identification [12, 20]. fer. The framework includes the largest set of queries and Those are systems like GraphLab, Giraph, Graph Engine, operators so far considered. The graph database systems and GraphX [67]. The second category is that of graph are evaluated on synthetic and real data, from different do- databases (GDB for short) [5]. Their focus is on storage mains, and at scales much larger than any previous work. and querying tasks where the priority is on high-throughput The framework is materialized as an open-source suite and and transactional operations.
    [Show full text]
  • Pentaho EMR46 SHIM 7.1.0.0 Open Source Software Packages
    Pentaho EMR46 SHIM 7.1.0.0 Open Source Software Packages Contact Information: Project Manager Pentaho EMR46 SHIM Hitachi Vantara Corporation 2535 Augustine Drive Santa Clara, California 95054 Name of Product/Product Version License Component An open source Java toolkit for 0.9.0 Apache License Version 2.0 Amazon S3 AOP Alliance (Java/J2EE AOP 1.0 Public Domain standard) Apache Commons BeanUtils 1.9.3 Apache License Version 2.0 Apache Commons CLI 1.2 Apache License Version 2.0 Apache Commons Daemon 1.0.13 Apache License Version 2.0 Apache Commons Exec 1.2 Apache License Version 2.0 Apache Commons Lang 2.6 Apache License Version 2.0 Apache Directory API ASN.1 API 1.0.0-M20 Apache License Version 2.0 Apache Directory LDAP API Utilities 1.0.0-M20 Apache License Version 2.0 Apache Hadoop Amazon Web 2.7.2 Apache License Version 2.0 Services support Apache Hadoop Annotations 2.7.2 Apache License Version 2.0 Name of Product/Product Version License Component Apache Hadoop Auth 2.7.2 Apache License Version 2.0 Apache Hadoop Common - 2.7.2 Apache License Version 2.0 org.apache.hadoop:hadoop-common Apache Hadoop HDFS 2.7.2 Apache License Version 2.0 Apache HBase - Client 1.2.0 Apache License Version 2.0 Apache HBase - Common 1.2.0 Apache License Version 2.0 Apache HBase - Hadoop 1.2.0 Apache License Version 2.0 Compatibility Apache HBase - Protocol 1.2.0 Apache License Version 2.0 Apache HBase - Server 1.2.0 Apache License Version 2.0 Apache HBase - Thrift - 1.2.0 Apache License Version 2.0 org.apache.hbase:hbase-thrift Apache HttpComponents Core
    [Show full text]