IBM Big SQL (With Hbase), Splice Major Contributor to the Apache Be a Major Determinant“ Machine (Which Incorporates Hbase Madlib Project

Total Page:16

File Type:pdf, Size:1020Kb

IBM Big SQL (With Hbase), Splice Major Contributor to the Apache Be a Major Determinant“ Machine (Which Incorporates Hbase Madlib Project MarketReport Market Report Paper by Bloor Author Philip Howard Publish date December 2017 SQL Engines on Hadoop It is clear that“ Impala, LLAP, Hive, Spark and so on, perform significantly worse than products from vendors with a history in database technology. Author Philip Howard” Executive summary adoop is used for a lot of these are discussed in detail in this different purposes and one paper it is worth briefly explaining H major subset of the overall that SQL support has two aspects: the Hadoop market is to run SQL against version supported (ANSI standard 1992, Hadoop. This might seem contrary 1999, 2003, 2011 and so on) plus the to Hadoop’s NoSQL roots, but the robustness of the engine at supporting truth is that there are lots of existing SQL queries running with multiple investments in SQL applications that concurrent thread and at scale. companies want to preserve; all the Figure 1 illustrates an abbreviated leading business intelligence and version of the results of our research. analytics platforms run using SQL; and This shows various leading vendors, SQL skills, capabilities and developers and our estimates of their product’s are readily available, which is often not positioning relative to performance and The key the case for other languages. SQL support. Use cases are shown by the differentiators“ However, the market for SQL engines on colour of each bubble but for practical between products Hadoop is not mono-cultural. There are reasons this means that no vendor/ multiple use cases for deploying SQL on product is shown for more than two use are the use cases Hadoop and there are more than twenty cases, which is why we describe Figure they support, their different SQL on Hadoop platforms. 1 as abbreviated. Thus, for example, performance and Mapping the latter to the former is not we are using “EDW” as shorthand for the level of SQL a trivial task, as different offerings are products that support both transactional they offer. optimised for some purposes but not lookups and complex analytics, which others. are otherwise individual use cases. Also, The key differentiators between it excludes vendors targeting OLAP, products are the use cases they as the leaders in this market – Jethro support, their performance and the Data and Kyvos Insights – have distinct level of SQL they offer. While all of approaches that are not easily compared. ” Figure 1 – Use cases by performance and SQL support. Use cases include Hybrid Transactional and Analytic Processing (HTAP), a merger of the transactional look-ups and complex analytics (EDW: enterprise data warehouse), combined batch and real-time/streaming analytics (Lambda architectures), and machine learning (ML). OLAP and some other use cases are omitted. SQL 5 KEY: HTAP IBM EDW ML Presto Lambda 4 Kognitio Pivotal Spark MapR Esgyn Hortonworks Actian 3 Cloudera Splice Machine Spark 2 PERFORMANCE 2 345 3 A Bloor Market Report Paper Use cases e have identified six different There are several other uses cases use cases for SQL on Hadoop. where you might want to use SQL on W Some of these overlap one Hadoop but, often enough, Hadoop on another and there will also be instances its own will be enough. These use cases where a user wants more than one of include extract, load and transform these use cases running on the same (ELT) and archival, as well as (ad hoc) cluster. However, we believe that the data preparation. The last of these examples detailed provide the bedrock was identified as a use case by one for making decisions about potential of the vendors, although none of the solutions. suppliers – including the identifier – we The main use cases we have have spoken to, have claimed to target identified, in no particular order, are: it. The same applies to data discovery We have identified 1. Transactional look-ups. This will and similar use cases where you would “six different use often be combined with other probably be better off to rely on an cases for SQL on use cases. information/data catalogue running on your data lake. One vendor also 2. Hybrid transactional analytic Hadoop. Some suggested a use case as an operational processing (HTAP). of these overlap data store. one another and 3. Complex queries against large datasets. Typically involving many there will also be users. We might describe this as instances where a “traditional data warehousing” and, user wants more certainly, there are vendors aiming to than one of these replace enterprise data warehouses use cases running on (EDW) via this use case. Often combined with transactional the same cluster. look-ups. 4. Online analytic processing (OLAP). May be either multi-dimensional OLAP (MOLAP) or relational OLAP ” (ROLAP). 5. To support machine (and deep) learning. 6. A “collapsed” lambda (or kappa) architecture designed to support both batch and real-time (streaming) analytics. Will often be combined with either or both of OLAP and machine learning, © 2017 Bloor 4 Offerings roducts in this market tend to • Other MPP-based solutions. This fall into one of six categories category consists of Transwarp and P and in the following lists we Esgyn. The latter is is a descendant have highlighted those products we of Tandem NonStop, HP Neoview examine in more detail in this report. and other HPE-based warehousing The groupings consist of: developments. • Pure-play open source projects. • Specialist offerings. Mostly these This category includes Hive, HBase, are targeted at OLAP environments. Tajo, Phoenix, Ignite and Spark. In this category are Apache Kylin See also the OLAP-based projects (MOLAP) and Apache Lens (ROLAP) below. All of these are Apache as well as Kyvos Insights and Jethro projects. Of the less well-known Data. Splice Machine is also in this offerings Phoenix supports on- category but has rather broader Traditional data line transaction processing (OLTP) capabilities (see later). AtScale warehousing products“ running against HBase; Ignite is an will compete with products in this have been used as in-memory computing platform that category but is a “BI on Hadoop” the basis for SQL on is commercially supported (and was engine rather than a SQL on Hadoop originally developed) by GridGain. It platform: as such it is not discussed Hadoop platforms. is typically used either as a Hadoop further here. These include IBM accelerator and/or to provide • Others that are often referred to as Db2 (Big SQL), Oracle, immediate consistency. Tajo is a big SQL on Hadoop engines, but which Vertica, Pivotal HDB data warehouse. There have been no are not. Included in this category (HAWQ: effectively a new releases of Tajo for 18 months, are Splout SQL, which is really port of Greenplum), so we suspect that it is defunct. about data serving, and Concurrent Kognitio (which is • Vendor supported open-source Lingual, which is used for application projects. This group includes development. Druid, which started free-to-use) and Drill (supported by MapR), Presto life as an MDX engine (and which Actian VectorH. (Teradata/Starburst Data), HAWQ now has limited SQL support) is (Pivotal) and Trafodion (Esgyn). another data serving product with All of these, again, are Apache OLAP capabilities. Apache Calcite is projects. Also in this category a general-purpose SQL optimiser but are Impala (Cloudera) and Hive + not an engine per se. None of the ” LLAP (Hortonworks – live long products in this group are discussed and process - previously known as in this report. Stinger). Note that Drill does not In the vendor/product section of this have to run on Hadoop. report we include short descriptions of • Traditional data warehousing many, though not all, of the proprietary products that have been used as the products (open source or otherwise), basis for SQL on Hadoop platforms. with the exception of Oracle, Vertica and These include IBM Db2 (Big SQL), Transwarp, none of which responded Oracle, Vertica, Pivotal HDB (HAWQ: to our requests for information. While effectively a port of Greenplum), the omission of Oracle and Vertica is no Kognitio (which is free-to-use) and great loss (a straight line can be drawn Actian VectorH. VectorH is the odd across from their traditional products), one out here because Actian Vector we would have liked to include details is a symmetric multi-processing about Transwarp. (SMP) solution that has been developed into a massively parallel processing (MPP) environment. All the other products were MPP-based originally. 5 A Bloor Market Report Paper Performance benchmarks great many vendors in this To conclude this section – while not space have conducted and all products have been benchmarked A published benchmarks. and some have been benchmarked Some of these have been validated against different standards – it is clear by third parties, some of them have that Impala, LLAP, Hive, Spark and so been conducted by third parties, but on, perform significantly worse than the majority have not involved any products from vendors with a history independent authorities. Although TPC in database technology. Moreover, it (transaction processing council) tests is much more likely that companies have typically been the basis for these in the latter category will be able to benchmarks, none of them have been support all of your queries and run authenticated by TPC. The individual them successfully: the level of SQL The level of support product descriptions that follow outline support from the pure-play, Cloudera “for ANSI standard the results of the various benchmarks or Hortonworks products, tends to be SQL varies widely. that have been performed by different limited. vendors. We will therefore confine While on the subject of SQL support, IBM – not just in ourselves here to general comments. it is worth commenting that the level Big SQL and Db2, The first point that we would of support for ANSI standard SQL varies but across its product like to note is that TPC-DS (Decision widely.
Recommended publications
  • Hitachi Solution for Databases in Enterprise Data Warehouse Offload Package for Oracle Database with Mapr Distribution of Apache
    Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide By Shashikant Gaikwad, Subhash Shinde December 2018 Feedback Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to [email protected]. To assist the routing of this message, use the paper number in the subject and the title of this white paper in the text. Revision History Revision Changes Date MK-SL-131-00 Initial release December 27, 2018 Table of Contents Solution Overview 2 Business Benefits 2 High Level Infrastructure 3 Key Solution Components 4 Pentaho 6 Hitachi Advanced Server DS120 7 Hitachi Virtual Storage Platform Gx00 Models 7 Hitachi Virtual Storage Platform Fx00 Models 7 Brocade Switches 7 Cisco Nexus Data Center Switches 7 MapR Converged Data Platform 8 Red Hat Enterprise Linux 10 Solution Design 10 Server Architecture 11 Storage Architecture 13 Network Architecture 14 Data Analytics and Performance Monitoring Using Hitachi Storage Advisor 17 Oracle Enterprise Data Workflow Offload 17 Engineering Validation 29 Test Methodology 29 Test Results 30 1 Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide Use this reference architecture guide to implement Hitachi Solution for Databases in an enterprise data warehouse offload package for Oracle Database. This Oracle converged infrastructure provides a high performance, integrated, solution for advanced analytics using the following big data applications: . Hitachi Advanced Server DS120 with Intel Xeon Silver 4110 processors . Pentaho Data Integration . MapR distribution for Apache Hadoop This converged infrastructure establishes best practices for environments where you can copy data in an enterprise data warehouse to an Apache Hive database on top of Hadoop Distributed File System (HDFS).
    [Show full text]
  • Database Solutions on AWS
    Database Solutions on AWS Leveraging ISV AWS Marketplace Solutions November 2016 Database Solutions on AWS Nov 2016 Table of Contents Introduction......................................................................................................................................3 Operational Data Stores and Real Time Data Synchronization...........................................................5 Data Warehousing............................................................................................................................7 Data Lakes and Analytics Environments............................................................................................8 Application and Reporting Data Stores..............................................................................................9 Conclusion......................................................................................................................................10 Page 2 of 10 Database Solutions on AWS Nov 2016 Introduction Amazon Web Services has a number of database solutions for developers. An important choice that developers make is whether or not they are looking for a managed database or if they would prefer to operate their own database. In terms of managed databases, you can run managed relational databases like Amazon RDS which offers a choice of MySQL, Oracle, SQL Server, PostgreSQL, Amazon Aurora, or MariaDB database engines, scale compute and storage, Multi-AZ availability, and Read Replicas. You can also run managed NoSQL databases like Amazon DynamoDB
    [Show full text]
  • Mapreduce Service
    MapReduce Service Troubleshooting Issue 01 Date 2021-03-03 HUAWEI TECHNOLOGIES CO., LTD. Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Issue 01 (2021-03-03) Copyright © Huawei Technologies Co., Ltd. i MapReduce Service Troubleshooting Contents Contents 1 Account Passwords.................................................................................................................. 1 1.1 Resetting
    [Show full text]
  • Desarrollo De Una Solución Business Intelligence Mediante Un Paradigma De Data Lake
    Desarrollo de una solución Business Intelligence mediante un paradigma de Data Lake José María Tagarro Martí Grado de Ingeniería Informática Consultor: Humberto Andrés Sanz Profesor: Atanasi Daradoumis Haralabus 13 de enero de 2019 Esta obra está sujeta a una licencia de Reconocimiento-NoComercial-SinObraDerivada 3.0 España de Creative Commons FICHA DEL TRABAJO FINAL Desarrollo de una solución Business Intelligence mediante un paradigma de Data Título del trabajo: Lake Nombre del autor: José María Tagarro Martí Nombre del consultor: Humberto Andrés Sanz Fecha de entrega (mm/aaaa): 01/2019 Área del Trabajo Final: Business Intelligence Titulación: Grado de Ingeniería Informática Resumen del Trabajo (máximo 250 palabras): Este trabajo implementa una solución de Business Intelligence siguiendo un paradigma de Data Lake sobre la plataforma de Big Data Apache Hadoop con el objetivo de ilustrar sus capacidades tecnológicas para este fin. Los almacenes de datos tradicionales necesitan transformar los datos entrantes antes de ser guardados para que adopten un modelo preestablecido, en un proceso conocido como ETL (Extraer, Transformar, Cargar, por sus siglas en inglés). Sin embargo, el paradigma Data Lake propone el almacenamiento de los datos generados en la organización en su propio formato de origen, de manera que con posterioridad puedan ser transformados y consumidos mediante diferentes tecnologías ad hoc para las distintas necesidades de negocio. Como conclusión, se indican las ventajas e inconvenientes de desplegar una plataforma unificada tanto para análisis Big Data como para las tareas de Business Intelligence, así como la necesidad de emplear soluciones basadas en código y estándares abiertos. Abstract (in English, 250 words or less): This paper implements a Business Intelligence solution following the Data Lake paradigm on Hadoop’s Big Data platform with the aim of showcasing the technology for this purpose.
    [Show full text]
  • Modern Technologies of Bigdata Analytics: Case Study on Hadoop Platform Dharminder Yadav1, Umesh Chandra2
    International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected] Volume 6, Issue 4, July- August 2017 ISSN 2278-6856 Modern Technologies of BigData Analytics: Case study on Hadoop Platform Dharminder Yadav1, Umesh Chandra2 1Research Scholar, Computer Science Department, Glocal University, Saharanpur, UP, India 2PhD, Assistant Professor, Computer Science Department, Glocal University, Saharanpur, UP, India Abstract exchange, banking, on-line and on-site procuring [2]. Data is growing in the worldwide by daily activities, by using the Enormous Information as an examination subject from a hand-held devices, the Internet, and social media sites.This few focuses course of events, paper main discusses about data processing by using various geographic yield, disciplinary output, types of distributed tool of Hadoop.This present study cover most of the tools used papers, topical and theoretical advancement. The Big Data in Hadoop that help in parallel processing and MapReduce. challenges define in 6V's that are variety, velocity, volume, The day since BigData term introduced to database world , Hadoop act like a savior for most of the large, small value, veracity, and volatility [23]. organization. Researchers will definitely found a way through Hadoop to work huge data concept and most of the researchers are being done in the field of BigData analytics and data mining with the help of Hadoop. Keywords— Big Data, Hadoop, HDFS (Hadoop Distributed File System), NOSQL 1.INTRODUCTION Big Data provide storage and data processing facilities to Cloud computing [26]. Big data comes around 2005 but now it is used everywhere in daily life, which alludes to an expansive scope of informational collections practically difficult to manage, handle and prepare utilizing accessible Volume: Data is growing exponentially by daily activities regular apparatuses and information administration which we handle.
    [Show full text]
  • Mapr Spark Certification Preparation Guide
    MAPR SPARK CERTIFICATION PREPARATION GUIDE By HadoopExam.com 1 About Spark and Its Demand ........................................................................................................................ 4 Core Spark: ........................................................................................................................................ 6 SparkSQL: .......................................................................................................................................... 6 Spark Streaming: ............................................................................................................................... 6 GraphX: ............................................................................................................................................. 6 Machine Learning: ............................................................................................................................ 6 Who should learn Spark? .............................................................................................................................. 6 About Spark Certifications: ........................................................................................................................... 6 HandsOn Exam: ......................................................................................................................................... 7 Multiple Choice Questions: ......................................................................................................................
    [Show full text]
  • Optimisation of Ad-Hoc Analysis of an OLAP Cube Using Sparksql
    UPTEC X 17 007 Examensarbete 30 hp September 2017 Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL Milja Aho Abstract Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL Milja Aho Teknisk- naturvetenskaplig fakultet UTH-enheten An Online Analytical Processing (OLAP) cube is a way to represent a multidimensional database. The multidimensional database often uses a star Besöksadress: schema and populates it with the data from a relational database. The purpose of Ångströmlaboratoriet Lägerhyddsvägen 1 using an OLAP cube is usually to find valuable insights in the data like trends or Hus 4, Plan 0 unexpected data and is therefore often used within Business intelligence (BI). Mondrian is a tool that handles OLAP cubes that uses the query language Postadress: MultiDimensional eXpressions (MDX) and translates it to SQL queries. Box 536 751 21 Uppsala Apache Kylin is an engine that can be used with Apache Hadoop to create and query OLAP cubes with an SQL interface. This thesis investigates whether the Telefon: engine Apache Spark running on a Hadoop cluster is suitable for analysing OLAP 018 – 471 30 03 cubes and what performance that can be expected. The Star Schema Benchmark Telefax: (SSB) has been used to provide Ad-Hoc queries and to create a large database 018 – 471 30 00 containing over 1.2 billion rows. This database was created in a cluster in the Omicron office consisting of five worker nodes and one master node. Queries were Hemsida: then sent to the database using Mondrian integrated into the BI platform Pentaho. http://www.teknat.uu.se/student Amazon Web Services (AWS) has also been used to create clusters with 3, 6 and 15 slaves to see how the performance scales.
    [Show full text]
  • Provisioning Guide Version 2.3.0 Table of Contents
    Provisioning Guide Version 2.3.0 Table of Contents 1. About This Document . 3 1.1. Intended Audience . 3 1.2. New and Changed Information . 3 1.3. Notation Conventions . 4 1.4. Comments Encouraged . 6 2. Quick Start . 8 2.1. Download Binaries . 8 2.2. Unpack Installer and Server package . 9 2.3. Collect Information . 10 2.3.1. Java Location . 10 2.3.2. Data Nodes . 11 2.3.3. Distribution Manager URL . 11 2.4. Run Installer . 12 3. Introduction . 13 3.1. Security Considerations . 13 3.2. Provisioning Options . 14 3.3. Provisioning Activities . 14 3.4. Provisioning Master Node . 15 3.5. Trafodion Installer . 15 3.5.1. Usage . 16 3.5.2. Install vs. Upgrade . 17 3.5.3. Guided Setup . 17 3.5.4. Automated Setup . 17 3.6. Trafodion Provisioning Directories . 20 4. Requirements . 22 4.1. General Cluster and OS Requirements and Recommendations . 22 4.1.1. Hardware Requirements and Recommendations . 22 4.1.2. OS Requirements and Recommendations . 23 4.1.3. IP Ports . 24 4.2. Prerequisite Software . 25 4.2.1. Hadoop Software . 25 4.2.2. Software Packages . 25 4.3. Trafodion User IDs and Their Privileges . 26 4.3.1. Trafodion Runtime User . 26 4.3.2. Trafodion Provisioning User . 26 4.4. Recommended Configuration Changes . 28 4.4.1. Recommended Security Changes . 29 4.4.2. Recommended HDFS Configuration Changes . 29 4.4.3. Recommended HBase Configuration Changes . 29 5. Prepare . 31 5.1. Install Optional Workstation Software . 31 5.2. Configure Installation User ID .
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • PROCESSING LARGE / BIG DATA SET THROUGH Mapr and PIG
    International Journal of Scientific & Engineering Research Volume 8, Issue 7, July-2017 863 ISSN 2229-5518 PROCESSING LARGE / BIG DATA SET THROUGH MapR AND PIG Arvind Kumar-Senior ERP Solution Architect / Manager, Derex Technologies, Inc. Abstract : We live in the data age. It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the “digital universe” at 0.18 zettabytes in 2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes.* A zettabyte is 1021 bytes, or equivalently one thousand Exabyte’s, one million petabytes, or one billion terabytes. That’s roughly the same order of magnitude as one disk drive for every person in the world. MapReduce is a programming model for data processing. The model is simple, yet nottoo simple to express useful programs in. Hadoop can run MapReduce programs writtenin various languages, MapReduce programs are inherentlyparallel, thus putting very large-scale data analysis into the hands of anyone withenough machines at their disposal. MapReduce comes into its own for large datasets. MapReduce is a framework for performing distributed data processing using the MapReduce programming paradigm. In the MapReduce paradigm, each job has a user-defined map phase, which is a parallel, share-nothing processing of input; followed by a user-defined reduce phase where the output of the map phase is aggregated). Pig raises the level of abstraction for processing large datasets. With MapReduce, thereis a map function and there is a reduce function, and working out how to fit your data processing into this pattern, which often requires multiple MapReduce stages, can be a challenge.
    [Show full text]
  • Esgyndb 版本说明2.4.2
    EsgynDB 版本说明 2.4.2 2018 年 7 月 版权 © Copyright 2018 Esgyn 公告 本文档包含的信息如有更改,恕不另行通知。 保留所有权利。除非版权法允许,否则在未经 Esgyn 预先书面许可的情况下, 严禁改编或翻译本手册的内容。Esgyn 对于本文中所包含的技术或编辑错误、遗 漏概不负责。 Esgyn 产品和服务附带的正式担保声明中规定的担保是该产品和服务享有的唯 一担保。本文中的任何信息均不构成额外的保修条款。 声明 Microsoft® 和 Windows® 是美国微软公司的注册商标。Java® 和 MySQL® 是 Oracle 及其子公司的注册商标。Bosun 是 Stack Exchange 的商标。Apache®、 Hadoop®、HBase®、Hive®、openTSDB®、Sqoop® 和 Trafodion® 是 Apache 软 件基金会的商标。Esgyn 和 EsgynDB 是 Esgyn 的商标。 目 录 1. 功能 ........................................................................................................ 2 EsgynDB 2.4.2 ................................................................................................... 2 EsgynDB 2.4.1 ................................................................................................... 2 EsgynDB 2.4.0 ................................................................................................... 2 2. 迁移要点................................................................................................ 3 2.1 在 EsgynDB 2.3.0 的基础上升级 ..................................................................... 3 2.1.1 系统 .......................................................................................................... 3 2.1.2 应用程序 .................................................................................................. 4 2.2 在 EsgynDB 2.2.0 或更早版本的基础上升级 ................................................. 5 2.2.1 系统 .......................................................................................................... 5 2.2.2 TRAF_HOME ..........................................................................................
    [Show full text]
  • Your Expert Guide to Hadoop Big Data Platforms
    E-guide Hadoop Big Data Platforms Buyer’s Guide – part 3 Your expert guide to Hadoop big data platforms E-guide In this e-guide A look at Amazon Elastic MapReduce A look at Amazon Elastic cloud-based Hadoop MapReduce cloud-based Hadoop Abie Reifer, DecisionWorx Learn more about the Cloudera The Amazon Elastic MapReduce Web service offers a managed Hadoop distribution Hadoop framework that enables users to distribute and process big data across dynamically scalable Amazon EC2 instances. Inside the Hortonworks open enterprise Hadoop distribution Amazon Elastic MapReduce provides users access to a cloud-based Hadoop implementation for analyzing and processing large amounts of data. Built on top Inside the IBM BigInsights of Amazon's cloud services, EMR leverages Amazon's Elastic Compute Cloud platform for big data and Simple Storage services, enabling users to provision a Hadoop cluster management quickly. Amazon's cloud elasticity and setup tools also give users a way to temporarily Inside the MapR Hadoop distribution for managing big scale up a cloud-based Hadoop cluster for short-term increased computing data capacity. Amazon EMR lets users focus on the design of their workflow without the distractions of configuring a Hadoop cluster. As with other Amazon cloud Inside the Microsoft Azure services, users pay for only what they use. HDInsight cloud infrastructure Page 1 of 28 E-guide In this e-guide Amazon Elastic MapReduce features A look at Amazon Elastic The current version of Amazon EMR, 4.3.0, bundles several open source MapReduce cloud-based applications, a set of components for users to monitor and manage cluster Hadoop resources, and components that enable application and cluster interoperability with other services.
    [Show full text]