Apache Kylin云原生架构思考及规划-OS2ATC

Total Page:16

File Type:pdf, Size:1020Kb

Apache Kylin云原生架构思考及规划-OS2ATC Apache Kylin 云原生架构 思考及规划 演讲人 史少锋 Kyligence 首席架构师 Apache Kylin PMC & Committer Effective Cloud User Group www.ecug.org 关于 Apache Kylin Extreme OLAP Engine for Big Data Apache Kylin™ 是一个开源的分布式分析引擎,为 Hadoop 等大 型分布式数据平台之上的超大规模数据集通过标准 SQL 查询及多 维分析(OLAP)功能,提供亚秒级的交互式分析能力。 官方网站: https://kylin.apache.org Effective Cloud User Group www.ecug.org 发展历史 2014年10月 2015年11月 2017年4月 开源并加入 毕业成为 Apache 发布 Kylin-2.0, 支持 Apache 孵化器项目 顶级项目 雪花模型和 Spark 2013年9月 2015年9月 2016年9月 2019年12月 项目启动 InfoWorld 二次获得 InfoWorld 发布 Kylin 3.0,支持 最佳开源大数据工具奖 最佳开源大数据工具奖 实时分析 Effective Cloud User Group www.ecug.org Apache Kylin 基础架构 - Build OLAP cube on Hadoop Data analytics Interactive Reporting Dashboard - Support TB to PB level data - Sub-second query latency - ANSI-SQL OLAP / Data mart Apache Kylin - JDBC / ODBC / REST API - BI integration Hive / Kafka / Hadoop MR/Spark HBase / Parquet - Web GUI RDBMS - LDAP/SSO Effective Cloud User Group www.ecug.org OLAP 与 OLAP Cube 联机分析处理(英语:Online analytical processing),简称OLAP,是计算机技 术中快速解决多维分析问题(MDA)的一种方法。– 维基百科 Cube 是 OLAP 的核心数据结构,基本操作: • 上卷 Roll-up • 下钻 Drill-down • 切片 Slice and dice • 旋转 Pivot Effective Cloud User Group www.ecug.org 理论基础:空间换时间 • Cuboid: 一种维度组合 • Cube: 所有的维度组合 • 每个 Cuboid 可以从上层 Cuboid 聚合计算而来 Kylin 会选择满足条件的最小的 Cuboid 回答查询 Effective Cloud User Group www.ecug.org 无 Cube 的 SQL 执行 select l_returnflag, o_orderstatus, Sort sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price from Agg v_lineitem inner join v_orders on l_orderkey = o_orderkey Filter where l_shipdate <= '1998-09-16' 无预计算, 全部现场计算 group by Join l_returnflag, o_orderstatus order by l_returnflag, Table Table o_orderstatus; 时间复杂度 O(N) Effective Cloud User Group www.ecug.org 使用 Cube 的 SQL 执行 Sort Agg Sort Filter Filter 有预计算,基于 出结果, Join Cube I/O 少,计算少,延迟低 Cube 预聚合数据 Table Table 时间复杂度 O(1) Effective Cloud User Group www.ecug.org 性能对比 PB 级数据上的亚秒级查询能力 条SQL响 不同数据量性能化 12 90 80 10 70 8 60 (s) (s) 50 6 40 Latency 4 Latency 30 20 2 10 0 0 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3 0 10 20 30 40 50 SSB Queries Data Scale Kylin SQL on Hadoop Kylin SQL on Hadoop Effective Cloud User Group www.ecug.org 和 BI 工具的无缝集成 支持各种开源及商业 BI 工具 Effective Cloud User Group www.ecug.org 全球用户认可 全球 1000+ 用户 Effective Cloud User Group www.ecug.org Kylin 基于 Hadoop /HBase 架构的局限 Hadoop 运维困难 • 计算与存储紧耦合,扩容/缩容难 HBase 不适合 OLAP 场景 • 组件多,架构复杂,学习成本高 • 为写多读少场景而设计,适合小量的 • 问题排查难,故障恢复慢 写和读,不适合大范围读 • 整体拥有成本高 • 索引简单,不适合复杂检索 • 非真正列存,I/O 消耗大 • MPP 架构,难以下压复杂计算,存在 单点 • 故障恢复、升级等困难 Effective Cloud User Group www.ecug.org 云计算正在吞噬整个世界,包括 Hadoop 云上大数据处理的基础:存储与计算分离 • 更加灵活地 scale,无需担心数据的持久 用对象存储代替 HDFS 本地磁盘存储 用容器代替 YARN 管理资源 • 近乎无限的容量 • 更好的应用隔离 • 更高的可靠性和持久性 • 更高的资源利用率 • 按实际使用量付费 • 全方位的运营监控 Effective Cloud User Group www.ecug.org 云原生架构正在成为主流 持续集成/持续交付 敏捷开发 容器编排 Effective Cloud User Group www.ecug.org Apache Kylin 如何适应这一趋势? Effective Cloud User Group www.ecug.org 第一步:重构 – 核心组件可插拔 [已完成] 3rd Party App SQL-Based Tool (Web App, Mobile…) (BI Tools: Tableau…) REST API JDBC/ODBC SQL SQL 将数据源、执行引擎和存储结构抽象为接 REST Server 口,允许支持更多的数据源接入,也支持采 Query Engine 用其它技术作为构建与存储引擎 Low Latency - Routing Seconds Hadoop Engine Abstraction DataOLAP Hive CubeCubes Kafka Metadata Storage Abstraction (HBase) Abstraction Data SourceData Cube Builder Source Data (MapReduce…) Key Value Data Effective Cloud User Group www.ecug.org 可插拔架构带来扩展的可能 Hive Adaptor Hive Source … Map-reduce Spark Hbase Adaptor Adaptor Adaptor JDBC Adaptor JDBC Parquet Adaptor Source 保存 Cube … … 加载数据源 Compute Engine Data Source Storage Effective Cloud User Group www.ecug.org 第二步:Spark 代替 MapReduce 进行构建 [已完成] RDD-5 Apache Spark 正在取代 Hadoop 成为大数据处理的核心技术 RDD-4 Kylin 利用 Spark 进行 Cube 计算: • 将每一层的 cuboid 抽象为一个 Spark 弹性数据集(RDD) RDD-3 • 使用 Cache 中的父 RDD 来生成子 RDD RDD-2 • 子 RDD 生成后,导出并清理父 RDD RDD-1 Effective Cloud User Group www.ecug.org Full on Spark 引擎 所有的构建步骤都使用 Spark 任务执行,为构建脱离 Hadoop 提供可能 Join Aggregate Cleanup • Flat Table • Base Cuboid • Data Source • Dim Dictionary • Encoded Flat Table • Storage Distinct Encode Aggregate Effective Cloud User Group www.ecug.org 使用 Spark 构建的优点 • 构建效率提升1倍以上 ü 节省1半以上的时间 • 简化 Kylin 的构建逻辑 ü 使用函数式编程来精简代码 • 向 Cloud Native 迈出重要一步 ü Spark 可以独立部署,可以容器运行 ü 抛去对 Hadoop 平台的依赖 Effective Cloud User Group www.ecug.org 第三步: Docker 运行 Kylin [已完成] Kylin 查询服务本身无状态,天然适合 docker 化; 通过 Zookeeper 注册和协调多个节点角色分配 Effective Cloud User Group www.ecug.org 用 Docker 运行 Kylin 1. 执行镜像拉取命令 2. 运行容器 3. Enjoy Kylin Effective Cloud User Group www.ecug.org 第四步: Kubernetes 部署 Kylin [开发中] Job Query Query Query Effective Cloud User Group www.ecug.org 使用 Kubernetes 部署 Kylin 集群 1. 创建 ConfigMap 以配置 Kubernetes 资源对象 2. 创建 Service 和 StatefulSet 3. Enjoy Kylin cluster Effective Cloud User Group www.ecug.org 第五步:Parquet 代替 HBase(开发中) Parquet 存储的优点 • 以原生 Apache Parquet 格式作为 Cube 存储,自带 schema 和 编码,更易于与大数据生态整合 • 真正列存,I/O 效率高,压缩率高 • 与 Spark 有很好的结合,利用 Spark 并行化、向量化技术提 速 • 可存储在对象存储上,实现服务无状态化 Effective Cloud User Group www.ecug.org 第六步:查询分布式化(开发中) 从基于 Apache Calcite 的单点执行到基于 Spark DataFrame 的分布式执行 Sort SortDF • 基于 Calcite 的查询引擎,将各个节点 Cube 数据拉取 Aggr AggDF 到查询节点进行后续处理 • 当查询复杂、数据量大的时候,容易卡死和 OOM Project • 基于 Spark DF 的查询执行器,所有步骤均分布式并 ProjectDF 行执行 • 消除单点,提升稳定性 Filter FilterDF Cube CuboidDF 基于 Calcite 基于 Spark DF 分布式计算 (分布式+单机) (完全分布式) Effective Cloud User Group (Coprocessor) www.ecug.org Apache Kylin NG目标:云原生的大数据 OLAP 引擎 Interactive Data analytics Reporting Dashboard Metadata Apache Kylin Security OLAP / Data mart Resource Orchestration Container Service (K8S, Docker) Data Lake Source file, Streams, Parquet on Object Storage (S3, ADSL) Effective Cloud User Group www.ecug.org Thanks Effective Cloud User Group www.ecug.org.
Recommended publications
  • Mapreduce Service
    MapReduce Service Troubleshooting Issue 01 Date 2021-03-03 HUAWEI TECHNOLOGIES CO., LTD. Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Issue 01 (2021-03-03) Copyright © Huawei Technologies Co., Ltd. i MapReduce Service Troubleshooting Contents Contents 1 Account Passwords.................................................................................................................. 1 1.1 Resetting
    [Show full text]
  • Desarrollo De Una Solución Business Intelligence Mediante Un Paradigma De Data Lake
    Desarrollo de una solución Business Intelligence mediante un paradigma de Data Lake José María Tagarro Martí Grado de Ingeniería Informática Consultor: Humberto Andrés Sanz Profesor: Atanasi Daradoumis Haralabus 13 de enero de 2019 Esta obra está sujeta a una licencia de Reconocimiento-NoComercial-SinObraDerivada 3.0 España de Creative Commons FICHA DEL TRABAJO FINAL Desarrollo de una solución Business Intelligence mediante un paradigma de Data Título del trabajo: Lake Nombre del autor: José María Tagarro Martí Nombre del consultor: Humberto Andrés Sanz Fecha de entrega (mm/aaaa): 01/2019 Área del Trabajo Final: Business Intelligence Titulación: Grado de Ingeniería Informática Resumen del Trabajo (máximo 250 palabras): Este trabajo implementa una solución de Business Intelligence siguiendo un paradigma de Data Lake sobre la plataforma de Big Data Apache Hadoop con el objetivo de ilustrar sus capacidades tecnológicas para este fin. Los almacenes de datos tradicionales necesitan transformar los datos entrantes antes de ser guardados para que adopten un modelo preestablecido, en un proceso conocido como ETL (Extraer, Transformar, Cargar, por sus siglas en inglés). Sin embargo, el paradigma Data Lake propone el almacenamiento de los datos generados en la organización en su propio formato de origen, de manera que con posterioridad puedan ser transformados y consumidos mediante diferentes tecnologías ad hoc para las distintas necesidades de negocio. Como conclusión, se indican las ventajas e inconvenientes de desplegar una plataforma unificada tanto para análisis Big Data como para las tareas de Business Intelligence, así como la necesidad de emplear soluciones basadas en código y estándares abiertos. Abstract (in English, 250 words or less): This paper implements a Business Intelligence solution following the Data Lake paradigm on Hadoop’s Big Data platform with the aim of showcasing the technology for this purpose.
    [Show full text]
  • Optimisation of Ad-Hoc Analysis of an OLAP Cube Using Sparksql
    UPTEC X 17 007 Examensarbete 30 hp September 2017 Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL Milja Aho Abstract Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL Milja Aho Teknisk- naturvetenskaplig fakultet UTH-enheten An Online Analytical Processing (OLAP) cube is a way to represent a multidimensional database. The multidimensional database often uses a star Besöksadress: schema and populates it with the data from a relational database. The purpose of Ångströmlaboratoriet Lägerhyddsvägen 1 using an OLAP cube is usually to find valuable insights in the data like trends or Hus 4, Plan 0 unexpected data and is therefore often used within Business intelligence (BI). Mondrian is a tool that handles OLAP cubes that uses the query language Postadress: MultiDimensional eXpressions (MDX) and translates it to SQL queries. Box 536 751 21 Uppsala Apache Kylin is an engine that can be used with Apache Hadoop to create and query OLAP cubes with an SQL interface. This thesis investigates whether the Telefon: engine Apache Spark running on a Hadoop cluster is suitable for analysing OLAP 018 – 471 30 03 cubes and what performance that can be expected. The Star Schema Benchmark Telefax: (SSB) has been used to provide Ad-Hoc queries and to create a large database 018 – 471 30 00 containing over 1.2 billion rows. This database was created in a cluster in the Omicron office consisting of five worker nodes and one master node. Queries were Hemsida: then sent to the database using Mondrian integrated into the BI platform Pentaho. http://www.teknat.uu.se/student Amazon Web Services (AWS) has also been used to create clusters with 3, 6 and 15 slaves to see how the performance scales.
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • IBM Big SQL (With Hbase), Splice Major Contributor to the Apache Be a Major Determinant“ Machine (Which Incorporates Hbase Madlib Project
    MarketReport Market Report Paper by Bloor Author Philip Howard Publish date December 2017 SQL Engines on Hadoop It is clear that“ Impala, LLAP, Hive, Spark and so on, perform significantly worse than products from vendors with a history in database technology. Author Philip Howard” Executive summary adoop is used for a lot of these are discussed in detail in this different purposes and one paper it is worth briefly explaining H major subset of the overall that SQL support has two aspects: the Hadoop market is to run SQL against version supported (ANSI standard 1992, Hadoop. This might seem contrary 1999, 2003, 2011 and so on) plus the to Hadoop’s NoSQL roots, but the robustness of the engine at supporting truth is that there are lots of existing SQL queries running with multiple investments in SQL applications that concurrent thread and at scale. companies want to preserve; all the Figure 1 illustrates an abbreviated leading business intelligence and version of the results of our research. analytics platforms run using SQL; and This shows various leading vendors, SQL skills, capabilities and developers and our estimates of their product’s are readily available, which is often not positioning relative to performance and The key the case for other languages. SQL support. Use cases are shown by the differentiators“ However, the market for SQL engines on colour of each bubble but for practical between products Hadoop is not mono-cultural. There are reasons this means that no vendor/ multiple use cases for deploying SQL on product is shown for more than two use are the use cases Hadoop and there are more than twenty cases, which is why we describe Figure they support, their different SQL on Hadoop platforms.
    [Show full text]
  • Apache Calcite for Enabling SQL Access to Nosql Data Systems Such As Apache Geode Christian Tzolov Whoami Christian Tzolov
    Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache Crunch PMC member [email protected] blog.tzolov.net twitter: @christzolov https://nl.linkedin.com/in/tzolov Disclaimer This talk expresses my personal opinions. It is not read or approved by Pivotal and does not necessarily reflect the views and opinions of Pivotal nor does it constitute any official communication of Pivotal. Pivotal does not support any of the code shared here. 2 Big Data Landscape 2016 • Volume • Velocity • Varity • Scalability • Latency • Consistency vs. Availability (CAP) 3 Data Access • {Old | New} SQL • Custom APIs – Key / Value – Fluent APIs – REST APIs • {My} Query Language Unified Data Access? At What Cost? 4 SQL? • Apache Apex • SQL-Gremlin • Apache Drill … • Apache Flink • Apache Geode • Apache Hive • Apache Kylin • Apache Phoenix • Apache Samza • Apache Storm • Cascading • Qubole Quark 5 Geode Adapter - Overview SQL/JDBC/ODBC Parse SQL, converts into relational expression and Apache Calcite optimizes Push down the relational expressions supported by Geode Spring Data API for Enumerable OQL and falls back to the Calcite interacting with Geode Adapter Enumerable Adapter for the rest Convert SQL relational Spring Data Geode Adapter expressions into OQL queries Geode (Geode Client) Geode API and OQL Geode Server Geode Server Geode Server Data Data Data SQL Relational Expressions
    [Show full text]
  • Big Data Developer, Data Science and Business Intelligence Developer
    Curriculum vitae INFORMAZIONI PERSONALI Nome: Gaetano , Cognome: Fabiano Residenza: Via Faentina 127, 50133 Firenze (Fi) 00 39 328 946 99 19 [email protected] Skype gaetano.fab | Linkedin www.linkedin.com/in/gaetanofabiano | Twitter @gaetanofabiano Sesso Maschile | Data di nascita 15/12/1985 | Nazionalità Italiana | Stato civile: coniugato OCCUPAZIONE DESIDERATA Big Data developer, Data Science and Business Intelligence developer, Prototype and product software designer, software developer and task planner, IT - Project manager. Data harvesting and data manipulation and data mining techniques expert. ESPERIENZA PROFESSIONALE 09/2017 – presente09/2019 Solution Developer at Engineering Ingegneria Informatica S.p.A. ENGINEERING INGEGNERIA INFORMATICA. 50144 Firenze. Via torre degli Agli http://www.eng.it ñ 09/2017 – oggi : Autostrade Per L’italia S.p.a: Attività di analisi, progettazione e sviluppo componenti software innovativi per la mobilità e monitoraggio e Intelligent Transport System (ITS). Analisi e studio delle soluzioni di controllo velocità media in ambito stradale: progetto SICVE (Sistema informativo per il controllo della velocità) meglio noto come TUTOR Supporto per la gestione e sviluppo di progetti software. ñ 03/2019 – oggi: Infoblu S.p.a. Progettazione e sviluppo componenti software per harvesting di dati (big data: 2 Milioni GPS point per minuto) provenienti da diversi fornitori, calcolo algoritmico e analisi flussi velocità e tracce su dati georeferenziati per fornitura su clienti finalizzati ai servizi di navigazione
    [Show full text]
  • Design of the Integrated Big and Fast Data Eco-‐System
    D4.1: Design of the integrated big and fast data eco-system Sandro Fiore (CMCC), Donatello Elia (CMCC), Walter dos Author(s) Santos Filho (UFMG), Carlos Eduardo Pires (UFCG) Status DraFt/Review/Approval/Final Version v1.0 Date 01/07/2016 Dissemination Level X PU: Public PP: Restricted to other programme participants (including the Commission) RE: Restricted to a group speciFied by the consortium (including the Commission) CO: Confidential, only For members of the consortium (including the Commission) EUBra-BIGSEA is Funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement No 690116. Este projeto é resultante da 3a Chamada Coordenada BR-UE em Tecnologias da InFormação e Comunicação (TIC), anunciada pelo Ministério de Ciência, Tecnologia e Inovação (MCTI) Abstract: Europe - Brazil Collaboration of BIG Data Scientific Research through Cloud-Centric Applications (EUBra-BIGSEA) is a medium-scale research project funded by the European Commission under the Cooperation Programme, and the Ministry of Science and Technology (MCT) of Brazil in the frame of the third European-Brazilian coordinated call. The document has been produced with the co-funding of the European Commission and the MCT. The purpose of this report is the design of the integrated big and fast data eco-system. The deliverable aims at identifying and describing in detail all the key architectural building blocks needed to address the multifaceted data management aspects (data storage, access, analytics and mining) of the EUBra-BIGSEA project. www.eubra-bigsea.eu | [email protected] |@bigsea_eubr 1 Document identifier: EUBRA BIGSEA -WP4-D4.1 Deliverable lead CMCC Related work package WP4 Author(s) Sandro Fiore (CMCC), Donatello Elia (CMCC), Walter dos Santos Filho (UFMG), Carlos Eduardo Pires (UFCG) Contributor(s) Ignacio Blanquer (UPV), Gustavo Avelar (UFMG), Wagner Meira (UFMG), Dorgival Guedes (UFMG), Luiz Fernando Carvalho (UFMG), Monica Vitali (POLIMI), Demetrio Mestre (UFCG), Tiago Brasileiro (UFCG), Nádia P.
    [Show full text]
  • D3.1 Multimedia Pre-Processing, Indexing and Mining Tools
    Ref. Ares(2019)380292 - 23/01/2019 Funded by the Horizon 2020 Framework Programme of the European Union MAGNETO - Grant Agreement 786629 Deliverable D3.1 Title: Multimedia Pre-processing, Indexing and Mining Tools Dissemination Level: PU Nature of the Deliverable: R Date: 19/12/2018 Distribution: WP3 Editors: QMUL Reviewers: ICCS, THA Contributors: ICCS, IOSB, UPV, EST, EUROB, VML, SIV Abstract: This deliverable reports on development, setup, configuration and preliminary performance evaluation of the MAGNETO pre-processing tools and Big Data software framework for mining, indexing and annotation of heterogenous multimedia data. The tools and solutions described in this document have been developed as part of the WP3 and will be further optimized and tuned based on feedback from end-users and performance evaluation during the testing phase. This document provides comprehensive overview of tailored solutions for multimedia content indexing, pre-processing of sensor data, text and data mining, automatic language translation, and big data framework. * Dissemination Level: PU= Public, RE= Restricted to a group specified by the Consortium, PP= Restricted to other program participants (including the Commission services), CO= Confidential, only for members of the Consortium (including the Commission services) ** Nature of the Deliverable: P= Prototype, R= Report, S= Specification, T= Tool, O= Other D3.1 Multimedia Pre-processing, Indexing and Mining Tools Disclaimer This document contains material, which is copyright of certain MAGNETO consortium parties and may not be reproduced or copied without permission. The information contained in this document is the proprietary confidential information of certain MAGNETO consortium parties and may not be disclosed except in accordance with the consortium agreement.
    [Show full text]
  • Elastic Mapreduce EMR Development Guide
    Elastic MapReduce Elastic MapReduce EMR Development Guide Product Documentation ©2013-2019 Tencent Cloud. All rights reserved. Page 1 of 441 Elastic MapReduce Copyright Notice ©2013-2019 Tencent Cloud. All rights reserved. Copyright in this document is exclusively owned by Tencent Cloud. You must not reproduce, modify, copy or distribute in any way, in whole or in part, the contents of this document without Tencent Cloud's the prior written consent. Trademark Notice All trademarks associated with Tencent Cloud and its services are owned by Tencent Cloud Computing (Beijing) Company Limited and its affiliated companies. Trademarks of third parties referred to in this document are owned by their respective proprietors. Service Statement This document is intended to provide users with general information about Tencent Cloud's products and services only and does not form part of Tencent Cloud's terms and conditions. Tencent Cloud's products or services are subject to change. Specific products and services and the standards applicable to them are exclusively provided for in Tencent Cloud's applicable terms and conditions. ©2013-2019 Tencent Cloud. All rights reserved. Page 2 of 441 Elastic MapReduce Contents EMR Development Guide Hadoop Development Guide HDFS Common Operations Submitting MapReduce Tasks Automatically Adding Task Nodes Without Assigning ApplicationMasters YARN Task Queue Management Practices on YARN Label Scheduling Hadoop Best Practices Using API to Analyze Data in HDFS and COS Dumping YARN Job Logs to COS Spark Development Guide
    [Show full text]
  • Online Analytical Processing on Hadoop Using Apache Kylin
    International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 12 – No. 2, May 2017 – www.ijais.org Online Analytical Processing on Hadoop using Apache Kylin Supriya Vitthal Ranawade Shivani Navale Akshay Dhamal Bachelor of Engineering (I.T.) Bachelor of Engineering (I.T.) Bachelor of Engineering (I.T.) P.E.S Modern College of P.E.S Modern College of P.E.S Modern College of Engineering Engineering Engineering Department of I.T. Department of I.T. Department of I.T. Kuldeep Deshpande Chandrashekhar Ghuge Founder and CEO Assistant Professor Ellicium Solutions P.E.S Modern College of Engineering Department of I.T. ABSTRACT There has been a lot of research on how Hadoop is used as an In the Big Data age, it is necessary to remodel the traditional alternative for data warehousing (e.g. 1, 4, 7, 9, and 11). Only data warehousing and Online Analytical Processing (OLAP) storing the data in the datawarehouse is not profitable to the system. Many challenges are posed to the traditional business until, the underlying data is analysed to gain business platforms due to the ever increasing data. In this paper, we insights. Online Analytical Processing (OLAP) is an approach have proposed a system which overcomes the challenges and to answer multidimensional analytical queries swiftly, and is also beneficial to the business organizations. The proposed provides support for decision-making and intuitive result system mainly focuses on OLAP on Hadoop platform using views for queries [8].However, the traditional OLAP Apache Kylin. Kylin is an OLAP engine which builds OLAP implementation, namely the ROLAP system based on cubes from the data present in hive and stores the cubes in RDBMS, appears to be inadequate in face of big data HBase for further analysis.
    [Show full text]
  • Software Security > Wearable Computing > Gaming Software
    > Software Security > Wearable Computing > Gaming Software > Robotics and Analytics JUNE 2018 www.computer.org IEEE COMPUTER SOCIETY: Be at the Center of It All IEEE Computer Society membership puts you at the heart of the technology profession—and helps you grow with it. Here are 10 reasons why you need to belong. A robust jobs board, Training that plus videos, articles sharpens your edge and presentations to in Cisco, IT security, help you land that next MS Enterprise, Oracle opportunity. and more. Scholarships awarded to computer science Certifi cations and engineering student and exam preparation members each year. that set you apart. Opportunities to Industry intelligence, get involved through including Computer, myCS, speaking, publishing and Computing Now, and volunteering opportunities. myComputer. Deep discounts Over 200 annual on magazines, journals, conferences and technical conferences, symposia events held worldwide. and workshops. Access to 300-plus chapters hundreds of and more than 30 books, 13 technical technical committees magazines and keep you 20 research journals. connected. IEEE Computer Society—keeping you ahead of the game. Get involved today. www.computer.org/membership IEEE COMPUTER SOCIETY computer.org • +1 714 821 8380 STAFF Editor Managers, Editorial Content Meghan O’Dell Brian Brannon, Carrie Clark Contributing Staff Publisher Christine Anthony, Lori Cameron, Cathy Martin, Chris Nelson, Robin Baldwin Dennis Taylor, Rebecca Torres, Bonnie Wylie Director, Products and Services Production & Design Evan Butterfield Carmen Flores-Garvey Senior Advertising Coordinator Debbie Sims Circulation: ComputingEdge (ISSN 2469-7087) is published monthly by the IEEE Computer Society. IEEE Headquarters, Three Park Avenue, 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720; voice +1 714 821 8380; fax +1 714 821 4010; IEEE Computer Society Headquarters, 2001 L Street NW, Suite 700, Washington, DC 20036.
    [Show full text]