Desarrollo De Una Solución Business Intelligence Mediante Un Paradigma De Data Lake

Total Page:16

File Type:pdf, Size:1020Kb

Desarrollo De Una Solución Business Intelligence Mediante Un Paradigma De Data Lake Desarrollo de una solución Business Intelligence mediante un paradigma de Data Lake José María Tagarro Martí Grado de Ingeniería Informática Consultor: Humberto Andrés Sanz Profesor: Atanasi Daradoumis Haralabus 13 de enero de 2019 Esta obra está sujeta a una licencia de Reconocimiento-NoComercial-SinObraDerivada 3.0 España de Creative Commons FICHA DEL TRABAJO FINAL Desarrollo de una solución Business Intelligence mediante un paradigma de Data Título del trabajo: Lake Nombre del autor: José María Tagarro Martí Nombre del consultor: Humberto Andrés Sanz Fecha de entrega (mm/aaaa): 01/2019 Área del Trabajo Final: Business Intelligence Titulación: Grado de Ingeniería Informática Resumen del Trabajo (máximo 250 palabras): Este trabajo implementa una solución de Business Intelligence siguiendo un paradigma de Data Lake sobre la plataforma de Big Data Apache Hadoop con el objetivo de ilustrar sus capacidades tecnológicas para este fin. Los almacenes de datos tradicionales necesitan transformar los datos entrantes antes de ser guardados para que adopten un modelo preestablecido, en un proceso conocido como ETL (Extraer, Transformar, Cargar, por sus siglas en inglés). Sin embargo, el paradigma Data Lake propone el almacenamiento de los datos generados en la organización en su propio formato de origen, de manera que con posterioridad puedan ser transformados y consumidos mediante diferentes tecnologías ad hoc para las distintas necesidades de negocio. Como conclusión, se indican las ventajas e inconvenientes de desplegar una plataforma unificada tanto para análisis Big Data como para las tareas de Business Intelligence, así como la necesidad de emplear soluciones basadas en código y estándares abiertos. Abstract (in English, 250 words or less): This paper implements a Business Intelligence solution following the Data Lake paradigm on Hadoop’s Big Data platform with the aim of showcasing the technology for this purpose. i Traditional data warehouses require incoming data to be transformed before being stored by means of an ETL process (Extract-Transform-Load) to adopt a predefined data model. However, the Data Lake paradigm proposes storing first the organization’s data in its original format in such a way that they can later be transformed and consumed by ad hoc processes based on the business needs. Finally, there is a review of the advantages and disadvantages of deploying a unified data platform for both Big Data and traditional Business Intelligence use cases as well as the importance of using solutions based on open source and open standards. Palabras clave (entre 4 y 8): Business Intelligence, Data Lake, Data Warehouse, Big Data, Hadoop, ETL, Schema-on-read, Hive ii Índice 1. Introducción .................................................................................................... 1 1.1. Contexto y justificación del Trabajo .......................................................... 1 1.2. Objetivos del Trabajo ................................................................................ 2 1.3. Enfoque y método seguido ....................................................................... 2 1.4. Planificación del Trabajo .......................................................................... 3 Relación de tareas y diagrama de Gantt del proyecto ..................... 3 Diagrama de Gantt ........................................................................... 5 1.5. Sumario de los productos obtenidos ........................................................ 7 1.6. Capítulos centrales de la memoria ........................................................... 7 2. Descripción de la empresa modelo ................................................................ 9 2.1. Modelo de gestión de la información de partida ....................................... 9 2.2. Descripción del caso: Supermercados Sierra ........................................... 9 Modelo de negocio ........................................................................... 9 Dificultades del modelo .................................................................. 10 Cadena de suministro .................................................................... 10 2.3. Arquitectura de los Sistemas de Información ......................................... 11 Hardware ....................................................................................... 11 Software ......................................................................................... 11 Topología ....................................................................................... 12 2.4. Fuentes primarias de datos .................................................................... 12 Modelo de datos del aplicativo TPV ............................................... 13 Modelo de datos de la solución e-commerce ................................. 13 Modelo de datos del registro de acceso del servidor web.............. 14 3. Juego de datos de simulación ...................................................................... 15 3.1. Introducción ............................................................................................ 15 iii 3.2. Juego de datos “The complete journey” ................................................. 15 Descripción .................................................................................... 15 Modelo de datos ............................................................................ 16 Descripción detallada de las relaciones ......................................... 16 Implementación en MySQL ............................................................ 19 3.3. Juego de datos “Let’s get kind of real”.................................................... 20 Modelo de datos ............................................................................ 20 Almacenamiento de los datos ........................................................ 21 3.4. Juego de datos EDGAR ......................................................................... 22 Estructura de los datos .................................................................. 22 3.5. Selección de datos ................................................................................. 22 Estimación del volumen de operaciones ........................................ 23 Implementación de los juegos de datos ......................................... 24 4. Diseño de la solución y casos de uso típicos ............................................... 25 4.1. Introducción a Apache Hadoop .............................................................. 25 4.2. Data Marts .............................................................................................. 25 Introducción a Apache Hive ........................................................... 25 Schema-on-read y el Data Lake .................................................... 26 Data Marts en Supermercados Sierra ............................................ 28 4.3. Cubo OLAP ............................................................................................ 28 Esquema en estrella ...................................................................... 28 Dimensiones y medidas del cubo .................................................. 29 4.4. Cuadro de mando ejecutivo .................................................................... 29 4.5. Cuadro de mando de gestión ................................................................. 30 4.6. Cuadro de mando operativo ................................................................... 30 4.7. Big Data Analytics .................................................................................. 30 4.8. Modelo predictivo ................................................................................... 31 4 5. Implementación de la solución ..................................................................... 32 5.1. Diseño .................................................................................................... 32 Arquitectura .................................................................................... 32 Componentes ................................................................................. 33 Procesos ........................................................................................ 37 5.2. Entorno de hardware y software de base ............................................... 40 Instalación de Hortonworks HDP ................................................... 40 Configuración de HDP ................................................................... 45 5.3. ETL ......................................................................................................... 48 Modelo de datos ............................................................................ 48 Carga de datos por lotes ................................................................ 51 Carga de datos en tiempo casi real ............................................... 52 5.4. Consultas exploratorias SQL .................................................................. 53 5.5. Cubo OLAP en Apache Kylin ................................................................. 55 5.6. Modelo predictivo avanzado: análisis de cesta de la compra ................. 60 Cuaderno RStudio ......................................................................... 61 6. Conclusiones ................................................................................................ 63 6.1. Descripción de las conclusiones ............................................................ 63 6.2. Consecución de los objetivos
Recommended publications
  • Mapreduce Service
    MapReduce Service Troubleshooting Issue 01 Date 2021-03-03 HUAWEI TECHNOLOGIES CO., LTD. Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied. Issue 01 (2021-03-03) Copyright © Huawei Technologies Co., Ltd. i MapReduce Service Troubleshooting Contents Contents 1 Account Passwords.................................................................................................................. 1 1.1 Resetting
    [Show full text]
  • Optimisation of Ad-Hoc Analysis of an OLAP Cube Using Sparksql
    UPTEC X 17 007 Examensarbete 30 hp September 2017 Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL Milja Aho Abstract Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL Milja Aho Teknisk- naturvetenskaplig fakultet UTH-enheten An Online Analytical Processing (OLAP) cube is a way to represent a multidimensional database. The multidimensional database often uses a star Besöksadress: schema and populates it with the data from a relational database. The purpose of Ångströmlaboratoriet Lägerhyddsvägen 1 using an OLAP cube is usually to find valuable insights in the data like trends or Hus 4, Plan 0 unexpected data and is therefore often used within Business intelligence (BI). Mondrian is a tool that handles OLAP cubes that uses the query language Postadress: MultiDimensional eXpressions (MDX) and translates it to SQL queries. Box 536 751 21 Uppsala Apache Kylin is an engine that can be used with Apache Hadoop to create and query OLAP cubes with an SQL interface. This thesis investigates whether the Telefon: engine Apache Spark running on a Hadoop cluster is suitable for analysing OLAP 018 – 471 30 03 cubes and what performance that can be expected. The Star Schema Benchmark Telefax: (SSB) has been used to provide Ad-Hoc queries and to create a large database 018 – 471 30 00 containing over 1.2 billion rows. This database was created in a cluster in the Omicron office consisting of five worker nodes and one master node. Queries were Hemsida: then sent to the database using Mondrian integrated into the BI platform Pentaho. http://www.teknat.uu.se/student Amazon Web Services (AWS) has also been used to create clusters with 3, 6 and 15 slaves to see how the performance scales.
    [Show full text]
  • Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"
    Appendix to the paper "Code smell prediction employing machine learning meets emerging Java language constructs" Hanna Grodzicka, Michał Kawa, Zofia Łakomiak, Arkadiusz Ziobrowski, Lech Madeyski (B) The Appendix includes two tables containing the dataset used in the paper "Code smell prediction employing machine learning meets emerging Java lan- guage constructs". The first table contains information about 792 projects selected for R package reproducer [Madeyski and Kitchenham(2019)]. Projects were the base dataset for cre- ating the dataset used in the study (Table I). The second table contains information about 281 projects filtered by Java version from build tool Maven (Table II) which were directly used in the paper. TABLE I: Base projects used to create the new dataset # Orgasation Project name GitHub link Commit hash Build tool Java version 1 adobe aem-core-wcm- www.github.com/adobe/ 1d1f1d70844c9e07cd694f028e87f85d926aba94 other or lack of unknown components aem-core-wcm-components 2 adobe S3Mock www.github.com/adobe/ 5aa299c2b6d0f0fd00f8d03fda560502270afb82 MAVEN 8 S3Mock 3 alexa alexa-skills- www.github.com/alexa/ bf1e9ccc50d1f3f8408f887f70197ee288fd4bd9 MAVEN 8 kit-sdk-for- alexa-skills-kit-sdk- java for-java 4 alibaba ARouter www.github.com/alibaba/ 93b328569bbdbf75e4aa87f0ecf48c69600591b2 GRADLE unknown ARouter 5 alibaba atlas www.github.com/alibaba/ e8c7b3f1ff14b2a1df64321c6992b796cae7d732 GRADLE unknown atlas 6 alibaba canal www.github.com/alibaba/ 08167c95c767fd3c9879584c0230820a8476a7a7 MAVEN 7 canal 7 alibaba cobar www.github.com/alibaba/
    [Show full text]
  • IBM Big SQL (With Hbase), Splice Major Contributor to the Apache Be a Major Determinant“ Machine (Which Incorporates Hbase Madlib Project
    MarketReport Market Report Paper by Bloor Author Philip Howard Publish date December 2017 SQL Engines on Hadoop It is clear that“ Impala, LLAP, Hive, Spark and so on, perform significantly worse than products from vendors with a history in database technology. Author Philip Howard” Executive summary adoop is used for a lot of these are discussed in detail in this different purposes and one paper it is worth briefly explaining H major subset of the overall that SQL support has two aspects: the Hadoop market is to run SQL against version supported (ANSI standard 1992, Hadoop. This might seem contrary 1999, 2003, 2011 and so on) plus the to Hadoop’s NoSQL roots, but the robustness of the engine at supporting truth is that there are lots of existing SQL queries running with multiple investments in SQL applications that concurrent thread and at scale. companies want to preserve; all the Figure 1 illustrates an abbreviated leading business intelligence and version of the results of our research. analytics platforms run using SQL; and This shows various leading vendors, SQL skills, capabilities and developers and our estimates of their product’s are readily available, which is often not positioning relative to performance and The key the case for other languages. SQL support. Use cases are shown by the differentiators“ However, the market for SQL engines on colour of each bubble but for practical between products Hadoop is not mono-cultural. There are reasons this means that no vendor/ multiple use cases for deploying SQL on product is shown for more than two use are the use cases Hadoop and there are more than twenty cases, which is why we describe Figure they support, their different SQL on Hadoop platforms.
    [Show full text]
  • Apache Calcite for Enabling SQL Access to Nosql Data Systems Such As Apache Geode Christian Tzolov Whoami Christian Tzolov
    Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache Crunch PMC member [email protected] blog.tzolov.net twitter: @christzolov https://nl.linkedin.com/in/tzolov Disclaimer This talk expresses my personal opinions. It is not read or approved by Pivotal and does not necessarily reflect the views and opinions of Pivotal nor does it constitute any official communication of Pivotal. Pivotal does not support any of the code shared here. 2 Big Data Landscape 2016 • Volume • Velocity • Varity • Scalability • Latency • Consistency vs. Availability (CAP) 3 Data Access • {Old | New} SQL • Custom APIs – Key / Value – Fluent APIs – REST APIs • {My} Query Language Unified Data Access? At What Cost? 4 SQL? • Apache Apex • SQL-Gremlin • Apache Drill … • Apache Flink • Apache Geode • Apache Hive • Apache Kylin • Apache Phoenix • Apache Samza • Apache Storm • Cascading • Qubole Quark 5 Geode Adapter - Overview SQL/JDBC/ODBC Parse SQL, converts into relational expression and Apache Calcite optimizes Push down the relational expressions supported by Geode Spring Data API for Enumerable OQL and falls back to the Calcite interacting with Geode Adapter Enumerable Adapter for the rest Convert SQL relational Spring Data Geode Adapter expressions into OQL queries Geode (Geode Client) Geode API and OQL Geode Server Geode Server Geode Server Data Data Data SQL Relational Expressions
    [Show full text]
  • Apache Kylin云原生架构思考及规划-OS2ATC
    Apache Kylin 云原生架构 思考及规划 演讲人 史少锋 Kyligence 首席架构师 Apache Kylin PMC & Committer Effective Cloud User Group www.ecug.org 关于 Apache Kylin Extreme OLAP Engine for Big Data Apache Kylin™ 是一个开源的分布式分析引擎,为 Hadoop 等大 型分布式数据平台之上的超大规模数据集通过标准 SQL 查询及多 维分析(OLAP)功能,提供亚秒级的交互式分析能力。 官方网站: https://kylin.apache.org Effective Cloud User Group www.ecug.org 发展历史 2014年10月 2015年11月 2017年4月 开源并加入 毕业成为 Apache 发布 Kylin-2.0, 支持 Apache 孵化器项目 顶级项目 雪花模型和 Spark 2013年9月 2015年9月 2016年9月 2019年12月 项目启动 InfoWorld 二次获得 InfoWorld 发布 Kylin 3.0,支持 最佳开源大数据工具奖 最佳开源大数据工具奖 实时分析 Effective Cloud User Group www.ecug.org Apache Kylin 基础架构 - Build OLAP cube on Hadoop Data analytics Interactive Reporting Dashboard - Support TB to PB level data - Sub-second query latency - ANSI-SQL OLAP / Data mart Apache Kylin - JDBC / ODBC / REST API - BI integration Hive / Kafka / Hadoop MR/Spark HBase / Parquet - Web GUI RDBMS - LDAP/SSO Effective Cloud User Group www.ecug.org OLAP 与 OLAP Cube 联机分析处理(英语:Online analytical processing),简称OLAP,是计算机技 术中快速解决多维分析问题(MDA)的一种方法。– 维基百科 Cube 是 OLAP 的核心数据结构,基本操作: • 上卷 Roll-up • 下钻 Drill-down • 切片 Slice and dice • 旋转 Pivot Effective Cloud User Group www.ecug.org 理论基础:空间换时间 • Cuboid: 一种维度组合 • Cube: 所有的维度组合 • 每个 Cuboid 可以从上层 Cuboid 聚合计算而来 Kylin 会选择满足条件的最小的 Cuboid 回答查询 Effective Cloud User Group www.ecug.org 无 Cube 的 SQL 执行 select l_returnflag, o_orderstatus, Sort sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price from Agg v_lineitem inner join v_orders on l_orderkey = o_orderkey Filter where l_shipdate <= '1998-09-16' 无预计算, 全部现场计算 group by Join l_returnflag,
    [Show full text]
  • Big Data Developer, Data Science and Business Intelligence Developer
    Curriculum vitae INFORMAZIONI PERSONALI Nome: Gaetano , Cognome: Fabiano Residenza: Via Faentina 127, 50133 Firenze (Fi) 00 39 328 946 99 19 [email protected] Skype gaetano.fab | Linkedin www.linkedin.com/in/gaetanofabiano | Twitter @gaetanofabiano Sesso Maschile | Data di nascita 15/12/1985 | Nazionalità Italiana | Stato civile: coniugato OCCUPAZIONE DESIDERATA Big Data developer, Data Science and Business Intelligence developer, Prototype and product software designer, software developer and task planner, IT - Project manager. Data harvesting and data manipulation and data mining techniques expert. ESPERIENZA PROFESSIONALE 09/2017 – presente09/2019 Solution Developer at Engineering Ingegneria Informatica S.p.A. ENGINEERING INGEGNERIA INFORMATICA. 50144 Firenze. Via torre degli Agli http://www.eng.it ñ 09/2017 – oggi : Autostrade Per L’italia S.p.a: Attività di analisi, progettazione e sviluppo componenti software innovativi per la mobilità e monitoraggio e Intelligent Transport System (ITS). Analisi e studio delle soluzioni di controllo velocità media in ambito stradale: progetto SICVE (Sistema informativo per il controllo della velocità) meglio noto come TUTOR Supporto per la gestione e sviluppo di progetti software. ñ 03/2019 – oggi: Infoblu S.p.a. Progettazione e sviluppo componenti software per harvesting di dati (big data: 2 Milioni GPS point per minuto) provenienti da diversi fornitori, calcolo algoritmico e analisi flussi velocità e tracce su dati georeferenziati per fornitura su clienti finalizzati ai servizi di navigazione
    [Show full text]
  • Design of the Integrated Big and Fast Data Eco-‐System
    D4.1: Design of the integrated big and fast data eco-system Sandro Fiore (CMCC), Donatello Elia (CMCC), Walter dos Author(s) Santos Filho (UFMG), Carlos Eduardo Pires (UFCG) Status DraFt/Review/Approval/Final Version v1.0 Date 01/07/2016 Dissemination Level X PU: Public PP: Restricted to other programme participants (including the Commission) RE: Restricted to a group speciFied by the consortium (including the Commission) CO: Confidential, only For members of the consortium (including the Commission) EUBra-BIGSEA is Funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement No 690116. Este projeto é resultante da 3a Chamada Coordenada BR-UE em Tecnologias da InFormação e Comunicação (TIC), anunciada pelo Ministério de Ciência, Tecnologia e Inovação (MCTI) Abstract: Europe - Brazil Collaboration of BIG Data Scientific Research through Cloud-Centric Applications (EUBra-BIGSEA) is a medium-scale research project funded by the European Commission under the Cooperation Programme, and the Ministry of Science and Technology (MCT) of Brazil in the frame of the third European-Brazilian coordinated call. The document has been produced with the co-funding of the European Commission and the MCT. The purpose of this report is the design of the integrated big and fast data eco-system. The deliverable aims at identifying and describing in detail all the key architectural building blocks needed to address the multifaceted data management aspects (data storage, access, analytics and mining) of the EUBra-BIGSEA project. www.eubra-bigsea.eu | [email protected] |@bigsea_eubr 1 Document identifier: EUBRA BIGSEA -WP4-D4.1 Deliverable lead CMCC Related work package WP4 Author(s) Sandro Fiore (CMCC), Donatello Elia (CMCC), Walter dos Santos Filho (UFMG), Carlos Eduardo Pires (UFCG) Contributor(s) Ignacio Blanquer (UPV), Gustavo Avelar (UFMG), Wagner Meira (UFMG), Dorgival Guedes (UFMG), Luiz Fernando Carvalho (UFMG), Monica Vitali (POLIMI), Demetrio Mestre (UFCG), Tiago Brasileiro (UFCG), Nádia P.
    [Show full text]
  • D3.1 Multimedia Pre-Processing, Indexing and Mining Tools
    Ref. Ares(2019)380292 - 23/01/2019 Funded by the Horizon 2020 Framework Programme of the European Union MAGNETO - Grant Agreement 786629 Deliverable D3.1 Title: Multimedia Pre-processing, Indexing and Mining Tools Dissemination Level: PU Nature of the Deliverable: R Date: 19/12/2018 Distribution: WP3 Editors: QMUL Reviewers: ICCS, THA Contributors: ICCS, IOSB, UPV, EST, EUROB, VML, SIV Abstract: This deliverable reports on development, setup, configuration and preliminary performance evaluation of the MAGNETO pre-processing tools and Big Data software framework for mining, indexing and annotation of heterogenous multimedia data. The tools and solutions described in this document have been developed as part of the WP3 and will be further optimized and tuned based on feedback from end-users and performance evaluation during the testing phase. This document provides comprehensive overview of tailored solutions for multimedia content indexing, pre-processing of sensor data, text and data mining, automatic language translation, and big data framework. * Dissemination Level: PU= Public, RE= Restricted to a group specified by the Consortium, PP= Restricted to other program participants (including the Commission services), CO= Confidential, only for members of the Consortium (including the Commission services) ** Nature of the Deliverable: P= Prototype, R= Report, S= Specification, T= Tool, O= Other D3.1 Multimedia Pre-processing, Indexing and Mining Tools Disclaimer This document contains material, which is copyright of certain MAGNETO consortium parties and may not be reproduced or copied without permission. The information contained in this document is the proprietary confidential information of certain MAGNETO consortium parties and may not be disclosed except in accordance with the consortium agreement.
    [Show full text]
  • Elastic Mapreduce EMR Development Guide
    Elastic MapReduce Elastic MapReduce EMR Development Guide Product Documentation ©2013-2019 Tencent Cloud. All rights reserved. Page 1 of 441 Elastic MapReduce Copyright Notice ©2013-2019 Tencent Cloud. All rights reserved. Copyright in this document is exclusively owned by Tencent Cloud. You must not reproduce, modify, copy or distribute in any way, in whole or in part, the contents of this document without Tencent Cloud's the prior written consent. Trademark Notice All trademarks associated with Tencent Cloud and its services are owned by Tencent Cloud Computing (Beijing) Company Limited and its affiliated companies. Trademarks of third parties referred to in this document are owned by their respective proprietors. Service Statement This document is intended to provide users with general information about Tencent Cloud's products and services only and does not form part of Tencent Cloud's terms and conditions. Tencent Cloud's products or services are subject to change. Specific products and services and the standards applicable to them are exclusively provided for in Tencent Cloud's applicable terms and conditions. ©2013-2019 Tencent Cloud. All rights reserved. Page 2 of 441 Elastic MapReduce Contents EMR Development Guide Hadoop Development Guide HDFS Common Operations Submitting MapReduce Tasks Automatically Adding Task Nodes Without Assigning ApplicationMasters YARN Task Queue Management Practices on YARN Label Scheduling Hadoop Best Practices Using API to Analyze Data in HDFS and COS Dumping YARN Job Logs to COS Spark Development Guide
    [Show full text]
  • Online Analytical Processing on Hadoop Using Apache Kylin
    International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 12 – No. 2, May 2017 – www.ijais.org Online Analytical Processing on Hadoop using Apache Kylin Supriya Vitthal Ranawade Shivani Navale Akshay Dhamal Bachelor of Engineering (I.T.) Bachelor of Engineering (I.T.) Bachelor of Engineering (I.T.) P.E.S Modern College of P.E.S Modern College of P.E.S Modern College of Engineering Engineering Engineering Department of I.T. Department of I.T. Department of I.T. Kuldeep Deshpande Chandrashekhar Ghuge Founder and CEO Assistant Professor Ellicium Solutions P.E.S Modern College of Engineering Department of I.T. ABSTRACT There has been a lot of research on how Hadoop is used as an In the Big Data age, it is necessary to remodel the traditional alternative for data warehousing (e.g. 1, 4, 7, 9, and 11). Only data warehousing and Online Analytical Processing (OLAP) storing the data in the datawarehouse is not profitable to the system. Many challenges are posed to the traditional business until, the underlying data is analysed to gain business platforms due to the ever increasing data. In this paper, we insights. Online Analytical Processing (OLAP) is an approach have proposed a system which overcomes the challenges and to answer multidimensional analytical queries swiftly, and is also beneficial to the business organizations. The proposed provides support for decision-making and intuitive result system mainly focuses on OLAP on Hadoop platform using views for queries [8].However, the traditional OLAP Apache Kylin. Kylin is an OLAP engine which builds OLAP implementation, namely the ROLAP system based on cubes from the data present in hive and stores the cubes in RDBMS, appears to be inadequate in face of big data HBase for further analysis.
    [Show full text]
  • Software Security > Wearable Computing > Gaming Software
    > Software Security > Wearable Computing > Gaming Software > Robotics and Analytics JUNE 2018 www.computer.org IEEE COMPUTER SOCIETY: Be at the Center of It All IEEE Computer Society membership puts you at the heart of the technology profession—and helps you grow with it. Here are 10 reasons why you need to belong. A robust jobs board, Training that plus videos, articles sharpens your edge and presentations to in Cisco, IT security, help you land that next MS Enterprise, Oracle opportunity. and more. Scholarships awarded to computer science Certifi cations and engineering student and exam preparation members each year. that set you apart. Opportunities to Industry intelligence, get involved through including Computer, myCS, speaking, publishing and Computing Now, and volunteering opportunities. myComputer. Deep discounts Over 200 annual on magazines, journals, conferences and technical conferences, symposia events held worldwide. and workshops. Access to 300-plus chapters hundreds of and more than 30 books, 13 technical technical committees magazines and keep you 20 research journals. connected. IEEE Computer Society—keeping you ahead of the game. Get involved today. www.computer.org/membership IEEE COMPUTER SOCIETY computer.org • +1 714 821 8380 STAFF Editor Managers, Editorial Content Meghan O’Dell Brian Brannon, Carrie Clark Contributing Staff Publisher Christine Anthony, Lori Cameron, Cathy Martin, Chris Nelson, Robin Baldwin Dennis Taylor, Rebecca Torres, Bonnie Wylie Director, Products and Services Production & Design Evan Butterfield Carmen Flores-Garvey Senior Advertising Coordinator Debbie Sims Circulation: ComputingEdge (ISSN 2469-7087) is published monthly by the IEEE Computer Society. IEEE Headquarters, Three Park Avenue, 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720; voice +1 714 821 8380; fax +1 714 821 4010; IEEE Computer Society Headquarters, 2001 L Street NW, Suite 700, Washington, DC 20036.
    [Show full text]