A Review on Big Data Analytics Tools for Telecommunication Industry

Total Page:16

File Type:pdf, Size:1020Kb

A Review on Big Data Analytics Tools for Telecommunication Industry I J C T A, 9(27), 2016, pp. 185-193 ISSN: 0974-5572 © International Science Press A Review on Big Data Analytics Tools for Telecommunication Industry M. Durairaj * and T.S. Poornappriya** ABSTRACT Big data is a major buzz phrase and new bend for IT today. Big data is driven data with high velocity, volume, variety, veracity and value. It originates from various sources like cell phones, web, online networking, sensors, geospatial gadgets and other machine-produced information. Data transfers organizations have a one of a kind favorable position in the present day commercial center: by controlling the interchanges base, they have more data than some other industry on where their clients are, the way they cooperate, and how they execute business. In any case, utilizing that client data obliges telecom to change a nineteenth century plan of action into one that takes care of today’s demand for constant business and customer knowledge. This paper means to dissect the distinctive big data examination apparatuses and client churn, up sell, cross sell software tools. Keywords: Big Data analytics, platforms, Database, customer churn tools. 1. INTRODUCTION Envision a world without data storage; a place where everything about an organization or person, each transaction performed, or each viewpoint which can be documented is lost directly after use. Organizations would thus lose the ability to extract gainful data and knowledge, perform itemized analyses, and also give new open doors and advantages. Anything extending from client names and addresses, to products available, to purchases made, to representatives hired, etc. has become necessary for everyday continuity. Data is the building obstruct whereupon any organization flourishes. Gadgets and individuals are always producing information. In IT industry big data is a biggest buzz stage. The data has expanded step by step from most recent a quarter century; truths about data are 2 million seeking questions on Google, 277,000 tweets, 100 million messages, and 350 GB data preparing on facebook consistently. Big data [1] is another open doors for big business to extricate gigantic volume of data continuous and judicious and non-sane data sorts. New advances and individual correspondence delivering the Big data drifts, the worldwide web populace developed by 6.5percent from 2010 to 2011. In 2013, gauges achieved 4 Zettabyte of data produced around the world. Big data depicts any voluminous organized (cluster, documents, records, table, tree), unstructured can be literary (PowerPoint, Word archive, Email messages, texts) and non printed (JPEG, MP3 sound records, streak video records) and semi-organized (weblogs, online networking sustains) [2]. In 2014, evaluated overall data at an amazing 7ZB. Today 2 billion individuals are associated together and producing gigantic measure of data consistently and IDC study found that, by 2020 data volumes are relied upon to expand 50 times. Distributed computing empowered with Big data [3] by elements, for example, pay-per-use, versatility, low time to market, transaction of dangers and low forthright speculation. Distributed computing gives little to medium estimated business to execute Big data innovations and lessen preparing expense and equipment cost. data is produced through organization with online networking, informing, portable applications, mechanized procedures, sensors and PCs. Ongoing utilizations of Big data in various commercial ventures like human services, * Assistant Professor, School of Computer Science Engineering and Applications, Bharathidasan University, Trichy, Tamilnadu, India ** Research Scholar, School of Computer Science, Engineering and Applications, Bharathidasan University, Trichy, Tamilnadu, India 186 M. Durairaj and T.S. Poornappriya system security, business sector and business, sports, instruction frameworks, gaming Industry, telecom. Big data technologies prepare high-assortment, high-volume and high-speed to concentrate data esteem and guarantee high-veracity of unique data [4]. Volume, velocity and variety as the greatest difficulties of data administration. Big data definition having the accompanying 5 V’s (volume, velocity, variety, veracity and quality) properties: 1. Volume: Big data comes in one size: XXL (range 30-50 terabytes TBs) through ventures [5]. The accessible storage can’t deal with structure and unstructured information; this is a major issue for ventures. 2. Velocity: Velocity characterizes the pace of data that enters the endeavor and after that examined to build the benefit of business before the estimation of the data lost. 3. Variety: Data can be organized, unstructured, semi organized or blend of three [6]. It comes in numerous structures like logs documents, tweets, pictures, recordings, sound, content, PDF documents, click streams and so on. 4. Veracity: Veracity signifies “similarity with truth or reality”. Data sources (even in same space) are of various qualities with contrasts precision, scope and timelineness. 5. Esteem: It alludes to the handling of the data and delivered it amid examination. Estimation of data is not one time utilize and reused for future by joined with another data sets. 2. BIG DATA ANALYSIS AND CUSTOMER CHURN IN TELECOMMUNICATION The expression “Huge Data” has as of late been connected to datasets that develop so substantial that they get to be ungainly to work with utilizing customary database administration frameworks. They are data sets whose size is past the capacity of ordinarily utilized software tools and capacity frameworks to catch, store, oversee, and additionally handle the data inside of a bearable slipped by time [7]. Big data sizes are always expanding, at present extending from a couple of dozen terabytes (TB) to numerous petabytes (PB) of data in a solitary data set. Thus, a portion of the challenges identified with Big data incorporate catch, storage, seek, sharing, analytics, and imagining [7]. Today, ventures are investigating vast volumes of exceedingly point by point data in order to find actualities they didn’t know some time recently. Henceforth, Big data analytics is the place progressed diagnostic methods are connected on Big data sets. Analytics in view of substantial data tests reveals and influences business change. In any case, the bigger the arrangement of information, the more troublesome it gets to be to oversee. In this area, we will begin by examining the attributes of Big data, and also its significance. Normally, business advantage can ordinarily be gotten from dissecting bigger and more mind boggling data sets that require constant or close continuous capacities; be that as it may, this prompts a requirement for new data structures, systematic techniques, and apparatuses. In this way the progressive segment will expound the big data analytics devices and techniques, specifically, beginning with the big data storage and administration, then proceeding onward to the Big data systematic handling. It then closes with a percentage of the different Big data examinations which have developed in utilization with Big data. These days, individuals would prefer just not to gather information, they need to comprehend the significance and significance of the information, and use it to help them in deciding. data examination is the procedure of applying calculations keeping in mind the end goal to investigate sets of data and concentrate valuable and obscure examples, connections, and data [8]. Besides, data examination are utilized to remove already obscure, valuable, substantial and concealed examples and data from expansive datasets, and in addition to recognize vital connections among the put away variables. In this manner, analytics have significantly affected exploration and technologies, since chiefs have turned out to be more inspired by gaining from past information, subsequently increasing upper hand [7] [8]. Expectation of clients who are at danger of leaving an organization is called as churn forecast in telecom [9]. The organization ought to concentrate on such clients and endeavor to hold them. This application is critical in light A Review on Big Data Analytics Tools for Telecommunication Industry 187 of the fact that it is less costly to hold a client than gain another. Significant worry in client relationship administration in data transfers organizations is the simplicity with which clients can move to a contender, a procedure called “stirring”. Agitating is an expensive procedure for the organization, as it is much less expensive to hold a client than to procure another one. 3. BIG DATA ANALYTICS TOOLS There are five key ways to deal with examining Big data and producing understanding: 1. Disclosure apparatuses are helpful all through the data lifecycle for fast, instinctive analytics and examination of data from any mix of organized and unstructured sources. These devices license analytics close by conventional BI [10] source frameworks. Since there is no requirement for in advance displaying, clients can draw new experiences, reach important conclusions, and settle on educated choices rapidly. 2. BI tools are essential for reporting, analytics and execution administration, basically with value-based data from data distribution centers and creation data frameworks. BI Tools [11] give far reaching capacities to business knowledge and execution administration, including endeavor reporting, dashboards, specially appointed examination, scorecards, and consider the possibility that situation analytics on a coordinated, undertaking
Recommended publications
  • The Network Data Storage Model Based on the HTML
    Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013) The network data storage model based on the HTML Lei Li1, Jianping Zhou2 Runhua Wang3, Yi Tang4 1Network Information Center 3Teaching Quality and Assessment Office Chongqing University of Science and Technology Chongqing University of Science and Technology Chongqing, China Chongqing, China [email protected] [email protected] 2Network Information Center 4School of Electronic and Information Engineering Chongqing University of Science and Technology Chongqing University of Science and Technology Chongqing, China Chongqing, China [email protected] [email protected] Abstract—in this paper, the mass oriented HTML data refers popular NoSQL for HTML database. This article describes to those large enough for HTML data, that can no longer use the SMAQ stack model and those who today may be traditional methods of treatment. In the past, has been the included in the model under mass oriented HTML data Web search engine creators have to face this problem be the processing tools. first to bear the brunt. Today, various social networks, mobile applications and various sensors and scientific fields are II. MAPREDUCE created daily with PB for HTML data. In order to deal with MapReduce is a Google for creating web webpage index the massive data processing challenges for HTML, Google created MapReduce. Google and Yahoo created Hadoop created. MapReduce framework has become today most hatching out of an intact mass oriented HTML data processing mass oriented HTML data processing plant. MapReduce is tools for ecological system. the key, will be oriented in the HTML data collection of a query division, then at multiple nodes to execute in parallel.
    [Show full text]
  • Projects – Other Than Hadoop! Created By:-Samarjit Mahapatra [email protected]
    Projects – other than Hadoop! Created By:-Samarjit Mahapatra [email protected] Mostly compatible with Hadoop/HDFS Apache Drill - provides low latency ad-hoc queries to many different data sources, including nested data. Inspired by Google's Dremel, Drill is designed to scale to 10,000 servers and query petabytes of data in seconds. Apache Hama - is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS for massive scientific computations such as matrix, graph and network algorithms. Akka - a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. ML-Hadoop - Hadoop implementation of Machine learning algorithms Shark - is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. Apache Crunch - Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run Azkaban - batch workflow job scheduler created at LinkedIn to run their Hadoop Jobs Apache Mesos - is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other applications on a dynamically shared pool of nodes.
    [Show full text]
  • Database Software Market: Billy Fitzsimmons +1 312 364 5112
    Equity Research Technology, Media, & Communications | Enterprise and Cloud Infrastructure March 22, 2019 Industry Report Jason Ader +1 617 235 7519 [email protected] Database Software Market: Billy Fitzsimmons +1 312 364 5112 The Long-Awaited Shake-up [email protected] Naji +1 212 245 6508 [email protected] Please refer to important disclosures on pages 70 and 71. Analyst certification is on page 70. William Blair or an affiliate does and seeks to do business with companies covered in its research reports. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of this report. This report is not intended to provide personal investment advice. The opinions and recommendations here- in do not take into account individual client circumstances, objectives, or needs and are not intended as recommen- dations of particular securities, financial instruments, or strategies to particular clients. The recipient of this report must make its own independent decisions regarding any securities or financial instruments mentioned herein. William Blair Contents Key Findings ......................................................................................................................3 Introduction .......................................................................................................................5 Database Market History ...................................................................................................7 Market Definitions
    [Show full text]
  • Big Data: a Survey the New Paradigms, Methodologies and Tools
    Big Data: A Survey The New Paradigms, Methodologies and Tools Enrico Giacinto Caldarola1,2 and Antonio Maria Rinaldi1,3 1Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Napoli, Italy 2Institute of Industrial Technologies and Automation, National Research Council, Bari, Italy 3IKNOS-LAB Intelligent and Knowledge Systems, University of Naples Federico II, LUPT 80134, via Toledo, 402-Napoli, Italy Keywords: Big Data, NoSQL, NewSQL, Big Data Analytics, Data Management, Map-reduce. Abstract: For several years we are living in the era of information. Since any human activity is carried out by means of information technologies and tends to be digitized, it produces a humongous stack of data that becomes more and more attractive to different stakeholders such as data scientists, entrepreneurs or just privates. All of them are interested in the possibility to gain a deep understanding about people and things, by accurately and wisely analyzing the gold mine of data they produce. The reason for such interest derives from the competitive advantage and the increase in revenues expected from this deep understanding. In order to help analysts in revealing the insights hidden behind data, new paradigms, methodologies and tools have emerged in the last years. There has been a great explosion of technological solutions that arises the need for a review of the current state of the art in the Big Data technologies scenario. Thus, after a characterization of the new paradigm under study, this work aims at surveying the most spread technologies under the Big Data umbrella, throughout a qualitative analysis of their characterizing features.
    [Show full text]
  • Big Data Solutions and Applications
    P. Bellini, M. Di Claudio, P. Nesi, N. Rauch Slides from: M. Di Claudio, N. Rauch Big Data Solutions and applications Slide del corso: Sistemi Collaborativi e di Protezione Corso di Laurea Magistrale in Ingegneria 2013/2014 http://www.disit.dinfo.unifi.it Distributed Data Intelligence and Technology Lab 1 Related to the following chapter in the book: • P. Bellini, M. Di Claudio, P. Nesi, N. Rauch, "Tassonomy and Review of Big Data Solutions Navigation", in "Big Data Computing", Ed. Rajendra Akerkar, Western Norway Research Institute, Norway, Chapman and Hall/CRC press, ISBN 978‐1‐ 46‐657837‐1, eBook: 978‐1‐ 46‐657838‐8, july 2013, in press. http://www.tmrfindia.org/b igdata.html 2 Index 1. The Big Data Problem • 5V’s of Big Data • Big Data Problem • CAP Principle • Big Data Analysis Pipeline 2. Big Data Application Fields 3. Overview of Big Data Solutions 4. Data Management Aspects • NoSQL Databases • MongoDB 5. Architectural aspects • Riak 3 Index 6. Access/Data Rendering aspects 7. Data Analysis and Mining/Ingestion Aspects 8. Comparison and Analysis of Architectural Features • Data Management • Data Access and Visual Rendering • Mining/Ingestion • Data Analysis • Architecture 4 Part 1 The Big Data Problem. 5 Index 1. The Big Data Problem • 5V’s of Big Data • Big Data Problem • CAP Principle • Big Data Analysis Pipeline 2. Big Data Application Fields 3. Overview of Big Data Solutions 4. Data Management Aspects • NoSQL Databases • MongoDB 5. Architectural aspects • Riak 6 What is Big Data? Variety Volume Value 5V’s of Big Data Velocity Variability 7 5V’s of Big Data (1/2) • Volume: Companies amassing terabytes/petabytes of information, and they always look for faster, more efficient, and lower‐ cost solutions for data management.
    [Show full text]
  • Building Open Source ETL Solutions with Pentaho Data Integration
    Pentaho® Kettle Solutions Pentaho® Kettle Solutions Building Open Source ETL Solutions with Pentaho Data Integration Matt Casters Roland Bouman Jos van Dongen Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 lll#l^aZn#Xdb Copyright © 2010 by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-0-470-63517-9 ISBN: 9780470942420 (ebk) ISBN: 9780470947524 (ebk) ISBN: 9780470947531 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appro- priate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at ]iie/$$lll#l^aZn#Xdb$\d$eZgb^hh^dch. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representa- tions or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose.
    [Show full text]
  • How to Estimate and Measure “The Cloud”
    Presented at the 2013 ICEAA Professional Development & Training Workshop - www.iceaaonline.com How to estimate and measure “the Cloud”. And Make COCOMO Cloud Enabled? ICEAA June 2013 John W. Rosbrugg&h & David P. Seaver OPS Consulting Flawless Implementation of the Fundamentals Presented at the 2013 ICEAA Professional Development & Training Workshop - www.iceaaonline.com Some Perspective • The DoD Cloud Computing Strategy introduces an approach to move the Department from the current state of a duplicative, cumbersome, and costly set of application silos to an end state which is an agile, secure, and cost effective service environment that can rapidly respond to changing mission needs. Flawless Implementation of the Fundamentals 2 Presented at the 2013 ICEAA Professional Development & Training Workshop - www.iceaaonline.com Flawless Implementation of the Fundamentals 3 Presented at the 2013 ICEAA Professional Development & Training Workshop - www.iceaaonline.com Flawless Implementation of the Fundamentals 4 Presented at the 2013 ICEAA Professional Development & Training Workshop - www.iceaaonline.com CLOUD COMPUTING DEFINED PART 1 Flawless Implementation of the Fundamentals Presented at the 2013 ICEAA Professional Development & Training Workshop - www.iceaaonline.com Cloud Computing Defined • The National Institute of Standards and Technology ( NIST) defines cloud computing as: – “A model for enabling ubiquitous, convenient, on‐demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage,
    [Show full text]
  • Big Data Fundamentals
    BigBig DataData FundamentalsFundamentals . Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 [email protected] These slides and audio/video recordings of this class lecture are at: http://www.cse.wustl.edu/~jain/cse570-15/ Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse570-15/ ©2015 Raj Jain 18-1 OverviewOverview 1. Why Big Data? 2. Terminology 3. Key Technologies: Google File System, MapReduce, Hadoop 4. Hadoop and other database tools 5. Types of Databases Ref: J. Hurwitz, et al., “Big Data for Dummies,” Wiley, 2013, ISBN:978-1-118-50422-2 Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse570-15/ ©2015 Raj Jain 18-2 BigBig DataData Data is measured by 3V's: Volume: TB Velocity: TB/sec. Speed of creation or change Variety: Type (Text, audio, video, images, geospatial, ...) Increasing processing power, storage capacity, and networking have caused data to grow in all 3 dimensions. Volume, Location, Velocity, Churn, Variety, Veracity (accuracy, correctness, applicability) Examples: social network data, sensor networks, Internet Search, Genomics, astronomy, … Washington University in St. Louis http://www.cse.wustl.edu/~jain/cse570-15/ ©2015 Raj Jain 18-3 WhyWhy BigBig DataData Now?Now? 1. Low cost storage to store data that was discarded earlier 2. Powerful multi-core processors 3. Low latency possible by distributed computing: Compute clusters and grids connected via high-speed networks 4. Virtualization Partition, Aggregate, isolate resources in any size and dynamically change it Minimize latency for any scale 5. Affordable storage and computing with minimal man power via clouds Possible because of advances in Networking Washington University in St.
    [Show full text]
  • Performance Analysis of Hbase Neseeba P.B, Dr
    International Journal of Latest Technology in Engineering, Management & Applied Science (IJLTEMAS) Volume VI, Issue X, October 2017 | ISSN 2278-2540 Performance Analysis of Hbase Neseeba P.B, Dr. Zahid Ansari Department of Computer Science & Engineering, P. A. College of Engineering, Mangalore, 574153, India Abstract— Hbase is a distributed column-oriented database built 3) Map Reduce: A programming model and an on top of HDFS. Hbase is the Hadoop application to use when associated implementation for processing and you require real-time random access to very large datasets. generating large data set. It was not too long after Hbase is a scalable data store targeted at random read and Google published these documents that we started writes access of fully structured data. It's invented after Google's seeing open source implementations of them, and in big table and targeted to support large tables, on the order of 2007, Mike Cafarella released code for an open billions of rows and millions of columns. This paper includes step by step information to the HBase, Detailed architecture of source Big Table implementation that he called HBase. Illustration of differences between apache Hbase and a Hbase. traditional RDBMS, The drawbacks of Relational Database Systems, Relationship between the Hadoop and Hbase, storage of II. DATA MODEL Hbase in physical memory. This paper also includes review of the Other cloud databases. Various problems, limitations, Hbase actually defines a four-dimensional data model and the advantages and applications of HBase. Brief introduction is given following four coordinates define each cell (see Figure 1): in the following section.
    [Show full text]
  • SIGMOD Officers, Committees, and Awardees
    SIGMOD Officers, Committees, and Awardees Chair Vice-Chair Secretary/Treasurer Yannis Ioannidis Christian S. Jensen Alexandros Labrinidis University of Athens Department of Computer Science Department of Computer Science Department of Informatics Aalborg University University of Pittsburgh Panepistimioupolis, Informatics Bldg Selma Lagerlöfs Vej 300 Pittsburgh, PA 15260-9161 157 84 Ilissia, Athens DK-9220 Aalborg Øst HELLAS DENMARK USA +30 210 727 5224 +45 99 40 89 00 +1 412 624 8843 <yannis AT di.uoa.gr> <csj AT cs.aau.dk > <labrinid AT cs.pitt.edu> SIGMOD Executive Committee: Sihem Amer-Yahia, Curtis Dyreson, Christian S. Jensen, Yannis Ioannidis, Alexandros Labrinidis, Maurizio Lenzerini, Ioana Manolescu, Lisa Singh, Raghu Ramakrishnan, and Jeffrey Xu Yu. Advisory Board: Raghu Ramakrishnan (Chair), Yahoo! Research, <First8CharsOfLastName AT yahoo-inc.com>, Rakesh Agrawal, Phil Bernstein, Peter Buneman, David DeWitt, Hector Garcia-Molina, Masaru Kitsuregawa, Jiawei Han, Alberto Laender, Tamer Özsu, Krithi Ramamritham, Hans-Jörg Schek, Rick Snodgrass, and Gerhard Weikum. Information Director: Jeffrey Xu Yu, The Chinese University of Hong Kong, <yu AT se.cuhk.edu.hk> Associate Information Directors: Marcelo Arenas, Denilson Barbosa, Ugur Cetintemel, Manfred Jeusfeld, Dongwon Lee, Michael Ley, Rachel Pottinger, Altigran Soares da Silva, and Jun Yang. SIGMOD Record Editor: Ioana Manolescu, INRIA Saclay, <ioana.manolescu AT inria.fr> SIGMOD Record Associate Editors: Magdalena Balazinska, Denilson Barbosa, Chee Yong Chan, Ugur Çetintemel, Brian Cooper, Cesar Galindo-Legaria, Leonid Libkin, and Marianne Winslett. SIGMOD DiSC and SIGMOD Anthology Editor: Curtis Dyreson, Washington State University, <cdyreson AT eecs.wsu.edu> SIGMOD Conference Coordinators: Sihem Amer-Yahia, Yahoo! Research, <sihemameryahia AT acm.org> and Lisa Singh, Georgetown University, <singh AT cs.georgetown.edu> PODS Executive: Maurizio Lenzerini (Chair), University of Roma 1, <lenzerini AT dis.uniroma1.it>, Georg Gottlob, Phokion G.
    [Show full text]
  • Understanding Big Data: a Management Study
    Understanding Big Data: A Management Study Special Research Reprint Courtesy of Progress Software September 22, 2011 Understanding Big Data: A Management Study 951 SMS TABLE OF CONTENTS Summary 1 So What? 1 Perspective 1 The Problem with Big Data 2 Big Data NOSQL Databases 4 Big Data Analytics 5 Big Data and the Cloud 7 Current Trends 9 Net Impacts 9 TABLE OF FIGURES Figure 1: Big Data Complexities 3 Figure 2: Big Data, Analytics, and Business Information 5 Figure 3: Cloud-Based Analytics Can Envelope, Adapt to, and Contain Big Data 8 About this Report This Big Data management report takes a broad look at how the market for Big Data infrastructure, technologies and solutions are evolving in response to the ex- plosion in both structured and non-structure content of all types. The Cloud is emerging as a particularly useful platform for Big Data solutions for both infra- structure and analytics, as it is an ideal platform for massive data crunching and analysis at affordable prices. Progress Software has been granted the right to reprint and electronically distribute this article through its website, through April 10, 2012. Additional related research of interest is available to clients via our Research Library located at www.saugatucktechnology.com/browse-research/ (registration required). The following Saugatuck staff were instrumental in the development and publication of this report: Lead author: Brian Dooley. Contributing Author: Bruce Guptill. About Saugatuck Technology Saugatuck Technology, Inc., provides subscription research and management con- sulting services focused on the key market trends and disruptive technologies driv- ing change in enterprise IT, including Software-as-a-Service (SaaS), Cloud Infra- structure, Social Computing, Mobility and Advanced Analytics, among others.
    [Show full text]
  • A Case Study on Apache Hbase
    A case study on Apache HBase A Master’s Project Presented to Department of Computer and Information Sciences SUNY Polytechnic Institute Utica, New York In Partial Fulfillment Of the Requirements for the Master of Science Degree by Rohit Reddy Nalla May 2015 © Rohit Reddy Nalla 2015 Abstract Introduction: Apache HBase is an open-source, non-relational and a distributed data base system built on top of HDFS (Hadoop Distributed File system). HBase was designed post Google’s Big table and it is written in Java. It was developed as a part of Apache’s Hadoop Project. It provides a kind of fault – tolerant mechanism to store minor amounts of non-zero items caught within large amounts of empty items. HBase is used when we require real-time read/write access to huge data bases. HBase project was started by the end of 2006 by Chad Walters and Jim Kellerman at Powerset.[2] The main purpose of HBase is to process large amounts of data. Mike Cafarella worked on code of the working system initially and later Jim Kellerman carried it to the next stage. HBase was first released as a part of Hadoop 0.15.0 in October 2007[2]. The project goal was holding of very large tables like billions of rows X millions of columns. In May 2010, HBase advanced to a major project and it became an Apache Top Level Project. Several applications like Adobe, Twitter, Yahoo, Trend Micro etc. use this data base. Social networking sites like Facebook have implemented its messenger application using HBase. This document helps us to understand how HBase works and how is it different from other data bases.
    [Show full text]