Building Big Data and Analytics Solutions in Cloud

Total Page:16

File Type:pdf, Size:1020Kb

Building Big Data and Analytics Solutions in Cloud Front cover Building Big Data and Analytics Solutions in the Cloud Characteristics of big data and key technical challenges in taking advantage of it Impact of big data on cloud computing and implications on data centers Implementation patterns that solve the most common big data use cases Wei-Dong Zhu Manav Gupta Ven Kumar Sujatha Perepa Arvind Sathi Craig Statchuk ibm.com/redbooks Redpaper International Technical Support Organization Building Big Data and Analytics Solutions in the Cloud December 2014 REDP-5085-00 Note: Before using this information and the product it supports, read the information in “Notices” on page v. First Edition (December 2014) This edition applies to: IBM InfoSphere® BigInsights™ IBM PureData™ System for Hadoop IBM PureData System for Analytics IBM PureData System for Operational Analytics IBM InfoSphere Warehouse IBM InfoSphere Streams IBM InfoSphere Data Explorer (Watson™ Explorer) IBM InfoSphere Data Architect IBM InfoSphere Information Analyzer IBM InfoSphere Information Server IBM InfoSphere Information Server for Data Quality IBM InfoSphere Master Data Management Family IBM InfoSphere Optim™ Family IBM InfoSphere Guardium® Family © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . .v Trademarks . vi Executive summary . vii Authors. viii Now you can become a published author, too! . ix Comments welcome. ix Stay connected to IBM Redbooks . .x Chapter 1. Big data and analytics overview . 1 1.1 Examples of big data and analytics. 3 1.2 IBM Big Data and Analytics platform. 4 1.3 IBM Cloud Computing Reference Architecture . 8 1.4 Big data and analytics on the cloud: Complementary technologies . 9 1.5 A large difference . 10 1.6 Implications for the cloud . 11 1.7 Big data and analytics in the cloud: Bringing the two together . 12 Chapter 2. Big data and analytics business drivers and technical challenges. 15 2.1 Big data business drivers . 16 2.2 Technical challenges. 18 Chapter 3. Big data and analytics taxonomy . 21 Chapter 4. Big data and analytics architecture . 25 4.1 Maturity models. 26 4.2 Functional architecture . 27 4.3 Big data and analytics infrastructure architecture . 31 Chapter 5. Key big data and analytics use cases . 33 5.1 Use case details . 35 5.1.1 Big data and analytics exploration . 35 5.1.2 Enhanced 360-degree view of the customer. 36 5.1.3 Operations analysis . 38 5.1.4 Data warehouse augmentation . 39 5.1.5 Security intelligence (security, fraud, and risk analysis) . 41 Chapter 6. Big data and analytics patterns. 43 6.1 Zones . 44 6.1.1 Data refinement value. 45 6.1.2 Normalization . 46 6.1.3 Maximizing the number of correlations . 46 6.1.4 Quality, relevance, and flexibility. 46 6.1.5 Solution pattern descriptions. 47 6.1.6 Component patterns . 55 Chapter 7. Picking a solution pattern . 69 7.1 Data ingestion . 70 7.2 Data preparation and data alignment . 71 7.3 Data streaming . 72 © Copyright IBM Corp. 2014. All rights reserved. iii 7.4 Search-driven analysis . 73 7.5 Dimensional analysis . 74 7.6 High-volume analysis . 75 Chapter 8. NoSQL technology . 77 8.1 Definition and history of NoSQL . 78 8.1.1 The CAP theorem . 79 8.2 Types of NoSQL databases . 80 8.2.1 Key-Value stores. 81 8.2.2 Document databases . 81 8.2.3 Columnar and column-family databases. 81 8.2.4 Graph databases . 82 8.2.5 Hadoop and the Hadoop Distributed File System. 82 8.2.6 Choosing the right distributed data store . 83 Chapter 9. Big data and analytics implications for the data center . 85 9.1 The hybrid data center . 86 9.2 Big data and analytics service ecosystem on cloud . 86 9.3 IBM SoftLayer for big data and analytics . 89 Chapter 10. Big data and analytics on the cloud: Architectural factors and guiding principles. 91 10.1 Architectural principles . 92 10.1.1 Security . 92 10.1.2 Speed . 92 10.1.3 Scalability . 92 10.1.4 Reliability and Availability . 93 10.2 Architectural decisions . 93 10.2.1 Performance . 93 10.2.2 Scalability . 94 10.2.3 Financial . 95 10.2.4 Reliability. ..
Recommended publications
  • Title, Modify
    Sentiment Analysis of Twitter Data for a Tourism Recommender System in Bangladesh Najeefa Nikhat Choudhury School of Science Thesis submitted for examination for the degree of Master of Science in Technology. Otaniemi 28.11.2016 Thesis supervisor: Assoc. Prof. Keijo Heljanko aalto university abstract of the school of science master’s thesis Author: Najeefa Nikhat Choudhury Title: Sentiment Analysis of Twitter Data for a Tourism Recommender System in Bangladesh Date: 28.11.2016 Language: English Number of pages: 8+65 Department of Computer Science Master’s Program in ICT Innovation Supervisor and advisor: Assoc. Prof. Keijo Heljanko The exponentially expanding Digital Universe is generating huge amount of data containing valuable information. The tourism industry, which is one of the fastest growing economic sectors, can benefit from the myriad of digital data travelers generate in every phase of their travel- planning, booking, traveling, feedback etc. One application of tourism related data can be to provide personalized destination recommendations. The primary objective of this research is to facilitate the business development of a tourism recommendation system for Bangladesh called “JatraLog”. Sentiment based recommendation is one of the features that will be employed in the recommendation system. This thesis aims to address two research goals: firstly, to study Sentiment Analysis as a tourism recommendation tool and secondly, to investigate twitter as a potential source of valuable tourism related data for providing recommendations for different countries, specifically Bangladesh. Sentiment Analysis can be defined as a Text Classification problem, where a document or text is classified into two groups: positive or negative, and in some cases a third group, i.e.
    [Show full text]
  • ROBRA – the Snake Robot
    International Journal of Computer Science Trends and Technology (IJCST) – Volume 7 Issue 3, May - Jun 2019 RESEARCH ARTICLE OPEN ACCESS ROBRA – The Snake Robot James M.V, Nandukrishna S, Noyal Jose, Ebin Biju Department of Computer Science and Engineering Toc H Institute of Science & Technology, Ernakulam Kerala – India ABSTRACT Robra - A snake-like robot which, provides the locomotion of a real snake. The shape and size of the robot depends on application. Robra can avoid obstacles by receiving signals from sensors and move with flexibility on terrain surfaces. Multiple joints increases its degree of freedom. It can quickly explore and navigate foreign environments and detect potential dangers for direct human involvement. Robra uses PIR() to detect presence of humans in an environment like civilians trapped in collapsed buildings during natural disasters. It can move through crevices and uneven surfaces. It detects obstacles by using Ultrasonic range movement sensors. This provides noncontact distance between the obstacles to detect and avoid collision with them. Electro chemical sensors are used for detection of poisonous gases. Robra has an onboard camera which records and send the video data in real time via WiFi. Robra has microphones for the survivors to communicate and inform their status incase of rescue aid. Thus Robra covers wide real time applications like surveillance, rescue aid etc. Keywords:- Robra, Snake, act have a wingspan almost 10 feet, giving it an ape-like I. INTRODUCTION appearance. In the past two decades it is estimated that disasters are Momaro - This robot has been specifically designed by the responsible for about 3 million deaths worldwide, 800million team NimbRo Rescue from the University of Bonn in people adversely affected.
    [Show full text]
  • Mogućnosti Primjene Twitterovog Aplikacijskog Progamskog Sučelja
    Mogućnosti primjene Twitterovog aplikacijskog progamskog sučelja Mance, Josip Master's thesis / Diplomski rad 2016 Degree Grantor / Ustanova koja je dodijelila akademski / stručni stupanj: Josip Juraj Strossmayer University of Osijek, Faculty of Electrical Engineering, Computer Science and Information Technology Osijek / Sveučilište Josipa Jurja Strossmayera u Osijeku, Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek Permanent link / Trajna poveznica: https://urn.nsk.hr/urn:nbn:hr:200:039315 Rights / Prava: In copyright Download date / Datum preuzimanja: 2021-09-29 Repository / Repozitorij: Faculty of Electrical Engineering, Computer Science and Information Technology Osijek SVEUČILIŠTE JOSIPA JURJA STROSSMAYERA U OSIJEKU ELEKTROTEHNIČKI FAKULTET Sveučilišni studij MOGUĆNOSTI PRIMJENE TWITTEROVOG APLIKACIJSKOG PROGRAMSKOG SUČELJA Diplomski rad Josip Mance Osijek, 2016. godina Obrazac D1 Izjava o originalnosti Sadržaj 1. Uvod ............................................................................................................................................ 1 2. Aplikacijsko programsko sučelje ................................................................................................ 2 3. Društvene mreže.......................................................................................................................... 4 3.1. Twitter .................................................................................................................................. 6 3.1.1. Upotreba Twittera ........................................................................................................
    [Show full text]
  • Table of Content
    TABLE OF CONTENT 1. What is Joomla………………………………………………………………. 2 2. Download and Install Joomla………………………………………………. 3 2.1 Pre-installation Check………………………………………………….... 3 2.2 Configuration …………………………………………………………… 4 3. Creating Content……………………………………………………………....9 3.1 Create a Category…………………………………………………………9 3.2 Create Article………………………………………………………………10 4. Modules……………………………………………………………………….. 12 5. How to Show Position for Template?....................................................... 14 6. Menu……………………………………………………………………………16 6.1 How to Create Menu?.........................................................................16 7. User……………………………………………………………………………. 18 7.1 Users Groups…………………………………………………………….. 18 7.2 Add New User……………………………………………………………..19 1 1. What is Joomla? Joomla is a free system for creating websites. It is an open source project, which, like most open source projects, is constantly in motion. It has been extremely successful for seven years now and is popular with millions of users worldwide. The word Joomla is a derivative of the word Joomla from the African language Swahili and means "all together". The project Joomla is the result of a heated discussion between the Mambo Foundation, which was founded in August 2005, and its then-development team. Joomla is a development of the successful system Mambo. Joomla is used all over the world for simple homepages and for complex corporate websites as well. It is easy to install, easy to manage and very reliable. The Joomla team has organized and reorganized itself throughout the last seven years to better meet the user demands. 2 2. Download and Install Joomla “Where and what to download?” “How to install?” In order to install Joomla! on your local computer, it is necessary to set up your "own internet", for which you'll need a browser, a web server, a PHP environment and as well a Joomla supported database system.
    [Show full text]
  • What Is Nosql? the Only Thing That All Nosql Solutions Providers Generally Agree on Is That the Term “Nosql” Isn’T Perfect, but It Is Catchy
    NoSQL GREG SYSADMINBURD Greg Burd is a Developer Choosing between databases used to boil down to examining the differences Advocate for Basho between the available commercial and open source relational databases . The term Technologies, makers of Riak. “database” had become synonymous with SQL, and for a while not much else came Before Basho, Greg spent close to being a viable solution for data storage . But recently there has been a shift nearly ten years as the product manager for in the database landscape . When considering options for data storage, there is a Berkeley DB at Sleepycat Software and then new game in town: NoSQL databases . In this article I’ll introduce this new cat- at Oracle. Previously, Greg worked for NeXT egory of databases, examine where they came from and what they are good for, and Computer, Sun Microsystems, and KnowNow. help you understand whether you, too, should be considering a NoSQL solution in Greg has long been an avid supporter of open place of, or in addition to, your RDBMS database . source software. [email protected] What Is NoSQL? The only thing that all NoSQL solutions providers generally agree on is that the term “NoSQL” isn’t perfect, but it is catchy . Most agree that the “no” stands for “not only”—an admission that the goal is not to reject SQL but, rather, to compensate for the technical limitations shared by the majority of relational database implemen- tations . In fact, NoSQL is more a rejection of a particular software and hardware architecture for databases than of any single technology, language, or product .
    [Show full text]
  • Beyond Agility: How Cloud Is Driving Enterprise Innovation
    Beyond agility How cloud is driving enterprise innovation IBM Institute for Business Value Executive Report Cloud How IBM can help IBM Cloud enables seamless integration into public and private cloud environments. The infrastructure is secure, scalable and flexible, providing tailored enterprise solutions that have made IBM Cloud the hybrid cloud market leader. For more information, please visit ibm.com/cloud-computing. 1 The cloud revolution Executive summary has advanced Five years ago, enterprises were implementing cloud What if your organization could sharpen its competitive edge by becoming more nimble? primarily to streamline IT infrastructure and cut costs. What if your enterprise could shift industry economics in its favor? What if your company Today, organizations are unleashing the power of could foresee a new customer need and dominate the market? cloud as business optimizers, innovators and The cloud revolution is here now, delivering true business value to organizations. In disruptors. Where companies choose to direct their enterprises around the world, cloud adoption has moved beyond the stage of acquiring cloud initiatives depends on a variety of factors. technological agility and is now powering business innovation. These factors include their goals and strategies, how much risk they are willing to assume, their current In our 2012 cloud study, only a third of the senior executives we spoke with when developing competitive context and their customer needs. In this our report, “The power of cloud,” said they were planning,
    [Show full text]
  • Cloud Computing and Enterprise Data Reliability Luan Gashi University for Business and Technology, [email protected]
    University of Business and Technology in Kosovo UBT Knowledge Center UBT International Conference 2016 UBT International Conference Oct 28th, 9:00 AM - Oct 30th, 5:00 PM Cloud Computing and Enterprise Data Reliability Luan Gashi University for Business and Technology, [email protected] Follow this and additional works at: https://knowledgecenter.ubt-uni.net/conference Part of the Communication Commons, and the Computer Sciences Commons Recommended Citation Gashi, Luan, "Cloud Computing and Enterprise Data Reliability" (2016). UBT International Conference. 56. https://knowledgecenter.ubt-uni.net/conference/2016/all-events/56 This Event is brought to you for free and open access by the Publication and Journals at UBT Knowledge Center. It has been accepted for inclusion in UBT International Conference by an authorized administrator of UBT Knowledge Center. For more information, please contact [email protected]. Cloud Computing and Enterprise Data Reliability Cloud Computing and Enterprise Data Reliability Luan Gashi UBT – Higher Education Institution, Lagjja Kalabria, 10000 p.n., Prishtine, Kosovo [email protected] Abstract. Cloud services offer many benefits from information and communication technology that to be credible must first be secured. To use the potential of cloud computing, data is transferred, processed and stored in the infrastructures of these service providers. This indicates that the owners of data, particularly enterprises, have puzzled when storing their data is done outside the scope of their control. Research conducted on this topic show how this should be addressed unequivocally. The provided information on the organization of cloud computing models, services and standards, with a focus on security aspects in protecting enterprise data where emphasis shows how data access is treated with reliability from providers of these services.
    [Show full text]
  • Hitachi Solution for Databases in Enterprise Data Warehouse Offload Package for Oracle Database with Mapr Distribution of Apache
    Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide By Shashikant Gaikwad, Subhash Shinde December 2018 Feedback Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to [email protected]. To assist the routing of this message, use the paper number in the subject and the title of this white paper in the text. Revision History Revision Changes Date MK-SL-131-00 Initial release December 27, 2018 Table of Contents Solution Overview 2 Business Benefits 2 High Level Infrastructure 3 Key Solution Components 4 Pentaho 6 Hitachi Advanced Server DS120 7 Hitachi Virtual Storage Platform Gx00 Models 7 Hitachi Virtual Storage Platform Fx00 Models 7 Brocade Switches 7 Cisco Nexus Data Center Switches 7 MapR Converged Data Platform 8 Red Hat Enterprise Linux 10 Solution Design 10 Server Architecture 11 Storage Architecture 13 Network Architecture 14 Data Analytics and Performance Monitoring Using Hitachi Storage Advisor 17 Oracle Enterprise Data Workflow Offload 17 Engineering Validation 29 Test Methodology 29 Test Results 30 1 Hitachi Solution for Databases in an Enterprise Data Warehouse Offload Package for Oracle Database with MapR Distribution of Apache Hadoop Reference Architecture Guide Use this reference architecture guide to implement Hitachi Solution for Databases in an enterprise data warehouse offload package for Oracle Database. This Oracle converged infrastructure provides a high performance, integrated, solution for advanced analytics using the following big data applications: . Hitachi Advanced Server DS120 with Intel Xeon Silver 4110 processors . Pentaho Data Integration . MapR distribution for Apache Hadoop This converged infrastructure establishes best practices for environments where you can copy data in an enterprise data warehouse to an Apache Hive database on top of Hadoop Distributed File System (HDFS).
    [Show full text]
  • Cloud Computing Bible Is a Wide-Ranging and Complete Reference
    A thorough, down-to-earth look Barrie Sosinsky Cloud Computing Barrie Sosinsky is a veteran computer book writer at cloud computing specializing in network systems, databases, design, development, The chance to lower IT costs makes cloud computing a and testing. Among his 35 technical books have been Wiley’s Networking hot topic, and it’s getting hotter all the time. If you want Bible and many others on operating a terra firma take on everything you should know about systems, Web topics, storage, and the cloud, this book is it. Starting with a clear definition of application software. He has written nearly 500 articles for computer what cloud computing is, why it is, and its pros and cons, magazines and Web sites. Cloud Cloud Computing Bible is a wide-ranging and complete reference. You’ll get thoroughly up to speed on cloud platforms, infrastructure, services and applications, security, and much more. Computing • Learn what cloud computing is and what it is not • Assess the value of cloud computing, including licensing models, ROI, and more • Understand abstraction, partitioning, virtualization, capacity planning, and various programming solutions • See how to use Google®, Amazon®, and Microsoft® Web services effectively ® ™ • Explore cloud communication methods — IM, Twitter , Google Buzz , Explore the cloud with Facebook®, and others • Discover how cloud services are changing mobile phones — and vice versa this complete guide Understand all platforms and technologies www.wiley.com/compbooks Shelving Category: Use Google, Amazon, or
    [Show full text]
  • 1- BRUNEL: Welcome to Another Episode of Getting the Most
    BRUNEL: Welcome to another episode of Getting the Most Out of IBM U2. I'm Kenny Brunel, and I'm your host for today's episode, and today we're going to talk about IBM U2's latest technology, U2.NET. First of all, what is .NET? So I'm going to turn my attention, first of all, to a guest that I have in the studio with me today in Denver, Dave Peters. Dave is the product manager for the U2 data servers and the client tools, and Dave has been with IBM for over 12 years. What can you tell us about U2.NET? But first of all could you give us a brief rundown or a description of what .NET is itself. PETERS: Sure, Kenny. .NET is really a development framework architected by Microsoft. This is the framework that Microsoft hopes that will be the development choice for any new development on Windows. It consists of user-defined interfaces, data access, database connectivity, cryptography, Web application development and communications. IBM has three options for U2 developers that want to access the U2 data servers using .NET. The first I'll mention is [UniObjects] for .NET or UO.NET. UO.NET is a MultiValue API for use in the .NET applications. It's very familiar within -1- the constructs and was introduced to help Basic programmers get to .NET very easily. And one good thing is it's simple to get. UO .NET is available on the client CD that comes with either UniVerse or UniData. The second option is, the long name is IBM database add-ins for Visual Studio or as we refer to it as IBM .NET.
    [Show full text]
  • Introduction to Managing Mobile Devices Using Linux on System Z
    Introduction to Managing Mobile Devices using Linux on System z SHARE Pittsburgh – Session 15692 Romney White ([email protected]) System z Architecture and Technology © 2014 IBM Corporation Mobile devices are 80% of devices sold to access the Internet Worldwide Shipment of Internet Access Devices 2013 2017 PC (Desktop & Notebook) PC (Ultrabook) Tablet Phone Worldwide Devices Shipments by Segment (Thousands of Units) Device Type 2012 2013 2014 2017 PC (Desk-Based and Notebook) 341,263 315,229 302,315 271,612 PC (Ultrabooks) 9,822 23,592 38,687 96,350 Tablet 116,113 197,202 265,731 467,951 Mobile Phone 1,746,176 1,875,774 1,949,722 2,128,871 Total 2,213,373 2,411,796 2,556,455 2,964,783 2 Source: Gartner (April 2013) © 2014 IBM Corporation Mobile Internet users will surpass PC internet users by 2015 The number of people accessing the Internet from smartphones, tablets and other mobile devices will surpass the number of users connecting from a home or office computer by 2015, according to a September 2013 study by market analyst firm IDC. PC is the new Legacy! 3 © 2014 IBM Corporation Five mobile trends with significant implications for the enterprise Mobile enables the Mobile is primary Internet of Things Mobile is primary 91% of mobile users keep Global Machine-to-machine 91% of mobile users keep their device within arm’s connections will increase their device within arm’s reach 100% of the time from 2 billion in 2011 to 18 reach 100% of the time billion at the end of 2022 Mobile must create a continuous brand Insights from mobile experience
    [Show full text]
  • System Tools Reference Manual for Filenet Image Services
    IBM FileNet Image Services Version 4.2 System Tools Reference Manual SC19-3326-00 IBM FileNet Image Services Version 4.2 System Tools Reference Manual SC19-3326-00 Note Before using this information and the product it supports, read the information in “Notices” on page 1439. This edition applies to version 4.2 of IBM FileNet Image Services (product number 5724-R95) and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright IBM Corporation 1984, 2019. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this manual 17 Manual Organization 18 Document revision history 18 What to Read First 19 Related Documents 19 Accessing IBM FileNet Documentation 20 IBM FileNet Education 20 Feedback 20 Documentation feedback 20 Product consumability feedback 21 Introduction 22 Tools Overview 22 Subsection Descriptions 35 Description 35 Use 35 Syntax 35 Flags and Options 35 Commands 35 Examples or Sample Output 36 Checklist 36 Procedure 36 May 2011 FileNet Image Services System Tools Reference Manual, Version 4.2 5 Contents Related Topics 36 Running Image Services Tools Remotely 37 How an Image Services Server can hang 37 Best Practices 37 Why an intermediate server works 38 Cross Reference 39 Backup Preparation and Analysis 39 Batches 39 Cache 40 Configuration 41 Core Files 41 Databases 42 Data Dictionary 43 Document Committal 43 Document Deletion 43 Document Services 44 Document Retrieval 44 Enterprise Backup/Restore (EBR)
    [Show full text]