Building Big Data and Analytics Solutions in Cloud
Total Page:16
File Type:pdf, Size:1020Kb
Front cover Building Big Data and Analytics Solutions in the Cloud Characteristics of big data and key technical challenges in taking advantage of it Impact of big data on cloud computing and implications on data centers Implementation patterns that solve the most common big data use cases Wei-Dong Zhu Manav Gupta Ven Kumar Sujatha Perepa Arvind Sathi Craig Statchuk ibm.com/redbooks Redpaper International Technical Support Organization Building Big Data and Analytics Solutions in the Cloud December 2014 REDP-5085-00 Note: Before using this information and the product it supports, read the information in “Notices” on page v. First Edition (December 2014) This edition applies to: IBM InfoSphere® BigInsights™ IBM PureData™ System for Hadoop IBM PureData System for Analytics IBM PureData System for Operational Analytics IBM InfoSphere Warehouse IBM InfoSphere Streams IBM InfoSphere Data Explorer (Watson™ Explorer) IBM InfoSphere Data Architect IBM InfoSphere Information Analyzer IBM InfoSphere Information Server IBM InfoSphere Information Server for Data Quality IBM InfoSphere Master Data Management Family IBM InfoSphere Optim™ Family IBM InfoSphere Guardium® Family © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . .v Trademarks . vi Executive summary . vii Authors. viii Now you can become a published author, too! . ix Comments welcome. ix Stay connected to IBM Redbooks . .x Chapter 1. Big data and analytics overview . 1 1.1 Examples of big data and analytics. 3 1.2 IBM Big Data and Analytics platform. 4 1.3 IBM Cloud Computing Reference Architecture . 8 1.4 Big data and analytics on the cloud: Complementary technologies . 9 1.5 A large difference . 10 1.6 Implications for the cloud . 11 1.7 Big data and analytics in the cloud: Bringing the two together . 12 Chapter 2. Big data and analytics business drivers and technical challenges. 15 2.1 Big data business drivers . 16 2.2 Technical challenges. 18 Chapter 3. Big data and analytics taxonomy . 21 Chapter 4. Big data and analytics architecture . 25 4.1 Maturity models. 26 4.2 Functional architecture . 27 4.3 Big data and analytics infrastructure architecture . 31 Chapter 5. Key big data and analytics use cases . 33 5.1 Use case details . 35 5.1.1 Big data and analytics exploration . 35 5.1.2 Enhanced 360-degree view of the customer. 36 5.1.3 Operations analysis . 38 5.1.4 Data warehouse augmentation . 39 5.1.5 Security intelligence (security, fraud, and risk analysis) . 41 Chapter 6. Big data and analytics patterns. 43 6.1 Zones . 44 6.1.1 Data refinement value. 45 6.1.2 Normalization . 46 6.1.3 Maximizing the number of correlations . 46 6.1.4 Quality, relevance, and flexibility. 46 6.1.5 Solution pattern descriptions. 47 6.1.6 Component patterns . 55 Chapter 7. Picking a solution pattern . 69 7.1 Data ingestion . 70 7.2 Data preparation and data alignment . 71 7.3 Data streaming . 72 © Copyright IBM Corp. 2014. All rights reserved. iii 7.4 Search-driven analysis . 73 7.5 Dimensional analysis . 74 7.6 High-volume analysis . 75 Chapter 8. NoSQL technology . 77 8.1 Definition and history of NoSQL . 78 8.1.1 The CAP theorem . 79 8.2 Types of NoSQL databases . 80 8.2.1 Key-Value stores. 81 8.2.2 Document databases . 81 8.2.3 Columnar and column-family databases. 81 8.2.4 Graph databases . 82 8.2.5 Hadoop and the Hadoop Distributed File System. 82 8.2.6 Choosing the right distributed data store . 83 Chapter 9. Big data and analytics implications for the data center . 85 9.1 The hybrid data center . 86 9.2 Big data and analytics service ecosystem on cloud . 86 9.3 IBM SoftLayer for big data and analytics . 89 Chapter 10. Big data and analytics on the cloud: Architectural factors and guiding principles. 91 10.1 Architectural principles . 92 10.1.1 Security . 92 10.1.2 Speed . 92 10.1.3 Scalability . 92 10.1.4 Reliability and Availability . 93 10.2 Architectural decisions . 93 10.2.1 Performance . 93 10.2.2 Scalability . 94 10.2.3 Financial . 95 10.2.4 Reliability. ..