IBM Analytics

Hadoop in the cloud Leverage big data analytics easily and cost-effectively with IBM BigInsights for Hadoop in the cloud

1 2 3 4 5 Introduction Cloud and Enhancing IBM Resources analytics: The new Hadoop in the Foundations: growth engine cloud with Complete cloud for business BigInsights analytics capabilities Hadoop in the cloud

Introduction

One of the hottest technologies in the big The need for large upfront investments and data space is Apache Hadoop, an open concerns about flexibility, coupled with source software framework used to reliably special challenges involved in evaluating the manage large volumes of data. Designed to technology and developing Hadoop skills, scale from a single server to thousands of often prevent organizations from adopting machines with a high degree of fault and deploying Hadoop across the tolerance, Hadoop enables organizations to enterprise. It also becomes impractical to extract valuable insight from large volumes use Hadoop on an occasional basis for of structured, unstructured and semi- high-impact projects that do not have a structured data. need for continuous processing.

3

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

There is good news, though—you can overcome these capital requirements barriers through . The What’s so cool about cloud? cloud model of paying only for the Cloud computing is gaining momentum for good reason. Some consider it for overall resources that you need—and only when IT cost savings. Others are drawn to the promise of reduced capital expense. Still you need them—supports experimentation others are looking to solve pressing issues such as a chronic shortage of space or overly long cycles for provisioning resources (see Figure 1). and evaluation and is ideal for building up skills. It is also a great solution for Cloud computing is the delivery of on-demand computing resources—everything short-term or occasional-use projects from applications to data centers—over the . At its root, this where an investment in a dedicated cluster “as-a-service” concept is simple: users can focus on their business needs and not is cost prohibitive. have to worry about maintaining, improving and caring for a complex IT system. Mutually beneficial for business and IT, the cloud delivers:

• Elastic resources for quickly scaling up or down to meet spikes and lulls in demand • Multiple payment options, from pay-as-you-go to hourly or monthly licenses • Self-service access to technology resources

4

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

This e-book explores how you can use IBM-enhanced Hadoop capabilities in We need more space in the cloud to cost-effectively deploy the data center deep analytics for all users in your organization—opening up the benefits of big data to everyone. We need to reduce We need to free up IT costs highly skilled resources

We need to support We need to reduce remote teams our CAPEX

We need to improve We need to deliver business agility better resiliency

We need to integrate web-based data Figure 1: Drivers of cloud adoption

5

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

Cloud and analytics: The new growth engine for business

In the view of IBM, cloud is more than just a • Innovative energy-saving techniques more decentralized, incorporate way to manage costs or get services • Insight into security anomalies technologies that encourage peer faster—it is a critical path to business • More efficient ways to use and conserve water interactions and often leverage cloud growth. Cloud and big data analytics offer • Deeper insight into customer preferences technologies to enable those interactions. the potential to place information, insights and trends and decision making at people’s fingertips, • Real-time feedback on marketing The differences between the two systems— at the right time and place. campaigns and the ramifications of those differences— are significant. For example, in order to Because cloud computing offers access to The shift toward cloud computing is also a integrate systems and support enhanced unlimited computing power and easier ways response to the realization that big data collaboration (a central tenet of systems of to process large amounts of data, it’s an ideal and analytics must take a more central role engagement), a company needs to deploy deployment model for big data analytics. in today’s business world, becoming an appropriate platform technologies. Formats IBM has combined cloud computing and engine that helps drive the business include (SaaS), big data analytics in a wide range of forward. Organizations need to (PaaS) or industries, resulting in benefits such as: transition from passive, siloed Infrastructure as a Service (IaaS) offerings, “systems of record” designed around and can be deployed on the public cloud, a • The discovery of life-changing medicines discrete pieces of information to private cloud or a hybrid model. • More accurate prediction of weather patterns “systems of engagement,” which are

6

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

Enhancing Hadoop in the cloud with IBM BigInsights

Cloud computing enables you to overcome integration capabilities, visualization and actionable insights from data in the the capital requirements of Hadoop. But discovery tools as well as security, audit Hadoop cluster rather than having to open source Hadoop lacks enterprise-grade history and performance management. move the data around. management technology and performance • Increased performance: An average and may require organizations to learn or 4 times performance gain over open This combination—open source Hadoop and acquire new skills. Organizations can source Hadoop.1 the value-add enterprise features from IBM— overcome these barriers by using IBM® • Usability: BigInsights is optimized for a can be deployed on the cloud. IBM provides BigInsights™ for Apache Hadoop, an wide range of roles, including integration pre-built images and/or templates for rapid enterprise-ready distribution of Hadoop. developers, administrators, data scientists, deployment of Hadoop clusters in the cloud, In addition to the Hadoop technology, analysts and line-of-business contacts. and multi-cloud templates for BigInsights are BigInsights delivers unique value that is • Integrated with IBM Watson™ also available through RightScale. Deploying designed to address the challenges of Foundations big data platform: a BigInsights cluster on the cloud also frees modern enterprise IT: BigInsights comes bundled with search users from sourcing (and paying for) extra and streaming analytics capabilities. equipment and racks. • Extended Hadoop: BigInsights is based • Analytics: Built-in Hadoop analytics on 100 percent open source Hadoop. It capabilities for machine data, social data, extends Hadoop with enterprise-grade text and Big R enable you to locate technology including administration and

7

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

The many roles of analytics Industry Cloud analytics use case Benefits in the cloud The IaaS and PaaS options will be used by Retail Support growing data volumes and analyze Reduced costs and improved developers, but business leaders will see a customer data in more efficient ways to marketing effectiveness. acquire deeper, more valuable insights. significant impact too. For example, clients deploying BigInsights on the cloud have Healthcare Analyze millions of patient records and Improved outcomes and better perform statistically guided decision support insights into treatment trends reduced hardware and software costs, to lower diagnostic errors and improve the and relationships. avoided future expansion costs, simplified quality of care. development and management processes, Energy and Implement proactive resource optimization and More efficient resource utilization and realized dramatic performance utilities allocation, perform asset management and and potentially less downtime or improvement. Check out the table for maintenance optimization. capacity shortfalls. more examples of how cloud analytics Banking Identify customer patterns from log data to Increased up-sell and cross-sell are delivering real-world benefits. improve customer insight and provide opportunities; more granular better-targeted offers and services. Create customer demographic segmentation. a one-stop shop for data discovery.

8

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

IBM Watson Foundations: Complete cloud analytics capabilities

BigInsights draws additional strength from • Real-time application development: its lineage as part of the IBM Watson Quickly ingest, analyze and correlate data Performance, scalability and deep Foundations big data portfolio. IBM as it arrives from thousands of real-time analytics make IBM big data Watson Foundations delivers a full range sources. Easily build applications with drag solutions exceptional. They deliver: of capabilities to help you meet your big operators, visual editors and performance data and analytics goals: monitoring. Dynamically add new data • More volume: Handles up to 10 times sources. Create, edit, visualize, test, debug more records per second on the same • Real-time analytics: Dynamically update and run applications in the cloud. hardware compared to other open source and complex event processing business rules and processes based on • Analytic toolkits and accelerators: 2 what’s happening right now. Analyze data Deploy deep analytics developed by IBM (CEP) vendors in motion for real-time insights. Research, such as geospatial, time series, • More data variety: Analytics and R analysis, text analytics and much more. powerful modeling on any and all data types • More velocity: One-tenth to one- thousandth the latency compared to other open source and CEP vendors3

9

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

IBM Watson Foundations also contributes key big data and analytics capabilities optimized for the cloud (see Figure 2). Cloud-ready Cloud-ready • technology pricing • Bring your own • Manage applications application license in the cloud • Pay as you go

IBM Watson Foundations • Quick-to-start solutions • Solutions ready for • Established developer Risk-free any cloud model— communities deployment Cloud-open private, public • Centers of competencies path or hybrid for expert help

Figure 2. Cloud capabilities provided by the IBM Watson Foundations big data platform.

10

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

A portfolio of solutions tournament organizer to create more that work together interesting and engaging content for the As part of IBM Watson Foundations, 19.7 million unique users. BigInsights works with other IBM big data solutions such as IBM InfoSphere® The IBM system gathered large volumes of Streams to deliver exceptional results. data in real time from on-court sensors and scorers plus social media from off-court For example, Wimbledon Championships analysts and fans around the globe, and served up an ace performance using then integrated it with other sources of InfoSphere Streams and BigInsights, structured and unstructured data for breaking new ground with 433 million distribution to analytical tools, websites, page views serving up a total of 155 TB mobile apps and broadcasters. The real-time of data—equivalent to over 35 years’ analysis of match data revealed winning worth of CD-quality audio recording. patterns; analytics were also used to predict More detailed and comprehensive analysis demand, enabling the cloud infrastructure of current and past data enabled the to automatically adjust resources.

11

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

BigInsights for the cloud allows organizations to respond faster to changing business environments by analyzing larger volumes of data more cost-effectively. It enables organizations to analyze all data in its native format to add real-world information to decision processes. Organizations can use it to scale to petabytes of data and thousands of users with near- linear processor scalability—all on a reliable and secure platform.

12

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 Resources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities Hadoop in the cloud

Resources

In this era of big data, you need solutions that allow you to easily and cost-effectively unlock the value of enterprise data. Many It’s time for a new approach. Analytics in the cloud with BigInsights allows you to easily analytics solutions leave users frustrated and economically tap into the power of Hadoop and big data. To learn more about how you or disappointed because they can’t act can take advantage of BigInsights and IBM cloud offerings, visit these resources: handle today’s big data volumes, take too IBM BigInsights for Apache Hadoop overview long to deploy or require huge new upfront investments. IBM BigInsights Quick Start Edition

IBM Watson Foundations

IBM Cloud Computing

13

1 Introduction 2 Cloud and analytics: 3 Enhancing Hadoop in the 4 IBM Watson Foundations: 5 ResourcesResources The new growth engine cloud with BigInsights Complete cloud analytics for business capabilities © Copyright IBM Corporation 2015

IBM Analytics Route 100 Somers, NY 10589

Produced in the United States of America June 2015

IBM, the IBM logo, .com, BigInsights, IBM Watson, and InfoSphere are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.

1 4 times is approximate value. Testing involved the SWIM benchmark (https://github.com/SWIMProjectUCB/SWIM) and jobs derived from production workload traces. Testing was conducted in controlled laboratory conditions. See “STAC Report: Comparison of IBM InfoSphere BigInsights Enterprise Edition with Apache Hadoop using SWIM.” www.stacresearch.com/node/15370

2 IBM InfoSphere Streams v3.0 Performance Report. February 2013. https://www14.software.ibm.com/webapp/iwm/web/ signup.do?source=sw-infomgt&S_PKG=500012717&S_ CMP=is_dwwp14_ppo

3 Ibid.

Please Recycle

IMM14153-USEN-02