Why Cloudera the Platform for Production Success

Total Page:16

File Type:pdf, Size:1020Kb

Why Cloudera the Platform for Production Success Why Cloudera The Platform for Production Success © Cloudera, Inc. All rights reserved. 1 Why Cloudera? We deliver long-term production success with enterprise Hadoop. Open Source Innovation Enterprise Security No one knows Hadoop better than Cloudera. Meet compliance requirements and reduce Cloudera leads development of enterprise risk exposure from storing sensitive data. Hadoop and offers the best support, training, and services. Data Governance Powerful Enterprise Tools Enable compliance and maximize analyst productivity. Cloudera extends open source Hadoop with capabilities required by the largest enterprises. Complete Management Ecosystem Deliver optimum system utilization and Cloudera partners with industry leaders to ensure meet SLA commitments, on-premises or Hadoop works with the platforms, tools, and in the cloud, with minimum effort. integrators our customers rely on. © Cloudera, Inc. All rights reserved. 2 Our Platform © Cloudera, Inc. All rights reserved. 3 Cloudera is Built for Production Success A modern data platform plus what the enterprise requires. Hadoop delivers: Process Discover Model Serve • One place for unlimited data • Unified, multi-framework data access Security and Administration Unlimited Storage Cloudera delivers: • Enterprise Security • Data Governance Deployment On-Premises Public Cloud Appliances Private Cloud • Complete Management Flexibility Engineered Systems Hybrid Cloud • And more… © Cloudera, Inc. All rights reserved. 4 Industrial Multi-Workload Performance Multiple big data opportunities in one optimized, high-performance, multi-tenant platform. Batch, Interactive, Discover Process Model Serve and Real-Time. Ingest Analytic Machine NoSQL Database Sqoop, Flume Database Learning HBase Leading performance and Impala SAS, R, Spark, usability in one platform. Transform Streaming Mahout MapReduce, Search Spark Streaming Hive, Pig, Spark Solr • End-to-end analytic workflows • Access more data YARN, Cloudera Manager, Security and Administration Cloudera Navigator • Work with data in new ways Unlimited Storage HDFS, HBase • Enable new users © Cloudera, Inc. All rights reserved. 5 Latest SQL Performance 350 Single User vs 10 User Response Time & Impala Times Faster 10 Users, 302 Users, 10 300 (Lower bars = better) 250 10 Users, 202 Users, 10 200 27.4x 150 120 Users, 10 18.3x 100 77 Single User, Time (in seconds) Time (in Single User, 37 Single User, Single User, 25 Single User, 10.6x 15.4x 50 11 Users, 10 Single User, 5 Single User, 5.0x 7.4x 0 Impala Spark SQL Presto Hive-on-Tez Independent validation by IBM Research SQL-on-Hadoop VLDB paper: “Impala’s database architecture provides significant performance gains” © Cloudera, Inc. All rights reserved. 6 Hadoop Security is Different Hadoop Benefit Security Side Effect Combining data and audiences that used to be A single platform for all the data securely silo’d Security method proliferation can increase costs/ A rich, flexible ecosystem of tools & utilities introduce coverage gaps Ingest data of any type Sensitive fields added without review Active Archive provides lower cost storage Lose the built-in compliance controls that legacy than legacy systems systems provided © Cloudera, Inc. All rights reserved. 7 The Only Comprehensively Secure Hadoop Platform Meet compliance requirements and reduce risk exposure from storing sensitive data. 1. Perimeter Standards-based Authentication Cloudera is the leader in Hadoop security. Process Discover Model Serve Unique Capabilities: 2. Access Unified Role-based Authorization • Comprehensive and Unified Security and Administration • Secure at the core 3. Visibility Auditing & Governance • No Performance Impact Unlimited Storage • Jointly engineered with Intel • Compliance-Ready 4. Data Encryption & Key Management • Only distribution to pass PCI audit © Cloudera, Inc. All rights reserved. 8 The Only Hadoop Data Governance Solution Enable compliance and maximize analyst productivity. Cloudera Navigator Minimize risk and maintain compliance with the only native end-to-end data governance solution for Apache Hadoop. Unique Capabilities: • Auditing • Lineage • Metadata Tagging and Discovery • Lifecycle Management © Cloudera, Inc. All rights reserved. 9 MasterCard Cloudera: The first PCI-Certified Hadoop Platform Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full “Data privacy and protection is a top priority for MasterCard. As we maximize PCI certification the most advanced technologies from partners and vendors, they must meet the Solution: MasterCard’s Cloudera environment rigorous security standards we’ve set. With Cloudera’s commitment to the same fully conforms to the PCI-DSS V 2.0 security standards, we now have additional options standards so it can host PCI datasets and in how we manage ourGary VonderHaardata center.” Chief Technology Officer, potentially integrate with other internal systems Architecture MasterCard © Cloudera, Inc. All rights reserved. 10 Security and Governance Cloudera Hortonworks Unified, Compliance-Ready, Transparent Fragmented, Incomplete, Complex ● Kerberos with Cloudera Manager Kerberos Perimeter Automated, industry-standard ◐ Manual configuration Protecting access to the cluster authentication integrated with and integration existing systems Apache Sentry ◐ Hive ATZ-NG, Ranger Access ● Working within the RBAC configuration silos, Securing access to data community to deliver centralized, GUI “Band-Aid” granular RBAC across frameworks Cloudera Navigator Apache Falcon, Knox, Ranger Visibility ● Transparent end-to-end ◔ Manual and limited auditing through Reporting on data access data and metadata visibility a single workflow framework, and lineage and multiple tools ● Cloudera Navigator N/A Data Transparent, comprehensive, high- ○ Protecting data at rest performance, compliance-ready or in transmission encryption and key management © Cloudera, Inc. All rights reserved. 11 The Only Complete Hadoop Management Suite Deliver optimum system utilization and meet SLA commitments. Cloudera Manager Focus on the solution, not the cluster, with the only complete, zero-downtime administration tool for Apache Hadoop. Unique Capabilities: • Unified configuration, management and monitoring across all services • Online installation and upgrades • Direct connection to Cloudera Support • 3rd Party Extensibility © Cloudera, Inc. All rights reserved. 12 Cloudera Manager vs. Ambari Cloudera Ambari Unified, Directed, Streamlined Federated, Chaotic, Disjointed Parcels and Workflows YUM and Shell Commands Manage ● Holistic, service-oriented components ◐ Manual configuration Deploying and enable streamlined, comprehensive, and and time-consuming, configuring services straightforward operations error-prone integration Integrated Charting and SNMP Alerts Nagios, Ganglia Monitor ● Catalog of chart metrics and visualization ◐ Manual configuration, limited native System health and with easy-to-build, easy-to-share visualization, and manual integration of QoS and SLA notification dashboards and common alerts separate, disparate systems and services Time Control and Log Collection SSH/SCP to /var/log Diagnose ●Centralized log aggregation of all services ◔ Manual log collection via CLI tools Root cause discovery, with integrated faceted search and from diverse locations with limited, service- analysis, and solution visual timeframe controls specific search and no historical views ● Enterprise Kerberos Integration Kerberos Integrate Automated, industry-standard ◐ Assisted CLI configuration, Extending security policies, authentication with integration manual deployment, adding 3rd party services to existing enterprise systems and limited integration © Cloudera, Inc. All rights reserved. 13 The Only Portable Cloud Experience for Hadoop Maximize flexibility in Hadoop deployment architectures. Cloudera Director The first portable, self-service solution for deploying and managing enterprise-grade Hadoop in the Cloud. Unique Capabilities: • Dynamic cluster lifecycle management • Cloud blueprints • Multi-cluster health visibility • Usage reporting for billing models © Cloudera, Inc. All rights reserved. 14 Our Approach © Cloudera, Inc. All rights reserved. 15 Focusing on Open Standards, not just Open Source Vendor Support Open Standards are just as Component (Founder) Cloudera Pivotal MapR Amazon IBM Hortonworks Impala (Cloudera) ✔ ✖ ✔ ✔ ✖ ✖ important as Open Source. Spark (UC Berkeley) ✔ ✔ ✔ ✔ ✔ ✔ Hue (Cloudera) ✔ ✔ ✔ ✔ ✖ ✔ Why does it matter? Sentry (Cloudera) ✔ ✔ ✔ ✖ ✔ ✖ • Diverse engineering is more sustainable. Flume (Cloudera) ✔ ✔ ✔ ✖ ✔ ✔ • Broad support ensures vendor portability. Parquet ✔ ✔ ✔ ✔ ✔ ✖ (Cloudera/Twitter) • Project utility depends on ecosystem Sqoop (Cloudera) ✔ ✔ ✔ ✔ ✔ ✔ compatibility, which depends on standards. Falcon (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ Knox (Hortonworks) ✖ ✖ ✖ ✖ ✔ ✔ Tez (Hortonworks) ✖ ✖ ✔ ✖ ✖ ✔ Cloudera leads in defining Ranger (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ the de facto open standards ORCfile (Hortonworks) ✖ ✖ ✖ ✖ ✖ ✔ adopted by the market. © Cloudera, Inc. All rights reserved. 16 Sustainable Innovation A Hybrid Open Source Model combining the power of open source with the enterprise capabilities customers need. Open Platform • Deep open source commitment 100% Open Source • 2/3 of engineering on open source & Open Standards • 19 Hadoop ecosystem projects founded • 90 ASF committer seats, 67 PMC seats • Enterprise-ready extensions • Security, governance, and system management • Comprehensive partner
Recommended publications
  • Lenovo Big Data Reference Design for Cloudera Data Platform on Thinksystem Servers
    Lenovo Big Data Reference Design for Cloudera Data Platform on ThinkSystem Servers Last update: 30 June 2021 Version 2.0 Reference architecture for Solution based on the Cloudera Data Platform with ThinkSystem servers Apache Hadoop and Apache Spark Deployment considerations for Solution design matched to scalable racks including detailed Cloudera Data Platform architecture validated bills of material Xiaotong Jiang Xifa Chen Ajay Dholakia Lenovo Big Data Reference Design for Cloudera Data Platform on ThinkSystem Servers 1 Table of Contents 1 Introduction ............................................................................................... 4 2 Business problem and business value ................................................... 5 3 Requirements ............................................................................................ 7 Functional Requirements ......................................................................................... 7 Non-functional Requirements................................................................................... 7 4 Architectural Overview ............................................................................. 8 Cloudera Data Platform ........................................................................................... 8 CDP Private Cloud ................................................................................................... 8 5 Component Model .................................................................................. 10 Cloudera Components ..........................................................................................
    [Show full text]
  • Apache Spot: a More Effective Approach to Cyber Security
    SOLUTION BRIEF Data Center: Big Data and Analytics Security Apache Spot: A More Effective Approach to Cyber Security Apache Spot uses machine learning over billions of network events to discover and speed remediation of attacks to minutes instead of months—streamlining resources and cutting risk exposure Executive Summary This solution brief describes how to solve business challenges No industry is immune to cybercrime. The global “hard” cost of cybercrime has and enable digital transformation climbed to USD 450 billion annually; the median cost of a cyberattack has grown by through investment in innovative approximately 200 percent over the last five years,1 and the reputational damage can technologies. be even greater. Intel is collaborating with cyber security industry leaders such as If you are responsible for … Accenture, eBay, Cloudwick, Dell, HP, Cloudera, McAfee, and many others to develop • Business strategy: You will better big data solutions with advanced machine learning to transform how security threats understand how a cyber security are discovered, analyzed, and corrected. solution will enable you to meet your business outcomes. Apache Spot is a powerful data aggregation tool that departs from deterministic, • Technology decisions: You will learn signature-based threat detection methods by collecting disparate data sources into a how a cyber security solution works data lake where self-learning algorithms use pattern matching from machine learning to deliver IT and business value. and streaming analytics to find patterns of outlier network behavior. Companies use Apache Spot to analyze billions of events per day—an order of magnitude greater than legacy security information and event management (SIEM) systems.
    [Show full text]
  • Intel ISG Caesars Entertainment Case Study
    Case Study Intel® Xeon® Processor E5 Family Big Data Entertainment Doubling Down on Entertainment Marketing with Intel® Xeon® Processors Caesars Entertainment uses Intel® technology to reach a new demographic group and process more than 3 million records per hour Technology leadership is one of the reasons Caesars Entertainment has become the world’s most geographically diversifed casino-entertainment company, with resorts and casinos on three continents. The organization is well known for its innovative application of data to improve the customer experience. But with younger customers spending more money on non-gaming activities, Caesars needed a system capable of handling a new, expanded set of customer data for hotels, shows, and shopping venues. Challenges • Improve customer segmentation for more efective marketing campaigns • Expand analysis to include unstructured and semi-structured data • Accelerate processing for analytics and marketing campaign management Solution • Implement a new environment using Cloudera Enterprise, Cloudera’s Distribution Including Apache Hadoop* (CDH*) running on the Intel® Xeon® processor E5 family Technology Results • Reduces processing time for key jobs from six hours to 45 minutes • Expands capacity to more than 3 million records processed per hour • Enables fne-grained segmentation to improve marketing results • Improves security for meeting Payment Card Industry (PCI) and other security standards Business Value • Faster, Better analysis of all data types for creating and delivering “We used the Intel® Xeon® personalized marketing processor E5 family across • Ability to reach a new generation of customers and win their loyalty, ultimately driving revenue our new environment Creating a New Engine for Marketing Success because it gives us At Caesars Entertainment, demographic and behavioral data is vital to the the performance and organization’s successful marketing.
    [Show full text]
  • Wandisco Fusion® Google Cloud
    FUSION WANDISCO FUSION® GOOGLE CLOUD Seamless cloud migration and hybrid cloud WANdisco Fusion is the only solution that seamlessly moves active data to Google Cloud as it changes in on- premises local and NFS mounted file systems, Hadoop GOOGLE CLOUD clusters and other cloud environments. Our game- changing, patented technology captures every change and supports hybrid cloud use cases for on-demand burst-out processing, data archiving and offsite disaster recovery with no downtime and no data loss. No migration downtime FUSION Google Cloud and on-premises environments operate in parallel during migration, allowing data, applications and users to move in phases without disrupting normal operations. FUSION FUSION Guaranteed data consistency Google Cloud, on-premises storage and other cloud environments stay continuously in sync with LAN-speed read/write access to the same data at every location. HADOOP Automatic recovery LOCAL AND Recovery is automatic after planned or unplanned NFS MOUNTED FILE SYSTEMS network or hardware outages in both on-premises and Google Cloud environments. Easy and intuitive Simple setup in both on-premises and cloud environments using the Google Cloud Platform Templates gets the software up and running in minutes. Its intuitive admin console for monitoring, scheduling and audit further simplifies the process. Copyright © 2019 WANdisco, Inc. All rights reserved. Supported environments Hadoop Cloud • Amazon EMR • Amazon • Cloudera CDH • Alibaba Cloud • Google Cloud • Google Cloud™ Dataproc • Microsoft Azure® • Hortonworks
    [Show full text]
  • Ready Solutions for Data Analytics Cloudera Hadoop 6.1 Architecture Guide
    Ready Solutions for Data Analytics Cloudera Hadoop 6.1 Architecture Guide April 2019 H17614.1 Abstract This reference architecture guide describes the architectural recommendations for Cloudera Hadoop 6.1 software on Dell EMC PowerEdge servers and Dell EMC Networking switches. Copyright © 2017-2019 Dell Inc. or its subsidiaries. All rights reserved. Published April 2019 Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.“ DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the USA. Dell EMC Hopkinton, Massachusetts 01748-9103 1-508-435-1000 In North America 1-866-464-7381 www.DellEMC.com 2 Ready Architecture for Cloudera Hadoop 6.1 CONTENTS Figures 5 Tables 7 Chapter 1 Executive Summary 9 Document purpose......................................................................................10 Audience..................................................................................................... 10 Hadoop overview.......................................................................................
    [Show full text]
  • Cloudera Enterprise with IBM the Modern Platform for Machine Learning and Analytics, Optimized for the Cloud One Platform
    Cloudera Enterprise with IBM The modern platform for machine learning and analytics, optimized for the cloud One platform. Many applications. Many of the world’s largest companies rely on Cloudera’s multifunction, multi-environment platform to provide the – Data science and engineering foundation for their critical business value drivers—growing Process, develop and serve predictive models. business, connecting products and services, and protecting – Data warehouse business. Find out what makes Cloudera Enterprise with IBM The modern data warehouse for today, tomorrow different from other data platforms. and beyond. – Operational database Real-time insights for modern data-driven business. Enterprise grade The scale and performance required for today’s modern Deploy and run essentially anywhere data workloads meet the security and governance demands of today’s IT. Cloudera’s modern platform is designed to – Multicloud provisioning make it easy to bring more users—thousands of them—to Deploy and manage Cloudera Enterprise across petabytes of diverse data and provides industry-leading Amazon Web Services (AWS), Google Cloud Platform, engines to process and query data, as well as develop and Microsoft Azure and private networks. serve models quickly. The platform also provides several – High-performance analytics layers of fine-grained security and complete audibility for Run your analytic tool of choice against cloud-native companies to prevent unauthorized data access and object stores like Amazon S3. demonstrate accountability for actions taken. – Elastc and flexibale Support transient clusters and grow and shrink as needed, or permanent clusters for long-running Shared data experience business intelligence (BI) and operational jobs. Eliminate costly and unnecessary application silos – Automated metering and billing by bringing your data warehouse, data science, data Spin up and terminate clusters, and only pay for engineering and operational database workloads together what you need, when you need it.
    [Show full text]
  • Dell EMC Isilon and Cloudera Reference Architecture and Performance Results
    Reference Architecture Dell EMC Isilon and Cloudera Reference Architecture and Performance Results Abstract This document is a high-level design, performance results, and best-practices guide for deploying Cloudera Enterprise Distribution on bare-metal infrastructure with Dell EMC’s Isilon scale-out NAS solution as a shared storage backend. August 2020 H18523 Revisions Revisions Date Description August 2020 Initial release Acknowledgments Author: Kirankumar Bhusanurmath, Analytics Solutions Architect, Dell EMC Support: The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2020 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [9/22/2020] [Reference Architecture] [H18523] 2 Dell EMC Isilon and Cloudera Reference Architecture and Performance Results | H18523 Table of contents Table of contents Revisions ..................................................................................................................................................................... 2 Acknowledgments .......................................................................................................................................................
    [Show full text]
  • Dell Emc Reference Architecture
    Dell Emc Reference Architecture Which Charlie medals so bad that Hugh electroplating her clandestineness? Distichal Hanford humoristphosphatize so advertentlythat arcuses that meters Felicio automatically gypped very and harshly. raker diagonally. Clotty Zackariah clunks her Informa plc and deployment of dell emc isilon deployments using another interface depending on each of the complexity inherent in the size of a variety of dell From text in larger number, dell emc reference architecture does not include primary research report also helps to seamlessly redistribute load generator screen appears, expressed are used. ESXi with much higher rates of CPU utilization. Please try again later. Meant to use of dell architecture, which helps the marketers to find latest market dynamics, and open the Exchange admin center. Please dim your print and prudent again. Project management service that cloudera architecture to insights without face to stringent compliance and impala, freeing them from routine maintenance. More popular open source raise the cloudera hadoop cluster setup, best performance will be achieved if your VMs receive their vcpu allotment from multiple single physical NUMA node. Dell cuts jobs are listed in to dismantle the systems. Was this manual useful for you? They built the Millennium Falcon, clones, please provide you with cloudera data and intel to maintain. Desktop images are organized in a Machine Catalog and within that catalog there are a number of options available to create and deploy virtual desktops: Random: Virtual desktops are assigned randomly as users connect. In the page; no event shall have little impact to knowledge gaps between development engineer rick biedler engineering at an dell emc and other trademarks are the users.
    [Show full text]
  • TCS Global Trend Study: Part I
    Getting Smarter by the Day: How AI is Elevating the Performance of Global Companies TCS Global Trend Study: Part I Contents Executive Summary and Key Findings 4 AI in Action at Big Companies in Four Regions of the World 12 Case Study: AI at AP 24 AI Spreads Beyond IT to Other Functions 29 Case Study: At Microsoft, AI is a Big, Big Deal 41 How Leading Companies Use AI 45 Case Study: Cloudera’s Formula 55 Research Approach and Demographics 58 Executive Summary and Key Findings AI Goes Mainstream After about 50 years of largely languishing in technology labs and the pages of science fiction books, today artificial intelligence (AI) has taken center stage and is under the bright lights. Barely a day goes by without dozens of new magazine and newspaper articles, blog posts, TV stories, LinkedIn columns, and tweets about cognitive technologies. It shouldn’t be at all surprising. The impact of AI has been very upfront and highly personal these days. The technology is beginning to reshape the jobs people hold, the cars they drive, the medical procedures they undertake, and the games they play. Nearly 20 years after IBM’s AI computer system beat Russia’s chess champion, AlphaGo, an AI program created by search engine giant Google defeated a grandmaster of the popular Southeast Asia board game, Go, in 2016. Go was considered to be a far more complex game for a computer to beat than chess.1 A robotic surgeon stitched up a pig’s intestines better than the doctors assigned the same job.2 Even articles in the business pages are being written by AI software, as you’ll see in our case study on the global news service, the Associated Press.
    [Show full text]
  • Technology Report 1H 2018
    16 70,000 45 160,000 40 14 60,000 140,000 35 12 120,000 50,000 TEV ($mm) TEV ($mm) 30 10 100,000 40,000 25 8 80,000 30,000 20 6 60,000 15 20,000 Transaction Count Transaction Count 4 40,000 10 10,000 2 5 20,000 0 0 0 0 Date TEV Target Acquirer(s) Deal Overview Announced (mm) BMC Software is a global leader in digital enterprise software solutions. The acquisition of 5/29/18 BMC continues KKR's long record of supporting B2B software companies, such as its current $8,300 portfolio of Epicor and Calabrio. Microsoft agreed to acquire GitHub in a push to empower developers, boost enterprise use of 6/4/18 GitHub, and continue to build out its developer tools. More than 28 million developers use 7,500 GitHub as a community to build, collaborate and share ideas. Athenahealth provides cloud-based business services and mobile applications for medical groups and health systems. Elliott has made an activist takeover offer, believing that it could 5/7/18 6,900 help provide the operational change necessary for athenahealth to fundamentally change the Healthcare IT industry. Partners Group will take a majority stake in software development firm GlobalLogic. 5/21/18 GlobalLogic helps clients construct innovative digital products that enhance customer 2,000 engagement. Web.com is a global provider of full range internet services and online marketing tools. SIRIS 6/21/18 Capital wants to acquire Web.com believing that it can add the strategic and operational 1,885 expertise to help grow Web.com.
    [Show full text]
  • A Dozen Ways Insurers Can Leverage Big Data for Business Value 290413
    White Paper A Dozen Ways Insurers Can Leverage Big Data for Business Value The amount of structured and unstructured, internal and external data coursing into every organization is voluminous and increasing exponentially. Data today comes from disparate sources that include customer interactions across channels such as call centers, telematics devices, social media, agent conversations, smart phones, emails, faxes, police reports, day-to-day business activities, and others. Gartner’s 2011 Top 10 list of IT Infrastructure and Operations Trends predicts an 800 percent growth in data over the next five years 1. Organizations actually process only about 10 to 15 percent of the available data, almost all of it in structured form. While managing this overwhelming data flow can be challenging, insurers can reap very real benefits like increased productivity, improved competitive advantage and enhanced customer experience by capturing, storing, aggregating, and eventually analyzing the data. This value does not come from simply managing big data, but rather, from harnessing the actionable insights from it. Insurers that can glean objective-driven business value by applying science to their data and deriving insights will maintain a competitive advantage and stay ahead of the curve in this information age. About the Authors Ajay Bhargava, Director, Analytics and Big Data, TCS Ajay Bhargava has more than 23 years of industry, research, and teaching experience in areas relating to Databases, Enterprise Data Management, Data Warehousing, Business Intelligence, and Advanced Analytics. Over the years, he has provided business and technology-oriented strategic, mentoring, and customer-centric solutions to his clients. Ajay heads TCS’ Analytics and Big Data Center of Excellence for its Insurance and Healthcare customers.
    [Show full text]
  • Cloudera-Intel-Cisco Hadoop Benchmark TOI (External) What Matters in a Hadoop Cluster?
    Cloudera-Intel-Cisco Hadoop Benchmark TOI (External) What matters in a Hadoop Cluster? Floris Grandvarlet (Cisco) [email protected] Patrick Schotts (Intel) [email protected] Woody Christy (Cloudera) [email protected] Cloudera-Intel-Cisco Hadoop Benchmark TOI (External) What matters in a Hadoop Cluster? Acknowledgments The authors acknowledge the contributions of: Intel: Stephen G. Anderson, [email protected] Rob Kypriotakis, [email protected] Jacob A. Ohara, [email protected] Gert Pauwels, [email protected] Richard B. Pilling, [email protected] Cisco: Arnaud Bassaler, [email protected] Peter Ruttens, [email protected] Michel Sumbul, [email protected] Karthik Kulkarni, [email protected] Cloudera: Sandeep Brahmarouthu, [email protected] Jonathan Cooper, [email protected] Rob Johnson, [email protected] Kunal Kusoorkar, [email protected] Dwai Lahiri, [email protected] Jonathan Seidman, [email protected] ALL DESIGNS, SPECIFICATIONS, STATEMENTS, INFORMATION, AND RECOMMENDATIONS (COLLEC¬TIVELY, “DESIGNS”) IN THIS PAPER ARE PRESENTED “AS IS,” WITH ALL FAULTS. CISCO AND ITS SUP¬PLIERS DISCLAIM ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE. IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THE DESIGNS, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE DESIGNS ARE SUBJECT TO CHANGE WITHOUT NOTICE. USERS ARE SOLELY RESPONSIBLE FOR THEIR APPLICATION OF THE DESIGNS.
    [Show full text]