Pentaho Data Integration's Key Component: the Adaptive Big Data

Pentaho Data Integration’s Key Component: The Adaptive Big Data Layer Driving the Flexibility and Power of Pentaho Data Integration (PDI) to Future-Proof You Against Big Data Technology Advancements The Not-So-Hidden Intricacies of Transforming Big Data into Business Insights As any IT or developer department knows, there are growing demands to turn data into business insights for multiple business departments, all with heterogeneous data sources. In order to fulfill these demands with high-impact results, it is critical that data projects are executed in a timely and accurate fashion. With the multitude of ever-changing big data sources, such as Hadoop, building analytic solutions that can blend data from disparate sources is a daunting task. Further, many companies do not have the highly skilled resources for difficult-to-architect big data projects. The Adaptive Big Data Layer: A Protective Buffer Between Technical Teams and Big Data Complexity PDI’s Adaptive Big Data Layer (ABDL) gives Pentaho users the ability to work with any big data source, providing a fully functioning buffer to insulate IT analysts and developers from data complexity. Created with common data integration challenges in mind, the tool enables technical teams to operationalize existing and emerging big data implementations in an agile manner. The ABDL accounts for the specific and unique intricacies of each data source to provide flexible, fast, and accurate access to all data sources. TRANSFORMATION MONITORING JOB Embedded Pentaho Data Integration Step Plugins Adaptive Big Data Layer Perspective Plugins Data Sort MongoDB Hadoop and More Schedule Integration and More Cloudera Amazon EMR Hortonworks MapR and More Copyright ©2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at pentaho.com. How it Works As a core component within Pentaho Data Integration, the ABDL insulates the data developer from the shifting sands of big data and enables “create once, run anywhere” transformation processes to work against any big data store (Hadoop, analytic database, NoSQL store, and others) allowing immediate communication with minimal effort. Enriching and blending data across big data sources with the assistance of PDI’s ABDL becomes a simplified process of drag and drop. Benefits of ABDL ABDL gives IT/Developers access to the broadest, A Pentaho MapReduce Case Study deepest, and easiest to use big data integration in Major Financial Institution the industry, allowing for: > Insulation from changing big data sources: versions, vendors, data stores Business Challenge > Reduced risk from proven and tested PDI > Gain competitive advantage through intraday transformations balance reporting for commercial customers > Rapid time to value with real-time updated > Technical proof of concept transformation options based on vendor updates Job 1: Extract, clean/manipulate, load in > Native integration into the big data ecosystem, Hadoop HDFS (10 lookups to HDFS) whether it’s an analytic database (HP Vertica, Job 2: Load proceeded results into RDBMS Amazon Redshift), Hadoop distribution (Amazon EMR, Cloudera, Hortonworks, MapR), NoSQL (MongoDB, Cassandra), or specialized technology (Splunk, Spark) Java Pentaho > Simplified blending of data across multiple big MapReduce MapReduce data sources to provide the richest insights 2 Java developers 1 ETL developer > Unparalleled flexibility to work with any big data Staff source or combination of sources from major SI (50% the resource) Proven ROI From Pentaho Customers 1+ month Time 2 days (15x faster) The Adaptive Big Data Layer has achieved significant returns with: Job 1: 5 min. 43 sec Job 1: 20 sec > Faster development: Drag-and-drop ETL through Pentaho MapReduce is up to 15X faster than Job 2: 7 min. 45 sec Job 2: 27 sec manual coding, meaning major time and cost savings on development 13 min 45 sec Total 47 sec (17x faster) > Faster execution: Run queries up to 40x faster resulting in lower project turnaround time Incorrect data due Results Accurate Results > Minimal Training: Easy-to-use ETL tools allow to coding errors users to get up to speed quickly, with little retraining required, equating to less staff overhead > Deployment flexibility: Simple movement and integration of Big Data deployed on-premise or in the cloud To learn more about Pentaho software and services, contact Pentaho at pentaho.com/contact or +1 (866) 660-7555 Be social with Pentaho: Copyright ©2015 Pentaho Corporation. All rights reserved. 015-529.

Pentaho Data Integration's Key Component: the Adaptive Big Data

The Pentaho Big Data Guide This Document Supports Pentaho Business Analytics Suite 4.8 GA and Pentaho Data Integration 4.4 GA, Documentation Revision October 31, 2012

Star Schema Modeling with Pentaho Data Integration

A Plan for an Early Childhood Integrated Data System in Oklahoma

Base Handbook Copyright

Open Source ETL on the Mainframe

Chapter 6 Reports Copyright

Pentaho Machine Learning Orchestration

Pentaho MAPR510 SHIM 7.1.0.0 Open Source Software Packages

Upgrading to Pentaho Business Analytics 4.8 This Document Is Copyright © 2012 Pentaho Corporation

Pentaho and Jaspersoft: a Comparative Study of Business Intelligence Open Source Tools Processing Big Data to Evaluate Performances

Deliver Performance and Scalability with Hitachi Vantara's Pentaho

The HPCC Cluster Computing Paradigm and an Efficient Data-Centric Programming Language Are Key Factors in Our Company's Success