Stream data processing made fast and easy
Cloudera & Big Data Ecosystem 2018
Miguel Gaspar Solutions Architect, Data Analytics and IoT at Hitachi Vantara
© Hitachi Vantara Corporation 2018. All Rights Reserved Advanced X as a Data Analytics Service Scientists OT/IT Software Expertise
Hitachi DataHitachi VantaraHitachi Insight Systems Pentaho Group
Solutions Infrastructure and Services Machine Partner Vertical Learning Ecosystem Expertise
© Hitachi Vantara Corporation 2018. All Rights Reserved Hitachi’s Unique Value
Only Hitachi Vantara combines CITY CLOUD deep IT, OT and domain expertise INFRASTRUCTURE
INDUSTRIAL OT IoT IT ANALYTICS IS IN 107 OUR 58 APPLICATIONS BUSINESS YEARS DNA YEARS
CONSUMER BIG DATA
© Hitachi Vantara Corporation 2018. All Rights Reserved Top 100 on Fortune’s Global 500
▪ Streaming ▪ IoT Platform ▪ Content platforms ▪ COE & Best practices ▪ Hitachi Consulting ▪ Smart Cities, Industry and Energy
▪ Hitachi Labs ▪ Deep bench of data scientists ▪ Vertical and domain expertise © Hitachi Vantara Corporation 2018. All Rights Reserved “Digital”, how hard and fast can it be?
Companies are spending $B to take the most out of digital
Digital transformation becomes mandatory and a priority
Digital disruptors or predator are changing the rules of businesses
© Hitachi Vantara Corporation 2018. All Rights Reserved Big data + cloud = exponential change
Business are Technology is thinking linearly changing nonlinearly
Source: Move Your Big Data Into The Cloud Forrester report © Hitachi Vantara Corporation 2018. All Rights Reserved A Spectrum of Big Data Use Cases What the Market is Deploying today and Planning for Tomorrow
Next Generation Use Cases
360 Degree Monetize My View Data
Transform Harnessing Machine & Big Data Sensor Data Streamlined Predictive Data Analytics Refinery Filling the Data Lake Internal Big Data Data as a On-Demand Warehouse Business Impact Business Service Big Data Optimization Blending
Big Data
Exploration Optimize
Use Case Complexity Entry Advanced
© Hitachi Vantara Corporation 2018. All Rights Reserved Big data + cloud = exponential change
Flexibility, agility, and speed
Source: Move Your Big Data Into The Cloud Forrester report © Hitachi Vantara Corporation 2018. All Rights Reserved But Big Data is HARD ?
"Through 2018, 70% of Hadoop ① New skills necessary deployments will not meet cost ② High effort and risk savings and revenue generation objectives due to skills and ③ Continuous change integration challenges.” -- Gartner1
1) Gartner Analyst, Nick Heudecker; infoworld.com, Sept 2015
© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho to removing tactical barriers and delivering strategic business value Those use cases are transformative – what do they have in common?
An end-to-end approach to driving big data value
Data integration that is intuitive and powerful
Flexible onboarding that A platform that drives accelerates time to value real business insights
Solutions: People + Technology + Flexibility
© Hitachi Vantara Corporation 2018. All Rights Reserved What Does Value Look Like?
BT reduces cyber FINRA detected Caterpillar saves threat detection time $96.2 million in $750k per ship with from weeks to fraud. predictive seconds. maintenance.
© Hitachi Vantara Corporation 2018. All Rights Reserved A bit of history and releases...
© Hitachi Vantara Corporation 2018. All Rights Reserved … 5 years ago
▪ More or less 4 to 5 years ago, ‒ Pentaho Data Integration was just a regular ETL tool, it was probably the best, even so, it was a traditional ETL tool
▪ Architect + Pentaho Founder: ‒ How about to integrate Pentaho on Hadoop
© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 6.0
▪ September 29, 2015, New York City (Strata Conference + Hadoop World) Pentaho 6.0 was announced
▪ Among many of the announced feature: ‒ Data shaping on the fly with visual data transformations over Map Reduce ‒ So by definition we’ve integrated Pentaho with Hadoop and Big Data technologies ‒ Indirectly Maybe by accident we’ve made development on Big Data easier and faster
© Hitachi Vantara Corporation 2018. All Rights Reserved Data integration that is intuitive and powerful
Pentaho MapReduce – From visual design to powerful native Hadoop execution
Data Source Data Target
Read Transform Write
Read Transform Write Hadoop
Read Transform Write Distributed Cache Distributed
Complex transform logic developed and deployed much, much faster, extracting value much, much faster Rapid time to value and high performance when your business needs to scale and be more agile © Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 7.0
▪ Mid-November 2016, Pentaho 7.0 was released
▪ Among other features and improvements ‒ Coordinate and schedule Spark applications that use a wider variety of libraries, including ‒ Spark Streaming and ‒ and Spark SQL, ‒ as well as SparkML and Spark MLlib for machine learning ‒ Support Spark 1.6 ‒ But, was only Spark Submit in Pentaho 7.0
© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 7.1
▪ Mid 2017 Pentaho 7.1 was released
▪ Adaptive Execution on Big Data, with adaptive execution on Spark in a visual environment ‒ No need to know Java or other programming language for Spark, development skills – with Pentaho you only need to design your logic once, regardless of execution engine. ‒ Support Spark 2.1+ ‒ Supported: Cloudera CDH 5.9+ ‒ Kerberos Impersonation for AEL-Spark
© Hitachi Vantara Corporation 2018. All Rights Reserved Powerful and Intuitive Data Integration
Adaptive Execution Layer - Spark
Build Once, Execute on Any Engine Rapid time to value and high performance when your business needs to scale © Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 8.0
▪ November 2017 was released Pentaho 8.0 ‒ Worker nodes for better scalability ‒ Stream Processing with Spark: With its adaptive execution capabilities, Pentaho 8.0 fully enables real-time data ingestion from Kafka using Spark Streaming ‒ Connect to Kafka streams: Pentaho 8.0 enables real-time processing, monitoring and aggregation with specialized steps that connect Pentaho Data Integration to Kafka ‒ Big Data Security with Knox: Building on existing enterprise-level security for major Hadoop distribution ‒ Native support for Avro and Parquet
© Hitachi Vantara Corporation 2018. All Rights Reserved NEW Stream Data Processing with Pentaho
▪ Ingest and produce data from/to Kafka using the NEW steps
▪ Process micro-batch chunks of data using either a time-based or a message size-based window
▪ Switch processing engines between Spark (Streaming) or Native Kettle
▪ Hardened stream processing libraries and steps to process data from traditional message queues
• Benefits: ‒ Lower the bar to build streaming applications
‒ Enable combining batch and stream data processing
© Hitachi Vantara Corporation 2018. All Rights Reserved Combined Data Processing Using Pentaho
DATA SOURCES
HADOOP/SPARK CLUSTER
IoT Data Micro Services Kafka RT Data Batch Data Cluster Processors Processors Hadoop MR Web Clickstream and Other Logs Data Data Analytical Pentaho Collector Publishe Databases Analytic Data Store r s HDFS Pentaho DI Pentaho DI Traditional DB/DW PDI collects data from PDI can retrieve processed and NoSQL sources including or blended data from Datastores Kafka Clusters Hadoop/ Spark and publish Kafka to Kafka clusters or external Pentaho DI Cluster PDI can process streaming data using databases Spark & Spark Streaming or Kettle engine in a completely visual way Traditional Message Bus Ingest Process Publish Reporting
© Hitachi Vantara Corporation 2018. All Rights Reserved NEW Stream Data Processing with Pentaho
© Hitachi Vantara Corporation 2018. All Rights Reserved Traditionally …
Data Engineering Data Prep Analytics
Data Discovery Analysis Ingestion Processing Blending Data Delivery / Analysis & Dashboards
© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho vision …
Data Engineering Data Prep Analytics
Data Discovery Analysis Ingestion Processing Blending Data Delivery / Analysis & Dashboards
Lifecycle Data Dynamic Data Administration Security Management Provenance Pipeline Monitoring Automation
© Hitachi Vantara Corporation 2018. All Rights Reserved © Hitachi Vantara Corporation 2018. All Rights Reserved Thank You
© Hitachi Vantara Corporation 2018. All Rights Reserved © Hitachi Vantara Corporation 2018. All Rights Reserved