<<

Stream data processing made fast and easy

Cloudera & Ecosystem 2018

Miguel Gaspar Solutions Architect, Data Analytics and IoT at Hitachi Vantara

© Hitachi Vantara Corporation 2018. All Rights Reserved Advanced X as a Data Analytics Service Scientists OT/IT Software Expertise

Hitachi DataHitachi VantaraHitachi Insight Systems Group

Solutions Infrastructure and Services Machine Partner Vertical Learning Ecosystem Expertise

© Hitachi Vantara Corporation 2018. All Rights Reserved Hitachi’s Unique Value

Only Hitachi Vantara combines CITY CLOUD deep IT, OT and domain expertise INFRASTRUCTURE

INDUSTRIAL OT IoT IT ANALYTICS IS IN 107 OUR 58 APPLICATIONS BUSINESS YEARS DNA YEARS

CONSUMER BIG DATA

© Hitachi Vantara Corporation 2018. All Rights Reserved Top 100 on Fortune’s Global 500

▪ Streaming ▪ IoT Platform ▪ Content platforms ▪ COE & Best practices ▪ Hitachi Consulting ▪ Smart Cities, Industry and Energy

▪ Hitachi Labs ▪ Deep bench of data scientists ▪ Vertical and domain expertise © Hitachi Vantara Corporation 2018. All Rights Reserved “Digital”, how hard and fast can it be?

Companies are spending $B to take the most out of digital

Digital transformation becomes mandatory and a priority

Digital disruptors or predator are changing the rules of businesses

© Hitachi Vantara Corporation 2018. All Rights Reserved Big data + cloud = exponential change

Business are Technology is thinking linearly changing nonlinearly

Source: Move Your Big Data Into The Cloud Forrester report © Hitachi Vantara Corporation 2018. All Rights Reserved A Spectrum of Big Data Use Cases What the Market is Deploying today and Planning for Tomorrow

Next Generation Use Cases

360 Degree Monetize My View Data

Transform Harnessing Machine & Big Data Sensor Data Streamlined Predictive Data Analytics Refinery Filling the Internal Big Data Data as a On-Demand Warehouse Business Impact Business Service Big Data Optimization Blending

Big Data

Exploration Optimize

Use Case Complexity Entry Advanced

© Hitachi Vantara Corporation 2018. All Rights Reserved Big data + cloud = exponential change

Flexibility, agility, and speed

Source: Move Your Big Data Into The Cloud Forrester report © Hitachi Vantara Corporation 2018. All Rights Reserved But Big Data is HARD ?

"Through 2018, 70% of Hadoop ① New skills necessary deployments will not meet cost ② High effort and risk savings and revenue generation objectives due to skills and ③ Continuous change integration challenges.” -- Gartner1

1) Gartner Analyst, Nick Heudecker; infoworld.com, Sept 2015

© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho to removing tactical barriers and delivering strategic business value Those use cases are transformative – what do they have in common?

An end-to-end approach to driving big data value

Data integration that is intuitive and powerful

Flexible onboarding that A platform that drives accelerates time to value real business insights

Solutions: People + Technology + Flexibility

© Hitachi Vantara Corporation 2018. All Rights Reserved What Does Value Look Like?

BT reduces cyber FINRA detected Caterpillar saves threat detection time $96.2 million in $750k per ship with from weeks to fraud. predictive seconds. maintenance.

© Hitachi Vantara Corporation 2018. All Rights Reserved A bit of history and releases...

© Hitachi Vantara Corporation 2018. All Rights Reserved … 5 years ago

▪ More or less 4 to 5 years ago, ‒ Pentaho was just a regular ETL tool, it was probably the best, even so, it was a traditional ETL tool

▪ Architect + Pentaho Founder: ‒ How about to integrate Pentaho on Hadoop

© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 6.0

▪ September 29, 2015, New York City (Strata Conference + Hadoop World) Pentaho 6.0 was announced

▪ Among many of the announced feature: ‒ Data shaping on the fly with visual data transformations over Map Reduce ‒ So by definition we’ve integrated Pentaho with Hadoop and Big Data technologies ‒ Indirectly Maybe by accident we’ve made development on Big Data easier and faster

© Hitachi Vantara Corporation 2018. All Rights Reserved Data integration that is intuitive and powerful

Pentaho MapReduce – From visual design to powerful native Hadoop execution

Data Source Data Target

Read Transform Write

Read Transform Write Hadoop

Read Transform Write Distributed Cache Distributed

Complex transform logic developed and deployed much, much faster, extracting value much, much faster Rapid time to value and high performance when your business needs to scale and be more agile © Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 7.0

▪ Mid-November 2016, Pentaho 7.0 was released

▪ Among other features and improvements ‒ Coordinate and schedule Spark applications that use a wider variety of libraries, including ‒ Spark Streaming and ‒ and Spark SQL, ‒ as well as SparkML and Spark MLlib for machine learning ‒ Support Spark 1.6 ‒ But, was only Spark Submit in Pentaho 7.0

© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 7.1

▪ Mid 2017 Pentaho 7.1 was released

▪ Adaptive Execution on Big Data, with adaptive execution on Spark in a visual environment ‒ No need to know or other programming language for Spark, development skills – with Pentaho you only need to design your logic once, regardless of execution engine. ‒ Support Spark 2.1+ ‒ Supported: Cloudera CDH 5.9+ ‒ Kerberos Impersonation for AEL-Spark

© Hitachi Vantara Corporation 2018. All Rights Reserved Powerful and Intuitive Data Integration

Adaptive Execution Layer - Spark

Build Once, Execute on Any Engine Rapid time to value and high performance when your business needs to scale © Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho 8.0

▪ November 2017 was released Pentaho 8.0 ‒ Worker nodes for better scalability ‒ Stream Processing with Spark: With its adaptive execution capabilities, Pentaho 8.0 fully enables real-time data ingestion from Kafka using Spark Streaming ‒ Connect to Kafka streams: Pentaho 8.0 enables real-time processing, monitoring and aggregation with specialized steps that connect Pentaho Data Integration to Kafka ‒ Big Data Security with Knox: Building on existing enterprise-level security for major Hadoop distribution ‒ Native support for Avro and Parquet

© Hitachi Vantara Corporation 2018. All Rights Reserved NEW Stream Data Processing with Pentaho

▪ Ingest and produce data from/to Kafka using the NEW steps

▪ Process micro-batch chunks of data using either a time-based or a message size-based window

▪ Switch processing engines between Spark (Streaming) or Native Kettle

▪ Hardened stream processing libraries and steps to process data from traditional message queues

• Benefits: ‒ Lower the bar to build streaming applications

‒ Enable combining batch and stream data processing

© Hitachi Vantara Corporation 2018. All Rights Reserved Combined Data Processing Using Pentaho

DATA SOURCES

HADOOP/SPARK CLUSTER

IoT Data Micro Services Kafka RT Data Batch Data Cluster Processors Processors Hadoop MR Web Clickstream and Other Logs Data Data Analytical Pentaho Collector Publishe Databases Analytic Data Store s HDFS Pentaho DI Pentaho DI Traditional DB/DW PDI collects data from PDI can retrieve processed and NoSQL sources including or blended data from Datastores Kafka Clusters Hadoop/ Spark and publish Kafka to Kafka clusters or external Pentaho DI Cluster PDI can process streaming data using databases Spark & Spark Streaming or Kettle engine in a completely visual way Traditional Message Bus Ingest Process Publish Reporting

© Hitachi Vantara Corporation 2018. All Rights Reserved NEW Stream Data Processing with Pentaho

© Hitachi Vantara Corporation 2018. All Rights Reserved Traditionally …

Data Engineering Data Prep Analytics

Data Discovery Analysis Ingestion Processing Blending Data Delivery / Analysis & Dashboards

© Hitachi Vantara Corporation 2018. All Rights Reserved Pentaho vision …

Data Engineering Data Prep Analytics

Data Discovery Analysis Ingestion Processing Blending Data Delivery / Analysis & Dashboards

Lifecycle Data Dynamic Data Administration Security Management Provenance Pipeline Monitoring Automation

© Hitachi Vantara Corporation 2018. All Rights Reserved © Hitachi Vantara Corporation 2018. All Rights Reserved Thank You

© Hitachi Vantara Corporation 2018. All Rights Reserved © Hitachi Vantara Corporation 2018. All Rights Reserved