DataWorks Summit Berlin 2018 Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager April 19, 2018
Artem Ervits – Hortonworks Clay Baenziger – Bloomberg
© 2018 Bloomberg Finance L.P. All rights reserved. Poll:
• Who here uses Oozie? — In production? With kerberos? — Do you use HUE with Oozie? — How many workflows have you in production? 1-10? 10-50? 50+? — How many actions does the largest workflow contain? 1-10? 10-50? 50+? — Do you use Oozie with (or want to)? HBase? Spark? Python? Deployment Automation?
• Do you like XML? — Do you have a favorite editor for Oozie workflows?
© 2018 Bloomberg Finance L.P. All rights reserved. Open Source Workflow Managers
• Apache Airflow (Incubating) • Luigi by Spotify • Azkaban by LinkedIn • (And of course) Apache Oozie
© 2018 Bloomberg Finance L.P. All rights reserved. Introduction to Oozie
• Oozie is a workflow scheduler system to manage Apache Hadoop jobs. • Oozie workflow jobs are Directed Acyclic Graphs (DAGs) of actions. • Oozie coordinator jobs are recurrent Oozie workflow jobs triggered by time and data availability. • Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs as well as system specific jobs out of the box. • Oozie is a scalable, reliable and extensible system. - Paraphrased from http://oozie.apache.org
Actions: • Map/Reduce • Java • E-Mail • Hive • Shell • Decision • Pig • Spark • Fork • HDFS • Sub-Workflow • Join
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Release Timeline
• 1.x released in 2010. Yahoo! project with two GitHub releases. Added support for workflow jobs.
• 2.x released in 2011. Still with Yahoo! with nine GitHub releases. Added support for coordinator jobs.
• 3.x released in 2013. Project under Apache. Added support for bundle jobs and HBase credentials.
• 4.x released in 2014. Added support for Hive/HCatalog, Spark integration and Oozie server high availability.
• 5.0 released April 2018. Removes support for Hadoop 1, adds support for Hadoop 3, YARN AM instead of MR launcher, new actions, code clean up.
- Adopted from: Apache Oozie by Mohammad Kamrul Islam and Aravind Srinivasan
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints
• Launcher jobs as map tasks • Dated UI • Confusing object model – workflows, coordinators, bundles • Complicated setup • XML • DAG visualization • SLA alerting • Fine grained authorization • Easy access to log files
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Complaints Improvements
• Launcher jobs as map tasks – solved by Oozie 5.0.0, OOZIE-1770 • Dated UI – OOZIE-2683, targeted for Oozie 5.X (Hue and Workflow Manager today) • Confusing object model – jobs API, patch available, targeted for 5.X, OOZIE-2339 • Complicated setup – can deploy with embedded Jetty in Oozie 5.0.0, OOZIE-2666 • XML – fluent job API, patch available, targeted for 5.X, OOZIE-2339 • DAG visualization – solved by Oozie 5.0.0, OOZIE-2406 • SLA alerting – since Oozie 4.0.0, OOZIE-1294 • Fine grained authorization – targeted for Oozie 5.X, OOZIE-3196 • Easy access to log files – solved by Oozie 5.0.0, OOZIE-2296
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Prior to Release 5.0
• MR launcher job
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Launcher – Release 5.0
• OYA: OOZIE-1770: Create Oozie Application Master for YARN — Removes MR launcher job
• Design Doc
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Documentation – Before Release 5.0 and After
Documentation redesign
OOZIE-3163: Improve documentation rendering: use fluido skin and better config
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Workflow Visualization – Prior to 5.0 and After
Jung GraphViz
OOZIE-2406: Completely rewrite Graph Generator code
© 2018 Bloomberg Finance L.P. All rights reserved. Oozie Fluent Job API – Apache Oozie 5.X (Preview)
OOZIE-2339: Provide an API for writing jobs based on the XSD schemas
© 2018 Bloomberg Finance L.P. All rights reserved. Apache Ambari
Ambari Provides: • Provisioning of a Hadoop Cluster
• Management of a Hadoop Cluster
• Monitoring of a Hadoop Cluster — A Metrics System for metrics collection — An Alert Framework — A dashboard for monitoring the Hadoop cluster
-Paraphrased from http://ambari.apache.org
© 2018 Bloomberg Finance L.P. All rights reserved. Ambari Views
• Ambari Views ”offer a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in Ambari Web. A "view" is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.”
• Key takeaways: — One does not need an Ambari managed (administrated) cluster — Third parties can build views packages to run in the Ambari framework too — Major views available: (YARN) Capacity Scheduler, (HDFS) Files, HAWQ, Hive, Pig, Storm, Tez, (YARN ATS) Jobs, (Oozie) Workflow Manager
• Alternatives: Cloudera Hue, bespoke applications
© 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Motivation
• Oozie workflows are defined in XML – too verbose — Provide GUI workflow builder and editor — Reduce possibility of user introduced errors — Provide browser based workflow manager • Integration with File Browser — Includes S3 support — Can replace existing Oozie web UI • Oozie is hard-coded to display only 25 actions — WFM doesn’t have this limit; tested with 300+ action nodes • Oozie is scalable — Can scale WFM by standing-up multiple Ambari Views servers
© 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Editor Example
Workflow Manager: • Available as an Ambari View • Enables visual editing of Oozie workflows • Integrated with file browser • Reduces user input errors • Minimal input required
© 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Execution View Example
• Integrated Dashboard with Workflow Manager View • Manage Oozie jobs • Drill down to logs
© 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Design Component
© 2018 Bloomberg Finance L.P. All rights reserved. Workflow Manager – Workflow Dashboard Component
Good Documentation: HDP 2.6 – Workflow Manager Basics
© 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Workflow Manager Examples with HBase
• Setup Oozie – Server and Workflows • Data Definition – Tables, ACLs • Compactions – Operational
© 2018 Bloomberg Finance L.P. All rights reserved. HBase – Setup
Oozie needs HBase Configuration: • Oozie Server Code (to support HBase delegation tokens) — In libexec (see Server JARs list) — In oozie-site.xml
© 2018 Bloomberg Finance L.P. All rights reserved. HBase – Data Definition
HBase Shell:
© 2018 Bloomberg Finance L.P. All rights reserved. HBase – Compactions
HBASE-19528: Major Compaction Tool • Automatically scales compaction to selected number of servers • Requires read ability to /hbase usage: MajorCompactor [-cf
Usage instructions -cf
© 2018 Bloomberg Finance L.P. All rights reserved. More Resources
• Apache Oozie Mailing Lists: http://oozie.apache.org/mail-lists.html • Artem’s Oozie Resources: —12 Part Series on WFM: http://bit.ly/2syKUIh — Oozie Examples: https://github.com/dbist/oozie-examples • Clay’s Past Oozie Presentations: — Code Deployment via Oozie: Apache BigData http://bit.ly/2sP2qbj — HBase Multi-Tenancy with Oozie: DataWorks Summit http://bit.ly/2rw7FIR
© 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Demo!
© 2018 Bloomberg Finance L.P. All rights reserved. DataWorks Summit Berlin 2018 Questions?
© 2018 Bloomberg Finance L.P. All rights reserved.