Course Title

Course Title

"Charting the Course ... ... to Your Success!" Real World Hadoop in the Enterprise Course Summary Description Apache Hadoop is an OpenSource(™) framework for creating reliable and distributable compute clusters. Credited with the IBM Watson Jeopardy win in 2011, Hadoop can be used (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This class is targeted towards the Java Developer and assumes working knowledge of Java programming in Eclipse and comfort in a Unix shell environment. We will go well beyond the "Hello World" word-count example into practical, applied uses of Hadoop in large-scale real-world scenarios, including fraud detection, algorithmic trading, and data mining. Students will develop in an environment architected for a dynamically changing business-rule driven infrastructure with multiple disparate data sources and large-scale datasets on a real Hadoop/Drools cluster. Topics Overview Applying Business Rules with Drools Hadoop Architecture Pig and Pig Pipelines Retrieving and Localizing Data Working with the Hive Feeding Hadoop in the Enterprise Testing, Performance and Troubleshooting Machine Learning with Mahout Other Optional Overview Topics Audience This class is designed for Java Developers. Prerequisites This class assumes working knowledge of Java programming in Eclipse and comfort in a Unix shell environment. Introduction to Java (IJSEP) - Experience developing Java with Eclipse Introduction to Unix (UNIXI) - Exposure to bash or tcsh shell use Data Persistence with JPA 2 - Experience using JPA and data access Duration Five days Due to the nature of this material, this document refers to numerous hardware and software products by their trade names. References to other companies and their products are for informational purposes only, and all trademarks are the properties of their respective companies. It is not the intent of ProTech Professional Technical Services, Inc. to use any of these names generically PT0756_REALWORLDHADOOPINTHEENTERPRISE.DOC "Charting the Course ... ... to Your Success!" Real World Hadoop in the Enterprise Course Outline I. Overview E. Bayesian Classifiers A. Map/Reduce F. Analytics B. Hadoop G. Random Forests C. NoSQL H. Decision Support with Mahout and Hadoop D. Mahout E. Alternate Frameworks VI. Applying Business Rules with Drools A. Drools Overview II. Hadoop Architecture B. Integrating Rules-based approach with A. Hadoop Map/Reduce Hadoop B. HDFS C. Decision Making with Drools and Hadoop C. Cassandra D. Integrating Drools, Mahout, and Hadoop D. HBase E. Hive VII. Pig and Pig Pipelines F. Pig A. Pig Latin B. Pig Pipelines III. Retrieving and Localizing Data C. Pig UDFs (User Defined Functions) A. Using JPA in Map/Reduce: Pros and Cons B. HDFS VIII. Working with the Hive C. NoSQL A. Hive and HDFS D. HBase B. Meta-data and indexing E. Cassandra C. Hive UDFs (User Defined Functions) F. Neo4J D. Hive and Apache S3 G. Sqoop E. HQL H. Flume I. Caching with JBoss Infinispan IX. Testing, Performance and Troubleshooting J. Caching with OpenTerracotta A. TDD with MRUnit K. Using Spring Data B. TDD with other Unit Testing Frameworks C. Bottleneck discovery IV. Feeding Hadoop in the Enterprise D. Monitoring A. Apache UIMA E. Join Framework Optimization B. Spring Integration F. Troubleshooting C. Apache Camel G. Hadoop and Virtualization D. Spring Batch H. Hadoop in the Cloud I. Hadoop and Amazon EC2 V. Machine Learning with Mahout A. Artificial Intelligence Overview X. Other Optional Overview Topics B. Fuzzy Logic A. Storm Project C. K-Means B. Apache Kafka D. Pattern Mining C. Cassandra Bolt Due to the nature of this material, this document refers to numerous hardware and software products by their trade names. References to other companies and their products are for informational purposes only, and all trademarks are the properties of their respective companies. It is not the intent of ProTech Professional Technical Services, Inc. to use any of these names generically PT0756_REALWORLDHADOOPINTHEENTERPRISE.DOC .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    2 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us