
"Charting the Course ... ... to Your Success!" Real World Hadoop in the Enterprise Course Summary Description Apache Hadoop is an OpenSource(™) framework for creating reliable and distributable compute clusters. Credited with the IBM Watson Jeopardy win in 2011, Hadoop can be used (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence. This class is targeted towards the Java Developer and assumes working knowledge of Java programming in Eclipse and comfort in a Unix shell environment. We will go well beyond the "Hello World" word-count example into practical, applied uses of Hadoop in large-scale real-world scenarios, including fraud detection, algorithmic trading, and data mining. Students will develop in an environment architected for a dynamically changing business-rule driven infrastructure with multiple disparate data sources and large-scale datasets on a real Hadoop/Drools cluster. Topics Overview Applying Business Rules with Drools Hadoop Architecture Pig and Pig Pipelines Retrieving and Localizing Data Working with the Hive Feeding Hadoop in the Enterprise Testing, Performance and Troubleshooting Machine Learning with Mahout Other Optional Overview Topics Audience This class is designed for Java Developers. Prerequisites This class assumes working knowledge of Java programming in Eclipse and comfort in a Unix shell environment. Introduction to Java (IJSEP) - Experience developing Java with Eclipse Introduction to Unix (UNIXI) - Exposure to bash or tcsh shell use Data Persistence with JPA 2 - Experience using JPA and data access Duration Five days Due to the nature of this material, this document refers to numerous hardware and software products by their trade names. References to other companies and their products are for informational purposes only, and all trademarks are the properties of their respective companies. It is not the intent of ProTech Professional Technical Services, Inc. to use any of these names generically PT0756_REALWORLDHADOOPINTHEENTERPRISE.DOC "Charting the Course ... ... to Your Success!" Real World Hadoop in the Enterprise Course Outline I. Overview E. Bayesian Classifiers A. Map/Reduce F. Analytics B. Hadoop G. Random Forests C. NoSQL H. Decision Support with Mahout and Hadoop D. Mahout E. Alternate Frameworks VI. Applying Business Rules with Drools A. Drools Overview II. Hadoop Architecture B. Integrating Rules-based approach with A. Hadoop Map/Reduce Hadoop B. HDFS C. Decision Making with Drools and Hadoop C. Cassandra D. Integrating Drools, Mahout, and Hadoop D. HBase E. Hive VII. Pig and Pig Pipelines F. Pig A. Pig Latin B. Pig Pipelines III. Retrieving and Localizing Data C. Pig UDFs (User Defined Functions) A. Using JPA in Map/Reduce: Pros and Cons B. HDFS VIII. Working with the Hive C. NoSQL A. Hive and HDFS D. HBase B. Meta-data and indexing E. Cassandra C. Hive UDFs (User Defined Functions) F. Neo4J D. Hive and Apache S3 G. Sqoop E. HQL H. Flume I. Caching with JBoss Infinispan IX. Testing, Performance and Troubleshooting J. Caching with OpenTerracotta A. TDD with MRUnit K. Using Spring Data B. TDD with other Unit Testing Frameworks C. Bottleneck discovery IV. Feeding Hadoop in the Enterprise D. Monitoring A. Apache UIMA E. Join Framework Optimization B. Spring Integration F. Troubleshooting C. Apache Camel G. Hadoop and Virtualization D. Spring Batch H. Hadoop in the Cloud I. Hadoop and Amazon EC2 V. Machine Learning with Mahout A. Artificial Intelligence Overview X. Other Optional Overview Topics B. Fuzzy Logic A. Storm Project C. K-Means B. Apache Kafka D. Pattern Mining C. Cassandra Bolt Due to the nature of this material, this document refers to numerous hardware and software products by their trade names. References to other companies and their products are for informational purposes only, and all trademarks are the properties of their respective companies. It is not the intent of ProTech Professional Technical Services, Inc. to use any of these names generically PT0756_REALWORLDHADOOPINTHEENTERPRISE.DOC .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages2 Page
-
File Size-