Lecture 14.1 Google's Map/Reduce

Lecture 14.1 Google's Map/Reduce

Lecture 14.1 Google’s Map/Reduce EN 600.320/420 Instructor: Randal Burns 13 March 2018 Department of Computer Science, Johns Hopkins University Overview Map/Reduce – A structured programming model for data parallelism – Geared toward data intensive applications The Google File System – Large scale parallel file system for low $$ – Failures are the normal case – M/R requires such a global data service Open-source re-implementation – HADOOP: Map/Reduce – HDFS: Google File System What is it? – Data-parallel, distributed-memory, data-intensive Lecture 14: Google’s Map/Reduce Map/Reduce Model Map – Process key/value pairs – Produce intermediate key/value pairs Reduce – Merge intermediate pairs on key value Map and reduce were LISP primitives – Hence the functional style – Actually write procedural code under functional constraints Lecture 14: Google’s Map/Reduce An M/R Schematic Lecture 14: Google’s Map/Reduce Example: Count all Words This is pseudocode – Need a tokenizer, etc. Lecture 14: Google’s Map/Reduce Example: Count all Words Lecture 14: Google’s Map/Reduce Word Count Visualized Splits: by files, by lines – Like most things, customizable within the framework Output is deceptive: not a file, but many files (one per reducer) Lecture 14: Google’s Map/Reduce What can you do? Seems like a limited model But, many string processing problems fit naturally Can be used iteratively Examples – Grep – Web-log processing (e.g. URL hit counts) – Reverse Web-link graph – Terms vector per host – Build an inverted index – Distributed sort (naturally) Lecture 14: Google’s Map/Reduce.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us