Mapreduce: Simplified Data Processing on Large Clusters

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Presenter: Aditya Devarakonda September 1, 2015 Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters for a rewrite of our production indexing system. Sec- 2.2 Types tion 7 discusses related and future work. Even though the previous pseudo-code is written in terms of string inputs and outputs, conceptually the map and 2 Programming Model reduce functions supplied by the user have associated types: The computation takes a set of input key/value pairs, and map (k1,v1) list(k2,v2) produces a set of output key/value pairs. The user of → reduce (k2,list(v2)) list(v2) the MapReduce library expresses the computation as two → functions: Map and Reduce. I.e., the input keys and values are drawn from a different Map, written by the user, takes an input pair and pro- domain than the output keys and values. Furthermore, duces a set of intermediate key/value pairs. The MapRe- the intermediate keys and values are from the same do- duce library groups together all intermediate values asso- main as the output keys and values. ciated with the same intermediate key I and passes them Our C++ implementation passes strings to and from to the Reduce function. the user-defined functions and leaves it to the user code The Reduce function, also written by the user, accepts to convert between strings and appropriate types. an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is 2.3 More Examples produced per Reduce invocation. The intermediate values are supplied to the user's reduce function via an iter- Here are a few simple examples of interesting programs ator. This allows us to handle lists of values that are too that can be easily expressed as MapReduce computa- large to fit in memory. tions. 2.1 Example Distributed Grep: The map function emits a line if it matches a supplied pattern. The reduce function is an Consider the problem of counting the number of oc- identity function that just copies the supplied intermedi- currences of each word in a large collection of docu- ate data to the output. ments. The user would write code similar to the follow- Programminging pseudo-code: Model Count of URL Access Frequency: The map func- map(String key, String value): tion processes logs of web page requests and outputs // key: document name URL, 1 . The reduce function adds together all values // value: document contents h i for the same URL and emits a URL, total count for each word w in value: h i EmitIntermediate(w, "1"); pair. reduce(String key, Iterator values): // key: a word Reverse Web-Link Graph: The map function outputs target, source pairs for each link to a target // values: a list of counts h i int result = 0; URL found in a page named source. The reduce for each v in values: function concatenates the list of all source URLs as- result += ParseInt(v); sociated with a given target URL and emits the pair: Emit(AsString(result)); target, list(source) h i map The Presenter:function Aditya Devarakondaemits eachMapReduce:word Simplifiedplus Dataan Processingassociated on Large Clusters count of occurrences (just `1' in this simple example). Term-Vector per Host: A term vector summarizes the The reduce function sums together all counts emitted most important words that occur in a document or a set for a particular word. of documents as a list of word, frequency pairs. The h i In addition, the user writes code to fill in a mapreduce map function emits a hostname, term vector h i specification object with the names of the input and out- pair for each input document (where the hostname is put files, and optional tuning parameters. The user then extracted from the URL of the document). The re- invokes the MapReduce function, passing it the specifi- duce function is passed all per-document term vectors cation object. The user's code is linked together with the for a given host. It adds these term vectors together, MapReduce library (implemented in C++). Appendix A throwing away infrequent terms, and then emits a final contains the full program text for this example. hostname, term vector pair. h i To appear in OSDI 2004 2 Execution Model User Program (1) fork (1) fork (1) fork Master (2) (2) assign assign reduce map worker split 0 (6) write output split 1 worker (5) remote read file 0 split 2 (3) read (4) local write worker output split 3 worker file 1 split 4 worker Input Map Intermediate files Reduce Output files phase (on local disks) phase files Figure 1: Execution overview Inverted Index: The map function parses each docu- large clusters of commodity PCs connected together with ment, and emits a sequencePresenter:of word Aditya, document DevarakondaID switchedMapReduce:Ethernet Simplified[4]. In our Dataenvironment: Processing on Large Clusters h i pairs. The reduce function accepts all pairs for a given (1) Machines are typically dual-processor x86 processors word, sorts the corresponding document IDs and emits a running Linux, with 2-4 GB of memory per machine. word, list(document ID) pair. The set of all output h i pairs forms a simple inverted index. It is easy to augment (2) Commodity networking hardware is used – typically this computation to keep track of word positions. either 100 megabits/second or 1 gigabit/second at the machine level, but averaging considerably less in over- all bisection bandwidth. Distributed Sort: The map function extracts the key (3) A cluster consists of hundreds or thousands of ma- from each record, and emits a key, record pair. The h i chines, and therefore machine failures are common. reduce function emits all pairs unchanged. This computation depends on the partitioning facilities described in (4) Storage is provided by inexpensive IDE disks at- Section 4.1 and the ordering properties described in Sec- tached directly to individual machines. A distributed file tion 4.2. system [8] developed in-house is used to manage the data stored on these disks. The file system uses replication to provide availability and reliability on top of unreliable 3 Implementation hardware. (5) Users submit jobs to a scheduling system. Each job Many different implementations of the MapReduce in- consists of a set of tasks, and is mapped by the scheduler terface are possible. The right choice depends on the to a set of available machines within a cluster. environment. For example, one implementation may be suitable for a small shared-memory machine, another for a large NUMA multi-processor, and yet another for an 3.1 Execution Overview even larger collection of networked machines. This section describes an implementation targeted The Map invocations are distributed across multiple to the computing environment in wide use at Google: machines by automatically partitioning the input data To appear in OSDI 2004 3 Resource management: SLURM, HTCondor. Machine setup: Supercomputer, network of computers. Prior Art Programming models: OpenMP, MPI, UPC, Charm++. Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Machine setup: Supercomputer, network of computers. Prior Art Programming models: OpenMP, MPI, UPC, Charm++. Resource management: SLURM, HTCondor. Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Prior Art Programming models: OpenMP, MPI, UPC, Charm++. Resource management: SLURM, HTCondor. Machine setup: Supercomputer, network of computers. Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Workers (Mappers and Reducers) Combiners Partitioners Key Players in MapReduce Master Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Combiners Partitioners Key Players in MapReduce Master Workers (Mappers and Reducers) Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Partitioners Key Players in MapReduce Master Workers (Mappers and Reducers) Combiners Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Key Players in MapReduce Master Workers (Mappers and Reducers) Combiners Partitioners Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Master failure Fault Tolerance Worker failure Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Fault Tolerance Worker failure Master failure Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Load balancing Communication Invariants (What assumptions does MR make). Parallel Computing Issues Locality Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Communication Invariants (What assumptions does MR make). Parallel Computing Issues Locality Load balancing Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Invariants (What assumptions does MR make). Parallel Computing Issues Locality Load balancing Communication Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Parallel Computing Issues Locality Load balancing Communication Invariants (What assumptions does MR make). Presenter: Aditya Devarakonda MapReduce: Simplified Data Processing on Large Clusters Performance on 1TB Sort 20000 20000 20000 Done Done Done 15000 15000 15000 10000 10000 10000 5000 5000 5000 Input (MB/s) Input (MB/s) Input (MB/s) 0 0 0 500 1000 500 1000 500 1000 20000 20000 20000 15000 15000 15000 10000 10000 10000 5000 5000 5000 Shuffle (MB/s) Shuffle (MB/s) Shuffle (MB/s) 0 0 0 500 1000 500 1000 500 1000 20000 20000 20000 15000 15000 15000 10000 10000 10000 5000 5000 5000 Output (MB/s) Output (MB/s) 0 0 Output (MB/s) 0 500 1000 500 1000 500 1000 Seconds Seconds Seconds (a) Normal execution (b) No backup tasks (c) 200 tasks killed Figure 3: Data transfer rates over time for different executions of the sort program original text line as the intermediate key/value pair. We the first batch of approximately 1700 reduce tasks (the used a built-in Identity function as the Reduce operator.

Load more