<<

Prediction API and Distributed Machine Learning

Max Lin Software Engineer | Google

HPC and Cloud Computing Workshop | LBNL & CITRIS UC Berkeley | June 22nd, 2011

Wednesday, June 22, 2011 Outline

• What is Machine Learning? • Why Machine Learning on the Cloud? • What is Google Prediction API? • How to scale up machine learning algorithms?

Wednesday, June 22, 2011 “Machine Learning is a study of computer algorithms that improve automatically through experience.”

Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 The quick brown fox jumped over the lazy dog.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you need a computer.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien no venga.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that is the question

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

La fe mueve montañas.

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish

To be or not to be -- that ? is the question

La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish

To be or not to be -- that ? Testing is the question La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish

To be or not to be -- that ? Testing is the questionf(x’) La fe mueve montañas. ?

Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish

To be or not to be -- that ? Testing is the questionf(x’) = y’ La fe mueve montañas. ?

Wednesday, June 22, 2011 Linear Classifier

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0,

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1,

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1,

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0,

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0, ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ...

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ... ]

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]

[0.1, ... 132, ... 150, ... 200, ... -153, ... ]

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]

w [0.1, ... 132, ... 150, ... 200, ... -153, ... ]

Wednesday, June 22, 2011 Linear Classifier

The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]

w [0.1, ... 132, ... 150, ... 200, ... -153, ... ]

P f(x)=w x = w x · p ∗ p p=1 ￿

Wednesday, June 22, 2011 Training Data

Input X Ouput Y

P

...

...

N ...

......

...

Wednesday, June 22, 2011 Typical machine learning data at Google

N: 100 billions / 1 billion P: 1 billion / 10 million (mean / median)

http://www.flickr.com/photos/mr_t_in_dc/5469563053/ Wednesday, June 22, 2011 Classifier Training

• Training: Given {(x, y)} and f, minimize the following objective function

N arg min L(yi,f(xi; w)) + R(w) w n=1 ￿

Wednesday, June 22, 2011 Use Newton’s method? t+1 t t 1 t w w H(w )− J(w ) ← − ∇

http://www.flickr.com/photos/visitfinland/5424369765/ Wednesday, June 22, 2011 Why Machine Learning in the Cloud?

http://www.flickr.com/photos/turtlemom_nancy/2046347762/sizes/l/in/photostream/ Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Google Prediction API

Wednesday, June 22, 2011 Prediction API

Wednesday, June 22, 2011 Prediction API

• Machine learning as a web service

Wednesday, June 22, 2011 Prediction API

• Machine learning as a web service • Use Google’s machine learning algorithms

Wednesday, June 22, 2011 Prediction API

• Machine learning as a web service • Use Google’s machine learning algorithms • Use Google’s computing infrastructure

Wednesday, June 22, 2011 Prediction API

• Machine learning as a web service • Use Google’s machine learning algorithms • Use Google’s computing infrastructure • Beta in 2010, generally available in 2011

Wednesday, June 22, 2011 Wednesday, June 22, 2011 Step 1: Upload

Wednesday, June 22, 2011 Step 1: Upload

• Training data: inputs to outputs

Wednesday, June 22, 2011 Step 1: Upload

• Training data: inputs to outputs • Data format: Comma-Separated Value english, “to err is human, but to really ...” spanish, “no hay mal que por bien no benga. “

Wednesday, June 22, 2011 Step 1: Upload

• Training data: inputs to outputs • Data format: Comma-Separated Value english, “to err is human, but to really ...” spanish, “no hay mal que por bien no benga. “ • Upload to Google Storage $ gsutil cp my_data gs://my/data

Wednesday, June 22, 2011 Step 2: Train

Wednesday, June 22, 2011 Step 2: Train

• To train a model POST prediction/v1.2/training

{id: ‘my/data’}

Wednesday, June 22, 2011 Step 2: Train

• To train a model POST prediction/v1.2/training

{id: ‘my/data’} • To see if a training job is finished GET prediction/v1.2/training/my%2Fdata

Wednesday, June 22, 2011 Step 3: Predict

Wednesday, June 22, 2011 Step 3: Predict • Make a prediction POST prediction/v1.2/training/my%2Fdata

{input: {csvInstance: [“To be or not to be ...”]}} • {outputLabel: “english”, outputMulti: [{“label”: “english”, “score”: 0.92}, {“label”: “spanish”, “score”: 0.08}]}

Wednesday, June 22, 2011 Step 4: Adapt

• Stream new data to new model PUT prediction/v1.2/training/my%2Fdata

{classLabel: “french”, csvInstance: [“J’aime X! C’est le meiller”]}

Wednesday, June 22, 2011 Wednesday, June 22, 2011 • Sign up the Google Prediction API Google APIs Console http://code.google.com/apis/console • More info about the Prediction API http://code.google.com/apis/predict

Wednesday, June 22, 2011 Scaling Up

Wednesday, June 22, 2011 Scaling Up

• Why big data?

Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms

Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel

Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines

Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning

Wednesday, June 22, 2011 Subsampling

Wednesday, June 22, 2011 Subsampling

Big Data

Wednesday, June 22, 2011 Subsampling

Big Data

Shard 1 Shard 2 Shard 3 Shard M ...

Wednesday, June 22, 2011 Subsampling

Big Data

Reduce N Shard 1

Wednesday, June 22, 2011 Subsampling

Big Data

Reduce N Shard 1

Machine

Wednesday, June 22, 2011 Subsampling

Big Data

Reduce N

Machine

Shard 1

Wednesday, June 22, 2011 Subsampling

Big Data

Reduce N

Machine

Shard 1

Model

Wednesday, June 22, 2011 Why not Small Data?

[Banko and Brill, 2001]

Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning

Wednesday, June 22, 2011 Parallelize Optimization

N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿

Wednesday, June 22, 2011 Parallelize Optimization

• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿

Wednesday, June 22, 2011 Parallelize Optimization

• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿

Wednesday, June 22, 2011 Parallelize Optimization

• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿

Wednesday, June 22, 2011 Parallelize Optimization

• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿ • Good: J(w) is concave

Wednesday, June 22, 2011 Parallelize Optimization

• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿ • Good: J(w) is concave • Bad: no closed-form solution like NB

Wednesday, June 22, 2011 Parallelize Optimization

• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp(￿ p=1 wp xp) ￿ ∗ ￿ • Good: J(w) is concave • Bad: no closed-form solution like NB • Ugly: Large N

Wednesday, June 22, 2011 Gradient Descent

http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf Wednesday, June 22, 2011 Gradient Descent

Wednesday, June 22, 2011 Gradient Descent

• w is initialized as zero • for t in 1 to T • Calculate gradients •

Wednesday, June 22, 2011 Gradient Descent

• w is initialized as zero • for t in 1 to T Calculate gradients J(w) • ∇ •

Wednesday, June 22, 2011 Gradient Descent

• w is initialized as zero • for t in 1 to T Calculate gradients J(w) • ∇ • wt+1 wt η J(w) ← − ∇

Wednesday, June 22, 2011 Gradient Descent

• w is initialized as zero • for t in 1 to T Calculate gradients J(w) • ∇ • wt+1 wt η J(w) ← − ∇ N J(w)= P (w,x ,y ) ∇ i i i=1 ￿

Wednesday, June 22, 2011 Distribute Gradient

• w is initialized as zero • for t in 1 to T • Calculate gradients in parallel

• Training CPU: O(TPN) to O(TPN / M)

Wednesday, June 22, 2011 Distribute Gradient

• w is initialized as zero • for t in 1 to T • Calculate gradients in parallel wt+1 wt η J(w) ← − ∇

• Training CPU: O(TPN) to O(TPN / M)

Wednesday, June 22, 2011 Distribute Gradient

Wednesday, June 22, 2011 Distribute Gradient

Big Data

Wednesday, June 22, 2011 Distribute Gradient

Big Data

Shard 1 Shard 2 Shard 3 ... Shard M

Wednesday, June 22, 2011 Distribute Gradient

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, partial gradient sum)

Wednesday, June 22, 2011 Distribute Gradient

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, partial gradient sum)

Reduce Sum and Update w

Wednesday, June 22, 2011 Distribute Gradient

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, partial gradient sum)

Reduce Sum and Update w Repeat M/R until converge Model

Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning

Wednesday, June 22, 2011 Parallelize Subroutines

Support Vector Machines • n 1 2 arg min w 2 + C ζi w,b,ζ 2|| || i=1 ￿ s.t. 1 y (w φ(x )+b) ζ , ζ 0 − i · i ≤ i i ≥ • Solve the dual problem 1 arg min αT Qα αT 1 α 2 − s.t. 0 α C, yT α =0 ≤ ≤

Wednesday, June 22, 2011 The computational cost for the Primal- Dual Interior Point Method is O(n^3) in time and O(n^2) in memory

http://www.flickr.com/photos/sea-turtle/198445204/ Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]

√N

Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]

• Parallel, row-wise incomplete Cholesky Factorization for Q

√N

Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]

• Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) • Memory O(n^2) becomes O(n √ N / M)

Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]

• Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) • Memory O(n^2) becomes O(n √ N / M) • Parallel Support Vector Machines (psvm) http:// code.google.com/p/psvm/ • Implement in MPI

Wednesday, June 22, 2011 Wednesday, June 22, 2011 Scaling Up

• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning

Wednesday, June 22, 2011 Majority Vote

Wednesday, June 22, 2011 Majority Vote

Big Data

Wednesday, June 22, 2011 Majority Vote

Big Data

Shard 1 Shard 2 Shard 3 ... Shard M

Wednesday, June 22, 2011 Majority Vote

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

Wednesday, June 22, 2011 Majority Vote

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

Model 1 Model 2 Model 3 Model 4

Wednesday, June 22, 2011 Majority Vote

• Train individual classifiers independently • Predict by taking majority votes • Training CPU: O(TPN) to O(TPN / M)

Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]

Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]

Big Data

Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]

Big Data

Shard 1 Shard 2 Shard 3 ... Shard M

Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, w1) (dummy key, w2) ...

Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, w1) (dummy key, w2) ...

Reduce Average w

Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, w1) (dummy key, w2) ...

Reduce Average w

Model

Wednesday, June 22, 2011 Much Less network usage than distributed gradient descent O(MN) vs. O(MNT)

http://www.flickr.com/photos/annamatic3000/127945652/ Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]

Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]

Big Data

Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]

Big Data

Shard 1 Shard 2 Shard 3 ... Shard M

Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, w1) (dummy key, w2) ...

Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch

Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]

Big Data

Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M

(dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch

Model

Wednesday, June 22, 2011 Wednesday, June 22, 2011 “Machine Learning is a study of computer algorithms that improve automatically through experience.”

Wednesday, June 22, 2011 Google Prediction API Machine Learning as a Web Service

Wednesday, June 22, 2011 Parallelize ML Algorithms

Wednesday, June 22, 2011 Parallelize ML Algorithms

• Embarrassingly parallel

Wednesday, June 22, 2011 Parallelize ML Algorithms

• Embarrassingly parallel • Parallelize sub-routines

Wednesday, June 22, 2011 Parallelize ML Algorithms

• Embarrassingly parallel • Parallelize sub-routines • Distributed learning

Wednesday, June 22, 2011 Google APIs

Wednesday, June 22, 2011 Google APIs

• Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict

Wednesday, June 22, 2011 Google APIs

• Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict

• BigQuery • interactive analysis of massive data on the cloud • http://code.google.com/apis/bigquery

Wednesday, June 22, 2011