Google Prediction API and Distributed Machine Learning
Max Lin Software Engineer | Google
HPC and Cloud Computing Workshop | LBNL & CITRIS UC Berkeley | June 22nd, 2011
Wednesday, June 22, 2011 Outline
• What is Machine Learning? • Why Machine Learning on the Cloud? • What is Google Prediction API? • How to scale up machine learning algorithms?
Wednesday, June 22, 2011 “Machine Learning is a study of computer algorithms that improve automatically through experience.”
Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 The quick brown fox jumped over the lazy dog.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you need a computer.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien no venga.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that is the question
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
La fe mueve montañas.
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish
To be or not to be -- that ? is the question
La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish
To be or not to be -- that ? Testing is the question La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish
To be or not to be -- that ? Testing is the questionf(x’) La fe mueve montañas. ?
Wednesday, June 22, 2011 The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training needInput a computer. X Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish
To be or not to be -- that ? Testing is the questionf(x’) = y’ La fe mueve montañas. ?
Wednesday, June 22, 2011 Linear Classifier
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0,
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1,
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1,
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0,
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0, ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ...
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
[0.1, ... 132, ... 150, ... 200, ... -153, ... ]
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
w [0.1, ... 132, ... 150, ... 200, ... -153, ... ]
Wednesday, June 22, 2011 Linear Classifier
The quick brown fox jumped over the lazy dog.
‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
w [0.1, ... 132, ... 150, ... 200, ... -153, ... ]
P f(x)=w x = w x · p ∗ p p=1
Wednesday, June 22, 2011 Training Data
Input X Ouput Y
P
...
...
N ...
......
...
Wednesday, June 22, 2011 Typical machine learning data at Google
N: 100 billions / 1 billion P: 1 billion / 10 million (mean / median)
http://www.flickr.com/photos/mr_t_in_dc/5469563053/ Wednesday, June 22, 2011 Classifier Training
• Training: Given {(x, y)} and f, minimize the following objective function
N arg min L(yi,f(xi; w)) + R(w) w n=1
Wednesday, June 22, 2011 Use Newton’s method? t+1 t t 1 t w w H(w )− J(w ) ← − ∇
http://www.flickr.com/photos/visitfinland/5424369765/ Wednesday, June 22, 2011 Why Machine Learning in the Cloud?
http://www.flickr.com/photos/turtlemom_nancy/2046347762/sizes/l/in/photostream/ Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Google Prediction API
Wednesday, June 22, 2011 Prediction API
Wednesday, June 22, 2011 Prediction API
• Machine learning as a web service
Wednesday, June 22, 2011 Prediction API
• Machine learning as a web service • Use Google’s machine learning algorithms
Wednesday, June 22, 2011 Prediction API
• Machine learning as a web service • Use Google’s machine learning algorithms • Use Google’s computing infrastructure
Wednesday, June 22, 2011 Prediction API
• Machine learning as a web service • Use Google’s machine learning algorithms • Use Google’s computing infrastructure • Beta in 2010, generally available in 2011
Wednesday, June 22, 2011 Wednesday, June 22, 2011 Step 1: Upload
Wednesday, June 22, 2011 Step 1: Upload
• Training data: inputs to outputs
Wednesday, June 22, 2011 Step 1: Upload
• Training data: inputs to outputs • Data format: Comma-Separated Value english, “to err is human, but to really ...” spanish, “no hay mal que por bien no benga. “
Wednesday, June 22, 2011 Step 1: Upload
• Training data: inputs to outputs • Data format: Comma-Separated Value english, “to err is human, but to really ...” spanish, “no hay mal que por bien no benga. “ • Upload to Google Storage $ gsutil cp my_data gs://my/data
Wednesday, June 22, 2011 Step 2: Train
Wednesday, June 22, 2011 Step 2: Train
• To train a model POST prediction/v1.2/training
{id: ‘my/data’}
Wednesday, June 22, 2011 Step 2: Train
• To train a model POST prediction/v1.2/training
{id: ‘my/data’} • To see if a training job is finished GET prediction/v1.2/training/my%2Fdata
Wednesday, June 22, 2011 Step 3: Predict
Wednesday, June 22, 2011 Step 3: Predict • Make a prediction POST prediction/v1.2/training/my%2Fdata
{input: {csvInstance: [“To be or not to be ...”]}} • {outputLabel: “english”, outputMulti: [{“label”: “english”, “score”: 0.92}, {“label”: “spanish”, “score”: 0.08}]}
Wednesday, June 22, 2011 Step 4: Adapt
• Stream new data to new model PUT prediction/v1.2/training/my%2Fdata
{classLabel: “french”, csvInstance: [“J’aime X! C’est le meiller”]}
Wednesday, June 22, 2011 Wednesday, June 22, 2011 • Sign up the Google Prediction API Google APIs Console http://code.google.com/apis/console • More info about the Prediction API http://code.google.com/apis/predict
Wednesday, June 22, 2011 Scaling Up
Wednesday, June 22, 2011 Scaling Up
• Why big data?
Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms
Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel
Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines
Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
Wednesday, June 22, 2011 Subsampling
Wednesday, June 22, 2011 Subsampling
Big Data
Wednesday, June 22, 2011 Subsampling
Big Data
Shard 1 Shard 2 Shard 3 Shard M ...
Wednesday, June 22, 2011 Subsampling
Big Data
Reduce N Shard 1
Wednesday, June 22, 2011 Subsampling
Big Data
Reduce N Shard 1
Machine
Wednesday, June 22, 2011 Subsampling
Big Data
Reduce N
Machine
Shard 1
Wednesday, June 22, 2011 Subsampling
Big Data
Reduce N
Machine
Shard 1
Model
Wednesday, June 22, 2011 Why not Small Data?
[Banko and Brill, 2001]
Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
Wednesday, June 22, 2011 Parallelize Optimization
N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗
Wednesday, June 22, 2011 Parallelize Optimization
• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗
Wednesday, June 22, 2011 Parallelize Optimization
• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗
Wednesday, June 22, 2011 Parallelize Optimization
• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗
Wednesday, June 22, 2011 Parallelize Optimization
• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗ • Good: J(w) is concave
Wednesday, June 22, 2011 Parallelize Optimization
• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗ • Good: J(w) is concave • Bad: no closed-form solution like NB
Wednesday, June 22, 2011 Parallelize Optimization
• Maximum Entropy Classifiers N P i yi exp( wp xp) arg min p=1 ∗ w P i i=1 1 + exp( p=1 wp xp) ∗ • Good: J(w) is concave • Bad: no closed-form solution like NB • Ugly: Large N
Wednesday, June 22, 2011 Gradient Descent
http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf Wednesday, June 22, 2011 Gradient Descent
Wednesday, June 22, 2011 Gradient Descent
• w is initialized as zero • for t in 1 to T • Calculate gradients •
Wednesday, June 22, 2011 Gradient Descent
• w is initialized as zero • for t in 1 to T Calculate gradients J(w) • ∇ •
Wednesday, June 22, 2011 Gradient Descent
• w is initialized as zero • for t in 1 to T Calculate gradients J(w) • ∇ • wt+1 wt η J(w) ← − ∇
Wednesday, June 22, 2011 Gradient Descent
• w is initialized as zero • for t in 1 to T Calculate gradients J(w) • ∇ • wt+1 wt η J(w) ← − ∇ N J(w)= P (w,x ,y ) ∇ i i i=1
Wednesday, June 22, 2011 Distribute Gradient
• w is initialized as zero • for t in 1 to T • Calculate gradients in parallel
• Training CPU: O(TPN) to O(TPN / M)
Wednesday, June 22, 2011 Distribute Gradient
• w is initialized as zero • for t in 1 to T • Calculate gradients in parallel wt+1 wt η J(w) ← − ∇
• Training CPU: O(TPN) to O(TPN / M)
Wednesday, June 22, 2011 Distribute Gradient
Wednesday, June 22, 2011 Distribute Gradient
Big Data
Wednesday, June 22, 2011 Distribute Gradient
Big Data
Shard 1 Shard 2 Shard 3 ... Shard M
Wednesday, June 22, 2011 Distribute Gradient
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, partial gradient sum)
Wednesday, June 22, 2011 Distribute Gradient
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, partial gradient sum)
Reduce Sum and Update w
Wednesday, June 22, 2011 Distribute Gradient
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, partial gradient sum)
Reduce Sum and Update w Repeat M/R until converge Model
Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
Wednesday, June 22, 2011 Parallelize Subroutines
Support Vector Machines • n 1 2 arg min w 2 + C ζi w,b,ζ 2|| || i=1 s.t. 1 y (w φ(x )+b) ζ , ζ 0 − i · i ≤ i i ≥ • Solve the dual problem 1 arg min αT Qα αT 1 α 2 − s.t. 0 α C, yT α =0 ≤ ≤
Wednesday, June 22, 2011 The computational cost for the Primal- Dual Interior Point Method is O(n^3) in time and O(n^2) in memory
http://www.flickr.com/photos/sea-turtle/198445204/ Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]
√N
Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]
• Parallel, row-wise incomplete Cholesky Factorization for Q
√N
Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]
• Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) • Memory O(n^2) becomes O(n √ N / M)
Wednesday, June 22, 2011 Parallel SVM [Chang et al, 2007]
• Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) • Memory O(n^2) becomes O(n √ N / M) • Parallel Support Vector Machines (psvm) http:// code.google.com/p/psvm/ • Implement in MPI
Wednesday, June 22, 2011 Wednesday, June 22, 2011 Scaling Up
• Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
Wednesday, June 22, 2011 Majority Vote
Wednesday, June 22, 2011 Majority Vote
Big Data
Wednesday, June 22, 2011 Majority Vote
Big Data
Shard 1 Shard 2 Shard 3 ... Shard M
Wednesday, June 22, 2011 Majority Vote
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
Wednesday, June 22, 2011 Majority Vote
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
Model 1 Model 2 Model 3 Model 4
Wednesday, June 22, 2011 Majority Vote
• Train individual classifiers independently • Predict by taking majority votes • Training CPU: O(TPN) to O(TPN / M)
Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]
Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]
Big Data
Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]
Big Data
Shard 1 Shard 2 Shard 3 ... Shard M
Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, w1) (dummy key, w2) ...
Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, w1) (dummy key, w2) ...
Reduce Average w
Wednesday, June 22, 2011 Parameter Mixture [Mann et al, 2009]
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, w1) (dummy key, w2) ...
Reduce Average w
Model
Wednesday, June 22, 2011 Much Less network usage than distributed gradient descent O(MN) vs. O(MNT)
http://www.flickr.com/photos/annamatic3000/127945652/ Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]
Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]
Big Data
Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]
Big Data
Shard 1 Shard 2 Shard 3 ... Shard M
Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, w1) (dummy key, w2) ...
Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch
Wednesday, June 22, 2011 Iterative Param Mixture [McDonald et al., 2010]
Big Data
Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
(dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch
Model
Wednesday, June 22, 2011 Wednesday, June 22, 2011 “Machine Learning is a study of computer algorithms that improve automatically through experience.”
Wednesday, June 22, 2011 Google Prediction API Machine Learning as a Web Service
Wednesday, June 22, 2011 Parallelize ML Algorithms
Wednesday, June 22, 2011 Parallelize ML Algorithms
• Embarrassingly parallel
Wednesday, June 22, 2011 Parallelize ML Algorithms
• Embarrassingly parallel • Parallelize sub-routines
Wednesday, June 22, 2011 Parallelize ML Algorithms
• Embarrassingly parallel • Parallelize sub-routines • Distributed learning
Wednesday, June 22, 2011 Google APIs
Wednesday, June 22, 2011 Google APIs
• Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict
Wednesday, June 22, 2011 Google APIs
• Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict
• BigQuery • interactive analysis of massive data on the cloud • http://code.google.com/apis/bigquery
Wednesday, June 22, 2011