Quick viewing(Text Mode)

RNN and Autoencoder

CS273B lecture 5: and

James Zou October 10, 2016

Recap Recap: feedforward and convnets

Main take-aways:

• Composition. Units/layers of a NN are modular and can be composed to form complex architecture.

• Weight-sharing. Enforcing that the weight be equal across a set of units can dramatically decrease # of parameters. What are limitations of convnets? What are limitations of convnets?

• Fixed input length.

• Unclear how to adapt to time-series data.

corresponds to strong prior—not appropriate for many biological settings.

• Could require many labeled training examples (high ). What are limitations of convnets?

• Fixed input length. What are limitations of convnets?

• Fixed input length.

output hidden units

input Recurrent neural network

output = + · hidden units = ( + ) ·

input Recurrent neural network

output hidden units

input Recurrent neural network

+

+

+ Recurrent neural network

= ( + + ) · ·

Recurrent neural network

= + ·

What does RNN remind you of?

+

+

+ Vanilla RNN: lacks long term

+

+

+ LSTM network

LSTM network

Hochreiter, Schmidhuber 1997 LSTM: inside the hood

memory

Figure adapted from Olah blog. LSTM: inside the hood

= + = ( [ , ]+) ·

Figure adapted from Olah blog. LSTM: inside the hood

= + = tanh( [ , ]+) · = ( [ , ]+) ·

Figure adapted from Olah blog. LSTM: inside the hood

= ( [ , ]+) · = tanh( ) output

Figure adapted from Olah blog. LSTM summary

• LSTM is a variant of RNN that makes it easier to retain long- range interactions.

• Parameters of LSTM:

, forget

, new memory

, weight of new memory (input)

, output LSTM application: enhancer/TF prediction

Output: 919 binary vector for the presence of TF/chromatin

Bi-directional LSTM

Similar convolutional architecture as before

Input: 200bp sequence

Quang and Xie. DanQ. 2016 Deep

• Feedforward Learning a nonlinear mapping from inputs to outputs. Predicting: • Convnets TF binding, gene expression, disease status from images, • RNN, LSTM risk from SNPs, protein structure …

Deep

• Nonlinear dimensional reduction and patterns mining.

• In many settings, have more unlabeled examples than labeled.

• Learn useful representations from unlabeled data.

• Better representation may improve prediction accuracy. Low dimensional structure

What is the latent dimensionality of each row of images?

Urtasun and Zemel. Autoencoder

ˆ

decoding ()

encoding Autoencoder

ˆ ˆ = ( + ) · () = ( + ) · , = arg min ˆ , || || Train with backprop as before. Autoencoder

ˆ If encoding and decoding are linear then , = arg min , || || () What does this remind you of? Autoencoder

ˆ If encoding and decoding are linear then , = arg min , || || () Linear autoencoder is basically just PCA! General f and g corresponds to nonlinear dimensional reduction. What is wrong with this picture?

ˆ ()

What is wrong with this picture?

ˆ h(X) can just copy X exactly!

Overcomplete. Need to impose sparsity on h. ()

Denoising autoencoder

ˆ ()

0 0 independent Denoising autoencoder

ˆ ()

0 0 independent noise Illustration of denoising autoencoder

Figure from Hugo Larochelle Filters from denoising autoencoder

Basis learned by Basis learned by weight- denoising autoencoder decay autoencoder Deep autoencoder

ˆ ˆ

Deep autoencoder example

original DAE PCA

Hinton and Salakhutdinov. Science. 2016 Deep autoencoder example

PCA Deep autoencoder

Hinton and Salakhutdinov. Science. 2016 Application: deep patient

Each patient = vector of 41k clinical descriptors

Stack of 3 denoising autoencoder 500 dim representation of each patient

Miotto et al. DeepPatient. 2016 Application: deep patient

500 dim representation of each patient

Random forest to predict future disease

Miotto et al. DeepPatient. 2016