Neural Turing Machine

Neural Turing Machine

Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction • Neural Turning Machine: Couple a Neural Network with external memory resources • The combined system is analogous to TM, but differentiable from end to end • NTM Can infer simple algorithms! 4/6/2015 Tinghui Wang, EECS, WSU 2 In Computation Theory… Finite State Machine 횺, 푺, 풔ퟎ, 휹, 푭 • 횺: Alphabet • 푺: Finite set of states • 풔ퟎ: Initial state • 휹: state transition function: 푺 × 횺 → 푷 푺 • 푭: set of Final States 4/6/2015 Tinghui Wang, EECS, WSU 3 In Computation Theory… Push Down Automata 푸, 횺, 횪, 휹, 풒ퟎ, 풁, 푭 • 횺: Input Alphabet • 횪: Stack Alphabet • 푸: Finite set of states • 풒ퟎ: Initial state • 휹: state transition function: 푸 × 횺 × 횪 → 푷 푸 × 횪 • 푭: set of Final States 4/6/2015 Tinghui Wang, EECS, WSU 4 Turing Machine 푸, 횪, 풃, 횺, 휹, 풒ퟎ, 푭 • 푸: Finite set of States • 풒ퟎ: Initial State • 횪: Tape alphabet (Finite) • 풃: Blank Symbol (occurs infinitely on the tape) • 횺: Input Alphabet (Σ = 횪\ 푏 ) • 휹: Transition Function 푄\F × Γ → 푄 × Γ × 퐿, 푅 • 푭: Final State (Accepted) 4/6/2015 Tinghui Wang, EECS, WSU 5 Neural Turing Machine - Add Learning to TM!! Finite State Machine (Program) Neural Network (I Can Learn!!) 4/6/2015 Tinghui Wang, EECS, WSU 6 A Little History on Neural Network • 1950s: Frank Rosenblatt, Perceptron – classification based on linear predictor • 1969: Minsky – Proof that Perceptron Sucks! – Not able to learn XOR • 1980s: Back propagation • 1990s: Recurrent Neural Network, Long Short-Term Memory • Late 2000s – now: Deep Learning, Fast Computer 4/6/2015 Tinghui Wang, EECS, WSU 7 Feed-Forward Neural Net and Back Propagation • Total Input of unit 푗: 푥푗 = 푦푖푤푗푖 푖 • Output of unit 푗 (logistic function) 1 푦 = 푗 1 + 푒−푥푗 푤푗푖 푖 푗 • Total Error (mean square) 1 2 퐸 = 푑 − 푦 푡표푡푎푙 2 푐,푗 푐,푗 푐 푗 • Gradient Descent 휕퐸 휕퐸 휕푦푗 휕푥푗 = 휕푤푗푖 휕푦푗 휕푥푗 휕푤푗푖 Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating errors". Nature 323 (6088): 533–536. 4/6/2015 Tinghui Wang, EECS, WSU 8 Recurrent Neural Network • 푦 푡 : n-tuple of outputs at time t • 푥푛푒푡 푡 : m-tuple of external inputs at time t • 푈: set of indices 푘 that 푥푘 is output of unit in network • 퐼: set of indices 푘 that 푥푘 is external input 푛푒푡 푥푘 푡 , 푖푓 푘 ∈ 퐼 • 푥푘 = 푦푘 푡 , 푖푓 푘 ∈ 푅 • 푦푘 푡 + 1 = 푓푘 푠푘 푡 + 1 where 푓푘 is a differential squashing function R. J. Williams and D. Zipser. Gradient-based learning algorithms for recurrent networks and their computational • 푠푘 푡 + 1 = 푖∈퐼∪푈 푤푘푖푥푖 푡 complexity. In Back-propagation: Theory, Architectures and Applications. Hillsdale, NJ: Erlbaum, 1994. 4/6/2015 Tinghui Wang, EECS, WSU 9 Recurrent Neural Network – Time Unrolling • Error Function: 1 1 퐽 푡 = − 푒 푡 2 = − 푑 푡 − 푦 푡 2 2 푘 2 푘 푘 푘∈푈 푘∈푈 푡 퐽푇표푡푎푙 푡′, 푡 = 퐽 휏 휏=푡′+1 • Back Propagation through Time: 휕퐽 푡 휀푘 푡 = = 푒푘 푡 = 푑푘 푡 − 푦푘 푡 휕푦푘 푡 휕퐽 푡 휕퐽 푡 휕푦푘 휏 ′ 훿푘 휏 = = = 푓푘 푠푘 휏 휀푘 휏 휕푠푘 휏 휕푦푘 휏 휕푠푘 휏 휕퐽 푡 휕퐽 푡 휕푠푙 휏 휀푘 휏 − 1 = = = 푤푙푘훿푙 휏 휕푦푘 휏 − 1 휕푠푙 휏 휕푦푘 휏 − 1 푙∈푈 푙∈푈 푇표푡푎푙 푡 휕퐽 휕퐽 푡 휕푠푗 휏 = 휕푤푗푖 휏=푡′+1 휕푠푗 휏 휕푤푗푖 4/6/2015 Tinghui Wang, EECS, WSU 10 RNN & Long Short-Term Memory • RNN is Turing Complete . With Proper Configuration, it can simulate any sequence generated by a Turing Machine • Error Vanishing and Exploding . Consider single unit self loop: 휕퐽 푡 휕퐽 푡 휕푦푘 휏 ′ ′ 훿푘 휏 = = = 푓푘 푠푘 휏 휀푘 휏 = 푓푘 푠푘 휏 푤푘푘훿푘 휏 + 1 휕푠푘 휏 휕푦푘 휏 휕푠푘 휏 • Long Short-term Memory Hochreiter, S. and Schmidhuber, J. (1997). Long short- term memory. Neural computation, 9(8):1735–1780. 4/6/2015 Tinghui Wang, EECS, WSU 11 Purpose of Neural Turing Machine • Enrich the capabilities of standard RNN to simplify the solution of algorithmic tasks 4/6/2015 Tinghui Wang, EECS, WSU 12 NTM Read/Write Operation 1 2 3 ... N Erase: 푀푡 푖 = 푀푡−1 푖 1 − 푤푡 푖 푒푡 푟푡 = 푤푡 푖 푀푡 푖 푖 Add: SizeVector SizeVector 푀 푖 += 푤 푖 푎 - - 푡 푡 푡 M M w1 w2 w3 ... wN 푒 : Erase Vector, emitted by the header 푤 푖 : vector of weights over the N locations 푡 푡 푎 : Add Vector, emitted by the header emitted by a read head at time t 푡 푤푡: weights vector 4/6/2015 Tinghui Wang, EECS, WSU 13 Attention & Focus • Attention and Focus is adjusted by weights vector across whole memory bank! 4/6/2015 Tinghui Wang, EECS, WSU 14 Content Based Addressing • Find Similar Data in memory 풌푡: Key Vector 훽푡: Key Strength exp 훽푡퐾 풌푡, 푴푡 푖 푐 푤푡 푖 = 푗 exp 훽푡퐾 풌푡, 푴푡 푗 풖 ∙ 풗 퐾 풖, 풗 = 풖 ∙ 풗 4/6/2015 Tinghui Wang, EECS, WSU 15 May not depend on content only… • Gating Content Addressing 푔푡: Interpolation gate in the range of (0,1) 푔 푐 풘푡 = 푔푡풘푡 + 1 − 푔푡 풘푡−1 4/6/2015 Tinghui Wang, EECS, WSU 16 Convolutional Shift – Location Addressing 풔푡: Shift weighting – Convolutional Vector (size N) 푁−1 푔 풘 푡 푖 = 풘푡 푗 풔푡 푖 − 푗 푗=0 4/6/2015 Tinghui Wang, EECS, WSU 17 Convolutional Shift – Location Addressing 훾푡: sharpening factor 훾 풘 푡 푖 푡 풘푡 푖 = 훾푡 푗 풘 푡 푗 4/6/2015 Tinghui Wang, EECS, WSU 18 Neural Turing Machine - Experiments • Goal: to demonstrate NTM is . Able to solve the problems . By learning compact internal programs • Three architectures: . NTM with a feedforward controller . NTM with an LSTM controller . Standard LSTM controller • Applications . Copy . Repeat Copy . Associative Recall . Dynamic N-Grams . Priority Sort 4/6/2015 Tinghui Wang, EECS, WSU 19 NTM Experiments: Copy NTM • Task: Store and recall a long sequence of arbitrary information • Training: 8-bit random vectors with length 1-20 • No inputs while it generates the targets LSTM 4/6/2015 Tinghui Wang, EECS, WSU 20 NTM Experiments: Copy – Learning Curve 4/6/2015 Tinghui Wang, EECS, WSU 21 NTM Experiments: Repeated Copy • Task: Repeat a sequence a specified number of times • Training: random length sequences of random binary vectors followed by a scalar value indicating the desired number of copies 4/6/2015 Tinghui Wang, EECS, WSU 22 NTM Experiments: Repeated Copy – Learning Curve 4/6/2015 Tinghui Wang, EECS, WSU 23 NTM Experiments: Associative Recall • Task: ask the network to produce the next item, given current item after propagating a sequence to network • Training: each item is composed of three six-bit binary vectors, 2-8 items every episode 4/6/2015 Tinghui Wang, EECS, WSU 24 NTM Experiments: Dynamic N-Grams • Task: Learn N-Gram Model – rapidly adapt to new predictive distribution • Training: 6-Gram distributions over binary sequences. 200 successive bits using look-up table by drawing 32 probabilities from Beta(.5,.5) distribution • Compare to Optimal Estimator: 푁1 + 0.5 푃 퐵 = 1 푁1, 푁2, 풄 = 푁1 + 푁0 + 1 C=01111 C=00010 4/6/2015 Tinghui Wang, EECS, WSU 25 NTM Experiments: Priority Sorting • Task: Sort Binary Vector based on priority • Training: 20 binary vectors with corresponding priorities, output 16 highest-priority vectors 4/6/2015 Tinghui Wang, EECS, WSU 26 NTM Experiments: Learning Curve 4/6/2015 Tinghui Wang, EECS, WSU 27 Some Detailed Parameters NTM with Feed Forward Neural Network NTM with LSTM LSTM 4/6/2015 Tinghui Wang, EECS, WSU 28 Additional Information • Literature Reference: . Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating errors". Nature 323 (6088): 533–536 . R. J. Williams and D. Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity. In Back-propagation: Theory, Architectures and Applications. Hillsdale, NJ: Erlbaum, 1994. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780 . Y. Bengio and Y. LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machines, Bottou, ., Chapelle, O., DeCoste, D., and Weston, J., Eds., MIT Press, 2007. • NTM Implementation . https://github.com/shawntan/neural-turing-machines • NTM Reddit Discussion . https://www.reddit.com/r/MachineLearning/comments/2m9mga/implementation_of_neural_turing_ma chines/ 4/6/2015 Tinghui Wang, EECS, WSU 29 4/6/2015 Tinghui Wang, EECS, WSU 30.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    30 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us