Gate Recurrent Unit, Long Short-Term Memory

GATE RECURRENT UNIT, LONG SHORT-TERM MEMORY Asst.Prof.Dr.Supakit Nootyaskool IT-KMITL Topics • Concept of Gate Recurrent Unit • Concept of Long Short-Term Memory Unit • Modify trainr() for GRU and LSTM • Development trainr() • Example LSTM in C lanugage REVIEW RECURRENT NEURAL NETWORK Define vectors 0 0 1 0 1 0 1 0 0 1 0 0 1 NN NN 0 1 1 0 0 NN 1 0 0 0 1 NN 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 Driving Schedule 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 Monday 0 0 1 0 0 1 1 0 0 1 0 Yesterday 0 1 0 Today 0 1 0 0 1 Driving on Thursday 0 0 1 0 1 0 0 0 1 1 0 0 Monday 1 0 0 1 0 1 Driving on Thursday 0 0 1 0 1 0 0 0 1 1 0 0 Monday 1 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 1 0 Driving on Thursday 0 0 1 0 1 0 0 0 1 1 0 0 Monday 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 RNN 1 0 0 1 0 1 0 0 1 1 0 y = label output 0 0 1 0 0 1 0 0 1 RNN 1 0 0 h = hidden activation x = Feature vector 1 0 1 0 0 1 1 0 RNN RNN [Wh] [W1; h0] [bh] RNN [Wh] [W1; h0] [bh] RNN y1 h1 RNN y1 [Wy] [h1] [by] h1 Softmax z = seq(0,5,by=0.2) softmax <- exp(z)/sum(exp(z)) plot(z,softmax,type='l') RNNs RNN usages "1" RNN "a" • Mapping sequence X and Y RNN "b" • Sentiment analysis "2" the product is a good quality and spending long "3" RNN "c" time for the delivery process. • Machine Translation RNN Advantages Disadvantages • RNN can learn the difference • We cannot bring the output from the between of the input sequence. future return sending to train in the model. • Model size does not depend on the length of the input sequence. • The model does not support sending data from the long past by skipping to train in the current. • Weight and some parameters passing by time, so it uses small variable when comparing with FF. • Most paper told RNN high training and spending high computation time. • Suffer from Vanishing gradient problem 0 1 0 0 0 1 tanh tanh tanh tanh xt xt xt xt 1 0 1 0 1 0 0 1 Equation in Recurrent Neural Network ot h Weight: W,U,V t-1 tanh ht bias: b,c xt Equation in Recurrent Neural Network ot h t-1 tanh ht xt Vanishing gradient problem • the vanishing gradient problem is caused by the multiplicative nature of the backpropagation algorithm. • Solving • Weight initialization management • LSTM from the research paper told LSTM can reduce 49% of transcription error. • Echo state network GATED RECURRENT UNIT What is GRU • GRU works as the water tap (door) to Reset gate control Update gate • Size of the UPDATE (bring data from the past to direct process in the current) • Size of RESET (delete the past data processing only current data) • The research claimed that GRU can reduce vanishing gradient problem. Data flow in GRU Step1: Update gate (open / close) Step2: Reset gate (open / close) Step3: Current memory Step4: Final memory at current time Weight location Wr Wz W LONG SHORT-TERM MEMORY A problem with training networks RNN suffer from with many layers (e.g. deep neural networks) is that the gradient diminishes dramatically • Vanishing gradient problem is unstable behavior as it is propagated backward through the network. The error when training a deep neural network. may be so small by the time it reaches layers close to the input • Information update in the RNN parameter of the model that it may have very little effect. As such, this becomes smaller and smaller after long running. problem is referred to as the “vanishing gradients” problem. W W r z W https://machinelearningmastery.com/how-to-fix-vanishing-gradients-using-the-rectified-linear-activation-function/ Long Short-Term Memory (LSTM) • Hochreiter & Schmidhuber • Information gets into the cell (1997) solved the problem of whenever its “write” gate is on. getting an RNN to remember things for a long time (like • The information stays in the cell hundreds of time steps). so long as its “keep” gate is on. • LSTM is designed as a memory • Information can be read from cell. the cell by turning on its “read” gate. Each part in LSTM • Cell state • Forgot gate H Fast Cell state • Input gate (Short) • Output gate Input Input gate Forgot gate Forgot C Slow Output gate (Long) Each part in LSTM Each part in LSTM Each part in LSTM Each part in LSTM Weight location Xt Weight location Xt Bidirectional LSTM • A Bidirectional LSTM, or biLSTM, is a sequence processing model that consists of two LSTMs: 1) Forward direction (left to right) 2) Backwards direction (right to left) • BiLSTMs effectively increase the amount of information available to the network, by improving the context to the algorithm immediately. Bidirectional LSTM Heart is not Bidirectional LSTM Heart is not is Nut enlarged is : is Nut : not RNN() trainr(Y, X, network_type="..") rnn The function trainr() has a network-type parameter changing the model. "rnn" = recurrent neural network. "gru" = gate recurrent unit "lstm" = long short-term memory #both gru and lstm are experimental and development. Activity 9.1 Compare RNN, GRU, and LSTM on trainr() # Activity 9.1 Test RNN, GRU and LSTM maxiter = model$numepochs # BY [email protected] Yp = predictr(model, Xt) rm(list=ls()) #install.packages("rnn") par(mfrow=c(2,1)) library("rnn") strConf = sprintf("%s lr:%1.2f hdim:%d packageVersion("rnn") #latest 1.4.0 sig:%s",model$network_type, model$learningrate, model$hidden_dim, model$sigmoid) x1 = 1:(5*4*3) plot(colMeans(model$error[,1:maxiter]),type='l',xlab='Epoch' x2 = 101:(100+5*4*3) , y = 201:(200+5*4*3) main = strConf, ylab='Errors') mx1 = matrix(x1,ncol=4) plot(as.vector(Yp), col = 'red', type='l', mx2 = matrix(x2,ncol=4) main = "Actual vs Predicted on Training Data Set", my = matrix(y,ncol=4) ylab = "Yt,Yp") lines(as.vector(Yt), type = 'l', col = 'black') X = array( c(mx1,mx2), dim=c(dim(mx1),2) ) Y = array( c(my), dim=c(dim(my),1) ) dim(X);X dim(Y);Y Xt = X/261 Yt = Y/261 model = NULL model = trainr(Y=Yt, X=Xt, model = model, network_type = "rnn", #rnn gru lstm sigmoid = "logistic", #logistic tanh use_bias = FALSE, #TRUE FALSE learningrate = 0.1, hidden_dim = 30, numepochs = 100) DEVELOPMENT TRAINR() Trainr()? • resource: https://rdrr.io/cran/rnn/src/R/trainr.R Activity 9.2 • Open NNDL_activity9_2.r EXAMPLE LSTM IN C LANGUAGE Activity 9.3 Compile and run code in activity 9.3 on visual studio. Summary.

Gate Recurrent Unit, Long Short-Term Memory

An Overview of Deep Learning Strategies for Time Series Prediction

Improving Gated Recurrent Unit Predictions with Univariate Time Series Imputation Techniques

Sentiment Analysis with Gated Recurrent Units

Arxiv:1910.10202V2 [Cs.LG] 6 Aug 2021 Its Richer Representational Capacity

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

Hate Speech Detection for Indonesia Tweets Using Word Embedding and Gated Recurrent Unit

Gated Recurrent Neural Network and Capsule Network Based Approach for Implicit Emotion Detection

Learning Over Long Time Lags

Email Spam Classification Using Gated Recurrent Unit and Long Short-Term Memory

A Quaternion Gated Recurrent Unit Neural Network for Sensor Fusion

Bidirectional Gated Recurrent Unit Neural Network for Chinese Address Element Segmentation

Gated Recurrent Unit Network-Based Short-Term Photovoltaic Forecasting