Performance Analysis of Various Activation Functions Using LSTM Neural Network for Movie Recommendation Systems

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems ANDRÉ BROGÄRD PHILIP SONG KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems ANDRÉ BROGÄRD, PHILIP SONG Degree Project in Computer Science, DD142X Date: June 8, 2020 Supervisor: Erik Fransén Examiner: Pawel Herman School of Electrical Engineering and Computer Science Swedish title: Prestandaanalys av olika aktiveringsfunktioner i LSTM neurala nätverk applicerat på rekommendationssystem för filmer iii Abstract The growth of importance and popularity of recommendations system has in- creased in many various areas. This thesis focuses on recommendation systems for movies. Recurrent neural networks using LSTM blocks have shown some success for movie recommendation systems. Research has indicated that by changing activation functions in LSTM blocks, the performance, measured as accuracy in predictions, can be improved. In this study we compare four different activation functions (hyperbolic tangent, sigmoid, ELU and SELU activation functions) used in LSTM blocks, and how they impact the prediction accuracy of the neural networks. Specifically, they are applied to the block input and the block output of the LSTM blocks. Our results indicate that the hyperbolic tangent, which is the default, and sigmoid function perform about the same, whereas the ELU and SELU functions perform worse. Further research is needed to identify other activation functions that could improve the prediction accuracy and improve certain aspects of our methodology. iv Sammanfattning Rekommendationssystem har ökat i betydelse och popularitet i många olika områden. Denna avhandling fokuserar på rekommendationssystem för filmer. Recurrent neurala nätverk med LSTM blocks har visat viss framgång för rekommendationssystem för filmer. Tidigare forskning har indikerat att en änd- ring av aktiverings funktioner har resulterat i förbättrad prediktering. I denna studie jämför vi fyra olika aktiveringsfunktioner (hyperbolic tangent, sigmoid, ELU and SELU) som appliceras i LSTM blocks och hur de påverkar predik- teringen i det neurala nätverket. De appliceras specifikt på block input och block output av LSTM blocken. Våra resultat indikerar att den hyperboliska tangentfunktionen, som är standardvalet, och sigmoid funktionen presterar li- ka, men ELU och SELU presterar båda sämre. Ytterligare forskning krävs för att indentifiera andra aktiveringsfunktioner och för att förbättra flera delar av metodologin. Contents 1 Introduction 1 1.1 Problem Statement . .2 1.2 Scope . .2 2 Background 3 2.1 Artifical Neural Networks . .3 2.2 Multilayer Perception ANN . .4 2.3 Recurrent Neural Network . .4 2.4 Long Short-Term Memory . .5 2.4.1 LSTM Architecture . .5 2.4.2 Activation Functions . .6 2.5 Metrics . .9 2.6 Related work . 10 3 Methods 12 3.1 Dataset . 12 3.2 Implementation . 12 3.3 Evaluation . 13 4 Results 14 5 Discussion 17 5.1 Result . 17 5.2 Improvements . 18 6 Conclusions 19 Bibliography 20 v Chapter 1 Introduction With more online movie platforms becoming available, people have a lot of movie content to choose from. According to a study from Ericsson, people spend up to one hour per day searching for movie content [1]. Seeking to min- imize this time, movie recommendation systems have been developed using Artificial Intelligence [2]. Recommendation systems aim to solve the problem of information over- loading, which denies access to interesting items, by filtering information [3]. One such way is through collaborative filtering (CF) where similar users’ in- terests are considered [3]. Popular approaches to CF include the use of neural networks, and in [4] it is demonstrated that CF can be converted to a sequence prediction problem with the use of recurrent neural networks (RNN). Long Short-Term Memory (LSTM), an RNN with LSTM blocks, was designed in order to solve a problem with RNN and has shown an improvement in performance [5]. LSTM has been applied in several recommendation systems [6] targeted at both entertainment (movie, music, videos) and e-commerce set- tings and has outperformed state-of-the-art models in many cases. In [4] an LSTM neural network was applied to the top-N recommendation problem, using the default choice of activation functions, recommending 10 movies the user would be interested in seeing next. The rating of a movie was ignored, only the sequence of watched movies was considered. It was observed that extra features such as age, rating or sex did not lead to an increase in accuracy. Both the Movielens and Netflix dataset were used and LSTM outperformed all baseline models in nearly all metrics. This study will use the same framework as in [4]. Since there has been success in switching activation functions [7], the study will compare different choices of activation functions in LSTM blocks and its impact on prediction 1 2 CHAPTER 1. INTRODUCTION accuracy in the context of movie recommendations. 1.1 Problem Statement The most important functionality for movie recommendation systems is the ability to predict a user’s preference of movies. Therefore, in our project we will investigate the performance, measured as accuracy in predictions for movies, of LSTM using various activation functions applied to the top-N rec- ommender problem in movie recommendation. To this end we pose a question: How does applying different activation functions to LSTM blocks affect the accuracy of predicting movies for users? 1.2 Scope The implementation of LSTM is the same as in [4] with small modifications. This study is therefore only considering this type of LSTM applied on the top N recommendation problem. In [4] they limit the amount of features to only three (user id, movie id and timestamp) and further conclude that more features such as sex or age doesn’t improve the accuracy of the models, unless they are all put together. We limit the features identically. Only the Movielens 1M dataset will be used in this study because of limited computational resources. Additionally, only the hyperbolic, sigmoid, ELU and SELU activation functions will be tested due to them showing promising results in previous work. Chapter 2 Background 2.1 Artifical Neural Networks Artificial Neural Networks (ANN) are a type of computing system inspired by the biological neural networks in human brains [8]. There are a lot of different types of networks, all characterized with the following components: a set of nodes, in our case artificial neurons (nodes), and connections between these nodes called weights. Like the synapses in a biological brain, each connection between nodes can transmit a signal to other nodes. The neurons re- ceive inputs and some processing and computing occurs and then an output has been obtained which can be signaled to other neurons connected to it. The weight in each connection determines the strength of one node’s influence on another[9]. Figure 2.1 shows how an artificial neuron receives inputs, which are multiplied by weights and then the mathematical function, activation function, determines the activation of the neuron. The activation functions will be more thoroughly discussed in section 2.4.2. Figure 2.1: An artificial Neuron 3 4 CHAPTER 2. BACKGROUND 2.2 Multilayer Perception ANN Multilayer perceptron (MLP) are comprised of one or more layers of neurons. The numbers of neurons in the input and output layer depends on the problem whereas the number of neurons in the hidden layers are arbitrary. The goal of MLPs is to approximate a function f ∗. For example, a classifier y = f ∗(x) maps an input x to a category y. The MLPs are also called feedforward neural networks because information flows through the function being evaluated from x, through the intermediate computations used to define f, and finally to the output y. There are no feedback connections in which outputs of the model are fed back to itself. If an MLP were to include feedback connections, they would be a recurrent neural network (RNN) [10]. 2.3 Recurrent Neural Network A weakness with MLP is that is lacks the ability to learn and efficiently store temporal dependencies [10]. A recurrent neural network is specialized for processing a sequence of values and they can scale to much longer sequences than networks without sequence-based specialization. Another advantage for RNN over MLP is the ability to share parameters across different parts of a model. For example, if we have two sentences, “I went to Nepal in 2009” and “In 2009, I went to Nepal”. Then extracting when the narrator went to Nepal using a machine learning model, the MLP model, which processes sentences of fixed length would have separate parameters for each input feature, which means it would need to learn all the rules of the language separately in each position in the sentence. Whereas, the RNN shares the same weights across several time steps. However, RNN has a problem with long-term memory, meaning it lacks the ability to connect present information to old information in order to achieve correct context [10]. For example, consider trying to predict the last word in the meaning “I grew up in France... I speak fluent French”. Latest information suggests the word to be a language. But to tell which specific language it is, context from further back in the text about France is needed. It is possible for the gap between the recent information and the information further back to become very large. As this gap grows RNNs become unable to use the past information as context for the recent information. Fortunately, Long Short-Term Memory neural network, is explicitly designed to solve long-term dependency problem [11]. CHAPTER 2. BACKGROUND 5 2.4 Long Short-Term Memory As discussed in previous section, RNN has a problem with long term memory.

Performance Analysis of Various Activation Functions Using LSTM Neural Network for Movie Recommendation Systems

Training Autoencoders by Alternating Minimization

CS 189 Introduction to Machine Learning Spring 2021 Jonathan Shewchuk HW6

Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Analysis

Neural Network in Hardware

Population Dynamics: Variance and the Sigmoid Activation Function ⁎ André C

Neural Networks Shuai Li John Hopcroft Center, Shanghai Jiao Tong University

Dynamic Modification of Activation Function Using the Backpropagation Algorithm in the Artificial Neural Networks

A Deep Reinforcement Learning Neural Network Folding Proteins

Two-Steps Implementation of Sigmoid Function for Artificial Neural Network in Field Programmable Gate Array

Information Flows of Diverse Autoencoders

A Study of Activation Functions for Neural Networks Meenakshi Manavazhahan University of Arkansas, Fayetteville

Compositional Distributional Semantics with Long Short Term Memory