Exploration of Deep Autoencoders on Cooking Recipes

Ghent University Master Thesis Exploration of deep autoencoders on cooking recipes Author: Promoter: Lander Bodyn Prof. Dr. Christophe Ley Tutor: Co-promoter: Ir. Michiel Stock Prof. Dr. Willem Waegeman A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computational Statistics Department of Applied Mathematics, Computer Science and Statistics January 2017 GHENT UNIVERSITY Abstract Master of Science in Computational Statistics Exploration of deep autoencoders on cooking recipes by Lander Bodyn Deep autoencoders are a form of deep neural networks that can be used to reduce the dimensionality of datasets. These deep autoencoder networks can sometimes be very hard to train [1]. The gradient descent algorithm has been explored to train deep autoencoders on a dataset of cooking recipes. Minibatches, momentum and pretraining were added as extensions to improve the gradient descent algorithm. The performance of data reduction to two dimensions of the deep autoencoders was compared to singular value decomposition. The best deep autoencoder model obtained a cross entropy loss of 0.048, much lower than cross entropy loss of 0.066 for singular value decomposition. From the two reduced dimension, the regions of the recipes were predicted using the KNN and QDA algorithms. For the deep autoencoder models, the best prediction accuracy was 65.4%, outperforming the best prediction accuracy of singular value decomposition, 55.4%. The best prediction accuracy of the raw dataset was 69.8%, suggesting that the deep autoencoders maintain the structure of the regions very well in two dimensions. Using a deep autoencoder with data reduction to 100 dimensions, the prediction accuracy was 72.0%, suggesting deep autoencoders might have some usefulness for representation learning on this dataset. Dimensionality reduction tech- niques can also be used as recommender systems, using collaborative filtering. Deep autoencoder models were optimized to have the best retrieval rank of an ingredient that was either removed from or added to an existing recipe. De Clercq et al. [2] have built two similar recommender models on the same dataset: a non-negative matrix factorization and a two-step kernel ridge regression model. The deep autoencoder (mean rank = 25.2) outperforms the non-negative matrix factorization (mean rank = 33.0) and comes close in performance to the two-step kernel ridge regression (mean rank = 23.6). Acknowledgements I would first like to thank my promoter Prof. Dr. Christophe Ley from the Faculty of Sciences at Ghent University. He instantly accepted my plan to do a thesis in deep learning and helped me by proofreading many parts of the thesis, suggesting which parts I should explain more clearly. While my promoter is not accustomed to the field deep learning, he assisted me to find a co-promoter that could guide me in the practical parts of the thesis. This brought me to my co-promoter Prof. Dr. Willem Waegeman and supervisor Ir. Michiel Stock from the Faculty of Bioengineering at Ghent University. I would like to thank both for coming up with a very interesting thesis subject, proof-reading several parts of the thesis and continually guiding me during the thesis. I also want to thank my friend Giancarlo Kerg, who inspired me to start my master in Computational Statistics, as a foundation to move towards the field of machine learning and deep learning in specific. I also want to thank the company Yazzoom, where I could do an internship in deep learning during my thesis. Some of the skills I learned at Yazzoom helped me to make progress in my thesis. Finally, I want to thank my family, who have supported me throughout the whole process. Special thanks to my grandma, who made my lunch and dinner every day and my parents, who endured all the fluctuations in my mood during the writing of my thesis. iii Contents Abstract ii Acknowledgements iii Contents iv 1 Introduction1 1.1 Theoretical background.........................1 1.2 Overview of the thesis..........................2 1.3 The cooking recipes dataset.......................2 2 Methods4 2.1 From artificial intelligence to deep learning...............4 2.1.1 Machine learning.........................5 2.1.2 Representation learning.....................6 2.1.3 Deep learning..........................7 2.2 Deep autoencoders............................9 2.2.1 Network architecture....................... 10 2.2.2 Singular value decomposition for dimension reduction..... 12 2.2.3 Deep autoencoders for dimension reduction........... 13 2.2.4 Deep autoencoders for representation learning......... 15 2.2.5 Deep autoencoders for collaborative filtering.......... 18 2.3 Training the network with gradient descent............... 19 2.3.1 Local minima........................... 20 2.3.2 The vanishing gradient problem................. 21 2.3.3 Initialisation of the network parameters............. 23 2.3.4 Minibatch gradient descent................... 23 2.3.5 Momentum............................ 24 2.4 Optimisation of the hyperparameters.................. 25 2.4.1 The gradient descent hyperparameters............. 26 2.4.1.1 Learning rate δ .................... 27 2.4.1.2 Batchsize....................... 29 2.4.1.3 Inertia α ........................ 29 2.4.1.4 Initialisation range................... 29 iv Contents v 2.4.2 The network architecture.................... 30 2.5 Python and the Theano package.................... 30 2.5.1 Backward propagation of the gradient.............. 30 2.5.2 Other packages......................... 31 3 Results 32 3.1 Training the autoencoders........................ 32 3.1.1 Adding extensions to the gradient descent algorithm...... 32 3.1.2 Plateaus............................. 35 3.2 Comparing data reduction methods................... 36 3.2.1 Singular value decompostion................... 36 3.2.2 Autoencoders.......................... 37 3.3 Prediction of the regions......................... 40 3.4 Collaboratorive filtering for recipe creation............... 43 3.4.1 Reconstruction of the removed ingredient............ 43 3.4.2 Elimination of the added ingredient............... 44 4 Conclusion and discussion 46 4.1 Conclusion................................ 46 4.2 Discussion................................ 47 A Admission for circulating the work 48 Bibliography 49 Chapter 1 Introduction 1.1 Theoretical background An autoencoder is a type of artificial neural network. When a neural network has several hidden layers, the network is called a deep network. The gradient descent algorithm is currently the dominant way of training neural networks. It can however sometimes be difficult to train neural networks using the gradient descent algorithm; this is especially true for deep autoencoders [1]. Autoencoders are designed to reduce the dimensionality of the dataset while minimizing a reconstruction error. They can be seen as a non-linear extension of the linear data reduction method singular value decomposition. Data reduction methods are useful to obtain visualisations of the data in two or three dimensions. Dimensionality reduction has also other applications. For example, the reduced features can be more suitable for a machine learning task than the original features. Since the increase in availability of datasets of cooking recipes online, machine learning is starting to play a prominent role in tasks such as food preference modelling. Having an algorithm that could combine left over ingredients to create a good recipe would be a useful application. De Clercq et al. [2] built two such recommender systems on a dataset containing the ingredients of recipes. For the recommender systems, the authors used a non-negative matrix factorization model and a two-step kernel ridge regression model. Deep autoencoders can also be used as a recommender system: in order to reduce the ingredients of the recipes, meaningful features of the recipes will have to be learned. A selection of ingredients can be reconstructed by the autoencoder, 1 Chapter 1. Introduction 2 after which the selection will resemble the recipes from which the autoencoder has learned its parameters. 1.2 Overview of the thesis In the thesis, it was explored how deep autoencoders can be optimally trained with the gradient descent algorithm on a dataset of cooking recipes. To speed up the gradient descent algorithm and improve its performance, two extensions were added to the algorithm: minibatches and momentum. Aside from improving the gradient descent algorithm itself, pretraining of the network parameters was implemented as another tool to facilitate convergence to a good solution. The deep autoencoder models were compared to singular value decomposition (SVD) for the purpose of data reduction. The performance of the models was measured using a reconstruction error. It was also examined how well both methods maintained the structure of the data, by visually checking if recipes with similar regions of origin lay close together on the biplots of the reduced features. The regions of the recipes were then predicted, using the KNN and QDA algorithms on the reduced features. This predicting might even be better than prediction models using the original dataset: data reduction algorithms can possibly make the data more suitable for the prediction task. The thesis also explored the use of deep autoencoder models as recommender systems. The same dataset and performance measures of De Clerq were used, to enable comparison with their recommender models [2]. 1.3 The cooking recipes dataset

Exploration of Deep Autoencoders on Cooking Recipes

Kitsune: an Ensemble of Autoencoders for Online Network Intrusion Detection

An Introduction to Incremental Learning by Qiang Wu and Dave Snell

Adversarial Examples: Attacks and Defenses for Deep Learning

Deep Learning and Neural Networks Module 4

An Online Machine Learning Algorithm for Heat Load Forecasting in District Heating Systems

Online Machine Learning: Incremental Online Learning Part 2

Data Mining, Machine Learning and Big Data Analytics

Merlyn.AI Prudent Investing Just Got Simpler and Safer

Tianbao Yang

Learning to Generate Corrective Patches Using Neural Machine Translation

NIPS 2017 Workshop Book

1 Machine Learning and Microeconomics