Universidad Politécnica De Madrid Movie Recommender Based On

Universidad Politecnica´ de Madrid Movie recommender based on visual content analysis using deep learning techniques MASTER´ UNIVERSITARIO EN INGENIERÍA DE TELECOMUNICACION´ TRABAJO FIN DE MASTER´ Luc´ıaCasta~neda González 2019 MASTER´ UNIVERSITARIO EN INGENIERÍA DE TELECOMUNICACION´ TRABAJO DE FIN DE Master T´ıtulo: Movie recommender based on visual content analysis using deep learning techniques. Autor: Luc´ıaCasta~nedaGonzález Tutor: Alberto Belmonte Hernández Ponente: Federico Alvarez´ Garc´ıa Departamento: Se~nales Sistemas y Radiocomunicaciones (SSR) MIEMBROS DEL TRIBUNAL Presidente: Vocal: Secretario: Suplente: Los miembros del tribunal arriba nombrados acuerdan otorgar la calificaciónde: ......... Madrid, a de de 2019 Universidad Politecnica´ de Madrid Movie recommender based on visual content analysis using deep learning techniques MASTER´ UNIVERSITARIO EN INGENIERÍA DE TELECOMUNICACION´ TRABAJO FIN DE MASTER´ Luc´ıaCasta~neda González 2019 Summary Nowadays there is a growing interest in the artificial intelligence sector and its varied appli- cations allowing solve problems that for humans are very intuitive and nearly automatic, but for machines are very complicated. One of these problems is the automatic recommendation of multimedia content. In this context, the work proposed try to exploit Computer Vision and Deep Learning techniques for content analysis in video. Based on intermediate extracted information a recommendation engine will be developed allowing the inclusion of learning algorithms using as base data trailers of the films. This project is divided into two main parts. After getting the dataset of movie trailers, the first part of the project consists of the extraction of characteristics from different trailers. For this purpose, computer vision techniques and deep learning architectures will be used. The set of algorithms goes from computer vision tasks as the analysis of color histograms and optical flow to complex analysis of actions or object detectors based on Deep Learning algorithms. The second part of the project is the recommender engine. For the recommender, different machine learning and Deep learning methods will be put into practice in order to learn efficiently about correlations between data. This recommender will be trained using neural networks over the proposed selected dataset. Three different options will be made with three different architectures for the recommender engine. The first will be a simple sequential neural network, the second an autoencoder and the third a double autoencoder. To compare the results of the three options, objective metrics (MSE, MAE, precision) and subjective metrics (polls) will be used. The final output of the project is provide from one input trailer, the ten best matches only based on the content analysis and the trained recommender. Resumen Hoy en d´ıa,existe un interéscreciente en el sector de la inteligencia artificial y sus variadas aplicaciones que permiten resolver problemas que para los humanos son muy intuitivos y casi automáticos,pero para las máquinasson muy complicados. Uno de estos problemas es la recomendaciónautomáticade contenido multimedia. En este contexto, el trabajo propuesto trata de explotar las técnicasde visiónartificial y Deep learning para el análisisde contenido en v´ıdeo. Basándoseen la informaciónextra´ıda, se desarrollaráun motor de recomendaciónque permite la inclusiónde algoritmos de aprendizaje que utilizan como base de datos tráileresde pel´ıculas. Este proyecto se divide en dos partes principales. Tras obtener el conjunto de datos de tráileres de pel´ıculas, la primera parte del proyecto consiste en la extracciónde caracter´ısticas de dichos tráileres.Para este propósito,se utilizarántécnicasde visiónartificial y arquitecturas de aprendizaje profundo. El conjunto de algoritmos va desde tareas de procesamiento de imágenes,como el análisisde histogramas de color y flujo óptico,hasta análisiscomplejos de acciones o detectores de objetos basados en algoritmos de Deep learning. La segunda parte del proyecto es la máquinade recomendación. Para el recomendador, se pondránen prácticadiferentes métodos de aprendizaje automáticoy aprendizaje profundo para aprender de manera eficiente sobre las correlaciones entre los datos. Este recomendador se capacitaráutilizando redes neuronales sobre el conjunto de datos seleccionado propuesto. Se realizarántres opciones diferentes con tres arquitecturas distintas para el motor de re- comendación.La primera seráuna simple red neuronal secuencial, el segundo un autoencoder y el tercero un doble autoencoder. Para comparar los resultados de las tres opciones, se utilizaránmétricasobjetivas (MSE, MAE y precisión)y métricassubjetivas (encuestas). El resultado final del proyecto proporciona, a partir de un tráilerde entrada, las diez mejores coincidencias solo en funcióndel análisis de contenido y el recomendador capacitado. Keywords Machine learning, deep learning, recommender, neuronal network, autoencoder, image pro- cessing, computer vision, Python, Tensorflow, Keras, Pytorch. Palabras clave `Machine-Learning', aprendizaje profundo, recomendador, red neuronal, autoencoder, procesamiento de imágenes,visiónartificial, Python, Tensorflow, Keras, Pytorch. Gracias a mi familia, por el apoyo incondicional a una hija que, cuando les contaba sobre su TFM, parec´ıahablar en klingon. Y a mi tutor por su ayuda inagotable y por contagiarme su entusiasmo. Index 1 Introduction and objectives 1 1.1 Introduction . .1 1.2 Objectives . .2 2 State of the art 3 2.1 Recommendation systems . .3 2.1.1 Deep learning basics . .6 2.1.2 Deep learning and recommendation systems . .8 2.2 Deep learning and visual content based recommendation systems . .9 2.2.1 Computer vision . 10 2.2.2 Action recognition . 13 2.2.3 Object detector . 16 3 Development 21 3.1 Machine Learning and Deep Learning process chain . 21 3.2 Proposed architecture . 27 3.3 Feature extraction . 28 3.3.1 Dataset . 29 3.3.2 Features . 32 3.4 Embedding . 49 3.5 Distances . 51 3.6 Deep Learning Recommender System Architectures . 53 3.6.1 Deep Neural Network . 54 3.6.2 Autoencoder . 57 3.6.3 Double autoencoder . 60 4 Results 66 4.1 Feature extraction . 66 4.1.1 Action recognition . 67 4.1.2 RGB Histogram Feature . 75 4.1.3 Object detector . 86 4.1.4 Optical flow . 95 4.1.5 Joined Feature . 102 4.2 Embedding . 103 4.2.1 Embedding training . 103 4.2.2 Embedding prediction . 105 4.2.3 Comparison between using or not embedding . 106 4.3 Distances . 109 4.3.1 Euclidean distances . 110 4.3.2 Cosine distances . 111 4.4 Recommender Objective Evaluation Metrics . 113 4.5 Deep Neural Network Recommender . 114 4.5.1 Neuronal Network training . 114 4.5.2 Deep Neural Network prediction . 116 4.6 Autoencoder . 117 4.6.1 Autoencoder training . 118 4.6.2 Autoencoder prediction . 119 4.7 Double autoencoder . 120 4.7.1 Double autoencoder training . 120 4.7.2 Double autoencoder prediction . 123 4.8 Subjective comparison between solutions . 124 4.8.1 Surveys . 124 5 Conclusions and future lines 129 5.1 Conclusions . 129 5.2 Future lines . 131 References 132 Appendices 138 A Ethical, social, economic and environmental aspects 139 A.1 Introduction . 139 A.2 Description of relevant impacts related to the project . 139 A.2.1 Ethic impact . 139 A.2.2 Social impact . 140 A.2.3 Economic impact . 141 A.2.4 Environmental impact . 141 A.3 Conclusions . 141 B Economic budget 142 C Survey results 144 C.1 Euclidean distance recommendations . 144 C.2 Artificial Neural Network recommendations . 147 C.3 Autoencoder recommendations . 149 C.4 Double Autoencoder recommendations . 152 D Survey template 155 E Detectable classes by object detector 157 F Detectable classes by the action recogniser 164 Index of figures 2.1 Youtube Machine . .5 2.2 LRNC architecture [1] . 14 2.3 3D CNN example from [2] . 14 2.4 Faster R-CNN architecture . 17 2.5 YOLO working scheme [3] . 18 2.6 SSD working scheme [4] . 18 2.7 RetinaNet working scheme [5] . 20 2.8 Mask R-CNN working scheme [6] . 20 3.1 Proposed architecture . 22 3.2 Gradient descent function . 23 3.3 Classification overfitting . 24 3.4 Classification underfitting . 25 3.5 Classification compromise between underfitting and overfitting . 26 3.6 Proposed architecture . 27 3.7 Multi-genres distribution . 33 3.8 Action recognition Training . 36 3.9 ResNet50 . 37 3.10 Action recognition prediction . 38 3.11 Histogram process chain . 40 3.12 Action film colour histogram . 41 3.13 Action film colour histogram . 41 3.14 Object detector training . 42 3.15 YOLO architecture [3] . 44 3.16 Object detection architectures comparison . ..

Load more