Universidad Politecnica´ de Madrid Movie recommender based on visual content analysis using deep learning techniques MASTER´ UNIVERSITARIO EN INGENIER´IA DE TELECOMUNICACION´ TRABAJO FIN DE MASTER´ Luc´ıaCasta~neda Gonz´alez 2019 MASTER´ UNIVERSITARIO EN INGENIER´IA DE TELECOMUNICACION´ TRABAJO DE FIN DE Master T´ıtulo: Movie recommender based on visual content analysis using deep learning techniques. Autor: Luc´ıaCasta~nedaGonz´alez Tutor: Alberto Belmonte Hern´andez Ponente: Federico Alvarez´ Garc´ıa Departamento: Se~nales Sistemas y Radiocomunicaciones (SSR) MIEMBROS DEL TRIBUNAL Presidente: Vocal: Secretario: Suplente: Los miembros del tribunal arriba nombrados acuerdan otorgar la calificaci´onde: ......... Madrid, a de de 2019 Universidad Politecnica´ de Madrid Movie recommender based on visual content analysis using deep learning techniques MASTER´ UNIVERSITARIO EN INGENIER´IA DE TELECOMUNICACION´ TRABAJO FIN DE MASTER´ Luc´ıaCasta~neda Gonz´alez 2019 Summary Nowadays there is a growing interest in the artificial intelligence sector and its varied appli- cations allowing solve problems that for humans are very intuitive and nearly automatic, but for machines are very complicated. One of these problems is the automatic recommendation of multimedia content. In this context, the work proposed try to exploit Computer Vision and Deep Learning tech- niques for content analysis in video. Based on intermediate extracted information a recom- mendation engine will be developed allowing the inclusion of learning algorithms using as base data trailers of the films. This project is divided into two main parts. After getting the dataset of movie trailers, the first part of the project consists of the extraction of characteristics from different trailers. For this purpose, computer vision techniques and deep learning architectures will be used. The set of algorithms goes from computer vision tasks as the analysis of color histograms and optical flow to complex analysis of actions or object detectors based on Deep Learning algorithms. The second part of the project is the recommender engine. For the recommender, different machine learning and Deep learning methods will be put into practice in order to learn efficiently about correlations between data. This recommender will be trained using neural networks over the proposed selected dataset. Three different options will be made with three different architectures for the recommender engine. The first will be a simple sequential neural network, the second an autoencoder and the third a double autoencoder. To compare the results of the three options, objective metrics (MSE, MAE, precision) and subjective metrics (polls) will be used. The final output of the project is provide from one input trailer, the ten best matches only based on the content analysis and the trained recommender. Resumen Hoy en d´ıa,existe un inter´escreciente en el sector de la inteligencia artificial y sus variadas aplicaciones que permiten resolver problemas que para los humanos son muy intuitivos y casi autom´aticos,pero para las m´aquinasson muy complicados. Uno de estos problemas es la recomendaci´onautom´aticade contenido multimedia. En este contexto, el trabajo propuesto trata de explotar las t´ecnicasde visi´onartificial y Deep learning para el an´alisisde contenido en v´ıdeo. Bas´andoseen la informaci´onextra´ıda, se desarrollar´aun motor de recomendaci´onque permite la inclusi´onde algoritmos de aprendizaje que utilizan como base de datos tr´aileresde pel´ıculas. Este proyecto se divide en dos partes principales. Tras obtener el conjunto de datos de tr´aileres de pel´ıculas, la primera parte del proyecto consiste en la extracci´onde caracter´ısticas de dichos tr´aileres.Para este prop´osito,se utilizar´ant´ecnicasde visi´onartificial y arquitecturas de aprendizaje profundo. El conjunto de algoritmos va desde tareas de procesamiento de im´agenes,como el an´alisisde histogramas de color y flujo ´optico,hasta an´alisiscomplejos de acciones o detectores de objetos basados en algoritmos de Deep learning. La segunda parte del proyecto es la m´aquinade recomendaci´on. Para el recomendador, se pondr´anen pr´acticadiferentes m´etodos de aprendizaje autom´aticoy aprendizaje profundo para aprender de manera eficiente sobre las correlaciones entre los datos. Este recomendador se capacitar´autilizando redes neuronales sobre el conjunto de datos seleccionado propuesto. Se realizar´antres opciones diferentes con tres arquitecturas distintas para el motor de re- comendaci´on.La primera ser´auna simple red neuronal secuencial, el segundo un autoencoder y el tercero un doble autoencoder. Para comparar los resultados de las tres opciones, se utilizar´anm´etricasobjetivas (MSE, MAE y precisi´on)y m´etricassubjetivas (encuestas). El resultado final del proyecto proporciona, a partir de un tr´ailerde entrada, las diez mejores coincidencias solo en funci´ondel an´alisis de contenido y el recomendador capacitado. Keywords Machine learning, deep learning, recommender, neuronal network, autoencoder, image pro- cessing, computer vision, Python, Tensorflow, Keras, Pytorch. Palabras clave `Machine-Learning', aprendizaje profundo, recomendador, red neuronal, autoencoder, proce- samiento de im´agenes,visi´onartificial, Python, Tensorflow, Keras, Pytorch. Gracias a mi familia, por el apoyo incondicional a una hija que, cuando les contaba sobre su TFM, parec´ıahablar en klingon. Y a mi tutor por su ayuda inagotable y por contagiarme su entusiasmo. Index 1 Introduction and objectives 1 1.1 Introduction . .1 1.2 Objectives . .2 2 State of the art 3 2.1 Recommendation systems . .3 2.1.1 Deep learning basics . .6 2.1.2 Deep learning and recommendation systems . .8 2.2 Deep learning and visual content based recommendation systems . .9 2.2.1 Computer vision . 10 2.2.2 Action recognition . 13 2.2.3 Object detector . 16 3 Development 21 3.1 Machine Learning and Deep Learning process chain . 21 3.2 Proposed architecture . 27 3.3 Feature extraction . 28 3.3.1 Dataset . 29 3.3.2 Features . 32 3.4 Embedding . 49 3.5 Distances . 51 3.6 Deep Learning Recommender System Architectures . 53 3.6.1 Deep Neural Network . 54 3.6.2 Autoencoder . 57 3.6.3 Double autoencoder . 60 4 Results 66 4.1 Feature extraction . 66 4.1.1 Action recognition . 67 4.1.2 RGB Histogram Feature . 75 4.1.3 Object detector . 86 4.1.4 Optical flow . 95 4.1.5 Joined Feature . 102 4.2 Embedding . 103 4.2.1 Embedding training . 103 4.2.2 Embedding prediction . 105 4.2.3 Comparison between using or not embedding . 106 4.3 Distances . 109 4.3.1 Euclidean distances . 110 4.3.2 Cosine distances . 111 4.4 Recommender Objective Evaluation Metrics . 113 4.5 Deep Neural Network Recommender . 114 4.5.1 Neuronal Network training . 114 4.5.2 Deep Neural Network prediction . 116 4.6 Autoencoder . 117 4.6.1 Autoencoder training . 118 4.6.2 Autoencoder prediction . 119 4.7 Double autoencoder . 120 4.7.1 Double autoencoder training . 120 4.7.2 Double autoencoder prediction . 123 4.8 Subjective comparison between solutions . 124 4.8.1 Surveys . 124 5 Conclusions and future lines 129 5.1 Conclusions . 129 5.2 Future lines . 131 References 132 Appendices 138 A Ethical, social, economic and environmental aspects 139 A.1 Introduction . 139 A.2 Description of relevant impacts related to the project . 139 A.2.1 Ethic impact . 139 A.2.2 Social impact . 140 A.2.3 Economic impact . 141 A.2.4 Environmental impact . 141 A.3 Conclusions . 141 B Economic budget 142 C Survey results 144 C.1 Euclidean distance recommendations . 144 C.2 Artificial Neural Network recommendations . 147 C.3 Autoencoder recommendations . 149 C.4 Double Autoencoder recommendations . 152 D Survey template 155 E Detectable classes by object detector 157 F Detectable classes by the action recogniser 164 Index of figures 2.1 Youtube Machine . .5 2.2 LRNC architecture [1] . 14 2.3 3D CNN example from [2] . 14 2.4 Faster R-CNN architecture . 17 2.5 YOLO working scheme [3] . 18 2.6 SSD working scheme [4] . 18 2.7 RetinaNet working scheme [5] . 20 2.8 Mask R-CNN working scheme [6] . 20 3.1 Proposed architecture . 22 3.2 Gradient descent function . 23 3.3 Classification overfitting . 24 3.4 Classification underfitting . 25 3.5 Classification compromise between underfitting and overfitting . 26 3.6 Proposed architecture . 27 3.7 Multi-genres distribution . 33 3.8 Action recognition Training . 36 3.9 ResNet50 . 37 3.10 Action recognition prediction . 38 3.11 Histogram process chain . 40 3.12 Action film colour histogram . 41 3.13 Action film colour histogram . 41 3.14 Object detector training . 42 3.15 YOLO architecture [3] . 44 3.16 Object detection architectures comparison . ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages193 Page
-
File Size-