Workshop track - ICLR 2017 GATED MULTIMODAL UNITS FOR INFORMATION FU- SION Arevalo, John Solorio, Thamar Dept. of Computing Systems and Industrial Engineering Dept. of Computer Science Universidad Nacional de Colombia University of Houston Cra 30 No 45 03-Ciudad Universitaria Houston, TX 77204-3010
[email protected] [email protected] Montes-y-Gomez,´ Manuel Gonzalez,´ Fabio A. Instituto Nacional de Astrof´ısica, Optica´ y Electronica´ Dept. of Computing Systems and Industrial Engineering Computer Science Department Universidad Nacional de Colombia Luis Enrique Erro No. 1, Sta. Ma. Tonantzintla Cra 30 No 45 03-Ciudad Universitaria C.P. 72840 Puebla, Mexico
[email protected] [email protected] ABSTRACT This paper presents a novel model for multimodal learning based on gated neu- ral networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modal- ities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies. 1 INTRODUCTION Representation learning methods have received a lot of attention by researchers and practitioners because of its successful application to complex problems in areas such as computer vision, speech recognition and text processing (LeCun et al., 2015).