MoNet: Moments Embedding Network Mengran Gou1 Fei Xiong2 Octavia Camps1 Mario Sznaier1 1Electrical and Computer Engineering, Northeastern University, Boston, MA, US 2Information Sciences Institute, USC, CA, US mengran, camps, msznaier @coe.neu.edu { }
[email protected] Abstract Table 1. Comparison of 2nd order statistic information based neu- ral networks. Bilinear CNN (BCNN) only has 2nd order infor- mation and does not use matrix normalization. Both improved Bilinear pooling has been recently proposed as a feature BCNN (iBCNN) and G2DeNet take advantage of matrix normal- encoding layer, which can be used after the convolutional ization but suffer from large dimensionality because they use the layers of a deep network, to improve performance in mul- square-root of a large pooled matrix. Our proposed MoNet, with tiple vision tasks. Instead of conventional global average the help of a novel sub-matrix square-root layer, can normalize the pooling or fully connected layer, bilinear pooling gathers local features directly and reduce the final representation dimen- 2nd order information in a translation invariant fashion. sion significantly by substituting the fully bilinear pooling with However, a serious drawback of this family of pooling lay- compact pooling. ers is their dimensionality explosion. Approximate pooling 1st order Matrix Compact methods with compact property have been explored towards moment normalization capacity resolving this weakness. Additionally, recent results have BCNN [22,9] shown that significant performance gains can be achieved by using matrix normalization to regularize unstable higher iBCNN [21] order information. However, combining compact pooling G2DeNet [33] with matrix normalization has not been explored until now. In this paper, we unify the bilinear pooling layer and MoNet the global Gaussian embedding layer through the empir- ical moment matrix.