Multi-View Vector-Valued Manifold Regularization for Multi-Label

> TNNLS-2012-P-0852 REVISION 2 < 1 Multi-view Vector-valued Manifold Regularization for Multi-label Image Classification Yong Luo, Dacheng Tao, Senior Member, IEEE, Chang Xu, Chao Xu, Hong Liu, Yonggang Wen, Member, IEEE Abstract—In computer vision, image datasets used for clas- SIFT [3]) and global structure (e.g. GIST [4]) can effectively sification are naturally associated with multiple labels and represent natural substances (e.g. sky, cloud and plant life), comprised of multiple views, because each image may contain man-made objects (e.g. aeroplane, motorbike and TV-monitor) several objects (e.g. pedestrian, bicycle and tree) and is properly characterized by multiple visual features (e.g. color, texture and and scenes (e.g. seaside and indoor), respectively, but cannot shape). Currently available tools ignore either the label relation- simultaneously illustrate all these concepts in an effective way. ship or the view complementary. Motivated by the success of the Each visual feature encodes a particular property of images vector-valued function that constructs matrix-valued kernels to and characterizes a particular concept (label), so we treat each explore the multi-label structure in the output space, we introduce feature representation as a particular view for characterizing multi-view vector-valued manifold regularization (MV3MR) to integrate multiple features. MV3MR exploits the complementary images. Fig. 1 (a)-(c) indicate that SIFT representation is property of different features and discovers the intrinsic local effective in describing a motorbike and GIST can capture the geometry of the compact support shared by different features global structure of a person on the motorbike. Fig. 1 (d)-(f) under the theme of manifold regularization. We conducted shows that GIST performs well in recognizing seaside scenes, extensive experiments on two challenging, but popular datasets, while the color information can be used as a complementary PASCAL VOC’ 07 (VOC) and MIR Flickr (MIR), and validated the effectiveness of the proposed MV3MR for image classification. aid for recognizing the blue sea water. From Fig. 1 (g)-(i) we can see that RGB usually represents cloud well and GIST is helpful when RGB fails. For example, the RGB represen- Index Terms—Image classification, semi-supervised, multi- label, multi-view, manifold tations of C1 and C3 are not very similar and their GIST distance (0.22) is very small due to the sky scene structure. This multi-view nature distinguishes image classification from I. INTRODUCTION single-view tasks, such as texture segmentation [5] and face Natural image can be summarized by several keywords recognition [6]. A (or labels). To conduct image classification by directly The vector-valued function [7] has recently been introduced using binary classification methods [1], [2], it is necessary to resolve multi-label classification [8] and has been demon- to assume that labels are independent, although most labels strated to be effective in semantic scene annotation. This appearing in one image are related to one another. Examples method naturally incorporates the label-dependencies into the are given in Fig. 1, where A1-A3 shows a “person” rides a classification model by first computing the graph Laplacian “motorbike”, B1-B3 indicates “sea” usually co-occurs with [9] of the output similarity graph, and then using this graph “sky” and C1-C3 shows some “clouds” in the “sky”. This to construct a vector-valued kernel. This model is superior to multi-label nature makes image classification intrinsically dif- most of the existing multi-label learning methods [10], [11], ferent from simple binary classification. [12] because it naturally considers the label correlations and Moreover, different labels cannot be properly characterized efficiently outputs all the predicted labels at one time. arXiv:1904.03921v1 [stat.ML] 8 Apr 2019 by a single feature representation. For example, the color Although the vector-valued function is effective for general information (e.g. color histogram), shape cue (encoded in multi-label classification tasks, it cannot directly handle image classification problems that include images represented by Y. Luo, C. Xu, and C. Xu are with the Key Laboratory of Machine multi-view features. A popular solution is to concatenate all Perception (Ministry of Education), School of Electronics Engineering and the features into a long vector. This concatenation strategy not Computer Science, Peking University, Beijing, 100871, China. D. Tao is with the Centre for Quantum Computation and Intelligent only ignores the physical interpretations of different features, Systems, University of Technology, Sydney, Jones Street, Ultimo, NSW 2007, however, but also encounters the over-fitting problem given Sydney, Australia. limited training samples. H. Liu is with the Engineering Lab on Intelligent Perception for Internet of Things, Shenzhen Graduate School, Peking University, China. (Email: We thus introduce multi-kernel learning (MKL) to the [email protected]) vector-valued function and present a multi-view vector-valued Y. Wen is with the Division of Networks and Distributed Systems, School of manifold regularization (MV3MR) framework for handling Computer Engineering, Nanyang Technological University, Singapore. (Email: [email protected]) the multi-view features in multi-label image classification. c 2013 IEEE. Personal use of this material is permitted. Permission from MV3MR associates each view with a particular kernel, assigns IEEE must be obtained for all other uses, in any current or future media, a higher weight to the view/kernel carrying more discrimina- including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers tive information, and explores the complementary nature of or lists, or reuse of any copyrighted component of this work in other works. different views. > TNNLS-2012-P-0852 REVISION 2 < 2 SIFT GIST RGB SIFT GIST RGB SIFT GIST RGB A1 0 500 1000 0 256 512 0 2048 4096 B1 0 500 1000 0 256 512 0 2048 4096 C1 0 500 1000 0 256 512 0 2048 4096 A2 0 500 1000 0 256 512 0 2048 4096 B2 0 500 1000 0 256 512 0 2048 4096 C2 0 500 1000 0 256 512 0 2048 4096 A3 0 500 1000 0 256 512 0 2048 4096 B3 0 500 1000 0 256 512 0 2048 4096 C3 0 500 1000 0 256 512 0 2048 4096 A1 A2 A3 A1 A2 A3 A1 A2 A3 B1 B2 B3 B1 B2 B3 B1 B2 B3 C1 C2 C3 C1 C2 C3 C1 C2 C3 A1 0 0.22 0.21 A1 0 0.28 0.26 A1 0 0.41 0.35 B1 0 0.32 0.29 B1 0 0.17 0.20 B1 0 0.43 0.46 C1 0 0.34 0.37 C1 0 0.33 0.22 C1 0 0.24 0.32 A2 0.22 0 0.25 A2 0.26 0 0.25 A2 0.41 0 0.38 B2 0.32 0 0.19 B2 0.17 0 0.15 B2 0.43 0 0.30 C2 0.34 0 0.45 C2 0.33 0 0.32 C2 0.24 0 0.29 A3 0.21 0.25 0 A3 0.24 0.22 0 A3 0.35 0.38 0 B3 0.29 0.19 0 B3 0.20 0.15 0 B3 0.46 0.30 0 C3 0.37 0.45 0 C3 0.22 0.32 0 C3 0.32 0.29 0 (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 1. A1-A3, B1-B3 and C1-C3 are images of person riding a motorbike, seaside and clouds in the sky respectively. Each (a)-(i) contains feature representations and a distance matrix of the samples in a particular view. All the distances have been normalized here. In particular, MV3MR assembles the multi-view informa- [19] adds the label value to the feature vector and then tion through a large number of unlabeled images to discover applies AdaBoost on weak classifiers. A ranking algorithm the intrinsic geometry embedded in the high dimensional am- is presented in [20] by adopting the ranking loss as the bient space of the compact support of the marginal distribution. cost function in SVM. ML-KNN [21] is an extension of The local geometry, approximated by the adjacency graphs the k-nearest neighbor (KNN) algorithm to deal with multi- induced from multiple kernels of all the corresponding views, label data and canonical correlation analysis (CCA) has also is more reliable than that approximated by the adjacency graph recently been extended to the multi-label case by formulating induced from a particular kernel of any corresponding view. it as a least-squares problem [12]. In this way, MV3MR essentially improves the vector-valued Other works concentrate on preprocessing the data so that function for multi-label image classification. standard binary or multi-class techniques can be utilized. For Because the hinge loss is more suitable for classification example, multiple labels of a sample belong to a subset of than the least squares loss [13], we derive an SVM (support the whole label set and we can view this subset as a new vector machine) formulation of MV3MR which results in a class [22]. This may lead to a large number of classes and a multi-view vector-valued Laplacian SVM (MV3LSVM). We more common strategy is to learn a binary classifier for each carefully design the MV3LSVM algorithm so that it deter- label [1], [2]. Considering that the labels are often sparse, mines the set of kernel weights in the learning process of the a compressed sensing method is proposed for multi-label vector-valued function. prediction [18]. We thoroughly evaluate the proposed MV3LSVM algorithm Various approaches have been proposed to improve predic- on two challenging datasets, PASCAL VOC’ 07 [14] and MIR tion accuracy by exploiting label correlations [23], [24], [11], Flickr [15], by comparing it with a popular MKL algorithm [8].

Load more