Correlated Logistic Model with Elastic Net Regularization for Multilabel

1 Correlated Logistic Model with Elastic Net Regularization for Multilabel Image Classification Qiang Li, Bo Xie, Jane You, Wei Bian, Member, IEEE, and Dacheng Tao, Fellow, IEEE Abstract—In this paper, we present correlated logistic model and multi-task learning [4]. Because of its great generality and (CorrLog) for multilabel image classification. CorrLog extends wide applications, MLC has received increasing attentions in conventional Logistic Regression model into multilabel cases, via recent years from machine learning, data mining, to computer explicitly modelling the pairwise correlation between labels. In addition, we propose to learn model parameters of CorrLog vision communities, and developed rapidly with both algorith- with Elastic Net regularization, which helps exploit the sparsity mic and theoretical achievements [5], [6], [7], [8], [9], [10]. in feature selection and label correlations and thus further The key feature of MLC that makes it distinct from SLC boost the performance of multilabel classification. CorrLog can is label correlation, without which classifiers can be trained be efficiently learned, though approximately, by regularized independently for each individual label and MLC degenerates maximum pseudo likelihood estimation (MPLE), and it enjoys a satisfying generalization bound that is independent of the to SLC. The correlation between different labels can be 2 number of labels. CorrLog performs competitively for multil- verified by calculating the statistics, e.g., χ test and Pearson’s abel image classification on benchmark datasets MULAN scene, correlation coefficient, of their distributions. According to [11], MIT outdoor scene, PASCAL VOC 2007 and PASCAL VOC there are two types of label correlations (or dependence), i.e., 2012, compared to the state-of-the-art multilabel classification the conditional correlations and the unconditional correlations, algorithms. wherein the former describes the label correlations conditioned Index Terms—Correlated logistic model, elastic net, multilabel on a given instance while the latter summarizes the global classification. label correlations of only label distribution by marginalizing out the instance. From a classification point of view, modelling I. INTRODUCTION of label conditional correlations is preferable since they are Multilabel classification (MLC) extends conventional single directly related to prediction; however, proper utilization of un- label classification (SLC) by allowing an instance to be conditional correlations is also helpful, but in an average sense assigned to multiple labels from a label set. It occurs naturally because of the marginalization. Accordingly, quite a number of MLC algorithms have been proposed in the past a few years, from a wide range of practical problems, such as document 1 categorization, image classification, music annotation, web- by exploiting either of the two types of label correlations, and page classification and bioinformatics applications, where each below, we give a brief review of the representative ones. As instance can be simultaneously described by several class it is a very big literature, we cannot cover all the algorithms. labels out of a candidate label set. MLC is also closely related The recent surveys [8], [9] contain many references omitted to many other research areas, such as subspace learning [1], from this paper. nonnegative matrix factorization [2], multi-view learning [3] • By exploiting unconditional label correlations: A large class of MLC algorithms that utilize unconditional label correlations are built upon label transformation. The key This research was supported in part by Australian Research Council Projects FT-130101457, DP-140102164 and LE-140100061. The funding support from idea is to find new representation for the label vector the Hong Kong government under its General Research Fund (GRF) scheme (one dimension corresponds to an individual label), so arXiv:1904.08098v1 [cs.CV] 17 Apr 2019 (Ref. no. 152202/14E) and the Hong Kong Polytechnic University Central that the transformed labels or responses are uncorrelated Research Grant is greatly appreciated. Q. Li is with the Centre for Quantum Computation and Intelligent Systems, and thus can be predicted independently. Original label Faculty of Engineering and Information Technology, University of Technology vector needs to be recovered after the prediction. MLC Sydney, 81 Broadway, Ultimo, NSW 2007, Australia, and also with Depart- algorithms using label transformation include [12] which ment of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong (e-mail: [email protected]). utilizes low-dimensional embedding and [7] and [13] W. Bian and D. Tao are with the Centre for Quantum Computation and which use random projections. Another strategy of using Intelligent Systems, Faculty of Engineering and Information Technology, unconditional label correlations, e.g., used in the stacking University of Technology Sydney, 81 Broadway, Ultimo, NSW 2007, Australia (e-mail: [email protected], [email protected]). method [6] and the “Curds” & “Whey” procedure [14], B. Xie is with College of Computing, Georgia Institute of Technology, is first to predict each individual label independently and Atlanta, GA 30345, USA (email: [email protected]). correct/adjust the prediction by proper post-processing. J. You is with Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong (e-mail: Algorithms are also proposed based on co-occurrence [email protected]). or structure information extracted from the label set, c 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, 1Studies on MLC, from different perspectives rather than label correlations, including reprinting/republishing this material for advertising or promotional also exit in the literature, e.g., by defining different loss functions, dimension purposes, creating new collective works, for resale or redistribution to servers reduction and classifier ensemble methods, but are not in the scope of this or lists, or reuse of any copyrighted component of this work in other works. paper. 2 which include random k-label sets (RAKEL) [15], pruned construction and a piecewise procedure is utilized to train problem transformation (PPT) [16], hierarchical binary the pairwise CRFs model. More recently, clique generating relevance (HBR) [17] and hierarchy of multilabel classi- machine (CGM) [45] proposed to learn the image label graph fiers (HOMER) [8]. Regression-based models, including structure and parameters by iteratively activating a set of reduced-rank regression and multitask learning, can also cliques. It also belongs to the CRFs framework, but the labels be used for MLC, with an interpretation of utilizing are not constrained to be all connected which may result in unconditional label correlations [11]. isolated cliques. • By exploiting conditional label correlations: MLC algorithms in this category are diverse and often developed B. Motivation and Organization by specific heuristics. For example, multilabel K-nearest Correlated logistic model (CorrLog) provides a more princi- neighbour (MLkNN) [5] extends KNN to the multilabel pled way to handle conditional label correlations, and enjoys situation, which applies maximum a posterior (MAP) several favourable properties: 1) built upon independent lo- label prediction by obtaining the prior label distribution gistic regressions (ILRs), it offers an explicit way to model within the K nearest neighbours of an instance. Instance- the pairwise (second order) label correlations; 2) by using the based logistic regression (IBLR) [6] is also a localized pseudo likelihood technique, the parameters of CorrLog can be algorithm, which modifies logistic regression by using learned approximately with a computational complexity linear label information from the neighbourhood as features. with respect to label number; 3) the learning of CorrLog is Classifier chain (CC) [18], as well as its ensemble and stable, and the empirically learned model enjoys a general- probabilistic variants [19], incorporate label correlations ization error bound that is independent of label number. In into a chain of binary classifiers, where the prediction addition, the results presented in this paper extend our previous of a label uses previous labels as features. Channel study [46] in following aspects: 1) we introduce elastic net coding based MLC techniques such as principal label regularization to CorrLog, which facilitates the utilization of space transformation (PLST) [20] and maximum margin the sparsity in both feature selection and label correlations; 2) output coding (MMOC) [21] proposed to select codes that a learning algorithm for CorrLog based on soft thresholding exploits conditional label correlations. Graphical models, is derived to handle the nonsmoothness of the elastic net e.g., conditional random fields (CRFs) [22], are also regularization; 3) the proof of generalization bound is also applied to MLC, which provides a richer framework to extended for the new regularization; 4) we apply CorrLog to handle conditional label correlations. multilabel image classification, and achieve competitive results with the state-of-the-art methods of this area. A. Multilabel Image Classification To ease the presentation, we first summarize the important Multilabel image classification belongs to the generic scope notations in Table I. The

Load more