PRIL: Perceptron Ranking Using Interval Labeled Data

PRIL: Perceptron Ranking Using Interval Labeled Data Naresh Manwani [email protected] IIIT Hyderabad, India Abstract proaches based on risk minimization principal for learning ordinal classifier has been proposed. Variants of large In this paper, we propose an online learning al- margin frameworks for learning ordinal classifiers are pro- gorithm PRIL for learning ranking classifiers us- posed in Shashua & Levin(2002); Chu & Keerthi(2005). ing interval labeled data and show its correct- One can maintain the order of thresholds implicitly or ex- ness. We show its convergence in finite number plicitly. In the explicit way the ordering is posed as a con- of steps if there exists an ideal classifier such that straint in the optimization problems itself. While the im- the rank given by it for an example always lies plicit method captures the ordering by posing separability in its label interval. We then generalize this mis- conditions between every pair of classes. Li & Lin(2006) take bound result for the general case. We also propose a generic method which converts learning an ordi- provide regret bound for the proposed algorithm. nal classifier into learning a binary classifier with weighted We propose a multiplicative update algorithm for examples. A classical online algorithm for learning linear PRIL called M-PRIL. We provide its correctness classifiers is proposed in Rosenblatt(1958). Crammer & and convergence results. We show the effective- Singer(2001b) extended Perceptron learning algorithm for ness of PRIL by showing its performance on var- ordinal classifiers. ious datasets. In the approaches discussed so far, the training data has correct class label for each feature vector. However, in many 1. Introduction cases we may not know the exact label. Instead, we may have an interval in which the true label lies. Such a sce- Ranking (also called as ordinal classification) is an impor- nario is discussed in Antoniuk et al.(2015; 2016). In this tant problem in machine learning. Ranking is different setting, corresponding to each example, an interval label is from multi-class classification problem in the sense that provided and it is assumed that the true label of the example there is an ordering among the class labels. For example, lies in this interval. In Antoniuk et al.(2016), a large mar- product ratings provided on online retail stores based on gin framework for batch learning is proposed using interval customer reviews, product quality, price and many other insensitive loss function. factors. Usually these ratings are numbered between 1-5. While these numbers can be thought of as class labels, there In this paper, we propose an online algorithm for learn- is also an ordering which has to be taken care. This prob- ing ordinal classifier using interval labeled data. We name lem has been very well addressed in the machine learning the proposed approach as PRIL (Perceptron ranking using and referred as ordinal classification or ranking. interval labeled data). Our approach is based on interval arXiv:1802.03873v1 [cs.LG] 12 Feb 2018 insensitive loss function. As per our knowledge, this is the In general, an ordinal classifier can be completely defined first ever online ranking algorithm using interval labeled by a linear function and a set of K − 1 thresholds (K data. We show the correctness of the algorithm by showing be the number of classes). Each threshold corresponds that after each iteration, the algorithm maintains the order- to a class. Thus, the thresholds should have the same or- ings of the thresholds. We derive the mistake bounds for der as their corresponding classes. The classifier decides the proposed algorithm in both ideal and general setting. the rank (class) based on the relative position of the lin- In the ideal setting, we show that the algorithm stops after ear function value with respect to different thresholds. One making finite number of mistakes. We also derive the regret can learn a non-linear classifier also by using an appropri- bound for the algorithm. We also propose a multiplicative ate nonlinear transformation. A lot of discriminative ap- update algorithm for PRIL (called M-PRIL). We also show the correctness of M-PRIL and find its mistake bound. The rest of the paper is organized as follows. In section2, we describe the problem of learning ordinal classifier using PRIL: Perceptron Ranking Using Interval Labeled Data t interval labeled data. In section3, we discuss the proposed Note that if for any i 2 f1; : : : ; yl − 1g, f(x) < θi, then t online algorithm for learning ordinal classifier using inter- f(x) < θj; 8j = yr;:::;K − 1 because θ1 ≤ θ2 ≤ t val labeled data. We derive the mistake bounds and the ::: ≤ θK−1. Similarly, if for any i 2 fyr;:::;K − 1g, t regret bound in section 3.2. We present the experimental f(x) > θi, then f(x) > θj; 8j = 1; : : : ; yl − 1. Let results in section5. We make the conclusions and some I¯ = f1; : : : ; yl − 1g [ fyr;:::;K − 1g. Then, we define remarks on the future work in section6. zi; 8i 2 I¯ as follows. ( +1 8i 2 f1; : : : ; yl − 1g 2. Ordinal Classification using Interval zi = (3) Labeled Data −1 8i 2 fyr;:::;K − 1g d IMC Let X ⊆ R be the instance space. Let Y = f1;:::;Kg Thus, LI (f(x); yl; yr; θ) = 0 requires that zi(f(x) − ¯ IMC be the label space. Our objective is to learn an ordinal clas- θi) ≥ 0; 8i 2 I. Thus, LI can be re-written as: sifier h : X!Y which has the following form IMC X LI (f(x); yl; yr; θ) = max (0; −zi (f(x) − θi)) K−1 X i2I¯ h(x) = 1 + Ifw:x>θkg = min fi : w:x − bi ≤ 0g i2[K−1] k=1 3. Perceptron Ranking using Interval Labeled where w 2 Rd and θ 2 RK−1 be the parameters to be opti- Data mized. Parameters θ = [θ1 : : : θK−1] should be such that θ1 ≤ θ2 ≤ ::: ≤ θK−1. The classifier splits the real line In this section, we propose an online algorithm for ranking into K consecutive intervals using thresholds θ1; : : : ; θK−1 using interval insensitive loss described in eq. (2). Our al- IMC and then decides the class label based on which interval gorithm is based on stochastic gradient descent on LI . corresponds to w:x. We derive the algorithm for linear classifier. Which means, Here, we assume that for each example x, the anno- f(x) = w:x. Thus, the parameters to be estimated are w 0 0 t t tator provides an interval [yl; yr] 2 Y × Y (yl ≤ and θ. We initialize with w = 0 and θ = 0. Let w ; θ yr). The interval annotation means that the true label be the estimates of the parameters in the beginning of trial t t t y for example x lies in the interval [yl; yr]. Let S = t. Let at trial t, x be the example observed and [yl ; yr] be 1 1 1 T T T its label interval. wt+1 and θt+1 are found as follows. f(x ; yl ; yr );:::; (x ; yl ; yr )g be the training set. t+1 t IMC t t t Discrepancy between the predicted label and correspond- w = w − ηr L (w:x ; y ; y ; θ) t w I l r wt;θ ing label interval can be measured using interval insensitive t X t t = w + z x t t t t loss (Antoniuk et al., 2016). i Ifzi (w :x −θi )<0g i2I¯t y −1 l K−1 IMC t t t MAE X X t+1 t @LI (w:x ; yl ; yr; θ) L (f(x); θ; yl; yr) = + θ = θ − η t I Iff(x)<θig Iff(x)≥θig i i wt;θ @θi i=1 i=yr (1) ( t t ¯t θi − zi Ifzt(wt:xt−θt)<0g 8i 2 I = i i Where subscript I stands for interval. This, loss function t ¯t θi 8i2 = I takes value 0, whenever θyl−1 ≤ f(x) ≤ θyr . However, this loss function is discontinuous. A convex surrogate of Thus, only those constraints will participate in the update this loss function is as follows (Antoniuk et al., 2016): which are not satisfied. The violation of ith constraint leads t t t+1 t t+1 yl−1 to the update contribution of zi x in w and zi in θi . IMC X θt; t2 = I¯t are not updated in trial t. The complete approach LI (f(x); yl; yr; θ) = max (0; −f(x) + θi) i i=1 is described in Algorithm1. It is important to see that when K−1 exact labels are given to the Algorithm1 instead of par- X + max (0; f(x) − θi) (2) tial labels, it becomes same as the algorithm proposed in i=yr (Crammer & Singer, 2001b). PRIL can be easily extended for learning nonlinear classifiers using kernel methods. Here IMC stands for the implicit constraints for ordering of thresholds θis. For a given example-interval pair IMC 3.1. Kernel PRIL fx; (yl; yr)g, the loss LI (f(x); θ; yl; yr) becomes zero only when We can easily extend the proposed algorithm PRIL for learning nonlinear classifiers using kernel functions. We f(x) − θi ≥ 0 8i 2 f1; : : : ; yl − 1g see that the classifier learnt after t trials using PRIL can s f(x) − θi ≤ 0 8i 2 fyr;:::;K − 1g be completely determined using τi ; i 2 f1;:::;K − PRIL: Perceptron Ranking Using Interval Labeled Data Algorithm 1 Perceptron Ranking using Interval Labeled Algorithm 2 Kernel PRIL Data (PRIL) Input: Training Dataset S t t Input: Training Dataset S Output: τ1; : : : ; τK−1; t = 1 :::T 1 1 1 0 0 0 Initialize Set t = 1, w1 = 0, θ1 = θ2 = ::: = θK−1 = Initialize: Set τ1 = ::: = τK−1 = 0, f (:) = 0 and 0, m = 1 t = 0 for i 1 to T do for t 1 to N do t t t t t Get example xt and its (yl ; yr) Get example x and its (yl ; yr); t t for i 1 to yl − 1 do for i 1 to yl − 1 do t t zi = +1 zi = +1; end for end for t t for i yr to K − 1 do for i yr to K − 1 do t t zi = −1 zi = −1; end for end for t t Initialize τi = 0; i 2 [K − 1] Initialize τi = 0; 8i 2 [K]; ¯ for i 2 It do for i 2 I¯t do t t t t t t t t if zi (w :x − θi ) ≤ 0 then if zi f (x ) − θi ≤ 0 then t t t t τi = zi τi = zi ; end if end if end for end for t+1 t PK−1 t t t+1 t P t t w = w + i=1 τi x f (:) = f (:) + i2I¯t τi κ(x ;:) θt+1 = θt − τ t; i = 1 :::K − 1 t+1 t t i i i θi = θi − τi ; 8i 2 [K] end for end for h(x) = min i : wT +1:x − θT +1 < 0 T +1 t Output: i2[K] i h(x) = mini2[K]fi : f (x) − θi < 0g t+1 Pt P s s 1g; s 2 [t] as follows w = ¯s τ x and i s=1 I i Proof: Note that θt 2 Z; 8i 2 f1;:::;K − 1g; 8t 2 t+1 Pt s t+1 1 θi = − s=1 τi ; i = 1 :::K − 1.

PRIL: Perceptron Ranking Using Interval Labeled Data

A Survey on Data Collection for Machine Learning a Big Data - AI Integration Perspective

Learning from Noisy Labels with Deep Neural Networks: a Survey Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil Lee

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

Learning from Minimally Labeled Data with Accelerated Convolutional Neural Networks Aysegul Dundar Purdue University

Exploring Semi-Supervised Variational Autoencoders for Biomedical Relation Extraction

Evaluation of Semi-Supervised Learning Using Sparse Labeling To

Intuition Learninglabeled Data (C) Semi - Supervised Learning

Applying Self-Supervised Learning for Semantic Cloud Segmentation of All

Learning Classification with Both Labeled and Unlabeled Data

Application to Transductive Semi-Supervised Learning

Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes

Deep Co-Training for Semi-Supervised Image Recognition