PRIL: Perceptron Ranking Using Interval Labeled Data

Total Page:16

File Type:pdf, Size:1020Kb

PRIL: Perceptron Ranking Using Interval Labeled Data PRIL: Perceptron Ranking Using Interval Labeled Data Naresh Manwani [email protected] IIIT Hyderabad, India Abstract proaches based on risk minimization principal for learn- ing ordinal classifier has been proposed. Variants of large In this paper, we propose an online learning al- margin frameworks for learning ordinal classifiers are pro- gorithm PRIL for learning ranking classifiers us- posed in Shashua & Levin(2002); Chu & Keerthi(2005). ing interval labeled data and show its correct- One can maintain the order of thresholds implicitly or ex- ness. We show its convergence in finite number plicitly. In the explicit way the ordering is posed as a con- of steps if there exists an ideal classifier such that straint in the optimization problems itself. While the im- the rank given by it for an example always lies plicit method captures the ordering by posing separability in its label interval. We then generalize this mis- conditions between every pair of classes. Li & Lin(2006) take bound result for the general case. We also propose a generic method which converts learning an ordi- provide regret bound for the proposed algorithm. nal classifier into learning a binary classifier with weighted We propose a multiplicative update algorithm for examples. A classical online algorithm for learning linear PRIL called M-PRIL. We provide its correctness classifiers is proposed in Rosenblatt(1958). Crammer & and convergence results. We show the effective- Singer(2001b) extended Perceptron learning algorithm for ness of PRIL by showing its performance on var- ordinal classifiers. ious datasets. In the approaches discussed so far, the training data has cor- rect class label for each feature vector. However, in many 1. Introduction cases we may not know the exact label. Instead, we may have an interval in which the true label lies. Such a sce- Ranking (also called as ordinal classification) is an impor- nario is discussed in Antoniuk et al.(2015; 2016). In this tant problem in machine learning. Ranking is different setting, corresponding to each example, an interval label is from multi-class classification problem in the sense that provided and it is assumed that the true label of the example there is an ordering among the class labels. For example, lies in this interval. In Antoniuk et al.(2016), a large mar- product ratings provided on online retail stores based on gin framework for batch learning is proposed using interval customer reviews, product quality, price and many other insensitive loss function. factors. Usually these ratings are numbered between 1-5. While these numbers can be thought of as class labels, there In this paper, we propose an online algorithm for learn- is also an ordering which has to be taken care. This prob- ing ordinal classifier using interval labeled data. We name lem has been very well addressed in the machine learning the proposed approach as PRIL (Perceptron ranking using and referred as ordinal classification or ranking. interval labeled data). Our approach is based on interval arXiv:1802.03873v1 [cs.LG] 12 Feb 2018 insensitive loss function. As per our knowledge, this is the In general, an ordinal classifier can be completely defined first ever online ranking algorithm using interval labeled by a linear function and a set of K − 1 thresholds (K data. We show the correctness of the algorithm by showing be the number of classes). Each threshold corresponds that after each iteration, the algorithm maintains the order- to a class. Thus, the thresholds should have the same or- ings of the thresholds. We derive the mistake bounds for der as their corresponding classes. The classifier decides the proposed algorithm in both ideal and general setting. the rank (class) based on the relative position of the lin- In the ideal setting, we show that the algorithm stops after ear function value with respect to different thresholds. One making finite number of mistakes. We also derive the regret can learn a non-linear classifier also by using an appropri- bound for the algorithm. We also propose a multiplicative ate nonlinear transformation. A lot of discriminative ap- update algorithm for PRIL (called M-PRIL). We also show the correctness of M-PRIL and find its mistake bound. The rest of the paper is organized as follows. In section2, we describe the problem of learning ordinal classifier using PRIL: Perceptron Ranking Using Interval Labeled Data t interval labeled data. In section3, we discuss the proposed Note that if for any i 2 f1; : : : ; yl − 1g, f(x) < θi, then t online algorithm for learning ordinal classifier using inter- f(x) < θj; 8j = yr;:::;K − 1 because θ1 ≤ θ2 ≤ t val labeled data. We derive the mistake bounds and the ::: ≤ θK−1. Similarly, if for any i 2 fyr;:::;K − 1g, t regret bound in section 3.2. We present the experimental f(x) > θi, then f(x) > θj; 8j = 1; : : : ; yl − 1. Let results in section5. We make the conclusions and some I¯ = f1; : : : ; yl − 1g [ fyr;:::;K − 1g. Then, we define remarks on the future work in section6. zi; 8i 2 I¯ as follows. ( +1 8i 2 f1; : : : ; yl − 1g 2. Ordinal Classification using Interval zi = (3) Labeled Data −1 8i 2 fyr;:::;K − 1g d IMC Let X ⊆ R be the instance space. Let Y = f1;:::;Kg Thus, LI (f(x); yl; yr; θ) = 0 requires that zi(f(x) − ¯ IMC be the label space. Our objective is to learn an ordinal clas- θi) ≥ 0; 8i 2 I. Thus, LI can be re-written as: sifier h : X!Y which has the following form IMC X LI (f(x); yl; yr; θ) = max (0; −zi (f(x) − θi)) K−1 X i2I¯ h(x) = 1 + Ifw:x>θkg = min fi : w:x − bi ≤ 0g i2[K−1] k=1 3. Perceptron Ranking using Interval Labeled where w 2 Rd and θ 2 RK−1 be the parameters to be opti- Data mized. Parameters θ = [θ1 : : : θK−1] should be such that θ1 ≤ θ2 ≤ ::: ≤ θK−1. The classifier splits the real line In this section, we propose an online algorithm for ranking into K consecutive intervals using thresholds θ1; : : : ; θK−1 using interval insensitive loss described in eq. (2). Our al- IMC and then decides the class label based on which interval gorithm is based on stochastic gradient descent on LI . corresponds to w:x. We derive the algorithm for linear classifier. Which means, Here, we assume that for each example x, the anno- f(x) = w:x. Thus, the parameters to be estimated are w 0 0 t t tator provides an interval [yl; yr] 2 Y × Y (yl ≤ and θ. We initialize with w = 0 and θ = 0. Let w ; θ yr). The interval annotation means that the true label be the estimates of the parameters in the beginning of trial t t t y for example x lies in the interval [yl; yr]. Let S = t. Let at trial t, x be the example observed and [yl ; yr] be 1 1 1 T T T its label interval. wt+1 and θt+1 are found as follows. f(x ; yl ; yr );:::; (x ; yl ; yr )g be the training set. t+1 t IMC t t t Discrepancy between the predicted label and correspond- w = w − ηr L (w:x ; y ; y ; θ) t w I l r wt;θ ing label interval can be measured using interval insensitive t X t t = w + z x t t t t loss (Antoniuk et al., 2016). i Ifzi (w :x −θi )<0g i2I¯t y −1 l K−1 IMC t t t MAE X X t+1 t @LI (w:x ; yl ; yr; θ) L (f(x); θ; yl; yr) = + θ = θ − η t I Iff(x)<θig Iff(x)≥θig i i wt;θ @θi i=1 i=yr (1) ( t t ¯t θi − zi Ifzt(wt:xt−θt)<0g 8i 2 I = i i Where subscript I stands for interval. This, loss function t ¯t θi 8i2 = I takes value 0, whenever θyl−1 ≤ f(x) ≤ θyr . However, this loss function is discontinuous. A convex surrogate of Thus, only those constraints will participate in the update this loss function is as follows (Antoniuk et al., 2016): which are not satisfied. The violation of ith constraint leads t t t+1 t t+1 yl−1 to the update contribution of zi x in w and zi in θi . IMC X θt; t2 = I¯t are not updated in trial t. The complete approach LI (f(x); yl; yr; θ) = max (0; −f(x) + θi) i i=1 is described in Algorithm1. It is important to see that when K−1 exact labels are given to the Algorithm1 instead of par- X + max (0; f(x) − θi) (2) tial labels, it becomes same as the algorithm proposed in i=yr (Crammer & Singer, 2001b). PRIL can be easily extended for learning nonlinear classifiers using kernel methods. Here IMC stands for the implicit constraints for order- ing of thresholds θis. For a given example-interval pair IMC 3.1. Kernel PRIL fx; (yl; yr)g, the loss LI (f(x); θ; yl; yr) becomes zero only when We can easily extend the proposed algorithm PRIL for learning nonlinear classifiers using kernel functions. We f(x) − θi ≥ 0 8i 2 f1; : : : ; yl − 1g see that the classifier learnt after t trials using PRIL can s f(x) − θi ≤ 0 8i 2 fyr;:::;K − 1g be completely determined using τi ; i 2 f1;:::;K − PRIL: Perceptron Ranking Using Interval Labeled Data Algorithm 1 Perceptron Ranking using Interval Labeled Algorithm 2 Kernel PRIL Data (PRIL) Input: Training Dataset S t t Input: Training Dataset S Output: τ1; : : : ; τK−1; t = 1 :::T 1 1 1 0 0 0 Initialize Set t = 1, w1 = 0, θ1 = θ2 = ::: = θK−1 = Initialize: Set τ1 = ::: = τK−1 = 0, f (:) = 0 and 0, m = 1 t = 0 for i 1 to T do for t 1 to N do t t t t t Get example xt and its (yl ; yr) Get example x and its (yl ; yr); t t for i 1 to yl − 1 do for i 1 to yl − 1 do t t zi = +1 zi = +1; end for end for t t for i yr to K − 1 do for i yr to K − 1 do t t zi = −1 zi = −1; end for end for t t Initialize τi = 0; i 2 [K − 1] Initialize τi = 0; 8i 2 [K]; ¯ for i 2 It do for i 2 I¯t do t t t t t t t t if zi (w :x − θi ) ≤ 0 then if zi f (x ) − θi ≤ 0 then t t t t τi = zi τi = zi ; end if end if end for end for t+1 t PK−1 t t t+1 t P t t w = w + i=1 τi x f (:) = f (:) + i2I¯t τi κ(x ;:) θt+1 = θt − τ t; i = 1 :::K − 1 t+1 t t i i i θi = θi − τi ; 8i 2 [K] end for end for h(x) = min i : wT +1:x − θT +1 < 0 T +1 t Output: i2[K] i h(x) = mini2[K]fi : f (x) − θi < 0g t+1 Pt P s s 1g; s 2 [t] as follows w = ¯s τ x and i s=1 I i Proof: Note that θt 2 Z; 8i 2 f1;:::;K − 1g; 8t 2 t+1 Pt s t+1 1 θi = − s=1 τi ; i = 1 :::K − 1.
Recommended publications
  • A Survey on Data Collection for Machine Learning a Big Data - AI Integration Perspective
    1 A Survey on Data Collection for Machine Learning A Big Data - AI Integration Perspective Yuji Roh, Geon Heo, Steven Euijong Whang, Senior Member, IEEE Abstract—Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research. Index Terms—data collection, data acquisition, data labeling, machine learning F 1 INTRODUCTION E are living in exciting times where machine learning expertise. This problem applies to any novel application that W is having a profound influence on a wide range of benefits from machine learning.
    [Show full text]
  • Learning from Noisy Labels with Deep Neural Networks: a Survey Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil Lee
    UNDER REVIEW 1 Learning from Noisy Labels with Deep Neural Networks: A Survey Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, Jae-Gil Lee Abstract—Deep learning has achieved remarkable success in 100% 100% numerous domains with help from large amounts of big data. Noisy w/o. Reg. Noisy w. Reg. 75% However, the quality of data labels is a concern because of the 75% Clean w. Reg Gap lack of high-quality labels in many real-world scenarios. As noisy 50% 50% labels severely degrade the generalization performance of deep Noisy w/o. Reg. neural networks, learning from noisy labels (robust training) is 25% Accuracy Test 25% Training Accuracy Training Noisy w. Reg. becoming an important task in modern deep learning applica- Clean w. Reg tions. In this survey, we first describe the problem of learning with 0% 0% 1 30 60 90 120 1 30 60 90 120 label noise from a supervised learning perspective. Next, we pro- Epochs Epochs vide a comprehensive review of 57 state-of-the-art robust training Fig. 1. Convergence curves of training and test accuracy when training methods, all of which are categorized into five groups according WideResNet-16-8 using a standard training method on the CIFAR-100 dataset to their methodological difference, followed by a systematic com- with the symmetric noise of 40%: “Noisy w/o. Reg.” and “Noisy w. Reg.” are parison of six properties used to evaluate their superiority. Sub- the models trained on noisy data without and with regularization, respectively, sequently, we perform an in-depth analysis of noise rate estima- and “Clean w.
    [Show full text]
  • Introduction to Deep Learning in Signal Processing & Communications with MATLAB
    Introduction to Deep Learning in Signal Processing & Communications with MATLAB Dr. Amod Anandkumar Pallavi Kar Application Engineering Group, Mathworks India © 2019 The MathWorks, Inc.1 Different Types of Machine Learning Type of Machine Learning Categories of Algorithms • Output is a choice between classes Classification (True, False) (Red, Blue, Green) Supervised Learning • Output is a real number Regression Develop predictive (temperature, stock prices) model based on both Machine input and output data Learning Unsupervised • No output - find natural groups and Clustering Learning patterns from input data only Discover an internal representation from input data only 2 What is Deep Learning? 3 Deep learning is a type of supervised machine learning in which a model learns to perform classification tasks directly from images, text, or sound. Deep learning is usually implemented using a neural network. The term “deep” refers to the number of layers in the network—the more layers, the deeper the network. 4 Why is Deep Learning So Popular Now? Human Accuracy Source: ILSVRC Top-5 Error on ImageNet 5 Vision applications have been driving the progress in deep learning producing surprisingly accurate systems 6 Deep Learning success enabled by: • Labeled public datasets • Progress in GPU for acceleration AlexNet VGG-16 ResNet-50 ONNX Converter • World-class models and PRETRAINED PRETRAINED PRETRAINED MODEL MODEL CONVERTER MODEL MODEL TensorFlow- connected community Caffe GoogLeNet IMPORTER PRETRAINED Keras Inception-v3 MODEL IMPORTER MODELS 7
    [Show full text]
  • Learning from Minimally Labeled Data with Accelerated Convolutional Neural Networks Aysegul Dundar Purdue University
    Purdue University Purdue e-Pubs Open Access Dissertations Theses and Dissertations 4-2016 Learning from minimally labeled data with accelerated convolutional neural networks Aysegul Dundar Purdue University Follow this and additional works at: https://docs.lib.purdue.edu/open_access_dissertations Part of the Computer Sciences Commons, and the Electrical and Computer Engineering Commons Recommended Citation Dundar, Aysegul, "Learning from minimally labeled data with accelerated convolutional neural networks" (2016). Open Access Dissertations. 641. https://docs.lib.purdue.edu/open_access_dissertations/641 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. Graduate School Form 30 Updated ¡¢ ¡£¢ ¡¤ ¥ PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance This is to certify that the thesis/dissertation prepared By Aysegul Dundar Entitled Learning from Minimally Labeled Data with Accelerated Convolutional Neural Networks For the degree of Doctor of Philosophy Is approved by the final examining committee: Eugenio Culurciello Chair Anand Raghunathan Bradley S. Duerstock Edward L. Bartlett To the best of my knowledge and as understood by the student in the Thesis/Dissertation Agreement, Publication Delay, and Certification Disclaimer (Graduate School Form 32), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy of Integrity in Research” and the use of copyright material. Approved by Major Professor(s): Eugenio Culurciello Approved by: George R. Wodicka 04/22/2016 Head of the Departmental Graduate Program Date LEARNING FROM MINIMALLY LABELED DATA WITH ACCELERATED CONVOLUTIONAL NEURAL NETWORKS A Dissertation Submitted to the Faculty of Purdue University by Aysegul Dundar In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2016 Purdue University West Lafayette, Indiana ii To my family: Nezaket, Cengiz and Deniz.
    [Show full text]
  • Exploring Semi-Supervised Variational Autoencoders for Biomedical Relation Extraction
    Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction Yijia Zhanga,b and Zhiyong Lua* a National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland 20894, USA b School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China Corresponding author: Zhiyong Lu ([email protected]) Abstract The biomedical literature provides a rich source of knowledge such as protein-protein interactions (PPIs), drug-drug interactions (DDIs) and chemical-protein interactions (CPIs). Biomedical relation extraction aims to automatically extract biomedical relations from biomedical text for various biomedical research. State-of-the-art methods for biomedical relation extraction are primarily based on supervised machine learning and therefore depend on (sufficient) labeled data. However, creating large sets of training data is prohibitively expensive and labor-intensive, especially so in biomedicine as domain knowledge is required. In contrast, there is a large amount of unlabeled biomedical text available in PubMed. Hence, computational methods capable of employing unlabeled data to reduce the burden of manual annotation are of particular interest in biomedical relation extraction. We present a novel semi-supervised approach based on variational autoencoder (VAE) for biomedical relation extraction. Our model consists of the following three parts, a classifier, an encoder and a decoder. The classifier is implemented using multi-layer convolutional neural networks (CNNs), and the encoder and decoder are implemented using both bidirectional long short-term memory networks (Bi-LSTMs) and CNNs, respectively. The semi-supervised mechanism allows our model to learn features from both the labeled and unlabeled data.
    [Show full text]
  • Evaluation of Semi-Supervised Learning Using Sparse Labeling To
    DE GRUYTER Current Directions in Biomedical Engineering 2020;6(3): 20203103 Roman Bruch*, Rüdiger Rudolf, Ralf Mikut, and Markus Reischl Evaluation of semi-supervised learning using sparse labeling to segment cell nuclei https://doi.org/10.1515/cdbme-2020-3103 To extract this information in a quantitative and objective manner algorithms are needed. The segmentation of nuclei in Abstract: The analysis of microscopic images from cell cul- microscopic images is a fundamental step on which further ac- tures plays an important role in the development of drugs. The tions like cell counting or co-localization of other fluorescent segmentation of such images is a basic step to extract the vi- markers depend on. Algorithms like built-in FIJI plugins [13], able information on which further evaluation steps are build. CellProfiler [9], Mathematica pipelines [14], TWANG [15] Classical image processing pipelines often fail under heteroge- and XPIWIT [1] are well established for this task. Typically, neous conditions. In the recent years deep neuronal networks these pipelines require properties such as object size and shape gained attention due to their great potentials in image segmen- and therefore have to be reparameterized for different record- tation. One main pitfall of deep learning is often seen in the ing conditions or cell lines. In extreme cases, such as the seg- amount of labeled data required for training such models. Es- mentation of apoptotic cells, parameterization is not sufficient pecially for 3D images the process to generate such data is and special algorithms need to be designed [8]. tedious and time consuming and thus seen as a possible rea- In the recent years deep learning models like the U- son for the lack of establishment of deep learning models for Net [11] gained attention in the biological field due to their 3D data.
    [Show full text]
  • Intuition Learninglabeled Data (C) Semi - Supervised Learning
    Feature Supervised Labels representation learning Labeled data (a) Supervised learning Feature Supervised representation learning Labeled data Feature Supervised Labels representation learning Labeled data (b) Transfer learning Feature subspace Unlabeled data Feature Supervised Labels representation learning Intuition LearningLabeled data (c) Semi - supervised learning 1 2 2 Anush Sankaran , Mayank Vatsa , RichaFeature Singh 1 IBM Research subspace Unlabeled data 2 IIIT, Delhi, India Feature Supervised Labels [email protected], [email protected], [email protected] learning Labeled data (d) Self taught learning Abstract Feature Feature Reinforcement Labels “By reading on the title and abstract, do you think subspace representation learning this paper will be accepted in IJCAI 2018?” A com- Unlabeled data mon impromptu reply would be “I don’t know but Feature Supervised Labels I have an intuition that this paper might get ac- representation learning cepted”. Intuition is often employed by humans Labeled data to solve challenging problems without explicit ef- (e) Intuition learning forts. Intuition is not trained but is learned from Figure 1: The proposed solution where an intuition learning algo- one’s own experience and observation. The aim of rithm can supplement a supervised or semi-supervised algorithm at this research is to provide intuition to an algorithm, decision level. apart from what they are trained to know in a super- vised manner. We present a novel intuition learn- ing framework that learns to perform a task com- learning. In other words, intuition is a context dependent guesswork pletely from unlabeled data. The proposed frame- that can be incorrect certain times. In a typical learning work uses a continuous-state reinforcement learn- pipeline, the concept of intuition can be used for a ing mechanism to learn a feature representation and variety of purposes starting from training data selection upto a data-label mapping function using unlabeled data.
    [Show full text]
  • Applying Self-Supervised Learning for Semantic Cloud Segmentation of All
    https://doi.org/10.5194/amt-2021-1 Preprint. Discussion started: 19 March 2021 c Author(s) 2021. CC BY 4.0 License. Applying self-supervised learning for semantic cloud segmentation of all-sky images Yann Fabel1, Bijan Nouri1, Stefan Wilbert1, Niklas Blum1, Rudolph Triebel2,4, Marcel Hasenbalg1, Pascal Kuhn6, Luis F. Zarzalejo5, and Robert Pitz-Paal3 1German Aerospace Center (DLR), Institute of Solar Research, 04001 Almeria, Spain 2German Aerospace Center (DLR), Institute of Robotics and Mechatronics, 82234, Oberpfaffenhofen-Weßling, Germany 3German Aerospace Center (DLR), Institute of Solar Research, 51147 Cologne, Germany 4Technical University of Munich, Chair of Computer Vision & Artificial Intelligence, 85748 Garching, Germany 5CIEMAT Energy Department, Renewable Energy Division, 28040 Madrid, Spain 6EnBW Energie Baden-Württemberg AG, 76131 Karlsruhe, Germany Correspondence: Yann Fabel ([email protected]), Bijan Nouri ([email protected]) Abstract. Semantic segmentation of ground-based all-sky images (ASIs) can provide high-resolution cloud coverage information of distinct cloud types, applicable for meteorology, climatology and solar energy-related applications. Since the shape and appearance of clouds is variable and there is high similarity between cloud types, a clear classification is difficult. Therefore, 5 most state-of-the-art methods focus on the distinction between cloudy- and cloudfree-pixels, without taking into account the cloud type. On the other hand, cloud classification is typically determined separately on image-level, neglecting the cloud’s position and only considering the prevailing cloud type. Deep neural networks have proven to be very effective and robust for segmentation tasks, however they require large training datasets to learn complex visual features.
    [Show full text]
  • Learning Classification with Both Labeled and Unlabeled Data
    Learning Classification with Both Labeled and Unlabeled Data Jean-Noël Vittaut, Massih-Reza Amini, and Patrick Gallinari Computer Science Laboratory of Paris 6 (LIP6), University of Pierre et Marie Curie 8 rue du capitaine Scott, 75015 Paris, France {vittaut,amini,gallinari}@poleia.lip6.fr Abstract. A key difficulty for applying machine learning classification algorithms for many applications is that they require a lot of hand- labeled examples. Labeling large amount of data is a costly process which in many cases is prohibitive. In this paper we show how the use of a small number of labeled data together with a large number of unla- beled data can create high-accuracy classifiers. Our approach does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning. We propose new discriminant algorithms handling both labeled and unlabeled data for training classification models and we analyze their performances on different information access problems ranging from text span classification for text summarization to e-mail spam detection and text classification. 1 Introduction Semi-supervised learning has been introduced for training classifiers when large amounts of unlabeled data are available together with a much smaller amount of la- beled data. It has recently been a subject of growing interest in the machine learning community. This paradigm particularly applies when large sets of data are produced continuously and when hand-labeling is unfeasible or particularly costly. This is the case for example for many of the semantic resources accessible via the web. In this paper, we will be particularly concerned with the application of semi-supervised learning techniques to the classification of textual data.
    [Show full text]
  • Application to Transductive Semi-Supervised Learning
    Learning Deep Representation with limited labels: application to transductive semi-supervised learning Dino Ienco Research Scientist, IRSTEA – UMR TETIS - Montpellier, [email protected] !1 Outline: Introduction, Motivation & Settings A recap on Supervised/Unsupervised Deep Learning Knowledge-aware data embedding (KADE) Learning with KADE Semi-Supervised Clustering Semi-Supervised Classification Conclusions and Trends !2 Introduction In many real world domains we need decisions: - Categorize an item into a category - Organize or structure information according to background knowledge !3 Introduction In many real world domains we need decisions: - Categorize an item into a category - Organize or structure information according to background knowledge Categorize Decide if a mail is SPAM or not Decide the category of an image Decide if a document is about politic or sport Decide if a patient is sick or not !3 Introduction In many real world domains we need decisions: - Categorize an item into a category - Organize or structure information according to background knowledge Categorize Organize Decide if a mail is SPAM or not Retrieved web queries Decide the category of an image Organize Omics data to recover gene/ protein families Decide if a document is about politic or sport Re-organizing textual or image archives given user preferences Decide if a patient is sick or not !3 Introduction - Each decision requires time to be taken - Each decision can also be expensive to take !4 Introduction - Each decision requires time to be taken - Each
    [Show full text]
  • Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes
    Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes Manuela Veloso Carnegie Mellon University Computer Science Department Planning - Fall 2001 Remember the examples on the board. 15-889 – Fall 2001 Veloso, Carnegie Mellon Different Aspects of “Machine Learning” Supervised learning • – Classification - concept learning – Learning from labeled data – Function approximation Unsupervised learning • – Data is not labeled – Data needs to be grouped, clustered – We need distance metric Control and action model learning • – Learning to select actions efficiently – Feedback: goal achievement, failure, reward – Search control learning, reinforcement learning 15-889 – Fall 2001 Veloso, Carnegie Mellon Search Control Learning Improve search efficiency, plan quality • Learn heuristics • (if (and (goal (enjoy weather)) (goal (save energy)) (state (weather fair))) (then (select action RIDE-BIKE))) 15-889 – Fall 2001 Veloso, Carnegie Mellon Learning Opportunities in Planning Learning to improve planning efficiency • Learning the domain model • Learning to improve plan quality • Learning a universal plan • Which action model, which planning algorithm, which heuristic control is the most efficient for a given task? 15-889 – Fall 2001 Veloso, Carnegie Mellon Reinforcement Learning A variety of algorithms to address: • – learning the model – converging to the optimal plan. 15-889 – Fall 2001 Veloso, Carnegie Mellon Discounted Rewards “Reward” today versus future (promised) reward • $100K + $100K + $100K + . • INFINITE!
    [Show full text]
  • Deep Co-Training for Semi-Supervised Image Recognition
    Deep Co-Training for Semi-Supervised Image Recognition Siyuan Qiao1( ) Wei Shen1,2 Zhishuai Zhang1 Bo Wang3 Alan Yuille1 1Johns Hopkins University 2Shanghai University 3Hikvision Research Institute [email protected] Abstract. In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. We present Deep Co-Training, a deep learning based method in- spired by the Co-Training framework. The original Co-Training learns two classifiers on two views which are data from different sources that describe the same instances. To extend this concept to deep learning, Deep Co-Training trains multiple deep neural networks to be the differ- ent views and exploits adversarial examples to encourage view difference, in order to prevent the networks from collapsing into each other. As a re- sult, the co-trained networks provide different and complementary infor- mation about the data, which is necessary for the Co-Training framework to achieve good results. We test our method on SVHN, CIFAR-10/100 and ImageNet datasets, and our method outperforms the previous state- of-the-art methods by a large margin. Keywords: Co-Training · Deep Networks · Semi-Supervised Learning 1 Introduction Deep neural networks achieve the state-of-art performances in many tasks [1–17]. However, training networks requires large-scale labeled datasets [18, 19] which are usually difficult to collect. Given the massive amounts of unlabeled natu- ral images, the idea to use datasets without human annotations becomes very appealing [20]. In this paper, we study the semi-supervised image recognition problem, the task of which is to use the unlabeled images in addition to the labeled images to build better classifiers.
    [Show full text]