Deep Learning and Weak Supervision for Image Classification

Total Page:16

File Type:pdf, Size:1020Kb

Deep Learning and Weak Supervision for Image Classification Deep learning and weak supervision for image classification Matthieu Cord Joint work with Thibaut Durand, Nicolas Thome Sorbonne Universités - Université Pierre et Marie Curie (UPMC) Laboratoire d’Informatique de Paris 6 (LIP6) - MLIA Team UMR CNRS June 09, 2016 1/35 Outline Context: Visual classification 1. MANTRA: Latent variable model to boost classification performances 2. WELDON: extension to Deep CNN 2/35 Motivations • Working on datasets with complex scenes (large and cluttered background), not centered objects, variable size, ... VOC07/12 MIT67 15 Scene COCO VOC12 Action • Select relevant regions ! better prediction • ImageNet: centered objects I Efficient transfert: needs bounding boxes [Oquab, CVPR14] • Full annotations expensive ) training with weak supervision 3/35 Motivations How to learn without bounding boxes? • Multiple-Instance Learning/Latent variables for missing information [Felzenszwalb, PAMI10] • Latent SVM and extensions => MANTRA How to learn deep without bounding boxes? • Learning invariance with input image transformations I Spatial Transformer Networks [Jaderberg, NIPS15] • Attention models: to select relevant regions I Stacked Attention Networks for Image Question Answering [Yang, CVPR16] • Parts model I Automatic discovery and optimization of parts for image classification [Parizi, ICLR15] • Deep MIL I Is object localization for free? [Oquab, CVPR15] I Deep extension of MANTRA: WELDON 4/35 Notations Variable Notation Space Train Test Example Input x X observed observed image Output y Y observed unobserved label Latent h H unobserved unobserved region • Model missing information with latent variables h • Most popular approach in Computer Vision: Latent SVM [Felzenszwalb, PAMI10] [Yu, ICML09] 5/35 Latent Structural SVM [Yu, ICML09] • Prediction function: (y^; h^) = arg max hw; Ψ(x; y; h)i (1) (y;h)2Y×H I Ψ(x; y; h): joint feature map I Joint inference in the (Y × H) space • Training: a set of N labeled trained pairs (xi ; yi ) I Objective function: upper bound of ∆(yi ; y^i ) N 1 2 C X kwk + max [∆(yi ; y) + hw; Ψ(xi ; y; h)i] − max hw; Ψ(xi ; yi ; h)i 2 N (y;h)2Y×H h2H i=1 | {z } ≥∆(yi ;^yi ) I Difference of Convex Functions, solved with CCCP I LAI: max [∆(yi ; y) + hw; Ψ(xi ; y; h)i] (y;h)2Y×H I Challenge exacerbated in the latent case, (Y × H) space 6/35 MANTRA: Minimum Maximum Latent Structural SVM Classifying only with the max scoring latent value not always relevant MANTRA model: + − • Pair of latent variables (hi;y; hi;y) + I max scoring latent value: hi;y = arg max hw; Ψ(xi ; y; h)i h2H − I min scoring latent value: hi;y = arg min hw; Ψ(xi ; y; h)i h2H • New scoring function: + − Dw(xi ; y) = hw; Ψ(xi ; y; hi;y)i + hw; Ψ(xi ; y; hi;y)i (2) • Prediction function ) find the output with maximum score y^ = arg max Dw(xi ; y) (3) y2Y 7/35 • MANTRA: max+min vs max for LSSVM ) negative evidence MANTRA: Minimum Maximum Latent Structural SVM Classifying only with the max scoring latent value not always relevant MANTRA model: + − • Pair of latent variables (hi;y; hi;y) + I max scoring latent value: hi;y = arg max hw; Ψ(xi ; y; h)i h2H − I min scoring latent value: hi;y = arg min hw; Ψ(xi ; y; h)i h2H • New scoring function: + − Dw(xi ; y) = hw; Ψ(xi ; y; hi;y)i + hw; Ψ(xi ; y; hi;y)i (2) • Prediction function ) find the output with maximum score y^ = arg max Dw(xi ; y) (3) y2Y 7/35 • MANTRA: max+min vs max for LSSVM ) negative evidence MANTRA: Model & Training Rationale Intuition of the max+min prediction function • x image, h image region, y image class • hw; Ψ(x; y; h)i: region h score for class y + − • Dw(x; y) = hw; Ψ(x; y; hy )i + hw; Ψ(x; y; hy )i + I hy : presence of class y ) large for yi − I hy : localized evidence of the absence of class y I Not too low for yi ) latent space regularization I Low for y 6= yi ) tracking negative evidence [Parizi, ICLR15] street image x Dw(x; street) = 2 Dw(x; highway) = 0:7 Dw(x; coast) = −1:5 8/35 MANTRA: Model Training Learning formulation • Loss function: `w(xi ; yi ) = max [∆(yi ; y) + Dw(xi ; y)] − Dw(xi ; yi ) y2Y I (Margin rescaling) upper bound of ∆(yi ; y^), constraints: 8y 6= yi ; Dw(xi ; yi ) ≥ ∆(yi ; y) + Dw(xi ; y) | {z } | {z } | {z } score for ground truth output margin score for other output • Non-convex optimization problem N 1 2 C X min kwk + `w(xi ; yi ) (4) w 2 N i=1 • Solver: non convex one slack cutting plane [Do, JMLR12] I Fast convergence I Direct optimization 6= CCCP for LSSVM I Still needs to solve LAI: maxy [∆(yi ; y) + Dw(xi ; y)] 9/35 MANTRA: Optimization • MANTRA Instantiation: define (x; y; h), Ψ(x; y; h), ∆(yi ; y) • Instantiations: binary & multi-class classification, AP ranking Binary Multi-class AP Ranking x bag bag set of bags (set of regions) (set of regions) (of regions) y ±1 f1;:::; Kg ranking matrix h instance (region) region regions [1 Φ(x; h);:::; joint latent ranking Ψ(x; y; h) y · Φ(x; h) (y=1) 1(y=K)Φ(x; h)] feature map ∆(yi ; y) 0/1 loss 0/1 loss AP loss LAI exhaustive exhaustive exact and efficient • Solve Inference maxy Dw(xi ; y) & LAI maxy [∆(yi ; y) + Dw(xi ; y)] I Exhaustive for binary/multi-class classification I Exact and efficient solutions for ranking 10/35 WELDON Weakly supErvisedLearning ofDeep cOnvolutionalNets • MANTRA extension for training deep CNNs • Learning Ψ(x; y; h): end-to-end learning of deep CNNs with structured prediction and latent variables I Incorporating multiple positive & negative evidence I Training deep CNNs with structured loss 11/35 Standard deep CNN architecture: VGG16 Simonyan et al. Very deep convolutional networks for large-scale image recognition. ICLR 2015 12/35 Adapt architecture to weakly supervised learning 1. Fully connected layers ! convolution layers I sliding window approach MANTRA adaptation for deep CNN Problem • Fixed-size image as input 13/35 MANTRA adaptation for deep CNN Problem • Fixed-size image as input Adapt architecture to weakly supervised learning 1. Fully connected layers ! convolution layers I sliding window approach 13/35 MANTRA adaptation for deep CNN Problem • Fixed-size image as input Adapt architecture to weakly supervised learning 1. Fully connected layers ! convolution layers I sliding window approach 13/35 MANTRA adaptation for deep CNN Problem • Fixed-size image as input Adapt architecture to weakly supervised learning 1. Fully connected layers ! convolution layers I sliding window approach 2. Spatial aggregation I Perform object localization prediction 13/35 WELDON: deep architecture • C: number of classes 14/35 Aggregation function [Oquab, 2015] • Region aggregation = max • Select the highest-scoring window original image motorbike feature map max prediction Oquab, Bottou, Laptev, Sivic. Is object localization for free? weakly-supervised learning with convolutional neural networks. CVPR 2015 15/35 WELDON: region aggregation Aggregation strategy: • max+min pooling (MANTRA prediction function) • k-instances I Single region to multiple high scoring regions: k k 1 X 1 X max ! i-th max min ! i-th min k k i=1 i=1 I More robust region selection [Vasconcelos CVPR15] max max + min 3 max +3 min 16/35 WELDON: architecture 17/35 WELDON: learning • Objective function for multi-class task and k = 1: N 1 X gt min R(w) + `(fw(xi ); y ) w N i i=1 w w 0 fw(xi ) =arg max max Lconv(xi ; y; h)+ min Lconv(xi ; y; h ) y h h0 How to learn deep architecture ? • Stochastic gradient descent training. • Back-propagation of the selecting windows error. 18/35 WELDON: learning Class is present • Increase score of selecting windows. Figure: Car map 19/35 WELDON: learning Class is absent • Decrease score of selecting windows. Figure: Boat map 20/35 Experiments • VGG16 pre-trained on ImageNet • Torch7 implementation Datasets • Object recognition: Pascal VOC 2007, Pascal VOC 2012 • Scene recognition: MIT67, 15 Scene • Visual recognition, where context plays an important role: COCO, Pascal VOC 2012 Action VOC07/12 MIT67 15 Scene COCO VOC12 Action 21/35 Experiments Dataset Train Test Classes Classification VOC07 ∼5.000 ∼5.000 20 multi-label VOC12 ∼5.700 ∼5.800 20 multi-label 15 Scene 1.500 2.985 15 multi-class MIT67 5.360 1.340 67 multi-class VOC12 Action ∼2.000 ∼2.000 10 multi-label COCO ∼80.000 ∼40.000 80 multi-label 22/35 Experiments • Multi-scale: 8 scales (combination with Object Bank strategy) 23/35 Object recognition VOC 2007 VOC 2012 VGG16 (online code) [1] 84.5 82.8 SPP net [2] 82.4 Deep WSL MIL [3] 81.8 WELDON 90.2 88.5 Table: mAP results on object recognition datasets. [1] Simonyan et al. Very deep convolutional networks. ICLR 2015 [2] He et al. Spatial pyramid pooling in deep convolutional networks. ECCV 2014 [3] Oquab et al. Is object localization for free? CVPR 2015 24/35 Scene recognition 15 Scene MIT67 VGG16 (online code) [1] 91.2 69.9 MOP CNN [2] 68.9 Negative parts [3] 77.1 WELDON 94.3 78.0 Table: Multi-class accuracy results on scene categorization datasets. [1] Simonyan et al. Very deep convolutional networks. ICLR 2015 [2] Gong et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features. ECCV 2014 [3] Parizi et al. Automatic discovery and optimization of parts. ICLR 2015 25/35 Context datasets VOC 2012 action COCO VGG16 (online code) [1] 67.1 59.7 Deep WSL MIL [2] 62.8 Our WSL deep CNN 75.0 68.8 Table: mAP results on context datasets.
Recommended publications
  • Generating Multi-Agent Trajectories Using Programmatic Weak
    Published as a conference paper at ICLR 2019 GENERATING MULTI-AGENT TRAJECTORIES USING PROGRAMMATIC WEAK SUPERVISION Eric Zhan Stephan Zheng∗ Caltech Salesforce [email protected] [email protected] Yisong Yue Long Sha & Patrick Lucey Caltech STATS [email protected] flsha,[email protected] ABSTRACT We study the problem of training sequential generative models for capturing co- ordinated multi-agent trajectory behavior, such as offensive basketball gameplay. When modeling such settings, it is often beneficial to design hierarchical mod- els that can capture long-term coordination using intermediate variables. Further- more, these intermediate variables should capture interesting high-level behavioral semantics in an interpretable and manipulatable way. We present a hierarchical framework that can effectively learn such sequential generative models. Our ap- proach is inspired by recent work on leveraging programmatically produced weak labels, which we extend to the spatiotemporal regime. In addition to synthetic settings, we show how to instantiate our framework to effectively model complex interactions between basketball players and generate realistic multi-agent trajec- tories of basketball gameplay over long time periods. We validate our approach using both quantitative and qualitative evaluations, including a user study compar- ison conducted with professional sports analysts.1 1 INTRODUCTION The ongoing explosion of recorded tracking data is enabling the study of fine-grained behavior in many domains: sports (Miller et al., 2014; Yue et al., 2014; Zheng et al., 2016; Le et al., 2017), video games (Ross et al., 2011), video & motion capture (Suwajanakorn et al., 2017; Taylor et al., 2017; Xue et al., 2016), navigation & driving (Ziebart et al., 2009; Zhang & Cho, 2017; Li et al., 2017), lab- oratory animal behaviors (Johnson et al., 2016; Eyjolfsdottir et al., 2017), and tele-operated robotics (Abbeel & Ng, 2004; Lin et al., 2006).
    [Show full text]
  • Constrained Labeling for Weakly Supervised Learning
    Constrained Labeling for Weakly Supervised Learning Chidubem Arachie1 Bert Huang2 1Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA 2Department of Computer Science, Data Intensive Studies Center, Tufts University, Medford, Massachusetts, USA Abstract sion. Weakly supervised learning involves training models using noisy labels. Using multiple sources or forms of weak supervision is common, as it provides diverse information Curation of large fully supervised datasets has be- to the model. However, each source of weak supervision has come one of the major roadblocks for machine its own bias that can be transmitted to the model. Different learning. Weak supervision provides an alterna- weak supervision signals can also conflict, overlap, or—in tive to supervised learning by training with cheap, the worst case—make dependent errors. Thus, a naive com- noisy, and possibly correlated labeling functions bination of these weak signals would hurt the quality of from varying sources. The key challenge in weakly a learned model. The key problem then is how to reliably supervised learning is combining the different combine various sources of weak signals to train an accurate weak supervision signals while navigating mislead- model. ing correlations in their errors. In this paper, we propose a simple data-free approach for combin- To solve this problem, we propose constrained label learn- ing weak supervision signals by defining a con- ing (CLL), a method that processes various weak supervi- strained space for the possible labels of the weak sion signals and combines them to produce high-quality signals and training with a random labeling within training labels. The idea behind CLL is that, given the weak this constrained space.
    [Show full text]
  • Multi-Task Weak Supervision Enables Automated Abnormality Localization in Whole-Body FDG-PET/CT
    Multi-task weak supervision enables automated abnormality localization in whole-body FDG-PET/CT Sabri Eyuboglu∗1, Geoffrey Angus∗1, Bhavik N. Patel2, Anuj Pareek2, Guido Davidzon2, Jared Dunnmon,∗∗1 Matthew P. Lungren ∗∗2 1 Department of Computer Science, Stanford University, Stanford, CA 94305, USA 2 Department of Radiology, Stanford University, Stanford, CA 94305, USA ∗ ∗∗ Equal contribution; Computational decision support systems could provide clinical value in whole-body FDG- PET/CT workflows. However, the lack of labeled data combined with the sheer size of each scan make it challenging to apply existing supervised machine learning systems. Leveraging recent advancements in natural language processing, we develop a weak supervision frame- work that extracts imperfect, yet highly granular regional abnormality labels from the radi- ologist reports associated with each scan. Our framework automatically labels each region in a custom ontology of anatomical regions, providing a structured profile of the pathologies in each scan. We design an attention-based, multi-task CNN architecture that can be trained using these generated labels to localize abnormalities in whole-body scans. We show that our multi-task representation is critical for strong performance on rare abnormalities for which we have little data and for accurately predicting mortality from imaging data alone (AUROC 80%), a critical challenge for palliative care teams. 1 Introduction The availability of large, labeled datasets has fueled recent progress in machine learning for med- ical imaging. Breakthrough studies in chest radiograph diagnosis, skin and mammographic lesion classification, and diabetic retinopathy detection have relied on hundreds of thousands of labeled training examples [1–4].
    [Show full text]
  • Visual Learning with Weak Supervision
    Visual Learning with Weak Supervision Safa Cicek Committee: Prof. Stefano Soatto, Chair Prof. Lieven Vandenberghe Prof. Paulo Tabuada Prof. Guy Van den Broeck PHD Defense January 2021 UCLA ELECTRICAL AND COMPUTER ENGINEERING Table of Contents Introduction SaaS: Speed as a Supervisor for Semi-supervised Learning [1] Input and Weight Space Smoothing for Semi-supervised Learning [2] Unsupervised Domain Adaptation via Regularized Conditional Alignment [3] Disentangled Image Generation for Unsupervised Domain Adaptation [4] Spatial Class Distribution Shift in Unsupervised Domain Adaptation [5] Learning Topology from Synthetic Data for Unsupervised Depth Completion [6] Targeted Adversarial Perturbations for Monocular Depth Prediction [7] Concluding Remarks [1] Cicek, Safa, Alhussein Fawzi, and Stefano Soatto. Saas: Speed as a supervisor for semi-supervised learning. Proceedings of the European Conference on Computer Vision (ECCV). 2018. [2] Cicek, Safa, and Stefano Soatto. Input and Weight Space Smoothing for Semi-supervised Learning. Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops. 2019. [3] Cicek, Safa, and Stefano Soatto. Unsupervised domain adaptation via regularized conditional alignment. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2019. [4] Cicek, Safa, Zhaowen Wang, Hailin Jin, Stefano Soatto, Generative Feature Disentangling for Unsupervised Domain Adaptation. Proceedings of the European Conference on Computer Vision (ECCV) Workshops. (2020). [5] Cicek, Safa, Ning Xu, Zhaowen Wang, Hailin Jin, Stefano Soatto, Spatial Class Distribution Shift in Unsupervised Domain Adaptation. Asian Conference on Computer Vision (ACCV). 2020. [6] Wong Alex, Safa Cicek, Stefano Soatto, Learning Topology from Synthetic Data for Unsupervised Depth Completion, IEEE Robotics and Automation Letters (RAL). 2021. [7] Wong Alex, Safa Cicek, Stefano Soatto, Targeted Adversarial Perturbations for Monocular Depth Prediction.
    [Show full text]
  • Natural Language Processing and Weak Supervision
    Natural language processing and weak supervision L´eonBottou COS 424 { 4/27/2010 Introduction Natural language processing \from scratch" { Natural language processing systems are heavily engineered. { How much engineering can we avoid by using more data ? { Work by Ronan Collobert, Jason Weston, and the NEC team. Summary { Natural language processing { Embeddings and models { Lots of unlabeled data { Task dependent hacks 2/43 COS 424 { 4/27/2010 I. Natural language processing The Goal We want to have a conversation with our computer . still a long way before HAL 9000 . Convert a piece of English into a computer-friendly data structure How to measure if the computer \understands" something? 4/43 COS 424 { 4/27/2010 Natural Language Processing Tasks Intermediate steps to reach the goal? Part-Of-Speech Tagging (POS): syntactic roles (noun, adverb...) Chunking (CHUNK): syntactic constituents (noun phrase, verb phrase...) Name Entity Recognition (NER): person/company/location... Semantic Role Labeling (SRL): semantic role [John]ARG0 [ate]REL [the apple]ARG1 [in the garden]ARGM LOC − 5/43 COS 424 { 4/27/2010 NLP Benchmarks Datasets: ? POS, CHUNK, SRL: WSJ( up to 1M labeled words) ? NER: Reuters( 200K labeled≈ words) ≈ System Accuracy System F1 Shen, 2007 97.33% Shen, 2005 95.23% Toutanova, 2003 97.24% Sha, 2003 94.29% Gimenez, 2004 97.16% Kudoh, 2001 93.91% (a) POS: As in (Toutanova, 2003) (b) CHUNK: CoNLL 2000 System F1 System F1 Ando, 2005 89.31% Koomen, 2005 77.92% Florian, 2003 88.76% Pradhan, 2005 77.30% Kudoh, 2001 88.31% Haghighi, 2005
    [Show full text]
  • Towards a Perceptually Relevant Control Space
    Music sound synthesis using machine learning: Towards a perceptually relevant control space Fanny ROCHE PhD Defense 29 September 2020 Supervisor: Laurent GIRIN Co-supervisors: Thomas HUEBER Co-supervisors: Maëva GARNIER Co-supervisors: Samuel LIMIER Fanny ROCHE - PhD Defense - 29 September 2020 1 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g.
    [Show full text]
  • Self-Supervised Collaborative Learning for Medical Image Synthesis
    The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Auto-GAN: Self-Supervised Collaborative Learning for Medical Image Synthesis Bing Cao,1 Han Zhang,2∗ Nannan Wang,1∗ Xinbo Gao,1 Dinggang Shen2∗ 1State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China 2Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA Abstract tors desire as many different modalities as possible to make a joint and more accurate decision, certain limitations (e.g., In various clinical scenarios, medical image is crucial in dis- restricted medical conditions, inadequate scanning time, and ease diagnosis and treatment. Different modalities of medical images provide complementary information and jointly helps cost/spending/resource control) could result in sacrificing doctors to make accurate clinical decision. However, due to some imaging modalities to have the most important one clinical and practical restrictions, certain imaging modalities done. For examples, low-resolution (e.g., thick-sliced) imag- may be unavailable nor complete. To impute missing data ing modalities are always combined with another single with adequate clinical accuracy, here we propose a frame- high-resolution imaging modality to save scanning time and work called self-supervised collaborative learning to syn- the low-resolution images can be misleading due to its par- thesize missing modality for medical images. The proposed tial coverage. CT is notorious for poor soft-tissue contrast method comprehensively utilize all available information cor- and radiation exposure, but can better quantify tissue’s phys- related to the target modality from multi-source-modality im- ical density. MRI is non-invasive and safer with better con- ages to generate any missing modality in a single model.
    [Show full text]
  • Weakly Supervised Text Classification with Naturally Annotated Resource
    Automated Knowledge Base Construction (2021) Conference paper NatCat: Weakly Supervised Text Classification with Naturally Annotated Resources Zewei Chu [email protected] University of Chicago, Chicago, IL 60637, USA Karl Stratos [email protected] Rutgers University, Piscataway, NJ 08854, USA Kevin Gimpel [email protected] Toyota Technological Institute at Chicago, Chicago, IL 60637, USA Abstract We describe NatCat, a large-scale resource for text classification constructed from three data sources: Wikipedia, Stack Exchange, and Reddit. NatCat consists of document- category pairs derived from manual curation that occurs naturally within online communities. To demonstrate its usefulness, we build general purpose text classifiers by training on NatCat and evaluate them on a suite of 11 text classification tasks (CatEval), reporting large improvements compared to prior work. We benchmark different modeling choices and resource combinations and show how tasks benefit from particular NatCat data sources.1 1. Introduction Websites with community contributed content contain ample knowledge for natural lan- guage processing. In this paper, we seek to improve text classification by leveraging this knowledge in the form of texts paired with natural category annotations. In particular, we present NatCat, a large-scale resource constructed automatically from three data sources: Wikipedia, Stack Exchange,2 and Reddit.3 With rich knowledge about text and categories, we show that NatCat is a useful resource for text classification. To demonstrate the usefulness of NatCat, we use it to train models that compute a score for any document-category pair. As a result, we get weakly supervised (i.e., \dataless"; Chang et al., 2008) text classifiers that can be used off-the-shelf to produce interpretable and relevant topics for any document.
    [Show full text]
  • Weakly Supervised Short Text Categorization Using World Knowledge
    Weakly Supervised Short Text Categorization Using World Knowledge Rima T¨urker1;2[0000−0001−8332−7241], Lei Zhang1[0000−0001−8184−9353], Mehwish Alam1;2[0000−0002−7867−6612], and Harald Sack1;2[0000−0001−7069−9804] 1 FIZ Karlsruhe { Leibniz Institute for Information Infrastructure, Germany 2 Karlsruhe Institute of Technology, Institute AIFB, Germany [email protected] [email protected] Abstract. Short text categorization is an important task in many NLP applications, such as sentiment analysis, news feed categorization, etc. Due to the sparsity and shortness of the text, many traditional classifi- cation models perform poorly if they are directly applied to short text. Moreover, supervised approaches require large amounts of manually la- beled data, which is a costly, labor intensive, and time-consuming task. This paper proposes a weakly supervised short text categorization ap- proach, which does not require any manually labeled data. The proposed model consists of two main modules: (1) a data labeling module, which leverages an external Knowledge Base (KB) to compute probabilistic labels for a given unlabeled training data set, and (2) a classification model based on a Wide & Deep learning approach. The effectiveness of the proposed method is validated via evaluation on multiple datasets. The experimental results show that the proposed approach outperforms unsupervised state-of-the-art classification approaches and achieves com- parable performance to supervised approaches. Keywords: Short Text Categorization · Weakly Supervised Short Text Categorization · Wide & Deep Model. 1 Introduction Due to rapid growth of the Web content, short text data such as search snippets, news feeds, short messages, etc.
    [Show full text]
  • Strength from Weakness: Fast Learning Using Weak Supervision
    Strength from Weakness: Fast Learning Using Weak Supervision Joshua Robinson 1 Stefanie Jegelka 1 Suvrit Sra 1 Abstract plex model on a related, large data task, and to then uses We study generalization properties of weakly su- the learned features for finetuning on the small-data target pervised learning, that is, learning where only a task (Girshick et al., 2014; Donahue et al., 2014; Zeiler & few “strong” labels (the actual target for predic- Fergus, 2014; Sun et al., 2017). Numerous approaches to tion) are present but many more “weak” labels are weakly supervised learning have succeeded in a variety of available. In particular, we show that pretraining tasks; beyond computer vision (Oquab et al., 2015; Durand using weak labels and finetuning using strong can et al., 2017; Carreira & Zisserman, 2017; Fries et al., 2019). accelerate the learning rate for the strong task to Examples include clinical text classification (Wang et al., 2019), sentiment analysis (Medlock & Briscoe, 2007), so- the fast rate of (1/n), where n is the number of strongly labeledO data points. This acceleration cial media content tagging (Mahajan et al., 2018) and many can happen even if, by itself, the strongly labeled others. Weak supervision is also closely related to unsu- pervised learning methods such as complementary and con- data admits only the slower (1/√n) rate. The acceleration depends continuouslyO on the number trastive learning (Xu et al., 2019; Chen & Batmanghelich, of weak labels available, and on the relation be- 2019; Arora et al., 2019), and particularly to self-supervised tween the two tasks. Our theoretical results are learning (Doersch et al., 2015), where feature maps learned reflected empirically across a range of tasks and via supervised training on artificially constructed tasks have illustrate how weak labels speed up learning on been found to even outperform ImageNet learned features on the strong task.
    [Show full text]
  • Snuba: Automating Weak Supervision to Label Training Data
    Snuba: Automating Weak Supervision to Label Training Data Paroma Varma Christopher Re´ Stanford University Stanford University [email protected] [email protected] ABSTRACT benign, malignant if area > 210.8: As deep learning models are applied to increasingly diverse return False Label if area < 150: Aggregator problems, a key bottleneck is gathering enough high-quality return Abstain training labels tailored to each task. Users therefore turn Labeled Data 25% benign, 75%benign, ... ?, ?, ?, ? if perim > 120: to weak supervision, relying on imperfect sources of labels return True like pattern matching and user-defined heuristics. Unfor- if perim > 80 Terminate? return Abstain tunately, users have to design these sources for each task. Heuristic Generation Training Labels This process can be time consuming and expensive: domain Unlabeled Data Snuba experts often perform repetitive steps like guessing optimal numerical thresholds and developing informative text pat- Figure 1: Snuba uses a small labeled and a large unlabeled terns. To address these challenges, we present Snuba,a dataset to iteratively generate heuristics. It uses existing la- system to automatically generate heuristics using a small bel aggregators to assign training labels to the large dataset. labeled dataset to assign training labels to a large, unla- beled dataset in the weak supervision setting. Snuba gen- pervision, or methods that can assign noisy training labels erates heuristics that each labels the subset of the data it to unlabeled data, like crowdsourcing [9, 22, 60], distant su- is accurate for, and iteratively repeats this process until the pervision [8,32], and user-defined heuristics [38,39,50]. Over heuristics together label a large portion of the unlabeled the past few years, we have been part of the broader effort data.
    [Show full text]
  • Multi-Modal Weakly Supervised Dense Event Captioning
    Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning Tanzila Rahman1,2 Bicheng Xu1,2 Leonid Sigal1,2,3 1University of British Columbia 2Vector Institute for AI 3Canada CIFAR AI Chair {trahman8, bichengx, lsigal}@cs.ubc.ca Abstract Input Video with Audio Track Dense Event Captions A group of friends sit Multi-modal learning, particularly among imaging and together in a living room linguistic modalities, has made amazing strides in many talking. high-level fundamental visual understanding problems, .. ranging from language grounding to dense event caption- One of the friends plays . a song on his guitar and ing. However, much of the research has been limited to the others listen. approaches that either do not take audio corresponding to .. Time video into account at all, or those that model the audio- . One friend falls back visual correlations in service of sound or sound source lo- into the chair in . amazement after the calization. In this paper, we present the evidence, that audio song . signals can carry surprising amount of information when .. it comes to high-level visual-lingual tasks. Specifically, The two friends congratulate the we focus on the problem of weakly-supervised dense event man for his guitar performance and captioning in videos and show that audio on its own can shake hands. nearly rival performance of a state-of-the-art visual model and, combined with video, can improve on the state-of-the- Figure 1: Multi-modal Dense Event Captioning. Illus- art performance. Extensive experiments on the ActivityNet tration of our problem definition, where we use both audio Captions dataset show that our proposed multi-modal ap- features and visual information to generate the dense cap- proach outperforms state-of-the-art unimodal methods, as tions for a video in a weakly supervised manner.
    [Show full text]