Self-Supervised Face Representation Learning
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Generating Multi-Agent Trajectories Using Programmatic Weak
Published as a conference paper at ICLR 2019 GENERATING MULTI-AGENT TRAJECTORIES USING PROGRAMMATIC WEAK SUPERVISION Eric Zhan Stephan Zheng∗ Caltech Salesforce [email protected] [email protected] Yisong Yue Long Sha & Patrick Lucey Caltech STATS [email protected] flsha,[email protected] ABSTRACT We study the problem of training sequential generative models for capturing co- ordinated multi-agent trajectory behavior, such as offensive basketball gameplay. When modeling such settings, it is often beneficial to design hierarchical mod- els that can capture long-term coordination using intermediate variables. Further- more, these intermediate variables should capture interesting high-level behavioral semantics in an interpretable and manipulatable way. We present a hierarchical framework that can effectively learn such sequential generative models. Our ap- proach is inspired by recent work on leveraging programmatically produced weak labels, which we extend to the spatiotemporal regime. In addition to synthetic settings, we show how to instantiate our framework to effectively model complex interactions between basketball players and generate realistic multi-agent trajec- tories of basketball gameplay over long time periods. We validate our approach using both quantitative and qualitative evaluations, including a user study compar- ison conducted with professional sports analysts.1 1 INTRODUCTION The ongoing explosion of recorded tracking data is enabling the study of fine-grained behavior in many domains: sports (Miller et al., 2014; Yue et al., 2014; Zheng et al., 2016; Le et al., 2017), video games (Ross et al., 2011), video & motion capture (Suwajanakorn et al., 2017; Taylor et al., 2017; Xue et al., 2016), navigation & driving (Ziebart et al., 2009; Zhang & Cho, 2017; Li et al., 2017), lab- oratory animal behaviors (Johnson et al., 2016; Eyjolfsdottir et al., 2017), and tele-operated robotics (Abbeel & Ng, 2004; Lin et al., 2006). -
Constrained Labeling for Weakly Supervised Learning
Constrained Labeling for Weakly Supervised Learning Chidubem Arachie1 Bert Huang2 1Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA 2Department of Computer Science, Data Intensive Studies Center, Tufts University, Medford, Massachusetts, USA Abstract sion. Weakly supervised learning involves training models using noisy labels. Using multiple sources or forms of weak supervision is common, as it provides diverse information Curation of large fully supervised datasets has be- to the model. However, each source of weak supervision has come one of the major roadblocks for machine its own bias that can be transmitted to the model. Different learning. Weak supervision provides an alterna- weak supervision signals can also conflict, overlap, or—in tive to supervised learning by training with cheap, the worst case—make dependent errors. Thus, a naive com- noisy, and possibly correlated labeling functions bination of these weak signals would hurt the quality of from varying sources. The key challenge in weakly a learned model. The key problem then is how to reliably supervised learning is combining the different combine various sources of weak signals to train an accurate weak supervision signals while navigating mislead- model. ing correlations in their errors. In this paper, we propose a simple data-free approach for combin- To solve this problem, we propose constrained label learn- ing weak supervision signals by defining a con- ing (CLL), a method that processes various weak supervi- strained space for the possible labels of the weak sion signals and combines them to produce high-quality signals and training with a random labeling within training labels. The idea behind CLL is that, given the weak this constrained space. -
Multi-Task Weak Supervision Enables Automated Abnormality Localization in Whole-Body FDG-PET/CT
Multi-task weak supervision enables automated abnormality localization in whole-body FDG-PET/CT Sabri Eyuboglu∗1, Geoffrey Angus∗1, Bhavik N. Patel2, Anuj Pareek2, Guido Davidzon2, Jared Dunnmon,∗∗1 Matthew P. Lungren ∗∗2 1 Department of Computer Science, Stanford University, Stanford, CA 94305, USA 2 Department of Radiology, Stanford University, Stanford, CA 94305, USA ∗ ∗∗ Equal contribution; Computational decision support systems could provide clinical value in whole-body FDG- PET/CT workflows. However, the lack of labeled data combined with the sheer size of each scan make it challenging to apply existing supervised machine learning systems. Leveraging recent advancements in natural language processing, we develop a weak supervision frame- work that extracts imperfect, yet highly granular regional abnormality labels from the radi- ologist reports associated with each scan. Our framework automatically labels each region in a custom ontology of anatomical regions, providing a structured profile of the pathologies in each scan. We design an attention-based, multi-task CNN architecture that can be trained using these generated labels to localize abnormalities in whole-body scans. We show that our multi-task representation is critical for strong performance on rare abnormalities for which we have little data and for accurately predicting mortality from imaging data alone (AUROC 80%), a critical challenge for palliative care teams. 1 Introduction The availability of large, labeled datasets has fueled recent progress in machine learning for med- ical imaging. Breakthrough studies in chest radiograph diagnosis, skin and mammographic lesion classification, and diabetic retinopathy detection have relied on hundreds of thousands of labeled training examples [1–4]. -
Low-Rank Tucker Approximation of a Tensor from Streaming Data : ; \ \ } Yiming Sun , Yang Guo , Charlene Luo , Joel Tropp , and Madeleine Udell
SIAM J. MATH.DATA SCI. © 2020 Society for Industrial and Applied Mathematics Vol. 2, No. 4, pp. 1123{1150 Low-Rank Tucker Approximation of a Tensor from Streaming Data˚ : ; x { } Yiming Sun , Yang Guo , Charlene Luo , Joel Tropp , and Madeleine Udell Abstract. This paper describes a new algorithm for computing a low-Tucker-rank approximation of a tensor. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode, as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor, and it uses storage proportional to the degrees of freedom in the output Tucker approximation. The algorithm does not require a second pass over the tensor, although it can exploit another view to compute a superior approximation. The paper provides a rigorous theoretical guarantee on the approximation error. Extensive numerical experiments show that the algorithm produces useful results that improve on the state-of-the-art for streaming Tucker decomposition. Key words. Tucker decomposition, tensor compression, dimension reduction, sketching method, randomized algorithm, streaming algorithm AMS subject classifcations. 68Q25, 68R10, 68U05 DOI. 10.1137/19M1257718 1. Introduction. Large-scale datasets with natural tensor (multidimensional array) struc- ture arise in a wide variety of applications, including computer vision [39], neuroscience [9], scientifc simulation [3], sensor networks [31], and data mining [21]. In many cases, these tensors are too large to manipulate, to transmit, or even to store in a single machine. Luckily, tensors often exhibit a low-rank structure and can be approximated by a low-rank tensor factorization, such as CANDECOMP/PARAFAC (CP), tensor train, or Tucker factorization [20]. -
Visual Learning with Weak Supervision
Visual Learning with Weak Supervision Safa Cicek Committee: Prof. Stefano Soatto, Chair Prof. Lieven Vandenberghe Prof. Paulo Tabuada Prof. Guy Van den Broeck PHD Defense January 2021 UCLA ELECTRICAL AND COMPUTER ENGINEERING Table of Contents Introduction SaaS: Speed as a Supervisor for Semi-supervised Learning [1] Input and Weight Space Smoothing for Semi-supervised Learning [2] Unsupervised Domain Adaptation via Regularized Conditional Alignment [3] Disentangled Image Generation for Unsupervised Domain Adaptation [4] Spatial Class Distribution Shift in Unsupervised Domain Adaptation [5] Learning Topology from Synthetic Data for Unsupervised Depth Completion [6] Targeted Adversarial Perturbations for Monocular Depth Prediction [7] Concluding Remarks [1] Cicek, Safa, Alhussein Fawzi, and Stefano Soatto. Saas: Speed as a supervisor for semi-supervised learning. Proceedings of the European Conference on Computer Vision (ECCV). 2018. [2] Cicek, Safa, and Stefano Soatto. Input and Weight Space Smoothing for Semi-supervised Learning. Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops. 2019. [3] Cicek, Safa, and Stefano Soatto. Unsupervised domain adaptation via regularized conditional alignment. Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2019. [4] Cicek, Safa, Zhaowen Wang, Hailin Jin, Stefano Soatto, Generative Feature Disentangling for Unsupervised Domain Adaptation. Proceedings of the European Conference on Computer Vision (ECCV) Workshops. (2020). [5] Cicek, Safa, Ning Xu, Zhaowen Wang, Hailin Jin, Stefano Soatto, Spatial Class Distribution Shift in Unsupervised Domain Adaptation. Asian Conference on Computer Vision (ACCV). 2020. [6] Wong Alex, Safa Cicek, Stefano Soatto, Learning Topology from Synthetic Data for Unsupervised Depth Completion, IEEE Robotics and Automation Letters (RAL). 2021. [7] Wong Alex, Safa Cicek, Stefano Soatto, Targeted Adversarial Perturbations for Monocular Depth Prediction. -
Polynomial Tensor Sketch for Element-Wise Function of Low-Rank Matrix
Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix Insu Han 1 Haim Avron 2 Jinwoo Shin 3 1 Abstract side, we obtain a o(n2)-time approximation scheme of f (A)x ≈ T T >x for an arbitrary vector x 2 n due This paper studies how to sketch element-wise U V R to the associative property of matrix multiplication, where functions of low-rank matrices. Formally, given the exact computation of f (A)x requires the complexity low-rank matrix A = [A ] and scalar non-linear ij of Θ(n2). function f, we aim for finding an approximated low-rank representation of the (possibly high- The matrix-vector multiplication f (A)x or the low-rank > rank) matrix [f(Aij)]. To this end, we propose an decomposition f (A) ≈ TU TV is useful in many machine efficient sketching-based algorithm whose com- learning algorithms. For example, the Gram matrices of plexity is significantly lower than the number of certain kernel functions, e.g., polynomial and radial basis entries of A, i.e., it runs without accessing all function (RBF), are element-wise matrix functions where entries of [f(Aij)] explicitly. The main idea un- the rank is the dimension of the underlying data. Such derlying our method is to combine a polynomial matrices are the cornerstone of so-called kernel methods approximation of f with the existing tensor sketch and the ability to multiply the Gram matrix to a vector scheme for approximating monomials of entries suffices for most kernel learning. The Sinkhorn-Knopp of A. -
Natural Language Processing and Weak Supervision
Natural language processing and weak supervision L´eonBottou COS 424 { 4/27/2010 Introduction Natural language processing \from scratch" { Natural language processing systems are heavily engineered. { How much engineering can we avoid by using more data ? { Work by Ronan Collobert, Jason Weston, and the NEC team. Summary { Natural language processing { Embeddings and models { Lots of unlabeled data { Task dependent hacks 2/43 COS 424 { 4/27/2010 I. Natural language processing The Goal We want to have a conversation with our computer . still a long way before HAL 9000 . Convert a piece of English into a computer-friendly data structure How to measure if the computer \understands" something? 4/43 COS 424 { 4/27/2010 Natural Language Processing Tasks Intermediate steps to reach the goal? Part-Of-Speech Tagging (POS): syntactic roles (noun, adverb...) Chunking (CHUNK): syntactic constituents (noun phrase, verb phrase...) Name Entity Recognition (NER): person/company/location... Semantic Role Labeling (SRL): semantic role [John]ARG0 [ate]REL [the apple]ARG1 [in the garden]ARGM LOC − 5/43 COS 424 { 4/27/2010 NLP Benchmarks Datasets: ? POS, CHUNK, SRL: WSJ( up to 1M labeled words) ? NER: Reuters( 200K labeled≈ words) ≈ System Accuracy System F1 Shen, 2007 97.33% Shen, 2005 95.23% Toutanova, 2003 97.24% Sha, 2003 94.29% Gimenez, 2004 97.16% Kudoh, 2001 93.91% (a) POS: As in (Toutanova, 2003) (b) CHUNK: CoNLL 2000 System F1 System F1 Ando, 2005 89.31% Koomen, 2005 77.92% Florian, 2003 88.76% Pradhan, 2005 77.30% Kudoh, 2001 88.31% Haghighi, 2005 -
Towards a Perceptually Relevant Control Space
Music sound synthesis using machine learning: Towards a perceptually relevant control space Fanny ROCHE PhD Defense 29 September 2020 Supervisor: Laurent GIRIN Co-supervisors: Thomas HUEBER Co-supervisors: Maëva GARNIER Co-supervisors: Samuel LIMIER Fanny ROCHE - PhD Defense - 29 September 2020 1 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Context and Objectives Context and Objectives Fanny ROCHE - PhD Defense - 29 September 2020 2 / 48 Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Spectral modeling ! e.g. additive, subtractive, ... + close to human sound perception − numerous & very specific Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g. sampling, wavetable, granular, ... + computationally efficient − very memory consuming Fanny ROCHE - PhD Defense - 29 September 2020 3 / 48 Physical modeling ! e.g. solving wave equations, modal synthesis, ... + physically meaningful controls − very specific sounds Context and Objectives Context and Objectives Music sound synthesis Abstract algorithms ! e.g. FM, waveshaping, ... + rich sounds − complex parameters Processed recordings ! e.g. -
Low-Rank Pairwise Alignment Bilinear Network for Few-Shot Fine
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, JUNE 2019 1 Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification Huaxi Huang, Student Member, IEEE, Junjie Zhang, Jian Zhang, Senior Member, IEEE, Jingsong Xu, Qiang Wu, Senior Member, IEEE Abstract—Deep neural networks have demonstrated advanced abilities on various visual classification tasks, which heavily rely on the large-scale training samples with annotated ground-truth. Slot Alaskan Dog However, it is unrealistic always to require such annotation in h real-world applications. Recently, Few-Shot learning (FS), as an attempt to address the shortage of training samples, has made significant progress in generic classification tasks. Nonetheless, it Insect Husky Dog is still challenging for current FS models to distinguish the subtle differences between fine-grained categories given limited training Easy to Recognise A Little Hard to Classify data. To filling the classification gap, in this paper, we address Lion Audi A5 the Few-Shot Fine-Grained (FSFG) classification problem, which focuses on tackling the fine-grained classification under the challenging few-shot learning setting. A novel low-rank pairwise Piano Audi S5 bilinear pooling operation is proposed to capture the nuanced General Object Recognition with Single Sample Fine-grained Object Recognition with Single Sample differences between the support and query images for learning an effective distance metric. Moreover, a feature alignment layer Fig. 1. An example of general one-shot learning (Left) and fine-grained one- is designed to match the support image features with query shot learning (Right). For general one-shot learning, it is easy to learn the ones before the comparison. -
Self-Supervised Collaborative Learning for Medical Image Synthesis
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Auto-GAN: Self-Supervised Collaborative Learning for Medical Image Synthesis Bing Cao,1 Han Zhang,2∗ Nannan Wang,1∗ Xinbo Gao,1 Dinggang Shen2∗ 1State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China 2Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA Abstract tors desire as many different modalities as possible to make a joint and more accurate decision, certain limitations (e.g., In various clinical scenarios, medical image is crucial in dis- restricted medical conditions, inadequate scanning time, and ease diagnosis and treatment. Different modalities of medical images provide complementary information and jointly helps cost/spending/resource control) could result in sacrificing doctors to make accurate clinical decision. However, due to some imaging modalities to have the most important one clinical and practical restrictions, certain imaging modalities done. For examples, low-resolution (e.g., thick-sliced) imag- may be unavailable nor complete. To impute missing data ing modalities are always combined with another single with adequate clinical accuracy, here we propose a frame- high-resolution imaging modality to save scanning time and work called self-supervised collaborative learning to syn- the low-resolution images can be misleading due to its par- thesize missing modality for medical images. The proposed tial coverage. CT is notorious for poor soft-tissue contrast method comprehensively utilize all available information cor- and radiation exposure, but can better quantify tissue’s phys- related to the target modality from multi-source-modality im- ical density. MRI is non-invasive and safer with better con- ages to generate any missing modality in a single model. -
Fast and Guaranteed Tensor Decomposition Via Sketching
Fast and Guaranteed Tensor Decomposition via Sketching Yining Wang, Hsiao-Yu Tung, Alex Smola Anima Anandkumar Machine Learning Department Department of EECS Carnegie Mellon University, Pittsburgh, PA 15213 University of California Irvine fyiningwa,[email protected] Irvine, CA 92697 [email protected] [email protected] Abstract Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized com- putation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as ten- sor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iter- ative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic mod- eling and obtain competitive results. Keywords: Tensor CP decomposition, count sketch, randomized methods, spec- tral methods, topic modeling 1 Introduction In many data-rich domains such as computer vision, neuroscience and social networks consisting of multi-modal and multi-relational data, tensors have emerged as a powerful paradigm for han- dling the data deluge. An important operation with tensor data is its decomposition, where the input tensor is decomposed into a succinct form. -
Weakly Supervised Text Classification with Naturally Annotated Resource
Automated Knowledge Base Construction (2021) Conference paper NatCat: Weakly Supervised Text Classification with Naturally Annotated Resources Zewei Chu [email protected] University of Chicago, Chicago, IL 60637, USA Karl Stratos [email protected] Rutgers University, Piscataway, NJ 08854, USA Kevin Gimpel [email protected] Toyota Technological Institute at Chicago, Chicago, IL 60637, USA Abstract We describe NatCat, a large-scale resource for text classification constructed from three data sources: Wikipedia, Stack Exchange, and Reddit. NatCat consists of document- category pairs derived from manual curation that occurs naturally within online communities. To demonstrate its usefulness, we build general purpose text classifiers by training on NatCat and evaluate them on a suite of 11 text classification tasks (CatEval), reporting large improvements compared to prior work. We benchmark different modeling choices and resource combinations and show how tasks benefit from particular NatCat data sources.1 1. Introduction Websites with community contributed content contain ample knowledge for natural lan- guage processing. In this paper, we seek to improve text classification by leveraging this knowledge in the form of texts paired with natural category annotations. In particular, we present NatCat, a large-scale resource constructed automatically from three data sources: Wikipedia, Stack Exchange,2 and Reddit.3 With rich knowledge about text and categories, we show that NatCat is a useful resource for text classification. To demonstrate the usefulness of NatCat, we use it to train models that compute a score for any document-category pair. As a result, we get weakly supervised (i.e., \dataless"; Chang et al., 2008) text classifiers that can be used off-the-shelf to produce interpretable and relevant topics for any document.