Beyond Supervised Learning: a Computer Vision Perspective

Beyond Supervised Learning: A Computer Vision Perspective Lovish Chum, Anbumani Subramanian, Vineeth N. Balasubramanian & C. V. Jawahar Journal of the Indian Institute of Science A Multidisciplinary Reviews Journal ISSN 0970-4140 J Indian Inst Sci DOI 10.1007/s41745-019-0099-3 1 23 Your article is protected by copyright and all rights are held exclusively by Indian Institute of Science. This e-offprint is for personal use only and shall not be self- archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 J. Indian Inst. Sci. Author's personal copy A Multidisciplinary Reviews Journal ISSN: 0970-4140 Coden-JIISAD © Indian Institute of Science 2019. Beyond Supervised Learning: A Computer Vision Perspective Lovish Chum1*, Anbumani Subramanian2, Vineeth N. Balasubramanian3 and C. V. Jawahar1 REVIEW REVIEW ARTICLE Abstract | Fully supervised deep learning-based methods have created a profound impact in various felds of computer science. Compared to classical methods, supervised deep learning-based techniques face scalability issues as they require huge amounts of labeled data and, more signifcantly, are unable to generalize to multiple domains and tasks. In recent years, a lot of research has been targeted towards addressing these issues within the deep learning community. Although there have been extensive surveys on learning paradigms such as semi- supervised and unsupervised learning, there are a few timely reviews after the emergence of deep learning. In this paper, we provide an overview of the contemporary literature surrounding alternatives to fully supervised learning in the deep learning context. First, we summarize the relevant techniques that fall between the paradigm of supervised and unsupervised learning. Second, we take autonomous navigation as a running example to explain and compare different models. Finally, we highlight some shortcomings of current methods and suggest future directions. Keywords: Deep learning, Synthetic data, Domain adaptation, Weakly supervised learning, Few-shot learning, Self-supervised learning 1 Introduction navigation, collecting an exhaustive data set is Distilling useful information from prior experi- either very expensive or all but impossible. ence is one of the primary research problems in Even though supervised methods excel at computer science. Past information contained in learning from a large quantity of data, results the training data is extracted as a model and used show that they are particularly poor in gener- to predict future outcomes in machine learning. alizing the learned knowledge to new task or In the past few years, the advent of deep learn- domain221. This is because a majority of learning techniques has greatly benefted the areas of ing techniques assume that both the train and computer vision, speech, and Natural Language test data are sampled from the same distribution. Processing (NLP). However, supervised deep However, when the distributions of the train and learning-based techniques require a large amount test data are different, the performance of the of human-annotated training data to learn an model is known to degrade signifcantly201,221. This article belongs to adequate model. Although data have been pains- For instance, take the example of autonomous the Special issue–Recent Advances in Machine takingly collected and annotated for problems driving. The roadside environment for a city in Learning. such as image classifcation120,186, image caption- Europe is signifcantly different from a city in 115 134 ing , instance segmentation , visual ques- South Asia. Hence, a model trained with input 1 CVIT, IIIT Hyderabad, tion answering81, and other tasks, it is not viable video frames from the former suffers a signif- Hyderabad, India. 2 Intel, Bangalore, India. to do so for every domain and task. Particularly, cant degradation in performance when tested 3 IIT Hyderabad, for problems in health care and autonomous on the latter. This is in direct contrast to living Hyderabad, India. *[email protected] J. Indian Inst. Sci. | VOL xxx:x | xxx–xxx 2019 | journal.iisc.ernet.in 1 3 Author's personal copy L. Chum et al. organisms which perform a wide variety of tasks in urban scene understanding task, researchers in different settings without receiving direct often use a synthetically generated data set along supervision168,237. with the real data for training. This proves to be This survey is targeted towards summa- greatly benefcial as real-world data set may not rizing the recent literature that addresses two cover all the variations encountered during the bottlenecks of fully supervised deep learning test time i.e. different lighting conditions, seasons, methods—(1) lack of labeled data in a particu- camera angles etc. However, a model trained using lar domain; (2) unavailability of direct super- synthetic images suffers a signifcant decrease vision for a particular task in a given domain. in performance when tested on real images due Broadly, we can categorize the methods which to domain shift. This issue is algorithmically aim to tackle these problems into three sets—(1) addressed by making the model “adapt” to the data-centric techniques which solve the prob- real-world scenario259. Most of the methods dis- lem by generating a large amount of data simi- cussed in this survey fall under this category. lar to the one present in the original data set; In this paper, we discuss some of these meth- (2) algorithm-centric techniques which tweak ods along with describing their qualitative results. the learning method to harness the limited data We use tasks associated with autonomous naviga- effciently through various techniques like on- tion as a case study to explain each paradigm. As demand human intervention, exploiting the a preliminary step, we introduce some common inherent structure of data, capitalizing on freely notations used in the paper. We follow this by available data on the web or solving for an easier mentioning the radical improvement brought by but related surrogate task; (3) hybrid techniques supervised deep learning methods in computer which combine ideas from both the data and vision tasks briefy in Sect. 1.2. Section 2 contains algorithm-centric methods. an overview of work which involves the use of Data-centric techniques include data aug- synthetic data for training. Various techniques for mentation which involves tweaking the data sam- transfer learning are compared in Sect. 3. Meth- ples with some pre-defned transformations to ods for weak and self-supervision are discussed increase the overall size of the data set. For images, in Sects. 4 and 6, respectively. Methods which this involves affne transformations such as shift- address the task of learning an adequate model ing, rotation, shearing, fipping, and distortion from a few instances are discussed in Sect. 5. of the original image116. Some recent papers also Finally, we conclude the paper discussing the advocate adding Gaussian noise to augment the promises, challenges, and open research frontiers images in the data set. Ratner et al.171 recommend beyond supervised learning in Sect. 7. Figure 1 learning these transforms instead of hard-coding gives a brief overview of the survey in the context them before training. Another method is to use of semantic segmentation task for autonomous techniques borrowed from computer graphics to navigation. generate synthetic data which is used along with the original data to train the model. In the case 1.1 Notations and Defnitions when data are in the form of time-series, window In this section, we introduce some notations slicing and window warping can be used for aug- which aid the explanation of the paradigms sur- mentation purposes126. veyed in the paper. Let X and Y be the input and Algorithm-centric techniques try to relax label space, respectively. In any machine learn- the need of perfectly labeled data by altering ing problem, we assume to have N objects from the model requirements to acquire supervision which we wish to learn the representation of the through inexact248, inaccurate148, and incom- data set. We extract features from these objects plete labels24. For most of the tasks, these labels X = (x , x , ..., x ) to train our model. Let are cheaper and relatively easy to obtain than full- 1 2 N P(X) be the marginal probability over X. In a fedged task-pertinent annotations. Techniques fully supervised setting, we also assume to have involving on-demand human supervision have labels Y = (y , y , ..., y ) corresponding to each also been used to label selective instances from 1 2 N of these feature sets. A learning algorithm seeks the data set220. Another set of methods exploit the to fnd a function f : X −→ Y in the hypoth- knowledge gained while learning from a related esis space F . To measure the suitability of the domain or task by effciently transferring it to the function f, a loss function l : Y × Y −→ R≥0 is test environment189. defned over space L . A machine learning algo- Hybrid methods incorporate techniques which rithm tries to minimize the risk R associated with focus on improving the performance of the model wrong predictions: at both the data and algorithm level. For instance, 2 1 3 J. Indian Inst. Sci.| VOL xxx:x | xxx–xxx 2019 | journal.iisc.ernet.in Author's personal copy Beyond Supervised Learning: A Computer Vision Perspective Figure 1: Learning paradigms arranged in decreasing order of supervision signal. Semantic segmentation of outdoor scene is taken as an example task (1) Fully supervised learning requires a lot of annotated data to learn a viable model35.

Beyond Supervised Learning: a Computer Vision Perspective

Principles of Emergent Design in Online Games: Mermaids Phase 1 Prototype

Re-Branding a Nation Online: Discourses on Polish Nationalism and Patriotism

Portland State University Commencement 2020 Program

Please Download Issue 1-2 2015 Here

Dancing Into the Chthulucene: Sensuous Ecological Activism In

Emergence, Elements, Examples, Evaluation

3D Virtual Space of the Academic Secretariat

Google Adquiere Motorola Mobility * Las Tablets PC Y Su Alcance * Synergy 1.3.1 * Circuito Impreso Al Instante * Proyecto GIMP-Es

2018-3-Hr01-Ka205-060151 Agrient

Volume 2, Number 2. 3D Virtual Worlds for Health and Healthcare

Raziskovalna Naloga Je Bila Opravljena Na ŠC Velenje, Poklicna in Tehniška Elektro in Računalniška Šola, 2008

Candidates Address Student Issues Phishing Alert by Aakash Arun Contributing Writer