Overcoming Limitations of Categorical Language Modeling

Total Page:16

File Type:pdf, Size:1020Kb

Overcoming Limitations of Categorical Language Modeling Overcoming Limitations of Categorical Language Modeling Shiran Dudy Advisor: Steven Bedrick A thesis presented for the degree of Doctor of Philosophy Center for Spoken Language Understanding Oregon Health & Science University November 2020 “Education means teaching a child to be curious, to wonder, to reflect, to enquire. The child who asks becomes a partner in the learning process, an active recipient. To ask is to grow”. Jonathan Sacks ii Acknowldgements There are many who I would like to thank, and who accompanied me throughout my journey. First, I am very grateful to have had Steven Bedrick as my advisor. I learned from him a lot: starting from the basics on how to ask a research question, to considering how and in what ways, in the grand scheme of things, our work adds to the general knowledge of our community. He also taught me how to attend to details more carefully, and to rigorously examine my steps and outcome in a methodological fashion. He always removed any roadblocks, and provided me with whatever assistance or advice about my work. Steven was always there when I asked (and I asked a lot). I am most appreciative of how he let me discover myself, his trust and support in me to follow my passion. He was everything I could ask for in a mentor. I also would like to thank Melanie Fried-Oken who accepted me to her group and who exposed me to the world of assistive technology. Her dedication to relentlessly developing means to find the voices of the people who have lost their basic ability to communicate was inspiring to me. She taught me what it takes to run an interdisci- plinary group. Most importantly, she has supported me throughout, and I am very fortunate for that. I also wanted to thank Peter Heeman who made it possible for me to graduate on time. Throughout the last year he has done everything he could to ensure I am provided with the resources and the knowledge to graduate and make a smooth transition onward. I do not take it for granted and appreciate him a lot for helping me make my first steps following my graduation. I wanted to thank Pat as well, as throughout the program she smoothly took care of every administrative issue. She was the first person who I saw when I arrived here and was always very friendly and welcoming. I am also grateful for Brian Roark who I learned from and consulted with whenever I got stuck. He always had the time and patience to listen, and offer good advice (which he always had). Finally, I would like to thank David Smith, who from time to time helped us in brainstorming over directions and ideas that I had in mind, and encouraged me to continue asking. I wanted to thank my mother who was (and is) there for me throughout it all, who initially thought that pursuing a PhD so far from home was a crazy idea. Nonetheless, she still supported me from afar throughout. Ronen, my love, who I could not imagine going through the last stretch without. I am very fortunate to have him in my life. My sister who taught me that I can shape my reality in my own hands. To my dad who supported me and is very proud now. To my adopting mothers here, Dvora M., Karen, and Dvora T.. And especially to Dvora M. who was there for me whenever I needed a fresh perspective on life. To Naomi, Dudi, Yael, Ori and Cleiton who also my became my close family. I wanted to thank my committee members Brian Roark, Peter Heeman, Meysam Asgari and Xubo Song for providing me feedback on this work and helping me strengthen my argument. iii Overcoming Limitations of Categorical Language Modeling Shiran Dudy Abstract Neural language models typically employ a categorical approach to prediction and training, leading to several well-known computational and numerical limitations. These limitations are particularly evident in applied settings where language models are employed as means for communication. From speller systems employed as assis- tive technology to texting applications on smartphones, all language models revolve around category-based prediction. Research shows that neural-category approaches to language modeling are questionable for predicting low-frequency words that are essential for user personalization. It is also challenging to adapt these architectures to a changing vocabulary due to the initially learned vocabulary constraints, which limit predictions of relevant categories (i.e., words) a user can type. Recently, such categorical models were shown to be relatively complex with long inference times, which may be detrimental for user engagement. In this thesis, I reevaluate neural- category approaches and propose an alternative: continuous output prediction. Continuous output prediction is an underexplored alternative approach to lan- guage modeling that performs prediction directly against a continuous word em- bedding space. This approach splits the inference phase into two steps: a vector prediction followed by a vector decoding (mapping the vector to a category). Pre- dicting a vector in an embedding space opens the door to a theoretically unlimited number of categories that can be represented and decoded using this technique. Technically, I show how given a trained model, continuous models’ adaptation to a new vocabulary requires minimal architectural modifications compared to that of categorical alternatives. I also explore another important trait of continuous out- put prediction models: such models reach low-frequency vocabulary words that are often ignored by categorical models. I discuss the computational aspects of con- tinuous output prediction, showing its promising results, especially in multiple-user settings and settings in which short inferences are required. Finally, to evaluate the diversity of categories predicted, including low-frequency words, I propose a simple metric based on the unique types predicted. iv Contents 1 Introduction1 1.1 Problem Statement............................3 1.2 Thesis Contributions...........................3 1.2.1 Retrieval-based language model.................4 1.2.2 Prediction diversity evaluation metric..............4 1.2.3 Adaptation in retrieval-based approaches............4 1.3 Organization of the Thesis........................4 2 Preliminaries and Background6 2.1 On the Roles of Language........................6 2.2 Augmentative and Alternative Communication (AAC)........7 2.2.1 BCI systems............................8 2.2.2 Icons................................ 10 2.3 Language Models............................. 14 2.3.1 Language models’ application.................. 15 2.3.2 Statistical language models................... 16 2.3.3 Evaluation metrics........................ 19 2.3.4 Neural network language models................. 19 2.3.5 Neural models compared to count-based approaches...... 25 2.4 Word-Embedding Spaces......................... 25 2.4.1 Static embeddings........................ 26 2.4.2 Contextualized embeddings................... 29 2.4.3 Hot representation........................ 30 2.5 Limitations of Neural-Categorical-Based Prediction.......... 31 2.5.1 Complexity limitations...................... 31 2.5.2 Decoding limitations....................... 33 2.5.3 Architectural limitations..................... 36 2.5.4 Evaluation limitations...................... 37 3 Towards Continuous-Output prediction of Language Models 39 3.1 Introduction................................ 39 3.1.1 Motivation for using a continuous approach.......... 39 3.2 Related Work............................... 43 3.2.1 Predictive language models................... 43 3.2.2 Adversarial language model training............... 44 3.2.3 Rare words............................ 44 3.3 Methods.................................. 45 3.3.1 Datasets.............................. 45 v 3.3.2 Models............................... 46 3.3.3 Embeddings............................ 47 3.3.4 Decoding............................. 48 3.3.5 Process.............................. 48 3.3.6 Metrics.............................. 48 3.3.7 Baselines.............................. 50 3.4 Results................................... 50 3.4.1 High-level analysis........................ 50 3.4.2 Proposing an adversarial continuous output model (GAN).. 52 3.4.3 GAN’s model performance.................... 53 3.4.4 Long-tail analysis......................... 53 3.4.5 An improved categorical model: The unlikelihood loss function.................. 58 3.4.6 An improved categorical model: Employing subword unit tokenization.............. 59 3.4.7 Overall performance across experiments............. 60 3.4.8 Computational costs....................... 62 3.5 Future Directions............................. 65 3.6 Conclusion................................. 65 4 Incremental Domain Adaptation in Language Models 67 4.1 Introduction................................ 67 4.2 Related Work............................... 67 4.2.1 Domain adaptation in NLP................... 67 4.2.2 Continual learning........................ 71 4.3 Continual Learning of Language Models................ 72 4.3.1 Problem definition........................ 72 4.3.2 Problem formalization...................... 72 4.4 Methods.................................. 72 4.4.1 Datasets.............................. 73 4.4.2 Models............................... 73 4.4.3 Embeddings............................ 73 4.4.4 Decoding............................. 73 4.4.5 Experiments............................ 74 4.4.6 Metrics.............................. 77 4.5 Results................................... 77 4.5.1 Performance following the second training..........
Recommended publications
  • 001. Chan Zuckerberg Fellows 8:00 to 5:00 Pm Hilton San Francisco Union Square: Yosemite a Chan Zuckerberg Fellows
    THURSDAY APRIL 16 001. Chan Zuckerberg Fellows 8:00 to 5:00 pm Hilton San Francisco Union Square: Yosemite A Chan Zuckerberg Fellows 002. Classroom Assessment Task Force 8:00 to 5:00 pm Hilton San Francisco Union Square: Yosemite B Classroom Assessment Task Force 003. Exploring, Visualizing, and Modeling Big Data with R Training Session 8:00 to 5:00 pm Hilton San Francisco Union Square: Franciscan Ballroom A Working with big data requires a particular suite of data analytics tools and advanced techniques, such as machine learning (ML). Many of these tools are readily and freely available in R. This full-day session will provide participants with a hands-on training on how to use data analytics tools and machine learning methods available in R to explore, visualize, and model big data. The first half of the session will focus on organizing (manipulating and summarizing) and visualizing (both statically and dynamically) big data in R. The second half will involve a series of short lectures on ML techniques (decision trees, random forest, and support vector machines), as well as hands-on demonstrations applying these methods in R. Examples will be drawn from various assessments (e.g., PISA and TIMSS). Participants will get opportunities to work through several, directed labs throughout the day. The target audience for this session includes graduate students, researchers interested in analyzing big data from large-scale assessments and surveys, and practitioners working with big data on a daily basis. Some familiarity with the R programming language is required. Participants should bring a laptop with R and RStudio installed to be able to complete the labs during the session.
    [Show full text]
  • Hallucinogenic Drugs and Hypnosis in Psychotherapy
    i ___"_ LSDI066a f l_ Reprintedfromthe BritishJournalof Medical Hypnotism, ., Autu_ 1961. Vol. 13, No. 1 HALLUCINOGENIC DRUGS AND HYPNOSIS IN PSYCHOTHERAPY _ by _ Isaac Gubel, M.D. !_.. ! w HALLUCINOGENIC DRUGS AND HYPNOSIS IN PSYCHOTHERAPY by ISAAC GUBEL, M.D. SociedadArgentinade Hipnoterapia(Argentina) In the practice of psychotherapy, ment and the need of security, rises in the present-day state of medicine, an infinite feeling of solitude which the physician is still very far from is expressed in anxiety. possessing therapeutic resources of It does not matter whether this undoubted value for the treatment anxiety "becomes a thing" in a of psychological diseases, phobia, or a depression, or a definite Rather than diseases, it should be somatisation. said in this case, of a maladjustment The real fact is, that the anxiety of the external reality to the human of the human being, in his peculiar being, language, is a call for help, protec- The last statement constitutes tion and care, in the philosophical apparently a paradox, sense implied in these words. We are accustomed to hear it When the human being becomes said, with reference to emotional a "patient" and comes to the alterations, that these are due to physician s eekin g psychological a maladaptation to the social or homeostasis in a prescription or in • familial environment of the person a session of psychotherapy, we meet who suffers them. a new anxiety, this time, of the pro- This is, however, a mistake, fessional man who, in most cases, if because a human being cannot be he is intelligent and has a critical required to renounce to his liberties understanding of the possibilities of as such, in favour of a world that his science, feels unarmed for his imposed on him tasks and dutie_ fight against the psychogenic pain.
    [Show full text]
  • Max Planck Institute for the History of Science RESEARCH REPORT 2002—2003
    RESEARCH REPORT 2002—2003 MAX-PLANCK-INSTITUT FÜR WISSENSCHAFTSGESCHICHTE Max Planck Institute for the History of Science RESEARCH REPORT 2002—2003 MAX-PLANCK-INSTITUT FÜR Cyanea capillata, glass model by Leopold and Rudolph Blaschka, 1884. WISSENSCHAFTSGESCHICHTE Courtesy of the Museum für Naturkunde der Humboldt-Universität, Berlin Max Planck Institute for the History of Science Towards an Historical Epistemology Ten Years Max Planck Institute for the History of Science, 1994 – 2004 Jürgen Renn At the beginning of the twenty-first century, we rely on the growth of scientific knowledge to meet the global challenges to humanity. Our understanding of the world around us, our lives, our economy, our technological achievements, and our vision of the future depend on it. But what is science? Can we blindly trust scientific knowledge when addressing delicate decisions on vital issues such as the use of nuclear energy or stem cells? Are there such things as scientific facts or objectivity that are not subject to the weaknesses of human judgment and the vicissitudes of historical change? Can scientific revolutions really affect fundamental categories of our thinking? How fragile is scientific truth and how predictable and dependable is scientific innovation? Will there ever be a final theory of the universe or are the very notions of what a universe and what a theory, let alone final theory mean, shaped by our historical context just as much as our judgment about what a good politician or a beautiful painting is? How deeply is science ingrained in culture, and how is scientific progress possible in spite of its contingent nature? These are some of the questions that any attempt to deal with the endeavor of science in a respon- sible way must raise and that motivate the interdisciplinary research projects at the Max Planck Institute for the History of Science.
    [Show full text]
  • R E V I S Taa R G E N T I N Ade Psiquiat R
    E RTE V RE V I S T A AR G E N T I N A DE PSIQUIATR I A X 45 IN V E S T I G A CIÓN EN PS I C OT E R A PI A Duhalde / Fernández Álvarez / García Hagelin / Hirsch / Huerin / Lardini Leibovich de Duarte / Roussos / Rutsztein Torricelli / Zukerfeld Revista de Experiencias Clínicas y Neurociencias / Dossier / El Rescate y la Memoria / Confrontaciones / Señales Volumen XII - N° 45 Setiembre – Octubre – Noviembre 2001 45 Director: Comité Científico Juan Carlos Stagnaro F. Alvarez (Bs. As.), V. Baremblit (Barcelona), I. Berenstein (Bs. As.), S. Berma n n Director Asociado para Europa: (C ó r doba), P. Berner (Viena), J. Berge r et (Lyon), F. Caroli (París), M. Cetcovich Bakmas (Bs. As.), B. Dubrovsky (Montreal), R. H. Etchegoyen (Bs. As.), N. Feldman Dominique Wintrebert (Rosario), J. Forbes (S. Pablo), O. Gershanik (Bs. As.), A. Heerlein (Sgo. de Chile), M. Hernández (Lima), O. Kernb e r g (Nueva York), G. Lanteri-Laura (París), F. Lolas Stepke (Sgo. de Chile), H. Lôo (París), J. Mari (S. Pablo), M. A. Matterazzi (Bs. As.), J. Mendlewicz (Bruselas), A. Monchablon Espinoza (Bs. As.), R. Montenegro (Bs. As.), A. Mossotti (Santa Fe), J. Nazar (Mendoza), P. Nöel (París), E. Olivera (C ó r doba), H. Pelegrina Cetrán (Madrid), E. Probst, (Montevideo), J. Postel (París), D. Rabinovich (Bs. As.), D. J. Rapela (Córdoba), L. Ricon (Bs. As.), S. Resnik (París), E. Rodríguez Echandía (Mendoza), S. L. Rojtenberg (Bs. As.), F. Rotelli (Trieste), L. Salvarezza (Bs. As.), B. Samuel-Lajeunesse (París), C. Solomonoff (Rosario), T.
    [Show full text]
  • UNIVERSITY of CALIFORNIA Los Angeles the Ethics of Social Media
    UNIVERSITY OF CALIFORNIA Los Angeles The Ethics of Social Media Policy: National Principles of Justice, Security, Privacy and Freedom Governing Online Social Platforms in Russia, China and The United States A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Information Studies by Morten Bay Christensen 2018 © Copyright by Morten Bay Christensen 2018 ABSTRACT OF THE DISSERTATION The Ethics of Social Media Policy: National Principles of Justice, Security, Privacy and Freedom Governing Online Social Platforms on Russia, China and The United States by Morten Bay Christensen Doctor of Philosophy in Information Studies University of California, Los Angeles, 2018 Professor Leah A Lievrouw, Chair As social media have become a primary mode of expression and communication for large parts of the world’s population, social media platforms have also become vulnerable to less desirable actions. These include using social media for information warfare, recruiting and radicalizing potential terrorists or collecting data and information about users for purposes they have not consented to. The demand for an ethical discussion of social media policy at the national level is growing, and this study seeks to address that challenge. The study is an exploration of applied ethics in the context of information and technology policy. It addresses issues in information, media and technology ethics, applying a specific ethical theory to three cases. These three cases consist of Russian, Chinese and U.S. policies that relate to social media in a national information security or cybersecurity context, and which exist within the information and technology policy ii categories. Each of these three cases represent a specific type of social media policy.
    [Show full text]
  • HALLUCINOGENIC DRUGS and HYPNOSIS in Psychotherapyl Isaac Gubel, M.D.2
    HALLUCINOGENIC DRUGS AND HYPNOSIS IN PSYCHOTHERAPYl Isaac Gubel, M.D.2 In the present-day state of medicine, comes a patient and comes to the phy- the physician engaged in psychother- sician seeking for psychological bal- apy still lacks therapeutic resources of ance by way of a prescription or by a unquestioned value in the treatment session of psychotherapy, we meet a of psychological disease. Perhaps, new anxiety. This time it is the anxi- rather than to use the term "disease," ety of the professional man, who, in maladjustment in relation to external most cases, if he is intelligent and has reality would be the better conceptual- a critical understanding of the possi- ization. bilities of his science, feels compara- Weare accustomed to hear said, tively unarmed in his fight against the with reference to emotional malad- psychogenic pain of his patient. justments, that these result from a One form of escape from the vicious lack of adaptation to the social and circle deriving from subjective reac- familial milieu of the person who suf- tions of therapeutic inadequacy and fers from them. This may be a mis- appreciation of the patient's anxiety is statement because there are circum- orientation of the therapist in terms of stances in which a human being can- a pre-established therapeutic formula- not be required to renounce his liber- tion with psychological over-weight- ties in favor of a world that imposes ing, or retreat into its opposite, dog- on him tasks and duties of which he, matic organicism. Thus, the basis for personally, has not approved.
    [Show full text]
  • How the Neuroscience of Emotion Promotes Spiritual Transformation Christine M
    Digital Commons @ George Fox University Doctor of Ministry Seminary 3-1-2014 Sustainable Faith: How the Neuroscience of Emotion Promotes Spiritual Transformation Christine M. Mutch George Fox University, [email protected] This research is a product of the Doctor of Ministry (DMin) program at George Fox University. Find out more about the program. Recommended Citation Mutch, Christine M., "Sustainable Faith: How the Neuroscience of Emotion Promotes Spiritual Transformation" (2014). Doctor of Ministry. Paper 77. http://digitalcommons.georgefox.edu/dmin/77 This Dissertation is brought to you for free and open access by the Seminary at Digital Commons @ George Fox University. It has been accepted for inclusion in Doctor of Ministry by an authorized administrator of Digital Commons @ George Fox University. GEORGE FOX UNIVERSITY SUSTAINABLE FAITH: HOW THE NEUROSCIENCE OF EMOTION PROMOTES SPIRITUAL TRANSFORMATION A DISSERTATION SUBMITTED TO THE FACULTY OF GEORGE FOX EVANGELICAL SEMINARY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF MINISTRY BY CHRISTINE M. MUTCH PORTLAND, OREGON MARCH 2014 George Fox Evangelical Seminary George Fox University Portland, Oregon CERTIFICATE OF APPROVAL ________________________________ DMin Dissertation ________________________________ This is to certify that the DMin Dissertation of Christine M. Mutch has been approved by the Dissertation Committee on February 27, 2014 for the degree of Doctor of Ministry in Leadership and Spiritual Formation. Dissertation Committee: Primary Advisor: MaryKate Morse, PhD Secondary Advisor: Laura Simmons, PhD Copyright © 2014 by Christine M. Mutch All rights reserved. The Scripture quotations contained herein are taken from the New International Version (NIV) of the Bible, unless otherwise indicated. ii DEDICATION For my miracle babies, Gabe and Ella.
    [Show full text]
  • Inventing a Pathology of Catastrophe for Holocaust Survival
    The fi nal third of the book offers “Michael Dorland’s Cadaverland “Michael Dorland has written A powerful look at a comparative look at the “psy- is the most important historical an important and, in many ways, science” approach to Holocaust survival beyond France, particularly study dealing with the medical a strikingly original work that how French medical in the United States and Israel. He ramifi cations of the Holocaust. defi nitely ranks as superior science apprehended illuminates the peculiar journey C Focusing on the psychiatric and scholarship. By choosing to C of a medical discourse that began in France but took on new forms psychological literature dealing examine how the fi gure of the LAND and described Inventing elsewhere, eventually expanding A with the impact of the Shoah Holocaust survivor has been A Holocaust survival into nonmedical fi elds to create the for the survivors and for their studied, he has succeeded in basis of the “traumato-culture” with a Pathology which we are familiar today. families, Dorland sketches the uncovering new material and D diffi cult, contradictory, often weaving this together with a D of Catastrophe Embedding his analysis of self-destructive struggle of critical review of a vast range In this extraordinary study, different medical discourses in Michael Dorland explores sixty psychological medicine with the of scholarship into a readable, for Holocaust the sociopolitical history of France A A years of medical attempts by in the twentieth century, he also horrors of the Shoah. Brilliantly yet subtle, and often eloquent, French doctors (mainly in the looks at the French Jewish Question written and ranging well beyond narrative.” Survival fi elds of neuropsychiatry and as it affected French medicine, V the confi nes of post-war France, V psychoanalysis) to describe the the effects of fi ve years of Nazi toby gelfand, effects of concentration camp Occupation, France’s enthusiastic this is a book that health care Jason A.
    [Show full text]
  • Download Report 2004-05
    Titel: Envelope, addressed to Albert Einstein „Chief Engineer of the Universe“. The Hebrew University, Jewish National & University Library, Albert Einstein Archives, Jerusalem, Israel, E. A. 031–742 Rückseite: The entrance of the Institute’s new building. Architects: Dietrich Dietrich, Stuttgart Most of the portrait photographs were done by Skúli Sigurdsson, Berlin/Reykjavík M A X-P LANCK-INSTITUT FÜR WISSENSCHAFTSGESCHICHTE Max Planck Institute for the History of Science RESEAR CH REPOR T 2004—2005 Introduction Modern societies are saturated with science and technology. Spatial patterns—whether high-rise clusters or low-rise sprawls—and temporal rhythms, ever accelerating, reveal how profoundly science and technology have influenced the very framework of modernity. These influences penetrate deep into the realm of meaning as well as that of matter. The prestissimo pace of scientific innovation challenges citizens and leaders of modern polities to reform, create, or scrap values and institutions in order to integrate (or reject) the new possibilities. The understanding of what counts as knowledge, of truth itself, has been shaped not only by the results, but also by the historical development of the sciences. The Max Planck Institute for the History of Science (MPIWG) studies this develop- ment in breadth and depth. Its research projects span ancient Babylonian mathemat- ics and the human genome project, the rise of the twentieth-century neurosciences and the decline of Renaissance chronologies. Many projects are comparative, both historically and cross-culturally: for example, longue durée studies of mechanics from classical Greek and Roman antiquity to quantum mechanics, embracing not only learned treatises but also the practical knowledge crystallized in Italian fortifications and traditional Chinese market balances.
    [Show full text]
  • Stimulus Integration and Parsing in the Primate Auditory Midbrain
    Stimulus Integration and Parsing in the Primate Auditory Midbrain By Daniel Scott Pages Department of Psychology and Neuroscience Duke University Date: Approved: ___________________________ Jennifer M Groh, Supervisor and Chair __________________________ Marty Woldorff __________________________ Elizabeth Brannon __________________________ Marc Sommer, Administrator __________________________ Michael Reed Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, in the Department of Psychology and Neuroscience in the Graduate School of Duke University 2016 ABSTRACT Stimulus Integration and Parsing in the Primate Auditory Midbrain By Daniel Scott Pages Department of Psychology and Neuroscience Duke University Date: Approved: ___________________________ Jennifer M Groh, Supervisor and Chair __________________________ Marty Woldorff __________________________ Elizabeth Brannon __________________________ Marc Sommer, Administrator __________________________ Michael Reed An abstract of a Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, in the Department of Psychology and Neuroscience in the Graduate School of Duke University 2016 Copyright by Daniel Scott Pages 2016 ABSTRACT Integrating information from multiple sources is a crucial function of the brain. Examples of such integration include multiple stimuli of different modalties, such as sights and sounds, multiple stimuli of the same modality, such as sounds and sounds, and integrating stimuli from the sensory organs (i.e. ears) with artificial stimulation of the brain using neuroprosthetics. The overall aim of this body of work is to empirically examine stimulus representation in these three domains to inform our broader understanding of how and when the brain combines information from multiple sources. First, I examine visually-guided auditory plasticity, a problem with implications for the general problem in learning of how the brain determines what lesson to learn (and what lessons not to learn).
    [Show full text]