Cognitive Science in the Era of Artificial Intelligence: a Roadmap for Reverse-Engineering the Infant Language-Learner Emmanuel Dupoux

Total Page:16

File Type:pdf, Size:1020Kb

Cognitive Science in the Era of Artificial Intelligence: a Roadmap for Reverse-Engineering the Infant Language-Learner Emmanuel Dupoux Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner Emmanuel Dupoux To cite this version: Emmanuel Dupoux. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner. Cognition, Elsevier, 2018, 173, pp.43 - 59. 10.1016/j.cognition.2017.11.008. hal-01888694 HAL Id: hal-01888694 https://hal.archives-ouvertes.fr/hal-01888694 Submitted on 7 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner∗ Emmanuel Dupoux EHESS, ENS, PSL Research University, CNRS, INRIA, Paris, France [email protected], www.syntheticlearner.net Abstract • Machine/humans comparison should be run on a benchmark of psycholinguistic tests. Spectacular progress in the information processing sci- ences (machine learning, wearable sensors) promises to revo- 1 Introduction lutionize the study of cognitive development. Here, we anal- yse the conditions under which ’reverse engineering’ lan- In recent years, artificial intelligence (AI) has been hit- guage development, i.e., building an effective system that ting the headlines with impressive achievements at matching mimics infant’s achievements, can contribute to our scien- or even beating humans in complex cognitive tasks (play- tific understanding of early language development. We argue ing go or video games: Mnih et al., 2015; Silver et al., that, on the computational side, it is important to move from 2016; processing speech and natural language: Amodei et toy problems to the full complexity of the learning situation, al., 2016; Ferrucci, 2012; recognizing objects and faces: He, and take as input as faithful reconstructions of the sensory Zhang, Ren, & Sun, 2015; Lu & Tang, 2014) and promising signals available to infants as possible. On the data side, a revolution in manufacturing processes and human society accessible but privacy-preserving repositories of home data at large. These successes show that with statistical learning have to be setup. On the psycholinguistic side, specific tests techniques, powerful computers and large amounts of data, have to be constructed to benchmark humans and machines it is possible to mimic important components of human cog- at different linguistic levels. We discuss the feasibility of this nition. Shockingly, some of these achievements have been approach and present an overview of current results. reached by throwing out some of the classical theories in lin- guistics and psychology, and by training relatively unstruc- Keywords tured neural network systems on large amounts of data. What does it tell us about the underlying psychological and/or neu- Artificial intelligence, speech, psycholinguistics, compu- ral processes that are used by humans to solve these tasks? tational modeling, corpus analysis, early language acquisi- Can AI provide us with scientific insights about human learn- tion, infant development, language bootstrapping, machine ing and processing? learning. Here, we argue that developmental psychology and in par- ticular, the study of language acquisition is one area where, Highlights indeed, AI and machine learning advances can be transfor- • Key theoretical puzzles of infant language develop- mational, provided that the involved fields make significant ment are still unsolved. adjustments in their practices in order to adopt what we call the reverse engineering approach. Specifically: arXiv:1607.08723v4 [cs.CL] 14 Feb 2018 • A roadmap for reverse engineering infant language learning using AI is proposed. The reverse engineering approach to the study of infant language acquisition consists in con- • AI algorithms should use realistic input (little or no structing scalable computational systems that supervision, raw data). can, when fed with realistic input data, mimic language acquisition as it is observed in infants. • Realistic input should be obtained by large scale recording in ecological environments. The three italicised terms will be discussed at length in subsequent sections of the paper. For now, only an intuitive understanding of these terms will suffice. The idea of us- * Author’s reprint of Dupoux, E. (2018). Cognitive sci- ing machine learning or AI techniques as a means to study ence in the era of artificial intelligence: A roadmap for reverse- child’s language learning is actually not new (to name a few: engineering the infant language learner. Cognition, 173, 43-59, doi: Kelley, 1967; Anderson, 1975; Berwick, 1985; Rumelhart 10.1016/j.cognition.2017.11.008. & McClelland, 1987; Langley & Carbonell, 1987) although 2 E. DUPOUX relatively few studies have concentrated on the early phases Third, the human language capacity can be viewed as a of language learning (see Brent, 1996b, for a pioneering col- finite computational system with the ability to generate a lection of essays). What is new, however, is that whereas (virtual) infinity of utterances. This turns into a learnabil- previous AI approaches were limited to proofs of principle ity problem for infants: on the basis of finite evidence, they on toy or miniature languages, modern AI techniques have have to induce the (virtual) infinity corresponding to their scaled up so much that end-to-end language processing sys- language. As has been discussed since Aristotle, such induc- tems working with real inputs are now deployed commer- tion problems do not have a generally valid solution. There- cially. This paper examines whether and how such unprece- fore, language is simultaneously a human-specific biological dented change in scale could be put to use to address linger- trait, a highly variable cultural production, and an apparently ing scientific questions in the field of language development. intractable learning problem. The structure of the paper is as follows: In Section 2, we Despite these complexities, most infants spontaneously present two deep scientific puzzles that large scale modeling learn their native(s) language(s) in a matter of a few years of approaches could in principle address: solving the bootstrap- immersion in a linguistic environment. The more we know ping problem, accounting for developmental trajectories. In about this simple fact, the more puzzling it appears. Specifi- Section 3, we review past theoretical and modeling work, cally, we outline two deep scientific puzzles that a reverse en- showing that these puzzles have not, so far, received an ad- gineering approach could, in principle help to solve: solving equate answer. In Section 4, we argue that to answer them the bootstrapping problem and accounting for developmental with reverse engineering, three requirements have to be ad- trajectories. The first puzzle relates to the ultimate outcome dressed: (1) modeling should be computationally scalable, of language learning: the so-called stable state, defined here (2) it should be done on realistic data, (3) model performance as the stabilized language competence in the adult. The sec- should be compared with that of humans. In Section 5, re- ond puzzle relates to what we know of the intermediate steps cent progress in AI is reviewed in light of these three require- in the acquisition process, and their variations as a function ments. In Section 6, we assess the feasibility of the reverse of language input.1 engineering approach and lay out the road map that has to be followed to reach its objectives 2.1 Solving the bootstrapping problem , and we conclude in Section 7. The stable state is the operational knowledge which en- ables adults to process a virtual infinity of utterances in their 2 Two deep puzzles of early language development native language. The most articulated description of this Language development is a theoretically important sub- stable state has been offered by theoretical linguistics; it is field within the study of human cognitive development for viewed as a grammar comprising several components: pho- the following three reasons: netics, phonology, morphology, syntax, semantics, pragmat- First, the linguistic system is uniquely complex: mastering ics. a language implies mastering a combinatorial sound system The bootstrapping problem arises from the fact these dif- (phonetics and phonology), an open ended morphologically ferent components appear interdependent from a learning structured lexicon, and a compositional syntax and seman- point of view. For instance, the phoneme inventory of a lan- tics (e.g., Jackendoff, 1997). No other animal communica- guage is defined through pairs of words that differ minimally tion system uses such a complex multilayered organization in sounds (e.g., "light" vs "right"). This would suggest that to (Hauser, Chomsky, & Fitch, 2002). On this basis, it has learn phonemes, infants need to first learn words. However, been claimed that humans have evolved (or acquired through from a processing viewpoint, words are recognized through a mutation) an innately specified computational architecture their phonological
Recommended publications
  • Cognitive Science 1
    Cognitive Science 1 • Linguistics, Minor (https://e-catalogue.jhu.edu/arts-sciences/full- COGNITIVE SCIENCE time-residential-programs/degree-programs/cognitive-science/ linguistics-minor/) http://www.cogsci.jhu.edu For current course information and registration go to https://sis.jhu.edu/ Cognitive science is the study of the human mind and brain, focusing on classes/ how the mind represents and manipulates knowledge and how mental representations and processes are realized in the brain. Conceiving of the mind as an abstract computing device instantiated in the brain, Courses cognitive scientists endeavor to understand the mental computations AS.050.102. Language and Mind. 3 Credits. underlying cognitive functioning and how these computations are Introductory course dealing with theory, methods, and current research implemented by neural tissue. Cognitive science has emerged at the topics in the study of language as a component of the mind. What it is interface of several disciplines. Central among these are cognitive to "know" a language: components of linguistic knowledge (phonetics, psychology, linguistics, and portions of computer science and artificial phonology, morphology, syntax, semantics) and the course of language intelligence; other important components derive from work in the acquisition. How linguistic knowledge is put to use: language and the neurosciences, philosophy, and anthropology. This diverse ancestry brain and linguistic processing in various domains. has brought into cognitive science several different perspectives and Area: Natural Sciences, Social and Behavioral Sciences methodologies. Cognitive scientists endeavor to unite such varieties AS.050.105. Introduction to Cognitive Neuropsychology. 3 Credits. of perspectives around the central goal of characterizing the structure When the brain is damaged or fails to develop normally, even the most of human intellectual functioning.
    [Show full text]
  • Reliability in Cognitive Neuroscience: a Meta-Meta-Analysis Books Written by William R
    Reliability in Cognitive Neuroscience: A Meta-Meta-Analysis Books Written by William R. Uttal Real Time Computers: Techniques and Applications in the Psychological Sciences Generative Computer Assisted Instruction (with Miriam Rogers, Ramelle Hieronymus, and Timothy Pasich) Sensory Coding: Selected Readings (Editor) The Psychobiology of Sensory Coding Cellular Neurophysiology and Integration: An Interpretive Introduction. An Autocorrelation Theory of Form Detection The Psychobiology of Mind A Taxonomy of Visual Processes Visual Form Detection in 3-Dimensional Space Foundations of Psychobiology (with Daniel N. Robinson) The Detection of Nonplanar Surfaces in Visual Space The Perception of Dotted Forms On Seeing Forms The Swimmer: An Integrated Computational Model of a Perceptual-Motor System (with Gary Bradshaw, Sriram Dayanand, Robb Lovell, Thomas Shepherd, Ramakrishna Kakarala, Kurt Skifsted, and Greg Tupper) Toward A New Behaviorism: The Case against Perceptual Reductionism Computational Modeling of Vision: The Role of Combination (with Ramakrishna Kakarala, Sriram Dayanand, Thomas Shepherd, Jaggi Kalki, Charles Lunskis Jr., and Ning Liu) The War between Mentalism and Behaviorism: On the Accessibility of Mental Processes The New Phrenology: On the Localization of Cognitive Processes in the Brain A Behaviorist Looks at Form Recognition Psychomyths: Sources of Artifacts and Misrepresentations in Scientifi c Cognitive neuroscience Dualism: The Original Sin of Cognitivism Neural Theories of Mind: Why the Mind-Brain Problem May Never Be Solved Human Factors in the Courtroom: Mythology versus Science The Immeasurable Mind: The Real Science of Psychology Time, Space, and Number in Physics and Psychology Distributed Neural Systems: Beyond the New Phrenology Neuroscience in the Courtroom: What Every Lawyer Should Know about the Mind and the Brain Mind and Brain: A Critical Appraisal of Cognitive Neuroscience Reliability in Cognitive Neuroscience: A Meta-Meta-Analysis Reliability in Cognitive Neuroscience: A Meta-Meta-Analysis William R.
    [Show full text]
  • Unsupervised Recurrent Neural Network Grammars
    Unsupervised Recurrent Neural Network Grammars Yoon Kim Alexander Rush Lei Yu Adhiguna Kuncoro Chris Dyer G´abor Melis Code: https://github.com/harvardnlp/urnng 1/36 Language Modeling & Grammar Induction Goal of Language Modeling: assign high likelihood to held-out data Goal of Grammar Induction: learn linguistically meaningful tree structures without supervision Incompatible? For good language modeling performance, need little independence assumptions and make use of flexible models (e.g. deep networks) For grammar induction, need strong independence assumptions for tractable training and to imbue inductive bias (e.g. context-freeness grammars) 2/36 Language Modeling & Grammar Induction Goal of Language Modeling: assign high likelihood to held-out data Goal of Grammar Induction: learn linguistically meaningful tree structures without supervision Incompatible? For good language modeling performance, need little independence assumptions and make use of flexible models (e.g. deep networks) For grammar induction, need strong independence assumptions for tractable training and to imbue inductive bias (e.g. context-freeness grammars) 2/36 This Work: Unsupervised Recurrent Neural Network Grammars Use a flexible generative model without any explicit independence assumptions (RNNG) =) good LM performance Variational inference with a structured inference network (CRF parser) to regularize the posterior =) learn linguistically meaningful trees 3/36 Background: Recurrent Neural Network Grammars [Dyer et al. 2016] Structured joint generative model of sentence x and tree z pθ(x; z) Generate next word conditioned on partially-completed syntax tree Hierarchical generative process (cf. flat generative process of RNN) 4/36 Background: Recurrent Neural Network Language Models Standard RNNLMs: flat left-to-right generation xt ∼ pθ(x j x1; : : : ; xt−1) = softmax(Wht−1 + b) 5/36 Background: RNNG [Dyer et al.
    [Show full text]
  • The Place of Modeling in Cognitive Science
    The place of modeling in cognitive science James L. McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford University James L. McClelland Department of Psychology Stanford University Stanford, CA 94305 650-736-4278 (v) / 650-725-5699 (f) [email protected] Running Head: Modeling in cognitive science Keywords: Modeling frameworks, computer simulation, connectionist models, Bayesian approaches, dynamical systems, symbolic models of cognition, hybrid models, cognitive architectures Abstract I consider the role of cognitive modeling in cognitive science. Modeling, and the computers that enable it, are central to the field, but the role of modeling is often misunderstood. Models are not intended to capture fully the processes they attempt to elucidate. Rather, they are explorations of ideas about the nature of cognitive processes. As explorations, simplification is essential – it is only through simplification that we can fully understand the implications of the ideas. This is not to say that simplification has no downsides; it does, and these are discussed. I then consider several contemporary frameworks for cognitive modeling, stressing the idea that each framework is useful in its own particular ways. Increases in computer power (by a factor of about 4 million) since 1958 have enabled new modeling paradigms to emerge, but these also depend on new ways of thinking. Will new paradigms emerge again with the next 1,000-fold increase? 1. Introduction With the inauguration of a new journal for cognitive science, thirty years after the first meeting of the Cognitive Science Society, it seems essential to consider the role of computational modeling in our discipline.
    [Show full text]
  • Author: Edwin Hutchins Title: Cognitive Ecology Affiliation: Department of Cognitive Science, University of California San Dieg
    Author: Edwin Hutchins Title: Cognitive Ecology Affiliation: Department of Cognitive Science, University of California San Diego Tel: 858 534-1134 Fax: 858 822-2476 email: [email protected] Running Head: Cognitive Ecology Abstract: Cognitive ecology is the study of cognitive phenomena in context. In particular, it points to the web of mutual dependence among the elements of a cognitive ecosystem. At least three fields were taking a deeply ecological approach to cognition thirty years ago: Gibson’s ecological psychology, Bateson’s ecology of mind, and Soviet cultural-historical activity theory. The ideas developed in those projects have now found a place in modern views of embodied, situated, distributed cognition. As cognitive theory continues to shift from units of analysis defined by inherent properties of the elements to units defined in terms of dynamic patterns of correlation across elements, the study of cognitive ecosystems will become an increasingly important part of cognitive science. Keywords: units of analysis for cognition, ecological psychology, ecology of mind, activity theory, embodied cognition, situated cognition, distributed cognition, brain-body-world systems, human culture. Cognitive Ecology Choosing units of analysis for cognition Cognitive ecology is the study of cognitive phenomena in context. Elements of cognitive ecology have been present in various corners, but not the core, of cognitive science since the birth of the field. It is now being rediscovered as cognitive science shifts from viewing cognition as a logical process to seeing it as a biological phenomenon. Everything is connected to everything else. Fortunately, not all connectivity is equally dense. The non- uniformity of connectivity makes science possible.
    [Show full text]
  • Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges
    Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges TRIET H. M. LE, The University of Adelaide HAO CHEN, The University of Adelaide MUHAMMAD ALI BABAR, The University of Adelaide Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well. CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies → Neural networks; Natural language processing; • Software and its engineering → Software notations and tools; Source code generation; Additional Key Words and Phrases: Deep learning, Big Code, Source code modeling, Source code generation 1 INTRODUCTION Deep Learning (DL) has recently emerged as an important branch of Machine Learning (ML) because of its incredible performance in Computer Vision and Natural Language Processing (NLP) [75]. In the field of NLP, it has been shown that DL models can greatly improve the performance of many classic NLP tasks such as semantic role labeling [96], named entity recognition [209], machine translation [256], and question answering [174].
    [Show full text]
  • The Rising Field of Bioinformatics
    Serena Hughes Case Study Spring 2019 The Rising Field of Bioinformatics Overview. Bioinformatics is an up-and-coming field within the intersection of biology and computer science. Biology and computer science were initially thought of separately, but with technological advances it soon became clear that there was an overlap where computer science algorithms could be used to further biological discoveries and biological systems could be used as a basis for computational models. The growth of research combining the two has led to the recent growth of the field of bioinformatics. Currently, bioinformatics is still relatively small, often housed as a specialization of biology, computer science, or other departments at universities. In 2009, only approximately 17% of bioinformatics programs were part of a bioinformatics department, with most programs being housed under biology or biomedical departments.[7] While it has been ten years since this data was collected, even in 2018 U.S. News’ ranking of bioinformatics programs included only 6 schools total.[11] Bioinformatics is a small and growing field, so the importance of the organization of bioinformatics is also growing. What is Being Organized? Bioinformatics has become increasingly interdisciplinary as it has grown, building on other scientific and mathematical fields. These subfields and the types of research they enable are the resources being organized. In organizing these resources, we will shed light on the structure of the field of bioinformatics as a whole. Why is it Being Organized? The term “bioinformatics” has been in use since the 1970s. Around that time, bioinformatics was “defined as the study of informatic processes in biotic systems”.
    [Show full text]
  • Research on the Strategy of E-Commerce Teaching Reform Based on Brain Cognition
    KURAM VE UYGULAMADA EĞİTİM BİLİMLERİ EDUCATIONAL SCIENCES: THEORY & PRACTICE Received: November 2, 2017 Revision received: April 12, 2018 Copyright © 2018 EDAM Accepted: April 23, 2018 www.estp.com.tr DOI 10.12738/estp.2018.5.063 ⬧ October 2018 ⬧ 18(5) ⬧ 1637-1646 Research Article Research on the Strategy of E-commerce Teaching Reform Based on Brain Cognition 1 2 Ronggang Zhang Xiaheng Zhang Northwest University of Political Northwest University of Political Science and Law Science and Law Abstract The rapid development of brain science and cognitive science and the demand of education and teaching reform urge the development of brain-based education movement. With e-commerce teaching reform strategy as the research target, and literature analysis method, case teaching method and questionnaire as research methods, this paper constructs the teaching design pattern based on brain cognition of the course of Introduction to E- Commerce on the basis of the detailed analysis of teaching principle based on brain, teaching theory related to brain and theories related to teaching design, taking the course of Introduction to E-Commerce in colleges and universities as an example. The investigation and analysis of the course implementation and teaching effect verify that the teaching design pattern of e-commerce course based on brain cognition can promote students' interest in learning and knowledge mastery, providing a new teaching mode to be referred for e-commerce teaching reform. Keywords Brain Science • E-commerce Teaching • Teaching Elements • Teaching Design _______________________________________________ 1 Correspondence to: Ronggang Zhang (PhD), Business School, Northwest University of Political Science and Law, Xi’an710122, China. Email: [email protected] 2 Business School, Northwest University of Political Science and Law, Xi’an710122, China.
    [Show full text]
  • The Cognitive Revolution: a Historical Perspective
    Review TRENDS in Cognitive Sciences Vol.7 No.3 March 2003 141 The cognitive revolution: a historical perspective George A. Miller Department of Psychology, Princeton University, 1-S-5 Green Hall, Princeton, NJ 08544, USA Cognitive science is a child of the 1950s, the product of the time I went to graduate school at Harvard in the early a time when psychology, anthropology and linguistics 1940s the transformation was complete. I was educated to were redefining themselves and computer science and study behavior and I learned to translate my ideas into the neuroscience as disciplines were coming into existence. new jargon of behaviorism. As I was most interested in Psychology could not participate in the cognitive speech and hearing, the translation sometimes became revolution until it had freed itself from behaviorism, tricky. But one’s reputation as a scientist could depend on thus restoring cognition to scientific respectability. By how well the trick was played. then, it was becoming clear in several disciplines that In 1951, I published Language and Communication [1], the solution to some of their problems depended cru- a book that grew out of four years of teaching a course at cially on solving problems traditionally allocated to Harvard entitled ‘The Psychology of Language’. In the other disciplines. Collaboration was called for: this is a preface, I wrote: ‘The bias is behavioristic – not fanatically personal account of how it came about. behavioristic, but certainly tainted by a preference. There does not seem to be a more scientific kind of bias, or, if there is, it turns out to be behaviorism after all.’ As I read that Anybody can make history.
    [Show full text]
  • Munin: a Peer-To-Peer Middleware for Ubiquitous Analytics and Visualization Spaces
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. Y, MONTH 2014 1 Munin: A Peer-to-Peer Middleware for Ubiquitous Analytics and Visualization Spaces Sriram Karthik Badam, Student Member, IEEE, Eli Fisher, and Niklas Elmqvist, Senior Member, IEEE Abstract—We present Munin, a software framework for building ubiquitous analytics environments consisting of multiple input and output surfaces, such as tabletop displays, wall-mounted displays, and mobile devices. Munin utilizes a service-based model where each device provides one or more dynamically loaded services for input, display, or computation. Using a peer-to-peer model for communication, it leverages IP multicast to replicate the shared state among the peers. Input is handled through a shared event channel that lets input and output devices be fully decoupled. It also provides a data-driven scene graph to delegate rendering to peers, thus creating a robust, fault-tolerant, decentralized system. In this paper, we describe Munin’s general design and architecture, provide several examples of how we are using the framework for ubiquitous analytics and visualization, and present a case study on building a Munin assembly for multidimensional visualization. We also present performance results and anecdotal user feedback for the framework that suggests that combining a service-oriented, data-driven model with middleware support for data sharing and event handling eases the design and execution of high performance distributed visualizations. Index Terms—Ubiquitous analytics, high-resolution displays, multi-display environments, distributed visualization, framework. F 1 INTRODUCTION peers can subscribe and publish, as well as a shared ISUALIZATION has long relied on computing devices event channel where peers can produce and consume V equipped with mice, keyboards and monitors for real-time events for input and coordination.
    [Show full text]
  • GRAINS: Generative Recursive Autoencoders for Indoor Scenes
    GRAINS: Generative Recursive Autoencoders for INdoor Scenes MANYI LI, Shandong University and Simon Fraser University AKSHAY GADI PATIL, Simon Fraser University KAI XU∗, National University of Defense Technology SIDDHARTHA CHAUDHURI, Adobe Research and IIT Bombay OWAIS KHAN, IIT Bombay ARIEL SHAMIR, The Interdisciplinary Center CHANGHE TU, Shandong University BAOQUAN CHEN, Peking University DANIEL COHEN-OR, Tel Aviv University HAO ZHANG, Simon Fraser University Bedrooms Living rooms Kitchen Fig. 1. We present a generative recursive neural network (RvNN) based on a variational autoencoder (VAE) to learn hierarchical scene structures, enabling us to easily generate plausible 3D indoor scenes in large quantities and varieties (see scenes of kitchen, bedroom, office, and living room generated). Using the trained RvNN-VAE, a novel 3D scene can be generated from a random vector drawn from a Gaussian distribution in a fraction of a second. We present a generative neural network which enables us to generate plau- learned distribution. We coin our method GRAINS, for Generative Recursive sible 3D indoor scenes in large quantities and varieties, easily and highly Autoencoders for INdoor Scenes. We demonstrate the capability of GRAINS efficiently. Our key observation is that indoor scene structures are inherently to generate plausible and diverse 3D indoor scenes and compare with ex- hierarchical. Hence, our network is not convolutional; it is a recursive neural isting methods for 3D scene synthesis. We show applications of GRAINS network or RvNN. Using a dataset of annotated scene hierarchies, we train including 3D scene modeling from 2D layouts, scene editing, and semantic a variational recursive autoencoder, or RvNN-VAE, which performs scene scene segmentation via PointNet whose performance is boosted by the large object grouping during its encoding phase and scene generation during quantity and variety of 3D scenes generated by our method.
    [Show full text]
  • Unsupervised Recurrent Neural Network Grammars
    Unsupervised Recurrent Neural Network Grammars Yoon Kimy Alexander M. Rushy Lei Yu3 Adhiguna Kuncoroz;3 Chris Dyer3 Gabor´ Melis3 yHarvard University zUniversity of Oxford 3DeepMind fyoonkim,[email protected] fleiyu,akuncoro,cdyer,[email protected] Abstract The standard setup for unsupervised structure p (x; z) Recurrent neural network grammars (RNNG) learning is to define a generative model θ are generative models of language which over observed data x (e.g. sentence) and unob- jointly model syntax and surface structure by served structure z (e.g. parse tree, part-of-speech incrementally generating a syntax tree and sequence), and maximize the log marginal like- P sentence in a top-down, left-to-right order. lihood log pθ(x) = log z pθ(x; z). Success- Supervised RNNGs achieve strong language ful approaches to unsupervised parsing have made modeling and parsing performance, but re- strong conditional independence assumptions (e.g. quire an annotated corpus of parse trees. In context-freeness) and employed auxiliary objec- this work, we experiment with unsupervised learning of RNNGs. Since directly marginal- tives (Klein and Manning, 2002) or priors (John- izing over the space of latent trees is in- son et al., 2007). These strategies imbue the learn- tractable, we instead apply amortized varia- ing process with inductive biases that guide the tional inference. To maximize the evidence model to discover meaningful structures while al- lower bound, we develop an inference net- lowing tractable algorithms for marginalization; work parameterized as a neural CRF con- however, they come at the expense of language stituency parser. On language modeling, unsu- modeling performance, particularly compared to pervised RNNGs perform as well their super- vised counterparts on benchmarks in English sequential neural models that make no indepen- and Chinese.
    [Show full text]