Generative Grammar, Neural Networks, and the Implementational Mapping Problem: Response to Pater

Generative Grammar, Neural Networks, and the Implementational Mapping Problem: Response to Pater Ewan Dunbar Language, Ahead of Print, vol. 95, no. 1, (Article) Published by Linguistic Society of America DOI: https://doi.org/10.1353/lan.2019.0000 This is a preprint article. When the final version of this article launches, this URL will be automatically redirected. For additional information about this preprint article https://muse.jhu.edu/article/718439/summary Access provided by University of California @ Irvine (26 Feb 2019 01:07 GMT) PERSPECTIVES Generative grammar, neural networks, and the implementational mapping problem: Response to Pater EWAN DUNBAR CNRS, Université Paris Diderot, and Université Sorbonne Paris Cité The target article (Pater 2019) proposes to use neural networks to model learning within exist - ing grammatical frameworks. This is easier said than done. There is a fundamental gap to be bridged that does not receive attention in the article: how can we use neural networks to examine whether it is possible to learn some linguistic representation (a tree, for example) when, after learning is finished, we cannot even tell if this is the type of representation that has been learned (all we see is a sequence of numbers)? Drawing a correspondence between an abstract linguistic representational system and an opaque parameter vector that can (or perhaps cannot) be seen as an instance of such a representation is an implementational mapping problem . Rather than rely - ing on existing frameworks that propose partial solutions to this problem, such as harmonic gram - mar, I suggest that fusional research of the kind proposed needs to directly address how to ‘find’ linguistic representations in neural network representations.* Keywords : neural networks, grammatical formalisms, cognitive science, implementational map - ping, generative grammar 1. Introduction. On the same list (Sanz 2008) that ranked Syntactic structures (Chomsky 1957) as the number one most influential twentieth-century work in cogni - tive science, the number two work is David Marr’s Vision (Marr 1982). Marr proposed that the only way to achieve an understanding of complex information-processing sys - tems (like those found in the brain) is to simultaneously analyze them at three levels of analysis: a computational level, where one formally defines the problem the sys - tem is solving, specifying which inputs must map to which outputs; an algorithmic- representational level, where one spells out a method for arriving at an output, given a particular input—a formal hypothesis about how the system encodes informa - tion, and how it manipulates it; and an implementational level, which details the physical system itself. Generative grammar standardly proposes to analyze language cognition at the first two levels, leaving the problem of how such a system would ac - tually be implemented in the brain open—leaving open, thus, the nature of the link between the algorithmic-representational theory and the physical implementation. Specifying and evaluating any such link is a difficult problem in itself—call it the im - plementational mapping problem . Pater’s (2019) proposal for closer interdisciplinary integration between generative grammar and neural network research immediately runs up against the implementa - tional mapping problem. Neural network models—while not brains, or even brain mod - els—are complex systems whose behavior cannot be understood just by inspecting their internal state. By contrast, representations proposed in linguistic theories are designed so that human researchers can write and read them—hypotheses of the kind that Marr had in mind for his intermediate level, explicit enough to be programmed on a com - puter, yet high-level enough to be understood. They are not directly comparable with the internal states of neural networks, which are simply long sequences of numerical parameters. Pater writes: ‘When neural network modeling is integrated with gram - matical formalisms … , we may be able to go further in assessing the extent to which * This work was supported by the French Agence Nationale de la Recherche grants GEOMPHON (ANR- 17-CE28-0009) and Labex-EFL (ANR-10-LABX-0083). e87 Printed with the permission of Ewan Dunbar. © 2019. e88 LANGUAGE, VOLUME 95, NUMBER 1 (2019) grammatical representations can be learned from experience’ (p. e64, emphasis mine). The sketch Pater outlines is missing a component critical to an integration of this kind. If the network is seen as the implementation, then a fundamental part of the work of ‘integration’ with grammatical formalisms consists in solving the mapping problem. In principle, feedforward networks can implement any grammar, or any formal rep - resentation, or approximate it with arbitrary precision—with the basic units of repre - sentation implemented as vectors in some numerical parameter space, and with operations manipulating these representations corresponding to information flowing through the network, changing its state (Hornik 1991, Leshno et al. 1993, Lin et al. 2017). This makes them attractive for simulating grammatical learning. But doing so first requires answering some fundamental questions: • Given a trained network, does it make use of representations consistent with a lin - guistic theory T? (Can we, for example, see its representations as trees: Smolensky 1988b, Pollack 1988?) Or is such an interpretation provably untenable? • … or has the network approximately arrived at a solution within the scope of T—for example, one for which a subset of the representations can be seen as trees? • Can all possible parameter settings learnable by a given class of network N (con - strained to have a certain architecture, with a certain type of units) be seen as licit grammars in theory T? In other words, is N constrained to learn within theory T (or come close)? • Conversely, can all grammars in T be implemented as networks in N? Only when we know how to answer these questions can we address the question the target article hopes to be able to answer by fusing the two research programs: given a class of networks N that has the capacity to go beyond some theory of possible gram - mars T, is an arbitrary network of this class guaranteed to learn a grammar in T if we give it realistic data? Or does it require special evidence, inaccessible to the child, in order to be constrained to grammars in T? This would allow us to conclude something useful: that the learner must have mechanisms to constrain it to T (N is too broad), or that the theory of learning from data is wrong, or that the theory of grammar is wrong (and should maybe allow the alternative solutions available within N). But to arrive at these answers, we must have an implementational mapping between network and formalism. 2. The implementational mapping problem. When a human brain is cut open, trees do not come fanning out, but this does not mean that this syntactic theory is wrong. This kind of observation is the essence of Marr’s methodological program. Even if we had access to the physical state of every cell in the brain, we would need to do work to understand whether their activity could be seen as building trees, or as some - thing rather different. Before we can validate an abstract theory of how the system might work, we need some systematic theory of how the abstract elements and opera - tions map to the physical implementation. Equally, if we wish to simulate learning of generative grammars with neural networks as ‘hardware’, particularly in the case where we do not force the networks to learn specific kinds of representations, we need some system for linking an abstract formal theory to the networks’ representations. This im - plies making the formal theory explicit—and noting which elements of it are there to describe how the system does its work, rather than just characterizing what it does— and then articulating how we would recognize what the network is doing. To see the general problem, take the artificial example of addition of natural num - bers. Addition is a function that applies to two natural numbers and results in some other natural number. Physical implementations of addition include the proper use of an PERSPECTIVES e89 abacus, standard pencil and paper column-based arithmetic with carrying, or a spread - sheet taking inputs with a keyboard and giving outputs on a computer monitor. For each of these very different systems, we can say that addition is what they do. We can only say this with certainty, however, once we characterize exactly what addition is. A com - putational-level description of a system implementing addition would be any complete, accurate description of the behavior of the two-place function add (·, ·)—what kinds of inputs does it take, and what is the relation between its inputs and its outputs? One set of axioms for arithmetic is due to Giuseppe Peano (given in Mendelson 1997) and as - serts, for example, that the set of manipulated elements must contain a special element called 0, that there be a function called successo r(x) which always yields an element distinct from x, that add (x, y) yields x when y = 0—and, in all other cases, add (x, suc - cesso r( y)) = successo r(add (x, y))—along with several other details. When these ax - ioms are put together, they have as logical consequences that the inputs must be a system equivalent to natural numbers, and they also specify uniquely what the output of any addition must be. The physical system itself will be quite far removed from any computational-level description. A partial understanding of how the system works on a purely physical level might be enough to verify whether a hypothesis about what the system is doing is cor - rect, but Marr proposes that, to meaningfully link between a computational-level de - scription and a physical implementation, a detailed, abstract hypothesis is necessary, which goes beyond describing what a system does to making a claim about how it does it.

Generative Grammar, Neural Networks, and the Implementational Mapping Problem: Response to Pater

Paul Smolensky

A View from Gradient Harmonic Grammar

Harmonic Grammar with Linear Programming

Generative Linguistics and Neural Networks at 60: Foundation, Friction, and Fusion*

What Is Optimality Theory? John J

Learning-Theoretic Linguistics: Some Examples from Phonology

Introduction to OT Syntax

Computational Linguistics and Deep Learning

Arxiv:2006.16324V1 [Cs.CL] 29 Jun 2020 the Data (Mitchell, 1997)

Sympathy and Phonological Opacity* John J

The Learnability of Optimality Theory: an Algorithm and Some Basic Complexity Results

The Pursuit of Theory