Fluid Construction Grammar for Historical and Evolutionary Linguistics Pieter Wellens1, Remi Van Trijp2, Katrien Beuls1, Luc Steels2,3
Total Page:16
File Type:pdf, Size:1020Kb
Fluid Construction Grammar for Historical and Evolutionary Linguistics Pieter Wellens1, Remi van Trijp2, Katrien Beuls1, Luc Steels2,3 1VUB AI Lab 2Sony Computer Science 3 ICREA Institute for Pleinlaan 2 Laboratory Paris Evolutionary Biology (UPF-CSIC) 1050 Brussels (Belgium) 6 Rue Amyot PRBB, Dr Aiguidar 88 pieter|katrien@ 75005 Paris (France) 08003 Barcelona (Spain) ai.vub.ac.be [email protected] [email protected] Abstract as HPSG (Pollard and Sag, 1994). A genera- tive grammar is a model of language competence Fluid Construction Grammar (FCG) is an that licenses well-formed structures and rejects ill- open-source computational grammar for- formed utterances. Such grammars often decide malism that is becoming increasingly pop- on the well- or ill-formedness of utterances by us- ular for studying the history and evolution ing a strong type system that defines a set of fea- of language. This demonstration shows tures and possible values for those features. The how FCG can be used to operationalise the burden of efficient and robust language process- cultural processes and cognitive mecha- ing with a generative grammar largely rests on the nisms that underly language evolution and shoulders of the language processor. change. A cognitive-functional grammar, on the other hand, functions more like a transducer between 1 Introduction meaning and form. In parsing, such a grammar tries to uncover as much meaning as possible from Historical linguistics has been radically trans- a given utterance rather than deciding on its gram- formed over the past two decades by the ad- maticality. In the other direction, the grammar vent of corpus-based approaches. Ever increas- tries to produce intelligible utterances, which are ing datasets, both in size and richness of anno- well-formed as a side-effect if the grammar ad- tation, are becoming available (Yuri et al., 2012; equately captures the conventions of a particular Davies, 2011), and linguists now have more pow- language. A cognitive-functional grammar can which erful tools at their disposal for uncovering best be implemented without a strong type system changes have taken place. In this demonstration, because the set of possible features and values for we present Fluid Construction Grammar (Steels, them is assumed to be open-ended. Efficient and 2011, FCG), an open-source grammar formalism robust language processing also becomes a joint that makes it possible to also address the question responsibility of the grammar and the linguistic how of these changes happened by uncovering the processor. cognitive mechanisms and cultural processes that drive language evolution. 3 Reversible Language Processing FCG combines the expressive power of fea- As a construction grammar, FCG represents all ture structures and unification with the adaptiv- linguistic knowledge as pairings of function and ity and robustnes of machine learners. In sum, form (called constructions). This means that any FCG aims to be an open instrument for de- linguistic item, be it a concrete lexical item (see veloping robust and open-ended models of lan- Figure 1) or a schematic construction, shares the guage processing that can be used for both pars- same fundamental representation in FCG. ing and production. FCG can be downloaded at Each construction consists of two poles (a se- http://www.fcg-net.org. mantic/functional one and a syntactic/form one), 2 Design Philosophy each represented as a feature structure. By using a separate semantic and syntactic pole, FCG allows Fluid Construction Grammar is rooted in a the same construction to be efficiently parsed and cognitive-functional approach to language, which produced by the same processing engine by sim- is quite different from a generative grammar such ply changing the direction of application. 127 Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 127–132, Sofia, Bulgaria, August 4-9 2013. c 2013 Association for Computational Linguistics Babel web interface http://localhost:8000/ kim-lex (lex) ?top-unit-1611 Linguistic system 1 Linguistic system 2 tag ?meaning-849 ?top-unit-1611 (meaning (== footprints (==0 kim-lex lex) (identify-person sem syn ?kim-1 ?context-243 tag ?form-946 ?person-119) 1. Reconstruction 1. Reconstruction (bind person (form (== (string ?word-kim-1 "Kim"))) ?person-119 [kim]))) footprints (==0 kim-lex lex) ?word-kim-1 ?word-kim-1 2. Individual Learning 2. Individual Learning → ?meaning-849 → ?form-946 (?kim-1) args footprints sem-cat ?top-unit-1611 ?top-unit-1611 (==1 kim-lex lex) ((sem-function referring) syn-cat (sem-class person)) ((lex-cat Population Population proper-noun) 3. 3. footprints (syn-function Alignment Alignment nominal)) (==1 kim-lex lex) reset 4. Grammaticalization Figure 1: Lexical construction for the proper noun “Kim” as shown in the FCG web interface. All constructions are mappings between semantic Figure 2: Schematic overview of the experimental (left) and syntactic feature structures (right). methodology for historical and evolutionary lin- guists. The example here shows only two linguis- tic stages but there could be more. FCG processing uses two different kinds of uni- fication called match and merge. The match phase is a conditional phase which checks for applicabil- actual linguistic change over time can be modeled ity of the construction. The merge operation most (van Trijp, 2010; Beuls and Steels, 2013; Wellens closely resembles classical (yet untyped) unifica- and Loetzsch, 2012). tion. In production (i.e. going from meaning to form), the processor will consider a construction’s 4 How to set up an evolutionary semantic pole as a set of conditions that need to be linguistics experiment in FCG? satisfied, and the syntactic pole as additional infor- As the FCG processor can both produce and 1 of 1 12/6/12 11:08 PM mation that can be contributed by the construction. parse utterances it is possible to instantiate not In parsing (i.e. going from form to meaning), the one but a set or population of FCG processors roles of the poles are reversed. (or FCG agents) that can communicatively inter- Since FCG pays a lot of attention to the inter- act with each other. Experiments in historical or action between linguistic knowledge and process- evolutionary linguistics make use of this multi- ing, it makes it possible to investigate the conse- agent approach where all agents engage in situated quences of particular aspects of grammar with re- pairwise interactions (language games) (Steels, gard to representation, production, parsing, learn- 2012b). ing and propagation (in a population of language In this systems demo we will focus on a re- users). For example, a small case system may be cent experiment in the emergence of grammatical easier to represent and produce than a large sys- agreement (Beuls and Steels, 2013). The language tem, but it might also lead to increased ambigu- game consists of two agents in which one agent ity in parsing and learning that the larger system (the speaker) has to describe one or more (max would avoid. Fluid Construction Grammar can three) objects in a scene to the other agent (the bring these differences to the surface for further hearer). Each object can be described by one or computational analysis. more words. It follows that without any grammat- It is exactly this ability to monitor the impact of ical marking it would be difficult (often impossi- grammatical choices, that has sparked the interest ble) for the hearer to figure out which words de- of an increasingly wide audience of historical and scribe the same object and thus to arrive at a suc- evolutionary linguists. With FCG, different histor- cessful interpretation. The hypothesis is that the ical stages can be implemented (which addresses introduction of agreement markers helps solve this questions about representation and processing) but ambiguity. FCG also comes bundled with a reflective learn- Next to setting up a language game script the ing framework (Beuls et al., 2012) for learning the methodology consists of operationalizing the lin- key constructions of each stage. That same archi- guistic strategies required for a population to boot- tecture has proven to be adequately powerful to strap and maintain a particular linguistic system (in implement processes of grammaticalization so that this case nominal agreement). Examples of lin- 128 meta-layer processing repair repair problem problem diagnostic diagnostic diagnostic diagnostic !" !" routine processing Figure 3: Reflective meta-layer architecture oper- ating as part of an FCG agent/processor. Figure 4: Meaningful marker strategy. guistic systems already investigated include Ger- so that one agent can learn the constructions man case (van Trijp, 2012a; van Trijp, 2013), based on the input of another agent. These the grammatical expression of space (Spranger learning operations are generally divided into and Steels, 2012), the emergence of quantifiers diagnostics and repair strategies (see Fig- (Pauw and Hilferty, 2012) and the expression of ure 3). Diagnostics continually monitor FCG aspect in Russian (Gerasymova et al., 2012) [for processing for errors or inefficiencies and an overview see (Steels, 2011; Steels, 2012a)]. generate problems if they are found. Repair An experiment generally investigates multi- strategies then act on these problems by al- ple linguistic systems of increasing complexity tering the linguistic inventory (e.g. adding, where each system can, but need not, map to a removing or changing constructions). stage along an attested grammaticalization path- way. Most often a stage is introduced in order Population Alignment: There exists a large gap to gradually increase the complexity of the emer- between the cognitive machinary needed for gent dynamics. In this demo we posit four sys- learning an existing linguistic system (step 2) tems/strategies, (1) a baseline purely lexical strat- and bootstrapping, aligning and maintaining egy, (2) a strategy to bootstrap and align formal a complete linguistic system from scratch. In (meaningless) agreement markers, (3) a strategy to this step individual learning operators are ex- bootstrap and align meaningful agreement mark- tended with alignment strategies.