<<

Distributional Meets Construction Grammar. Towards a Unified Usage-Based Model of Grammar and Meaning Giulia Rambelli Emmanuele Chersoni University of Pisa The Hong Kong Polytechnic University [email protected]@gmail.com

Philippe Blache Chu-Ren Huang Aix-Marseille University The Hong Kong Polytechnic University [email protected] [email protected]

Alessandro Lenci University of Pisa [email protected]

Abstract Obj1 Obj2]). It is worth stressing that, even if the concept of construction is based on the idea In this paper, we propose a new type of seman- that linguistic properties actually emerge from lan- tic representation of Construction Grammar guage use, CxG theories have typically preferred that combines constructions with the vector to model the semantic content of constructions in representations used in Distributional Seman- tics. We introduce a new framework, Distribu- terms of hand-made, formal representations like tional Construction Grammar, where grammar those of Frame Semantics (Baker et al., 1998). and meaning are systematically modeled from This leaves open the issue of how semantic repre- language use, and finally, we discuss the kind sentations can be learned from empirical evidence, of contributions that distributional models can and how do they relate to the usage-based nature of provide to CxG representation from a linguis- Cxs. In fact, for a usage-based model of grammar tic and cognitive perspective. based on a strong -semantics parallelism, it would be desirable to be grounded on a framework 1 Introduction allowing to learn the semantic content of Cxs from In the last decades, usage-based models of lan- language use. guage have captured the attention of In this perspective, a promising solution for and (Tommasello, 2003; Bybee, representing constructional semantics is given by 2010). The different approaches covered by this an approach to meaning representations that has label are based on the assumptions that linguistic gained a rising interest in both computational lin- knowledge is embodied in mental processing and guistics and cognitive science, namely Distribu- representations that are sensitive to context and tional Semantics (henceforth DS). DS is a usage- statistical probabilities (Boyland, 2009), and that based model of meaning, based on the well- language structures at all levels, from established assumption that the statistical distribu- to syntax, emerge out of facts of actual language tion of linguistic items in context plays a key role usage (Bybee, 2010). in characterizing their semantic behaviour (Distri- A usage-based framework that turned out to be butional Hypothesis (Harris, 1954)). More pre- extremely influential is Construction Grammar cisely, Distributional Semantic Models (DSMs) (CxG) (Hoffman and Trousdale, 2013), a family represent the in terms of vector spaces, of theories sharing the fundamental idea that lan- where a lexical target is described in terms of a guage is a collection of form-meaning pairings vector (also known as embedding) built by identi- called constructions (henceforth Cxs)(Fillmore, fying in a corpus its syntactic and lexical contexts 1988; Goldberg, 2006). Cxs differ for their degree (Lenci, 2018). Lately, neural models to learn dis- of schematicity, ranging from (e.g., tributional vectors have gained massive popular- pre-, -ing), to complex (e.g., daredevil) to ity: these algorithms build low-dimensional vec- filled or partially-filled (e.g., give the devil tor representations by learning to optimally predict his dues or Jog (someones) memory) to more ab- the contexts of the target words (Mikolov et al., stract patterns like the ditransitive Cxs [Subj V 2013). On the negative side, DS lacks a clear

110 Proceedings of the First International Workshop on Designing Meaning Representations, pages 110–120 Florence, Italy, August 1st, 2019 c 2019 Association for Computational Linguistics connection with usage-based theoretical frame- has never formulated a systematic proposal for de- works. To the best of our knowledge, existing riving representations of constructional meaning attempts of linking DS with models of grammar from corpus data. Previous literature has mostly have rather targeted formal theories like Montague focused either on the automatic identification of Grammar and Categorial Grammar (Baroni et al., constructions on the basis of their formal features, 2014; Grefenstette and Sadrzadeh, 2015). or on modeling the meaning of a specific CxG. To sum up, both CxG and DS share the assump- For the former approach, we should mention tion that linguistic structures naturally emerge the works of Dunn(2017, 2019) that aim at au- from language usage, and that a representation tomatically inducing a set of grammatical units of both form and meaning of any linguistic item (Cxs) from a large corpus. On the one hand, can be modeled through its distributional statistics, Dunn’s contributions provide a method for extract- and more generally, with the quantitative informa- ing Cxs from corpora, but on the other hand they tion derived from corpus data. However, these two are mainly concerned with the formal side of the models still live in parallel worlds. On the one constructions, and especially with the problem of hand, CxG is a model of grammar in search for how syntactic constraints are learned. Some sort a consistent usage-based model of meaning, and, of semantic representation is included, in the form conversely, DS is a computational framework to of semantic cluster of word embeddings to which build semantic representations in search for an em- the word forms appearing in the constructions are pirically adequate theory of grammar. assigned. However, these works do not present As we illustrate in Section 2, occasional en- any evaluation of the construction representations counters between DS and CxG have already hap- in terms of semantic tasks. pened, but we believe that new fruitful advances Another line of research has focused in using could come from the exploitation of the mu- constructions for building computational models tual synergies between CxG and DS, and by let- of . Alishahi and Stevenson ting these two worlds finally meet and interact (2008) propose a model for the representation, ac- in a more systematic way. Following this direc- quisition and use of verb argument structure by tion of research, we introduce a new representa- formulating constructions as probabilistic associ- tion framework called Distributional Construc- ations between syntactic and semantic properties tion Grammar, which aims at bringing together of verbs and their arguments. This probabilistic these two theoretical paradigms. Our goal is to association emerges over time through a Bayesian integrate distributional information into construc- acquisition process in which similar verb usages tions by completing their semantic structures with are detected and grouped together to form general distributional vectors extracted from large textual constructions, based on their syntactic and seman- corpora, as samples of language usage. tic properties. Despite the success of this model, These pages are structured as follows: after the semantic representation of argument structure reviewing existing literature on CxG and related is still symbolic and each semantic category of in- computational studies, in Section3 we outline put constructions are manually compiled, in con- the key characteristics of our theoretical proposal, trast with the usage-based nature of constructions. while Section4 provides a general discussion Other studies used DSMs to model construc- about what contributions DSMs can provide to tional meaning, by focusing on a specific type of CxG representation from a linguistic and cognitive Cx rather than on the entire grammar. For exam- perspective. Although this is essentially a theoret- ple, Levshina and Heylen(2014) build a vector ical contribution, we outline ongoing work focus- space to study Dutch causative constructions with ing on its computational implementation and em- doen (‘do’) and laten (‘let’). They compute several pirical validation. We conclude by reporting future vector spaces with different context types, both for perspectives of research. the nouns that fill the Causer and Causee slot and for the verbs that fill the Effected Predicate slot. 2 Related Work Then, they cluster these nouns and verbs at differ- Despite the popularity of the constructional ap- ent levels of granularity and test which classifica- proach in (Gries and Stefanow- tion better predicts the use of laten and doen. itsch, 2004), computational semantics research A recent trend in diachronic linguistics investi-

111 gates linguistic change as a sequence of gradual for some alternations, verb embeddings encode changes in distributional patterns of usage (By- sufficient information for distinguishing between bee, 2010). For instance, Perek(2016) investi- acceptable and unacceptable combinations. gates the productivity of the V the hell out of NP construction (e.g., You scared the hell out of me) 3 Distributional CxG Framework from 1930 to 2009. On one side, he clusters the We introduce a new framework aimed at integrat- vectors of verbs occurring in this construction to ing the computational representation derived from pin point the preferred semantic domains of the distributional methods into the explicit formaliza- Cx in its diachronic evolution. Secondly, he com- tion of Construction Grammars, called Distribu- putes the density of the semantic space of the con- tional Construction Grammar (DisCxG). struction around a given word in a certain period DisCxG is based on three components: to be predictive of that word joining the construc- tion in the subsequent period. A similar approach • Constructions: stored pairings of form and is applied to study changes in the productivity of function, including morphemes, words, id- the Way-construction over the period 1830-2009 ioms, partially lexically filled and fully gen- (Perek, 2018). Perek’s analysis also proves that eral linguistic patterns (Goldberg, 2003); distributional similarity and neighbourhood den- • Frames: schematic semantic knowledge de- sity in the vector space can be predictive of the scribing scenes and situations in terms of usage of a construction with a new . their semantic roles; Other works have followed this approach, demon- strating the validity of DSMs to model the seman- • Events: semantic information concerning tic change of constructions in diachrony. Amato particular event instances with their specific and Lenci(2017) examine the Italian Gerundival participants. The introduction of this compo- Periphrases stare (to stay) andare (to go), venire nent, which is a novelty with respect to tra- (to come) followed by a gerund. As in previous ditional CxG frameworks, has been inspired works, they uses DSMs to i) identify similarities by cognitive models such as the General- and differences among Cxs clustering the vectors ized Event Knowledge (McRae and Matsuki, of verbs occurring in each Cx, and ii) investigate 2009) and the Words-as-Cues hypothesis (El- the changes undergone by the semantic space of man, 2014). the verbs occurring in the Cxs throughout a very long period (from 1550 to 2009). The peculiarity of DisCxG is that we distinguish two layers of semantic representation, referring to Lebani and Lenci(2017) present an unsuper- two different and yet complementary aspects of vised distributional semantic representation of ar- semantic knowledge. Specifically, frames define gument constructions. Following the assumption a prototypical semantic representation based on that constructional meanings for argument Cxs the different semantic roles (the frame elements) arise from the meaning of high frequency verbs defining argument structures, while events provide that co-occur with them (Goldberg, 1999; Casen- a specialization of the frame by taking into ac- hiser and Goldberg, 2005; Barak and Goldberg, count information about specific participants and 2017), they compute distributional vectors for CxS relations between them. Crucially, we assume as the centroids of the vectors of their typical that both these layers have a DS representation in verbs, and use them to model the psycholinguis- terms of distributional vectors learned from cor- tic data about construction priming in Johnson and pus co-occurrences. Goldberg(2013). This representation of construc- Following the central tenet of CxGs, according tion meaning has also been applied to study va- to which linguistic information is encoded in sim- lency coercion by Busso et al.(2018). ilar way for lexical items as well as for more ab- Following a parallel research line on probing stract Cxs (e.g., covariational-conditional Cx, di- tasks for distributed vectors, Kann et al.(2019) in- transitive Cx etc.), the three components of Dis- vestigate whether word and sentence embeddings CxG are modeled using the same type of formal encode the grammatical distinctions necessary for representation with recursive feature-structures, inferring the idiosyncratic frame-selectional prop- which is inspired by -Based Construction erties of verbs. Their findings show that, at least Grammar (SBCG) (Sag, 2012; Michaelis, 2013).

112 3.1 Constructions construction always involve a possession interpre- In DisCxG, a construction is represented by form tation (more precisely the transfer of something to and semantic features. The following list presents somebody), represented in the TRANSFER frame. the set of main features of Cxs adapting the for- Differently from standard SBCG formalization malization in SBCG: of Cxs, we add the distributional feature DS- VECTOR into the semantic layer in order to in- • The FORM feature contains the basic formal tegrate lexical distributional representations. The characteristics of constructions. It includes semantic structure of a lexical item can be asso- the (i) PHONological/SURFACE form, (ii) ciated with its distributional vector (e.g., the em- the (morpho)syntactic features (SYN), i.e bedding of read), but we can also include a distri- part-of-speech (TYPE), CASE (nominal, ac- butional representation of abstract syntactic con- cusative), the set of elements subcategorized structions following the approach of Lebani and (VAL), and (iii) PROPERTIES representing Lenci(2017) we have illustrated in Section 2. explicitly the syntactic relations among the elements of the Cx. 3.2 Frames A frame is a schematic representation of an • The ARGument-STructure implements the event or scenario together with the participating interface between syntactic and semantic actors/objects/locations and their (semantic) role roles. The arguments are in order of their ac- (Fillmore, 1982). For instance, the sentences cessibility hierarchy (subj ≺ d-obj ≺ obl...), encoding the syntactic role. Each argument 1. (a) Mary bought a car from John (for specifies the case, related to the grammatical 5000$). function, and links to the thematic role.1 (b) John sold a car to Mary (for 5000$).

• The SEMantic feature specifies the properties activate the same COMMERCIAL TRANSACTION of Cx’s meaning (Section 3.2). frame, consisting of a SELLER (John), a BUYER (Mary), a GOOD which is sold (car), and the Unlike SGBG or other CxG theories, we in- MONEY used in the transaction (5000$). clude inside FORM a new feature called PROP- Semantic frames are the standard meaning rep- ERTIES, borrowed from Property Grammars resentation in CxG, which represent them as sym- (Blache, 2005). Properties encode syntactic infor- bolic structures. The source of this informa- mation about the components of a Cx, and they tion is typically FrameNet (Ruppenhofer et al., play an important role in its recognition. How- 2016), a lexical database of English containing ever, the discussion of this linguistic aspect is not more than 1,200 semantic frames linked to more presented here, as the focus of this paper is on the than 200,000 manually annotated sentences. The 2 semantic side of constructions. not negligible problem of FrameNet is that entries As said above, a Cx can describe linguis- must be created by expert lexicographers. This has tic objects of various levels of complexity and lead to a widely recognized coverage problem in schematicity: words, phrases, fully lexicalized its lexical units (Baker, 2012). idiomatic patterns, partially lexicalized schemas, In DisCxG, semantic frames are still repre- etc. Thus, the attribute-value matrix can be applied sented as structures, but the value of semantic to lexical entries, as the verb read in Figure1, as roles consists of distributional vectors. As for the well as to abstract constructions that do not involve COMMERCIAL TRANSACTION frame in Figure3, lexical material. Figure2 depicts the ditransitive each frame element has associated a specific em- Cx. The semantic particularity of this construction bedding. It is worth noting that in this first version is that whatever the lexicalization of the verb, this of the DisCxG model, frame representations are 1SGCG distinguishes between valence and argument still based on predefined lists of semantic roles, as structure: the ARG-ST encodes overt and covert arguments, defined in FrameNet (e.g., BUYER, SELLER, etc.). including extracted (non-local) and unexpressed elements, However, some works have recently attempted to while VAL in the form description represents only realized elements. When no covert arguments occur, these features automatically infer frames (and their roles) from are identical. distributional information3. Woodsend and Lap- 2For more details on the Propery Grammar framework, see Blache(2016). 3Lately, SemEval 2019 proposed a task on unsupervised

113   read-lxm-cx  ditransitive-cx   h i    D E     SURFACE read   SYN CAT 1 V               n o  " #    2 1 3 4    TYPE V  FORM  LIN ≺ ≺ ≺  FORM  CAT   PROPERTIES          n L L o  SYN  VFORM fin   ADJ 1 3 ; 3 4              VAL hi   D E    ARG-ST 2 NP , 3 NP , 4 NP   D E   x[subj] y[obl] z[obj]  ARG-ST NP ,NP       i j         transfer-fr             read-fr    * +   * +    AGENT x          FRAMES      FRAMES READER i     RECIPIENT y         SEM      SEM          TEXT j     THEME z             −−→     −−−−−−−−→   DS-VECTOR read DS-VECTOR ditransitive

Figure 1: Description of read verb Figure 2: Description of ditransitive Cx ata(2015) use distributional representations to in- commercial-transaction-fr    duce embeddings for predicates and their argu-   −−−→    BUYER buyer  ments. Ustalov et al.(2018) propose a different     −−−→   methodology for unsupervised semantic frame in-  *SELLER seller +     duction. They build embeddings as the concate-   −−−→   SEM GOODS goods       nations of subject-verb-object triples and identify  MONEY money−−−−→      frames as clustered triples. Of course, a limit of   −−→   this approach is that it only uses subject and object PLACE shop arguments, while frames are generally associated with a wider variety of roles. Lebani and Lenci Figure 3: The COMMERCIAL TRANSACTION frame (2018) instead provide a distributional representa- containing the distributional representation of the se- mantic roles tion of verb-specific semantic roles as clusters of features automatically induced from corpora. In this paper, we assume that at least some as- pects of semantic roles can be derived from com- hension. An important aspect of such knowl- bining (e.g., with summation) the distributional edge consists of the events and situations that we vectors of their most prototypical fillers, follow- experience under different modalities, including ing an approach widely explored in DS (Baroni the linguistic input. McRae and Matsuki(2009) and Lenci, 2010; Erk et al., 2010; Sayeed et al., call it Generalized Event Knowledge (GEK), be- −−−→ 2016; Santus et al., 2017). For instance, the buyer cause it contains information about prototypical event structures. Language comprehension has role in the COMMERCIAL TRANSACTION frame can be taken as a vector encoding the properties been characterized as a largely predictive pro- of the typical nouns filling this role. We are aware cess (Kuperberg and Jaeger, 2015). Predictions that this solution is just an approximation of the are memory-based, and experiences about events content of frames elements. How to satisfactorily and their participants are used to generate ex- characterize semantic frames and roles using DS pectations about the upcoming linguistic input, is in fact still an open research question. thereby minimizing the processing effort (Elman, 2014; McRae and Matsuki, 2009). For instance, 3.3 Events argument combinations that are more ‘coherent’ Neurocognitive research has brought extensive ev- with the event scenarios activated by the previous idence that stored world knowledge plays a key words are read faster in self-paced reading tasks role in online language production and compre- and elicited smaller N400 amplitudes in ERP ex- periments (Paczynski and Kuperberg, 2012). lexical semantic frame induction (http://alt.qcri. org/semeval2019/index.php?id=tasks) In DisCxG, events have a crucial role: they

114   bridge the gap between the concrete instantiation SPECIALIZE ditransitive-cx  D E  of a Cx in context and its conceptualized meaning   ARG-ST NPx,NPy ,NPz  (conveyed from frames). For example, let’s con-      sider the verb read. We know that this verb sub-      transfer-fr  categorizes for two noun phrases (form) and in-        *AGENT x −−−−−→ +    [teacher]   volves a generic READING frame in which there SEM FRAMES      RECIPIENT y −−−−−−→   is someone who reads (READER) and something    [students]        that is read (TEXT). This frame only provides an   −−−−−−→ −−−−−−−→  THEME zhexercise,homework..i abstract, context-independent representation of the verb meaning, and the two roles can be generally Figure 5: Ditransitive event specialization defined as clusters of properties derived from sin- gular subjects and objects of read. However, the semantic representation comprehenders build dur- teacher/student/exercises (Figure5). ing sentence processing is influenced by the spe- cific fillers that instantiate the frame elements. If 2. The teacher gives students ... → The teacher the input is A student reads.., the fact that the word gives students exercises student appears as the subject of the verb activates a specific scenario, together with a series of ex- Any lexical item activate a portion of event pectations about the prototypicality of other lexi- knowledge (Elman, 2014): in fact, if verbs evoke cal items. Consequently, the object of the previous events, nouns evoke entities that participate into sentence is more likely to be book rather than mag- events. Thus, events and entities are themselves azine (Chersoni et al., 2019). Accordingly, in Dis- interlinked: there is not a specific feature EVENT CxG events are considered as functions that spe- in the description of the lexical entry teacher, but cialize the semantic meaning encoded in frames. events are activated by the lexical entry, generating The word student specializes the READING frame a network of expectations about upcoming words into a specific event, triggering expectations about in the sentence (McRae and Matsuki, 2009). the most likely participants of the other roles: the Given this assumption, Chersoni et al.(2019) READER is encoded as a lexical unit vector, and represent event knowledge in terms of a Distri- the distributional restriction applied to the TEXT butional Event Graph (DEG) automatically built is represented by a subset of possible objects or- from parsed corpora. In this graph, nodes are dered by their degree of typicality in the event. embeddings and edges are labeled with syntac- Figure4 gives a simple example of the specializa- tic relations and weighted using statistic associa- tion brought out by event knowledge. tion measures (Figure6). Each event is a a path in DEG. Thus, given a lexical cue w, it is pos-   student-read-event sible to identify the events it activates (together   SPECIALIZE reading-fr  with the strength of its activation, defined as a     function of the graph weights) and generate ex-  * −−−−−→ +  READER student  pectations about incoming inputs on both paradig- SEM  n−−→ o   TEXT book, paper−−−→..  matic and syntagmatic axes. With this graph- based approach, Chersoni et al.(2019) model sen- Figure 4: Student-read event as the specialization of tence comprehension as the dynamic and incre- READING frame mental creation of a semantic representation inte- grated into a semantically coherent structure con- In a similar way, events can instantiate an ab- tributing to the sentence interpretation. stract construction dynamically, according to the We propose to include in our framework the context. The different lexicalization of the AGENT information encoded in DEG. Each lexical entry and the RECIPIENT in the ditransitive construction contains a pointer to its corresponding node in causes a different selection of the THEME. For the graph. Therefore, the frame specialization we example, the fact that the sentence fragment The have described above corresponds to an event en- teacher gives students ... could be completed as in coded with a specific path in the DEG. Event in- (2) expresses a distributional restriction that can be formation represents a way to unify the schematic encoded as an event capturing the co-occurrences descriptions contained in the grammar with the

115 world knowledge and contextual information pro- tive meaning cannot be decomposed. In computa- gressively activated by lexical items and integrated tional semantics, a large literature has been aim- during language processing. ing at modeling idiomaticity using DSMs. Senaldi et al.(2016) carried out an type identifi- 4 Some Other Arguments in Favor of a cation task representing Italian V-NP and V-PP Distributional CxG Cxs as vectors. They observed that the vectors of VN and AN idioms are less similar to the vec- As we said in Section 2, few works have tried to tors of lexical variants of these expressions with use distributional semantic representations of con- respect to the vectors of compositional construc- structions and existing studied have focused more tions. (Cordeiro et al., 2019) realized a frame- on applying DS to a particular construction type, work for predict compositionality us- instead of providing a general model to represent ing DSMs, evaluating to what extent they cap- the semantic content of Cxs. We argue that DSMs ture idiomaticity compared to human judgments. could give an important contribution in design- Results revealed a high agreement between the ing representations of constructional meaning. In models and human predictions, suggesting that what follows, we briefly discuss some specific is- they are able to incorporate information about id- sues related to Construction Grammars that could iomaticity. be addressed by combining them with Distribu- In future works, it would be interesting to see if tional Semantics. DSMs-based approaches can be used in combina- Measuring similarity among constructions tion with methods for the identification of the for- and frames The dominant approaches like mal features of constructions (Dunn, 2017, 2019), frame semantics and traditional CxGs tend to in order to tackle the task of compositionality pre- represent entities and their relations in a formal diction simultaneously with syntactic and seman- (hand-made) way. A potential limitation of these tic features. methods is that it is hard to assess the similarity Modeling sentence comprehension A trend between frames or constructions, while one advan- in computational semantics regards the application tage of distributional vectors is that one can easily of DSMs to sentence processing (Mitchell et al., compute the degree of similarity between linguis- 2010; Lenci, 2011; Sayeed et al., 2015; Johns and tic items represented in a vector space. For ex- Jones, 2015, i.a.). ample, Busso et al.(2018) built a semantic space for several Italian argument constructions and then Chersoni et al.(2016, 2017) propose a Distribu- computed the similarity of their vectors, observ- tional Model of sentence comprehension inspired ing that some Cxs have similar distributional be- by the general principles of the Memory, Uni- haviour like Caused-Motion and Dative. fication and Control framework (Hagoort, 2013, As for frames, there has been some work on 2015). The memory component includes events using distributional similarity between vectors for in GEK with feature structures containing infor- their unsupervised induction (Ustalov et al., 2018), mation directly extracted from parsed sentences for comparing frames across languages (Sikos and in corpora: attributes are syntactic dependencies, Pado´, 2018), and even for the automatic identifi- while values are distributional vectors of depen- cation of the semantic relations holding between dent lexemes. Then, they model semantic com- them (Botschen et al., 2017). position as an event construction and update func- tion F, whose aim is to build a coherent semantic Identifying idiomatic meaning Many stud- representation by integrating the GEK cued by the ies in theoretical, descriptive and experimental lin- linguistic elements. guistics have recently questioned the fregean prin- The framework has been applied to the logical ciple of compositionality, which assumes that the metonymy phenomenon (e.g, The student begins meaning of an expression is the result of the incre- the book), using the semantic complexity func- mental composition of its sub-constituents. There tion to model the processing costs of metonymic is a large number of linguistic phenomena whose sentences, which was shown to be higher com- meaning is accessed directly from the whole lin- pared to non-coercion sentences (McElree et al., guistic structure: this is typically the case with id- 2001; Traxler et al., 2002). Evaluation against ioms or multi-word expressions, where the figura- psycholinguistic datasets proves the linguistic and

116 Figure 6: An extract of DEG showing several instances of events (Chersoni et al., 2019) psycholinguistic validity of using embeddings to that language structure and properties emerge represent events and including them in incremen- from language use. tal model of sentence comprehension.

Evaluations based on experimental evidence • We integrated information about events to DSMs have proved to be very useful in model- build a semantic representation of an input as ing human performance in psycholinguistic tasks an incremental and predictive process. (Mandera et al., 2017). This is an important find- ing, since it allows to test the predictions of Con- Converging different layers of meaning represen- struction Grammar theories against data derived tation into a unique framework is not a trivial prob- from behavioral experiments. lem, and in our future work we will need to find To cite an example from the DS literature, the optimal ways to balance these two components: models proposed by Lebani and Lenci(2017) semantic vectors derived from corpus data on the replicated the priming effect of the lexical decision one hand, and a possibly accurate formalization of task by Johnson and Goldberg(2013), where the the internal structure of the constructions on the participants were asked to judge whether a given other hand. In this contribution, we hoped to show verb was a real word or not, after being exposed to that merging the two frameworks would be worth an argument structure construction in the form of the efforts, as they share many theoretical assump- a Jabberwocky sentence. The authors of the study tions and complement themselves on the basis of created distributional representations of construc- their respective strengths. tions as combinations of the vectors of their typical Our future goal is the automatic building and in- verbs, and measured their with clusion of a distributional representation of frames the verbs of the original experiment, showing that and event in DisCxG; our aim is to exploit the fi- their model can accurately reproduce the results nal to build for the first time a Distribu- reported by Johnson and Goldberg(2013). tional Construction . Moreover, we are planning to apply this framework in a predictive 5 Conclusion model of language comprehension, defining how In this paper, we investigated the potential con- a Cx is activated by the combination of syntactic, tribution of DSMs to the semantic representation lexical and distributional cues occurring in Dis- of constructions, and we presented a theoretical CxG. We believe this framework could be a start- proposal bringing together vector spaces and con- ing point for applications in NLP such as Knowl- structions into a unique framework. It is worth edge representation and reasoning, Natural Lan- highlighting our main contributions: guage Understanding and Generation, but also a potential term of comparison for psycholinguistic • We built a unified representation of gram- models of human language comprehension. mar and meaning based on the assumption

117 References J. Bybee. 2010. Language, Usage and Cognition. Cambridge University Press. Afra Alishahi and Suzanne Stevenson. 2008. A Com- putational Model of Early Argument Structure Ac- Devin Casenhiser and Adele E Goldberg. 2005. Fast quisition. Cognitive Science, 32(5):789–834. Mapping Between a Phrasal Form and Meaning. Irene Amato and Alessandro Lenci. 2017. Story of a Developmental science, 8(6):500–508. Construction: Statistical and Distributional Analysis Emmanuele Chersoni, Philippe Blache, and Alessan- of the Development of the Italian Gerundival Con- dro Lenci. 2016. Towards a Distributional Model struction. In G. Marotta and F. S. Lievers, editors, of Semantic Complexity. In Proceedings of the Strutture Linguistiche e Dati Empirici in Diacronia COLING Workshop on Computational Linguistics e Sincronia, pages 135–158. Pisa University Press. for Linguistic Complexity (CL4LC), pages 12–22. Collin F Baker. 2012. FrameNet, Current Collabora- tions and Future Goals. Language, Resources and Emmanuele Chersoni, Alessandro Lenci, and Philippe Evaluation, 46(2):269–286. Blache. 2017. Logical Metonymy in a Distributional Model of Sentence Comprehension. In Proceedings Collin F. Baker, Charles J. Fillmore, and John B. Lowe. of *SEM, pages 168–177. 1998. The Berkeley FrameNet project. In Pro- ceedings of COLING-ACL, pages 86–90, Montreal, Emmanuele Chersoni, Enrico Santus, Ludovica Pan- Canada. ACL. nitto, Alessandro Lenci, Philippe Blache, and Chu- Ren Huang. 2019. A Structured Distributional Libby Barak and Adele Goldberg. 2017. Modeling the Model of Sentence Meaning and Processing. Jour- Partial Productivity of Constructions. In Proceeed- nal of Natural Language Engineering. To Appear. ings of the 2017 AAAI Spring Symposium Series on Computational Construction Grammar and Natural Silvio Cordeiro, Aline Villavicencio, Marco Idiart, and Language Understanding, pages 131–138. Carlos Ramisch. 2019. Unsupervised Composition- ality Prediction of Nominal Compounds. Computa- Marco Baroni, Raffaela Bernardi, and Roberto Zam- tional Linguistics, 45:1–57. parelli. 2014. Frege in Space: A Program of Com- positional Distributional Semantics. Linguistic Is- Jonathan Dunn. 2017. Computational Learning of sues in Language Technology, 9(6):5–110. Construction Grammars. Language and Cognition, 9(2):254–292. Marco Baroni and Alessandro Lenci. 2010. Dis- tributional Memory: A General Framework for Jonathan Dunn. 2019. Frequency vs. Association for Corpus-Based Semantics. Computational Linguis- Constraint Selection in Usage-Based Construction tics, 36(4):673–721. Grammar. In Proceedings of the NAACL Workshop on Cognitive Modeling and Computational Linguis- Philippe Blache. 2005. Property Grammars: A Fully tics. Constraint-Based Theory. In Skadhauge P. R. Chris- Constraint tiansen, H. and J. Villadsen, editors, Jeffrey L. Elman. 2014. Systematicity in the Lexicon: Solving and Language Processing. CSLP 2004 , vol- On Having Your Cake and Eating It Too. In Paco LNAI ume 3438 of , pages 1–16. Springer, Berlin, Calvo and John Symons, editors, The Architecture Heidelberg. of Cognition, pages 115–145. The MIT Press. Philippe Blache. 2016. Representing Syntax by Means of Properties: A Formal Framework for Descrip- Katrin Erk, Sebastian Pado,´ and Ulrike Pado.´ 2010. A tive Approaches. Journal of Language Modelling, Flexible, Corpus-Driven Model of Regular and In- 4(2):183–224. verse Selectional Preferences. Computational Lin- guistics, 36(4):723–763. Teresa Botschen, Hatem Mousselly Sergieh, and Iryna Gurevych. 2017. Prediction of Frame-to-Frame Re- Charles J. Fillmore. 1982. Frame Semantics. Linguis- lations in the FrameNet Hierarchy with Frame Em- tics in the Morning Calm, pages 111–37. beddings. In Proceedings of the ACL Workshop on Representation Learning for NLP, pages 146–156. Charles J Fillmore. 1988. The Mechanisms of Con- struction Grammar. Annual Meeting of the Berkeley Joyce Tang Boyland. 2009. Usage-Based Models of Linguistics Society, 14:35–55. Language. In D. Eddington, editor, Experimental and Quantitative Linguistics, pages 351–419. Lin- Adele E. Goldberg. 1999. The Emergence of the Se- com, Munich. mantics of Argument Structure Constructions. In B. MacWhinney, editor, The Emergence of Lan- Lucia Busso, Ludovica Pannitto, and Alessandro guage, pages 197–212. Lawrence Erlbaum Publica- Lenci. 2018. Modelling Italian Construction Flex- tions, Hillsdale, NJ. ibility with Distributional Semantics: Are Construc- tions Enough? In Proceedings of the Fifth Italian Adele E. Goldberg. 2003. Constructions: A New The- Conference on Computational Linguistics (CLiC-it oretical Approach to Language. Trends in Cognitive 2018), pages 68–74. Sciences, 7(5):219–224.

118 Adele E Goldberg. 2006. Constructions at Work: The Alessandro Lenci. 2011. Composing and Updating Nature of Generalization in Language. Oxford Uni- Verb Argument Expectations: A Distributional Se- versity Press on Demand. mantic Model. In Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguis- Edward Grefenstette and Mehrnoosh Sadrzadeh. 2015. tics, pages 58–66. Concrete Models and Empirical Evaluations for the Categorical Compositional Distributional Model of Alessandro Lenci. 2018. Distributional Models of Meaning. Computational Linguistics, 41(1):71– Word Meaning. Annual Review of Linguistics, 118. 4(1):151–171.

Stefan Th. Gries and Anatol Stefanowitsch. 2004. Ex- Natalia Levshina and Kris Heylen. 2014. A Radi- tending Collostructional Analysis. A Corpus-Based cally Data-Driven Construction Grammar: Exper- Perspective on ’Alternations’. International Journal iments with Dutch Causative Constructions. In of Corpus Linguistics, 9(1):97–129. R. Boogaart, T. Colleman, and Rutten G., edi- tors, Extending the Scope of Construction Grammar, Peter Hagoort. 2013. MUC (Memory, Unification, pages 17–46. Mouton de Gruyter, Berlin. Control) and Beyond. Frontiers in Psychology, 4:1– 13. Paweł Mandera, Emmanuel Keuleers, and Marc Brys- baert. 2017. Explaining Human Performance in Peter Hagoort. 2015. MUC (Memory, Unification, Psycholinguistic Tasks with Models of Semantic Control): A Model on the Neurobiology of Lan- Similarity Based on Prediction and Counting: A Re- guage Beyond Single Word Processing. In Neuro- view and Empirical Validation. Journal of Memory biology of language, pages 339–347. Elsevier. and Language, 92:57–78. Zellig S. Harris. 1954. Distributional Structure. Word, Brian McElree, Matthew J Traxler, Martin J Pickering, 10:146–62. Rachel E Seely, and Ray Jackendoff. 2001. Reading Time Evidence for Enriched Composition. Cogni- Thomas Hoffman and Graeme Trousdale, editors. tion, 78(1):B17–B25. 2013. The Oxford Handbook of Construction Gram- mar. Oxford University Press, Oxford. Ken McRae and Kazunaga Matsuki. 2009. People Use their Knowledge of Common Events to Understand Brendan T Johns and Michael N Jones. 2015. Gener- Language, and Do So as Quickly as Possible. Lan- ating Structure from Experience: A Retrieval-Based guage and Linguistics Compass, 3(6):1417–1429. Model of Language Processing. Canadian Journal of Experimental Psychology, 69(3):233–251. Laura A Michaelis. 2013. Sign-Based Construction Grammar. In The Oxford Handbook of Construction Matt A. Johnson and Adele E. Goldberg. 2013. Ev- Grammar, pages 133–152, Oxford. Oxford Univer- idence for Automatic Accessing of Constructional sity Press. Meaning: Jabberwocky Sentences Prime Associ- ated Verbs. Language and Cognitive Processes, Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor- 28(10):1439–1452. rado, and Jeffrey Dean. 2013. Distributed Represen- tations of Words and Phrases and their Composition- Katharina Kann, Alex Warstadt, Adina Williams, and ality. In Advances in Neural Information Processing Samuel R Bowman. 2019. Verb Argument Structure Systems (NIPS 2013), pages 3111–3119. Alternations in Word and Sentence Embeddings. In Proceedings of the Society for Computation in Lin- Jeff Mitchell, Mirella Lapata, Vera Demberg, and guistics (SCiL) 2019, pages 287–297. Frank Keller. 2010. Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure. In Gina R. Kuperberg and T. Florian Jaeger. 2015. What Proceedings of ACL, pages 196–206, Uppsala, Swe- Do We Mean by Prediction in Language Compre- den. ACL. hension? Language Cognition & Neuroscience, 3798. Martin Paczynski and Gina R. Kuperberg. 2012. Mul- tiple Influences of on Sentence Gianluca E Lebani and Alessandro Lenci. 2017. Mod- Processing: Distinct Effects of Semantic Relat- elling the Meaning of Argument Constructions with edness on Violations of Real-World Event/State Distributional Semantics. In Proceedings of the Knowledge and Animacy Selection Restrictions. J AAAI 2017 Spring Symposium on Computational Memory and Language, 67(4). Construction Grammar and Natural Language Un- derstanding, pages 197–204. Florent Perek. 2016. Using Distributional Semantics to Study Syntactic Productivity in Diachrony: A Case Gianluca E. Lebani and Alessandro Lenci. 2018. Study. Linguistics, 54(1):149–188. A Distributional Model of Verb-Specific Semantic Roles Inferences. In Thierry Poibeau and Aline Florent Perek. 2018. Recent Change in the Produc- Villavicencio, editors, Language, Cognition, and tivity and Schematicity of the Way-Construction: A Computational Models, pages 118–158. Cambridge Distributional Semantic Analysis. Corpus Linguis- University Press, Cambridge. tics and Linguistic Theory, 14(1):65–97.

119 Josef Ruppenhofer, Michael Ellsworth, Myriam Schwarzer-Petruck, Christopher R Johnson, and Jan Scheffczyk. 2016. FrameNet II: Extended Theory and Practice. Ivan A Sag. 2012. Sign-Based Construction Grammar: An Informal Synopsis. In Sign-based construction grammar, volume 193, pages 69–202. CSLI: CSLI Publications. Enrico Santus, Emmanuele Chersoni, Alessandro Lenci, and Philippe Blache. 2017. Measuring The- matic Fit with Distributional Feature Overlap. In Proceedings of EMNLP, pages 648–658. Asad Sayeed, Stefan Fischer, and Vera Demberg. 2015. Vector-Space Calculation of Semantic Surprisal for Predicting Word Pronunciation Duration. In Pro- ceedings of ACL-IJCNLP, pages 763–773, Beijing, China. ACL. Asad Sayeed, Clayton Greenberg, and Vera Demberg. 2016. Thematic Fit Evaluation: An Aspect of Se- lectional Preferences. In Proceedings of the ACL Workshop on Evaluating Vector-Space Representa- tions for NLP, pages 99–105. Marco Silvio Giuseppe Senaldi, Gianluca E Lebani, and Alessandro Lenci. 2016. Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models. In Proceedings of the ACL Workshop on Multiword Expressions, pages 21–31. Jennifer Sikos and Sebastian Pado.´ 2018. Using Em- beddings to Compare FrameNet Frames Across Lan- guages. In Proceedings of the COLING Workshop on Linguistic Resources for Natural Language Pro- cessing, pages 91–101.

Michael Tommasello. 2003. Constructing a Language: A Usage-Based Acquisition. Harvard University Press, Cambridge, MA. Matthew J Traxler, Martin J Pickering, and Brian McElree. 2002. Coercion in Sentence Process- ing: Evidence from Eye-Movements and Self- Paced Reading. Journal of Memory and Language, 47(4):530–547.

Dmitry Ustalov, Alexander Panchenko, Andrei Kutu- zov, Chris Biemann, and Simone Paolo Ponzetto. 2018. Unsupervised Semantic Frame Induction Us- ing Triclustering. In Proceedings of ACL, pages 55– 62.

Kristian Woodsend and Mirella Lapata. 2015. Dis- tributed Representations for Unsupervised . In Proceedings of EMNLP, pages 2482–2491.

120