Workshop on High-Level Methodologies for Grammar Engineering @ ESSLLI 2013

Workshop on High-level Methodologies for Grammar Engineering @ ESSLLI 2013 Düsseldorf,Germany, 12{16 August 2013 Denys Duchier and Yannick Parmentier (Eds.) Preface These are the proceedings of the International Workshop on High-level Method- ologies for Grammar Engineering (HMGE), held at ESSLLI 2013 in Düsseldorf, Germany . This workshop follows a series of workshops dedicated to Grammar Engineering Across Frameworks (GEAF), which took place in 2007 (Stanford), 2008 (Manchester) and 2009 (Singapore). These workshops offer a place to report on research about the design and use of deep grammars for the representation and processing of language. HMGE was open to submissions on topics including, but not limited to: { development, maintenance and enhancement of grammars { semi-automatic acquisition of grammars { debugging environments for grammar design { dedicated description languages for grammar engineering { applications of large-scale grammars 10 papers have been selected via the rigorous efforts of a Program Commit- tee composed of renowned researchers. Each paper has been reviewed by three independent committee members. HMGE would not have been possible without the precious help of the program committee, the organizing committee, the ESSLLI local organizing committee and their sponsors. Thanks to them all. Wishing you a fruitful reading, yours faithfully June 2013 Denys Duchier & Yannick Parmentier Local Chairs HMGE 2013 Organization Executive Committee Workshop Chairs: Denys Duchier and Yannick Parmentier (Universitéd'Orléans) Program Committee Emily M. Bender, Washington, USA Philippe Blache, Aix-en-Provence, France Miriam Butt, Konstanz, Germany Eric De La Clergerie, Paris, France Beno^ıtCrabbé,Paris, France Berthold Crysmann, Paris, France Stefanie Dipper, Bochum, Germany Denys Duchier, Orléans,France Claire Gardent, Nancy, France Josef van Genabith, Dublin, Ireland Timm Lichte, Düsseldorf,Germany Montserrat Marimon, Barcelona, Spain Yusuke Miyao, Tokyo, Japan Stefan Müller,Berlin, Germany Gertjan van Noord, Groningen, The Netherlands Yannick Parmentier, Orléans,France Table of contents A Type-Logical Treebank for French. .1 • Richard Moot. Clustering for categorial grammar induction. 13 • Noémie-FleurSandillon-Rezer. Constructing a Phenomenal Corpus: Towards Detecting Linguistic Phenom- • ena in Precision Grammars. 25 Ned Letcher and Timothy Baldwin. Coupling Trees and Frames through XMG. .37 • Timm Lichte, Alexander Diez and Simon Petitjean. Extended Projections in a Guadeloupean TAG Grammar. 49 • Emmanuel Schang. FRIGRAM: a French Interaction Grammar. .63 • Guy Perrier and Bruno Guillaume. Kahina: A Hybrid Trace-Based and Chart-Based Debugging System for • Grammar Engineering. .75 Johannes Dellert, Kilian Evang and Frank Richter. SlaviCLIMB: Combining expertise for Slavic grammar development using a • metagrammar. 87 Antske Fokkens and Tania Avgustinova. The CoreGram Project: Theoretical Linguistics, Theory Development and • Verification. 93 Stefan Müller. Time Travel in Grammar Engineering: Using a Metagrammar to Broaden • the Search Space. 105 Antske Fokkens and Emily M. Bender. Demo papers A Toolkit for Engineering Type-Logical Grammars. 117 • Richard Moot. eXtensible MetaGrammar: a Modular Tool for Grammar Engineering. .119 • Simon Petitjean. Leopar: an Interaction Grammar Parser. 121 • Guy Perrier and Bruno Guillaume. Ygg, parsing French text using AB grammars. 123 • Noémie-FleurSandillon-Rezer. A Type-Logical Treebank for French Richard Moot LaBRI (CNRS), Bordeaux University [email protected] 1 Introduction Categorial grammars have interesting theoretical advantages, most notably their very clean syntax-semantics interface. In the last decade, research in Combina- tory Categorial Grammar has shown that this is not merely a theoretical ad- vantage, but that, with the appropriate resources and tools — an annotated treebank, the CCGbank [13], a very efficient parser [10] and a semantic lexicon [4]) — we can use categorial grammars for wide-coverage, deep semantic analysis. Applications of the resulting wide-coverage semantics include natural-language question-answering [5] and computing textual entailments [6]. A key element has been the development of the CCGbank, which has al- lowed both parameter-optimization for the wide-coverage parser and provided a framework (in types and in derivations) for the semantic applications. Categorial grammars in the logical tradition [15, 18, 17] have stayed somewhat behind in terms of their application to large-scale linguistic data. The goal of the current paper is to describe the TLGbank, a semi-automatically extracted treebank containing type-logical proofs, created with the explicit goal of making similar wide-coverage parsing and semantics possible in the type-logical context. 2 The French Treebank The French Treebank (FTB, [1]) is a set of syntactically annotated news articles from the newspaper Le Monde. The FTB consists of 12,891 annotated sentences with a total of 383,227 words. The FTB has previously been used to extract lexical-functional grammars [20] and tree adjoining grammars [11]. For the annotation, the FTB uses simple, rather flat trees with some functional syntactic annotation (subject, object, infinitival argument, etc.). Con- secutive multiword-expression have been merged in the annotation and neither traces nor discontinuous dependencies have been annotated. Figure 1 in Sec- tion 4.1 shows a fragment of a sentence from the FTB. Verb clusters are treated as a constituents (labeled VN ) and the arguments of the verb occur as sisters of the verbal cluster (eg. the infinitival argument with functional role OBJ in Figure 1). 2 Richard Moot 3 Type-Logical Grammars This section is a very short introduction to (multimodal) type-logical grammars. More detailed introductions can be found in Section 2.4 of [15] and in Chapter 5 of [17]. The atomic formulas are n (for nouns), np (for noun phrases), ppx (for prepo- sitional phrases, with x the preposition heading the phrase) and sx for sentences (distinguishing between several types smain for main, tensed sentence, swhq for a wh-question, sq for a sentence introduced by (that) and further types for passives spass, infinitives sinf, and past sppart and present sppres participles; this is inspired by the FTB annotation, though passives are not annotated as such, and the categorial treatments of [9, 13] implemented using first-order logic [16]). An intransitive verb is assigned np smain, indicating that it requires a noun phrase to its left in order to form an inflected\ sentence. Similarly, transitive verbs are assigned the formula (np smain)/np, requiring a noun phrase to their right in order to form an intransitive\ verb. Table 1 lists (a slightly simplified version) of the most common rules used in the extracted treebank. Section 3.1 sketches some linguistic phenomena requiring additional rules and gives some references as to where to find these rules. Lex Hyp w A x A ` ` X A/B Y B X BY B A ` ` /E ` ` \ E X Y A X Y A \ ◦ ` ◦ ` x B x B `. `. X x A x X A ◦ ` /I ◦ ` I X A/B X B A \ ` ` \ x B `. X[Y ] BZ B 1A X[Y x] A ` ` \ 1E ◦ ` /3121I X[Y Z] A \ X[Y ] A/3121B ◦ ` ` Table 1. Logical rules for multimodal categorial grammars w We will abbreviate the lexicon rule as A . The rule for /E simply states that whenever we have shown an expression X to be of type A/B and we have shown an expression Y to be of type B, then the tree with X as its immediate subtree on the left and Y as its immediate subtree of the right is of type A. An easy instantiation of this rule (with X := the, Y := student, A := np, B := n) would be the following (the E rule is symmetric). \ A Type-Logical Treebank for French 3 the np/n student n ` ` /E the student np ◦ ` The two rules on the bottom of the figure require some special attention. The 1E rule is an infixation rule. This rule is used for adverbs (and other VP modifiers)\ occurring after the verb. Like the E rule, it takes a B formula as its argument, but infixes itself to the right of any\ subtree Y of X (X[Y ] denotes a tree X with a designated subtree Y 1). An example is shown below for the VP “impoverishes the CGT dangerously”. The interest of this rule is that it allows a uniform type assignment for adverbs occurring post-verbally, regardless of other verb arguments. appauvrit (np s)/np la CGT np ` \ ◦ ` /E appauvrit (la CGT) np s dangereusement (np s) 1(np s) ◦ ◦ ◦ ` \ ` \ \ \ /E (appauvrit dangereusement) (la CGT) ◦ ◦ ◦ Finally, the /3121 rule is an extraction rule, extracting a B constituent from any right branch inside an X constituent. Section 4.3 shows an example.2 3.1 Additional Linguistic Phenomena The rules listed in Table 1 correspond to the most frequently used rules for the type-logical treebank. The additional rules are a) for the product (primarily used for coordination of multiple arguments (as shown in sentence (1) below, where the two verb arguments np and pp are conjoined, see Section 2.4 of [18]), b) for gapping (as shown in sentence (2) below, where the transitive verb “atteindre” is absent from the second clause; a multimodal solution is proposed in [12]), and c) for some special rules to treat past-perfect quoted speech, as shown in sentence (3) below. The parenthesized sentence is argument of the past participle “ajouté”and, in addition, this argument is discontinuous. The solution is essen- tially to analyse the entire verb group missing the s argument “a ajouté...” as s s . main\1 main (1) ... augmenter [np ses fonds propres ] [pp de 90 millions de francs ] et ... increase [np its equity ] [pp by 90 million francs ] and [np les quasi-fonds propres ] [pp de 30 millions ] ... [np its quasi-equity ] [pp by 30 million ] ... (2) Le salaire horaire atteint dorénavant 34,06 francs et le The wages per hour reach from now on 34,06 francs and the SMIC mensual brut [tv ] 5756,14 francs.

Load more