Proceedings of NAACL-HLT 2016

NAACL HLT 2016 Workshop on Discontinuous Structures in Natural Language Processing (DiscoNLP) Proceedings of the Workshop June 17, 2016 San Diego, California, USA c 2016 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-941643-85-3 ii Introduction This volume presents the papers presented at the Workshop on Discontinuous Structures in Natural Language Processing, held in San Diego, California on June 17, 2016 during the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The modeling of certain structures in natural language requires a mechanism for discontinuity, in the sense that we must account for two or more parts of the structure that are not adjacent. This is true across many languages and on different description levels. For instance, on the lexical level, this concerns discontinuous morphological phenomena such as transfixation (templatic morphology), as well as phrasal verbs, and non-contiguous multiword expressions. On the syntactic level, discontinuity is caused by phenomena such as extraposition and topicalization, or argument scrambling. Morphologically rich languages (MRLs) are particularly likely to exhibit such phenomena. Other examples include disfluency and anaphora/coreference resolution with discontinuous antecedents; modeling in both of the latter areas requires an extended domain of locality. On a higher level, discontinuity is a relevant factor in machine translation, as well as in complex question answering and in topic structure modeling. Discontinuity has been studied intensively in a range of different areas, including but not limited to grammar development, syntactic and semantic parsing, morphological analysis, machine translation, anaphora resolution, discourse modeling, automatic summarization and complex question answering. Nevertheless, the treatment of discontinuous structures remains a challenge, because on the one hand, recovering of non-local information is generally associated with a high computational cost, and on the other hand, discontinuities are inherently a low-frequency phenomenon, which means that statistical approaches have a tendency to analyze them incorrectly as more frequent local phenomena. Additionally, it is not always clear if and how NLP tasks can benefit from knowing about discontinuity, that is, why one should care, particularly considering the given computational cost. The goal of this workshop is to bring together researchers from the different areas to give them a forum to exchange ideas and problem solutions, to create synergy effects, and to enable more powerful solutions. This encompasses not only linguistic analyses and work on analyzing or recovering the corresponding structures, such as, e.g., in non-projective dependency parsing, but also studies on "use cases", which show how information about discontinuity can be used to enhance NLP tasks. We think that given the broad program we have put together, this goal has been more than fulfilled. Thanks to all authors who have contributed their work! Out of ten submissions, seven were selected for presentation. We would also like to extend our gratitude the program committee, who have dedicated their time and effort in order to make this workshop a high-quality event. See you in San Diego! Wolfgang Maier, Sandra Kübler, and Constantin Orasan˘ iii Organizers: Wolfgang Maier, University of Düsseldorf (Germany) Sandra Kübler, Indiana University (USA) Constantin Orasan,˘ University of Wolverhampton (UK) Program Committee: Anne Abeillé, University Paris 7 (France) Krasimir Angelov, University of Gothenburg (Sweden) Marianna Apidianaki, LIMSI (France) Eric de la Clergerie, INRIA (France) Andreas van Cranenburgh, Royal Netherlands Academy for Arts and Sciences (The Netherlands) Joachim Daiber, University of Amsterdam (The Netherlands) Carlos Gómez Rodríguez, University of A Coruña (Spain) Eva Hasler, University of Cambridge (UK) Mijail Kabadjov, University of Essex (UK) Sylvain Kahane, University Paris 10 (France) Laura Kallmeyer, University of Düsseldorf (Germany) Philipp Koehn, University of Edinburgh (UK) Johannes Leveling, Elsevier (The Netherlands) Timm Lichte, University of Düsseldorf (Germany) Peter Ljunglöf, University of Gothenburg (Sweden) Georgiana Marsic, University of Wolverhampton (UK) Detmar Meurers, University of Tübingen (Germany) Jean-Luc Minel, Université Paris Ouest Nanterre La Défense (France) Sara Moze, University of Wolverhampton (UK) Philippe Muller, University of Toulouse/IRIT (France) Preslav Nakov, Qatar Computing Research Institute (Qatar) Mark-Jan Nederhof, University of St. Andrews (UK) Yannick Parmentier, University of Orléans (France) Ted Pedersen, University of Minnesota (USA) Irene Renau, Pontificia Universidad Católica de Valparaíso (Chile) Lonneke van der Plas, University of Malta (Malta) Natalie Schluter, University of Copenhagen (Denmark) Djamé Seddah, University Paris 4 (France) Khalil Sima’an, University of Amsterdam (The Netherlands) Yannick Versley, University of Heidelberg (Germany) Suzan Veberne, University of Nijmegen (The Netherlands) Andy Way, Dublin City University (Ireland) Invited Speaker: David Chiang, University of Notre Dame (USA) v Table of Contents An LFG Account of Discontinuous Nominal Expressions Liselotte Snijders . .1 Non-projectivity and valency Zdenka Uresova, Eva Fucikova and Jan Hajic. .12 Machine Translation of Non-Contiguous Multiword Units Anabela Barreiro and Fernando Batista . 22 Discontinuous VP in Bulgarian Elisaveta Balabanova . 31 Discontinuous Genitives in Hindi/Urdu Sebastian Sulger. .37 Discontinuous parsing with continuous trees Wolfgang Maier and Timm Lichte . 47 Discontinuity Reˆ2-visited: A Minimalist Approach to Pseudoprojective Constituent Parsing Yannick Versley . 58 vii Workshop Program Friday, June 17, 2016 9:30–10:00 An LFG Account of Discontinuous Nominal Expressions Liselotte Snijders 10:00–10:30 Non-projectivity and valency Zdenka Uresova, Eva Fucikova and Jan Hajic 10:30–11:00 Coffee break 11:00–12:15 Invited Talk: Finite automata for free word order languages David Chiang 12:15–12:45 Machine Translation of Non-Contiguous Multiword Units Anabela Barreiro and Fernando Batista 12:45–14:30 Lunch break 2:30–3:00 Discontinuous VP in Bulgarian Elisaveta Balabanova 3:00–3:30 Discontinuous Genitives in Hindi/Urdu Sebastian Sulger 4:00–4:30 Discontinuous parsing with continuous trees Wolfgang Maier and Timm Lichte 4:30–5:00 Discontinuity Reˆ2-visited: A Minimalist Approach to Pseudoprojective Constituent Parsing Yannick Versley 5:00–5:45 Panel discussion ix An LFG Account of Discontinuous Nominal Expressions Liselotte Snijders Waseda University Tokyo, Japan [email protected] Abstract (1) Kurdu-ngku ka wajilipi-nyi child-ERG PRES chase-NONPAST This paper presents an overview of an LFG wita-ngku treatment of discontinuous nominal expressions involving modification, making the small-ERG claim that cross-linguistically different types ‘The small child is chasing it.’ of discontinuity (i.e. in Warlpiri and English) should be captured by the same overall analy- In (1) a head noun is separated from a modifier, sis, despite being licensed in different ways. but both parts map to the same grammatical function LFG’s separation of grammatical functions (subject). The two parts of the discontinuous expres- from phrase structural positions intuitively ac- sion share the same case-marking. A similar type counts for discontinuous expressions, and its of discontinuity involving modification is attested in use of glue semantics ensures that discontin- English, in the cases of relative clause extraposition uous and contiguous expressions receive the same semantic analysis. in (2a) and NP-PP split in (2b) (Kirkwood, 1977, p. 55):2 1 Introduction (2) a. The man entered who I met yesterday. Discontinuity of nominal expressions, a phe- b. A number of stories soon appeared nomenon in which two or more parts of a seman- about Watergate. tic nominal unit are non-adjacent in phrase struc- A similar type of discontinuity is in fact also at- ture, is prevalent in languages traditionally classified tested in Warlpiri (Hale, 1976, p. 78):3 as “non-configurational” (Hale, 1983), e.g. the Aus- tralian languages Warlpiri, Wambaya, Jaminjung (3) Ngajulu-rlu rna yankirri pantu-rnu (Simpson, 1991; Nordlinger, 1998; Schultze-Berndt I-ERG AUX emu.ABS spear-PAST and Simard, 2012), Latin (Devine and Stephens, kuja-lpa ngapa nga-rnu. 2000; Spevak, 2010), Ancient Greek (Devine and COMP-AUX water.ABS drink-PAST Stephens, 2006), and are also attested in a number ‘I speared the emu that was drinking of Slavic languages, e.g. Russian (Sekerina, 1997; water.’ Sekerina, 1999) and Polish (Siewierska, 1984). An 2Another type of discontinuity in English involving modifi- example of nominal discontinuity from Warlpiri is cation is partial fronting, e.g. About Japan, the woman wrote shown in (1) (Simpson, 1991, p. 282):1 many books; additional examples are discussed in Section 6. 3 1 Hale (1976) refers to this type of example as ‘adjoined rel- This type of Warlpiri example has another interpretation, ative clause’: it can also precede the sentence as a whole (some- which can be translated as ‘The childi is chasing it and iti is what like a hanging topic). It can also have a temporal reading: small’

Proceedings of NAACL-HLT 2016

The Morphosyntax of Clitic Doubling

Bulgarian Reference Grammar

Morphology Interface: Denominal Adjectives in Bulgarian Boris Harizanov

Special Issue a Cognitive Linguistic View of South Slavic Prepositions