NAACL HLT 2016

Workshop on Discontinuous Structures in Natural Language Processing (DiscoNLP)

Proceedings of the Workshop

June 17, 2016 San Diego, California, USA c 2016 The Association for Computational Linguistics

Order copies of this and other ACL proceedings from:

Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected]

ISBN 978-1-941643-85-3

ii Introduction

This volume presents the papers presented at the Workshop on Discontinuous Structures in Natural Language Processing, held in San Diego, California on June 17, 2016 during the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

The modeling of certain structures in natural language requires a mechanism for discontinuity, in the sense that we must account for two or more parts of the structure that are not adjacent. This is true across many languages and on different description levels. For instance, on the lexical level, this concerns discontinuous morphological phenomena such as transfixation (templatic morphology), as well as phrasal , and non-contiguous multiword expressions. On the syntactic level, discontinuity is caused by phenomena such as extraposition and topicalization, or argument scrambling. Morphologically rich languages (MRLs) are particularly likely to exhibit such phenomena. Other examples include disfluency and anaphora/coreference resolution with discontinuous antecedents; modeling in both of the latter areas requires an extended domain of locality. On a higher level, discontinuity is a relevant factor in machine translation, as well as in complex question answering and in topic structure modeling. Discontinuity has been studied intensively in a range of different areas, including but not limited to grammar development, syntactic and semantic parsing, morphological analysis, machine translation, anaphora resolution, discourse modeling, automatic summarization and complex question answering.

Nevertheless, the treatment of discontinuous structures remains a challenge, because on the one hand, recovering of non-local information is generally associated with a high computational cost, and on the other hand, discontinuities are inherently a low-frequency phenomenon, which means that statistical approaches have a tendency to analyze them incorrectly as more frequent local phenomena. Additionally, it is not always clear if and how NLP tasks can benefit from knowing about discontinuity, that is, why one should care, particularly considering the given computational cost. The goal of this workshop is to bring together researchers from the different areas to give them a forum to exchange ideas and problem solutions, to create synergy effects, and to enable more powerful solutions. This encompasses not only linguistic analyses and work on analyzing or recovering the corresponding structures, such as, e.g., in non-projective dependency parsing, but also studies on "use cases", which show how information about discontinuity can be used to enhance NLP tasks. We think that given the broad program we have put together, this goal has been more than fulfilled.

Thanks to all authors who have contributed their work! Out of ten submissions, seven were selected for presentation. We would also like to extend our gratitude the program committee, who have dedicated their time and effort in order to make this workshop a high-quality event.

See you in San Diego!

Wolfgang Maier, Sandra Kübler, and Constantin Orasan˘

iii

Organizers: Wolfgang Maier, University of Düsseldorf (Germany) Sandra Kübler, Indiana University (USA) Constantin Orasan,˘ University of Wolverhampton (UK)

Program Committee: Anne Abeillé, University Paris 7 (France) Krasimir Angelov, University of Gothenburg (Sweden) Marianna Apidianaki, LIMSI (France) Eric de la Clergerie, INRIA (France) Andreas van Cranenburgh, Royal Netherlands Academy for Arts and Sciences (The Netherlands) Joachim Daiber, University of Amsterdam (The Netherlands) Carlos Gómez Rodríguez, University of A Coruña (Spain) Eva Hasler, University of Cambridge (UK) Mijail Kabadjov, University of Essex (UK) Sylvain Kahane, University Paris 10 (France) Laura Kallmeyer, University of Düsseldorf (Germany) Philipp Koehn, University of Edinburgh (UK) Johannes Leveling, Elsevier (The Netherlands) Timm Lichte, University of Düsseldorf (Germany) Peter Ljunglöf, University of Gothenburg (Sweden) Georgiana Marsic, University of Wolverhampton (UK) Detmar Meurers, University of Tübingen (Germany) Jean-Luc Minel, Université Paris Ouest Nanterre La Défense (France) Sara Moze, University of Wolverhampton (UK) Philippe Muller, University of Toulouse/IRIT (France) Preslav Nakov, Qatar Computing Research Institute (Qatar) Mark-Jan Nederhof, University of St. Andrews (UK) Yannick Parmentier, University of Orléans (France) Ted Pedersen, University of Minnesota (USA) Irene Renau, Pontificia Universidad Católica de Valparaíso (Chile) Lonneke van der Plas, University of Malta (Malta) Natalie Schluter, University of Copenhagen (Denmark) Djamé Seddah, University Paris 4 (France) Khalil Sima’an, University of Amsterdam (The Netherlands) Yannick Versley, University of Heidelberg (Germany) Suzan Veberne, University of Nijmegen (The Netherlands) Andy Way, Dublin City University (Ireland)

Invited Speaker: David Chiang, University of Notre Dame (USA)

v

Table of Contents

An LFG Account of Discontinuous Nominal Expressions Liselotte Snijders ...... 1

Non-projectivity and valency Zdenka Uresova, Eva Fucikova and Jan Hajic...... 12

Machine Translation of Non-Contiguous Multiword Units Anabela Barreiro and Fernando Batista ...... 22

Discontinuous VP in Bulgarian Elisaveta Balabanova ...... 31

Discontinuous Genitives in Hindi/Urdu Sebastian Sulger...... 37

Discontinuous parsing with continuous trees Wolfgang Maier and Timm Lichte ...... 47

Discontinuity Reˆ2-visited: A Minimalist Approach to Pseudoprojective Constituent Parsing Yannick Versley ...... 58

vii

Workshop Program

Friday, June 17, 2016

9:30–10:00 An LFG Account of Discontinuous Nominal Expressions Liselotte Snijders

10:00–10:30 Non-projectivity and valency Zdenka Uresova, Eva Fucikova and Jan Hajic

10:30–11:00 Coffee break

11:00–12:15 Invited Talk: Finite automata for free word order languages David Chiang

12:15–12:45 Machine Translation of Non-Contiguous Multiword Units Anabela Barreiro and Fernando Batista

12:45–14:30 Lunch break

2:30–3:00 Discontinuous VP in Bulgarian Elisaveta Balabanova

3:00–3:30 Discontinuous Genitives in Hindi/Urdu Sebastian Sulger

4:00–4:30 Discontinuous parsing with continuous trees Wolfgang Maier and Timm Lichte

4:30–5:00 Discontinuity Reˆ2-visited: A Minimalist Approach to Pseudoprojective Constituent Parsing Yannick Versley

5:00–5:45 Panel discussion

ix

An LFG Account of Discontinuous Nominal Expressions

Liselotte Snijders Waseda University Tokyo, Japan [email protected]

Abstract (1) Kurdu-ngku ka wajilipi-nyi child-ERG PRES chase-NONPAST This paper presents an overview of an LFG wita-ngku treatment of discontinuous nominal expres- sions involving modification, making the small-ERG claim that cross-linguistically different types ‘The small child is chasing it.’ of discontinuity (i.e. in Warlpiri and English) should be captured by the same overall analy- In (1) a head noun is separated from a modifier, sis, despite being licensed in different ways. but both parts map to the same grammatical function LFG’s separation of grammatical functions (). The two parts of the discontinuous expres- from phrase structural positions intuitively ac- sion share the same case-marking. A similar type counts for discontinuous expressions, and its of discontinuity involving modification is attested in use of glue semantics ensures that discontin- English, in the cases of relative clause extraposition uous and contiguous expressions receive the same semantic analysis. in (2a) and NP-PP split in (2b) (Kirkwood, 1977, p. 55):2

1 Introduction (2) a. The man entered who I met yesterday. Discontinuity of nominal expressions, a phe- b. A number of stories soon appeared nomenon in which two or more parts of a seman- about Watergate. tic nominal unit are non-adjacent in phrase struc- A similar type of discontinuity is in fact also at- ture, is prevalent in languages traditionally classified tested in Warlpiri (Hale, 1976, p. 78):3 as “non-configurational” (Hale, 1983), e.g. the Aus- tralian languages Warlpiri, Wambaya, Jaminjung (3) Ngajulu-rlu rna yankirri pantu-rnu (Simpson, 1991; Nordlinger, 1998; Schultze-Berndt I-ERG AUX emu.ABS spear-PAST and Simard, 2012), Latin (Devine and Stephens, kuja-lpa ngapa nga-rnu. 2000; Spevak, 2010), Ancient Greek (Devine and COMP-AUX water.ABS drink-PAST Stephens, 2006), and are also attested in a number ‘I speared the emu that was drinking of Slavic languages, e.g. Russian (Sekerina, 1997; water.’ Sekerina, 1999) and Polish (Siewierska, 1984). An 2Another type of discontinuity in English involving modifi- example of nominal discontinuity from Warlpiri is cation is partial fronting, e.g. About Japan, the woman wrote shown in (1) (Simpson, 1991, p. 282):1 many books; additional examples are discussed in Section 6. 3 1 Hale (1976) refers to this type of example as ‘adjoined rel- This type of Warlpiri example has another interpretation, ative clause’: it can also precede the sentence as a whole (some- which can be translated as ‘The childi is chasing it and iti is what like a hanging topic). It can also have a temporal reading: small’ (Simpson, 1991). Based on Simpson’s work this appears ‘I speared the emu while it was drinking water’. to be secondary predication rather than discontinuity, therefore I only take the interpretation in (1) into account.

1

Proceedings of DiscoNLP 2016, pages 1–11, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics

Discontinuity of nominal expressions, whether of framework which posits a parallel architecture, sep- the kind in which two words are marked with the arating information about grammatical functions same case (as in (1)), or of the kind in which a mod- from phrase structural configuration. For this reason ifier of an argument is postposed to follow the clause it is well-suited to account for languages with rela- (as in (2) and (3)) presents a challenge for syntactic tively free ordering of grammatical functions (e.g. theory, as it requires the two or more parts of syn- Warlpiri). LFG posits two syntactic levels: con- tactic information to be united in the semantics. stituent structure (c-structure) and functional struc- In this paper I illustrate how discontinuous ture (f-structure). C-structure encodes information nominal expressions can be accounted for within about linear precedence, dominance relations and the Lexical-Functional Grammar (LFG) framework constituency, and is represented as a phrase structure based on previous work. With its constraint-based tree. F-structure hosts information about the gram- nature and parallel architecture, LFG provides a matical functions of the predicate of a sentence (in- straightforward way of handling discontinuity by al- cluding adjuncts), along with a range of morphosyn- lowing two or more separate parts of phrase struc- tactic information such as case, number, tense and tural information to map to the same functional aspect. It is represented as an attribute-value ma- structure, and thereby to the same semantic struc- trix. C-structure nodes are locally annotated with ture. The focus of this paper is on nominal disconti- information about grammatical functions and/or lex- nuity involving modification specifically (i.e. a head ical information. Each c-structure node is associated and a modifier being separated), to limit the scope of with a particular f-structure, and a local annotation discussion, but Section 8 briefly addresses a differ- on the c-structure node ensures the mapping of the c- ent type of discontinuity in comparison. I make the structure node to this f-structure via the φ function. claim that discontinuous nominal expressions (in- Specifically, the local annotation on a particular c- volving modification) in typologically different lan- structure node specifies the relation of the f-structure guages (i.e. in Warlpiri and English) are instances associated with this node to the f-structure associ- of the same phenomenon and therefore require the ated with its mother node. An example of this map- same analysis, despite being licensed by somewhat ping for the simple sentence John walked is shown different phrase structure rules. I propose a defini- in Figure 1. tion that captures both types of discontinuity, and In the annotations on c-structure, points to the ↑ illustrate that LFG is capable of accounting for the f-structure of the mother node and points to the f- ↓ different types in a straightforward fashion. structure of the current node. The annotation = ↑ ↓ Overall I aim to illustrate LFG’s potential in con- thus expresses that the f-structure of the mother node tributing to a potential implementation of discon- is the same as the f-structure of its daughter node, tinuity in NLP systems, because of its straightfor- and the annotation ( SUBJ) = expresses that the ↑ ↓ ward account of discontinuity, not requiring any daughter node maps to the subject of its mother’s f- special mechanisms. Discontinuous data is more structure. In English, this subject annotation is struc- challenging for approaches which parse sentences turally associated with the NP node that is in the based on linear ordering, such as dependency gram- specifier of IP position (see Section 5). The gram- mar and approaches relying on surface phrase struc- matical functions SUBJ and OBJ (as well as a number ture configuration. LFG is computationally imple- of other functions assumed in LFG, such as OBJθ and mentable and has been implemented in the XLE sys- OBLθ) are unique, e.g. a can only have one sub- tem (Crouch et al., 2011). This paper thereby con- ject which it subcategorizes for. This subcategoriza- tributes to potential enhancements of NLP tasks with tion is marked in the lexical annotation on walked regards to discontinuity. in Figure 1, ( PRED) = ‘walk’, which states ↑ that the predicate (PRED) value of this word is ‘walk’ 2 Lexical-Functional Grammar and takes one argument, SUBJ. Adjuncts (ADJ) are not Lexical-Functional Grammar (LFG) (Kaplan and unique, but map to a set, by means of the annotation ( ADJ), to be discussed in more detail in the next Bresnan, 1982) is a constraint-based syntactic ↓∈ ↑ section.

2 3 Discontinuity in LFG ent languages is licensed in different ways. One type of English discontinuity, extraposition, will 3.1 Previous Work be addressed here, and I show that it requires the Discontinuous nominal expressions have been ana- same treatment as the discontinuity with two case- lyzed in LFG by Simpson (1991) (Warlpiri), Kuhn marked nominals as found in Warlpiri, providing a (1999; 2001) (German), Cavar and Seiss (2011) cross-linguistic definition of discontinuous nominal (New-Shtokavian) and are discussed by Snijders expressions in Section 4. I show that any cross- 4 (2012; 2015). Simpson (1991) and Kuhn (1999; linguistic differences are due to differences in c- 2001) share a similar overall analysis in assuming structure rules (phrase structure rules). that two parts of a discontinuous nominal expression Discontinuity involving extraposition and discon- map to the same f-structure. Here I focus on Simp- tinuity involving two words with the same case- son’s (1991) analysis. Simpson’s analysis for exam- marking have in common the fact that two parts ple (1), adapted to fit with her more recently pro- of the same grammatical functions are not adja- posed c-structure for Warlpiri (Simpson, 2007), is cent in phrase structure. In the instance of relative 5 shown in Figure 2 (leaving out lexical annotations). clause extraposition in English, I propose to repre- Crucially, the two parts of the discontinuous ex- sent the extraposed clause by means of an adjoined pressions both map to the f-structure of the subject CP clause.6 The structure I propose for (2a) is shown of the predicate. The annotation ( ADJ) on the ↓∈ ↑ in Figure 4 with partial annotations. In the case of N node states that this node ( ) maps to the set of ↓ (2b) we would instead have an adjoined PP. adjuncts of the NP node ( ). Note that the , ↑ I note a few differences between the type of dis- despite being absent in c-structure, is present in f- continuity with two case-marked nominals (Figure structure as the verb requires an object: its PRED 2) and the English extraposition one (Figure 4). Lin- value is ‘PRO’. guistically, a difference is the type of categories that The overall f-structure for the contiguous example may be separated from each other, i.e. in Warlpiri is the same, as shown by the annotations on the c- two nominals, while in English an NP and a CP or structure of the contiguous version of example (1), PP. A second point of linguistic variation is the po- shown in Figure 3. Unlike in English, in the Warlpiri sition that the two or more parts of the discontin- c-structures the subject annotation comes from the uous expression may appear in. In English this is case-marking, not from the structural position. restricted, shown by the unacceptability of *A num- The example with the discontinuous expression ber of stories soon about Watergate appeared. In the and the contiguous one thus have the same f- case of Warlpiri discontinuity in which the subparts structure, which is mapped to the same semantic have the same case-marking, the placement of the structure, as will be discussed in Section 7. subparts is much freer, reflecting Warlpiri’s property of free placement of grammatical functions.7 3.2 English Extraposition Another difference between the discontinuous ex- Previous work in LFG does not discuss discontin- pression in Warlpiri in Figure 2 and the English one uous nominal expressions in English, nor does it in Figure 4 is the type of annotation on the modifier: provide an account of how discontinuity in differ- 6The CP does not form a constituent with the VP, as prepos- 4There is other work on long-distance dependencies in LFG ing of both is ruled out: *‘Entered who I saw yesterday, the (e.g. Kaplan and Zaenen (1989), Cl´ement et al. (2002) among man’. For this reason adjunction to IP is appropriate. others), but this literature discusses cases of arguments appear- 7However, like word order, discontinuity is not random, but ing outside the clause that they are part of (e.g. wh-fronting out is triggered by information structure, as discussed by De Jong of embedded clauses). This is a different type of long-distance (1986) (Latin), Cavar and Seiss (2011) (New-Shtokavian) and dependency than the discontinuous nominal expressions dis- Schultze-Berndt and Simard (2012) (Jaminjung). A full discus- cussed here, as the latter case involves two parts of the same sion of the information structure of discontinuous expressions is grammatical function being separated. beyond the scope of this paper. Also, recall that Warlpiri does 5It is generally assumed that adjective-like elements in appear to have a type of extraposition as shown in (3), which Warlpiri are of the N category (Hale, 1983; Simpson, 1991; seems more restricted in its placement than the type of discon- Hale et al., 1995). tinuity involving case-marked nominals.

3 IP

NP I′ ( SUBJ)= = ↑ ↓ ↑ ↓ N VP PRED ‘walk SUBJ ’ h i = = ↑ ↓ ↑ ↓ TENSE PAST  John V SUBJ PRED ‘John’  ( PRED) = ‘John’ =   ↑ ↑ ↓  h i walked ( PRED) = ‘walk’ ↑ ( TENSE) = PAST ↑ Figure 1: An illustration of c- to f-structure mapping.

IP

NP I′ PRED ‘chase SUBJ,OBJ ’ ( SUBJ)= = h i ↑ ↓ ↑ ↓ TENSE PRES  NIS  PRED ‘child’  = = =   ↑ ↓ ↑ ↓ ↑ ↓  CASE ERG     kurdu-ngku ka V NP SUBJ    PRED ‘small’  child-ERG PRES = ( SUBJ)=  ADJ  ↑ ↓ ↑ ↓       "CASE ERG #  wajilipi-nyi N        chase-NONPAST ( ADJ)    ↓∈ ↑ OBJ PRED ‘PRO’     wita-ngku  h i  small-ERG Figure 2: C- to f-structure mapping of a discontinuous expression.

IP IP

NP I′ IP CP ( SUBJ)= = = ( SUBJ ADJ) ↑ ↓ ↑ ↓ ↑ ↓ ↓∈ ↑ N N I S NP I′ who I met yesterday = ( ADJ) = = ( SUBJ)= = ↑ ↓ ↓∈ ↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ kurdu wita-ngku ka V D N′ VP child small-ERG PRES = ↑ ↓ the N V wajilipi-nyi chase-NONPAST man entered Figure 3: C-structure for a contiguous example, mapping to the Figure 4: C-structure for English relative clause extraposition. f-structure in Figure 2. is an important observation, as we would have the in the Warlpiri case both head and modifier map to same f-structure mapping even with different anno- the same f-structure directly (subject of the predi- tations. For example, in the Warlpiri c-structure in cate by means of the annotation ( SUBJ) = ). In the Figure 2, the annotations on the two NPs are both ↑ ↓ English case, the modifier maps to the f-structure of ( SUBJ) = , both mapping to the same f-structure, ↑ ↓ the adjunct of the SUBJ, whereas the head maps to the but we could also imagine a set of annotations where overall f-structure of the SUBJ. The annotations are the adjunct has the annotation ( SUBJ ADJ). A ↓∈ ↑ thus somewhat different, but the end result for both definition of discontinuity needs to abstract away examples is the same: both head and modifier are from this variation in annotation, which is reflected contained within the f-structure of the subject. This in the definition proposed in the following section.

4 a. a. b. X Z Y X ( SUBJ)= = ( SUBJ)= = ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ X ZY Y b. ( OBJ)= ( SUBJ)= = ↑ ↓ ↑ ↓ ↑ ↓ X Z Y ( SUBJ)= = ( SUBJ ADJ) c. ↑ ↓ ↑ ↓ ↓∈ ↑

X Y Figure 6: Instances of discontinuous nominal expressions. ( SUBJ)= ( SUBJ)= ↑ ↓ ↑ ↓ out the structure in Figure 5 (b) as an instance of Figure 5: Structures not fulfilling the conditions of definition discontinuity. In this structure the yield of X is not (4). string adjacent to the yield of Y (in other words, the 4 Definition of Discontinuity in LFG edges of X and Y do not coincide), but X and Y do not map to the f-structure (or sub-f-structure) of the In order to capture nominal discontinuity in a more same grammatical function. Condition (iii) ensures formal way, I propose the following definition:8 that there is an intervening element, and rules out the structure in Figure 5 (c) as an instance of disconti- (4) Nominal discontinuous expressions: nuity. Given two c-structure constituents X and Y, X The structure in Figure 6 (a) fulfills all conditions = Y, X, Y form a discontinuous nominal 6 { } as listed in (4), including condition (iv). Finally, ac- expression iff: cording to condition (ii) in (4), the structure in Fig- i. Neither X nor Y dominate the other; and ure 6 (b) is also a case of discontinuity. ii. X and Y map to the f-structure or 5 Constraining Discontinuity sub-f-structure of the same grammatical function; and The definition of nominal discontinuity in (4) cov- iii. The yield of X is not string adjacent to the ers both the type of nominal discontinuity attested yield of Y; and with two case-marked nominals (one of which mod- iv. The constituent(s) that intervene(s) ifies the other) and the type with an extraposed mod- between X and Y do(es) not map to the ifier clause. Their analysis is very similar in terms of f-structure or any sub-f-structure of the their c- to f-structure mapping, but the way in which grammatical function that X and Y map the two different types are licensed is somewhat dif- to. ferent. The case-marked nominal type of discontinu- ity is made possible by the assumption in LFG that The key here is that the two parts of a discontin- in languages like Warlpiri grammatical functions are uous expression both map to the f-structure of the assigned lexically and not by phrase structure con- same grammatical function, or to an f-structure that figuration (Dalrymple, 2001; Bresnan et al., 2016). is contained within the f-structure of this grammat- Free assignment of grammatical functions enables ical function. Consider the partial structures shown the existence of discontinuous expressions as in (1).9 in Figure 5, all of which do not fulfill all conditions Cases of English extraposition are more constrained of definition (4). as grammatical functions are generally assumed to Condition (i) rules out structure (a) in Figure 5 as 9Again, this does not mean that discontinuity is uncon- being discontinuous; X and Y here map to the same strained: it appears constrained by information structure. More- f-structure, but X dominates Y. Condition (ii) rules over, there are languages which like Warlpiri have free assign-

8 ment of grammatical functions, but which lack discontinuous One reviewer points out that these conditions (especially nominal expressions in which two words with the same case- (i), (iii) and (iv)) do not confine to nominals. However, this marking are separated. We can say that Warlpiri has head op- paper restricts itself to nominal expressions (involving modifi- tionality, allowing for a modifier to appear without its head cation); extending the definition to other kinds of discontinuity nominal. is briefly addressed in Section 8.

5 be assigned configurationally. same GF to appear on two NPs, enabling discontinu- In LFG c-structure is licensed and constrained by ity (under the assumption that in Warlpiri syntactic c-structure rules (phrase structure rules), assumed to heads are optional). The actual annotation on the be static constraints on c-structure. In English, a NPs is determined lexically, by case-marking. rule ensuring that the subject appears pre-verbally In English the situation is somewhat different. and VP-externally is as follows:10 Discontinuity of the kind shown in (2) and Figure 4 is licensed by the rule in (5) and a rule like the one IP NP I′ (5) → in (7) for adjunction of CP or PP to IP: ( SUBJ)= = ↑ ↓ ↑ ↓ (7) XP CP PP ≡ { | } The annotation on the NP ensures that if the NP IP IP XP 11 → is present, it obligatorily hosts the subject. The = ( GF ADJ) rule in (5) partly licenses the c-structure in Figure 1. ↑ ↓ ↓∈ ↑ The annotation on the XP, ( GF ADJ), maps to In Warlpiri, with no obligatory annotations for any ↓∈ ↑ grammatical function, the c-structure rules are less the adjunct set of a GF function. Note that this GF constrained in this dimension. An example is the IP is not restricted to SUBJ, as we can have examples rule partly licensing the c-structures in Figures 2 and like Mary mentioned the claim yesterday that John 3, where GF = ‘grammatical function’:12 is intelligent. Despite configurational assignment of GFs in English, there is some nonspecificity in an- (6) GF SUBJ OBJ notation here. It appears that Warlpiri adjoined rel- ≡ { | } IP NP V I′ ative clauses (as in (3)) can be licensed by a similar → { | } ( GF)= = = rule with a similar annotation on an extraposed CP ↑ ↓ ↑ ↓ ↑ ↓ (but with the difference that the clause can appear The node preceding I′ ranges over NP and V, as on either side of the main IP). A more in-depth in- either an NP or a V may appear preceding the AUX vestigation of the data is needed to posit a specific constituent (which is in I position), as long as it bears rule like this, but a generalized rule of extraposition a focus function. The annotations on NP and V are seems plausible and would be promising for a uni- different. Relevant for the current discussion is the form approach to this type of discontinuity. Discon- annotation on the NP: it is unspecified for its gram- tinuity involving case-marked nominals and discon- matical function (in this case, it ranges over SUBJ and tinuity involving extraposition have somewhat dif- OBJ, but this set can be extended depending on the ferent underlying mechanisms. We can say that dif- attested data). Assuming that all NPs in Warlpiri’s ferent restrictions on annotations on c-structure rules c-structure rules have the same unspecified annota- in the different constructions (and languages) lead tion ( GF) = , it is possible for an annotation for the ↑ ↓ to very similar outcomes, namely that two parts of 10The subject can also appear in Spec,CP position, for exam- c-structure which are non-adjacent can map to the f- ple when it is a wh-word. structure or sub-f-structure of the same grammatical 11 I note‘if the NP is present’, as LFG adheres to the principle function. of Economy of Expression, which states that all phrase structure nodes are optional, unless required by independent principles (Bresnan et al., 2016). An example of an independent principle 6 More complex cases is satisfaction of subcategorization requirements, e.g. if the sub- There are other types of more complex cases of ject is expressed elsewhere in the c-structure (e.g. in Spec,CP) then the NP node in (5) is absent. nominal discontinuity involving modification in En- 12The reason for assuming an IP in Warlpiri (following glish, namely extraposition with embedding (in (8a)) Austin and Bresnan (1996), Simpson (2007)) is the set posi- (M¨uller, 2016, p. 443), extraction out of complex tions of two types of constituents. The first is the verb-like NPs (in (8b)) and secondary predication (in 8c)):13 constituent referred to as AUX (‘auxiliary’) in the Warlpiri lit- erature, like ka (glossed ‘PRES’) in (1), assumed to appear in (8) a. Many proofs of the theorem appeared I position. The second is the constituent immediately preced- ing the AUX, which Simpson (2007) assumes to always have a that I wanted to think about. focus discourse function (similarly, she assumes that Spec,CP 13I thank the anonymous reviewer for suggesting to include always hosts a topic function). these examples in the paper.

6 b. Who did they take pictures of? pair of sentences with or without a discontinuous ex- c. She watched him naked. pression have the same f-structure, as shown above. Semantics in LFG is represented on the level of Extraposition with embedding ((8a)) is a more s-structure. Following Dalrymple and Nikolaeva complex version of example (2). The extraposed (2011, p. 90), I assume that f-structure is mapped clause can be assumed to either modify the head directly to s-structure via the σ function. The di- many proofs or the modifier of the theorem. In the rect mapping ensures that sentences with the same f- first case, rule (7) applies, with GF = SUBJ. For the structure will receive the same semantics. A discon- latter case we need an extension of rule (7) to ensure tinuous expression and a contiguous expression will that the relative clause can map to the adjunct set of therefore have the same semantics, as also pointed the adjunct of the head, which can be achieved by out by Dalrymple (2001). This is achieved by glue an additional possible annotation on the XP in (7) of semantics (Dalrymple et al., 1993; Dalrymple, 1999; the form (ADJ ADJ GF ). Dalrymple, 2001), the linguistic theory of semantic ↓∈ ∈ ↑ As for the type of discontinuity involving extrac- composition commonly used in LFG, which relies tion from complex NPs ((8b)), this is captured by on linear logic. Glue semantics associates meaning the definition of discontinuity in (4) if we assume constructors, instructions on how to combine mean- that the first part (who) maps to the adjunct set of ings to form the meaning of the sentences, with lexi- the object, and that pictures of maps to the object. cal items (or in some cases with phrase structural po- The preposition of can by itself map to the adjunct sitions). Semantic composition is therefore largely set of the object as well, meaning that both who and separate from c-structure constituency, which is es- of are part of the adjunct set.14 Therefore this type pecially beneficial for the purpose of accounting for of example does not contradict the generalizations discontinuous nominal expressions, as we want the proposed. same semantic analysis for two different c-structural The example of secondary predication in (8c) is configurations. For example, in the Warlpiri exam- not covered by the definition of nominal discontinu- ple in Figure 2, the two subparts of the discontinuous ity in (4), as there is no intervening material between expression each contribute their own meaning con- him and naked. This construction appears very simi- structors. Before looking at these, consider the f- to lar to relative clause extraposition in the sense that s-structure mapping for the SUBJ (the discontinuous a modifier follows the sentence as a whole. Un- expression) of example (1): der the current approach this is not assumed to be a case of discontinuity, even if the two adjacent words PRED ‘child’ (9) s him and naked do not form a syntactic constituent. I ADJ PRED ‘small’  leave this issue open for discussion.     h i  An approach to discontinuity following the defi- VAR v s nition in (4) thus covers most cases, but in an imple- σ h i RESTR r mentation of this approach, one might need to con- λ.X.small(Xh) ichild(X): v r sider specific constructions individually to achieve  ∧  ⊸ accurate results. The f- to s-structure mapping of the head by itself 7 Mapping to Semantics is very similar: For completeness, I discuss how discontinuous nom- (10) PRED ‘child’ h i inal expressions involving modification can be ana- VAR v lyzed semantically. The c- to f-structure mapping  h i in LFG via the φ function ensures that a minimal RESTR r   14However, this mapping does not ensure that who and of end λX.child(Xh): iv ⊸ r up as part of the same f-structure in this set; this issue will be addressed in Section 8. In (9), the subject’s f-structure is labelled s, and its s-structure is labelled sσ. The s-structure sσ has two

7 kurdu [child] λX.child(X) : v ⊸ r ( PRED) = ‘child’ λP.λX.small(X) P(X):[v r] [v r] ↑ [small] ⊸ ⊸ ⊸ λX.child(X):( VAR) ( RESTR) ∧ ↑σ ⊸ ↑σ Figure 8: Meaning constructors for kurdu and wita. wita λX.child(X) : v ⊸ r ( PRED) = ‘small’ ↑ λP.λX.small(X) P(X):[v ⊸ r] ⊸ [v ⊸ r] λP.λX.small(X) P(X): ∧ ∧ λX.small(X) child(X) : v ⊸ r [((ADJ )σ VAR) ⊸ (( ADJ )σ RESTR)] ⊸ ∧ ∈↑ ∈↑ Figure 9: Deduction of the meaning of the discontinuous ex- [((ADJ )σ VAR) ⊸ ((ADJ )σ RESTR)] ∈↑ ∈↑ pression. Figure 7: Lexical entries for kurdu and wita. shown in Figure 8, with the meaning constructors in attributes: VAR which has as its value an s-structure bold and brackets (with the glue semantics side on labelled v and RESTR which has as its value an s- the right). structure labelled r. The attribute VAR represents a From the meaning constructors for the two indi- variable of type e and RESTR is of type t and repre- vidual words, one can deduce the meaning of the sents a restriction the variable of type e. For (9) the overall expression as shown in Figure 9. The mean- restriction would state that the variable must range ing constructor for the modifier consumes the con- over individuals that are both children and small. tribution of the noun (v r), and thereby provides For (10) the only restriction is that the variable must ⊸ a new meaning, also associated with v ⊸ r. With- range over individuals that are children. The no- out providing a full overview of glue semantics and tation , the linear implication symbol of linear ⊸ its use in LFG, this section has shown that by as- logic, signifies that if there is an attribute VAR (v) sociating meaning constructors directly with lexical in the s-structure ( ) then there is also an attribute ↑σ items and not with phrase structural positions, one RESTR (r) in that same s-structure. can have the same semantic derivation for both con- The s-structure in (9) of the subject of example tiguous and discontinuous nominal expressions. (1) comes from the lexical entries of the two words of the discontinuous expression, with the one en- 8 Remaining Issues try restricting the other. These lexical entries are shown in Figure 7, leaving out case marking. The There are a few remaining issues left to be resolved lexical entry of the head, kurdu (‘child’), states that with regards to this approach to discontinuity. First, kurdu provides a value for the PRED attribute in the one outstanding technical issue that was brought f-structure, namely child. The second part of the up by Snijders (2012) is the problematic account lexical entry of kurdu makes a statement about the of discontinuous adjuncts. Discontinuous adjuncts mapping to s-structure (signaled by the use of are found for example in Latin, (Bolkestein, 2001, ↑σ mapping to s-structure). It states the restriction on p. 255). In c-structure both parts of a discontin- the variable VAR: it must range over individuals that uous adjunct will be marked with the annotation ( ADJ), to ensure that both parts map to the ad- are children. The lexical specification of the modi- ↓∈ ↑ fier wita is somewhat more complicated. Here (ADJ junct set of the predicate. However, unlike SUBJ or ) refers to the f-structure of which (the ad- OBJ, ADJ is not a unique grammatical function. This ∈ ↑ ↑ junct) is a member (the set of adjuncts, or modifiers), is apparent from the set notation. Any NP anno- (ADJ ) refers to the s-structure corresponding to tated with the adjunct annotation will map to the ∈ ↑ σ that f-structure and ((ADJ ) VAR) refers to the adjunct set, and in principle each NP will form its ∈ ↑ σ value of the VAR attribute of that s-structure. Like- own f-structure in this set. There is no clear way wise, ((ADJ ) RESTR) refers to the value of the to distinguish between the case in which two nomi- ∈ ↑ σ RESTR attribute of the s-structure (ADJ ) . Refer- nals (with the same case, number, gender, assuming ∈ ↑ σ ring to these s-structures with the labels v and r, as that this a constraining factor for discontinuity) form shown in the examples in (9) and (10), the meaning two separate f-structure adjuncts (separate elements constructor premises for the two individual words of the adjunct set) or are part of the same adjunct of the discontinuous expression in example (1) are f-structure. The only constraining factor in this is ‘PRED clash’: grammatical functions (including ADJ)

8 IP 10. In this c-structure the two parts of the phrasal verb both map to the same f-structure, namely the NP I′ overarching f-structure of the sentence (shown by ( SUBJ)= = ↑ ↓ ↑ ↓ = ). Aiming for a general definition of disconti- ↑ ↓ N nuity, this makes it seem appropriate to change con- = I VP ↑ ↓ = = dition (ii) in the definition of discontinuity in (4) to ↑ ↓ ↑ ↓ er read ‘X and Y map to the same f-structure or its gab V′ sub-f-structure’, with no mention of a grammatical = ↑ ↓ function. However, this rephrasing makes inaccu- NP V0 rate predictions, because if we refer to the highest ( OBJ)= = f-structure level (the one of the whole sentence) and ↑ ↓ ↑ ↓ ˆ its sub-f-structures, we refer to all of the f-structures den Kampf P 16 = contained in the sentence. Making reference to all ↑ ↓ f-structures in the definition of discontinuity makes auf it impossible to make any point about discontinuity. Figure 10: C-structure for example (12a). One solution is to change the phrasing of condition may only have one PRED value: (ii) to ‘X and Y map to the same f-structure’, but under this condition the English structure in Figure (11) PRED ‘VERB SUBJ ’ h i 4 would not be an instance of a discontinuous ex- SUBJ PRED ‘SUBJ’  pression, at least not with the annotations as shown.  h i  However, we want the definition to cover both extra- ADJ PRED ‘ADJ1’ , PRED ‘ADJ2’    position and discontinuity like example (1), as ulti-  h i h i   mately both are somewhat different instances of the PRED clash ensures that two adjuncts which both same mechanism. contribute a PRED value do not unify in f-structure. Nonetheless, there is no straightforward way to dis- 9 Conclusion tinguish between the two cases just described, and to ensure that two parts of a discontinuous adjunct This paper has illustrated an LFG approach to dis- map to the same f-structure (or sub-f-structure). continuous nominal expressions involving modifica- A second issue is that the definition in (4) cov- tion, i.e. by letting two (or more) c-structural con- ers discontinuous nominal expressions, but it does stituents map to the same (sub-)f-structure of a spe- not encompass other types of discontinuity, such as cific grammatical function. In this I follow Simpson the one created by ‘particle verbs’ (Ackerman, 1983; (1991), Kuhn (1999; 2001), while making the ex- Pi˜n´on, 1992; L¨udeling, 2001; Booij, 2002; Toivo- plicit claim that different types of discontinuity (e.g. nen, 2003; Forst et al., 2010).15 Consider an exam- two constituents with the same case-marking or the ple from German (Forst et al., 2010, p. 229): case of extraposed XPs) should be captured by the same overall analysis, despite being licensed in dif- (12) Er gab den Kampf auf. ferent ways. Crucially, discontinuous expressions he gave the.ACC fight up and contiguous expressions receive the same map- ‘He gave up the fight.’ ping to semantics, enabled by glue semantics’ asso- I follow Toivonen’s (2003) analysis of verbal par- ciation of meaning constructors with lexical items, ticles as non-projecting words, marked as P.ˆ Exam- not phrase structure. This paper has thereby aimed ple (12) then has the c-structure as shown in Figure to illustrate LFG’s potential in contributing to a po- tential implementation of discontinuity in NLP sys- 15 Also, discontinuous nominal expressions not involving tems. modification but rather with a separated determiner and a noun, as found for example in Latin (Devine and Stephens, 2006, 16One reviewer suggests referring to immediate sub-f- p. 524), have not been discussed. The semantic mapping for this structures only, but the immediate sub-f-structures of the sen- will be different than the mapping for modification described in tence’s overarching f-structure include those of the arguments Section 7. of the predicate, and thereby of the sentence as a whole.

9 Acknowledgments Mary Dalrymple. 2001. Lexical-Functional Grammar, volume 34 of and Semantics. Academic Press, I gratefully acknowledge the JSPS for funding this San Diego; London. work. I also thank Jelke Bloem, Mary Dalrymple, Andrew M. Devine and Laurence D. Stephens. 2000. Ryo Otoguro, Marjolein Poortvliet and the anony- Discontinuous syntax: hyperbaton in Greek. Oxford mous reviewers for their useful suggestions. University Press, Oxford/New York. Andrew M. Devine and Laurence D. Stephens. 2006. Latin Word Order. Oxford University Press, Ox- References ford/New York. Martin Forst, Tracy Holloway King, and Tibor Laczk´o. Farrell Ackerman. 1983. Miscreant Morphemes: 2010. Particle Verbs in Computational LFGs: Is- Phrasal Predicates in Ugric. Ph.D. thesis, UC Berke- sues from English, German and Hungarian. In Miriam ley. Butt and Tracy Holloway King, editors, Proceedings Peter Austin and Joan Bresnan. 1996. Non- of the LFG10 Conference, pages 228–248. CSLI Pub- configurationality in Australian Aboriginal languages. lications. Natural Language and Linguistic Theory , 14(2):215– Kenneth L. Hale, Mary Laughren, and Jane Simpson. 268. 1995. Warlpiri syntax. In Joachim Jacobs, Arnim A. Machteld Bolkestein. 2001. Random Scrambling? von Stechow, Wolfgang Sternefeld, and Theo Venne- Constraints on Discontinuity in Latin Noun Phrases. mann, editors, Syntax. Ein internationales Handbuch In C. Moussy, editor, De lingua Latina novae quaes- zeitgenossischer¨ Forschung: An International Hand- tiones, pages 245–258. Peeters, Louvain. book of Contemporary Research, pages 1430–51. Wal- Geert Booij. 2002. Separable Complex Verbs in Dutch: ter de Gruyter, Berlin New York. A Case of Periphrastic Word Formation. In Nicole Kenneth L. Hale. 1976. The adjoined relative clause in Deh´e, Ray Jackendoff, Andrew McIntyre, and Silke Australia. In R.M.W. Dixon, editor, Grammatical cat- Urban, editors, Verb-Particle Explorations, pages 21– egories in Australian languages, pages 78–105. Aus- 42. Mouton de Gruyter, Berlin. tralian Institute of Aboriginal Studies, Canberra. Joan Bresnan, Ash Asudeh, Ida Toivonen, and Stephen Kenneth L. Hale. 1983. Warlpiri and the Grammar of Wechsler. 2016. Lexical-Functional Syntax. Wiley Non-Configurational Languages. Natural Language Blackwell, Chichester, West Sussex. and Linguistic Theory, 1.1:5–47. Damir Cavar and Melanie Seiss. 2011. Clitic Placement, Jan R. De Jong. 1986. Hyperbaton en informatiestruk- Syntactic Discontinuity and Information Structure. In tuur. Lampas, pages 323–331. Miriam Butt and Tracy Holloway King, editors, Pro- Ronald M. Kaplan and Joan Bresnan. 1982. Lexical- ceedings of the LFG11 Conference, pages 131–151. Functional Grammar: A Formal System for Grammat- CSLI Publications. ical Representation. In Joan Bresnan, editor, The Men- Lionel Cl´ement, Kim Gerdes, and Sylvain Kahane. 2002. tal Representation of Grammatical Relations, pages An LFG-Type Grammar for German Based on the Ty- 173–281. MIT Press, Cambridge, MA. pological Model. In M. Butt and T.H. King, editors, Ronald M. Kaplan and Annie Zaenen. 1989. Long- Proceedings of the LFG02 Conference, pages 116– Distance Dependencies, Constituent Structure and 129. CSLI Publications. Functional Uncertainty. In Mark Baltin and Anthony Richard Crouch, Mary Dalrymple, Ronald M. Kaplan, Kroch, editors, Alternative Conceptions of Phrase Tracy King, III Maxwell, John T., and Paula Newman. Structure, pages 17–42. University of Chicago Press, 2011. XLE Documentation. Palo Alto Research Cen- Chicago, IL. Reprinted in M. Dalrymple, R.M. Ka- ter (PARC), Palo Alto, CA. plan, J.T. Maxwell III and A. Zaenen, editors, Formal Mary Dalrymple and Irina Nikolaeva. 2011. Objects and Issues in Lexical-Functional Grammar, pages 137- Information Structure. Cambridge Studies in Linguis- 165, CSLI Publications, Stanford University, 1995. tics. Cambridge University Press, Cambridge. H.W. Kirkwood. 1977. Discontinuous Noun Phrases in Mary Dalrymple, John Lamping, and Vijay Saraswat. Existential Sentences in English and German. Journal 1993. LFG semantics via constraints. In Proceed- of Linguistics, 13(1):53–66. ings of the 6th Meeting of the EACL, pages 97–105, Jonas Kuhn. 1999. The syntax and semantics of split Utrecht. European Association for Computational Lin- NPs in LFG. In Francis Corblin, Carmen Dobrovie- guistics. Sorin, and Jean-Marie Marandin, editors, Selected Mary Dalrymple, editor. 1999. Semantics and Syntax papers from the Colloque de Syntaxe et Semantique´ in Lexical Functional Grammar: The Resource Logic a` Paris (CSSP 1997), pages 145–166. Thesus, The Approach. MIT Press, Cambridge, MA. Hague.

10 Jonas Kuhn. 2001. Resource Sensitivity in the Syntax- Ida Toivonen. 2003. Non-Projecting Words: A Case Semantics interface and the German Split NP Con- Study of Swedish Particles. Studies in Natural Lan- struction. In T. Kiss and D. Meurers, editors, guage and Linguistic Theory. Kluwer Academic Pub- Constraint-Based Approaches to Germanic Syntax. lishers, Dordrecht. CSLI Publications, Stanford, CA. Anke L¨udeling. 2001. On Particle Verbs and Similar Constructions in German. CSLI Publications, Stan- ford, CA. Stefan M¨uller. 2016. Grammatical Theory: From trans- formational grammar to constraint-based approaches. Textbooks in Language Sciences 1. Language Science Press, Berlin. Rachel Nordlinger. 1998. Constructive Case: Evidence from Australian Languages. CSLI Publications, Stan- ford, CA. Revised version of 1997 Stanford University dissertation. Chris Pi˜n´on. 1992. The Preverb Problem in German and Hungarian. In Proceedings of BLS, pages 395–408. Eva Schultze-Berndt and Candide Simard. 2012. Con- straints on noun phrase discontinuity in an Australian language: The role of prosody and information struc- ture. Linguistics, 50:1015–1058. Irina Sekerina. 1997. The Syntax and Processing of Split Scrambling Constituents in Russian. Ph.D. the- sis, SUNY Graduate School and University Center. Irina Sekerina. 1999. The Scrambling Complexity Hypothesis and Processing of Split Scrambling Con- stituents in Russian. Journal of Slavic Linguistics, 7(2):265–304. Anna Siewierska. 1984. Phrasal Discontinuity in Polish. Australian Journal of Linguistics, 4:57–71. Jane Simpson. 1991. Warlpiri Morpho-Syntax: a Lexi- calist Approach, volume 23 of Studies in Natural Lan- guage and Linguistic Theory. Kluwer Academic Pub- lishers, Dordrecht. Jane Simpson. 2007. Expressing Pragmatic Constraints on Word Order. In Annie Zaenen, Jane Simpson, Tracy Holloway King, Jane Grimshaw, Joan Maling, and Chris Manning, editors, Architectures, Rules, and Preferences: Variations on Themes by Joan W. Bres- nan, pages 403–427.CSLI Publications, Stanford, CA. Liselotte Snijders. 2012. Issues Concerning Constraints on Discontinuous NPs in Latin. In Miriam Butt and Tracy Holloway King, editors, Proceedings of the LFG12 Conference, pages 565–581. CSLI Publica- tions. Liselotte Snijders. 2015. The Nature of Configurational- ity in LFG. Ph.D. thesis, University of Oxford. Olga Spevak. 2010. Constituent Order in Classical Latin Prose. Studies in Language Companion Se- ries. John Benjamins Publishing Company, Amster- dam/Philadelphia.

11 Non-projectivity and valency

Zdenka Uresova and Eva Fucikova and Jan Hajic Faculty of Mathematics and Physics, Charles University in Prague Institute of Formal and Applied Linguistics Malostranske nam. 25 11800 Prague 1, Czech Republic uresova,fucikova,hajic @ufal.mff.cuni.cz { }

Abstract dependency-based theories (Marcus, 1965; Hudson, 1994). In Czech, which is our focus here as a repre- We describe results of investigation of a sentative of a (relatively) free-word order language specific type of discontinuous constructions, namely non-projective constructions concern- which frequently displays this phenomenon, we can ing verbs and their arguments. This topic is cite e.g., (Uhl´ırovˇ a,´ 1972), (Stˇ ´ıcha, 1996), (Oliva, especially important for languages with a rela- 2001) or (Petkevic,ˇ 1998; Petkevic,ˇ 2001). How- tively free word order, such as Czech, which is ever, at that time, they did not have a syntactically the language we have primarily worked with. annotated corpus at their disposal, let alone a seman- For comparison, we have included some re- tically annotated one. Their works are thus rather sults for English. The corpora used for both theoretical treatments with little confrontation with languages are the Prague Czech-English De- pendency Treebank and the Prague Depen- real texts, even though these works have at least dency Treebank, which are both annotated at laid very good basis for the treatment of projectivity a dependency syntax level as well as a deep by defining (from various perspectives) what non- (semantic) level, including verbs and their va- projectivity actually is in terms of sentence structure lency (arguments). We are using traditionally representation. defined non-projectivity on trees with full lin- First treatment of non-projective constructions ear ordering, but the two levels of annotation based on an annotated corpus, namely in the an- are innovatively combined to determine if a particular (deep) verb -argument structure is notation scenario of the Prague Dependency Tree- non-projective. As a result, we have identi- bank (PDT), was presented by Hajicovˇ a´ (2004) and fied several types of discontinuities, which we this issue was further elaborated by Havelka (2005) classify either by the verb class or structurally where some properties of non-projective edges rel- in terms of the verb, its arguments and their evant for the newly presented algorithms were dis- dependents. In addition, we have quantita- cussed and a hint on finding all non-projective edges tively compared selected phenomena found in using its output was given. Havelka (2007) fol- Czech translated texts (in the PCEDT) to the lowed and focused on a refinement of the defini- native Czech as found in the original Prague Dependency Treebank. tions of non-projectivity (having found certain errors in previously published definitions, among other things) and introduced measures to further refine 1 Introduction the notion. In addition, he also showed how em- Non-projective constructions in general have long pirical results corroborate theoretical results. All been the subject of research in computational lin- of these works have focused on the basic proper- guistics, especially within the frameworks of various ties of non-projectivity at the same level of linguis-

12

Proceedings of DiscoNLP 2016, pages 12–21, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics tic description (i.e., surface dependency syntax or tion level of the available corpus, while for testing the deep, semantically-oriented “tectogrammatical” non-projectivity using the standard definitions, we representation as defined in the Prague Dependency use the ‘unquestionable’ linear ordering from the Treebank), i.e., the authors limited themselves to surface dependency annotation which in turn fol- only one syntactic layer at a time instead of try- lows the original word order. We believe this a novel ing to define and investigate the phenomenon from approach not found in previous studies. both perspectives, thus providing a more compact approach. Hajicovˇ a´ et al. (2004) made an attempt 2 The corpus and its annotation at classification of non-projective constructions on 2.1 The corpora used: Prague Czech-English these two levels separately.1 In our work, we are Dependency Treebank and the Prague trying to use both the surface and deep layer to- Dependency Treebank gether to specify and investigate a “new breed” of non-projectivity in a more holistic approach. Prague Czech-English Dependency Treebank In Natural Language Processing, non-projectivity (PCEDT) is a parallel, linguistically annotated has long been ignored, since the first tree- corpus (Hajicˇ et al., 2012). The texts come from banks, such as the Penn Treebank (Marcus et al., the WSJ part of the Penn Treebank (Marcus et al., 1993), have been annotated using parse trees (or, 1993); the Czech side is their professional trans- phrase-structure-based annotation), which techni- lation. The corpus consists of about one million cally do not allow for direct representation of non- tokens (on each language side) in about 50 thousand projectivity, and the surrogate means (co-indexing aligned sentence pairs. It is currently available from 3 and traces, some of which can be considered to the Linguistic Data Consortium as well as from 4 represent non-projective constructions) have also the LINDAT/CLARIN repository. This corpus been largely ignored by syntactic parsers developed follows the multilayer annotation scenario used in (trained) on them. Only after the development of the original Prague Dependency Treebank (PDT). dependency parsers has started using natively2 anno- The tectogrammatical annotation of these corpora tated dependency treebanks (which naturally do con- includes also links to two valency lexicons, the PDT- tain non-projectivities), non-projectivity has been fi- Vallex (for Czech) and the EngVallex (for English). 5 nally seriously looked at from the parsing perspec- The Czech valency lexicon, called PDT-Vallex, tive (McDonald et al., 2005; Nivre and Nilsson, is publicly available as a part of the one-million- 2005; Nivre, 2006; Kuhlmann and Nivre, 2006; word Prague Dependency Treebank (PDT) version 6 Nivre, 2007; Hall and Nivre, 2008; Nivre, 2009; 2 published by the Linguistic Data Consortium. It Bohnet and Nivre, 2012; Bjorkelund¨ and Nivre, has been developed as a resource for valency anno- 2015). Since such parsers work with the surface- tation in the PDT; it is based on the Functional Gen- syntactic dependency trees, there was no specific at- erative Description valency theory framework - for tention paid to the relation between deep syntax or details, see (Uresovˇ a,´ 2011b; Uresovˇ a,´ 2011a). The 7 semantics and non-projectivity. EngVallex is a lexicon of English verbs, built on In our study, we describe the results of investigat- the same grounds as PDT-Vallex. It was created by ing non-projectivity of verbs and their arguments, a (largely manual) adaptation of an already existing using two levels of description: for defining the resource for English with similar purpose, namely constructions of interest, i.e., verbs and their argu- the PropBank Lexicon (Palmer et al., 2005; Kings- ments, we use the deep syntactic/semantic annota- 3https://catalog.ldc.upenn.edu/ LDC2012T08 1The special linear ordering (which does not follow the sur- 4http://hdl.handle.net/11858/ face word order) of nodes at the tectogrammatical layer of anno- 00-097C-0000-0015-8DAF-4 tation of all PDT-style treebanks will be described in Sect. 2.2.2. 5http://hdl.handle.net/11858/ 2By “natively” annotated dependency treebanks we mean 00-097C-0000-0023-4338-F treebanks originally annotated manually using dependency 6http://www.ldc.upenn.edu/LDC2006T01 scheme and guidelines, as opposed to phrase-based treebanks 7http://hdl.handle.net/11858/ converted automatically to dependencies ex-post. 00-097C-0000-0023-4337-2

13 bury and Palmer, 2002), to the PDT labeling stan- of attributes describing their syntactic and seman- dards (see also (Cinkova,´ 2006)). tic properties. Edges are labeled by the (mostly se- mantic) types of dependency relations, called ‘func- 2.2 PCEDT and PDT annotation tor’s. As opposed to the surface syntactic annota- The PCEDT is annotated on both the Czech and the tion, function words and punctuation have no nodes English side using PDT-style of annotation. Every of their own; only content words are kept. How- sentence is annotated at three, explicitly interlinked ever, in addition to the content words that have a sur- layers: morphology, dependency syntax (Hajic,ˇ face counterpart, there are also nodes which have no 1998) and tectogrammatics (deep syntax/semantics). surface counterpart (some types of restored ellipses, such as surface-elided semantically obligatory verb 2.2.1 Surface dependency syntax arguments etc.). The surface dependency syntax annotation in both the PCEDT and the PDT (Hajicˇ et al., 2004) as- The set of ‘functors’ is different (and richer) signs a node to each word and punctuation sym- than the set of dependency relations at the surface bol in the sentence. It is rooted in an extra node dependency level. While verb arguments are de- holding the ID and other bookkeeping information scribed by five core argument functors (Actor (ACT) about the sentence. Heads are determined, when and Patient (PAT) for the first two, and then the in doubt, using the morphosyntactic argument: if a more semantically defined Addressee (ADDR), Ef- node controls the morphosyntactic behavior of the fect (EFF) and Origin (ORIG)), there is a set of word directly related to it, for example by agree- about 30 adverbial types (LOCation, DIR1ection ment, morphosyntactic control constraints etc., it is (from), DIR3ection (to), MANNer, ACMP for ac- considered to be the head. All relations (edges in companiment, TWHEN, TSINce, THL (how long) the tree) are labeled by the type of the relation. In the and several more for time adverbials, CAUSe, BEN- PDT (and PCEDT), there are a relatively few coarse- eficiary, etc.). For nominal modifiers, RSTR and grained types: Pred and Pnom for predicate and DESC (restrictive and descriptive dependent) are the nominal part of a predicate in copula construc- added. Nodes serving as structure descriptors (such tions, respectively, then Sb, Obj and Adv for verb as coordination and aposition “heads”) are similar dependents (Subject, Object, and Adverbials), and to the ones used at the surface dependency layer of Atr for all nominal modifiers. Auxiliaries are di- annotation. vided into another set of types, such as AuxV (func- In addition, every verb (i.e., content verb) in the tion word-verb), AuxP for prepositions (which are treebank is disambiguated for its sense based on an heads) and AuxC for subordinate conjunctions, to inventory of senses in the corresponding valency name the most important. There are also ‘structural’ lexicon (PDT-Vallex for Czech and EngVallex for labels for coordination, apposition and parenthetical English, cf. Sect. 2.1). Its arguments as annotated relation. An example is in Fig. 1. in the treebank correspond to the argument ‘slots’ as Importantly, for the investigation of non- recorded in the valency lexicons. Morphosyntactic projectivity, all the nodes are numbered by ordinal constraints on the individual arguments as recorded numbers starting with 0 for the extra root node, 1 in the lexicons have been checked and are consistent for the first word in the sentence in its surface word with the treebank annotation of the corresponding order, etc., forming a total linear ordering of all the argument nodes. nodes. Ordering of nodes in the tectogrammatical anno- 2.2.2 Deep syntax and semantics tation (also a total linear order) does not correspond, The tectogrammatical annotation layer is based however, to the surface word order, and thus any on the Functional Generative Description theory non-projectivity seen in the tectogrammatical anno- (Sgall et al., 1986). The structure of a sentence is tation can only be judged relatively to the definition represented as a rooted tree (as it is at the surface of the “deep word order” and thus it has not been dependency level), with nodes bearing a number used here (for its prevalent use, cf. (Hajicovˇ a´ et al.,

14 2004)).8 which are annotated on the deep (tectogrammatical) level, we have modified the definition combining the 3 Definition of non-projectivity two layers. The modified definition, named CLP (Combined-Layer Projectivity), follows these three 3.1 Dependency syntax and non-projective rules for determining the necessary components of constructions the original definition: The definition of projectivity we are using is as fol- lows (from (Hajicovˇ a´ et al., 2004) and (Havelka, words (nodes for verbs, their arguments and • 2005)): their dependents/descendants) are taken from Definition. A subtree S of a rooted dependency the tectogrammatical level; tree T is projective if for all nodes a, b and c of the subtree S the condition (P) holds: dependencies (i.e., the structure of the subtrees • (b a & b < a & c b c < a) or of interest) are also taken from the tectogram- ↓ ↓↓ → (b a & b > a & c b c > a) (P) matical level of annotation (used for determin- ↓ ↓↓ → where b a means that b is immediately dependent ing the and relations in the definition (P)); ↓ ↓ ↓↓ on a, c b means that c is a descendant of b (i.e., ↓↓ linear ordering is taken from the surface syntac- transitively dependent), and < and > have the usual • meaning with respect to the linear ordering of nodes. tic level of annotation, using the surface node’s (referred to by the lex.rf link from the tec- 3.2 Measure of the degree of non-projectivity togrammatical node) ord attribute, i.e., the Havelka (2005) introduces the notion of a gap as surface word order is used. a set of all nodes that ’cause’ an edge to be non- While we could have possibly used the surface projective, i.e., the head node of such an edge be- dependencies for determining non-projectivity, the ing a root of a non-projective tree. However, in our approach outlined above gives more adequate re- work, we believe that the mere set of words, or even sults since (a) we are focusing on verbs and their their count, is too fine-grained to describe the ’de- arguments, which naturally occur at the deep layer gree’ of non-projectivity, at least for the purposes of annotation and (b) this annotation has been done of this study on verb-headed constructions. There- fully manually in all three corpora we use, while the fore, we define a gap as the number of continuous surface syntax has been generated automatically on spans (rather than a number of all words) that ‘in- both sides of the PCEDT and thus is not reliable, terfere’ in (are not part of) the yield of the node, of especially with regard to non-projectivity.9 which the subtree rooted by it is being tested for non- To illustrate the gap measure as defined earlier, projectivity. We also use the phrase “be in the gap” Fig. 1 shows a non-projective construction with one (for a word or node of a tree), if the projection of gap - the projection of the tree based on the linear that word based on its linear surface word order is ordering of nodes (i.e., word order in the case of one of those that fall into that gap. surface dependency syntax) has two parts. In this 4 Finding non-projective constructions example, the word “To” (this) is an Object of the and measuring their complexity verb “splnit” (to fulfill), and therefore, the subtree rooted in “splnit” is non-projective, since the words In our analysis of non-projective constructions re- “je” (is) and “mozno”ˇ (possible) are not descendants lated to verb and its arguments, we have used the of “splnit”, and they both constitute the one single definition described in Sect. 3. However, since gap present in the projection of the “splnit”-rooted we are interested in verbs and their arguments, subtree.

8According to the tectogrammatical annotation manual 9In English, the number of non-projective constructions (Mikulova´ et al., 2006), the linear order of the nodes in the tec- posited by the surface dependency parser is negligible com- togrammatical trees is given by the attribute dord, or “deep or- pared to the number of non-projective constructions determined der” which is defined independently of the surface word order by using the (manually annotated) tectogrammatical dependen- using so-called “contextual boundness” criterion. cies as described in the above three bullets.

15 tation using the modified (CLP) definition as de- a-ln94202-3-p3s2 scribed earlier: the identification of whether a word is a verb or not, or whether a word is an argument je to a verb, has been performed at the tectogrammat- Pred ical level (using all content, i.e., non-auxiliary, non- modal verbs, which had a link to the corresponding

možno splnit Czech or English valency lexicon). Arguments to Pnom Sb such verbs have been identified using the valency dictionary entry, which lists all arguments by their function label (called “functor” in the tectogrammat- To ical annotation scheme, cf. (Mikulova´ et al., 2006)). Obj These labels have been matched to all immediately Cs: To.Obj je.Pred moznoˇ .Pnom splnit.Sb dependent nodes on the verb in the tectogrammati- En: (lit.) This.Obj is.Pred possible.Pnom to fulfill.Sb cal annotation. However, for reasons already men- En: This can be fulfilled tioned, we have used the inter-layer links that the Figure 1: Simple non-projective construction, gap=1 annotation scheme contains, and which connect the nodes in the surface syntax dependency tree with the tectogrammatical one(s) to retrieve the original word The number of gaps can be easily computed for order and use it as described in the third bullet in every node in the surface dependency tree, by going (CLP). through all the nodes from its yield (i.e., through all This way, every construction of a verb and its ar- nodes which are descendants of the node in ques- gument(s)10 could be tested against the enhanced tion) and counting the gaps. However, one has to (CLP) definition of non-projectivity. be careful–subtrees with no gaps can still be non- projective “inside”, i.e., some of their subtrees might 5 Classification of verb-argument still be non-projective with gap count greater than non-projective constructions zero. For the description of non-projectivity of verbs We have extracted all examples of non-projective and their arguments, we have thus computed the constructions for verbs and their arguments from the 11 non-projectivity of the argument-rooted subtrees English and Czech sides of the PCEDT, and for separately from the non-projectivity of the subtree comparison also from the Prague Dependency Tree- rooted by the verb in question, which might have bank (representing natively written Czech texts). no gaps. On the other hand, if any of the argument- The overall number of non-projective construc- rooted subtrees has the gap equal to zero, it is not rel- tions on the surface syntactic level of annotation us- evant to our goals whether there is a non-projectivity ing the original (P) definition of projectivity and the “hidden” inside, for some of its subtrees. In other breakdown by the number of gaps is given in Ta- words, we consider (verb-rooted) subtrees that have ble 1. The total number of nodes at the dependency either syntax layer of the PCEDT is 1,173,766 on the En- glish side and 1,151,150 on the Czech side. The to- non-zero gap measure at the verb root, or tal number of nodes counted in the PDT is 833,193 • (only sentences annotated also at the tectogrammat- non-zero gap measure at any of its arguments. • ical layer have been used). The small number of non-projective constructions For simplicity, we will call these constructions (and only these) non-projective, even though we are 10Unless it is a NULL argument, which has no overt word in the surface sentence as a counterpart; these have been ignored. aware of the fact that we are ignoring gap=0 con- 11 structions with embedded non-projectivity. For those verbs that are translations of an English verb con- struction, to avoid constructions which might be too influenced An important aspect of the extraction was that by the fact that they are translations of a syntactically very dif- we have used both layers of the PDT-style anno- ferent one.

16 Lang. 0 gaps 1 gap 2 gaps >2 gaps gap of the non-projective subtree of its argument) en 479 112 1 0 and 1,407 (932 in English, 476 in Czech) cases of cs (tr.) 61,619 44,774 3,827 449 two arguments (i.e., one argument is in the gap a cs (nat.) 29,912 14,259 196 2 non-projective subtree of another argument). Table 1: Non-projective constructions in surface depndency trees, overall counts říkat PRED říká on the English side of the PCEDT (i.e., in the WSJ kritik vyčkávat ACT EFF texts) is caused by the fact the the parser has been kritik vyčkává trained on non-native dependency annotation, and thus almost always prefers projective constructions. proč Dinkins kdy CAUS ACT MANN The highest number of gaps on the Czech side of Proč Dinkins vždycky the PCEDT was 8, in five cases (and there was no Cs: Procˇ.CAUS Dinkins.ACT, rˇ´ıka´.PRED kritik.ACT, non-projective subtree with 7 gaps). Overall, there vzdyckyˇ .MANN vyckˇ av´ a´.EFF ... is slightly below 10% of non-projective subtrees and En: (lit.) Why.CAUS Dinkins.ACT, says.PRED less than 5% with at least one gap. the kicker.ACT, always.MANN waits.EFF ... In the PDT, the overall number of nodes at the de- En: Why Dinkins always waits ..., says the kicker. pendency syntax layer is 29,912, and as can be seen Figure 2: Non-projective construction, gap=1, verb in gap from the last row of Table 1, the percentages for non- projective nodes and for non-projective nodes with at least one gap are 5.3% and 1.7%, respectively. An example of a Czech construction with non- When the (CPL) definition is used, the num- projectivity of a subtree rooted in a verb argument, bers look differently (Tab. 2). The total number of where the verb is in the gap, is shown in Fig. 2 nodes at the tectogrammatical layer of the PCEDT (it uses the (P) definition on a surface dependency is 757,021 on the English side and 819,206 on the tree). Here, the root verb of the subordinate clause Czech side. The total number of tectogrammatical “vyckˇ av´ a”´ waits, which is an argument (labeled Ef- nodes in the PDT is 593,473. fect) of the matrix verb “rˇ´ıka”´ says on the tectogram- matical layer, dominates a non-projective substree, Lang. 0 gaps 1 gap 2 gaps >2 gaps since the subject has been fronted before the root en 11,328 5,561 15 0 verb of the whole sentence. This is one of the cs (tr.) 9,702 4,503 21 0 very typical cases of non-projective constructions, cs (nat.) 9,186 4,848 53 2 where the main verb is a communication or a re- Table 2: Non-projective constructions in PCEDT and PDT, ported speech verb (say, add, shout, remember, an- overall counts using the (CLP) definition swer, argue, go on, to name a few extracted from the PCEDT).12 This table differs substantially from Tab. 1, giving Another typical example of non-projective con- much more balanced figures due to the manual anno- structions in Czech involving a verb is a construc- tation of the tectogrammatical layer. Based on these tion with a catenative13 (and modals or quasi-modal) observations, we have used only the (CLP) defini- verb like “podarit”,ˇ “zacˇ´ıt”, “zkusit”, “nechat” (lit. tion for our subsequent investigation. “manage”, “start”, “try”, “let”), etc. The ar- gument, which is often non-projective, is the Pa- 5.1 Constructions involving a verb and its tient (PAT), typically expressed as infinitive, whose argument first or second argument (Actor (ACT) or Patient The overall number of verb tokens tested for non- 12 projectivity in the PCEDT was 92,840. Among Counting on a sample of 100 examples from the English side of the PCEDT, 43 have been of this type. those, there are 2,352 cases (1,311 in English, 1,042 13 Catenative verbs are usually defined as those combining in Czech translations) where the non-projectivity in- with non-finite verbal forms, see e.g. (Palmer, 1974; Quirk et volves a verb and its argument (i.e., the verb is in the al., 1985; Mindt, 1999; Leech et al., 2012).

17 (PAT)) is fronted “across” the verb. An example market would get.RSTR” where the root “get” of is “domy.PAT nezkousej.PREDˇ prodavat.PAT´ bez the relative clause depends on “opinion”, and there- makle´re.ACMP”ˇ (lit. “houses.PAT do-not-try.PRED fore “opinion” heads a non-projective subtree with sell.PAT without an-agent.ACMP).14 the predicator“is (mixed)” falling in the gap.15 An- other example is “... the plan.PAT is.PRED impos- appear.enunc sible.PAT to accommodate.PAT”, where “plan” is a PRED appeared dependent of “accommodate”, which itself is a de- pendent of “is”, creating a non-projective subtree t-shirt corridor ACT LOC rooted in “accommodate”. T-shirts in the corridors In Czech, there are only a few constructions which allow similar non-projectivity to the one just carry RSTR described for English, typically containing the verb carried “byt”´ as a copula: “... dividendy.ACT jsou.PRED

that splatne.PAT´ k 2. lednu.TWHEN z akci´ı.RSTR ...” ACT that (lit. ... dividends.ACT are.PRED payable.PAT En: T-shirts.ACT appeared.PRED in the corridors.LOC Jan.TWHEN 2 to stock.RSTR) where “akci´ı” (lit. that.ACT carried.RSTR ... shares depends on “dividendy” (lit. dividends), and Cs (lit.): *Trickaˇ .ACT se objevila.PRED na thus causes the non-projectivity of the subtree rooted chodbach´ .LOC ktera´.ACT nesla.RSTR ... in “dividendy”, with the verb “jsou” (lit. are) in the Cs: Na chodbach´ se objevila tricka,ˇ ktera´ nesla ... gap. In English, but possible in Czech too16, is a Figure 3: Non-projective construction with ACT’s dependent construction in which a verb argument is modi- (RSTR) branching non-projectively to the right, verb in gap fied by two or more modifiers, with one imme- diately following it in the surface word order but In English, in one of the rare cases where there is the other being far right, after additional arguments no Czech non-projective counterpart, a construction or adjuncts of the dominant verb, such as in: “A which gives rise to non-projectivity is a verb argu- total.ACT of 139 companies.RSTR raised.PRED ment (typically Actor (ACT) expressed as Subject, dividends.PAT in October.TWHEN, basically un- i.e., in active voice) preceding the verb, which is changed.RSTR ...”, where “unchanged” is a depen- then complemented by a time or location expression dent of “total,” not the verb,17 putting the verb (and and only then an relative clause dependent on the some of the additional dependents of the verb, such argument is placed: “T-shirts.ACT appeared.PRED as “dividends” and “October”) in the gap of the non- in the corridors.LOC that.ACT carried.RSTR ...” projective subtree rooted in “total”. (Fig. 3). Here, in the tectogrammatical represen- tation, “T-shirt” is the Actor (ACT), and argument 15One could argue that the subordinate clause could be con- of “appear”, and the clause starting “that carried...” sidered Adverbial clause depending on the verb, in which case depends on it. In the tree, the subtree rooted in there will be no non-projectivity. However, the distinction be- “T-shirt” is non-projective, since the verb “appear” tween “opinion on [clause] is mixed” and “opinion is mixed on [clause]” has been considered to be in the information structure (and all the words from the location adverbial, i.e., rather than in syntax (Hajicˇ et al., 2004), and thus the structure “in the corridors”) form the gap. Another English in the PDT-style of annotation is the same. This argument holds, example involving a copula is “... opinion.ACT due to morphosyntactic considerations such as agreement, more is.PRED mixed.PAT on how much of a boost the firmly for Czech, but it was applied to English as well by anal- ogy. 14In such constructions, a question might arise how the 16Even though all cases that we have found in the PCEDT shared argument between the head verb and the non-finite de- have been translated using a completely different (and projec- pendent verb is treated: as has been described earlier, any node tive) constructiton. elided on the surface (even if present at the tectogrammatical 17We are leaving aside the discussion whether annotating layer) are ignored for non-projectivity considerations due to the “unchanged” as a dependent on “total” is adequate for the se- non-existence of its word order index, which we in no way try mantic/tectogrammatical layer of annotation, but at the moment to re-create. this is how such Measure Phrases have been treated in PDT.

18 5.2 Constructions involving two or more prevail 2:1 or more over right-branching, the num- arguments ber of right-branching non-projectivities rooted at These cases are less frequent than the cases involv- verb arguments is substantial (and thus, worth fur- ing the verb being in the gap of the non-projective ther studies). argument-rooted tree, but they do exist. 6 Conclusions Similar to the case of verb-argument non- projectivy of the “T-shirts.ACT appeared.PRED in We have described the results of investigation of the corridors.LOC that carried.RSTR ...”-type as de- non-projective constructions involving verbs and scribed in the previous section, is a construction their arguments, using no predefined classification where a Patient (PAT) follows a verb, followed by scheme but an annotated material of the Prague an adverbial (dependent on the verb), and only then Czech-English Dependency Treebank and the origi- the attribute of the Patient follows: “ABC.ACT nal Prague Dependency Treebank. We can summa- signed.PRED an agreement.PAT with DEF.ADDR rize our findings in a few main points: under which shares will be acquired.RSTR ...”. as a starting point, we have divided the corpus Since “with ...” is an argument (Addressee) of “sign” • material to those constructions that involve the at the tectogrammatical layer, and thus depends on verb and at least one of its arguments vs. those it, the subtree rooted in the deep object (PAT) argu- involving two or more arguments (and not the ment “agreement” is non-projective. The type of this verb itself), under the hypothesis that these argument-argument non-projectivity is PAT-ADDR two cases will display different behavior; how- (the ADDR-labeled argument is projected to the gap ever, this proved not to be a crucial distinction in the yield of the subtree rooted in “agreement”. (“(a tak) transakce.PAT je.PRED pritom.TPARˇ 5.3 Left vs. right non-projective edges levnejˇ sˇ´ı provad´ et...”ˇ lit. (and so) trans- action.PAT is.PRED at-the-same-time.TPAR It is well known that fronting or ’movement to the cheaper to perform vs. “(a tak) je.PRED left’ tends to create non-projective constructions. In transakce.PAT pritom.TPARˇ levnejˇ sˇ´ı provad´ et”ˇ (Hajicovˇ a´ et al., 2004), it was only such moves that lit. (and so) is.PPRED transaction.PAT at- have been investigated, due also to their relation to the-same-time.TPAR cheaper to perform, with information structure which was one of the foci in “transakce” depending on “provad´ et”);ˇ that study. However, in our study, we also wanted to in- the most frequent case is the construction with • vestigate whether non-projective edges leading to the communication/reporting verbs (verba di- the right (both in Czech and English) are rare(r), cendi and similar verbs) when used in the mid- or whether they differ substantially from those left- dle of the direct or report speech construction branching ones studied previously. they introduce; nominals used as arguments can have their at- Lang. left (%) right (%) • en 1122 (64.93%) 606 (35.07%) tribute(s) (whether expressed by a clause or as cs (tr.) 913 (68.96%) 411 (31.04%) prepositinal phrase) across other arguments or cs (nat.) 1,945 (79.85%) 491 (20.15%) adjuncts of the verb; Table 3: Left- vs. right-branching non-projective subtrees as expected, certain types of non-projectivity • rooted in a verb argument are due to the conventions used in the annota- tion; The statistics alone show two things: first, the prevalence of left-branching non-projective edges is when comparing native Czech with translated • much higher in the native Czech treebank (PDT) Czech, the statistics on the direction of non- than on Czech side of the PCEDT (which suggest projective branching rooted in a verb argu- influence of non-projective constructions on trans- ment suggests that translators are probably in- lation), and second, that while left-branching does fluenced by the source English and do not use

19 left-branching non-projective constructions as specifically, we are grateful to the anonymous re- often as they appear in native Czech; viewer #2, whose in-depth review helped us to re- alize and correct several important shortcomings of we have independently confirmed that the fo- • the original version. cus on fronted or left-moved constructions in (Hajicovˇ a´ et al., 2004) was right, but that roughly 1/3 of non-projective constructions References rooted in a verb argument are right-branching and thus not to be ignored in future research; Anders Bjorkelund¨ and Joakim Nivre. 2015. Non- deterministic oracles for unrestricted non-projective certain types of verb-related non-projectivities transition-based dependency parsing. In Proceedings • described in (Hajicovˇ a´ et al., 2004), such of the 14th International Conference on Parsing Tech- nologies, pages 76–86. as a nominal group in Czech with dislo- Bernd Bohnet and Joakim Nivre. 2012. A transition- cated RSTR (depending on a verb argu- based system for joint part-of-speech tagging and la- ment) (“spolecnou.RSTRˇ mame.PRED´ ... zod- beled non-projective dependency parsing. In Proceed- povednost.PAT”,ˇ lit. “common.RSTR we- ings of the 2012 Joint Conference on Empirical Meth- have.PRED ... responsibility.PAT), were not at- ods in Natural Language Processing and Computa- tested in translated Czech (PCEDT), but have tional Natural Language Learning, pages 1455–1465. been found in the PDT. The same holds for nu- Silvie Cinkova.´ 2006. From propbank to engvallex: merals with a dislocated dependent. Adapting the propbank-lexicon to the valency theory of the functional generative description. In Proceed- In terms of future work, there are two possible di- ings of the 5th International Conference on Language rections. In the technological area, the results (es- Resources and Evaluation (LREC 2006), pages 2170– pecially on English) confirm that non-projectivity 2175, Genova, Italy. ELRA, ELRA. is indeed going to be a problem for (deep) parsers, Jan Hajic,ˇ Jarmila Panevova,´ Eva Bura´novˇ a,´ Zdenkaˇ and that even surface dependency parsers should be Uresovˇ a,´ Alevtina Bemov´ a,´ Jan stˇ epˇ anek,´ Petr Pajas, looked at again to see if improvements are possi- and Jirˇ´ı Karn´ ´ık. 2004. Anotace na analyticke´ rovine.ˇ Navod´ pro anotatory.´ Technical Report TR-2004-23, ble based on error analysis using the classification UFAL/CKL´ MFF UK, Prague. presented. On the theoretical side, we would like Jan Hajic,ˇ Eva Hajicovˇ a,´ Jarmila Panevova,´ Petr Sgall, to (a) continue to investigate the less frequent cases Ondrejˇ Bojar, Silvie Cinkova,´ Eva Fucˇ´ıkova,´ Marie which we have not included in this study, (b) involve Mikulova,´ Petr Pajas, Jan Popelka, Jirˇ´ı Semecky,´ other features of the tectogrammatical annotation, Jana Sindlerovˇ a,´ Jan Stˇ epˇ anek,´ Josef Toman, Zdenkaˇ such as the information structure (topic/focus anno- Uresovˇ a,´ and Zdenekˇ Zabokrtskˇ y.´ 2012. Announc- tation, and/or co-reference information) and (c) de- ing Prague Czech-English Dependency Treebank 2.0. fine the types of non-projective verb-argument con- In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), structions more formally, to allow for an automatic pages 3153–3160, Istanbul, Turkey. European Lan- classification, e.g., on a large corpus. guage Resources Association. Acknowledgments Jan Hajic.ˇ 1998. Building a syntactically annotated cor- pus: The prague dependency treebank. In Issues of This work described herein has been supported by Valency and Meaning. Studies in Honour of Jarmila the grant GP13-03351P of the Grant Agency of Panevova´ (ed. Eva Hajicovˇ a)´ . Karolinum, Charles the Czech Republic and by the LINDAT/CLARIN University Press, Prague, ISBN 80-7184-601-5. Research Infrastructure projects, LM2010013 and Eva Hajicovˇ a,´ Jirˇ´ı Havelka, Petr Sgall, Katerinaˇ Vesela,´ and Daniel Zeman. 2004. Issues of projectivity in the LM2015071 funded by the MEYS of the Czech Re- Prague Dependency Treebank. The Prague Bulletin of public. It has also been using language resources Mathematical Linguistics, (81):5–22. developed and distributed by the LINDAT/CLARIN Johan Hall and Joakim Nivre. 2008. Parsing discontin- project (http://lindat.cz). uous phrase structure with grammatical functions. In We would like to thank to all the three review- Advances in Natural Language Processing, 6th Inter- ers of the paper, who provided valuable comments; national Conference, GoTAL 2008, Gothenburg, Swe-

20 den, August 25-27, 2008, Proceedings, pages 169– Joakim Nivre and Jens Nilsson. 2005. Pseudo-projective 180. dependency parsing. In Proceedings of the 43rd An- Jirˇ´ı Havelka. 2005. Projectivity in totally ordered rooted nual Meeting of the Association for Computational trees: An alternative definition of projectivity and opti- Linguistics (ACL), pages 99–106. mal algorithms for detecting non-projective edges and Joakim Nivre. 2006. Constraints on non-projective de- projectivizing totally ordered rooted trees. The Prague pendency parsing. In Proceedings of the 11th Confer- Bulletin of Mathematical Linguistics, (84):13–30. ence of the European Chapter of the Association for Jirˇ´ı Havelka. 2007. Beyond projectivity: Multilin- Computational Linguistics (EACL), pages 73–80. gual evaluation of constraints and measures on non- Joakim Nivre. 2007. Incremental non-projective depen- projective structures. In Proceedings of the 45th An- dency parsing. In Proceedings of Human Language nual Meeting of the Association for Computational Technologies: The Conference of the North American Linguistics, pages 608–615, Praha, Czechia. UFAL´ Chapter of the Association for Computational Linguis- MFF UK, Association for Computational Linguistics. tics (NAACL HLT), pages 396–403. Richard Hudson. 1994. Discontinuous phrases in depen- Joakim Nivre. 2009. Non-projective dependency parsing dency grammar. (6):89–124. in expected linear time. In Proceedings of the Joint P. Kingsbury and M. Palmer. 2002. From Treebank to Conference of the 47th Annual Meeting of the ACL and Propbank. In Proceedings of the 3rd International the 4th International Joint Conference on Natural Lan- Conference on Language Resources and Evaluation guage Processing of the AFNLP, pages 351–359. (LREC-2002), pages 1989–1993. Citeseer. Karel Oliva. 2001. Nekterˇ e´ aspekty komplexity ceskˇ eho´ Marco Kuhlmann and Joakim Nivre. 2006. Mildly non- slovn´ıho neporˇadku.´ 3:163–172. projective dependency structures. In Proceedings of Martha Palmer, Dan Gildea, and Paul Kingsbury. 2005. the 21st International Conference on Computational The proposition bank: An annotated corpus of seman- Linguistics and 44th Annual Meeting of the Associ- tic roles. Computational Linguistics, 31(1):71–106. ation for Computational Linguistics (COLING-ACL) F. Palmer. 1974. The English Verb. Longman, London. Main Conference Poster Sessions, pages 507–514. Vladim´ır Petkevic.ˇ 1998. Special Cases of Non- Geoffrey N. Leech, Marianne Hundt, Christian Mair, and Projective Constructions in the Syntax of Czech Sen- Nicholas Smith. 2012. Change in Contemporary En- tence. pages 61––66. glish. Cambridge University Press, New York. Vladim´ır Petkevic.ˇ 2001. Neprojektivn´ı konstrukce v Mitchell P. Marcus, Beatrice Santorini, and Mary Ann ceˇ stinˇ eˇ z hlediska automaticke´ morfologicke´ disam- Marcinkiewicz. 1993. Building a Large Annotated biguace ceskˇ ych´ textu.˚ In Ceˇ stinaˇ - univerzalia´ a Corpus of English: The Penn Treebank. COMPUTA- specifika 3. Sborn´ık konference ve Slapanicˇ ´ıch u Brna, TIONAL LINGUISTICS, 19(2):313–330. 22.-24.11.2000 (eds. Zdenkaˇ Hladka,´ Petr Karl´ık), Solomon Marcus. 1965. Sur la notion de projectivite.´ pages 197–205. MU Brno. Mathematical Logic Quarterly, 11(2):181–192. Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, Ryan McDonald, Fernando Pereira, Kiril Ribarov, and and Jan Svartvik. 1985. A Comprehensive Grammar Jan Hajic.ˇ 2005. Non-projective dependency pars- of the English Language. Longman, London. ing using spanning tree algorithms. In Proceedings of Petr Sgall, Eva Hajicovˇ a,´ and Jarmila Panevova.´ 1986. Human Langauge Technology Conference and Confer- The Meaning of the Sentence in Its Semantic and ence on Empirical Methods in Natural Language Pro- Pragmatic Aspects. Dordrecht, Reidel, and Prague, cessing, pages 523–530, Vancouver, BC, Canada. As- Academia, Prague. sociation for Computational Linguistics, Association Frantisekˇ Stˇ ´ıcha. 1996. Krˇ´ızenˇ ´ı vetˇ v ceˇ stinˇ e.ˇ Naseˇ reˇ cˇ, for Computational Linguistics. 79(1):26–31. Marie Mikulova,´ Alevtina Bemov´ a,´ Jan Hajic,ˇ Eva Ludmila Uhl´ırovˇ a.´ 1972. On the non-projective con- Hajicovˇ a,´ Jirˇ´ı Havelka, Veronika Kola´rovˇ a,´ Lucie structions in czech. The Prague Bulletin of Mathemat- Kucovˇ a,´ Marketa´ Lopatkova,´ Petr Pajas, Jarmila ical Linguistics, (3):171–181. ˇ Panevova,´ Magda Raz´ımova,´ Petr Sgall, Jan Stepˇ anek,´ Zdenkaˇ Uresovˇ a.´ 2011a. Valence sloves v Prazskˇ em´ Zdenkaˇ Uresovˇ a,´ Katerinaˇ Vesela,´ and Zdenekˇ zavislostn´ ´ım korpusu. Studies in Computational and ˇ Zabokrtsky.´ 2006. Annotation on the tectogrammati- Theoretical Linguistics. Ustav´ formaln´ ´ı a aplikovane´ cal level in the Prague Dependency Treebank. Annota- lingvistiky, Praha, Czechia. tion manual. Technical Report 30, Prague, Czech Rep. Zdenkaˇ Uresovˇ a.´ 2011b. Valencnˇ ´ı slovn´ık Prazskˇ eho´ Dieter Mindt. 1999. Finite vs. Non-Finite Verb Phrases zavislostn´ ´ıho korpusu (PDT-Vallex). Studies in Com- in English. In Form, Function and Variation in En- putational and Theoretical Linguistics. Ustav´ formaln´ ´ı glish, pages 343–352, Frankfurt am Main. Peter Lang a aplikovane´ lingvistiky, Praha, Czechia. GmbH.

21 Machine Translation of Non-Contiguous Multiword Units

Anabela Barreiro1 and Fernando Batista1,2 (1) INESC-ID Lisboa, Portugal (2) ISCTE-IUL, Instituto Universitário de Lisboa, Portugal {anabela.barreiro, fernando.batista}@inesc-id.pt

Abstract frequently with different degrees of compositional- ity. Some represent free combinations, such as the Non-adjacent linguistic phenomena such as English noun phrase round table, (i.e., meeting), non-contiguous multiwords and other phrasal some have opaque meanings, where the meaning of units containing insertions, i.e., words that are the unit cannot be deduced from the meaning of its not part of the unit, are difficult to process and remain a problem for NLP applications. individual constituents, such as piece of cake used Non-contiguous multiword units are common figuratively with the meaning of something easy to across languages and constitute some of the do, or pay a visit equivalent to the verb visit. Trans- most important challenges to high quality ma- lations of multiword units are often idiomatic and chine translation. This paper presents an unpredictable and a word-for-word translation may empirical analysis of non-contiguous multi- result in poor quality translation (acid test). Addi- words, and highlights our use of the Logos tionally, in many cases, the idiom does not exist, or Model and the Semtab function to deploy semantic knowledge to align non-contiguous exists with a different structural and lexical form, in multiword units with the goal to translate these the target language (raining cats and dogs). Finally, units with high fidelity. The phrase level man- the morpho-syntactic properties of multiword units ual alignments illustrated in the paper were allow, in some cases, the insertion of external ele- produced with the CLUE-Aligner, a Cross- ments into the unit (go for a [INSERTION] ride). Language Unit Elicitation alignment tool. Notwithstanding the efforts undertaken to im- prove multiword unit processing, lack of formaliza- 1 Introduction tion still triggers problems with the syntactic and se- mantic analysis of sentences where multiwords oc- Recently, in natural language processing (NLP), cur and impairs the performance of NLP systems, af- there has been an increasing interest in multiword fecting especially machine translation (MT). When- units and in the problems they raise. Multiword ever a multiword unit contains insertions, there is units, most commonly known as multiword expres- a remote dependency that contributes to additional 1 sions , have been defined by Baldwin and Kim difficulties in analysing and translating that multi- (2010) as "lexical items that: (a) can be decom- word unit. The causes for poor quality translation of posed into multiple lexemes; and (b) display lexi- lexical and semantico-syntactic phenomena, namely cal, syntactic, semantic, pragmatic and/or statistical cross-linguistic multiwords and other phrasal units, idiomaticity". Compositionality is the property that can be summed up in three points: makes the automatic processing of multiword units particularly challenging. Multiword units occur very 1. Current methodologies rely mostly on statisti- cal techniques to train and evaluate MT sys- 1This term has also been designated inter alia as "multiword lexical itens", "phraseological units" and "fixed expressions", tems. Statistical machine translation (SMT) with slight variations in scope and meaning. models are built on the grounds of alignment

22

Proceedings of DiscoNLP 2016, pages 22–30, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics

pairs acquired mostly automatically. Unsu- 2 Addressing the Challenges pervised learning approaches adopted by SMT systems use probabilistic alignments where lin- The first problem highlighted in section 1, has been guistic knowledge is still limited. Inability to addressed by the creation of an empirical basis for identify multiword units correctly often results identifying support verb constructions in a corpus in translation deficiencies. and understand the impact of these multiword units on translation quality. Barreiro et al. (2014) eval- uated support verb constructions by two MT sys- 2. Shortcomings in current state of the art su- tems, OpenLogos and Google Translate, and con- pervised learning and manual word align- cluded that neither of these systems translates them ment standard practices, such as lack of pub- well. Overall, OpenLogos suffers from a weak lex- licly available manual multilingual datasets, icon, while Google Translate translation errors are and lack of linguistically motivated alignment more of a structural nature. Although the transla- guidelines impose significant constraints on tions are still problematic, the Logos Model presents translation quality, because they disregard non- an advantage with regards to the SMT approach: adjacent linguistic phenomena or syntactic dis- the Logos Model relies on deep semantico-syntactic continuity. analysis to translate not only contiguous multiword units, such as the support verb construction to draw 3. Current tools are incapable of assisting, in an a distinction between, but also non-contiguous mul- efficient way, human annotators in the task tiword units, such as the support verb construction of identifying correctly non-contiguous multi- to bring [INSERTION] to a conclusion. words and produce rules from them. The second problem reported has been ap- proached by the creation of manually annotated alignments – the Gold CLUE4Translation – which This paper discusses the aforementioned prob- represent an important asset in the development lems from an empirical point of view and provides of MT systems. Supervised learning uses man- a solution for them in an experimental research ual alignments and aims at taking context, syn- inspired in the Logos Model to machine transla- tax and other grammatical and semantic knowl- tion (Scott, 2003; Barreiro et al., 2011).2 We use edge into consideration. The Logos Model served the Europarl corpus (Koehn, 2005) to illustrate the as inspiration to deploy this linguistic knowledge kind of linguistic knowledge that needs to be repre- into the alignment task through the identification sented in future alignment tasks, with a special fo- of translation relationships among words, multi- cus on the alignment challenges presented by non- words or phrasal units in bilingual parallel sen- contiguous multiword units. The alignment exam- tences, i.e., sentence pairs that have been identified ples in the paper were annotated with the CLUE- as translation of each other. It also inspired the es- Aligner tool (Barreiro et al., 2016). Even though tablishment of new linguistically-motivated align- similar in name to the "clue alignment approach" ment guidelines for pairs of translation units – the (Tiedemann, 2003; Tiedemann, 2011), mainly de- CLUE4Translation Alignment Guidelines – that aim voted to word-level alignment, our approach is the- to improve the quality of the (machine) translation of oretically and methodologically different with a fo- multiword units, among other linguistic phenomena. cus on phrase alignment, contemplating multiwords The third problem was tackled by the creation of and linguistically-relevant phrasal units. In addition, a solution for the annotation of non-contiguous mul- in our approach, CLUE is an acronym for "Cross- tiwords and other phrasal units – the aforementioned Language Unit Elicitation", where a source and a CLUE-Aligner3 – a web alignment tool that places a target language can correspond to the same language special emphasis in the annotation of pairs of seman- as in the case of paraphrases. tically equivalent non-adjacent structures in mono-

2The Logos Model underlies both the commercial system 3https://esperto.l2f.inesc-id.pt/esperto/ and its degraded open source version OpenLogos. aligner/index.pl?

23 lingual and bilingual parallel sentences. The pairs thoughts) on) in Portuguese. The different transla- of non-contiguous multiword units and phrasal ex- tions of deal are related to the idiomatic ways that pressions can be used in rule development. predicate nouns select their support verbs in differ- ent languages: take a vow in English, but ‘make a 3 The Logos Model vow’ in the Romance languages (hacer in Spanish, faire in French, and fazer in Portuguese). The struggles of SMT with multiwords have been reported in several research works (Barreiro et al., (1) EN - our Asian partners prefer to deal with questions which unite us 2013; Kordoni and Simova, 2014; Barreiro et al., ES - nuestros socios asiáticos prefieren dedicarse a las 2014; Semmar, 2012), among others. Multiword questiones que nos unen units are a source of mistranslations not only by MT FR - nos partenaires asiatiques préfèrent s’attacher à systems, but also by professional translators, in part ([a+a]) ce qui nous unit because they are a source of various contextual nu- PT - os nossos parceiros asiáticos preferem centrar-se ances, but also because they can be non-contiguous. unicamente nas ([em+as]) questões comuns For example, verbal expressions such as the En- If the different nuances of a verbal expression are glish prepositional verb to deal with take difference difficult to capture even for translators, it is not sur- senses (and translations) depending on contexts, typ- prising that these expressions are poorly translated ically their object or prepositional phrase comple- by MT systems, unless these systems integrate se- ment. If the context of the verb is to deal with ques- mantic or contextual knowledge and apply it to the tions, as in example (1), then the French translation translation process, as illustrated in example (2). should be s’occuper de (to be busy with). On the The French MT output of example (1) by the Google other hand, if the context is he proved unable to Translate (GT) system is a literal translation where deal with the problem, then the translation should no context has been taken into consideration. How- be the translation of its paraphrase handle the prob- ever, the OpenLogos translation is correct and even lem. However, if the context is he refused to deal of a higher quality than that provided by a profes- with the problem, then the translation would be a sional translator in the Europarl corpus (s’occuper translation of the paraphrase analyse and try to solve de is more precise than s’attacher a in that context). the problem. These different nuances are related to the ambiguity and weakness of the verb deal and the (2) FR GT - nos partenaires asiatiques préfèrent *traiter − different meanings of the predicate-like nouns ques- avec des questions qui nous unissent tions (issues, topics, interrogations, etc.) or prob- FR OL - nos associés asiatiques préfèrent s’occuper des questions− qui nous unissent lem (difficulty, exercise, etc.). It is the meaning of these nouns that triggers the different translations of The precision in the OpenLogos translation is as- deal, just like the verb take will have different trans- sociated with the application of a Semtab contex- lations depending on the predicate noun it supports tual pattern-rule, which is a deep structure pattern (walk, responsibility, comfort, etc.). Therefore, the that matches on/applies to a great variety of surface two slightly different meanings for problem in the structures: last two examples explain the distinct paraphrase: 4 handle in one case, and analyze and try to solve in (3) deal(VI) with N(questions) = s’occuper de N the other case. This Semtab pattern-rule states that, when fol- In the Europarl corpus used in our exploratory lowed by the direct object noun questions or a noun study not all translations are optimal and often trans- of the same semantico-syntactic class, the verb is lational equivalents are approximate rather than ex- translated as s’occuper de, overriding the default act. Therefore, the English prepositional verb to 4 deal with in example (1) is translated in the Ro- Here we only display the comment line of the Semtab rule, mance languages as dedicarse a (engage in) in Span- not the rule itself or what it does in terms of the Logos language. The rule notation is arcane due to its numeric representation and ish, the reflexive s’attacher a (focus on/stick to) in it would take a larger effort to explain the use and meaning of French, and centrar-se em (concentrate/center (their the distinct codes in the Logos Model.

24 Pattern #occurences #unique one with the highest spectrum of common usages. bring []to a conclusion 114 84 About 83% of the forms occur more than once. The set []in motion 162 140 play []role 5165 1216 fourth pattern occurs more than once 65% of the take []interest in 360 163 times, revealing these commonly adopted construc- keep []informed about 77 58 tions appear in many different forms. On the other Table 1: Statistics from a subset of Europarl. hand, the first, second and fifth patterns occur only once, 62%, 78% and 62% of the times, thus suggest- dictionary translation for this verb. The power of this ing that learning automatic models to deal with these rule is that it allows the translation system to recog- type of constructions may not be straightforward. nize and analyze multiword units, even when the el- The remainder of this section discusses each case ements of the multiword units are non-contiguous. taking into account the Logos Model and show- The alignment of multiword units to feed a SMT ing how each alignment is represented in CLUE- system needs to reflect these semantic nuances, in Aligner. a similar way to the way the Logos Model uses data- driven pattern-rules to account for these nuances.5 4.1 bring []to a conclusion This proves that alignments that mirror Semtab se- In example (4), the English non-contiguous sup- mantic and contextual pattern-rules of the Logos port verb construction bring [INSERTION] to a Model can help create new MT systems and improve conclusion places the predicate noun conclusion, existing ones. with its adnominal modifiers, ten words apart from the support verb bring. The Spanish, French, and 4 Alignment of Non-Contiguous Portuguese translation equivalents adopt different Multiwords Inspired by Logos stylistic variants and simpler surface structures (i.e., syntax) by transforming the support verb construc- Non-contiguous multiword units are difficult to rec- tion into semantically equivalent verbal construc- ognize and process causing many MT systems to tions, the single verb acelerar (speed up, acceler- fail in providing the correct translations. For SMT ate) in Spanish or the compound verbs faire avancer systems, non-contiguous multiword units represent (make advance) and apressar-se a apresentar (hurry a significant challenge to a correct word and phrase to present) in French and Portuguese, respectively. alignment (Shen et al., 2009).

The Europarl corpus contains a significant num- (4) EN - I would urge the European Commission to bring the ber of occurrences of non-contiguous multiword process of adopting the directive to on additional pensions units, such as the support verb constructions illus- to a conclusion trated in subsections 4.1 – 4.5, which are a source ES - insto a la comisión europea para que acelere la direc- of translation errors due to incorrect alignment. Ta- tiva sobre pensiones complementares ble 1 shows the number of occurrences for five dif- FR - j’insiste auprès de la comission européenne pour faire avancer la directive sur les pensions complémen- ferent cases of non-contiguous multiword units in a taires subset of the Europarl corpus, containing about 47.4 PT - exorto a comissão europeia a apressar-se a apresen- million words, where the search was performed us- tar a directiva relativa as pensões complementares ing all forms of each verb. The third pattern is the most common type of multiword unit and also the This non-contiguous support verb construction, with a remote placement of one of the components 5Unlike rule-based MT models, the Logos Model is not rule- driven, but data-driven, i.e., in Logos, patterns, not rules, do the of the unit, represents a difficult unit to align and matching. So, in Logos, a rule refers to the action component to translate. In general, statistical (or statistically- once a match is made. Like SMT, it is possible to apply the based) MT systems translate fairly well contigu- same techniques to the data, which in the Logos Model is not ous multiword units taking into account context literal words but semantico-syntactic (SAL) patterns or entities. (surrounding word strings). However, purely sta- This is the reason why it makes sense to train a machine learning system to learn new SAL patterns based on alignments, instead tistical phrase-based MT systems translate poorly of on the conventionally used SMT patterns. multiwords that contain elements placed remotely

25 Figure 2: Alignment of setting []in motion

which we are able to solve with the Logos Model Figure 1: Alignment of bring []to a conclusion approach.

(6) EN - many member states thus have the major task of set- (long distance dependency), as illustrated in the Por- ting structural reform in motion tuguese translation of the support verb construction ES - he aquí por lo tanto una tarea de gran importancia para que numerosos estados miembros lleven a cabo re- by Google Translate in example (5), where the verb formas estructurales is missing. FR - il y a donc là une táche considérable pour beau- coup d’états membres, celle de mettre en chantier des (5) PT GT - Gostaria de exortar a Comissão Europeia a que réformes structurelles o processo− de adopção da directiva para as pensões adi- - há, portanto, uma tarefa importante para muitos cionais *para uma conclusão. PT estados-membros em empreender reformas estruturais

Figure 1 represents the P-alignment of the non- Figure 2 represents the alignment of the non- contiguous support verb construction bring []to a contiguous support verb construction setting []in conclusion with its contiguous equivalent compound motion with the single verb empreender in Por- verb apressar-se a apresentar in Portuguese. tuguese. The most common expression found in our sub- set of the Europarl corpus is bring this matter to 4.3 play []role a conclusion and the remaining expressions occur In some cases, the verbal expression is always ex- very few times. pressed in the form of a support verb construction, which is the case of play [INSERTION] role, be- 4.2 setting []in motion cause there is no semantically equivalent single verb. Often in translation, a non-contiguous expression The support verb can take several forms, i.e., the in a source language can be maintained in the tar- construction can be stylistically different (Barreiro, get language or replaced by an equivalent but con- 2009). Figure 3 exemplifies the adjective modifier tiguous expression that conveys the same meaning. insertions increasingly predominant in the English It can also be transformed into a simpler contigu- sentence. These insertions are excluded from the ous syntactic structure, such as a single word. For English–Portuguese alignment pair play []a role – example, the Portuguese translation for the non- desempenham um papel and aligned separately. contiguous English support verb construction set in motion in example (6) is the single verb empreender 4.4 take []interest in (undertake). Both Spanish and French maintain the Non-contiguous prepositional verbs are aligned to- support verb constructions (llevar a cabo and met- gether with the preposition. Example (7) illustrates tre en chantier), but they are contiguous, having no the alignment of the English support verb construc- insertions. The presence of a non-contiguous ex- tion take []interest in with its semantically equiva- pression in one of the sentences of the language pair lent prepositional verbs in the Romance languages. causes additional complexity to the alignment task, In this support verb construction, the preposition in

26 4.5 keep []informed about Prepositional adjectives align as internal elements of support verb constructions. Example (8) illus- trates the alignment of contiguous prepositional ad- jectives, with the exception of Spanish. The prepo- sitional adjective in Spanish contains an adverbial insertion (periódicamente) between the adjective in- Figure 3: Alignment of play []role formados and the preposition de. In English, French, and Portuguese, the adverbs occur before the prepo- is selected by the predicate noun interest, and not by sitional adjectives. Therefore, the contiguous prepo- the support verb take. In the Romance languages, sitional adjective informed about in English aligns the prepositions are selected by strong verbs: ocu- with its semantically equivalent prepositional adjec- parse []de in Spanish, s’occuper []des in French, tives informés des in French, and informados acerca and debruçar-se []sobre in Portuguese. The adjec- d(os) in Portuguese. tival insertion special aligns with the adverbial in- sertions in the Romance languages: en especial in (8) EN - calling on the commission to keep us regularly Spanish, en particulier in French, and em especial informed about recent developments in Portuguese. ES - pidiendo a la comisión que nos mantenga informados periódicamente de lo que vaya ocurriendo (7) EN - the committee on employment and social affairs FR - appelant la commission à nous tenir régulièrement took a special interest in types of supplementary pension informés des ([de+les]) derniers développements de ce funds dossier ES - la comisión de empleo y de asuntos sociales se ha PT - em que se apela à comissão para que nos mantenha en especial las modalidades de la asistencia ocupado de regularmente informados acerca dos ([de+os]) progres- suplementaria a la tercera edade sos que se forem realizando FR - la commission de l’emploi et des affaires sociales s’est en particulier occupée des ([de+les]) différentes formes de retraite complémentaire

PT - a comissão do emprego e dos assuntos sociais debruçou-se em especial sobre as possibilidades exis- tentes para regimes complementares de reforma

Figure 5: Alignment of keep []informed about

Figure 4 represents the alignment of the non- contiguous support verb construction with a prepo- sitional adjective keep []informed about with its Spanish equivalent mantenga informados de.

5 Advantages of the Logos Model Figure 4: Alignment of took []interest in Former word alignment techniques, even when they Figure 4 represents the alignment of the non- contemplated multiword unit alignments, were un- contiguous prepositional verb took []interest in able to present a consistent and efficient solution to with the corresponding reflexive prepositional verb process non-contiguous expressions. The advantage s’occuper de in French. of the Logos Model with regards to non-contiguous

27 multiword units is its ability to relate constituents results. that are apart (even very far apart) in the sentence. For the support verb construction bring []to a Semtab is an effective way of analysing and trans- conclusion, we categorized all 20 translations as in- lating words in context, especially when the context correct, inadequate or non-optimal. Example (9) il- is remote. In addition to this, Semtab also allows lustrates a literal, unnatural Portuguese translation generalizing between alternative forms of the same trazer a uma conclusão for this support verb con- multiword, phrase or expression. For example, it struction, where a paraphrase of it, such as concluir presents the possibility of generalizing translations or terminar este dossier, represents a higher quality of take a walk to translations of walk, if one of translation. these two is found in the training corpus. Similarly, (9) - The Council is full of good intentions to do all it can closed class items or highly frequent multiwords and EN to bring this dossier to a conclusion phrases might be learnt quickly and be translated PT GT - O Conselho está cheio de boas intenções para − correctly by a SMT system, but open class items or fazer todo o possível para *trazer este dossier a uma less frequent multiwords and phrases might present conclusão more challenging problems that can be observed in MT translations, but also in non-native speakerisms, We also categorized all 20 translations of sen- such as the choice of a support verb for a particular tences with the support verb construction set []in support verb construction (e.g., make a visit or pay a motion as incorrect. Example (10) illustrates a lit- visit?), which can be robustely corrected by the use eral, incorrect translation estabeleceu []em movi- of Semtab. mento, instead of iniciou or pôs em marcha.

Independently of the MT approach, the most (10) EN - it was the Polish Solidarnosc movement which set important consideration with respect to multiword the downfall of the Soviet superpower in motion 20 years units is that they should never be processed on a ago PT GT - foi o movimento Solidarnosc polonês que word-for-word basis, because they represent atomic − *estabeleceu a queda da superpotência soviética em semantico-syntactic and translation units and cannot movimento há 20 anos be broken down into constituent parts in any align- ment process. Given that SMT translation quality For the support verb construction play []role, we depends on the quality of the alignments, it is neces- categorized 8 of the 20 translations as incorrect, in- sary a better representation of multiword units, and adequate or non-optimal. Example (11) illustrates a greater amounts of training data. Only a more gen- literal, incorrect translation jogar o papel, instead of eral representation and access to lexica will cause desempenhar o papel. an impact on unseen multiwords. Therefore, lin- guistic knowledge "elicited" in the alignment pro- (11) EN - the European Parliament is not prepared to simply play the role of observer cess and the use of a more refined alignment tool can PT GT - o Parlamento Europeu não está preparado para − solve some of the problems related to multiword unit simplesmente *jogar o papel de observador alignment, when it is so relevant that these align- ments mirror the unity of the expression. For the support verb construction take []interest in, we categorized 16 of the 20 translations as incor- 6 Analysis of Preliminary Results rect, inadequate or non-optimal. Example (12) illus- trates the consequences that an incorrect approach Taking into account the search performed in section to non-contiguous support verb constructions (and 4 and the corresponding results summarised in Table other multiword units) have over translation, which 1, we have analysed the first 20 sentences extracted is responsible for the incorrect agreement between from the subset corpus for each one of the multiword noun (interesse is masculine) and adjective (morna cases. In order to assess the current translation qual- is feminine), but also for the non-optimal transla- ity of each one of the previously described cases, we tion of the support verb. A higher quality translation translated each sample using Google Translate and would use the non-elementary support verbs mani- performed an empirical evaluation of the achieved feste or demonstre, in the present subjunctive instead

28 of in the infinitive form (unlike the incorrectly cho- cessing of non-contiguous multiword units, which sen support verb form ter). current approaches are not exploring efficiently. The amount of post-editing effort can be reduced by in- (12) EN - It is unacceptable for the Commission only to take a lukewarm interest in a country creasing the quality of the alignments. PT GT - É inaceitável que a Comissão só a *ter um in- Non-contiguous support verb constructions pro- − teresse morna em um país cessing, recognition and translation is a challenging problem when using alignment techniques. Some For the support verb construction keep informed methodologies are inefficient in the sense that they about, we categorized 9 of the 20 translations as in- violate the intrinsic property of the unit as an atomic correct, inadequate or non-optimal. Example (13) group of elements when aligning them individually illustrates an incorrect translation tem [...] manteve or when not respecting the correct boundaries of the informados sobre for this support verb construction, unit. which should be translated as (que nos) tem mantido Another problem concerns manual multilingual informados, or (que nos) tem informado, among oth- alignment scarcity and lack of linguistically rich ers. alignment guidelines. Previously proposed word (13) EN - We have a Commissioner who has played and still alignments guidelines cover cross-linguistic phe- is playing a major role in this enlargement, who has con- nomena superficially, excluding the important align- stantly kept us informed about what he was doing and ment challenges (and challenges to machine transla- with whom we have clearly always been on the same wavelength from a political point of view. tion) presented by non-contiguous support verb con- PT GT - Temos um Comissário, que desempenhou e con- structions and other multiwords and phrasal units. tinua− a desempenhar um papel importante no este alarga- This paper presented the reasons why non- mento, que *tem constantemente nos *manteve infor- contiguous correct and non-ambiguous alignment mados sobre o que ele estava fazendo e com quem temos claramente sido sempre no mesmo comprimento de onda is important, and showed how alignment chal- de um ponto de vista político lenges have been addressed in the Logos Model. This model inspired us to create an alignment In addition to the lexical problems related to the methodology that allows the correct alignment of translation of non-contiguous multiword units, there non-contiguous multiwords and other phrasal units, are also structural errors, such as lack of agreement which we were able to represent graphically in (e.g., para nos manter regular e estreitamente *in- CLUE-Aligner, an alignment tool that handles non- formado sobre; que o Parlamento *ser bem *in- adjacent structures in an appropriate way. formados sobre) and incorrect word order (se con- A strategic follow-up of this experimental re- seguirmos *a adoptar e defini-lo em movimento), search is to extract translation rules from manu- among others. ally annotated corpora and enhance an initially cre- Even though, we have analysed just a few cases, ated Gold standard of manual annotations of bilin- the findings point to a general lack of quality in the gual alignment pairs to feed CLUE-Aligner, based translation of non-contiguous support verb construc- on alignment decisions documented in the work in tions, which appear to be also true for other types of progress set of CLUE Alignment Guidelines. There- non-contiguous multiword units and phrasal expres- fore, future work aims the enhancement of CLUE- sions. A broader quantification of the phenomenon Aligner to align and extract automatically large would help validating our preliminary results. A amounts of alignment pairs to be applied to MT case subsequent work could evaluate the performance of studies, with an ultimate goal to improve transla- hierarchical phrase-based, syntax-based, and neural tion applications. Linguistically-based alignments network translation models, which have the theoret- extracted from good quality translation corpora can ical capacity to learn non-contiguous expressions. contribute to increased precision and recall in SMT systems, with the subsequent improvement of trans- 7 Conclusions and Future Directions lation quality. They are also a valuable asset for ap- This paper aims to prove that standard MT systems plications that require monolingual paraphrases. can benefit significantly by assuming a correct pro- This paper could not be ended without without

29 underlining the great importance of paraphrases in of the Ninth International Conference on Language the translation process. Future machine transla- Resources and Evaluation (LREC’14), pages 35–40. tion requires paraphrastic knowledge that allows to ELRA. choose among possible translations, the best transla- Anabela Barreiro, Francisco Raposo, and Tiago Luís. tion for a multiword unit or phrasal expression in the 2016. CLUE-Aligner: An Alignment Tool to Anno- tate Pairs of Paraphrastic and Translation Units. In particular sentence where it occurs (i.e., in context), Nicoletta Calzolari et al., editor, Proceedings of the as illustrated in example (14). 10th edition of the Language Resources and Evalua- tion Conference (LREC 2016), pages –. ELRA. (14) EN - It is time to bring this issue to a conclusion Anabela Barreiro. 2009. Make it Simple with Para- - We must bring this episode to a conclusion EN phrases: Automated Paraphrasing for Authoring Aids PT - Está na altura de resolver esta questão and Machine Translation. Ph.D. thesis, Universidade PT - Chegou a hora de concluir este assunto do Porto, Portugal. PT - Ponhamos um ponto final neste tema Philipp Koehn. 2005. Europarl: A Parallel Corpus PT - Temos de concluir este episódio. for Statistical Machine Translation. In Conference Proceedings: the tenth Machine Translation Summit, pages 79–86, Phuket, Thailand. AAMT. Acknowledgements Valia Kordoni and Iliana Simova. 2014. Multiword ex- This research work was supported by Fundação pressions in machine translation. In Nicoletta Cal- zolari (Conference Chair), Khalid Choukri, Thierry para a Ciência e Tecnologia (FCT), under ref- Declerck, Hrafn Loftsson, Bente Maegaard, Joseph erence UID/CEC/50021/2013, project eSPERTo Mariani, Asuncion Moreno, Jan Odijk, and Stelios – EXPL/MHC-LIN/2260/2013, and post-doctoral Piperidis, editors, Proceedings of the Ninth Interna- grant SFRH/BPD/91446/2012. The authors wish to tional Conference on Language Resources and Evalu- thank Brigitte Orliac and Bud Scott for comment- ation (LREC’14), Reykjavik, Iceland, May. ELRA. ing on the portion of this paper related to the Logos Bernard (Bud) Scott. 2003. The Logos Model: An His- Model. torical Perspective. Machine Translation, 18(1):1–72. Dhouha Semmar. 2012. Identifying Bilingual Multi- Word Expressions for Statistical Machine Translation. References In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC12), Timothy Baldwin and Su Nam Kim. 2010. Multiword pages 23–25. ELRA. Expressions. In Nitin Indurkhya and Fred J. Damerau, Libin Shen, Jinxi Xu, Bing Zhang, Spyros Matsoukas, editors, Handbook of Natural Language Processing, and Ralph Weischedel. 2009. Effective use of lin- Second Edition. CRC Press, Taylor and Francis Group, guistic and contextual information for statistical ma- Boca Raton, FL. ISBN 978-1420085921. chine translation. In EMNLP 09: Proceedings of the Anabela Barreiro, Bernard Scott, Walter Kasper, and 2009 Conference on Empirical Methods in Natural Bernd Kiefer. 2011. OpenLogos Rule-Based Machine Language Processing, pages 72–80. Translation: Philosophy, Model, Resources and Cus- Jörg Tiedemann. 2003. Combining clues for word tomization. Machine Translation, 25(2):107–126. alignment. In Proceedings of the 10th Conference of Anabela Barreiro, Johanna Monti, Brigitte Orliac, and the European Chapter of the Association for Compu- Fernando Batista. 2013. When Multiwords Go Bad tational Linguistics (EACL), pages 12–17, Budapest, in Machine Translation. In Proceedings of the Work- Hungary. shop on Multi-word Units in Machine Translation and Jörg Tiedemann. 2011. Bitext Alignment. Morgan and Translation Technology, Machine Translation Summit Claypool. XIV. Anabela Barreiro, Johanna Monti, Brigitte Orliac, Su- sanne Preuss, Kutz Arrieta, Wang Ling, Fernando Batista, and Isabel Trancoso. 2014. Linguistic Eval- uation of Support Verb Constructions by OpenLo- gos and Google Translate. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings

30

Discontinuous VP in Bulgarian

Elisaveta Balabanova University of Library Studies and Information Technologies UniBIT [email protected]

The structure of the paper is as follows: In Sec- Abstract tion 2 discontinuous constituents and the theories of “free“ word order in Bulgarian are discussed; In This paper presents Bulgarian discontinuous Section 3 discontinuous constituents are presented constituents. 1 Bulgarian is claimed to be a within BulTreeBank; Section 4 deals with the language of relatively free word order. As a types of discontinuity in VP and Section 5 con- typical manifestation of free word order dis- cludes the paper. continuous constituents in Bulgarian have not been studied so far. The paper discusses and 2 Discontinuous constituents and the the- analyzes the freedom in Bulgarian word order and points out the way discontinuity has been ories of “free” word order in Bulgarian treated within BulTreeBank. We show the re- sults of our linguistic analysis of discontinu- Researchers on Bulgarian word order have so far ous VPs and summarize the extent of word noticed that there is a greater word order freedom order freedom and word order constraints within the verb phrase than within other phrases within VP. (Rudin, 1986; Penchev, 1991). Scientists show that the variety of word order models usually is due to 1 Introduction the influence of discourse on word ordering. In the tradition of Bulgarian word order investigations, It is well known that discontinuous constituents are especially in the first half of the 20 th century, in- typical manifestation of free word order. It is also formation packaging was taken as one of the most claimed that discontinuity is characteristic of lan- prominent manifestations of discourse. Thus much guages with rich morphology. Bulgarian shares of the research in the field of word order was de- both of the above features. Scientists, working on voted to the connection between word order and word order problems in Bulgarian, debate whether information packaging. Bulgarian is a configurational or non- Sv. Ivanchev (Ivanchev, 1975) is the first scien- configurational language. Most of them share the tist who spreaded the ideas of the Prague linguistic belief that Bulgarian is a configurational language, school in Bulgaria. The relation between word or- but has some non-configurational features (i.e. the der and information packaging is investigated by a free permutation of the elements within VP) number of researchers (Georgieva, 1974; Bre- (Penchev, 1991). zinski, 1995; Avgustinova, 1997, Tisheva, 2003, By exploring the issue of discontinuity in VP we Tisheva and Djonova, 2002; Tisheva and Djonova, aim to show the extent of word order freedom in 2004a; Tisheva and Djonova, 2004b; Tisheva, Bulgarian and the restrictions on this freedom, 2013). coming from semantics. The interrelation between intonation, word order and information packaging is a topic of research of 1 In our definition „constituent“ is the same as „phrase“. For another Bulgarian linguist – Jordan Penchev the parts of the constituent we use the term „elements of the (Penchev, 1980). J. Penchev aims to describe the constituent“. main intonation types of Bulgarian sentences and

31

Proceedings of DiscoNLP 2016, pages 31–36, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics

for this reason he uses the information from the re- phrase, VPA is a head-adjunct phrase and VPF – a lationship between semantics and information head-filler phrase, which has an extracted element, packaging. There are several researchers who in- realized outside the phrase. vestigate particular word order constructions, but There are three types of discontinuous constitu- as a whole we can summarize that all the research- ents in the treebank. ers claim that there is a number of factors from structural (syntactic), discourse and prosodic na- 3.1 Functional element DiscA ture, which affect the word order models in Bul- This is when a higher dependent is realized be- garian. The different combinations of these factors tween the head and lower dependent/s. For the give rise to a larger number of word order combi- word order to be preserved, the higher element is nations within some Bulgarian phrases (the VP es- marked up with the functional element DiscA pecially), than in others (NP, AP), which is the rea- (Discontinuous adjunct) and is annotated at a high- son for the researchers to discuss whether Bulgari- er place with the functional element nid (nonim- an is a nonconfigurational language. Scientists de- mediate dominance). Then the element DiscA and ny this hypothesis, showing that the word order nid are connected with the same index, seen as a freedom in some phrases is an isolated phenome- line in the tree below (Simov and Osenova, 2004). non, which cannot be taken as a sign of noncon- fiugrationality. They claim that within the structure of Bulgarian there is a combination of configura- tional and nonconfigurational features. Based on the above mentioned assumptions, all the researchers are on the shared opinion of that Bulgarian has a rather free or relatively free word order, but noone has pointed out precisely the ex- tent of the word order freedom and the restrictions on word order. Also noone so far has studied dis- continuity in Bulgarian, so our survey is the first Figure 1: Sentence from the treebank with DiscA element: attempt to analyze one of the most frequent types Izvednazh telefonat pak zvanna. (Suddenly, the telephone of discontinuous VPs and the factors, causing dis- again rang continuity. 3.2 Functional element DiscM 3 Representation of discontinuous con- DiscM stands for discontinuous mixture. stituents within BulTreeBank This is a mixture of two constituents. The ele- ments of two constituents are mixed with neither of Investigating discontinuity within a corpus is a the two being a governor of the other (Simov and good way to investigate the extent of word order Osenova, 2004). This is a very rare case of discon- freedom and word order constraints. For this sake tinuity and has only two or three occurrences in the we use the corpus of syntactic trees in Bulgarian, treebank. namely BulTreeBank, which is a corpus of syntac- tic trees of Bulgarian sentences . Constituency within the treebank is represented via graphs, which are defined on the basis of mother-daughter relation (Simov and Osenova, 2004). Graphs are chosen as close to the context free-tree representa- tion (Simov and Osenova, 2004). In the syntactic trees the original word order is preserved and dis- continuous elements are introduced where neces- sary (Simov and Osenova, 2004). In the examples from the treebank VPC stands for verb- complement phrase, VPS – for a verb-subject

32

total number of sentences with discontinuity in the corpus is 4160 sentences, which makes about 35% of all the sentences in the treebank. After doing this, we had to select the types of discontinuities within VP. We found out that there are 2 main groups of discontinuity in Bulgarian VP: i) discon- tinuity, caused by an element which is part of the syntactic structure of the sentence (the element, causing discontinuity in this case is marked up with DiscA in the treebank) and ii) discontinuity, Figure 2: Sentence from the treebank with DiscM element: caused by an element, which is not part of the syn- Malki go momi beriaha. (Little it girls picked up; i.e. little tactic structure of the sentence (the element, caus- girls picked it up) ing discontinuity in this case is marked up with the 3.3 Functional element DiscE tag Pragmatic element in the treebank). In this pa- per we will not deal with discontinuities of the This is external realization of inner constituent. second type. We are focused only on discontinui- This is the case of extraction (Simov and Osenova, ties, caused by elements, which are part of the syn- 2004). Again the element DiscE (Discontinuous tactic structure of the tree. These elements are: ad- extraction) is marked with nid with the same index juncts; extracted complements of the head verb and as the phrase where it has been extracted from. the subject. Here we will focus only on discontinu- ity, caused by adjuncts.

4.1 Discontinuity, caused by adjuncts Within BulTreeBank discontinuities, caused by ad- juncts, are the greatest number of discontinuities (67% of all the sentences with discontinuity in VP). In the treebank the sentences are annotated along the lines of HPSG (Pollard and Sag 1994). Thus, according to the theoretical frame we use, adjuncts are attached as sisters of the saturated VP phrase, i.e. when the verb has realized its depend- ents – complement/s and subject (if there is a sub- ject in the sentence, since Bulgarian is a pro-drop language). Only after the verb has taken its de- pendents and formed either a VPC (verb- complement phrase) or a VPS (verb-subject phrase), the adjunct is attached to this VPC or VPS phrase, forming a VPA phrase. This is the usual Figure 3: Sentence from the treebank with DiscE and DiscA element: S tozi sluchai az sym svoevremenno zapoznat. (With case, when adjuncts are realized linearly without this case I am on time informed.) causing discontinuity. In this linear realization of the adjunct the latter modifies semantically the sat- 4 Types of discontinuity, based on the urated VP. On the contrary, in the cases of discon- type of the element causing it. tinuities the adjuncts are realized linearly first and the dependents of the verb (subject and/or com- In this part we present our investigation on discon- plements) – afterwards. In such sentences the pro- tinuity within VP. VP is chosen as the most promi- jection of the VPA phrase is higher up in the tree nent example of free word order in Bulgarian (see and the linear intersection is seen as a line in the Section 2a). For the completion of the task it was graph (see Section 3.1). There are two cases of dis- necessary first of all to extract all the sentences, continuity in VP, caused by adjunct: i) The adjunct containing discontinuous VPs from the corpus. The

33

is realized between the subject and the head verb; ii) The adjunct is realized between the head verb and the complement. Before starting the linguistic analysis, we came across one problem. Namely, the adjuncts were not classified by types in the treebank. Therefore, we needed to have a classification of adjuncts first and then annotate manually all the adjuncts in the sen- tences of discontinuities, according to this classifi- cation. Only afterwards we could extract the sen- tences with discontinuities by types of the adjuncts. The classification of adjuncts we used is based on GSBKE (GSBKE, 1983) and contains the follow- Figure 4: Sentence from the treebank with adjunct between ing types of adjuncts: adjuncts of time, of manner, the head verb and the complement: Puskat v Evropa (adjunct of place) akcii na Intersputnik.(They release in Europe shares of quantity and degree, of place, adjuncts of second of Intersputnik). predication, of condition, of reason, and of aim. In all the sentences with adjunct, causing discontinui- The information packaging 2 in this type of sen- ty, we annotated the adjuncts manually along this tences follows two main patterns: classification. Then it was possible to extract the groups of discontinuities by the type of the adjunct. 1) The adjunct is part of the Ground. This allowed us to make conclusions about the rea- sons, causing discontinuous linear realization of Example : Ground [link[V tzivilizovania sviat] tail the elements of the VP. [chovek prez celia si zhivot (adjunct of time)]] Focus [pazi „svetaia svetih“ na svoiata reputacia – 4.1.1 Discontinuity between the subject and the svoeto kreditno dosie]. (In the civilized world one, head verb during his whole life, keeps the most precious of This is the biggest group of sentences with discon- his reputation – his credit history.) tinuities, caused by adjuncts. Here is the proportional distribution of sentences In sentences with such discontinuity and com- with adjuncts, causing discontinuity between the municatively marked word order 3 the adjunct can subject and the head verb in VP: Adjuncts of time take the information value of either tail , or link of – 45%; Adjunts of manner – 31%; Adjuncts of the tail . In sentences with communicatively un- quantity and degree – 11%; Adjuncts of place – marked word order the information value of the ad- 5%; Adjuncts of second predication – 3%; Ad- junct is only a tail . juncts of condition – 2%; Adjuncts of reason – 1%; Adjuncts of aim – less than 1% 2) The adjunct is part of the Focus

2 For analysis of information packaging we use the metho- dology of Engdahl and Vallduvi (Engdahl and Vallduvi, 1994; Engdahl and Vallduvi 1996), where Focus is the actual information of the sentence, Ground is what is presupposed by the information at the output. Sentences have Ground only if the context ensures it. The Link is the particular place in the sentence for introduction of the new information and the Tail points out that there is a need for information update in this part of the discourse. 3 For the relation between word order and information packag- ing we use the model of T. Avgustinova (Avgustinova, 1997), in which there are 4 types of word order, according to the in- formation packaging: communicatively unmarked, parenthe- tical, communicatively marked and emphatic.

34

Example : Ground [Vseki opit za ocenka na organ- 4.2 Conclusion about the word order models iziranata prestapnost v izmereniata na nacionalnata with discontinuity, caused by adjuncts sigurnost] Focus [ zadalzhitelno (adjunct of man- From our linguistic analysis we can summarize ner) predpolaga predvaritelno da se utochniat ob- that the factors, which rule the realization of ad- hvatyt i sadarzhanieto na samoto poniatie. (Any at- juncts within VP are: tempt to estimate the organized crime in the con- 1. The information packaging within the sen- text of the national security obligatorily presuppos- tence, which depends on es to define content of the notion itself.) 2. The semantics of the adjuncts. According to 4.1.2 Discontinuity between the head verb and the semantics of the adjunct and according the complement to which part of the sentence the adjunct is The position right after the head verb in VP has syntactically attached, we distinguish4 two been investigated by a number of researchers types of adjuncts: i) sentential (they modify (Rudin, 1986, Avgustinova, 1997, Penchev in: Bo- semantically the whole sentence) and ii) yadzhiev, Kutzarov, Penchev, 1999, Tisheva, phrasal (they modify semantically a particu- 2000, Tisheva, 2013). lar element of the VP). Here is the proportional distribution of sentences with adjuncts, causing discontinuity between the Most of the adjuncts in Bulgarian modify se- subject and the head verb in VP: Adjuncts of man- mantically the whole sentence (these are the ad- ner – 40%; Adjunts of time – 30%; Adjuncts of juncts of time, of place, of condition, of reason and quantity and degree – 11%; Adjuncts of place – of aim). Syntactically, these adjuncts are realized 9%; Adjuncts of second predication – 4%; Ad- as sisters of the saturated VP. Thus their linear re- juncts of aim – 2%; juncts of condition – less than alization within VP is only a result of the particular 1%; Adjuncts of reason – less than 1%. information packaging that the speaker chooses to make in his utterance. The word order realization of the phrasal ad- juncts (adjuncts of manner, of quantity and degree and adjuncts of second predication) is restricted by semantic constraints. The semantic scope of these adjuncts – i.e. over a particular element of the VP5 – demands that they are realized in contact to the element they semantically modify (the contact po- sition can be pre- or postposition). Since discontinuous constituents are a typical manifestation of free word order, we can summa- rize that the word order freedom within Bulgarian VP is a result of different information packaging.

Figure 5: Sentence from the treebank with adjunct between The constraints on word order, though, come from the subject and the head verb: Izvendazh telefonat pak (ad- semantics. This means that whenever adjuncts with junct of manner) zvynna.(Suddenly the telephone again rang). narrow sematic scope are realized within VP, their semantics poses restrictions on word order since In sentences with adjuncts between the head the adjunct has to be realized in contact to the ele- verb and the complement the adjunct becomes part ment of the VP it semantically modifies. The reali- of the focus. zation of the adjunct in contact to the element it modifies semantically (in pre- or postposition to Example : Razbira se, choveshko e da se sbyrka, no tuk Focus [znam mnogo dobre (adjunct of manner) za kakvo stava duma]. (Of course, it’s 4 This distinction is already known for other languages, but the human to make mistakes, but I know exactly of author is the first one who defines it for Bulgarian. what we’re talking about.) 5 For this we use the term “narrow semantic scope“.

35

this element) results in syntactic discontinuity of Jordan Penchev. 1991. Nekonfiguracionni iavlenia v the phrase. balgarskia sintaksis, sp. Balgarski ezik, kn.6, Izd. BAN. Sofia. Bulgaria 5 Conclusion Carl Pollard and Ivan Sag. 1994. Head-Driven Phrase In this paper we have reviewed the theories about Structure Grammar . The University of Chicago Press. Bulgarian word order in the limelight of discontin- Katrin Rudin. 1986. Aspects of Bulgarian syntax: com- uous constituents. We have shown how discontin- plementizers and wh-constructions . Slavica Publishers, uous constituents have been presented within Bul- Inc. TreeBank. We have also pointed out the types of discontinuous constituents and presented our lin- Kiril Simov and Petya Osenova. 2004. BTB-TR05: guistic analysis of the discontinuities, caused by BulTreeBank Stylebook . http://www.BulTreeBank.org/ adjuncts. We have described the reasons for linear TechRep/BTB-TR05.pdf realization of adjuncts within VP and we have also summarized the factors, which trigger word order Jovka Tisheva. 2003. Bulgarian yes-no questions with particles nali and nima . In: Investigations into Formal freedom and impose word order constraints on the Slavic Linguistics. Contributions of the 4 th European elements of VP, thus pointing out the precise ex- Conference in Formal Description of Slavic Languages. tent of word order freedom in Bulgarian, which Peter Kosta, Janna Blaszczak, Jens Frasek, Ljudmila had not been studied thoroughly so far. Geist, Marzena Zygis (eds.). Peter Lang, Europaischer Verlag der Wissenschaften, Frankfurt am Main, 715- References 729

Tanya Avgustinova. 1997. Word order and clitics in Jovka Tisheva. 2013. Pragmatichni aspekti na ustnata Bulgarian . Saarbrucken dissertations in computational rech . Litera et Lingua Series Dissertations. linguistics and language technology. Vol. 5. Universität http://slav.uni-sofia.bg/naum/liliseries/diss/2013/3 des Saarlandes.

Jovka Tisheva, Marina Djonova. 2002. Information Stefan Brezinski. 1995. Kratak balgarski sintaksis . UI structure and clitics in TreeBanks . In: Proceedings of „Sv.Kliment Ohridski“. Sofia the First Workshop on Treebanks and Linguistic Theo-

ries (TLT 2002). Sozopol. Bulgaria. BulTreeBank. http://www.BulTreeBank.org/

Jovka Tisheva, Marina Djonova. 2004a. Stariat nov top- Engdahl and Vallduvi 1994. Information packaging and ic . In: VII Nacionalni slavistichni chetenia. Sofia. grammar architecture: A constraint-based approach . In

E. Engdahl, editor, Integration Information Structure in- Jovka Tisheva, Marina Djonova. 2004b. Za niakoi slo- to Constraint-based and Categorial Approaches, volume voredni modeli za topicalizacia na razgovornata rech . R.1.3.B of DYANA, Edinburgh, 41-79. In: Sedma nauchna konferencia po problemite na razgo-

vornata rech. Veliko Tarnovo. Engdahl and Vallduvi 1996. Information packaging in

HPSG . In Grover and Vallduvi, 1996, 113-128.

Elena Georgieva. 1974. Slovored na prostoto izrechenie v balgarskia knizhoven ezik . Izd.BAN.Sofia

GSBKE, 1983. Gramatika na savremennia balgarski knizhoven ezik . 1983 Tom 3. Izd.BAN. Sofia.

Svetomir Ivanchev. 1957. Nabljudenia varhu upotre- bata na chlena . In: Sv.Ivanchev. Prinosi v balgarskoto i slavianskoto ezikoznanie. Sofia. 1978

Jordan Penchev. 1980. Osnovni intonacionni konturi v balgarskoto izrechenie . Izd. BAN. Sofia

36 Discontinuous Genitives in Hindi/Urdu

Sebastian Sulger Department of Linguistics University of Konstanz [email protected]

Abstract reconstruct the dependencies in the analysis, • i.e., attach the discontinuous parts to their syn- This paper discusses genitive phrases in tactic heads. Hindi/Urdu in general and puts a particular fo- cus on genitive scrambling, a process whereby Third, from a theoretical linguistic point of view, the basic order of constituents is changed. In one would also want to derive generalizations about Hindi/Urdu, genitive phrases may not only what kinds of discontinuities are possible, and what occur at different structural positions within the NP that they modify; under the right kinds do not appear. Depending on the language circumstances, they can also be found out- studied, investigating such constraints is helpful side of the NP, yielding discontinuous struc- since they can provide cross-linguistic insight into tures. The theoretical challenge is to identify the phenomenon of discontinuity, and why it can or and formalize the linguistic constraints that cannot take place. govern genitive scrambling. Further, a suc- This paper presents a study of discontinuous NPs cessful computational treatment correctly at- in the morphologically-rich South Asian language taches the genitive phrase to its head NP. I use 1 a Lexical-Functional Grammar to solve both Hindi/Urdu. The focus is on genitive NP modi- challenges, demonstrating that the constraints fiers, which display a large deal of discontinuity. As can be aptly formulated using a functional un- will be seen below, in the right configurations, they certainty path. Successful attachment further may be scrambled out of their NP domain, removing depends on the morphological agreement of them from the heads that they modify. Neither the the genitive phrase with its head. On a the- phenomenon itself nor the configurations that allow oretical level, the present contribution sheds for it have been previously discussed in the litera- light on the possibilities of NP discontinu- ities in a morphologically rich language like ture. Hindi/Urdu. The paper contributes to solving all three of the above challenges. It discusses the empirical prop- erties of the Hindi/Urdu genitive in general as well 1 Introduction as genitive discontinuity, investigated by collect- Discontinuous constituents offer particular chal- ing data from native speakers and searching the lenges for various NLP applications, such as 1The two languages Hindi and Urdu are so closely related question-answering, coreference resolution or topic that many researchers in linguistics treat them as a single lan- modeling. This paper relates to an application that guage, Hindi/Urdu. Differences between Urdu and Hindi are is further up the NLP toolchain: syntactic parsing. mainly in the script (Urdu uses a version of Arabic script, while Hindi uses Devanagari) as well as in the vocabulary (Urdu uses Here, the main challenges lie in: more Persian and Arabic vocabulary, and Hindi evolved from Sanskrit). There are further minor differences in the phonology adapting the parser to be able to process the dis- • as well as in the derivational morphology; the syntax is almost continuous structures; identical.

37

Proceedings of DiscoNLP 2016, pages 37–46, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics

Hindi/Urdu Treebank (Bhatt et al., 2009) ( 2,3,4). Gender Number Inflection Form § I arrive at a couple of theoretical generalizations, Masculine Singular Nominative ka which can be aptly formulated via functional uncer- Oblique ke tainty within the framework of Lexical-Functional Plural Nominative ke Grammar (LFG, Dalrymple (2001)). I suggest that Oblique ke the possibility of the genitive to appear outside its Feminine Singular Nominative ki NP is a result of the rich agreement between the gen- Oblique ki itive case marker and the NP head. Finally, I de- Plural Nominative ki scribe how the Hindi/Urdu ParGram grammar (Butt Oblique ki and King, 2007; Bogel¨ et al., 2009), a computa- Table 1: Possible inflections of Hindi/Urdu genitive case clitic tional LFG grammar developed as part of the Par- k- Gram project (Sulger et al., 2013; Butt et al., 2002) and implemented in XLE (Crouch et al., 2015), is (2) a. nina=ki beti adapted to parse and correctly attach discontinuous . Nina.F.SG=GEN.F.SG daughter.F.SG genitives to their NPs ( 5).2 The paper concludes in § 6. ‘Ram’s car’ § b. * nina=ka bet.i 2 General Description Nina.F.SG=GEN.M.SG daughter.F.SG The genitive case in Hindi/Urdu is realized using the c. * nina=ke bet.i clitic k-, which is attached to a possessor NP. Un- Nina.F.SG=GEN.M.PL daughter.F.SG der the analysis of Hindi/Urdu case in Butt and King (3) a. nadya=ke bete (2004), which I adapt here, all case clitics function- . Nadya.F.SG=GEN.M.PL son.M.PL ally head a KP (case phrase).3 The genitive differs from other case clitics: it agrees in number, gen- ‘Nadya’s sons’ der and morphological form (nominative or oblique) b. * nadya=ka bet.e with the head noun, the possessum. For the femi- Nadya.F.SG=GEN.M.SG son.M.PL nine, there is morphological syncretism in that a sin- c. * nadya=ki bet.e gle form ki is used throughout the feminine inflec- Nadya.F.SG=GEN.F.SG/PL son.M.PL tional pattern. For the masculine, there is syncretism between the singular oblique and plural nominative Within NPs, the modifying possessor phrase and oblique. Table 1 shows the complete pattern of comes first, then the possessum (i.e., the head of the the clitic. In (1)–(3), the a. examples are valid NPs, NP); this conforms to the general clausal word order displaying the correct agreement pattern. in Hindi/Urdu, which is head-final (Mohanan, 1994; Butt, 1995). The position of the genitive phrase (1) a. ram=ka mAkan varies with respect to other NP modifiers, such as Ram.M.SG=GEN.M.SG house.M.SG adjectives or quantifiers; see (4) for an example. NP ‘Ram’s house’ modifiers occurring after the NP head are judged as b. * ram=ki ungrammatical by the informants; see (4c) for an Ram.M.SG=GEN.F.SG/PL example. Another example illustrating the variable mAkan word order inside the NP is shown in (5). house.M.SG (4) a. ram=ki nili c. * ram=ke mAkan Ram.M.SG=GEN.F.SG blue.F.SG Ram.M.SG=GEN.M.PL house.M.SG gar.i 2The Hindi/Urdu ParGram grammar can be tested using the car.F.SG INESS website at http://iness.uib.no/. ‘Ram’s blue car’ 3The status of the case marker as a clitic is not of direct importance here; the interested reader is referred to Butt and b. nili ram=ki King (2004) for a comprehensive discussion. blue.F.SG Ram.M.SG=GEN.F.SG

38 gar.i grammatical functions (GF) subject/SUBJ (7a), ob- car.F.SG ject/OBJ (7b) and adjunct/ADJUNCT (7c).4 ‘Ram’s blue car’ (7) a. ram=ki tIppAni c. * nili gar.i Ram.M.SG=GEN.F.SG comment.F.SG blue.F.SG car.F.SG ‘Ram’s comment/criticism’ ram=ki b. gar.i=ki tAbahi Ram.M.SG=GEN.F.SG car.F.SG=GEN.F.SG destruction.F.SG (5) a. Ustad=ka kUch hoSyar ‘the car’s destruction’ teacher.M.SG=GEN.M.SG some smart c. sUrx rAng=ki mez I I tal b- lm red color.M.SG=GEN.F.SG table.F.SG student.M.PL ‘the table of red color’ ‘some smart students of the teacher’ b. Ustad=ka hoSyar kUch talIb-Ilm 3 Genitive Scrambling c. kUch Ustad=ka hoSyar talIb-Ilm In addition to the variable word order inside NPs, d. kUch hoSyar Ustad=ka talIb-Ilm there are examples showing that the genitive modi- e. hoSyar kUch Ustad=ka talIb-Ilm fiers can occur outside of the NPs they modify. I will f. hoSyar Ustad=ka kUch talIb-Ilm refer to this as Genitive Scrambling. In (8a), the gen- itive occurs in the canonical position inside the NP The constraint that NP modifiers have to precede to the left of the head noun. In (8b), the genitive is their head inside the NP is corroborated by data such scrambled outside of the subject NP to the end of the as in (6b) (a permutation of (6a)). Here, the genitive clause; still, it must be analyzed as a modifier of the occurs after the NP head, bet.e ‘sons’, which is itself head noun dost ‘friend’, since it cannot be argued to marked with the ergative case. The fact that (6b) is be an argument of the intransitive verb a ‘come’. ungrammatical is a clear indication that the genitive phrases cannot be right-adjoined to the NP head. (8) a. ram=ka Ram.M.SG=GEN.M.SG (6) a. [[nadya=ke do dost ay-a Nadya.F.SG=GEN.M.PL two friend.M.SG.NOM come-PERF.M.SG bet.e]NP=ne]KP gar.i=ko ‘Ram’s friend came.’ (Butt and son.M.PL=ERG car.F.SG=ACC Zinsmeister, 2009) cAla-yi hE b. dost ay-a drive-PERF.F.SG be.PRES.3.SG friend.M.SG.NOM come-PERF.M.SG ‘Nadya’s two sons have driven the car. ram=ka b. * [do bet.e Ram.M.SG=GEN.M.SG two son.M.PL ‘Ram’s friend came.’ (Butt and nadya=ke]NP=ne]KP Zinsmeister, 2009) Nadya.F.SG=GEN.M.PL=ERG In (9a), the object gari ‘car’ is modified by the gar.i=ko cAla-yi . U car.F.SG=ACC drive-PERF.F.SG genitive s=ki ‘her/his/its’. The genitive can be hE 4Due to space limitations, I do not go into detail regarding be.PRES.3.SG the treatment in Sulger (to appear). I will say, however, that the evidence includes binding of a reflexive pronoun as well as The Hindi/Urdu genitive has a wide functional iterativity/optionality of nominal arguments vs. adjuncts. Note distribution: it appears on adjuncts and nominal also that semantically, the genitive may realize various roles as a arguments. The LFG analysis of Sulger (to ap- modifier, e.g., an agent in (7a), a patient in (7b), and an attribute in (7c). This semantic variety is known from many languages, pear) is assumed here, which argues for a differen- including English; a semantic classification of the genitive is tiated treatment of the genitive KP in terms of the certainly not within the scope of this paper.

39 scrambled out of the object to the beginning of the (11) a. Us=ki clause as in (9b). From the morphosyntax, it is clear PRON.3.SG.OBL=GEN.F.SG that in (9b) the feminine-inflected Us=ki ‘her/his/its’ gar.i ram=ne modifies gar. i ‘car’, since that is the only feminine car.F.SG.NOM Ram.M.SG=ERG nominal in the sentence. A very similar example is bazar=me˜ dekh-i in (10). market.M.SG=LOC.IN see-PERF.F.SG (9) a. ram=ne ‘His/her car, Ram saw in the market.’ Ram.M.SG=ERG (adapted from Bogel¨ and Butt (2013), p. Us=ki 301) PRON.3.SG.OBL=GEN.F.SG b. gar.i ram=ne car.F.SG.NOM Ram.M.SG=ERG gar.i bazar=me˜ car.F.SG.NOM market.M.SG=LOC.IN Us=ki dekh-i PRON.3.SG.OBL=GEN.F.SG h see-PERF.F.SG bazar=me˜ dek -i ‘Ram saw her/his car in the market.’ market.M.SG=LOC.IN see-PERF.F.SG (adapted from Bogel¨ and Butt (2013), p. ‘His/her car, Ram saw in the market.’ 301) (adapted from Bogel¨ and Butt (2013), p. 301) b. Us=ki PRON.3.SG.OBL=GEN.F.SG (12) a. kIs=ki who.SG.OBL=GEN.F.SG ram=ne gar.i Ram.M.SG=ERG car.F.SG.NOM kItab tUm=ne bazar=me˜ dekh-i book.F.SG.NOM you=ERG market.M.SG=LOC.IN see-PERF.F.SG xArid-i? ‘His/her car, Ram saw in the market.’ buy-PERF.F.SG (adapted from Bogel¨ and Butt (2013), p. ‘Whose book did you buy?’ (adapted 301) from Bogel¨ and Butt (2013), p. 301) I U (10) a. tUm=ne kIs=ki b. k tab t m=ne book.F.SG.NOM you=ERG you=ERG who.SG.OBL=GEN.F.SG I kItab xArid-i? k s=ki who.SG.OBL=GEN.F.SG book.F.SG.NOM buy-PERF.F.SG xArid-i? ‘Whose book did you buy?’ (adapted buy-PERF.F.SG from Bogel¨ and Butt (2013), p. 301) ‘Whose book did you buy?’ (Bogel¨ and b. kIs=ki tUm=ne Butt (2013), p. 301) who.SG.OBL=GEN.F.SG you=ERG kItab xArid-i? Recall that the order within NPs is head-final. book.F.SG.NOM buy-PERF.F.SG As seen in (11)–(12), however, when genitives are scrambled outside of their NP, this order is not ‘Whose book did you buy?’ (adapted necessarily preserved. Using the terminology of from Bogel¨ and Butt (2013), p. 301) Fanselow and Fery´ (2006), I refer to scrambled gen- Genitives may also be scrambled to the right. In itives that occur before their heads in the sentence (11a), a permutation of (9a), the object is topical- as non-inverted scrambled genitives, and to scram- ized to the front of the clause. In (11b), the genitive bled genitives that occur after their heads as inverted phrase modifying the object is scrambled to the right scrambled genitives. and occurs after the subject. A similar example is It is a reasonable assumption that scrambling of given in (12), where kIs=ki ‘whose’ modifies kItab genitive phrases is possible since the genitive dis- ‘book’, but is not in the same constituent. plays rich morphology which agrees with its head,

40 enabling speakers to identify the nominal in the sen- The preference for local attachment is reflected in tence modified by the genitive. Fanselow and Fery´ a principle well-known from cognitive science, first (2006) identify agreement inside NPs as a main discussed by Kimball (1973) as the Right Associa- factor influencing the availability of discontinuous tion principle, and reformulated by Gibson (1991) NPs across languages, but there are also counter- as the Recency Preference. examples against this generalization; Turkish, for example, has discontinuous NPs, in spite of the ab- 4.2 Scrambling and Case sence of agreement inside nominal projections. The examples above involve genitives that are scrambled out of bare NPs. Genitives may also 4 Some Preferences and Constraints be scrambled out of NPs that are overtly case- The operation of genitive scrambling does not occur marked; in this case, inverted scrambled genitives without constraints. This section sums up these con- are ungrammatical, and the genitive has to pre- straints, which serve as the empirical background for cede its head in the clause. Examples are shown ram=ke the XLE implementation of genitive scrambling as in (14). In both sentences, ‘Ram’s’ modi- bAcco=ne ERG described in 5. Each of the constraints was verified fies ‘children= ’, but since the latter is § by intensive consultation with at least three native ergative-marked, the former has to precede it. speakers. (14) a. ram=ke kAl Ram.M.SG=GEN.M.SG.OBL yesterday 4.1 Local Attachments are Preferred bAcco=ne yIh Consider (13a), which involves a topicalized object. child.M.PL.OBL=ERG this The possessor of that object can be scrambled to the gana ga-ya U right as in (13b). In cases such as (13b), s=ki is song.M.SG.NOM sing-PERF.M.SG gari either a scrambled genitive modifying . ‘car’ or th-a a canonical genitive locally attached to bag ‘park’; be.PAST-M.SG the agreement morphology does not rule out ei- ‘Ram’s children sang this song yester- ther. Where the agreement morphology permits both day.’ scrambled as well as locally attached genitives, local attachments are highly preferred. Here, informants b. * bAcco=ne kAl judge Us=ki ‘his/her’ as modifying bag ‘park’, but child.M.PL.OBL=ERG yesterday acknowledge that it may also modify gar. i ‘car’. ram=ke yIh Ram.M.SG=GEN.M.SG.OBL this (13) a. Us=ki gana ga-ya PRON.3.SG.OBL=GEN.F.SG song.M.SG.NOM sing-PERF.M.SG gar.i nadya=ne th-a car.F.SG.NOM Nadya.F.SG=ERG be.PAST-M.SG bag=me˜ dekh-i park.F.SG=LOC.IN see-PERF.F.SG A similar example involving a genitive scrambled from an overtly-marked object NP is given in (15): ‘Her/his car, Nadya saw in the park.’ ram=ke ‘Ram’s’ needs to precede its head kU.t.te=ko b. gari nadya=ne . ‘dog=ACC’. car.F.SG.NOM Nadya.F.SG=ERG Us=ki (15) a. bAcco=ne PRON.3.SG.OBL=GEN.F.SG child.M.PL.OBL=ERG bag=me˜ dekh-i ram=ke kAl park.F.SG=LOC.IN see-PERF.F.SG Ram.M.SG=GEN.M.SG.OBL yesterday h ‘The car, Nadya saw in her park.’ kUt.t.e=ko dek -a preferred over dog.M.SG.OBL=ACC see-PERF.M.SG ‘His/her car, Nadya saw in the park.’ ‘The children saw Ram’s dog yesterday.’

41 b. * bAcco=ne (17) * Us=ki child.M.PL.OBL=ERG PRON.3.SG.OBL=GEN.F.SG kUt.t.e=ko kAl ram=ne kAh-a kIh dog.M.SG.OBL=ACC yesterday Ram.M.SG=ERG say-PERF.M.SG that ram=ke [nina=ne gar.i Ram.M.SG=GEN.M.SG.OBL Nina.F.SG=ERG car.F.SG.NOM dekh-a dekh-i] see-PERF.M.SG see-PERF.F.SG

Recall that genitive KPs modifying nominals in (18) Us=ki overtly case-marked KPs need to have oblique nom- PRON.3.SG.OBL=GEN.F.SG inal morphology. One might assume, then, that ex- ram gari dekh amples such as (15b) are bad simply because there . Ram.M.SG.NOM car.F.SG.NOM see are several options for the genitive KP to modify sAk-a a nominal, given the high amount of syncretism in can-PERF.M.SG genitive case marking for the oblique; e.g., in (15b) ‘His/her car, Ram could see.’ the genitive could modify both bAcco and kU.t.te. (16) shows that this cannot be the issue. Here, the geni- tive can modify both nominals, being in linear prece- 4.4 No Scrambling out of Adjuncts dence to both of them; cf. also (14b), which is un- The third constraint concerning genitive scrambling grammatical, even though the agreement morphol- is that genitive KPs may not be scrambled from ogy clearly rules out any other possibilities of mod- within adjuncts. In (19a), Us=ki ‘her/his/its’ is a ification aside of bAcco. genitive phrase modifying bag ‘park’, which itself (16) ram=ke kAl is locative case-marked and an adjunct to the over- Ram.M.SG=GEN.M.SG.OBL yesterday all clause. It is found that the possessor may not be bAcco=ne kUt.t.e=ko scrambled from its NP to any other position in the child.M.PL.OBL=ERG dog.M.SG.OBL=ACC clause (19b–c). dekh-a see-PERF.M.SG (19) a. ram=ne ‘The children saw Ram’s dog yesterday.’ Ram.M.SG=ERG or Us=ki ‘Ram’s children saw the dog yesterday.’ PRON.3.SG.OBL=GEN.F.SG h bag=me˜ hat. i 4.3 Scrambling from Complement Clauses park.F.SG=LOC.IN elephant.M.SG.NOM Another constraint concerns complement clauses. dekh-a None of my informants judge possessors scram- see-PERF.F.SG bled out of finite complement clauses as gram- ‘Ram saw an elephant in my park.’ matical; cf. the ungrammatical examples in (17). U h h However, a majority of my informants indicate b. * s=ki ram=ne bag=me˜ hat. i dek -a h h that it is grammatical to scramble genitive phrases c. * ram=ne bag=me˜ hat. i Us=ki dek -a from within non-finite complement clauses, e.g., the clause headed by the modal verb sAk ‘can’ in Island behavior, i.e., the unavailability of con- (18). This is in line with the findings by Ma- stituents for movement/scrambling, is symptomatic hajan (1990), Kidwai (1999) as well as Kidwai for clausal adjuncts and is well-known throughout (2000), who state that scrambling of arguments from the literature, first discussed by Ross (1967). It is within finite complement clauses is generally not ac- also a well-known diagnostic for distinguishing ar- cepted, whereas scrambling from infinite comple- guments from adjuncts, as discussed by, e.g., Need- ment clauses is. ham and Toivonen (2011) in an LFG setting.

42 h 4.5 No Scrambling from Deep Within b. * ram=ne SOhAr=ki gar.i orAt=ke dek -i The last constraint to be discussed here indicates (23) a. * sUrx rAng=ke nina=ne mAkan=ka that it is not possible to scramble genitive phrases dArvaza dekh-a that are selected by nominals further down a path b. * nina=ne mAkan=ka dArvaza sUrx of grammatical functions. Consider the examples rAng=ke dekh-a in (20a). SOhAr ‘husband’ is modified by a genitive SUBJ orAt=ke ‘the woman’s’. SOhAr=ki, in turn, is 5 XLE Implementation an extrinsic possessor SUBJ modifying the overall This section describes the implementation of the object of the clause, gari ‘car’. The structure is as . Hindi/Urdu genitive as well as its scrambling prop- indicated by the bracketing in (20b). In the similar erties and resulting discontinuities. The implemen- example (21), sUrx rAng=ke ‘of red color’ is an AD- tation uses the XLE grammar development plat- JUNCT modifying mAkan ‘house’. form, which includes an industrial-strength parser (20) a. ram=ne and generator for LFG grammars (Crouch et al., Ram.M.SG=ERG 2015). orAt=ke 5.1 General Setup woman.F.SG=GEN.M.SG.OBL SOhAr=ki The lexical entry for the feminine genitive case husband.M.SG=GEN.F.SG marker ki is given in (24). Recall the agreement h pattern of the genitive case marker in Table 1; in gar.i dek -i XLE, constraining equations can account for the re- car.F.SG.NOM see-PERF.F.SG quirements concerning gender, number as well as ‘Ram saw the woman’s husband’s car.’ morphological form. In (24), the constraints are in b. ram=ne [[[orAt=ke]SUBJ SOhAr=ki]SUBJ the form of inside-out constraining equations, since h gar.i]OBJ dek -i the genitive KP may either be embedded in a SUBJ, (21) a. nina=ne sUrx ADJUNCT or in an OBJ f-structure inside the head Nina.F.SG=ERG red noun’s f-structure. The last line in (24) states that rAng=ke the case marker needs to be inside an f-structure that color.M.SG=GEN.M.SG has the feature NTYPE; this ensure that the genitive mAkan=ka dArvaza only occurs as a nominal case (i.e., not on verbal ar- house.M.SG=GEN.M.SG door.M.SG guments/adjuncts). h dek -a (24) kI K * (ˆ CASE) =c gen see-PERF.M.SG (({SUBJ|OBJ|ADJUNCT} ˆ) GEND) =c fem ‘Nina saw the red house’s door.’ (({SUBJ|OBJ|ADJUNCT} ˆ) NTYPE).

b. nina=ne [[[sUrx rAng=ke]ADJUNCT The XLE grammar rules in (25) construct the KP h mAkan=ka]SUBJ dArvaza]OBJ dek -a and NP. (25a) states that the KP consists of an NP and an optional case marker K. (25b) states that an Given such situations, consider the examples in NP may consist of a simple pronoun or a modified (22)–(23). In (22a–b), orAt=ke ‘the woman’s’, the noun (Nadj). In (25b), the use of the shuffle op- SUBJ genitive KP modifying SOhAr ‘husband’, cannot erator (,), separating the KP, AP and N nodes en- appear outside of the NP it is embedded in, i.e., out- sures that each of these nodes may occur in any or- side the NP headed by gari ‘car’, since it is embed- . der, thereby allowing for different word orders in- ded too far down in that NP, its GF path being ( OBJ ↑ side the NP. The annotation !

43 heads. Sample c- and f-structures for (2a) are shown OBJ, XCOMP SUBJ, etc. (XCOMP is the gram- in Figures 1 and 2.5 matical function used for non-finite complement (25) a. KP --> NP clauses). Thus, (26) describes exactly those paths (K). that scrambled genitives may be extracted from; it b. NP --> {PRON does not allow for genitives scrambled from ad- |Nadj}. juncts, finite complement clauses (which are inside c. Nadj = KP*: (! CASE) = gen the COMP GF) or from deeper GF paths (e.g., OBJ !

N kI is overtly case-marked), in which case the genitive is required to precede its head (again implemented us- nInA ing head precedence, see above). Finally, line 9 adds an O(ptimality)T(heory) mark to the scrambled gen- Figure 1: Hindi/Urdu NP c-structure for (2a) itive, called attach, which marks the analysis as non-optimal when it is in direct competition with a "nInA kI bETI" local attachment analysis (which does not carry the PRED 'bETI<[1:nInA]>' OT mark). PRED 'nInA' (27) KP-SCRAMBLE = KP*: (! CASE) = gen NSEM PROPER PROPER-TYPE name NTYPE (ˆ KP-SCRAMBLE-PATH) = %PATH SUBJ NSYN proper {(%PATH CASE) =c nom SEM-PROP SPECIFIC + |(%PATH CASE) ˜= nom 1 ANIM +, CASE gen, GEND fem, NUM sg, PERS 3 !

44 Given the ambiguity of the genitive discussed in "gARI nAdiyah nE us kI bAG mEN dEkHI" Section 2, all sentences yield ambiguous parse re- PRED 'dEkH<[23:nAdiyah], [1:gARI]>' PRED 'nAdiyah' sults. As an example, reconsider (13b). The sen- NSEM PROPER PROPER-TYPE name NTYPE tence is part of the testsuite and yields two optimal as SUBJ NSYN proper well as two unoptimal solutions. Under the two op- SEM-PROP SPECIFIC + 23 ANIM +, CASE erg, GEND fem, NUM sg, PERS 3 timal readings, Us=ki ‘his/her’ locally modifies bag PRED 'gARI<[45:vuh]>' PRED 'vuh' ‘park’ as a subject or an adjunct; under the two un- SUBJ NTYPE NSYN pronoun optimal readings, Us=ki ‘his/her’ is a scrambled gen- OBJ 45 CASE gen, NUM sg, PERS 3, PRON-TYPE pers NSEM COMMON count itive subject or adjunct modifying gari ‘car’. NTYPE . NSYN common XLE does not display unoptimal solutions by de- 1 CASE nom, GEND fem, NUM sg, PERS 3 fault; the developer/annotator can select the unopti- PRED 'bAG' COMMON count NTYPE NSEM mal solution(s) by clicking the OT mark that con- ADJUNCT NSYN common trols the (dis)preference. Figure 3 shows the optimal SEM-PROP LOCATION in 77 CASE loc, GEND fem, NUM sg, PERS 3 U solution where s=ki ‘his/her’ is a subject, while LEX-SEM AGENTIVE + Figure 4 shows the corresponding unoptimal solu- TNS-ASP ASPECT perf, MOOD indicative tion, i.e., the scrambled genitive analysis.8 104 CLAUSE-TYPE decl, PASSIVE -, VTYPE main Figure 4: Hindi/Urdu NP f-structure for (13b) "gARI nAdiyah nE us kI bAG mEN dEkHI"

PRED 'dEkH<[23:nAdiyah], [1:gARI]>' PRED 'nAdiyah' NSEM PROPER PROPER-TYPE name Future theoretical work includes a comparison NTYPE SUBJ NSYN proper with other morphologically-rich languages. An ini- SEM-PROP SPECIFIC + 23 ANIM +, CASE erg, GEND fem, NUM sg, PERS 3 tial investigation has shown that scrambling data PRED 'gARI' in Turkish, as discussed by e.g. Kornfilt (2003), NSEM COMMON count OBJ NTYPE NSYN common are similar, but display a constraint called the “bar- 1 CASE nom, GEND fem, NUM sg, PERS 3 rier constraint” by Chomsky (1986), which rules PRED 'bAG<[45:vuh]>' PRED 'vuh' out possessors that occur directly right-adjoined SUBJ NTYPE NSYN pronoun 45 CASE gen, NUM sg, PERS 3, PRON-TYPE pers to arguments; the constraint does not exist in ADJUNCT NTYPE NSEM COMMON count Hindi/Urdu. Since ParGram includes a Turkish NSYN common SEM-PROP LOCATION in grammar (C¸etinoglu, 2009), a comparison of the an- 77 CASE loc, GEND fem, NUM sg, PERS 3 notations necessary to cover the genitive scrambling LEX-SEM AGENTIVE + TNS-ASP ASPECT perf, MOOD indicative facts would be interesting. 104 CLAUSE-TYPE decl, PASSIVE -, VTYPE main

Figure 3: Hindi/Urdu NP f-structure for (13b) Acknowledgments This work is supported by a Nuance Foundation 6 Summary grant on Tense and Aspect in Multilingual Seman- tic Construction. I would like to thank the native The paper describes Hindi/Urdu genitives in general speakers who have provided me with judgments; in and its scrambling properties in particular. I take alphabetical order, these are Qaiser Abbas, Tafseer a detailed look at the empirical distribution of this Ahmed, Rajesh Bhatt, Miriam Butt, Farhat Jabeen, phenomenon, including its syntactic constraints, and Asad Mustafa, Ghulam Raza and Ashwini Vaidya. formulate a generalization using LFG. The general- ization is implemented in the Hindi/Urdu ParGram grammar using XLE. References

8The c-structures are not shown here due to space limita- Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, tions. In the c-structure corresponding to the f-structure in Fig- Owen Rambow, Dipti Misra Sharma, and Fei Xia. ure 3, the genitive attaches below the NP headed by bag ‘park’, 2009. A Multi-Representational and Multi-Layered while in the c-structure for Figure 4, the genitive attaches to the Treebank for Hindi/Urdu. In Proceedings of the Third clausal node, resulting in a flat structure. Linguistic Annotation Workshop, pages 186–189, Sun-

45 tec, Singapore, August. Association for Computational Ayesha Kidwai. 1999. Word Order and Focus Posi- Linguistics. tions in Universal Grammar. In Georges Rebuschi and Tina Bogel¨ and Miriam Butt. 2013. Possessive Clitics Laurice Tuller, editors, The Grammar of Focus, pages and Ezafe in Urdu. In Kersti Borjars,¨ David Deni- 213–244. John Benjamins. son, and Alan Scott, editors, Morphosyntactic Cate- Ayesha Kidwai. 2000. XP-Adjunction in Universal gories and the Expression of Possession, pages 291– Grammar: Scrambling and Binding in Hindi-Urdu. 322. John Benjamins. Oxford University Press. Tina Bogel,¨ Miriam Butt, Annette Hautli, and Sebastian John Kimball. 1973. Seven Principles of Surface Struc- Sulger. 2009. Urdu and the Modular Architecture of ture Parsing in Natural Language. Cognition, 2:15–47. ParGram. In Proceedings of the Conference on Lan- Jaklin Kornfilt. 2003. Scrambling, Subscrambling, and guage and Technology 2009 (CLT09). Center for Re- Case in Turkish. In Simin Karimi, editor, Word Order search in Urdu Language Processing (CRULP). and Scrambling. Blackwell Publishing. Tina Bogel.¨ 2012. Urdu – Roman Transliteration Anoop Kumar Mahajan. 1990. The A/A-Bar Distinction via Finite State Transducers. In Proceedings of and Movement Theory. Ph.D. thesis, MIT. FSMNLP’12, pages 25–29. Muhammad Kamran Malik, Tafseer Ahmed, Sebastian Miriam Butt and Tracy Holloway King. 2004. The Sta- Sulger, Tina Bogel,¨ Atif Gulzar, Ghulam Raza, Sar- tus of Case. In Veneeta Dayal and Anoop Kumar Ma- mad Hussain, and Miriam Butt. 2010. Transliter- hajan, editors, Clause Structure in South Asian Lan- ating Urdu for a Broad-Coverage Urdu/Hindi LFG guages, pages 153–198. Kluwer. Grammar. In Proceedings of the Seventh Conference Miriam Butt and Tracy Holloway King. 2007. Urdu in on International Language Resources and Evaluation a Parallel Grammar Development Environment. Lan- (LREC 2010), pages 2921–2927. guage Resources and Evaluation: Special Issue on Tara Mohanan. 1994. Argument Structure in Hindi. Dis- Asian Language Processing: State of the Art Re- sertations in Linguistics. CSLI Publications. sources and Processing, 41:191–207. Stephanie Needham and Ida Toivonen. 2011. Derived Miriam Butt and Heike Zinsmeister. 2009. ESSLLI 2009 Arguments. In Miriam Butt and Tracy Holloway King, Course on Case, Scrambling and Default Word Order. editors, Proceedings of the LFG11 Conference, pages Course material. 401–421. CSLI Publications. Miriam Butt, Helge Dyvik, Tracy Holloway King, Hi- John Robert Ross. 1967. Constraints on Variables in roshi Masuichi, and Christian Rohrer. 2002. The Syntax. Ph.D. thesis, MIT. Parallel Grammar Project. In Proceedings of the Sebastian Sulger, Miriam Butt, Tracy Holloway COLING-2002 Workshop on Grammar Engineering King, Paul Meurer, Tibor Laczko,´ Gyorgy¨ Rakosi,´ and Evaluation, pages 1–7. Cheikh Bamba Dione, Helge Dyvik, Victoria Rosen,´ Miriam Butt. 1995. The Structure of Complex Predicates Koenraad De Smedt, Agnieszka Patejuk, Ozlem¨ in Urdu. Dissertations in Linguistics. CSLI Publica- C¸etinoglu, I Wayan Arka, and Meladel Mistica. 2013. tions. ParGramBank: The ParGram Parallel Treebank. Ozlem¨ C¸etinoglu. 2009. A Large Scale LFG Grammar In Proceedings of the 51st Annual Meeting of the for Turkish. Ph.D. thesis, Sabanci University. Association for Computational Linguistics (Volume Noam Chomsky. 1986. Barriers. MIT Press. 1: Long Papers), pages 550–560, Sofia, Bulgaria, August. Association for Computational Linguistics. Dick Crouch, Mary Dalrymple, Ronald M. Kaplan, Tracy Holloway King, John T. Maxwell III, and Paula Sebastian Sulger. to appear. Modeling Nominal Predica- Newman, 2015. XLE Documentation. Palo Alto Re- tions in Hindi/Urdu. Ph.D. thesis, University of Kon- search Center. stanz. Mary Dalrymple. 2001. Lexical Functional Grammar, volume 34 of Syntax and Semantics. Academic Press, New York. Gisbert Fanselow and Caroline Fery.´ 2006. Prosodic and Morphosyntactic Aspects of Discontinuous Noun Phrases: a Comparative Perspective. Manuscript, Uni- versity of Potsdam. Edward Gibson. 1991. A Computational Theory of Hu- man Linguistic Processing: Memory Limitations and Processing Breakdown. Ph.D. thesis, Carnegie Mel- lon University, Pittsburgh, PA.

46 Discontinuous parsing with continuous trees

Wolfgang Maier and Timm Lichte Institute for Language and Information University of Dusseldorf¨ Universitatsstr.¨ 1, 40225 Dusseldorf,¨ Germany maierwo,lichte @phil.hhu.de { }

Abstract S HD OC

VP We introduce a new method for incremen- OC HD tal shift-reduce parsing of discontinuous con- VP stituency trees, based on the fact that discon- MO HD tinuous trees can be transformed into continu- Darüber muß nachgedacht werden . ous trees by changing the order of the terminal PROAV VMFIN VVPP VAINF $. nodes. It allows for a clean formulation of dif- ferent oracles, leads to faster parsers and pro- Figure 1: Discontinuous annotation of (1) vides better results. Our best system achieves an F1 of 80.02 on TIGER. German TIGER and NeGra treebanks. This can be 1 Introduction seen in Fig. 1, which shows the treebank annotation of (1). The connection between Daruber¨ and its ref- Certain structures in natural language can be de- erence participle nachgedacht is made by grouping scribed as discontinuous, in the sense that they con- both words under a single VP node. sist of two or more parts which are not adjacent. Parsing discontinuous constituents is a challenge, In linguistics, such structures are typically consid- since approaches that produce context-free deriva- ered the result of some kind of movement of an ele- tions cannot be used. For treebanks in which dis- ment out of a “base” position. Discontinuous struc- continuities are represented by traces, various ap- tures occur across many other languages (Huck and proaches have been presented, mostly based on the Ojeda, 1987). Sentence (1) is a German example, extension of a CFG parser with a pre-, post- or taken from the NeGra treebank. in-processing step. See, e.g., Johnson (2002), Di- (1) Daruber¨ muss nachgedacht werden enes and Dubey (2003), Levy and Manning (2004), Thereof must thought-about be Schmid (2006), and Cai et al. (2011). For the di- ‘We have to think about that’ rect parsing of discontinuous constituents, grammar- based techniques have been used, mostly on the In this sentence, the adverb Daruber¨ , modifier of the basis of Linear Context-Free Rewriting System participle nachgedacht, is moved to the front. Tree- (LCFRS) (Vijay-Shanker et al., 1987). LCFRS is an bank annotation generally accounts for such struc- extension of Context-Free Grammar in which a sin- tures: Either, the base position of an element is gle non-terminal can span k 1 continuous parts ≥ marked with a trace node which is coindexed with of the input string; i.e., CFG is a special case of the moved element, as it is done, e.g., in the Penn LCFRS in which k = 1. See, e.g., Kallmeyer and Treebank; or, all parts of a discontinuous constituent Maier (2013), van Cranenburgh (2012), Angelov are grouped under a single node, as it is done in the and Ljunglof¨ (2014), Nederhof and Vogler (2014),

47

Proceedings of DiscoNLP 2016, pages 47–57, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics or Cohen and Gildea (2015). Even with advanced choosing the appropriate terminal order is crucial to approaches such as the latter, the high parsing com- parsing success: We obtain state-of-the-art results plexity with such approaches is a major bottleneck for discontinuous shift-reduce constituency parsing, which tends to lead to low parsing speeds. namely 80.02 on the TIGER data set of Hall and Another approach consists of creating a reversible Nivre (2008). conversion of discontinuous constituents to depen- The remainder of the article is organized as fol- dencies, and to parse those with an appropriate de- lows. In Sec. 2, we present the basic architecture for pendency parser. This very successful approach shift-reduce parsing with discontinuous constituents is taken by Hall and Nivre (2008) and Fernandez-´ of Maier (2015), on which we build our work. Sec. 3 Gonzalez´ and Martins (2015). presents our methodology. In Sec. 4, we present Recently, Versley (2014) and Maier (2015) have our experiments and discuss the results, and Sec. 5 exploited a strategy known from non-projective de- closes the article. pendency parsing (Nivre, 2009; Nivre et al., 2009): One can convert every non-projective dependency 2 Discontinuous Shift-Reduce Parsing tree into a projective one by reordering its words. 2.1 Basic parser architecture Non-projective dependency parsing can therefore be cast as projective dependency parsing with an ad- We base our work on the shift-reduce parser archi- ditional online reordering operation which allows tecture of Maier (2015), which in turn is based on for the input to be processed out-of-order (“swap”). the architecture of Zhu et al. (2013). The same holds for the parsing of discontinuous As usual in shift-reduce parsing, a parser state constituency trees. While Versley (2014) adapts represents a (partial) derivation. It is a tuple con- the “easy-first” strategy of Goldberg and Elhadad sisting of a queue of incoming pairs of input tokens 2 (2010) to work with a swap operation, Maier (2015) and POS tags which have not yet been processed, extends the shift-reduce approach of Zhu et al. and a stack which holds completed constituents. The (2013) correspondingly. Note that the idea of pro- initial parser state has an empty stack and a queue cessing linear precedence and immediate dominance which holds the input string to be parsed. From a for discontinuous parsing separately has also been given state, other states can be reached via the fol- explored in a grammar-based context by Nederhof lowing state transitions. and Vogler (2014). SHIFT shifts a single element from the queue In this paper, we build on the work of Maier • (2015) and make two contributions. Firstly, we in- onto the stack. troduce a new parser transition SKIPSHIFT-i which BINL-X, resp. BINR-X build a new X con- in comparison to the swap operation reduces the • stituent with the first two stack elements as amount of decisions required to be taken in order its children and its head coming from its left, to produce a discontinuous constituent and therefore resp. right child. The new constituent replaces leads to fewer errors. Secondly, we address the prob- the first two stack elements. lem that when processing the input terminals out- of-order, the same tree can be mapped to different UNARY-X builds a new X constituent with the • parser transition sequences. We introduce an algo- first stack element as its child. The new con- rithm which reorders the terminals of a tree off-line stituent replaces the first stack element. such that the resulting tree is continuous. The re- ordered terminals are used as a basis for obtaining SWAP-i handles discontinuous constituents. It • an oracle that maps the tree to a canonical transi- allows for a block of i elements from the stack tion sequence. All new techniques are implemented 2POS tagging and constituency shift-reduce parsing can be within uparse, the publicly available parser of Maier done jointly, as demonstrated, e.g., by Mi and Huang (2015). 1 (2015). An experimental evaluation shows that However, note that throughout, we assume POS tagging to be done outside of the parser as in earlier work (Zhu et al., 2013; 1https://github.com/wmaier/uparse. Maier, 2015).

48 to be swapped back on the queue, starting with (initially filled with the sequence of terminals/pre- the second stack element.3 terminals of the tree), a stack structure holding sub- trees of the treebank tree to be processed (initially FINISH pops the last remaining element from • empty), and a list holding the result. the stack, given that it is labeled with the root While the incoming list is not empty, repeat the label and the queue is empty. following two steps.

IDLE can be applied any number of times after 1. If the stack is not empty, repeat: • FINISH. This compensates for different lengths If the root of the first subtree on the stack of analyses (Zhu et al., 2013). • is the only child of its parent node p (la- A parser state to which FINISH has been applied beled X), pop the subtree from the stack is a final state. Transitions can only be applied on and push the subtree the root of which is states which fulfill certain conditions. For instance, p, then add an UNARY-X transition to the SHIFT can only be applied if there are elements left result. If the first two subtrees on the stack share on the queue. The full set of the corresponding con- • ditions is listed in the appendix of Zhang and Clark the same parent node p (labeled X), pop (2009), and in Maier (2015) (for SWAP). both subtrees from the stack and push the subtree the root of which is p, then add a 2.2 Oracles BINR-X/BINL-X transition, depending An oracle is used to obtain canonical transition se- on the head side of p. quences from gold treebank trees. These sequences If no transitions have been added, go to the next can then be learned by the parser. step. Since we only use transitions that handle unary and binary nodes, incoming trees must be binarized. 2. If the incoming list is not empty, process the As in previous work, we use head-outward binariza- next terminal as follows. Determine the left- tion with binary top and bottom productions, and a most terminal dominated by the right child of single binarization label @X for all X constituents. the parent of the top element on the stack. If For details on the binarization, see Maier (2015) and there is no gap, then this is the first element in references therein. our incoming list. We can add a single SHIFT, For continuous parsing, i.e., when no discontin- remove the first element from the incoming list uous trees have to be handled such as in Zhu et al. and add it to the stack. If there is a gap, then there are i 1 terminals between the head of (2013), one can use a simple oracle which traverses ≥ the tree top-down in postorder. Before the traver- the list and the terminal to be shifted. We there- sal starts, a single IDLE followed by a single FIN- fore add i+1 SHIFT and one SWAP-i transition; ISH transition are generated. Then, at each binary X then we remove the i+1st element from the in- node, a corresponding BINL-X/BINR-X is gener- coming list and add it to the stack. ated; each unary X node leads to a UNARY-X tran- Note that in the continuous case, both oracles sition. For a terminal, SHIFT is generated. In a last yield the same result. As an example, consider step the transition sequence is reversed. Fig. 1. At the start, no reduce transitions can be Discontinuous trees can be handled with the added because the stack is empty. We pass to step SWAP-i transition, which allows for the input to be 2, add a SHIFT transition to the result and remove processed out-of-order (Maier, 2015). The corre- the first token (Daruber¨ ) from the list. Then, we sponding oracle traverses a treebank tree bottom up must jump over the gap, i.e., we determine the left- left-to-right. Mirroring the parser, it maintains an most terminal dominated by the right child of the incoming list of token/POS-pairs to be processed parent of the topmost stack element. The parent 3 This operation is called COMPOUNDSWAPi in Maier of the topmost stack element is VP, and its right- (2015); we do not use single SWAP here. most child nachgedacht, resp. VVPP. We therefore

49 add two SHIFTs and one SWAP-1. This then al- there is exactly one (u, v) E with u V ; u is ∈ ∈ lows for the addition of a BINR-VP. In the follow- thereby called the parent of v and v a child of u. ing, we have to jump over muß again. The parent of Nodes with no children are called terminals. We the topmost stack element is (the upper) VP and its let V = v V v is a terminal ; all v V T { ∈ | } t ∈ T rightmost child is werden. Therefore, we add again are uniquely numbered from 1 to V by a function | T | two SHIFTs and one SWAP-1 followed by BINR-VP. ind (Maier and Lichte, 2011). The yield of a node Last, we SHIFT the remaining muss and add BINR- v V is the set of all terminals it dominates, i.e., ∈ S. for all v V , yield(v) = u V (v, u) E . ∈ { ∈ T | ∈ ∗} V is equipped with an order , which orders the ≺ 2.3 Structured prediction nodes according to the lowest index of the termi- For structured prediction, each parser state is as- nals they dominate. I.e., u v iff min( ind(u ) ≺ { 0 | signed a score. The score of the start state is 0, and u yield(u) ) < min( ind(v ) v yield(v) ) 0 ∈ } { 0 | 0 ∈ } the score of the i + 1th state is the sum of the score where u, v V . We write c(v) for the ordered list ∈ of the ith state and the dot product of a local feature of the children of a given node v; for the ith child of vector obtained through a feature function φ and a v, we write c(v)[i]. global weight vector θ. We train θ with the averaged Any node v V is discontinuous (as opposed ∈ Perceptron using max violation (Huang et al., 2012). to continuous) if there are v , v yield(v) such 1 2 ∈ Decoding is done with beam search. The beam of that ind(v ) + 1 < ind(v ) and ind(v ) + 1 / 1 2 1 ∈ size n is initialized with the start state. Then repeat- ind(v ) v yield(v) . If i v < i < { 0 | 0 ∈ } { | 1 edly, a candidate list is filled with all states which v ind(v) = , then the tuple (v , v ) is called 2} ∩ ∅ 1 2 can be built using transitions on states on the beam. a gap of size ind(v ) ind(v ) 1; v and v are 2 − 1 − 1 2 The highest scoring n of them are put back on the called its left and right border. T is discontinuous if beam. Parsing is finished if the highest scoring state V contains discontinuous nodes. A continuous node on the beam is a final state. which has discontinuous children is called a gap cre- ator. 2.4 Features The feature function φ works by applying templates 3.2 Continuous Tree Reordering to parser states. The templates describe particular As already observed in previous literature (Nivre, configurations of stack and queue. As does Maier 2009; Versley, 2014; Maier, 2015), a discontinuous (2015), we use the feature template set from Zhang tree can be made continuous by changing the order and Clark (2009) as a baseline, and furthermore ex- of the terminals. The new terminal order must be periment with the extended features of Zhu et al. such that all gaps in yields are eliminated, i.e., it (2013). We also use the features for discontinuities must ensure that discontinuous parts are joined. from Maier (2015). For the full template list, consult In other words, given a tree (V, E, r) we want Appendix A. to produce a function ind σ to replace ind. ind σ must be such that all v V are continuous. In ∈ 3 Methodology general, more than one such function will exists for We now present the main contributions of this pa- a tree. As an example, consider again the tree in per, namely, the tree reordering algorithm, the new Fig. 1. Two possible permutations of the terminals parser transition, and the new oracles. which eliminate the VP gap would be: Daruber¨ nachgedacht werden muß (i.e., ind σ(Daruber¨ ) = 3.1 Discontinuous Trees 1, ind σ(nachgedacht) = 2, ind σ(werden) = 3, First, we define the structures which are manipu- ind σ(muß) = 4) and muß Daruber¨ nachgedacht lated by the tree reordering algorithm. A discontin- werden (i.e., ind σ(muß) = 1, ind σ(Daruber¨ ) = 2, uous tree is a directed acyclic graph T = (V, E, r) ind σ(nachgedacht) = 3, ind σ(werden) = 4). Note where V is a set of nodes with r V the root node, that the first order is the complementizer-free order ∈ and E : V V a set of edges with E the reflex- of embedded sentences. The second order is also an × ∗ ive transitive closure of E. For all v V r , acceptable German sentence, namely, the question ∈ \{ }

50 form of the original sentence. reorder(werden) is [4]. The lower VP is also Given a binarized tree T = (V, E, r), we can ob- no gap creator, therefore we apply again the tain different variants of ind systematically with a left reordering and obtain reorder(Daruber¨ ) σ ⊕ recursive top-down procedure reorder, to be called reorder(nachgedacht). reorder(Daruber¨ ) is [1], on r. reorder yields a bijection σ of the set and reorder(nachgedacht) is [3], i.e., the result for ind(v ) v yield(r) to itself, and we define the lower VP is [1, 3]. The result for the upper { t | t ∈ } ind such that for all v yield(r), ind (v ) = VP is [1, 3, 4], and the result for S is [2, 1, 3, 4]. σ t ∈ σ t σ(ind(vt)). Note that we use tuple notation with In other words, we obtain ind σ(Daruber¨ ) = 2, angled brackets for the output of the bijection; ind (muss) = 1, ind (nachgedacht) = 3, and ⊕ σ σ denotes tuple concatenation ind σ(werden) = 4, the above mentioned question When called on a node v V , reorder ordering of the sentence. ∈ distinguishes three cases. (1) If v is a ter- minal, reorder(v) simply yields [ind(v)]. 3.3 From Swap to Skip-Shift (2) If v is an unary node, reorder(v) yields We introduce a new transition SKIPSHIFT-i, which reorder(c(v)[0]). (3) If v is a binary node, then shifts the ith element from the queue (counting from there are two different possibilities. reorder can 0). In other words, it allows the parser to di- yield reorder(c(v)[0]) reorder(c(v)[1]) or rectly skip elements on the queue while shifting. ⊕ reorder(c(v)[1]) reorder(c(v)[0]). The former is SKIPSHIFT-i underlies the same restrictions as the ⊕ called left reordering, since all terminals of the left ”regular” SHIFT transition (Zhang and Clark, 2009): child are placed in front of all terminals of the right The queue must not be empty and the state must not child. Correspondingly, the latter is called the right be a final state. Furthermore, if last transition was of reordering. Not all nodes have to be reordered in type BINR- and led to an intermediate constituent, the same way; we therefore explore the following SKIPSHIFT-i is disallowed. reordering selections. The new transition reduces the amount of tran- sitions needed to parse a discontinuity. As an LEFT: always select left reordering. • example, consider the transition sequence which would be extracted from the tree in Fig. 1 with RIGHT: always select right reordering. This • also reorders continuous nodes. SHIFT and SWAP, namely SHIFT,SHIFT,SHIFT, SWAP,BINR-VP,SHIFT,SHIFT,SWAP,BINR- RIGHTD: select right reordering if node is a VP,SHIFT,BINR-S. Note that the word muss is • gap creator, otherwise do not reorder. shifted two times and swapped back on the queue before it is finally used in a reduce transition the DISTi: select right reordering if node is a gap • third time it is shifted. With SKIPSHIFT-i, only 7 creator and the left or right child have a gap of instead of 11 decisions are required: SKIPSHIFT-0, size i, otherwise do not reorder. ≥ SKIPSHIFT-1,BINL-VP,SKIPSHIFT-1,BINL-VP, LABEL: specify a reordering direction for par- SKIPSHIFT-0,BINR-S. • ticular labels, and a default reordering for all 3.4 From reordering to oracles others. In the case of discontinuities, the bottom-up ora- As an example, let us look at the result of cle from Maier (2015) determines the next termi- RIGHTD when applied on the tree in Fig. 1. The nal to be shifted by skipping over the discontinuity original set of indices of the root node is 1, 2, 3, 4 . through a traversal of the relevant part of the input { } Since S is binary and a gap creator (S is continu- tree, namely, the path from the left to right border of ous, but its VP child is discontinuous), we select the a gap. right reordering, i.e., reorder(S) is reorder(muß) We modify the old oracle such that the order of ⊕ reorder(VP). reorder(muß) is [2]. The VP is operations is determined by the terminal reordering not a gap creator, i.e., we apply the left reorder- obtained with the reorder procedure described in ing and obtain reorder(VP) reorder(werden). Sec. 3.2. If the incoming terminal list is ordered by ⊕

51 ind σ, the gap borders in the original ordering are we attach the material which is not included in the joined together. Therefore, no tree walk is necessary annotation (mainly punctuation) to the tree itself. to determine the next terminal to be shifted (i.e., the We run the training for 20 iterations using beam right border of the gap). It is always the first one size 4. Other parameters are indicated later. The in the reordered list. We must, however, determine results are reported as labeled bracket scores, as ob- how many terminals have to be shifted and swapped, tained with the evaluation module of discodop.5 resp. skipped, relative to the original order, i.e., the We use the proper.prm file of the discodop size i of the gap that is skipped. For this, we deter- distribution, i.e., root nodes are not included, but mine the index of the first terminal in the list of ter- punctuation is included in the evaluation. minals to be processed, and then count the number of terminals in the list that have a lower index; i is set 4.2 Results to this number minus one. Once determined, i can All experimental results are listed in Tab. 1 (evalua- be used with either SHIFT and SWAP, or with SKIP- tion on all constituents) and Tab. 2 (evaluation only SHIFT. With the former, we generate i + 1 SHIFT on discontinuous constituents). followed by one SWAP-i transition. With the latter, As a baseline, we run a single experiment with the we just generate a single SKIPSHIFT-i transition. SWAP-i transition. Then, with different settings, we 3.5 More features run experiments with the SKIPSHIFT-i transition, using the LEFT,RIGHTD,RIGHT,LABEL, and the The new SKIPSHIFT-i operation can access any el- DISTi reorderings, choosing 2, 4, and 8 as distance ement in the input queue, not just the first one. In for the latter. For the label-based reordering, we em- order to make the corresponding information avail- ploy the left reordering for all NP and PP nodes, and able to the parser, we introduce a new feature set RIGHTD for all other nodes. NPs and PPs are very which consist of all token/POS pairs remaining on similar in the TIGER annotation; the only difference the queue (”full queue features”). between both is the presence of a preposition in PPs, Furthermore, we adapt swap importance weight- resp. the absence of it in NPs (Maier et al., 2014). ing from Maier (2015). During training, all updates Both are often deeply embedded; we therefore pre- that concern a swap transition are counted twice. We sume that recognizing the entire phrase before rec- do the same for SKIPSHIFT-i. ognizing the material in the gap is more promising. WAP i 4 Experiments The results for S - confirm the results from Maier (2015). With the SKIPSHIFT-i operation, the We implement the tree transformations with the cor- parser performs consistently better than with SWAP- responding oracles, as well as the SKIPSHIFT-i op- i. The best result is obtained with DIST2, an F1 eration within uparse, the publicly available im- gain of 2.2 over the swap baseline. Unsurprisingly, plementation of the parser of Maier (2015).4 RIGHT does not perform well due to the fact that it also aggressively reorders nodes which are continu- 4.1 Data and Setup ous; it will not be included in the remaining experi- In order to facilitate a comparison, we use the same ments. When evaluating discontinuous constituents data as Maier (2015), we use the TIGER treebank re- only, DIST8 is the most successful setting. This lease 2.2 with the splits of Farkas and Schmid (2012) can be explained by the fact that preferring the right and Hall and Nivre (2008). In the former, the first reordering in nodes with children that have large half of the last 10,000 sentences are used for devel- gaps means that the i in SKIPSHIFT-i transitions can opment and the second half for testing, and the rest be maintained lower than with the left reordering. is left for training. In the latter, the treebank is split Since SKIPSHIFT-i with a lower i are seen more fre- into ten parts, such that sentence i is put into part quently in training, results improve. LABEL is not i mod 10. The first of those parts is used for testing, that successful, achieving only a slightly higher pre- the concatenation of the rest for training. As usual, cision on discontinuous constituents.

4https://github.com/wmaier/uparse. 5https://github.com/andreasvc/discodop.

52 Rec. Prec. F1 Exact Rec. Prec. F1 Exact SWAP-i 72.80 74.67 73.72 36.27 SWAP-i 10.29 23.64 14.34 8.03 Baseline Baseline LEFT 73.94 75.54 74.73 37.37 LEFT 14.31 18.64 16.19 12.33 RIGHTD 75.15 76.77 75.95 37.19 RIGHTD 10.90 25.27 15.24 8.74 RIGHT 25.61 22.53 23.97 10.68 RIGHT 1.73 0.18 0.33 0.51 LABEL 75.14 76.71 75.92 37.28 LABEL 10.58 25.88 15.02 8.75 DIST2 75.13 76.81 75.96 37.19 DIST2 11.82 28.15 16.64 9.73 DIST4 74.98 76.55 75.76 37.19 DIST4 11.63 25.40 15.96 10.10 DIST8 75.18 76.64 75.88 37.56 DIST8 13.53 27.11 18.05 11.29 + extended features + extended features LEFT 74.43 75.87 75.14 37.80 LEFT 13.54 21.38 16.58 12.20 RIGHTD 75.58 77.07 76.32 37.86 RIGHTD 10.25 31.00 15.41 9.04 LABEL 75.24 76.61 75.92 37.41 LABEL 10.19 27.92 14.94 8.99 DIST2 75.70 77.25 76.46 37.82 DIST2 11.02 31.32 16.31 9.99 DIST4 75.60 77.07 76.33 37.84 DIST4 11.22 29.16 16.20 10.25 DIST8 75.18 76.61 75.89 37.83 DIST8 11.78 26.00 16.21 10.56 + extended and disco features + extended and disco features LEFT 74.77 76.10 75.43 37.84 LEFT 14.90 29.53 19.80 14.17 RIGHTD 75.73 77.14 76.43 37.13 RIGHTD 8.43 42.13 14.05 8.14 LABEL 75.65 77.07 76.35 37.38 LABEL 8.87 45.91 14.86 8.57 DIST2 75.57 76.95 76.25 37.69 DIST2 9.03 41.49 14.83 9.43 DIST4 75.63 77.08 76.35 37.52 DIST4 10.14 42.86 16.40 9.62 DIST8 75.39 76.77 76.08 37.72 DIST8 11.42 37.96 17.55 11.06 + extended and full queue features + extended and full queue features LEFT 74.44 75.96 75.19 38.54 LEFT 14.30 24.86 18.16 12.98 RIGHTD 75.42 76.96 76.18 38.20 RIGHTD 10.96 30.69 16.15 9.79 LABEL 75.30 76.91 76.10 37.90 LABEL 11.12 29.02 16.08 9.57 DIST2 75.65 77.17 76.40 38.01 DIST2 11.58 31.93 17.00 9.91 DIST4 75.43 76.83 76.12 38.22 DIST4 11.03 27.74 15.78 9.81 DIST8 75.20 76.54 75.87 37.95 DIST8 12.06 27.60 16.79 10.42 + extended features and imp. weighting + extended features and imp. weighting LEFT 74.45 75.82 75.13 37.53 LEFT 13.39 24.24 17.25 12.36 RIGHTD 75.44 76.77 76.10 37.48 RIGHTD 9.15 31.98 14.23 7.79 LABEL 75.42 76.84 76.12 37.70 LABEL 9.47 33.83 14.80 8.32 DIST2 75.51 76.89 76.19 37.37 DIST2 10.00 33.59 15.41 9.12 DIST4 75.18 76.55 75.86 37.68 DIST4 9.70 32.50 14.94 9.41 DIST8 75.00 76.36 75.67 37.12 DIST8 10.54 29.14 15.48 9.78 + extended, disco and full queue feature + extended, disco and full queue feature + importance weighting + importance weighting LEFT 74.87 76.23 75.54 37.61 LEFT 14.31 35.27 20.36 13.80 RIGHTD 75.28 76.63 75.95 37.03 RIGHTD 8.91 45.93 14.92 8.37 LABEL 75.30 76.76 76.02 36.94 LABEL 8.39 43.10 14.04 8.44 DIST2 75.43 76.95 76.19 37.54 DIST2 8.45 42.62 14.11 8.67 DIST4 75.33 76.65 75.98 37.05 DIST4 8.91 41.11 14.65 8.72 DIST8 75.16 76.64 75.89 37.28 DIST8 10.95 41.84 17.35 10.55

Table 1: Results (all constituents) Table 2: Results (discontinuous constituents)

53 VM here vC H&N F&M sion that the strongest point of discontinuous shift- F1 74.23 79.52 80.02 79.00 79.93 85.53 E 37.32 44.32 45.11 41.33 37.78 51.21 reduce parsing, namely the locality of the search which leads to its speed, is also its biggest weakness. Table 3: Results for sentence length 40 on H&N data ≤ Certain structures can simply not be recognized al- most “by definition”. For instance, in order to cor- The fact that we need less operations in total in rectly recognize an NP with an extraposed modifier, order to build a tree (in comparison to SWAP-i) is the reduction of the full NP must be delayed until reflected in reduced parsing times. With SWAP-i, the complete modifier has been recognized. With we need 63 seconds to parse the entire test set (79.5 the current parsing model, in some situations, there sent./sec.), with SKIPSHIFT-i and LEFT, we only is just no way of knowing if a delay is necessary, need 49 seconds (101.8 sent./sec.). since the modifier can be still out of reach for the When adding the extended features from Zhu et feature function when the first part of the full NP al. (2013), the trend seen in Maier (2015) is re- has already been recognized. Due to beam search, peated: Looking deeper into the structures on the once the NP is reduced, we cannot backtrack, i.e., stack leads to a higher performance when looking at the modifier cannot be attached later. all constituents. On discontinuous constituents, we One way of addressing this problem could be the obtain a improved precision, but a slightly worse re- use of exact search, such as in Thang et al. (2015). call. The features for discontinuities have also the However, there is another perspective. An impor- same effect as in Maier (2015), no improvement is tant finding of the experiments is that recognizing achieved on all constituents. The precision on dis- material in gaps before the discontinuous constituent continuous constituents is, however, much higher itself (RIGHTD) leads to high precision and low re- while the corresponding recall drops sharply, indi- call on discontinuous constituents, while recogniz- cating data sparseness. Adding the full queue fea- ing the discontinuous constituent first (LEFT) leads tures to the extended features only has a very small to more errors, but also catches more cases (lower effect. precision, higher recall). The reason for this is that Also, adding importance weighting alone is not in the former case, one decides too late and in the very successful. However, when we combine the ex- latter case too early if a partially recognized con- tended, the discontinuous and the full queue features stituent is part of a discontinuous structure. We con- with importance weighting, we achieve the best re- jecture that what makes the parsers of Fernandez-´ sult on discontinuous constituents; surprisingly this Gonzalez´ and Martins (2015) and van Cranenburgh happens when using the LEFT reordering. and Bod (2013) successful are their mechanisms of Last, for comparison, we run experiments on the handling this issue: The former joins the recogni- Hall and Nivre (2008) (H&N) data set. We run tion of a terminal with the decision of what part of our own parser (LEFT; extended, discontinuous, and a (potentially discontinuous) constituent it belongs full queue features; importance weighting); and also to. The latter gets structure-global context by build- discodop (van Cranenburgh and Bod, 2013) (vC) ing the final tree as a combination of discontinu- (default settings, using gold POS tags). Tab. 3 ous base structures. This makes it particularly more shows the corresponding results along with those successful on discontinuous structures, achieving of Versley (2014) (V), Maier (2015) (M), H&N Prec./Rec./F1 of 33.77/50.29/40.41 on them (com- and Fernandez-Gonz´ alez´ and Martins (2015) (F&M) pared to our result of 19.36/39.71/26.03) (unfor- (taken from Maier (2015)). Note that we do improve tunately, Fernandez-Gonz´ alez´ and Martins (2015) on M, but still lie much behind F&M. have not reported results on discontinuous con- stituents alone). In future work, we will explore pos- 4.3 Discussion sibilities of integrating such a mechanism in a shift- What about the big picture? In spite of an improve- reduce approach. ment on Maier (2015), the scores on the discontinu- We have seen that choosing the transition order ous constituents remain very low. A manual anal- well, i.e., picking the right oracle, is crucial for pars- ysis of the parsing results leads us to the conclu- ing success. We therefore want to explore how the

54 order of transitions affects parsing results in the con- trigrams tinuous case. A particular transition order could s0cs1cs2w, s0cs1cs2c, s0cs1cq0w, s0cs1cq0t, be forced via a tree transformation such as right- s0cs1wq0w, s0cs1wq0t, s0ws1cs2c, s0ws1cq0t corner transform (Schuler et al., 2010). Concretely, The extended features have been introduced by right-corner transform would give preference to eas- Zhu et al. (2013). ier transition orders in which we reduce as soon as extended possible, i.e., long sequences of SHIFTs would be s0llwc, s0lrwc, s0luwc, s0rlwc, s0rrwc, avoided. s0ruwc, s0ulwc, s0urwc, s0uuwc, s1llwc, Last, it should be noted that the applica- s1lrwc, s1luwc, s1rlwc, s1rrwc, s1ruwc tion of SKIPSHIFT-i is not limited to discontin- uous constituency parsing. We want to apply The following disco features stem from Maier our method to non-projective dependency parsing, (2015). As explained there, in the following tem- plates, x denotes the gap type of a stack element. where SKIPSHIFT-i could be used instead of swap- eager/lazy transitions (Nivre et al., 2009) in a pars- It can be “none” (tree on stack is fully continuous), ing framework such as the one of Zhang and Nivre “pass” (there is a gap at the root), and “gap” (the root (2011). This would also allow for an ”intersec- of this tree fills a gap, i.e., its children have gaps, but tion” between our work and then one of Fernandez-´ the root does not). y stands for the sum of all gap Gonzalez´ and Martins (2015). lengths. unigrams 5 Conclusion s0xwc, s1xwc, s2xwc, s3xwc, s xtc, s xwc, s xtc, s xwc, We have presented a new tree reordering method 0 1 2 3 s xy, s xy, s xy, s xy which makes discontinuous constituency trees con- 0 1 2 3 bigrams tinuous. The reordering method can be used to s xs c, s xs w, s xs x, s ws x, s cs x, obtain oracles for discontinuous shift-reduce pars- 0 1 0 1 0 1 0 1 0 1 s xs c, s xs w, s xs x, s ws x, s cs x, ing. In conjunction with a new parser transition, we 0 2 0 2 0 2 0 2 0 2 s ys y, s ys y, s xq t, s xq w have achieved state-of-the-art results for discontinu- 0 1 0 2 0 0 0 0 ous shift-reduce constituency parsing. Acknowledgments Appendix A. Feature Templates We would like to thank Omri Abend for discus- Our parser uses feature templates from previous sions. Thanks also to the three anonymous review- work. For the sake of completeness, we list them ers for valuable comments and suggestions. This work was partially funded by Deutsche Forschungs- here. si and qi stand for the ith item on stack and queue, w is the head word, t the head tag and c the gemeinschaft (DFG). constituent label (w, t and c are identical on preter- minal level). l and r (ll and rr) are the left and right References children (grand-children) of the corresponding ele- Krasimir Angelov and Peter Ljunglof.¨ 2014. Fast statis- ment on the stack; u deals with unary constituents. tical parsing with parallel multiple context-free gram- The following baseline features have been pre- mars. In Proceedings of the 14th Conference of the Eu- sented by Zhang and Clark (2009). ropean Chapter of the Association for Computational unigrams Linguistics, pages 368–376, Gothenburg, Sweden. s0tc, s0wc, s1tc, s1wc, s2tc, s2wc, s3tc, s3wc, Shu Cai, David Chiang, and Yoav Goldberg. 2011. Language-independent parsing with empty elements. q0wt, q1wt, q2wt, q3wt, In Proceedings of the 49th Annual Meeting of the As- s lwc, s rwc, s uwc, s lwc, s rwc, s uwc 0 0 0 1 1 1 sociation for Computational Linguistics: Human Lan- bigrams guage Technologies, pages 212–216, Portland, OR. s0ws1w, s0ws1c, s0cs1w, s0cs1c, s0wq0w, s0wq0t, Shay B. Cohen and Daniel Gildea. 2015. Parsing linear- s0cq0w, s0cq0t, s1wq0w, s1wq0t, s1cq0w, s1cq0t, context free rewriting systems with fast matrix multi- q0wq1w, q0wq1t, q0tq1w, q0tq1t plication. CoRR, abs/1504.08342.

55 Peter´ Dienes and Amit Dubey. 2003. Antecedent re- Computational Linguistics (ACL’04), Main Volume, covery: Experiments with a trace tagger. In Proceed- pages 327–334, Barcelona, Spain. ings of the 2003 Conference on Empirical Methods in Wolfgang Maier and Timm Lichte. 2011. Characteriz- Natural Language Processing, pages 33–40, Sapporo, ing discontinuity in constituent treebanks. In Formal Japan. Grammar. 14th International Conference, FG 2009. Richard Farkas and Helmut Schmid. 2012. Forest Bordeaux, France, July 25-26, 2009. Revised Selected reranking through subtree ranking. In Proceedings Papers, volume 5591 of LNCS/LNAI, pages 167–182, of the 2012 Joint Conference on Empirical Methods Berlin, Heidelberg, New York. Springer-Verlag. in Natural Language Processing and Computational Wolfgang Maier, Miriam Kaeshammer, Peter Baumann, Natural Language Learning, pages 1038–1047, Jeju and Sandra Kubler.¨ 2014. Discosuite - A parser Island, Korea, July. Association for Computational test suite for German discontinuous structures. In Linguistics. Proceedings of the Ninth International Conference Daniel Fernandez-Gonz´ alez´ and Andre´ F. T. Martins. on Language Resources and Evaluation (LREC’14), 2015. Parsing as reduction. In Proceedings of the 53rd Reykjavik, Iceland. European Language Resources Annual Meeting of the Association for Computational Association (ELRA). Linguistics and Teh 7th International Joint Conference Wolfgang Maier. 2015. Discontinuous incremental shift- on Natural Language Processing of the Asian Federa- reduce parsing. In Proceedings of the 53rd Annual tion of Natural Language Processing, Beijing, China. Meeting of the Association for Computational Linguis- Yoav Goldberg and Michael Elhadad. 2010. An effi- tics and the 7th International Joint Conference on Nat- cient algorithm for easy-first non-directional depen- ural Language Processing (Volume 1: Long Papers), dency parsing. In Human Language Technologies: pages 1202–1212, Beijing, China, July. Association The 2010 Annual Conference of the North American for Computational Linguistics. Chapter of the Association for Computational Linguis- Haitao Mi and Liang Huang. 2015. Shift-reduce con- tics, pages 742–750, Los Angeles, CA. stituency parsing with dynamic programming and pos Johan Hall and Joakim Nivre. 2008. Parsing discon- tag lattice. In Proceedings of the 2015 Conference tinuous phrase structure with grammatical functions. of the North American Chapter of the Association for In Bengt Nordstrom¨ and Aarne Ranta, editors, Ad- Computational Linguistics: Human Language Tech- vances in Natural Language Processing, volume 5221 nologies, pages 1030–1035, Denver, Colorado, May– of Lecture Notes in Computer Science, pages 169–180. June. Association for Computational Linguistics. Springer, Gothenburg, Sweden. Mark-Jan Nederhof and Heiko Vogler. 2014. Hybrid Liang Huang, Suphan Fayong, and Yang Guo. 2012. grammars for discontinuous parsing. In Proceedings Structured perceptron with inexact search. In Pro- of COLING 2014, the 25th International Conference ceedings of the 2012 Conference of the North Ameri- on Computation Linguistics: Technical Papers, pages can Chapter of the Association for Computational Lin- 1370–1381, Dublin, Ireland. guistics: Human Language Technologies, pages 142– Joakim Nivre, Marco Kuhlmann, and Johan Hall. 2009. 151, Montreal,´ Canada, June. Association for Compu- An improved oracle for dependency parsing with tational Linguistics. online reordering. In Proceedings of the 11th Geoffrey Huck and Almerindo Ojeda, editors. 1987. International Conference on Parsing Technologies Discontinuous constituency. Academic Press, New (IWPT’09), pages 73–76, Paris, France. York. Joakim Nivre. 2009. Non-projective dependency parsing Mark Johnson. 2002. A simple pattern-matching al- in expected linear time. In Proceedings of the Joint gorithm for recovering empty nodes and their an- Conference of the 47th Annual Meeting of the ACL tecedents. In Proceedings of the 40th Annual Meet- and the 4th International Joint Conference on Natural ing of the Association for Computational Linguistics, Language Processing of the AFNLP, pages 351–359, pages 136–143, Philadelphia, PA. Singapore. Laura Kallmeyer and Wolfgang Maier. 2013. Data- Helmut Schmid. 2006. Trace prediction and recovery driven parsing using probabilistic linear context- with unlexicalized PCFGs and slash features. In Pro- free rewriting systems. Computational Linguistics, ceedings of the 21st International Conference on Com- 39(1):87–119. putational Linguistics and 44th Annual Meeting of Roger Levy and Christopher Manning. 2004. Deep de- the Association for Computational Linguistics, pages pendencies from context-free statistical parsers: Cor- 177–184, Sydney, Australia. recting the surface dependency approximation. In Pro- William Schuler, Samir AbdelRahman, Tim Miller, and ceedings of the 42nd Meeting of the Association for Lane Schwartz. 2010. Broad-coverage parsing using

56 human-like memory constraints. Computational Lin- guistics, 36(1):1–30. Le Quang Thang, Hiroshi Noji, and Yusuke Miyao. 2015. Optimal shift-reduce constituent parsing with structured perceptron. In Proceedings of the 53rd An- nual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Pa- pers), pages 1534–1544, Beijing, China, July. Associ- ation for Computational Linguistics. Andreas van Cranenburgh and Rens Bod. 2013. Dis- continuous parsing with an efficient and accurate DOP model. In Proceedings of The 13th International Con- ference on Parsing Technologies, Nara, Japan. Andreas van Cranenburgh. 2012. Efficient parsing with linear context-free rewriting systems. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 460–470, Avignon, France. Yannick Versley. 2014. Experiments with easy-first non- projective constituent parsing. In Proceedings of the First Joint Workshop on Statistical Parsing of Mor- phologically Rich Languages and Syntactic Analysis of Non-Canonical Languages, pages 39–53, Dublin, Ireland. K. Vijay-Shanker, David Weir, and Aravind K. Joshi. 1987. Characterising structural descriptions used by various formalisms. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguis- tics, pages 104–111, Stanford, CA. Yue Zhang and Stephen Clark. 2009. Transition- based parsing of the Chinese treebank using a global discriminative model. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09), pages 162–171, Paris, France. Yue Zhang and Joakim Nivre. 2011. Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Asso- ciation for Computational Linguistics: Human Lan- guage Technologies, pages 188–193, Portland, Ore- gon, USA, June. Association for Computational Lin- guistics. Muhua Zhu, Yue Zhang, Wenliang Chen, Min Zhang, and Jingbo Zhu. 2013. Fast and accurate shift-reduce con- stituent parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 434–443, Sofia, Bulgaria.

57 Discontinuity (Re)2visited: A Minimalist Approach to Pseudoprojective Constituent Parsing

Yannick Versley LinkedIn∗ yversley@lınkedın.comii

Abstract structures, which today offer straightforward ways to deal with nonprojective structures in practical In this paper, we use insights from Mini- ways despite the fact that exact parsing of nonpro- malist Grammars (Keenan and Stabler, 2003) jective dependencies with second-order factors is in- to argue for a context-free approximation of tractable in general (McDonald, 2006). discontinuous structures that is both easy to In the following, we present a principled treat- parse for state-of-the-art dynamic program- ming constituent parsers and has a simple and ment for approximating discontinuous syntactic effective method for the reconstruction of dis- structures by context-free ones. The resulting novel continuous tree structures. technique for pseudoprojective parsing is well suited The results achieved on the Tiger treebank – for lexicalized as well as unlexicalized projective paired with state-of-the-art constituent parsers parsers, and yields a feasible solution for accurate such as the BLLIP and Berkeley parsers probabilistic parsing of discontinuous structures. – both improve on existing transformation- based approaches for representing discontin- 2 Related work uous structures and the state-of-the-art results of Fernandez-Gonz´ alez´ and Martins’ (2015) Grammar-based approaches to constituent parsing parsing-as-reduction approach. based on minimally context-sensitive formalisms such as LCFRS/MCFG can give guarantees of polynomial-time parsability. However, the step to 1 Introduction practical (i.e., fast and accurate) parsing is much larger than is the case for context-free structures: For languages with free(r) word order and richer Specifically, binarization is a key to yielding prob- morphology, predicate-argument structure (depen- abilistic formulations with a good tradeoff be- dencies of words and their heads/governors) and tween expressivity and sparsity e.g. in the parser topology (contiguous phrases or regions in the sen- of Charniak (2000). Binarization, however, is non- tence) do not always match up. Hence, constituency straightforward for LCFRS (Gildea, 2010; van Cra- parsing techniques that rely on a context-free back- nenburgh, 2012), and, unlike for CFG, it is in- bone either have to do with a language-dependent effective to obtain a grammar with lower parsing approximation that puts topology at the center (e.g. complexity. As a result, parsers directly based on the TuBa-D/Z¨ treebank of Telljohann et al., 2009, LCFRS such as Kallmeyer and Maier (2013) are based on topological fields) or can only produce rather limited in terms of speed and accuracy. an approximation of the actual predicate-argument Greedy (and beam-search) parsing of discontin- structures (as is the case with most parsing ap- uous constituents has recently seen some progress proaches targeting the Negra and Tiger treebanks, in the form of approaches that use swapping tech- cf. Skut et al., 1997; Brants et al., 2002). niques in transition-based parsing of discontinuous As an alternative to this procrustean choice, prac- constituents (Versley, 2014; Maier, 2015). A fur- titioners have traditionally preferred dependency ther strain of approaches uses other dependency *Work was done at the University of Heidelberg parsing techniques by reducing discontinuous con-

58

Proceedings of DiscoNLP 2016, pages 58–69, San Diego, California, June 17, 2016. c 2016 Association for Computational Linguistics stituent parsing to a dependency labeling problem evaluation is also performed on the individual steps (Hall and Nivre, 2008; Fernandez-Gonz´ alez´ and rather than in a framework that looks at the complete Martins, 2015). However, none of these two groups process of parsing and reattachment. of approaches easily lend themselves to reranking Boyd (2007) proposes an approximation of (Collins, 2000; Charniak and Johnson, 2005) or LCFRS’s block structure to get to a tree transfor- comparable techniques. mation that is, in difference to a pure node rais- As in van Cranenburgh and Bod (2013), we as- ing approach, reversible. Boyd labels the blocks of sume that pairing k-best parsing of context-free a discontinuous nodes by appending a special suf- structures with a deterministic back-transform to fix (i.e. “VP*” instead of “VP”), and later merging discontinuous structures is a feasible approach to these nodes in a top-down fashion. We will discuss yield k-best lists of discontinuous structures. While this approach in more detail in section 3.2. van Cranenburgh and Bod rely on a complex rank- Boyd herself compares her approach of block- ing mechanism as the second step, we show that re- based transformation to a setting where node rais- finements to the first steps can already yield results ing was applied to the training corpus but no back- that surpass the state of the art without any separate transformation was used (similar to parsing mod- ranking step. els distributed with modern constituent parsers). Undoing the block-based transformation yielded 3 Pseudoprojective parsing a better result than node-raising without back- transformation using gold part-of-speech tags. After initial successes with head-lexicalized PCFG Contra Boyd, Rehbein and van Genabith (2009) parsing for English (Collins, 1997; Charniak, 2000), find in an evaluation based on f-structure conver- researchers tried to apply these models to languages sion that Boyd’s way of transforming trees gives that show less configurationality than English. For worse results to their approach. Hsu (2010) looks Czech, (Collins et al., 1999) reordered the original purely at how well parsers are able to reproduce the word sequence into one where the corresponding structures created by pseudoprojective transforma- tree becomes continuous. For German, (Dubey and tions, and finds that parsers introduce more errors in Keller, 2003) transform the treebank by raising dis- Boyd’s method than in the node-raising method.1 continuous parts to a higher node in the tree. Both of these approaches will yield parsing mod- 3.1 Formalizing pseudoprojectivity els that (potentially) produce sensible trees for feasi- ble sentences that have a continuous syntactic struc- For our purposes a tree T = (NT t1, . . . , tn ),E) ∪{ } ture (which comprises the large majority of sen- over a sequence of terminals Term = t1, . . . , tn is a tences in English, but may only cover less than half directed acyclic graph such that (i) terminals have no of sentences in typical text for German or Czech). children (ii) all nonterminal nodes have at least one As such, these approximations would be problem- child, (iii) all nodes have at most one parent and (iv) atic for not offering a path from unparsed tokens as that there is a unique topmost node that dominates they occur in the text to (potentially) discontinuous all other nodes. For the nodes of such a tree we can trees reflecting predicate-argument structure. recursively assign a yield function such that the yield of t is i and the yield of a nonterminal node is the Subsequent approaches such as Levy and Man- i { } ning (2004) use a multiple-step algorithm to revert union of the yields of the children. the changes introduced by node raising by (heuris- We call a node contiguous if its yield is a contigu- tically or automatically) identifying NULL elements ous subsequence; we call a tree contiguous if all its in the tree, and in a second step identifying dislo- nodes are contiguous. cated material and linking it up with an appropri- 1 ate NULL position. Unlike work for heuristic reat- A reviewer points out that these results should be seen in tachment of dependencies such as Hall and Novak the context of the experimental framework used – Rehbein only uses the Berkeley parser, and Hsu only uses plain unbinarized (2005) or Nivre and Nilsson (2005), however, this PCFGs – and that particular transformations may be more or work is relatively complex. Levy and Manning’s less appropriate for specific parsers.

59 A pseudoprojective transform over a set of Boyd (2007) proposes a pseudoprojective trans- T trees is a pair (proj, unproj) of functions with the formation that approximates LCFRS’s block struc- following properties: ture: Boyd labels the blocks of a discontinuous nodes by appending a special suffix (i.e. “VP*” in- proj is a total function from trees in to trees. • T stead of “VP”), and later merging these nodes in a For any tree T , the value proj(T ) is a pro- ∈ T top-down fashion. jective tree over the same terminal sequence. Van Cranenburgh (2012) suggests a refinement of unproj is a partial function from projective trees Boyd’s approximation – independently disc where • to trees from . For any T proj( ), the different blocks of a discontinuous phrase re- T 0 ∈ T proj(unproj(T 0)) = T 0 ceive different labels (yielding an approximate pro- 1 1 2 2 duction of S VP∗ V NP∗ VP∗ NP∗ ) proj (and by extension unproj) preserve con- → • In practice, we found that van Cranenburgh’s ap- tiguous nodes: if T contains a node with label a 12 proach creates many rare categories such as VP∗ , i..j (T ) and a contiguous yield , proj must con- and that limiting the numbers in the superscripts a tain a node with label and a contiguous yield yields identical performance to Boyd’s approach. i..j. If a contiguous node a1 has a contiguous ancestor a2, the ancestor relationship between 3.3 A new look at node raising a1 and a2 is preserved in proj(T ). If we have a treebank where a VP is the projection There is a set of barrier nodes B among of a verb, we (or the parser) have a concrete expec- • the contiguous nodes from T T , minimally tation of what to find inside a VP node. In con- ∩ 0 including the topmost node and all terminal trast, LCFRS and Boyd’s transforms treat the trees nodes. If proj(T ) contains a node not contained as non-lexicalized construct: for a VP, we would get in T with a ancestor na B and a descendent two VP* nodes with rather different properties. The ∈ nd B, then T must contain a noncontiguous second (in our case) contains a verb and other chil- ∈ node with ancestor na and a descendent nd. dren that we would normally expect under a VP (not VP*) node, and the first one contains topicalized (I.e. proj(T ) = T for all contiguous trees). material. Furthermore, the occurrence of both top- Note that we do not require proj to be injective icalization and extraposition can mean that there is (in many useful cases it is not), nor do we say any- no strong regularity among “first parts” and “second thing about unproj’s behaviour on trees outside of parts”, which renders van Cranenburgh’s refinement proj( ). Possibilities for proj include simply delet- ineffective. T ing discontiguous nodes, adjusting their spans by re- If we consider trees to have head terminals, parenting (as in node raising) and/or adding other and the hierarchical relations inducing dependency nodes. relations between terminals (see definition in the appendix), we could hope for contiguous sub- 3.2 LCFRS-inspired approximations tree preservation, a stronger version of contiguity A Linear Context-Free Rewriting System (LCFRS) preservation: is a grammar where nodes in the parse tree do not Given two trees T1 and T2 where the terminal correspond to a single span, but to a fixed number of node subsequences i1..j1 of T1 and i2..j2 of T2 as blocks, yielding productions such as well as the dependencies between the terminals in these subsequences are identical. If T1 has a con- S1(uvwxy) VP2(u, x)V1(v) NP2(w, y) → tiguous subtree with a yield of i1..j1, then proj(T2) In grammar-based parsing, the number of blocks should also contain the same contiguous subtree. (i.e. letter variables in the production rule) deter- Node raising (leaving aside unary productions) mines the parsing complexity; for pseudo-projective fulfills this subtree conservation property for tree- transforms, the most important distinction is be- banks such as Tiger that use sibling adjunction (see tween (contiguous) blockdegree 1 nodes and (dis- Carreras et al., 2008). The LCFRS-derived pseudo- contiguous) nodes with a larger blockdegree. projective transformations do not, which intuitively

60 > +wh cp =t < who:d tp =t =d =v ε:C > vp =d =d marie:d < who:d marie:d will:t praise:v =v

will:t < Figure 2: A Tiger-style tree =d

praise:v (-wh) a higher node, and attractor features (e.g. +w) which Figure 1: A Minimalist Grammar tree designates a node as a host for moved constituent of the corresponding type. A MOVE operation ex- tracts a x constituent into a +x position, whereas explains the result of Hsu (2010) that the latter are − harder for a context-free parser. a MERGE operation adjoins one node to another. As a conclusion, we see two properties we want to To illustrate the correspondence between the anal- maintain: on the one hand, subtree contiguity preser- ysis assigned by a minimalist grammar and our tar- vation seems to be beneficial for getting more accu- get representation (flat discontinuous trees), let us rate parses of the projectivized trees, on the other compare Figures 1 and 2: On one hand, MG as- hand, some marking to aid deprojectivization seems sumes binary adjunctions while the Tiger annotation very helpful. scheme has flat phrases – this kind of variation in annotation scheme, while it may influence the struc- 4 Approximating Minimalist Grammars tures preferred by PCFG and similar models, is se- mantically equivalent (Johnson, 1998). On the other The approach we present in this paper is based on hand, the representation with discontinuous phrases an approximation of minimalist grammars (MG), eliminates all empty nodes (both the empty comple- stemming from a family of approaches that is pop- mentizer and the trace of the moved who), some- ular in mainstream generative linguistics (Chom- times yielding headless phrases. sky, 1986). Transformational grammar and its de- scendants have inspired some early work on pars- We should emphasize here that, while many in- ing beyond context-free structures (Dorr, 1987; Lin, sights about constraints in movements that are valid 1993), but have for a long time lacked a formaliza- in Minimalist Grammars are still valid in actual tree- tion that would enable more principled work. banks (whenever we take care to allow for notational Minimalist grammars (in the sense we will use differences), it is not true in general that annotated here) are a grammar formalism introduced by Sta- treebanks are designed using the criteria that are ap- bler (1997) and Keenan and Stabler (2003) that plied in the Minimalist Program (indeed, Minimalist belongs to the class of minimally context-sensitive Grammars have the goal of providing a formal gram- grammars (Vijay-Shanker et al., 1987; Michaelis, mar to express such ideas in a more theory-neutral 1998), and carry (in comparison with LCFRS) way), nor do decisions about headedness, argument- the benefit of being a lexicalized formalism, and hood, and the modeling of moved nodes necessarily potentially of yielding more compact grammars correspond to those that one would make in an MG. (Michaelis, 1998) and a more natural model of prob- Beyond couching the relation between predicate- ability assignments (Hunter and Dyer, 2014). argument trees and surface trees in more principled At the core of Minimalist Grammars are terms (see also Boston et al., 2010 for a discussion nodes/expressions that carry a category (e.g. x) to- of the relation to dependencies), what do we gain? gether with valencies for certain categories (=x, x=) The Shortest Move Constraint posits that any and attractee features (e.g. -w), which designate a material that is moved must attach at the first pos- node as being moved from its argument position to sible host node (i.e., for a moved phrase of type x, −

61 attach to the lowest ancestor with the feature +x. node in the latter and coindexing with the dislocated With the caveat that SMC may not be appropriate material; empty nodes are not supported by today’s for all types of movement (e.g. scrambling), it firstly parsers, and the idea of encoding the coindexation provides an intuition on possible hosts for dislocated by appending slash categories to phrases has been material (whereas the rule of “raise the nodes up to a shown to be non-beneficial (Schiehlen, 2004). host node where the material can be attached with- We can preserve contiguous subtrees and yet want out discontinuity” is of a much more practical na- to add some marking, we could insert a phrase be- ture), while simultaneously predicting that any (po- tween the host phrase and the dislocated material, in tential) host phrase must also act as a barrier for the simplest case one single fixed category, as in movement. Following the discussion of Boston et al. (2010), (2) [vp [np den Mann [pp mit dem Hamster]] we find a plausible rationale for contiguous subtree gesehen [X [s der grinst]]] preservation. A downside is that undoing Move- where we inserted an additional X node between host based dislocation – especially if we consider node and dislocated material. In difference to simple node raising as the protypical example – would need more raising, this would encode the information which information. Consider the following sentence: nodes were dislocated and which ones were not in (1) Ich habe [np den Mann [pp mit the tree rather than leaving it implicit. I have the man with the The LR scheme for adding additional information to raised nodes, which we propose here, leaves the dem Hamster]] gesehen [der grinst] block of the sentence that includes its head with an hamster seen which grins I have seen the man with the hamster which unchanged node label, but labels the other part as grins being left- or right-dislocated, yielding, in the above example, VP*L, VP and VP*R as labels if the mid- In this case, the attachment of the relative clause to dle phrase contains the head of the VP. In our exam- Mann (man) or to Hamster (hamster) is an instance ple, we would obtain the following: of general attachment ambiguity, and while we can use very general preferences (attach low; prefer at- (3) [vp [np den Mann [pp mit dem Hamster]] taching as an argument rather than as an adjunct, gesehen [NP*R [s der grinst]] cf. Hobbs and Bear, 1990), correct attachment often Similar to the trace-filler view, and different from requires semantic or context-dependent information. the LCFRS view, we distinguish between the main (LCFRS-inspired transforms pass this problem onto part of a phrase (normally the one containing the the parser, in a form that poorly fits state-of-the-art head) and discontinuous (moved) parts. We mark parsing models). the moved parts with the concatenation of the origi- nal parent label and a *L or *R suffix. In compari- 4.1 Representing Moved Phrases son with node raising, the additional node fulfills an If we look at minimalist grammars from our pseu- important function as it both helps the parser recog- doprojective transform perspective, we see that even nize where such moved or extraposed material can when we want a transform that fulfills the contigu- occur, based on topology (i.e., outside context) as ous subtree preservation criterion (which includes well as the material itself, and also provides useful Boyd’s and van Cranenburgh’s solution, even though information for reattachment. they are pseudoprojective transforms in our sense), we can solve the problem differently from – and 4.2 Simple Heuristics for Backtransformation hopefully better than – node raising. Adopting a minimalist perspective on reattachment, Marcus et al. (1994) and later Skut et al. (1997), we would assume that a parser for context-free struc- in a solution that we will call the trace-filler mecha- tures can plausibly produce the derived structures nism try to ensure a relation between host phrase and that capture sentence topology but not necessarily the argument-structure parent by inserting an empty the full set of argument and adjunct dependencies.

62 method F1 5 Experimental set-up uncross, no reattach 94.99 In the following parsing experiments, we want to LR + top-down reattach 97.90 compare more directly the accuracy of our LR + bottom-up reattach 99.20 LR scheme of projectivization and reattachment to that +S barrier 99.08 of, e.g. Boyd’s proposal while taking into account Table 1: Heuristic reattachment: roundtrip results on Tiger sen- many of the concerns that occur in practical parsing tences 40475-45474 today, in particular the compatibility with other tech- niques used to improve parsing accuracy (linguis- To capture the discontinuous structures, we inves- tic tree transformations, products of latent variable tigate two reattachment heuristics: grammars, word clustering-based generalizations of words). As we are specifically concerned about the The top-down heuristic chooses an appropriate behaviour with more unknown words and different • node from the siblings of the dislocated phrase, distributions of syntactic constructions that occurs in similar to the way that Boyd’s or van Cranen- out-of-domain corpora, we exclusively use the part- burgh’s reattachment algorithm would operate. of-speech tags assigned by the parsing model itself. Corpora used As an in-domain corpus that we The bottom-up heuristic iterates over the yield • split into a training, development and testing set, we of the parent node (limited either to the part left use the Tiger treebank (Brants et al., 2002), which of the dislocated one for *R ones, otherwise encodes argument and adjunct relations in a discon- the part right of it), to produce a sequence of tinuous constituent structure with edge labels. We all suitable descendents of the parent node in a use two splits that were used in the literature for close-to-far, low-to-high fashion. parsing experiments: the first, called the SPMRL split reproduces the train/development/test portions Evidence from reconstructing gold data (see table of Farkas and Schmid (2012) and was used in the 1) shows that the bottom-up heuristic is more accu- SPMRL shared tasks of 2013 for dependency and rate in the reconstruction, which is consistent with constituency parsing, using sentences 1–40474 as the preference on low attachment as formulated by, training set, the next 5000 as development set, and e.g., Hobbs and Bear (1990). the remaining 5000 as a test set. The second split Considering the relation to Minimalist Grammar was first used in the experiments of Hall and Nivre and the Shortest Move Constraint, we would ex- (2008) and uses folds 9 and 10 in a 10-fold setup as pect that certain nodes act as barriers and can development and testing portions, respectively. block dislocation or movement across them: In ex- As an out-of-domain dataset, we use the Smul- ample (4) (see figure 3), such a barrier constraint tron treebanks of Volk et al. (2015), which include would prevent the fronted subclause “nehmen die portions of a novel (sophie), business reports Uberlegungen¨ Gestalt an” (if the plans become more (economy), texts about mountaineering (alpine) concrete) to the VP internal to the relative clause in- as well as extracts from the manual of a DVD player stead of the VP of the matrix sentence. (dvdman). The annotation of the Smultron tree- In our experiments with projectivising the devel- banks is loosely based on the Tiger scheme but opment set of the Tiger treebank and subsequently differs in two important respects: on one hand, reattaching dislocated phrases (see table 1), we see the Tiger scheme merges PP nodes with the noun that top-down reattachment already provides a 60% phrases that is the argument of the preposition, error reduction with respect to simply leaving nodes yielding one single PP phrase; on the other hand, unattached, and that using a preference for lower the Smultron annotation scheme uses extra nodes for attachment (“bottom-up reattach”) yields a further unary noun, verb and adjective phrases that would be 60% error reduction in terms of phrase F1. In con- elided in the Tiger scheme. For a sensible compar- trast, we see that having sentence (S) nodes block ison, we use a transformed version of the Smultron movement leads to a slight decrease in accuracy. treebanks where unary nodes are deleted and argu-

63 (4) [VP*L Nehmen die Uberlegungen¨ Gestalt an], wurden¨ die Frankfurter, [S die im Januar take the thoughts shape on, would the Frankfurt+ADJ, which in January 1988 [VP ihr 125jahriges¨ Bestehen feiern] konnen¨ ], [VP die Verbindung zu ihren historischen 1988 their 125-year existence celebrate can, the connection to their historical Wurzeln kappen]. roots clip. “If the plans become more concrete, the Frankfurt group, which had its 125-year anniversary in January 1988 would cut the the connection to their historical roots.”

Figure 3: An example sentence where the closest VP is not a suitable movement target ment NPs of prepositional phrases are unwrapped. derived with the Marlin tool (Muller¨ and Schuetze, 2015) and text from the DECOW corpus2, limit- Parsing models For reasons of simplicity, we ing the vocabulary size to the most frequent 250 000 limit ourselves to a small number of generative pars- word types as done by Muller¨ and Schutze.¨ ing models that have been shown to work well for Finally, we also include in our investigation the context-free parsing. use of linguistic tree transformations, which e.g. The BLLIP parser (of which we use the gener- Dubey (2005) as well as Versley and Rehbein (2009) ative model only and not the discriminative rerank- found useful both for unlexicalized and discrimina- ing part) uses a “maximum-entropy-inspired” prob- tive PCFG parsing. In particular, we use lowering abilistic model that produces head-lexicalized con- of parenthetic material as proposed by Maier et al. stituents from the inside out while conditioning on (2012) to reduce the complexity of discontinuities, the two previous neighbours, the grandparent con- but also markers for relative clauses and compara- stituents, and their heads (Charniak, 2000). tive phrases, linguistically motivated subcategoriza- The PCFG-LA parsing model of Petrov et al. tion information for sentences, added case informa- (2006) uses a zeroth-order right-markovized version tion to noun phrases, as well as refing part-of-speech of the treebank which is subsequently augmented classes using some morphological information. with latent symbol refinements in order to improve the fit, using smoothing and a split-merge procedure 5.1 Comparing Boyd with LR to avoid overfitting in the EM-based refinement pro- For each of the three parsers (BLLIP, PCFG-LA, cess. We found that four split-merge iterations (in- PCFG-LA product grammar) we produce trans- stead of the default six) gave the best results with the formed trees with the non-annotated treebank trees linguistically transformed trees. (orig) as well as the enriched ones (xform), us- As the PCFG-LA model learns a latent-variable ing Boyd’s projectivization transform (boyd) and augmentation of the treebank trees and most often the one proposed here (LR). reaches a non-unique local maximum of the EM ob- Quite expectedly, we find that head-lexicalized jective, it is possible and useful to combine multiple parsing using the BLLIP parser is substantially PCFG-LA models to reach an even better perfor- helped by the linguistically motivated tree transfor- mance using a product grammar approach where mations. Somewhat less intuitively, as the earlier each rule is scored by a product-of-experts of multi- results of Petrov et al. (2006) for English indicate ple parsers (Petrov, 2010). that basic transformations such as head annotation For the out-of-domain parsing, Candito and Sed- do not help PCFG-LA models, we find that our lin- dah (2010) found that replacing words with clusters guistically motivated transformations substantially improved the generalization ability of parsers. Af- help both single PCFG-LA and product grammars. ter preliminary experiments showed that replacing We also see that the LR transform performs slightly all words with clusters actually had a negative effect, worse than Boyd’s transform in the BLLIP parser, our setting using word clusters only replaces words that occur fewer than five times, using word clusters 2http://www.corporafromtheweb.org

64 variant BLLIP PCFG-LA LA-product F1 EX POS discF1 F1 EX POS discF1 F1 EX POS discF1 orig-Boyd 80.32 43.75 97.55 68.41 82.35 42.95 97.63 70.55 83.11 45.66 97.76 72.68 orig-LR 81.28 45.71 97.68 70.02 82.27 43.57 97.71 71.46 83.29 45.81 97.80 72.95 xform-Boyd 82.01 44.67 97.70 71.55 81.35 42.71 97.55 70.92 83.62 46.16 97.72 74.43 xform-LR 81.93 45.27 97.73 71.64 82.43 43.49 97.68 72.77 84.18 46.90 97.84 75.43 LR + cleanup 82.19 45.53 97.73 71.85 82.43 43.49 97.68 72.77 84.36 47.08 97.84 75.56 LR + filter 82.20 45.63 97.72 72.09 82.58 43.87 97.69 73.11 84.09 47.31 97.79 75.81 Table 2: Comparison on the Tiger development set, sentences with 70 words ≤

whereas the tendency is exactly reversed in PCFG- performs quite well, going beyond the results of LA-based parsing. Looking at the evaluation results van Cranenburgh and Bod (2013) that use a PCFG when ignoring continuous constituents (see table 2, base parser followed by a DOP-inspired reranking discF1 columns), we see that the LR scheme specif- step, or the discriminative parsing results of Vers- ically improves the quality of discontinuities, some- ley (2014). Using the product-grammar approach, times by a slight amount, and sometimes by as much we find that our approach outperforms the parsing- as one percent). as-reduction approach of Fernandez-Gonz´ alez´ and Martins (2015) by a small margin in the Hall/Nivre Dealing with invalid solutions Some of the de- split, also yielding much improved (+1.9%) evalua- crease in performance for the BLLIP parser for tion results for the case of the SPMRL split. the LR scheme is due to the occasional dislocated phrases (e.g. NP*R) that cannot be reattached and 5.3 Experiments with Out-of-domain parsing are then left behind. For the Boyd transform, our im- Table 3 shows the results on different treebanks from plementation already reverts discontinuity-marked SMULTRON in comparison with those on Tiger. phrases (e.g. NP*) to the original label (e.g. NP). We see that out-of-domain results are substantially To deal with the ‘invalid’ phrases, we propose two lower than those achievable on the Tiger develop- solutions: the first one, which we call the cleanup ment set: The novel (sophie) performs relatively strategy, involves a post-processing step in which well while performance on the other domains falls all dislocated phrase nodes are deleted (and their off even more. The most difficulties are due to the daughters attached to the deleted phrase’s parent). DVD manuals (dvdman), which already Seeker and The second one, which we call the filter strategy, Kuhn (2014) argue to be due to several phenomena involves producing ranked list of parses, of which not seen in well-edited text. we delete any parse that includes dislocated nodes If we compare the accuracy of the POS assign- that cannot be reattached. Of the remaining parses, ment from the PCFG-LA parsing models to both we chose the highest-scoring one. the tagging results of Seeker and Kuhn (2014) and For the BLLIP parser, we find that both the our own, in each case using the Marmot CRF tagger cleanup step and the kbest list filter result in im- Muller¨ et al. (2013), we see that in all cases (except provements over the default version, while the ad- for the Alpine domain) the parser is able to make use vantage of the filtering-based approach over the sim- of the syntactic context to achieve improved part-of- pler cleanup approach is very slight at best. speech accuracy, even if the overall difficulty of the out-of-domain texts is higher. Use of word clusters 5.2 Comparing with the state of the art seems to be especially helpful in those cases where Table 4 compares the result of our pseudoprojective the parser has difficulty at the POS level. parsing approach to other evaluation results on the Tiger treebank using parser-assigned POS tags.3 6 Summary We find that our BLLIP parsing model already In this paper, we have presented a new, linguistically 3Fernandez-Gonzalez and Martins report substantially well-motivated method for transforming discontinu- higher results using gold part-of-speech tags. ous trees to context-free ones and back.

65 Because this method allows us to make effective use of state-of-the-art parsing for continuous trees (similar to the parsing-as-reduction approach lever- Parser LF1/70 EX/70 POS aging state-of-the-art models for dependency pars- TIGERDEV ing), this transformation approach has a substan- multi+sm4 84.18 46.90 97.84 tial advantage over models that use a more complex multi+sm4+clust 84.31 47.51 97.74 grammar formalism but have to use a simpler statis- ALPINE tical model. multi+sm4 74.18 32.70 93.79 Using three generative probabilistic models, we multi+sm4+clust 74.80 33.96 93.96 showed that our method performs better than the marmot Seeker14 94.42 ECONOMY older transformation approach of Boyd (2007), and multi+sm4 74.38 22.05 91.73 outperforms the current state of the art for discon- multi+sm4+clust 74.50 22.44 92.40 tinuous parsing on the Tiger treebank, the parsing- marmot Seeker14 91.83 as-reduction approach of Fernandez-Gonz´ alez´ and SOPHIE Martins (2015). Future work will explore feature- multi+sm4 77.71 38.56 96.51 based statistical models for reattachment and parse multi+sm4+clust 77.53 38.19 96.41 selection. marmot Seeker14 95.20 DVDMAN Acknowledgements The research described in multi+sm4 71.54 25.78 90.55 multi+sm4+clust 72.45 26.56 90.22 this paper was supported by the Leibniz Society marmot Seeker14 90.81 and the Baden Wurttemberg¨ ministery of science, Table 3: Comparative results for parsing the SMULTRON tree- research and the arts as part of the ScienceCam- banks (LR without cleanup) pus Empirical Linguistics and Computational Nat- ural Language Modeling. The author is grateful to Jen Sikos and Alexis Palmer, as well as the three anonymous reviewers, for helpful comments on ear- lier versions of this paper.

References Boston, M. F., Hale, J. T., and Kuhlmann, M. Tiger-H&N (pred) L 40 all ≤ (2010). Dependency structures derived from Min- F1 EX F1 EX imalist Grammars. In Ebert, C., Jager,¨ G., and Hall&Nivre 2008 75.33 32.63 — — Michaelis, J., editors, MOL 10/11, volume 6149 van Cranenburgh ’13 78.8- 40.8- — — of Lecture Notes in Computer Science, pages 1– Fernandez&Martins ’15 82.57 45.93 81.12 44.48 Ours, BLLIP 81.16 43.17 79.68 41.87 12. Springer-Verlag, Berlin Heidelberg. Ours, PCFGLA-prod 82.93 44.26 81.93 42.87 Boyd, A. (2007). Discontinuity revisited: An im- Tiger-SPMRL (pred) L 70 all ≤ proved conversion to context-free representations. F1 EX F1 EX In Proceedings of the Linguistic Annotation Work- Versley 2014 73.90 37.00 — — shop (LAW 2007). Fernandez&Martins ’15 77.72 38.75 77.32 38.64 Ours, BLLIP 76.96 35.52 76.52 35.42 Brants, S., Dipper, S., Hansen, S., Lezius, W., and Ours, PCFGLA-prod 79.84 39.61 79.50 39.50 Smith, G. (2002). The TIGER treebank. In Proc. Table 4: Test set results on the split of Hall and Nivre (2008) TLT 2002. and on the SPMRL split Candito, M.-H. and Seddah, D. (2010). Parsing word clusters. In Proceedings of the First Work- shop on Statistical Parsing of Morphologically- Rich Languages (SPMRL 2010).

66 Carreras, X., Collins, M., and Koo, T. (2008). TAG, Hall, J. and Nivre, J. (2008). Parsing discontinuous dynamic programming, and the perceptron for ef- phrase structure with grammatical functions. In ficient, feature-rich parsing. In Proceedings of Proceedings of the 6th International Conference CoNLL. on Natural Language Processing (GoTAL 2008). Charniak, E. (2000). A maximum-entropy-inspired Hall, K. and Novak, V. (2005). Corrective modeling parser. In Sixth Applied Natural Language Pro- for non-projective dependency parsing. In Pro- cessing Conference (ANLP-NAACL 2000). ceedings of the Ninth International Workshop on Parsing Technology (IWPT 2005). Charniak, E. and Johnson, M. (2005). Coarse-to-fine n-best parsing and maxent discriminative rerank- Hobbs, J. R. and Bear, J. (1990). Two principles of ing. In Proc. ACL 2005. parse preference. In Coling 1990. Chomsky, N. (1986). Barriers. Linguistic Inquiry Hsu, Y.-Y. (2010). Comparing conversions of dis- Monographs. MIT Press. continuity in PCFG parsing. In Proceedings of the Ninth International Workshop on Treebanks and Collins, M. (1997). Three generative, lexicalized Linguistic Theories. models for statistical parsing. In Proc. ACL 1997. Hunter, T. and Dyer, C. (2014). Distributions on Collins, M. (2000). Discriminative reranking for minimalist grammar derivations. In Proceedings natural language parsing. In ICML 2000. of the 13th Meeting on the Mathematics of Lan- Collins, M., Hajic,ˇ J., Ramshaw, L., and Tillmann, guage (MoL 13). C. (1999). A statistical parser for Czech. In Pro- Johnson, M. (1998). Pcfg models of linguistic ceedings of ACL 1999. tree representations. Computational Linguistics, Dorr, B. J. (1987). Principle-based parsing for 24(4):613–632. machine translation. Technical report, Mas- Kallmeyer, L. and Maier, W. (2013). Data-driven sachusetts Institute of Technology Artificial Intel- parsing using probabilistic linear context-free ligence Laboratory. rewriting systems. Computational Linguistics, Dubey, A. (2005). What to do when lexicalization 39(1):87–119. fails: parsing German with suffix analysis and Keenan, E. L. and Stabler, E. P. (2003). Bare Gram- smoothing. In ACL-2005. mar. CSLI Publications, Stanford. Dubey, A. and Keller, F. (2003). Probabilistic pars- Levy, R. and Manning, C. (2004). Deep depen- ing for German using sister-head dependencies. dencies from context-free statistical parsers: cor- In ACL’2003. recting the surface dependency approximation. In Farkas, R. and Schmid, H. (2012). Forest rerank- ACL 2004. ing through subtree ranking. In EMNLP-CoNLL Lin, D. (1993). Principle-based parsing without 2012. overgeneration. In Proceedings of ACL 1993. Fernandez-Gonz´ alez,´ D. and Martins, A. F. T. Maier, W. (2015). Discontinuous incremental shift- (2015). Parsing as reduction. In Proceed- reduce parsing. In Proceedings of the 53rd Annual ings of the 53rd Annual Meeting of the Asso- Meeting of the Association for Computational ciation for Computational Linguistics and the Linguistics and the 7th International Joint Con- 7th International Joint Conference on Natural ference on Natural Language Processing (Vol- Language Processing (Volume 1: Long Papers), ume 1: Long Papers), pages 1202–1212, Bei- pages 1523–1533, Beijing, China. Association for jing, China. Association for Computational Lin- Computational Linguistics. guistics. Gildea, D. (2010). Optimial parsing strategies for Maier, W., Kaeshammer, M., and Kallmeyer, L. linear context-free rewriting systems. In Proceed- (2012). PLCFRS parsing revisited: Restricting ings of NAACL 2010. the fan-out to two. In Proceedings of the 11th

67 International Workshop on Tree Adjoining Gram- Stabler, E. (1997). Derivational minimalism. In Log- mar and Related Formalisms (TAG+11). ical Aspects of Computational Linguistics, pages Marcus, M., Kim, G., Marcinkiewicz, M. A., Mac- 68–95. Intyre, R., Bies, A., Ferguson, M., Katz, K., and Telljohann, H., Hinrichs, E. W., Kubler,¨ S., Zins- Schasberger, B. (1994). The Penn treebank: An- meister, H., and Beck, K. (2009). Stylebook notating predicate argument structure. In Work- for the Tubingen¨ Treebank of Written German shop on Human Language Technology. (TuBa-D/Z).¨ Technical report, Seminar fur¨ McDonald, R. (2006). Online learning of approx- Sprachwissenschaft, Universitat¨ Tubingen.¨ imate dependency parsing algorithms. In EACL van Cranenburgh, A. (2012). Efficient parsing with 2006. linear context-free rewriting systems. In EACL Michaelis, J. (1998). Derivational minimalism is 2012. mildly context-sensitive. In Logical Aspects of van Cranenburgh, A. and Bod, R. (2013). Discontin- Computational Linguistics. uous parsing with an efficient and accurate DOP Proceedings of the International Con- Muller,¨ T., Schmid, H., and Schutze,¨ H. (2013). Ef- model. In ference on Parsing Technologies (IWPT 2013) ficient higher-order CRFs for morphological tag- . ging. In Proceedings fo EMNLP 2013. Versley, Y. (2014). Experiments with easy-first non- projective constituent parsing. In Proceedings of Muller,¨ T. and Schuetze, H. (2015). Robust mor- the First Joint Workshop on Statistical Parsing of phological tagging with word representations. In Morphologically Rich Languages and Syntactic Proceedings of the 2015 Conference of the North Analysis of Non-Canonical Languages, pages 39– American Chapter of the Association for Compu- 53, Dublin, Ireland. Dublin City University. tational Linguistics: Human Language Technolo- gies, pages 526–536, Denver, Colorado. Associa- Versley, Y. and Rehbein, I. (2009). Scalable discrim- tion for Computational Linguistics. inative parsing for German. In Proc. IWPT 2009. Nivre, J. and Nilsson, J. (2005). Pseudo-projective Vijay-Shanker, K., Weir, D. J., and Joshi, A. K. dependency parsing. In Proceedings of ACL 2005. (1987). Characterizing structural descriptions produced by various grammatical formalisms. In Petrov, S. (2010). Products of random latent variable Proceedings of the 25th Annual Meeting of the grammars. In HLT-NAACL 2010. Association for Computational Linguistics (ACL Petrov, S., Barett, L., Thibaux, R., and Klein, D. 1987). (2006). Learning accurate, compact, and inter- Volk, M., Gohring,¨ A., Rios, A., Marek, T., pretable tree annotation. In COLING-ACL 2006. and Samuelsson, Y. (2015). SMULTRON (ver- Rehbein, I. and van Genabith, J. (2009). Automatic sion 4.0) — The Stockholm MULtilingual par- acquisition of lfg resources for german: As good allel TReebank. An English-French-German- as it gets. In Proceedings of LFG09. Quechua-Spanish-Swedish parallel treebank with Schiehlen, M. (2004). Annotation strategies for sub-sentential alignments. probabilistic parsing in German. In Proc. Coling 2004 . Appendix A. Notational clarifications Seeker, W. and Kuhn, J. (2014). An out-of-domain test suite for dependency parsing of German. In For two trees T and T 0 covering the same sequence Proceedings of LREC 2014. of terminals, and containing no unary productions, we define the intersection tree T T 0 as follows: Skut, W., Krenn, B., Brants, T., and Uszkoreit, H. ∩ (1997). An annotation scheme for free word order If we identify nonterminals with (label, yield) • languages. In Proceedings of the Fifth Conference pairs, the nonterminals of T T are exactly ∩ 0 on Applied Natural Language Processing (ANLP- those (label, yield) pairs that correspond to 97). nonterminals of both T and T 0.

68 The set of edges of T T is determined as The induced dependency graph then contains • ∩ 0 the cover relation of its descendent relation. Range(T erm) as nodes. For any pair of a node A node n1 is a descendent of a node n2 iff n and its child n0, the dependency graph contains yield(n ) yield(n ). a dependency edge (head(n), head(n )) as long as 1 ⊆ 2 0 head(n) = head(n0). We can extend this definition to trees containing 6 unary productions if we consider mappings LT from yields to label sequences (i.e., (Nn) Σ∗ if Σ is P → our set of labels). We can define this mapping for a tree as follows:

if there is no node in T with a yield y, then • LT (y) = ε

if T contains one or more nodes with the same • yield, they form a chain of unary productions. Given the sequence l , . . . l Σ of the labels 1 m ∈ ∗ of this sequence (read from top to bottom), we then set LT (y) = l1, . . . , lm.

Taking (e.g.) the longest common prefix of two such sequences gives us an operation that is idempotent, commutative and associative. If we identify nonterminals with (label, yield, or- der) triples, we can extend the definition of the inter- section tree as follows:

The nonterminals of T T are those (label, • ∩ 0 yield, order) triples that correspond to nonter- minals of both T and T 0 and whose unary par- ents (if any) are also nonterminals of T T . ∩ 0 A node n is a descendent of a node if ei- • 1 ther yield(n1) ( yield(n2) or if yield(n1) = yield(n ) order(n ) order(n ). 2 ∧ 1 ≤ 2 We can assign induced dependencies to a node given a suitable head assignment function as fol- lows: Given a head assignment function headidx : Σ Σ∗ N (such that × → headidx(p, c , . . . , c ) 1, . . . , m ) we can re- 1 m ∈ { } cursively assign a head to each node by using:

head(n) = n for terminal nodes • head(n) = head(n ) if n has label l and the • k p children n1, . . . , nm have labels l1, . . . , lm and headidx(lp; l1, . . . , lm) = k

69

Author Index

Balabanova, Elisaveta, 31 Barreiro, Anabela, 22 Batista, Fernando, 22

Fucikova, Eva, 12

Hajic, Jan, 12

Lichte, Timm, 47

Maier, Wolfgang, 47

Snijders, Liselotte,1 Sulger, Sebastian, 37

Uresova, Zdenka, 12

Versley, Yannick, 58

71