Zero Alignment of Arguments in a Parallel Treebank

Jana Šindlerová Eva FucíkovᡠZdenkaˇ Urešová Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Czech Republic {sindlerova,uresova,fucikova}@ufal.mff.cuni.cz

Abstract the sentence, in terms of foregrounding or back- grounding different arguments.1 This paper analyses several points of inter- We believe that the analysis of possible syntac- lingual dependency mismatch on the ma- tic variation within paraphrases, especially such terial of a parallel Czech-English depen- that involves a kind of “disproportion”, in the par- dency treebank. Particularly, the points allel treebank data, would be beneficial for further of alignment mismatch between the va- MT experiments. lency frame arguments of the correspond- By a disproportion in dependencies, we mean ing are observed and described. The such structural configurations that involve differ- attention is drawn to the question whether ent number of dependencies in corresponding syn- such mismatches stem from the inherent tactic structures, i.e., an alignment of “something” semantic properties of the individual lan- on one side of the translation to “nothing” on the guages, or from the character of the used other side. For the purposes of this paper, we call linguistic theory. Comments are made on it a “zero alignment”. the possible shifts in meaning. The authors use the findings to make predictions about 2 Related Work possible machine translation implementa- The analysis in this paper goes in a similar direc- tion of the data. tion as that of (Sanguinetti et al., 2013), though our interest in what they call a “translation shift” 1 Introduction is of a different kind. The authors claim that de- pendency structures are finely apt to account for In Machine translation tasks lately, paraphrases the alignment of syntactically different treelets be- have been used and studied intensely. They ba- tween languages, because of the subtree structures sically serve to improve the evaluation metrics of constituting similar semantic units. We take their MT systems. The ability to generate valid para- findings as our starting point and provide a lin- phrases also plays an important role in informa- guistic analysis of some of the well-identified cat- tion retrieval tasks, textual entailment etc. The egories of translation shift from their research, in so-called paraphrase tables can be automatically order to get a better understanding of different lin- extracted from parallel corpora (Denkowski and guistic grounds for different syntactic structures Lavie, 2010; Ganitkevitch et al., 2013). for a parallel semantic content. Also, our analy- So far, only lexical paraphrases have been ex- sis is based on the deep syntactic layer (in con- plored for Czech (Barancíkovᡠet al., 2014), with trast to the surface structure alignments used in syntactic (structural) paraphrases intended for fu- the paper mentioned above), therefore it does not ture enhancement of the systems. For English, have to deal with those structural phenomena that experiments with both lexical and syntactic para- might not have important semantic consequences, phrases are employed (Dorr et al., 2004). but only serve for topic-focus hierarchization pur- This paper presents a preliminary linguistic poses (such as word order variation, simple pas- analysis of structural paraphrases based on va- sivization etc.). lency representations. It appears that certain 1Here, we use the label "argument" in a simplifying man- types of paraphrases affect the valency structure ner. Any element which is included in the valency frame is of verbs, and possibly the semantic structure of referred to as an argument.

330 Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pages 330–339, Uppsala, Sweden, August 24–26 2015. Our research is also inspired by (Bojar et al., or facultative and its typical morphological real- 2013), an attempt to generate as many possible ization forms are listed. Frame entries are supple- translation paraphrases as possible, in order to en- mented with illustrative sentence examples. large the reference set of translations for MT eval- EngVallex5 (Cinková, 2006) was created as an uation purposes. The experiment described in the adaptation of an already existing resource of En- paper used mostly a flat approach, and was car- glish verb argument structure characteristics, the ried out with substantial work provided by human Propbank (Palmer et al., 2005). The original Prop- annotators. We believe that our research might bank argument structure frames have been adapted help establish rules for automatic extraction of true to the FGD scheme, so that it currently bears the syntactic paraphrases (without unnecessary noise) structure of the PDT-Vallex, though some minor from parallel corpora, based on the valency pat- deflections from the original scheme have been al- terns of words, so that most of the work could be lowed in order to save some important theoretical done automatically, with minimal human control. features of the original Propbank annotation. This lexicon includes 7,148 valency frames for 4,337 3 Methodology and Data verbs. PDT-Vallex and EngVallex have been inter- In the research, we took the advantage of the ex- linked together into a new resource called CzEng- istence of Czech-English parallel data, namely the Vallex (Urešová et al., 2015a; Urešová et al., Prague Czech-English Dependency Treebank 2.5 2015). Beside the complete data of the two (PCEDT 2.5) (Hajicˇ et al., 2012).2 lexicons, the CzEngVallex contains a database It is a collection of about 50 000 sentences, of frame-to-frame, and subsequently, argument- taken from the Wall Street Journal part of to-argument pairs for the purposes of machine Penn treebank (Marcus et al., 1993),3 trans- translation experiments (Urešová et al., 2015b). lated manually to Czech, transformed into de- PCEDT and the CzEngVallex data have already pendency trees and annotated at the level of been used successfully in several MT experiments deep syntactic relations (called tectogrammatic aimed at valency frame detection and selection layer). In short, the tectogrammatic layer con- (Dušek et al., 2014) and also for word sense dis- tains mostly content words (with several defined ambiguation (Dušek et al., 2015). exceptions) connected with oriented edges and la- The interlinking of CzEngVallex frames was belled with syntactico-semantic functors accord- carried out via an annotation over the PCEDT. ing to the Functional Generative Description ap- First, an automatic alignment procedure was run proach (FGD), see (Sgall et al., 1986). Ellipsis over the data, which suggested translational links and anaphora resolution is also included, as well between nodes of the tectogrammatic layer. Cor- as automatic alignment of corresponding nodes. responding verb pairs6 and argument pairs were The PCEDT 2.5 is annotated according to the the highlighted. Then, manual revision and correction FGD valency theory (FGDVT) and two valency of the alignments by two annotators was carried lexicons (one for each language) are part of the out. Thus, as a by-product of building the lexicon, release. a collection of illustrative annotated tree pairs is PDT-Vallex4 (Hajicˇ et al., 2003; Urešová, 2011) available for each verb pair of the CzEngVallex. has been developed as a resource for annotating ar- gument relations in the Prague Dependency Tree- 4 Zero Alignment in the Data bank (Hajicˇ et al., 2006). The version used here contains 11,933 valency frames for 7,121 verbs. In the following sections, we will describe the Each valency frame in the PDT-Vallex represents most important, consistent and frequent points of a distinct verb meaning. Valency frames consist zero alignment found in the data. For each section, of argument slots represented by tectogrammatic we will comment on the linguistic background of functors (slots). Each slot is marked as obligatory the phenomena described and the possible conse- quences for semantic interpretation in the individ- 2https://catalog.ldc.upenn.edu/ ual languages. LDC2012T08 3https://catalog.ldc.upenn.edu/ 5http://lindat.mff.cuni.cz/services/ LDC99T42 EngVallex 4http://lindat.mff.cuni.cz/services/ 6As a basic stage of building the CzEngVallex, only verb- PDT-Vallex verb pairs were taken into account.

331 4.1 Catenative Verbs - Single vs. Double SEnglishT Object Interpretation

expect One of the prominent points of alignment dispro- PRED portion in the data are sentences with catenative v:fin #PersPron cut verbs. Catenative verbs are usually defined as ACT PAT n:subj v:to+inf those combining with non-finite verbal forms. Be- #PersPron cost organization tween the finite catenative verb and the non-finite ACT PAT LOC verb form, there might appear an intervening NP n:adv n:obj n:throughout+X that might be interpreted as the subject of the de- SCzechT očekávat pendent verbal form. In this section, we will be PRED concerned with exactly those verbs allowing the v:fin #PersPron snížit sequence of a finite catenative verb – NP – a non- ACT PAT drop v:že+fin finite catenative verb. #PersPron náklad společnost ACT PAT LOC 4.1.1 ECM Constructions, Raising to Object drop n:1 n:napříč+X celý Most Czech linguistic approaches do not recog- RSTR adj:attr nize the term Exceptional Case Marking (ECM) En: They expect him to cut costs... in the sense of “raising to object”, instead they generally address similar constructions under the Cz: Ocekávají,ˇ že sníží náklady... label “accusative with infinitive”. The difference Figure 1: Alignment of the ECM construction between ECM and control verbs is not being taken into account in most of Czech grammars. In short, raising and ECM are generally considered a b. Videl,ˇ že Petr prichází.ˇ He saw that Peter.ACC is coming. marginal phenomenon in Czech and are not being c. Videlˇ Petra, jak prichází.ˇ treated conceptually (Panevová, 1996), except for He saw Peter.ACC, how is coming. several attempts to describe agreement issues, e.g., the morphological behaviour of predicative com- In this type of accusative-infinitive sequence, plements described in a phrase structure grammar the accusative element is in FGDVT analysed con- formalism (Przepiórkowski and Rosen, 2005). sistently as the direct object of the matrix verb (the The reason for this negligent approach to ECM PATient argument) and the non-finite verb form is probably rooted in the low frequency of ECM then as the predicative complement of the verb constructions in Czech. Czech sentences corre- (the EFFect argument). sponding to English sentences with ECM mostly The PCEDT annotation of verbs of perception do not allow catenative constructions. They usu- is shown in Fig. 2, with frame arguments mapped ally involve a standard dependent clause with a fi- in the following way: 7 nite verb, see Fig.1, or they include a nominaliza- ACT ACT; PAT EFF; --- PAT tion, thus keeping the structures strictly parallel. → → → The only exception are verbs of perception (see, The literature mentions two ways of ECM struc- hear), which usually allow both ways of Czech tural analysis, a flat one, representing the NP as translation – with an accusative NP followed by dependent on the matrix verb, and a layered one, a non-finite verb form (1a), or with a dependent representing the intervening NP as the subject of clause (1b), not speaking about the third possibil- the dependent verb. This mirrors the opinion that ity involving an accusative NP followed by a de- verbs allowing ECM usually have three syntactic, pendent clause (1c). but only two semantic arguments. It is then a mat- ter of decision between a syntactic and semantic (1) He saw Peter coming. approach to tree construction. a. Videlˇ Petra pricházet.ˇ He saw Peter.ACC to come. The English part of the PCEDT data was anno- tated in the layered manner,8 thus most of the pairs 7In the examples displayed, the green dashed lines con- nect the annotated verb pair, the dotted lines connect verb in the treebank appear as strictly parallel. The con- dependents, the thick arrows mark collected verb arguments, sistency of structures is one of the most impor- the automatic node alignment is displayed in blue, the man- ually corrected alignment is marked in red. The images have 8The annotation followed the original phrasal annotation been cropped or otherwise adjusted for the sake of clarity. of the data in the Penn Treebank.

332 ending naturally leads to the interpretation of the SEnglishT presumed subject of the infinitive as the object

see of the matrix verb. The morphosyntactic repre- PRED v:fin sentation is taken as a strong argument for us- ing a flat structure in the semantic representation, #PersPron die ACT PAT n:subj v:inf and a covert co-referential element for filling the

man “empty” ACTor position of the infinitive. In En- ACT n:subj glish, in general, there is no such strong indication

SCzechT #Comma and therefore the layered structure is preferred in CONJ x the semantic representation.

vidět PRED_CO v:fin 4.1.2 Object Control Verbs, Equi Verbs,

zato #PersPron muž zemřít Causatives PREC ACT PAT EFF x drop n:4 v:inf Contrary to the ECM constructions, object control verbs constructions (OCV), involving verbs such En: I have seen [one or two] men die... as make, cause, or get, are analyzed strictly as Cz: Zato jsem videlaˇ [jednoho nebo dva] muže zemrít...ˇ double-object in both languages, i.e., the interven- Figure 2: Alignment of the perception verbs’ ar- ing NP is dependent on the matrix verb (and li- guments. The corresponding arguments man-muž censed by it) and there is usually a co-referential are interpreted as belonging to verbs in different empty element of some kind in the valency struc- levels of the structure. ture of the dependent verb form. OCV construc- tions are similarly frequent in Czech and English and their alignment in the PCEDT data is bal- tant advantages of the layered approach; there is anced, see Fig. 3.9 no need of having two distinct valency frames for Interestingly, it is sometimes the case that En- the two syntactic constructions of the verb, there- glish control verbs in the treebank are trans- fore, the semantic relatedness of the verb forms is lated with non-control, non-catenative verbs on kept. Also, there are other specific constructions the Czech side, and the intervening NP is trans- supporting the layered analysis for English, like formed to a dependent of the lower verb of the de- the there-constructions intervening instead of the pendent clause (see Fig. 4), or even a more com- NP, see (2). plex nominalization of the dependent structure is (2) We expected there to be slow growth. used. On the other hand, the Czech part of the PCEDT The verb involved in this kind of translation data uses flat annotation, partly because the cate- shift may be either a more remote synonym, or a native construction with raising structure is fairly conversive verb.10 uncommon in Czech (cf. Sect. 4.1.1). The flat Such a translation shift brings about (at least a structure is easier to interpret, or translate in a slight) semantic shift in the interpretation, usually morphologically correct way to the surface real- in the sense of de-causativisation of the meaning ization, but it requires multiple frames for seman- (prompt lead to).11 Nevertheless, this type of se- → tically similar verb forms (the instances of the verb mantic shift does not prevent the use of the struc- to see in see the house fall and see the house are 9In Fig. 3, English ACT of run does not show the coref- in the FGD valency approach considered two dis- erence link to water since the annotation of coreferential re- tinct lexical units) and it also leaves alignment lations has not yet been completed on the English side of the mismatches in the parallel data. PCEDT, as opposed to the Czech side (cf. the coreference link from ACT of téci to voda). The treatment of ECM constructions in English 10Semantic conversion in our understanding relates differ- and in Czech is different. It reflects both the dif- ent lexical units, or different meanings of the same lexical ferences internal to the languages and their conse- unit, which share the same situational meaning. The valency frames of conversive verbs can differ in the number and type quences in theoretical thinking. Contrary to En- of valency complementations, their obligatoriness or mor- glish, Czech carry strong indicators of mor- phemic forms. Prototypically, semantic conversion involves phology – case, number and gender. The rules permutation of situational participants. 11Note that the de-causativisation process is possible with- for the subject-verb agreement block overt realiza- out objections whereas the reverse shift, from non-control tion of subjects of the infinitives. The accusative verb to a control verb, is rare if it at all exists.

333 make PRED v:fin

fact also picture look ACT RHEM PAT EFF make n:subj x n:subj v:inf PAT v:of+ger profit #Cor better PAT ACT PAT n:attr x adv #Gen water run ACT PAT EFF x n:subj v:inf

způsobovat #Gen uphill PRED ACT DIR3 v:fin x adv skutečnost také vypadat ACT RHEM PAT n:1 x v:že+fin

obraz dobrý ACT MANN n:1 adv

ziskový přimět RSTR PAT adj:attr v:inf En: The fact...... will also make the profit picture look... #Cor voda téci ACT ADDR PAT x n:4 Cz: Skutecnost...... zp˚usobuje,ˇ že ziskový obraz vypadá...

#Cor kopec ACT DIR3 Figure 4: Alignment of English OCV with Czech x n:do+2 non-OCV construction

En: ...making water run... Cz: ...primˇ etˇ vodu téct... with the dependent in the complex predica- tion, based on the similarity of semantic content. Figure 3: Alignment of the control verbs’ argu- In the CzEngVallex, the decision was to align the ments verbs, reflecting the fact that the verb and the noun phrase form a single unit from the semantic point ture as a sufficiently equivalent expression of the of view. semantic content. We approach this as an inherent The second type of zero alignment is connected property of (any) language to suppress certain as- to the presence of a “third” valency argument pects of meaning without losing the general sense within the complex predication structure, e.g., En: of synonymity. placed weight on retailing - Cz: klást d˚uraz na prodej, see Fig. 5. 4.2 Complex Predication Complex predicates have been annotated ac- By “complex predication” we mean a combination cording to quite a complicated set of rules on the of two lexical units, usually a (semantically empty, Czech side of the PCEDT data (for details, see or “light”) verb and a noun (carrying main lexi- (Mikulová et al., 2006)). Those rules include also cal meaning and marked with CPHR functor in the the so-called dual function of a valency modifica- data), forming a predicate with a single semantic tion. There are two possible dependency positions reference, e.g., to make an announcement, to un- for the “third” valency argument of the complex dertake preparations, to get an order. There are predicate: either it is modelled as the dependent some direct consequences for the syntactically an- of the semantically empty verb, or as a dependent notated parallel data. of the nominal component. The decision between First type of zero alignment is connected to the the two positions rely on multiple factors, such as fact that a complex predication in one language valency structure of the semantically full use of can be easily translated with a one-word reference, the verb, valency structure of the noun in other and consequently aligned to a one-word predica- contexts, behaviour of synonymous verbs etc. On tion, in the other language. This is quite a triv- the Czech side, the “third” valency argument was ial case. In the data, then, one component of the strongly preferred to be a dependent of the nomi- complex predication remains unaligned. There are nal component. basically two ways of resolving such cases: either On the English side of the PCEDT, the preferred one can align the with the full verb in decision was different. The “third” argument was the other language, or one can align the full verb annotated as a direct dependent of the light verb

334 syntactically most similar lexical unit, but uses a SEnglishT conversive one (cf. also Sect. 4.1.2), thus caus-

place! ing the arguments to relocate in the deep syntactic PRED v:fin structure, see Fig. 6.

furrier also weight retailing ACT RHEM CPHR PAT n:subj x n:obj n:on+X

SEnglishT other more RSTR RSTR adj:attr adj:attr

increase PRED SCzechT klást v:fin PRED v:fin election board member obchodník rovněž důraz ACT PAT EFF ACT RHEM CPHR n:subj n:obj n:to+X n:1 x n:4

ostatní kožešina velký prodej RSTR RSTR RSTR RSTR #PersPron ryder 14 adj:attr n:s+7 adj:attr n:na+4 APP APP RSTR n:poss n:poss adj:attr

maloobchodní RSTR adj:attr SCzechT zvýšit_se PRED En: Other furriers have also placed more weight on retailing. v:fin Cz: Ostatní obchodníci s kožešinami rovnežˇ kladou vetšíˇ zvolení počet 14 d˚urazna maloobchodní prodej. MEANS ACT PAT n:7 n:1 n:na+4 Figure 5: Mismatch due to complex predication #PersPron #Gen člen solution PAT ACT RSTR adj:poss x n:2

rada RSTR (probably due to lower confidence of non-native n:2 speaker annotators in judging verb valency issues). There is probably no chance of dealing with the dependencies in one of the two above stated ways En: His election increases Ryder’s board to 14 members. only. The class of complex predicates in the data is Cz: Jeho zvolením se pocetˇ clen˚usprávníˇ rady spolecnostiˇ wide and heterogeneous with respect to semantic Ryder zvýšíl na 14. and morphosyntactic qualities. Nevertheless, the Figure 6: Mismatch due to the the use of conver- data suggest several points of interesting inconsis- sive verbs tencies stemming from the imperfection or lack of reliability of the theoretical guidelines. For exam- The relocation of arguments frequently goes ple, the dependency of the valency complementa- together with backgrounding of one of the ar- tion of the complex predicate klást d˚uraz ‘place guments, which then either disappears from the emphasis’, as can be seen in Fig. 5, is solved as translation, or is transformed into an adjunct, or a dependency on the nominal component, whereas into a dependent argument embedded even lower in the complex predicate klást požadavek ‘place in the structure. claim’, the valency lexicon entry involves a direct The first argument (actant)12 in the FGD ap- dependency on the verb. Keeping in mind that the proach is strongly underspecified. It is mostly de- verb klást ‘to place’ has three arguments in its se- limited by its position in the tectogrammatic anno- mantically full occurrences, we would expect di- tation. Its prevalent morphosyntactic realization is rect dependency on the verb in both cases. nominative case, but certain exceptions are recog- nized (verbs of feeling etc.). Also, the ACT posi- 4.3 Conversive Verbs tion (first actant) is subject to the process called A considerable number of unaligned arguments in “shifting of cognitive roles” (Panevová, 1974), the data is caused by the translator’s choice of a i.e., other semantic roles can take the nominative verb in a conversive relation to the verb used in case and the corresponding place in the structure the original language. For some reason (e.g., fre- 12Under the term “actant”, FGDVT distinguishes five core quency of the verbal lexical unit, topic-focus artic- constituting valency complementations, ACT, PAT, ADDR, ulation etc.), the translator decides not to use the EFF, and ORIG.

335 in case there is no semantic agent in the structure. Thus we get semantically quite different elements SEnglishT (e.g., +anim vs. -anim) in the ACT position, even base PRED with formally identical verb instances, see the En- v:fin glish side of Figs. 7 and 8. report #Gen and PAT ACT ORIG (CONJ) n:subj x x

survey survey interview ORIG_CO ORIG_CO ORIG_CO SEnglishT n:on+X n:on+X n:on+X

base! PRED v:fin SCzechT opírat_se PRED v:fin wertheimer this #Colon ACT PAT ORIG (APPS) n:subj n:obj x zpráva a ACT PAT (CONJ) n:1 x mr. statement RSTR ORIG_CO n:attr n:on+X výzkum výzkum rozhovor PAT_CO PAT_CO PAT_CO n:o+4 n:o+4 n:o+4 keating ACT n:by+X

mr. RSTR SCzechT n:attr En: The report was based on a telephone survey...

opírat_se Cz: Zpráva se opírá o telefonický výzkum... PRED v:fin

Wertheimer prohlášení Figure 8: Original collect for the verbs base and ACT PAT n:1 n:o+4 opírat se

Keating ACT n:2 guments in the structure. We will call them the En: Mr. Wertheimer based this on a statement by Mr. Keat- Person that expresses an opinion, the Expressed ing... Opinion and the Resource for the opinion. The Cz: Wertheimer se opírá o prohlášení Keatinga... Person bases the Expressed Opinion on the Re- source. With the English verb, the Expressed Figure 7: Conflict due to the underspecification of Opinion always takes the PAT position and the Re- the ACT position source the ORIGin position in the valency struc- ture. On the other hand, on the Czech side of the This formal feature of the FGDVT gives rise data, there is a conflict. In both cases, there are to a number of conflicts in the parallel structures seemingly only two arguments. In the first case, considering structures that undergo semantic de- the Expressed Opinion is sort of backgrounded agentization or (milder) de-concretization of the from the semantic structure. If there were a need agent. of overtizing it, it would probably appear with Here the question arises, whether such verb in- locative morphology, as an adjunct: Wertheimer stances correspond to different meanings of the se v tomto opírá o prohlášení... ‘Wertheimer in verb (represented by different verb frames), or this relies on a statement’ (see also an authentic whether they correspond to a single meaning (rep- example from the data in Fig. 9). In the sec- resented by a single valency frame). It is often the ond case, on the other hand, the structure follows case, that the Czech data tend to overgeneralize the the passivized English structure in backgrounding valency frames through considering the different the Person (note that the se morpheme does NOT instances as realizations of a single deep syntactic stand for a passive morphology here). If there valency frame, when there is no other modification were a need for expressing the Person, it would intervening in the frame. Therefore, this approach probably appear as a specifying dependent to the chosen for the Czech annotation sometimes shows ACT position: Jejich zpráva se opírá o telefon- a conflict, as in Fig. 7. ický výzkum. ‘Their report is based on a phone The valency structure for both instances of base survey’. In the second case, the Expressed Opin- is identical, only in the first case, the verb is used ion does not take the PAT position, but the ACT in active voice, whereas in the second case, it takes position in the structure, which is the cause of the passive morphology. There are three semantic ar- conflict. We are able to reformulate the first case

336 The conflicts in annotation have a substantial reason – the ways in which English and Czech express backgrounding of the agent are multiple base! EFF v:fin and they differ across the languages. Czech uses the se-morphemization often, in order to preserve #PersPron conclusion statistics ACT PAT ORIG the topic focus articulation (information) struc- n:subj n:obj n:on+X ture, whereas English does not have such a mor-

#PersPron government pheme to work with, so it often uses simple pas- ACT RSTR n:poss n:attr sivization, or middle construction. Moreover, the first valency position in Czech is often overgeneralized, allowing a multitude of vycházet EFF semantically different arguments, which is, due v:že+fin to “economy of description”, sometimes not re- #PersPron závěr statistika ACT LOC PAT flected in the linguistic theory. drop n:v+6 n:z+2

#PersPron vládní RSTR RSTR adj:poss adj:attr 4.4 Arguments Mapped to Adjuncts In the previous section, we have described the En: ...they based their conclusions on government statistics. bilingual treebank data manifestation of the fact Cz: ...vycházejí z vládních statistik. that languages have different means of express- ing a content, and we have noted that these can Figure 9: Original collect for the verbs base and also variate between argument and adjunct inter- vycházet with LOC argument linked to PAT pretation. This variation appears both within a single language (one language expresses a largely in a corresponding manner to show the Expressed synonymous content with either argument or ad- Opinion argument in the ACT position and the Per- junct means) and across languages (a direct con- son backgrounded from the structure, see (3): sequence of the former case: an argument (actant) in one language can be translated into another lan- (3) a. Wertheimer se ve svém názoru opírá o Wertheimer REFL in his opinion leans to guage using an adjunct construction). Languages prohlášení Keatinga. may differ in the preference for either of the pos- the statement by Keating sibilities. b. Wertheimer˚uv názor se opírá o Observing such mismatches in a parallel tree- Wertheimer’s opinion REFL leans to bank occasionally leads us to hesitate whether our prohlášení Keatinga. the statement by Keating interpretation of a word (or phrase) as an argu- c. Wertheimer opírá sv˚uj názor o ment or an adjunct is proper or justifiable. There Wertheimer leans his opinion to may be two possible consequences drawn from prohlášení Keatinga. the observation of a mismatch – either there are the statement by Keating some (rather subtle) semantic reasons for structur- The problem of the status of a Czech verbal- ing a word as an argument/adjunct, or there might adjoining se-morpheme is a complex one and there be some imperfection in our theoretical thinking is no clear scientific consensus in this respect. The about the internal system of a particular language. se-morpheme in Czech has a variety of functions, The theoretical distinction between arguments e.g., a passivization morpheme for the so-called and adjuncts is subject to serious debates in the “reflexive passive” form, a “dispositional diathe- world of linguistics (Hwang, 2011; Tutunjian and sis” morpheme, a reflexive morpheme for lexical Boland, 2008), and so far there is no approach derivation of impersonal verbal variants, or an ac- known to us that would overcome this problem cusative reflexive . easily. Still, we can see that the real data indicate These variants differ with respect to the way some remarkable points that stand at the roots of they are reflected in the data and in the lexicon. the argument/adjunct distinction problem. Most Some are treated as individual verb lemmas, some prominently - the nature of the relation between as surface variants of a common non-reflexive the form of the argument and its semantics. lemma. In the parallel treebank, we find cases (among

337 others) such as alignment of an actor with a tem- to the shift in their morphosyntactic proper- poral adjunct (4) or an actor with a causal adjunct ties, to the shift in their valency status, or (5), etc. even to their complete disappearance from (4) Americans haven’t forgiven China’s leaders for the the structure. military assault of June 3-4 that killed hundreds, and perhaps thousands, of demonstrators. 4. The FGD, having been built on a morpholog- a. Americanéˇ neodpustili cínskýmˇ v˚udc˚um ically rich Czech language, relies strongly on Americans haven’t forgotten Chinese leaders the morphosyntactic form of the individual vojenský útok z 3.-4. cervna,ˇ military assault from 3-4 June, arguments. Therefore, disproportions of the pˇrikterém zahynuly stovky, možná i zero alignment or argument mismatch kind during which died hundreds, maybe even must appear when it is applied to other lan- tisíce demonstrant˚u. guages with different typological properties. thousands demonstrators

(5) The purchase will make Quebecor the second- Points 1, 2 and 3 belong among inherent deeply largest commercial printer in North America. rooted properties of (perhaps any) natural lan- a. Díky této koupi se spolecnostˇ guage. Such differences are not to be overcome Thanks to this purchase REFL the company Quebecor stane druhou nejvetšíˇ by means of possible theoretical unification of de- Quebecor will become second largest scription. komercníˇ tiskárnou v Severní Americe. Point 4, on the other hand, belongs to the prop- commercial printer in North America erties of a certain linguistic theory. We will leave The interpretation of the argument in the above it open, whether it were appropriate to change the stated examples is driven mainly by its morpho- very roots of a linguistic theory in order to make logical form, which is a surprising finding consid- it more flexible for use across different languages. ering that we are dealing with deep syntax, or even Nevertheless, it appears that it is at least possible semantics. to change those aspects that cause individual and It is believed that the form of the expression otherwise unjustifiable conflicts in the data. more or less mirrors its function in the language. The width of the paraphrasing range though, both Acknowledgements within and across languages, leads us to question- This work has been supported by the grant GP13- ing whether it is appropriate to lay much stress on 03351P of the Grant Agency of the Czech Rep. the difference between arguments and adjuncts in the description of a language. References 5 Conclusion P. Barancíková,ˇ R. Rosa, and A. Tamchyna. 2014. Im- We have encountered several reasons for the pres- proving Evaluation of English-Czech MT through Paraphrasing. In N. Calzolari, K. Choukri, T. De- ence of a zero alignment in the data. Though these clerck, H. Loftsson, B. Maegaard, and J. Mariani, reasons have different grounds they tend to be in- editors, Proceedings of the 9th International Confer- terconnected in the language. ence on Language Resources and Evaluation (LREC 2014), pages 596–601, Reykjavík, Iceland. Euro- 1. Language is flexible in paraphrasing linguis- pean Language Resources Association. tic content with different syntactic means. O. Bojar, M. Machácek,ˇ A. Tamchyna, and D. Zeman. Even pairs of sentences which include se- 2013. Scratching the surface of possible transla- mantic backgrounding or foregrounding of tions. In Text, Speech, and Dialogue, pages 465– different arguments are easily interpreted as 474. Springer. synonymous. S. Cinková. 2006. From PropBank to EngValLex: adapting the PropBank-Lexicon to the valency the- 2. It is possible to use predicates that are in a ory of the functional generative description. In conversive relation, or predicates of different Proceedings of the fifth International conference on complexity. Language Resources and Evaluation (LREC 2006), Genova, Italy.

3. The backgrounding and foregrounding of ar- M. Denkowski and A. Lavie. 2010. Meteor-next and guments leads to syntactic relocation of other the meteor paraphrase tables: Improved evaluation arguments in the structure, and consequently support for five target languages. In Proceedings

338 of the Joint Fifth Workshop on Statistical Machine M. Mikulová, A. Bémová, J. Hajic,ˇ E. Hajicová,ˇ Translation and MetricsMATR, pages 339–342. As- J. Havelka, V. Kolárová,ˇ L. Kucová,ˇ M. Lopatková, sociation for Computational Linguistics. P. Pajas, J. Panevová, M. Razímová, P. Sgall, J. Štepánek,ˇ Z. Urešová, K. Veselá, and Z. Žabokrt- B. J. Dorr, R. Green, L. Levin, O. Rambow, D. Far- ský. 2006. Annotation on the tectogrammatical well, N. Habash, S. Helmreich, E. Hovy, K.J. Miller, level in the Prague Dependency Treebank. Annota- T. Mitamura, et al. 2004. Semantic annotation and tion manual. Technical Report 30, Prague, Czech lexico-syntactic paraphrase. Proceedings of the 4th Rep. International Conference on Language Resources and Evaluation (LREC) Workshop on Building Lex- M. Palmer, D. Gildea, and P. Kingsbury. 2005. The ical Resources from Semantically Annotated Cor- proposition bank: An annotated corpus of semantic pora, pages 47 – 52. roles. Computational Linguistics, 31(1):71–106.

O. Dušek, J. Hajic,ˇ and Z. Urešová. 2014. Verbal va- J. Panevová. 1974. On verbal Frames in Functional lency frame detection and selection in Czech and Generative Description. Prague Bulletin of Mathe- English. In The 2nd Workshop on EVENTS: Defi- matical Linguistics, 22:3–40. nition, Detection, Coreference, and Representation, pages 6–11, Stroudsburg, PA, USA. Association for J. Panevová. 1996. More remarks on control. Prague Computational Linguistics. Linguistic Circle Papers, 2(1):101–120.

O. Dušek, E. Fucíková,ˇ J. Hajic,ˇ M. Popel, A. Przepiórkowski and A. Rosen. 2005. Czech and Polish raising/control with or without structure shar- J. Šindlerová, and Z. Urešová. 2015. Using Par- ing. 3:33–66. allel Texts and Lexicons for Verbal Word Sense Disambiguation. In Proceedings of the Third In- M. Sanguinetti, C. Bosco, and L. Lesmo. 2013. De- ternational Conference on Dependency Linguistics, pendency and constituency in translation shift anal- Depling 2015, page this volume. Uppsala Univer- ysis. DepLing 2013, page 282. sity. P. Sgall, E.Hajicová,ˇ and J. Panevová. 1986. The J. Ganitkevitch, B. Van Durme, and Ch. Callison- Meaning of the Sentence in Its Semantic and Prag- Burch. 2013. PPDB: The paraphrase database. In matic Aspects. Dordrecht, Reidel, and Prague, Proceedings of NAACL-HLT, pages 758–764, At- Academia, Prague. lanta, Georgia, June. Association for Computational Linguistics. D. Tutunjian and J. E. Boland. 2008. Do we need a distinction between arguments and adjuncts? Evi- J. Hajic,ˇ E. Hajicová,ˇ J. Panevová, P. Sgall, O. Bo- dence from psycholinguistic studies of comprehen- jar, S. Cinková, E. Fucíková,ˇ M. Mikulová, P. Pajas, sion. Language and Linguistics Compass, 2(4):631– J. Popelka, J. Semecký, J. Šindlerová, J. Štepánek,ˇ 646. J. Toman, Z. Urešová, and Z. Žabokrtský. 2012. Announcing Prague Czech-English Dependency Z. Urešová, E. Fucíková,ˇ and J. Šindlerová. 2015. Treebank 2.0. In Proceedings of LREC, pages 3153– CzEngVallex: Mapping Valency between Lan- 3160. guages. Technical Report TR-2015-58, Charles University in Prague, Institute of Formal and Ap- J. Hajic,ˇ J. Panevová, Z. Urešová, A. Bémová, plied Lingustics, Prague. To appear at http:// V. Kolárová,ˇ and P. Pajas. 2003. PDT-Vallex: Creat- ufal.mff.cuni.cz/techrep/tr58.pdf. ing a Large-coverage Valency Lexicon for Treebank Annotation. In Proceedings of The Second Work- Z. Urešová, O. Dušek, E. Fucíková,ˇ J. Hajic,ˇ and shop on Treebanks and Linguistic Theories, vol- J. Šindlerová. 2015b. Bilingual English-Czech va- ume 9, page 57–68. lency lexicon linked to a parallel corpus. In Pro- ceedings of the The 9th Linguistic Annotation Work- J. Hajic,ˇ J. Panevová, E. Hajicová,ˇ P. Sgall, P. Pajas, shop (LAW IX 2015), Stroudsburg, PA, USA. Asso- J. Štepánek,ˇ J. Havelka, M. Mikulová, Z. Žabokrt- ciation for Computational Linguistics. ský, M. ŠevcíkovᡠRazímová, and Z. Urešová. Z. Urešová. 2011. Valenˇcníslovník Pražského závis- 2006. Prague Dependency Treebank 2.0. Num- lostního korpusu (PDT-Vallex). Studies in Compu- ber LDC2006T01. Linguistic Data Consortium, tational and Theoretical Linguistics. Ústav formální Philadelphia, PA, USA. a aplikované lingvistiky, Praha, Czechia.

J. D. Hwang. 2011. Making verb argument adjunct Z. Urešová, O. Dušek, E. Fucíková,ˇ J. Hajic,ˇ and distinctions in English. Synthesis paper, University J. Šindlerová. 2015a. Bilingual English-Czech Va- of Colorado, Boulder, Colorado. lency Lexicon Linked to a Parallel Corpus. In Pro- ceedings of The 9th Linguistic Annotation Work- M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. shop, pages 124–128, Denver, Colorado, USA, June. 1993. Building a Large Annotated Corpus of En- Association for Computational Linguistics. glish: The Penn Treebank. COMPUTATIONAL LINGUISTICS, 19(2):313–330.

339