<<

Building the Valency Lexicon of

Viktor Bielicky´ Otakar Smrzˇ

Institute of Formal and Applied Linguistics, Charles University in Prague Malostranske´ nam´ estˇ ´ı 25, Prague 1, 118 00, Czech Republic [email protected]

Abstract This paper describes the building of a valency lexicon of Arabic verbs using a morphologically and syntactically annotated corpus, the Prague Arabic Dependency Treebank, as its primary source. We present the theoretical account on valency developed within the Functional Generative Description theory. We apply the framework to Arabic and discuss various valency-related phenomena with respect to examples from the corpus. We then outline the methodology and the linguistic and technical resources used in the building of the lexicon. Valency lexicons can find application in automatic parsing as well as in language generation.

1. Introduction Actant Meaning Example ACT Actor Peter read a letter. Valency of a lexical unit, in particular a , is a set of its ADDR Addressee Peter gave Mary a book. obligatory and/or optional arguments potentially or actu- PAT Patient I saw him. ally realized in an utterance. Valency information is useful EFF Effect We made her the secretary. in restoring the syntactic structure of an utterance, and has ORIG Origin She made a cake from apples. consequences for the study of the meaning. The goal of this paper is to prepare the theoretical (Sections Table 1: Types of actants (inner participants) illustrated on 2 and 3) and methodological (Sections 4 and 5) background English sentences (Lopatkova´ et al., 2006: xvi). for creating the valency lexicon of the most frequent Arabic verbs, exploiting various resources of information. Our ap- Actant Meaning DIR1 Direction to proach is inspired by the VALLEX lexicon of Czech verbs MANN Manner DIR3 Direction from (Lopatkova´ et al., 2006, 2008) and its treebank-oriented MEANS Means TWHEN Time when twin project, the PDT-VALLEX (Hajicˇ et al., 2003, 2006). LOC Location THO Time how often In our case, we focus on (MSA) and take as reference the Prague Arabic Dependency Treebank (PADT). It provides refined linguistic annota- Table 2: Types of adjunct (free modifications) appearing in ´ tions whose multi-level description scheme discerns func- this paper. For the complete list, cf. (Mikulova et al., 2006). tional morphology, analytical dependency syntax, and tec- togrammatical representation of linguistic meaning. The on a verb. Each verb has at least one valency frame. The current, largely extended version of PADT (cf. Hajicˇ et al., exact number of valency frames depends on the number of 2004; Smrz,ˇ 2007) covers over one million words of text. different meanings of the particular verb. For expressing relations between a verb and its complements, FGD uses 2. Theory of Valency in FGD various functors. These functors are divided into actants Before we focus on some issues concerning verbal va- (inner participants, arguments) and free (adverbial) modifi- lency in Arabic and our proposed methodology for creat- cations (adjuncts). The entire number of actants is five (for ing the valency lexicon, let us briefly outline the theoreti- examples in English, see Table 1): cal framework we have adopted. The Functional Genera- ACTor – usually the agent (the surface subject) or the tive Description (FGD) theory, which has been elaborated bearer of some property/quality; PATient – the goal/target since the sixties of the last century (in particular in Sgall or the affected by the action with consequences for et al., 1986; Hajicovˇ a´ and Sgall, 2003), is a multi-stratal its morphemic representation (the case in inflectional lan- dependency-oriented description of language. The valency guages) brought about by verbal government (usually the theory of verbs has been thoroughly researched within the direct object of transitive verbs); ADDRessee – usually the framework of FGD since the seventies (Panevova,´ 1974, indirect object on the surface; ORIGin – this participant 1975, 1994; Lopatkova´ and Panevova,´ 2005). The ques- is probably never obligatory; EFFect – usually the second tion of valency is closely associated with the underlying (inanimate) object, the predicative complement or the ad- tectogrammatical level of language description represent- verbial of result. ing the meaning of the discourse. As regards the actants, they have to fulfill two conditions. According to the valency theory of FGD, valency informa- The first condition is that the set of certain actants is char- tion of the given verb is defined by the valency frame—the acteristic for a particular verb—in other words, not every sequence of frame slots—which is filled by a specific num- actant can depend on every verb. The second is that every ber of various valency complements, i.e. a variety of either actant can occur only once as a complement of the given required or specifically permitted syntactic units dependent verb, disregarding coordination or apposition.

2300 On the contrary, there are different kinds of free modifi- not be takeninto consideration in our present study of Ara- cations denoting various types of adverbial complementa- bic and willbe the subject of our further research. In this tion (e.g. time, location, direction, manner, aim, cause, re- preliminary phase of our research, we adopt the valency gard, accompaniment). These free modifications can ap- frames in their narrow sense, i.e. including obligatoryand pear more than once with a single verb and theoretically optional actants and only obligatoryfree modifications as can modify any verb. It means that they are actually not has been pointed at in Table 3. restricted to a certain group of verbs, as is the case with actants. For examples of free modifications, see Section 3. 3. Valency in Arabic: Preliminary Overview The verbal valency frame in its narrow sense consists of In this section, we will adapt some aspects of the above both obligatory and optional inner participants and obliga- mentioned theoretical approach of FGD for Arabic in order tory free modifications (see Table 3). The criterion of obli- to makeour preliminary observations about the valency be- gatoriness or optionality of verbal complements was intro- havior of Arabic verbs and their verbonominal derivatives. duced in the dialogue test by (Panevova,´ 1974, 1975) with The only elaborate workon valency in Arabic which has respect to possibility to intentionally omit a contextually come to our knowledge is (Al-Qahtani, 2004). Contrary to bound obligatory complement on the surface morphemic FGD, al-Qahtani has adopted predominantly semantic ap- level of representation through the ellipsis or, for instance, proach, since he deals with verbal valency in terms of Case as a general (“dummy”) subject or object, etc. Grammar theory. He applies the Matrix Model of (Cook, 1979) to the semantic classification of Arabic verbs (state, obligatory optional action, and process verbs). To each class a specific set of re- inner participants (actants) + + quired semantic complements (“deep cases”) is assigned— free modifications (adjuncts) + − namely Agent, Experiencer, Benefactive, Object, and Loca- tive. The obligatoryObject is omnipresent with every verb Table 3: Members included in the valency frames. (in contrast to Actor in FGD) and can occur more than once in a case frame. Experiencer, Benefactive, and Locative are It is to be stated that the approach adopted by the FGD mutually exclusive. Sometimes, a particular case is not re- takes into account both syntactic and semantic criteria alized on the surface(“covert case role”), i.e. it is either par- for assigning functors to verbal complements (contrary to tially covert (“deletable”) or totally covert (“coreferential” other more semantically-based approaches). Within this or “lexicalized”). Those deletable case roles can be omit- approach the concept of “shifting of cognitive roles” was ted on the surface(optional or elided complements in terms adopted (Panevova,´ 1974, 1975, 1994). This “shifting” de- of FGD, see (X) and Table 4), whereas theso-called coref- notes application of primarily syntactic criteria for identi- erential and lexicalized case roles arealwaysabsent from fying the first two actants (Actor and Patient). Due to this the surface.The former coreferential roles denote instances fact, the first actant of the given verb is always identified where a single noun cumulates twocase roles simultane- as Actor and the second one as Patient regardless of their ously (not permitted in FGD, see (Y)), while the latterlex- actual semantics. On the contrary, semantic criteria are ap- icalized roles include instances where a certain case role plied when assigning functors to other actants as well as to (usually Object) is incorporated in the semantics of the verb all free (adverbial) modifications of a verb. For the concept (see (Z)). No shift of case roles takesplace in this approach. of “shifting” see Figure 1. Some examples of “shifting” (Al-Qahtani, 2004: 148, 178) will be illustrated on Arabic in Section 3. (X) qala¯ Zaydun maq ulata-hu¯ he-said Zayd said-of-him ORIG Zayd said what he had to say qal¯ AEO/E-del (Experiencer is deleted) ACT PAT ADDR (Y) darasa Zaydun al-kit aba¯ he-studied Zayd the-book Zayd studied the book EFF daras AEO/A=E (Agent equals Experiencer) (Z) ,amila Zaydun he-workedZayd Figure 1: Shifting of cognitive roles as a criterion for as- Zayd worked= Zayd did some work signing functors to actants (inner participants) of a verbal ,amil AO/O-lex (Object is lexicalized) frame (Panevov a,´ 1994: 234). (subject) li- (prep.) ,an (prep.) 4-/-inna (conj.) The valency frames as appeared in the valency lexicon ACTobl ADDRopt PATopt EFFobl of Czech verbs VALLEX are enriched with twoother someone to someone about sth. something/that sets of complements, namely quasi-valency complements and typical complements (Lopatkov a´ and Panevov a,´ 2005; ÈA¯ Lopatkova,´ 2003). The former quasi-valency type (consist- Table 4: Valency frame of the verb qal¯ ‘to say’. ing of newly introduced Obstacle and Difference and of re- vised previously existing complements Intention and Me- diator) is the kind of complement lying somewhere inbe- 3.1. Verbal valency tween the free modifications and actants, while the latter First, let us demonstratesome basic issues postulated by the typical type denotes optional free modification usually co- FGD approach. In all the following examples in this sec- occurring with a particular verb. Those complements will tion, the complements highlighted in bold are considered

2301 to be obligatory, the others are optional. Some examples transitive verbs as -a,.ta¯ IV ‘to give’ (=both objects (ADDR, derived from available corpora had to be abridged. PAT) are in accusative (8)) can be used regularly also in the In case that the verbal valency frame consists of only one reversed position (PAT, ADDR). In that case, the indirect inner participant, it is always Actor, whatever the semantics object (i.e., ADDR) appears with the preposition li- (9). of that complement would be. Here, the syntactic criteria (8) -a,.ta-hu¯ ’l-furs. ata play the major role in assigning the functor to a comple- dheACT-gave-dhimADDR dthe-opportunityPAT ment. Those verbs are typically intransitive stative (1) or he gave him the opportunity passive/reflexive (2). (9) -a,.tat-i ’s-sayt.arata li-’l-bunuki¯ (1) kana¯ yanamu¯ ,adatan¯ f¯ı sˇari¯ ,in s. ag˙¯ırin ditACT-gave dthe-powerPAT dto-the-banksADDR he-was dheACT-sleeps dusuallyTHO din a-street a- it gave the power to the banks LOC small Valency frames with ACT, PAT, EFF. The following verbs he usually slept in a small street are also double transitive: (2) intah. ara bi-tas. w¯ıbi-hi ’l-musaddasa -ila¯ ra-si-hi (10) ,ayyana-hu hakiman¯ li-’l-Kuwayti d ACT d . he -commited-a-suicide by-aiming-of-him the- dheACT-appointed-dhimPAT da-ruler for-KuwaitEFF MEANS gun at head-of-him he appointed him as a ruler of Kuwait he committed a suicide by aiming the gun at his head (11) i,tabara -Adun¯ ¯ıs -ur¯ ub¯ ¯ıyan If the valency frame includes two actants, the first actant is dhe-consideredACT dAdonisPAT da-EuropeanEFF considered to be Actor and the other Patient. Some verbs he considered Adonis to be European are directly transitive (3), whereas others are transitive in- Valency frames with four actants ACT, PAT, ORIG, EFF: directly through a preposition (4). (12) targamaˇ -aktara min hams¯ına kitaban¯ min-a ’l-faris¯ ¯ı- ¯ (3) ,aqadat-i ’l-lagnatuˇ ’l-munaz. z. imatu mu-tamaran s. ih. a-¯ yati -ila¯ ’l-,arab¯ıyati ˘ f¯ıyan -awwala min -amsi dheACT-translated da-more than fifty a-bookPAT dfrom d ACT it-held the-committee the-organizational PersianORIG dinto ArabicEFF d PAT d a-conference a-press a-first-day from he translated more than fifty books from Persian into TWHEN yesterday Arabic the organizational committee held a press conference (13) gayyarat˙ [as-ˇ sarikˇ atu]¯ nasˇat¯ a-ha¯ min -inta¯giˇ ’l-qamhi the day before yesterday . . -ila¯ -inta¯giˇ ’l-buduri¯ d ¯ d (4) nagahˇ . a ,ulama¯-u farans¯ıyuna¯ f¯ı ’stinsah¯ i -araniba¯ itACT-changed [the companies] activity-of-itPAT d ACT ˘d he-succeeded scientists French in cloning-of dfrom production-of the-wheatORIG dto production-of PAT rabbits the-seedsEFF French scientists succeeded in cloning rabbits [the companies] changed their activity from the pro- In case of verbs with three or more actants (no matter if duction of grain to the production of seeds obligatory or optional) where Patient is from the seman- In the following examples, let us mention some verbal va- tic viewpoint not realized in the valency frame, the above lency frames that comprise some type of free (adverbial) mentioned preference of syntactic criteria is applied. This modifications. means that the other actant (EFF, ORIG, or ADDR) under- (14) bad-u ’l-harbi wada,a-hu -amama¯ -amrin waqi¯ ,in goes the “shift” (see Section 2., esp. Figure 1) to occupy the . . dbeginning-of the-warACT it-put-dhimPAT din-front-of unfilled slot of Patient. To the remaining actants, functors a-thing a-realDIR3 are assigned according to the semantic criteria. In example the beginning of the war put him in front of the reality (5), Effect undergoes the shift to Patient. (15) ,adat¯ min-a ’l-Qahirati¯ -ila¯ Bayruta¯ (5) tahawwalat-i ’l-munazzamatu min -adati¯ muwa-¯ . . . dsheACT-returned dfrom CairoDIR1 dto BeirutDIR3 gahatinˇ -ila¯ -adatin¯ li-’l-bahti .¯ she returned from Cairo to Beirut it-changed dthe-organizationACT dfrom instrument- of a-confrontationORIG dto an-instrument for-the- It should be pointed out that we make a difference in verbal researchPAT (EFF → PAT) frames when assigning a functor to a verbal complement the organization changed from an instrument of con- that could be semantically regarded as a free modification frontation to an instrument for research (e.g. some directional meaning), but on the surface level this complement is the direct object in accusative. In this In the examples below, some verbs with three and more case, the syntactic (or morpho-syntactic) viewpoint (verbal actants are illustrated. Valency frames with ACT, ADDR, government affects the morphemic form of a complement, PAT: i.e. the criterion of direct transitivity) is preferred, and con- (6) samah. a la-hu bi-’d-duhuli¯ -ila¯ ’l-bayti sequently a functor of Patient is assigned as in sentence d ACT d ˘ ADDR d he -permitted to-him by-the-entering into (16). If there are two (or more) different morphemic real- the-housePAT izations on the surface (i.e. prepositional phrase versus di- he permited him to enter the house rect verbal government), although the meaning of that verb (7) sˇarakat¯ zawga-hˇ a¯ f¯ı ’l-h. ukmi is in both cases the very same, two (or more) different va- dsheACT-shared dhusband-of-herADDR din the-reignPAT lency frames are distinguished ((17) with DIR3 and (18) she shared the reign with her husband with PAT). It is worth mentioning that the usual word order of double (16) g˙adarat-i¯ ’l-Qahirata¯ -ila¯ Tall -Ab¯ıb

2302 dsheACT-left dCairoPAT dto Tell AvivDIR3 jointPAT she left Cairo for Tell Aviv and no joint press conference was held (17) was. ala ’l-muntahabu -ila¯ mad¯ınati Sal¯ ¯ırnu¯ ’l--¯ıt.al¯ ¯ıyati With double transitive verbs, those with complements Pa- it-arrived dthe-representation˘ ACT dto town-of Salerno tient and Effect, the first object (PAT) substitutes the gram- the-ItalianDIR3 matical (surface) subject while the second object (EFF) re- the representation arrived to the Italian town Salerno mains in accusative (compare to the active (10)). (18) was. altu-ha¯ [dimasq]ˇ min-a ’d-Dawh. ati (22) ,uyyina ’d-dukturu¯ Maws. il¯ı wak¯ılan li-kull¯ıyati ’t.-t.ibbi dIACT-arrived-to-ditPAT [Damascus] dfrom ad- he-was-appointed dthe-doctor MawsiliPAT dan- DawhaDIR1 assistant-dean to-faculty-of the-medicineEFF I arrived there [to Damascus] from ad-Dawha doctor Mawsili was appointed as an assistant dean of On the contrary, when a more abstract meaning of a partic- the faculty of medicine ular verb occurs, the complement is no longer considered It is to be pointed out that those double transitive verbs to be a free modification (directional meaning) and both with complements ADDR and PAT (verbs as -a,.ta¯ ‘to give’) (or more) variants—that with a prepositional phrase and might be passivized in two ways, either the former object the other with a direct object—are regarded as morphemic usually referred to as indirect (23) or the latter direct ob- variants of the same actant (Patient in this case) within one ject (24) can substitute the grammatical (surface) subject single valency frame (19) and (20). (Agameya, 2008: 559) (compare the active voice (8) and (19) was. alat q¯ımatu-ha¯ -ila¯ hamsati yur¯ uh¯ atin¯ (9)). it-reached dvalue-of-itACT˘ dto five eurosPAT (23) -u,tiyat fursatan taniyatan¯ . . ¯ its value reached 5 euros dsheADDR-was-given da-chance a-secondPAT (20) was. alat q¯ımatu ’s. -s. adir¯ ati¯ 625 milyuna¯ dul¯ arin¯ she was given the second chance d ACT d it-reached value-of the-exports 625 million-of a- (24) -u,.tiya ’d. -d. aw-u ’l--ahd. aru li-’l-malikati dollarPAT it-was-given dthe-light˘ the-greenPAT dto-the- the value of export reached 625 million dollars queenADDR When dealing with verbal valency in MSA, some issues the queen was given the green light concerning diathesis should be briefly discussed as well. In case of indirectly transitive verbs through prepositions MSA (contrary to Arabic dialects), as the successor of when passivized, the verb itself always remains in the 3rd , has preserved one of its characteristic person masculine singular in the passive while the surface features—regularly formed passive by changing the vowel subject (the previous object of the active verb) goes af- pattern of active verb (so-called inflectional, internal or ter the preposition (Badawi et al., 2004: 387–388; Ryding, apophonic passive)—which is usually used when the agent 2005: 666–667). of an action is not known or is preferred not to be men- (25) yuhkamu ,alay-hi bi-’s-signiˇ 1 . tioned. With some rare exceptions, only transitive verbs it-is-sentenced dupon-himPAT dby-the-jailEFF undergo passivisation, no matter if they are transitive di- he is sentenced to jail rectly or indirectly through the preposition. In the passive, Sometimes, the agent (Actor on the underlying level of rep- the position of the underlying Actor is reduced and the un- resentation) of some verb in the is expressed derstood surface object (usually PAT or ADDR) of the ac- periphrastically after particular prepositional phrases like tive verb becomes a subject (Agameya, 2008: 558). How- min qibali ‘by, on part of’, etc. (Badawi et al., 2004: 385– ever, besides this type of diathetical transformation, another 386; Retso,¨ 2006: 624–625). type of passive exists in Arabic, namely “a derivational pas- sive, where a derivational verb form (typically V, VII, or (26) sarikˇ atun¯ tudaru¯ min qibali mudara¯-a mutahas. s. is.¯ına d PAT d ˘ VIII) is used to convey a passive, reflexive or mediopas- companies is-managed from directions-of man- ACT sive sense of the action involved in the verb” (Ryding, agers specialist 2005: 657). Those cases are then, as a result of derivation companies are managed by professional managers through verbal morpho-semantic patterns, autonomous lex- It is worth mentioning that several Arabic verbs in passive icalized passive or passive-related verbs with their own va- voice have undergone some kind of semantic shift and are lency frames with more or less probable word-formational used figuratively. Due to this fact they have to be consid- relation to some active verb (, factitive, etc.) they ered as idiomatic, since they are no longer real semantic are derived from, cf. Figure 2. counterparts of their active forms. We can mention at this With directly , Actor is reduced and the direct point two very frequent verbs (27) tuwuffiy V ‘to die, to object (Patient) becomes a grammatical subject (compare to pass away’ and (28) ustushidˇ X ‘to die as a martyr’ reflect- the active voice in (3)). ing some degree of euphemism in connection with religious feeling. Those verbs would be treated in our proposed lex- (21) wa-lam yu,qad -ayyu mu-tamarin s. ih. af¯ ¯ıyin mustarakinˇ and-not it-was-held dany a-conference a-press a- icon as separate word entries. (27) tuwuffiya walidu-hu¯ f¯ı hadit¯ i sayyaratin¯ . ¯ he-died dfather-of-himACT din accident-of a-carMANN 1Some intransitive verbs, esp. those of movement, become transitive secondarily when taking a preposition, e.g. gˇa¯-‘to come’ his father died in a car accident → gˇa¯- bi- ‘to bring sth.’; qam¯ ‘to stand up’ → qam¯ bi- ‘to carry (28) ustushidaˇ ,ama¯ 1991 f¯ı hadit¯ i ’gtiy˙ alin¯ . ¯ out sth.’ (Badawi et al., 2004: 382–383; Drozd´ık, 2001). dheACT-died-as-a-martyr dyear-of 1991TWHEN din

2303 event-of an-assassinationMANN emphasize that almost all verbonominal derivatives (except he died as a martyr in 1991 in an assassination for verbal nouns of form I) are in Arabic derived regularly. Reflexivity is expressed in MSA either lexically by deriva- A great number of verbonominal derivatives can also be tional morpho-semantic classes (especially reflexive/pas- used as substantives and adjectives. In case of many verbal sive forms V, VII, VIII and some reflexive meanings of X) nouns, they are no longer used in their verbal meaning of or periphrastically using the nouns nafs ‘a self, a soul’ or action and they are capable of forming plural. As regards dat¯ ‘a self’ in the position of all possible verbal comple- the , many of them have been substantivized and ¯ ments. Some verbs with reflexive meaning have been al- also form plurals or are used as adjectives. Despite the ready illustrated in examples (2) and (5). Another closely loss of the verbal character of both types of verbonominals, related phenomena to reflexivity, i.e. reciprocity, is ex- not only do they preserve some of their valency comple- pressed in MSA either by derived morpho-semantic form ments (esp. in case of participles), but those complements VI (29) (then lexicalized) or by an anaphoric means using are usually expressed on the surface level by the same mor- the noun ba,d. ‘part, some’ (Badawi et al., 2004: 391–394). phemic representation, as it is with the verb (by preposition (29) ta,aqadat¯ ma,a ’l-banki ’l--islam¯ ¯ıyi bi-Dubay ,ala¯ -ida-¯ or with accusative). However, only those cases where the verbonominals preserve their verbal character (action) are rati funduqi -Abu¯ Z. ab¯ı ditACT-came-to-the-agreement dwith the-bank the- in the scope of our interest in this paper. islamic in-DubayADDR don managing-of hotel-of Abu As concerns the , it is capable of preserving all DhabiPAT slots of the original verbal valency frame when Actor is an- it came to the agreement with the Islamic Bank in nexed to it in the position of nomen rectum (mud. af¯ -ilay-hi). Dubay on managing the Abu Dhabi Hotel It is worth pointing out that those verbal nouns derived from directly transitive verbs can emphasize their verbal charac- The last issue related to diathesis to some degree concerns ter through affecting the accusative case in their object (see a certain type of verbs that are sometimes regarded as im- (33)) (Badawi et al., 2004: 238). personal owing to the fact that they frequently seem to be As regards the , namely the active participle, in neutralized in gender (3rd person singular masc.), since cases where it substitutes for verbal predicate, it behaves they disagree with their grammatical subject (modal verbs the same way as the verb and keeps all the verbal valency wagabˇ -amkan of necessity ( I, etc.) and possibility ( IV)) slots (see (34)). Actor is realized as a surface subject or is (McCarus, 1976: 17; Al-Qahtani, 2004: 114–118; Badawi absorbed by the subject of the governing sentence in such et al., 2004: 395–398, 418–419). These verbs have a verbal frequent cases where the active participle as a verbal at- noun (30) or subordinate clause after the conjunction -an tribute (h. al¯ ) develops the preceding sentence. (31) as a surface subject which corresponds according to As an example of preserving the valency frame (Table 5), our approach to Actor on the underlying level. They have we can mention here verb .talab¯ ‘to demand’ and its active their Patient, if realized on the surface, after the preposition participle mutalib¯ ‘demanding’ and verbal noun mutalabah¯ ,ala¯ . . (usually ) or as a direct object (when pronoun) or after ‘demanding, a demand’. This verb is double transitive; one li- -amkan preposition (when a noun) in case of IV. object is direct, the other is indirect through the preposition. (30) yumkinu-hu ’d-duhulu¯ f¯ı ’l-h. arbi d PAT˘ d ACT it-enables- him the-entering in the-war (subject) bi- (prep.)/-an (conj.) 4- (object) it is possible for him to enter the war ACTobl PATobl ORIGopt (31) yagibuˇ ,ala¯ quw¯ ati¯ ’t-tah. alufi¯ -an tuh. ¯ıt.u¯ bi-had¯ a¯ ’l- someone something/that of someone baladi ¯ d PAT d it-is-necessary on forces-of the-alliance that Table 5: Valency frame of the verb .talab¯ I. ËA£ ‘to demand’. they-encircle by-this the-countryACT the ally forces have to encircle that country (32) talabat-i¯ ’l-wikalatu¯ ’l-ittihad¯ ati¯ bi-tahl¯ıli ... According to the FGD approach, auxiliary verbs do not . . . it-demanded dthe-agencyACT dthe-unionsORIG dby- have their own valency frames, as they only modify the analysis-of . . . PAT main verb. These verbs (e.g. “kan¯ and her sisters”) are be- the agency demanded of the unions an analysis of . . . yond the scope of this paper. (33) wa-mut.alabati¯ ’l-wafdi ’l-gamˇ a¯,ata bi-h. alli nafsi-ha¯ 3.2. Valency of verbonominal derivatives and-demanding-of dthe-WafdACT dthe-groupORIG dby- dissolution-of self-of-itPAT Valency—a potential of a particular lexical unit to bind and al-Wafd’s demand that the group dissolves itself some other syntactic element(s)—is not limited only to verbs but it is the property of all autosemantic words, i.e. in- (34) mut.aliban¯ qiyadata¯ ’l-gayˇ siˇ bi--iqamati¯ h. iwarin¯ d ACT d ORIG cluding nouns and adjectives. [ he -is]-demanding leadership-of the-army d PAT In case of Arabic, studying the verbal valency should be by-establishing-of a-dialog very fruitful with respect to its verbonominal derivatives—a [he is] demanding of the military leadership to enter into a dialog participle (active and passive) and a verbal noun (mas. dar). Not only can these bear similar syntactic function as the verb (e.g. active participle as a predicate), but in many cases 4. Methodology and Resources they preserve the same or almost the same valency frame as The linguistic work on exploring the language data, extract- the verb they are derived from. In this respect, we have to ing the valency information, and recording it in form of a

2304 compact lexicon, needs to be supported with reusable re- project will include TrEd, Netgraph, and Xaira. TrEd and sources, effective tools, and scalable procedures. Netgraph6 allow structural queries into the trees. Xaira7 The key concept in our scenario is that of PDT-VALLEX searches in linear text, but the underlying data for it can (Hajicˇ et al., 2003; Hajicˇ and Uresovˇ a,´ 2003). The lexi- comprise node-related annotations of various kinds, in par- con is gradually developed by linking its conceivable en- ticular some automatically disambiguated morphological tries with their instances in the treebank. Conversely, the information (cf. Hajicˇ et al., 2005). treebank’s annotations are linked to the lexicon, so that the The complex representation of the valency lexicon based on validity of annotations can be verified, and the mutual con- the principle of links will yet enable miscellaneous options sistency of the treebank and the lexicon can be established. for the reuse, processing, and exporting of the data. One Due to Petr Pajas (p.c., 2008), the TrEd2 annotation en- of the possible linearizations of the lexicon is depicted in vironment can be extended for the new linking and editing Figure 2. Other output formats can follow, for instance, the tasks, as the data of the valency lexicon can conveniently be online, printed, or interactive interfaces of VALLEX. formatted in terms of trees (note the valency frame treelets and the capturing of alternatives in the trees of Figure 2). 5. Proposed Structure of the Lexicon While a comparable functionality is readily in place for the With respect to specific needs of Arabic, a slightly modified Czech treebank projects, our own contribution is to design structure of a lexical entry used in VALLEXwill be adopted and implement this model thoroughly for Arabic and the (Lopatkova´ et al., 2006). Entries will be organized alpha- PADT data. The new valency lexicon will be interlinked betically according to the Arabic three-/four-consonantal 8 with the morphological lexicon of ElixirFM, which itself is root, as is usual in most lexicographic works. The pro- linked with the morphological annotations. posed lexical entry will provide the following information: ElixirFM3 (Smrz,ˇ 2007) is an open-source implementation Word Root an abstract consonantal morpheme that can in- of inflectional and derivational morphological processes in clude more than one lexical entry (lexeme); Arabic allowing, among other things, to automatically com- Lexeme the most abstract notion represented by the pute the conversions between verbs, participles, and ver- (citation form); one lexeme encompasses all inflec- bal nouns required for the study of valency. Entries of the tional morphological forms and discerns one or more ElixirFM lexicon are encoded using morphophonemic tem- lexical units corresponding to different meanings; plates (shown in Figure 2). Most of the lexical information Morphological Information identification of the deriva- is adapted from the Buckwalter lexicon (Buckwalter, 2002). tional class (I–X) and the corresponding verbal noun; Converting the ElixirFM lexicon into a data structure acces- Lexical Unit each lexeme would contain at least one lex- sible and editable with TrEd leaves open the possibility to ical unit, i.e. a particular meaning of the given verb expand the annotation of valency also to general nouns and with its syntactic properties described by the valency adjectives (cf. Hajicˇ et al., 2003). The work on the valency frame; a potential idiomatic usage of a verb with its re- lexicon will however first concern the most frequent verbs stricted collocability will be taken into account—those and verbonominal derivatives, as discussed previously. cases will be treated as separate lexical units; Using the PADT data for the extraction of Arabic valency Valency Frame a sequence of frame slots—each of them information brings significant advantages. In the develop- is assigned to a different valency complementation of ment version of the treebank (to be published as PADT 2.0 the given verb (lexical unit) bringing the valency infor- in 2008), over one million tokens are annotated into surface mation in terms of the FGD functors as well as the sur- syntactic trees, and more than thirty thousand tokens are face morphemic representation of such complementa- equipped with tectogrammatical features. The underlying tion (e.g. 4- (accusative case), -I (indefinite state), morphological annotations provide disambiguated lexical the form of a particular bound lexeme, conjunction, or and inflectional (functional and structural) morphological preposition, like li-, -an, etc.); abstractions compatible with the ElixirFM system. Obligatoriness/Optionality valency frames will consist The contents of the arising valency lexicon will not be lim- of both obligatory and optional inner participants (ac- ited to evidence from PADT. The Arabic Gigaword (Graff, tants) and only of obligatory free modifications; 2007) supplies even more raw data of the newswire domain, Diathesis possible passivisation of the given verb by while the CLARA4 corpus offers documents from literature changing the internal vowel pattern, in case it is se- and other types of texts. The printed dictionaries to be con- mantically permitted, should be mentioned and illus- sulted include (e.g. Wehr, 1979; Baalbaki, 2000; Hoogland trated with an example taken from available corpora; et al., 2003; Zemanek´ et al., 2006). Additional information examples from the corpus as well We believe that the valency frames gathered in VALLEX5 as the exact frequency of occurrence and rank of a (Zabokrtskˇ y,´ 2005; Lopatkova´ et al., 2006, 2008) and PDT- lexeme in PADT will be provided; syntactic-semantic VALLEX can serve as a useful source of inspiration for de- verb class will be the subject of our future research.9 scribing valency in other languages as well. Based on our previous experience with exploring and man- 6http://quest.ms.mff.cuni.cz/netgraph/ aging large linguistic corpora, the search tools used in the 7http://sf.net/projects/xaira/ 8Weak verbs will be treated as triconsonantal with w or y in- 2http://ufal.mff.cuni.cz/∼pajas/tred/ cluded among the root consonants. 3http://sf.net/projects/elixir-fm/ 9In this respect, works of (McCarus, 1976) and (Al-Qahtani, 4http://enlil.ff.cuni.cz/ 2004) as well as verb classes proposed in VALLEX are of high 5http://ufal.mff.cuni.cz/vallex/2.5/ inspiration for us. 2305  |> "‘ q d" <| [ X † ¨  FaCaL ‘verb‘[ "hold", "convene", "conclude" ]‘imperf‘ FCiL, aqad i Y®« , I( )  FaCCaL ‘verb‘[ "complicate" ], ,aqqad II Y®«   TaFaCCaL ‘verb‘[ "be complicated" ], ta,aqqad V Y®ªK   TaFACaL ‘verb‘[ "contract", "convene" ], ta,aqad¯ VI Y¯AªK  © InFaCaL ‘verb‘[ "be held", "be gathered", "be convened" ], in,aqad VII Y®ªK @   IFtaCaL ‘verb‘[ "believe" ]] i,taqad VIII Y®J« @      ,aqad Y®« ta,aqad¯ Y¯AªK i,taqad Y®J« @

ACT PAT ACT ADDR PAT ACT ADDR PAT ACT ADDR PAT ACT PAT 4- ma,a 4- ,ala¯ -amal 4- ma,a ,ala¯ -anna 4- bi- bi-    -anna I ,aqad ya,qid ,aqd Y®« ¤ Y® ª K ¤ Y®« ∼ 526 situation became complicated afterwards as a conse- quence of the division of Germany ACT PAT (4-) to call, to hold (a meeting, a conference,   etc.) |act ,aqadat-i ’l-lagnatuˇ ’l-wizar¯ ¯ıyatu ’s-sur¯ ¯ıyatu VI ta,aqad¯ Y¯AªK ∼ 20 al-kuwayt¯ıyatu ’gtimˇ a¯,a-ha¯ ’d-dawr¯ıya f¯ı Dimasqaˇ the ACT ADDR (ma,a) PAT (,ala¯) to come to an agreement, committee of Syrian and Kuwaiti ministers held its to make a contract |act kama¯ ta,aqadat¯ [as-ˇ sarikatu]ˇ pas regular meeting in Damascus | wa-lam yu,qad -ayyu ma,a ’l-banki ’l--islam¯ ¯ıyi bi-Dubay ,ala¯ -idarati¯ fun- mu-tamarin sihaf¯ ¯ıyin mustarakinˇ and no joint press . . duqi -Abu¯ Z. ab¯ı ’t-tabi¯ ,i la-hu it [the company] also conference was held came to the agreement with the Islamic Bank in Dubai ACT ADDR (ma,a) PAT (4-) to conclude (a contract, a on managing its Abu Dhabi Hotel act treaty, etc.) | al-mutaqqafu ’llad¯ı ya,qidu s. afqatan  © ¯ ¯ VII in,aqad Y®ªK @ ∼ 42 ma,a ’l-h. ukumati¯ the intelectual that is concluding a bargain with the government |pas ,uqidat-i ’ttifaq¯ ¯ıyatu ACT to assemble, to meet, to be held, to take place ’l-Gazˇ a¯-iri ,ama¯ 1975 bayna -Ir¯ ana¯ wa-’l-,Iraqi¯ the (a meeting, a conference, etc.) |act wa-f¯ı tisrˇ ¯ı- Algerian agreement between Iran and Iraq was con- na ’t-tan¯ ¯ı 1995, in,aqada f¯ı Barsilˇ unah¯ al-mu-tamaru ¯ ¯ cluded in 1975 ’l-wizar¯ ¯ıyu li-’s-ˇ sirˇ akati¯ ’l--ur¯ ub¯ ¯ıyati ’l-mutawassit.¯ı- ACT ADDR (,ala¯) PAT (-amal 4-) to set one’s hope(s) to yati in November 1995, the Ministerial Conference (idiom) |act ya,qidu ’n-nad¯ ¯ı ’l--isban¯ ¯ıyu -am¯ alan¯ ,ala¯ of the Euro-Mediterranean Partnership was held in -an tusa¯,ida saˇ ,b¯ıyatu B¯ıkham¯ al-gˇarifatu¯ f¯ı ’htiraqi¯ Barcelona ˘  ’s-suqi¯ the Spanish club is setting its hopes to that VIII i,taqad Y®J« @ ∼ 285 Beckham’s great popularity will help to penetrate into ACT PAT (4-, bi-, -anna, bi--anna) to believe, to think the market |pas allad¯ına yu,qadu ,alay-him al--amalu |act kama¯ -anna-n¯ı -a,taqidu -anna ’r-ragulaˇ ’s-ˇ sarqˇ ¯ıya f¯ı ta,l¯ımi ’s-sig˙ari¯ wa-’l-kib¯ ari¯ li-lugati-n˙ a¯ ’l-hayyati . . . la¯ yuhibbu -an yakuna¯ murattabu zawgati-hiˇ -a,la¯ min those to whom the hope is set that they will teach both . murattabi-hi I also think that the man from the Ori- the young and the old our live language  ent does not like when his wife’s salary is higher than II ,aqqad Y®« ∼ 9 his own | fa-’l-,Iraqu¯ laysa milkan li-quw¯ ati¯ ’t-tah. a-¯ lufi, wa--in i,taqada ba,du ’l-muhafiz¯ ¯ına ’l-gududiˇ da-¯ ACT PAT to complicate |act dalika¯ -amrun yu,aqqidu ’l- . . . ¯ ¯ lika Iraq is not property of the ally forces even if some muskilataˇ wa-la¯ yahullu-ha¯ this is a thing that only . neo-conservatives have that opinion |pas ba,da nasriˇ complicates the problem and does not solve it  har¯ıt.ati ’t.-t.ar¯ıqi ’llat¯ı yu,taqadu -anna-ha¯ sa-takunu¯ Y®ª K ˘ V ta,aqqad ∼ 3 h. ayaw¯ıyatan f¯ı h. alli ’l-qad. ¯ıyati ’l-filast.¯ın¯ıyati after the publication of the Road Map, which is believed to be ACT to be/become complicated |act al-wad. ,u ta,aqqada lah¯ iqan bi-ma¯ hadata li--Almaniy¯ a¯ min taqs¯ımin the vital for the solution of the Palestinian Question . . ¯  Figure 2: Top left: Verb entries of the ElixirFM lexicon nested under the ,q d Y®« root. Top right: Possible layout of these entries including the explicit derivational class, showing that various pieces of information can be inferred directly from this lexicon’s representation. Middle: Valency frame treelets and the constraints on the surface realization of the functors, organized into trees. Optional slots are marked with dashed edges. Multiple options with frames or constraints are rendered as dotted links. Bottom: Linearized valency information augmented with the derivational class number, frequency, glosses, and corpus-based examples. Obligatory functors are highlited in bold.

2306 6. Conclusion Jan Hajic,ˇ Eva Hajicovˇ a,´ Jarmila Panevova,´ Petr Sgall, Petr Pajas, Jan Stˇ epˇ anek,´ Jirˇ´ı Havelka, and Marie Mikulova.´ Prague De- In our contribution, we have overviewed the theoretical pendency Treebank 2.0. LDC2006T01, ISBN 1-58563-370-4, concept of valency developed in FGD and have adapted its 2006. essential part for Modern Standard Arabic. We have illus- Eva Hajicovˇ a´ and Petr Sgall. Dependency Syntax in Func- trated various valency-related phenomena on a number of tional Generative Description. In Dependenz und Valenz – De- instances, extracted mainly from the PADT treebank. pendency and Valency, volume I, pages 570–592. Walter de We have presented the methodology, tools, and resources Gruyter, 2003. used for creating the valency lexicon of Arabic verbs, which Clive Holes. Modern Arabic: Structures, Functions, and Vari- is intended for applications in computational parsing or lan- eties. Georgetown University Press, 2004. guage generation, and for use by human researchers. Jan Hoogland, Kees Versteegh, and Manfred Woidich. Woorden- The proposed valency lexicon will be exploited in partic- boek Arabisch-Nederlands. Bulaaq, Amsterdam, 2003. ´ ´ ular during further tectogrammatical annotations of PADT, Marketa Lopatkova. Valency in the Prague Dependency Treebank: Building the Valency Lexicon. Prague Bulletin of Mathemati- and might possibly serve for enriching the expected sec- cal Linguistics, 78–80:37–60, 2003. ond edition of the corpus-based Arabic-Czech Dictionary Marketa´ Lopatkova´ and Jarmila Panevova.´ Recent developments (Zemanek´ et al., 2006). of the theory of valency in the light of the Prague Dependency Treebank. In Maria´ Simkovˇ a,´ editor, Insight into Slovak and Acknowledgements Czech Corpus Linguistic, pages 83–92. Veda, Bratislava, 2005. Marketa´ Lopatkova,´ Zdenekˇ Zabokrtskˇ y,´ and Vaclava´ Benesovˇ a.´ We would like to thank Petr Zemanek,´ Marketa´ Lopatkova,´ Valency Lexicon of Czech Verbs VALLEX 2.0. Technical Re- ˇ Petr Pajas, and Zdenekˇ Zabokrtsky´ for their valuable com- port 34, UFAL MFF UK, Charles University in Prague, 2006. ments and inspiring ideas. This work has been funded by Marketa´ Lopatkova,´ Zdenekˇ Zabokrtskˇ y,´ and Vaclava´ Kettnerova.´ the Ministry of Education of the Czech Republic (MSM00- Valencnˇ ´ı slovn´ık ceskˇ ych´ sloves [Valency Lexicon of Czech 21620823 and MSM0021620838) and by the Grant Agency Verbs]. Karolinum, Praha, 2008. of the Czech Academy of Sciences (1ET101120413). Ernest N. McCarus. A Semantic Analysis of Arabic Verbs. In Michigan Oriental Studies in Honour of George G. Cameron, References pages 3–28. Department of Near Eastern Studies, University of Michigan, Ann Arbor, 1976. Amira Agameya. Passive (syntax). In Kees Versteegh, editor, Marie Mikulova´ et al. A Manual for Tectogrammatical Layer An- Encyclopaedia of Arabic Language, volume 3, pages 558–564. notation of the Prague Dependency Treebank. Technical Re- Brill, Leiden – Boston, 2008. port 30, UFAL MFF UK, Charles University in Prague, 2006. Duleim Masoud Al-Qahtani. Semantic Valence of Arabic Verbs. Jarmila Panevova.´ On Verbal Frames in Functional Generative Libraire du Liban, Beirut, 2004. Description, Part I. Prague Bulletin of Mathematical Linguis- Rohi Baalbaki. Al-Mawrid: A Modern Arabic-English Dictio- tics, 22:3–40, 1974. nary. Dar El-Ilm Lilmalayin, Beirut, 2000. Jarmila Panevova.´ On Verbal Frames in Functional Generative Elsaid Badawi, Mike G. Carter, and Adrian Gully. Modern Writ- Description, Part II. Prague Bulletin of Mathematical Linguis- ten Arabic: A Comprehensive Grammar. Routledge, 2004. tics, 23:17–52, 1975. Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Jarmila Panevova.´ Valency Frames and the Meaning of the Sen- Version 1.0. LDC2002L49, ISBN 1-58563-257-0, 2002. tence. In The Prague School of Functional and Structural Walter A. Cook. Case Grammar: Development of the Matrix Linguistics, pages 223–243. John Benjamins, Amsterdam– Model (1970–1978). Georgetown University Press, 1979. Philadelphia, 1994. Ladislav Drozd´ık. Grammatical, Derivational and Lexical Dimen- Jan Retso.¨ Diathesis. In Kees Versteegh, editor, Encyclopaedia sions of Transitivity in Arabic. In Modern Written Arabic: of Arabic Language, volume 1, pages 622–626. Brill, Leiden – Studies in Grammar, Lexicon and Prestigious Oral Communi- Boston, 2006. cation, pages 87–103. Veda, Bratislava, 2001. Karin C. Ryding. A Reference Grammar of Modern Standard Ara- David Graff. Arabic Gigaword Third Edition. LDC2007T40, 1- bic. Cambridge University Press, 2005. 58563-460-3, 2007. Petr Sgall, Eva Hajicovˇ a,´ and Jarmila Panevova.´ The Meaning of Jan Hajicˇ and Zdenkaˇ Uresovˇ a.´ Linguistic Annotation: from the Sentence in Its Semantic and Pragmatic Aspects. D. Reidel Links to Cross-Layer Lexicons. In Proceedings of The Second & Academia, 1986. Workshop on Treebanks and Linguistic Theories, pages 69–80, Otakar Smrz.ˇ Functional Arabic Morphology. Formal System and Vaxj¨ o,¨ Sweden, 2003. Implementation. PhD thesis, Charles University in Prague, Jan Hajic,ˇ Jarmila Panevova,´ Zdenkaˇ Uresovˇ a,´ Alevtina Bemov´ a,´ 2007. Veronika Kola´rovˇ a,´ and Petr Pajas. PDT-VALLEX: Creating a Hans Wehr. Arabic-English Dictionary: The Hans Wehr Dictio- Large-coverage Valency Lexicon for Treebank Annotation. In nary of Modern Written Arabic. Spoken Language Services, Proceedings of the Second Workshop on Treebanks and Lin- 1979. Edited by J. Milton Cowan. guistic Theories, pages 57–68, Vaxj¨ o,¨ Sweden, 2003. William Wright. A Grammar of the Arabic Language. Cambridge Jan Hajic,ˇ Otakar Smrz,ˇ Petr Zemanek,´ Petr Pajas, Jan Snaidauf,ˇ University Press, 1991. Emanuel Beska,ˇ Jakub Kra´cmar,ˇ and Kamila Hassanova.´ Zdenekˇ Zabokrtskˇ y.´ Valency Lexicon of Czech Verbs. PhD thesis, Prague Arabic Dependency Treebank 1.0. LDC2004T23, ISBN Charles University in Prague, 2005. 1-58563-319-4, 2004. Petr Zemanek,´ Andrea Moustafa, Nadeˇzdaˇ Obadalova,´ and Jan Hajic,ˇ Otakar Smrz,ˇ Tim Buckwalter, and Hubert Jin. Feature- Frantisekˇ Ondra´s.ˇ Arabsko-ceskˇ y´ slovn´ık [Arabic-Czech Dic- Based Tagger of Approximations of Functional Arabic Mor- tionary]. Set Out, Praha, 2006. phology. In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), pages 53–64, Barcelona, Spain, 2005.

2307