Semantic Role Labeling Systems for Arabic using Kernel Methods

Mona Diab Alessandro Moschitti Daniele Pighin CCLS, Columbia University DISI, University of Trento FBK-irst; DISI, University of Trento New York, NY 10115, USA Trento, I-38100, Italy Trento, I-38100, Italy [email protected] [email protected] [email protected]

Abstract and the subject of the second, it is the ‘theme’ in There is a widely held belief in the natural lan- both sentences. Same idea applies to passive con- guage and computational linguistics commu- structions, for example. nities that Semantic Role Labeling (SRL) is There is a widely held belief in the NLP and com- a significant step toward improving important putational linguistics communities that identifying applications, e.g. and in- and defining roles of arguments in a sen- formation extraction. In this paper, we present an SRL system for Modern Standard Arabic tence has a lot of potential for and is a significant that exploits many aspects of the rich mor- step toward improving important applications such phological features of the language. The ex- as document retrieval, , question periments on the pilot Arabic Propbank data answering and (Moschitti et show that our system based on Support Vector al., 2007). Machines and Kernel Methods yields a global To date, most of the reported SRL systems are for SRL F1 score of 82.17%, which improves the English, and most of the data resources exist for En- current state-of-the-art in Arabic SRL. glish. We do see some headway for other languages 1 Introduction such as German and Chinese (Erk and Pado, 2006; Sun and Jurafsky, 2004). The systems for the other Shallow approaches to semantic processing are mak- languages follow the successful models devised for ing large strides in the direction of efficiently and English, e.g. (Gildea and Jurafsky, 2002; Gildea and effectively deriving tacit semantic information from Palmer, 2002; Chen and Rambow, 2003; Thompson text. Semantic Role Labeling (SRL) is one such ap- et al., 2003; Pradhan et al., 2003; Moschitti, 2004; proach. With the advent of faster and more power- Xue and Palmer, 2004; Haghighi et al., 2005). In the ful computers, more effective machine learning al- same spirit and facilitated by the release of the Se- gorithms, and importantly, large data resources an- mEval 2007 Task 18 data1, based on the Pilot Arabic notated with relevant levels of semantic information, Propbank, a preliminary SRL system exists for Ara- such as the FrameNet (Baker et al., 1998) and Prob- bic2 (Diab and Moschitti, 2007; Diab et al., 2007a). Bank (Kingsbury and Palmer, 2003), we are seeing However, it did not exploit some special character- a surge in efficient approaches to SRL (Carreras and istics of the Arabic language on the SRL task. M`arquez, 2005). In this paper, we present an SRL system for MSA SRL is the process by which predicates and their that exploits many aspects of the rich morphological arguments are identified and their roles are defined features of the language. It is based on a supervised in a sentence. For example, in the English sen- model that uses support vector machines (SVM) tence, ‘John likes apples.’, the predicate is ‘likes’ technology (Vapnik, 1998) for argument boundary whereas ‘John’ and ‘apples’, bear the semantic role detection and argument classification. It is trained labels agent (ARG0) and theme (ARG1). The cru- and tested using the pilot Arabic Propbank data re- cial fact about semantic roles is that regardless of leased as part of the SemEval 2007 data. Given the the overt syntactic structure variation, the underly- lack of a reliable Arabic deep syntactic parser, we ing predicates remain the same. Hence, for the sen- tence ‘John opened the door’ and ‘the door opened’, 1http://nlp.cs.swarthmore.edu/semeval/ though ‘the door’ is the object of the first sentence 2We use Arabic to refer to Modern Standard Arabic (MSA).

798 Proceedings of ACL-08: HLT, pages 798–806, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics use gold standard trees from the Arabic Tree Bank the VSO constructions, the verb agrees with the syn- (ATB) (Maamouri et al., 2004). tactic subject in Gender only, while in the SVO con- This paper is laid out as follows: Section 2 structions, the verb agrees with the subject in both presents facts about the Arabic language especially Number and Gender. Even though, in the ATB, an in relevant contrast to English; Section 3 presents equal distribution of both VSO and SVO is observed the approach and system adopted for this work; Sec- (each appearing 30% of the time), it is known that tion 4 presents the experimental setup, results and in general Arabic is predominantly in VSO order. discussion. Finally, Section 5 draws our conclu- Moreover, the pro-drop cases could effectively be sions. perceived as VSO orders for the purposes of SRL. 2 Arabic Language and Impact on SRL Syntactic Case is very important in the cases of VSO and pro-drop constructions as they indicate the syn- Arabic is a very different language from English in tactic roles of the object arguments with accusative several respects relevant to the SRL task. Arabic is a Case. Unless the morphology of syntactic Case is semitic language. It is known for its templatic mor- explicitly present, such free order could run phology where are made up of roots and af- the SRL system into significant confusion for many fixes. Clitics agglutinate to words. Clitics include of the predicates where both arguments are semanti- prepositions, conjunctions, and pronouns. cally of the same type. In contrast to English, Arabic exhibits rich mor- Arabic exhibits more complex noun phrases than phology. Similar to English, Arabic verbs explic- English mainly to express possession. These con- itly encode tense, voice, Number, and Person fea- structions are known as idafa constructions. Mod- tures. Additionally, Arabic encodes verbs with Gen- ern standard Arabic does not have a special parti- der, Mood (subjunctive, indicative and jussive) in- cle expressing possession. In these complex struc- formation. For nominals (nouns, adjectives, proper tures a surface indefinite noun (missing an explicit names), Arabic encodes syntactic Case (accusative, definite article) may be followed by a definite noun genitive and nominative), Number, Gender and Def- marked with genitive Case, rendering the first noun initeness features. In general, many of the morpho- syntactically definite. For example, I JË@ ÉgP rjl logical features of the language are expressed via . . 3 Albyt ‘man the-house’ meaning ‘man of the house’, short vowels also known as diacritics . ÉgP becomes definite. An adjective modifying the Unlike English, syntactically Arabic is a pro-drop . noun Ég. P will have to agree with it in Number, language, where the subject of a verb may be im- Gender, Definiteness, and Case. However, with- plicitly encoded in the verb morphology. Hence, we Q out explicit morphological encoding of these agree- observe sentences such as ÈA®K .Ë@ É¿@ Akl AlbrtqAl ments, the scope of the arguments would be con- ‘ate-[he] the-oranges’, where the verb Akl encodes fusing to an SRL system. In a sentence such as the third Person Masculine Singular subject in the ÉK ñ¢Ë@ I J.Ë@ Ég. P rjlu Albyti AlTwylu meaning ‘the verbal morphology. It is worth noting that in the tall man of the house’: ‘man’ is definite, masculine, ATB 35% of all sentences are pro-dropped for sub- singular, nominative, corresponding to Definiteness, ject (Maamouri et al., 2006). Unless the syntactic Gender, Number and Case, respectively; ‘the-house’ parse is very accurate in identifying the pro-dropped is definite, masculine, singular, genitive; ‘the-tall’ is case, identifying the syntactic subject and the under- definite, masculine, singular, nominative. We note lying semantic arguments are a challenge for such that ‘man’ and ‘tall’ agree in Number, Gender, Case pro-drop cases. and Definiteness. Syntactic Case is marked using Arabic syntax exhibits relative free word order. short vowels u, and i at the end of the word. Hence, Arabic allows for both subject-verb-object (SVO) rjlu and AlTwylu agree in their Case ending5 With- 4 and verb-subject-object (VSO) argument orders. In out the explicit marking of the Case information, 3Diacritics encode the vocalic structure, namely the short vowels, as well as the gemmination marker for consonantal dou- 5The presence of the Albyti is crucial as it renders rjlu defi- bling, among other markers. nite therefore allowing the agreement with AlTwylu to be com- 4MSA less often allows for OSV, or OVS. plete.

799 S

VP NPARGM−TMP VBDpredicate NPARG0 NPARG1 NP @YK. NP NP NP PP NN JJ started YgB@ úæ•AÖ Ï@ NN NP NNP NNP NN JJ IN NP úm'ðP   Þ Sunday past  KP NN JJ ðP . èPAK P éJ Ö P È NNP president Zhu Rongji visit official to Z@PPñË@ ú æJ ’Ë@ YJêË@ ministers Chinese India Figure 1: Annotated Arabic Tree corresponding to ‘Chinese Prime minister Zhu Rongjy started an official visit to India last Sunday.’ namely in the word endings, it could be equally valid Feature Name Description Predicate Lemmatization of the predicate word that ‘the-tall’ modifies ‘the-house’ since they agree Path Syntactic path linking the predicate and in Number, Gender and Definiteness as explicitly an argument, e.g. NN↑NP↑VP↓VBX Partial path Path feature limited to the branching of marked by the Definiteness article Al. Hence, these the argument idafa constructions could be tricky for SRL in the No-direction path Like Path without traversal directions absence of explicit morphological features. This is Phrase type Syntactic type of the argument node Position Relative position of the argument with compounded by the general absence of short vowels, respect to the predicate expressed by diacritics (i.e. the u and i in rjlu and Al- Verb subcategorization Production rule expanding the predicate parent node byti,) in naturally occurring text. Idafa constructions Syntactic Frame Position of the NPs surrounding the in the ATB exhibit recursive structure, embedding predicate First and last word/POS First and last words and POS tags of other NPs, compared to English where possession is candidate argument phrases annotated with flat NPs and is designated by a pos- sessive marker. Table 1: Standard linguistic features employed by most SRL systems. Arabic texts are underspecified for diacritics to different degrees depending on the genre of the Arabic. In this research, we go beyond the previ- text (Diab et al., 2007b). Such an underspecifica- ously proposed basic SRL system for Arabic (Diab tion of diacritics masks some of the very relevant et al., 2007a; Diab and Moschitti, 2007). We exploit morpho-syntactic interactions between the different the full morphological potential of the language to categories such as agreement between nominals and verify our hypothesis that taking advantage of the their modifiers as exemplified before, or verbs and interaction between morphology and syntax can im- their subjects. prove on a basic SRL system for morphologically Having highlighted the differences, we hypothe- rich languages. size that the interaction between the rich morphol- Similar to the previous Arabic SRL systems, our ogy (if explicitly marked and present) and syntax adopted SRL models use Support Vector Machines could help with the SRL task. The presence of ex- to implement a two step classification approach, plicit Number and Gender agreement as well as Case i.e. boundary detection and argument classifica- information aids with identification of the syntactic tion. Such models have already been investigated subject and object even if the word order is relatively in (Pradhan et al., 2005; Moschitti et al., 2005). The free. Gender, Number, Definiteness and Case agree- two step classification description is as follows. ment between nouns and their modifiers and other 3.1 Predicate Argument Extraction nominals, should give clues to the scope of argu- The extraction of predicative structures is based on ments as well as their classes. The presence of such the sentence level. Given a sentence, its predicates, morpho-syntactic information should lead to better as indicated by verbs, have to be identified along argument boundary detection and better classifica- with their arguments. This problem is usually di- tion. vided in two subtasks: (a) the detection of the target 3 An SRL system for Arabic argument boundaries, i.e. the span of the argument The previous section suggests that an optimal model words in the sentence, and (b) the classification of should take into account specific characteristics of the argument type, e.g. Arg0 or ArgM for Propbank

800 S VP VP VP VP VP NP NP NNP VBD D N

NP VP ⇒ VBD NP VBD NP VBD NP VBD NP VBD NP D N NNP Mary bought a cat . . .

NNP VBD NP bought D N D N bought D N bought D N bought a cat Mary

Mary bought D N a cat a cat cat

a cat Figure 2: Fragment space generated by a tree kernel function for the sentence Mary bought a cat. or Agent and Goal for the FrameNet. 3.2 Features The standard approach to learn both the detection The discovery of relevant features is, as usual, a and the classification of predicate arguments is sum- complex task. The choice of features is further com- marized by the following steps: pounded for a language such as Arabic given its rich (a) Given a sentence from the training-set, generate morphology and morpho-syntactic interactions. a full syntactic parse-tree; To date, there is a common consensus on the set of (b) let P and A be the set of predicates and the set basic standard features for SRL, which we will refer of parse-tree nodes (i.e. the potential arguments), re- to as standard. The set of standard features, refers to spectively; unstructured information derived from parse trees. (c) for each pair hp, ai∈P×A: extract the feature e.g. Phrase Type, Predicate Word or Head Word. + representation set, Fp,a and put it in T (positive ex- Typically the standard features are language inde- amples) if the subtree rooted in a covers exactly the pendent. In our experiments we employ the features words of one argument of p, otherwise put it in T − listed in Table 1, defined in (Gildea and Jurafsky, (negative examples). 2002; Pradhan et al., 2005; Xue and Palmer, 2004). For instance, in Figure 1, for each combination of For example, the Phrase Type indicates the syntac- the predicate started with the nodes NP, S, VP, VPD, tic type of the phrase labeled as a predicate argu- NNP, NN, PP, JJ or IN the instances Fstarted,a are ment, e.g. NP for ARG1 in Figure 1. The generated. In case the node a exactly covers ‘presi- Path contains the path in the parse tree between the dent ministers Chinese Zhu Rongji’ or ‘visit official predicate and the argument phrase, expressed as a to India’, Fp,a will be a positive instance otherwise sequence of nonterminal labels linked by direction it will be a negative one, e.g. Fstarted,IN . (up or down) symbols, e.g. VBD ↑ VP ↓ NP for The T + and T − sets are used to train the bound- ARG1 in Figure 1. The Predicate Word is the surface ary classifier. To train the multi-class classifier, T + form of the verbal predicate, e.g. started for all argu- + ments. The standard features, as successful as they can be reorganized as positive Targi and negative − are, are designed primarily for English. They are not Targi examples for each argument i. This way, an in- dividual ONE-vs-ALL classifier for each argument i exploiting the different characteristics of the Arabic can be trained. We adopt this solution, according language as expressed through morphology. Hence, to (Pradhan et al., 2005), since it is simple and ef- we explicitly encode new SRL features that capture fective. In the classification phase, given an unseen the richness of Arabic morphology and its role in morpho-syntactic behavior. The set of morphologi- sentence, all its Fp,a are generated and classified by cal attributes include: inflectional morphology such each individual classifier Ci. The argument associ- ated with the maximum among the scores provided as Number, Gender, Definiteness, Mood, Case, Per- by the individual classifiers is eventually selected. son; derivational morphology such as the Lemma form of the words with all the diacritics explicitly The above approach assigns labels independently, marked; vowelized and fully diacritized form of the without considering the whole predicate argument surface form; the English gloss6. It is worth noting structure. As a consequence, the classifier output that there exists highly accurate morphological tag- may generate overlapping arguments. Thus, to make gers for Arabic such as the MADA system (Habash the annotations globally consistent, we apply a dis- and Rambow, 2005; Roth et al., 2008). MADA tags ambiguating heuristic adopted from (Diab and Mos- chitti, 2007) that selects only one argument among 6The gloss is not sense disambiguated, hence they include multiple overlapping arguments. homonyms.

801 Feature Name Description Definiteness Applies to nominals, values are definite, indefinite or inapplicable Number Applies to nominals and verbs, values are singular, plural or dual or inapplicable Gender Applies to nominals, values are feminine, masculine or inapplicable Case Applies to nominals, values are accusative, genitive, nominative or inapplicable Mood Applies to verbs, values are subjunctive, indicative, jussive or inapplicable Person Applies to verbs and pronouns, values are 1st, 2nd, 3rd person or inapplicable Lemma The citation form of the word fully diacritized with the short vowels and gemmination markers if applicable Gloss this is the corresponding English meaning as rendered by the underlying lexicon. Vocalized word The surface form of the word with all the relevant diacritics. Unlike Lemma, it includes all the inflections. Unvowelized word The naturally occurring form of the word in the sentence with no diacritics.

Table 2: Rich morphological features encoded in the Extended Argument Structure Tree (EAST). modern standard Arabic with all the relevant mor- VP VBD NP phological features as well as it produces highly ac-  curate lemma and gloss information by tapping into @YK. NP NP an underlying morphological lexicon. A list of the NN NP NNP NNP ' extended features is described in Table 2.  KP NN JJ ðP ú m. ðP

The set of possible features and their combina- Z@PPñË@ ú æJ ’Ë@ tions are very large leading to an intractable fea- Figure 3: Example of the positive AST structured feature encoding ture selection problem. Therefore, we exploit well the argument ARG0 in the sentence depicted in Figure 1. known kernel methods, namely tree kernels, to ro- K t1,t2 P P n1,n2 , where bustly experiment with all the features simultane- T ( ) = n1∈Nt1 n2∈Nt2 ∆( ) ously. Such kernel engineering, as shown in (Mos- Nt1 and Nt2 are the sets of nodes of t1 and t2, re- chitti, 2004), allows us to experiment with many spectively. The function ∆(·) evaluates the num- syntactic/semantic features seamlessly. ber of common fragments rooted in n1 and n2, i.e. |F| ∆(n1,n2) = Pi=1 Ii(n1)Ii(n2). ∆ can be ef- 3.3 Engineering Arabic Features with Kernel ficiently computed with the algorithm proposed in Methods (Collins and Duffy, 2002). Feature engineering via kernel methods is a useful technique that allows us to save a lot of time in the 3.4 Structural Features for Arabic design and implementation of features. The basic In order to incorporate the characteristically rich idea is (a) to design a set of basic value-attribute Arabic morphology features structurally in the tree features and apply polynomial kernels and generate representations, we convert the features into value- all possible combinations; or (b) to design basic tree attribute pairs at the leaf node level of the tree. Fig structures expressing properties related to the target 1 illustrates the morphologically underspecified tree linguistic objects and use tree kernels to generate with some of the morphological features encoded in all possible tree subparts, which will constitute the the POS tag such as VBD indicating past tense. This feature representation vectors for the learning algo- contrasts with Fig. 4 which shows an excerpt of the rithm. same tree encoding the chosen relevant morpholog- Tree kernels evaluate the similarity between two ical features. trees in terms of their overlap, generally measured For the sake of classification, we will be dealing as the number of common substructures (Collins with two kinds of structures: the Argument Structure and Duffy, 2002). For example, Figure 2, shows Tree (AST) (Pighin and Basili, 2006) and the Ex- a small parse tree and some of its fragments. To tended Argument Structure Tree (EAST). The AST design a function which computes the number of is defined as the minimal subtree encompassing all common substructures between two trees t1 and t2, and only the leaf nodes encoding words belonging let us define the set of fragments F={f1,f2, ..} and to the predicate or one of its arguments. An AST the indicator function Ii(n), equal to 1 if the tar- example is shown in Figure 3. The EAST is the get fi is rooted at node n and 0 otherwise. A tree corresponding structure in which all the leaf nodes kernel function KT (·) over two trees is defined as: have been extended with the ten morphological fea-

802 VP

VBD NP

FEAT FEAT FEAT FEAT FEAT FEAT FEAT NP NP

Gender Number Person Lemma Gloss Vocal UnVocal NN NP . . .

MASC S 3 bada>-a start/begin+he/it bada>a bd> FEAT FEAT FEAT FEAT FEAT FEAT FEAT . . .

Definite Gender Number Case Lemma Gloss Vocal

DEF MASC S GEN ra}iys president/head/chairman ra}iysi Figure 4: An excerpt of the EAST corresponding to the AST shown in Figure 3, with attribute-value extended morphological features represented as leaf nodes. tures described in Table 2, forming a vector of 10 cle È@. Therefore, the system could be lead astray to preterminal-terminal node pairs that replace the sur- conclude that ‘the-Chinese’ does not modify ‘pres- face of the leaf. The resulting EAST structure is ident’ but rather ‘the-ministers’. Without knowing shown in Figure 4. the Case information and the agreement features be- Not all the features are instantiated for all the leaf tween the verb @YK. ‘started’ and the two nouns head- ing the two main NPs in our tree, the syntactic sub- node words. Due to space limitations, in the fig-  ure we did not include the Features that have NULL ject can be either èPAK P ‘visit’ or  KP ‘president’ in values. For instance, Definiteness is always asso- Figure 1. The EAST is more effective in identifying ciated with nominals, hence the verb @YK. bd’ ‘start’ the first noun as the syntactic subject and the second is assigned a NULL value for the Definite feature. as the object since the morphological information in- Verbs exhibit Gender information depending on in- dicates that they are in nominative and accusative flections. For our example, @YK. ‘started’ is inflected Case, respectively. Also the agreement in Gender for masculine Gender, singular Number, third per- and Number between the verb and the syntactic sub- son. On the other hand, the noun Z@PPñË@ is definite ject is identified in the enriched tree. We see that @YK. and is assigned genitive Case since it is in a posses- ‘started’ and KP ‘president’ agree in being singu-  sive, idafa, construction. lar and masculine. If èPAK P ‘visit’ were the syntactic The features encoded by the EAST can provide subject, we would have seen the verb inflected as very useful hints for boundary and role classifica- H @YK. ‘started-FEM’ with a feminine inflection to re- tion. Considering Figure 1, argument boundaries is flect the verb-subject agreement on Gender. Hence not as straight forward to identify as there are sev- these agreement features should help with the clas- eral NPs. Assuming that the inner most NP ‘minis- sification task. ters the-Chinese’ is a valid Argument could poten- 4 Experiments tially be accepted. There is ample evidence that any In these experiments we investigate (a) if the tech- NN followed by a JJ would make a perfectly valid nology proposed in previous work for automatic Argument. However, an AST structure would mask SRL of English texts is suitable for Arabic SRL the fact that the JJ ‘the-Chinese’ does not modify the systems, and (b) the impact of tree kernels using NN ‘ministers’ since they do not agree in Number7, new tree structures on Arabic SRL. For this purpose, and in syntactic Case, where the latter is genitive and we test our models on the two individual phases the former is nominative. ‘the-Chinese’ in fact mod- of the traditional 2-stage SRL model (i.e. bound- ifies ‘president’ as they agree on all the underlying ary detection and argument classification) and on morphological features. Conversely, the EAST in the complete SRL task. We use three different fea- Figure 4 explicitly encodes this agreement includ- ture spaces: a set of standard attribute-value features ing an agreement on Definiteness. It is worth noting and the AST and the EAST structures defined in that just observing the Arabic word ‘president’  KP 3.4. Standard feature vectors can be combined with in Fig 1, the system would assume that it is an indef- a polynomial kernel (Poly), which, when the de- inite word since it does not include the definite arti- gree is larger than 1, automatically generates feature 7The POS tag on this node is NN as broken plural, however, conjunctions. This, as suggested in (Pradhan et al., the underlying morphological feature Number is plural. 2005; Moschitti, 2004), can help stressing the differ-

803 ences between different argument types. Tree struc- tures can be used in the learning algorithm thanks to the tree kernels described in Section 3.3. Moreover, to verify if the above feature sets are equivalent or complementary, we can join them by means of addi- tive operation which always produces a valid kernel (Shawe-Taylor and Cristianini, 2004).

4.1 Experimental setup

We use the dataset released in the SemEval 2007 Figure 5: Impact of polynomial kernel, tree kernels and their combi- Task 18 on Arabic Semantic Labeling (Diab et al., nations on boundary detection. 2007a). The data covers the 95 most frequent verbs in the Arabic III ver. 2 (ATB). The ATB consists of MSA newswire data from the Annhar newspaper, spanning the months from July to November, 2002. All our experiments are carried out with gold standard trees. An important characteristic of the dataset is the use of unvowelized Arabic in the Buckwalter transliteration scheme for deriving the basic features for the AST experimental condition. The data com- Figure 6: Impact of the polynomial kernel, tree kernels and their prises a development set, a test set and a training combinations on the accuracy in role classification (gold boundaries) set of 886, 902 and 8,402 sentences, respectively, and on the F1 of complete SRL task (boundary + role classification). where each set contain 1725, 1661 and 21,194 argu- ment instances. These instances are distributed over ing that both languages share an underlying syntax- 26 different role types. The training instances of semantics interface. Moreover, we note that the F1 the boundary detection task also include parse-tree of EAST is higher than the F1 of AST which in turn nodes that do not correspond to correct boundaries is higher than the linear kernel (Poly1). However, (we only considered 350K examples). For the exper- when conjunctive features (Poly2-4) are used the iments, we use SVM-Light-TK toolkit8 (Moschitti, system accuracy exceeds those of tree kernel mod- 2004; Moschitti, 2006) and its SVM-Light default els alone. Further increasing the polynomial degree parameters. The system performance, i.e. F1 on sin- (Poly5-6) generates very complex hypotheses which gle boundary and role classifier, accuracy of the role result in very low accuracy values. multi-classifier and the F1 of the complete SRL sys- Therefore, to improve the polynomial kernel, we tems, are computed by means of the CoNLL evalua- sum it to the contribution of AST and/or EAST, tor9. obtaining AST+Poly3 (polynomial kernel of degree 3), EAST+Poly3 and AST+EAST+Poly3, whose F1 4.2 Results scores are also shown in Figure 5. Such com- Figure 5 reports the F1 of the SVM boundary classi- bined models improve on the best polynomial ker- fier using Polynomial Kernels with a degree from 1 nel. However, not much difference is shown be- to 6 (i.e. Polyi), the AST and the EAST kernels and tween AST and EAST on boundary detection. This their combinations. We note that as we introduce is expected since we are using gold standard trees. conjunctions, i.e. a degree larger than 2, the F1 in- We hypothesize that the rich morphological fea- creases by more than 3 percentage points. Thus, not tures will help more with the role classification only are the English features meaningful for Ara- task. Therefore, we evaluate role classification with bic but also their combinations are important, reveal- gold boundaries. The curve labeled ”classification” 8http://disi.unitn.it/ moschitti in Figure 6 illustrates the accuracy of the SVM ∼ 9http://www.lsi.upc.es/ srlconll/soft.html role multi-classifier according to different kernels. ∼ 804 AST+ P3 AST EAST AST+ EAST+ EAST+ Role Precision Recall Fβ=1 P3 P3 P3 ARG0 96.14% 97.27% 96.70 P 81.73 80.33 81.7 81.73 82.46 83.08 ARG0-STR 100.00% 20.00% 33.33 R 78.93 75.98 77.42 80.01 80.67 81.28 ARG1 88.52% 92.70% 90.57 F1 80.31 78.09 79.51 80.86 81.56 82.17 ARG1-STR 33.33% 15.38% 21.05 ARG2 69.35% 76.67% 72.82 Table 3: F1 of different models on the Arabic SRL task. ARG3 66.67% 16.67% 26.67 ARGM-ADV 66.98% 61.74% 64.25 ARGM-CAU 100.00% 9.09% 16.67 Again, we note that a degree larger than 1 yields ARGM-CND 25.00% 33.33% 28.57 a significant improvement of more than 3 percent ARGM-LOC 67.44% 95.08% 78.91 ARGM-MNR 54.00% 49.09% 51.43 points, suggesting that the design of Arabic SRL ARGM-NEG 80.85% 97.44% 88.37 system based on SVMs requires polynomial kernels. ARGM-PRD 20.00% 8.33% 11.76 ARGM-PRP 85.71% 66.67% 75.00 In contrast to the boundary results, EAST highly im- ARGM-TMP 91.35% 88.79% 90.05 proves over AST (by about 3 percentage points) and produces an F1 comparable to the best Polynomial Table 4: SRL F1 of the single arguments using the kernel. Moreover, AST+Poly3, EAST+Poly3 and AST+EAST+Poly3 kernel. AST+EAST+Poly3 all yield different degrees of im- SRL task. We note that, as for English SRL, ARG0 provement, where the latter model is both the richest shows high values (96.70%). Conversely, ARG1 in terms of features and the most accurate. seems more difficult to be classified in Arabic. The These results strongly suggest that: (a) tree ker- F for ARG1 is only 90.57% compared with 96.70% nels generate new syntactic features that are useful 1 for ARG0. for the classification of Arabic semantic roles; (b) This may be attributed to the different possi- the richer morphology of Arabic language should ble syntactic orders of Arabic consructions confus- be exploited effectively to obtain accurate SRL sys- ing the syntactic subject with the object especially tems; (c) tree kernels appears to be a viable approach where there is no clear morphological features on to effectively achieve this goal. the arguments to decide either way. To illustrate the practical feasibility of our system, we investigate the complete SRL task where both 5 Conclusions the boundary detection and argument role classifica- We have presented a model for Arabic SRL that tion are performed automatically. The curve labeled yields a global SRL F1 score of 82.17% by combin- ”boundary + role classification” in Figure 6 reports ing rich structured features and traditional attribute- the F of SRL systems based on the previous ker- 1 value features derived from English SRL systems. nels. The trend of the plot is similar to the gold- The resulting system significantly improves previ- standard boundaries case. The difference among ously reported results on the same task and dataset. the F scores of the AST+Poly3, EAST+Poly3 and 1 This outcome is very promising given that the avail- AST+EAST+Poly3 is slightly reduced. This may able data is small compared to the English data sets. be attributed to the fact that they produce similar For future work, we would like to explore further boundary detection results, which in turn, for the explicit morphological features such as aspect tense global SRL outcome, are summed to those of the and voice as well as richer POS tag sets such as those classification phase. Table 3 details the differences proposed in (Diab, 2007). Finally, we would like to among the models and shows that the best model experiment with automatic parses and different syn- improves the SRL system based on the polynomial tactic formalisms such as dependencies and shallow kernel, i.e. the SRL state-of-the-art for Arabic, by parses. about 2 percentage points. This is a very large im- provement for SRL systems (Carreras and M`arquez, Acknowledgements 2005). These results confirm that the new enriched Mona Diab is partly funded by DARPA Contract No. HR0011- structures along with tree kernels are a promising ap- 06-C-0023. Alessandro Moschitti has been partially funded by proach for Arabic SRL systems. CCLS of the Columbia University and by the FP6 IST LUNA Finally, Table 4 reports the F of the best model, 1 project contract no 33549. AST+EAST+Poly3, for individual arguments in the

805 References Tabessi. 2006. Developing and Using a Pilot Dialectal Arabic Treebank. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. Alessandro Moschitti, Ana-Maria Giuglea, Bonaventura 1998. The Berkeley FrameNet Project. In COLING- Coppola, and Roberto Basili. 2005. Hierarchical ACL ’98: University of Montreal´ . Semantic Role Labeling. In Proceedings of CoNLL- Xavier Carreras and Llu´ıs M`arquez. 2005. Introduction 2005, Ann Arbor, Michigan. to the CoNLL-2005 Shared Task: Semantic Role La- Alessandro Moschitti, Silvia Quarteroni, Roberto Basili, beling. In Proceedings of CoNLL-2005, Ann Arbor, and Suresh Manandhar. 2007. Exploiting Syntactic Michigan. and Shallow Semantic Kernels for Question Answer John Chen and Owen Rambow. 2003. Use of Deep Lin- Classification. In Proceedings of ACL’07, Prague, guistic Features for the Recognition and Labeling of Czech Republic. Semantic Arguments. In Proceedings of EMNLP, Sap- Alessandro Moschitti. 2004. A Study on Convolution poro, Japan. Kernels for Shallow Semantic . In proceedings Michael Collins and Nigel Duffy. 2002. New Ranking of ACL’04, Barcelona, Spain. Algorithms for Parsing and Tagging: Kernels over Dis- Alessandro Moschitti. 2006. Making Tree Kernels Prac- crete structures, and the voted perceptron. In ACL02. tical for Natural Language Learning. In Proceedings Mona Diab and Alessandro Moschitti. 2007. Semantic of EACL’06. Parsing for Modern Standard Arabic. In Proceedings Alessandro Moschitti, Daniele Pighin and Roberto Basili. of RANLP, Borovets, Bulgaria. 2006. Semantic Role Labeling via Tree Kernel Joint Mona Diab, Musa Alkhalifa, Sabry ElKateb, Christiane Inference. In Proceedings of CoNLL-X. Fellbaum, Aous Mansouri, and Martha Palmer. 2007a. Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H. Semeval-2007 task 18: Arabic Semantic Labeling. In Martin, and Daniel Jurafsky. 2003. Semantic Role Proceedings of SemEval-2007, Prague, Czech Repub- Parsing: Adding Semantic Structure to Unstructured lic. Text. In Proceedings ICDM’03, Melbourne, USA. Mona Diab, Mahmoud Ghoneim, and Nizar Habash. Sameer Pradhan, Kadri Hacioglu, Valerie Krugler, 2007b. Arabic Diacritization in the Context of Sta- Wayne Ward, James H. Martin, and Daniel Jurafsky. tistical Machine Translation. In Proceedings of MT- 2005. Support Vector Learning for Semantic Argu- Summit, Copenhagen, Denmark. ment Classification. Machine Learning. Mona Diab. 2007. Towards an Optimal Pos Tag Set for Ryan Roth, Owen Rambow, Nizar Habash, Mona Diab, Modern Standard Arabic Processing. In Proceedings and Cynthia Rudin. 2008. Arabic Morphological Tag- of RANLP, Borovets, Bulgaria. ging, Diacritization, and Lemmatization Using Lex- Katrin Erk and Sebastian Pado. 2006. Shalmaneser – A eme Models and Feature Ranking. In ACL’08, Short Toolchain for Shallow . Proceedings Papers, Columbus, Ohio, June. of LREC. John Shawe-Taylor and Nello Cristianini. 2004. Kernel Daniel Gildea and Daniel Jurafsky. 2002. Automatic La- Methods for Pattern Analysis. Cambridge University beling of Semantic Roles. Computational Linguistics. Press. Daniel Gildea and Martha Palmer. 2002. The Neces- Honglin Sun and Daniel Jurafsky. 2004. Shallow Seman- sity of Parsing for Predicate Argument Recognition. tic Parsing of Chinese. In Proceedings of NAACL- In Proceedings of ACL-02, Philadelphia, PA, USA. HLT. Nizar Habash and Owen Rambow. 2005. Arabic Tok- Cynthia A. Thompson, Roger Levy, and Christopher enization, Part-of-Speech Tagging and Morphological Manning. 2003. A Generative Model for Semantic Disambiguation in One Fell Swoop. In Proceedings of Role Labeling. In ECML’03. ACL’05, Ann Arbor, Michigan. Vladimir N. Vapnik. 1998. Statistical Learning Theory. Aria Haghighi, Kristina Toutanova, and Christopher John Wiley and Sons. Manning. 2005. A Joint Model for Semantic Role Nianwen Xue and Martha Palmer. 2004. Calibrating Labeling. In Proceedings ofCoNLL-2005, Ann Arbor, Features for Semantic Role Labeling. In Dekang Lin Michigan. and Dekai Wu, editors, Proceedings of EMNLP 2004, Paul Kingsbury and Martha Palmer. 2003. Propbank: the Barcelona, Spain. Next Level of Treebank. In Proceedings of and Lexical Theories. Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki. 2004. The Penn Arabic Treebank : Building a Large-Scale Annotated Arabic Corpus. Mohamed Maamouri, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, and Dalila

806