<<

Morpho-syntactically Annotated Amharic Seyoum, Binyam Ephrem Miyao, Yusuke Mekonnen, Baye Yimam Addis Ababa University National Institute of Informatics Addis Ababa University [email protected] [email protected] [email protected]

morphological features and syntactic structure Abstract (dependency structures). , on the other hand, is a process of In this paper, we describe an ongoing project recognizing input sentences and identifying units of developing a treebank for Amharic. The main objective of developing the treebank is to like subject, verb, and object. It is used to use it as an input for the development of a determine the interaction among these parser. Morphologically-rich Languages like grammatical functions. Deep parsing, sometimes Arabic, Amharic and other Semitic languages referred to as deep processing, is a process present challenges to the state-of-art in whereby rich linguistic resource is used to give a parsing. In such language morphemes play detailed (or deep) syntactic as well as semantic important functions in both and analysis of input sentences. Such systems are syntax. In addition to the existence of high basic components and language specific lexical variations due the morphology, resources for any Natural Language Processing Amharic has a number of clitics which are not applications which require deeper understanding indicated with any special marker in the orthography. Considering the status of of Natural Languages. Amharic resources and challenges to the The structure of the paper is as follows: after existing approach to parsing, we suggest to we give a brief discussion on the motivation for develop a treebank where clitics are separated the development of the treebank in Section 2, the manually from content and annotated challenges of Amharic to treebank development semi-automatically for part-of-speech, will be discussed in Section 3. Section 4 deals morphological features and syntactic with the existing resources and their limitation. relations. Section 5 is devoted to the proposed solution and finally the paper will concluded in Section 6. 1. Introduction 2. Motivation A treebank in general can be viewed as a linguistically annotated corpus that includes The main objective of the project is to develop a grammatical analysis beyond part-of-speech level parser for Amharic sentences. For parser (Nivre, 2008). The level of annotation could development there are two major approaches. include , phrase and sentence levels (Frank These are grammar-driven (or rule-based) and and Erhard, 2012). Such language specific data-driven (or statistical-based) (Nivre, 2008). annotated corpus is an input for the development In grammar-driven parsing, a formal grammar is of various natural language processing tools that used to define possible parsing results for each use data-driven approaches. Though the design of string in the language. We define the grammar a treebank should be motivated by its intended rules and the list of possible lexical items to use, most treebank annotation schemes are which the rules can apply. It follows linguistic organized into a number of layers (Nivre, 2008). motivation to precisely describe a grammar of a For our purpose, we propose three layers of language. Even though, the rules are hand-written information. These are the part-of-speech, (or hand-crafted), they are capable of delivering a

48

highly accurate in-depth analysis of complex be categorized as less resourced language. As a natural language phenomena. result, to develop a parser for Amharic we Motivated by the hypothesis that humans suggested first to develop a treebank manually. In recognize patterns and phrases that have occurred this paper, we are going to address some of the in past experience (Bod et al., 2012), the task in challenges and solution in developing the data-driven approaches is to learn syntactic treebank for Amharic. structures from an existing treebank or a large syntactically annotated corpus. In data-driven 3. Challenges approach, the development of a parsing system presupposes availability of a treebank for a given In developing a treebank for languages which are language (Nivre et al., 2007). In data-driven, the morphologically rich and less resourced, there are knowledge of the grammar rules will be learned a number of challenging issues which should be from a manually-parsed training data. given due attention. In the following sections we Both methods have their own shortcomings provide a summary of the major challenges we and benefits. Grammar-driven methods are have noticed so far. known to be linguistically precise, but have a 3. 1 Morphologically-rich languages problem of robustness and ambiguity. On the other hand, data-driven methods are good for Amharic being one of the morphologically rich developing wide-coverage parsers rapidly. languages presents a challenge to the area of NLP However, the accuracy of the parser depends on in general and to parsing in particular (Dehadari, the magnitude of the training data and the et al., 2011). It is considered as a existence of accurate language specific resources. morphologically-rich languages in the sense that As data-driven methods depend on the size of grammatical relations like subject, object, etc. or corpus, it is also subject to problems of word arrangements and syntactic information are robustness, the ability of a system to give a indicated morphologically or at word level certain analysis for a new or unseen input (Habash, 2010).When parsing models which sentence (Nivre, 2006). have the highest performance for languages like On the other hand, in choosing which English are adopted and implemented for approaches to follow for the development of morphologically-rich languages like Amharic, parsing system, we need to consider the status of they perform poorly. This is due to the the language. In general, languages can be highly complexity of the morphological structure resourced or less resourced. Such distinction is (Tsarfaty, et al., 2010). based on the availability of tools and electronic An orthographic word in such a language data prepared in a language. Regarding Amharic which is delimited by white space, may be a there are initiatives to develop a large corpus. It combination of one or more function words and is also stated that language processing research inflectional morphemes. For instance, an for Amharic has shown some progress in recent Amharic orthographic word like ከየቤቱና years in both corpus and basic Natural Language /kəjjəbetunna/ “from each house and”, includes Processing (NLP) tools development (Gamback, the preposition ከ- /kə-/ “from”, reduced form of 2012).The existing corpus so far focuses on web the distributive marker እየ- /ɨjjə/ “each”, ቤት /bet/ or news corpus. These sources are good for “house”, the definite marker -ኡ /-u / “the” and the producing a large corpus quickly. However, for conjunction -ና /-nna/ “and”. The clitics like Amharic, there are no text preparation tools like preposition, conjunction, auxiliaries, etc. have Spell Checker, , etc. As a syntactic roles that indicate grammatical relations result, a part of such corpus is produced with with the content words (Tsarfaty, 2013). In order some level of errors. The efficiency of NLP tools to show the syntactic relation between clitics and trained on such corpus may also be questioned. content words, we need to consider how to Therefore, in both tools and corpus, Amharic can

49

represent and tag such elements. In addition, in handle this feature. In some cases, morphologically-rich languages a syntactic gemination can be used to convey position may not give clue about the syntactic grammatical information. Such case is relation. Rather it provides morphological very common in the construction of features that determine the function of a phrase in relativized verbs. a sentence. Thus, we should combine both B. words - As can be observed structural and morphological features to predict in other languages, compound words are syntactic dependencies. written in three ways; with a space State-of-the-art in parsing technology has between the combined words (ቡና ቤት been using data-driven systems. However, due to high morphological variations that exist in /bunna bet/ “bar”፣ አየር መንገድ /ʔajər morphologically rich languages, it is impractical məngəd/ “airline”), separated by a hyphen to observe all such variations in a given annotated (ስነ-ስርዓት /sɨnə-sɨrʔat/ “procedures”፣ ስነ- data. As a result, data-driven systems do not ጥበብ /sɨnə-t‘ɨbəb/ “art”) or written as a guarantee recognition of all morphological single word (ቤተክርስቲያን /bətəkɨrsɨtijan/ variations (Tsarfaty, 2013). Verbs in Amharic, “church”፣ መስሪያቤቶች /məsrijabetoʧʧ/ for instance, can have more than thousands word “offices”). forms (Gasser, 2010). Capturing all these word C. Syntactic words – words which are forms in a corpus is unthinkable. Apart from the separated by white-space (semicolon in morphological variations, the written forms may old documents) may be coupled with cause further variations due to clitics that are functional words like a preposition, attached to content words. Since there are a conjunction or auxiliaries. Thus, an number of clitics in Amharic, the degree of variations increases. Thus, in order to decrease orthographic word may be a phrase (ለሰው the degree of variations arising from attachment /ləsəw/ “to human”), a clause (የሚገኙትና of functional words, one needs to segment these /jəmmigəɲɲutɨnna/“and those that are forms. Segmenting clitics is not a simple task as found/available), or even a sentence it can be part of a content word or can be a word (አልመጣችም /ʔalmət‘t‘aʧʧɨm/ “She did not which is reduced due to phonological processes. come.”)

3.2 Writing system All such features need to be addressed in processing Amharic texts. Some of the above The Amharic script is called Ge’ez or Ethiopic problems call for standardization efforts to be where a consonant and a vowel are represented made whereas others are due to the decision to by one symbol. In most literature, the writing write what is in mind and what is actually system is considered as alpha-syllabic. But some produced. people argue that there may not be a one-to-one relationship between a grapheme and a syllable. 4. Existing resource In other words, each symbol in Amharic orthography represents a CV syllable. However, In recent years, language processing research on a syllable in Amharic may have CVC structure. Amharic has grown. This is partly because of the Problems related to the writing system which existence of a reasonable-size Part of Speech could be worthy considering in the development (POS) tagged corpus and the development of of the treebank are: Morphological Analyzer. The tagged corpus is A. Gemination or consonantal length - in news corpus from Walta Information Center Amharic gemination is phonemic in the (WIC). It is manually tagged by the staff member sense that it could bring meaning change. of the Ethiopian Languages Research Center However, the writing system does not

50

(ELRC). It consists of 210,000 tokens collected 5. Proposed solutions from 1065 news documents (Demeke and Mesfin, 2006). The corpus is used to develop a In the previous sections we have shown that the stemmer (Argaw, and Asker, 2007), Named existing corpora and tool cannot be used for our Entity recognition (Alemu, 2013), a chanker purpose due to their limitation of scope or focus. For our purpose, we want to analyze both lexical (Ibrahim and Assabie, 2014). However, the and function words. Thus, we propose the corpus contains some errors and annotation development of a treebank where both content inconsistencies (Gamback, 2012), (Gebrekidan, and function words are separated. In other words, 2010). Beside the identified problems, they we propose to separate function words or clitics consider orthographic words as their unit of from their phonological host. Even though, the analysis. Function words which are attached to distinction between clitics and affixes are content words are not considered separately. debatable, for our purpose, the following list Thus, we cannot use this corpus as it is. elements are considered as clitics.

Another important resource is a 1. Prepositions morphological analyzer called HornMorpho 2. The Possessive marker or pronominal (Gasser, 2011). It is described as “the most genitive markers complete morphological processing tool for 3. Definite marker Amharic” (Gamback, 2012). The system can be 4. Accusative marker used to analysis, segment and generate words. 5. Conjunction 6. Negation The performance was tested on 200 randomly 7. Auxiliaries selected words and has been reported to have 8. Relative pronouns above 95% accuracy (Gasser, 2011). The tool is 9. Nominal clause marker developed by taking orthographic words into 10. Subject and object pronominal consideration. As a result, it provides POS, agreement markers morphological and syntactic information, and The above elements should be separated from other information related to function words that content words. To do so, we have collected five are attached to the word. Even though, the thousand sentences from different sources which information it provides is important for the include grammar books, biographies, news, analysis of words in isolation, it has to be fictions, science books, law and religions. All the modified for the purpose of developing a parser. collected sentences were manually checked for The major focus in the development was on spelling errors. These sentences will then be lexical words not on function words. The system annotated at different levels. Before the gets confused when lexical words attaches more annotation, we will decompose words into than one function words. For instance, smaller meaningful units without loss of their እንደየክልሎች /ʔɨndəjjəklɨloʧʧu/ “as to the basic meaning. As it is indicated above, in Amharic writing system, those listed function respective regions”, it contains two clitics, እንደ words are written together with content words. /ʔɨndə/ and እየ-/ʔɨjjə/. The system guessed eight Thus, we should segment the two. Such analysis whereas when we remove a clitic, it segmentation will be done following a guideline gives the right analysis. However, since clitics are which we have prepared. not considered as a separate word, the system The guideline gives what should be does not give any analysis for clitis. Therefore, considered in the manual segmentation. For even though it is a very important tool to check instance, complex word in the text, that is a the structure of words, we may not use it for our combination of a content word and one or more purpose as it stands. clitics should be embraced by a bracket. This helps to keep track of the input word which is

51

segmented. When a complex word is segmented, marker is attached to a passive verb, the passive the elements in the orthographic words may not marker ‘ተ‘ /tə/ will get assimilated to the always be the same. They may be modified or consonant that begins the word. Thus, we cannot reduced in some way. For instance, the word tell whether a relatived verb is an active or ወደሚገኝ /wədəjəmmigəɲɲ/ “to which that is passive from the orthography unless we consider found” will be segmented into ወደ_የ_እም_ይ_ተገኝ the context or the pronouciation. For instance, the wədə_jə_ɨmm_jɨ_təgəɲɲ. From this example we word, የሚበላ can read as /jəmmibəlla/ “the one noticed that the orthographic word is a reduced who is eating” as an active form or read as form. When a preposition precedes the /jəmmibbəlla/ “the one who is being eaten” as a complimentizer የ jə (relative marker), the form passive form. In the morpholigical anlysis of will be reduced into ም mm. Thus, we need to keep HornMopho, this is handled by giving both the input orthographic word using the bracket and analysis. The following figure shows the analysis show the components that make up the form in of HornMorpho. the segmentation. The guidline also provides on how to check wheather a certain form is a clitics or part of the content word. Clitics that we have listed above, in most cases are short forms which may be part of the word. In such cases, they will not be segmented. For instance, the form ከ /kə/ “from” is a preposition. However, it can be part of a content word as in ከበደ /kəbbədə/ “became Figure 1: snap shot taken from HornMorpho analyisis heavy” or “a personal name”. In such cases, the from ከ/kə/ should not be segemented. This We notice from figure 1 above that the indicates that we cannot apply a certain rule or expression የሚበላ can have two citation forms በላ write a regular expression to automatically /bəlla/ “eat” for active and ተበላ /təbəlla/ “ being segment clitics. Separating clitics, we can say eaten” for passive. The possible interpritation of that, requires knoweldge of exisiting words in the the expression is given under “grammar” part of laguage. the analysis. In our manual anotation, since the In addition, the form of the clitics can be segmentation is done for a give sentence which is changed due to phonological process. This the context, this expression will be segmented as change can also be observed in the orthography. either as የ_እም_ይ_በል_ኣ or as የ_እም_ይ_ተ_በል_ኣ For instance, the prepostion ለ lə “to/for” is depending on the contex. The manual attached to a content word that begines with the segmentation is therefore important to the vowel like ኣሸናፊነት /aʃʃənnafinnət/ “winning”, the development of a morphological analyser with a form of the preposition will be changed.As a disambugation module for the future. result, the form becomes ላሸናፊነት /laʃʃənnafinnət/ This stage is the basic and fundamental step “ for a winning”. If we consider all the variations where the data is given to three annotators who a clitic may have, it will be problematic to handle are linguists and have better understanding of the all variations. Thus, the guideline suggests to language for the manual segmentation. Inter- restore to the orginal form in the segmentation. annotators agreement will be checked. After we Accordingly, ላሸናፊነት will be segmented into have reached above 95% inter-annotator ለ_ኣሸናፊነት. agreement, we will assign them a separate data The manual segmentation is important to for clitic segmentation. The result of this level solve some ambiguities observed in the will be a corpus of clitics separated from lexical arthography. For instance, some verbs which are words. It will help us to develop a tokenizer relativized can be in active or passive form. This which is a basic tool for the language. ambiguity occurs because when the relative

52

After the segmentation, the corpus will be 33 CONJ conjunction annotated for POS tag and morphological 34 SCONJ subordinating conjunction features. We have compiled 56 POS tag sets 35 NUM cardinal based on morphosyntactic properties words. 36 ORD ordinal Table 1 summarizes the POS tag sets. 37 INTJ interjection No tag Name 38 BG beginning 39 MD medial 1 CN common 40 FN final 2 ABS abstract foreign word, written in other 41 FW language 3 CLN collective 42 AC acronym 4 PN proper 43 AB abbreviation 5 VN verbal noun/infinitive 44 FM formula 6 CMN compound noun 45 EM emoticon 46 AN answer 7 PR personal 47 NG negative 8 RF reflexive 48 UC unclassified 9 DM demonstrative 49 SY symbol 10 IN interrogative 50 MWE multi word Expression 11 IND indefinite 51 DEF definite marker 12 POSP possessive 52 ACC accusative marker 13 QAN indefinite 53 GEN genetive marker 14 ADJ other adjectives 54 NEG negative marker 15 ADJN adjective derived from noun 55 RLP relative pronoun 16 ADJV adjective derived from verbs 56 POSM possessive marker 17 CADJ compound Adjective Table 1: POS tag set

18 COP copula The above tags will be revised based on the 19 VI intransitive with no complement feedback we will get from the annotators. The list 20 VIP intransitive with a complement is subjected for modification. In addition, we 21 VT transitive have listed possible morphological features which words in Amharic can represent. Table 2 22 VTN transitive with complement lists the morphological features that can be 23 VEX existential annotated in the treebank. 24 VEV eventive Basic Categories Inflection Type Tags 25 CV compound verb masculine m asc 26 AUX auxilary verb Gender feminine f em 27 ADVT time common com 28 ADVP place singular s ing 29 ADVD degree Nominal plural plur 30 ADVM manner Number dual dual 31 PREP adposition collective coll 32 POT postposition Case nominative nom

53

accusative acc system to annotate for both type of information. genitive gen The result will be manually checked and corrected by the annotators. The system again Definite definite def learns from the corrections. In other words, it will infinitive inf be done in iterative ways i.e. manually gerund ger annotation, training, manual correction, indicative ind verb form retraining, and annotation (Judge et al., 2006). jussive jus The result will be used to develop an automatic question que POS tagger and morphological analyzer. negative neg Finally, we plan to annotate the sentences with grammatical relations using a universal past past dependency framework (Nivre, 2015). We have Tense present pres identified and compiled potential syntactic future fut relations for Amharic. Table 3 provides potential imperfect imp syntactic relations identified so far. perfective perf Aspect Category Relation Description prospective pro adj adjective progressive prog possessive pred active act construction Nominal passive pass Dependency app predicate Verb Voice reciprocal rcp spec apposition causative cau cpnd specification first 1 subj subject of a verb Person second 2 pass passive subject third 3 verbal positive obj object of a verb Dependency Negative /affirmative pos impv imperative negative neg pro prohibition subject subj prepositional object obj gen Agreement phrase dative dat link PP attachment applicative app coordinating masculine masc conj Gender conjunction feminine fem sub subordinate clause singular sing Number cond condition Plural plu phrases and clauses rslt result Table 2: Morphological features conc concessive Therefore, the segmented sentences will be annotated for both POS tag and morphological temp temporal features. This could be done in a semi-automatic loc local way. That means, some of the data like 100 caus causal sentences will be manually annotated and then the machine learns the tag and morphological amd purpose features out of these seed sentences. Then other Table 3: Dependency relations set of sample sentences will be given to the

54

Using the above relations, sentences will be POS tag, morphological features and dependency semi-automatically annotated for syntactic relations. Figure (1) is a screen-shot for sample relations following the same procedures we data attempted for a couple of sentences. follow in the above annotations. Consequently, put all the activities together we will have a treebank annotated for all the three information:

Figure 2: Annotation sample using Brat

In figure 2 we observe a dependency tree for an under Linguistic Capacity Building-Tools for Amharic sentence. We many notice that complex inclusive development of Ethiopia, expressions are embraced by a bracket and their (http://www.hf.uio.no/iln/english/research/projec segmentation are indicated following the bracket. ts/linguistic-capacity-building-tools-for-the- When we want to retrieve the orthography we can inclu/). We want to thank both institutions for consider the expression in bracket and when we their support. We also would to extend our want to represent the syntactic roles played by the appreciations to the three anonymous reviewers clitics we can consider the segmentation. for their feedback that greatly improved the Furthermore, the morphological features are also article. indicated together with their tags if a given token has morphological features. We have produced References the above kind of representation for some couple Abeba Ibrahim and Yaregal Assabie. (2014). of sentences. However, the manual segmentation Amharic Sentence Parsing Using Base of the remaining sentences is in progress. Phrase Chunking. CICLing 2014, (pp. 297- 306).

6. Conclusion Alemu, Besufikad. (2013). A Named Entity Recognition for Amharic . MA Thesis, Addis We have described an ongoing project that aims Ababa University. at developing a treebank for Amharic. As the language is less resourced and morphologically- Amsalu, Saba and Demeke, Girma A. (2006). Non- rich, we suggest the annotation of the treebank to concatinative Finite-State Morphotactics of have three tiers: POS tag, morphological features Amharic Verbs. and syntactic relations. Before the annotation is Argaw, Atelach Alemu and Asker, Lars. (2007). An done, orthographic word needs to be segmented Amharic Stemmer : Reducing Words to their if it has clitics. We suggested that the minimal Citation Forms. Proceedings of the 5th unit for our analysis should be syntactic words, Workshop on Important Unresolved i.e. both content word and functional words. Matters, pages , (pp. 104 -110). Prague, Czech Republic: Association for Acknowledgments Computational Linguistics.

This project is partially funded by the Director for Binyam Gebrekidan. (2010). Part of Speech Tagging Research of AAU under Adaptive Problem- for Amharic. Masters Thesis, University of Wolverhampton. Wolverhampton. Solving Research grant and NORHED fund

55

Carroll, John. (2000). Statistical Parsing. In N. T. R. Habash, Nizar (2010). Introduction to Arabic Natural Dale (Ed.), Handbook of Natural Language Language Processing. Morgan & Claypool,. Processing (pp. 525–544). John Judge, Aoife Cahill, Josef van Genabith. (2006). Dehadari, J., Tounsi, L. and Genabith, J. (2011). QuestionBank: Creating a Corpus of Parse- Morphological Features for Parsing Annotated Questions. ACL 2006, 21st Morphologically- rich Languages: A Case of International Conference on Computational Arabic . Proceedings of the 2nd Workshop Linguistics and 44th Annual Meeting of the on Statistical Parsing of Morphologically- Association for Computational Linguistics. Rich Language (SPMRL 2011), (pp. 12-21). Sydney, Australia: ACL. Dublin Irelan. Maamouri, Mohamed and Bies, Ann. (2004). Dehadari, Jon, Tounsi, Lamia and Genabith, Josef. Developing an Arabic Treebank: Methods, (2011). Morphological Features for Parsing Guidelines, Procedures, and Tools. Morphologically-rich Languages: A Case of Proceedings of the Workshop on Arabic. Proceedings of the 2nd Workshop Computational Approaches to Arabic Script- on Statistical Parsing of Morphologically- based Languages, 2-9. Rich Language (SPMRL 2011), (pp. 12-21). Dublin Ireland. Nivre, Joakim. (2006). Two Strategies for Text Parsing. Journal of Linguistics, 19, 440– Demeke , Girma Awgichew and Getachew , Mesfin. 448. (2006). Manual annotation of Amharic news items with part-of-speech tags and its Nivre, Joakim. (2008). . In M. a. Kytö, challenges. Ethiopian Languages Research : An International Center Working Papers, 2, 1–16. Handbook (pp. 225-241). Mouton de Gruyter. Frank, Aneette; Hinrichs, Erhard;. (2012). Treebanks : Linking Linguistic Theory to Nivre, Joakim. (2015). Towards a Universal Computational Linguistics. Linguistic Issues Grammar for Natural Language Processing. in Language Technology, 7(1). In Lecture Notes in Computer Science (including subseries Lecture Notes in Gamback, Björn. (2012). Tagging and Verifying an Artificial Intelligence and Lecture Notes in Amharic News Corpus. the 8th International Bioinformatics) (pp. 3–16). Switzerland: Conference on Language Resources and Springer International Publishing. Evaluation (ELRA). Workshop on Language doi:10.1007/978-3-642-28601-8 Technology for Normalisation of Less- Resourced Languages. Istanbul, Turkey. Nivre, Joakim; Hall, Johan; Sandra, Kubler; Mcdonald, Ryan; Nilsson, Jens; Riedel, Gambäck, Björn; Olsson, Fredrik; Atelach, Alemu Sebastian & Yuret, Deniz. (2007). The Argaw; Asker, Lars. (2009). Methods for CoNLL 2007 Shared Task on Dependency Amharic part-of-speech tagging. Parsing. Proceedings of the CoNLL Shared Proceedings of the First Workshop on Task Session of EMNLP-CoNLL 2007, (pp. Language Technologies for African 915–932). Prague. Languages, (pp. 104-111). Rens Bod ; Remko Scha ; Khalil Sima. (2012). Data Gasser, Michael. (2010). A Dependency Grammar Oriented Parsing. (R. Bod , R. Scha , & K. for Amharic. Workshop on Language Sima, Eds.) CSLI Studies in Computation, Resource and Human Language 14(4), 472–476. Technologies for Semitic Languages. Tsarfaty , Reut; Seddah, Djame; Kubler , Sandra; Gasser, Michael. (2011). HornMorpho: a system for Niver, Joakim;. (2013). Parsing morphological processing of Amharic, Morphologically Rich Languages: Oromo, and Tigrinya. Conference on Introduction to the Special Issue. HUman Language Technology for Association for Computational Linguistics, Development. Alexandria, Egypt. 39(1), 15 - 22.

56 Tsarfaty, Reut. (2013). A Unified Morpho-Syntactic Languages, (pp. 1-12). Los Angeles, Scheme of Stanford Dependencies. California. Proceedings of ACL. Tsarfaty, Reut; Seddah , Djame; Goldberg, Yoav; Tsarfaty, Reut; Seddah , Djame; Goldberg, Yoav; Kubler , Yannick; Candito , Marie; Foster , Kubler , Yannick; Candito , Marie; Foster , Jennifer; Versley , Yannick; Rehbein, Ines; Jennifer; Versley , Yannick; Rehbein, Ines; Tounsi, Lamia. (2010). Statistical Parsing of Tounsi, Lamia. (2010). Statistical Parsing of Morphologically Rich Languages (SPMRL): Morphologically Rich Languages (SPMRL): What, How and Whither. The Proceedings What, How and Whither. The Proceedings of the NAACL HLT 2010 First Workshop on of the NAACL HLT 201 First Workshop on Statical Parsing of Morphologically-Rich Statistical Parsing of Morphologically-Rich Languages, (pp. 1-12). Los Angles, California.

57