Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging
Total Page:16
File Type:pdf, Size:1020Kb
Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging Ehsan Doostmohammadi|, Minoo Nassajian|, Adel Rahimi♠ |Sharif University of Technology, Tehran, Iran ♠Dathena Science Pte. Ltd., Singapore fe.doostm72,[email protected], [email protected] Abstract This affix is always pronounced but almost al- ways not written, which results in a high degree Ezafe is a grammatical particle in some Iranian of ambiguity in reading and understanding Persian languages that links two words together. Re- texts. It is hence considered as one of the most gardless of the important information it con- veys, it is almost always not indicated in Per- interesting issues in Persian linguistics, and it has sian script, resulting in mistakes in reading been discussed in details from phonological as- complex sentences and errors in natural lan- pects (Ghomeshi, 1997), morphological aspects guage processing tasks. In this paper, we (Samvelian, 2006, 2007) and (Karimi and Brame, experiment with different machine learning 2012), and syntactic aspects (Samiian, 1994; Lar- methods to achieve state-of-the-art results in son and Yamakido, 2008; Kahnemuyipour, 2006, the task of ezafe recognition. Transformer- 2014, 2016). based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving Nearly 22% of the Persian words have ezafe (Bi- 2.68% F1-score more than the previous state- jankhan et al., 2011), which shows the prevalence of-the-art. We, moreover, use ezafe informa- of this marker. Moreover, this construction also ap- tion to improve Persian part-of-speech tagging pears in other languages such as Hawramani (Holm- results and show that such information will not berg and Odden, 2005), Zazaki (Larson and Ya- be useful to transformer-based methods and ex- makido, 2006; Toosarvandani and van Urk, 2014), plain why that might be the case. Kurdish (Karimi, 2007) etc. Ezafe construction 1 Introduction is also similar to idafa construction in Arabic and construct state in Hebrew (Habash, 2010; Karimi Persian ezafe is an unstressed morpheme that ap- and Brame, 2012) and Zulu (Jones, 2018). pears on the end of the words, as -e after conso- Ezafe recognition is the task of automatically la- nants and as -ye1 after vowels. This syntactic phe- beling the words ending with ezafe, which is crucial nomenon links a head noun, head pronoun, head for some tasks such as speech synthesis (Sheikhan adjective, head preposition, or head adverb to their et al., 1997; Bahaadini et al., 2011), as ezafe is al- modifiers in a constituent called ‘ezafe construc- ways pronounced, but rarely written. Furthermore, tion’ (Nassajian et al., 2019). Whether a word in a as recognizing the positions of this marker in sen- sentence receives or does not receive ezafe might tences helps determine phrase boundaries, it highly affect that sentence’s semantic and syntactic struc- facilitates other natural language processing (NLP) tures, as demonstrated in Examples 1a and 1b in tasks, such as tokenization (Ghayoomi and Mom- Figure1. There are some constructions in English tazi, 2009), syntactic parsing (Sagot and Walther, that can be translated by ezafe construction in Per- 2010; Nourian et al., 2015), part-of-speech (POS) sian. For instance, English ‘of’ has the same role tagging (Hosseini Pozveh et al., 2016), and ma- as Persian ezafe to show the part-whole relation, chine translation (Amtrup et al., 2000). the relationship of possession, or ‘’s’ construction, In this paper, we experiment with different meth- and possessive pronouns followed by nouns show- ods to achieve state-of-the-art results in the task of ing genitive cases are mirrored by Persian ezafe ezafe recognition. We then use the best of these (Karimi and Brame, 2012). methods to improve the results for the task of POS 1The y is called an intrusive y and is an excrescence be- tagging. After establishing a baseline for this task, tween two vowels for the ease of pronunciation. we provide the ezafe information to the POS tag- 961 Findings of the Association for Computational Linguistics: EMNLP 2020, pages 961–971 November 16 - 20, 2020. c 2020 Association for Computational Linguistics prep nsubj pobj advmod dobj-lvc root (1) a. Darpey-e ettefaq¯ at-e¯ diruz ’u ’este’fa¯ dad¯ . following-ez.2 events-ez. yesterday he/she resign did . P N ADV PRO N V DELM “Following the yesterday events, he/she resigned.” prep advmod nsubj pobj dobj-lvc root b. Darpey-e ettefaq¯ at¯ diruz ’u ’este’fa¯ dad¯ . following-ez. events yesterday he/she resign did . P N ADV PRO N V DELM “Following the events, he/she resigned yesterday.” Figure 1: An example of the role of ezafe in the syntactic and semantic structures. ging model once in the input text and the other time Megerdoomian et al.(2000) use a rule-based as an auxiliary task in a multi-task setting, to see method to design a Persian morphological analyzer. the difference in the results. The contributions of They define an ezafe feature to indicate the pres- this paper are (1) improving the state-of-the-art re- ence or absence of ezafe for each word based on sults in both of ezafe recognition and POS tagging the following words in a sentence. Another work is tasks, (2) analyzing the results of ezafe recognition Muller¨ and Ghayoomi(2010) that considers ezafe task to pave the way for further enhancement in the as a part of implemented head-driven phrase struc- future work, (3) improving POS tagging results in ture grammar (HPSG) to formalize Persian syntax some of the methods by providing ezafe informa- and determine phrase boundaries. In addition, No- tion and explaining why transformer-based models joumian(2011) designs a Persian lexical diacritizer might not benefit from such information. The code to insert short vowels within words in sentences us- for our experiments is available on this project’s ing finite-state transducers (FST) to disambiguate GitHub repository 3. words phonologically and semantically. They use After reviewing the previous work of both tasks a rule-based method to insert ezafe based on the in Section2, we introduce our methodology in context and the POS tags of the previous words. Section3 and data in Section4. We then discuss As for the statistical approach, Koochari et al. ezafe recognition and POS tagging tasks and their (2006) employ classification and regression trees results in Sections5 and6, respectively. (CART) to predict the absence or presence of ezafe marker. They use features such as Persian mor- 2 Previous Work phosyntactic characteristics, the POS tags of the current word, two words before, and three words af- 2.1 Ezafe Recognition ter the current word to train the model. Their train In the field of NLP, a few studies have been car- set contains approximately 70,000 words, and the ried out on Persian ezafe recognition, including test corpus consists of 30,382 words. To evaluate rule-based methods, statistical methods, and hybrid the performance of the model, they use Kappa fac- methods. Most of the previous work on the task tor, and they report 98.25% accuracy in the case of ezafe rely on long lists of hand-crafted rules and fail to non- words and 88.85% in the case of words ezafe achieve high performance on the task. with . As another research, we can mention Asghari et al.(2014) that employs maximum en- 2Ezafe. tropy (ME) and conditional random fields (CRF) 3https://github.com/edoost/pert methods. They use the 10 million word Bijankhan 962 corpus (Bijankhan et al., 2011) and report an ac- lion word Bijankhan corpus, ME and CRF with curacy of 97.21% for the ME tagger and 97.83% different window sizes, the best results of which for the CRF model with a window of size 5. They are 95% for both models with a window size of 5. also utilize five Persian specific features in a hy- brid setting with the models to achieve the highest 3 Methodology accuracy of 98.04% with CRF. We see both ezafe recognition and POS tagging Isapour et al.(2008) propose a hybrid method as sequence labeling problems, i.e., mapping each to determine ezafe positions using probabilistic input word to the corresponding class space of the context-free grammar (PCFG) and then consider task. For the ezafe recognition task, the class space the relations between the heads and their modi- size is two, 0 for words without and 1 for words fiers. The obtained accuracy is 93.29%, reportedly. with ezafe. The class space size for POS tagging Another work is Noferesti and Shamsfard(2014) task is 14, consisting of the coarse-grained POS that uses both a rule-based method and a genetic tags in the 10 million word Bijankhan corpus. The algorithm. At first, they apply 53 syntactic, mor- results in Section 2.1 are unfortunately reported on phological, and lexical rules to texts to determine different, and in most cases irreproducible, test sets, words with ezafe. Then, the genetic algorithm is using accuracy as the performance measure (which employed to recognize words with ezafe, which is insufficient and unsuitable for the task), making have not been recognized at the previous step. To the comparison difficult. We hence re-implemented train and test the model, they use the 2.5 million the model that reports the highest accuracy on the word Bijankhan corpus (Bijankhan, 2004) and ob- largest test set and compare its results with ours. tain an accuracy of 95.26%. 3.1 Models 2.2 POS Tagging We experiment with three types of models: condi- Azimizadeh et al.(2008) use a trigram hidden tional random fields (CRF) (Lafferty et al., 2001), Markov model trained on the 2.5 million word Bi- recurrent neural networks (RNN) (Rumelhart et al., jankhan corpus.