Zeina Aizouky
Total Page:16
File Type:pdf, Size:1020Kb
i Arabic-English Google Translation Evaluation and Arabic Sentiment Analysis by Zeina Aizouky A thesis submitted in partial fulfillment of the requirements for the degree of Master of Arts Department of Digital Humanities University of Alberta © Zeina Aizouky, 2020 ii Abstract Machine translation is one of the most important tasks in the automatic processing of the natural languages, but its systems are still very far from achieving any performance close to ideal human translation due to many obstacles and difficulties. For example, grammatical rules between different languages are different, also, some words in the source language may not have an equivalent translation in the target language and this will lead to accurate translation results. Also, sentiment analysis is another application of Natural Language Processing (NLP) which is the process of identifying the feelings or sentiments held in a piece of text and classifying them into positive, negative, or neutral. This study sets out to examine the efficiency and accuracy of Google Translate’s translation between the Arabic and English languages and to evaluate the qualification of sentiment analysis when we apply it to Arabic texts. My study will also examine how old texts and texts written in dialects of Arabic affect both sentiment analysis and Google Translate’s performance. So, I used Arabic songs’ lyrics which differ between each other in terms of precedence and formality where they can be old or new, and formal or informal. To evaluate the results of Google Translate in this thesis, a lexical and grammatical analysis was used and the retrieval rate of words, sentences, and meanings of the text in the source language was calculated. In this thesis, I proposed a new approach to sentiment analysis which I conducted over five Arabic songs’ lyrics. Some of the lyrics are written in informal colloquial Arabic and some in Modern Standard Arabic (MSA). I performed sentiment analysis on those lyrics to evaluate the efficiency of the Google translation by comparing the results of the sentiment analysis of Arabic lyrics and the translated versions of those lyrics. iii The results of the evaluation of Google Translate showed that Google Translate has failed to achieve adequate translation, especially in old complex informal and unstructured Arabic texts. In addition, the results of applying sentiment analysis to Arabic texts showed that old and informal texts lower the performance of sentiment analysis. Also, applying English sentiment analysis to the translated version gave different results than applying Arabic opinion mining to the Arabic original version. Furthermore, Google Translate lowered the accuracy of the English sentiment analysis compared to the results I got when I applied the English sentiment analysis to my suggested translation of the lyrics. iv Acknowledgment I would like to thank the University of Alberta for affording me the un-imaginable opportunity to complete my Masters’ degree here. And Special thanks to Dr. Maureen Engel and Nashir Karmali whom I would not be here without their help. I also would like to thank my supervisor Dr. Harvey Quamen for the help and guidance through my thesis process. Geoffrey Rockwell and Sean Gouglas for taking the time to be in my supervisory committee. Last but certainly not least is my family, husband, and my friends for encouraging and supporting me throughout my entire program. v Table of Contents Table of Contents Abstract ............................................................................................................................... ii Acknowledgment ............................................................................................................... iv Table of Contents ................................................................................................................ v List of Tables ..................................................................................................................... xi List of Figures and Illustrations ....................................................................................... xiii Chapter 1: Introduction ....................................................................................................... 1 • Background in the Arabic Language ........................................................................ 3 ➢ Arabic Phonology .............................................................................................. 6 ➢ Arabic Orthography ........................................................................................... 8 ➢ Arabic Morphology ........................................................................................... 9 • Background in NLP ................................................................................................ 11 ➢ NLP importance ............................................................................................... 12 ➢ Machine translation ......................................................................................... 14 ➢ Sentiment Analysis .......................................................................................... 15 ➢ Challenges in NLP ........................................................................................... 17 Chapter 2: Introduction to Arabic Music .......................................................................... 20 • Arabic Musical Instruments .................................................................................... 20 vi • Differences between Arabic and Western music .................................................... 21 • Influence of Arabic Music on Western Music ........................................................ 24 • Influence of Western Music on Arabic Music ........................................................ 26 • Muwashahat and religious music (Whirling Dervish) Mawlawiyya ...................... 27 • Turko-Arab influence (Samaai) .............................................................................. 28 • The Arabic Maqamat .............................................................................................. 30 • Tarab ....................................................................................................................... 31 • Arabic Singers ......................................................................................................... 32 Chapter 3: Literature Review ............................................................................................ 34 • NLP Applications Accuracy in Arabic Language .................................................. 34 • Introduction to MT and Google Translate .............................................................. 35 • Machine translation:................................................................................................ 37 ➢ Current state-of-the-art in Machine translation ............................................... 37 ➢ Big debates in Machine translation ................................................................. 38 ➢ Google Translate efficiency in academic fields .............................................. 40 ➢ Google Translate efficiency with texts from different eras ............................. 41 ➢ Google Translate efficiency between Arabic and English .............................. 42 ➢ Google Translate accuracy in translating Arabic texts .................................... 43 ➢ Google translate efficiency in translating Arabic poetry ................................. 44 vii ➢ Old Arabic texts and Google Translate accuracy ............................................ 46 ➢ Texts in Dialects of Arabic and Google Translate .......................................... 46 ➢ Arabic-English Google Translate errors .......................................................... 48 • Sentiment analysis: ................................................................................................. 48 ➢ Current state-of-the-art in sentiment analysis .................................................. 48 ➢ Big debates in sentiment analysis .................................................................... 50 ➢ Sentiment analysis accuracy in Arabic texts ................................................... 52 ➢ Sentiment analysis efficiency in Arabic texts.................................................. 54 ➢ Sentiment analysis errors in Arabic texts ........................................................ 55 ➢ Texts in Dialects of Arabic and sentiment analysis ......................................... 56 • Conclusion .............................................................................................................. 56 Chapter 4: Machine Translation and Google Translate Evaluation .................................. 58 • Introduction about MT and Google translate .......................................................... 58 ➢ Bilingual and Multilingual systems ................................................................. 62 ➢ Median Language ............................................................................................ 63 ➢ Statistical Translation ...................................................................................... 64 ➢ Google Translate .............................................................................................. 65 ➢ Arabic Language in MT .................................................................................. 66 • Methodology ..........................................................................................................