ALANAZI, MOHAMMAD S, Ph.D., December 2019 Modern and Classical Language Studies the USE of COMPUTER-ASSISTED TRANSLATION TOOLS F

Total Page:16

File Type:pdf, Size:1020Kb

ALANAZI, MOHAMMAD S, Ph.D., December 2019 Modern and Classical Language Studies the USE of COMPUTER-ASSISTED TRANSLATION TOOLS F ALANAZI, MOHAMMAD S, Ph.D., December 2019 Modern and Classical Language Studies THE USE OF COMPUTER-ASSISTED TRANSLATION TOOLS FOR ARABIC TRANSLATION: USER EVALUATION, ISSUES, AND IMPROVEMENTS (368 PP.) Dissertation Advisor: Sue Ellen Wright The development of technology since the last quarter of the 20th century has played a momentous role in shaping the translation process for most languages. The Arabic language, however, has encountered challenges and difficulties to catch up with the accelerated changes in computer-assisted translation tools. Those challenges have been examined and investigated extensively during the last decade. However, these tools evaluations made by Arabic language translators have not been adequately taken into consideration in the previous studies. The challenging morphological, syntactic, phonetic, and phonologic characteristics of Arabic language make it one of the most complicated languages for the use of developed translation technology, which can explain a potentially understandable negative assessment among Arabic language translators. This study examined Arabic language translators’ evaluation of computer- assisted translation tools and investigated potential problems that can possibly complicate the use of the tools. Finally, the study discussed factors to take into consideration when developing computer-assisted tools to address Arabic language translators’ needs. The study hypothesized that Arabic language translators would express concerns regarding language-specific issues during the use of the tools. Complications would occur for Arabic language translators while working with these applications, e.g. MT suggestions, segmentation, punctuation and script related issues etc. To test the study's hypothesis, a mixed methodological approach was pursued that combines the following: an online survey and an observational experiment. Arabic language translators were recruited to participate in the study. A mixed approach of quantitative and qualitative analysis of the collected data were conducted to demonstrate the responses and evaluation of the participants toward the tools. The results of the study reveal a strong inclination by Arabic language translators in this study to encourage and support the use of CAT tools despite the complications (e.g., segmentation, punctuation and spelling etc.) and suggest that Arabic language translators are more likely to make changes to TM and extensive post-editing to MT suggestions. Triangulation of the survey and experiment findings supports the conclusion that there is no relationship between the complications experienced while using translation tools and the expressed level of satisfaction. Keywords: Arabic language, MT, CAT tools, evaluation, perspectives, complications THE USE OF COMPUTER-ASSISTED TRANSLATION TOOLS FOR ARABIC TRANSLATION: USER EVALUATION, ISSUES, AND IMPROVEMENTS A Dissertation Submitted to Kent State University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy By Mohammad S. Alanazi December 2019 © Copyright All rights reserved Except for previously published materials. Dissertation written by Mohammad S. Alanazi BA, Imam Muhammad Ibn Saud Islamic University, 2009 M.A., University of Florida, 2013 Ph.D., Kent State University, 2019 Approved by Sue Ellen Wright______________________, Chair, Doctoral Dissertation Committee Said Shiyab__________________________, Members, Doctoral Dissertation Committee Erik Angelone________________________, Michael Carl_________________________, Yesim Kaptan________________________, Accepted by Keiran Dunne________________________, Chair, Department of Modern and Classical Language Studies James L. Blank_______________________, Dean, College of Arts and Science TABLE OF CONTENTS TABLE OF CONTENTS .................................................................................................... v LIST OF FIGURES ........................................................................................................... ix LIST OF TABLES ............................................................................................................. xi LIST OF ABBREVATIONS ........................................................................................... xiii ARABIC TRANSLITERATION..................................................................................... xiv DEDICATION .................................................................................................................. xv ACKNOWLDEGMENT.................................................................................................. xvi CHAPTER I ........................................................................................................................ 1 INTRODUCTION .............................................................................................................. 1 1.1 Overview ................................................................................................................... 1 1.2 Statement of the Problem .......................................................................................... 2 1.3 Research Questions and Hypotheses ......................................................................... 5 1.4 Potential Impact and Significance ............................................................................. 7 1.5 Overview of the Dissertation..................................................................................... 9 CHAPTER II ..................................................................................................................... 11 LITERATURE REVIEW ................................................................................................. 11 2.1 Introduction ............................................................................................................. 11 2.2 Challenges for Arabic Language Natural Language Processing ............................. 12 2.2.1 Morphological Analysis in Arabic ................................................................... 12 2.2.2 Syntactical Analysis of Arabic ......................................................................... 14 2.2.3 Arabic Diacritics ............................................................................................... 19 2.2.4 Arabic Diglossia ............................................................................................... 22 2.2.5 Arabic Optical Character Recognition (OCR) .................................................. 25 2.2.6 Governmental and Academic Support for Arabic Computer Tools ................. 28 2.3 Development of Arabic Automated Translation Tools ........................................... 30 v 2.3.1 Rule-Based Machine Translation ..................................................................... 32 2.3.2 Example-Based Machine Translation ............................................................... 33 2.3.3 Statistical Machine Translation ........................................................................ 34 2.3.4 Hybrid Machine Translation ............................................................................. 35 2.3.5 Neural Network-based Machine Translation .................................................... 36 2.4 Shift to Human-Machine Translation ...................................................................... 37 2.4.1 The Translator’s WorkStation (Computer-assisted Translation Tools) ........... 38 2.4.2 Challenges encountered with CAT tools with Arabic ...................................... 43 2.4.3 Integrated Computer-assisted Translation Tools .............................................. 46 2.5 Summary ................................................................................................................. 49 CHAPTER III ................................................................................................................... 51 METHODOLOGY ........................................................................................................... 51 3.1 Introduction ............................................................................................................. 51 3.2 Research Approach ................................................................................................. 52 3.3 Survey...................................................................................................................... 54 3.3.1 Participants ....................................................................................................... 54 3.3.2 Materials and Procedure ................................................................................... 57 3.4 Experiment .............................................................................................................. 58 3.4.1 Translation Task ............................................................................................... 59 3.4.2 Semi-Structured Interview ................................................................................ 60 3.4.3 Participants ....................................................................................................... 61 3.4.4 Research Procedure .......................................................................................... 62 3.5 Data Elicitation ........................................................................................................ 64 3.5.1 Quantitative Analysis ....................................................................................... 66 3.5.2 Qualitative analysis..........................................................................................
Recommended publications
  • An Artificial Neural Network Approach for Sentence Boundary Disambiguation in Urdu Language Text
    The International Arab Journal of Information Technology, Vol. 12, No. 4, July 2015 395 An Artificial Neural Network Approach for Sentence Boundary Disambiguation in Urdu Language Text Shazia Raj, Zobia Rehman, Sonia Rauf, Rehana Siddique, and Waqas Anwar Department of Computer Science, COMSATS Institute of Information Technology, Pakistan Abstract: Sentence boundary identification is an important step for text processing tasks, e.g., machine translation, POS tagging, text summarization etc., in this paper, we present an approach comprising of Feed Forward Neural Network (FFNN) along with part of speech information of the words in a corpus. Proposed adaptive system has been tested after training it with varying sizes of data and threshold values. The best results, our system produced are 93.05% precision, 99.53% recall and 96.18% f-measure. Keywords: Sentence boundary identification, feed forwardneural network, back propagation learning algorithm. Received April 22, 2013; accepted September 19, 2013; published online August 17, 2014 1. Introduction In such conditions it would be difficult for a machine to differentiate sentence terminators from Sentence boundary disambiguation is a problem in natural language processing that decides about the ambiguous punctuations. beginning and end of a sentence. Almost all language Urdu language processing is in infancy stage and processing applications need their input text split into application development for it is way slower for sentences for certain reasons. Sentence boundary number of reasons. One of these reasons is lack of detection is a difficult task as very often ambiguous Urdu text corpus, either tagged or untagged. However, punctuation marks are used in the text.
    [Show full text]
  • Uila Supported Apps
    Uila Supported Applications and Protocols updated Oct 2020 Application/Protocol Name Full Description 01net.com 01net website, a French high-tech news site. 050 plus is a Japanese embedded smartphone application dedicated to 050 plus audio-conferencing. 0zz0.com 0zz0 is an online solution to store, send and share files 10050.net China Railcom group web portal. This protocol plug-in classifies the http traffic to the host 10086.cn. It also 10086.cn classifies the ssl traffic to the Common Name 10086.cn. 104.com Web site dedicated to job research. 1111.com.tw Website dedicated to job research in Taiwan. 114la.com Chinese web portal operated by YLMF Computer Technology Co. Chinese cloud storing system of the 115 website. It is operated by YLMF 115.com Computer Technology Co. 118114.cn Chinese booking and reservation portal. 11st.co.kr Korean shopping website 11st. It is operated by SK Planet Co. 1337x.org Bittorrent tracker search engine 139mail 139mail is a chinese webmail powered by China Mobile. 15min.lt Lithuanian news portal Chinese web portal 163. It is operated by NetEase, a company which 163.com pioneered the development of Internet in China. 17173.com Website distributing Chinese games. 17u.com Chinese online travel booking website. 20 minutes is a free, daily newspaper available in France, Spain and 20minutes Switzerland. This plugin classifies websites. 24h.com.vn Vietnamese news portal 24ora.com Aruban news portal 24sata.hr Croatian news portal 24SevenOffice 24SevenOffice is a web-based Enterprise resource planning (ERP) systems. 24ur.com Slovenian news portal 2ch.net Japanese adult videos web site 2Shared 2shared is an online space for sharing and storage.
    [Show full text]
  • Automatic Correction of Real-Word Errors in Spanish Clinical Texts
    sensors Article Automatic Correction of Real-Word Errors in Spanish Clinical Texts Daniel Bravo-Candel 1,Jésica López-Hernández 1, José Antonio García-Díaz 1 , Fernando Molina-Molina 2 and Francisco García-Sánchez 1,* 1 Department of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, Spain; [email protected] (D.B.-C.); [email protected] (J.L.-H.); [email protected] (J.A.G.-D.) 2 VÓCALI Sistemas Inteligentes S.L., 30100 Murcia, Spain; [email protected] * Correspondence: [email protected]; Tel.: +34-86888-8107 Abstract: Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing Citation: Bravo-Candel, D.; López-Hernández, J.; García-Díaz, with patient information.
    [Show full text]
  • Armenophobia in Azerbaijan
    Հարգելի՛ ընթերցող, Արցախի Երիտասարդ Գիտնականների և Մասնագետների Միավորման (ԱԵԳՄՄ) նախագիծ հանդիսացող Արցախի Էլեկտրոնային Գրադարանի կայքում տեղադրվում են Արցախի վերաբերյալ գիտավերլուծական, ճանաչողական և գեղարվեստական նյութեր` հայերեն, ռուսերեն և անգլերեն լեզուներով: Նյութերը կարող եք ներբեռնել ԱՆՎՃԱՐ: Էլեկտրոնային գրադարանի նյութերն այլ կայքերում տեղադրելու համար պետք է ստանալ ԱԵԳՄՄ-ի թույլտվությունը և նշել անհրաժեշտ տվյալները: Շնորհակալություն ենք հայտնում բոլոր հեղինակներին և հրատարակիչներին` աշխատանքների էլեկտրոնային տարբերակները կայքում տեղադրելու թույլտվության համար: Уважаемый читатель! На сайте Электронной библиотеки Арцаха, являющейся проектом Объединения Молодых Учёных и Специалистов Арцаха (ОМУСA), размещаются научно-аналитические, познавательные и художественные материалы об Арцахе на армянском, русском и английском языках. Материалы можете скачать БЕСПЛАТНО. Для того, чтобы размещать любой материал Электронной библиотеки на другом сайте, вы должны сначала получить разрешение ОМУСА и указать необходимые данные. Мы благодарим всех авторов и издателей за разрешение размещать электронные версии своих работ на этом сайте. Dear reader, The Union of Young Scientists and Specialists of Artsakh (UYSSA) presents its project - Artsakh E-Library website, where you can find and download for FREE scientific and research, cognitive and literary materials on Artsakh in Armenian, Russian and English languages. If re-using any material from our site you have first to get the UYSSA approval and specify the required data. We thank all the authors
    [Show full text]
  • A Comparison of Knowledge Extraction Tools for the Semantic Web
    A Comparison of Knowledge Extraction Tools for the Semantic Web Aldo Gangemi1;2 1 LIPN, Universit´eParis13-CNRS-SorbonneCit´e,France 2 STLab, ISTC-CNR, Rome, Italy. Abstract. In the last years, basic NLP tasks: NER, WSD, relation ex- traction, etc. have been configured for Semantic Web tasks including on- tology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowl- edge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output. 1 Introduction We present a landscape analysis of the current tools for Knowledge Extraction from text (KE), when applied on the Semantic Web (SW). Knowledge Extraction from text has become a key semantic technology, and has become key to the Semantic Web as well (see. e.g. [31]). Indeed, interest in ontology learning is not new (see e.g. [23], which dates back to 2001, and [10]), and an advanced tool like Text2Onto [11] was set up already in 2005. However, interest in KE was initially limited in the SW community, which preferred to concentrate on manual design of ontologies as a seal of quality. Things started changing after the linked data bootstrapping provided by DB- pedia [22], and the consequent need for substantial population of knowledge bases, schema induction from data, natural language access to structured data, and in general all applications that make joint exploitation of structured and unstructured content.
    [Show full text]
  • ACL 2019 Social Media Mining for Health Applications (#SMM4H)
    ACL 2019 Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task Proceedings of the Fourth Workshop August 2, 2019 Florence, Italy c 2019 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-950737-46-8 ii Preface Welcome to the 4th Social Media Mining for Health Applications Workshop and Shared Task - #SMM4H 2019. The total number of users of social media continues to grow worldwide, resulting in the generation of vast amounts of data. Popular social networking sites such as Facebook, Twitter and Instagram dominate this sphere. According to estimates, 500 million tweets and 4.3 billion Facebook messages are posted every day 1. The latest Pew Research Report 2, nearly half of adults worldwide and two- thirds of all American adults (65%) use social networking. The report states that of the total users, 26% have discussed health information, and, of those, 30% changed behavior based on this information and 42% discussed current medical conditions. Advances in automated data processing, machine learning and NLP present the possibility of utilizing this massive data source for biomedical and public health applications, if researchers address the methodological challenges unique to this media. In its fourth iteration, the #SMM4H workshop takes place in Florence, Italy, on August 2, 2019, and is co-located with the
    [Show full text]
  • Welsh Language Technology Action Plan Progress Report 2020 Welsh Language Technology Action Plan: Progress Report 2020
    Welsh language technology action plan Progress report 2020 Welsh language technology action plan: Progress report 2020 Audience All those interested in ensuring that the Welsh language thrives digitally. Overview This report reviews progress with work packages of the Welsh Government’s Welsh language technology action plan between its October 2018 publication and the end of 2020. The Welsh language technology action plan derives from the Welsh Government’s strategy Cymraeg 2050: A million Welsh speakers (2017). Its aim is to plan technological developments to ensure that the Welsh language can be used in a wide variety of contexts, be that by using voice, keyboard or other means of human-computer interaction. Action required For information. Further information Enquiries about this document should be directed to: Welsh Language Division Welsh Government Cathays Park Cardiff CF10 3NQ e-mail: [email protected] @cymraeg Facebook/Cymraeg Additional copies This document can be accessed from gov.wales Related documents Prosperity for All: the national strategy (2017); Education in Wales: Our national mission, Action plan 2017–21 (2017); Cymraeg 2050: A million Welsh speakers (2017); Cymraeg 2050: A million Welsh speakers, Work programme 2017–21 (2017); Welsh language technology action plan (2018); Welsh-language Technology and Digital Media Action Plan (2013); Technology, Websites and Software: Welsh Language Considerations (Welsh Language Commissioner, 2016) Mae’r ddogfen yma hefyd ar gael yn Gymraeg. This document is also available in Welsh.
    [Show full text]
  • Full-Text Processing: Improving a Practical NLP System Based on Surface Information Within the Context
    Full-text processing: improving a practical NLP system based on surface information within the context Tetsuya Nasukawa. IBM Research Tokyo Resem~hLaborat0ry t623-14, Shimotsurum~, Yimmt;0¢sl{i; I<almgawa<kbn 2421,, J aimn • nasukawa@t,rl:, vnet;::ibm icbm Abstract text. Without constructing a i):recige filodel of the eohtext through, deep sema~nfiCamtlys~is, our frmne= Rich information fl)r resolving ambigui- "work-refers .to a set(ff:parsed trees. (.r~sltlt~ 9 f syn- ties m sentence ~malysis~ including vari- : tacti(" miaiysis)ofeach sexitencd in t.li~;~i'ext as (:on- ous context-dependent 1)rol)lems. can be ob- text ilfformation, Thus. our context model consists tained by analyzing a simple set of parsed of parse(f trees that are obtained 1)y using mi exlst- ~rces of each senten('e in a text withom il!g g¢lwral syntactic parser. Excel)t for information constructing a predse model of the contex~ ()It the sequence of senl;,en('es, olIr framework does nol tl(rough deep senmntic.anMysis. Th.us. pro- consider any discourse stru(:~:ure mwh as the discourse cessmg• ,!,' a gloup" of sentem' .., '~,'(s togethel. i ': makes.' • " segmenm, focus space stack, or dominant hierarclty it.,p.(,)ss{t?!e .t.9 !~npl:ovel-t]le ~ccui'a~'Y (?f a :: :.it~.(.fi.ii~idin:.(cfi.0szufid, Sht/!er, dgs6)i.Tli6refbi.e, om< ,,~ ~-. ~t.ehi- - ....t sii~ 1 "~g;. ;, ~-ni~chin¢''ti-mslat{b~t'@~--..: .;- . .? • . : ...... - ~,',". ..........;-.preaches ":........... 'to-context pr0cessmg,,' • and.m" "tier-ran : ' le d at. •tern ; ::'Li:In i. thin.j.;'P'..p) a (r ~;.!iwe : .%d,es(tib~, ..i-!: ....
    [Show full text]
  • Learning to Read by Spelling Towards Unsupervised Text Recognition
    Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group Visual Geometry Group Visual Geometry Group University of Oxford University of Oxford University of Oxford [email protected] [email protected] [email protected] tttttttttttttttttttttttttttttttttttttttttttrssssss ttttttt ny nytt nr nrttttttt ny ntt nrttttttt zzzz iterations iterations bcorote to whol th ticunthss tidio tiostolonzzzz trougfht to ferr oy disectins it has dicomered Training brought to view by dissection it was discovered Figure 1: Text recognition from unaligned data. We present a method for recognising text in images without using any labelled data. This is achieved by learning to align the statistics of the predicted text strings, against the statistics of valid text strings sampled from a corpus. The figure above visualises the transcriptions as various characters are learnt through the training iterations. The model firstlearns the concept of {space}, and hence, learns to segment the string into words; followed by common words like {to, it}, and only later learns to correctly map the less frequent characters like {v, w}. The last transcription also corresponds to the ground-truth (punctuations are not modelled). The colour bar on the right indicates the accuracy (darker means higher accuracy). ABSTRACT 1 INTRODUCTION This work presents a method for visual text recognition without read (ri:d) verb • Look at and comprehend the meaning of using any paired supervisory data. We formulate the text recogni- (written or printed matter) by interpreting the characters or tion task as one of aligning the conditional distribution of strings symbols of which it is composed.
    [Show full text]
  • Detecting Personal Life Events from Social Media
    Open Research Online The Open University’s repository of research publications and other research outputs Detecting Personal Life Events from Social Media Thesis How to cite: Dickinson, Thomas Kier (2019). Detecting Personal Life Events from Social Media. PhD thesis The Open University. For guidance on citations see FAQs. c 2018 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.00010aa9 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Detecting Personal Life Events from Social Media a thesis presented by Thomas K. Dickinson to The Department of Science, Technology, Engineering and Mathematics in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the subject of Computer Science The Open University Milton Keynes, England May 2019 Thesis advisor: Professor Harith Alani & Dr Paul Mulholland Thomas K. Dickinson Detecting Personal Life Events from Social Media Abstract Social media has become a dominating force over the past 15 years, with the rise of sites such as Facebook, Instagram, and Twitter. Some of us have been with these sites since the start, posting all about our personal lives and building up a digital identify of ourselves. But within this myriad of posts, what actually matters to us, and what do our digital identities tell people about ourselves? One way that we can start to filter through this data, is to build classifiers that can identify posts about our personal life events, allowing us to start to self reflect on what we share online.
    [Show full text]
  • Spelling Correction: from Two-Level Morphology to Open Source
    Spelling Correction: from two-level morphology to open source Iñaki Alegria, Klara Ceberio, Nerea Ezeiza, Aitor Soroa, Gregorio Hernandez Ixa group. University of the Basque Country / Eleka S.L. 649 P.K. 20080 Donostia. Basque Country. [email protected] Abstract Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the "free world" (OpenOffice, Mozilla, emacs, LaTeX, ...)? Ispell and similar tools (aspell, hunspell, myspell) are the usual mechanisms for these purposes, but they do not fit the two-level model. In the absence of two-level morphology based mechanisms, an automatic conversion from two-level description to hunspell is described in this paper. previous work and reuse the morphological information. 1. Introduction Two-level morphology (Koskenniemi, 1983; Beesley & Karttunen, 2003) has been applied successfully to the 2. Free software for spelling correction morphological description of highly inflected languages. Unfortunately there are not open source tools for spelling There are two-level based descriptions for very different correction with these features: languages (English, German, Swedish, French, Spanish, • It is standardized in the most of the applications Danish, Norwegian, Finnish, Basque, Russian, Turkish, (OpenOffice, Mozilla, emacs, LaTeX, ...). Arab, Aymara, Swahili, etc.). • It is based on the two-level morphology. After doing the morphological description, it is easy to The spell family of spell checkers (ispell, aspell, myspell) develop a spelling checker/corrector for the language fits the first condition but not the second.
    [Show full text]
  • Natural Language Processing
    Chowdhury, G. (2003) Natural language processing. Annual Review of Information Science and Technology, 37. pp. 51-89. ISSN 0066-4200 http://eprints.cdlr.strath.ac.uk/2611/ This is an author-produced version of a paper published in The Annual Review of Information Science and Technology ISSN 0066-4200 . This version has been peer-reviewed, but does not include the final publisher proof corrections, published layout, or pagination. Strathprints is designed to allow users to access the research output of the University of Strathclyde. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in Strathprints to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profitmaking activities or any commercial gain. You may freely distribute the url (http://eprints.cdlr.strath.ac.uk) of the Strathprints website. Any correspondence concerning this service should be sent to The Strathprints Administrator: [email protected] Natural Language Processing Gobinda G. Chowdhury Dept. of Computer and Information Sciences University of Strathclyde, Glasgow G1 1XH, UK e-mail: [email protected] Introduction Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. NLP researchers aim to gather knowledge on how human beings understand and use language so that appropriate tools and techniques can be developed to make computer systems understand and manipulate natural languages to perform the desired tasks.
    [Show full text]