Developing Methods and Resources for Automated Processing of the African Language Igbo

Total Page:16

File Type:pdf, Size:1020Kb

Developing Methods and Resources for Automated Processing of the African Language Igbo Developing Methods and Resources for Automated Processing of the African Language Igbo Author: Supervisor: Ikechukwu E. Onyenwe Dr. Mark R. Hepple A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Department of Computer Science Faculty of Engineering April 2017 ii Declaration I hereby declare that I am the sole author of this thesis. That except where specific reference is made to the work of others, the contents of this thesis are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. The contents are the outcome of research done under my supervisor. Part of this thesis has appeared in the following publications: 1. Onyenwe, Ikechukwu E., Chinedu Uchechukwu, and Mark Hepple. Part-of-speech Tagset and Corpus Development for Igbo, an African. In Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop, pages 93{98, Dublin, Ireland, August 23-24 2014. 2015 Association for Computational Linguistics. 2. Onyenwe, Ikechukwu, Mark Hepple, Chinedu Uchechukwu, and Ignatius Ezeani. Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, pages 24{33, Hissar, Bulgaria, September 10, 2015. 2015 Association for Computational Linguistics. 3. Onyenwe, Ikechukwu E. and Mark Hepple. Predicting Morphologically-Complex Unknown Words in Igbo. In Proceedings of the Nineteenth International Con- ference on Text, Speech, Dialogue | TSD 2016, Brno, Czech Republic, September 12-16, 2016. Published by Springer-Verlag in Lecture Notes in Artificial Intelligence (LNAI), Volume 99241. 4. Onyenwe, Ikechukwu E. and Mark Hepple. Predicting Morphologically-Complex Unknown Words in Igbo. In Proceedings of the Community-based Building of Language Resources (CBBLR) workshop | TSD 2016, Brno, Czech Republic, September 12, 20162. 5. Onyenwe, Ikechukwu, Mark Hepple and Chinedu Uchechukwu. Improving Ac- curacy of Igbo Corpus Annotation Using Morphological Reconstruction and Transformation-Based Learning. JEP-TALN-RECITAL 2016, Workshop TALAf 2016: Traitement Automatique des Langues Africaines (TALAf 2016: African Language Processing), July, 2016, Paris, France, publisher: ATALA/AFCP, pages 1-10. 1 The best papers which succeeded in both review processes (by the TSD 2016 Conference PC and CBBLR Workshop 2016 PC) will be published in the TSD 2016 Springer and CBBLR Proceedings 2See footnote1 iii 6. Onyenwe, Ikechukwu, Mark Hepple, and Ignatius Ezeani. Towards An Effective Igbo Part-of-Speech Tagger. Manuscript submitted for journal publication. 7. Onyenwe, Ikechukwu E and Hepple, Mark and Uchechukwu, Chinedu and Ignatius Ezeani. A Basic Language Resource Kit Implementation for IgboNLP Project. Manuscript submitted for journal publication. The above jointly authored publications are primarily the work of the first author. The role of the co-authors was editorial and supervisory. This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement. Ikechukwu E. Onyenwe April 2017 iv v Dedication To my family: my lovely wife Obio. ma and my son Chimdi.ndu. To these great ones: Onyeanusi Ikechukwu and Gladys N. Onyenwe for being my wonderful parents. Dr. Mark Hepple for being extremely good mentor and supervisor to me. Prof. Boniface C.E. Egboka for believing in me. Rev. Canon Prof. A.D. Nkamnebe for overwhelming guidance and supports to me. In loving memory of my beloved sister Lily. Forever in my heart, my beautiful sister with a beautiful heart. vi Acknowledgements Firstly, I would like to express my exceptional appreciation and sincere thanks to my supervisor Dr. Mark Hepple (a Reader in Computer Science), who has been an immense mentor to me. I would like to thank him for supporting my research and enabling me to grow as a research scientist. His patience, motivation, immense knowledge and guidance on both research as well as on my career have been invaluable. His guidance and hard questions helped me to broaden my research from various standpoints in all the time of study and writing of this PHD thesis. I could not have envisaged having a greater supervisor and mentor for my PHD study. He is the true definition of a mentor. I would like to thank the rest of my panel committee members, my chair, Dr. John Barker and my advisor, Dr. Mark Stevenson, for their insightful comments and suggestions. My sincere thanks also go to Dr. Uchechukwu Chinedu, a senior linguist who provided me with some Igbo linguistic materials and ideas. His collaboration throughout this study was very helpful. The administrative teams (com-X groups) of Computer Science, University of Sheffield, are delightfully appreciated for their administrative supports throughout this PHD study. Many thanks to Nnamdi Azikiwe University and Tertiary Eduction Trust Fund (TET- Fund) Nigeria for the funding. Special thanks to the Vice Chancellor, Prof. Joe Ahaneku, and his management teams for their supports. I sincerely appreciate all who have positively impacted my life, especially these great ones: Mr. Robin and Mrs. Joan Story, The Rt. Rev Prof. 'kelue Okoye, HRH Sir Dr. Harry Obi-Nwosu, Prof. S.O. Anigbogu, and Pastor (Dr.) and Mrs. Sam Okerenta. I thank my fellow colleagues at IgboNLP project and Natural Language Processing Group of the Computer Science Department of Faculty of Engineering, The University of Sheffield, United Kingdom; for the sleepless nights we worked together meeting deadlines for panel meetings, research retreats, conferences , etc., and for all of the fun we had in the last four years. Special thanks to Mark Tice, Ignatius Ezeani, Olusayo Obajemu, Samuel Nwagbo, and Joshua Gbenga Adeyemi for their friendship and support beyond earthly norm. I am also thanking my wonderful colleagues and friends within and outside Nnamdi Azikiwe University in Nigeria, and those outside Nigeria for their calls and messages. You guys are one of the major reasons while I smile. A special thanks to my family; words cannot express how grateful I am to my wife (seed of beauty) and son, parents, in-laws, and siblings for all of their prayers on my behalf. To God be all the glory great things He has done. Amen! Ikechukwu E. Onyenwe, April 2017. vii viii Developing Methods and Resources for Automated Processing of the African Language Igbo Ikechukwu E. Onyenwe Abstract Natural Language Processing (NLP) research is still in its infancy in Africa. Most of languages in Africa have few or zero NLP resources available, of which Igbo is among those at zero state. In this study, we develop NLP resources to support NLP-based research in the Igbo language. The springboard is the development of a new part-of-speech (POS) tagset for Igbo (IgbTS) based on a slight adaptation of the EAGLES guideline as a result of language internal features not recognized in EAGLES. The tagset consists of three granularities: fine-grain (85 tags), medium-grain (70 tags) and coarse-grain (15 tags). The medium-grained tagset is to strike a balance between the other two grains for practical purpose. Following this is the preprocessing of Igbo electronic texts through normalization and tokenization processes. The tokenizer is developed in this study using the tagset definition of a word token and the outcome is an Igbo corpus (IgbC) of about one million tokens. This IgbTS was applied to a part of the IgbC to produce the first Igbo tagged corpus (IgbTC). To investigate the effectiveness, validity and reproducibility of the IgbTS, an inter-annotation agreement (IAA) exercise was undertaken, which led to the revision of the IgbTS where necessary. A novel automatic method was developed to bootstrap a manual annotation process through exploitation of the by-products of this IAA exercise, to improve IgbTC. To further improve the quality of the IgbTC, a committee of taggers approach was adopted to propose erroneous instances on IgbTC for correction. A novel automatic method that uses knowledge of affixes to flag and correct all morphologically-inflected words in the IgbTC whose tags violate their status as not being morphologically-inflected was also developed and used. Experiments towards the development of an automatic POS tagging system for Igbo using IgbTC show good accuracy scores comparable to other languages that these taggers have been tested on, such as English. Accuracy on the words previously unseen during the taggers' training (also called unknown words) is considerably low, and much lower on the unknown words that are morphologically-complex, which indicates difficulty in handling morphologically-complex words in Igbo. This was improved by adopting a morphological reconstruction method (a linguistically-informed segmentation into stems and affixes) that reformatted these morphologically-complex words into patterns learnable by machines. This enables taggers to use the knowledge of stems and associated affixes of these morphologically-complex words during the tagging process to predict their appropriate tags. Interestingly, this method outperforms other methods that existing taggers use in handling unknown words, and achieves an impressive increase for the accuracy of the morphologically-inflected unknown words and overall unknown words. These developments are the first NLP toolkit for the Igbo language and a step towards achieving the objective of Basic Language Resources Kits (BLARK)
Recommended publications
  • International Computer Science Institute Activity Report 2007
    International Computer Science Institute Activity Report 2007 International Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, CA 94704-1198 USA phone: (510) 666 2900 (510) fax: 666 2956 [email protected] http://www.icsi.berkeley.edu PRINCIPAL 2007 SPONSORS Cisco Defense Advanced Research Projects Agency (DARPA) Disruptive Technology Offi ce (DTO, formerly ARDA) European Union (via University of Edinburgh) Finnish National Technology Agency (TEKES) German Academic Exchange Service (DAAD) Google IM2 National Centre of Competence in Research, Switzerland Microsoft National Science Foundation (NSF) Qualcomm Spanish Ministry of Education and Science (MEC) AFFILIATED 2007 SPONSORS Appscio Ask AT&T Intel National Institutes of Health (NIH) SAP Volkswagen CORPORATE OFFICERS Prof. Nelson Morgan (President and Institute Director) Dr. Marcia Bush (Vice President and Associate Director) Prof. Scott Shenker (Secretary and Treasurer) BOARD OF TRUSTEES, JANUARY 2008 Prof. Javier Aracil, MEC and Universidad Autónoma de Madrid Prof. Hervé Bourlard, IDIAP and EPFL Vice Chancellor Beth Burnside, UC Berkeley Dr. Adele Goldberg, Agile Mind, Inc. and Pharmaceutrix, Inc. Dr. Greg Heinzinger, Qualcomm Mr. Clifford Higgerson, Walden International Prof. Richard Karp, ICSI and UC Berkeley Prof. Nelson Morgan, ICSI (Director) and UC Berkeley Dr. David Nagel, Ascona Group Prof. Prabhakar Raghavan, Stanford and Yahoo! Prof. Stuart Russell, UC Berkeley Computer Science Division Chair Mr. Jouko Salo, TEKES Prof. Shankar Sastry, UC Berkeley, Dean
    [Show full text]
  • A Resource of Corpus-Derived Typed Predicate Argument Structures for Croatian
    CROATPAS: A Resource of Corpus-derived Typed Predicate Argument Structures for Croatian Costanza Marini Elisabetta Ježek University of Pavia University of Pavia Department of Humanities Department of Humanities costanza.marini01@ [email protected] universitadipavia.it Abstract 2014). The potential uses and purposes of the resource range from multilingual The goal of this paper is to introduce pattern linking between compatible CROATPAS, the Croatian sister project resources to computer-assisted language of the Italian Typed-Predicate Argument learning (CALL). Structure resource (TPAS1, Ježek et al. 2014). CROATPAS is a corpus-based 1 Introduction digital collection of verb valency structures with the addition of semantic Nowadays, we live in a time when digital tools type specifications (SemTypes) to each and resources for language technology are argument slot, which is currently being constantly mushrooming all around the world. developed at the University of Pavia. However, we should remind ourselves that some Salient verbal patterns are discovered languages need our attention more than others if following a lexicographical methodology they are not to face – to put it in Rehm and called Corpus Pattern Analysis (CPA, Hegelesevere’s words – “a steadily increasing Hanks 2004 & 2012; Hanks & and rather severe threat of digital extinction” Pustejovsky 2005; Hanks et al. 2015), (2018: 3282). According to the findings of initiatives such as whereas SemTypes – such as [HUMAN], [ENTITY] or [ANIMAL] – are taken from a the META-NET White Paper Series (Tadić et al. shallow ontology shared by both TPAS 2012; Rehm et al. 2014), we can state that and the Pattern Dictionary of English Croatian is unfortunately among the 21 out of 24 Verbs (PDEV2, Hanks & Pustejovsky official languages of the European Union that are 2005; El Maarouf et al.
    [Show full text]
  • Implementing the GF Resource Grammar for Nepali Language Master of Science Thesis in Software Engineering and Technology
    müzik Implementing the GF Resource Grammar for Nepali Language Master of Science Thesis in Software Engineering and Technology DINESH SIMKHADA Chalmers University of Technology University of Gothenburg Department of Computer Science and Engineering Göteborg, Sweden, June 2012 The Author grants to Chalmers University of Technology and University of Gothenburg the non-exclusive right to publish the Work electronically and in a non-commercial purpose make it accessible on the Internet. The Author warrants that he/she is the author to the Work, and warrants that the Work does not contain text, pictures or other material that violates copyright law. The Author shall, when transferring the rights of the Work to a third party (for example a publisher or a company), acknowledge the third party about this agreement. If the Author has signed a copyright agreement with a third party regarding the Work, the Author warrants hereby that he/she has obtained any necessary permission from this third party to let Chalmers University of Technology and University of Gothenburg store the Work electronically and make it accessible on the Internet. Implementing GF Resource Grammar for Nepali language DINESH, SIMKHADA © DINESH SIMKHADA, June 2012. Examiner: AARNE RANTA, Prof. Chalmers University of Technology University of Gothenburg Department of Computer Science and Engineering SE-412 96 Göteborg Sweden Telephone + 46 (0)31-772 1000 Cover: concept showing translation of Nepali word संगीत (music) to different languages that are available in Grammatical Framework. Inspired from GF summer school poster and stock images Department of Computer Science and Engineering Göteborg, Sweden June 2012 Abstract The Resource Grammar Library is a very important part of Grammatical Framework.
    [Show full text]
  • Department of English School of Languages and Literature Sikkim University Gangtok-737102
    From Race to Nation: A Critical Perspective on the works of William Butler Yeats and Hari Bhakta Katuwal Vivek Mishra Department of English School of Languages and Literature Submitted in partial fulfillment of the degree of Master of Philosophy February 2017 Department of English School of Languages and Literature Sikkim University Gangtok-737102 ACKNOWLEDGEMENTS The researching and writing of this dissertation has proved to be a profitable experience for me in the academic field, and for this I owe a great debt to these people. Firstly, I take this opportunity to thank my supervisor Dr. Rosy Chamling for her guidance, support and encouragement that enabled me to complete my work. I thank Dr. Irshad Ghulam Ahmed, the Head of the Department for his guidance and insightful inputs. I am grateful to the faculty members of English Department for their support during my Masters in Philosophy programme. I thank my parents and my sister for their endless love and support. For help in finding material and providing insights vis-a-vis the Nepali poet in my study I want to thank many people, but particularly Smt. Kabita Chetry and Nabanita Chetry. My thanks extend to my friends – Bipin Baral, Afrida Aainun Murshida, Ghunato Neho, Anup Sharma and Kritika Nepal for their selfless assistance during my entire research work. Vivek Mishra CONTENTS ACKNOWLEDGEMENTS Chapter – 1 INTRODUCTION (1 – 5) Chapter – 2 RACIAL AND NATIONALISTIC CONSCIOUSNESS IN THE WORKS OF YEATS AND KATUWAL (6 – 28) Chapter – 3 REPRESENTATION OF IRISH AND NEPALI CULTURES IN YEATS AND KATUWAL (29 – 43) Chapter – 4 LYRICAL QUALITY IN YEATS AND KATUWAL (44 – 53) Chapter – 5 CONCLUSION (54 – 66) CHAPTER I INTRODUCTION The present study entitled “From Race to Nation: A Critical Perspective on the works of William Butler Yeats and Hari Bhakta Katuwal” shall be a comparative literary survey across languages i.e.
    [Show full text]
  • Burmese, a Grammar of (Soe).Pdf
    INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. Bell & Howell Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A GRAMMAR OF BURMESE by MYINTSOE A DISSERTATION Presented to the Department of Linguistics and the Graduate School of the University of Oregon in partial fulfillment o f the requirements for the degree of Doctor of Philosophy December 1999 Reproduced with permission of the copyright owner.
    [Show full text]
  • 3 Corpus Tools for Lexicographers
    Comp. by: pg0994 Stage : Proof ChapterID: 0001546186 Date:14/5/12 Time:16:20:14 Filepath:d:/womat-filecopy/0001546186.3D31 OUP UNCORRECTED PROOF – FIRST PROOF, 14/5/2012, SPi 3 Corpus tools for lexicographers ADAM KILGARRIFF AND IZTOK KOSEM 3.1 Introduction To analyse corpus data, lexicographers need software that allows them to search, manipulate and save data, a ‘corpus tool’. A good corpus tool is the key to a comprehensive lexicographic analysis—a corpus without a good tool to access it is of little use. Both corpus compilation and corpus tools have been swept along by general technological advances over the last three decades. Compiling and storing corpora has become far faster and easier, so corpora tend to be much larger than previous ones. Most of the first COBUILD dictionary was produced from a corpus of eight million words. Several of the leading English dictionaries of the 1990s were produced using the British National Corpus (BNC), of 100 million words. Current lexico- graphic projects we are involved in use corpora of around a billion words—though this is still less than one hundredth of one percent of the English language text available on the Web (see Rundell, this volume). The amount of data to analyse has thus increased significantly, and corpus tools have had to be improved to assist lexicographers in adapting to this change. Corpus tools have become faster, more multifunctional, and customizable. In the COBUILD project, getting concordance output took a long time and then the concordances were printed on paper and handed out to lexicographers (Clear 1987).
    [Show full text]
  • 153 Natasha Abner (University of Michigan)
    Natasha Abner (University of Michigan) LSA40 Carlo Geraci (Ecole Normale Supérieure) Justine Mertz (University of Paris 7, Denis Diderot) Jessica Lettieri (Università degli studi di Torino) Shi Yu (Ecole Normale Supérieure) A handy approach to sign language relatedness We use coded phonetic features and quantitative methods to probe potential historical relationships among 24 sign languages. Lisa Abney (Northwestern State University of Louisiana) ANS16 Naming practices in alcohol and drug recovery centers, adult daycares, and nursing homes/retirement facilities: A continuation of research The construction of drug and alcohol treatment centers, adult daycare centers, and retirement facilities has increased dramatically in the United States in the last thirty years. In this research, eleven categories of names for drug/alcohol treatment facilities have been identified while eight categories have been identified for adult daycare centers. Ten categories have become apparent for nursing homes and assisted living facilities. These naming choices function as euphemisms in many cases, and in others, names reference morphemes which are perceived to reference a higher social class than competitor names. Rafael Abramovitz (Massachusetts Institute of Technology) P8 Itai Bassi (Massachusetts Institute of Technology) Relativized Anaphor Agreement Effect The Anaphor Agreement Effect (AAE) is a generalization that anaphors do not trigger phi-agreement covarying with their binders (Rizzi 1990 et. seq.) Based on evidence from Koryak (Chukotko-Kamchan) anaphors, we argue that the AAE should be weakened and be stated as a generalization about person agreement only. We propose a theory of the weakened AAE, which combines a modification of Preminger (2019)'s AnaphP-encapsulation proposal as well as converging evidence from work on the internal syntax of pronouns (Harbour 2016, van Urk 2018).
    [Show full text]
  • Nepali Grammar and Vocabulary
    Nepali Grammar & Vocabulary REV. A. TURNBULL THIRD EDITION Edited by the REV. R. KILGOUR, D.D. ASIAN EDUCATIONAL SERVICES NEW DELHI ★ MADRAS ★ 1992 ASIAN EDUCATIONAL SERVICES. * C-2/15, S.DA NEW DELHI-110016 * 5 SRIPGRAM FIRST STREET, MADRAS-600014. fir] AES First AES Reprint 1982 Second AES Reprint 1992 IS8N 81-206-0102-5 Published by J Jet ley for ASIAN EDUCATIONAL. SERVICES C-2/15. SDA New Delhi-I 10016 Processed Dy APEX PUBLICATION SERVICES New Delhi-II0016 Printed at Nice Printing Press Delhi-I 10092 PREFACE TO THE SECOND EDITION. This “second edition” of my 1887 attempt to reduce to standard rule the Nepali tongue, as spoken at cosmopolitan Darjeeling, is really an entirely new work. In preparing it I have had the advantage of elaborate criti¬ cal notes on the first edition by the most exalted authority in Kathmandu ; in passing it through the press, the invaluable assistance of the Rev. *G. P. Pradhan, the highest authority in Darjeeling ; and in meeting the expense, the practical patron¬ age of the Government of Bengal, the advance purchaser of an adequate number of the copies. In perusing it the student will perhaps allow’ some of its shortcomings—to be excused by my distance from the prin¬ ter, and himself to be persuaded to learn the native alphabet at the outset—the language is much easier in its own charac¬ ter than in the ill-fitting Roman—and to procure in due course The Acts of the Apostles in Nepali (Bible House, 23. Ohow- ringhi, Calcutta) and in English (R.V.), as complemental text¬ books.
    [Show full text]
  • AVAILABLE from 'Bookstore, ILC, 7500 West Camp Wisdom Rd
    DOCUMENT RESUME ED 401 726 FL 024 212 AUTHOR Payne, David, Ed. TITLE Notes on Linguistics, 1996. INSTITUTION Summer Inst. of Linguistics, Dallas, Tex. REPORT NO ISSN-0736-0673 PUB DATE 96 NOTE 239p. AVAILABLE FROM 'Bookstore, ILC, 7500 West Camp Wisdom Rd., Dallas, TX 75236 (one year subscription: SIL members, $15.96 in the U.S., $19.16 foreign; non-SIL members, $19.95 in the U.S.; $23.95 foreign; prices include postage and handling). PUB TYPE Collected Works Serials (022) JOURNAL CIT Notes on Linguistics; n72-75 1996 EDRS PRICE MF01/PC10 Plus Postage. DESCRIPTORS Book Reviews; Computer Software; Conferences; Dialects; Doctoral Dissertations; Group Activities; *Language Patterns; *Language Research; *Linguistic Theory; Native Speakers; Phonology; Professional Associations; Publications; Research Methodology; *Syntax; Textbooks; Tone Languages; Workshops IDENTIFIERS 'Binding Theory ABSTRACT The four 1996 issues of this journal contain the following articles: "Sketch of Autosegmental Tonology" (H. Andrew Black); "System Relationships in Assessing Dialect Intelligibility" (Margaret Milliken, Stuart Milliken); "A Step-by-Step Introduction to Government and Binding Theory of Syntax" (Cheryl A. Black); "Participatory Research in Linguistics" (Constance Kutsch Lojenga); "Introduction to Government and Binding Theory II" (Cheryl A. Black); What To Do with CECIL?" (Joan Baart); "WINCECIL" (Jerold A. Edmondson); "Introduction to Government and Binding Theory III" (Cheryl A. Black); and "Mainland Southeast Asia: A Unique Linguistic Area" (Brian Migliazza). Each issue also contains notes from the SIL Linguistics Department coordinator, a number of reports on linguistics association conferences around the world, book and materials reviews, and professional announcements. (MSE) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document.
    [Show full text]
  • Nepalese Translation Volume 1, September 2017 Nepalese Translation
    Nepalese Translation Volume 1, September 2017 Nepalese Translation Volume 1,September2017 Volume cg'jfbs ;dfh g]kfn Society of Translators Nepal Nepalese Translation Volume 1 September 2017 Editors Basanta Thapa Bal Ram Adhikari Office bearers for 2016-2018 President Victor Pradhan Vice-president Bal Ram Adhikari General Secretary Bhim Narayan Regmi Secretary Prem Prasad Poudel Treasurer Karuna Nepal Member Shekhar Kharel Member Richa Sharma Member Bimal Khanal Member Sakun Kumar Joshi Immediate Past President Basanta Thapa Editors Basanta Thapa Bal Ram Adhikari Nepalese Translation is a journal published by Society of Translators Nepal (STN). STN publishes peer reviewed articles related to the scientific study on translation, especially from Nepal. The views expressed therein are not necessarily shared by the committee on publications. Published by: Society of Translators Nepal Kamalpokhari, Kathmandu Nepal Copies: 300 © Society of Translators Nepal ISSN: 2594-3200 Price: NC 250/- (Nepal) US$ 5/- EDITORIAL strategies the practitioners have followed to Translation is an everyday phenomenon in the overcome them. The authors are on the way to multilingual land of Nepal, where as many as 123 theorizing the practice. Nepali translation is languages are found to be in use. It is through desperately waiting for such articles so that translation, in its multifarious guises, that people diverse translation experiences can be adequately speaking different languages and their literatures theorized. The survey-based articles present a are connected. Historically, translation in general bird's eye view of translation tradition in the is as old as the Nepali language itself and older languages such as Nepali and Tamang. than its literature.
    [Show full text]
  • Better Web Corpora for Corpus Linguistics and NLP
    Masaryk University Faculty of Informatics Better Web Corpora For Corpus Linguistics And NLP Doctoral Thesis Vít Suchomel Brno, Spring 2020 Masaryk University Faculty of Informatics Better Web Corpora For Corpus Linguistics And NLP Doctoral Thesis Vít Suchomel Brno, Spring 2020 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Vít Suchomel Advisor: Pavel Rychlý i Acknowledgements I would like to thank my advisors, prof. Karel Pala and prof. Pavel Rychlý for their problem insight, help with software design and con- stant encouragement. I am also grateful to my colleagues from Natural Language Process- ing Centre at Masaryk University and Lexical Computing, especially Miloš Jakubíček, Pavel Rychlý and Aleš Horák, for their support of my work and invaluable advice. Furthermore, I would like to thank Adam Kilgarriff, who gave me a wonderful opportunity to work for a leading company in the field of lexicography and corpus driven NLP and Jan Pomikálek who helped me to start. I thank to my wife Kateřina who supported me a lot during writing this thesis. Of those who have always accepted me and loved me in spite of my failures, God is the greatest. ii Abstract The internet is used by computational linguists, lexicographers and social scientists as an immensely large source of text data for various NLP tasks and language studies. Web corpora can be built in sizes which would be virtually impossible to achieve using traditional corpus creation methods.
    [Show full text]
  • Overview: Computational Lexical Semantics and the Week Ahead
    Computational Lexical Semantics My Background and Research Interests Overview: Computational Lexical Semantics and the Week Ahead Diana McCarthy University of Melbourne, July 2011 McCarthy Overview Computational Lexical Semantics My Background and Research Interests Outline 1 Computational Lexical Semantics Word Meaning Representation Distributional Similarity Word Sense Disambiguation Semantic Relations Multiword Expressions Predicate Argument Structure: the syntax-semantics interface 2 My Background and Research Interests Academic Interests Commercial Interests /Demos Sketch Engine, and related tools Dante Other Related Projects McCarthy Overview Word Meaning Representation Computational Lexical Semantics Distributional Similarity My Background and Research Interests Word Sense Disambiguation Semantic Relations Multiword Expressions Motivation Predicate Argument Structure: the syntax-semantics interface we interpret and use language for communication words have meaning if we want to machines to manipulate language as we do they need to be able to distinguish meanings and use words appropriately McCarthy Overview Word Meaning Representation Computational Lexical Semantics Distributional Similarity My Background and Research Interests Word Sense Disambiguation Semantic Relations Multiword Expressions Drawbacks Predicate Argument Structure: the syntax-semantics interface semantic phenomena covert what is the appropriate representation? more variation compared to syntax and morphology less straightforward to evaluate, unless we focus on
    [Show full text]