Words in a Text

Total Page:16

File Type:pdf, Size:1020Kb

Words in a Text Cross-lingual geo-parsing for non-structured data Judith Gelernter Wei Zhang Language Technologies Institute Language Technologies Institute School of Computer Science School of Computer Science Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] ABSTRACT to cross-language geo-information retrieval. We will examine A geo-parser automatically identifies location words in a text. We Strötgen’s view that normalized location information is language- have generated a geo-parser specifically to find locations in independent [16]. Difficulties in cross-language experiments are unstructured Spanish text. Our novel geo-parser architecture exacerbated when the languages use different alphabets, and when combines the results of four parsers: a lexico-semantic Named the focus is on proper names, as in this case, the names of Location Parser, a rules-based building parser, a rules-based street locations. parser, and a trained Named Entity Parser. Each parser has Geo-parsing structured versus unstructured text requires different different strengths: the Named Location Parser is strong in recall, language processing tools. Twitter messages are challenging to and the Named Entity Parser is strong in precision, and building geo-parse because of their non-grammaticality. To handle non- and street parser finds buildings and streets that the others are not grammatical forms, we use a Twitter tokenizer rather than a word designed to do. To test our Spanish geo-parser performance, we tokenizer. We use an English part of speech tagger created for compared the output of Spanish text through our Spanish geo- tweets, which was not available to us in Spanish. Our machine parser, with that same Spanish text translated into English and run learning models for recognizing location words acknowledge through our English geo-parser. The results were that the Spanish Twitter-specific features such as hash tags (index terms specific geo-parser identified toponyms with an F1 of .796, and the to Twitter), website addresses and names in Twitter @mentions. English geo-parser identified toponyms with an F1 of .861 (and Both our English and Spanish models were trained on tweets. this is despite errors introduced by translation from Spanish to English), compared to an F1 of .114 from a commercial off-the- Our focus is social media for crisis response due to the importance shelf Spanish geo-parser. Results suggest (1) geo-parsers should of location in message text. The use of social media for disaster be built specifically for unstructured text, as have our Spanish and relief had been established by the time of the Haiti earthquake in English geo-parsers, and (2) location entities in Spanish that have January 2010, when posts to Facebook, Flickr, YouTube and been machine translated to English are robust to geo-parsing in Twitter were numerous [5]. Research revealed that those English. messages sent during the Haiti earthquake that were considered “actionable” and more useful than the others were generally the Categories and Subject Descriptors ones that included locations [10]. Moreover, Twitterers themselves are more likely to pass along, or re-tweet, messages H.3.1 [Content Analysis and Indexing]: Linguistic processing that include geo-location and situational updates [17], indicating General Terms that Twitterers find such messages important. Our data for this Algorithms, Design research are tweets pertaining to a 2010 earthquake in Chile. The basis of our experiment is to send a test set of Spanish Keywords Chilean earthquake messages through our Spanish geo-parser, a geo-parse, location, translation, Spanish, Twitter, microtext, geo- commercial Spanish geo-parser, and in English translation reference, cross-language geographic information retrieval (CL- through our own English geo-parser. The output of the three GIR) algorithms were scored in comparison to manual geo-tagging annotation of the Spanish tweets by fluent Spanish speakers. 1. INTRODUCTION Translating a document into a language plentiful in language Our English translations of the Spanish tweets come from Google processing tools and location resources, such as English, amounts Translate (see Table 1 for example).1 This tool finds statistically- recurring patterns and selects out of hundreds of millions of human-translated documents what seems to be the best translation for unseen word groupings. Given the vast amount of high quality Permission to make digital or hard copies of all or part of this work for Spanish-English translations, the Google Translate output is also personal or classroom use is granted without fee provided that copies are of high quality. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for Our research questions are: components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to How does our Spanish geo-parser perform on identifying post on servers or to redistribute to lists, requires prior specific permission toponyms in Spanish tweets in comparison to our English geo- and/or a fee. Request permissions from [email protected]. GIR'13, November 05 2013, Orlando, FL, USA Copyright is held by the owner/author(s). Publication rights licensed to ACM. Copyright 2013 ACM 978-1-4503-2241-6/13/11$15.00 1 http://dx.doi.org/10.1145/2533888.2533943 http://translate.google.com/ and according to Google Translate “About”, http://translate.google.com/about/ 2.2 What to geo-parse? Table 1. Sample tweet on the February earthquake Locations found by geo-parsers are usually toponyms, or political in Chile, 2010 (with locations in bold) and administrative place names at the resolution of city or higher Spanish Info geologica Terremoto Chile: …construidos sobre la parte in the spatial hierarchy. Some have mined for telephone number tweet plana de concepcion, dundadas sobre suelo lando (arenas del and postal codes for additional location precision [1], but neither bio bio y relleno. telephone nor postal information appear in tweets with any Google- Chile earthquake geological info: … built on the flat of frequency. We parse for streets and buildings as well as translated conception, on soil dundadas lando (bio bio sands and fillers) toponyms because that is what our preliminary inquiry into tweet locations in tweet text uncovered [7]. The OpenCalais by Thomson Reuters parses for facility, which we call building.5 parser finding toponyms on those tweets Google-translated into 2.3 Geo-parser structure English? Multi-lingual text mining systems have been developed in one language and ported to another, as for example Gamon et al. [4]. How does our Spanish geo-parser for unstructured text compare For cross-lingual applications such as cross-lingual topic tracking to a commercial geo-parser for Spanish?2 or linking related documents across languages, the most common approach is to use Machine Translation or bilingual dictionaries to Our contributions include (1) a Spanish geo-parser that is tuned to translate foreign languages into one language, typically English informal, unstructured text,3 and (2) a prediction based on our [15]. We will try this approach for geo-parsing tweets. experimental results about the relative utility of geo-parsing within a language compared with geo-parsing across languages 2.4 Misspelling which use the same alphabet. The Levenshtein algorithm that identifies misspellings was proposed in the 1960s, and the original paper is in Russian.6 The 2. LITERATURE REVIEW distance between two spellings has been called the “edit distance” 2.1 Geo-parsing in that it is the number of edits needed to transform one string into Geo-parsing is a subset of named entity recognition, which another string. Edits are insertion, deletion or substitution. Other identifies entities in text and classifies them into categories such methods of identifying mis-spelled words have been based on as agent, organization and location. Our literature review focuses web-wide data with many more words spelled correctly than mis- therefore on language processing rather than geographic spelled [18]. This, along with the edit distance of Levenshtein, is information retrieval. the basis of the Norvig algorithm that uses probability theory to find the correct spelling of a word [11.]. Misspelling sub-routines Geo-parsing is performed automatically based on language rules. are not standard in geo-parsers, so that locations spelled A model based on language rules is then applied to unseen incorrectly or in non-standard form are often missed. instances to find location words in unseen text [2]. The text of a tweet is not only unstructured, it is also brief. If not supplemented 3. GAZETTEER STRUCTURE with metadata, it is potentially difficult to parse due to lack of This section describes the structure of our gazetteer as it is part of context. It has been found that a data set too small or with both the Named Location Parser and the Named Entity insufficient context for the location word as in the geo-parsing of Recognizer parser. Gazetteer structure is essential because of its word or phrase search queries is liable to yield low recall and immense multi-GB size. Without accommodating the data precision results [8]. structure, gazetteer lookup would make processing time Our Spanish geo-parser was created specifically for this research. unacceptable. Our English geo-parser was created earlier [6], but it has 3.1 Gazetteer as trie and inverted index continued to be developed in concert with our Spanish geo-parser We use the structure of a trie tree (from retrieval) because it is for the benefit of this research. Other geo-parsers include those by flexible for information look-up. Searching character by character MetaCarta, Yahoo! Placemaker, GeoGravy, NetOwl, and open allows us to match word approximations that are misspellings, and source systems such as CLAVIN, TextGrounder, Geodict, adjectival locations that are similar in stem but differ in ending GeoDoc, Geotxt, Europeana, OpenSextant and DIGMAP, as well from the gazetteer term.
Recommended publications
  • Geographical Names and Sustainable Tourism
    No. 59 NOVEMBERNo. 59 NOVEMBER 2020 2020 Geographical Names and Sustainable Tourism Socio- Institutional cultural Sustainable Tourism Economic Environmental Table of Contents The Information Bulletin of the United Nations MESSAGE FROM THE CHAIRPERSON ............................................... 3 Group of Experts on Geographical Names (formerly Reconsidérer notre mobilité ......................................................... 3 UNGEGN Newsletter) is issued twice a year by the Secretariat of the Group of Experts. The Secretariat Reconsider our mobility ............................................................... 4 is served by the Statistics Division (UNSD), MESSAGE FROM THE SECRETARIAT ................................................. 5 Department for Economic and Social Affairs (DESA), Secretariat of the United Nations. Contributions “Geographical names and sustainable tourism ............................ 5 and reports received from the Experts of the Group, IN MEMORIAM ................................................................................ 7 its Linguistic/Geographical Divisions and its Working Groups are reviewed and edited jointly by the Danutė Janė Mardosienė (1947-2020) ........................................ 7 Secretariat and the UNGEGN Working Group on SPECIAL FEATURE: GEOGRAPHICAL NAMES AND SUSTAINABLE Publicity and Funding. Contributions for the TOURISM ......................................................................................... 9 Information Bulletin can only be considered when they are made
    [Show full text]
  • Exonyms – Standards Or from the Secretariat  Message from the Secretariat 4
    NO. 50 JUNE 2016 In this issue Preface Message from the Chairperson 3 Exonyms – standards or From the Secretariat Message from the Secretariat 4 Special Feature – Exonyms – standards standardization? or standardization? What are the benefits of discerning 5-6 between endonym and exonym and what does this divide mean Use of Exonyms in National 6-7 Exonyms/Endonyms Standardization of Geographical Names in Ukraine Dealing with Exonyms in Croatia 8-9 History of Exonyms in Madagascar 9-11 Are there endonyms, exonyms or both? 12-15 The need for standardization Exonyms, Standards and 15-18 Standardization: New Directions Practice of Exonyms use in Egypt 19-24 Dealing with Exonyms in Slovenia 25-29 Exonyms Used for Country Names in the 29 Repubic of Korea Botswana – Exonyms – standards or 30 standardization? From the Divisions East Central and South-East Europe 32 Division Portuguese-speaking Division 33 From the Working Groups WG on Exonyms 31 WG on Evaluation and Implementation 34 From the Countries Burkina Faso 34-37 Brazil 38 Canada 38-42 Republic of Korea 42 Indonesia 43 Islamic Republic of Iran 44 Saudi Arabia 45-46 Sri Lanka 46-48 State of Palestine 48-50 Training and Eucation International Consortium of Universities 51 for Training in Geographical Names established Upcoming Meetings 52 UNGEGN Information Bulletin No. 50 June 2106 Page 1 UNGEGN Information Bulletin The Information Bulletin of the United Nations Group of Experts on Geographical Names (formerly UNGEGN Newsletter) is issued twice a year by the Secretariat of the Group of Experts. The Secretariat is served by the Statistics Division (UNSD), Department for Economic and Social Affairs (DESA), Secretariat of the United Nations.
    [Show full text]
  • Latvia Toponymic Factfile
    TOPONYMIC FACT FILE Latvia Country name Latvia State title Republic of Latvia Name of citizen Latvian Official language Latvian (lv) Country name in official language Latvija State title in official language Latvijas Republika Script Roman n/a. Latvian uses the Roman alphabet with three Romanization System diacritics (see page 3). ISO-3166 country code (alpha-2/alpha-3) LV / LVA Capital (English conventional) Riga1 Capital in official language Rīga Population 1.88 million2 Introduction Latvia is the central of the three Baltic States3 in north-eastern Europe on the eastern shore of the Baltic Sea. It has existed as an independent state c.1918 to 1940 and again since 1990. In size it is similar to Sri Lanka or Sierra Leone. Latvia is approximately 1% smaller than neighbouring Lithuania, but has only two-thirds the population, estimated at 1.88 million in 20202. The population has been falling steadily since a high of 2,660,000 in 1989 source: Eurostat). Geographical names policy Latvian is written in Roman script. PCGN recommends using place names as found on official Latvian-language sources, retaining all diacritical marks. Latvian generic terms frequently appear with lower-case initial letters, and PCGN recommends reflecting this style. Allocation and recording of geographical names in Latvia are the responsibility of the Latvia Geospatial Information Agency (Latvian: Latvijas Ģeotelpiskās informācijas aģentūra – LGIA) which is part of the Ministry of Defence (Aizsardzības ministrija). The geographical names database on the LGIA website: http://map.lgia.gov.lv/index.php?lang=2&cPath=3&txt_id=24 is a useful official source for names.
    [Show full text]
  • Spell Checker
    International Journal of Scientific and Research Publications, Volume 5, Issue 4, April 2015 1 ISSN 2250-3153 SPELL CHECKER Vibhakti V. Bhaire, Ashiki A. Jadhav, Pradnya A. Pashte, Mr. Magdum P.G ComputerEngineering, Rajendra Mane College of Engineering and Technology Abstract- Spell Checker project adds spell checking and For handling morphology language-dependent algorithm is an correction functionality to the windows based application by additional step. The spell-checker will need to consider different using autosuggestion technique. It helps the user to reduce typing forms of the same word, such as verbal forms, contractions, work, by identifying any spelling errors and making it easier to possessives, and plurals, even for a lightly inflected language like repeat searches .The main goal of the spell checker is to provide English. For many other languages, such as those featuring unified treatment of various spell correction. Firstly the spell agglutination and more complex declension and conjugation, this checking and correcting problem will be formally describe in part of the process is more complicated. order to provide a better understanding of these task. Spell checker and corrector is either stand-alone application capable of A spell checker carries out the following processes: processing a string of words or a text or as an embedded tool which is part of a larger application such as a word processor. • It scans the text and selects the words contained in it. Various search and replace algorithms are adopted to fit into the • domain of spell checker. Spell checking identifies the words that It then compares each word with a known list of are valid in the language as well as misspelled words in the correctly spelled words (i.e.
    [Show full text]
  • Finite State Recognizer and String Similarity Based Spelling
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by BRAC University Institutional Repository FINITE STATE RECOGNIZER AND STRING SIMILARITY BASED SPELLING CHECKER FOR BANGLA Submitted to A Thesis The Department of Computer Science and Engineering of BRAC University by Munshi Asadullah In Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Science and Engineering Fall 2007 BRAC University, Dhaka, Bangladesh 1 DECLARATION I hereby declare that this thesis is based on the results found by me. Materials of work found by other researcher are mentioned by reference. This thesis, neither in whole nor in part, has been previously submitted for any degree. Signature of Supervisor Signature of Author 2 ACKNOWLEDGEMENTS Special thanks to my supervisor Mumit Khan without whom this work would have been very difficult. Thanks to Zahurul Islam for providing all the support that was required for this work. Also special thanks to the members of CRBLP at BRAC University, who has managed to take out some time from their busy schedule to support, help and give feedback on the implementation of this work. 3 Abstract A crucial figure of merit for a spelling checker is not just whether it can detect misspelled words, but also in how it ranks the suggestions for the word. Spelling checker algorithms using edit distance methods tend to produce a large number of possibilities for misspelled words. We propose an alternative approach to checking the spelling of Bangla text that uses a finite state automaton (FSA) to probabilistically create the suggestion list for a misspelled word.
    [Show full text]
  • Spell Checking in Computer-Assisted Language Learning: a Study of Misspellings by Nonnative Writers of German
    SPELL CHECKING IN COMPUTER-ASSISTED LANGUAGE LEARNING: A STUDY OF MISSPELLINGS BY NONNATIVE WRITERS OF GERMAN Anne Rimrott Diplom, Justus Liebig Universitat, 2002 THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS In the Department of Linguistics O Anne Rimrott 2005 SIMON FRASER UNIVERSITY Spring 2005 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. APPROVAL Name: Anne Rimrott Degree: Master of Arts Title of Thesis: Spell Checking in Computer-Assisted Language Learning: A Study of Misspellings by Nonnative Writers of German Examining Committee: Chair: Dr. Alexei Kochetov Assistant Professor, Department of Linguistics Dr. Trude Heift Senior Supervisor Associate Professor, Department of Linguistics Dr. Chung-hye Han Supervisor Assistant Professor, Department of Linguistics Dr. Maria Teresa Taboada Supervisor Assistant Professor, Department of Linguistics Dr. Mathias Schulze External Examiner Assistant Professor, Department of Germanic and Slavic Studies University of Waterloo Date DefendedIApproved: SIMON FRASER UNIVERSITY PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection.
    [Show full text]
  • Exploiting Wikipedia Semantics for Computing Word Associations
    Exploiting Wikipedia Semantics for Computing Word Associations by Shahida Jabeen A thesis submitted to the Victoria University of Wellington in fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science. Victoria University of Wellington 2014 Abstract Semantic association computation is the process of automatically quan- tifying the strength of a semantic connection between two textual units based on various lexical and semantic relations such as hyponymy (car and vehicle) and functional associations (bank and manager). Humans have can infer implicit relationships between two textual units based on their knowledge about the world and their ability to reason about that knowledge. Automatically imitating this behavior is limited by restricted knowledge and poor ability to infer hidden relations. Various factors affect the performance of automated approaches to com- puting semantic association strength. One critical factor is the selection of a suitable knowledge source for extracting knowledge about the im- plicit semantic relations. In the past few years, semantic association com- putation approaches have started to exploit web-originated resources as substitutes for conventional lexical semantic resources such as thesauri, machine readable dictionaries and lexical databases. These conventional knowledge sources suffer from limitations such as coverage issues, high construction and maintenance costs and limited availability. To overcome these issues one solution is to use the wisdom of crowds in the form of collaboratively constructed knowledge sources. An excellent example of such knowledge sources is Wikipedia which stores detailed information not only about the concepts themselves but also about various aspects of the relations among concepts. The overall goal of this thesis is to demonstrate that using Wikipedia for computing word association strength yields better estimates of hu- mans’ associations than the approaches based on other structured and un- structured knowledge sources.
    [Show full text]
  • Automatic Spelling Correction Based on N-Gram Model
    International Journal of Computer Applications (0975 – 8887) Volume 182 – No. 11, August 2018 Automatic Spelling Correction based on n-Gram Model S. M. El Atawy A. Abd ElGhany Dept. of Computer Science Dept. of Computer Faculty of Specific Education Damietta University Damietta University, Egypt Egypt ABSTRACT correction. In this paper, we are designed, implemented and A spell checker is a basic requirement for any language to be evaluated an end-to-end system that performs spellchecking digitized. It is a software that detects and corrects errors in a and auto correction. particular language. This paper proposes a model to spell error This paper is organized as follows: Section 2 illustrates types detection and auto-correction that is based on n-gram of spelling error and some of related works. Section 3 technique and it is applied in error detection and correction in explains the system description. Section 4 presents the results English as a global language. The proposed model provides and evaluation. Finally, Section 5 includes conclusion and correction suggestions by selecting the most suitable future work. suggestions from a list of corrective suggestions based on lexical resources and n-gram statistics. It depends on a 2. RELATED WORKS lexicon of Microsoft words. The evaluation of the proposed Spell checking techniques have been substantially, such as model uses English standard datasets of misspelled words. error detection & correction. Two commonly approaches for Error detection, automatic error correction, and replacement error detection are dictionary lookup and n-gram analysis. are the main features of the proposed model. The results of the Most spell checkers methods described in the literature, use experiment reached approximately 93% of accuracy and acted dictionaries as a list of correct spellings that help algorithms similarly to Microsoft Word as well as outperformed both of to find target words [16].
    [Show full text]
  • Hungarian Geographical Bulletin 68 (2019) (1) 93–101
    DOI: 10.15201/hungeobull.68.1.7Book review section – HungarianHungarian Geographical Geographical Bulletin Bulletin 68 (2019) 68 2019 (1) (1) 93–101. 93 BOOK REVIEW SECTION Kocsis, K., Gercsák, G., Horváth, G., Keresztesi, Z. and Nemerkényi, Z. (eds.): National Atlas of Hungary Vol. 2: Natural Environment. Budapest, Hungarian Academy of Sciences, Research Centre for Astronomy and Earth Sciences, Geographical Institute. 2018. 183 p. The current volume is Part 2 of the new “National advisory committee, and István Klinghammer, an Atlas of Hungary”, and the first volume that has been outstanding representative of Hungarian cartogra- published out of the four volumes, for Part 1 (The phy, as honorary president of the editorial board. Hungarian State and Its Place in the World) will arrive The current four volume project constitutes the later. Part 3 (Society) comes next, very likely in 2019, third generation of national atlases in Hungary af- and Part 4 (Economy) is also to be published soon, ter the first such work in 1967, edited by Sándor before Part 1. A digital version will accompany the Radó, and the second atlas in 1989, which was ed- hardcopy of each volume, which will be distributed ited by Márton Pécsi and was later complemented in two editions, Hungarian and English. by additional maps. Actually, one can also consider The list of editors and authors reads like a ‘Who is it the fourth generation if counting András Rónai’s who?’ in Hungarian geography and cartography. At “Atlas of Central-Europe” from 1945 as a national the top is Károly Kocsis, director of the Geographical atlas.
    [Show full text]
  • Development of a Persian Syntactic Dependency Treebank
    Development of a Persian Syntactic Dependency Treebank Mohammad Sadegh Rasooli Manouchehr Kouhestani Amirsaeid Moloodi Department of Computer Science Department of Linguistics Department of Linguistics Columbia University Tarbiat Modares University University of Tehran New York, NY Tehran, Iran Tehran, Iran [email protected] [email protected] [email protected] Abstract tions in tasks such as machine translation. Depen- dency treebanks are collections of sentences with This paper describes the annotation process their corresponding dependency trees. In the last and linguistic properties of the Persian syn- decade, many dependency treebanks have been de- tactic dependency treebank. The treebank veloped for a large number of languages. There are consists of approximately 30,000 sentences at least 29 languages for which at least one depen- annotated with syntactic roles in addition to morpho-syntactic features. One of the unique dency treebank is available (Zeman et al., 2012). features of this treebank is that there are al- Dependency trees are much more similar to the hu- most 4800 distinct verb lemmas in its sen- man understanding of language and can easily rep- tences making it a valuable resource for ed- resent the free word-order nature of syntactic roles ucational goals. The treebank is constructed in sentences (Kubler¨ et al., 2009). with a bootstrapping approach by means of available tagging and parsing tools and man- Persian is a language with about 110 million ually correcting the annotations. The data is speakers all over the world (Windfuhr, 2009), yet in splitted into standard train, development and terms of the availability of teaching materials and test set in the CoNLL dependency format and annotated data for text processing, it is undoubt- is freely available to researchers.
    [Show full text]
  • Study on Spell-Checking System Using Levenshtein Distance Algorithm
    International Journal of Recent Development in Engineering and Technology Website: www.ijrdet.com (ISSN 2347-6435(Online) Volume 8, Issue 9, September 2019) Study on Spell-Checking System using Levenshtein Distance Algorithm Thi Thi Soe1, Zarni Sann2 1Faculty of Computer Science 2Faculty of Computer Systems and Technologies 1,2FUniversity of Computer Studies (Mandalay), Myanmar Abstract— Natural Language Processing (NLP) is one of If the word is not found, it is considered to be an error, the most important research area carried out in the world of and an attempt may be made to suggest that word. When a Artificial Intelligence (AI). NLP supports the tasks of a word which is not within the dictionary is encountered, portion of AI such as spell checking, machine translation, most spell checkers provide an option to add that word to a automatic text summarization, and information extraction, so list of known exceptions that should not be flagged. An on. Spell checking application presents valid suggestions to the user based on each mistake they encounter in the user’s adaptive spelling checker tool based on Ternary Search document. The user then either makes a selection from a list Tree data structure is described in [1]. A comprehensive of suggestions or accepts the current word as valid. Spell- spelling checker application presented a significant checking program is often integrated with word processing challenge in producing suggestions for a misspelled word that checks for the correct spelling of words in a document. when employing the traditional methods in [2]. It learned Each word is compared against a dictionary of correctly spelt the complex orthographic rules of Bangla.
    [Show full text]
  • Spell Checker in CET Designer
    Linköping University | Department of Computer Science Bachelor thesis, 16 ECTS | Datateknik 2016 | LIU-IDA/LITH-EX-G--16/069--SE Spell checker in CET Designer Rasmus Hedin Supervisor : Amir Aminifar Examiner : Zebo Peng Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/. Copyright The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose.
    [Show full text]