Usage of Wordnet in Natural Language Generation

Total Page:16

File Type:pdf, Size:1020Kb

Usage of Wordnet in Natural Language Generation Usage of WordNet in Natural Language Generation Hongyan Jing Department of Computer Science Columbia University New York, NY 10027, USA [email protected] Abstract 2 Why a valuable resource for generation? WordNet has rarely been applied to natural lan- WordNet is a potentially valuable resource for guage generation, despite of its wide applica- generation for four reasons. First, Synonym tion in other fields. In this paper, we address sets in WordNet (synsets) can possibly provide three issues in the usage of WordNet in gener- large amount of lexical paraphrases. One ma- ation: adapting a general lexicon like WordNet jor shortcoming of current generation systems is to a specific application domain, how the infor- its poor expressive capability. Usually none or mation in WordNet can be used in generation, very limited paraphrases are provided by a gen- and augmenting WordNet with other types of eration system due to the cost of hand-coding in knowledge that are helpful for generation. We the lexicon. Synsets, however, provide the pos- propose a three step procedure to tailor Word- sibility to generate lexical paraphrases without Net to a specific domain, and carried out ex- tedious hand-coding in individual systems. For periments on a basketball corpus (1,015 game example, for the output sentence \Jordan hit a reports, 1.7MB). jumper", we can generate the paraphrase \Jor- dan hit a jump shot" simply by replacing the word jumper in the sentence with its synonym 1 Introduction jump shot listed in WordNet synset. Whereas, such replacements are not always appropriate. WordNet (Miller et al., 1990) has been success- For example, tally and rack up are listed as syn- fully applied in many human language related onyms of the word score, although the sentence applications, such as word sense disambigua- like \Jordan scored 22 points" are common in tion, information retrieval, and text categoriza- newspaper sport reports, sentences like \Jor- tion; yet generation is among the fields in which dan tallied 22 points" or \Jordan racked up 22 the application of WordNet has rarely been ex- points" seldomly occur. To successfully apply plored. We demonstrate in this paper that, as a WordNet for paraphrasing, we need to develop rich semantic net, WordNet is indeed a valuable techniques which can correctly identify inter- resource for generation. We propose a corpus changeability of synonyms in a certain context. based technique to adapt WordNet to a specific Secondly, as a semantic net linked by lexi- domain and present experiments in the basket- cal relations, WordNet can be used for lexical- ball domain. We also discuss possible ways to ization in generation. Lexicalization maps the use WordNet knowledge in the generation task semantic concepts to be conveyed to appropri- and to augment WordNet with other types of ate words. Usually it is achieved by step-wise knowledge. refinements based on syntactic, semantic, and In Section 2, we answer the question why pragmatic constraints while traversing a seman- WordNet is useful for generation. In Section tic net (Danlos, 1987). Currently most genera- 3, we discuss problems to be solved to success- tion systems acquire their semantic net for lexi- fully apply WordNet to generation. In Section calization by building their own, while WordNet 4, we present techniques to solve the problems. provides the possibility to acquire such knowl- Finally, we present future work and conclude. edge automatically from an existing resource. Next, WordNet ontology can be used for Once WordNet is tailored to the domain, the building domain ontology. Most current genera- main problem is how to use its knowledge in the tion systems manually build their domain ontol- generation process. As we mentioned in section ogy from scratch. The process is time and labor 2, WordNet can potentially benefit generation intensive, and introduction of errors is likely. in three aspects: producing large amount of lex- WordNet ontology has a wide coverage, so can ical paraphrases, providing the semantic net for possibly be used as a basis for building domain lexicalization, and providing a basis for building ontology. The problem to be solved is how to domain ontology. A number of problems to be adapt it to a specific domain. solved at this stage, including: (a)while using Finally, WordNet is indexed by concepts synset for producing paraphrases, how to de- rather than merely by words makes it especially termine whether two synonyms are interchange- desirable for the generation task. Unlike lan- able in a particular context? (b)while WordNet guage interpretation, generation has as inputs can provide the semantic net for lexicalization, the semantic concepts to be conveyed and maps the constraints to choose a particular node dur- them to appropriate words. Thus an ideal gen- ing lexical choice still need to be established. eration lexicon should be indexed by semantic (c) How to use the WordNet ontology? concepts rather than words. Most available lin- The last problem is relevant to augmenting guistic resources are not suitable to use in gen- WordNet with other types of information. Al- eration directly due to their lack of mapping be- though WordNet is a rich lexical database, it tween concepts and words. WordNet is by far can not contain all types of information that the richest and largest database among all re- are needed for generation, for example, syntac- sources that are indexed by concepts. Other rel- tic information in WordNet is weak. It is then atively large and concept-based resources such worthwhile to investigate the possibility to com- as PENMAN ontology (Bateman et al., 1990) bine it with other resources. usually include only hyponymy relations com- In the following section, we address the above pared to the rich types of lexical relations pre- issues in order and present our experiment re- sented in WordNet. sults in the basketball domain. 3 Problems to be solved 4 Solutions Despite the above advantages, there are some 4.1 Adapting WordNet to a domain problems to be solved for the application of We propose a corpus based method to automat- WordNet in a generation system to be success- ically adapt a general resource like WordNet to ful. a domain. Most generation systems still use The first problem is how to adapt WordNet hand-coded lexicons and ontologies, however, to a particular domain. With 121,962 unique corpus based automatic techniques are in de- words, 99,642 synsets, and 173,941 senses of mand as natural language generation is used in words as of version 1.6, WordNet represents the more ambitious applications and large corpora largest publically available lexical resource to in various domains are becoming available. The date. The wide coverage on one hand is benefi- proposed method involves three steps of pro- cial, since as a general resource, wide coverage cessing. allows it to provide information for different ap- plications. On the other hand, this can also be Step 1: Prune unused words and quite problematic since it is very difficult for synsets an application to efficiently handle such a large We first prune words and synsets that are database. Therefore, the first step towards uti- listed in WordNet but not used in the domain. lizing WordNet in generation is to prune unre- This is accomplished by tagging the domain cor- lated information in the general database so as pus with part of speech information, then for to tailor it to the domain. On the other hand, each word in WordNet, if it appears in the do- domain specific knowledge that is not covered main corpus and its part of speech is the same by the general database needs to be added to as that in the corpus, the word is kept in the re- the database. sult, otherwise it is eliminated; for each synset in WordNet, if none of the words in the synset { A } appears in the domain corpus, the synset as a / \ { A } whole is deleted. The only exception is that if { B } ... ===> / \ a synset is the closest common ancestor of two / \ { D } ... synsets in the domain corpus, the synset is al- { C } { F } ways kept in the result. The reason to keep this / \ kind of synsets is to generalize the semantic cat- { D }{ E } egory of verb arguments, as we illustrate in step 2. The frequency of words in such synsets will before after be marked zero so that they will not be used in output. Figure 1 shows two example prun- (A) Synset A and D appear in the corpus, ing operations: (A) is a general case, and (B) while B, C, E, and F do not. is the case involving ancestor synset. In this step, words are not yet disambiguated, so all the senses of a word remain in the result; the prun- { A } ing of unlikely senses is achieved in step 2, when / \ { A } verb argument clusters are utilized. Words that { B } { C } ===> / \ are in the corpus but not covered by WordNet / \ { B }{ D } are also identified in this stage, and later at step { D }{ E} 3, we guess the meanings of these known words and place them into domain ontology. before after A total of 1,015 news reports on basketball games (1.7MB, Clarinet news, 1990-1991) were (B) Synset B and D appear in the corpus, collected. The frequency count reported totally A, C, and E do not. Note Synset A is not 1,414 unique nouns (proper names excluded) removed since it's the closest ancestor and 993 unique verbs in the corpus. Compared of B and D. to 94,473 nouns and 10,318 verbs in WordNet 1.6, only 1.5% of nouns and 9.6% of verbs are Figure 1: Examples for corpus based pruning used in the domain.
Recommended publications
  • Automatic Wordnet Mapping Using Word Sense Disambiguation*
    Automatic WordNet mapping using word sense disambiguation* Changki Lee Seo JungYun Geunbae Leer Natural Language Processing Lab Natural Language Processing Lab Dept. of Computer Science and Engineering Dept. of Computer Science Pohang University of Science & Technology Sogang University San 31, Hyoja-Dong, Pohang, 790-784, Korea Sinsu-dong 1, Mapo-gu, Seoul, Korea {leeck,gblee }@postech.ac.kr seojy@ ccs.sogang.ac.kr bilingual Korean-English dictionary. The first Abstract sense of 'gwan-mog' has 'bush' as a translation in English and 'bush' has five synsets in This paper presents the automatic WordNet. Therefore the first sense of construction of a Korean WordNet from 'gwan-mog" has five candidate synsets. pre-existing lexical resources. A set of Somehow we decide a synset {shrub, bush} automatic WSD techniques is described for among five candidate synsets and link the sense linking Korean words collected from a of 'gwan-mog' to this synset. bilingual MRD to English WordNet synsets. As seen from this example, when we link the We will show how individual linking senses of Korean words to WordNet synsets, provided by each WSD method is then there are semantic ambiguities. To remove the combined to produce a Korean WordNet for ambiguities we develop new word sense nouns. disambiguation heuristics and automatic mapping method to construct Korean WordNet based on 1 Introduction the existing English WordNet. There is no doubt on the increasing This paper is organized as follows. In section 2, importance of using wide coverage ontologies we describe multiple heuristics for word sense for NLP tasks especially for information disambiguation for sense linking.
    [Show full text]
  • A Method for Automatically Building and Evaluating Dictionary Resources
    A Method for Automatically Building and Evaluating Dictionary Resources Smaranda Muresan∗, Judith Klavansy ∗Computer Science Department, Columbia University 500 West 120th, New York, USA [email protected] yCenter for Research on Information Access, Columbia University 535 West 114th St, New York, USA [email protected] Abstract This paper describes a method toward automatically building dictionaries from text. We present DEFINDER, a rule-based system for extraction of definitions from on-line consumer-oriented medical articles. We provide an extensive evaluation on three dimensions: i) performance of the definition extraction technique in terms of precision and recall, ii) quality of the built dictionary as judged both by specialists and lay users, iii) coverage of existing on-line dictionaries. The corpus we used for the study is publicly available. A major contribution of the paper is the range of quantitative and qualitative evaluation methods. 1. Introduction used in the context of summarization of technical articles to Most machine readable dictionaries or glossaries are ei- provide explanation of the technical terms in lay language. ther manually built by human experts or transformed in Our system, was extensively evaluated. First we eval- electronic forms from hard-copy versions through an ex- uated the performance of the definition extraction method pensive digitization process. Also for some particular do- by comparing the results against a gold standard in terms mains, such as medical domain, the effort is concentrated in of precision and recall. Second, we evaluated the quality of building technical dictionaries for specialists that are of lit- the dictionary as judged both by specialists and lay users.
    [Show full text]
  • Biomedical Entity Representations with Synonym Marginalization
    Biomedical Entity Representations with Synonym Marginalization Mujeen Sung Hwisang Jeon Jinhyuk Leey Jaewoo Kangy Korea University fmujeensung,j hs,jinhyuk lee,[email protected] Abstract Unlike named entities from general domain text, typical biomedical entities have several different Biomedical named entities often play impor- surface forms, making the normalization of biomed- tant roles in many biomedical text mining ical entities very challenging. For instance, while tools. However, due to the incompleteness two chemical entities ‘motrin’ and ‘ibuprofen’ be- of provided synonyms and numerous varia- tions in their surface forms, normalization of long to the same concept ID (MeSH:D007052), biomedical entities is very challenging. In this they have completely different surface forms. paper, we focus on learning representations of On the other hand, mentions having similar sur- biomedical entities solely based on the syn- face forms could also have different meanings onyms of entities. To learn from the incom- (e.g. ‘dystrophinopathy’ (MeSH:D009136) and plete synonyms, we use a model-based candi- ‘bestrophinopathy’ (MeSH:C567518)). These ex- date selection and maximize the marginal like- amples show a strong need for building latent rep- lihood of the synonyms present in top candi- dates. Our model-based candidates are itera- resentations of biomedical entities that capture se- tively updated to contain more difficult neg- mantic information of the mentions. ative samples as our model evolves. In this In this paper, we propose a novel framework for way, we avoid the explicit pre-selection of learning biomedical entity representations based on negative samples from more than 400K can- the synonyms of entities.
    [Show full text]
  • Universal Or Variation? Semantic Networks in English and Chinese
    Universal or variation? Semantic networks in English and Chinese Understanding the structures of semantic networks can provide great insights into lexico- semantic knowledge representation. Previous work reveals small-world structure in English, the structure that has the following properties: short average path lengths between words and strong local clustering, with a scale-free distribution in which most nodes have few connections while a small number of nodes have many connections1. However, it is not clear whether such semantic network properties hold across human languages. In this study, we investigate the universal structures and cross-linguistic variations by comparing the semantic networks in English and Chinese. Network description To construct the Chinese and the English semantic networks, we used Chinese Open Wordnet2,3 and English WordNet4. The two wordnets have different word forms in Chinese and English but common word meanings. Word meanings are connected not only to word forms, but also to other word meanings if they form relations such as hypernyms and meronyms (Figure 1). 1. Cross-linguistic comparisons Analysis The large-scale structures of the Chinese and the English networks were measured with two key network metrics, small-worldness5 and scale-free distribution6. Results The two networks have similar size and both exhibit small-worldness (Table 1). However, the small-worldness is much greater in the Chinese network (σ = 213.35) than in the English network (σ = 83.15); this difference is primarily due to the higher average clustering coefficient (ACC) of the Chinese network. The scale-free distributions are similar across the two networks, as indicated by ANCOVA, F (1, 48) = 0.84, p = .37.
    [Show full text]
  • NL Assistant: a Toolkit for Developing Natural Language: Applications
    NL Assistant: A Toolkit for Developing Natural Language Applications Deborah A. Dahl, Lewis M. Norton, Ahmed Bouzid, and Li Li Unisys Corporation Introduction scale lexical resources have been integrated with the toolkit. These include Comlex and WordNet We will be demonstrating a toolkit for as well as additional, internally developed developing natural language-based applications resources. The second strategy is to provide easy and two applications. The goals of this toolkit to use editors for entering linguistic information. are to reduce development time and cost for natural language based applications by reducing Servers the amount of linguistic and programming work needed. Linguistic work has been reduced by Lexical information is supplied by four external integrating large-scale linguistics resources--- servers which are accessed by the natural Comlex (Grishman, et. al., 1993) and WordNet language engine during processing. Syntactic (Miller, 1990). Programming work is reduced by information is supplied by a lexical server based automating some of the programming tasks. The on the 50K word Comlex dictionary available toolkit is designed for both speech- and text- from the Linguistic Data Consortium. Semantic based interface applications. It runs in a information comes from two servers, a KB Windows NT environment. Applications can server based on the noun portion of WordNet run in either Windows NT or Unix. (70K concepts), and a semantics server containing case frame information for 2500 System Components English verbs. A denotations server using unique concept names generated for each The NL Assistant toolkit consists of WordNet synset at ISI links the words in the lexicon to the concepts in the KB.
    [Show full text]
  • Similarity Detection Using Latent Semantic Analysis Algorithm Priyanka R
    International Journal of Emerging Research in Management &Technology Research Article August ISSN: 2278-9359 (Volume-6, Issue-8) 2017 Similarity Detection Using Latent Semantic Analysis Algorithm Priyanka R. Patil Shital A. Patil PG Student, Department of Computer Engineering, Associate Professor, Department of Computer Engineering, North Maharashtra University, Jalgaon, North Maharashtra University, Jalgaon, Maharashtra, India Maharashtra, India Abstract— imilarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the S closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index.
    [Show full text]
  • Introduction to Wordnet: an On-Line Lexical Database
    Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller (Revised August 1993) WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list. Unfortunately, there is no obvious alternative, no other simple way for lexicographers to keep track of what has been done or for readers to ®nd the word they are looking for. But a frequent objection to this solution is that ®nding things on an alphabetical list can be tedious and time-consuming. Many people who would like to refer to a dictionary decide not to bother with it because ®nding the information would interrupt their work and break their train of thought. In this age of computers, however, there is an answer to that complaint. One obvious reason to resort to on-line dictionariesÐlexical databases that can be read by computersÐis that computers can search such alphabetical lists much faster than people can. A dictionary entry can be available as soon as the target word is selected or typed into the keyboard. Moreover, since dictionaries are printed from tapes that are read by computers, it is a relatively simple matter to convert those tapes into the appropriate kind of lexical database.
    [Show full text]
  • Web Search Result Clustering with Babelnet
    Web Search Result Clustering with BabelNet Marek Kozlowski Maciej Kazula OPI-PIB OPI-PIB [email protected] [email protected] Abstract 2 Related Work In this paper we present well-known 2.1 Search Result Clustering search result clustering method enriched with BabelNet information. The goal is The goal of text clustering in information retrieval to verify how Babelnet/Babelfy can im- is to discover groups of semantically related docu- prove the clustering quality in the web ments. Contextual descriptions (snippets) of docu- search results domain. During the evalua- ments returned by a search engine are short, often tion we tested three algorithms (Bisecting incomplete, and highly biased toward the query, so K-Means, STC, Lingo). At the first stage, establishing a notion of proximity between docu- we performed experiments only with tex- ments is a challenging task that is called Search tual features coming from snippets. Next, Result Clustering (SRC). Search Results Cluster- we introduced new semantic features from ing (SRC) is a specific area of documents cluster- BabelNet (as disambiguated synsets, cate- ing. gories and glosses describing synsets, or Approaches to search result clustering can be semantic edges) in order to verify how classified as data-centric or description-centric they influence on the clustering quality (Carpineto, 2009). of the search result clustering. The al- The data-centric approach (as Bisecting K- gorithms were evaluated on AMBIENT Means) focuses more on the problem of data clus- dataset in terms of the clustering quality. tering, rather than presenting the results to the user. Other data-centric methods use hierarchical 1 Introduction agglomerative clustering (Maarek, 2000) that re- In the previous years, Web clustering engines places single terms with lexical affinities (2-grams (Carpineto, 2009) have been proposed as a solu- of words) as features, or exploit link information tion to the issue of lexical ambiguity in Informa- (Zhang, 2008).
    [Show full text]
  • Sethesaurus: Wordnet in Software Engineering
    This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TSE.2019.2940439 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 14, NO. 8, AUGUST 2015 1 SEthesaurus: WordNet in Software Engineering Xiang Chen, Member, IEEE, Chunyang Chen, Member, IEEE, Dun Zhang, and Zhenchang Xing, Member, IEEE, Abstract—Informal discussions on social platforms (e.g., Stack Overflow, CodeProject) have accumulated a large body of programming knowledge in the form of natural language text. Natural language process (NLP) techniques can be utilized to harvest this knowledge base for software engineering tasks. However, consistent vocabulary for a concept is essential to make an effective use of these NLP techniques. Unfortunately, the same concepts are often intentionally or accidentally mentioned in many different morphological forms (such as abbreviations, synonyms and misspellings) in informal discussions. Existing techniques to deal with such morphological forms are either designed for general English or mainly resort to domain-specific lexical rules. A thesaurus, which contains software-specific terms and commonly-used morphological forms, is desirable to perform normalization for software engineering text. However, constructing this thesaurus in a manual way is a challenge task. In this paper, we propose an automatic unsupervised approach to build such a thesaurus. In particular, we first identify software-specific terms by utilizing a software-specific corpus (e.g., Stack Overflow) and a general corpus (e.g., Wikipedia). Then we infer morphological forms of software-specific terms by combining distributed word semantics, domain-specific lexical rules and transformations.
    [Show full text]
  • DICTIONARY News
    Number 17 y July 2009 Kernerman kdictionaries.com/kdn DICTIONARY News KD’s BLDS: a brief introduction In 2005 K Dictionaries (KD) entered a project of developing We started by establishing an extensive infrastructure both dictionaries for learners of different languages. KD had already contentwise and technologically. The lexicographic compilation created several non-English titles a few years earlier, but those was divided into 25 projects: 1 for the French core, 8 for the were basic word-to-word dictionaries. The current task marked dictionary cores of the eight languages, another 8 for the a major policy shift for our company since for the first time we translation from French to eight languages, and 8 more for the were becoming heavily involved in learner dictionaries with translation of each language to French. target languages other than English. That was the beginning of An editorial team was set up for developing each of the nine our Bilingual Learners Dictionaries Series (BLDS), which so (French + 8) dictionary cores. The lexicographers worked from far covers 20 different language cores and keeps growing. a distance, usually at home, all over the world. The chief editor The BLDS was launched with a program for eight French for each language was responsible for preparing the editorial bilingual dictionaries together with Assimil, a leading publisher styleguide and the list of headwords. Since no corpora were for foreign language learning in France which made a strategic publicly available for any of these languages, each editor used decision to expand to dictionaries in cooperation with KD. different means to retrieve information in order to compile the The main target users were identified as speakers of French headword list.
    [Show full text]
  • Natural Language Processing Based Generator of Testing Instruments
    California State University, San Bernardino CSUSB ScholarWorks Electronic Theses, Projects, and Dissertations Office of aduateGr Studies 9-2017 NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS Qianqian Wang Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd Part of the Other Computer Engineering Commons Recommended Citation Wang, Qianqian, "NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS" (2017). Electronic Theses, Projects, and Dissertations. 576. https://scholarworks.lib.csusb.edu/etd/576 This Project is brought to you for free and open access by the Office of aduateGr Studies at CSUSB ScholarWorks. It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS A Project Presented to the Faculty of California State University, San Bernardino In Partial Fulfillment of the Requirements for the DeGree Master of Science in Computer Science by Qianqian Wang September 2017 NATURAL LANGUAGE PROCESSING BASED GENERATOR OF TESTING INSTRUMENTS A Project Presented to the Faculty of California State University, San Bernardino by Qianqian Wang September 2017 Approved by: Dr. Kerstin VoiGt, Advisor, Computer Science and EnGineerinG Dr. TonG Lai Yu, Committee Member Dr. George M. Georgiou, Committee Member © 2017 Qianqian Wang ABSTRACT Natural LanGuaGe ProcessinG (NLP) is the field of study that focuses on the interactions between human lanGuaGe and computers. By “natural lanGuaGe” we mean a lanGuaGe that is used for everyday communication by humans. Different from proGramminG lanGuaGes, natural lanGuaGes are hard to be defined with accurate rules. NLP is developinG rapidly and it has been widely used in different industries.
    [Show full text]
  • Download Natural Language Toolkit Tutorial
    Natural Language Processing Toolkit i Natural Language Processing Toolkit About the Tutorial Language is a method of communication with the help of which we can speak, read and write. Natural Language Processing (NLP) is the sub field of computer science especially Artificial Intelligence (AI) that is concerned about enabling computers to understand and process human language. We have various open-source NLP tools but NLTK (Natural Language Toolkit) scores very high when it comes to the ease of use and explanation of the concept. The learning curve of Python is very fast and NLTK is written in Python so NLTK is also having very good learning kit. NLTK has incorporated most of the tasks like tokenization, stemming, Lemmatization, Punctuation, Character Count, and Word count. It is very elegant and easy to work with. Audience This tutorial will be useful for graduates, post-graduates, and research students who either have an interest in NLP or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites The reader must have basic knowledge about artificial intelligence. He/she should also be aware of basic terminologies used in English grammar and Python programming concepts. Copyright & Disclaimer Copyright 2019 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher.
    [Show full text]