Engineering ABSTRACT Building an Efficient Reverse Dictionary

Research Paper Volume : 3 | Issue : 5 | EngineeringMay 2014 • ISSN No 2277 - 8179 KEYWORDS : Building an Efficient Reverse Dictionary Sneha P.S PG Scholar, Department of Computer Science & Engineering, Vedavyasa Institute of Technology, University of Calicut , Kerala, India Mrs.S.Kavitha Associate Professor, Department of Computer Science & Engineering, Vedavyasa Murugesan Institute of Technology, University of Calicut , Kerala, India ABSTRACT In a traditional forward dictionary, words are mapped to their definitions. In the case of a reverse dictionary, a phrase describing the desired concept is taken as the user input, and returns a set of candidate words that satisfy the input phrase. This research work describes the implementation of an efficient reverse dictionary where the user is also able to find out the actual dictionary definition of those words returned as output and thereby help the user to check the accuracy of the re- sults obtained from the reverse dictionary output. Effectively, a Reverse Dictionary addresses the “word is on the tip of my tongue, but I can’t quite remember it” problem. This work has significant application not only for the general public, particularly those who work closely with words, but also in the general field of conceptual search. This research work proposes a novel approach called “wordster” to create and process the reverse dictionary and also evaluate the retrieval accuracy and runtime response time performance. The wordster approach can provide significant improvements in performance scale without sacrificing the quality of the result. I. INTRODUCTION dictionary .An online dictionary [1] is a dictionary that is acces- WordNet organizes them in the order of the most frequently sibleThis papervia the report Internet work through on creation a web of browser. an efficient A forward online dictionreverse- used to the least frequently used (Semcor). word searching. However, there is no text stemming. The exact we already had the meaning or the idea but aren’t too sure of Work in text analyzer, surveyed in detail in [3] , performs exact theary appropriateis one which word, maps thenfrom REVERSEwords to DICTIONARYtheir definitions. is the What right if aretext words matching that isare case used sensitive. in particular That phrasesmeans, wherethe word they “JAVA” denote is different from the word “Java”. Synonyms such as “sick” and “ill” one for use. For example, to find the word “doctor” knowing only that he is a “person who cures disease”. The RD addresses the mustunique consider meaning multi-word but are not expressions identified. andThe multiplework discussed senses inof students,“word is onprofessional the tip of writers,my tongue, scientists, but I can’tmarketing quite andremember adver- polysemous[4] need to tackle words other and theirparts translations.of speech in additionThe pairwise to nouns similar and- tisementit” problem. professionals, This problem teachers, is mainly the facedlist goes by writers,on. including ity of words in two phrases is well described in and the major - the following two constraints. First, the user input phrase may drawback identified in the work is sequence of words is not con While creating a reverse dictionary, it is important to consider thesidered same important.For words and so example, the method Consider in [2] twowould phrases: consider “the them dog - who bit the man” and “the man who bit the dog.” They contain not exactly match the forward dictionary definition of a word.- convey very different meanings. For example, user may enter the phrase “talks a lot” when look virtually equivalent, without considering that the two phrases containing for asame concept words such but as are“garrulous” conceptually whose similar. dictionary Second, defini the SVM approach discussed in [5] deal with the pre-creation of a tion might be “full of trivial conversation”. Both phrases do not online. According to a recent study, online users become impa- phase.Since the SVM approach has access to all the word rela- response to a request must be quick since it needs to be usable tionshipcontext vector data available for each andword compares in WordNet the inputduring phrase the learning to the - tient if the response to a request takes longer than 4-5 seconds. In order to build a is reverse a lexical dictionary, database whichfirst a is forward available diction online context vector for each word in the WordNet corpus, we can use andary isprovides needed. a WordNetlarge repository is the forward of English dictionary lexical items.used inThere this asthe in SVM SVM).Once quality as the a roughmodel benchmark has been created for high and quality given output. input Into work. WordNet [2] which are theLSI[6], system, pre-create it can thebe directlymodel for searched each word by ain user WordNet(Similar input U. The main disadvantage with the SVM approach and LSI is that the is a multilingual WordNet for European languages response time is high. fourstructured types ofin Partsthe same of Speech way as (POS) the English- noun, verb,language adjective, WordNet. and WordNet was designed to establish the connections between- In LDA [7], input all the dictionary words and their correspond- - ing context vectors to GibbsLDA++, and receive a set of classes adverb. The smallest unit in a WordNet is synset, which repre as output. Then provide the user input U to identify the class undersents a one specific type meaningof POS is ofcalled a word. a sense. It includes Each sense the word, of a word its ex is to which U belongs, based on GibbsLDA++ inference. The dic- planation, and its synonyms. The specific meaning of one word- tures containing sets of terms with synonymous meanings. Each Reverse Dictionary output. LDA can be readily embedded in a in a different synset. Synsets are equivalent to senses= struc- moretionary complex words correspondingmodel—a property to that that class is not are possessed identified by as LSI.the ample, the words night, night-time and dark constitute a single The work described in need to compute the user input concept synset hasthat a hasgloss the that following defines thegloss: concept the time it represents. after sunset For and ex before sunrise while it is dark outside. Synsets are connected between this and the dictionary concept vectors. Vector com- to one another through the explicit semantic relations. Some of vector at runtime and then subsequently compute the distance these relations (hypernym, hyponym for nouns and hypernym and troponym for verbs) constitute is-a-kind-of (holonymy) and schemes.putation is known to be quite computationally intensive even is-a-part-of (meronymy for nouns) hierarchies.For example, though much effort has been expended in building efficient tree is a kind of plant, tree is a hyponym of plant and plant is a The Cp/cv method described in deals with single word focus hypernym of tree. Analogously, trunk is a part of a tree and takes and addresses multiword similarity problem- where semantic trunk as a meronym of tree and tree is a holonym of trunk. For similarity must be computed between multiword phrases. The one word and one type of POS, if there is more than one sense, two existing online reverse dictionaries [6],[7] available do not IJSR - INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH 259 Research Paper Volume : 3 | Issue : 5 | May 2014 • ISSN No 2277 - 8179 parameter representing the maximum number of words to be rank the output and thereby reduces the efficiency. returnedbased on similarityas output. to U and return top β Ws where β is an input overcome all the problems discussed above and thereby create This paper introduces a new approach called Wordster that C. Ranking candidate words II.an efficient PROPOSED reverse APPROACH dictionary. found is compared with the user input phrase U. On the basis The proposed RD system is based on the fact that the user input ofHere that, semantic sorts a similarityset of output of definitionwords in theof candidate order of decreasingwords “S” similarity to U as compared to S. There is a need to assign a exactly matching, then at least it must be conceptually similar. similarity measure for each ( S,U ) pair, where U is the user in- phrase should resemble the word’s actual definition, and, if not phase is the find candidate word .In this phase, when a user in- Here it is necessary to compute both term similarity and term The proposed approach mainly consists of two phases. The first importance.put phrase and S is the definition of candidate words found out. inputput phrase phrase. is received, This phase first consists find candidate of two keywords substeps: from a forward 1)build D. Term Similarity dictionary, whose definition have some similarity to the user Compute term similarity between two terms based on their lo- - the RMS ; and 2) query the RMS. The second phase involves - the ranking of those candidate words in the order of quality cation in the WordNet hierarchy. The WordNet hierarchy organ of match. Before discussing the above steps in detail, there is a ofizes the words two terms in English and iflanguage the LCA from of two general terms at in root this tohierarchy most spe is Synonymneed to define set, severalsyn(t): A concepts set of conceptually used in this work.related terms for t. root,cific at then leaf those nodes. two Consider terms will the have LCA little(Least similarity. Common If Ancestor the LCA is ) syn(talk) might consist of {speak, utter, mouth}. more deeper, two terms will have greater similarity. W AntonymFor example, set, Want(t) : A set of conceptually opposite or negated to compute the similarity ant - W Define a similarity function terms for t. For example, W (pleasant) might consist of {“un between two terms ‘a’ and ‘b’.

Engineering ABSTRACT Building an Efficient Reverse Dictionary

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support