DOI 10.4010/2016.639 ISSN 2321 3361 © 2016 IJESC

Research Article Volume 6 Issue No. 3

Reverse Veena Gurram1, Sweta Rathod 2, Anish Lushte3, Pranay Vaidya4, Vinod Alone5, Mahendra Pawar 6 UG Scholar1,2,3,4, Assistant Professor5,6 Department Of Computer Engineering PVPPCOE Sion, Mumbai, India.

Abstract: In this paper, we describe the design and implementation of a reverse dictionary. Unlike a traditional forward dictionary, which maps from to their definitions, a reverse dictionary takes a user input phrase describing the desired concept, and returns a set of candidate words that satisfy the input phrase. This work has significant application not only for the general public, particu- larly those who work closely with words, but also in the general field of conceptual search.

Index Terms: , , search process, web-based services.

I. INTRODUCTION III. LITERATURE SURVEY In this paper, we report work on creating an online reverse Literature survey is highlighted in reference to the dictionary (RD). As opposed to a regular (forward) dictionary performance and approach of the current system. that maps words to their definitions, a RD performs the con- Existing System: After referring existing system we came to a verse mapping, i.e., given a phrase describing the desired con- conclusion that they lack the following qualities:- cept, it provides words whose definitions match the entered • The existing dictionary outputs 100 results and most of them definition phrase. For example, suppose a forward dictionary are not related to the phrase. informs the user that the meaning of the “spelunking” is • In the existing system ranking of result is not efficient. “exploring caves.” A reverse dictionary, on the other hand, ([16]T.Dao and T.Simpson [6]Z Wu and offers the user an opportunity to enter the phrase “check out M.Palmer).Therefore accurate result for a given phrase is not natural caves” as input, and expect to receive the word guaranteed. “spelunking” (and possibly other words with similar mean- • Auto-correction of input phrase is not done in the existing ings) as output. systems for example if user does some mistake Effectively, the RD addresses the “word is on the tip of my while entering the input phrase, thus it will end up in giving tongue, but I can’t quite remember it” problem. A particular wrong results. category of people afflicted heavily by this problem are writ- ers, including students, professional writers, scientists, market- IV. PROPOSED SYSTEM ing and advertisement professionals, teachers, the list goes on. In a reverse dictionary the user enters the desired phrase with a In fact, for most people with a certain level of education, the logic and gets a certain number of words as the output.To do problem is often not lacking knowledge of the meaning of a so, we have to first build the RMS (Reverse Mapping Set). word, but, rather, being unable to recall the appropriate word Building an RMS means to find a set of words in whose defi- on demand. The RD addresses this widespread problem. nitions any word ‘w’ is found. Example: The word “sleep” is found in 4 definitions belonging to 4 words. Therefore II. RELATED WORK R(sleep) will be “slumber”, “sopor”, “nap”, ”rest”. These In a reverse dictionary a user input phrase is given and we words must be manually entered for each word. The RMS of receive certain number of words as output ranked with an the words can be found from the [2][6] dictionary.The algorithm. The related works to this reverse dictionary are the stop words like “whereas, whenever, however, very” etc needs reverse dictionaries [3][4] which is built with certain to be negated as they don’t form a very important part of drawbacks. The existing dictionary receives an input phrase theprocess. Whereas, Antonyms are needed to be addressed. and outputs many output words; therefore it can be tedious for Example: When the word sleep is followed by “cannot”, the the user to search one from it. Whereas, building a ranking antonym of “sleep”, which is “wake up” should be considered based dictionary, allows the user to choose from a set of words for the search process. The search can be enhanced by consid- which are closely related to each other. Example: In the ering the synonyms, hypernyms and hyponyms of that particu- existing reverse dictionary, the user inputs a phrase “unknown lar word. When the words do not yield enough output. The name” outputs over a 100 search results including synonym, hypernym and hyponym of that particular word is “anonymous”, “nameless”, “incognito”, “jane doe”, “john considered and the RMS of those words also form a part of the doe”, “some”, “sky-blue pink”, “dark horse” etc. [4], But the output words. A synonym is the other possible meaning of the most accurate result for “unknown name” shall be word, where as a hypernym is the common class under which “anonymous”, “nameless” but “incognito” also contributes the word occurs. Example: The word “parrot” belongs to some primary meaning to the user input phrase. Whereas “birds”, therefore the hypernym of “parrot” is birds. Whereas words like “sky-blue pink”, “challenge”, “key” doesn’t have the hyponym is the other similar birds like “macaw” etc. Con- anything to do with the concept but It is associated with the sidering synonyms, hypernyms and hyponyms will increase word “unknown”. Reverse mapping Set, RMS of t, denoted by the number of output words. i.e, if “parrot” doesn’t yield R(t) is mapped to the “definition” in the dictionary .Every enough results; the synonym, hyponym and hypernym of par- word that contains the definition, forms the suggested output. rot will yield more results. Finally, they are arranged according to the rank.

International Journal of Engineering Science and Computing, March 2016 2727 http://ijesc.org/

The final step is to sort out the results. This is done when there • Remaining important words are stemmed, or converted to are more number of output words. Example: In the existing their base form. Eg: “describing” becomes “describe”. reverse dictionary, the user inputs a phrase “discrimination based on colour” outputs over a 100 search results including “racism”, “classism”, “judgemental”, “colour bar”, “colour line”, “red”, “nepotism”, “fair” etc. But the most accurate re- sult for “discrimination based on colour” shall be “racism”, but “colour line”, “colour bar” also contributes some primary meaning to the user input phrase. Whereas words like “fair”, “red” doesn’t have anything to do with the concept but It is associated with the word “colour” [3]. In order to avoid such • Fig : Working unnecessary results we decrease the number of search words.By decreasing the number of outputs, we finally get the • Words/Phrases containing negation are converted to their number of words which are closely associated with the search antonyms. Eg: “not pleasant” becomes “unpleasant”. concept. Therefore, according to the previous example, when a • Tokens will be generated and query will be formed using user enters a phrase “discrimination based on colour”, the out- this tokens. put shall be “racism”, “prejudice”, “colour line”, “colour bar”, • Expand query if results are less results are found on basis of “nepotism” etc.Inorder to avoid too many words, the words are synonym, hypernym and hyponym. grouped together with other words , and the set of two words • Sort result on basis of word-net hierarchy by comparing term are found in the definition of a word [1]. Example: “discrimi- similarity and term importance in a phrase. nation based on colour”, the word “racism” consists of both • Display Result. “discrimination” and “colour”, therefore it must be given more priority. V. Module Description Algorithms: Advantages: • K-Means Algorithm: K-means clustering tends to find clus- 1. Time efficiency:-Quick output. ters of comparable spatial extent, while the expectation- 2. Accuracy:- Gives accurate word. maximization mechanism allows clusters to have different 3. Auto Correction:- The phrase entered by the user is cor- shapes. It is used for creating multiple data sets. rected if the spelling is wrong. • Algorithm Build RMS: RMS stands for Reverse Mapping System Architecture: Set. It is a mapping algorithm designed to map the word to words of similar meaning. It improves the quality of word mapped i.e not vulnerable to the input phrase. For an input dictionary D a mapping R is created for all term appearing in the sense phrase. The RMS algorithm describe this reverse mapping pattern. • Algorithm GenerateQuery: Here we generate a query for all the SetType that are mean to be used for mapping and re- trieval of reversed term for the given input phrase. Here Query Q is generated for all other algorithm that are returned to get the meaning for the given set of terms in the phrase. The is the building algorithm for the SetType and Sorted queries. • Algorithm ExecuteQuery: For a given query Q if u have phrase that contain terms t1, t2, t3, ...... tk, it performs AND/OR operations in query. If it performs OR operation Fig: System Architecture then the terms of the phrase are union with reverse term and if it performs AND operation then the term of the phrase in- The Reverse dictionary is a computer application which takes tersect with the reverse term and we returned the union or in- the user input phrase and gives the corresponding words as tersection of the reverse term. output.he RMS contains a set of mappings, the dictionary definitions and parse trees [8] for definitions. The database of • Algorithm ExpandAntonyms: Given: A query Q of the form synonyms which consists of the set of synonym for individual t1, t2, t3, ...... tk, it creates a copy of the query and perform words in the user input phrase. The hypernym/hyponym negation to create a sub query to replace all the terms and database, which consists the hyponym and hypernym sets for negated terms. If copy of the query is not equals to the copy each individual word in the user input phrase, whereas the of the original query the return copied query or else return its Antonym database consists of the set of antonym for each negated terms. word in the user input phrase. • Algorithm ExpandQuery: Given: A query Q of the form

t1,t2,t3,...... tk,we perform AND/OR operation for all ti in the Process: query If AND is perform in SetType as synonyms, antonyms • User enters a set of words/phrases to be looked up. and hyponyms, hypernyms to create a subquery q for the

above SetType respectively.For OR the term are replaced in • Stop words, or unwanted words, which do not affect the query q from Q and at last it return ExecuteQuery. meaning of a phrase or have minimal meaning, are removed

from the input. Eg: “this”.

International Journal of Engineering Science and Computing, March 2016 2728 http://ijesc.org/

• Algorithm SortResults: Create a empty list K and all the The user-interface which WAMP, LAMP and XAMPP servers term are arranged in order of its retrieval priority for ease provide for MySQL is easiest and reduces our work to a large mapping. The sorted term are arrange according to it extent. searched priority i.e term importance, semantic and weighted similarity factor to generate a candidate set that must be 6. JSP: JavaServer Pages (JSP) is a technology that helps Ranked using mathematical computation software developers create dynamically generated web pages based on HTML, XML, or other document types. Released in VI. Hardware and Software requirement 1999 by Sun Microsystems, JSP is similar to PHP and ASP, Hardware: but it uses the Java programming language. 1. Processor 2. RAM: 256 MB minimum VII. Result Analysis 3. Hard disk: 10 GB Software: Response Time Performance: Below given graph show the 1. Browser response time of Proposed system and previously available 2. Coding Language: JAVA dictionary comparing request rate against response time. 3. Database: MySQL 4. Server: Apache Tomcat Frontend: 1. HTML 2. CSS 3. JavaScript Backend: 1. MySQL 2. Java 3. JSP

Technologies Used:- 1. Java: Java is beginning in the second decade. Java is growing continuously as the day’s passes because its continuous up gradation rather than other programming languages. As time passes java grows stronger. Java leapt to Fig: Response Time the forefront of internet programming. The most important Quality Results: To improve quality of product find the exact features which plays very important role in web-site matching similar word of input phrase. development are as bellow:

• Servlet • Java Beans • JDBC Java is an object-oriented programming language that provides mechanisms to im- plement the three object-oriented models which includes encapsulation, inheritance and polymorphism. Java programming language is a simple language that can be programmed without extensive programmer training if you have some programming experience in cur- rent software practices. If you already understand the basic concepts of object-oriented programming, learning Java will be even easier. Java inherits the C/C++ syntax and many of the object- oriented features of C++.

2. HTML (Hypertext Markup Language):This will be used to create every single word, line, and image etc. with which Fig: Quality Parameter user will interact. VIII. CONCLUSION 3. CSS (Cascading Style Sheets):Every single element In this paper, we discuss the way to build a reverse dictionary created using HTML and PHP will be aligned and designed by and also note down the existing problems that occur in the CSS. process. We, therefore, suggest a group of methods for constructing and inputing a reverse dictionary, and show the 4. JavaScript: JavaScript will be used for dynamic web- result’s quality and also the run time and scalability. The pages. quality of this approach shows a greater improvement in the quality and run time as compared to the existing reverse 5. MySQL: MySQL is well known as world’s most widely dictionaries, which are onelook.com and used open-source database (back-end). It is most supportive dictionary.com.Hence, upon receiving a user‟s search concept, database for PHP as PHP-MySQL is most frequently used the reverse dictionary will successfully output a relevant word. open-source scripting database pair.

International Journal of Engineering Science and Computing, March 2016 2729 http://ijesc.org/

[16] T. Joachims, Svmlight, http://svmlight.joachims.org/, ACKNOWLEDGEMENT 2008 and T. Joachims, Svm-multiclass, We wish to acknowledge Prof. Vinod Alone and Prof. http://svmlight.joachims.org/ svm multiclass.html, 2008. Mahendra Pawar for their throughout support and guidance in every step from conceptualization to implementation of system aiding in successful completion of this survey paper.

REFERENCES [1]Ryan Shaw, Member, IEEE, AnindyaDatta, Member, IEEE, Debra Vander Meer, Member, IEEE, and Kaushik Dutta, Member, IEEE,” Building a Scalable Database-Driven Re- verse Dictionary”,vol.25, pp. 528-540,2013.

[2] G.Miller, C.Fellbaum, R. Tengi, P. Wakefield, and H. Langone, “Wordnet Lexical Database,” http://wordnet.princeton.edu/wordnet/download/, 2009.

[3] Dictionary.com, LLC, “Reverse Diction- ary,”http://dictionary.reference.com/reverse, 2009.

[4] OneLook.com, “Onelook.com Reverse Dictionary,” http://www.onelook.com/, 2009.

[5] O.S. Project “Opennlp,” http://opennlp.sourceforge.net/, 2009.

[6] T.Daoand T. Simpson, “Measuring Similarity between Sentences,” http://opensvn.csie.org/WordNetDotNet/trunk/Projects/Thanh/ Paper/WordNetDotNet_Semantic_Similarity.pdf, 2009.

[7]http://en.wikipedia.org/wiki/Data_mining

[8] J. Earley, “An Efficient Context-Free Parsing Algorithm,” Comm.ACM, vol. 13, no. 2, pp. 94-102, 1970.J. Kim and K. Candan, “Cp/cv: Concept Similarity Mining without Frequen- cy Information from Domain Describing Taxonomies,” Proc. ACM Conf. Information and Knowledge Management,2006.

[9] D. Lin, “An Information-Theoretic Definition of Similari- ty,”Proc.Int’l Conf. Machine Learning, pp. 296-298, 1998

[10] Z. Wu and M. Palmer, “Verbs Semantics and Lexical Selection,”Proc. 32nd Ann. Meeting Assoc. for Computational Linguistics,pp. 133-138, 1994.

[11] D. Widdows and K. Ferraro, “Semantic Vectors,” http://code.google.com/p/semanticvectors/, 2010.

[12] N. Segata and E. Blanzieri, “Fast Local Support Vector Machines for Large Datasets,” Proc. Int’l Conf. Machine Learning and Data Mining in Pattern Recognition, pp. 295-310 ,July 2009.

[13] U. of Pennsylvania, “The Penn Treebank Project,” http://www.cis.upenn.edu/ treebank/, 2009.

[14] R. Mihalcea, C. Corley, and C. Strapparava, “Corpus- Based and Knowledge-Based Measures of Text Semantic Sim- ilarity,” Proc. Nat’l Conf. Artificial Intelligence, pp. 775- 780,2006.

[15] C. Manning, P. Raghavan, and H. Schutze, ”Introduction to Information Retrieval”, Cambridge Univ. Press, 2008.

International Journal of Engineering Science and Computing, March 2016 2730 http://ijesc.org/