Information Extraction Using Semantic Similarity Features in Natural Language Processing 1S

International Journal of Pure and Applied Mathematics Volume 116 No. 21 2017, 719-727 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu Information Extraction Using Semantic Similarity Features in Natural Language Processing 1S. Jayalakshmi and 2Ananthi Sheshasaayee 1Periyar University, Vels University, India. [email protected] 2PG & Research Dept. of Computer Science, Queen Marys College for Women, Chennai, India. [email protected] Abstract Question Answering system have achieved significant progress in which question classification is an essential part, Retrieving most relevant information from the web is a major challenge. It focuses on proving the answer to the tough questions posted by the users and furnishing the appropriate answer along with the significant information with adequate sentences. To analyze a natural language question which assigns a semantic category to a given question that represents the type of answer required resulting in an accurate question answering system. Methods: The proposed approach focuses on predicting the original intention of the questions and providing the candidate answer with adequate, significant information from both web corpus and ontology, it incorporates the three modules are Generating relevant documents, Ranking the Documents and Predicting the precise answer. Findings: Identify the appropriate, concise candidate answer automatically. Applications: Improved system accuracy and provide better result to user with less effort and time. 719 International Journal of Pure and Applied Mathematics Special Issue 1. Introduction The growth of the web searching information is rapid, the user demands more sophisticated search tools capable of providing highly relevant information with ease [1] and the web has become the global and easily accessible repository containing textual information. The web search engine has been exploited as the most significant tool on the internet to extract the information from the web, which enables the users to retrieve the relevant information, is retrieved from the internet search engine [2]. Hence the information management system targets to create a comfortable searching environment to the users in the flood of online information; an information retrieval [3] is the process of converting the information needs of the users into the list of documents that are relevant to the needs of the users through the web search engine. Question Answering System Question Answering(QA) [4] is one of the services which satisfies the desires of the users and provides the adequate sentences as answer to specific natural language questions posted by them instead of providing a set of relevant web documents. The AQ system receives considerable attention due to the increasing amount of web content and the high demand for digital information. The IR engines have been desired to retrieve only the documents and not retrieve the specific information as the answer from the abundant relevant information [5]. If a user requires specific information, the user has to examine the retrieved documents manually to find a accurate answer. The QA system addressing this problem and solve with the help of NLP methods and IR system. The QA system jointly processes the IR and NLP techniques to flexibly access the online information. The automated QA system is related to the IR system offering desired answers to the queries submitted by the users, but the QA system differs providing the information needs as the direct answers. Significance of Semantic Similarity in QA Semantics is the most complex and essential factor for natural language. Semantics involved in three processes of QA system such as identify the question, recognize the topic and retrieve the relevant answer. The semantic analysis is based on the different semantic element like ontology classes, WordNet, FrameNet and Semantic Role Labeling are used to understand the input query. QA has the advantage from SRL, which improves the accuracy of the given answers. Semantic search find ways for improving accuracy of search using the conceptual knowledge hidden in internet. It does not assign ranks for predicting relevancy; it uses hidden meaning to assign outputs Problem Statement and Scope of the Work The conventional QA systems still confront with the answer generation problem for WH-operator missing questions. The improper questions which are not 720 International Journal of Pure and Applied Mathematics Special Issue specified with the question words starting with ‘W’ and‘H’likely to mislead the question processing syntactically. The conventional QA system apply semantic analysis on the natural language questions throughout the QA stags, even though it lacks in identifying the accurate answer for each questions. To overcome this problem the lexical, syntactic and reinforcement of semantic relation of the arguments based answer extraction is necessary for a QA system. The main scope of QA system is designed to improve theease of understanding capability of information by the user along with the benefits of the IR system. In web searching the user need to retrieve the relevant answer quickly from the web search engine, which are lexically incorrect or improper questions, Thus QA system provides the accurate result as the answer at the top of the result web page rather than retrieving a set of documents containing the answer. The QA technique enables the system to answer both the proper and improper NLP questions, which precisely provides the appropriate answer by constructing the proper questions from awkward question using by the syntactically, semantically and pragmatically examination technique. Syntactic Analysis: arranging the words in an appropriate way. Semitic Analysis-Providing meaning of the word, by applying semantic features. Semantic Features: Synonyms-contain related meaning of the word. Hypernyms-Contain more general term(eg: Flowers) Hyponyms- sub division of more general term.(eg.Flowers: Jasmine, Rose, Lotus) Pragmatic Analysis: used to identify the content about the context, all the above techniques are used to identify the more relevant answer for the posted query. 2. An Overview of SSSR-QAS The SSSR-QAS, Semantic and Syntactic Structure Representation- Question Answering System, is an automated question answering system based on the Lexical, syntactic and semantic measures. It includes three methods are as Question Processing, document Processing and Answer Validation as shown in Fig.1. Question Processing Question Processing is process of identifying the correct format of the question type. It contains two major modules like question classification and question reformulations. In Question Classification[6] is mainly used to identify the question type like WH- overt or WH–covert type of question, Main and Sub class generation it encompass of coarse class like ABBR, DESC, ENTY, LOC etc., and Fine class like LOC–Country, city. The linear order of the argument technique is applied for question extraction. Document Processing Process of matching [7] the appropriate question terms and removes the irrelevent contents. It provides correct sentence formation for answer retrieval 721 International Journal of Pure and Applied Mathematics Special Issue process. The answers [8] are filtered on the bases Title and Snippet to retrieve most relevant answer types. The pattern generation [9] and answer weight based pattern order are used to arrange the contents in the weighted order which is used to arrange the text in the form of most relevant to least relevant order. Answer Processing Which is used to identify the candidate answer sentences [10] and it validate [11] the main verb to display the correct answer on the ranked list. It Re-rank the contents [12] according to the posted query. It assigns the score for pattern according to the semantic relation between the question and the Answer type. Fig. 1: SSSR-QAS Approach SSSR-QAS Algorithm Input: Posted Query Output: Correct answer For all posted query Q do For all QA samples do Construct Training Corpus (TC) for QA // Phase 1: Question Processing While (check ->user quer) do If (Q->over type) then Identify WH-non WH Q then Identify sub –main class from TC for Q End if If(Q->cover type) then Convert into overt type Q with (SVO structure) End if Q(NLP)-> (Q(preprocessing)) If((Q->overt type) then If(Qtype->Qans) Classify the question using SVM machine learning algorithm End if For all (Q of all Qans) then 722 International Journal of Pure and Applied Mathematics Special Issue Extract the answer using linear order of words in syntactic representation End if End for End while // Phase 2: Document Processing For all retrieved D from WSE If (T(D)->user Q) then D->list the Document Else D-> Removed list End if If(snippet(D)=user Q)then S->list(S) Else S->Removed list End if If(S->P rank) then Generate patter using POS End if // Phase 3: Answer Validation For all ranked s answers do Select the high rank with matching pattern S(D) For all relevant S then do If (S->relevant ans) then Validate the named entity type based TC(QA) End if Else If (S->P)=Q(P) then Assign top rank to the Q End if End for 1. Algorithm for SSSR-QAS 3. Performance Analysis for Question Answering System The SSSR-QAS approach is implemented in java platform and Expert system of java System Shell (JESS) rule engine. Using Docjax search engine and IR engine are used retrieve semantic relationship. Java API for WordNet searching

Load more