Contextualized Web Search: Query-Dependent Ranking and Social Media Search

Total Page:16

File Type:pdf, Size:1020Kb

Contextualized Web Search: Query-Dependent Ranking and Social Media Search CONTEXTUALIZED WEB SEARCH: QUERY-DEPENDENT RANKING AND SOCIAL MEDIA SEARCH A Thesis Presented to The Academic Faculty by Jiang Bian In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the College of Computing Georgia Institute of Technology December 2010 CONTEXTUALIZED WEB SEARCH: QUERY-DEPENDENT RANKING AND SOCIAL MEDIA SEARCH Approved by: Dr. Hongyuan Zha, Advisor Dr. Guy Lebanon College of Computing College of Computing Georgia Institute of Technology Georgia Institute of Technology Dr. Eugene Agichtein Dr. Richard Vuduc Mathematics and Computer Science College of Computing Department Georgia Institute of Technology Emory University Dr. Alexander Gray Date Approved: September 15, 2010 College of Computing Georgia Institute of Technology ACKNOWLEDGEMENTS I would never have been able to finish my dissertation without the guidance of my committee members, help from my friends, and support from my family. I owe an enormous debt of gratitude to my advisor, Dr. Hongyuan Zha, who has showed me the beauty of science, offered me precious opportunities for collabo- rations, and supplied sustenance, literally and metaphorically, through long seasons in Atlanta. I would like to express my deep gratitude to one of committee members and collaborators, Dr. Eugene Agichtein, for offering many insightful suggestions and friendly accessibility throughout my graduate study. I would like to thank Dr. Ding Zhou for his patient guidance on my early research work and my PhD life. This dissertation has also received contributions from outside the university, for which I owe my special thanks to Dr. Tie-Yan Liu and Dr. Zhaohui Zheng who mentored my two important industrial internships. I thank Dr. Tao Qin, Dr. Xin Li, Dr. Fan Li and Dr. Max Gubin for giving me so much invaluable advice from indus- try. I feel especially thankful to my colleagues from Georgia Institute of Technology, Emory University, Microsoft Research Asia, Yahoo! Labs, Facebook, University of Illinois at Urbana-Champaign, and Nokia Research Center, with whom I have enjoyed collaboration. Finally, I am deeply indebted to my parents for bringing me into this world and providing me with a sound education. Finally, I would like to thank Dr. Weiyun Chen, who was always standing by me through both good and bad times. iii TABLE OF CONTENTS ACKNOWLEDGEMENTS ............................ iii LIST OF TABLES ................................. ix LIST OF FIGURES ................................ xi SUMMARY ..................................... xiv CHAPTERS I INTRODUCTION .............................. 1 1.1 Query-Dependent Ranking for Web Search . 3 1.2 Context-Aware Social Media Search . 4 1.3 Contributions . 9 1.4 Dissertation Organization . 10 II RELATED WORK .............................. 12 2.1 Ranking for Web Search . 12 2.1.1 Learning to Rank . 12 2.1.2 Incorporate Query Difference in Ranking . 13 2.2 Search in Social Media . 14 2.2.1 Information Retrieval over Community Question Answering 14 2.2.2 Modeling User Authority . 15 2.2.3 User Interactions and Related Spam in Web Search . 16 2.2.4 Information Extraction and Integration over Social Media . 17 III QUERY-DEPENDENT RANKING FOR SEARCH . 19 3.1 Query Difference in Ranking . 19 3.2 Incorporating Query Difference into Ranking . 21 3.2.1 Query-Dependent Loss Functions . 21 3.2.2 Learning Methods . 22 3.3 Applying Query-Dependent Loss To A Specific Query Categorization 23 iv 3.3.1 Query-Dependent Loss Functions Based on Query Taxonomy of Web Search . 24 3.3.2 Learning Methods . 27 3.3.3 Example Query-Dependent Loss Functions . 30 3.4 Experiments . 33 3.4.1 Experimental Setup . 33 3.4.2 Experimental Results . 35 3.4.3 Discussions . 40 3.5 Query-Dependent Loss Functions v.s. Query-Dependent Rank Func- tions . 43 3.6 Summary . 46 IV RANKING SPECIALIZATION FOR WEB SEARCH . 47 4.1 A Divide-and-Conquer Framework . 48 4.2 Identifying Ranking-Sensitive Query Topics . 50 4.2.1 Generating Query Features . 50 4.2.2 Generating Query Topics and Computing Topic Distribution for Queries . 51 4.3 A Unified Approach for Learning Multiple Ranking Models . 52 4.3.1 Problem Statement . 52 4.3.2 Topical RankSVM . 53 4.4 Ensemble Ranking for New Queries . 55 4.5 Experimental Setup . 56 4.5.1 Data Collection . 56 4.5.2 Evaluation Metrics . 58 4.5.3 Ranking Methods Compared . 58 4.6 Experimental Results . 59 4.6.1 Experiments with LETOR Dataset . 59 4.6.2 Experiments with SE-Dataset . 68 4.7 Summary . 71 v V RANKING OVER SOCIAL MEDIA .................... 74 5.1 Community Question Answering . 74 5.2 Learning Ranking Functions for CQA Retrieval . 78 5.2.1 Problem Definition of QA Retrieval . 78 5.2.2 User Interactions in Community QA . 79 5.2.3 Features and Preference Data Extraction . 80 5.2.4 Learning Ranking Function from Preference Data . 84 5.3 Experimental Setup . 85 5.3.1 Datasets . 85 5.3.2 Evaluation Metrics . 88 5.3.3 Ranking Methods Compared . 89 5.4 Experimental Results . 91 5.4.1 Learning Ranking Function . 91 5.4.2 Robustness to Noisy Labels . 92 5.4.3 Ablation Study on Feature Set . 93 5.4.4 QA Retrieval . 94 5.5 Summary . 97 VI LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL MEDIA ............................... 98 6.1 Content Quality and User Reputation in Social Media . 98 6.2 Learning Content Quality and User Reputation in CQA . 99 6.2.1 Problem Statement . 100 6.2.2 Coupled Mutual Reinforcement Principle . 101 6.2.3 CQA-MR: Coupled Semi-Supervised Mutual Reinforcement 104 6.3 Experimental Setup . 109 6.3.1 Data Collection . 109 6.3.2 Evaluation Metrics . 111 6.3.3 Methods Compared . 111 6.4 Experimental Results . 113 vi 6.4.1 Predicting Content Quality and User Reputation . 113 6.4.2 Quality-aware CQA Retrieval . 115 6.4.3 Effects of the QA quality and User Reputation Features . 117 6.4.4 Effects of the Amount of Supervision . 118 6.5 Summary . 120 VII ROBUST RANKING IN SOCIAL MEDIA . 122 7.1 User Votes in Social Media . 122 7.2 Vote Spam in Social Media . 124 7.2.1 Vote Spam Attack Models . 124 7.3 Robust Ranking in Community Question Answering . 126 7.3.1 Robust Ranking Method . 126 7.3.2 Datasets . 127 7.3.3 Ranking Methods Compared . 127 7.3.4 Evaluation on Robustness of Ranking to Vote Spam Attack 128 7.4 Experimental Results . 128 7.4.1 QA Retrieval . 128 7.4.2 Robustness to Vote Spam . 131 7.4.3 Analyzing Feature Contributions . 133 7.5 Summary . 136 VIII LEARNING TO ORGANIZE THE KNOWLEDGE IN SOCIAL MEDIA 137 8.1 Task Question and Its Structured Semantics . 137 8.2 Integrating Task Questions Based on Structured Semantics . 141 8.2.1 Problem Formulation . 141 8.2.2 Aspect Discovery and Clustering on CQA Content . 143 8.2.3 Generating Representative Description for Aspects . 149 8.3 Applications . 150 8.3.1 Identifying Semantics for New Questions . 150 8.3.2 Finding Similar Questions . 151 vii 8.3.3 Other Applications . 153 8.4 Experiments . 154 8.4.1 Data Collection . 155 8.4.2 Empirical Results and Analysis . 157 8.4.3 Quantitative Evaluation . 161 8.4.4 Experiments on External Applications . 162 8.5 Summary . 166 IX CONCLUSION ................................ 168 APPENDICES REFERENCES ................................... 170 VITA ........................................ 178 viii LIST OF TABLES 1 MAP value of RankNet, SQD-RankNet, and UQD-RankNet . 37 2 MAP value of ListMLE, SQD-ListMLE, and UQD-ListMLE . 38 3 Top 10 Results (Doc IDs) of One Informational Query (Query ID: 87) 41 4 Top 10 Results (Doc IDs) of One Navigational Query (Query ID: 52) 42 5 MAP value of RSVM, CRSVM, LRSVM, and TRSVM on TREC2003 and TREC2004 . 62 6 MAP value of RSVM, LRSVM, and TRSVM on MQ2003 and MQ2004 62 7 Top 8 most important features for RSVM on TREC2003 . 63 8 Top 8 most important features for CRSVM on TREC2003 . 63 9 Top 8 most important features for TRSVM on TREC2003 . 63 10 Features used to represent textual elements and user interactions . 81 11 Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) for GBrank and other metrics for baseline1 . 96 12 Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) for GBrank and other metrics for baseline2 . 96 13 Features Spaces: X(Q);X(A) and X(U)................ 105 14 Inter-annotator agreement and Kappa for question quality . 111 15 Accuracy of GBRank-MR, GBRank-Supervised, GBRank-HITS, GBRank, and Baseline (TREC 1999-2006 questions) . 117 16 Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) for GBrank, GBrank-robust and Baseline . 130 17 Information Gain for all features both when training and testing data are not polluted . 134 18 Information Gain for all features both when training and testing data are polluted . 135 19 Examples of Integrated Task Summaries of two instances of The Task Topic \travel" . 154 20 Examples of Integrated Task Summaries of two instances of The Task Topic \pets" . 155 21 Basic statistics of the collections of task questions from Yahoo! Answers156 ix 22 Basic statistics of the prior knowledge of aspect from eHow . 156 23 Inter-annotator agreement for aspect discovering . 161 24 Evaluation of Clustering Accuracy . 162 x LIST OF FIGURES 1 Accuracy in terms of NDCG of query-dependent RankNet compared with original RankNet on TREC2003 . 36 2 Accuracy in.
Recommended publications
  • Web Query Classification and URL Ranking Based on Click Through Approach
    S.PREETHA et al, International Journal of Computer Science and Mobile Computing Vol.2 Issue. 10, October- 2013, pg. 138-144 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 2, Issue. 10, October 2013, pg.138 – 144 RESEARCH ARTICLE Web Query Classification and URL Ranking Based On Click through Approach S.PREETHA1, K.N Vimal Shankar2 PG Scholar1, Asst. Professor2 Department of Computer Science & Engineering1, 2 V.S.B. Engineering College, Karur, India1, 2 [email protected], [email protected] Abstract― In a web based application; different users may have different search goals when they submit it to a search engine. For a broad-topic and ambiguous query it is difficult. Here we propose a novel approach to infer user search goals by analyzing search engine query logs. This is typically shown in cases such as these: Different users have different backgrounds and interests. However, effective personalization cannot be achieved without accurate user profiles. We propose a framework that enables large-scale evaluation of personalized search. The goal of personalized IR (information retrieval) is to return search results that better match the user intent. First, we propose a framework to discover different user search goals for a query by clustering the proposed feedback sessions. Feedback sessions are getting constructed from user click-through logs and can efficiently reflect the information needs of users. Second, we propose an approach to generate pseudo-documents to better represent the feedback sessions for clustering.
    [Show full text]
  • Classifying User Search Intents for Query Auto-Completion
    Classifying User Search Intents for Query Auto-Completion Jyun-Yu Jiang Pu-Jen Cheng Department of Computer Science Department of Computer Science University of California, Los Angeles National Taiwan University California 90095, USA Taipei 10617, Taiwan [email protected] [email protected] ABSTRACT completion is to accurately predict the user's intended query The function of query auto-completion1 in modern search with only few keystrokes (or even without any keystroke), engines is to help users formulate queries fast and precisely. thereby helping the user formulate a satisfactory query with Conventional context-aware methods primarily rank candi- less effort and less time. For instance, users can avoid mis- date queries2 according to term- and query- relationships to spelling and ambiguous queries with query completion ser- the context. However, most sessions are extremely short. vices. Hence, how to capture user's search intents for pre- How to capture search intents with such relationships be- cisely predicting the query one is typing is very important. The context information, such as previous queries and comes difficult when the context generally contains only few 3 queries. In this paper, we investigate the feasibility of dis- click-through data in the same session , is usually utilized covering search intents within short context for query auto- to capture user's search intents. It also has been exten- completion. The class distribution of the search session (i.e., sively studied in both query completion [1, 21, 35] and sug- gestion [18, 25, 37]. Previous work primarily focused on cal- issued queries and click behavior) is derived as search in- 2 tents.
    [Show full text]
  • Automatic Web Query Classification Using Labeled and Unlabeled Training Data Steven M
    Automatic Web Query Classification Using Labeled and Unlabeled Training Data Steven M. Beitzel, Eric C. Jensen, David D. Lewis, Abdur Chowdhury, Ophir Frieder, David Grossman Aleksandr Kolcz Information Retrieval Laboratory America Online, Inc. Illinois Institute of Technology [email protected] {steve,ej,ophir,dagr}@ir.iit.edu {cabdur,arkolcz}@aol.com ABSTRACT We examine three methods for classifying general web queries: Accurate topical categorization of user queries allows for exact match against a large set of manually classified queries, a increased effectiveness, efficiency, and revenue potential in weighted automatic classifier trained using supervised learning, general-purpose web search systems. Such categorization and a rule-based automatic classifier produced by mining becomes critical if the system is to return results not just from a selectional preferences from hundreds of millions of unlabeled general web collection but from topic-specific databases as well. queries. The key contribution of this work is the development Maintaining sufficient categorization recall is very difficult as of a query classification framework that leverages the strengths web queries are typically short, yielding few features per query. of all three individual approaches to achieve the most effective We examine three approaches to topical categorization of classification possible. Our combined approach outperforms general web queries: matching against a list of manually labeled each individual method, particularly improves recall, and allows queries, supervised learning of classifiers, and mining of us to effectively classify a much larger proportion of an selectional preference rules from large unlabeled query logs. operational web query stream. Each approach has its advantages in tackling the web query classification recall problem, and combining the three 2.
    [Show full text]
  • Semantic Matching in Search
    Foundations and Trends⃝R in Information Retrieval Vol. 7, No. 5 (2013) 343–469 ⃝c 2014 H. Li and J. Xu DOI: 10.1561/1500000035 Semantic Matching in Search Hang Li Huawei Technologies, Hong Kong [email protected] Jun Xu Huawei Technologies, Hong Kong [email protected] Contents 1 Introduction 3 1.1 Query Document Mismatch ................. 3 1.2 Semantic Matching in Search ................ 5 1.3 Matching and Ranking ................... 9 1.4 Semantic Matching in Other Tasks ............. 10 1.5 Machine Learning for Semantic Matching in Search .... 11 1.6 About This Survey ...................... 14 2 Semantic Matching in Search 16 2.1 Mathematical View ..................... 16 2.2 System View ......................... 19 3 Matching by Query Reformulation 23 3.1 Query Reformulation .................... 24 3.2 Methods of Query Reformulation .............. 25 3.3 Methods of Similar Query Mining .............. 32 3.4 Methods of Search Result Blending ............. 38 3.5 Methods of Query Expansion ................ 41 3.6 Experimental Results .................... 44 4 Matching with Term Dependency Model 45 4.1 Term Dependency ...................... 45 ii iii 4.2 Methods of Matching with Term Dependency ....... 47 4.3 Experimental Results .................... 53 5 Matching with Translation Model 54 5.1 Statistical Machine Translation ............... 54 5.2 Search as Translation .................... 56 5.3 Methods of Matching with Translation ........... 59 5.4 Experimental Results .................... 61 6 Matching with Topic Model 63 6.1 Topic Models ........................ 64 6.2 Methods of Matching with Topic Model .......... 70 6.3 Experimental Results .................... 74 7 Matching with Latent Space Model 75 7.1 General Framework of Matching .............. 76 7.2 Latent Space Models ...................
    [Show full text]
  • A New Algorithm for Inferring User Search Goals with Feedback Sessions
    International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 3 Issue 11, November 2014 A Survey on: A New Algorithm for Inferring User Search Goals with Feedback Sessions Swati S. Chiliveri, Pratiksha C.Dhande a few words to search engines. Because these queries are short Abstract— For a broad-topic and ambiguous query, different and ambiguous, how to interpret the queries in terms of a set users may have different search goals when they submit it to a of target categories has become a major research issue. search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance and For example, when the query ―the sun‖ is submitted to a user experience. This paper provides search engine, some people want to locate the homepage of a an overview of the system architecture of proposed feedback United Kingdom newspaper, while some people want to session framework with their advantages. Also we have briefly discussed the literature survey on Automatic Identification of learn the natural knowledge of the sun. Therefore, it is User Goals in Web Search with their pros and cons. We have necessary and potential to capture different user search goals, presented working of various classification approaches such as user needs in information retrieval. We define user search SVM, Exact matching and Bridges etc. under the web query goals as the information on different aspects of a query that classification framework. Lastly, we have described the existing user groups want to obtain. Information need is a user’s web clustering engines such as MetaCrawler, Carrot 2, particular desire to obtain information to satisfy users need.
    [Show full text]
  • Most Valuable (Cited)
    Top 100 Most Valuable Patent Portfolios Based on 3 million patents issued from 1-1-2005 to 3-21-2017 Methodology: 1) Identify all US-based patents and their patent citations 2) Identify most cited patents (patent must be cited > 10x as 10x is the average) 3) Aggregate all of the "most cited patents" into their Assignees and sort Patents Total Average Rank Assignee Being Cited Citations Citations/Patent 1 International Business Machines Corporation 5902 163252 27.66 2 Microsoft Corporation 4688 142191 30.33 3 Ethicon Endo-Surgery, Inc. 715 97544 136.43 4 Western Digital Technologies, Inc. 972 71947 74.02 5 Intel Corporation 2273 66689 29.34 6 Hewlett-Packard Development Company, L.P. 1922 58888 30.64 7 Tyco Healthcare Group LP 498 56695 113.85 8 Samsung Electronics Co., Ltd. 2097 53863 25.69 9 Canon Kabushiki Kaisha 1950 52703 27.03 10 Cisco Technology, Inc. 1702 51491 30.25 11 Micron Technology, Inc. 1567 48443 30.91 12 Medtronic, Inc. 1049 42652 40.66 13 Sony Corporation 1454 37963 26.11 14 Semiconductor Energy Laboratory Co., Ltd. 1177 36719 31.20 15 Apple Inc. 1057 36234 34.28 16 Hitachi, Ltd. 1314 36126 27.49 17 Western Digital (Fremont), LLC 322 36007 111.82 18 Masimo Corporation 220 35402 160.92 19 IGT 746 35063 47.00 20 Nokia Corporation 1110 34107 30.73 21 Kabushiki Kaisha Toshiba 1346 33712 25.05 22 Sun Microsystems, Inc. 982 29744 30.29 23 Matsushita Electric Industrial Co., Ltd. 1092 29719 27.22 24 General Electric Company 1301 29432 22.62 25 Silverbrook Research PTY LTD 656 25941 39.54 26 Boston Scientific SciMed, Inc.
    [Show full text]
  • Improving Search Engines Via Classification
    Improving Search Engines via Classi¯cation Zheng Zhu May 2011 A Dissertation Submitted to Birkbeck College, University of London in Partial Ful¯llment of the Requirements for the Degree of Doctor of Philosophy Department of Computer Science & Information Systems Birkbeck College University of London Declaration This thesis is the result of my own work, except where explicitly acknowledge in the text. Zheng Zhu i Abstract In this dissertation, we study the problem of how search engines can be improved by making use of classi¯cation. Given a user query, traditional search engines output a list of results that are ranked according to their relevance to the query. However, the ranking is independent of the topic of the document. So the results of di®erent topics are not grouped together within the result output from a search engine. This can be problematic as the user must scroll though many irrelevant results until his/her desired information need is found. This might arise when the user is a novice or has super¯cial knowledge about the domain of interest, but more typically it is due to the query being short and ambiguous. One solution is to organise search results via categorization, in particular, the classi¯cation. We designed a target testing experiment on a controlled data set, which showed that classi¯cation-based search could improve the user's search experience in terms of the numbers of results the user would have to inspect before satisfying his/her query. In our investigation of classi¯cation to organise search results, we not only consider the classi¯cation of search results, but also query classi¯cation.
    [Show full text]
  • Building Bridges for Web Query Classification
    Building Bridges for Web Query Classification Dou Shen†, Jian-Tao Sun‡, Qiang Yang†, Zheng Chen‡ †Department of Computer Science and Engineering, Hong Kong University of Science and Technology {dshen,qyang}@cs.ust.hk ‡Microsoft Research Asia, Beijing, P.R.China {jtsun, zhengc}@microsoft.com ABSTRACT 1. INTRODUCTION Web query classification (QC) aims to classify Web users’ With exponentially increasing information becoming avail- queries, which are often short and ambiguous, into a set of able on the Internet, Web search has become an indispens- target categories. QC has many applications including page able tool for Web users to gain desired information. Typi- ranking in Web search, targeted advertisement in response cally, Web users submit a short Web query consisting of a to queries, and personalization. In this paper, we present a few words to search engines. Because these queries are short novel approach for QC that outperforms the winning solu- and ambiguous, how to interpret the queries in terms of a tion of the ACM KDDCUP 2005 competition, whose objec- set of target categories has become a major research issue. tive is to classify 800,000 real user queries. In our approach, In this paper, we call the problem of generating a ranked list we first build a bridging classifier on an intermediate tax- of target categories from user queries the query classification onomy in an offline mode. This classifier is then used in problem, or QC for short. an online mode to map user queries to the target categories The importance of QC is underscored by many services via the above intermediate taxonomy.
    [Show full text]
  • Query Classification Based on Textual Patterns Mining and Linguistic Features Mohamed Ettaleb, Chiraz Latiri, Patrice Bellot
    Query Classification based on Textual Patterns Mining and Linguistic Features Mohamed Ettaleb, Chiraz Latiri, Patrice Bellot To cite this version: Mohamed Ettaleb, Chiraz Latiri, Patrice Bellot. Query Classification based on Textual Patterns Mining and Linguistic Features. 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2019), Apr 2019, La Rochelle, France. hal-02447749 HAL Id: hal-02447749 https://hal.archives-ouvertes.fr/hal-02447749 Submitted on 21 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Query Classification based on Textual Patterns Mining and Linguistic Features Mohamed ETTALEB1, Chiraz LATIRI2, and Patrice BELLOT2 1 University of Tunis El Manar, Faculty of Sciences of Tunis, LIPAH research Laboratory, Tunis ,Tunisia 2 Aix-Marseille University, CNRS, LIS UMR 7020, 13397, Marseille, France [email protected], [email protected], [email protected] Abstract. We argue that verbose natural language queries used for soft- ware retrieval contain many terms that follow specific discourse rules, yet hinder retrieval. Through verbose queries, users can express complex or highly specific information needs. However, it is difficult for search engine to deal with this type of queries.
    [Show full text]
  • Ricardo Baeza-Yates Berthier Ribeiro-Neto
    Modern Information Retrieval The Concepts and Technology behind Search Ricardo Baeza-Yates Berthier Ribeiro-Neto Second edition Addison-Wesley Harlow, England Reading, Massachusetts Menlo Park, California• New York Don Mills, Ontario Amsterdam• Bonn Sydney Singapore• Tokyo Madrid• San Juan• Milan •Mexico City• Seoul Taipei • • • • References [1] I. Aalbersberg. Incremental relevance feedback. In Proc of the Fifteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 11–22, Denmark, 1992. [2] H. Abdi. Kendall rank correlation. In N. Salkind, editor, Encyclopedia of Measure- ment and Statistics, Thousand Oaks, CA, 2007. Sage. [3] H. Abdi. The Kendall rank correlation coefficient. Technical report, Univ. of Texas at Dallas, 2007. [4] K. Aberer, F. Klemm, M. Rajman, and J. Wu. An architecture for peer-to-peer information retrieval. In Workshop on Peer-to-Peer Information Retrieval,Sheffield, UK, July 2004. [5] S. Abiteboul. Querying semi-structured data. In F. N. Afrati and P. Kolaitis, editors, Int. Conf. on Database Theory (ICDT), number 1186 in LNCS, pages 1–18, Delphi, Greece, 1997. Springer-Verlag. [6] S. Abiteboul, M. Preda, and G. Cobena. Adaptive on-line page importance compu- tation. In Proceedings of the twelfth international conference on World Wide Web, pages 280–290, Budapest, Hungary, 2003. ACM Press. [7] M. Abolhassani and N. Fuhr. Applying the divergence from randomness approach for content-only search in xml documents. Lecture Notes in Computer Science, 2997:409– 420, 2004. [8] M. Abrams, editor. World Wide Web: Beyond the Basics. Prentice Hall, 1998. [9] M. Abrol, N. Latarche, U. Mahadevan, J. Mao, R. Mukherjee, P.
    [Show full text]
  • 104. Ontology Based Web Query Classification
    International Journal of Engineering Research and General Science Volume 3, Issue 3, May-June, 2015 ISSN 2091-2730 Ontology-Based Web Query Classification Bharat Katariya Computer Engineering Guided By Prof. Mitula Pandya [M.E, C.E] Alpha college of Engineering & Technology, Khatraj, Gandhinagar – 382721 Abstract- Now a day’s use of internet becomes very popular. Everyone gets their need by searching on internet. User can access information easily by web searching tools like a search engine. (Google) According to one survey nearly 70% of the people use a Search Engine to access the information available on Web. If Query submitted by the user to the search engine is too short and ambiguous then it can create problem to getting right documents. Query classification is one technique of Data Mining in query should classify to the number of predefined categories. Query classification use ontology as a model to retrieve the document. Ontology is used in information retrieval system to retrieve more relevant information from a collection of unstructured information source. The main objective of this work is to use ontology concepts as categories. The concepts of ontology will be used as training set. Each document is ranking by the probability. The semantic relations will solve the problem of ambiguity. Web query classification is used to ranking the page(Search engine optimization). Key Words- Classification, Ontology, Web query, categories of query, Irrelevant link, Duplicate link I. Introduction Web query classification is a part of web mining. Web mining include data cleaning, data integration from multiple sources, warehousing the data, data cube construction, data selection, data mining, presentation of mining results, pattern and knowledge to be used to stored into knowledge base.
    [Show full text]
  • Web Search Query Privacy, an End-User Perspective
    Journal of Information Security, 2017, 8, 56-74 http://www.scirp.org/journal/jis ISSN Online: 2153-1242 ISSN Print: 2153-1234 Web Search Query Privacy, an End-User Perspective Kato Mivule Department of Computer Science, Norfolk State University, Norfolk, USA How to cite this paper: Mivule, K. (2017) Abstract Web Search Query Privacy, an End-User Perspective. Journal of Information Securi- While search engines have become vital tools for searching information on ty, 8, 56-74. the Internet, privacy issues remain a growing concern due to the technological http://dx.doi.org/10.4236/jis.2017.81005 abilities of search engines to retain user search logs. Although such capabili- Received: December 28, 2016 ties might provide enhanced personalized search results, the confidentiality of Accepted: January 14, 2017 user intent remains uncertain. Even with web search query obfuscation tech- Published: January 17, 2017 niques, another challenge remains, namely, reusing the same obfuscation me- thods is problematic, given that search engines have enormous computation Copyright © 2017 by author and Scientific Research Publishing Inc. and storage resources for query disambiguation. A number of web search This work is licensed under the Creative query privacy procedures involve the cooperation of the search engine, a non- Commons Attribution International trusted entity in such cases, making query obfuscation even more challenging. License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ In this study, we provide a review on how search engines work in regards to Open Access web search queries and user intent. Secondly, this study reviews material in a manner accessible to those outside computer science with the intent to intro- duce knowledge of web search engines to enable non-computer scientists to approach web search query privacy innovatively.
    [Show full text]