Searching for People on Web Search Engines Amanda Spink and Bernard J
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
DEWS: a Decentralized Engine for Web Search
DEWS: A Decentralized Engine for Web Search Reaz Ahmed, Rakibul Haque, Md. Faizul Bari, Raouf Boutaba David R. Cheriton School of Computer Science, University of Waterloo [r5ahmed|m9haque|mfbari|rboutaba]@uwaterloo.ca Bertrand Mathieu Orange Labs, Lannion, France [email protected] (Technical Report: CS-2012-17) Abstract The way we explore the Web is largely governed by centrally controlled, clustered search engines, which is not healthy for our freedom in the Internet. A better solution is to enable the Web to index itself in a decentralized manner. In this work we propose a decentralized Web search mechanism, named DEWS, which will enable the existing webservers to collaborate with each other to form a distributed index of the Web. DEWS can rank the search results based on query keyword relevance and relative importance of websites. DEWS also supports approximate matching of query keywords and incremental retrieval of search results in a decentralized manner. We use the standard LETOR 3.0 dataset to validate the DEWS protocol. Simulation results show that the ranking accuracy of DEWS is very close to the centralized case, while network overhead for collaborative search and indexing is logarithmic on network size. Simulation results also show that DEWS is resilient to changes in the available pool of indexing webservers and works efficiently even in presence of heavy query load. 1 Introduction Internet is the largest repository of documents that man kind has ever created. Voluntary contributions from millions of Internet users around the globe, and decentralized, autonomous hosting infrastructure are the sole factors propelling the continuous growth of the Internet. -
Market Research SD-5 Gathering Information About Commercial Products and Services
Market Research SD-5 Gathering Information About Commercial Products and Services DEFENSE STANDARDIZATION PROGRA M JANUARY 2008 Contents Foreword 1 The Market Research Other Considerations 32 Background 2 Process 13 Amount of Information Strategic Market Research to Gather 32 What Is Market Research? 2 (Market Surveillance) 14 Procurement Integrity Act 32 Why Do Market Research? 2 Identify the Market or Market Paperwork Reduction Act 33 Segment of Interest 14 When Is Market Research Cost of Market Research 34 Done? 5 Identify Sources of Market Information 16 Who Should Be Involved In Market Research? 7 Collect Relevant Market Other Information Information 17 Technical Specialist 8 Document the Results 18 on Market Research 35 User 9 Logistics Specialist 9 Tactical Market Research Appendix A 36 (Market Investigation) 19 Testing Specialist 9 Types of Information Summarize Strategic Market Available on the Internet Cost Analyst 10 Research 19 Legal Counsel 10 Formulate Requirements 20 Appendix B 39 Contracting Officer 10 Web-Based Information Identify Sources of Sources Information 21 Guiding Principles 11 Collect Product or Service Appendix C 47 Examples of Tactical Start Early 11 Information from Sources 22 Collect Information from Information Define and Document Product or Service Users 26 Requirements 11 Evaluate the Data 27 Refine as You Proceed 12 Document the Results 30 Tailor the Investigation 12 Repeat as Necessary 12 Communicate 12 Involve Users 12 Foreword The Department of Defense (DoD) relies extensively on the commercial market for the products and services it needs, whether those products and services are purely commercial, modified for DoD use from commercial products and services, or designed specifically for DoD. -
How to Choose a Search Engine Or Directory
How to Choose a Search Engine or Directory Fields & File Types If you want to search for... Choose... Audio/Music AllTheWeb | AltaVista | Dogpile | Fazzle | FindSounds.com | Lycos Music Downloads | Lycos Multimedia Search | Singingfish Date last modified AllTheWeb Advanced Search | AltaVista Advanced Web Search | Exalead Advanced Search | Google Advanced Search | HotBot Advanced Search | Teoma Advanced Search | Yahoo Advanced Web Search Domain/Site/URL AllTheWeb Advanced Search | AltaVista Advanced Web Search | AOL Advanced Search | Google Advanced Search | Lycos Advanced Search | MSN Search Search Builder | SearchEdu.com | Teoma Advanced Search | Yahoo Advanced Web Search File Format AllTheWeb Advanced Web Search | AltaVista Advanced Web Search | AOL Advanced Search | Exalead Advanced Search | Yahoo Advanced Web Search Geographic location Exalead Advanced Search | HotBot Advanced Search | Lycos Advanced Search | MSN Search Search Builder | Teoma Advanced Search | Yahoo Advanced Web Search Images AllTheWeb | AltaVista | The Amazing Picture Machine | Ditto | Dogpile | Fazzle | Google Image Search | IceRocket | Ixquick | Mamma | Picsearch Language AllTheWeb Advanced Web Search | AOL Advanced Search | Exalead Advanced Search | Google Language Tools | HotBot Advanced Search | iBoogie Advanced Web Search | Lycos Advanced Search | MSN Search Search Builder | Teoma Advanced Search | Yahoo Advanced Web Search Multimedia & video All TheWeb | AltaVista | Dogpile | Fazzle | IceRocket | Singingfish | Yahoo Video Search Page Title/URL AOL Advanced -
Multitasking Web Search on Alta Vista
1 Multitasking Web Search on Alta Vista Amanda Spink* & Bernard J. Jansen Minsoo Park School of Info Sciences and Jan Pedersen School of Information Technology Chief Scientist Sciences The Pennsylvania State Overture Web Search University of Pittsburgh University Division 610 IS Building, 135 N. 4P Thomas Building 1070 Arastradero Road Bellefield Avenue University Park PA 16802 Palo Alto, CA 94304 Pittsburgh PA 15260 Tel: (814) 865-6459 [email protected] Tel: (412) 624-9454 Fax: (814) 865-6424 Fax: (412) 648-7001 E-mail: [email protected] E-mail: [email protected],edu * To whom all correspondence should be addressed. Ozmutlu [8] show that IR searches often include Abstract multiple topics, during a single search session or A user’s single session with a Web search engine multitasking search. Spink, Batemen and Greisdorf may consist of seeking information on single or [9] found that eleven (3.8%) of the 287 Excite users multiple topics. Most Web search sessions consist of responding to a Web-based survey reported two queries of two words. We present findings from a multitasking searches. However, limited knowledge study of two-query search sessions on the Alta Vista exists on the characteristics and patterns of Web search engine to examine the degree of multitasking searches. Recent studies have examined multitasking search by typical web searchers. A multitasking searching on the Excite and sample of two-query length search sessions were AlltheWeb.com Web search engines [10, 11]. filtered from Alta Vista transaction logs from 2003. Ozmutlu, Ozmutlu and Spink [10] provide a Findings include: (1) 81% of two-query sessions detailed analysis of multitasking sessions on were multitasking searches, and (2) there are a broad AlltheWeb.com. -
The Deep Web: Surfacing Hidden Value
White Paper The Deep Web: Surfacing Hidden Value BrightPlanet.com LLC July 2000 The author of this study is Michael K. Bergman. Editorial assistance was provided by Mark Smither; analysis and retrieval assistance was provided by Will Bushee. This White Paper is the property of BrightPlanet.com LLC. Users are free to distribute and use it for personal use.. Some of the information in this document is preliminary. BrightPlanet plans future revisions as better information and documentation is obtained. We welcome submission of improved information and statistics from others involved with the “deep” Web. Mata Hari® is a registered trademark and BrightPlanet™, CompletePlanet™, LexiBot™, search filter™ and A Better Way to Search™ are pending trademarks of BrightPlanet.com LLC. All other trademarks are the respective property of their registered owners. © 2000 BrightPlanet.com LLC. All rights reserved. Summary BrightPlanet has uncovered the “deep” Web — a vast reservoir of Internet content that is 500 times larger than the known “surface” World Wide Web. What makes the discovery of the deep Web so significant is the quality of content found within. There are literally hundreds of billions of highly valuable documents hidden in searchable databases that cannot be retrieved by conventional search engines. This discovery is the result of groundbreaking search technology developed by BrightPlanet called a LexiBot™ — the first and only search technology capable of identifying, retrieving, qualifying, classifying and organizing “deep” and “surface” content from the World Wide Web. The LexiBot allows searchers to dive deep and explore hidden data from multiple sources simultaneously using directed queries. Businesses, researchers and consumers now have access to the most valuable and hard- to-find information on the Web and can retrieve it with pinpoint accuracy. -
Towards the Ontology Web Search Engine
TOWARDS THE ONTOLOGY WEB SEARCH ENGINE Olegs Verhodubs [email protected] Abstract. The project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies in the Web. OWL (Web Ontology Languages) ontologies are meant, and they are necessary for the functioning of the SWES (Semantic Web Expert System). SWES is an expert system that will use found ontologies from the Web, generating rules from them, and will supplement its knowledge base with these generated rules. It is expected that the SWES will serve as a universal expert system for the average user. Keywords: Ontology Web Search Engine, Search Engine, Crawler, Indexer, Semantic Web I. INTRODUCTION The technological development of the Web during the last few decades has provided us with more information than we can comprehend or manage effectively [1]. Typical uses of the Web involve seeking and making use of information, searching for and getting in touch with other people, reviewing catalogs of online stores and ordering products by filling out forms, and viewing adult material. Keyword-based search engines such as YAHOO, GOOGLE and others are the main tools for using the Web, and they provide with links to relevant pages in the Web. Despite improvements in search engine technology, the difficulties remain essentially the same [2]. Firstly, relevant pages, retrieved by search engines, are useless, if they are distributed among a large number of mildly relevant or irrelevant pages. -
Merging Multiple Search Results Approach for Meta-Search Engines
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by D-Scholarship@Pitt MERGING MULTIPLE SEARCH RESULTS APPROACH FOR META-SEARCH ENGINES By Khaled Abd-El-Fatah Mohamed B.S-------- Cairo University, Egypt, 1995 M.A-------Cairo University, Egypt, 1999 M.A------- University of Pittsburgh 2001 Submitted to the Graduate Faculty of School of Information Sciences in Partial Fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2004 UNIVERSITY OF PITTSBURGH INFORMATION SCIENCES This dissertation was presented by Khaled Abd-El-Fatah Mohamed It was defended on Janauary 29, 2004 and approved by Chris Tomer, PhD, Associate Professor, DLIS Jose-Marie Griffiths, PhD, Professor, DLIS Don King, Research Professor, DLIS Amy Knapp, PhD, ULS Dissertation Director: Chris Tomer, PhD, Associate Professor MERGING MULTIPLE SEARCH RESULTS APPROACH FOR META-SEARCH ENGINES Khaled A. Mohamed, PhD University of Pittsburgh, 2004 Meta Search Engines are finding tools developed for enhancing the search performance by submitting user queries to multiple search engines and combining the search results in a unified ranked list. They utilized data fusion technique, which requires three major steps: databases selection, the results combination, and the results merging. This study tries to build a framework that can be used for merging the search results retrieved from any set of search engines. This framework based on answering three major questions: 1. How meta-search developers could define the optimal rank order for the selected engines. 2. How meta-search developers could choose the best search engines combination. 3. What is the optimal heuristic merging function that could be used for aggregating the rank order of the retrieved documents form incomparable search engines. -
Organizing User Search Histories
Global Journal of Computer Science and Technology Network, Web & Security Volume 13 Issue 13 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172 & Print ISSN: 0975-4350 Organizing user Search Histories By Ravi Kumar Yandluri Gokaraju Rangaraju Institute of Engineering & Technology, India Abstract - Internet userscontinuously make queries over web to obtain required information. They need information about various tasks and sub tasks for which they use search engines. Over a period of time they make plenty of related queries. Search engines save these queries and maintain user’s search histories. Users can view their search histories in chronological order. However, the search histories are not organized into related groups. In fact there is no organization made except the chronological order. Recently Hwang et al. studied the problem of organizing historical search information of users into groups dynamically. This automatic grouping of user search histories can help search engines also in various applications such as collaborative search, sessionization, query alterations, result ranking and query suggestions. They proposed various techniques to achieve this. In this paper we implemented those techniques practically using a prototype web application built in Java technologies. The experimental results revealed that the proposed application is useful to organize search histories. Indexterms : search engine, search history, click graph, query grouping. GJCST-E Classification : H.3.5 Organizing user Search Histories Strictly as per the compliance and regulations of: © 2013. Ravi Kumar Yandluri. This is a research/review paper, distributed under the terms of the Creative Commons Attribution- Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction inany medium, provided the original work is properly cited. -
Internet Research
Discipline Specific Researching on the Internet The internet is growing exponentially, and thousands of new web pages are being added each day. The upside is that you have an enormous amount of information only a few mouse clicks away. The downside is that you must refine your approach to online research in order to target the handful of websites that may be useful to you. The following search tops are designed to get you stared and save you time. Happy hunting! Finding Sources Go to www.library.unh.edu The University of New Hampshire Library has access to dozens of online databases catering to nearly every subject you may be studying. The site also has several online research guides and tools to ensure you'll find what you need. The library’s web site is a good first choice to help narrow your research. Try several search engines Google is the most popular search engine in the world, so that is a good place to start. There are, however, other search engines that might be of use to you: Altavista.com askjeeves.com ditto.com excite.com metacrawler.com dogpile.com Alltheweb.com yahoo.com Use the advanced search function Most search engines have a link for “advanced search.” This can be extremely useful in narrowing your search, as the additional features enable you to search by exact phrase, date range, domain, language, date of web page update, and more. Choose your search words carefully Use words that are specific, or unique, to what you are looking for. Some words are very common and will lead to far too many hits. -
A Community-Based Approach to Personalizing Web Search
COVER FEATURE A Community-Based Approach to Personalizing Web Search Barry Smyth University College Dublin Researchers can leverage the latent knowledge created within search communities by recording users’search activities—the queries they submit and results they select—at the community level.They can use this data to build a relevance model that guides the promotion of community-relevant results during regular Web search. ver the past few years, current Web search through rates should be preferred during ranking. engines have become the dominant tool for Unfortunately, Direct Hit’s so-called popularity engine accessing information online. However, even did not play a central role on the modern search stage today’s most successful search engines struggle (although the technology does live on as part of the O to provide high-quality search results: Approx- Teoma search engine) largely because the technology imately 50 percent of Web search sessions fail to find proved inept at identifying new sites or less well- any relevant results for the searcher. traveled ones, even though they may have had more The earliest Web search engines adopted an informa- contextual relevance to a given search. tion-retrieval view of search, using sophisticated term- Despite Direct Hit’s fate, the notion that searchers based matching techniques to identify relevant docu- themselves could influence the ranking of results by virtue ments from repeated occurrences of salient query terms. of their search activities remained a powerful one. It res- Although such techniques proved useful for identifying onates well with the ideas that underpin the social web, a set of potentially relevant results, they offered little a phrase often used to highlight the importance of a suite insight into how such results could be usefully ranked. -
Resources for Patrons
Part 1 Ready Reference on the Web: Resources for Patrons Quick! What is the best Web resource for finding a good vacuum cleaner? Getting information for a grade-school “state” report? Finding a life-saving clinical trial? People just walk up to our desks and ask us these questions every day. And we are expected to answer them in seconds, not minutes or hours. For heaven’s sake! How are we supposed to know? Well, as Robert W. Winter, Arthur G. Coons Professor of the History of Ideas, Emeritus at Occidental College in Los Angeles once told our class, “A liberal arts education will teach you what you don’t know.” In other words, as educated people, as librarians, we know that we don’t know everything, yet we are armed with tools to find out about anything. My hope is that the first section of this book can serve as one of those tools. I have tried to cover some of the most asked subject areas that I encounter every day on the job. I also point to directo- ries of high-quality sites that can quickly connect librarians and their patrons to the answers that they need. These listings may not answer every question. But I guarantee you, if you can learn to turn to them first, your patrons will hail your alacrity and brilliance. And isn’t this glory what we librarians live for? 7 Chapter 1 Searching and Meta-Searching the Internet The topic of searching the Web puts me in an elegiac mood. I think back to 1994 when I was first playing with the Web during my library school internship at the Getty Research Institute. -
Do Metadata Models Meet IQ Requirements?
Do Metadata Mo dels meet IQ Requirements Claudia Rolker Felix Naumann Humb oldtUniversitat zu Berlin Forschungszentrum Informatik FZI Unter den Linden HaidundNeuStr D Berlin D Karlsruhe Germany Germany naumanndbisinformatikhub erli nde rolkerfzide Abstract Research has recognized the imp ortance of analyzing information quality IQ for many dierent applications The success of data integration greatly dep ends on the quality of the individual data In statistical applications p o or data quality often leads to wrong conclusions High information quality is literally a vital prop erty of hospital information systems Po or data quality of sto ck price information services can lead to economically wrong decisions Several pro jects have analyzed this need for IQ metadata and have prop osed a set of IQ criteria or attributes which can b e used to prop erly assess information quality In this pap er we survey and compare these approaches In a second step we take a lo ok at existing prominent prop osals of metadata mo dels esp ecially those on the Internet Then we match these mo dels to the requirements of information quality mo deling Finally we prop ose a quality assurance pro cedure for the assurance of metadata mo dels Intro duction The quality of information is b ecoming increasingly imp ortant not only b ecause of the rapid growth of the Internet and its implication for the information industry Also the anarchic nature of the Internet has made industry and researchers aware of this issue As awareness of quality issues amongst information