Towards an Open Web Index: Lessons from the Past
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Why We Need an Independent Index of the Web ¬ Dirk Lewandowski 50 Society of the Query Reader
Politics of Search Engines 49 René König and Miriam Rasch (eds), Society of the Query Reader: Reections on Web Search, Amsterdam: Institute of Network Cultures, 2014. ISBN: 978-90-818575-8-1. Why We Need an Independent Index of the Web ¬ Dirk Lewandowski 50 Society of the Query Reader Why We Need an Independent Index of the Web ¬ Dirk Lewandowski Search engine indexes function as a ‘local copy of the web’1, forming the foundation of every search engine. Search engines need to look for new documents constantly, detect changes made to existing documents, and remove documents from the index when they are no longer available on the web. When one considers that the web com- prises many billions of documents that are constantly changing, the challenge search engines face becomes clear. It is impossible to maintain a perfectly complete and current index.2 The pool of data changes thousands of times each second. No search engine can keep up with this rapid pace of change.3 The ‘local copy of the web’ can thus be viewed as the Holy Grail of web indexing at best – in practice, different search engines will always attain a varied degree of success in pursuing this goal.4 Search engines do not merely capture the text of the documents they find (as is of- ten falsely assumed). They also generate complex replicas of the documents. These representations include, for instance, information on the popularity of the document (measured by the number of times it is accessed or how many links to the document exist on the web), information extracted from the documents (for example the name of the author or the date the document was created), and an alternative text-based 1. -
Evaluating Exploratory Search Engines : Designing a Set of User-Centered Methods Based on a Modeling of the Exploratory Search Process Emilie Palagi
Evaluating exploratory search engines : designing a set of user-centered methods based on a modeling of the exploratory search process Emilie Palagi To cite this version: Emilie Palagi. Evaluating exploratory search engines : designing a set of user-centered methods based on a modeling of the exploratory search process. Human-Computer Interaction [cs.HC]. Université Côte d’Azur, 2018. English. NNT : 2018AZUR4116. tel-01976017v3 HAL Id: tel-01976017 https://tel.archives-ouvertes.fr/tel-01976017v3 Submitted on 20 Feb 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE DE DOCTORAT Evaluation des moteurs de recherche exploratoire : Elaboration d'un corps de méthodes centrées utilisateurs, basées sur une modélisation du processus de recherche exploratoire Emilie PALAGI Wimmics (Inria, I3S) et Data Science (EURECOM) Présentée en vue de l’obtention Devant le jury, composé de : du grade de docteur en Informatique Christian Bastien, Professeur, Université de Loraine d’Université Côte d’Azur Pierre De Loor, Professeur, ENIB Dirigée par : Fabien Gandon -
Gekonnt Suchen
20 > PRAXIS > SUCH-TIPPS PCtipp, März 2020 Gekonnt suchen Wer im Internet etwas sucht, kommt nicht um Google herum. Mit den richtigen Suchtricks werden Ihre Ergebnisse viel besser. Ausserdem stellen wir einige praktische Alternativen zur Google-Suche vor. VON LUCA DIGGELMANN oogle ist die Nummer eins in Sachen raus. Sowohl über diverse Einstellungen als die Google bei der Suchanfrage lädt, oder die Websuche – ob man es mag oder nicht. auch mit Textcodes finden Sie besser, was Sie Region, aus der Google seine Inhalte primär GSowohl in Sachen Funktionalität als wirklich suchen. bezieht. auch bei der Qualität der Suchergebnisse ist der Internetgigant nur schwer zu schlagen. EINSTELLUNGEN ANPASSEN TABS VERWENDEN Deshalb gibt es von uns gleich nachfolgend Gleich zu Beginn sollten Sie die Einstellungen Direkt unter jeder Suche zeigt Google diverse viele Tipps für alle, die Google mögen. Und von Google genau durchgehen. Das lohnt sich Reiter (engl. Tabs) an. Standardmässig ist Alle für alle anderen stellen wir ab S. 23 mehrere generell bei jeder Software, so auch bei der angewählt. Dahinter reihen sich Optionen wie Alternativen vor, mit denen man auch nicht Google-Suche. Sie finden die Einstellungen auf News, Bilder, Maps oder Shopping. Die genaue schlecht fährt. der Frontseite von Google unten rechts im Reihenfolge variiert jeweils leicht, je nachdem, grauen Balken, Bild 1. welche Inhalte Google zu Ihrer Anfrage fin- Besser googeln In den Einstellungen sehen Sie verschiedene det. Klicken Sie einen der Tabs an, zeigt Menüs. Gehen Sie diese Eintrag für Eintrag Google vornehmlich Inhalte aus der gewähl- Bereits ohne Vorkenntnisse ist Google eine genau durch, besonders das Menü Sucheinstel- ten Kategorie an. -
The Autonomous Surfer
Renée Ridgway The Autonomous Surfer CAIS Report Fellowship Mai bis Oktober 2018 GEFÖRDERT DURCH RIDGWAY The Autonomous Surfer Research Questions The Autonomous Surfer endeavoured to discover the unknown unknowns of alternative search through the following research questions: What are the alternatives to Google search? What are their hidden revenue models, even if they do not collect user data? How do they deliver divergent (and qualitative) results or knowledge? What are the criteria that determine ranking and relevance? How do p2p search engines such as YaCy work? Does it deliver alternative results compared to other search engines? Is there still a movement for a larger, public index? Can there be serendipitous search, which is the ability to come across books, articles, images, information, objects, and so forth, by chance? Aims and Projected Results My PhD research investigates Google search – its early development, its technological innovation, its business model of the past 20 years and how it works now. Furthermore, I have experimented with Tor (The Onion Router) in order to find out if I could be anonymous online, and if so, would I receive diver- gent results from Google with the same keywords. For my fellowship at CAIS I decided to first research search engines that were incorporated into the Tor browser as default (Startpage, Disconnect) or are the default browser now (DuckDuckGo). I then researched search engines in my original CAIS proposal that I had come across in my PhD but hadn’t had the time to research; some are from the Society of the Query Reader (2014) and others I found en route or on colleagues’ suggestions. -
About Garlic and Onions a Little Journey…
About Garlic and Onions A little journey… Tobias Mayer, Technical Solutions Architect BRKSEC-2011 Cisco Webex Teams Questions? Use Cisco Webex Teams to chat with the speaker after the session How 1 Find this session in the Cisco Events Mobile App 2 Click “Join the Discussion” 3 Install Webex Teams or go directly to the team space 4 Enter messages/questions in the team space BRKSEC-2011 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 About Garlic and Onions We are all looking for privacy on the internet, for one or the other reason. This Session is about some technologies you can use to anonymise your network traffic, such as Tor (The Onion Router). The first part will give an introduction and explain the underlaying technology of Tor. We will take look at how you can not only use the Tor browser for access but also how the Tor network is working. We will learn how you can establish a Tor session and how we can find hidden websites and give examples of some websites...So we will enter the Darknet together. Beside Tor, we will also take a quick look at other techniques like I2P (Garlic Routing). In the last section we will make a quick sanity check what security technologies we can use to (maybe) detect such traffic in the network. This presentation is aimed at everyone who likes to learn about anonymization techniques and have a little bit of fun in the Darknet. BRKSEC-2011 © 2020 Cisco and/or its affiliates. All rights reserved. -
Distributed Indexing/Searching Workshop Agenda, Attendee List, and Position Papers
Distributed Indexing/Searching Workshop Agenda, Attendee List, and Position Papers Held May 28-19, 1996 in Cambridge, Massachusetts Sponsored by the World Wide Web Consortium Workshop co-chairs: Michael Schwartz, @Home Network Mic Bowman, Transarc Corp. This workshop brings together a cross-section of people concerned with distributed indexing and searching, to explore areas of common concern where standards might be defined. The Call For Participation1 suggested particular focus on repository interfaces that support efficient and powerful distributed indexing and searching. There was a great deal of interest in this workshop. Because of our desire to limit attendance to a workable size group while maximizing breadth of attendee backgrounds, we limited attendance to one person per position paper, and furthermore we limited attendance to one person per institution. In some cases, attendees submitted multiple position papers, with the intention of discussing each of their projects or ideas that were relevant to the workshop. We had not anticipated this interpretation of the Call For Participation; our intention was to use the position papers to select participants, not to provide a forum for enumerating ideas and projects. As a compromise, we decided to choose among the submitted papers and allow multiple position papers per person and per institution, but to restrict attendance as noted above. Hence, in the paper list below there are some cases where one author or institution has multiple position papers. 1 http://www.w3.org/pub/WWW/Search/960528/cfp.html 1 Agenda The Distributed Indexing/Searching Workshop will span two days. The first day's goal is to identify areas for potential standardization through several directed discussion sessions. -
Final Study Report on CEF Automated Translation Value Proposition in the Context of the European LT Market/Ecosystem
Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem FINAL REPORT A study prepared for the European Commission DG Communications Networks, Content & Technology by: Digital Single Market CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report This study was carried out for the European Commission by Luc MEERTENS 2 Khalid CHOUKRI Stefania AGUZZI Andrejs VASILJEVS Internal identification Contract number: 2017/S 108-216374 SMART number: 2016/0103 DISCLAIMER By the European Commission, Directorate-General of Communications Networks, Content & Technology. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein. ISBN 978-92-76-00783-8 doi: 10.2759/142151 © European Union, 2019. All rights reserved. Certain parts are licensed under conditions to the EU. Reproduction is authorised provided the source is acknowledged. 2 CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report CONTENTS Table of figures ................................................................................................................................................ 7 List of tables .................................................................................................................................................. -
Online-Quellen Nutzen: Recherche Im Internet
ONLINE-QUELLEN NUTZEN: RECHERCHE IM INTERNET Katharina Gilarski, Verena Müller, Martin Nissen (Stand: 08/2020) Inhaltsverzeichnis 1 Recherche mit Universalsuchmaschinen ............................................................. 2 1.1 Clever googlen ......................................................................................................... 2 1.2 Suchoperatoren für die einfache Suche ................................................................... 2 1.3 Die erweiterte Suche mit Google .............................................................................. 3 1.4 Die umgekehrte Bildersuche mit Google .................................................................. 4 1.5 Das Geheimnis der Treffersortierung in Google ....................................................... 5 1.6 Alternative universelle Suchmaschinen .................................................................... 6 1.7 Surface vs. Deep Web ............................................................................................. 9 1.8 Vor- und Nachteile von universellen Suchmaschinen ............................................. 10 1.9 Bewertung von Internetquellen ............................................................................... 10 2 Recherche mit wissenschaftlichen Suchmaschinen ......................................... 11 2.1 Google Scholar: Googles Suchmaschine für Wissenschaftler ................................ 11 2.2 Suchmöglichkeiten von Google Scholar ................................................................ -
Web-Page Indexing Based on the Prioritize Ontology Terms
Web-page Indexing based on the Prioritize Ontology Terms Sukanta Sinha1, 4, Rana Dattagupta2, Debajyoti Mukhopadhyay3, 4 1Tata Consultancy Services Ltd., Victoria Park Building, Salt Lake, Kolkata 700091, India [email protected] 2Computer Science Dept., Jadavpur University, Kolkata 700032, India [email protected] 3Information Technology Dept., Maharashtra Institute of Technology, Pune 411038, India [email protected] 4WIDiCoReL Research Lab, Green Tower, C-9/1, Golf Green, Kolkata 700095, India Abstract. In this world, globalization has become a basic and most popular human trend. To globalize information, people are going to publish the docu- ments in the internet. As a result, information volume of internet has become huge. To handle that huge volume of information, Web searcher uses search engines. The Web-page indexing mechanism of a search engine plays a big role to retrieve Web search results in a faster way from the huge volume of Web re- sources. Web researchers have introduced various types of Web-page indexing mechanism to retrieve Web-pages from Web-page repository. In this paper, we have illustrated a new approach of design and development of Web-page index- ing. The proposed Web-page indexing mechanism has applied on domain spe- cific Web-pages and we have identified the Web-page domain based on an On- tology. In our approach, first we prioritize the Ontology terms that exist in the Web-page content then apply our own indexing mechanism to index that Web- page. The main advantage of storing an index is to optimize the speed and per- formance while finding relevant documents from the domain specific search engine storage area for a user given search query. -
Merging Multiple Search Results Approach for Meta-Search Engines
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by D-Scholarship@Pitt MERGING MULTIPLE SEARCH RESULTS APPROACH FOR META-SEARCH ENGINES By Khaled Abd-El-Fatah Mohamed B.S-------- Cairo University, Egypt, 1995 M.A-------Cairo University, Egypt, 1999 M.A------- University of Pittsburgh 2001 Submitted to the Graduate Faculty of School of Information Sciences in Partial Fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2004 UNIVERSITY OF PITTSBURGH INFORMATION SCIENCES This dissertation was presented by Khaled Abd-El-Fatah Mohamed It was defended on Janauary 29, 2004 and approved by Chris Tomer, PhD, Associate Professor, DLIS Jose-Marie Griffiths, PhD, Professor, DLIS Don King, Research Professor, DLIS Amy Knapp, PhD, ULS Dissertation Director: Chris Tomer, PhD, Associate Professor MERGING MULTIPLE SEARCH RESULTS APPROACH FOR META-SEARCH ENGINES Khaled A. Mohamed, PhD University of Pittsburgh, 2004 Meta Search Engines are finding tools developed for enhancing the search performance by submitting user queries to multiple search engines and combining the search results in a unified ranked list. They utilized data fusion technique, which requires three major steps: databases selection, the results combination, and the results merging. This study tries to build a framework that can be used for merging the search results retrieved from any set of search engines. This framework based on answering three major questions: 1. How meta-search developers could define the optimal rank order for the selected engines. 2. How meta-search developers could choose the best search engines combination. 3. What is the optimal heuristic merging function that could be used for aggregating the rank order of the retrieved documents form incomparable search engines. -
Awareness Watch™ Newsletter V18N2 February 2020
Awareness Watch™ Newsletter By Marcus P. Zillman, M.S., A.M.H.A. http://www.AwarenessWatch.com/ V18N2 February 2020 Welcome to the V18N2 February 2020 issue of the Awareness Watch™ Newsletter. This newsletter is available as a complimentary subscription and will be issued monthly. Each newsletter will feature the following: Awareness Watch™ Featured Report Awareness Watch™ Spotters Awareness Watch™ Book/Paper/Article Review Subject Tracer™ Information Blogs I am always open to feedback from readers so please feel free to email with all suggestions, reviews and new resources that you feel would be appropriate for inclusion in an upcoming issue of Awareness Watch™. This is an ongoing work of creativity and you will be observing constant changes, constant updates knowing that “change” is the only thing that will remain constant!! Awareness Watch™ Featured Report This month’s featured report covers Privacy Resources 2020 and is a comprehensive listing of privacy resources, tools and alerts including search engines, directories, subject guides and index resources and sites on the Internet available for the 2020 year. The below list of the privacy sources is taken partially from my Subject Tracer™ white paper titled Privacy Resources 2020 and is constantly updated with Subject Tracer™ bots at the following URL: http://PrivacyResources.info/ These resources and sources will help you to discover the many new pathways available through the Internet to find the latest new and existing privacy research, resources, sources, tools and sites. As this site is constantly updated it would be to your benefit to bookmark and return to the above URL frequently. -
BRKSEC-2011.Pdf
#CLUS About Garlic and Onions A little journey… Tobias Mayer, Technical Solutions Architect BRKSEC-2011 #CLUS Me… CCIE Security #14390, CISSP & Motorboat driving license… Working in Content Security & TLS Security tmayer{at}cisco.com Writing stuff at “blogs.cisco.com” #CLUS BRKSEC-2011 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 Agenda • Why anonymization? • Using Tor (Onion Routing) • How Tor works • Introduction to Onion Routing • Obfuscation within Tor • Domain Fronting • Detect Tor • I2P – Invisible Internet Project • Introduction to Garlic Routing • Conclusion #CLUS BRKSEC-2011 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Cisco Webex Teams Questions? Use Cisco Webex Teams (formerly Cisco Spark) to chat with the speaker after the session How 1 Find this session in the Cisco Events App 2 Click “Join the Discussion” 3 Install Webex Teams or go directly to the team space 4 Enter messages/questions in the team space Webex Teams will be moderated cs.co/ciscolivebot#BRKSEC-2011 by the speaker until June 18, 2018. #CLUS © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Different Intentions Hide me from Government! Hide me from ISP! Hide me from tracking! Bypass Corporate Bypass Country Access Hidden policies restrictions (Videos…) Services #CLUS BRKSEC-2011 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 Browser Identity Tracking does not require a “Name” Tracking is done by examining parameters your browser reveals https://panopticlick.eff.org #CLUS BRKSEC-2011 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 7 Proxies EPIC Browser #CLUS BRKSEC-2011 © 2018 Cisco and/or its affiliates.