Federated Search for Heterogeneous Environments

Total Page:16

File Type:pdf, Size:1020Kb

Federated Search for Heterogeneous Environments Federated Search for Heterogeneous Environments Jaime Arguello CMU-LTI-11-008 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Jamie Callan (chair) Jaime Carbonell Yiming Yang Fernando Diaz (Yahoo! Research) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Language and Information Technologies © 2011, Jaime Arguello Para Sophia, con todo mi amor. Acknowledgments This dissertation would have not been possible without the persistent guidance and en- couragement of my mentors. I owe a big ‘thank you’ to my advisor, Jamie Callan. It was during the Fall of 2004, when I took two half-semester courses taught by Jamie (Digi- tal Libraries and Text Data Mining), that I discovered the field of Information Retrieval. Seven years later, I am writing my dissertation acknowledgments. Jamie’s advice on both research and life in general has been a great source of insight and support. His feedback on research papers, presentations, lecture slides, proposals, and this disserta- tion has taught me a great deal about communicating effectively. From Jamie’s example, I have learned valuable lessons on research, teaching, and mentoring. I would like to thank Jaime Carbonell, Yiming Yang, and Fernando Diaz for agreeing to be in my thesis committee. Their feedback was critical in making this dissertation stronger. Fernando Diaz helped shape many of the ideas presented in this dissertation. It was during an internship with Fernando at Yahoo! where I began working on vertical selection. I enjoyed it so much I returned for a second internship a year later. I have been fortunate to have had Fernando as a mentor and collaborator ever since. I wish to thank the rest of the group at Yahoo! Labs Montreal for making my two in- ternships so rewarding. I was lucky to have had Jean-François Crespo as a manager, who provided everything necessary to run experiments quickly. Jean-François Paiement was a helpful collaborator on the work on domain adaptation for vertical selection. Hughes Buchard, Alex Rochette, Remi Kwan, and Daniel Boies provided much needed feedback, help with tools, and a model work environment. Merci pour tout! I would like to thank Carolyn Rosé for taking me in as a masters student and for guiding me through my first two years doing research. More generally, I would like to thank all the LTI faculty for putting together such an engaging curriculum and the LTI staff for keeping the place running and always being patient with my questions. Grad school would have not been the same without my friendship with Jon Elsas. It is quite possible I ran every research idea through Jon. Those hundreds of lunch and coffee-break discussions are much appreciated. Thank you to all my other friends at CMU, who provided such valuable feedback on every practice talk I subjected them to. And, thank you to my officemates, Grace Hui-Yang and Anagha Kulkarni, for providing such a nice environment to work in. I would also like to thank all my friends outside of CMU for taking me away from work (admittedly, without much convincing) and for making those times so worthwhile. I am grateful to Tori Ames for her friendship and support during my last year as a PhD student. 1 2 Finally and most importantly, I would like to thank my family for their love and encouragement. Leaping into the unknown is easier when you have someone there to pick you up and cheer you on. None of this would have been possible without my mother, who taught me the value of loving what you do. Abstract In information retrieval, federated search is the problem of automatically searching across multiple distributed collections or resources. It is typically decomposed into two subsequent steps: deciding which resources to search (resource selection) and deciding how to combine results from multiple resources into a single presentation (results merg- ing). Federated search occurs in different environments. This dissertation focuses on an environment that has not been deeply investigated in prior work. The growing heterogeneity of digital media and the broad range of user information needs that occur in today’s world have given rise to a multitude of systems that specialize on a specific type of search task. Examples include search for news, images, video, local businesses, items for sale, and even social-media interactions. In the Web search domain, these specialized systems are called verticals and one important task for the Web search engine is the prediction and integration of relevant vertical content into the Web search results. This is known as aggregated web search and is the main focus on this dissertation. Providing a single-point of access to all these diverse systems requires federated search solutions that can support result-type and retrieval-algorithm heterogeneity. This type of heterogeneity violates major assumptions made by state-of-the-art resource selection and results merging methods. While existing resource selection methods derive predictive evidence exclusively from sampled resource content, the approaches proposed in this dissertation draw on ma- chine learning as a means to easily integrate various different types of evidence. These include, for example, evidence derived from (sampled) vertical content, vertical query- traffic, click-through information, and properties of the query string. In order to operate in a heterogeneous environment, we focus on methods that can learn a vertical-specific relationship between features and relevance. We also present methods that reduce the need for human-produced training data. Existing results merging methods formulate the task as score normalization. In a more heterogeneous environment, however, combining results into a single presentation requires satisfying a number of layout constraints. The dissertation proposes a novel formulation of the task: block ranking. During block-ranking, the objective is to rank sequences of results that must appear grouped together (vertically or horizontally) in the final presentation. Based on this formulation, the dissertation proposes and empiri- cally validates a cost-effective methodology for evaluating aggregated web search results. Finally, it proposes the use of machine learning methods for the task of block-ranking. 3 Contents 1 Introduction 1 1.1 Unique Properties of Aggregated Web Search . 3 1.2 The Goal and Contribution of this Dissertation . 6 1.2.1 Summary of Contributions . 10 1.3 Impact on Other Applications . 11 1.4 Dissertation Outline . 12 2 Prior Research in Federated Search 13 2.1 Cooperative vs. Uncooperative Federated Search . 13 2.2 Resource Representation . 14 2.2.1 How much to sample . 14 2.2.2 What to sample . 15 2.2.3 When to re-sample . 16 2.2.4 Combining collection representations. 16 2.3 Resource Selection . 16 2.3.1 Problem Formulation and Evaluation . 16 2.3.2 Modeling Query-Collection Similarity . 17 2.3.3 Modeling the Relevant Document Distribution . 19 2.3.4 Modeling the Document Score Distribution . 20 2.3.5 Combining Collection Representations . 22 2.3.6 Modeling Search Engine Effectiveness . 23 2.3.7 Query Classification . 25 2.3.8 Machine Learned Resource Selection . 26 2.4 Results Merging . 27 2.5 Summary . 28 3 Vertical Selection 30 3.1 Formal Task Definition . 32 3.2 Classification Framework . 32 3.3 Features . 32 3.3.1 Corpus Features . 33 3.3.2 Query-Log Features . 36 3.3.3 Query Features . 38 3.3.4 Summary of Features . 39 3.4 Methods and Materials . 39 4 CONTENTS 5 3.4.1 Verticals . 40 3.4.2 Queries . 40 3.4.3 Evaluation Metric . 41 3.4.4 Implementation Details . 42 3.4.5 Single-evidence Baselines . 42 3.5 Experimental Results . 42 3.6 Discussion . 44 3.6.1 Feature Ablation Study . 44 3.6.2 Per Vertical Performance . 45 3.7 Related Work in Vertical Selection . 46 3.8 Summary . 48 4 Domain Adaptation for Vertical Selection 50 4.1 Formal Task Definition . 51 4.2 Related Work . 51 4.3 Vertical Adaptation Approaches . 54 4.3.1 Gradient Boosted Decision Trees . 54 4.3.2 Learning a Portable Model . 55 4.3.3 Adapting a Model to the Target Vertical . 58 4.4 Features . 59 4.4.1 Query Features . 59 4.4.2 Query-Vertical Features . 59 4.5 Methods and Materials . 60 4.5.1 Verticals . 60 4.5.2 Queries . 61 4.5.3 Evaluation Metrics . 61 4.5.4 Unsupervised Baselines . 62 4.6 Experimental Results . 63 4.7 Discussion . 63 4.8 Summary . 68 5 Vertical Results Presentation: Evaluation Methodology 69 5.1 Related Work in Aggregated Web Search Evaluation . 71 5.1.1 Understanding User Behavior . 71 5.1.2 Aggregated Web Search Evaluation . 72 5.2 Layout Assumptions . 74 5.3 Task Formulation: Block Ranking . 75 5.4 Metric-Based Evaluation Methodology . 75 5.4.1 Collecting Block-Pair Preference Judgements . 76 5.4.2 Deriving the Reference Presentation . 77 5.4.3 Measuring Distance from the Reference . 79 5.5 Methods and Materials . 80 5.5.1 Block-Pair Preference Assessment . 81 6 CONTENTS 5.5.2 Verticals . 81 5.5.3 Queries . 82 5.5.4 Empirical Metric Validation . 83 5.6 Experimental Results . 83 5.6.1 Assessor Agreement on Block-Pair Judgements . 84 5.6.2 Assessor Agreement on Presentation-Pair Judgements . 85 5.6.3 Metric Validation Results . 86 5.7 Discussion . 88 5.8 Summary . 89 6 Vertical Results Presentation: Approaches 92 6.1 Formal Task Definition . 93 6.2 Related Work . 94 6.3 Features . ..
Recommended publications
  • Key Issue Cervone 1L
    Serials – 20(1), March 2007 Key issue Key issue Federated searching: today, tomorrow and the future (?) FRANK CERVONE Assistant University Librarian for Information Technology Northwestern University, Evanston, IL, USA Although for many people it may seem that authorization, thereby allowing patrons the ability federated search software has been around forever, to use the resources from wherever they may it has only been a little over three years since happen to be. federated search burst on the library scene. In that time, the software has been evolving in many ways as the landscape of Internet-based searching has Standard features and evolving trends shifted. When many libraries decided to install federated In the current generation of federated searching search software, they were trying to address some products, standard features include the ability to of the issues that have confounded searchers since use multiple protocols for gathering data from the introduction of CD ROM-based indexes in the information products. In addition to the ubiquitous early 1990s. For example, it has been known for Z39.50 protocol, newer protocols such as SRU/SRW some time that when patrons have multiple and OAI-PMH are rapidly becoming basic require- databases available to them, they typically prefer ments for all federated search engines, regardless to use a single interface rather than having to learn of scale. difference interfaces for each database. Further- As the needs of researchers and faculty evolve, more, most people want to have a way to search advanced features are being built into next- those multiple resources simultaneously and then generation federated searching products, such as consolidate the search results into a single, ordered integration of the federated searching software with list.
    [Show full text]
  • Federated Search: New Option for Libraries in the Digital Era Shailendra Kumar Gareema Sanaman Namrata Rai
    International CALIBER-2008 267 Federated Search: New Option for Libraries in the Digital Era Shailendra Kumar Gareema Sanaman Namrata Rai Abstract The article describes the concept of federated searching and demarcates the difference between metasearching and federated searching which are synonymously used. Due to rapid growth of scholarly information, need of federated searching arises. Advantages of federated search have been described along with the search model indicating old search model and federated search model. Various technologies used for federated searching have been discussed. While working with search, the selection of federated search engine and how it works in libraries and other institutions are explained. Article also covers various federated search providers and at the end system advantages and drawbacks in federated search have been listed. Keywords: Federated Search, Meta Search, Google Scholar, SCOPUS, XML, SRW 1. Introduction In the electronic information environment one of the responses to the problem of bringing large amounts of information together has been for libraries to introduce portals. A portal is a gateway, or a point where users can start their search for information on the web. There are a number of different types of portals, for example universities have been introducing “institutional portals”, which can be described as “a layer which aggregates, integrates, personalizes and presents information, transactions and applications to the user according to their role and preferences” (Dolphin, Miller & Sherratt, 2002). A second type of portal is a “subject portals”. A third type of portal is a “federated search tool” which brings together the resources to a library subscribes and allows cross-searching of these resources.
    [Show full text]
  • Federal Depository Library Handbook
    US Government Printing Office Federal Depository Library Program Federal Depository Library Handbook Federal Depository Library Handbook US Government Printing Office Library Services & Content Management, Library Planning & Development 732 N. Capitol St. NW, Washington, DC 20401 202-512-1800 • 866-512-1800 • Fax 202-512-2104 [email protected] Text highlighted in yellow refers to legal and/or program requirements for depository libraries. Preface The Federal Depository Library Handbook (Handbook), describes requirements of Federal depository libraries, both legal and those prescribed by the Government Printing Office (GPO). Additionally the Handbook provides guidance to libraries on how they can meet the requirements of the Federal Depository Library Program (FDLP). This information was previously found in two publications. Instructions to Depository Libraries contained the FDLP requirements while practical guidance for carrying out FDLP operations was in the Federal Depository Library Manual and its supplements. The Instructions and Manual are superseded by the Handbook. In accepting the privilege of Federal depository status for their libraries, directors agreed to abide by all the laws and requirements governing officially designated depository libraries. Recognizing this, chapter 2 of the Handbook outlines the legal requirements and each chapter of the Handbook includes a section for library administrators. These sections are also consolidated in Appendix C. Depository coordinators MUST ensure that all personnel involved in any aspect of depository operations are aware of the obligations of depository libraries and of the importance of the Handbook. Depository staff should review the Handbook on a regular basis and any questions can be directed to askGPO (http://www.gpoaccess.gov/help/index.html).
    [Show full text]
  • From Federated to Aggregated Search
    From federated to aggregated search Fernando Diaz, Mounia Lalmas and Milad Shokouhi [email protected] [email protected] [email protected] Outline Introduction and Terminology Architecture Resource Representation Resource Selection Result Presentation Evaluation Open Problems Bibliography 1 Outline Introduction and Terminology Architecture Resource Representation Resource Selection Result Presentation Evaluation Open Problems Bibliography Introduction What is federated search? What is aggregated search? Motivations Challenges Relationships 2 A classical example of federated search One query Collections to be searched www.theeuropeanlibrary.org A classical example of federated search Merged list www.theeuropeanlibrary.org of results 3 Motivation for federated search Search a number of independent collections, with a focus on hidden web collections Collections not easily crawlable (and often should not) Access to up-to-date information and data Parallel search over several collections Effective tool for enterprise and digital library environments Challenges for federated search How to represent collections, so that to know what documents each contain? How to select the collection(s) to be searched for relevant documents? How to merge results retrieved from several collections, to return one list of results to the users? Cooperative environment Uncooperative environment 4 From federated search to aggregated search “Federated search on the web” Peer-to-peer network connects distributed peers (usually for
    [Show full text]
  • Federated Search of Text Search Engines in Uncooperative Environments
    Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University [email protected] Thesis Committee: Jamie Callan (Carnegie Mellon University, Chair) Jaime Carbonell (Carnegie Mellon University) Yiming Yang (Carnegie Mellon University) Luis Gravano (Columbia University) 2006 i ACKNOWLEDGEMENTS I would never have been able to finish my dissertation without the guidance of my committee members, help from friends, and support from my family. My first debt of gratitude must go to my advisor, Jamie Callan, for taking me on as a student and for leading me through many aspects of my academic career. Jamie is not only a wonderful advisor but also a great mentor. He told me different ways to approach research problems. He showed me how to collaborate with other researchers and share success. He guided me how to write conference and journal papers, had confidence in me when I doubted myself, and brought out the best in me. Jamie has given me tremendous help on research and life in general. This experience has been extremely valuable to me and I will treasure it for the rest of my life. I would like to express my sincere gratitude towards the other members of my committee, Yiming Yang, Jamie Carbonell and Luis Gravano, for their efforts and constructive suggestions. Yiming has been very supportive for both my work and my life since I came to Carnegie Mellon University. Her insightful comments and suggestions helped a lot to improve the quality of this dissertation. Jamie has kindly pointed out different ways to solve research problems in my dissertation, which results in many helpful discussions.
    [Show full text]
  • Numbers Are Footnotes for Positions Referenced Below
    Gary L. Wade 320 Crescent Village Circle Unit 1396 San Jose, CA 95134 [email protected] 949-505-3223 KEY EXPERIENCES • Passion for developing solutions for Apple Inc.’s platforms (macOS, iOS, tvOS, watchOS) • Innovative in usability design and testing, and refactoring for performance • Design and development of multi-platform client/server/cloud software solutions • Facilitate instruction for technical and non-technical staff including one-on-one mentoring CURRENT TALENT SUMMARY (numbers are footnotes for positions referenced below) • Adobe Photoshop (24,17,10,9,8,1) • Macintosh (Mac OS X) SDK (28,27,26,21,20, • AppKit/macOS (28,27,26,21,20,19,18,17,1) 19,18,17,16,15,14,13,12,11,8,7,6,5,4,2,1) • C (28,27,26,24,23,22,21,20,19,18,17,16,15,14, • Objective-C (28,27,26,24,23,22,21,20,19,18, 13,12,11,8,7,6,5,4,1) 17,1) • C++ (21,20,18,17,16,15,14,13,12,11,8,7,6,5,1) • Perforce (21,20,15) • Client/Server/Cloud (28,27,26,25,24,22,21,20, • Quality Assurance (25,24,17,3,1) 19,18,17,16,13,7,6,4) • QuickTime/AVKit (25,24,22,17,13,10,9,6,5) • Cocoa (28,27,26,25,24,23,22,21,20,19,18,17,1) • REST (28,27,26,25,24,22,21) • CoreFoundation (28,27,26,22,21,20,18,17,16, • Sales support (21,10,9,6,1) 15,14,13,12,1) • sqlite (23) • Cross-platform (28,27,21,20,17,15,14,13,12,8, • Subversion (27,22,19,18,1) 7,6,5,4,2) • Swift (28,27,26,25,24) • git (28,27,26,25,24,23,22) • UIKit/iOS (28,27,26,24,23,22,1) • HTML (28,27,26,25,17,14,12,10,9,1) • UIKit/tvOS (25) • Int.
    [Show full text]
  • By Michael L. Black Dissertation
    TRANSPARENT CULTURES: IMAGINED USERS AND THE POLITICS OF SOFTWARE DESIGN (1975-2012) BY MICHAEL L. BLACK DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in English in the Graduate College of the University of Illinois at Urbana-Champaign, 2014 Urbana, Illinois Doctoral Committee: Professor Robert Markley, Chair Associate Professor Ted Underwood Associate Professor Melissa Littlefield Associate Professor Spencer Schaffner Associate Professor John T. Newcomb ii Abstract The rapid pace of software’s development poses serious challenges for any cultural history of computing. While digital media studies often sidestep historicism, this project asserts that computing’s messy, and often hidden, history can be studied using digital tools built to adapt text-mining strategies to the textuality of source code. My project examines the emergence of personal computing, a platform underlying much of digital media studies but that itself has received little attention outside of corporate histories. Using an archive of technical papers, professional journals, popular magazines, and science fiction, I trace the origin of design strategies that led to a largely instrumentalist view of personal computing and elevated “transparent design” to a privileged status. I then apply text-mining tools that I built with this historical context in mind to study source code critically, including those features of applications hidden by transparent design strategies. This project’s first three chapters examine how and why strategies of information hiding shaped consumer software design from the 1980s on. In Chapter 1, I analyze technical literature from the 1970s and 80s to show how cognitive psychologists and computer engineers developed an ideal of transparency that discouraged users from accessing information structures underlying personal computers.
    [Show full text]
  • 3/1996 Info1mat1on• Technology
    OCCASION This publication has been made available to the public on the occasion of the 50th anniversary of the United Nations Industrial Development Organisation. DISCLAIMER This document has been produced without formal United Nations editing. The designations employed and the presentation of the material in this document do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations Industrial Development Organization (UNIDO) concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries, or its economic system or degree of development. Designations such as “developed”, “industrialized” and “developing” are intended for statistical convenience and do not necessarily express a judgment about the stage reached by a particular country or area in the development process. Mention of firm names or commercial products does not constitute an endorsement by UNIDO. FAIR USE POLICY Any part of this publication may be quoted and referenced for educational and research purposes without additional permission from UNIDO. However, those who make use of quoting and referencing this publication are requested to follow the Fair Use Policy of giving due credit to UNIDO. CONTACT Please contact [email protected] for further information concerning UNIDO publications. For more information about UNIDO, please visit us at www.unido.org UNITED NATIONS INDUSTRIAL DEVELOPMENT ORGANIZATION Vienna International Centre, P.O. Box 300, 1400 Vienna, Austria Tel: (+43-1) 26026-0 · www.unido.org · [email protected] (v), 6 9 p · 21850 EMERGING TECHNOLOGY SERIES 3/1996 Info1mat1on• Technology UNITED NATIONS INDUSTRIAL DEVEWPMENT ORGANIZATION Vienna, 1997 TO OUR READERS EMERGING TECHNOLOGY This issue of lnfonnation Technology brings a cover paper on intranets.
    [Show full text]
  • Federated Search As a Transformational Technology Enabling Knowledge Discovery: the Role of Worldwidescience.Org
    Federated search as a transformational technology enabling knowledge discovery: the role of WorldWideScience.org ABSTRACT Purpose ­ To describe the work of the Office of Scientific and Technical Information (OSTI) in the U.S. Department of Energy Office of Science and OSTI’s development of the powerful search engine, WorldWideScience.org. With tools such as Science.gov and WorldWideScience.org, the patron gains access to multiple, geographically dispersed deep web databases and can search all of the constituent sources with a single query. Approach – Historical and descriptive Findings – That WorldWideScience.org fills a unique niche in discovering scientific material in an information landscape that includes search engines such as Google and Google Scholar. Value – One of the few articles to describe in depth the important work being done by the U.S. Office of Scientific and Technical Information in the field of search and discovery Paper type – Review Keywords – OSTI, WorldWideScience, ScienceAccelerator.gov, Science.gov, search engines, federated search, Google, multilingual Th h e dd ee vv ee ll oo pp me e nn t oo f OO SS TI Established in 1947, the U.S. Department of Energy (DOE) Office of Scientific and Technical Information (OSTI) (http://www.osti.gov/) fulfills the agency’s responsibilities to collect, preserve, and disseminate scientific and technical information (STI) emanating from DOE research and development (R&D) activities. OSTI was founded on the principle that science progresses only if knowledge is shared. OSTI’s mission is to advance science and sustain creativity by making R&D findings available and useful to DOE and other researchers and the public.
    [Show full text]
  • School Board of National School Boards Association
    DOCUMENT RESUME ED 410 678 EA 028 578 TITLE Bringing Tomorrow's Technology to You Today: School Board of Tomorrow Resource Guide. INSTITUTION National School Boards Association, Alexandria, VA. PUB DATE 1997-04-00 NOTE 34p.; Guide to an exhibit at the Annual Meeting of the National School Boards Association (Anaheim, CA, April 26-29, 1997). PUB TYPE Guides Non-Classroom (055) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS *Boards of Education; *Computer Mediated Communication; Computer Networks; *Computer System Design; Corporate Support; *Educational Technology; Electronic Mail; Elementary Secondary Education; Information Management; Information Networks; Information Services; Information Systems; *Information Technology; *Telecommunications ABSTRACT The National School Boards Association (NSBA), the National School Boards Foundation, NSBA's Institute for the Transfer of Technology to Education, and Apple Computer, Inc., launched "The School Board of Tomorrow Exhibit" at NSBA's 1996 annual conference and exposition in Orlando, Florida. This handbook summarizes the communication technologies featured in the exhibit. The first part provides an overview of the five different environments simulated in the exhibit: a school board member's home office, the family education network (FEN), a superintendent's office, a school board meeting room, and a community forum. The second part of this guide contains five selected NSBA articles from "The Electronic School." The articles offer advice and information on hiring technical constOtants for a school district, conducting successful bond campaigns, using e-mail to conduct school board business, and acquiring funding for technology. A list of technology providers is included. (LMI) ******************************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document.
    [Show full text]
  • Aquene Freechild Submission
    Different Comments: Please strengthen Voluntary Voting System Guidelines First Name Last Name Email Date submitted Comment for interference through cyberattacks, it is imperative the VVSG prohibit connectivity to the public Internet through wireless modems or other means. We want to ban modems in vote counting machines both to protect data and to prevent manipulation. Therefore, we urge the Commission to add the following to the guideline under Principle 13: DATA PROTECTION: 'The voting system does not use wireless technology or connect to any public jenbigelow@ 2019-05-02 telecommunications infrastructure.' Indeed, eliminating Jennifer Bigelow hotmail.com 23:32:32 GMT wireless modems and internet connectivity will not the draft Voluntary Voting System Guidelines (VVSG) and commend the robust principles and guidelines for software independence, auditability and ballot secrecy. Given the fact that our election systems are being targeted for interference through cyberattacks, it is imperative the VVSG also prohibit connectivity to the public Internet through wireless modems or other means. We want to ban modems in vote counting machines both to protect data and to prevent manipulation. Therefore, we urge the Commission to add the following to the guideline under bobjanz01@ 2019-05-03 Principle 13: DATA PROTECTION: 'The voting system does Bob JANZ gmail.com 00:43:14 GMT not use wireless technology or connect to any public computer science. As such I speak with some authority when I say I strongly support the draft Voluntary Voting System Guidelines (VVSG). I commend the robust principles and guidelines for software independence, auditability and ballot secrecy. Given the fact that our election systems are being targeted for interference through cyberattacks, it is imperative the VVSG also prohibit connectivity to the public Internet through wireless modems or other means.
    [Show full text]
  • Clustering and Ranking
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Electronic Thesis and Dissertation Archive - Università di Pisa Universita` degli Studi di Pisa Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis On Two Web IR Boosting Tools: Clustering and Ranking Antonino Gull`ı Supervisor Referee Prof. Paolo Ferragina Prof. Dino Pedreschi Referee Prof. Francesco Romani May 6, 2006 Abstract This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most “relevant” come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Pagerank and Hits algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient inadequate to provide satisfying search experience. In many situations a flat list of ten search results is not enough, and the users might desire to have a larger number of search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users’ on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected on-the-fly by users.
    [Show full text]