Jaime Teevan

Total Page:16

File Type:pdf, Size:1020Kb

Jaime Teevan [email protected] JAIME TEEVAN http://www.teevan.org Jaime Teevan is a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. Working at the intersection of information retrieval, human computer interaction, and social media, she studies people’s information seeking activities. Much of her research focuses on the social and temporal context of information use, and she developed the first personalized search algorithm used by Bing. Her accomplishments have been honored with Technology Review TR35 Young Innovator and Borg Early Career Awards. Jaime has published numerous technical articles, including several books and best papers. She received a Ph.D. from MIT and a B.S. from Yale University. EDUCATION Massachusetts Institute of Technology, Cambridge, MA Ph.D., Electrical Engineering and Computer Science, January 2007. Thesis: Supporting Finding and Re-Finding Through Personalization Advisor: Prof. David R. Karger Committee Members: Prof. Mark S. Ackerman, Dr. Susan T. Dumais, Prof. Robert C. Miller S.M., Electrical Engineering and Computer Science, June 2001. Thesis: Improving Information Retrieval with Textual Analysis: Bayesian Models and Beyond Advisor: Prof. David R. Karger Yale University, New Haven, CT B.S., Computer Science, May 1998. Cum laude, with distinction in major. Senior thesis: Automatically Creating High Quality Internet Directories (Idea sold to Infoseek) Advisor: Prof. Gregory D. Hager PROFESSIONAL EXPERIENCE Senior Researcher, Microsoft Research, 2012 – Present. Researcher, Microsoft Research, 2006 – 2012. Studied the social and temporal context of information use using large-scale log analysis. Developed the first personalized search algorithm used by Bing. Affiliate Professor, Information School, University of Washington, 2013 – Present. Affiliate Assistant Professor, Information School, University of Washington, 2012 – 2013. Mentored students and lectured at graduate-level courses. Explored successful approaches to personal information management. Research Assistant, MIT, Computer Science and Artificial Intelligence Laboratory, 1999 – 2006. Developed a tool to support re-finding in dynamic information environments. Devised a generative model for information retrieval that better matches real text data than previous naïve Bayesian models. Research Intern, Microsoft Research, Spring 2004. Investigated the different things people mean for the same query. Developed a system to personalize search results by implicitly inferring the user’s intent based on previously encountered information. Software Engineer, Infoseek, Summer 1997, July 1998 – August 1999. Recipient Go Getter Award. Researched Internet organization and methods for determining webpage quality. Lead engineer for the software controlling all inter- and intra-application navigation. JAIME TEEVAN – Page 2 AWARDS AND HONORS Borg Early Career Award, CRA-W, 2014. Senior Leader Bench Program, Microsoft, 2012 – present. Senior Member, ACM, 2013. Delphi Fellow, Big Think, 2011. Corporate R&D Accelerated Development Program, Microsoft, 2010 – 2011. Gold Star, Microsoft, 2010. TR 35 2009 Young Innovator, Technology Review, 2009. Financial Technology Option (FTO), MIT Sloan School, 2003. Go Getter Award, Infoseek Corporation, 1999. Computer Research Association Outstanding Undergraduate Award, honorable mention, 1998. BEST PAPER AWARDS Notable Article, Computing Review, 2013 (with Radisnky, Svore, Dumais, Shokouhi, Horvitz). Honorable Mention, ICWSM 2013 (with Kairam, Morris, Liebling, Dumais). Honorable Mention, CHI 2013 (with Adar, Tan). Best Paper Nominee, CSCW 2013 (with Hehmeyer). Honorable Mention, CHI 2012 (with Bernstein, Dumais, Liebling, Horvitz). Best Search Marketing Paper, 2010 (with Dumais, Horvitz). Best Paper, CHI 2010 (with Dumais, Liebling). Best Paper Nominee, WSDM 2010 (with Tyler). Best Paper, Search Marketing 2010 (with Dumais, Horvitz). Best Student Paper, WSDM 2009 (with Adar, Dumais, Elsas). Best Paper, CHI 2008 (with Adar, Dumais). GRADUATE FELLOWSHIPS National Science Foundation Graduate Research Fellowship, 1999 – 2003. National Defense Science and Engineering Graduate Fellowship, honorable mention, 1998. AWARDS FROM YALE UNIVERSITY Cum laude, with distinction in major, 1998. Master’s Cup, Timothy Dwight College, 1998. J. Edward Meeker Prize for Excellence in Freshman composition, 1995. Bloch Prize for the Freshman who shall write the best essay in English, 1995. SELECTED PROFESSIONAL ACTIVITIES Panels Chair, Conference on Human Factors in Computing Systems (CHI), 2016. Industry Chair, Conference on Research and Development in Information Retrieval (SIGIR), 2015. Doctoral Consortium Chair, Conference on Human Factors in Computing Systems (CHI), 2015. General Chair, Conference on Web Search and Data Mining (WSDM), 2012. Associate Editor, ACM Transactions on Information Systems (TOIS), 2011 – present. Founder, Human Computer Interaction Seminar Series, MIT, CSAIL, 2003 – 2006. Institute Representative, Faculty Committee on the Library System, MIT, 2003 – 2005. Graduate Student Council Representative, Massachusetts Institute of Technology, 2002 – 2003. REVIEWING Senior Program Committee, CHI 2012, 2013, 2014, HCOMP 2014, SIGIR 2013, 2014, WSDM 2014. Program Committee, ASIST, CHI, CIKM, CSCW, ICWSM, IIiX, SIGIR, UIST, Web Science, WSDM, WWW. Reviewer, AAAI, ASIST, CACM, CHI, CIKM, CSCW, ECIR, FnT in HCI, GI, HCI, HCIR, ICWSM, IIiX, IP&M, IWC, MobileHCI, SIGIR, TKDE, TOCHI, TOIS, TWeb, UIST, WSDM, WWW. JAIME TEEVAN – Page 3 PUBLICATIONS BOOKS 1. Meredith Ringel Morris and Jaime Teevan. Collaborative Web Search: Who, What, Where, When, and Why. San Rafael, CA: Morgan & Claypool Series on Information Concepts, Retrieval, and Services (Ed. Gary Marchionini), 2010. 2. William Jones and Jaime Teevan (Eds.). Personal Information Management. Seattle: University of Washington Press, 2007. BOOK CHAPTERS 3. Susan Duamis, Robin Jeffries, Daniel M. Russell, Diane Tang and Jaime Teevan. “Undersatnding User Behavior through Log Data and Analysis.” In Judith S. Olson and Wendy Kellogg (Eds.), Ways of Knowing in HCI. New York: Springer, 2014. 4. Jaime Teevan and Susan Dumais. “Web Retrieval, Ranking, and Personalization.” In Ian Ruthven and Diane Kelly (Eds), Interactive Information Seeking, Behaviour and Retrieval. London: Facet Publishing, 2011. 5. Jaime Teevan, Robert Capra and Manuel A. Pérez-Quiñones. “How People Find Personal Information.” In William Jones and Jaime Teevan (Eds.), Personal Information Management. Seattle: University of Washington Press, 2007. 6. Diane Kelly, Jaime Teevan and Richard Boardman. “Understanding What Works: Evaluating PIM Tools.” In William Jones and Jaime Teevan (Eds.), Personal Information Management. Seattle: University of Washington Press, 2007. EDITORSHIPS 7. Jaime Teevan, William Jones and Benjamin B. Bederson (Eds.). Special Issue on Personal Information Management. Communications of the ACM (CACM), 49(1), January 2006. REFEREED JOURNAL ARTICLES 8. Jaime Teevan, Meredith Ringel Morris and Shiri Azenkot. Supporting Interpersonal Interaction during Collaborative Mobile Search. IEEE Computer special issue on Collaborative Information Seeking, 47(3), 2014. 9. Kira Radinsky, Krysta Svore, Susan T. Dumais, Milad Shokouhi, Jaime Teevan and Eric Horvitz. Behavioral Dynamics on the Web: Learning, Modeling and Predicting. ACM Transactions on Information Systems (TOIS), 31(3), 2013. (Notable Computing Article) 10. Jaime Teevan, Susan T. Dumais and Eric Horvitz. Potential for Personalization. ACM Transactions on Computer-Human Interaction (TOCHI) special issue on Data Mining for Understanding User Needs, 2009. (Best Search Marketing Paper) 11. Jaime Teevan. How People Recall, Recognize, and Reuse Search Results. ACM Transactions on Information Systems (TOIS) special issue on Keeping, Refinding, and Sharing Personal Information, 26(4), September 2008. 12. Edward Cutrell, Susan T. Dumais and Jaime Teevan. Searching to Eliminate Personal Information Management. Communications of the ACM (CACM), Special Issue on Personal Information Management, 49(1), January 2006. 13. Diane Kelly and Jaime Teevan. Implicit Feedback for Inferring User Preference: A Bibliography. SIGIR Forum, 37(2), 2003. JAIME TEEVAN – Page 4 REFEREED CONFERENCE PAPERS 14. Avishay Livne, Vivek Gokuladas, Jaime Teevan, Susan Dumais and Eytan Adar. CiteSight: Supporting Contextual Citation Recommendation Using Differential Search. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2014), Gold Coast, Australia, July 2014. 15. Chia-Jung Lee, Jaime Teevan and Sebastian de la Chica. Characterizing Multi-Click Behavior and the Risks and Opportunities of Changing Results during Use. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2014), Gold Coast, Australia, July 2014. 16. Carsten Eickhoff, Jaime Teevan, Ryen White and Susan T. Dumais. Lessons from the Journey: A Query Log Analysis of Within-Session Learning. In Proceedings of the Seventh ACM International Conference on Web Search and Data Mining (WSDM 2014), New York, NY, February 2014. 17. Walter Lasecki, Jaime Teevan and Ece Kamar. Information Extraction and Manipulation Threats in Crowd-Powered Systems. In Proceedings of the 2014 ACM Conference on Computer Supported Cooperative Work (CSCW 2014), Baltimore, MD, February 2014. 18. Anne Oeldorf-Hirsch, Brent Hecht, Meredith Ringel Morris, Jaime Teevan
Recommended publications
  • A Distributed, Cooperative Citeseer
    OverCite: A Distributed, Cooperative CiteSeer Jeremy Stribling, Jinyang Li,† Isaac G. Councill,†† M. Frans Kaashoek, Robert Morris MIT Computer Science and Artificial Intelligence Laboratory †New York University and MIT CSAIL, via UC Berkeley ††Pennsylvania State University Abstract common for non-commercial Web sites to be popular yet to lack the resources to support their popularity. Users of CiteSeer is a popular online resource for the computer sci- such sites are often willing to help out, particularly in the ence research community, allowing users to search and form of modest amounts of compute power and network browse a large archive of research papers. CiteSeer is ex- traffic. Examples of applications that thrive on volunteer pensive: it generates 35 GB of network traffic per day, re- resources include SETI@home, BitTorrent, and volunteer quires nearly one terabyte of disk storage, and needs sig- software distribution mirror sites. Another prominent ex- nificant human maintenance. ample is the PlanetLab wide-area testbed [36], which is OverCite is a new digital research library system that made up of hundreds of donated machines over many dif- aggregates donated resources at multiple sites to provide ferent institutions. Since donated resources are distributed CiteSeer-like document search and retrieval. OverCite en- over the wide area and are usually abundant, they also al- ables members of the community to share the costs of run- low the construction of a more fault tolerant system using ning CiteSeer. The challenge facing OverCite is how to a geographically diverse set of replicas. provide scalable and load-balanced storage and query pro- In order to harness volunteer resources at many sites, a cessing with automatic data management.
    [Show full text]
  • Scooped, Again
    Scooped, Again The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Ledlie, Jonathan, Jeff Shneidman, Margo Seltzer, and John Huth. 2003. Scooped, again. Peer-to-peer systems II: Second international workshop, IPTPS 2003, Berkeley, California, February 21-22, 2003, ed. IPTPS 2003, Frans Kaashoek, and Ion Stoica. Berlin: Springer Verlang. Previously published in Lecture Notes in Computer Science 2735: 129-138. Published Version http://dx.doi.org/10.1007/b11823 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2799042 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Scooped, Again Jonathan Ledlie, Jeff Shneidman, Margo Seltzer, John Huth Harvard University jonathan,jeffsh,margo ¡ @eecs.harvard.edu [email protected] Abstract p2p Grid Users (scientists) users Users (disparate The Peer-to-Peer (p2p) and Grid infrastructure commu- groups ) ©§ nities are tackling an overlapping set of problems. In ad- App. ¢¤£¦¥§¥§¨writers place demand demand dressing these problems, p2p solutions are usually moti- demand (e.g., on vated by elegance or research interest. Grid researchers, App. writers OGSA) writers} under pressure from thousands of scientists with real file research and Infrastructure writers Infrastructure writers sharing and computational needs, are pooling their solu- development tions from a wide range of sources in an attempt to meet user demand. Driven by this need to solve large scientific Figure 1: In serving their well-defined user base, the problems quickly, the Grid is being constructed with the Grid community has needed to draw from both its ances- tools at hand: FTP or RPC for data transfer, centraliza- try of supercomputing and from Systems research (in- tion for scheduling and authentication, and an assump- cluding p2p and distributed computing).
    [Show full text]
  • Donut: a Robust Distributed Hash Table Based on Chord
    Donut: A Robust Distributed Hash Table based on Chord Amit Levy, Jeff Prouty, Rylan Hawkins CSE 490H Scalable Systems: Design, Implementation and Use of Large Scale Clusters University of Washington Seattle, WA Donut is an implementation of an available, replicated distributed hash table built on top of Chord. The design was built to support storage of common file sizes in a closed cluster of systems. This paper discusses the two layers of the implementation and a discussion of future development. First, we discuss the basics of Chord as an overlay network and our implementation details, second, Donut as a hash table providing availability and replication and thirdly how further development towards a distributed file system may proceed. Chord Introduction Chord is an overlay network that maps logical addresses to physical nodes in a Peer- to-Peer system. Chord associates a ranges of keys with a particular node using a variant of consistent hashing. While traditional consistent hashing schemes require nodes to maintain state of the the system as a whole, Chord only requires that nodes know of a fixed number of “finger” nodes, allowing both massive scalability while maintaining efficient lookup time (O(log n)). Terms Chord Chord works by arranging keys in a ring, such that when we reach the highest value in the key-space, the following key is zero. Nodes take on a particular key in this ring. Successor Node n is called the successor of a key k if and only if n is the closest node after or equal to k on the ring. Predecessor Node n is called the predecessor of a key k if and only n is the closest node before k in the ring.
    [Show full text]
  • On the Feasibility of Peer-To-Peer Web Indexing and Search
    University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science October 2003 On the Feasibility of Peer-to-Peer Web Indexing and Search Jiyang Li Massachusetts Institute of Technology Boon Thau Loo University of Pennsylvania, [email protected] Joseph M. Hellerstein University of California M Frans Kaashoek Massachusetts Institute of Technology David Karger Massachusetts Institute of Technology See next page for additional authors Follow this and additional works at: https://repository.upenn.edu/cis_papers Recommended Citation Jiyang Li, Boon Thau Loo, Joseph M. Hellerstein, M Frans Kaashoek, David Karger, and Robert Morris, "On the Feasibility of Peer-to-Peer Web Indexing and Search", . October 2003. Postprint version. Published in Lecture Notes in Computer Science, Volume 2735, Peer-to-Peer Systems II, 2003, pages 207-215. Publisher URL: http://dx.doi.org/10.1007/b11823 NOTE: At the time of publication, author Boon Thau Loo was affiliated with the University of California at Berkeley. Currently (April 2007), he is a faculty member in the Department of Computer and Information Science at the University of Pennsylvania. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/328 For more information, please contact [email protected]. On the Feasibility of Peer-to-Peer Web Indexing and Search Abstract This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries vo er an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table.
    [Show full text]
  • CHR: a Distributed Hash Table for Wireless Ad Hoc Networks
    CHR: a Distributed Hash Table for Wireless Ad Hoc Networks∗ Filipe Araujo´ Lu´ıs Rodrigues Universidade de Lisboa University of Lisboa fi[email protected] [email protected] J¨org Kaiser Changling Liu University of Ulm University of Ulm [email protected] [email protected] Carlos Mitidieri University of Ulm [email protected] Abstract This paper focuses on the problem of implementing a distributed hash table (DHT) in wireless ad hoc networks. Scarceness of resources and node mobility turn routing into a challenging problem and therefore, we claim that building a DHT as an overlay network (like in wired environments) is not the best option. Hence, we present a proof-of-concept DHT, called Cell Hash Routing (CHR), designed from scratch to cope with problems like limited available energy, communication range or node mobility. CHR overcomes these problems, by using position information to organize a DHT of clusters instead of individual nodes. By using position-based routing on top of these clusters, CHR is very efficient. Furthermore, its localized routing and its load sharing schemes, make CHR very scalable in respect to network size and density. For these reasons, we believe that CHR is a simple and yet powerful adaptation of the DHT concept for wireless ad hoc environments. ∗Selected sections of this report were published in the Proceedings of the Fourth Interna- tional Workshop on Distributed Event-Based Systems (DEBS'05), in conjuction with the 25th International Conference on Distributed Computing Systems (ICDCS-25), Columbus, Ohio, USA, June 2005.
    [Show full text]
  • Mast Cornellgrad 0058F 12280.Pdf (2.365Mb)
    Protocols for Building Secure and Scalable Decentralized Applications A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Kai Mast December 2020 © 2020 Kai Mast ALL RIGHTS RESERVED Protocols for Building Secure and Scalable Decentralized Applications Kai Mast, Ph.D. Cornell University 2020 Decentralized ledger technologies distribute data and execution across a public peer- to-peer network, which allows for more democratic governance of distributed systems and enables tolerating Byzantine failures. However, current protocols for such decen- tralized ledgers are limited in performance as they require every participant of the protocol to execute and validate every operation. Because of this, systems such as Bitcoin or Ethereum are limited in their throughput to around 10 transaction per sec- ond. Additionally, current implementations provide virtually no privacy to individual users, which precludes decentralized ledgers from being used in many real-world ap- plications. This thesis analyses the scalability and privacy limitations of current protocols and discusses means to improve them in detail. It then outlines two novel protocols for building decentralized ledgers, their implementation, and evaluates their performance under realistic workloads. First, it introduces the BitWeave, a blockchain protocol enabling parallel transac- tion validation and serialization while maintaining the same safety and liveness guaran- tees provided by Bitcoin. BitWeave partitions the system’s workload across multiple distinct shards, each of which then executes transactions mostly independently, while allowing for serializable cross-shard transactions. Second, it discusses DataPods, which is a database architecture and programming abstraction that combines the safety properties of decentralized systems with the scala- bility and confidentiality of centralized systems.
    [Show full text]
  • JAIME TEEVAN [email protected]
    JAIME TEEVAN [email protected] http://www.teevan.org WORK: MICROSOFT RESEARCH HOME One Microsoft Way 13109 NE 38th Place Redmond, WA 98052 Bellevue, WA 98005 (425) 421-9299 (425) 556-9753 EDUCATION Massachusetts Institute of Technology, Cambridge, MA Ph.D., Electrical Engineering and Computer Science, January 2007. Thesis: Supporting Finding and Re-Finding Through Personalization Advisor: Prof. David R. Karger Committee Members: Prof. Mark S. Ackerman, Dr. Susan T. Dumais, Prof. Robert C. Miller S.M., Electrical Engineering and Computer Science, June 2001. Thesis: Improving Information Retrieval with Textual Analysis: Bayesian Models and Beyond Advisor: Prof. David R. Karger Yale University, New Haven, CT B.S., Computer Science, May 1998. Cum laude, with distinction in major. Senior thesis: Automatically Creating High Quality Internet Directories Advisor: Prof. Gregory D. Hager PROFESSIONAL EXPERIENCE Researcher, Microsoft Research, 2006 – Present. Studied the history of digital content (how the content has changed and how people have interacted with it over time) and built tools to take advantage of that history. Developed a community of researchers interested in personal information management. Published numerous papers, presented research to the academic community, and influenced product decisions. Research Assistant, Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, 1999 – 2006. Explored re-finding in dynamic information environments. Developed the Re:Search Engine, a tool to support re-finding by preserving previous search contexts. Devised a generative document model for information retrieval that more closely matches real text data than previous naïve Bayesian models. Research Intern, Microsoft Research, Spring 2004. Investigated the variation in goals of people using Web search engines.
    [Show full text]
  • Building Distributed Systems Using Programmable Networks
    c Copyright 2020 Ming Liu Building Distributed Systems Using Programmable Networks Ming Liu A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2020 Reading Committee: Arvind Krishnamurthy, Chair Luis Ceze Ratul Mahajan Program Authorized to Offer Degree: Computer Science & Engineering University of Washington Abstract Building Distributed Systems Using Programmable Networks Ming Liu Chair of the Supervisory Committee: Professor Arvind Krishnamurthy Computer Science & Engineering The continuing increase of data center network bandwidth, coupled with a slower improvement in CPU performance, has challenged our conventional wisdom regarding data center networks: how to build distributed systems that can keep up with the network speeds and are high-performant and energy-efficient? The recent emergence of a programmable network fabric (PNF) suggests a poten- tial solution. By offloading suitable computations to a PNF device (i.e., SmartNIC, reconfigurable switch, or network accelerator), one can reduce request serving latency, save end-host CPU cores, and enable efficient traffic control. In this dissertation, we present three frameworks for building PNF-enabled distributed systems: (1) IncBricks, an in-network caching fabric built with network accelerators and programmable switches; (2) iPipe, an actor-based framework for offloading distributed applications on Smart- NICs; (3) E3, an energy-efficient microservice execution platform for SmartNIC-accelerated servers. This dissertation presents how to make efficient use of in-network heterogeneous computing re- sources by employing new programming abstractions, applying approximation techniques, co- designing with end-host software layers, and designing efficient control-/data-planes. Our pro- totyped systems using commodity PNF hardware not only show the feasibility of such an approach but also demonstrate that it is an indispensable technique for efficient data center computing.
    [Show full text]
  • Chord: a Scalable Peer-To-Peer Lookup Service for Internet
    Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications ¡ Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science [email protected] http://pdos.lcs.mit.edu/chord/ Abstract and involves relatively little movement of keys when nodes join A fundamental problem that confronts peer-to-peer applications is and leave the system. to efficiently locate the node that stores a particular data item. This Previous work on consistent hashing assumed that nodes were paper presents Chord, a distributed lookup protocol that addresses aware of most other nodes in the system, making it impractical to this problem. Chord provides support for just one operation: given scale to large number of nodes. In contrast, each Chord node needs a key, it maps the key onto a node. Data location can be easily “routing” information about only a few other nodes. Because the implemented on top of Chord by associating a key with each data routing table is distributed, a node resolves the hash function by item, and storing the key/data item pair at the node to which the communicating with a few other nodes. In the steady state, in an ¢ -node system, each node maintains information only about £¥¤§¦©¨ key maps. Chord adapts efficiently as nodes join and leave the £¥¤§¦©¨ ¢ system, and can answer queries even if the system is continuously ¢ other nodes, and resolves all lookups via mes- changing. Results from theoretical analysis, simulations, and ex- sages to other nodes. Chord maintains its routing information as nodes join and leave the system; with high probability each such periments show that Chord is scalable, with communication cost £¥¤§¦©¨ and the state maintained by each node scaling logarithmically with event results in no more than ¢ messages.
    [Show full text]
  • Overcite: a Cooperative Digital Research Library
    OverCite: A Cooperative Digital Research Library by Jeremy Stribling Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2005 ) Massachusetts Institute of Technology 2005. All rights reserved. Author ......... ................. Department of Electri Engineering and CIter Science September 2, 2005 ^ /,\ la /7 Certified by ......................... 1 K/ I M. Frans Kaashoek Professor Thesis Supervisor 7pb., - ,,, -.. 7.B- Z-111" Acceptedby......... ( .............. Arthur C. Smith Chairman, Department Committee on Graduate Students MASSACHUSEtITSINSmTE OF TECHNOLOGY ARCHIVfS F71·· YTZAAAA MAKRL ZUUb LIBRARIES 2 OverCite: A Cooperative Digital Research Library by Jeremy Stribling Submitted to the Department of Electrical Engineering and Computer Science on September 2, 2005, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering Abstract CiteSeer is a well-known online resource for the computer science research community, allowing users to search and browse a large archive of research papers. Unfortunately, its current centralized incarnation is costly to run. Although members of the community would presumably be willing to donate hardware and bandwidth at their own sites to assist CiteSeer, the current architecture does not facilitate such distribution of resources. OverCite is a design for a new architecture for a distributed and cooperative research library based on a distributed hash table (DHT). The new architecture harnesses donated resources at many sites to provide document search and retrieval service to researchers worldwide. A preliminary evaluation of an initial OverCite prototype shows that it can service more queries per second than a centralized system, and that it increases total storage capacity by a factor of n/4 in a system of n nodes.
    [Show full text]
  • Overcite: a Distributed, Cooperative Citeseer
    OverCite: A Distributed, Cooperative CiteSeer Jeremy Stribling, Jinyang Li,† Isaac G. Councill,†† M. Frans Kaashoek, Robert Morris MIT Computer Science and Artificial Intelligence Laboratory †New York University and MIT CSAIL, via UC Berkeley ††Pennsylvania State University Abstract common for non-commercial Web sites to be popular yet to lack the resources to support their popularity. Users of CiteSeer is a popular online resource for the computer sci- such sites are often willing to help out, particularly in the ence research community, allowing users to search and form of modest amounts of compute power and network browse a large archive of research papers. CiteSeer is ex- traffic. Examples of applications that thrive on volunteer pensive: it generates 35 GB of network traffic per day, re- resources include SETI@home, BitTorrent, and volunteer quires nearly one terabyte of disk storage, and needs sig- software distribution mirror sites. Another prominent ex- nificant human maintenance. ample is the PlanetLab wide-area testbed [36], which is OverCite is a new digital research library system that made up of hundreds of donated machines over many dif- aggregates donated resources at multiple sites to provide ferent institutions. Since donated resources are distributed CiteSeer-like document search and retrieval. OverCite en- over the wide area and are usually abundant, they also al- ables members of the community to share the costs of run- low the construction of a more fault tolerant system using ning CiteSeer. The challenge facing OverCite is how to a geographically diverse set of replicas. provide scalable and load-balanced storage and query pro- In order to harness volunteer resources at many sites, a cessing with automatic data management.
    [Show full text]
  • Algorithmic Engineering Towards More Efficient Key-Value Systems Bin
    Algorithmic Engineering Towards More Efficient Key-Value Systems Bin Fan CMU-CS-13-126 December 18, 2013 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: David G. Andersen, Chair Michael Kaminsky, Intel Labs Garth A. Gibson Edmund B. Nightingale, Microsoft Research Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright © 2013 Bin Fan The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Memory Efficiency, Hash Table, Bloom Filter, Caching, Load Bal- ancing Dedicated to my family iv Abstract Distributed key-value systems have been widely used as elemental components of many Internet-scale services at sites such as Amazon, Facebook and Twitter. This thesis examines a system design approach to scale existing key-value systems, both horizontally and vertically, by carefully engineering and integrating techniques that are grounded in re- cent theory but also informed by underlying architectures and expected workloads in practice. As a case study, we re-design FAWN-KV—a dis- tributed key-value cluster consisting of “wimpy” key-value nodes—to use less memory but achieve higher throughput even in the worst case. First, to improve the worst-case throughput of a FAWN-KV system, we propose a randomized load balancing scheme that can fully utilize all the nodes regardless of their query distribution. We analytically prove and empirically demonstrate that deploying a very small but ex- tremely fast load balancer at FAWN-KV can effectively prevent uneven or dynamic workloads creating hotspots on individual nodes.
    [Show full text]