Experiences in Building and Operating Epost, a Reliable Peer-To-Peer Application

Total Page:16

File Type:pdf, Size:1020Kb

Experiences in Building and Operating Epost, a Reliable Peer-To-Peer Application Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer Application Alan Mislove†‡ Ansley Post†‡ Andreas Haeberlen†‡ Peter Druschel† {amislove,abpost,ahae,druschel}@rice.epostmail.org †Max Planck Institute for Software Systems, Stuhlsatzenhausweg 85, 66123 Saarbr¨ucken, Germany. ‡Rice University, 6100 Main Street, MS-132, Houston, TX 77005, USA. ABSTRACT Keywords Peer-to-peer (p2p) technology can potentially be used to build Peer-to-peer, electronic mail, reliability, decentralized sys- highly reliable applications without a single point of failure. tems However, most of the existing applications, such as file shar- ing or web caching, have only moderate reliability demands. 1. INTRODUCTION Without a challenging proving ground, it remains unclear Decentralized, cooperative systems (also known as p2p whether the full potential of p2p systems can be realized. systems) have received much attention in recent years be- To provide such a proving ground, we have designed, de- cause they promise robustness to a wide range of work- ployed and operated a p2p-based email system. We chose loads and failures, are highly scalable with respect to nodes, email because users depend on it for their daily work and users, and data, and enable cooperative sharing of otherwise therefore place high demands on the availability and reliabil- underutilized resources. The technology was first adopted ity of the service, as well as the durability, integrity, authen- for use in file sharing and content distribution systems like ticity and privacy of their email. Our system, ePOST, has KaZaA, Gnutella, eDonkey and BitTorrent [3, 20, 29, 40], been actively used by a small group of participants for over which now enjoy widespread popular use. While these sys- two years. tems have grown to a considerable scale, they have only very In this paper, we report the problems and pitfalls we en- modest reliability demands. In general, their users seem to countered in this process. We were able to address some of tolerate slow downloads, repeated failures, and even cor- them by applying known principles of system design, while rupted data, as long as they get the content eventually. others turned out to be novel and fundamental, requiring us Prototypes of more reliable p2p systems, including con- to devise new solutions. Our findings can be used to guide tent distribution [3, 10, 16, 55], information retrieval [24, 52, the design of future reliable p2p systems and provide inter- 53,56], and name resolution [41,54], have mostly focused on esting new directions for future research. availability, scalability, and resilience to overload. Coopera- tive storage systems [5,12,30] additionally require durability and consistency, but to our knowledge, these systems have Categories and Subject Descriptors not yet been developed and deployed for production use. C.2.4 [Computer-Communications networks]: Distributed Thus, it remains unclear whether the full reliability poten- Systems—Peer-to-peer applications; H.4.3 [Information Sys- tial of p2p systems can be realized in practice. tems]: Communications Applications—Electronic mail; D.4.5 In an effort to address this question, we have designed, [Operating Systems]: Reliability—Fault-tolerance; D.0 deployed and operated ePOST, a serverless email system. [Software]: General—Peer-to-peer systems We have chosen email not because it is a killer application for p2p (it isn’t). Rather, we chose it as a proving ground because it is well understood, widely used, and has strin- gent reliability requirements. Many users depend on email General Terms for their daily work and therefore place high demands on the Algorithms, Design, Measurement, Reliability, Experimen- availability of the service, as well as the durability, integrity, tation authenticity, and privacy of their email. Hence, by demon- strating that users can rely on a p2p-based email system as their primary email service, we believe that we have taken a major step towards establishing p2p technology as a viable Permission to make digital or hard copies of all or part of this work for platform for applications with high reliability demands. personal or classroom use is granted without fee provided that copies are As an additional benefit, the basic elements of an email not made or distributed for profit or commercial advantage and that copies system — event notification, durable storage of mostly im- bear this notice and the full citation on the first page. To copy otherwise, to mutable data, and some per-user mutable metadata — are republish, to post on servers or to redistribute to lists, requires prior specific characteristic of many collaborative applications. Demon- permission and/or a fee. EuroSys’06, April 18–21, 2006, Leuven, Belgium. strating that highly reliable email services can be provided Copyright 2006 ACM 1-59593-322-0/06/0004 ...$5.00. using p2p technology suggests that the technology can sup- port other collaborative applications and even enable new of streaming data, while ePOST’s focus is on availability, applications. durability and consistency. In this paper, we report on our experience in designing, End System Multicast (ESM) [10, 25, 49] is a deployed deploying, and operating ePOST. Since ePOST users ex- system used to broadcast streaming video using cooperating pect the system to perform equally well or better than a end hosts to disseminate the data. ESM, like VoIP, is used conventional server-based system, our challenge was to pro- for streaming media data, which has high requirements of vide an acceptable user experience from a system built upon timeliness but not reliability or durability. This makes the unreliable machines and networks. Surprisingly, the prob- challenges it faces very different, though no less significant, lems we encountered were generally the result of incorrect from those addressed in ePOST. assumptions made in the design of underlying component OpenHash [28, 45] is an publicly open DHT system that technologies like the structured overlay and the distributed provides a public hashtable interface allowing users to store hash table (DHT) used by ePOST. Seemingly unrelated and retrieve data. OpenHash is a public service infrastruc- user-perceivable errors and inconsistencies in ePOST oper- ture for others to build applications and therefore does not ation often led us to discover fundamental problems with emphasize data durability and consistency to the extent re- these components. quired for running ePOST. We describe in some detail the challenges we encountered CoDoNS and CoDNS [41, 44] are decentralized systems in providing users with an acceptable service. This means that seek to replace or augment the existing DNS infras- providing strong data durability, high availability, and a tructure to improve its reliability, availability, and perfor- mostly consistent view of the user’s email folders despite mance. Since DNS entries are infrequently updated, con- correlated node failures, network partitions, network anoma- sistency problems are not likely to be visible to the user. lies, and node churn. In some cases, we were able to address Coral [11,16] is a cooperative web cache, which is centrally problems by applying known principles of system design; administered on the PlanetLab [43] network and does not al- other issues turned out to be novel and fundamental, re- low untrusted nodes to join the network. Coral-CDN forms quiring us to devise new solutions. After a partial re-design a cache of data that can be found on the Internet, and as to address initial problems, ePOST has supported a small such all data contained can be recovered from the source. user base, including the authors, who have relied on the The Resilient Overlay Networks (RON) [1] project has system as their primary email service for over two years. deployed a network of nodes that can be used to increase Overall, this paper makes four contributions: (i) We re- the reliability of routing in the Internet. It augments the port on our experiences in designing and operating a co- service traditionally provided by the Internet and does not operative, serverless email service that supports real users. store data on behalf of users. (ii) We identify two problems, routing anomalies and non- transitivity, that emerged during our use of the system, and 3. BACKGROUND we describe and evaluate our solutions for them. (iii) We note some pitfalls we encountered that are of interest to de- In this section, we give a brief overview of the ePOST signers of similar systems, though we were able to address system, and we describe our current deployment. The initial them using known principles of system design. Finally, (iv) design of ePOST was previously sketched in [35]. we demonstrate that it is possible to support applications with high availability, data durability, and consistency using 3.1 Overview p2p technology. ePOST provides secure, decentralized email services to The rest of this paper is structured as follows: Section 2 its users. Each user runs the ePOST software on her lo- describes related work, and Section 3 presents background cal machine and contributes some of that machine’s CPU, on the design and deployment of ePOST. Section 4 describes storage and network bandwidth to the system. Users’ email four general lessons we distilled from our experience build- messages and metadata are stored cooperatively and redun- ing and operating ePOST, and we present and evaluate solu- dantly in a DHT backed by the participants’ disks. All data tions for several problems we had to solve. Section 5 details is signed for authenticity and encrypted for confidentiality some pitfalls we encountered, and Section 6 describes inter- before it is inserted into the cooperative storage. esting directions for future research. Section 7 presents our In order to deliver an email in ePOST, the sender inserts conclusions. the email and a special delivery request into the cooperative store. Nodes that store such a delivery request subscribe to 2. RELATED WORK a multicast group under the recipient’s identifier.
Recommended publications
  • Resurrect Your Old PC
    Resurrect your old PCs Resurrect your old PC Nostalgic for your old beige boxes? Don’t let them gather dust! Proprietary OSes force users to upgrade hardware much sooner than necessary: Neil Bothwick highlights some great ways to make your pensioned-off PCs earn their keep. ardware performance is constantly improving, and it is only natural to want the best, so we upgrade our H system from time to time and leave the old ones behind, considering them obsolete. But you don’t usually need the latest and greatest, it was only a few years ago that people were running perfectly usable systems on 500MHz CPUs and drooling over the prospect that a 1GHz CPU might actually be available quite soon. I can imagine someone writing a similar article, ten years from now, about what to do with that slow, old 4GHz eight-core system that is now gathering dust. That’s what we aim to do here, show you how you can put that old hardware to good use instead of consigning it to the scrapheap. So what are we talking about when we say older computers? The sort of spec that was popular around the turn of the century. OK, while that may be true, it does make it seem like we are talking about really old hardware. A typical entry-level machine from six or seven years ago would have had something like an 800MHz processor, Pentium 3 or similar, 128MB of RAM and a 20- 30GB hard disk. The test rig used for testing most of the software we will discuss is actually slightly lower spec, it has a 700MHz Celeron processor, because that’s what I found in the pile of computer gear I never throw away in my loft, right next to my faithful old – but non-functioning – Amiga 4000.
    [Show full text]
  • Modest Talk at Guadec/Desktop Summit 2009
    static void _f_do_barnacle_install_properties(GObjectClass *gobject_class) { GParamSpec *pspec; Modest /* Party code attribute */ pspec = g_param_spec_uint64 Creating a modern mobile (F_DO_BARNACLE_CODE, "Barnacle code.", "Barnacle code", 0, email client with gnome G_MAXUINT64, G_MAXUINT64 /* default value */, G_PARAM_READABLE technologies | G_PARAM_WRITABLE | G_PARAM_PRIVATE); g_object_class_install_property (gobject_class, F_DO_BARNACLE_PROP_CODE, José Dapena Paz jdapena AT igalia DOT com Sergio Villar Senín svillar AT igalia DOT com Brief history ● Started in 2006 ● 2007 Targeted for Maemo Chinook ● December 2007 first beta release ● 2008 Maemo Diablo ● 2009 Development becomes public. Repository moved to git How big is it? ● Modest Total Physical Source Lines of Code (SLOC) = 104,675 ● Tinymail Total Physical Source Lines of Code (SLOC) = 179,363 Goals Easy to use Embedded devices ● Small resources ● Small screen ● Small storage Multiple UI. Common logic Gnome UI Maemo 4/Diablo UI Maemo 5/Fremantle UI Coming soon... Support for most common email protocols ● IMAP ● POP ● SMTP Push email IMAP IDLE Extensibility New plugin architecture Architecture Architecture Gtk+ Hildon 2 GLib Pango GConf GtkHTML Modest Modest Modest Alarm MCE Abook plugin Xproto Plugin Yproto libtinymail-gtk libtinymail-maemo libtinymail-camel camel-lite Camel Xproto Tinymail Camel Yproto IMAP POP SMTP Xproto daemon Camel-lite mmap-ed summaries ● Very reduced memory usage ● Very compact representation on disk ● Efficient use of memory (thx kernel) IMAP IDLE support Camel features out-of-the-box ● Great support for MIME ● Stream based API ● Modular extensible Tinymail ● Multiple platforms ● Simplifies Camel API's ● Integrated Glib mainloop ● Gtk+ widgets ● Asynchronous API. Responsive UI ● Modular design Modest ● Message view based on gtkhtml ● Rich message editor based on wpeditor ● Offline read of messages and folders ● Integration with network status libraries Migration to Hildon 2.2 Hildon 2.2/Fremantle philosophy ● Proper experience with finger in small screens.
    [Show full text]
  • Webwords 58 Internet Resources Caroline Bowen
    Shaping innovative services: Reflecting on current and future practice Webwords 58 Internet resources Caroline Bowen ow timely it was that, just as Webwords’ minder Although they are time-consuming to master, email and completed an entry (Bowen, in press) in Jack Damico the web require of the user little, if any, technical savvy, Hand Martin Ball’s massive, 4-volume encyclopaedia, beyond conquering computer use with a desktop, laptop, intended for students of human communication sciences tablet, or smart phone. The device, must have: (a) a current, and disorders and the “educated general reader”, the topic routinely updated operating system (e.g., iOS, Linux, for the July 2017 JCPSLP landed, somewhat belatedly, on Windows), (b) an up-to-date browser (e.g., Chrome, Firefox, her desk. The topic, “Shaping innovative services: Opera, Safari), (c) plug-ins or browser extensions (e.g., Reflecting on current and future practice”, harmonised Adobe Flash Player, Java applet, QuickTime Player); an perfectly with the encyclopaedia essay, which covered both email client (e.g., Apple Mail, IBM Lotus Notes, MS Outlook, internet innovations, and online resources that have existed Mozilla Thunderbird) and/or (d) browser accessible web since the www was initiated. Accordingly, Webwords 58 mail; and, for mobile computing, (e) access to WiFi or a comprises the complete entry, reproduced here, prior to G3 or G4 network. Mobile computing technology employs publication, by kind permission of the publisher, in the hope Bluetooth, near field communication (NFC), or WiFi, and that SLP/SLT students around the world will find it helpful. mobile hardware, to transmit data, voice and video via a Typically for encyclopaedias, the piece does not include computer or any other wireless-enabled device, without parenthetical citations of published works.
    [Show full text]
  • Awareness, Training and Trust in Interaction with Adaptive Spam Filters
    Awareness, Training and Trust in Interaction with Adaptive Spam Filters. Henriette S.M. Cramer, Vanessa Evers, Maarten W. van Someren, Bob J. Wielinga Human-Computer Studies, University of Amsterdam Science Park 107 1098 XG Amsterdam, The Netherlands [email protected] +31 (0)20 525 6660 ABSTRACT user trust as they rely on correction of filter mistakes, Even though adaptive (trainable) spam filters are a common especially when a filter isn’t pre-trained. Users need to example of systems that make (semi-)autonomous decisions spend time and effort to train their filter. They have to be on behalf of the user, trust in these filters has been convinced to keep on training their system and need to underexplored. This paper reports a study of usage of spam understand the way the system learns. Studies on spam filters in the daily workplace and user behaviour in training filters can therefore provide interesting information about these filters (N=43). User observation, interview and survey the usage of spam filters in specific, but also about techniques were applied to investigate attitudes towards two interaction with autonomous and adaptive systems in types of filters: a user-adaptive (trainable) and a rule-based general. Despite this opportunity, spam filter usage is a filter. While many of our participants invested extensive surprisingly underexplored area. effort in training their filters, training did not influence filter Ways to develop spam filters are widely available, but trust. Instead, the findings indicate that users’ filter information on user interaction with spam filters is scarce. awareness and understanding seriously impacts attitudes Research is available on interaction with e.g.
    [Show full text]
  • Mailenable Business Advantages
    MailEnable Business Advantages Advantages of E-mail Why choose MailEnable? Electronic mail (E-mail) has become Advantages for Users mission critical for most successful MailEnable Mail Servers provide a messaging infrastructure that enhances and increases businesses. E-mail allows for fast and productivity for the end user. The features are designed to facilitate ease of communication concise communication between by making the process fast, accessible, customizable and simple to use. For example; multiple users and provides an efficient method of information Access e-mail anywhere ± Web mail is an email dissemination. In most cases, e-mail service that uses a web browser as an interface is cheaper, faster and more reliable allowing for the viewing, sending, receiving, editing than long distance phone calls and and deleting of email, just as you would with your postal deliveries. standard email clients such as Outlook or Eudora. E-mail instantly generates a written Web mail is ideal for mobile users because it allows record of interactions between you to access your email from anywhere, anytime senders. Mail can be sorted and with any computer connected to the Internet. filtered automatically before even being opened to filter out junk mail or prioritize important information. E-mail Customised look and feel± Web mail µskins¶ allow you to redesign the look and feel of has revolutionized the manner in your web mail and web administration. MailEnable has a skin library available on-line, or you which business is conducted. can customize your own look. By changing colors, icons, buttons, images and backgrounds, you can even brand web mail with your company logo for a consistent corporate look.
    [Show full text]
  • Icewarp Server Quick Start Guide
    IceWarp Unified Communications IceWarp Server Quick Start Guide Version 10.4 Printed on 29 August, 2011 Contents Quick Start Guide 1 IceWarp Server Overview ............................................................................................................................................... 1 IceWarp Server Highlights .............................................................................................................................................. 2 Getting Started ............................................................................................................................................................... 3 Download ........................................................................................................................................................... 3 Installation ......................................................................................................................................................... 3 Wizard ................................................................................................................................................................ 3 Activation ........................................................................................................................................................... 3 Add Domain ....................................................................................................................................................... 3 Network ............................................................................................................................................................
    [Show full text]
  • Linux E-Mail Set Up, Maintain, and Secure a Small Office E-Mail Server
    Linux E-mail Set up, maintain, and secure a small office e-mail server Ian Haycox Alistair McDonald Magnus Bäck Ralf Hildebrandt Patrick Ben Koetter David Rusenko Carl Taylor BIRMINGHAM - MUMBAI This material is copyright and is licensed for the sole use by Jillian Fraser on 20th November 2009 111 Sutter Street, Suite 1800, San Francisco, , 94104 Linux E-mail Set up, maintain, and secure a small office e-mail server Copyright © 2009 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: June 2005 Second edition: November 2009 Production Reference: 1051109 Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK. ISBN 978-1-847198-64-8 www.packtpub.com
    [Show full text]
  • Taking Email to Task
    Taking Email to Task: The Design and Evaluation of a Task Management Centered Email Tool Victoria Bellotti, Nicolas Ducheneaut, Mark Howard, Ian Smith Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94706 USA +1 650 812 4000 {bellotti, nicolas, mahoward, iansmith}@parc.com ABSTRACT avenues. Among these, CAFE [20] is probably the most Email has come to play a central role in task management, ambitious. Based on studies of email users’ behavior, yet email tool features have remained relatively static in CAFE offers three modes (busy, cool and curious) to recent years, lagging behind users’ evolving practices. The satisfy varying needs in different situations. Cadiz et al.’s Taskmaster system narrows this gap by recasting email as prototype [7] focuses on keeping up with the flow of task management and embedding task-centric resources messages as they arrive and the activity of email triage on directly in the client. In this paper, we describe the field a body of new messages. In a similar vein Rohall et al. research that inspired Taskmaster and the principles behind [18] describe visualization techniques to quickly identify its design. We then describe how user studies conducted dependencies between messages in a conversation. Finally, with “live” email data over a two-week period revealed the Gwidzka’s work [12] focuses on the management of value of a task-centric approach to email system design and pending tasks and the process of reminding to act on them. its potential benefits for overloaded users. This body of research is, however, only in its infancy, with Keywords each prototype only addressing one facet of the problem of Email, task management, user studies, system evaluation task management in email.
    [Show full text]
  • A Portable User-Level Approach for System-Wide Integrity Protection†
    A Portable User-Level Approach for System-wide Integrity Protectiony Wai-Kit Sze and R. Sekar Stony Brook University Stony Brook, NY, USA ABSTRACT • Difficulty of secure policy enforcement. Non-bypassable poli- In this paper, we develop an approach for protecting system in- cies usually have to be enforced in the OS kernel. They are usu- tegrity from untrusted code that may harbor sophisticated malware. ally much harder to develop than user-level defenses. Moreover, We develop a novel dual-sandboxing architecture to confine not kernel-based solutions cannot be easily ported across different only untrusted, but also benign processes. Our sandboxes place OSes, or even different versions of the same OS. only a few restrictions, thereby permitting most applications to An alternative to sandboxing is isolated execution of untrusted code. function normally. Our implementation is performed entirely at One-way isolation [15, 23] permits untrusted software to read any- the user-level, requiring no changes to the kernel. This enabled us thing, but its outputs are held in isolation. Two-way isolation limits to port the system easily from Linux to BSD. Our experimental re- both reads and writes, holding the inputs as well as outputs of un- sults show that our approach preserves the usability of applications, trusted applications in an isolated environment. The app model on while offering strong protection and good performance. Moreover, Android, Apple iOS, and Windows 8 sandbox are generally based policy development is almost entirely automated, sparing users and on this two-way isolation model. administrators this cumbersome and difficult task. Isolation approaches provide stronger protection from malware since they block all interactions between untrusted and benign soft- 1.
    [Show full text]
  • RIM Blackberry Synchronization Guide
    IceWarp Unified Communications RIM BlackBerry Synchronization Guide Version 10.4 Printed on 16 April, 2012 Contents RIM Blackberry Synchronization 1 Blackberry Enterprise Server .......................................................................................................................................... 2 Choosing the Wireless Connection ................................................................................................................................. 3 BIS Email Account ........................................................................................................................................................... 4 Plug-in Installation.......................................................................................................................................................... 4 Installing an SSL Certificate ............................................................................................................................................ 5 ActiveSync Alternatives .................................................................................................................................................. 5 1 C H A P T E R 1 RIM Blackberry Synchronization Disclaimer Information contained herein is provided as-is. The use of products and services as mentioned here is not authorized by their manufacturer and service provider Research In Motion Inc. IceWarp or any of its affiliate can not be held responsible for any damage, direct or implied, when using this information in production
    [Show full text]
  • August 2021 Edition
    THE OFFICIAL NEWSPAPER OF, AND FOR, PINE MOUNTAIN LAKE PROPERTY OWNERS 2021 AUGUST TABLE OF CONTENTS Administration 2-19, 24, 32-35 PMLA Financials 6 Letters to the Editor 8 The Grill & Lounge 16-17 “Homes on the Hill” Real Estate 25-31 PML Clubs & 36 - 37, 41 Recreation 43-44 Community News 45-50 School News 50 Home Improvement 51-54 Classified Ads 55 Requested PAID ABS DIRECT U.S. POSTAGE U.S. PRSRT STD PRSRT Change Service ANNUALANNUAL MEMBERMEMBER MEETINGMEETING BOARDBOARD ELECTIONELECTION SATURDAY,SATURDAY, AUGUSTAUGUST 2121 PMLPML LAKELAKE LODGELODGE DON'TDON'T FORGETFORGET TOTO VOTEVOTE PLEASEPLEASE RETURNRETURN YOURYOUR BALLOTBALLOT FAWNFAWN SEASONSEASON ININ PMLPML EVERYEVERY VOTEVOTE COUNTSCOUNTS PLEASEPLEASE WATCHWATCH FORFOR WILDLIFEWILDLIFE MEETMEET THETHE CANDIDATESCANDIDATES OUTDOOROUTDOOR BURNINGBURNING FORFOR PMLAPMLA BOARDBOARD ISIS PROHIBITEDPROHIBITED OFOF DIRECTORSDIRECTORS FIREFIRE DANGERDANGER ISIS SeeSee PagesPages 20-2320-23 VERYVERY HIGHHIGH 19228 Pine Mountain Dr. Groveland, CA 95321 Groveland, 19228 Pine Mountain Dr. 2 ADMINISTRATION August 2021 PINE MOUNTAIN LAKE ASSOCIATION PHONE & EMAIL DIRECTORY ADMINISTRATION DEPARTMENT OF SAFETY 209.962.8600 General Manager – Joseph Powell Director of Safety – 209.962.8633 [email protected] Natalie Trujillo [email protected] www.pinemountainlake.com Admin Asst. to G.M. – 209.962.8627 Debra Durai Sergeant – 209.962.1244 [email protected] Sgt. Teri Cathrein ADMINISTRATION OFFICE HOURS* [email protected] Human Resources – 209.962.8628 8:00 AM TO 4:30 PM – MON THRU FRI Shannon Abbott Sergeant – 209.962.8616 OPEN AT 8:45 AM SECOND TUESDAY OF EACH MONTH [email protected] Sgt. Carrie Harvey THE OFFICE IS CLOSED BETWEEN 12:00 & 1:00 EACH DAY E.C.C.
    [Show full text]
  • Modeling Intention in Email
    Modeling Intention in Email Vitor R. Carvalho CMU-LTI-08-007 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: William W. Cohen, Chair Tom M. Mitchell Robert E. Kraut Lise Getoor, University of Maryland Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2008 Vitor R. Carvalho Keywords: Machine Learning, Email Management, Ranking To Sarah iv Abstract Email management has a fundamental role in modern work productivity. In this thesis we present evidence that email management can be potentially improved by the effective use of machine learning techniques to model dif- ferent aspects of user intention. We initially propose a taxonomy of user in- tentions in terms of Speech Acts applied to email communication, or “email acts”, and show that email act classification can be largely automated, poten- tially leading to better email prioritization and management. We then describe how machine learning can be used to reduce the chances of costly email addressing errors. One type of costly error is an “email leak”, i.e., mistakenly sending a message to an unintended recipient — a widespread problem that can severely harm individuals and corporations. Another type of addressing error is forgetting to add an intended collaborator as recipient, a likely source of costly misunderstandings and communication delays that can be potentially addressed with intelligent recipient recommendation. We pro- pose several different approaches to address these problems, and show very positive experimental results in a large email collection. In addition, we de- scribe a 4-week long user study based on the implementation of some of the proposed models in a popular email client (Mozilla Thunderbird).
    [Show full text]