GMAIL What It Is, and What It Is NOT RIP Ray Tomlinson Every Program Attempts to Expand Until It Can Read Mail
Total Page:16
File Type:pdf, Size:1020Kb
This Slide Intentionally Left Blank GMAIL What it is, and what it is NOT RIP Ray Tomlinson Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can. ~ Jamie Zawinski (JWZ) The Law of Software Envelopment THE GOOGLE SESSIONS: ACT ONE • My Greatest Flaw is that I am Slow • My organization, the School District of Escambia County (ECSD), is coming to The Cloud. • We want to understand it before we make it integral to our operations. • Thus, four sessions on Google Apps. • Each builds on the next. • Some observations might be trite or even inaccurate. • Call me out on those. • Our approach begins outside an older application, and moves inward. WHERE ECSD IS NOW • Management believes maintaining 3 separate e-mail systems (GroupWise, GMail, O365 Outlook) is silly. • Pick the simplest to maintain and go with that. • Gmail is required by GAPS which is required by ChromeBooks. • Most users will not notice missing features. • Management is generally trustworthy and intelligent. • Power users want us to move to Gmail. • Lots of “When are we moving?” commentary. • But quite a bit of the reverse as well, from other types of power users. • Especially calendar folks. • I have a natural suspicion of closed, vendor-locked systems (Thanks, Pearson!) • Google bristles and points to Vault when we say that. • Those of us maintaining e-mail system want to make an informed decision. • I’ve used Gmail for years, but what do I know about it? • Putting aside transition and licensing costs, what is the “Right” thing? INQUIRY GROUND RULES • The Internet is full of Guides to Gmail with helpful tips. • And yet Gmail is really a moving target from an interface perspective • Fundamental features are another story. • How useful is the past in our rapidly iterating technological space? • If you use the term “digital immigrant”, that’s a paddlin’. • “Millennial” — you know THAT’s a paddlin’. • The history of Gmail has been narrated in a number of places • Including Wikipedia • I prefer “Founders at Work” by Jessica Livingstone • Interview with Paul Buchheit • Recall the lessons of “The Power Broker” • Google is not going to publish deep implementation details. • Let us not let Kremlinology carry us away WHY DO USERS PREFER GMAIL? • The interface feels futuristic. • AJAX was new when it premiered. • More discussion down the line. • Their most common tasks are within easy reach. • It is highly tolerant of “messy” user behavior. • Benefit of the doubt goes to the user, not the machine. • This is very hard to do. • Easy to access from any device. • Nothing is ever lost. • It learns. • Google has done a great job of selling itself as a bunch of geniuses inventing the future. • Fair enough. Every one has an image to maintain. • Is German engineering really that much better than French? • The guys working on the self-driving cars are not working on Gmail. IS GMAIL E-MAIL? • Question is not meant to be “provocative”. • Answer invites exploration. • Is E-mail based on its traditional implementation? • Or its purpose? • And if Gmail fails both of those tests, is that problematic? TRADITIONAL E-MAIL • One sender, one or more recipients. • Private, not public. • Otherwise it would be Usenet / NNTP. • “Store and Forward” • Shuttling text files across a network. • Files stored on a disk. • First retrieved on the mail server machine • Which might serve lots of purposes. • PINE, MUTT, EMACS (see JWZ quote) • Later, retrieved via network protocols. • Support for attachments and rich text via MIME. • The network is built on trust. • Until the Long September. • September 1992 had 30 days. • September 1993 has 8,223 days and counting. EMAIL STANDARDS • SMTP - Sending and Transfers. • POP - Retrieval to local client. • IMAP - Reading from server. • MIME - Content encoding. • MX Records - DNS Routing for E-Mail. • Headers etc. defined by RFC, but mainly by folklore. • Storage: Berkeley MBOX format, spool files. • LDAP: Original Directory. • CardDAV: Replacement Directory. • CALDAV: Calendaring. • ActiveSync: Mobile Collaboration. THE TRAGEDY OF EMAIL • The foundational standards of e-mail are uniformly terrible, and we have spent decades trying to build something sensible upon them. • SMTP was too naive, and so we added relay restrictions, authentication, encryption, and finally SPF trickery. • POP was limited to copy-then-delete from a single directory. • So we invented IMAP for multi-folder, multi-client. • And did anyone every implement it well on the server or client? • MX Records had to be supplemented by other methods of domain verification. • The very existence of anti-spam technologies demonstrates poor design. • And let us not even speak of strategies for generating headers, threading, quoting, summarizing (all sadly documented at http://jwz.org/hacks with discussions regarding Netscape Mail v2 and v3) • Until very late in the game there was no comprehensive standard for calendaring (CalDAV). • So we invented GroupWare … “GROUPWARE BAD” • Internet E-mail was the world of those who had Internet access. • Local networks developed their own messaging systems independent of it. • Support for most standards initially bolted on, and always uneasy. • Exchange Nobody has ever been able to adequately explain Exchange architecture to me. • At least it gave us ActiveSync. • Lotus Notes A synchronization engine that generated e-mail, calendaring, workflow apps, websites, CMS, and much more. • Taco Bell Syndrome - “It’s all the same, it’s just the way you fold it.” • GroupWise I have an entire presentation on the Architectural History of GroupWise. • The past glory of GroupWise is its data store. • The future glory might be extending that via APIs. • Apple Mail Designed elmx as a sort of “Super MBOX” format. • Flags designed for Apple Mail client. • Optimized for Spotlight metadata and searching. WHERE IS FUNCTIONALITY IMPLEMENTED? • We might think of features implemented on a quadrant • In or Above the Data Layer • Through Standard E-mail Mechanism or Through Proprietary Tech • Example: Checking whether a recipient has read an e-mail. • GroupWise implements functionality as integral part of the data layer using proprietary calls. • Lotus Notes implements functionality through “read receipts” as a quasi-standard feature interpreted by recipient’s client. • Some clients store read receipts with message, some mark them as a separate, standalone message. ENTER GMAIL • Gmail was initially built upon two existing Google projects • Google Groups (a web layer above usenet) • Free text searching (applied to general datasets) • All that code has definitely been rewritten. • Goal was to find ways of helping users with massive mailboxes manage their E-mail. • Implication is that the E-Mail data storage began as standard inboxes (maybe mbox format). • But it could not stay that way. • The basic approach lives on in “Gmailify” • Had to be web-based because Google is a web-oriented company. SEARCH CAUSES STORAGE PROBLEMS • Typical approach to e-mail search is to do a straightforward search through E-Mail data store based on filter and keywords. • Indexes can be built ahead of time to speed searches based on structured data (subject, addresses, time). • Google Web Search crawls the web server farms to build search indexes. • Only once the information is brought into the index through crawl retrieval can the data be searched. • Early Gmail users expected mail to be searchable as soon as it entered inbox. • But it could not be, because building the search index takes time once a message is incorporated into the data set. • Moreover, the model for serving web content requires distribution and redundancy of data across multiple web servers. • How do we guarantee that a user has full access to their e-mail inbox with current data? SEARCH SOLUTIONS TRANSFORMED GMAIL • New methods of searching had to be built just for E-Mail • “Google” was too slow! • A Guess: Speculative indexes and other aggressive, resource-intensive data sets have to be generated ahead of time so that messages can be broken up and slotted into them as soon as they are received. • The back end data for Gmail must be optimized for quick and continuous processing by very efficient algorithms. • As such it must operate on almost the entire Gmail data set continuously. • A Guess: This means that the entire data set is one, and partitioning the data sets beyond single users is actually kind of hard. • Gmail is Groupware, but at a huge and simplified level. • Google is the group, and the user is the user. • All other relationships are shims. • German Gmail (Googlemail) and Google Apps domain e-mail demonstrate this paradigm. • All doors lead to Gmail • The price of all this is that it is hard to separate educational data from other data. • Privacy lawsuits. FREE CANDY FOR EVERYONE! • Unlimited storage is part of the model, not merely a selling point. • You are already holding multiples of the data for indexes, backups, server farm distribution etc. • A Guess: Cutting the data set down is not desired • Trimming is another (inefficient) operation. • The algorithms learn better on large data sets. • Hence, it is hard to delete e-mails, most are just archived • Although,deletion has to exist for strong reasons. GMAIL STORAGE • The data does not represent anything one might recognize as traditionally structured e- mail data • Nor the esoteric formats supported by Groupware products. • Multiple abstraction layers in front of the user are required. • Even when exporting mail, the data presented to the user is a complex composite. • The same is true for protocol access. • Labels are not folders. • Labels are additive metadata. • Folders imply absolute data organization, which does not exist. • Threads are not generated after the fact, but are a fundamental structure. • A single message is just a very short thread. GMAIL INTERFACE • Original interface was too slow, even beyond searching.