Annals of law Google’s moon shot

The quest for the universal . by Jeffrey Toobin

very weekday, a truck pulls up to the at the company’s headquarters, in Moun- Cecil H. Green Library, on the cam- tain View, California. “It’s mind-boggling pusE of Stanford University, and collects at to me, how close it is. I think of Google least a thousand books, which are taken to Books as our moon shot.” an undisclosed location and scanned, page Google’s is not the only book-scan- by page, into an enormous database being ning venture. Amazon has digitized hun- created by Google. The company is also dreds of thousands of the books it sells, retrieving books from at several and allows users to search the texts; Carne- other leading universities, including Har- gie Mellon is hosting a project called vard and Oxford, as well as the New York the , which so far has Public Library. At the University of Mich- scanned nearly a million and a half books; igan, Google’s original partner in Google the , a consortium Book Search, tens of thousands of books that includes , Yahoo, and sev- are processed each week on the company’s eral major libraries, is also scanning thou- custom-made scanning equipment. sands of books; and there are many smaller Google intends to scan every book ever projects in various stages of development. published, and to make the full texts Still, only Google has embarked on a searchable, in the same way that Web sites project of a scale commensurate with its can be searched on the company’s engine corporate philosophy: “to organize the at google.com. At the books site, which is world’s information and make it univer- up and running in a beta (or testing) ver- sally accessible and useful.” sion, at books.google.com, you can enter a In part because of that ambition, Goo- word or phrase—say, Ahab and whale— gle’s endeavor is encountering opposition. and the search returns a list of works in A federal court in New York is consider- which the terms appear, in this case nearly ing two challenges to the project, one eight hundred titles, including numer- brought by several writers and the Au- ous editions of Herman Melville’s novel. thors Guild, the other by a group of pub- Clicking on “Moby-Dick, or The Whale” lishers, who are also, curiously, partners in calls up Chapter 28, in which Ahab is Google Book Search. Both sets of introduced. You can scroll through the plaintiffs claim that the library component chapter, search for other terms that appear of the project violates copyright law. Like in the book, and compare it with other most federal lawsuits, these cases appear editions. Google won’t say how many likely to be settled before they go to trial, books are in its database, but the site’s and the terms of any such deal will shape value as a research tool is apparent; on it the future of digital books. Google, in an you can find a history of Urdu newspapers, effort to put the lawsuits behind it, may an 1892 edition of Jane Austen’s letters, agree to pay the plaintiffs more than a several guides to writing haiku, and a Har- court would require; but, by doing so, the vard alumni directory from 1919. company would discourage potential No one really knows how many books competitors. To put it another way, being there are. The most volumes listed in any taken to court and charged with copyright catalogue is thirty-two million, the num- infringement on a large scale might be the ber in WorldCat, a database of titles from best thing that ever happens to Google’s more than twenty-five thousand libraries foray into the printed word. around the world. Google aims to scan at least that many. “We think that we can do hough Google has more than ten it all inside of ten years,” Marissa Mayer, thousand employees—about fifty a vice-president at Google who is in newT ones are hired each week—and a charge of the books project, said recently, market capitalization of more than a

30 THE NEW YORKER, FEBRUARY 5, 2007

TNY—2007_02_05—PAGE 30—133SC.—#2 page—text changes hundred and fifty billion dollars, the computer science that putting things on quality knowledge is captured in books. company cultivates the air of a college dead trees was obsolete and getting it all So not having that—it’s just too big an campus at its headquarters, in Silicon into a searchable, digital format was a omission.” As Marissa Mayer put it, Valley. Now and then, there are self- quest that had to be accomplished some- “Google has become known for 8; pro­ consciously wacky stunts, like Pajama day,” Terry Winograd, a Stanford pro- viding access to all of the world’s knowl- Day, which happened to take place fessor who was a mentor to Page and edge, and if we provide access to books when I visited. (The event was to be Brin, said. we are going to get much higher-quality madcap within reason; supervisors were After founding Google, in 1998, Page and much more reliable information. told to convey the message that “paja- and Brin—who are now in their mid- We are moving up the food chain.”

Publishers have sued Google for breaching copyright. A settlement seems likely, but it may not be in the public’s interest.

mas means ‘pajamas,’ not ‘what you thirties and worth around fourteen bil- In 2002, Google quietly made over- sleep in.’ ”) When I met with Sergey lion dollars each—began to talk about tures to several libraries at major uni- Brin, a co-founder of Google, he was how to include books in the company’s versities. The company proposed to wearing bright-blue p.j.s, with the database. Page, in particular, embraced digitize the entire collection free of company’s logo stitched on the breast the idea of putting books online; at one charge, and give the library an elec- pocket. point, he set up a primitive lab in his tronic copy of each of its books. “Larry The story of how Brin and Google’s office, with a scanner and a page-turning is an undergrad alum here at Michigan, other co-founder, Larry Page, met as machine. “I think it was motivating to and he knew we were already interested graduate students in computer science at have those kinds of aspirations, but no- in digitizing the library as part of our Stanford in the mid-nineties, and de- body really took it seriously,” Brin told preservation efforts,” John Wilkin, an vised a series of elegant software algo- me. The men were less interested in associate university librarian at Michi- rithms that allowed Web searchers to making it easy for people to obtain the gan, told me. “There was a lot of back- find relevant information quickly and full texts of books online than in making and-forth between Google and us in efficiently, has become part of Silicon accessible the information those books the process. We wanted to insure that Valley lore. Less well known is that, at contained. “We really care about the the materials wouldn’t be damaged and the time, Brin and Page were also work- comprehensiveness of a search,” Brin that what came out could be used as ing on Stanford’s Tech- said. “And comprehensiveness isn’t just a preservation surrogate. They started nologies Project, an attempt, funded by about, you know, total number of words experimenting with different ways of h t the federal government, to organize or bytes, or whatnot. But it’s about hav- copying the images, and we started different kinds of stored information, in- ing the really high-quality information. a pilot project in July, 2004. We’ve old ro n cluding books, articles, and journals, in You have thousands of years of human been getting better, going faster. We’re r a digital form. “There was an attitude in knowledge, and probably the highest- doubling our output all the time.” The

THE NEW YORKER, FEBRUARY 5, 2007 31

TNY—2007_02_05—PAGE 31—133SC.31—133SC.—live art r15903—pls pull kodak for color guidance—#2 page—text changes Michigan library holds seven million books is called up in response to search demand that Google stop further copy- volumes, and Wilkin believes that Google queries, Google displays a portion of the ing and “destroy all unauthorized copies will have copied the entire collection in total work and shows links to the pub- made by Google through the Google Li- about six years. lisher’s Web site and online shops like brary Project of any copyrighted works.” Amazon, where users can buy the book. (The Authors Guild filed its lawsuit ast month, at the New York Pub- “We are helping the publishers reach around the same time.) The publishers, lic Library, Google hosted a con- consumers that otherwise might not who have the support of the Association ferenceL on the future of the publish- have known about their books and help- of American Publishers, are suffering ing industry. About four hundred peo- ing them market their books by giving from a version of the problem that John ple—mainly publishing executives and limited but relevant previews of the Kerry had in the last Presidential cam- agents—attended, most of them grimly books,” Jim Gerber, Google’s director of paign: they are for Google Book Search aware of the simultaneous lethargy and content partnerships, told me. “The In- at the same time that they are against it. panic that have characterized their indus- ternet and search are custom made for try’s response to the digital age. Nearly all marketing books. When there are a hun- opyright law dates to the birth of the attempts to sell books in an electronic for- dred and seventy-five thousand new Republic. Article I of the Constitu- mat have been disappointing, and now books each year, you can’t market each tionC assigns Congress the right to pass Google appeared to be encroaching on one of those books in mass market. laws “securing for limited Times to Au- the publishers’ domain. The implicit When someone goes into a search en- thors and Inventors the exclusive Right message of the conference was summed gine to learn more about a topic, that is a to their respective Writings and Discov- up by a quotation from Charles Darwin perfect time to make them aware that a eries.” The first copyright law was passed that was projected on a screen: “It is not given book exists. Publishers know that in 1790, and it has been frequently and the strongest of the species that survive, ‘browse leads to buy.’ ” (Google says that confusingly amended over the years, nor the most intelligent, but the ones it does not take a cut of sales made through most recently in the Sonny Bono Copy- most responsive to change.” As Laurence its books site.) right Term Extension Act of 1998, Kirschbaum, a longtime publishing exec- Still, on October 19, 2005, several which extended copyright terms by utive who recently became a literary agent, leading publishers, including Simon & twenty years. (The law is also known as told me at the conference, “Google is now Schuster, the Penguin Group, and Mc- the Mickey Mouse Protection Act, be- the gatekeeper. They are reaching an au- Graw Hill—all of which are partners in cause the Walt Disney Company, seek- dience that we as publishers and authors Google Book Search—filed a lawsuit ing to protect its copyright on early ani- are not reaching. It makes perfect sense to against the company, seeking to stop the mated classics like “Steamboat Willie,” use the specificity of a search engine as a project. The publishers don’t object to lobbied heavily for it.) The twisted his- tool for selling books.” Google’s plan for helping them sell new tory of copyright law has insured an awk- Google thought so, too, and designed books, but they assert that the library ward passage into the digital age. the books project accordingly. In addi- component of the project is ille- The legal assertion at the core of tion to forming partnerships with librar- gal. They claim that Google’s “massive, Google’s business plan is its purported ies, the company has signed contracts wholesale and systematic copying of en- right to scan millions of copyrighted with nearly every major American pub- tire books still protected by copyright” books without payment to or permis- lisher. When one of these publishers’ infringes on the publishers’ rights. They sion from the copyright owners. Ap- proximately twenty per cent of all books are in the public domain; these include books that were never copyrighted, like government publications, and works whose copyrights have expired, like “Moby-Dick.” Google has simply cop- ied such books and made them available on the Web. Roughly ten per cent of books are copyrighted and in print— that is, actively being sold by publishers. Many of these books are covered by Google’s arrangement with its publisher partners, which allows the company to scan and display parts of the works. The vast majority of books belong to a third category: still protected by copy- right, or of uncertain status, and out of print. These books are at the center of the conflict between Google and the publishers. Google is scanning these books in full but making only “snippets”

TNY—2007_02_05—PAGE 32—133SC.—live opi art a11825— #2 page—text changes (the company’s term) available on the able to search books on its site—which it blond hair, conducted most of our con- Web. (Google searches turn up only the describes as the equivalent of a giant li- versations with bits of granola bar cling- search term and about twenty words brary card catalogue—is not the same as ing to his shirt. on either side of it.) Copyright law has making the books themselves available. “Previously, when people have done never forbidden all “copying” of a pro- But the publishers cite another factor in scanning, they always were constrained tected work; scholars and journalists fair-use analysis: the amount of the copy- by their budget and their scale,” Clancy have long been allowed to quote por- righted work that is used in the creation told me. “They had to spend all this tions of copyrighted material under the of the new one. Google is copying entire time figuring out which were the perfect doctrine of fair use. Google maintains books, which doesn’t sound “fair” to the ten thousand books, so they spent as that the chunks of copyrighted material plaintiff publishers and authors. “Tradi- much time in selection as in scanning. that it makes available on its books site tional copyright analysis All the technology out there are legal under fair use. “We really anal- says that a transformation developed solutions for ogized book search to Web search, and leads to the creation of a what I’ll call low-rate scan- we rely on fair use every day on Web new and independent work, ning. There was no need search,” David C. Drummond, a senior like a parody or a work of for a company to build a vice-president at Google who is over- criticism,” Jane Ginsburg, a machine that could scan seeing the response to the lawsuits, told professor at Columbia Law thirty million books. Do- me. “Web sites that we crawl are copy- School, said. “Copying the ing this project just using righted. People expect their Web sites entire work, which is what Google is commercial, off-the-shelf technology to be found, and Google searches find doing, does not preclude a finding of fair was not feasible. So we had to build it them. So, by scanning books, we give use, but it does fall outside the traditional ourselves.” books the chance to be found, too.” paradigm.” Google will not discuss its proprietary (Google also has an “opt out” policy, Harvard, Stanford, and Oxford have scanning technology, but, rather than in- which allows copyright holders to re- prohibited Google from scanning copy- vesting in page-turning equipment, the quest that specific titles be omitted from righted works in their collections, limit- company employs people to operate the the company’s database.) ing the company to books that are in the machines, I was told by someone famil- However, according to the plaintiffs public domain. Because of the opacity iar with the process. “Automatic page- in the cases against Google, the act of of copyright law, and the extension of turners are optimized for a normal book, copying the complete text amounts to an protections mandated by the 1998 act, but there is no such thing as a normal infringement, even if only portions are it’s not always clear which works are book,” Clancy said. “There is a great deal made available to users. “What they are still protected. (Copyright status can be- of variability over books in a library, in doing, of course, is scanning literally mil- come murky when authors die or pub- terms of size or dust or brittle pages.” (To lions of copyrighted books without per- lishing houses go out of business.) Stan- needle Google, several blogs have posted mission,” Paul Aiken, the executive di- ford has drawn a line at 1964 and images from the books site that include rector of the Authors Guild, said. “Google prohibited Google from copying most the scanners’ fingers.) Google will not is doing something that is likely to be works published since that date. “When reveal how much it is spending on the very profitable for them, and they should Google got sued, we got nervous,” Mi- books project. In 2005, Microsoft an- pay for it. It’s not enough to say that it chael A. Keller, the university librarian nounced that it would spend two and a will help the sales of some books. If you at Stanford, told me. “We’re not a pub- half million dollars to scan a hundred make a movie of a book, that may spur lic institution. We don’t have any state thousand out-of-copyright books in the sales, but that doesn’t mean you don’t li- immunity from being sued ourselves, so collection of the . At this cense the books. Google should pay. We we started sorting out the stuff that we rate, scanning thirty-two million books— should be finding ways to increase the know is public domain.” (Several of the the number in WorldCat’s database— value of the stuff on the Internet, but public institutions that are Google’s would cost Google eight hundred mil- Google is saying the value of the right to partners, including the Universities of lion dollars, a major but hardly extrava- put books up there is zero.” Michigan, California, Virginia, and gant expenditure for a multibillion-dollar Google asserts that its use of the Texas at Austin, are allowing the scan- corporation. copyrighted books is “transformative,” ning of copyrighted material.) Copying all those pages presents that its database turns a book into es- many difficulties, but writing software to sentially a new product. “A key part of he chief engineer of Google’s sys- make the books useful to searchers is the line between what’s fair use and tem for scanning books in the li- even harder. “The scanning technology is what’s not is transformation,” Drummond braryT collections is Dan Clancy, who boring,” Clancy said. “The real challenge said. “Yes, we’re making a copy when joined the company after eight years at is to get somebody something that they we digitize. But surely the ability to find NASA, where he supervised teams of are actually interested in, inside a book. something because a term appears in a Ph.D.s. working on problems related to Web sites are part of a network, and book is not the same thing as reading artificial intelligence. Google provides that’s a significant part of how we rank the book. That’s why is a its employees with free food twenty- sites in our search—how much other different product from the book itself.” four hours a day, and Clancy, a tall, sites refer to the others.” But, he added, In other words, Google says that being shambling man with a shock of white- “Books are not part of a network. There

THE NEW YORKER, FEBRUARY 5, 2007 33

TNY—2007_02_05—PAGE 33—133SC.—live spot art r15901_A, pls inspect and report on quality— #2 page—text changes is a huge research challenge, to under- search results that lead to books obtained cerned that pirated copies of the books stand the relationship between books.” from publishers. Google’s prospects for on Google’s site could leak to the pub- Still, the basic search protocols func- producing revenue from the books proj- lic, and so the organization would in- tion well. A search for “Heart of Dark- ect appear rather modest, but the com- sist on security measures. (Sadly, for ness” leads immediately to Joseph Con- pany has often made a profit on ventures writers and publishers, demand for rad’s novel, which is not as obvious as it that initially seemed unlikely to be lucra- their products has never been robust sounds, considering how common the tive. “We’ve had this fortunate streak that enough to generate a major piracy words in the title are. As Clancy said, “If when we’ve done things that have im- problem.) As for distribution of the you put in ‘Heart of Darkness,’ we have pacted our users and society as a whole— proceeds from the site, Google might to know that you’re looking for the novel, positively, in a significant way—we’ve agree to share revenue with publishers, not a book about lighting conditions in been rewarded by that downstream in in the way that radio stations pay for cardiac surgery. So how do we do that? some way, even though we may not have the music they play; publishers could We rank some words more important envisioned exactly what it was right receive a fee based on a statistical anal- than others. The title may matter more offhand,” Sergey Brin told me. “We ysis of how often their books are viewed. than the content, so we may weight that didn’t have ads when we first put up Web Google could pay in cash or in kind, with more. You could also look at what other search. It wasn’t clear it was great busi- advertising. people have searched for, so if everyone ness when we started search. In fact, the But a settlement that serves the par- who searched for ‘Heart of Darkness’ companies that were doing search were ties’ interests does not necessarily benefit clicked on the novel, we might figure moving away from it. But we just thought the public. “It’s clearly in both sides’ in- that you probably will, too.” it was important, and we thought that terest to settle,” Lawrence Lessig, a pro- The most important data for ranking where there was a will there would be fessor at Stanford Law School, said. searches, Clancy explained, may come a way. And in fact it turned out to be “Businesses in Internet time can’t wait from Web pages that link to books in a great way to make money—doing around for years for lawsuits to be re- Google’s database. (For instance, if links search with targeted advertising. And I solved. Google wants to be able to get on the phrase “Clinton’s autobiography” think you’ll find the same sort of thing this done, and get permission to resume direct users to a copy of “My Life” on here.” scanning copyrighted material at all the the books site, there is a high probabil- The key legal question is whether the libraries. For the publishers, if Google ity that people who use the same search courts will allow Google to continue to gives them anything at all, it creates a terms will also want this result.) “We scan copyrighted material without per- practical precedent, if not a legal prece- just started, and we need to make these mission. But the schedule of the lawsuits dent, that no one has the right to scan books networked, and we need people may turn out to be as significant as the this material without their consent. That’s to help us do that,” Clancy said. merits of the cases, which are before Judge a win for them. The problem is that even Google’s database contains many John E. Sprizzo. In keeping with the though a settlement would be good for books in languages other than English, stately pace of federal litigation, the depo- Google and good for the publishers, it but for now they must be searched in sitions of witnesses are to begin sometime would be bad for everyone else.” the original tongue. On the company’s this year, and the parties will be allowed to Web site, there is already a primitive file motions for summary judgment—in ibraries have recognized for some translation feature, and it may someday Google’s case, to dismiss the suits—in time that they must adapt to the be enhanced to allow books to be ren- early 2008. Then there could be a trial. If digitalL age, and many have taken steps dered in another language at the touch the cases are appealed, they could linger in that direction. In 1995, Stanford of a button. “In terms of democratiza- well into the next decade. founded the HighWire Press, which tion, you want to be able to access infor- However, most people involved in the now provides electronic access to more mation,” Clancy told me. In places like dispute believe that a settlement is likely. than a thousand scholarly journals. A the Arab world, where few titles are “The suits that have been filed are a busi- few years later, Stanford digitized most translated into the local languages each ness negotiation that happens to be go- of its card catalogue, and circulation of year, he said, access to the world’s books ing on in the courts,” Marissa Mayer its books increased by fifty per cent. could have a substantial impact. “We told me. “We think of it as a business ne- “Once our students could sit in their are talking about a universal digital li- gotiation that has a large legal-system dorm rooms and find out what we had brary,” Clancy went on. “I hope this component to it.” According to Pat in the library, they sought out more world evolves so that there exists a time Schroeder, the former congresswoman, books,” Michael Keller, the university where somebody sitting at a terminal who is the president of the Associa- librarian, says. Individual libraries some- can access all the world’s information.” tion of American Publishers, “This is times received grants to scan specific basically a business deal. Let’s find a collections—in 2001, the New York uch messianism cannot obscure the way to work this out. It can be done. Public Library used federal money to central truth about Google Book Google can license these rights, go to the digitize a substantial portion of the col- Search:S it is a business. Google has rights holder of these books, and make lection at its Schomburg Center for Re- pledged not to show advertising next to a deal.” search in Black Culture—but a compre- the pages of library books, but the com- The terms of such a deal aren’t hard hensive effort seemed inconceivable. pany does sell advertising alongside to imagine. The Authors Guild is con- According to Paul LeClerc, who has

34 THE NEW YORKER, FEBRUARY 5, 2007

TNY—2007_02_05—PAGE 34—133SC.34—133SC.— #2 page—text changes been the president of the New York Public Library for the past thirteen years, “For the first decade of my tenure, I was always asked, ‘Weren’t libraries going to go online?’ And I’d say of course we want to do it, but it’s not going to happen, because no one is going to give us the money to do it. No- where on the horizon was that amount of money predictable or identifiable. Then came Google. This struck us as being the quickest, the fastest, and the most efficient way of getting large-scale additions to our collections online for free use.” Among Google’s potential competi- tors in the field of library digitization are members of the Open Content Alli- ance, which facilitates various scanning projects around the country and over- seas. Funded largely by Microsoft and the Alfred P. Sloan Foundation, the O.C.A. has formed alliances with many companies and institutions, including the Boston Public Library, the Ameri- “Sir, can I see your head shot and registration?” can Museum of Natural History, and . For the mo- ment, though, the O.C.A.’s members •• are copying only material in the public domain (and works from copyright with the publishers and create huge bar- company as one between good and evil. owners who have given explicit permis- riers to newcomers in the market there The dual status of several leading pub- sion), which limits the scope of the proj- won’t be any competition. That’s the lishers as both partner and adversary ects substantially. greatest danger here.” to Google underscores their desperate Google’s advantage may well be ce- need to hedge their bets in a digital mented if the company settles its law- he most striking thing about Pa- world that they have yet to master. The suits with the publishers and authors. “If jama Day at Google was how few publishers’ complaint against Google Google says to the publishers, ‘We’ll peopleT participated. Most of the rank states that “the Publishers support mak- pay,’ that means that everyone else who and file saw the stunt for the manufac- ing books available in digital form so wants to get into this business will have tured fun that it was. They came to work that those books can be, among other to say, ‘We’ll pay,’ ” Lessig said. “The in their usual slacker uniforms of jeans things, researched through electronic publishers will get more than the law and T-shirts—which are, in their way, as means.” That may be true in theory, but entitles them to, because Google needs conformist as white shirts and ties were trade publishers, in particular, have been to get this case behind it. And the set- at I.B.M. in the nineteen-sixties. Google, slow to embrace new technology, espe- tlement will create a huge barrier for any as its employees seem to recognize, can- cially for out-of-print books; Google new entrants in this field.” not pretend to be anything other than a will almost certainly bring more atten- In other words, a settlement could large and powerful corporation. tion to these works than their own pub- insulate Google from competitors, It’s easy to mock Google’s unofficial lishers have. which would be especially troubling, be- motto—“Don’t be evil”—but there is The law is supposed to resolve issues cause the company has already proved nothing evil about Google Book Search. like these—between self-interested par- that when it comes to searches it is not At the same time, there is nothing in- ties with reasonable claims and legiti- infallible. “Google didn’t get video herently virtuous about it. Google has mate arguments. But the rules of copy- search right—YouTube did,” Tim Wu, succeeded because, on the whole, it has right are so ambiguous, and the courts a professor at Columbia Law School, developed excellent products; it’s folly so slow, that the judicial system serves said. (Google solved that problem by to judge the company’s behavior on largely to implement the law of the jun- buying YouTube last year for $1.6 bil- moral grounds. Its shareholders cer- gle. “There is a real opportunity to move lion.) “Google didn’t get blog search tainly don’t. books into the digital arena,” Marissa right—technorati.com did,” Wu went Nor can publishers and authors, who Mayer told publishers during the con- on. “So maybe Google won’t get book are struggling for a way to survive in a ference at the New York Public Library. search right. But if they settle the case new age, portray their conflict with the “And we are going to do it together.” 

THE NEW YORKER, FEBRUARY 5, 2007 35

TNY—2007_02_05—PAGE 35—133SC.—live opi art—A 11943— #2 page—text changes