Chapter 10. Our Archive

The New Enlightenment • The New Enlightenment Chapter 10. Our Archive Peter B. Kaufman Published on: Feb 23, 2021 License: Creative Commons Attribution 4.0 International License (CC-BY 4.0) The New Enlightenment • The New Enlightenment Chapter 10. Our Archive Who controls our access to the libraries of content developed and produced and archived over the last hundred-plus years? Who controls, or has tried to control, our search across these screens and servers for the moving pictures and sounds we are looking for? The forces of the Monsterverse are arrayed on every battlefield. We have to recognize these forces for what they are. And we have to gird for war. As we digitize all of our cultural heritage materials for access, we link our institutions and ourselves together online, and are in fact building one big supercomputer— futurologist Kevin Kelly has called it a “planetary electric membrane”—comparable to the individual human brain. It is an organism of collective human intelligence in the business now of processing the hundreds of thousands of full-length feature films we have made, the millions of television shows, the tens of millions of recorded songs, tens of billions of books, and billions of web pages—and looking at the world every day through camera lenses and microphones, including 3 billion phones and counting—all recording our own sounds and visions. It is a supercomputer so large that if we think of it as one connected thing, it processes some 3 million emails every second and generates so many exabytes of data each year that it consumes 5 percent of the world’s electrical energy. And what it wants is . more knowledge. Increasing sentience, or intelligence. Who is writing the software that makes this contraption useful and productive? We are. When we post and tag photos, for example, we are teaching the machine to give names to images, and the thickening links between caption and picture form a “neural net” that can continue learning! The 100 billion times per day humans click on one page or another is a way of teaching the web what we think is important. Each time we forge a link between words, we teach it an idea. We may think we are merely wasting time when we surf mindlessly or blog an item, but each time we click a link we strengthen a node somewhere in the supercomputer’s mind. “Google is learning. We teach it while we think it is teaching us. Every search for information is itself a piece of information Google can learn from.”1 An understanding of our rights and our power needs to course through the knowledge sector. The Rebel Alliance—should we found it—must have this understanding, and a deep awareness of the pedigree of our integrity, the duration of our centuries-old struggle, at its core. While Wikipedia doesn’t formally recognize the Encyclopédie 2 The New Enlightenment • The New Enlightenment Chapter 10. Our Archive anywhere as its exquisite forerunner, the Internet Archive roots its name and founding principles—even core graphics—in the Library of Alexandria, from the Ptolemaic period, some 2,300 years ago, in Egypt, 7,400 miles away from where the Internet Archive is based in current-day San Francisco. The Internet Archive was funded initially, and still, in the main, by its founder, Brewster Kahle, from the sale of a type of search engine he created called, not incidentally, Alexa. Today the Archive, an independent nonprofit, runs on an operating budget of approximately $10 million per year. Its goal is to provide “universal access to all knowledge.” Its primary operation is to archive the worldwide web—not a bad ambition, given that everyone seems to be ignoring the essential mandate of collecting, keeping, and learning from the past. Today the archive holds 330 billion web pages, 20 million books and texts, 4.5 million audio recordings (including 180,000 live concerts), 4 million videos (including 1.6 million television news programs), 3 million images, and 200,000 software programs.2 Any user anywhere can create a free account and archive her own media—it’s as close to a public resource as we can get. Is this important? It is no accident that the etymology of the word “archive” comes from the Greek ἄρχω—“to begin, rule, govern”—and thus no accident that “archive” shares the same root as the words “monarch” and “hierarchy.”3 Archives started in the “archon”—the seat of government—and the centrality of the power of the archive is likely to be the story of the twenty-first century.4 If this book is in part about the power of the moving image and the web, then it is at least worth noting that this—the archive as pure power—is the critical message of one of the highest-dollar-grossing moving images of all time. Set in the future (but how far?), the Na’avi people in Avatar plug into and connect with the sounds of the past—they make zahaylu—as the primordial way for them to heal, regenerate their powers, and enlighten themselves. It seems to work. We too have to imagine a collective of knowledge institutions forging alliances to make available all of their holdings to similar seekers and the weak and the injured at our own Tree of Knowledge. The Internet Archive mirrors the Library of Alexandria’s ambitions. “Starting as early as 300 BCE,” we have been told, “the Ptolemaic kings who ruled Alexandria had the inspired idea of luring leading scholars, scientists, and poets to their city by offering them life appointments at the Museum”—located right in the center of the city— featuring a library where “most of the intellectual inheritance of Greek, Latin, Babylonian, Egyptian, and Jewish cultures had been assembled at enormous cost and carefully archived for research.” Ptolemy III (246–221 BCE), according to one 3 The New Enlightenment • The New Enlightenment Chapter 10. Our Archive historian, “is said to have sent messages to all the rulers of the known world, asking for books to copy.” As a result, Euclid developed his geometry in Alexandria; Archimedes discovered pi and laid the foundation for calculus; Eratosthenes posited that the earth was round and calculated its circumference to within 1 percent; Galen revolutionized medicine. Alexandrian astronomers postulated a heliocentric universe; geometers deduced that the length of a year was 365 1⁄4 days and proposed adding a “leap day” every fourth year; geographers speculated that it would be possible to reach India by sailing west from Spain; engineers developed hydraulics and pneumatics; anatomists first understood clearly that the brain and the nervous system were a unit, studied the function of the heart and the digestive system, and conducted experiments in nutrition. The level of achievement was staggering. And: The Alexandrian library was not associated with a particular doctrine or philosophical school; its scope was the entire range of intellectual inquiry. It represented a global cosmopolitanism, a determination to assemble the accumulated knowledge of the whole world and to perfect and add to this knowledge.5 The concatenating links of past and present abound. The Encyclopédie launched in 1750 with a prospectus that included a map of human knowledge. It was built upon the knowledge maps of Francis Bacon—to whom Diderot and D’Alembert quite frequently alluded.6 Bacon sought to organize knowledge—memory, reason, imagination, that sort of thing—and here we go again. Google’s mission is one they put out there simply for us: Organize the world’s information and make it universally accessible and useful. And it’s terrifying.7 Should one private, profit-seeking company, one state, one religion, or one church have a role like this ceded to it by the people? (The answer, again, is still no.) The company’s work on its taxonomy of knowledge objects, its “knowledge graph,” is in many senses founded upon the knowledge base of structured data that Metaweb developed and ran publicly as Freebase from 2007 to 2010. Google bought that out from Danny Hillis and his colleagues, who today, at MIT and elsewhere, are trying to develop a not-for-profit alternative that may take billions of dollars to underwrite.8 At the time of that purchase, that free knowledge base was said to be able to sort and classify 70 billion facts. Today, the challenge is orders of magnitude greater.9 We have to understand how to make knowledge as easily 4 The New Enlightenment • The New Enlightenment Chapter 10. Our Archive searchable and discoverable as other products are in our society, from music and sports videos to pornography and shoes. What we have that the Carnegie Commission did not have is the new network, some woke alliance members, and extraordinary computing power. (What they had that we do not is a sense of deep, catalyzing fear—but we will be getting that again.) Where artificial intelligence may take us is for other work in the future, but for now, what if we were to steal a page from the Monsterverse? What if, as Rebel Alliance members, we were to acknowledge our own lack of progress to date, as a field? While we noodle over the potential rights challenges involved in making our assets searchable, our lunch is being eaten. Our breakfast. Our dinner. The commercial sector is actively exploiting the growth potential for such advanced products and services. Pandora, Netflix, iTunes, and IMDb (the Internet Movie Database), among others, enable customers to experience moving images, sounds, texts, and images, and they provide thriving recommendation engines. Google has more than two hundred signals in its Page-Rank algorithm.

Chapter 10. Our Archive

Getting Started Computing at the Al Lab by Christopher C. Stacy Abstract

Selected Filmography of Digital Culture and New Media Art

Creativity in Computer Science. in J

2012, Dec, Google Introduces Metaweb Searching

This Introduction to the SHOT SIGCIS Session Examining The

Sept. 25Th Born: Sept

Landmark Contributions by Students in Computer Science Version 12: September 16, 2009

Past, Present, and Future

Mechanical Computing: the Computational Complexity of Computable Capable of Being Worked out by Physical Devices Calculation, Especially Using a Computer

Funding a Revolution: Government Support for Computing Research

Vector Models for Data-Parallel Computing

Being Digital