Culturomics: Word Play
Total Page:16
File Type:pdf, Size:1020Kb
WORD PL AY © 2011 Macmillan Publishers Limited. All rights reserved FEATURE NEWS WORD PLYet his role AY as evangelist for change in the humanities — By mining a database or doomsday prophet, depending on your point of view — of the world’s books, is just one of the many parts played by Lieberman Aiden. He is also: the inventor of a groundbreaking protocol that Erez Lieberman Aiden is reveals how DNA can be tightly wound and yet untangled attempting to automate enough to orchestrate life; the chief executive of iShoe, a company that is testing sensor-stuffed shoe inserts to help much of humanities the elderly with their balance; and the co-founder, with his wife, of Bears Without Borders, which sends thousands of research. But is the field stuffed animals to children in the developing world. (Barely ready to be digitized? concealed in the couple’s basement are mounds of donated animals awaiting delivery.) In pouring his energies into BY ERIC HAND all the projects that excite him, Lieberman Aiden doesn’t transcend disciplinary boundaries so much as ignore them. And although he is still technically a postdoctoral rez Lieberman Aiden is standing on the sun deck of researcher at Harvard, Lieberman Aiden seems to publish his town house, rocking back and forth on the balls the results of those projects almost exclusively on the cov- of his bare feet as he belts out a blessing. The Hebrew ers of Science and Nature; hung in the stairwell below the Ewords echo across the quiet courtyards of Harvard Uni- sun deck, he has framed blow-ups of the magazine covers versity in Cambridge, Massachusetts. The sky has turned to prove it. indigo as the light and warmth leak away from this day in But that is work, and this is Shabbat dinner, the start of late April. Shalom aleichem, he sings. Peace be upon you. the Jewish Sabbath: a time for rest. The light switches will Lieberman Aiden — molecular biologist, applied math- remain untouched, leaving the house illuminated through ematician and, at 31 years old, the precocious doyen of the the night; the hot plate in the kitchen, on which the meal emerging field known as the digital humanities — could do is being warmed, is on a timer. Three candles have been with a little peace. The cries of his 10-month-old son have lit, one for each member of the household. Lieberman abated — for the moment — and he has had just enough Aiden sings unabashedly in a hearty baritone that is not at time to throw on a pair of frayed black trousers and a shiny all like his reedy, excitable speaking voice. He gazes at his synthetic pullover before his guests arrive. A five o’clock wife, Aviva Presser Aiden, who grins back at him, hold- shadow darkens the terrain between his thick goatee and ing her sweater tight to herself in the chilly night air. She unkempt hair. The night before, he caught a late train back too has reason to rest contentedly. The week before, she from Princeton University in New Jersey, where he, the learned that she had won a US$100,000 grant from the Bill geeky scientist, had the delicate task of informing a room & Melinda Gates Foundation in Seattle, Washington, to of erudite historians that his efforts at mining a database of build a microbial fuel cell that could charge mobile phones 5 million books, about 4% of all those ever published, had in Africa. The project means a year-long break from her made much of what they do trivially easy. The scrupulous studies at Harvard Medical School in Boston, where she is tracking of ideas across history, for instance — work that adding an MD to her PhD in genetics. has consumed entire careers — can be done in seconds It is only by comparison with this academic power- with tools that Lieberman Aiden and his colleagues have couple that the other dinner guests — two young, self- SAM OGDEN invented. assured Harvard physics graduates — look a bit lost, but 23 JUNE 2011 | VOL 474 | NATURE | 437 © 2011 Macmillan Publishers Limited. All rights reserved NEWS FEATURE that probably has more to do with their unfa- a corpus of 500 billion words. A ‘one-gram’ of words and phrases produced by the n-grams miliarity with the Shabbat rituals. They flip plots the frequency of a single word such as tool. “I think saying all books equal the DNA through the Hebrew prayer books and try to ‘feminism’ over time; a ‘two-gram’ shows the of human experience — I think that’s a very follow along. But Lieberman Aiden, who in frequency of a contiguous phrase, such as dangerous parallel,” says Cohen. How do you his 20s toyed with becoming a rabbi, has no ‘touch base’ (see ‘Think outside the box’). factor in the cultural contributions of furni- need for the book. These are the texts that he Google unveiled the tool on 16 December ture, or dance, or ticket stubs at a movie hall, he has studied for years. These are the words he 2010, the same day that Lieberman Aiden and asks. What about all the books that were never knows best. his colleagues published a paper in Science2 published? Or the culture as experienced by describing how the tool could be used, for the world’s vast illiterate populations? READING VERY NOT-CAREFULLY example, to identify the verb that has regular- Other scholars have deep reservations about As a reader with a finite amount of time, ized the fastest: ‘chid’ and ‘chode’ to ‘chided’ the digital humanities movement as a whole — Lieberman Aiden likes to say, you pretty much in some 200 years (see ‘The fastest verb on especially if it will come at the expense of tra- have two choices. You can read a small number the planet’). “We found ‘found’ 200,000 times ditional approaches. “You can’t help but worry of books very carefully. Or you can read lots of more often than we finded ‘finded’,” they that this is going to sweep the deck of all money books “very, very not-carefully”. Most humani- wrote, with characteristic playfulness. “In con- for humanities everywhere else,” says Anthony ties scholars abide by the former approach. In a trast, ‘dwelt’ dwelt in our data only 60 times Grafton, a historian at Princeton and president process known as close-reading, they seek out as often as ‘dwelled’ dwelled.” Interspersed of the American Historical Association, who original sources in archives, where they under- between the jokes were real discoveries — uses a giant, geared wooden reading wheel to line, annotate and cross-reference the text in many of which had nothing to do with verbs. help him manage his oversized, Renaissance efforts to identify and interpret authors’ inten- By comparing German and English texts from texts. He wants researchers to hold onto the tions, historical trends and linguistic evolution. the first half of the twentieth century, the team power that comes with intimately knowing It’s the approach Lieberman Aiden followed for showed that the Nazi regime suppressed men- their primary sources, right down to the scrib- a 2007 paper in Nature1. Sifting through old tion of the Jewish artist Marc Chagall, and bled notes in the margin that would elude the grammar books, he and his colleagues iden- that the n-grams tool could be used to identify book scanners. “You don’t want to give up what tified 177 verbs that were irregular in the era artists, writers or activists whose suppression is your own core activity,” he says. of Old English (around ad 800) and studied had hitherto been unknown. Lieberman Aiden their conjugation in Middle English (around and Michel called their approach culturomics, FOLLOWING TRADITION ad 1200), then in the English used today. They a reference to the genomics-like scale of the Back at the Aiden house, the Shabbat dinner found that less-commonly used verbs regular- book database, and a nod to the future, when guests have all laved their hands with a glass of ized much more quickly than commonly used they hope that more of the media that under- water and returned to the sun deck for matzo- ones: ‘wrought’ became ‘worked’, but ‘went’ has pin culture — newspapers, blogs, art, music ball soup. Lieberman Aiden explains some of not become ‘goed’. The study gave Lieberman — will be folded in. the trepidation he felt when he and Michel Aiden a first-hand lesson in how painstaking In the first 24 hours after its launch, the talked to the historians at Princeton about a traditional humanities approach could be. n-grams viewer (ngrams.googlelabs.com) their work. “I was a little bit nervous going in,” But what if, Lieberman Aiden wondered, received more than one million hits. Dan he says. “I really thought that we were going to you could read every book ever written ‘not- Cohen, director of the Roy Rosenzweig Center get denounced at one point.” carefully’? You could then show how verbs are for History and New Media at George Mason Although Lieberman Aiden and Michel are conjugated not just at isolated moments in his- University in Fairfax, Virginia, calls the tool a sensitive to the feelings of traditional humani- tory, but continuously through time, as the cul- “gateway drug” for the digital humanities, a field ties scholars, they are also too young, restless and deeply ambitious to slow their own pur- suits. Lieberman Aiden says that the influence of technology on the humanities is already past a tipping point. The tools and methods “THESE TOOLS ARE REVOLUTIONIZING that it provides, he says, will be impossible for researchers to ignore.