BIG DATA’S BIG VISIONARY As cholera swept through London in the mid-19th century, a physician named John Snow painstakingly drew a paper map indicating clusters of homes where the deadly waterborne infection had struck. In an iconic feat in public health history, he implicated the Broad Street pump as the source of the scourge—a founding event in modern epidemiology. Today, Snow might have crunched GPS information and disease prevalence data and solved the problem within hours. And in a cellphone text, he might also have sought advice from a certain professor of computational and bioin- formatics at Harvard School of Public Health, who likely would have spun off dozens of ideas about study design, data analysis and modeling, and interpretation. That’s how John Quackenbush operates. continued

32 Harvard Public Health BIG DATA’S BIG VISIONARY Kent Dayton / HSPH Dayton Kent Boyish-looking at age 52, with by moons—an observation that into an information science,” says long hair, a fashionably scruffy beard, confirmed Copernicus’ theory that Quackenbush. “And when you look at and a wardrobe stocked with jeans, the earth revolved around the sun, all the great scientific revolutions, it’s black T-shirts, hiking boots, and a and thus shuffled humankind’s rank data that drive new ways of thinking black leather jacket, Quackenbush in the cosmic order. Likewise, in the about problems. We all have ideas. looks more like an indie band member 19th and 20th centuries, experiments We all think we know about how than someone at the forefront of conducted at the extremes of velocity the universe operates. But when you mapping modern insights into the and distance enabled physicists to start to get empirical data, you realize causes of disease. He has created unveil the structure and interactions that your hypotheses aren’t true. In novel methods of analyzing today’s of nature’s tiniest particles. biomedical research, we’ve had a relentless flood of digitized informa- In essence, these turning points lot of ideas about everything, from tion that is itself scourge or salvation, forced us to reimagine ourselves and ‘What are the origins and evolution of depending on how it is harnessed. the world around us. Today’s humans?’ to ‘What is the basic nature “We’re making huge investments revolution will have the same effect. of disease?’ Genomic data are funda- in technology to generate data. The science is being driven, in mentally changing the way we think We’re making huge investments in part, by economics. In 2001, it cost about those questions.” electronic medical records,” he says. “What’s surprising is not what we’ve AMASSING DATA FROM done, but what we haven’t done. We “WE ARE AWASH IN DISPARATE SOURCES haven’t made a parallel investment DATA. BIOLOGY IS Indeed, says Quackenbush, big in tools to make sense of all this EVOLVING FROM BEING data may represent a treasure trove of potential solutions to countless information.” A PURE LABORATORY medical and public health problems. SCIENCE INTO AN WHAT MAKES A REVOLUTION? In a few years, researchers will be Tucked in a warren of labs and INFORMATION SCIENCE.” able to conduct large observational wooden cabinets housing glass —John Quackenbush cohort studies that yield whole- flasks, Quackenbush’s scientific lair sequences on hundreds or is a casual repository of digital data, thousands of volunteers. They could stored on black computer towers lying about $100 million to figure out then link the genomic information helter-skelter. It is also an homage to the order of DNA nucleotides—the to diet and lifestyle, health records, domestic bliss, festooned with photo- billions of A’s, C’s, G’s, and T’s—in environmental exposures, and other graphs of his wife, Mary Kalamaras, an individual’s genome; and it took data. Once this digitized information an editor and photographer, and his months for armies of researchers is amassed, synthesized, distilled, 8-year-old son, Adam. around the world to generate and and analyzed, it could offer clues to Mulling the megatrend that big interpret the data. By 2009, the cost how our genetic profiles raise the risk data represent, Quackenbush likens had dropped to $100,000 and the of certain diseases or protect us, and today’s genomics revolution—kindled time required to a few weeks. Today, how our genes interact with what’s by the sequencing of the human it costs between $1,000 and $2,000— inside and around us. genome in 2000—to other turning an easy credit-card purchase—and “Environmental exposure such points in the history of science. In the takes a day or two. as cigarette smoking or obesity have early 17th century, when Galileo built Put simply, genetic sequencing much greater relative risks than a telescope and pointed it at Jupiter, has become a commodity. “We are he found that the planet was circled awash in data. Biology is evolving from being a pure laboratory science

34 Harvard Public Health almost any genetic factor you can understanding how the disease is “I LOVED THE MAD .” imagine,” says Quackenbush. “But transmitted in networks helps you Quackenbush was 5 when he everyone has weird Uncle Bob who develop strategies to stop it. Even performed his first bold scientific smoked until he was 90 and never today, tracking diseases like SARS, experiment: mixing cleaning chemi- coughed. Or, on the other hand, a MERS, and Ebola involves analyzing cals in the bathtub. “That kid is still friend or relative who never smoked combinations of modern molecular at the core of who I am,” he says. but developed spontaneous lung and social interaction data.” “When I was little, I watched Batman at 40.” Big data might even reveal on TV and I loved the mad hidden associations between appar- villains the best. I came to be excited WHO TRANSMITTED INFECTION ently disparate afflictions. “One of about science because it involved this TO WHOM? the things I would love to be able process of discovery. I want to under- Big data may help scientists detail to do is look at all the different stand how things work.” the spread of HIV more reliably than diseases that co-occur in people,” The quest to understand how through contact tracing, which is says Quackenbush. “If we had genetic things work—and to solve problems based on first-person recollections. information, we could combine all by discerning connections—has “Today, we ask people about their that data together to understand if also been a theme in Quackenbush’s sexual partners to track the move- certain genetic risk factors predispose own life. “My father was extremely ment of the infection. Or we collect you not to one disease, but to a host abusive. We had a lot of domestic empirical data and map the flow of of seemingly different diseases.” For violence.” After his parents divorced, disease in social networks,” observes instance, a genetic twist in an epithe- he lived with his mother, who worked Quackenbush. lial cell in the colon that raises the as a nurse. When he was 12, his “But someday, we will be able to risk for cancer might also raise the father briefly kidnapped him and his sequence the virus and in that way risk of asthma or chronic bronchitis sisters. Quackenbush says he hasn’t actually pinpoint who has transmitted in an epithelial cell in the lung. “If had contact with his father since, the infection to whom, by tracking the we start to see such connections,” adding, “He was at the far end of the mutations that the virus has picked spectrum of acceptable behavior. His © REUTERS says Quackenbush, “we can think up. Why is that important from a about common risk factors and even continued public health viewpoint? Because common therapies.”

“ Tracking diseases like SARS, MERS, and Ebola involves analyzing combinations of modern molecular and social interaction data.” —John Quackenbush

Medical staff working with Médecins Sans Frontiè res (MSF) don protective gear before entering an isolation area at the MSF Ebola treatment center in Kailahun, Sierra Leone, July 2014.

35 Fall 2014 antics didn’t create a healthy environ- when I went through it and got the expendable. It was a devastating experi- ment for any of us. same answer, I knew that I was ence. I spent that night feeling like a “Would I change any of that? I right all along. It was an epiphany. I complete failure. But the next morning, don’t think I would. The road we take was sitting in this little office. The I woke up and asked myself where the is a big part of who we are. The expe- weather was dreary. And I had this most interesting unsolved scientific riences we have are what make us. All feeling of sheer joy at discovering a problems were.” of us face different adversities in our tiny little corner of the universe that Helping a girlfriend, who was lives, and the challenge is to overcome no one else knew existed.” a PhD student in biology, analyze them. The most important lesson I’ve At the time, however, interest in her data, he discovered a seamless learned is how to fail.” his blissful theoretical corner of the fit between the burgeoning field of The lesson also applies profession- universe was waning. With the Cold molecular biology and his physics ally. “As a scientist, if you are working training, which had taught him a two- at the edge of your understanding, step approach: distill the question to a you are going to come up with ideas QUACKENBUSH problem one can solve, then generalize that are just plain wrong. If you are WANTS TO EXPLORE the answer into universal principles. a successful scientist, you have to be DISEASES THAT CO- Quackenbush soon moved swiftly prepared to quickly figure out why OCCUR IN INDIVIDUALS. through the most prestigious molec- you’re wrong and then try something ular biology and genomics programs BY ANALYZING VAST else. And if you do that enough, you in the country. In 2005, his scientific AMOUNTS OF DATA, develop an intuition that can help you peregrinations brought him to Boston, be wrong less often. But if you aren’t IT MAY BE POSSIBLE with dual appointments at HSPH and failing, you aren’t trying.” TO PINPOINT GENETIC at the Dana-Farber Cancer Institute. RISK FACTORS THAT At all these posts, the animating PHYSICS TO BIOLOGY PREDISPOSE PEOPLE impulse behind Quackenbush’s Quackenbush’s first scientific passion NOT TO ONE DISEASE, science was transparency. In 2013, was theoretical physics. “In physics, BUT TO A HOST OF he was named a White House Open we draw conclusions about things we Science Champion of Change for SEEMINGLY DIFFERENT can no longer see and observe. We making open sharing of scientific data collect data and plug them into theo- AFFLICTIONS. a reality. “We don’t publish a paper retical models. Then we refine those without ensuring that both the soft- models to see where they break down, ware and the data are accessible, so War drawing to a close, government so that we can reinterpret the data that other people can reproduce our funding for physics research dried and build a better understanding of work,” he says offhandedly. The award up. By 1990, Quackenbush’s fascina- how some particle or force functions.” committee couched the achievement tion with high-energy physics had In the 1980s, in graduate school in grander rhetoric: “Since the Human mutated. “I had been on an experiment at the University of California, Genome Project began in the 1990s, at Fermilab outside of Chicago. I had Los Angeles (UCLA), he toiled for new technologies, producing previ- just come back to UCLA from weeks— months on a particularly elusive ously unimaginable quantities of data including Thanksgiving—of manning problem. By the end, he had written on human health and disease, have the experiment on the midnight-to-8 a 60-page calculation. “I kept going been driving a revolution in medi- a.m. ‘owl’ shift. I walked into my office back to my adviser, who kept telling cine and biomedical research. John and was greeted with the news that me it was wrong. The third time, Quackenbush has been a pioneer in our funding had taken a severe cut and that, as a postdoctoral fellow, I was

36 Harvard Public Health Are active genetic circuits being switched on and off differently in men and women? Quackenbush and a colleague found that in Alzheimer’s patients, certain genes were indeed activated differently in men and women. A daughter cares for her 85-year-old mother who suffers from Alzheimer’s disease.

ensuring that these data, and the tools patients, certain genes were indeed considering whether they want to needed to access them, are available, activated differently in men and take on the known risks of HRT—an accessible, and useful.” women—and these genes were highly increase in breast cancer—to mitigate responsive to estrogen and testos- the risk for Alzheimer’s. But first we EXPLAINING WOMEN’S GREATER terone. As Quackenbush sees it, “There need to conclusively establish the link.” RISK OF ALZHEIMER’S are subtle hormonal balances that Deploying big data in this manner One of the emerging mysteries in appear to hold the system in check.” may transform the way science is medicine is why women and men It was a startling discovery, and conducted. Rather than dissecting the face different risks for a number of for Quackenbush it opened up fresh function of individual genes and then common deadly conditions, from avenues of research. “If the genes carrying out years of clinical trials to heart disease to chronic obstruc- activated in Alzheimer’s disease confirm a hypothesis, investigators tive pulmonary disease. Alzheimer’s are hormonally responsive, would may simply be able to analyze existing disease reflects one of the starkest something like hormone replace- data. According to Quackenbush, gender imbalances: two-thirds of ment therapy (HRT) in women have a “There are certain questions where, sufferers are women. protective effect—or might it actually if the big data evidence is strong Focusing on Alzheimer’s, increase risk? We don’t know.” enough, doing the clinical trial may Quackenbush and his colleague To arrive at the answer, he envi- not be practical or even necessary.” Kimberly Glass are applying new sions using epidemiological data computational tools to a data set that from large cohorts—such as from PERSONALIZING MEDICINE had been around for more than a © Mikkel Ostergaard / PANOS Ostergaard © Mikkel the Framingham Heart Study or The most immediate application of decade and had already been exten- from the Center for Medicare and genomics will likely be in personal- sively analyzed. But they are asking Medicaid Services—to tease out ized medicine. Even today, genetic a new question: Are active genetic whether women who received HRT are profiles are being used to target treat- circuits being switched on and off at higher or lower risk for Alzheimer’s ments for everything from breast differently in men and women? What disease. If HRT is proven to lower tumors to heart disease to neuropsy- they found was that in Alzheimer’s the risk of Alzheimer’s, “then we can chiatric disorders. provide women with the option of continued page 49

37 Fall 2014 BIG DATA’S BIG VISIONARY continued from page 37 Quackenbush sees more possibilities. theoretical. “My grandmother died of my lifetime is that the more infor- He is currently exploring cancer treat- of Alzheimer’s. I don’t know if she mation we have, the greater is the ment through a reversed lens: asking carried an APOE [apolipoprotein E] opportunity to learn new things. The whether a tumor’s genetic profile mutation—which raises the risk of the challenge—and the opportunity—rest correlates with its size, shape, density, disease—or not. But I guarantee you in separating meaningless correlations and, most important, invasiveness. If that at some point, I’ll be sequenced. from causal relationships.” it does, then doctors could potentially From my personal perspective, there is Getting a handle on big data and determine the genetic profile of a tremendous power in information.” genomics is like mastering a language, malignancy based on simple CT scan he adds. “There are tens of thousands images, which in turn would inform LIKE LEARNING A LANGUAGE of words. You can get by just fine with treatment. For all the popular enthusiasm a few hundred. But the subtleties and “If I can test your tumor for surrounding big data, the diatribes complexities of what we can convey $1,000 and tell you that you’re not against it are growing: that it’s noisy by using the entire spectrum of the likely to respond to a particular and rife with false associations; that noisy lexicon is part of the joy of being therapy that would cost $30,000, it doesn’t necessarily equate to knowl- able to speak and communicate.” that’s a huge public health win,” he edge or understanding; that it doesn’t Quackenbush clearly revels both notes, “because that money can be reflect the real, messy world—dubbed in doing the science and in talking used for other potentially effective “thick data”; and that it won’t solve about the adventures and misadven- therapies, or to support other parts complex human problems. tures along the way. Hands resting on of the health care system. And hope- “I would say all of those things head, eyes widening, he says, “The fully we can then help you move more are true,” concedes Quackenbush. most exciting moment is when the quickly to a treatment with a greater “Data by itself is not a panacea. But data don’t agree with the model. We’re likelihood of being effective.” that doesn’t mean we can’t use it. We always looking to be surprised.” Quackenbush’s commitment just need to be smart about how we Madeline Drexler is editor of Harvard to the data revolution is not merely use it. My experience over the course Public Health.

49 Fall 2014