Computational Biology

Sequencing, Sequencing, Sequencing BGI- tackles the cute, the edible, and pretty much everything else.

BY ALISSA POH HENZHEN, CHINA—Boxing Day 2008: The Luohu border between Hong Kong and mainland China Sis crowded, smoky, and noisy. With my visitor’s visa to Shenzhen in hand, I’m cleared to visit the Beijing Genom- ics Institute’s (BGI’s) Shenzhen-based sequencing facility. Accompanied by my father and brother-in-law, I wave down the nearest cab. The cab driver doesn’t have the faintest idea where BGI is lo- cated. As none of us can handle his thick Mandarin accent, I’m forced to call Zhuo Li, vice president of BGI’s health care division. I hand the phone to the driver, and happily we’re deposited at BGI’s main entrance. It’s a tall gray-and-glass struc- ture, distinctly newer and shinier than the neighboring buildings. My companions BGI currently houses dozens of Illumina sequencers and scores of China’s ‘best and brightest’. head across the street for a late breakfast (frog legs), and I wander in to meet Li. extraction. time, standing at the podium and looking The lobby lacks a smiling reception- It all started in August 1998 with the out at a sea of skeptical faces. He gambled ist, tasteful paintings, or piped-in music. Human Project, which geneticist on the funds somehow materializing, fig- Save for supercomputers humming away Yang Huanming and three like-minded uring that what the audience didn’t know within a glass-enclosed area and several countrymen, all recently returned from couldn’t hurt them, or BGI’s image. ping-pong tables—naturally—in a corner, U.S. postdocs, saw as the perfect way to Three years later, the genomics world it’s Spartan. My sense is BGI’s staff wasn’t position China on the genomics and se- took notice when BGI metamorphosed going to spend any time on décor that quencing stage. Yang’s plan was to utilize from one man’s intangible dream to the they could otherwise devote to research. the Chinese Academy of Sciences (CAS), cover of Science, having outraced its Li is tall, lean, and intense. He greets as its Institute of Genetics already had global competition to shotgun-sequence me in immaculate English, and escorts its own Human Genome Center. But he the indica rice genome. A reform-minded me on a whirlwind tour of the institute. quickly concluded that CAS, bound by China proved that the will to succeed, It’s eleven floors of long hallways, each traditions, was lagging behind the rest of spirited nationalism, and sheer manpow- with its respective research unit—, the world. In early 1999, he broke away, er can be a potent combination. BGI split bioinformatics and the like—on one side, setting up BGI as a private, non-profit its sequencing team into 12-hour shifts so posters papering the opposite wall. Lab- research organization. A few months the machines could run 24/7 for the 74 coated staff are everywhere, poring over later, at a conference at the Wellcome days it took to finish indica. Dispensing printouts, peering into cell-culture hoods, Trust Sanger Institute in the U.K., Yang with the commute between workplace shuttling racks of test tubes from one lab announced China’s intention of becoming and home, staff catnapped in hallways or to another. Most ignore me, apart from a global player in genomics. simply dozed in their chairs. the occasional half-diffident glance. Naturally, he was asked whether he By 2002, BGI had outgrown its initial had the money to realize his vision. As home and relocated to an industrial park BGI Beginnings he later confessed to Science, he lied. Just in Beijing, with an additional campus Li succinctly answers my questions, four months after the conference, CAS in . The original Beijing unit but as I discover, Chinese scientists are funded three Chinese sequencing centers assumed responsibility for all commer- rather more close-mouthed than their to tackle 1 percent of the human genome, cial and outsourcing projects, while the Western colleagues. Getting them to with BGI receiving over half of the total Hangzhou branch focused on sequencing elaborate beyond the facts is akin to tooth award. But Yang didn’t know it at the and academic research. Then in 2007,

[20] BIO•IT WORLD NOVEMBER | DECEMBER 2009 www.bio-itworld.com BGI made a major investment in next- bamboo and have poor li- gen sequencing technology— Illumina’s bido. It also suggested that Solexa—and moved its headquarters to rather than being related Shenzhen. The director is 33-year-old to raccoons, they likely hail Jun Wang, a handsome, highly decorated from the bear family. Ph.D. from Peking University whose in- The first (human) Asian terest in genomics dates back a decade to sequence is the starting point the . for BGI-Shenzhen’s Yan- huang project—so named The Chinese Way for the Mandarin saying yan New employees at BGI-Shenzhen don’t huang zi sun, or “descen- need reminding about the institute’s dants of Yan and Huang,” two game plan. It’s right in their faces: poster- emperors from ancient times style and of billboard proportions, span- that many Chinese consider ning an entire hallway. Printed in giant their earliest ancestors. The font, dead center, is a four-word slogan— institute has its sights “Sequencing is the basic!” It’s the founda- set on sequencing at tion for moving into broader biological least 100 additional Chi- systems and processes—analysis of DNA nese , to better variation and global methylation, protein study genetic variations networks, and metagenomics, ultimately among China’s differ- providing individualized health care and ent populations. agricultural advances. “It got a lot of media Large-scale research is a mainstay of attention,” Li says of BGI-Shenzhen, and the “Tree of Life” the November 2008 project among its most prominent. Silk- publication in Nature. worms, cucumbers, chickens, and pigs “Not long afterwards, are but a few examples of organisms large we received RMB10 and small that the institute’s scientists million [$1.46 mil- have already sequenced. On the wall-sized lion] from an anony- poster, they’re lumped into three groups: mous Chinese donor. animals, plants, and microorganisms. He’s interested in Animals are labeled “economic” (ducks, decoding personal ALISSA POH for instance); “endangered” (the Chinese genetic information to im- river dolphin); or “model” (Drosophila). prove biomedical research, Similarly, microorganisms are categorized and wants to help this proj- as industrial, pathogenic, or environmen- ect move forward.” tal. Projects past, present, and future are Information gleaned annotated, respectively, by red flags, green from Yanhuang will con- stars, and yellow circles. tribute to the 1000 Genom- BGI-Shenzhen is perhaps best known es project, aimed at creating for the panda genome, as well as a Han- the most finely-tuned refer- Chinese individual whose genome was ence map of human genetic but the third announced and published variation to date, down to worldwide, after Watson and Venter. the 1 percent level. BGI- Back in February 2008, the institute Shenzhen is one of the key launched its International Giant Panda players in this undertaking. Genome project, aiming to sequence and Other initiatives include assemble the draft sequence within six a “strategic alliance,” since months. The honor fell to Jingjing, the early 2008, with Knome, prototype for the Beijing Olympics’ panda George Church’s personal mascot. The project was wrapped up genomics company. The lat- From top: The Beijing Genomics Institute, Shenzhen. An by October. This ranked among China’s ter gets prime access to BGI’s artist’s representation of BGI’s next-planned home in a Shenzen industrial park currently under construction in top ten technology accomplishments for capabilities in whole-genome nearby Enshan village. Supercomputers at BGI. 2008, and is viewed as a major step to- sequencing, assembly, and an- ward understanding why pandas eat only notation for its private clients.

www.bio-itworld.com NOVEMBER | DECEMBER 2009 BIO•IT WORLD [ 21 ] Computational Biology

BGI-Shenzhen is also one of 13 academic and industrial participants in MetaHIT, Blueprint of the Supercomputer Center a four-year project financed by the Euro- Requirement pean Commission to study connections Genomics is a typical data-intensive computational application. The sequencing between of the human intestinal platform generates over 10 Tb raw data every day currently. microbiota and our health, zooming in Milestones on inflammatory bowel disease and obe- • To the end of 2008, 20 Tflops (Tera FLoating point Operation Per Second), 1 PB storage sity. In addition, a Sino-Danish diabetes • To the midyear of 2009, 50 Tflops, 5 PB storage project involves deep-sequencing of exons • To the end of 2009, 100 Tflops, 10 PB storage and other conserved genomic regions from more than 4,000 individuals, in an System Architecture • Computational capability: >100 Tflops, Linux cluster system, 4 ways x 4 cores CPUs and attempt to discover genetic variations 32-64 RAM per node, ~500 computing nodes; linked with obesity, type 2 diabetes and • Storage: 10 PB, large-scale parallel file system, high speed I/O; hypertension. • Network: 10 Gb computing Ethernet, 100 Mb management Ethernet; “We’re a completely private organi- • System: professional high-performance Linux cluster system and job management; • Software: bioinformatics software development by our own team zation, with an annual budget of [$30 million],” Li says. “So to feed ourselves Budget and carry out all our projects, we rely on RMB60 million ($8.79 million) revenue from these collaborations, and our spin-off companies [ten in total].” er, with some of the brightest stars barely handmade cloning (HMC) technology— BGI-Shenzhen also benefits from the out of college. Designing novel analysis a cheaper and simpler alternative—to generous support of Shenzhen’s munici- tools capable of handling short-read se- produce transgenic pigs. They’ve already pal government. quences by the ton is among the group’s created a porcine model of Alzheimer’s, specialties. Their Short Oligonucleotide collaborating with Danish scientists. ‘Omics Know-How Analysis Package (SOAP), for instance, BGI-Shenzhen relies heavily on Illumina’s includes de novo software where assem- Looking Ahead Genome Analyzers for its myriad sequenc- bling large genomes—panda, human and Rather than dwell on how many years ing projects; at last count, April 2009, the like—takes just about two days. they’ve been in existence, folks at BGI their fleet had expanded to 29 (eight are BGI-Shenzhen also has an active consider their institute “as young as ge- in Hong Kong). The machines are kept health care platform, which Li manages. nomics.” And much like the field itself— in continuous production, churning out They’ve developed a variety of affordable, which has accelerated at lightning speed data at a daily rate of 60 gigabases (GB). quality diagnostic tests—for instance, within the last decade—BGI now has five “We could sequence the human genome tissue-matching via the gold standard additional branches across China, plus a 20 times a day,” Li says, “but we probably Sequencing-Based Typing (SBT). China presence in Hong Kong, Denmark, and won’t load all our machines with just the is seeing an increasing number of bone the U.S. (California). Coming from a tiny one sample.” He’s not joking, and yet I marrow transplants, yet most diagnostic brick building devoid of staff, equipment, wonder if one can’t take the last half of his laboratories remain unequipped with the or money, it’s phenomenal growth. So statement as a bit of deadpan humor. expensive SBT commercial kits. Hence what does BGI see in its future? Might they eventually switch to an- BGI’s decision to manufacture their own A “Personal Genomics Industry” by other platform such as Life Technologies’ SBT reagents and software. 2012, for starters. BGI believes the cost SOLiD system? “We’ve developed all our Public health and smarter disease of human genome sequencing could drop software and applications based on Il- surveillance are additional foci, particu- below $1000 soon, making feasible an era lumina, which is why we mainly use their larly “digitalized health,” complete with where digitalized and personalized health technology,” Li responds. “But we do have databases for personal health records. Li’s records are affordable. They’ve estimated two SOLiD machines, and we may get platform has successfully introduced this the Chinese market for such services at a more. It doesn’t necessarily mean we’ll improved system to Chinese communi- fat RMB1 trillion ($146.5 billion). switch; we’d like to make good use of both ties in Yunnan Province, Inner Mongolia, And of course, more sequencing—of [technologies].” and Tibet—nearly 25,000 individuals in the cute, the edible, and anything else The software developers work within all—and with the support of their local imaginable. BGI-Shenzhen scientists are BGI-Shenzhen’s energetic bioinformatics government, they’re doing the same for already working on the emperor penguin, group, one of the largest in China, if not residents in Shenzhen’s Yantian district, the Tibetan antelope, and the polar bear. the world, directed by seven-year veteran which surrounds the institute. “These [creatures] will help us under- Ruiqiang Li. “I don’t know any other place Not surprisingly, BGI dabbles in stand how living organisms adapt to with so many bioinformaticians [200 and cloning and genetic engineering, mainly extreme environments,” Li says. “And we counting] under one roof,” Zhuo Li af- for agricultural and animal husbandry think it is fun as well. Actually, we want to firms, adding that many are 25 or young- purposes. Researchers in this division use sequence everything—and we will.” x

[22] BIO•IT WORLD NOVEMBER | DECEMBER 2009 www.bio-itworld.com