<<

Beginning & Basis of Biotechnology The Fourth Asian Conference on Biotechnology and Development February 12-13, 2009, Kathmandu, Nepal

Huanming Yang Ph.D. Institute (BGI), 1. A world of crises and opportunities 2. Two pillars and platforms of genomics 3. “Three BY ALLs” for genomics 4. Four proposals for collaboration 5. Five collaborative projects undergoing Biotechnology

Challenges and opportunities

for Asia for developing countries 2007-2008 Global energy crisis Global food crisis Global financial crisis Global economy crisis Crisis

Crisis & Opportunity 2007: A Year of Miracle “There were more breakthroughs (in life sciences) last year (2007) than in several of the past decades combined.”

Dr. Eric Topol (Scripps Institute, March 11 2008 ): The hunt for genetic gold “A Year of Miracle”

“These findings are just a prelude to what's shaping up as a true conceptual and technological revolution. Just as physics shocked the world in the 20th century, it is now clear that the life sciences will shake up the world in the 21st.”

Newsweek Oct. 15, 2007 2007

The Year of Sequencing

Nature Methods 5:11-14, 2008 2007年的 “年度问题” Question of the Year

《自然-遗传学》杂志的编辑提出了2007年的 “年度 问题”: 如果人类基因组的测序费用降至1000美元,那 将意味着什么? What does it mean if a genome sequence costs less than $1000?

“一千美元一个人类基因组”已经不是能不能、而是 什么时候的问题了。这样的问题在4年前还难以设想。 “$1000 Genome’ is not a question whether it is possible, but when it will be realized. It is unimaginable only 4 years ago. BREAKTHROUGH OF THE YEAR: Equipped with faster, cheaper technologies for sequencing DNA Science and assessing variation in on scales 318: 1842 – 1843 21 December 2007 ranging from one to millions of bases, researchers are finding out how truly different we are from one another

Now, we have moved from asking what in our DNA makes us human to striving to know what in my DNA makes me me. Volume 22 | Issue 12 | Dec. 2008 TIME’s Best Inventions of 2008 “我们正处在个人 基因组学 () 革命的初 始阶段,这场革命不 仅会转变我们照顾自 己的方式,还将改变 9 Nov. 2008 我们个人信息的表现 形式。”

《时代》周刊 We are at the beginning of a personal- : genomics revolution that will transform 1. The Retail DNA Test not only how we take care of ourselves 个人 测试服务 位居榜首 but also what we mean by personal “ DNA ” information. Top 10 Medical Breakthroughs of 2008 4. Genomes for the Masses

James Watson did it. So did Craig Venter. Now you, too, can map your entire genome and reveal some of its many secrets. Scientists debate whether that information is really worth anything at the moment — in many cases, there isn't enough scientific knowledge to interpret what it really means to have this gene variant or that one — but companies at least make it possible for you to take a gander at your genetic data. (Although the service was available previously, until this year, it's been prohibitively expensive.) You provide a sample of saliva, from which your DNA is extracted, copied and combed for the presence of 90 known genetic variations that code for different traits or conditions, from lactose intolerance (though you could probably drink a glass of milk and find out for far cheaper) to prostate cancer. Right now, there's no way to know whether you'll get cancer just because you have the gene, but once the science has advanced, the hope is that such genetic mining will predict disease, giving people the option of seeking treatment before they get sick. favours developing countries To develop your own bioindustry by taking advantage of your rich genetic resources Genomics

A science to find genes

Basis and beginning of Biotechnology

(biotech cannot be done without genes) 1. A world of crises and opportunities 2. Two pillars and platforms of genomics 3. “Three BY ALLs” for genomics 4. Four proposals for collaboration 5. Five collaborative projects undergoing Two Pillars of Genomics “Life is of sequence”

Genetic information is in sequence ATTCGGTAACGATTAGAA

DNA sequence: The essence of “life is life”! The same for plants Two Pillars of Genomics

“Life is digital!” “the instructions for making a life from one generation to the next is digital, NOT analogue …”

“生命指令是数据的,而不是模拟的” 101010101010100101 010101001101010101 010100101010101001 0100111000011100000111000001100010101 0101010010101010101010100101010101001010 0111000011100000111000001100010010101010 0101010101010101001010101010010100111000 0111000001110000011000101010101010010101 0101010101001010101010010100111000011100 0001110100111000011100000111000001100010 1010101010010101010101010100101010101001 0100111000011100000111000001100010010101 0100101010101010101001010101010010100111 0000111000001110000011000101010101010010 1010101010101001010101010010100111000011 1000001110000011000101010101010010101010 1010101001010101010010100111000011100000 1110000011000101010101010010101010101010 “Life is of sequence”,1001010101010010100111000011100000111000 “Life is digital” 0110001010110000111000001110000011000101 0101010100101010101010101000101010101010 making sequencers and0101010101010001010101010100101010101010 supercomputers 1010010101010100101001110000111000001110 major tools for0001100010101100001110000011100000110001 genomics 0101010101001010101010101011100101100101 0101010101010101010100010111010100000000 0010111000000100010000010101010010101001 1001010101001001010010010100101010010010 Two Pillars of Genomics “Life is of sequence, Life is digital!”

Revolutionizing life sciences No sequence, no knowledge!

“All biology in the future will start with the knowledge of genomes and proceed hopefully. ”

J. D. Watson, 2003 “未来所有生物学只有以基因组知识(重新)开始才有希望发展” An issue of cost!

“One ‘buck’, one base” HGP $ 3.0 b/3 Gb ($1.0/1bp) 1999 $ 0.5 b/3 Gb 2005 $ 30 m/3 Gb ($ 0.01/bp) Future (1) $ 3.0 m /3 Gb (2) $ 0.1 m /3 Gb

Sequencing is not a “luxurious” tool any longer Next-Generation Sequencers

Aiming at “$100,000 genome” Sequencing Revolution 454

20 – 500 Mb (100-500 bp/read)/6 hours/run Sequencing Revolution

Sequencing by Ligation SOLiD 3 Gb/run Sequencing Revolution Illumina Solexa

3-6 Gb/6 days

~1000 molecules per ~ 1 um cluster 100um ~1000 clusters per 100 um square Random array of ~40 million clusters per experiment clusters Next-Next-Generation Sequencers

Aiming at “$1000 genome” NEXT-GEN SEQUENCING TECHNOLOGIES 26 (10):1146, Oct. 2008 PacBio to Start Selling Next-Gen Sequencer To Early Users in 2010

The company projects that with improvements to its and in camera technology, it will eventually be able to generate more than 100 gigabases of sequence data per hour, provide reads at least as long as Sanger sequencing, and offer run times measuring in minutes at a cost of hundreds of dollars. Pacific Biosciences also prepares the 15-Minute Genome by 2013.

GenomeWeb Newsroom February 13, 2008 VisiGen to Offer 'Nano-Sequencing' $1,000 Genome Service by 2009

[February 12, 2008] By Bernadette Toner. GenomeWeb News Editor SALT LAKE CITY (GenomeWeb News) – Next-generation sequencing firm VisiGen Biotechnologies plans to offer a service based on its real-time single- molecule sequencing, or "nano-sequencing machine", technology by the end of 2009, and to follow that with the launch of equipment and reagents in another 18 months to two years. The technology could enable researchers to sequence an entire in less than a day for under $1,000, which can generate around 4 gigabases of data per day. At that throughput, the technology could sequence 44 human genomes per year at 10-fold coverage for around $1,000 per genome. In addition, read lengths for the instrument are expected to be around one kilobase. Oxford Nanopore Technologies

• Earlier this year, UK-based Oxford Nanopore Technologies, a startup company developing a nanopore-based sequencing technology, raised £10 million ($20 million) in a second financing round from non-VC institutional and private investors, adding to an £7.5 million ($15 million) round in 2006 (see In Sequence 4/8/2008). Technical Platform of Genomics

It is essential for developing countries to build powerful infrastructure for sustainable development of science ① Sequencing

104 MegaBACE 25 ABI 3700/3730 18 X Illumina Solexa

I 1 X 454 2 X ABI SOLiD l “BGI to Ramp up Sequencing Abilities”

NEW YORK March 26, 2008 (GenomeWeb News) – Beijing Genomics Institute is dramatically expanding its DNA sequencing capacity by adding fourteen new next-generation sequencers, … to bring BGI's raw-sequencing data output to up to 20 Gbps per day or more, ranking the 3rd biggest center in the world concerning its capacity. Space for 40 new sequencers Programers > 150 ②

Downing 3000

Supercomputers at BGI, Beijing & Hangzhou SGI Supercomputers in Beijing

IBM Downing 2000 SUN CPUs: 1192 (Memory) (Storage) Memory: 2.3 T Storage: 1458 T Speed: 20T FLP

Home-made new supercomputer at BGI-Shenzhen To read the genome

BGI has contributed to most, if not all, programs for CNV detection and other applications by the next-generation sequencers, especially for Solexa. This is a revolution “This is just the start” 1. A world of crises and opportunities 2. Two pillars and platforms of genomics 3. “Three BY ALLs” for genomics 4. Four proposals for collaboration 5. Five collaborative projects undergoing An imbalanced world

Two “tribes” in the world:

The Rich & the Poor. Where is her hope? Kathmendu, Feb.11, 2008 Hope!

Where is her hope? Kathmendu, Feb.11, 2008 The challenge is not only technology, but also humanity! Genomics Should not create more differences or to make the differences even bigger Nature April 27,2006 “Human genome sequencing presented a unique opportunity for China to join the international community. I salute all our friends and colleagues at the collaborating institutions for their contributions to this task and for their support of free data-sharing under the spirit of the that is ‘owned by all, done by all and shared by all,” said Yang.

News Release for Completion of Chromosome 3 April 27, 2006 “The HGP Spirit” “Owned by all (共有), done by all (共为), shared by all (共享)!”

Science Feb. 16, 2001 The first time in history

free-sharing of genomics data

Providing all countries with almost the same opportunity placing all countries at the similar starting point in life sciences and biotechnology “China has become the latest contributor to the worldwide sequencing effort alongside France, Germany, , UK and USA.” --- International Human Genome Sequencining Consortium 1 Sept. 1999

USA 54% UK 33% Japan 7% France 3% Germany 2% China 1%

Science Feb. 16, 2001 Canada 10 % China 10 % (HKSTU) Japan 25 % UK 25 % USA 30 % Nov.6, 2008 国际人类基因组单体型图计划

The genetic markers generated by the HapMap make GWAS possible Biology Reborn

“In five months, from April through August, geneticists at the Harvard/MIT Broad Institute, founded by Eric Lander; at deCODE in Iceland, founded by Kari Stefansson, and several other institutions have published papers suggesting that the key to a deeper understanding of the human genome may finally be in hand. These scientists have identified specific alterations in the sequence of DNA that play causative roles in a broad range of common diseases,

Newsweek Oct. 15, 2007 Biology Reborn

“including type 1 and type 2 ; schizophrenia; bipolar disorder; glaucoma; inflammatory bowel disease; rheumatoid arthritis; hypertension; restless legs syndrome; susceptibility to gallstone formation; lupus; multiple sclerosis; coronary heart disease; colorectal, prostate and breast cancers; susceptibility to HIV infection... Unlike so many previous "disease gene" discoveries, these findings are being replicated and validated.” Newsweek Oct. 15, 2007

Multi-origin of the domesticated chicken 中国对 A contribution 科学及 to science and mankind 人类的 by China. 又一重 大贡献 “The rice genome is well cooked” Citations: 1078

“A landmark paper should be read by all plant biologists.”

Let international scientific community evaluate! Impacts on the global genomics

Let international scientific community evaluate! All the data generated are freely available to all To make the best use of the free data

To contribute to the databases

for your own country reasonably both scientifically and economically >26 countries will join

The 2nd Meeting of the International Nov. 15-18, 2008, Maryland, USA The

Three sequencer-producers applied to join 1. A world of crises and opportunities 2. Two pillars and platforms of genomics 3. “Three BY ALLs” for genomics 4. Four proposals for collaboration 5. Five collaborative projects undergoing Four Proposals for Collaboration on Genomics in Asia

1. To build a network based on friendship among us

any governmental collaboration is based on personal friendship and mutual trust. I appreciate the organizers for the opportunity to meet old friends and to make new friends. Four Proposals for Collaboration on Genomics in Asia

2. To share the platforms established and to be established

through a mechanism good for both scientificaly and economically Four Proposals for Collaboration on Genomics in Asia

3. To educate the young generation

by running training courses. programs for graduates and visiting scholars, as we did with Hongkong Chinese University, South China S & T University, and Shenzhen University. Four Proposals for Collaboration on Genomics in Asia

4. To join the internationally collaborative projects

to join the ongoing international projects, to sequence samples from your populations, and/or an animal or a plant which is the logo genome for your country. 1. A world of crises and opportunities 2. Two pillars and platforms of genomics 3. “Three BY ALLs” for genomics 4. Four proposals for collaboration 5. Five collaborative projects undergoing Five Projects undergoing

1. The “1000 Genomes Project” 2. The Cancer Genome Project (TCGA) 3. The Metagenomics Projects 4. The Pathogen Genome Project 5. Tree of Life: The 10,000 (“万物”) Genomes Project EMBARGOED for Release Contact: Tues., Jan. 22, 2008 Geoff Spencer, NHGRI 8 a.m. Eastern 301 402-0911 [email protected] International Consortium Announces the 1000 Genomes Project Major Sequencing Effort Will Produce Most Detailed Map Of Human Genetic Variation to Support Disease Studies

The project will receive major support from the Sanger Institute in Hinxton, , the Beijing Genomics Institute, Shenzhen (BGI Shenzhen) in China and the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH). The 1000 Genomes Project a catalogue of human created using next generation sequencing Data Production (Gb) by Pilot Project

>1,000 x coverage of human genome already in pilots! Data Production (Gb) by Pilot Project

• Current size of Genbank: 235,135,312,328 bp

• During September and October the 1000 Genomes project produced the equivalent of GenBank every week

• Raw data is freely available now – ftp://ftp.1000genomes.ebi.ac.uk – ftp://ftp-trace.ncbi.nih.gov/1000genomes/ Slide courtesy Paul Flicek 1000 Genomes Project Data To Be Released Within Months as Pilot Phase Nears Completion

[November 14, 2008] By Andrea Anderson, GenomeWeb staff reporter

PHILADELPHIA (GenomeWeb News) – The 1000 Genomes Project collaborators plan to begin releasing data early next year and expect to finish sequencing 1,200 human genomes by around the end of 2009, project representative David Altshuler announced yesterday at the American Society of Human Genetics meeting here. May 31, 2007

Nov.6, 2008 We have to sequence more and more Individuals from all the populations! Five Projects undergoing

1. The “1000 Genomes Project” 2. The Cancer Genome Project (TCGA) 3. The Metagenomics Projects 4. The Pathogen Genome Project 5. Tree of Life: The 10,000 Species (“万物”) Genomes Project Human Cancer Genome Project Recommendation for a Human Cancer Genome Project

Report of Working Group on Biomedical Technology National Cancer Advisory Board (NCAB)

February 2005 Let science unify us!

1. Turn a country’s project to an internationally collaborative project

2. To call for developing countries to join

NIH TCGA Symposium, April, 2005 How big is the project?

50 types of cancers > 500 samples for each The Chinese Initiative on Cancer Genomics

Another historic opportunity for China and all Asian countries to join the international community Scientists Form International Cancer Genome Consortium

EMBARGOED For release after 8 a.m. Eastern, Tues., April 29, 2008 Researchers from four continents today announced they are launching the International Cancer Genome Consortium (ICGC), a collaboration designed to generate high-quality genomic data on up to 50 types of cancer through efforts projected to require up to a decade. The ICGC, which is extending an invitation to all nations to participate, will make its data rapidly and freely available to the global research community.

Current ICGC members include: • Australia: National Health and Research Council • Canada: Genome Canada; Ontario Institute for • China: Chinese Cancer Genome Consortium • : European Commission (Observer Status) • India: Department of Biotechnology, Ministry of Science & Technology • Singapore: Genome Institute of Singapore • : The Wellcome Trust • : National Institutes of Health (NIH) Five Projects undergoing

1. The “1000 Genomes Project” 2. The Cancer Genome Project (TCGA) 3. The Metagenomics Projects 4. The Pathogen Genome Project 5. Tree of Life: The 10,000 Species (“万物”) Genomes Project Human Metagenome: the extension of our genome

Human microbiota The microorganisms that live inside (in cavities of the human body and even within human cells) and on (surfaces) Human microbiome (metagenome) The collective genomes of human microbiota, including the genomes of bacteria (the majority), Archaea, yeasts, and viruses. The human microbiota at least 10 times more cells than in normal human body itself The human microbiome at least 100 times more unique genes than in human genome itself

The complex and dynamic microbiota has a profound influence on physiology, nutrition, immunity and neurology, imbalances in the microbial population is a significant factor in many diseases

Metagenomics of the Human Intestinal Tract EU FP7 Large Collaborative Project The MetaHIT Project

“The consortium plans to sequence the genomes of reference microbes and to generate metagenomic sequence data of samples from the human gut,” according to Dusko Ehrlich, a researcher who coordinates the project. Five Projects undergoing

1. The “1000 Genomes Project” 2. The Cancer Genome Project (TCGA) 3. The Metagenomics Projects 4. The Pathogen Genome Project 5. Tree of Life: The 10,000 Species (“万物”) Genomes Project 近五年来的相关突发疫情处理 Pathogens identified in the past 5 years • SARS-CoV - 新发传染病病原体的检测鉴定

• HPAIV-H5 – 高致病性禽流感的检测与监测

• Streptococcus suis Sequencing– 四川资阳猪链球菌的检测\鉴定,毒力变异分析 is the most reliable and rapidest way for pathogen identification, • Unknown pathogen(Semliki Forest\densonucleosis virus) – 海南、新疆不明热病原体的快速检测\鉴定detection and sub-typing BGI is acting!

300,000人份 捐憎

科研报告 文章发表

ELISA Kit 研制

Viral identified in patient serum

4 株病毒全基 因组测序完成 获得样本

The first 4 strains of SARS-CoV identified by BGI 24 hours after receiving the samples, April 15-16, 2003 青海湖禽流感序列测定与分析

(Science 309 (5738): 1206, 2005)

24 小时完成4株AIV的全基因组测序,具有以下特 征:

• 均为H5N1亚型;

• PQGERRRKKR/G的HA切割位点基序;

• Pb2基因蛋白序列627位的AA残基为K;

• NA基因20个AA残基的缺失;

• 可能来源是香港发现的A/peregrine falcon/HK/D0028/04 Four strains病毒,并发生重排。 of AIV identified within 24 hours by sequencing at BGI International collaborations on TB, AIV, and HPV,… Five Projects undergoing

1. The “1000 Genomes Project” 2. The Cancer Genome Project (TCGA) 3. The Metagenomics Projects 4. The Pathogen Genome Project 5. Tree of Life: The 10,000 Species (“万物”) Genomes Project A better understanding of DNA “functionA better understanding of DNA function will come only from generating data from diverse genomes.”

Nature 6 Sept. 2007 解 读 生 命 中国水稻 之 中国家蚕 树 国际家鸡 中国烟草 国际熊猫 中国棉花 中国黄瓜 国际甜瓜 “万物基因组计划” “10000 Species Genomes Project”

Chinese unlock panda's genome

San Diego Union Tribune 2:00 a.m. January 14, 2009 By Terri Somers

Jingjing, the 2008 Beijing Olympics mascot, … has been sequenced. “The biggest technological advance in genetics analysis has been in gene sequencing, … like BGI did with the panda genome. ” 大熊猫的进化

初步分析表明, 在已经测序的物种中 ,熊猫基因组与狗最 为接近(79.9%),与 人类也有约62% 的相 似性 Cucumber Genome Project

Feasibility: 367 Mb, homozygous genome;

Economic value: 38 M tons, 100B RMB for cucurbits

Orphan crop but easy to breed Other Genome Projects Initiated

• 34 silkworm genomes • >40 rice genomes • >10 Drosophila • Pig • Oyster • Duck • Maize • Wheat • Sorghum • Cabbage • Potato …… We have to sequence all the species of scientific, economic, and ecological significance! Nov.2008 Nature Science Genome Science Nature PLoS Research Biolog(Nature)y Feb.2001 Apr.2002 Jun.2003 Dec.2004 Dec.2004 Feb.2005 Genomics cannot be done alone! Nov.2008 Nature Science Genome Science Nature PLoS Research Biolog(Nature)y Feb.2001 Apr.2002 Jun.2003 Dec.2004 Dec.2004 Feb.2005 Let’s work together! Nov.2008 Nature Science Genome Science Nature PLoS Research Biolog(Nature)y Feb.2001 Apr.2002 Jun.2003 Dec.2004 Dec.2004 Feb.2005 Acknowledgment Sponsors Chinese Academy of Sciences Ministry of Science and Technology National Natural Science Foundation Shenzhen, Yantian Governments Zhejiang, Chongqing Governments Beijing Municipal Government Hangzhou Municipal Government Yueqing Municipal Government All our supporters, collaborators international advisors colleagues and friends and all my young staff