Genome and Transcriptome Comparisons Between Human And
Total Page:16
File Type:pdf, Size:1020Kb
! "## $%$&%'% "# ()*&($&+&))'& ,,,, - &$$*(. ! " #$%&%' ''()''* $ + * *,$ $-. *# /01$ 2 +$0 " 30%' '04 1 5 2 5$ 603 0 %%0 0789(!( &%'%0 1$$ 6$ : + $2 + ;< + 05 *$$ $ 6+ *+ $ $ * $ * *0 1$ *$$ $6 ** 2 $2 2$+ $+ = $ + *01$*2 * + + -,15/ !> *$ $ 6+ 01$ *,15+ 2$ $$ $ $ + 0# ,15+ $ ** $,152 $,15**$ * * 07 $$ 2 +$ + $ + $+$ 01$$ 2$+ ** 2 $ 2 03 > *$+ ** 2 $* *$+ $ 2 ** + 0" *%! 2$$ *$ * * 07 $* $ 2 = + $$ 6 * 0"+ $* + -1/0 1$? *1 + $ 03 *1 * $$ + 01$ ** * $ *$ 2 * *+ $ 2$1 * 0 7 $ $ * + 2 $ $ 601$* + * *$ * $ + ** ** + $ + 2 $ $ 60 $ $ 6+ + = + + ! "# "$!%! " "&'()*+) " @3 " %' ' 79 < <%'< 789(!( &%'% ) ))) %!(A-$ )BB 00B C D ) ))) %!(A/ Problems worthy of attack prove their worth by hitting back. Piet Hein, Danish scientist and philosopher (1905-1996) Till min familj List of Papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I Comparative genomic analysis of human and chimpanzee indicates a key role for indels in primate evolution. Wetterbom, A., Sevov, M., Cavelier, L., Bergström, TF. J Mol Evol (2006) Nov;63(5):682-90 II Genome-wide analysis of chimpanzee genes with premature termi- nation codons. Wetterbom, A., Gyllensten, U., Cavelier, L., Bergström, TF. BMC Genomics (2009) Jan 29;10:56 III Global comparison of the human and chimpanzee transcriptomes using Affymetrix exon arrays. Wetterbom, A., Ameur, A., Feuk, L., Gyllensten, U., Bergström, TF. Cavelier, L. Manuscript. IV Deep sequencing of the chimpanzee transcriptome identifies numer- ous transcribed regions in frontal cortex and liver. Wetterbom, A.*, Ameur, A.*, Feuk, L., Gyllensten, U., Cavelier, L. Submitted. * Both authors contributed equally. Reprints were made with permission from the respective publishers. Related papers by the author Indels in the evolution of the human and primate genomes. Wetterbom, A., Cavelier, L., Bergström, TF. In: ENCYCLOPEDIA OF LIFE SCIENCES. John Wiley & Sons, Ltd: Chichester. (July 2008). http://www.els.net/ [doi:10.1002/9780470015902.a0020851]. Global and unbiased detection of splice junctions from RNA-seq data. Ameur, A., Wetterbom, A., Feuk, L., Gyllensten, U. Submitted. Contents Introduction...................................................................................................11 The genome...................................................................................................14 Genomic variation and nucleotide divergence between human and chimpanzee...............................................................................................14 The transcriptome: from DNA to RNA ........................................................17 Protein coding genes ................................................................................17 Other transcription products.....................................................................19 Implications for phenotypic divergence...................................................21 Enabling technologies...................................................................................25 Publicly available data: genome assemblies and annotations ..................25 Gene expression arrays.............................................................................26 Next-generation sequencing.....................................................................27 Analysis of data from high-throughput methods......................................30 Present research ............................................................................................32 Aims .........................................................................................................32 Paper I: Comparative genomic analysis of human and chimpanzee indicates a key role for indels in primate evolution .................................33 Paper II: Genome-wide analysis of chimpanzee genes with premature termination codons ...................................................................................35 Paper III: Global comparison of the human and chimpanzee transcriptomes using Affymetrix exon arrays ..........................................36 Paper IV: Deep sequencing of the chimpanzee transcriptome identifies numerous transcribed regions in frontal cortex and liver.........................39 General discussion of the present research...............................................40 A note on the ethical implications of working with primate samples ......44 Concluding remarks and future perspectives................................................46 Why do we bother? ..................................................................................46 What have we learnt? ...............................................................................46 Where do we go next?..............................................................................47 Sammanfattning på svenska..........................................................................49 Acknowledgements.......................................................................................51 References.....................................................................................................53 Abbreviations BAC Bacterial artificial chromosome Bp Base pair(s) cDNA Complementary DNA CNS Conserved noncoding sequence CNV Copy number variant DNA Deoxyribonucleic acid ENCODE The encyclopedia of DNA elements Gb Giga base pairs GO Gene Ontology EJC Exon junction complex EPHX1 Epoxide hydrolase 1 EST Expressed sequence tag FDR False discovery rate FOXP2 Forkhead box P2 GSTO2 Glutathione S-transferase omega 2 HPR Human proteome resource Indel Insertion or deletion Kb Kilo base pairs Mb Mega base pairs MYA Million years ago mRNA Messenger RNA miRNA Micro RNA ncRNA Noncoding RNA NGS Next-generation sequencing NMD Nonsense mediated decay RNA Ribonucleic acid PCR Polymerase chain reaction PTC Premature termination codon rRNA Ribosomal RNA SD Segmental duplication siRNA Short interferring RNA snRNA Small nuclear RNA snoRNA Small nucleolar RNA SOLiD Sequencing by oligonucleotide ligation and detection tRNA Transfer RNA TSS Transcription start site UTR Untranslated region Introduction One of the most famous scholars of Uppsala University was Carl von Linné (1707-1778), a Swedish botanist, zoologist and physician who is recognised as the founder of modern taxonomy. Linné noticed the morphological simi- larities between humans and the great apes and classified the species to- gether in his great work, Systema Naturae (Linné 1735). Although the deci- sion to group humans and great apes together was biologically correct, it was controversial since it challenged the prevailing religious view of humans as the crown of creation. It would take more than a century before Linné’s work was supported by Thomas Huxley (1864) and Charles Darwin (1871) who also argued that humans were evolutionarily related to chimpanzees (Figure 1) and other great apes. Today it is widely accepted that chimpan- zees represent humankind’s closest living relative and molecular evidence places the last common ancestor of human and chimpanzee 5-7 million years ago (Figure 2). Figure 1. Chimpanzee at the Kolmården Zoo. Photograph by Anna Wetterbom. Since the time of separation both lineages have acquired a number of changes in their genomes and the initial genome comparison (Mikkelsen et al. 2005) revealed a nucleotide divergence of ~ 1.2 % and an additional indel 11 divergence of ~ 3 %. Other types of highly variable regions have also been described, such as segmental duplications (Cheng et al. 2005b) and trans- posable Alu elements (Mikkelsen et al. 2005), further adding to the variation. This implies that the popularly cited ‘estimate’ that humans and chimpan- zees are > 98 % alike is not entirely true. It all depends on which type of variation we consider. Figure 2. Species tree indicating the relationship between human, chimpanzee (Pan troglodytes), bonobo (Pan paniscus) and other closely related primates. Divergence times, in million years ago (MYA), are indicated to the left (Chen and Li 2001; Glazko and Nei 2003; Yu et al. 2003). Changes in the nucleotide sequence provide the raw material for evolu- tion. If a genetic variant results in a phenotype that confers an advantage to the carrier, the frequency of this genetic variant is likely to increase in the population. This is referred to as positive selection. The opposite is known as purifying selection, which acts to remove genotypes that cause a deleterious phenotype. However, most changes in the nucleotide sequence are selec- tively neutral, or nearly neutral. Meaning that the molecular changes either (i) do not change the phenotype or (ii) the resulting phenotype is neither advantageous nor deleterious. The neutral theory of molecular evolution was proposed by Kimura (1968) and it states that the majority of genetic changes occuring during evolution have little or no functional impact. The theory was initially developed to account for genetic alterations and it has recently been extended by Khaitovich et al (2004b) who suggested that changes in gene expression also follow a model for neutral evolution. Their results imply that similar to changes at the DNA level, many genetically determined expres- sion differences within and between species have little or