Genomes:Genomes: Whatwhat Wewe Knowknow …… Andand Whatwhat Wewe Don’Tdon’T Knowknow

Genomes:Genomes: WhatWhat wewe knowknow …… andand whatwhat wewe don’tdon’t knowknow Complete draft sequence 2001 OctoberOctober 15,15, 20072007 Dr.Dr. StefanStefan Maas,Maas, BioSBioS Lehigh Lehigh U.U. © SMaas 2007 What we know Raw genome data © SMaas 2007 The range of genome sizes in the animal & plant kingdoms !! NoNo correlationcorrelation betweenbetween genomegenome sizesize andand complexitycomplexity © SMaas 2007 What accounts for the often massive and seemingly arbitrary differences in genome size observed among eukaryotic organisms? The fruit fly The mountain grasshopper Drosophila melanogaster Podisma pedestris 180 Mb 18,000 Mb The difference in genome size of a factor of 100 is difficult to explain in view of the apparently similar levels of evolutionary, developmental and behavioral complexity of these organisms. © SMaas 2007 ComplexityComplexity doesdoes notnot correlatecorrelate withwith genomegenome sizesize 3.4 × 10 9 bp 6.7 × 1011 bp Homo sapiens Amoeba dubia © SMaas 2007 ComplexityComplexity doesdoes notnot correlatecorrelate withwith genegene numbernumber ~31,000~31,000 genesgenes ~26,000~26,000 genesgenes ~50,000~50,000 genesgenes © SMaas 2007 IsIs anan ExpansionExpansion inin GeneGene NumberNumber drivingdriving EvolutionEvolution ofof HigherHigher Organisms?Organisms? Vertebrata 30,000 Urochordata 16,000 Arthropoda 14,000 Nematoda 21,000 Fungi 2,000 – 13,000 Vascular plants 25,000 – 60,000 Unicellular sps. 5,000 – 10,000 Prokaryotes 500 - 7,000 © SMaas 2007 Structure of DNA Watson and Crick in 1953 proposed that DNA is a double helix in which the 4 bases are base paired, Adenine (A) with Thymine (T) and Guanine (G) with Cytosine (C). © SMaas 2007 © SMaas 2007 © SMaas 2007 Steps in the folding of DNA to create an eukaryotic chromosome 30 nm fiber (6 nucleosomes per turn) FactorFactor ofof condensation:condensation: Ca.Ca. 10,00010,000 foldfold Why? ! Facilitates movement during cell division ! Decreases error rate © SMaas 2007 Genes code for Proteins with RNA as an intermediate © SMaas 2007 The Genetic Code © SMaas 2007 The human karyotype short arm long arm 23 pairs of chromosomes © SMaas 2007 Gene splicing: Removal of non-coding introns ExonExon IntronIntron mature splice product PROTEIN © SMaas 2007 HumanHuman genomegenome factsfacts •• ~~ 30,00030,000 genesgenes •• OnOn average:average: ––CodingCoding length:length: 1.41.4 kbkb ––GeneGene extent:extent: 3030 kbkb ––88 ExonsExons (135(135 bpbp),), 77 intronsintrons (2,200(2,200 bpbp)) ––GeneGene density:density: 11.5/111.5/1 MbMb © SMaas 2007 IsIs thethe genomegenome emptyempty ?? Exons 1.5%1.5% Introns (junk?) Intergenic regions (junk?) © SMaas 2007 Correlation between complexity and amount of non-coding DNA 1.0 © SMaas 2007 © SMaas 2007 DNA with low complexity is located in the middle and at the ends of a chromosome = gene poor region © SMaas 2007 Imagine a gene that could duplicate copies of itself within the genome time © SMaas 2007 Repeats dominate the human genome One megabase from chromosome 7 UCSC Genome Browser Interspersed repeats http://genome.cse.ucsc.edu Human genome: ~ 50% repetitive sequences © SMaas 2007 Transposable elements in the human genome © SMaas 2007 GenomeGenome vsvs TranscriptomeTranscriptome How much of the genomic DNA is converted into RNA sequences? © SMaas 2007 TranscriptomeTranscriptome factsfacts • 97-98% of transcriptional output is non coding RNA (ncRNA) • Only 1.5% of genome are protein coding • But: including introns, 30% of genome is transcribed • Adding ncRNA genes: >50% of genome is transcribed © SMaas 2007 Genotyping © SMaas 2007 Sequence variations between individuals © SMaas 2007 Sequence variations between individuals © SMaas 2007 The Genome Sequence -- an open book …. (written in well-known language but poorly understood grammar) How make sense of whole genome sequences? Bioinformatics helps to: " Find regulatory sequences " Find protein coding sequences " Find related sequences " Compare sequences across species Main aim: understand what are the functions of all genes and how they work together © SMaas 2007 © SMaas 2007 Functional classification of expressed genes Collection of expressed genes RandomlyRandomly selectedselected sequencessequences fromfrom humanhuman cellscells groupedgrouped byby functionfunction © SMaas 2007 Genes in the Mycoplasma genitalium classified by function SmallestSmallest genomegenome ofof anyany knownknown freefree livingliving organismorganism ! What is the smallest number of genes needed for survival? 580 kb genome <> 471 genes © SMaas 2007 Functional genomics I Use of DNA microarrays (chips) Fluorescently tagged cDNA probes are hybridized to DNA spots in the microarray for normal vs brain brain tumor studying differential expression of Example: thousands of genes at a time in two mRNA samples © SMaas 2007 What is Proteomics © SMaas 2007 © SMaas 2007 Steps in the creation of a transgenic mouse © SMaas 2007 © SMaas 2007 Summary - Conclusions We do know: • Complete human genome sequence • Function of ca. 50% of protein-coding genes • Some interactions and regulation of genes • > 2700 disease-causing single gene mutations We don’t know: • Function of at least 50% of genes • How do gene products work together ? (within cells and within organism) • What makes the human so special ? (its not the gene number or genome size) © SMaas 2007.

Genomes:Genomes: Whatwhat Wewe Knowknow …… Andand Whatwhat Wewe Don’Tdon’T Knowknow

Genome Sequence of the Progenitor of the Wheat D Genome Aegilops Tauschii Ming-Cheng Luo1*, Yong Q

Saccharomyces Cerevisiae

Improved Maize Reference Genome with Single-Molecule Technologies Yinping Jiao1, Paul Peluso2, Jinghua Shi3, Tiffany Liang3, Michelle C

Apis Mellifera)

Genome-Wide Analysis of Admixture and Adaptation in the Africanized Honeybee

Recombination in Diverse Maize Is Stable, Predictable, and Associated with Genetic Load

Co-Expression of Neighboring Genes in the Zebrafish (Danio Rerio) Genome

Ceranae, an Emergent Pathogen of Honey Bees

Genome-Wide Analysis of Local Chromatin Packing in Arabidopsis Thaliana

Genome-Wide DNA Alterations in X-Irradiated Human Gingiva Fibroblasts

Saccharomyces Cerevisiae HOWARD BUSSEY*T, DAVID B

Genomewide Analysis of Admixture and Adaptation in the Africanized Honeybee