Genomes:: WhatWhat wewe knowknow …… andand whatwhat wewe don’tdon’t knowknow

Complete draft sequence 2001

OctoberOctober 15,15, 20072007 Dr.Dr. StefanStefan Maas,Maas, BioSBioS Lehigh Lehigh U.U. © SMaas 2007 What we know

Raw data © SMaas 2007 The range of genome sizes in the animal & plant kingdoms

!! NoNo correlationcorrelation betweenbetween genomegenome sizesize andand complexitycomplexity © SMaas 2007 What accounts for the often massive and seemingly arbitrary differences in genome size observed among eukaryotic organisms?

The fruit fly The mountain grasshopper Podisma pedestris

180 Mb 18,000 Mb

The difference in genome size of a factor of 100 is difficult to explain in view of the apparently similar levels of evolutionary, developmental and behavioral complexity of these organisms. © SMaas 2007 ComplexityComplexity doesdoes notnot correlatecorrelate withwith genomegenome sizesize

3.4 × 10 9 bp 6.7 × 1011 bp Homo sapiens Amoeba dubia

© SMaas 2007 ComplexityComplexity doesdoes notnot correlatecorrelate withwith genegene numbernumber

~31,000~31,000 genesgenes ~26,000~26,000 genesgenes ~50,000~50,000 genesgenes © SMaas 2007 IsIs anan ExpansionExpansion inin GeneGene NumberNumber drivingdriving EvolutionEvolution ofof HigherHigher Organisms?Organisms?

Vertebrata 30,000

Urochordata 16,000

Arthropoda 14,000

Nematoda 21,000

Fungi 2,000 – 13,000

Vascular plants 25,000 – 60,000

Unicellular sps. 5,000 – 10,000

Prokaryotes 500 - 7,000

© SMaas 2007 Structure of DNA

Watson and Crick in 1953 proposed that DNA is a double helix in which the 4 bases are base paired, Adenine (A) with Thymine (T) and Guanine (G) with Cytosine (C).

© SMaas 2007 © SMaas 2007 © SMaas 2007 Steps in the folding of DNA to create an eukaryotic chromosome

30 nm fiber (6 nucleosomes per turn)

FactorFactor ofof condensation:condensation: Ca.Ca. 10,00010,000 foldfold

Why?

! Facilitates movement during cell division ! Decreases error rate

© SMaas 2007 Genes code for Proteins with RNA as an intermediate

© SMaas 2007 The

© SMaas 2007 The karyotype

short arm

long arm

23 pairs of chromosomes

© SMaas 2007 Gene splicing: Removal of non-coding introns ExonExon

IntronIntron

mature splice product

PROTEIN

© SMaas 2007 HumanHuman genomegenome factsfacts

•• ~~ 30,00030,000 genesgenes •• OnOn average:average: ––CodingCoding length:length: 1.41.4 kbkb ––GeneGene extent:extent: 3030 kbkb ––88 ExonsExons (135(135 bpbp),), 77 intronsintrons (2,200(2,200 bpbp)) ––GeneGene density:density: 11.5/111.5/1 MbMb

© SMaas 2007 IsIs thethe genomegenome emptyempty ??

Exons 1.5%1.5% Introns (junk?)

Intergenic regions (junk?)

© SMaas 2007 Correlation between complexity and amount of non-coding DNA

1.0

© SMaas 2007 © SMaas 2007 DNA with low complexity is located in the middle and at the ends of a chromosome

= gene poor region

© SMaas 2007 Imagine a gene that could duplicate copies of itself within the genome time

© SMaas 2007 Repeats dominate the

One megabase from chromosome 7

UCSC Genome Browser Interspersed repeats http://genome.cse.ucsc.edu Human genome: ~ 50% repetitive sequences © SMaas 2007 Transposable elements in the human genome

© SMaas 2007 GenomeGenome vsvs TranscriptomeTranscriptome

How much of the genomic DNA is converted into RNA sequences?

© SMaas 2007 TranscriptomeTranscriptome factsfacts

• 97-98% of transcriptional output is non coding RNA (ncRNA) • Only 1.5% of genome are protein coding • But: including introns, 30% of genome is transcribed • Adding ncRNA genes: >50% of genome is transcribed

© SMaas 2007 Genotyping

© SMaas 2007 Sequence variations between individuals

© SMaas 2007 Sequence variations between individuals

© SMaas 2007 The Genome Sequence -- an open book …. (written in well-known language but poorly understood grammar)

How make sense of whole genome sequences?

Bioinformatics helps to: " Find regulatory sequences " Find protein coding sequences " Find related sequences " Compare sequences across species

Main aim: understand what are the functions of all genes and how they work together

© SMaas 2007 © SMaas 2007 Functional classification of expressed genes

Collection of expressed genes

RandomlyRandomly selectedselected sequencessequences fromfrom humanhuman cellscells groupedgrouped byby functionfunction

© SMaas 2007 Genes in the Mycoplasma genitalium classified by function

SmallestSmallest genomegenome ofof anyany knownknown freefree livingliving organismorganism

! What is the smallest number of genes needed for survival?

580 kb genome <> 471 genes © SMaas 2007 Functional genomics I Use of DNA microarrays (chips)

Fluorescently tagged cDNA probes are hybridized to DNA spots in the microarray for normal vs brain brain tumor studying differential expression of Example: thousands of genes at a time in two mRNA samples

© SMaas 2007 What is Proteomics

© SMaas 2007 © SMaas 2007 Steps in the creation of a transgenic mouse

© SMaas 2007 © SMaas 2007 Summary - Conclusions We do know: • Complete human genome sequence • Function of ca. 50% of protein-coding genes • Some interactions and regulation of genes • > 2700 disease-causing single gene mutations We don’t know: • Function of at least 50% of genes • How do gene products work together ? (within cells and within organism) • What makes the human so special ? (its not the gene number or genome size) © SMaas 2007