Genome-Wide Transcription and the Implications for Genomic Ororganizationganizati

Genome-Wide Transcription and the Implications for Genomic Ororganizationganizati

REVIEWS Genome-wide transcription and the implications for genomic ororganizationganizati Philipp Kapranov, Aarron T. Willinghamam andan Thomashomas R. GingerasGingeras Abstract | Recentnt evidence of genome-wide transcriptioncripti in severaleveral species indicindicates that the amountnt of transcription that occurs cannott be eentirelyy accounted for by current sesets of genome-wide annotations. EviEvidencee indicindicates thatat most of both strands of tthe humanuman genome might be transctranscribed, implyimplying extensivei overlapp of transcriptionalriptio unitsnits and regulatory elements. ThesThesese observatobservations suggest ththat genomic architectureitecture is nott colinear, but is instead interlinterleaved andd modulmodular, and thatt the samsame genomicenomic seqsequenceses are multifunctionamultifunctional: thatt is, used for multipletiple indepeindependentlyde regulated transcrtranscriptsts and aas regulatoryulatory regregions. Whathat are the implicatioimplications and consequences oof suchh an interinterleaved genomic arcarchitectureecture in terms of increasedd informationnform on contencontent, transcriptionalriptiona complexity,omplexity, evolutiontion and disease states?s? Emerginggi genomicmic architectuarchitecture TilingTi array The descriptionscrip on of the lac operooperon in 1961961 by Jacobob 1 A microarray design in which and Monod establishedblished a conceptualconce modelodel of genee In-depth ananalyses off ththe transcriptional outpuoutputs of the the probes are selected to organization by whichch a DNA sequencese is neatlynea splitt human, momouse, fly and other genomes from a rrangenge of interrogateogate a genome witwith a into separate regulatoryy and protein-codingpro g portions;po s; experimentalexperim approaches (TABLE 1; BOX 1) suggestsugg st that consistent,nsistent, pre-determinepre-determined with the protein-codingprote portionortion of a gene preceded byb a the information content of a genome is compcomplex,xand and spacing between each proprobe. definedefined region of DNAD that regulatesregu its transcriptional that this complexityomple ity manifestsmanife ts itself at twtwoo levelevels. initiationtiation and follofollowed by a functional stretch of DNA The fractionraction off a genome thatthat is used asa an informa-in thatt controls its termination.t This simple butut eelegantt tion carriercarr er is muchuch higher thant an previouslyprevio expected, modelm hhas been supsupported by a wealth off biochebiochemical and muchmuch of theth unannotatedunannota transcription, the so- and genetic dataa and has conconsequentlyquently becbecomeome eengraved called ‘tra‘transcriptionalscrip dark matter’2, remains to be char- in all thinkingnking regardingr garding genomicgenomic organizationorganizat on for nearly acterizeacterized. Unbiased transcriptome profiling using tiling everyery species.s ecies. A simplisticmplistic extensionextension of this model is that arraysarray for ten human chromosomes revealed that 56% a region in a genomeg me usually hasas just oneon function; so, of the th transcribed base pairs in cytosolic polyadenylated a genomgenome consists of a linear aarrangement of different RNA (the cytosol contains the most mature, processed functionalfunctiona elementselemen that are interspersed with non- RNAs [AU:ok?]) do not correspond to annotated functionalfunction elements. For example, a region of DNA can exons of protein-coding genes, mRNAs or ESTs3. The be either a promoter or an exon, but usually not both. complexity of nuclear transcriptomes is much higher The advent of genome-wide techniques for studying — fivefold more transcribed base pairs are detected in transcription has enabled transcriptome studies on an nuclear RNA than in cytosolic RNA3, and approximately unprecedented scale (BOX 1). What emerges is that a 80% corresponds to the unannotated portion of the genomic region can be used for different purposes and genome3. In total, ~15% of all interrogated base pairs that different functional elements can co-locate in the can be detected as RNA molecules (either in the cytosol same region in a genome. This observation prompts us or in the nucleus) in a single human cell line. This is to re-evaluate the current dogma, which can be referred in contrast to a total of 1–2% of base pairs that corre- Affymetrix, Inc., to as the ‘colinear’ model, and indicates an alternative spond to the exons of all the annotated protein-coding 3 [AU:pls provide full postal model for genomic organization. This ‘interleaved’ genes . These data strongly indicate that a significant address], Santa Clara, model reflects the observation that multiple functional portion of the human genome can be transcribed. California 95051, USA. elements can overlap in the same genomic space. Here Estimates made by the Encyclopedia Of DNA Elements Correspondence to T.R.G. we discuss recent empirical data supporting this and (ENCODE) consortium4 — a large multidisciplinary e-mail: [email protected] consider the implications, advantages and challenges and collaborative effort to characterize the regulatory doi:10.1038/nrg2083 of this new model of genomic architecture. landscape of ~1% of the human genome — suggest that NATURE REVIEWS | GENETICS VOLUME 8 | JUNE 2007 | 1 nnrg2083.inddrg2083.indd 1 223/4/073/4/07 55:30:46:30:46 ppmm REVIEWS Box 1 | Technologies for mapping RNA expression this is in fact the case. Depending on which empirical data sets are included in the estimate, as much as 93% The methods for analysing structure and expression levels of the RNAs described in of the genomic sequences in the surveyed ENCODE this Review can be broadly classified into two groups: sequencing-based and regions seem capable of being transcribed5. This hybridization-based approaches. estimate is derived from the union of all intronic and Sequencing-based approaches. These approaches rely on obtaining direct information exonic sequences detected by several empirical RNA- about the order of nucleotides in an RNA molecule. They can be further subdivided into mapping technologies in multiple biological samples. A methods that involve sequencing of full-length or nearly full-length RNAs, or sequencing surprisingly large number of unannotated transcripts6,7 of short portions of RNAs, typically derived from the 3′ (SAGE)113, 5′ (CAGE)96 or both or novel isoforms of protein-coding genes5 for which termini (PET)97 of the corresponding RNAs. Before sequencing, RNAs are converted into cDNAs that can be further processed to generate truncated cDNAs that contain only primary structures have been elucidated by sequenc- short sequences or ‘tags’ (typically ~14–22 bases) that represent the sequences from ing do not seem to encode proteins. These transcripts either one of the two termini of the original RNA. Generation of tags significantly are often referred to as non-codingn-coding RNAsRN (ncRNAs). increases throughput, which in turn significantly increases depth of coverage. This is a putative designation,ation as it iss possiblepossib that some Sequencing-based methods provide the most detailed information about the might inn fact encodeencod short proteins or pepeptides. The structure of an RNA molecule, but they have a much lower throughput than term ‘transcript of unknownunknown function’ (TUF)( 3,8 has been hybridization-based methods. Therefore, full-length cDNA sequencing is typicallycally proposedoposed by the ENCODECODE consortium to denote such used to catalogue exemplars of different RNA molecules,cules, rather than as a means to putativeive non-codingnon g molecules — thus resereserving the comprehensively count the number of molecules inn a sample. However, sequencing of label ‘ncRNA’ncRNA for thosese RNAs for which therether is some short tags of cDNAs, such as CAGE, SAGE and PETET tags, has greatly benefited from the functionalnal evievidence. increases in throughput and parallelismarallelism of sequence-readoutuence-readout methods, anand is now used to identify the 5′ and 3′ termini of RNA moleculesules and estimate their abundance. Multifunctionalifuncti usagege of the same genomicge space is common.on. OveOverlapping transcriptsi can be proproduced Hybridization-baseddization-bas approaches.proac Thesese metmethods relyly on measuring the magnitudema of from ththee same or oppoopposite strandsnds of DNADNA. The regions hybridization of a prprobe to its target in a complexcom background,ckground, relative to theth signal of ovoverlaprlap of transcriptsscripts fromfro oppositeite strandsstrand can from the background or from controlol probes. IIn hybridization-baseddization-based detection, a probeprob would detect all molemolecules that contain regionsregion of complementaritymplementarity to that probe. If include the exons that are presentpresen in mature RRNAs, RNA molecules are not separatedted (using traditional gel-basedge -based techniques,techn that is, or be mostly m confinednfined to the introns.intr s. This is exemexempli- northern blots) before hhybridization,dization, the net sum of tthe hybridizationybridiz signalnal is the sums fied byy the phosphatidylserine decarboxylased rboxylase (PISDP ) of the signal from all the differenterent molecules that canc hybridizee to a probe.prob gene, whichich has at least nine overoverlappingg independenindependent Compared with the sequsequencing-basedcing-based methods, the main aadvantages of transcriptscr in its genic boundary (FIG. 1). Many detected hybridization-based methods are higherr throughputthroughp and depthth of sampling.sampling

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us