The Study of Eqtl Variations by RNA-Seq: from Snps to Phenotypes

The Study of Eqtl Variations by RNA-Seq: from Snps to Phenotypes

Review The study of eQTL variations by RNA-seq: from SNPs to phenotypes Jacek Majewski and Tomi Pastinen Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, 740 Dr. Penfield Avenue, Rm 7210, Montreal, Quebec, H3A 1A4, Canada Common DNA variants alter the expression levels and more widespread importance of noncoding or regulatory patterns of many human genes. Loci responsible for this DNA alterations in disease as implied by GWAS now calls genetic control are known as expression quantitative for approaches to characterize such a variation and its trait loci (eQTLs). The resulting variation of gene expres- links to disease phenotypes. sion across individuals has been postulated to be a determinant of phenotypic variation and susceptibility Genome-wide identification of loci controlling gene to complex disease. In the past, the application of ex- expression pression microarray and genetic variation data to study The parallel assessment of thousands of transcripts using populations enabled the rapid identification of eQTLs in DNA microarrays is clearly one of the revolutionary tech- model organisms and humans. Now, a new technology nologies that launched the ‘genomic’ era. The genome- promises to revolutionize the field. Massively parallel wide association of genetic and transcriptome variations RNA sequencing (RNA-seq) provides unprecedented res- was first achieved in yeast [6], where expression traits of olution, allowing us to accurately monitor not only the the progeny were shown to be largely correlated with the expression output of each genomic locus but also recon- genetic contribution of parental genotypes. The excite- struct and quantify alternatively spliced transcripts. ment of observing thousands of quantitative traits, or RNA-seq also provides new insights into the regulatory eQTLs, in a technically straightforward experiment quick- mechanisms underlying eQTLs. Here, we discuss the ly spread to studies in more complex genomes [7] including major advances introduced by RNA-seq and summarize the human genome [8]. Several eQTL studies in humans current progress towards understanding the role of eQTLs in determining human phenotypic diversity. Glossary Complex traits and common variants in humans: Expression quantitative trait loci (eQTL): Term most commonly used to noncoding DNA takes center stage describe a statistically significant genotype–gene expression level correlation. The majority of mutations underlying monogenic disease Expression is detected by using microarrays or RNA-seq, and genotypes can be collected at a high density (typical for association-based mapping) or lower traits alter protein structure. As a consequence of this density (utilized in family- or model organism-based linkage or eQTL observation, protein-coding variants were the primary mapping). candidates in the early search of susceptibility alleles for Genome-wide association studies (GWAS): GWAS use large case-control cohorts of individual- or population-based samples with quantitative pheno- multifactorial, complex disease traits [1]. However, the typic data (such as height or lipid levels), which are characterized for genetic first 5 years of genome-wide association studies (GWAS) variation at a high density, e.g. 500 000 to 1 000 000 genotypes collected across for complex disease have shown that if mapping had been the genome. The links between polymorphisms and disease risk or quantita- tive traits are then observed by the point-wise assessment of genetic marker restricted to coding variants alone, only approximately 5% alleles for enrichment among cases or among tails of distribution for of the currently validated disease associations would have quantitative phenotypes. been discovered [2]. Thus, the dissection of the genetic Linkage disequilibrium (LD): The nonrandom association of alleles at different loci. LD is usually the result of a close physical location and lack of architecture of human disease is now focused on variants recombination between loci. One of the consequences of high LD in the residing outside of coding regions; that is variants that human genome is the presence of haplotype blocks consisting of large numbers of polymorphic markers that can be grouped into a limited number of potentially affect regulatory elements. A well-known ex- haplotypes. ample, the lactase persistence phenotype in European Next-generation sequencing (NGS): Techniques based on the amplification populations, was mapped 50 distal to LCT, a gene coding and sequencing of short stretches of DNA in parallel for millions of individual target molecules using randomly ordered arrays or the suspensions of sheared for the lactase enzyme in the small intestine [3]. This target molecules. noncoding stretch of DNA within an intron of an adjacent Paired-end reads: NGS targets are generally short sheared DNA fragments that gene and with no known function was subsequently shown can be sequenced from one or both ends of the fragment. The latter approach allows the collection of physically linked, or paired-end reads, facilitating the to contain a distal enhancer specific to enterocytes produc- mapping and understanding of polynucleotide sequences beyond the read ing lactase in the digestive track [4]. The replication of this length of a single NGS read. association in other ethnicities [5] confirmed the role of RNA sequencing (RNA-seq): NGS application for RNA species present in a sample. Typically, mRNA is isolated from a tissue of interest, converted into nucleotide substitutions within this element in regulating cDNA and sheared into smaller fragments, millions of which can be sequenced LCT expression and explaining population differences in in parallel using one of the NGS technologies. Aligning these short fragments their abilities to digest milk sugar (lactose tolerance). The to the genome can explain the sequence composition in mRNA, expression level (based on the number of overlapping sequences to a specific gene) and gene structure (based on splice junctions). Corresponding author: Majewski, J. ([email protected]). 72 0168-9525/$ – see front matter ß 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2010.10.006 Trends in Genetics, February 2011, Vol. 27, No. 2 Review Trends in Genetics February 2011, Vol. 27, No. 2 Box 1. The study of eQTL in humans RNA-seq could provide a platform-independent and objective standard compared with the microarray ap- In humans, identifying eQTLs is usually carried out by analyzing the linkage or association [8,11,42,82] between gene expression levels proach (Box 2). Recent RNA-seq-based eQTL studies have and genetic markers in cis (within a preselected interval close to the both confirmed and further clarified previous microarray gene) or in trans (distant or located on different chromosomes). In results [25,26]. The first comparisons of cis-eQTL detection the genomic era, many screens for eQTLs have been carried out by by RNA-seq compared with microarray-based approaches measuring the expression levels of a large number of genes and testing them for linkage (in families) or association (in populations) are promising; when disagreeing results between two with a large number of genetic markers. Although such approaches approaches have been detected, the RNA-seq data have have enjoyed some success, as in any whole-genome analysis false more frequently matched the allelic biases observed by the positive results can be introduced due to multiple testing problems Sanger sequencing-based validation method [27]. Impor- – for example, when testing tens of thousands of genes against tantly, sequencing technologies are advancing so rapidly millions of SNP markers – and systematic errors related to the specific genomic technologies used. Hence, the choice of a most that even the most recently published RNA-seq studies accurate gene expression assay is a crucial component of eQTL have used already outdated platforms, with low sequenc- surveys. A recent suggestion is that RNA-seq can provide the more ing coverage and short reads. Technologies available today accurate assessment of expression, and extending this technology allow much higher coverage and longer reads at a reduced to studies of population variation could potentially provide refined cost. In parallel with these incremental improvements, the information at the isoform, transcript and allelic expression levels. With this approach even minute changes in the levels of the introduction of ‘third generation sequencing’ [28] promises expression of genes are detectable, but detecting variations in to allow simple sample preparation (without the need for relative isoform abundance or allelic expression can need substan- amplification) and longer read lengths (thousands of bas- tially higher coverage than for observing population variation in full es), resulting in a more direct assessment of RNA abun- transcript expression [25,26,59]. In RNA-seq, establishing optimal dance and mRNA isoforms. correction for known and hidden technical biases in experiments [26] as well as modeling for isoform structures based on short-read In the following sections, we describe in detail the major data [25] are crucial. However, a generalizable approach has yet to advances brought about by RNA-seq that have allowed us emerge given the rapid progression of NGS technology in terms of to identify the eQTLs responsible for variations at the throughput and read length, which are both influencing the choice transcript, isoform and allele levels. We also outline the

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us