RESEARCH

David S. DeLuca,1 Ayellet V. Segrè,1 Timothy J. Sullivan,1 Medical Center, New York, NY 10032, USA. 18Wellcome Trust MA 02138. 42National Disease Research Interchange, Philadelphia, Taylor R. Young,1 Ellen T. Gelfand,1 Casandra A. Trowbridge,1 Centre for Human Genetics Research, Nuffield Department of PA 19103, USA. 43Roswell Park Cancer Institute, Buffalo, NY 14263, Julian B. Maller,1,2 Taru Tukiainen,1,2 Monkol Lek,1,2 Lucas D. Ward,1,3 Clinical Medicine, University of Oxford, Oxford OX3 7BN, UK. USA. 44Science Care Inc., Phoenix, AZ, USA. 45Gift of Life Donor Pouya Kheradpour,1,3 Benjamin Iriarte,3 Yan Meng,1 19Department of Computer Science, Stanford University, Stanford, Program, Philadelphia, PA 19103, USA. 46LifeNet Health, Virginia Cameron D. Palmer,1,4 Tõnu Esko,1,4,5 Wendy Winckler,1 CA 94305, USA. 20Department of Computer Science, Johns Beach, VA 23453, USA. 47UNYTS, Buffalo, NY 14203, USA. Joel N. Hirschhorn,1,4 Manolis Kellis,1,3 Daniel G. MacArthur,1,2 Hopkins University, Baltimore, MD 21218, USA. 21Centre for 48Virginia Commonwealth University, Richmond, VA 23298, USA. Gad Getz1,6; UNC/NCSU: Andrey A. Shabalin,7 Gen Li,8 Genomic Regulation (CRG), 08003 Barcelona, Catalonia, Spain. 49Department of Public Health, Temple University, Philadelphia, PA Yi-Hui Zhou,9 Andrew B. Nobel,8 Ivan Rusyn,10,11 Fred A. Wright9; 22Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain. 19122, USA. 50Van Andel Research Institute, Grand Rapids, MI Univ. of Geneva: Tuuli Lappalainen,12,13,14,15,16,17 Pedro G. Ferreira,12,13,14 23Human Genetics Department, McGill University, Montréal, 49503, USA. 51Biorepositories and Biospecimen Research Branch, Halit Ongen,12,13,14 Manuel A. Rivas,18 Alexis Battle,19,20 Quebec H3A 0G1, Canada. 24National Institute for Scientific National Cancer Institute, Bethesda, MD 20892, USA. Sara Mostafavi,19 Jean Monlong,21,22,23 Michael Sammeth,21,22,24 Computing, Petropolis 25651-075, Rio de Janeiro, Brazil. 52Biospecimen Research Group, Clinical Research Directorate, Marta Melé,21,22,25 Ferran Reverter,21,26 Jakob M. Goldmann,21,27 25Department of Stem Cell and Regenerative Biology, Harvard Leidos Biomedical Research, Inc., Rockville, MD 20852, USA. Daphne Koller,19 Roderic Guigó,21,22,28 Mark I. McCarthy,18,29,30 University, Cambridge, MA 02138, USA. 26Universitat de Barcelona, 53iDoxSolutions Inc., Bethesda, MD 20814, USA. 54Sapient Emmanouil T. Dermitzakis12,13,14; Univ. of Chicago: Eric R. Gamazon,31,32 08028 Barcelona, Spain. 27Radboud University Nijmegen, Government Services, Arlington, VA 22201, USA. 55Brain Hae Kyung Im,31 Anuar Konkashbaev,31,32 Dan L. Nicolae,31 Netherlands. 28Institut Hospital del Mar d’Investigacions Mèdiques Endowment Bank, Department of Neurology, Miller School of Nancy J. Cox,31,32 Timothée Flutre,33,34 Xiaoquan Wen,35 (IMIM), 08003 Barcelona, Spain. 29Oxford Centre for Diabetes, Medicine, University of Miami, Miami, FL 33136, USA. 56Division of Matthew Stephens,33,36 Jonathan K. Pritchard33,37,38; Endocrinology and Metabolism, University of Oxford, Churchill Genomic Medicine, National Human Genome Research Institute, Harvard: Zhidong Tu,39,40 Bin Zhang,39,40 Tao Huang,39,40 Quan Hospital, Oxford OX3 7LJ, UK. 30Oxford NIHR Biomedical Research Bethesda,MD20892,USA.57Division of and Society, National Long,39,40 Luan Lin,39,40 Jialiang Yang,39,40 Jun Zhu,39,40 Jun Liu41 Centre, Churchill Hospital, Oxford OX3 7LJ, UK. 31Section of Human Genome Research Institute, Bethesda, MD 20892, USA. 58Office Biospecimen and data collection, processing, quality control, Genetic Medicine, Department of Medicine and Department of of Science Policy, Planning, and Communications, National Institute storage, and pathological review caHUB Biospecimen Source Sites, Human Genetics, University of Chicago, Chicago, IL 60637, USA. of Mental Health, Bethesda, MD 20892, USA. 59Division of National Disease Research Interchange (NDRI): Amanda Brown,42 32Division of Genetic Medicine, Department of Medicine, Vanderbilt Neuroscience and Basic Behavioral Science, National Institute of Bernadette Mestichelli,42 Denee Tidwell,42 Edmund Lo,42 University, Nashville, TN 37232, USA. 33Department of Human Mental Health, Bethesda, MD 20892, USA. 60Cancer Diagnosis Michael Salvatore,42 Saboor Shad,42 Jeffrey A. Thomas,42 Genetics, University of Chicago, Chicago, IL 60637, USA. 34INRA, Program, National Cancer Institute, Bethesda, MD 20892, USA. John T. Lonsdale42; Roswell Park: Michael T. Moser,43 Department of Plant Biology and Breeding, UMR 1334, AGAP, 43 43 43 35 Bryan M. Gillard, Ellen Karasik, Kimberly Ramsey, Montpellier, 34060, France. Department of Biostatistics, SUPPLEMENTARY MATERIALS Christopher Choi,43 Barbara A. Foster43; Science Care Inc.: University of Michigan, Ann Arbor, MI 48109, USA. 36Department of www.sciencemag.org/content/348/6235/648/suppl/DC1 John Syron,44 Johnell Fleming,44 Harold Magazine,44; Gift of Life Statistics, University of Chicago, Chicago, IL 60637, USA. Materials and Methods Donor Program: Rick Hasz45; LifeNet Health: Gary D. Walters46; 37Department of Genetics and Biology, Stanford University, Box S1 UNYTS: Jason P. Bridge,47 Mark Miklos,47 Susan Sullivan47 Stanford, CA 94305, USA. 38Howard Hughes Medical Institute, Figs. S1 to S34 caHUB ELSI study VCU: Laura K. Barker,48 Heather M. Traino,48,49 Chicago, IL, USA. 39Department of Genetics and Genomic 48 48,49 Tables S1 to S15 Maghboeba Mosavel, Laura A. Siminoff caHUB Sciences, Icahn School of Medicine at Mount Sinai, New York, NY – References (50 86) on July 8, 2015 comprehensive biospecimen resource Van Andel Research 10029, USA. 40Icahn Institute of Genomics and Multiscale Biology, Institute: Dana R. Valley,50 Daniel C. Rohrer,50 Scott D. Jewell50 Icahn School of Medicine at Mount Sinai, New York, NY 10029, 3 October 2014; accepted 3 April 2015 caHUB pathology resource center NCI: Philip A. Branton51; USA. 41Department of Statistics, , Cambridge, 10.1126/science.1262110 Leidos Biomedical Research Inc.: Leslie H. Sobin,52 Mary Barcus52 caHUB comprehensive data resource Leidos Biomedical Research Inc.: Liqun Qi,52 Jeffrey McLean,52 Pushpa Hariharan,52 Ki Sung Um,52 Shenpei Wu,52 David Tabor,52 Charles Shive52 ◥ caHUB operations management Leidos Biomedical Research Inc.: REPORTS Anna M. Smith,52 Stephen A. Buia,52 Anita H. Undale,52 Karna L. Robinson,52 Nancy Roche,52 Kimberly M. Valentino,52 Angela Britton,52 Robin Burges,52 Debra Bradbury,52 Kenneth W. Hambright,52 John Seleski,53 Greg E. Korzeniewski52; HUMAN GENOMICS Sapient Government Services: Kenyon Erickson54 Brain bank www.sciencemag.org operations University of Miami: Yvonne Marcus,55 Jorge Tejada,55 Mehran Taherian,55 Chunrong Lu,55 Margaret Basile,55 Deborah C. Mash55 Program management NHGRI: Simona Volpi,56 The human transcriptome across Jeffery P. Struewing,56 Gary F. Temple,56 Joy Boyer,57 Deborah Colantuoni56; NIMH: Roger Little,58 Susan Koester59; NCI: Latarsha J. Carithers,51 Helen M. Moore,51 Ping Guan,51 tissues and individuals Carolyn Compton,51 Sherilyn J. Sawyer,51 Joanne P. Demchok,60 51 51 51,57 Jimmie B. Vaught, Chana A. Rabiner, Nicole C. Lockhart 1,2 1,3,4,5 1,6,7 8 1 1,6 9 Marta Melé, Pedro G. Ferreira, Ferran Reverter, David S. DeLuca, Writing committee Kristin G. Ardlie, Gad Getz, Fred A. Wright, * * * 1,7,9 1,7,10 8 1,7,11 Manolis Kellis,1,3 Simona Volpi,56 Emmanouil T. Dermitzakis12,13,14 Jean Monlong, Michael Sammeth, Taylor R. Young, Jakob M Goldmann, Downloaded from Dmitri D. Pervouchine,1,7,12 Timothy J. Sullivan,8 Rory Johnson,1,7 Ayellet V. Segrè,8 1,7 3,4,5 13 1Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Sarah Djebali, Anastasia Niarchou, The GTEx Consortium, Fred A. Wright, 2Analytic and Translational Genetics Unit, Massachusetts General 3,4,5,14,15 6 8,16 3,4,5 3 Tuuli Lappalainen, Miquel Calvo, Gad Getz, Emmanouil T. Dermitzakis, Hospital, Boston, MA 02114, USA. MIT Computer Science and 8† 1,7,17,18† Artificial Intelligence Laboratory, Massachusetts Institute of Kristin G. Ardlie, Roderic Guigó Technology, Cambridge, MA 02139, USA. 4Center for Basic and Translational Obesity Research and Division of Endocrinology, Transcriptional regulation and posttranscriptional processing underlie many cellular and Boston Children’s Hospital, Boston, MA 02115, USA. 5Estonian organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue 6 Genome Center, University of Tartu, Tartu, Estonia. Cancer Center Expression (GTEx) project to investigate the patterns of transcriptome variation across and Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA. 7Center for Biomarker Research and individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show Personalized Medicine, Virginia Commonwealth University, stability in postmortem samples. These signatures are dominated by a relatively small Richmond, VA 23298, USA. 8Department of Statistics and number of genes—which is most clearly seen in blood—though few are exclusive to Operations Research and Department of Biostatistics, University of a particular tissue and vary more across tissues than individuals. Genes exhibiting high North Carolina, Chapel Hill, NC 27599, USA. 9Bioinformatics Research Center and Departments of Statistics and Biological interindividual expression variation include disease candidates associated with sex, Sciences, North Carolina State University, Raleigh, NC 27695, USA. ethnicity, and age. Primary transcription is the major driver of cellular specificity, with 10Department of Environmental Sciences and Engineering, splicing playing mostly a complementary role; except for the brain, which exhibits a more 11 University of North Carolina, Chapel Hill, NC 27599. Department divergent splicing program. Variation in splicing, despite its stochasticity, may play in of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA. 12Department of Genetic Medicine contrast a comparatively greater role in defining individual phenotypes. and Development, University of Geneva Medical School, 1211 Geneva, Switzerland. 13Institute for Genetics and Genomics in ene expression is the key determinant of underlying human biology and disease. Whereas Geneva (iG3), University of Geneva, 1211 Geneva, Switzerland. cellular phenotype, and genome-wide ex- expression data sets from tissues/primary cells 14 Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland. pression analysis has been a mainstay of (1, 2)andindividuals(3) have accumulated over 15Department of Genetics, Stanford University, Stanford, CA 94305, USA. 16New York Genome Center, New York, NY 10011, genomics and biomedical research, pro- recent years, only limited expression data sets have 17 G USA. Department of Systems Biology, Columbia University viding insights into the molecular events allowed analysis across tissues and individuals

660 8MAY2015• VOL 348 ISSUE 6235 sciencemag.org SCIENCE RESEARCH | REPORTS simultaneously (4). The Genotype-Tissue Ex- types contributing little (fig. S5). The primary enous retrovirus L repeats (ERVL and ERVL- pression Project (GTEx) is developing such a separation, as observed in prior studies (10), MaLR) (fig. S12C). resource (5, 6), collecting multiple “nondiseased” is between nonsolid (blood) and solid tissues Using linear mixed models, we found that tissues sampled from recently deceased human and, within solid tissues, brain is the most dis- variation in gene expression is far greater among donors. We analyzed the GTEx pilot data freeze tinct. Brain subregions are not well differen- tissues (47% of total variance in gene expression) (6), which comprised RNA sequencing (RNA- tiated, with the exception of cerebellum (fig. S6). than among individuals (4% of total variance, seq) from 1641 samples from 175 individuals Postmortem ischemia appears to have little im- Fig. 2A and table S9), and very similar for PCGs representing 43 sites: 29 solid organ tissues, pact on the characteristic tissue transcriptional and lncRNAs when controlling for gene expres- 11 brain subregions, whole blood, and two cell signatures, as previously noted (11). In a com- sion (Fig. 2A). Genes that show high expression lines: Epstein-Barr virus–transformed lympho- parison of 798 GTEx samples with 609 “nondis- variance across individuals and low variance cytes (LCL) and cultured fibroblasts from skin eased” samples obtained from living (surgical) across tissues include genes on the sex chromo- [table S1 and (7)]. donors (table S2), we found that GTEx samples somes, as well as autosomal genes, such as the The identification and characterization of clustered with surgical samples of the same tis- RHD gene that determines Rh blood group. genetic variants that are associated with gene suetype(Fig.1CandtableS3)(12). We identified 92 PCGs and 43 lncRNAs with expression are extensively discussed in (6). Here Tissue transcription is generally dominated global sex-biased expression [false discovery rate we use the GTEx data to investigate the pat- by the expression of a relatively small number (FDR) < 0.05, Fig. 2B and table S10]. Genes over- terns of transcriptome variation across indi- of genes. Indeed, we found that for most tis- expressed in males are predominantly located viduals and tissues and how these patterns sues, about 50% of the transcription is accounted on the Y chromosome. Conversely, many genes associate with human phenotypes. RNA-seq per- for by a few hundred genes (13). In many tissues, on the X chromosome are overexpressed in fe- formed on the GTEx pilot samples produced the bulk of transcription is of mitochondrial males, suggesting that more genes might es- an average of 80 million paired-end mapped origin (Fig. 1D and table S4) (14). In kidney, cape X inactivation than previously described reads per sample (fig. S1) (7, 8). We used the for instance, a highly aerobic tissue with many (16). Among these, we found XIST and JPX, mapped reads to quantify gene expression using mitochondria, a median of 51% (>65% in some known to participate in X inactivation, as well Gencode V12 annotation (9), which includes samples) of the transcriptional output is from as the lncRNAs RP11-309M23.1 and RP13-216E22.4, 20,110 protein-coding genes (PCGs) and 11,790 the mitochondria (fig. S7). Other tissues show the expression of which shows enrichment in long noncoding RNAs (lncRNAs). Comparison nuclear-dominated expression; in blood, for ex- the nucleus in female cell lines from ENCODE with microarray-based quantification for a sub- ample, three hemoglobin genes contribute more (17) and hence could be candidates to also set of 736 samples showed concordance be- than 60% to total transcription. Genes related participate in X inactivation (fig. S14) (16). tween the two technologies (average correlation to lipid metabolism in pancreas, actin in muscle, Among autosomal PCGs, MMP3, linked to sus- coefficient = 0.83, fig. S2). At the threshold de- and thyroglobulin in thyroid are other exam- ceptibility to coronary heart disease [Online fined for expression quantitative trait loci (eQTL) ples of nuclear genes contributing dispropor- Mendelian Inheritance in Man (OMIM) no. analysis [reads per kilobase per million mapped tionally to tissue-specific transcription. Because 614466] and more prevalent in males, shows reads (RPKM) > 0.1, see (7)], at which 88% of RNA samples are generally sequenced to the same the strongest expression bias (Fig. 2B). PCGs and 71% of lncRNAs are detected in at least depth, in tissues where a few genes dominate We detected 221 PCGs and 153 lncRNAs glob- one sample, the distribution of gene expression expression, fewer RNA-seq reads are compar- ally differentially expressed between individu- across tissues is U-shaped and complementary atively available to estimate the expression of als of European and African-American ancestry between PCGs (generally ubiquitously expressed) the remaining genes, decreasing the power to (FDR < 0.05, Fig. 2C and table S11). There is a and lncRNAs (typically tissue-specific or not ex- estimate expression variation. These tissues— slight enrichment of lncRNAs (P <1×10−6), pressed, Fig. 1A). i.e., blood, muscle, and heart (Fig. 1E)—are, con- among which we identified the RP11-302J23.1 Tissues show a characteristic transcription- sequently, those with less power to detect eQTLs gene, highly expressed in cardiac tissue in Af- al signature, as revealed by multidimensional (6).BecausemosteQTLanalysesareperformed rican Americans only, and located in a region scaling, of both PCG and lncRNA expression on easily accessible samples, such as blood, this that harbors weak associations to heart disease (figs. 1B, S3, and S4), with individual pheno- highlights the relevance of the GTEx multitissue (18). Additionally, some genes showing differ- approach. ential expression by ethnicity lie in genomic re- Although thousands of genes are differentially gions under positive selection in European or 1Center for Genomic Regulation (CRG), Barcelona, Catalonia, Spain. 2Harvard Department of stem cell and regenerative expressed between tissues (fig. S8) or show tissue- sub-Saharan African populations (Fig. 2C and biology, Harvard University, Cambridge, MA, USA. preferential expression (fig. S9 and table S5), fig. S15). 3Department of Genetic Medicine and Development, fewer than 200 genes are expressed exclusively Finally, we detected 1993 genes that globally 4 University of Geneva, Geneva, Switzerland. Institute for in a given tissue (figs. S10 and S11 and tables S6 change expression with age (FDR < 0.05, Fig. 2D Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland. 5Swiss Institute of and S7, A to E). The vast majority (~ 95%) are and table S12). Genes that decrease expression Bioinformatics, Geneva, Switzerland. 6Facultat de Biologia, exclusive to testis and many are lncRNAs. This are enriched in functions and pathways related Universitat de Barcelona (UB), Barcelona, Catalonia, Spain. may reflect low-level basal transcription com- to neurodegenerative diseases such as Parkinson’s 7 Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, mon to all cell types or result from general tissue and Alzheimer’s diseases, among which eight har- Spain. 8Broad Institute of MIT and Harvard, Cambridge, MA, USA. 9McGill University, Montreal, Canada. 10National heterogeneity, with few primary cell types being bor single-nucleotide polymorphisms (SNPs) Institute for Scientific Computing (LNCC), Petropolis, Rio de specific to a given tissue. for these diseases identified from genome-wide Janeiro, Brazil. 11Radboud University, Nijmegen, Netherlands. Expression of repetitive elements also reca- association studies (P < 0.05). Among the genes 12 Faculty of Bioengineering and Bioinformatics, Moscow pitulates tissue type (table S8 and fig. S12A). that increase expression with age is EDA2R,whose State University, Leninskie Gory 1-73, 119992 Moscow, Russia. 13North Carolina State University, Raleigh, NC, USA. We identified 3046 PCGs whose expression, in ligand, EDA, has been associated with age-related 14New York Genome Center, New York, NY, USA. at least one tissue, was correlated with the ex- phenotypes (19). 15Department of Systems Biology, Columbia University, New pression of the closest repeat element (on aver- We also identified 753 genes with tissue spe- 16 York, NY, USA. Cancer Center and Department of Pathology, age 2827 base pairs away, fig. S12B). In about cific sex-biased expression (FDR < 0.05, table S13) Massachusetts General Hospital, Boston, MA 02114, USA. 17Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), half of these cases, the repeat was also signif- predominantly in breast tissue (92%), and 31 Barcelona, Catalonia, Spain. 18Joint CRG-Barcelona Super icantly coexpressed with other repeats of its genes with tissue-specific ethnicity-biased expres- Computing Center (BSC)–Institut de Recerca Biomedica (IRB) same family (table S8 and fig. S13). LncRNA sion, many in the skin (FDR < 0.05, Table 1 and Program in Computational Biology, Barcelona, Catalonia, Spain. † expression can be regulated by specific repeat table S14). Among the sex-differentially expressed *These authors contributed equally to this work. Corresponding 15 author. E-mail: [email protected] (K.G.A.); roderic. families ( ), and we found evidence that testis- genes, five show biased expression specifically [email protected] (R.G.) specific expression could be regulated by endog- in heart and are of interest given the differing

SCIENCE sciencemag.org 8MAY2015• VOL 348 ISSUE 6235 661 RESEARCH | REPORTS prevalence of cardiovascular disease between ysis of coexpression networks. We identified derm development (FDR = 4.6 × 10−14, fig. males and females. One of these genes, PLEKHA7 42 coexpression modules in males and 46 in S16C). Differential network expression, there- (fig. S15C), contains SNPs associated with risk females (fig. S16). Among male-specific mod- fore, distinguishes differences between male for cardiovascular disease. ules, we found one related to spermatid differ- and females not well captured by analysis of Overall, tissue specificity is likely to be driven entiation and development (FDR = 9.0 × 10−4, individual genes. by the concerted expression of multiple genes. fig. S16B), and among female-specific modules, Split-mapped RNA-seq reads predict about Thus, we performed sex-based differential anal- we found one related to epidermis and ecto- 87,000 novel junctions with very strong support

12 0 Blood LCL Liver Testis 0.15 Liver 10 2 Muscle Muscle Heart lncRNAs Brain 0.10 Pituitary Pancreas 8 4 Heart Kidney RPKM Skin Fibroblasts 0.1 < Artery 0.05 Adipose tissue Skin > 0.1 and < 1 Adrenal gland Breast 6 6 > 1 and < 10 Stomach > 10 and < 100 Lung Colon Ovary > 100 and < 1000 0.00 Nerve Vagina > 1000 and < 10000 Thyroid Blood Esophagus 4 8

Coordinate 2 (16%) Prostate Testis Artery Protein coding −0.05 Uterus Number of genes (thousands) Pituitary Fallopian tube 2 10 Nerve Brain Adipose −0.10 LCL Breast Lung Thyroid 0 12 0123456789 −0.1 0.0 0.1 0.2 0.3 0.4 0.050.15 0.25 Number of tissues Coordinate 1 (34%) Distance between clusters

2 GTEx samples F1 0.75 Protein coding gene Pseudogene Adipose, subcutaneous F6 Lung Mitochondrion gene lncRNA Blood, ante-mortem Post-mortem Blood, post-mortem 0.50 Ante-mortem Muscle, skeletal Skin, sun exposed Breast Blood 1 Lung 0.25 Thyroid Heart, left ventricle 0.00 F5

Living samples LCL Skin Liver Lung Brain Heart Colon Ovary Nerve Blood Breast Testis Uterus Artery

Adipose, subcutaneous Kidney Heart Vagina Muscle Pituitary Thyroid Prostate Adipose Stomach

Muscle, skeletal Pancreas Fibroblasts Skin, sun exposed Esophagus 0 Blood, ante-mortem Adrenal gland S2 Thyroid Breast 1.00 Lung Thyroid F4 Heart, left ventricle 0.75

Fraction of total transcriptional output Blood Heart −1 0.50 Muscle F3 Brain Skin Adipose 0.25 Artery Muscle Adipose Thyroid F2 Testis Breast F7 Nerve Skin 0.00 −2 −2 –1 0 1 2 1 100 10,000 S1 Number of genes Fig. 1. The GTEx multitissue transcriptome. (A) Gene expression levels Transcriptome complexity. Bottom: Cumulative distribution of the av- and number of tissues in which genes are expressed (>0.1 RPKM in at erage fraction of total transcription contributed by genes when sorted from least 80% of the samples). RPKMs are averaged over all genes expressed most-to-least expressed in each tissue (x axis). Lines represent mean values in a given number of tissues. (B) Sample and tissue similarity on the across samples of the same tissue, and lighter-color surfaces around the basis of gene expression profiles. Left: Multidimensional scaling Right: mean represent dispersion calculated as the standard deviation divided by Tissue hierarchical clustering. (C) Expression values from eight GTEx the cumulative sum of all means. Top: Biological type and relative contribution tissues (colored circles) plotted radially along seven metagenes ex- to total transcription of the hundred most expressed genes. Height of the tracted from expression data. Antemortem samples curated from the bars is proportional to the fraction that these genes contribute to total Gene Expression Omnibus (GEO) cluster strongly with GTEx tissues. (D) transcription.

662 8MAY2015• VOL 348 ISSUE 6235 sciencemag.org SCIENCE RESEARCH | REPORTS

(fig. S17). These tend to be more tissue spe- splicing program in the brain (20). Furthermore, sues (Wilcoxon test, P <1×10−7,Fig.3B).This cific, detected in fewer samples, and less con- preferential gene expression of RNA-binding pro- pattern is not obvious in short exons longer than served than previously annotated junctions teins and both differential and preferential 15 bp (P =0.3,fig.S21).Thisobservedbrain- (only 2.6% of novel junctions can be detected exon inclusion are enriched in the brain (figs. specific splicing pattern may result from differen- as orthologous in mouse, compared to 65% for S18 and S19 and table S15). We found very few tial splicing in the cerebellum, because expression annotated junctions). Multidimensional scal- exons exclusively included or excluded in a given clustering of the brain regions reveals a general ing based on exon inclusion levels again large- tissue (fig. S20 and table S16), 40% of which up-regulation of RNA-binding proteins specifi- ly recapitulates tissue type (Fig. 3A). However, show exclusive inclusion in the brain. We also cally in the cerebellum (Fig. 3C). This is also the samples from brain cluster as the primary out- found that micro-exons (<15 bp) are overwhelm- brain region exhibiting the largest proportion of group, supporting the existence of a distinct inglyusedinthebraincomparedtoothertis- novel splicing events (fig. S22).

Chr Y Pseudoautosomal region Protein coding 1.0 Autosomes Chr X Chr X lincRNAs lncRNAs * XIST Chromosome Y 10.00 Overexpressed Overexpressed 0.9 GSTM1 GSTT1 in males in females 1.0 XIST 5.00

AC104135.3 2.00 AC000032.2 0.8 MMP3 0.8 RPS28 RPL9 RP11−403B2.6 1.00 NPIPL2 JPX RP11−365P13.3 0.50 0.7 AC104135.2 0.6 RHD RP11−195E2.4 FAM21B (log [RPKM]) 0.20 HLA−DRB5 HLA−DQB1−AS1 Expression difference 0.10 0.4 0.6 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.05 Invidiual variance 0.15 0.2 MMP3

0.10 0.68 0.70 Males Females 300 Adipose 0.0 0.05 0.65 250 Breast 0.49 200 0.0 0.2 0.4 0.6 0.8 1.0 Kidney 150 0.00 Skin Tissue variance 0.3 0.4 0.5 0.6 0.7 0.8 100 (RPKM) 50 Muscle Tissue variance Expression 0 Nerve

Positive selection in European (CEU) PD or AD metabolic pathway Protein coding * Protein coding * EDA2R lncRNAs * Positive selection in Yaruba (YRI) or Bantu lncRNAs * PD or AD GWAS 0.020 Overexpressed in AA Overexpressed in EA Decrease with age Increase with age 1.0 0.010

RP11−345M22.2 0.5 RP11−302J23.1 0.005

COL28A1

(log [RPKM]) 0.2 0.002 Regression coefficient Expression difference 0.1 0.001

RP11-302J23.1 EDA2R African American European American Nerve Artery 6 6 8 Heart Sp Rho = 0.577 Sp Rho = 0.413 5 5 6 6 4 4 3 3 4 4 2 2 (RPKM) (RPKM) 2 2 Expression

Expression 1 1 0 0 0 0 20 30 40 50 60 70 20 30 40 50 60 70 Age (yr) Age (yr)

Fig. 2. Gene expression across tissues and individuals. (A) Left: Con- males. Bottom: MMP3 gene expression in males and females. (C) Genes tribution of tissue and individual to gene expression variation of PCGs differentially expressed with ethnicity. Top: differentially expressed genes and lncRNAs. Bottom right: Mean T SD over all genes (filled circles) and (FDR < 0.05) between African Americans (AA) and European Americans over genes with similar expression levels in PCGs and lncRNAs (unfilled (EA) sorted according to expression differences. A few of these genes lie circles). Circle size is proportional to the sum of tissue and individual in regions reported to be under positive selection in similar populations. variation, and segment length corresponds to 0.5 SD. Top right: genes Bottom: expression of RP11-302J23.1. (D) Genes differentially expressed with high individual variation and low tissue variation. (B) Sex differen- with age. Top: Genes sorted according to the regression coefficient. tially expressed genes. Top: differentially expressed genes (FDR < 0.05) Bottom: expression of EDAR2 gene in nerve and artery as a function of sorted according to expression differences between males and females. age. Shaded area around the regression line represents 95% confidence Genes in the Y chromosome are sorted according to the expression in interval.

SCIENCE sciencemag.org 8MAY2015• VOL 348 ISSUE 6235 663 RESEARCH | REPORTS

Translation related

0.35 RPS28 0.04 Blood 0.35 0.30 NDUFS5 TMSB4X Testis 0.30 0.25 RNF144B NAMPTL Brain 0.02 LCL RPL12 HLA−C FAM127A RPS12 UBC 0.25 Pituitary RPL23A TRMT1 ID3 RPL17 0.20 RPL39 RPL23 MIEN1 SNAP29 Lung Thyroid RAD23A RPL10A EIF1B RPL3 RPL4 Skin 0.20 RPLP1 RPS16 FARSB RPS8 RPSAP58 RPL9 RPL29 RPL36 0.00 0.15 EIF3F Nerve 0.15 Adipose 0.00 0.05 0.10 0.15 0.20 Coordinate 2 (11.5%) tissue 0.10

Heart Artery Breast Individual variance 0.12 Isoform relative abundance (0.23) 0.05 0.08 - 0.02 Muscle Gene expression (0.65) 0.04

0.0 0.2 0.4 0.6 0.8 0.10 0.20 0.30 0.40 0.50 0.60 0.70 Tissue variance Tissue variance −0.08 −0.06 −0.04 −0.02 0.00 0.02 Coordinate 1 (52%)

Brain Brain subregions Adipose Cerebellar hemisphere Artery Cerebellum Pituitary Cortex Skin Frontal cortex Blood Hippocampus Muscle Heart Substantia nigra Nerve Anterior cingulate corte Lung Amygdala Pancreas Caudate Breast Nucleus accumbens Thyroid Putamen Esophagus Hypothalamus Testis Spinal cord Stomach Fibroblasts HNRNPD Colon SRSF4 HNRNPA0 LCL KHDRBS1 KHSRP SRSF7 −1.0 −0.5 0.0 0.5 1.0 ZRANB2 SRSF11 RBM25 Phi HNRNPH1 SRSF1 SFPQ HNRNPH3 TRA2A SRSF6 RBFOX2 SRSF2 SF1 RBM5 SF3B1 Between individual HNRNPA3 6 TARDBP PCBP2 Between tissue HNRNPM HNRNPA1 HNRNPU SRSF3 SRSF9 HNRNPC ELAVL3 QKI 4 TIA1 SRRM1 TIAL1 CELF1 PTBP1 SYNCRIP MBNL1 ELAVL1

Density PTBP2 FMR1 TRA2B 2 RBMX DAZAP1 HNRNPF CELF2 RBFOX1 HNRNPK HNRNPL HNRPDL SRSF5 FUS YBX1 0 PCBP1 HNRNPA2B1 0.00 0.25 0.50 0.75 1.00 SRSF10 HNRPLL NOVA1 KHDRBS3 Gene expression contribution NOVA2 ELAVL2 ELAVL4 to isoform abundance HNRNPH2 KHDRBS2 RBM4 ESRP1 ESRP2

−4 −2 0 2 4 z score Fig. 3. Splicing across tissue and individuals. (A) Multidimensional scaling The order of samples and genes is obtained by biclustering the expression of all samples on the basis of exon inclusion levels (Percent spliced in, PSI). matrix. (D) Left: Contribution of tissue and individual to splicing variation in (B) Microexon inclusion across tissues.Values of tissue exon inclusion close PCGs. Bottom right: Mean T SD of individual and tissue contributions to to 1 (–1) indicate that the microexon is included (excluded), in nearly all splicing and to gene expression variation. Circle size is proportional to the samples from the tissue, and excluded (included) in nearly all samples from sum of tissue and individual variation and segment length corresponds to the rest of the tissues.Tissues are sorted according to tissue exon inclusion 0.5 SD. Top right: Genes with high splicing variation across individuals. (E) (phi) median value. (C) Clustering of brain samples on the basis of the Contribution of gene expression to the between-individual and between- normalized expression levels of 67 RNA binding proteins involved in splicing. tissue variation in isoform abundance

664 8MAY2015• VOL 348 ISSUE 6235 sciencemag.org SCIENCE RESEARCH | REPORTS

Table 1. Genes with sex-biased and ethnicity-biased expression in GTEx tissues. Differentially expressed genes between males (M) and females (F) and between African American (AA) and European American (EA) in those tissues with at least 10 samples per group. Median fold change (on autosomal genes) was calculated for tissues with more than two significant genes.

Sex Ethnicity No. of No. of No. of No. of samples genes Med. samples genes Med. Pseudo- fold fold All Y X All MF PCGs lncRNA autosomal Autosomalchange AA EA PCGs lncRNA change genes chrom. chrom. genes region Adipose 76 37 45 29 16 28 4 1 12 2.8 17 95 0 0 0 – Artery 88 57 40 21 19 29 2 0 9 2.3 21 121 3 0 0 2.9 Blood 99 57 20 12 8 18 1 0 1 – 24 129 0 0 0 – Breast 14 13 762 567 195 26 21 0 715 4.2 5 22 –– – – Esophagus 27 11 23 14 9 22 1 0 0 – 533 –– – – Heart 74 34 27 15 12 23 1 0 3 3 11 95 2 1 1 – LCL 26 13 23 11 12 21 1 1 0 – 930 –– – – Lung 76 43 34 17 17 31 1 0 2 – 14 104 4 3 1 3 Muscle 87 51 42 27 15 24 6 0 12 2.7 18 117 2 2 0 – Nerve 54 34 38 24 14 25 5 1 7 2.3 13 73 2 1 1 – Skin 76 43 47 32 15 31 5 0 11 2.3 14 103 14 13 1 2.9 Thyroid 65 40 42 25 17 27 5 1 9 1.9 13 90 13 7 6 3.5 No. of samples 942 566 172 1307 35 27 8 No. of individuals 111 64 847 628 219 41 34 1 771 24 148

In contrast to gene expression, variation of Overall, our results underscore the value ACKNOWLEDGMENTS splicing, measured either from relative iso- of monitoring the transcriptome of multiple We acknowledge and thank the donors and their families for their form abundance or exon inclusion, is similar tissues and individuals in order to understand generous gifts of organ donation for transplantation and tissue across tissues and across individuals, but ex- tissue-specific transcriptional regulation and donations for the GTEx research study. We thank the Genomics Platform at the for data generation; L. Gaffney for hibits a much larger proportion of residual un- to uncover the transcriptional determinants help with figures; E. Gelfand and C. Trowbridge for project support explained variation (Fig. 3D, fig. S23, and table of human phenotypic variation and disease and members of the Analysis Working Group for feedback; and S17). This could arise from nonadditive inter- susceptibility. D. MacArthur, J. Maller, and B. Neale for critical reading of the actions between individuals and tissues, but manuscript. The primary and processed data used to generate the analyses presented here are available in the following locations: might also reflect stochastic, nonfunctional fluc- REFERENCES AND NOTES All primary sequence files are deposited in and available from tuations that are more common in splicing than 1. FANTOM Consortium and the RIKEN PMI and CLST dbGaP (phs000424.v3.p1); gene and transcript quantifications in expression (21). Among the genes that show (DGT) et al., Nature 507, 462–470 (2014). are available on the GTEx Portal (www.gtexportal.org). The high interindividual splicing variability, we found 2. ENCODE Project Consortium, Nature 489,57–74 (2012). Genotype-Tissue Expression (GTEx) Project was supported by – the Common Fund of the Office of the Director of the National an enrichment of ribosomal proteins and genes 3. T. Lappalainen et al., Nature 501, 506 511 (2013). 4. E. Grundberg et al., Nat. Genet. 44,1084–1089 Institutes of Health (http://commonfund.nih.gov/GTEx). Additional related to translation and protein biosynthe- (2012). funds were provided by the National Cancer Institute (NCI); sis (Fig. 3D and table S18).Highervariability 5. T. J. Lonsdale et al., Nat. Genet. 45, 580–585 (2013). National Human Genome Research Institute; National Heart, Lung, between individuals may also partially reflect 6. The GTEx Consortium, Science 348, 648–660 (2015). and Blood Institute; National Institute on Drug Abuse; National Institute of Mental Health; and National Institute of Neurological an effect of ischemic time on splicing, which 7. Materials and methods are available in the supplementary materials on Science Online. Disorders and Stroke. This work was supported by the following we observed when clustering samples by exon 8. D. S. DeLuca et al., Bioinformatics 28,1530–1532 grants and contracts from the United States National Institutes of inclusion within each tissue (fig. S24). (2012). Health: contract HHSN261200800001E (Leidos Prime contract with NCI); contracts 10XS170 [National Disease Research The abundance of splicing isoforms reflects 9. J. Harrow et al., Genome Res. 22, 1760–1774 (2012). Interchange (NDRI)], 10XS171 (Roswell Park Cancer Institute), 10. M. Lukk et al., Nat. Biotechnol. 28, 322–324 (2010). the actions of both primary transcription and 10X172 (Science Care Inc.), and 12ST1039 (IDOX); contract 11. A. C. Birdsill, D. G. Walker, L. Lue, L. I. Sue, T. G. Beach, posttranscriptional processing—mostly alter- 10ST1035 (Van Andel Institute); contract HHSN268201000029C Cell Tissue Bank. 12, 311–318 (2011). native splicing. To determine the relative con- (Broad Institute); R01 DA006227-17 (University of Miami Brain 12. J. P. Brunet, P. Tamayo, T. R. Golub, J. P. Mesirov, Proc. Natl. Bank); R01 MH090941 (University of Geneva), European tribution of each process, we estimated the – Acad. Sci. U.S.A. 101, 4164 4169 (2004). Research Council, Swiss National Science Foundation, and – proportion of variance in isoform abundance 13. P. Carninci et al., Genome Res. 10, 1617 1630 (2000). Louis-Jeantet Foundation to E.T.D.; R01 MH090936 (University that can be simply explained by variance in 14. R. D. Kelly, A. Mahmud, M. McKenzie, I. A. Trounce, of North Carolina–Chapel Hill); and grants BIO2011-26205 – gene expression. We found that gene expres- J. C. St John, Nucleic Acids Res. 40, 10124 10138 from the Spanish Ministerio de Ciencia e Innovación (MICINN), (2012). sionexplainsonly45%ofthevariancebetween 2014 SGR 464 and 2014 SGR 1319 from the Generalitat de 15. D. Kelley, J. Rinn, Genome Biol. 13, R107 (2012). Catalunya, and 294653 from the European Research – individuals, but 84% of the variance between 16. L. Carrel, H. F. Willard, Nature 434,400404 (2005). Council–European Commission. tissues (Fig. 3E and fig. S25). This strongly 17. S. Djebali et al., Nature 489, 101–108 (2012). 18. V. Regitz-Zagrosek, U. Seeland, Wien. Med. Wochenschr. 161, suggests that primary transcription is the main SUPPLEMENTARY MATERIALS 109 (2011). driver of cellular specificity, with splicing play- 19. M. Yan et al., Science 290, 523–527 (2000). www.sciencemag.org/content/348/6235/660/suppl/DC1 Materials and methods ing a complementary role. Although this may 20. G. Yeo, D. Holste, G. Kreiman, C. B. Burge, Genome Biol. 5, R74 Figs. S1 to S25 be unexpected, given the magnitude of the ef- (2004). Tables S1 to S20 21. J. K. Pickrell, A. A. Pai, Y. Gilad, J. K. Pritchard, PLOS Genet. 6, fect, it is consistent with recent findings of low References (24–72) e1001236 (2010). proteomic support for alternatively spliced iso- Data tables S4 to S7 and S9 to S18 – 22 22. I. Ezkurdia et al., Mol. Biol. Evol. 29, 2265 2283 (2012). forms ( ) and few shifts in major protein iso- 23. M. Gonzàlez-Porta, A. Frankish, J. Rung, J. Harrow, A. Brazma, 20 October 2014; accepted 2 April 2015 forms across cell types (table S19) (23). Genome Biol. 14, R70 (2013). 10.1126/science.aaa0355

SCIENCE sciencemag.org 8MAY2015• VOL 348 ISSUE 6235 665 The human transcriptome across tissues and individuals Marta Melé et al. Science 348, 660 (2015); DOI: 10.1126/science.aaa0355

This copy is for your personal, non-commercial use only.

If you wish to distribute this article to others, you can order high-quality copies for your colleagues, clients, or customers by clicking here. Permission to republish or repurpose articles or portions of articles can be obtained by following the guidelines here.

The following resources related to this article are available online at www.sciencemag.org (this information is current as of July 7, 2015 ):

Updated information and services, including high-resolution figures, can be found in the online version of this article at: http://www.sciencemag.org/content/348/6235/660.full.html Supporting Online Material can be found at: http://www.sciencemag.org/content/suppl/2015/05/06/348.6235.660.DC1.html on July 8, 2015 A list of selected additional articles on the Science Web sites related to this article can be found at: http://www.sciencemag.org/content/348/6235/660.full.html#related This article cites 64 articles, 29 of which can be accessed free: http://www.sciencemag.org/content/348/6235/660.full.html#ref-list-1 This article has been cited by 2 articles hosted by HighWire Press; see: http://www.sciencemag.org/content/348/6235/660.full.html#related-urls www.sciencemag.org This article appears in the following subject collections: Genetics http://www.sciencemag.org/cgi/collection/genetics Downloaded from

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2015 by the American Association for the Advancement of Science; all rights reserved. The title Science is a registered trademark of AAAS.