Gtex in the UCSC Genome Browser
Total Page:16
File Type:pdf, Size:1020Kb
GTEx in the UCSC Genome Browser Kate Rosenbloom UCSC Genome Browser Group December 2015 Mo#vaon Incorporate newer gene expression data sets and explore new ways of displaying #ssue- specific gene expression in the browser. Current #ssue expression tracks: GNF Atlas 2 (2009) ENCODE transcrip#on signal (2008-2012) è Browser supplement funding to integrate GTEx into the genome browser database and develop visualizaons OXT: Oxytocin precursor Expression from GNF Atlas2 Aims 1. Integrate gene expression data and metadata into the browser database 2. Develop new track display with more compact layout for higher on-screen annotaon density 3. Create browser tracks for GTEx gene-level expression and allele-specific expression Progress 1. Integrate gene expression data and metadata into the ✔ browser database ✔ 2. Develop new bar-graph track display ✔ 3. Create browser track for GTEx gene-level expression 4. Create browser track for GTEx allele-specific expression Try the GTEx gene expression track on the UCSC preview browser: hVp://#nyurl.com/gtexUcsc GTEx Gene Expression track Scale 1 kb hg19 chr17: 37,821,000 37,821,500 37,822,000 37,822,500 37,823,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) TCAP Basic Gene Annotation Set from ENCODE/GENCODE Version 18 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas 4.88 _ 100 vertebrates Basewise Conservation by PhyloP 100 Vert. Cons -4.5 _ LCAP gene locus, showing highest expression in heart and skeletal muscle. Mouseover on bars shows #ssue type and median score. Click-through shows boxplot with score ranges. Gene expression details page Mul#-gene view Scale 100 kb hg19 chr17: 37,800,000 37,850,000 37,900,000 37,950,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) NEUROD2 MIR4728 PPP1R1B MIEN1 STARD3 GRB7 TCAP IKZF3 PNMT PGAP3 ERBB2 Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas PeptideAtlas 200 Kbp region of chromosome 17 Mul#-megabase view Scale 1 Mb hg19 chr17: 37,000,000 37,500,000 38,000,000 38,500,000 39,000,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) GPR179 MIR4734 ARL5C NEUROD2 PSMD3 RARA KRT24 SOCS7 MLLT6 CACNB1 PPP1R1B CSF3 TOP2A KRT25 ARHGAP23 CWC25 RPL19 STARD3 MED24 IGFBP4 KRT28 SRCIN1 FBXO47 MED1 GRB7 THRA TNS4 KRT10 C17orf96 PLXDC1 CDK12 ZPBP2 WIPF2 CCR7 KRT23 MIR4726 STAC2 TCAP NR1D1 SMARCE1 KRT39 CISD3 FBXL20 PNMT MSL1 KRT26 PCGF2 PGAP3 CASC3 KRT27 PSMB3 ERBB2 RAPGEFL1 TMEM99 PIP4K2B MIR4728 CDC6 KRT12 MIR4727 MIEN1 KRT20 C17orf98 IKZF3 KRT40 RPL23 GSDMB KRTAP3-3 LASP1 ORMDL3 KRTAP3-2 LINC00672 LRRC3C KRTAP3-1 LRRC37A11P GSDMA CTD-2206N4.4 SNORD124 Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas PeptideAtlas 2.7 Mbp region of chromosome 17 Track features • Displays bar graph height as median RPKM raw score (with configurable range) or log transform. Mouseovers on gene shows descrip#on; on bars show #ssue and score. • Shows sample quar#les and outliers on details page. • Allows filtering by #ssue. • Provides comparison func#on to subset samples (currently, Male/Female). • Displays comparisons as difference graph or mirror graphs. Tissue selecon Track configuraon supports user choice of #ssues to display Tissue selec#on, cont. Alternave configuraon panel, with #ssues grouped by system Comparisons Scale 50 kb hg19 chr17: 37,800,000 37,850,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) F F NEUROD2 TCAP M M F F PPP1R1B PGAP3 M M F STARD3 ERBB2 M F PNMT M Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription Simple Nucleotide Polymorphisms (dbSNP 144) Found in >= 1% of Samples Common SNPs(144) Gender expression differences graph. The #ssue filter was applied here to exclude gender-specific ssues. Track Descrip#on Page Track descrip#on Linkouts to GTEx portal UCSC/GENCODE genes detail page GTEx gene expression details page Under the covers: GTEx Database tables Primary table for GTEx Gene Expression track. This is a summary table of #ssue medians Per gene, anchored to the genomic locaon of the GTEx union transcript for the gene. Related tables: Data: gtexSampleData (RPKM scores for each gene by sample) Metadata: gtexTissue, gtexSample, gtexDonor Data mining & analysis: UCSC public MySQL server $ mysql –user=genome –host=genome-mysql.cse.ucsc.edu –A \! !hgFixed -e ‘select sampleId, tissue, gender, age,\ ! !deathClass, ischemicTime, autolysisScore, rin,\ ! !collectionSites, batchId, isolationDate from \ ! ! ! !gtexSample, gtexDonor where \ ! ! ! !! !gtexSample.donor=gtexDonor.name' | \! !!sed -e 's/ //' > !sampleDf.txt! $ R! >sampleDf <- read.table("sampleDf.txt", sep="\t", ! ! !header=TRUE)! Example: Query GTEx metadata tables to create an R dataframe Data mining & analysis: New tool for database queries/intersec#ons Plans • Release GTEx gene expression tracks on hg19 and hg38 (V4, then V6) • Implement GTEx allele-specific expression track: – Display will be based on difference bar graph, as in gene expression track – Ini#al design will locate graph at eQTL SNP’s – Your input welcome! Other plans • Gene sorter • Tree of cells visualizaon Community data: GTEx data hub ? • RNA-seq signal tracks for all samples • Exon-level expression • Transcript-level expression • Proteomics ? -> Can display on UCSC or Ensembl (and later, NCBI sequence viewer) -> Can view and intersect eQtl’s with regulatory data from ENCODE, Epigenomics Roadmap, etc. -> Can view and intersect with variant and medical annotaons, comparave genomics GTEx + private data: Genome Browser in a Box (GBiB) GBiB running on a Macbook, with private data loaded as a custom track. The GBiB preinstalled image is run in a virtual machine so data can be kept local. UCSC tracks can be mirrored locally or retrieved from the UCSC public MySQL server. Acknowledgements Colleagues: UCSC Genome Browser group PI: Jim Kent Collaborators: GTEx Consor#um Funding: NHGRI (5 U41 HG002371 to UCSC Center for Genomic Science) UCSC Browser help Send us a message to our public mailing list: [email protected] View training info, videos, and user’s guides: http://genome.ucsc.edu/training/ Find us on: Genome Browser @GenomeBrowser UCSC Genome Browser More information: More information: More information: Send us a message on our Find us on: SendSend us us a a message message on on our our FindFind us us on: on: publicpublic mailing mailing list: list: [email protected] mailing list: GenomeBrowser [email protected]@soe.ucsc.edu GenomeBrowser GenomeBrowser View training info, videos, and @GenomeBrowser ViewView training training info, info, videos, videos, and and @GenomeBrowser @GenomeBrowser user’suser’s guides: guides: http://genome.ucsc.edu/training/user’s guides: UCSC Genome Browser http://genome.ucsc.edu/training/ UCSC Genome Browser http://genome.ucsc.edu/training/ UCSC Genome Browser .