GTEx in the UCSC Genome Browser
Kate Rosenbloom UCSC Genome Browser Group
December 2015 Mo va on Incorporate newer gene expression data sets and explore new ways of displaying ssue- specific gene expression in the browser.
Current ssue expression tracks:
GNF Atlas 2 (2009) ENCODE transcrip on signal (2008-2012)
è Browser supplement funding to integrate GTEx into the genome browser database and develop visualiza ons
OXT: Oxytocin precursor Expression from GNF Atlas2 Aims
1. Integrate gene expression data and metadata into the browser database
2. Develop new track display with more compact layout for higher on-screen annota on density
3. Create browser tracks for GTEx gene-level expression and allele-specific expression
Progress 1. Integrate gene expression data and metadata into the ✔ browser database
✔ 2. Develop new bar-graph track display
✔ 3. Create browser track for GTEx gene-level expression
4. Create browser track for GTEx allele-specific expression
Try the GTEx gene expression track on the UCSC preview browser: h p:// nyurl.com/gtexUcsc
GTEx Gene Expression track
Scale 1 kb hg19 chr17: 37,821,000 37,821,500 37,822,000 37,822,500 37,823,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors)
TCAP
Basic Gene Annotation Set from ENCODE/GENCODE Version 18
UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics)
Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription
Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas
4.88 _ 100 vertebrates Basewise Conservation by PhyloP 100 Vert. Cons -4.5 _
LCAP gene locus, showing highest expression in heart and skeletal muscle. Mouseover on bars shows ssue type and median score. Click-through shows boxplot with score ranges. Gene expression details page Mul -gene view
Scale 100 kb hg19 chr17: 37,800,000 37,850,000 37,900,000 37,950,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) NEUROD2 MIR4728
PPP1R1B MIEN1
STARD3 GRB7
TCAP IKZF3 PNMT PGAP3
ERBB2 Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription
Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas PeptideAtlas
200 Kbp region of chromosome 17 Mul -megabase view
Scale 1 Mb hg19 chr17: 37,000,000 37,500,000 38,000,000 38,500,000 39,000,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) GPR179 MIR4734 ARL5C NEUROD2 PSMD3 RARA KRT24 SOCS7 MLLT6 CACNB1 PPP1R1B CSF3 TOP2A KRT25 ARHGAP23 CWC25 RPL19 STARD3 MED24 IGFBP4 KRT28 SRCIN1 FBXO47 MED1 GRB7 THRA TNS4 KRT10 C17orf96 PLXDC1 CDK12 ZPBP2 WIPF2 CCR7 KRT23 MIR4726 STAC2 TCAP NR1D1 SMARCE1 KRT39 CISD3 FBXL20 PNMT MSL1 KRT26 PCGF2 PGAP3 CASC3 KRT27 PSMB3 ERBB2 RAPGEFL1 TMEM99 PIP4K2B MIR4728 CDC6 KRT12 MIR4727 MIEN1 KRT20 C17orf98 IKZF3 KRT40 RPL23 GSDMB KRTAP3-3 LASP1 ORMDL3 KRTAP3-2 LINC00672 LRRC3C KRTAP3-1 LRRC37A11P GSDMA CTD-2206N4.4 SNORD124 Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription
Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas PeptideAtlas
2.7 Mbp region of chromosome 17 Track features
• Displays bar graph height as median RPKM raw score (with configurable range) or log transform. Mouseovers on gene shows descrip on; on bars show ssue and score. • Shows sample quar les and outliers on details page. • Allows filtering by ssue. • Provides comparison func on to subset samples (currently, Male/Female). • Displays comparisons as difference graph or mirror graphs. Tissue selec on
Track configura on supports user choice of ssues to display Tissue selec on, cont.
Alterna ve configura on panel, with ssues grouped by system Comparisons
Scale 50 kb hg19 chr17: 37,800,000 37,850,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) F F NEUROD2 TCAP M M F F PPP1R1B PGAP3 M M F STARD3 ERBB2 M F PNMT M Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription
Simple Nucleotide Polymorphisms (dbSNP 144) Found in >= 1% of Samples Common SNPs(144)
Gender expression differences graph. The ssue filter was applied here to exclude gender-specific ssues. Track Descrip on Page Track descrip on Linkouts to GTEx portal
UCSC/GENCODE genes detail page
GTEx gene expression details page Under the covers: GTEx Database tables
Primary table for GTEx Gene Expression track. This is a summary table of ssue medians Per gene, anchored to the genomic loca on of the GTEx union transcript for the gene.
Related tables: Data: gtexSampleData (RPKM scores for each gene by sample) Metadata: gtexTissue, gtexSample, gtexDonor Data mining & analysis: UCSC public MySQL server
$ mysql –user=genome –host=genome-mysql.cse.ucsc.edu –A \! !hgFixed -e ‘select sampleId, tissue, gender, age,\ ! !deathClass, ischemicTime, autolysisScore, rin,\ ! !collectionSites, batchId, isolationDate from \ ! ! ! !gtexSample, gtexDonor where \ ! ! ! !! !gtexSample.donor=gtexDonor.name' | \! !!sed -e 's/ //' > !sampleDf.txt! $ R! >sampleDf <- read.table("sampleDf.txt", sep="\t", ! ! !header=TRUE)!
Example: Query GTEx metadata tables to create an R dataframe Data mining & analysis: New tool for database queries/intersec ons Plans
• Release GTEx gene expression tracks on hg19 and hg38 (V4, then V6) • Implement GTEx allele-specific expression track: – Display will be based on difference bar graph, as in gene expression track – Ini al design will locate graph at eQTL SNP’s – Your input welcome! Other plans
• Gene sorter
• Tree of cells visualiza on Community data: GTEx data hub ?
• RNA-seq signal tracks for all samples • Exon-level expression • Transcript-level expression • Proteomics ?
-> Can display on UCSC or Ensembl (and later, NCBI sequence viewer)
-> Can view and intersect eQtl’s with regulatory data from ENCODE, Epigenomics Roadmap, etc.
-> Can view and intersect with variant and medical annota ons, compara ve genomics GTEx + private data: Genome Browser in a Box (GBiB)
GBiB running on a Macbook, with private data loaded as a custom track. The GBiB preinstalled image is run in a virtual machine so data can be kept local. UCSC tracks can be mirrored locally or retrieved from the UCSC public MySQL server. Acknowledgements
Colleagues: UCSC Genome Browser group PI: Jim Kent Collaborators: GTEx Consor um Funding: NHGRI (5 U41 HG002371 to UCSC Center for Genomic Science) UCSC Browser help
Send us a message to our public mailing list: [email protected]
View training info, videos, and user’s guides: http://genome.ucsc.edu/training/
Find us on: Genome Browser
@GenomeBrowser
UCSC Genome Browser
MoreMore information: information: More information:
Send us a message on our Find us on: SendSend us us a a message message on on our our FindFind us us on: on: publicpublic mailing mailing list: list: [email protected] mailing list: GenomeBrowser [email protected]@soe.ucsc.edu GenomeBrowser GenomeBrowser
View training info, videos, and @GenomeBrowser ViewView training training info, info, videos, videos, and and @GenomeBrowser @GenomeBrowser user’suser’s guides: guides: http://genome.ucsc.edu/training/user’s guides: UCSC Genome Browser http://genome.ucsc.edu/training/ UCSC Genome Browser http://genome.ucsc.edu/training/ UCSC Genome Browser