GTEx in the UCSC Genome Browser

Kate Rosenbloom UCSC Genome Browser Group

December 2015 Movaon Incorporate newer gene expression data sets and explore new ways of displaying ssue- specific gene expression in the browser.

Current ssue expression tracks:

GNF Atlas 2 (2009) ENCODE transcripon signal (2008-2012)

è Browser supplement funding to integrate GTEx into the genome browser database and develop visualizaons

OXT: Oxytocin precursor Expression from GNF Atlas2 Aims

1. Integrate gene expression data and metadata into the browser database

2. Develop new track display with more compact layout for higher on-screen annotaon density

3. Create browser tracks for GTEx gene-level expression and allele-specific expression

Progress 1. Integrate gene expression data and metadata into the ✔ browser database

✔ 2. Develop new bar-graph track display

✔ 3. Create browser track for GTEx gene-level expression

4. Create browser track for GTEx allele-specific expression

Try the GTEx gene expression track on the UCSC preview browser: hp://nyurl.com/gtexUcsc

GTEx Gene Expression track

Scale 1 kb hg19 chr17: 37,821,000 37,821,500 37,822,000 37,822,500 37,823,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors)

TCAP

Basic Gene Annotation Set from ENCODE/GENCODE Version 18

UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics)

Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription

Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas

4.88 _ 100 vertebrates Basewise Conservation by PhyloP 100 Vert. Cons -4.5 _

LCAP gene locus, showing highest expression in heart and skeletal muscle. Mouseover on bars shows ssue type and median score. Click-through shows boxplot with score ranges. Gene expression details page Mul-gene view

Scale 100 kb hg19 chr17: 37,800,000 37,850,000 37,900,000 37,950,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) NEUROD2 MIR4728

PPP1R1B MIEN1

STARD3 GRB7

TCAP IKZF3 PNMT PGAP3

ERBB2 Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription

Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas PeptideAtlas

200 Kbp region of Mul-megabase view

Scale 1 Mb hg19 chr17: 37,000,000 37,500,000 38,000,000 38,500,000 39,000,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) GPR179 MIR4734 ARL5C NEUROD2 PSMD3 RARA KRT24 SOCS7 MLLT6 CACNB1 PPP1R1B CSF3 TOP2A KRT25 ARHGAP23 CWC25 RPL19 STARD3 MED24 IGFBP4 KRT28 SRCIN1 FBXO47 MED1 GRB7 THRA TNS4 KRT10 C17orf96 PLXDC1 CDK12 ZPBP2 WIPF2 CCR7 KRT23 MIR4726 STAC2 TCAP NR1D1 SMARCE1 KRT39 CISD3 FBXL20 PNMT MSL1 KRT26 PCGF2 PGAP3 CASC3 KRT27 PSMB3 ERBB2 RAPGEFL1 TMEM99 PIP4K2B MIR4728 CDC6 KRT12 MIR4727 MIEN1 KRT20 C17orf98 IKZF3 KRT40 RPL23 GSDMB KRTAP3-3 LASP1 ORMDL3 KRTAP3-2 LINC00672 LRRC3C KRTAP3-1 LRRC37A11P GSDMA CTD-2206N4.4 SNORD124 Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription

Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas PeptideAtlas

2.7 Mbp region of chromosome 17 Track features

• Displays bar graph height as median RPKM raw score (with configurable range) or log transform. Mouseovers on gene shows descripon; on bars show ssue and score. • Shows sample quarles and outliers on details page. • Allows filtering by ssue. • Provides comparison funcon to subset samples (currently, Male/Female). • Displays comparisons as difference graph or mirror graphs. Tissue selecon

Track configuraon supports user choice of ssues to display Tissue selecon, cont.

Alternave configuraon panel, with ssues grouped by system Comparisons

Scale 50 kb hg19 chr17: 37,800,000 37,850,000 Gene Expression in 53 tissues from GTEx RNA-seq of 2921 samples (209 donors) F F NEUROD2 TCAP M M F F PPP1R1B PGAP3 M M F STARD3 ERBB2 M F PNMT M Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE Transcription

Simple Nucleotide Polymorphisms (dbSNP 144) Found in >= 1% of Samples Common SNPs(144)

Gender expression differences graph. The ssue filter was applied here to exclude gender-specific ssues. Track Descripon Page Track descripon Linkouts to GTEx portal

UCSC/GENCODE genes detail page

GTEx gene expression details page Under the covers: GTEx Database tables

Primary table for GTEx Gene Expression track. This is a summary table of ssue medians Per gene, anchored to the genomic locaon of the GTEx union transcript for the gene.

Related tables: Data: gtexSampleData (RPKM scores for each gene by sample) Metadata: gtexTissue, gtexSample, gtexDonor Data mining & analysis: UCSC public MySQL server

$ mysql –user=genome –host=genome-mysql.cse.ucsc.edu –A \! !hgFixed -e ‘select sampleId, tissue, gender, age,\ ! !deathClass, ischemicTime, autolysisScore, rin,\ ! !collectionSites, batchId, isolationDate from \ ! ! ! !gtexSample, gtexDonor where \ ! ! ! !! !gtexSample.donor=gtexDonor.name' | \! !!sed -e 's/ //' > !sampleDf.txt! $ R! >sampleDf <- read.table("sampleDf.txt", sep="\t", ! ! !header=TRUE)!

Example: Query GTEx metadata tables to create an R dataframe Data mining & analysis: New tool for database queries/intersecons Plans

• Release GTEx gene expression tracks on hg19 and hg38 (V4, then V6) • Implement GTEx allele-specific expression track: – Display will be based on difference bar graph, as in gene expression track – Inial design will locate graph at eQTL SNP’s – Your input welcome! Other plans

• Gene sorter

• Tree of cells visualizaon Community data: GTEx data hub ?

• RNA-seq signal tracks for all samples • Exon-level expression • Transcript-level expression • Proteomics ?

-> Can display on UCSC or Ensembl (and later, NCBI sequence viewer)

-> Can view and intersect eQtl’s with regulatory data from ENCODE, Epigenomics Roadmap, etc.

-> Can view and intersect with variant and medical annotaons, comparave genomics GTEx + private data: Genome Browser in a Box (GBiB)

GBiB running on a Macbook, with private data loaded as a custom track. The GBiB preinstalled image is run in a virtual machine so data can be kept local. UCSC tracks can be mirrored locally or retrieved from the UCSC public MySQL server. Acknowledgements

Colleagues: UCSC Genome Browser group PI: Jim Kent Collaborators: GTEx Consorum Funding: NHGRI (5 U41 HG002371 to UCSC Center for Genomic Science) UCSC Browser help

Send us a message to our public mailing list: [email protected]

View training info, videos, and user’s guides: http://genome.ucsc.edu/training/

Find us on: Genome Browser

@GenomeBrowser

UCSC Genome Browser

MoreMore information: information: More information:

Send us a message on our Find us on: SendSend us us a a message message on on our our FindFind us us on: on: publicpublic mailing mailing list: list: [email protected] mailing list: GenomeBrowser [email protected]@soe.ucsc.edu GenomeBrowser GenomeBrowser

View training info, videos, and @GenomeBrowser ViewView training training info, info, videos, videos, and and @GenomeBrowser @GenomeBrowser user’suser’s guides: guides: http://genome.ucsc.edu/training/user’s guides: UCSC Genome Browser http://genome.ucsc.edu/training/ UCSC Genome Browser http://genome.ucsc.edu/training/ UCSC Genome Browser