Computational Analysis of the ENCODE Datasets and Other Related Epigenetic Explorations

Computational analysis of the ENCODE datasets and other related epigenetic explorations . Ved Topkar Harvard College class of 2016 Gunawardena Lab Harvard Medical School Department of Systems Biology 13 August 2013 . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 1 / 1 Introduction Presentation Goals FULL understanding of discussed material Ask questions along the way! . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 2 / 1 Introduction Outline 1. Molecular biology in a jiffy 2. A case study Hypothesis formulation Analyzing data 3. More examples . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1 Introduction Outline 1. Molecular biology in a jiffy 2. A case study Hypothesis formulation Analyzing data 3. More examples . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1 Introduction Outline 1. Molecular biology in a jiffy 2. A case study Hypothesis formulation Analyzing data 3. More examples . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1 Molecular Biology Essentials The Cell . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 4 / 1 Molecular Biology Essentials The Central Dogma . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 5 / 1 Molecular Biology Essentials Transcriptional Regulation . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 6 / 1 Molecular Biology Essentials Transcriptional Access . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 7 / 1 Background Epigenetics and Gene Expression Things beyond just the base pairs in DNA matter ! gene expression . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 8 / 1 Background The Question Analyze the ENCODE dataset . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 9 / 1 Background The Question Analyze the ENCODE dataset . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 9 / 1 Background . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 10 / 1 Background ENCODE (Overview) . Overview . National Human Genome Institute: Encyclopedia of DNA Elements (ENCODE) Nearly 600 collaborating labs post HGP . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 11 / 1 Background The Data Set . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 12 / 1 Background Data Types Raw signals Raw signal peak calling outputs (e.g. PeakSeq results) Relatively course-grain peak data . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 13 / 1 A simple example TFBS The Game Plan . Can we reduce transcription factor binding landscapes into categories? . Scan across genome, looking for promoters Bin promoters appropriately Score binding at each promoter Clustering analysis . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 14 / 1 A simple example TFBS RefSeq . Overview . Curated database of genes New versions released as frequently as Firefox Includes pseudogenes, haplotype variations, and predicted genes . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 15 / 1 A simple example TFBS Defining Promoter? Only upstream from TSS? Incredibly far regulatory regions? Intronic regulation? Post termination regulatory elements? 1000 bp upstream . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1 A simple example TFBS Defining Promoter? Only upstream from TSS? Incredibly far regulatory regions? Intronic regulation? Post termination regulatory elements? 1000 bp upstream . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1 A simple example TFBS Defining Promoter? Only upstream from TSS? Incredibly far regulatory regions? Intronic regulation? Post termination regulatory elements? 1000 bp upstream . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1 A simple example TFBS Defining Promoter? Only upstream from TSS? Incredibly far regulatory regions? Intronic regulation? Post termination regulatory elements? 1000 bp upstream . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1 A simple example TFBS Defining Promoter? Only upstream from TSS? Incredibly far regulatory regions? Intronic regulation? Post termination regulatory elements? 1000 bp upstream . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1 A simple example TFBS Binning How do we quantitatively analyze promoter presence? Break promoter regions into bins for a finer metric? Do we give weights to bins as a function of their position? Single, unweighted 1000 bp bin of counts . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1 A simple example TFBS Binning How do we quantitatively analyze promoter presence? Break promoter regions into bins for a finer metric? Do we give weights to bins as a function of their position? Single, unweighted 1000 bp bin of counts . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1 A simple example TFBS Binning How do we quantitatively analyze promoter presence? Break promoter regions into bins for a finer metric? Do we give weights to bins as a function of their position? Single, unweighted 1000 bp bin of counts . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1 A simple example TFBS Binning How do we quantitatively analyze promoter presence? Break promoter regions into bins for a finer metric? Do we give weights to bins as a function of their position? Single, unweighted 1000 bp bin of counts . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1 A simple example Survey of Promoters Forward vs. Backward Strand . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 18 / 1 A simple example Survey of Promoters TF Binding Frequency . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 19 / 1 A simple example Survey of Promoters Histogram of promoter binding . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 20 / 1 A simple example Survey of Promoters Promoter/TFBS intersections . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 21 / 1 A simple example Survey of Promoters Computational Efficiency This was an exercise in program optimization Original algorithm took about 5 days, optimized/parallelized algorithm took just a few hours . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 22 / 1 A simple example Survey of Promoters Clustering The unsupervised grouping of information such that groups have similar elements that are dissimilar from elements in other groups . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 23 / 1 A simple example Clustering analysis Clustering . Minkowski Distance (p = 2 ! Euclidean Distance) . ! 1 Xn p p jxi − yi j . i=1 . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 24 / 1 A simple example Clustering analysis Clustering . Simple Linkage Clustering . D(X ; Y ) = min(d(x; y)) 8 (x; y)ϵ(X ; Y ) . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 25 / 1 A simple example Clustering analysis Clustering (simple linkage exhibiting chaining) . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 26 / 1 A simple example Clustering analysis Clustering . Complete Linkage Clustering . D(X ; Y ) = max(d(x; y)) 8 (x; y)ϵ(X ; Y ) . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 27 / 1 A simple example Clustering analysis Clustering (complete linkage) . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 28 / 1 A simple example Clustering analysis Computational Efficiency Pairwise distance calculations require a LOT of RAM . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 29 / 1 A simple example Clustering analysis 2-way clustering heatmap . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 30 / 1 A simple example Clustering analysis 2-way clustering heatmap (with random data arrays) . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 31 / 1 A simple example Roadmap Dataset Analysis Histone Modifications (Roadmap Dataset) . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 32 / 1 A simple example Roadmap Dataset Analysis A quick correlation test R = 0.042 . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 33 / 1 A simple example Roadmap Dataset Analysis Conclusions There are numerous methods of promoter binning that prove useful for complexity reduction Unsupervised clustering of preprocessed promoter data yield results with biological significance . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 34 / 1 A simple example Roadmap Dataset Analysis Next Steps Incorporation of nucleosome enrichment, methylation, and histone modification data in a more meaningful way Further refining of cluster analysis pipeline to uncover more unknown biology . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 35 / 1 A simple example End Thanks! Jeremy Gunawardena and the HMS Department of Systems Biology! My collaborator Tobias Ahsendorf! PRISE and Greg Llacer! The lovely PRISE staff! This audience! . Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 36 / 1.

Computational Analysis of the ENCODE Datasets and Other Related Epigenetic Explorations

ENCODE Regions Identification of Higher-Order Functional Domains in the Human

Developing and Implementing an Institute-Wide Data Sharing Policy Stephanie OM Dyke and Tim JP Hubbard*

Estimating the Number of Unseen Variants in the Human Genome

The ENCODE Project

The ENCODE (Encyclopedia of S DNA Elements) Project

What Is a Gene, Post-ENCODE? History and Updated Definition

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Current Advances of Epigenetics in Periodontology from ENCODE Project

DNA Copy Number Analysis of Fresh and Formalin-Fixed Specimens By

ENCODE Phase 2: Participants and Projects

National Human Genome Research Institute EA Feingold, PJ G

Multiple Genes Encode Nuclear Factor 1-Like Proteins That Bind To