Developing the Cis-Regulatory Association Model (CRAM) to Identify Combinations Of

Developing the Cis-Regulatory Association Model (CRAM) to Identify Combinations of Transcription Factors in ChI!-Se# Data Thesis !resented in Partial Fulfillment of the Re#uirements for the Degree Master of Science in the $raduate School of The %hio State University 'y 'rian Kennedy) '*"* Department of Computer Science and Engineering The %hio State University ,-.- Thesis Committee/ 0ictor X* Jin) Advisor Raghu Machira3u Copyright by 'rian Kennedy ,-.- Abstract There are appro4imately ,)5-- human transcription factors 6hich may cis-regulate the e4pression of pro4imal genes. These T s may further interact 6ith one another and e4hibit different behavior in combination than individually) cis-regulatory modules (CRM)* +ven simple , and 7 T combinations could form over ,*8 billion different cis-regulatory modules* Testing the functionality of these modules e4perimentally 6ill be a massive underta9ing* CRAM) the Cis-Regulatory Association Modeler predicts functional regulatory modules in silico using T s found in se#uences searched for T motifs defined by !osition :eight Matrices. This technique targets ChI!-se# data and finds CRMs 6hich are over-represented in the target se#uences compared to a random bac9ground) or another contrasting sample of se#uences, by using contrast fre#uent item-set mining in the e4perimental ChI!-se# pea9s and the control sample* The error 6ith 6hich these CRMs may be separated from the random bac9ground by a variety of features is used to deter- mine 6hich CRMs are truly specific to the e4perimental ChI!-se# sample under degree of motif matching) relative position) and genetic conservation* eed-for6ard neural net- 6or9s are used to learn the function 6hich specifies the classifiability of each CRM and calculate the error 6ith 6hich they are compared* Several other programs use a compara- ble approach; ho6ever) the application of neural net6or9s specifically and contrast item- set mining is novel* ii Dedicated to my mother) father) and brother) for all of their love and support* iii Acknowledgments I have many people to than9 for my ma9ing it this far/ my advisor) Dr* 0ictor 2in) for everything he<s done; Dr* Raghu Machira3u, for his counsel and support; all of my lab mates) for their 9no6ledge) assistance) and encouragement; and the incredible 'iomedical Informatics Department staff for everything they do* iv Vita 2003 Memphis Central High School 2008 '*"* Computer Science) University of Memphis 2009 Transferred from M*"* Bioinformatics, University of Memphis 2009-!resent M*"* Computer Science & Engineering) The %hio State University !ublications (ennedy BA) $ao :) Huang T=) 2in 01 (2009) =RT'?Db/ an informative data resource for hormone receptors target binding loci. Nucleic Acids Res. 38:D676-681 Bapat SA) 2in 0) Berry @) Balch C) Sharma @) (urrey @) Ahang ") ang ) ?an 1) ?i M) (ennedy 'A) Bigsby RM) Huang T=) @ephe6 (! (2010) Multivalent epigenetic mar9s confer microenvironment-responsive epigenetic plasticity to ovarian cancer cells. Epige- netics 5(8):716-729 ields of Study Ma3or/ Computer Science & Engineering Machine Learning applied in Bioinformatics v Table of Contents Abstract****************************************************************************************************************************************************ii Ac9no6ledgments********************************************************************************************************************************iv 0ita************************************************************************************************************************************************************v Table of Contents**********************************************************************************************************************************vi ?ist of Illustrations ******************************************************************************************************************************ix ?ist of Tables*****************************************************************************************************************************************4ii ?ists of Symbols & Abbreviations**************************************************************************************************4iii Chapter 1/ Introduction*************************************************************************************************************************. .*. Biological Bac9ground of the Buestion*********************************************************************************. .*, Research Buestion & $oals******************************************************************************************************C .*7 Prior Art in CRM Prediction****************************************************************************************************D .*C %vervie6 of the Solution**********************************************************************************************************D .*D %vervie6 of the Thesis**************************************************************************************************************E Chapter 2/ Algorithms and Design*****************************************************************************************************8 ,*. Input Data**********************************************************************************************************************************.- ,*, Transcription Factor Representation*************************************************************************************., ,*7 Transcription Factor Search****************************************************************************************************.C ,*C $enetic Conservation Measurement************************************************************************************.5 ,*D Transcription Factor Cartization*******************************************************************************************,- ,*5 Contrast Fre#uent Item-set Mining**************************************************************************************,C ,*E Artificial Neural Net6or9s*****************************************************************************************************,5 vi ,*G %utput Data*******************************************************************************************************************************7- Chapter 3/ Comparing Transcription Factors in KD5, cells*********************************************************7. 7*. Data Source********************************************************************************************************************************7. 7*, Data summary***************************************************************************************************************************7, 7*7 Biological bac9ground*************************************************************************************************************7D 7*C Method****************************************************************************************************************************************7D 7*D Analysis**************************************************************************************************************************************7D 7*5 Discussion**********************************************************************************************************************************C- Chapter 4/ Contrasting T$ -H Treatment in A,EG-**********************************************************************C, C*. Data source********************************************************************************************************************************C, C*, Data summary***************************************************************************************************************************C, C*7 Biological bac9ground*************************************************************************************************************CD C*C Method****************************************************************************************************************************************CD C*D Analysis**************************************************************************************************************************************CE C*5 Discussion**********************************************************************************************************************************D- Chapter 5/ Conclusions***********************************************************************************************************************D, D*. Address of the Research Buestion****************************************************************************************D, D*, Discussion of CRAM****************************************************************************************************************D7 D*7 Future :or9*******************************************************************************************************************************DC Bibliography******************************************************************************************************************************************DE Appendix************************************************************************************************************************************************5- A* FASTA format****************************************************************************************************************************5- '* BED format*********************************************************************************************************************************5- +* %utput format*****************************************************************************************************************************57 vii * J"%@ format********************************************************************************************************************************5C

Load more