Regularization Methods for Predicting an Ordinal Response Using Longitudinal High-Dimensional Genomic Data

Total Page:16

File Type:pdf, Size:1020Kb

Regularization Methods for Predicting an Ordinal Response Using Longitudinal High-Dimensional Genomic Data Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2013 Regularization Methods for Predicting an Ordinal Response using Longitudinal High-dimensional Genomic Data Jiayi Hou Virginia Commonwealth University Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Biostatistics Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/3242 This Dissertation is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. c Jiayi Hou 2013 All Rights Reserved REGULARIZATION METHODS FOR PREDICTING AN ORDINAL RESPONSE USING LONGITUDINAL HIGH-DIMENSIONAL GENOMIC DATA A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University By Jiayi Hou B.S. Mathematics, Sichuan University, 2008 Advisor: Kellie J. Archer Associate Professor, Department of Biostatistics Director, VCU Massey Cancer Center Biostatistics Shared Resource Virginia Commonwealth University Richmond, Virginia December, 2013 Acknowledgement I own my sincere thanks to countless people who have helped, supported, encouraged me during my long Ph.D journey. Without your help, this thesis would not be possible. First and foremost, I must thank my thesis advisor Dr. Kellie Archer for the generous support she gave to help pave the path to achievement. Dr.Archer has inspired me to devote to statistical learning area by her enthusiasm, vision and determination. Learning from Dr. Archer has been the greatest pleasure and will be the lifetime treasure. Thank you to the rest of my thesis committee: Dr. Chris Gennings, Dr. Robert Johnson who are experts in the fields of biostatistics; Dr. Sam Chen, who has tremendous research experience in genetics and genomics; and Dr. Juan Lu, who contributes enormously to the epidemiology area. Your constructive feedback, rich support and kind encouragement help me grow both as a researcher and as a person. I truly appreciate the time, effort and energy you dedicated to help make this happen. Many other great supervisors I got to know through various projects and positions have taught me a lot. Thank you to Dr. Mark Reimers for introducing me to the field of bioinformatics and offering me the rigorous training. Thank you to Dr. Charlie Kish and Dr. Carol Summitt for providing a unique opportunity to gain real industrial experience and broadening the areas where my knowledge can be applied. Thank you to Dr. Phillip Yates, Dr. Max Kuhn and other team members at Pfizer Inc for offering me an enjoyable, memorable and cool summer in Connecticut. I owe a big thanks to my classmates Adam Sima, Caroline Carrico, Amber Wilk, Sarah i Reese, Chunfeng Ren, Qing Zhou, Yan Jin who grow, laugh, gossip and enjoy graduate school with me. There were other people in the department who provided generous support and helped me get through school: Yvonne Hargrove, Gayle Spivery, Helen Wang, Brian Bush, Russ Boyle, Dr. Donna McClish, Dr. Roy Sabo, Dr. Jessica Ketchum, Dr. Guimin Gao and Dr. Shumei Sun, I owe you a great debt of gratitude. I am grateful to all my other friends in Richmond, VA for your accompaniment, tolerance and encouragement which makes the journey wonderful. Finally, I owe everything to my parents and family members who unconditionally support me to pursue my dream. I feel blessed and will always cherish the memory at Virginia Commonwealth University. ii Table of Contents Page Table of Contents iii List of Figures ix List of Tables xii Abstract xx 1 Introduction to Ordinal Model 1 1.1 Ordinal Responses . 2 1.2 Model Framework for Ordinal Responses . 3 1.2.1 Cumulative Logit Model . 5 1.2.2 Adjacent Categories Model . 6 1.2.3 Continuation Ratio Model . 8 1.3 Estimation of the Coefficients . 10 1.3.1 Maximum Likelihood Estimate . 10 1.3.2 Optimization Technique . 11 1.3.3 Software Implementation . 13 1.4 NIMH Schizophrenia Example . 14 2 Regularization Methods for High-dimensional Data 20 iii 2.1 Regularization Methods for Continuous Response . 21 2.1.1 LASSO . 24 2.1.2 Forward Stagewise Method . 26 2.1.3 LAR . 27 2.2 Regularization Methods for Dichotomous Responses . 28 2.2.1 LASSO for Logistic Regression . 29 2.2.2 Forward Stagewise for Logistic Regression . 30 2.3 Coordinate Descent for LASSO Regularization Paths . 32 2.4 Some Discussion . 35 3 Statistical Models for Longitudinal Data 37 3.1 Linear Mixed Model . 40 3.1.1 Linear Regression Model . 40 3.1.2 ANOVA and MANOVA Approaches for Repeated Measurement . 41 3.1.3 Linear Mixed Model . 47 3.1.4 Estimating Parameters for a Linear Mixed Model . 55 3.2 Nonlinear Mixed Model . 66 3.2.1 The Model Framework . 66 3.2.2 The Marginal Likelihood and its Approximation . 67 3.2.3 Estimating of the Parameters . 71 3.2.4 Orange Tree Example . 72 3.3 Generalized Linear Model . 79 3.3.1 Generalized Linear Model Framework . 79 3.3.2 Moments and Likelihood for GLM . 81 iv 3.3.3 Maximum Likelihood Estimates for GLM . 83 3.3.4 Quasi-Likelihood Estimates for GLM . 84 3.4 Generalized Linear Mixed Model . 87 3.4.1 Generalized Equation Estimation for Marginal Model . 87 3.4.2 Penalized Quasi-likelihood for GLMM . 89 4 Random Coefficient Model with Ordinal Response 94 4.1 Random Coefficient Model with Ordinal Response . 95 4.2 The Marginal Likelihood and its Approximation . 97 4.3 Estimating Model Parameters . 105 4.4 Estimating the Random Effects . 108 4.5 NIMH Schizophrenia Example Revisited . 109 4.6 Health Services Research Example . 122 5 Penalized Model for Traditional Longitudinal High-dimensional Data with an Ordinal Response 127 5.1 Review of Forward Stagewise Method . 128 5.2 Regularization Method for High-dimensional Data with Ordinal Response . 131 5.3 Regularization Method for Longitudinal High-dimensional Data with an Or- dinal Response . 137 5.4 Model Assessment and Selection . 143 5.5 Software Implementation . 147 5.6 Simulations to Evaluate the Proposed Model . 166 5.6.1 Simulation for High-dimensional Data . 166 v 5.6.2 Simulation for Longitudinal High-dimensional Data . 167 5.7 Some Discussion . 168 6 Application of Proposed Methodology 172 6.1 Application to the Smoking Study . 173 6.2 Application to the Glue Grant Study . 179 6.2.1 Marshall score for the renal system . 182 6.2.2 Marshall score for the central nervous system . 190 6.2.3 Aggregated Marshall score . 196 6.3 Discussion . 200 7 Conclusions and Future Work 203 7.1 Conclusions . 203 7.2 Future Work . 206 7.2.1 Variable Selection using LAR type Algorithm . 206 7.2.2 Variable Selection with Consideration of the Correlations between Fea- tures . 208 7.2.3 Application to Other Genomic and Medical Data . 210 Bibliography 213 Appendices 223 A NIMH Schizophrenia Data Code 224 A.1 R code for NIMH Schizophrenia Data . 224 A.2 R code for NIMH Schizophrenia Data using VGAM package . 227 vi A.3 SAS code for NIMH Schizophrenia Data . 228 B Orange Tree Example Code 230 B.1 R code for Orange Tree Example . 230 B.2 R code for Orange Tree Example using lme4 package . 233 B.3 SAS code Orange Tree Example . 234 B.4 WinBUGS code for Orange Tree Example . 235 C NIMH Schizophrenia Longitudinal Data Code 237 C.1 R code for NIMH Schizophrenia Longitudinal Data . 237 C.2 SAS code for NIMH Schizophrenia Longitudinal Data . 237 C.3 R code for NIMH Schizophrenia Longitudinal Data using ordinal pacakge . 241 D NIMH Schizophrenia Longitudinal Data Additional Results 242 D.1 Random Coefficient Model with Adjacent Categories Logit . 242 D.2 Random Coefficient Model with Backward Continuation Ratio . 248 D.3 Random Coefficient Model with Forward Continuation Ratio . 254 E Health Service Research Example Code 260 E.1 R code for Health Service Research Example . 260 E.2 SAS code for Health Service Research Example . 260 F Health Service Research Example Additional Results 262 F.1 Health Service Research Example output: Random Intercept Model with Adjacent-Category Logit . 262 F.2 Random Intercept Model with Backward Continuation Ratio . 263 vii F.3 Random Intercept Model with Forward Continuation Ratio Logit . 265 G GSE10006 Smoking Study Additional Results 267 H Glue Grant Burn Injury Study Example Additional Results 269 I R code for R package ordinalmixed with Applications 277 I.1 Source Code . 277 I.2 Application to NIMH Schizophrenia Longitudinal Data . 305 I.3 Application to Health Service Research Example . 306 I.4 Application to GSE10006 Smoking Study . 307 I.5 Application to Glue Grant Burn Injury Study . 310 I.6 High-dimensional Data Simulation . 317 I.7 Longitudinal High-dimensional Data Simulation . 319 viii List of Figures 2.1 Estimation picture for the LASSO . 25 2.2 Least square projection in linear regression model . 27 3.1 Orange Tree Growth Curves . 72 4.1 Summary of IMPS score (Normal, Mild, Marked, Severe) by Time in the Placebo Group . 110 4.2 Summary of IMPS score (Normal, Mild, Marked, Severe) by Time in the Intervention Group . 111 4.3 Summary of Housing Status by Time in Group with Section 8 Certificates . 124 4.4 Summary of Housing Status by Time in Group without Section 8 Certificates 124 5.1 Flowchart for function FSPenFixed in R package ordinalmixed. The blue circle represents input/output and the cyan rectangle represents an R func- tion. The FSPenFixed function first calls function forward.stagewise.cum to perform steps 1,2 and 3 described in GMIFS for ordinal response with high- dimensional data.
Recommended publications
  • (12) STANDARD PATENT (11) Application No. AU 2010325179 B2 (19) AUSTRALIAN PATENT OFFICE
    (12) STANDARD PATENT (11) Application No. AU 2010325179 B2 (19) AUSTRALIAN PATENT OFFICE (54) Title Blood transcriptional signature of active versus latent Mycobacterium tuberculosis infection (51) International Patent Classification(s) C12N 15/31 (2006.01) G01N 33/15 (2006.01) C12Q 1/68 (2006.01) C12R 1/32 (2006.01) (21) Application No: 2010325179 (22) Date of Filing: 2010.08.19 (87) WIPONo: WO11/066008 (30) Priority Data (31) Number (32) Date (33) Country 12/628,148 2009.11.30 US (43) Publication Date: 2011.06.03 (44) Accepted Journal Date: 2015.03.12 (71) Applicant(s) Medical Research Council;Baylor Research lnstitute;lmperial College Healthcare NHS Trust (72) Inventor(s) Banchereau, Jacques F.;Chaussabel, Damien;O'Garra, Anne;Berry, Matthew;Kon, Onn Min (74) Agent / Attorney Pizzeys, PO Box 291, WODEN, ACT, 2606 (56) Related Art WO 2004/001070 BERRY, M.P.R. et al., Thorax, 2008, Vol. 63 (Suppl VII), page A63 STERN, J.N. et al., Immunologic Research, 2009, Vol. 45, pages 1-12 MISTRY, R. et al., The Journal of Infectious Diseases, 2007, Vol. 195, pages 357-365 (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date WO 2011/066008 A3 3 June 2011 (03.06.2011) PCT (51) International Patent Classification: (74) Agents: CHALKER, Daniel J. et al.; Chalker Flores, C12Q 1/68 (2006.01) G01N33/15 (2006.01) LLP, 14951 North Dallas Parkway, Suite 400, Dallas, TX C12N15/31 (2006.01) C12R 1/32 (2006.01) 75254 (US).
    [Show full text]
  • Identification and Characterization of Genes Essential for Human Brain Development
    Identification and Characterization of Genes Essential for Human Brain Development The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Ganesh, Vijay S. 2012. Identification and Characterization of Genes Essential for Human Brain Development. Doctoral dissertation, Harvard University. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:9773743 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Copyright © 2012 by Vijay S. Ganesh All rights reserved. Dissertation Advisor: Dr. Christopher A. Walsh Author: Vijay S. Ganesh Identification and Characterization of Genes Essential for Human Brain Development Abstract The human brain is a network of ninety billion neurons that allows for many of the behavioral adaptations considered unique to our species. One-fifth of these neurons are layered in an epithelial sheet known as the cerebral cortex, which is exquisitely folded into convolutions called gyri. Defects in neuronal number clinically present with microcephaly (Greek for “small head”), and in inherited cases these defects can be linked to mutations that identify genes essential for neural progenitor proliferation. Most microcephaly genes are characterized to play a role in the centrosome, however rarer presentations of microcephaly have identified different mechanisms. Charged multivesicular body protein/Chromatin modifying protein 1A (CHMP1A) is a member of the ESCRT-III endosomal sorting complex, but is also suggested to localize to the nuclear matrix and regulate chromatin.
    [Show full text]
  • Analysis of the Indacaterol-Regulated Transcriptome in Human Airway
    Supplemental material to this article can be found at: http://jpet.aspetjournals.org/content/suppl/2018/04/13/jpet.118.249292.DC1 1521-0103/366/1/220–236$35.00 https://doi.org/10.1124/jpet.118.249292 THE JOURNAL OF PHARMACOLOGY AND EXPERIMENTAL THERAPEUTICS J Pharmacol Exp Ther 366:220–236, July 2018 Copyright ª 2018 by The American Society for Pharmacology and Experimental Therapeutics Analysis of the Indacaterol-Regulated Transcriptome in Human Airway Epithelial Cells Implicates Gene Expression Changes in the s Adverse and Therapeutic Effects of b2-Adrenoceptor Agonists Dong Yan, Omar Hamed, Taruna Joshi,1 Mahmoud M. Mostafa, Kyla C. Jamieson, Radhika Joshi, Robert Newton, and Mark A. Giembycz Departments of Physiology and Pharmacology (D.Y., O.H., T.J., K.C.J., R.J., M.A.G.) and Cell Biology and Anatomy (M.M.M., R.N.), Snyder Institute for Chronic Diseases, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada Received March 22, 2018; accepted April 11, 2018 Downloaded from ABSTRACT The contribution of gene expression changes to the adverse and activity, and positive regulation of neutrophil chemotaxis. The therapeutic effects of b2-adrenoceptor agonists in asthma was general enriched GO term extracellular space was also associ- investigated using human airway epithelial cells as a therapeu- ated with indacaterol-induced genes, and many of those, in- tically relevant target. Operational model-fitting established that cluding CRISPLD2, DMBT1, GAS1, and SOCS3, have putative jpet.aspetjournals.org the long-acting b2-adrenoceptor agonists (LABA) indacaterol, anti-inflammatory, antibacterial, and/or antiviral activity. Numer- salmeterol, formoterol, and picumeterol were full agonists on ous indacaterol-regulated genes were also induced or repressed BEAS-2B cells transfected with a cAMP-response element in BEAS-2B cells and human primary bronchial epithelial cells by reporter but differed in efficacy (indacaterol $ formoterol .
    [Show full text]
  • Diana Maria De Figueiredo Pinto Marcadores Moleculares Para A
    Universidade de Aveiro Departamento de Química 2015 Diana Maria de Marcadores moleculares para a Nefropatia Figueiredo Pinto Diabética Molecular markers for Diabetic Nephropathy Universidade de Aveiro Departamento de Química 2015 Diana Maria de Marcadores moleculares para a Nefropatia Diabética Figueiredo Pinto Molecular markers for Diabetic Nephropathy Dissertação apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Bioquímica, ramo de Bioquímica Clínica, realizada sob a orientação científica da Doutora Maria Conceição Venâncio Egas, investigadora do Centro de Neurociências e Biologia Celular da Universidade de Coimbra, e da Doutora Rita Maria Pinho Ferreira, professora auxiliar do Departamento de Química da Universidade de Aveiro. Este trabalho foi efetuado no âmbito do programa COMPETE, através do projeto DoIT – Desenvolvimento e Operacionalização da Investigação de Translação, ref: FCOMP-01-0202-FEDER- 013853. o júri presidente Prof. Francisco Manuel Lemos Amado professor associado do Departamento de Química da Universidade de Aveiro Doutora Maria do Rosário Pires Maia Neves Almeida investigadora do Centro de Neurociências e Biologia Celular da Universidade de Coimbra Doutora Maria Conceição Venâncio Egas investigadora do Centro de Neurociências e Biologia Celular da Universidade de Coimbra Agradecimentos Em primeiro lugar quero expressar o meu agradecimento à Doutora Conceição Egas, orientadora desta dissertação, pelo seu apoio, palavras de incentivo e disponibilidade demonstrada em todas as fases que levaram à concretização do presente trabalho. Obrigada pelo saber transmitido, que tanto contribuiu para elevar os meus conhecimentos científicos, assim como pela oportunidade de integrar o seu grupo de investigação. O seu apoio e sugestões foram determinantes para a realização deste estudo.
    [Show full text]
  • Content Based Search in Gene Expression Databases and a Meta-Analysis of Host Responses to Infection
    Content Based Search in Gene Expression Databases and a Meta-analysis of Host Responses to Infection A Thesis Submitted to the Faculty of Drexel University by Francis X. Bell in partial fulfillment of the requirements for the degree of Doctor of Philosophy November 2015 c Copyright 2015 Francis X. Bell. All Rights Reserved. ii Acknowledgments I would like to acknowledge and thank my advisor, Dr. Ahmet Sacan. Without his advice, support, and patience I would not have been able to accomplish all that I have. I would also like to thank my committee members and the Biomed Faculty that have guided me. I would like to give a special thanks for the members of the bioinformatics lab, in particular the members of the Sacan lab: Rehman Qureshi, Daisy Heng Yang, April Chunyu Zhao, and Yiqian Zhou. Thank you for creating a pleasant and friendly environment in the lab. I give the members of my family my sincerest gratitude for all that they have done for me. I cannot begin to repay my parents for their sacrifices. I am eternally grateful for everything they have done. The support of my sisters and their encouragement gave me the strength to persevere to the end. iii Table of Contents LIST OF TABLES.......................................................................... vii LIST OF FIGURES ........................................................................ xiv ABSTRACT ................................................................................ xvii 1. A BRIEF INTRODUCTION TO GENE EXPRESSION............................. 1 1.1 Central Dogma of Molecular Biology........................................... 1 1.1.1 Basic Transfers .......................................................... 1 1.1.2 Uncommon Transfers ................................................... 3 1.2 Gene Expression ................................................................. 4 1.2.1 Estimating Gene Expression ............................................ 4 1.2.2 DNA Microarrays ......................................................
    [Show full text]
  • Development of a Hepatitis C Virus Knowledgebase with Computational Prediction of Functional Hypothesis of Therapeutic Relevance
    DEVELOPMENT OF A HEPATITIS C VIRUS KNOWLEDGEBASE WITH COMPUTATIONAL PREDICTION OF FUNCTIONAL HYPOTHESIS OF THERAPEUTIC RELEVANCE KOJO KWOFIE SAMUEL Thesis presented in fulfillment of the requirements for the Degree of Doctor Philosophiae at the South African National Bioinformatics Institute, Faculty of Natural Sciences, University of the Western Cape May 2011 Supervisor: Prof. Vladimir Bajic Co-supervisor: Prof. Alan Christoffels Keywords Abstract Association Biomedical concepts Database Dictionaries Hepatitis C Virus Hepatocellular carcinoma Hypothesis generation Protein-protein interactions Text mining ii Abstract To ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and co- occurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed.
    [Show full text]
  • Downloaded Definitions for 4,716 Human and Murine (Mammalian) Pathways
    bioRxiv preprint doi: https://doi.org/10.1101/041244; this version posted February 24, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 A Powerful Procedure for Pathway-based Meta-Analysis Using 2 Summary Statistics Identifies 43 Pathways Associated with 3 Type II Diabetes in European Populations 4 5 6 Han Zhang 1, William Wheeler 2 , Paula L Hyland 1, Yifan Yang 3, Jianxin Shi 1, Nilanjan 7 Chatterjee 4* , Kai Yu 1* 8 9 10 1 Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, 11 Bethesda, MD 20892, USA 12 13 2 Information Management Services Inc., Calverton, MD 20904, USA 14 15 3 Department of Statistics, University of Kentucky, Lexington, KY 40508, USA 16 17 4 Department of Biostatistics, Bloomberg School of Public Health and Department of 18 Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA 19 20 * Corresponding Authors 21 22 Kai Yu ([email protected]) 23 Nilanjan Chatterjee ([email protected]) 24 1 bioRxiv preprint doi: https://doi.org/10.1101/041244; this version posted February 24, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Abstract 2 Meta-analysis of multiple genome-wide association studies (GWAS) has become an 3 effective approach for detecting single nucleotide polymorphism (SNP) associations 4 with complex traits. However, it is difficult to integrate the readily accessible SNP- 5 level summary statistics from a meta-analysis into more powerful multi-marker 6 testing procedures, which generally require individual-level genetic data.
    [Show full text]
  • (12) Patent Application Publication (10) Pub. No.: US 2012/0015839 A1 Chinnaiyan (43) Pub
    US 2012OO15839A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2012/0015839 A1 Chinnaiyan (43) Pub. Date: Jan. 19, 2012 (54) RECURRENT GENE FUSIONS IN CANCER Related U.S. Application Data (60) Provisional application No. 61/143,598, filed on Jan. (75) Inventor: Arul M. Chinnaiyan, Plymouth, 9, 2009, provisional application No. 61/187,776, filed MI (US) on Jun. 17, 2009. (73) Assignee: THE REGENTS OF THE Publication Classification UNIVERSITY OF MICHIGAN, (51) Int. Cl. Ann Arbor, MI (US) CI2O I/68 (2006.01) C40B 30/04 (2006.01) (21) Appl. No.: 13/145,067 (52) U.S. Cl. ............................ 506/9; 435/6.14; 435/6.11 (57) ABSTRACT (22) PCT Filed: Jan. 8, 2010 The present invention relates to compositions and methods (86). PCT No.: PCT/US2O10/0205O1 for cancer diagnosis, research and therapy, including but not limited to, cancer markers. In particular, the present invention S371 (c)(1), relates to recurrent gene fusions as diagnostic markers and (2), (4) Date: Sep. 30, 2011 clinical targets for cancer (e.g., prostate cancer). & SSSSSSSSSSSSSSSSSSS s s s 8 s & S&SSSSSSS &SSS8 --------------------- :SSSSSSSSSS. Patent Application Publication Jan. 19, 2012 Sheet 1 of 32 US 2012/0015839 A1 FIGURE 1 S. SS S&SSSSSSSis sis.&Y an &SS Ssssssssssssssssssssssssssssssssssssssssssssssss Sixxxxxxxx xxxxSS&xxxxxx Patent Application Publication Jan. 19, 2012 Sheet 2 of 32 US 2012/0015839 A1 FIGURE 2 g&& S. SS ises S&S&S S&S 3. SESSSSSSSSSSSSSSS sinci: Ex888. &88& S& S&SS&S&S S&SSSSSSSSSS&S 888: 8&ss& Patent Application Publication Jan.
    [Show full text]
  • Doctoral Dissertation by Yongsheng Huang
    Integrative Statistical Learning with Applications to Predicting Features of Diseases and Health by Yongsheng Huang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in The University of Michigan 2011 Doctoral Committee: Professor Alfred O. Hero III, Co-Chair Professor Jay L. Hess, Co-Chair Professor Daniel Burns Jr Professor Gilbert S. Omenn Associate Professor Kerby Shedden © Yongsheng Huang 2011 All Rights Reserved I dedicate this dissertation to my parents and my sisters. It is their unconditional love that gave me the courage and perseverance to continue on this long and winding road towards personal and professional improvement. For so many years, they have quietly and patiently waited for me to grow up. I dedicate this work to my true friend Jiehua Guo who is like my brother and helped me tremendously at many critical moments along this journey. I also dedicate my dissertation to the University of Michigan for granting me such a privilege to its invaluable educational resources. The time I spent studying here will always be one of the most significant parts of my life. ii ACKNOWLEDGEMENTS This dissertation is not even remotely possible without the guidance from Professor Alfred Hero, period. Professor Hero brought me into the world of mathematical statistics and taught me the true meaning of statistical thinking. He often worked with me late into the night and early in the morning, going through each analysis that I have performed and every sentence of manuscripts that I have written. He demonstrated the dedication and rigorous attitude towards science.
    [Show full text]
  • Download As Supplementary Material in Are Generally Valid and Will Perform Well but Would Be Not Additional File 2 [20]
    Rodriguez-Fontenla et al. BMC Genomics 2014, 15:408 http://www.biomedcentral.com/1471-2164/15/408 METHODOLOGY ARTICLE Open Access Genetic distance as an alternative to physical distance for definition of gene units in association studies Cristina Rodriguez-Fontenla, Manuel Calaza and Antonio Gonzalez* Abstract Background: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ± 50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances. Results: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ± 50 Kb offset that has been common in previous studies. A SRR ≥ 2 was selected because it led to gene extensions with median length = 45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ± 50 Kb and with the SRR ≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance.
    [Show full text]
  • Reassessing Colugo Phylogeny, Taxonomy, and Biogeography
    REASSESSING COLUGO PHYLOGENY, TAXONOMY, AND BIOGEOGRAPHY BY GENOME WIDE COMPARISONS AND DNA CAPTURE HYBRIDIZATION FROM MUSEUM SPECIMENS A Dissertation by VICTOR CHRISTIAN MASON Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Chair of Committee, William J Murphy Committee Members, Kristofer M Helgen Paul Samollow Tiffani Williams James Cai Head of Department, Dorothy Shippen May 2016 Major Subject: Genetics Copyright 2016 Victor Christian Mason ABSTRACT The ability to uncover the phylogenetic history of archived museum material with molecular techniques has rapidly improved due to the reduced cost and increased sequence capacity of next-generation sequencing technologies. However it remains difficult to isolate large, orthologous DNA regions across multiple divergent species. Here we describe the use of cross-species DNA capture hybridization techniques and next-generation sequencing to selectively isolate and sequence mitochondrial DNA genomes and nuclear DNA from the degraded DNA of museum specimens, using probes generated from the DNA of an extant species. Colugos are among the most poorly understood of all living mammals despite their central role in our understanding of higher-level primate relationships. Two described species of these extreme gliders are the sole living members of a unique mammalian order, Dermoptera, distributed throughout Southeast Asia. We generated a draft genome sequence for a Sunda colugo and a reference alignment for the Philippine colugo, and used these to identify colugo-specific enrichment in sensory and musculo- skeletal related genes that likely underlie their nocturnal and gliding adaptations. Phylogenomic analysis and catalogs of rare genomic changes overwhelmingly support the hypothesis that colugos are the sister group to primates (Primatomorpha), to the exclusion of treeshrews.
    [Show full text]
  • „Identification and Characterization of Interferon Induced Ubiquitinated
    Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) im Fach Biologie „Identification and Characterization of Interferon-J Induced Ubiquitinated Newly Synthesized Proteins“ Eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät I Humboldt-Universität zu Berlin von Frau Diplom-Chemikerin Anne Wiemhoefer Präsident der Humboldt-Universität zu Berlin: Prof. Dr. Jan-Hendrik Olbertz Dekan der Mathematisch-Naturwissenschaftlichen Fakultät I: Prof. Dr. Andreas Herrmann Gutachter: 1. Prof. Dr. Peter-Michael Kloetzel 2. Prof. Dr. Wolfgang Lockau 3. Prof. Dr. Wolfgang Dubiel Tag der mündlichen Prüfung: 29.06.2011 Inhaltsverzeichnis Zusammenfassung ................................................................................................ i Abstract ............................................................................................................... iii 1 Introduction .................................................................................................... 1 1.1 Ubiquitination ............................................................................................ 3 1.1.1 Ubiquitin ........................................................................................ 3 1.1.2 Process of ubiquitination ............................................................... 4 1.1.3 Endoplasmatic reticulum associated degradation system .............. 8 1.1.4 Ubiquitin chains and functions .................................................... 10 1.2 Process of deubiquitination ....................................................................
    [Show full text]