Bioinformatics Approaches Towards Facilitating Drug Development

BIOINFORMATICS APPROACHES TOWARDS FACILITATING DRUG DEVELOPMENT Anna Ying-Wah Lee Doctor of Philosophy School of Computer Science McGill University Montréal,Québec August 2010 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy. c Anna Ying-Wah Lee, 2010 ACKNOWLEDGEMENTS I would like to thank my supervisors Mike Hallett and Sarah Jenna, and my col- laborator David Thomas for their valuable guidance and support. Their efforts have taught me a lot about research itself, in addition to science. Despite my generally quiet nature, I would like to thank them for sending me off to present at various meetings. Those meetings exposed me to a lot of great science and I greatly appreciated the opportunities to attend. I would also like to thank my labmates for making my graduate experience enjoyable both in- and outside of the lab. Members of the David Thomas lab have also been helpful and kind to me over the years. I also thank my famiy and friends. My parents have been quite supportive considering the fact they wanted to me become a different kind of doctor. My high school friends have always supported me by encouraging me to stay in school. Finally, I thank AndréCourtemanche, Dr. and Mrs. Milton Leong and NSERC for providing me with fellowships. ii ABSTRACT Drug development is currently a time-consuming, costly and challenging process. The process typically starts with the identification of a therapeutic target for a given disease. A therapeutic target is some biological molecule and the binding of compounds to target molecules is expected to cause a desired therapeutic effect. That is, target-binding compounds have the potential to become drug candidates. How- ever, there is a tendency for many drug candidates to fail during clinical trials, and consequently, very few candidates become approved new drugs. This trend suggests that the early stages of drug development should be improved to provide better drug candidates. The reasons for which a drug candidate may fail during clinical trials include unac- ceptable toxicity and insufficient efficacy observed in humans. These reasons suggest that the assessments of a compound during the early stages of drug development often inaccurately predict the effect of the compound in humans. One of the main goals of systems biology is to accurately predict how a given biological system responds to perturbations, e.g. treatment with a compound. This suggests that systems biology can help address challenges in drug development. However, there are currently gaps in our knowledge of systems. Here we use machine learning techniques to exploit exist- ing systems data towards filling in these gaps. In particular, we developed a method that uses the occurrences of motifs in protein sequences to predict kinase-substrate interactions. We also developed a method that uses gene expression, protein-protein interaction and phenotype data to predict genetic interactions. These predicted interactions can facilitate the identification of potential therapeutic targets. Ultimately, a better selection of therapeutic targets should lead to better drug candidates. We also address the challenge of developing combinatorial therapies. Despite the fact that combinatorial therapies are advantageous, the scale of the experiments required iii to search for desirable chemical combinations is currently prohibitive. We therefore developed a method that uses system response data to predict chemical synergies towards facilitating the development of combinatorial therapies. Overall, this thesis shows how computational prediction in a systems biology frame- work can be used to facilitate and expedite the early stages of drug development. iv ABREG´ E´ Le développement des médicaments est actuellement un processus coûteux,dif- ficile, et qui prend beaucoup de temps. Le processus commence généralement par l'identification d'une cible thérapeutique pour une maladie spécifique. Une cible thérapeutique est une moléculebiologique et l'attachement des composésaux molécules cibles est supposécauser un effet thérapeutique. Donc, les composésqui attachent aux cibles ont le potentiel de devenir des candidats médicaments. Toutefois, beaucoup de candidats médicaments ont tendance àéchouer pendant les essais cliniques, et par conséquence,trèspeu de candidats deviennent nouveaux médicaments approuvés. Cette tendance suggèreque les premièresétapes du développement de médicaments doit êtreamélioréafin de fournir des candidats médicaments de meilleure qualité. Les raisons pour lesquelles un candidat médicament peut échouer pendant les essais cliniques incluent une toxicitéinacceptable et une éfficacitéinsuffisante observéschez les humains. Ces raisons suggèrent que les évaluations d'un composépendant les premièresétapes du développement de médicaments mal prédirent l’effet du com- poséchez les humains. Un des principaux objectifs de la biologie des systèmesest de prédireavec précisioncomment un systèmebiologique répond àdes perturbations, par exemple, un traitement avec un composé.Ceci suggèreque la biologie des systèmes peut aider àaborder les défisdu développement de médicaments. Toutefois, il existe actuellement des lacunes dans notre connaissance des systèmes. Ici, nous utilisons des techniques d'apprentissage automatique pour exploiter l'information existantes des systèmespour combler ces lacunes. En particulier, nous avons développéune méthode qui utilise des occurrences des motifs dans les séquencesde protéinepour prédiredes interactions kinase-substrat. Nous avons aussi développéune méthode qui utilise d'expression des gènes,des interactions entre les protéineset d'information des phénotypes pour prédiredes interactions génétiques. Ces interactions prédites v peuvent faciliter l'identification des cibles thérapeutiques potentielles. En fin de compte, une meilleure sélectiondes cibles thérapeutiques devrait entra^ınerdes candidats médicaments de meilleure qualité. Nous avons aussi abordéle défide développer des thérapiescombinatoires. Malgré le fait que les thérapiescombinatoires sont avantageuses, l'ampleur des expériences nécessairesàla recherche de combinaisons chimiques souhaitables est actuellement prohibitif. Donc, nous avons développéune méthode qui utilise d'information de réponse des systèmes pour prédiredes synergies chimiques en vue de faciliter le développement de thérapiescombinatoires. Dans l'ensemble, cette thèsemontre comment de calcul de prédictiondans une structure de biologie des systèmespeut être utiliséspour faciliter et accélérerles premières étapes du développement de médicaments. vi KEY TO ABBREVIATIONS ADME: Absorption, Distribution, Metabolism and Excretion AIC: Akaike's information criterion AlvC: alverine citrate AMCA: actin-myosin contractile apparatus AMD: amiodarone ASPM: Abnormal Spindle-like, Microcephaly-associated ATP: adenosine triphosphate AU: approximately unbiased AUC: area under the (ROC) curve BBD: Bud6p-binding domain BEN: benomyl Blebb.: blebbistatin CASP: caspofungin CBP: carboplatin CDK: cyclin-dependent kinase CFU: colony forming units CGC: Caenorhabditis Genetic Center CLSI: Clinical and Laboratory Standards Institute cmpd: compound COOH: C-terminal tail region (of Bni1p) CPT: camptothecin CPZ: chlorpromazine CsA: cyclosporin A DAD: Dia-autoregulatory domain DAPI: 4',6-diamidino-2-phenylindole vii DGC: dystrophin glycoprotein complex DMD: Duchenne Muscular Dystrophy DMI: desipramine DNA: deoxyribonucleic acid DOXY: doxycycline Emo: accumulation of endomitotic oocytes ER: endoplasmic reticulum FA: formic acid FCZ: fluconazole FDA: Food and Drug Administration FEN: fenpropimorph FH: Formin-Homology domain FICI: fractional inhibitory concentration index FKBP: FK506 binding protein GAP: GTPase activating protein GBD: GTPase binding domain GDI: guanine-nucleotide dissociation inhibitor GDP: guanosine diphosphate GEF: guanine-nucleotide exchange factor GFP: green fluorescent protein GIN: genetic interaction neighbourhood GIST: gastrointestinal stromal tumour GO: Gene Ontology Gon: severe shortening of the gonads GST: glutathione S-transferase GTP: guanosine triphosphate GW: genome-wide viii HIV: human immunodeficiency virus HOG: high-osmolarity glycerol HMM: hidden Markov model HPLC: high performance liquid chromatography HygB: hygromycin B IBU: ibuprofen IPTG: Isopropyl β-D-1-thiogalactopyranoside KAN: kanamycin LatA: latrunculin A LC-MS: liquid chromatography-mass spectrometry MAPK: mitogen-activated protein kinase MCC: minimum cytotoxic concentration MFC: minimum fungicidal concentration MIC: minimum inhibitory concentration MLC: myosin light chain MMS: methyl methanesulfonate MPA: mycophenolic acid mRNA: messenger ribonucleic acid MRSP: mental retardation and synaptic plasticity MYR: myriocin NA: not available NCRR: National Center for Research Resources ND: not done NGM: nematode growth media NIH: National Institutes of Health NRC: National Research Council of Canada NYS: nystatin ix PAK: p21 activated kinase PCR: polymerase chain reaction PhEP: phenotype, gene expression and PP interaction network PIN: physical interaction neighbourhood PLZF: promyelocytic leukaemia zinc finger PP: protein-protein, e.g. PP interaction PSSM: position-specific scoring matrix Q-TOP: quadrupole time-of-flight RNA: ribonucleic acid RNAi: ribonucleic acid interference ROC: receiver-operating characteristic curve RSC: Remodel the Structure of Chromatin SAGA:

Bioinformatics Approaches Towards Facilitating Drug Development

Lineage-Specific Evolution of the Vertebrate Otopetrin Gene Family Revealed by Comparative Genomic Analyses

Relevance Network Between Chemosensitivity and Transcriptome in Human Hepatoma Cells1

Long Non-Coding RNA Lncshgl Recruits Hnrnpa1 to Suppress Hepatic Gluconeogenesis and Lipogenesis

Role and Regulation of the P53-Homolog P73 in the Transformation of Normal Human Fibroblasts

Suppression of the Peripheral Immune System Limits the Central Immune Response Following Cuprizone-Feeding: Relevance to Modelling Multiple Sclerosis

Genetic and Genomics Laboratory Tools and Approaches

Exome Sequencing of 457 Autism Families Recruited Online Provides Evidence for Autism Risk Genes

Genomic Portrait of a Sporadic Amyotrophic Lateral Sclerosis Case in a Large Spinocerebellar Ataxia Type 1 Family

1 Novel Expression Signatures Identified by Transcriptional Analysis

Identifying and Characterizing Yeast PAS Kinase 1 Substrates Reveals Regulation of Mitochondrial and Cell Growth Pathways

Context-Dependent Transcriptional Regulation of Microglial Proliferation

Etics: Early Online, Published on December 26, 2017 As 10.1534/Genetics.117.300552