Bioinformatics Approaches Towards Facilitating Drug Development
Total Page:16
File Type:pdf, Size:1020Kb
BIOINFORMATICS APPROACHES TOWARDS FACILITATING DRUG DEVELOPMENT Anna Ying-Wah Lee Doctor of Philosophy School of Computer Science McGill University Montr´eal,Qu´ebec August 2010 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy. c Anna Ying-Wah Lee, 2010 ACKNOWLEDGEMENTS I would like to thank my supervisors Mike Hallett and Sarah Jenna, and my col- laborator David Thomas for their valuable guidance and support. Their efforts have taught me a lot about research itself, in addition to science. Despite my generally quiet nature, I would like to thank them for sending me off to present at various meet- ings. Those meetings exposed me to a lot of great science and I greatly appreciated the opportunities to attend. I would also like to thank my labmates for making my graduate experience enjoyable both in- and outside of the lab. Members of the David Thomas lab have also been helpful and kind to me over the years. I also thank my famiy and friends. My parents have been quite supportive considering the fact they wanted to me become a different kind of doctor. My high school friends have always supported me by encouraging me to stay in school. Finally, I thank Andr´eCourtemanche, Dr. and Mrs. Milton Leong and NSERC for providing me with fellowships. ii ABSTRACT Drug development is currently a time-consuming, costly and challenging process. The process typically starts with the identification of a therapeutic target for a given disease. A therapeutic target is some biological molecule and the binding of com- pounds to target molecules is expected to cause a desired therapeutic effect. That is, target-binding compounds have the potential to become drug candidates. How- ever, there is a tendency for many drug candidates to fail during clinical trials, and consequently, very few candidates become approved new drugs. This trend suggests that the early stages of drug development should be improved to provide better drug candidates. The reasons for which a drug candidate may fail during clinical trials include unac- ceptable toxicity and insufficient efficacy observed in humans. These reasons suggest that the assessments of a compound during the early stages of drug development often inaccurately predict the effect of the compound in humans. One of the main goals of systems biology is to accurately predict how a given biological system responds to perturbations, e.g. treatment with a compound. This suggests that systems biology can help address challenges in drug development. However, there are currently gaps in our knowledge of systems. Here we use machine learning techniques to exploit exist- ing systems data towards filling in these gaps. In particular, we developed a method that uses the occurrences of motifs in protein sequences to predict kinase-substrate interactions. We also developed a method that uses gene expression, protein-protein interaction and phenotype data to predict genetic interactions. These predicted inter- actions can facilitate the identification of potential therapeutic targets. Ultimately, a better selection of therapeutic targets should lead to better drug candidates. We also address the challenge of developing combinatorial therapies. Despite the fact that combinatorial therapies are advantageous, the scale of the experiments required iii to search for desirable chemical combinations is currently prohibitive. We therefore developed a method that uses system response data to predict chemical synergies towards facilitating the development of combinatorial therapies. Overall, this thesis shows how computational prediction in a systems biology frame- work can be used to facilitate and expedite the early stages of drug development. iv ABREG´ E´ Le d´eveloppement des m´edicaments est actuellement un processus co^uteux,dif- ficile, et qui prend beaucoup de temps. Le processus commence g´en´eralement par l'identification d'une cible th´erapeutique pour une maladie sp´ecifique. Une cible th´erapeutique est une mol´eculebiologique et l'attachement des compos´esaux mol´ecules cibles est suppos´ecauser un effet th´erapeutique. Donc, les compos´esqui attachent aux cibles ont le potentiel de devenir des candidats m´edicaments. Toutefois, beaucoup de candidats m´edicaments ont tendance `a´echouer pendant les essais cliniques, et par cons´equence,tr`espeu de candidats deviennent nouveaux m´edicaments approuv´es. Cette tendance sugg`ereque les premi`eres´etapes du d´eveloppement de m´edicaments doit ^etream´elior´eafin de fournir des candidats m´edicaments de meilleure qualit´e. Les raisons pour lesquelles un candidat m´edicament peut ´echouer pendant les essais cliniques incluent une toxicit´einacceptable et une ´efficacit´einsuffisante observ´eschez les humains. Ces raisons sugg`erent que les ´evaluations d'un compos´ependant les premi`eres´etapes du d´eveloppement de m´edicaments mal pr´edirent l’effet du com- pos´echez les humains. Un des principaux objectifs de la biologie des syst`emesest de pr´edireavec pr´ecisioncomment un syst`emebiologique r´epond `ades perturbations, par exemple, un traitement avec un compos´e.Ceci sugg`ereque la biologie des syst`emes peut aider `aaborder les d´efisdu d´eveloppement de m´edicaments. Toutefois, il existe actuellement des lacunes dans notre connaissance des syst`emes. Ici, nous utilisons des techniques d'apprentissage automatique pour exploiter l'information existantes des syst`emespour combler ces lacunes. En particulier, nous avons d´evelopp´eune m´ethode qui utilise des occurrences des motifs dans les s´equencesde prot´einepour pr´ediredes interactions kinase-substrat. Nous avons aussi d´evelopp´eune m´ethode qui utilise d'expression des g`enes,des interactions entre les prot´eineset d'information des ph´enotypes pour pr´ediredes interactions g´en´etiques. Ces interactions pr´edites v peuvent faciliter l'identification des cibles th´erapeutiques potentielles. En fin de compte, une meilleure s´electiondes cibles th´erapeutiques devrait entra^ınerdes can- didats m´edicaments de meilleure qualit´e. Nous avons aussi abord´ele d´efide d´evelopper des th´erapiescombinatoires. Malgr´e le fait que les th´erapiescombinatoires sont avantageuses, l'ampleur des exp´eriences n´ecessaires`ala recherche de combinaisons chimiques souhaitables est actuellement prohibitif. Donc, nous avons d´evelopp´eune m´ethode qui utilise d'information de r´eponse des syst`emes pour pr´ediredes synergies chimiques en vue de faciliter le d´eveloppement de th´erapiescombinatoires. Dans l'ensemble, cette th`esemontre comment de calcul de pr´edictiondans une struc- ture de biologie des syst`emespeut ^etre utilis´espour faciliter et acc´el´ererles premi`eres ´etapes du d´eveloppement de m´edicaments. vi KEY TO ABBREVIATIONS ADME: Absorption, Distribution, Metabolism and Excretion AIC: Akaike's information criterion AlvC: alverine citrate AMCA: actin-myosin contractile apparatus AMD: amiodarone ASPM: Abnormal Spindle-like, Microcephaly-associated ATP: adenosine triphosphate AU: approximately unbiased AUC: area under the (ROC) curve BBD: Bud6p-binding domain BEN: benomyl Blebb.: blebbistatin CASP: caspofungin CBP: carboplatin CDK: cyclin-dependent kinase CFU: colony forming units CGC: Caenorhabditis Genetic Center CLSI: Clinical and Laboratory Standards Institute cmpd: compound COOH: C-terminal tail region (of Bni1p) CPT: camptothecin CPZ: chlorpromazine CsA: cyclosporin A DAD: Dia-autoregulatory domain DAPI: 4',6-diamidino-2-phenylindole vii DGC: dystrophin glycoprotein complex DMD: Duchenne Muscular Dystrophy DMI: desipramine DNA: deoxyribonucleic acid DOXY: doxycycline Emo: accumulation of endomitotic oocytes ER: endoplasmic reticulum FA: formic acid FCZ: fluconazole FDA: Food and Drug Administration FEN: fenpropimorph FH: Formin-Homology domain FICI: fractional inhibitory concentration index FKBP: FK506 binding protein GAP: GTPase activating protein GBD: GTPase binding domain GDI: guanine-nucleotide dissociation inhibitor GDP: guanosine diphosphate GEF: guanine-nucleotide exchange factor GFP: green fluorescent protein GIN: genetic interaction neighbourhood GIST: gastrointestinal stromal tumour GO: Gene Ontology Gon: severe shortening of the gonads GST: glutathione S-transferase GTP: guanosine triphosphate GW: genome-wide viii HIV: human immunodeficiency virus HOG: high-osmolarity glycerol HMM: hidden Markov model HPLC: high performance liquid chromatography HygB: hygromycin B IBU: ibuprofen IPTG: Isopropyl β-D-1-thiogalactopyranoside KAN: kanamycin LatA: latrunculin A LC-MS: liquid chromatography-mass spectrometry MAPK: mitogen-activated protein kinase MCC: minimum cytotoxic concentration MFC: minimum fungicidal concentration MIC: minimum inhibitory concentration MLC: myosin light chain MMS: methyl methanesulfonate MPA: mycophenolic acid mRNA: messenger ribonucleic acid MRSP: mental retardation and synaptic plasticity MYR: myriocin NA: not available NCRR: National Center for Research Resources ND: not done NGM: nematode growth media NIH: National Institutes of Health NRC: National Research Council of Canada NYS: nystatin ix PAK: p21 activated kinase PCR: polymerase chain reaction PhEP: phenotype, gene expression and PP interaction network PIN: physical interaction neighbourhood PLZF: promyelocytic leukaemia zinc finger PP: protein-protein, e.g. PP interaction PSSM: position-specific scoring matrix Q-TOP: quadrupole time-of-flight RNA: ribonucleic acid RNAi: ribonucleic acid interference ROC: receiver-operating characteristic curve RSC: Remodel the Structure of Chromatin SAGA: