Exploring Genomic Medicine Using Integrative Biology

Exploring Genomic Medicine Using Integrative Biology by Atul Janardhan Butte B.A. Computer Science Brown University, 1991 M.D. Brown University Medical School, 1995 SUBMITTED TO THE HARVARD-MIT DIVISION OF HEALTH SCIENCES AND TECHNOLOGY IN PARTIAL FULFILLMENT OF THE REQURIEMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN HEALTH SCIENCES AND TECHNOLOGY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUNE 2004 © 2004 Massachusetts Institute of Technology. All rights reserved. I - Signature of Author: ··L Harvard-MIT Division of Health Sciences and Technology May 12, 2004 / ( Certified by: Isaac Kohane, M.D., Ph.D. Associate Professor of Pediatrics, Harvard Medical School Henderson Professor of Health Sciences and Technology Harvard-MIT Division of Health Sciences and Technology Thesis Supervisor . Accepted by: Ivlarna L. Gray, rn.u. Edward Hood Taplin Professor of Medical ind Electrical Engineering Co-Director, Harvard-MIT Division of HealtH Sciences and Technology "- ' ., ARCHIVE! MASSACHUSETS INSTiTU S OF TECHNOLOGY I SEP 2 2005 LIBRARIES %I '.t. Exploring Genomic Medicine Using Integrative Biology by Atul Janardhan Butte SUBMITTED TO THE HARVARD-MIT DIVISION OF HEALTH SCIENCES AND TECHNOLOGY IN PARTIAL FULFILLMENT OF THE REQURIEMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN HEALTH SCIENCES AND TECHNOLOGY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY ABSTRACT Instead of focusing on the cell, or the genotype, or on any single measurement modality, using integrative biology allows us to think holistically and horizontally. A disease like diabetes can lead to myocardial infarction, nephropathy, and neuropathy; to study diabetes in genomic medicine would require reasoning from a disease to all its various complications to the genome and back. I am studying the process of intersecting nearly-comprehensive data sets in molecular biology, across three representative modalities (microarrays, RNAi and quantitative trait loci) out of the more than 30 available today. This is difficult because the semantics and context of each experiment performed becomes more important, necessitating a detailed knowledge about the biological domain. I addressed this problem by using all public microarray data from NIH, unifying 50 million expression measurements with standard gene identifiers and representing the experimental context of each using the Unified Medical Language System, a vocabulary of over 1 million concepts. I created an automated system to join data sets related by experimental context. I evaluated this system by finding genes significantly involved in multiple experiments directly and indirectly related to diabetes and adipogenesis and found genes known to be involved in these diseases and processes. As a model first step into integrative biology, I then took known quantitative trait loci in the rat involved in glucose metabolism and build an expert system to explain possible biological mechanisms for these genetic data using the modeled genomic data. The system I have created can link diseases from the ICD-9 billing code level down to the genetic, genomic, and molecular level. In a sense, this is the first automated system built to study the new field of genomic medicine. 3 Biographical Note Atul Butte is currently on staff in the Children's Hospital Informatics Program, is a practicing pediatric endocrinologist at Children's Hospital, Boston, and is an Instructor at Harvard Medical School. Dr. Butte received his undergraduate degree in Computer Science from Brown University in 1991, and worked in several stints as a software engineer at Apple Computer (on the System 7 team) and Microsoft Corporation (on the Excel team). He graduated from the Brown University School of Medicine in 1995, during which he worked as a research fellow at NIDDK through the Howard Hughes/NIH Research Scholars Program. He completed his residency in Pediatrics and Fellowship in Pediatric Endocrinology in 2001, both at Children's Hospital, Boston. Dr. Butte has authored 25 publications in bioinformatics, medical informatics, and molecular diabetes and has delivered more than 30 presentations world-wide on bioinformatics, including four at the National Institutes of Health. During his research work under Dr. Isaac Kohane, he developed a novel methodology for analyzing large data sets of RNA expression, called Relevance Networks. This technique was published in the Proceedings of the National Academy of Science (2000, 97:12182). Dr. Butte's recent awards include the 2003 Emory University School of Medicine, Pathology Residents' Choice Award, 2002 American Association for Clinical Chemistry Outstanding Speaker Award, 2002 Endocrine Society Travel Award based on presentation merit, 2001 American Association for Cancer Research Scholar-In-Training Award and the 2001 Lawson Wilkins Pediatric Endocrine Society Clinical Scholar Award. Dr. Butte's research is supported by grants from NCI, NIDDK, NHLBI, NINDS, NIAID, NLM, the Endocrine Fellows Foundation, the Genentech Center for Clinical Research and Education, the Lawson Wilkins Pediatric Endocrinology Society, and Merck. Along with Isaac Kohane and Alvin Kho, Dr. Butte has co-authored one of the first books on microarray analysis titled "Microarrays for an Integrative Genomics" published by MIT Press. 4 Acknowledgements More than 10 years ago, I first heard of Dr. Isaac Kohane while I interviewed at Children's Hospital for a residency position. I knew immediately that I had to work with him. Six years ago, he introduced me to the new field of functional genomics as he handed me one of the first microarray data sets. My life has forever changed. Dr. Isaac Kohane, my mentor and friend, I cannot thank you enough. Six years ago, I met Dr. Peter Szolovits as I enrolled as a student at MIT. Dr. Szolovits introduced me to courses at MIT that first inspired me to apply data modeling techniques to genomic data, which led to my Masters Degree in Medical Informatics. He has always encouraged discipline in my writing. Dr. Szolovits will always be a prominent role-model for me, as my work increasingly spans both clinical and bioinformatics. My work has forever changed. Dr. Szolovits, I cannot thank you enough. More than 10 years ago, I heard of the incredible work in the field of insulin receptor signal transduction being accomplished by Dr. C. Ronald Kahn at the Joslin Diabetes Center. I am still amazed that I continue to have the opportunity to benefit from his wisdom, knowledge, guidance, and mentorship. Because of Dr. Kahn, I now have the ability to focus my bioinformatics skills and ideas on the growing problem of diabetes mellitus. My career has forever changed. Dr. Kahn, I cannot thank you enough. My wife, Dr. Tarangini Deshpande, entered my life 41/2years ago. I cannot imagine life without her. Beyond absorbing my role and responsibilities in the home while this dissertation was being written, she continues to inspire my life and my work, from the most theoretical to the deepest technical levels. In all ways, I could not have focused on this thesis without her. I cannot thank her enough. A little girl entered my life 16 months ago. Her name, Kimayani, literally means "by a miracle." I apologize to her for the time away that this work required, and I promise to make it up to her starting now. My parents, Janardhan and Mangala Butte, have encouraged my academic development since birth. For heading out at night to purchase new toys or puzzles immediately after I mastered the ones I had as a toddler, to teaching me biology in elementary school, to introducing me to computer programming as a teenager, I cannot thank them enough. 5 My brother, Dr. Manish Butte, who inspires me by pushing himself constantly to learn and develop as a physician scientist. I cannot thank him enough. My inlaws, Sudhir and Neela Deshpande, have been enormously helpful in caring for Kimayani, allowing me to intensely pursue this academic work. Their love and support for me have been immeasurable. I cannot thank them enough. My fellow colleagues, faculty, fellows, students and staff of the Children's Hospital Informatics Program, who have provided an inspirational sounding-board and test-bed for many of my ideas over the past six years. Dr. Joseph Majzoub, for supporting my pursuit of this Ph.D. while I was still a fellow and attending physician in the Division of Pediatric Endocrinology at Children's Hospital. My grandparents, aunts and uncles, and ancestors for their continued blessings. Without financial support from the following institutions, I could not have completed this work: the National Library of Medicine, the National Heart Lung and Blood Institute, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, the Endocrine Fellows Foundation, the Lawson Wilkins Pediatric Endocrinology Society, the Harvard Center for Neurodegeneration and Repair, the Genentech Center for Clinical Research and Education, the Merck / Massachusetts Institute of Technology fellowship, Children's Hospital Division of Endocrinology, and the Harvard-MIT Division of Health Sciences and Technology. 6 1. Introduction Genomic medicine has been defined by Alan Guttmacher and Francis S. Collins as the application of our rapidly expanding knowledge of the human genome to medical practice. 4 We commonly define genomic medicine by the individual experimental modalities available to study the genome. As an example of this, as of this writing, there are over 3,600 publications

Exploring Genomic Medicine Using Integrative Biology

Analysis of Gene Expression Data for Gene Ontology

Identification of Evolutionarily Conserved Md1 Splice Variants

Irs2 and Irs4 Synergize in Non-Leprb Neurons to Control Energy Balance and Glucose Homeostasis#.” Molecular Metabolism 3 (1): 55-63

Serum Concentrations of Afamin Are Elevated in Patients with Polycystic Ovary Syndrome

The Association of Single Nucleotide Polymorphisms in Intronic Regions of Islet Cell Autoantigen 1 and Type 1 Diabetes Mellitus

Comparative Genome Mapping in the Sequence-Based Era: Early Experience with Human Chromosome 7

Antisense Afp Transcripts in Mouse Liver and Their Potential Role in Afp Gene Regulation

Biochemical Mechanisms of Thyroid Hormone Deiodination

The Imprinted DLK1-MEG3 Gene Region on Chromosome 14Q32.2 Alters Susceptibility to Type 1 Diabetes

Mouse Irs4 Knockout Project (CRISPR/Cas9)

REVIEW Iodothyronine Deiodinase Structure and Function

Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?