Consumer Genomics to Genomic Medicine: Role of Discrete Algorithms

Total Page:16

File Type:pdf, Size:1020Kb

Consumer Genomics to Genomic Medicine: Role of Discrete Algorithms IBM Computational Biology Center Consumer Genomics to Genomic Medicine: Role of Discrete Algorithms Laxmi Parida IBM TJ Watson Research Center New York, USA IBM Computational Biology Center Consumer Genomics www.ibm.com/genographic www.nationalgeographic.com/genographic Delivering genomic results directly to consumers 2 IBM Computational Biology Center IBM Computational Biology Center Migratory Map Representation IBM Computational Biology Center www.ibm.com/genographic www.nationalgeographic.com/genographic Over 450,000 public participants 100,000 participants thru PI 5 IBM Computational Biology Center www.ibm.com/genographic www.nationalgeographic.com/genographic non-medical applications Over 450,000 public participants 100,000 participants thru PI 6 IBM Computational Biology Center Predicting the past– Andreas Dress Recombining loci ARG Ancestral Recombinations graphs Griffiths and Marjoram, 1996 7 Neighborhood joining method tree courtesy Saitou Naruya 2002 IBM Computational Biology Center Quantify what can be inferred ? Recombining loci ?? 100 % ARG Ancestral Recombinations graphs Griffiths and Marjoram, 1996 8 Neighborhood joining method tree courtesy Saitou Naruya 2002 IBM Computational Biology Center The Origin of Population Genomics Brilliant Blunders 1860 1859 9 IBM Computational Biology Center Mathematical Population Genetics Darwinian selection & Mendelian genetics are actually complementary Sewall Wright, Ronald Fisher, JBS Haldane Population Genetics: “deals with (statistical) analysis of the inheritance & prevalence of genes in populations” Neutral Theory (MK, King, Jukes): 1968-80 • Most of the observed genetic variation is selectively neutral • Importance of stochastic factors by random genetic drift Kingman’s coalescence 1970-80’s Retrospective coalescent . “explicitized the tree” 1990s: Availability of genomes Population Genetics=>Genomics 10 IBM Computational Biology Center Genome Variations • Human Genome Project (completed 2000; 2003) genomics.energy.gov More variations than expected • International SNP Consortium (launched 2000) The HapMap Project (2003) www.hapmap.org; snp.cshl.org SNP variations • 1000 Genomes Project (launched June 2008) www.1000genomes.org Deep catalog of human genetic variation • Personal Genome Project www.personalgenomes.org Nothing in Biology Makes Sense Except in the Light of Evolution. -- Theodosius Dobzhansky 11 IBM Computational Biology Center Retrospective: Coalescence (track flow of ancestral genetic material for the extants only) past Wright-Fisher population: 1. Constant population 2. Non-overlapping generations 3. Panmictic LCA / MRCA LCA: 1. Common ancestor-CA 2. Least among CAs present The Phylogeny 12 IBM Computational Biology Center Retrospective: Coalescence (track flow of ancestral genetic material for the extants only) past Wright-Fisher population: 1. Constant population 2. Non-overlapping generations 3. Panmictic - single parent/node - multiple parents/node present The Phylogeny 13 IBM Computational Biology Center Bound the infinite structure: root MRCA/GMRCA [Grand] Most Recent Common Ancestor root root 14 IBM Computational Biology Center Motivation question: Is ARG even reconstructible? How much? 15 IBM Computational Biology Center Motivation question: Is ARG even reconstructible? How much? JCB 10; BMC Bioinformatics16 11 IBM Computational Biology Center Quantify what can be inferred ? Recombining loci ?? 100 % ARG Ancestral Recombinations graphs Griffiths and Marjoram, 1996 17 Neighborhood joining method tree courtesy Saitou Naruya 2002 IBM Computational Biology Center The Random Graphs Framework Forbidden Structure Graph definition . Infinite number of vertices arranged in finite sized rows . Edges introduced via a random process across immediate rows Probability space of such graphs . Space is non-enumerable (no bijective map to natural numbers) . Uniform probability measure 18 IBM Computational Biology Center Question: LCA ? 19 IBM Computational Biology Center Question: LCA ? Ancestor without ancestry paradox 20 IBM Computational Biology Center Question: LCA ? Ancestor without ancestry paradox 21 IBM Computational Biology Center Annotated graph Mendelian model 22 IBM Computational Biology Center Annotated Edges Retrospective Coalescence Marginal Genealogies 23 IBM Computational Biology Center Illustration of Theorem & edge annotations {1, 2, 3} 24 IBM Computational Biology Center Illustration of Theorem 25 IBM Computational Biology Center Illustration 26 IBM Computational Biology Center Topological Definition of MRCA: Least Common Ancestor with Ancestry (LCAA) 27 IBM Computational Biology Center Random Graphs: Probability Space . Space is non-enumerable (no bijective map to natural numbers) . Uniform probability measure . Probability of some event F(h) for a fixed depth, h, & take limit: 28 IBM Computational Biology Center Minimal Descriptor Non-redundant core L Parida, P F Palamara, A Javed, A minimal descriptor of an ancestral recombinations graph, BMC Bioinformatics,29 2011. 3 0IBM Computational Biology Center Minimal Descriptor of an ARG 1. Structure-preserving (marginal genealogies are identical to that of G) 2. Samples-preserving (genetic variation patterns in the samples are identical to that of G) 30 3 1IBM Computational Biology Center Minimal Descriptor 1. Structure-preserving (marginal genealogies are identical to that of G) 2. Samples-preserving (genetic variation patterns in the samples are identical to that of G) t-coalescent node t-coalescent node 31 IBM Computational Biology Center Minimal descriptor (MD) results Theorem 1 : An unbounded ARG always has a bounded minimal descriptor. Theorem 2 : The reduced MD ARG is unique. 32 IBM Computational Biology Center Minimal descriptor of an ARG Implications: • What is the model? • Theoretical upper bound on the reconstructed structure Embarrassingly simple; models data that don’t matter • The lean mdARG can be a benchmark • How small is the core? Substantially (empirically observed) • Importance sampling ? • Mindless compression? • Observation: no gapped segments in binary MD ARGs No: structure & samples preserving L Parida, P F Palamara, A Javed, A minimal descriptor of an ancestral recombinations graph, BMC Bioinformatics,33 2011. IBM Computational Biology Center Quantify what can be inferred ? Recombining loci ~65 % 100 % ARG Ancestral Recombinations graphs Griffiths and Marjoram, 1996 34 Neighborhood joining method tree courtesy Saitou Naruya 2002 IBM Computational Biology Center DSR Algorithm RECOMATRIX Reconstruction (IRiS pipeline) network recombinations JCB 08, BMC Bioinformatics 09 Hum Gen 11, MBE 11, Bioinformatics 11 RECOTYPE calibration PLoS Comp Bio 10 35 IBM Computational Biology Center DSR Algorithm RECOMATRIX Reconstruction (IRiS pipeline) network recombinations JCB 08, BMC Bioinformatics 09 Hum Gen 11, MBE 11, Bioinformatics 11 RECOTYPE calibration PLoS Comp Bio 10 36 IBM Computational Biology Center The Central Problem: DSR Algorithm a b 1. Identify SNP block patterns c d e segment 1 segment 2 2. Compute segments with no evidence of recombination within each segment 3. Construct Trees Tree 2 2:7 Tree 1 1:5 2:5 2:6 1:4 1:1 2:2 2:4 1:2 1:3 2:1 2:3 c,d a b,e JCB ‘08 d e a b,c IBM Computational Biology Center Minimize recombination events DSR AlgorithmOccam’s is a Razor bottomPrinciple up tree merger in which each nodes is assigned Dominant, Subdominant, or Recombinant. Tree 1 1:5 1:5 1:5 j k 2:7 m 1:4 m j 2:5 1:1 k 1:3 1:4 1:2 1. c,d a b,e 2:6 2. i 1:4 f,g h,i 2:2 Tree 2 2:7 g 1:1 h f 1:3 2:4 2:1 2:5 2:6 1:2 2:3 1. 2:2 1:1 d c 2. 2:1 2:3 2:4 2. 1. a 1:2 a b c d e d e a b,c e b 1:3 D S R 38 IBM Computational Biology Center Minimize recombination events Occam’s Razor Principle DSR Algorithm is a bottom up tree merger in which each nodes is assigned Dominant, Subdominant, or Recombinant. Tree 1 1:5 1:5 1:5 j k 2:7 m 1:4 m j 2:5 1:1 k 1:3 1:4 1:2 1. c,d a b,e 2:6 2. i 1:4 f,g h,i 2:2 Tree 2 2:7 g 1:1 h f 1:3 2:4 2:1 2:5 2:6 1:2 2:3 1. 2:2 1:1 d c 2. 2:1 2:3 2:4 2. 1. a 1:2 a b c d e d e a b,c e b 1:3 D S R 39 IBM Computational Biology Center Parameters: devil in the details …. SNP Patterns Granularity Haplotype clusters Ancestral states Sequential Markovian 40 IBM Computational Biology Center DSR Algorithm RECOMATRIX Reconstruction (IRiS pipeline) network recombinations JCB 08, BMC Bioinformatics 09 Hum Gen 11, MBE 11, Bioinformatics 11 RECOTYPE calibration PLoS Comp Bio 10 41 IBM Computational Biology Center IRiS: Reconstructing ARG at genomic scales RECOTYPE RECOMATRIX Bioinformatics 11 42 IBM Computational Biology Center Edges & Ages 43 IBM Computational Biology Center IRiS researcher.ibm.com/project/2303 IBM Computational Biology Center Africa 1.Yoruba 2.Maasai 3.Luuya 4.Chad 5.African American Five regions on X North Africa-Middle East 6.Lebanese 7.Kuwaiti 8.Iranian 9.Egyptian chromosome 10.Morrocan spanning 2MB Europe 11.NW European 12.British 13.Dutch 14.Basque 15.Gypsies 16.Tuscan 17.Romanian 18.Chechen 19.Russian 1255 SNPs Central Asia 20.Tatar 21.Altaian 22.Uighur 1318 samples South East Asia 23.Gujarati 24.Tamil (CN) 25.Tamil(NTN) 26.Kalita 33 populations East Asia 27.Adi 28.Tibetan 29.Laotian 30.Ati 31.Chinese 32.Japanese South America 33. Mexican 19 21 20 12 13 11 18 22 16 17 14 5 31 33 15 32 8 6 28 10 9 7 27 23 26 4 25 29 30 1 24 3 2 45 Custom array of SNPs (NOT tagSNPs ie based on LD) IBM Computational Biology Center 46 IBM Computational Biology Center 47 Multi-Dimensional Scaling (MDS) on ARG IBM Computational Biology Center Possible corridor out of Africa: Northern route thru Middle East vs Southern Route thru Arabia (Bab-el-Mandeb) Using recombinational diversity (Nei’s diversity on recotypes) Does not match SNP diversity Southern Migration Northern Migration Reconstructed ARG 48 https://researcher.ibm.com/researcher/view_project.php?id=2303 IBM Computational Biology Center Summary of implications: 1.
Recommended publications
  • 120421-24Recombschedule FINAL.Xlsx
    Friday 20 April 18:00 20:00 REGISTRATION OPENS in Fira Palace 20:00 21:30 WELCOME RECEPTION in CaixaForum (access map) Saturday 21 April 8:00 8:50 REGISTRATION 8:50 9:00 Opening Remarks (Roderic GUIGÓ and Benny CHOR) Session 1. Chair: Roderic GUIGÓ (CRG, Barcelona ES) 9:00 10:00 Richard DURBIN The Wellcome Trust Sanger Institute, Hinxton UK "Computational analysis of population genome sequencing data" 10:00 10:20 44 Yaw-Ling Lin, Charles Ward and Steven Skiena Synthetic Sequence Design for Signal Location Search 10:20 10:40 62 Kai Song, Jie Ren, Zhiyuan Zhai, Xuemei Liu, Minghua Deng and Fengzhu Sun Alignment-Free Sequence Comparison Based on Next Generation Sequencing Reads 10:40 11:00 178 Yang Li, Hong-Mei Li, Paul Burns, Mark Borodovsky, Gene Robinson and Jian Ma TrueSight: Self-training Algorithm for Splice Junction Detection using RNA-seq 11:00 11:30 coffee break Session 2. Chair: Bonnie BERGER (MIT, Cambrige US) 11:30 11:50 139 Son Pham, Dmitry Antipov, Alexander Sirotkin, Glenn Tesler, Pavel Pevzner and Max Alekseyev PATH-SETS: A Novel Approach for Comprehensive Utilization of Mate-Pairs in Genome Assembly 11:50 12:10 171 Yan Huang, Yin Hu and Jinze Liu A Robust Method for Transcript Quantification with RNA-seq Data 12:10 12:30 120 Zhanyong Wang, Farhad Hormozdiari, Wen-Yun Yang, Eran Halperin and Eleazar Eskin CNVeM: Copy Number Variation detection Using Uncertainty of Read Mapping 12:30 12:50 205 Dmitri Pervouchine Evidence for widespread association of mammalian splicing and conserved long range RNA structures 12:50 13:10 169 Melissa Gymrek, David Golan, Saharon Rosset and Yaniv Erlich lobSTR: A Novel Pipeline for Short Tandem Repeats Profiling in Personal Genomes 13:10 13:30 217 Rory Stark Differential oestrogen receptor binding is associated with clinical outcome in breast cancer 13:30 15:00 lunch break Session 3.
    [Show full text]
  • Python for Bioinformatics, Second Edition
    PYTHON FOR BIOINFORMATICS SECOND EDITION CHAPMAN & HALL/CRC Mathematical and Computational Biology Series Aims and scope: This series aims to capture new developments and summarize what is known over the entire spectrum of mathematical and computational biology and medicine. It seeks to encourage the integration of mathematical, statistical, and computational methods into biology by publishing a broad range of textbooks, reference works, and handbooks. The titles included in the series are meant to appeal to students, researchers, and professionals in the mathematical, statistical and computational sciences, fundamental biology and bioengineering, as well as interdisciplinary researchers involved in the field. The inclusion of concrete examples and applications, and programming techniques and examples, is highly encouraged. Series Editors N. F. Britton Department of Mathematical Sciences University of Bath Xihong Lin Department of Biostatistics Harvard University Nicola Mulder University of Cape Town South Africa Maria Victoria Schneider European Bioinformatics Institute Mona Singh Department of Computer Science Princeton University Anna Tramontano Department of Physics University of Rome La Sapienza Proposals for the series should be submitted to one of the series editors above or directly to: CRC Press, Taylor & Francis Group 3 Park Square, Milton Park Abingdon, Oxfordshire OX14 4RN UK Published Titles An Introduction to Systems Biology: Statistical Methods for QTL Mapping Design Principles of Biological Circuits Zehua Chen Uri Alon
    [Show full text]
  • F. Alex Feltus, Ph.D
    F. Alex Feltus, Ph.D. Curriculum Vitae 001010101000001000100001011001010101001000101001010010000100001010101001001000010010001000100001010001001010100100010001000101001000011110101000110010100010101010101010110101010100001000010010101010100100100000101001010010001010110100010 Clemson University • Department of Genetics & Biochemistry Biosystems Research Complex Rm 302C • 105 Collings St. • Clemson, SC 29634 (864)656-3231 (office) • (864) 654-5403 (home) • Skype: alex.feltus • [email protected] https://www.clemson.edu/science/departments/genetics-biochemistry/people/profiles/ffeltus https://orcid.org/0000-0002-2123-6114 https://www.linkedin.com/in/alex-feltus-86a0073a 001010101000001010101010101010101010100110000101100101010100100010100101001000010000101010100100100001001000100010000101000100101010010001000100010100100001111010100011001010001000001000010010101010100100100000101001010010001010110100010 Educational Background: Ph.D. Cell Biology (2000) Vanderbilt University (Nashville, TN) B.Sc. Biochemistry (1992) Auburn University (Auburn, AL) Ph.D. Dissertation Title: Transcriptional Regulation of Human Type II 3β-Hydroxysteroid Dehydrogenase: Stat5- Centered Control by Steroids, Prolactin, EGF, and IL-4 Hormones. Professional Experience: 2018- Professor, Clemson University Department of Genetics and Biochemistry 2017- Core Faculty, Biomedical Data Science and Informatics (BDSI) PhD Program 2018- Faculty Member, Clemson Center for Human Genetics 2020- Faculty Scholar, Clemson University School of Health Research (CUSHR) 2019- co-Founder,
    [Show full text]
  • ACM-BCB 2016 the 7Th ACM Conference on Bioinformatics
    ACM-BCB 2016 The 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics October 2-5, 2016 Organizing Committee General Chairs: Steering Committee: Ümit V. Çatalyürek, Georgia Institute of Technology Aidong Zhang, State University of NeW York at Buffalo, Genevieve Melton-Meaux, University of Minnesota Co-Chair May D. Wang, Georgia Institute of Technology and Program Chairs: Emory University, Co-Chair John Kececioglu, University of Arizona Srinivas Aluru, Georgia Institute of Technology Adam Wilcox, University of Washington Tamer Kahveci, University of Florida Christopher C. Yang, Drexel University Workshop Chair: Ananth Kalyanaraman, Washington State University Tutorial Chair: Mehmet Koyuturk, Case Western Reserve University Demo and Exhibit Chair: Robert (Bob) Cottingham, Oak Ridge National Laboratory Poster Chairs: Lin Yang, University of Florida Dongxiao Zhu, Wayne State University Registration Chair: Preetam Ghosh, Virginia CommonWealth University Publicity Chairs Daniel Capurro, Pontificia Univ. Católica de Chile A. Ercument Cicek, Bilkent University Pierangelo Veltri, U. Magna Graecia of Catanzaro Student Travel Award Chairs May D. Wang, Georgia Institute of Technology and Emory University JaroslaW Zola, University at Buffalo, The State University of NeW York Student Activity Chair Marzieh Ayati, Case Western Reserve University Dan DeBlasio, Carnegie Mellon University Proceedings Chairs: Xinghua Mindy Shi, U of North Carolina at Charlotte Yang Shen, Texas A&M University Web Admins: Anas Abu-Doleh, The
    [Show full text]
  • Optical Maps in Guided Genome Assembly
    Optical maps in guided genome assembly Miika Leinonen Helsinki September 9, 2019 UNIVERSITY OF HELSINKI Master’s Programme in Computer Science HELSINGIN YLIOPISTO — HELSINGFORS UNIVERSITET — UNIVERSITY OF HELSINKI Tiedekunta — Fakultet — Faculty Koulutusohjelma — Studieprogram — Study Programme Faculty of Science Study Programme in Computer Science Tekijä — Författare — Author Miika Leinonen Työn nimi — Arbetets titel — Title Optical maps in guided genome assembly Ohjaajat — Handledare — Supervisors Leena Salmela and Veli Mäkinen Työn laji — Arbetets art — Level Aika — Datum — Month and year Sivumäärä — Sidoantal — Number of pages Master’s thesis September 9, 2019 41 pages + 0 appendices Tiivistelmä — Referat — Abstract With the introduction of DNA sequencing over 40 years ago, we have been able to take a peek at our genetic material. Even though we have had a long time to develop sequencing strategies further, we are still unable to read the whole genome in one go. Instead, we are able to gather smaller pieces of the genetic material, which we can then use to reconstruct the original genome with a process called genome assembly. As a result of the genome assembly we often obtain multiple long sequences representing different regions of the genome, which are called contigs. Even though a genome often consists of a few separate DNA molecules (chromosomes), the number of obtained contigs outnumbers them substantially, meaning our reconstruction of the genome is not perfect. The resulting contigs can afterwards be refined by ordering, orienting and scaffolding them using additional information about the genome, which is often done manually by hand. The assembly process can also be guided automatically with the additional information, and in this thesis we are introducing a method that utilizes optical maps to aid us assemble the genome more accurately.
    [Show full text]
  • ISBRA 2012 Short Abstracts
    1 ISBRA 20 2 SHORT ABSTRACTS 8TH INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS RESEARCH AND APPLICATIONS May 21-23, 2012 University of Texas at Dallas, Dallas, TX http://www.cs.gsu.edu/isbra12/ Symposium Organizers Steering Committee Dan Gusfield, University of California, Davis Ion Mandoiu, University of Connecticutt Yi Pan, Georgia State University Marie-France Sagot, INRIA Alex Zelikovsky, Georgia State University General Chairs Ovidiu Daesku, University of Texas at Dallas Raj Sunderraman, Georgia State University Program Chairs Leonidas Bleris, University of Texas at Dallas Ion Mandoiu, University of Connecticut Russell Schwartz, Carnegie Mellon University Jianxin Wang, Central South University Publicity Chair Sahar Al Seesi, University of Connecticut Finance Chairs Anu Bourgeois, Georgia State University Raj Sunderraman, Georgia State University Web Master, Web Design Piyaphol Phoungphol J. Steven Kirtzic Sponsors NATIONAL SCIENCE DEPARTMENT OF COMPUTER SCIENCE DEPARTMENT OF COMPUTER SCIENCE FOUNDATION GEORGIA STATE UNIVESITY UNIVERSITY OF TEXAS AT DALLAS i Program Committee Members Srinivas Aluru, Iowa State University Allen Holder, Rose-Hulman Istitute of S. Cenk Sahinalp, Simon Fraser Danny Barash, Ben-Gurion Technology University University Jinling Huang, Eastern Carolina David Sankoff, University of Ottawa Robert Beiko, Dalhousie University University Russell Schwartz, Carnegie Mellon Anne Bergeron, Universite du Lars Kaderali, University of University Quebec a Montreal Heidelberg Joao Setubal, Virginia Bioinformatics Iyad Kanj,
    [Show full text]
  • DGEN: a Test Statistic for Detection of General Introgression Scenarios
    DGEN: A Test Statistic for Detection of General Introgression Scenarios Ryan A. Leo Elworth Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA [email protected] Chabrielle Allen Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA Travis Benedict Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA Peter Dulworth Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA Luay Nakhleh Department of Computer Science and Department of BioSciences, Rice University, 6100 Main Street, Houston, TX, USA [email protected] Abstract When two species hybridize, one outcome is the integration of genetic material from one spe- cies into the genome of the other, a process known as introgression. Detecting introgression in genomic data is a very important question in evolutionary biology. However, given that hybrid- ization occurs between closely related species, a complicating factor for introgression detection is the presence of incomplete lineage sorting, or ILS. The D-statistic, famously referred to as the “ABBA-BABA” test, was proposed for introgression detection in the presence of ILS in data sets that consist of four genomes. More recently, DFOIL – a set of statistics – was introduced to extend the D-statistic to data sets of five genomes. The major contribution of this paper is demonstrating that the invariants underlying both the D-statistic and DFOIL can be derived automatically from the probability mass functions of gene tree topologies under the null species tree model and alternative phylogenetic network model. Computational requirements aside, this automatic derivation provides a way to generalize these statistics to data sets of any size and with any scenarios of introgression.
    [Show full text]
  • Laxmi Parida Curriculum Vitae
    Laxmi Parida Curriculum Vitae Education 1995–1998 Phd, Courant Institute, New York University, New York. 1993–1995 Masters, Courant Institute, New York University, New York. Experience 2019 - IBM Fellow, IBM T J Watson Research Center, Yorktown Heights. current 2019 - IBM Master Inventor, IBM T J Watson Research Center, Yorktown Heights. current 2016 - 2019 Distinguished Research Staff Member, IBM T J Watson Research Center, York- town Heights. 2014 - 2016 Principal Research Staff Member, IBM T J Watson Research Center, Yorktown Heights. 2010 - Manager, Computational Genomics, IBM T J Watson Research Center, Yorktown current Heights. 1998 - 2014 Research Staff Member, IBM T J Watson Research Center, Yorktown Heights. Research Interests Data-driven + Topological Data Analysis Analytics + Pattern Discovery + Algorithms & Bioinformatics Disease + Cancer Genomics Genomics + NeuroGenomics IBM T J Watson Research Center – Yorktown Heights, NY 10598 – USA Ó +1 (914) 945 1376 • Q [email protected] researcher.ibm.com/person/us-parida IBM Research Genomics Group Page; NYU-page Other + Microbiome/Metagenomics Computational + Epidemiological Genomics Genomics + Population Genomics + Plant Genomics Honors and Awards + Watson Health GM Award 2021 (Tackle Challenges through Collaboration, Diver- sity & Inclusion) + International Society for Computational Biology (ISCB) Distinguished Fellow, 2020 + Continuous Surveillance Theme Lead for GTO2021, 2020 + Special Accomplishment for contributions to COVID-19 Technology Taskforce, 2020 + Member, IBM Academy
    [Show full text]
  • (Title of the Thesis)*
    Discovery of Flexible Gap Patterns from Sequences by En Hui Zhuang A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Systems Design Engineering Waterloo, Ontario, Canada, 2014 ©En Hui Zhuang 2014 AUTHOR'S DECLARATION I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Human genome contains abundant motifs bound by particular biomolecules. These motifs are involved in the complex regulatory mechanisms of gene expressions. The dominant mechanism behind the intriguing gene expression patterns is known as combinatorial regulation, achieved by multiple cooperating biomolecules binding in a nearby genomic region to provide a specific regulatory behavior. To decipher the complicated combinatorial regulation mechanism at work in the cellular processes, there is a pressing need to identify co-binding motifs for these cooperating biomolecules in genomic sequences. The great flexibility of the interaction distance between nearby cooperating biomolecules leads to the presence of flexible gaps in between component motifs of a co- binding motif. Many existing motif discovery methods cannot handle co-binding motifs with flexible gaps. Existing co-binding motif discovery methods are ineffective in dealing with the following problems: (1) co-binding motifs may not appear in a large fraction of the input sequences, (2) the lengths of component motifs are unknown and (3) the maximum range of the flexible gap can be large.
    [Show full text]
  • Pattern Discovery for Hypothesis Generation in Biology
    Pattern Discovery for Hypothesis Generation in Biology by Aristotelis Tsirigos A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science New York University January, 2006 _______________________ Dennis Shasha _______________________ Isidore Rigoutsos © Aristotelis Tsirigos All Rights Reserved, 2006 DEDICATION This dissertation is dedicated to my family in Greece. iii ACKNOWLEDGEMENTS First and foremost I would like to thank my advisor Dennis Shasha for his guidance and patience during these four years. He trusted me with important research projects while allowing me at the same time to take Spanish classes and do part of my research at IBM. My co-advisor at IBM Research Isidore Rigoutsos provided me with an excellent research environment at IBM Research both in terms of interesting computational problems and in terms of research collaborators inside and outside IBM. Ken Birnbaum’s lab at New York University and Philip Benfey’s lab at Duke University have been instrumental in our efforts to improve and validate our computational techniques. I would like to also thank the computer science professors and committee members Dan Melamed and Mehryar Mohri for extremely helpful discussions on machine learning techniques and their applications. I would especially like to thank NYU biology professors Fabio Piano and Kris Gunsalus for introducing me to the basic concepts and mechanisms in biology during my first years at NYU. My IBM collaborators Kevin Miranda, Alice iv McHardy, Tien Huynh, Laxmi Parida and Dan Platt helped make research at IBM fun and productive. I would also like to acknowledge the enormous support I received from Computer Science department administration in dealing with the necessary bureaucracy regarding my employment by IBM Research as a graduate co-op.
    [Show full text]
  • Annual Report
    DIMACS Center Rutgers University Special Focus on Information Processing in Biology Annual Report May 2006 Participants who spent 160 hours or more PI: Fred Roberts, DIMACS Participants who spent less than 160 hours Ron Levy, BioMaPS, Rutgers University, Special Focus Co-Organizer Wilma Olson, Center for Molecular Biophysics and Biophysical Chemistry, Rutgers University, Special Focus Co-Organizer Eduardo Sontag, BioMaPS, DIMACS, Rutgers University, Special Focus Co-Organizer BioMaPS/DIMACS/MBBC/PMMB/SYCON Short Course: Molecular Mechanisms and Models of Bacterial Signal Transduction June 6 - 10, 2005 Organizers: Eduardo Sontag, Rutgers University Ann Stock, UMDNJ/HHMI Workshop on Information Processing by Protein Structures in Molecular Recognition June 13 - 14, 2005 Organizers: Bhaskar DasGupta, University of Illinois at Chicago Jie Liang, University of Illinois at Chicago Workshop on Detecting and Processing Regularities in High Throughput Biological Data June 20 - 22, 2005 Organizer: Laxmi Parida, IBM T J Watson Research Workshop on Machine Learning Approaches for Understanding Gene Regulation August 15 - 17, 2005 Organizers: Christina Leslie, Columbia University Chris Wiggins, Columbia University Working Group on DNA Barcode of Life September 26, 2005 Organizers: Rebecka Jornsten, Rutgers University David Madigan, Rutgers University Fred Roberts, DIMACS Working Group on Evolution of Gene Regulatory Logic January 6 - 8, 2006 Organizers: Tanya Berger-Wolf, University of Illinois, Chicago David Krakauer, Santa Fe Institute Workshop
    [Show full text]
  • Giw/ Abacbs 2019
    GIW/ ABACBS 2019 30TH INTERNATIONAL CONFERENCE ON GENOME INFORMATICS & AUSTRALIAN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY SOCIETY ANNUAL CONFERENCE WELCOME WELCOME 2019 marks a very special time in the history of the analysis, precision medicine, single-cell analytics, Australasian Bioinformatics and Computational Biology non-human, agricultural, environmental and microbial Society (ABACBS). This is the first time that the annual genomics, proteomics and metabolomics and finally, ABACBS conference has partnered the International methods development and reproducibility research. Conference on Genome Informatics (GIW). This year, We have eight international and four national keynote these two conferences will join forces to bring together presenters, in addition to 36 presentations from GIW, members from all over Australia, the Asia-Pacific region selected from peer-reviewed full paper submissions and and the world to enjoy the opportunity to interact and 32 presentations from ABACBS selected from peer- hear fantastic stories about genomics, bioinformatics reviewed abstracts. We look forward to 12 exciting and computational biology research with an emphasis fast forward talks and two dynamic posters sessions on how advances in computational with 170 posters. and statistical techniques are applied ABACBS to solve important biological and ABACBS was founded in We wish to sincerely thank our biomedical problems. 2014 and represents over local organising committee, program 850 bioinformaticians committee chairs and representatives To keep pace with the growing and computational from the ABACBS and GIW communities biologists from across bioinformatics community, the who gave so much of their time to make Australia. Our members ABACBS conference is launching are a mix of students, this conference a reality. We are very parallel sessions as well as live academics, researchers, and grateful to all of our sponsors for their streaming of presentations.
    [Show full text]