Bayesian Assembly of Reads from High Throughput Sequencing

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Assembly of Reads from High Throughput Sequencing BAYESIAN ASSEMBLY OF READS FROM HIGH THROUGHPUT SEQUENCING A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Jonathan Laserson March 2012 © 2012 by Jonathan Daniel Laserson. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/xp796hy4748 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Daphne Koller, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Serafim Batzoglou I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Andrew Fire Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii iv Abstract The high-throughput sequencing revolution allows us to take millions of noisy short reads from the DNA in a sample, essentially taking a snapshot of the genomic material in the sample. To recover the true genomes, these reads are assembled by algorithms exploiting their high coverage and overlap. In this thesis, I focus on two scenarios for sequence assembly. In both cases I use the same principled Bayesian approach to design an algorithm that uncovers the composition of the genomic sequences that produced the reads. The first scenario is Metagenomics, a de novo assembly task where the reads come from an unknown and diverse population of genomes. Here the challenge stems from uncertainty over the sample composition, read coverage, and noise levels. We developed Genovo, an assembler driven by a probabilistic model of the process of read generation. A nonparametric prior accounts for the unknown number of genomes and their relative abundance in the sample. We compared our method to three other short read assembly programs, over real and synthetic datasets. The results show that our Bayesian approach provides assemblies that outperform the state of the art on various measures. The second scenario is lineage assembly, where the reads come from short but clonally related genomes only slightly mutated from each other, generated over a small time scale. We developed ImmuniTree, a tool that builds a phylogenetic tree representing the mutated genomes, where the input reads are partitioned among the tree’s nodes. Unlike traditional phylogenetic trees, our trees are not necessarily binary, and we allow reads in internal nodes. The underlying model compounds a sequencing noise model on top of a mutation model, allowing the user to tell apart disagreements v between the reads that are due to random noise, versus an actual mutation. We demonstrated ImmuniTree on reconstruction of B cell lineages. In a healthy immune system, the B cells are produced with an incredible diversity, in an essentially random process. When a B cell detects an invader it starts cloning itself through a process of hypermutation, with B cells that detect the invader better selected over those that do not. We collected samples from 10 organ donors, each providing four tissue samples. Reads taken from the samples cover each the variable region of the heavy chain in a single B cell receptor. To analyze this data, we clustered the reads into clones and built a tree for each one using ImmuniTree. Our findings show that many of the clones are shared by B cells from multiple tissues, which suggests a high degree of B cell migration. In addition, our results show other aspects of B cell production in which the four tissues differ. vi Acknowledgments First, I’d like to thank Daphne Koller, my advisor, who taught me about sharp and uncompromising research. Throughout my PhD I worked on a number of projects in different fields, and one of the things I most appreciate is that each and every one of those projects was addressing a central problem in its field. These projects opened doors to amazing opportunities and collaborations. I want to thank Daphne for her guidance, and her genuine interest in the success of all my projects. Vladimir Jojic, with whom I worked closely for more than a year, was also a great influence on my PhD path. He taught me a lot about MCMC and computational Biology. In fact, he is the reason I transitioned from computer vision to Biology, for better or worse. The collaboration with Scott Boyd and Andrew Fire was particularly enjoyable. They devoted a ton of their time discussing with me the different aspects of their data, and offered many research trajectories and help. I also want to thank their students, Katherine Jackson and Elyse Hope for streamlining the data from their side, and Clauss Niemann for collecting and sequencing the samples. I want to thank Serafim Batzoglou and his wonderful students for helpful discus- sions, and also for adopting me when I was all alone in that conference in Lisbon. I will never forget the day I gave a talk to your group. Through the PhD my funding that made this work possible came from the Stan- ford Graduate Fellowship, and also National Science Foundation Grant BDI-0345474. Many other people helped me through my time at Stanford, either directly to my research or simply by making the lab a happy place. I cannot possibly list here all of them. But I will name a few. Geremy Heitz was my first mentor in Daphne’s group, vii back when I was doing vision. I learned a lot from him about coding paradigms and running jobs on the cluster. Ben Packer and I were close friends from day one in Stanford, and soon after we became office-mates and colleagues in Daphne’s group. This makes him the person who had spent the most time in the same room with me in all of California, and we are still friends. Karen Sachs is both a good friend and one of those rare people who nativley knows both Computer Science and Biology. Almost every paper I wrote or presented went through her eyes first. My thanks go to all the other talented members of the DAGS group, especially Gal Elidan, Alexis Battle, Yoni Donner, Manfred Claassen, Such Saria, and Cristina Pop. Also thanks to my cousin, Uri Laserson, for many useful and stimulating conversations as somehow we ended up working in the same domain. During my nomadic search for purpose in my first year at Stanford, I enjoyed the interaction with many Stanford Professors, many in the Psychology department. I spent a few months in the groups of Vladlen Koltun, who also supported me finan- cially, and Lera Boroditsky. Even though I turned out to take a different academic route, I feel that what I learned in that first year profoundly shaped my interests and views. Many of the PhD students who joined the CS department the same year as I became great friends and companions. I want to thank Daniel Ramage, Siddhartha Chaudhuri, and Paul Heymann. Special thanks to Aleksandra Korolova for a true friendship that sadly did not last. For great friendship and good times that go beyond Stanford, I want to thank Orly Fuhrman, who became my housemate, and who was briefly my lab-mate as well, Leor Vartikovski, who quickly became the key ingredient of my social life, and also Shlomi, Roee, Inbal & Micha, Rachel, Eran, Arash, Maya, Rafit (and family), Joy, Noam, Nir, Inbal & Noy, Sivan, Pamela & Ophir, Natalie, Sarah, and Amy. Thanks also to my good old pals from Israel, Zvika, Eyal, Aviv, Gilad, Alex and Tair. Last, I want to thank my family. For my parents, Aliza and Avi, I think the PhD was the first time they ever saw me struggle, and they gave me tremendous and continuous support all the way from Israel, along with good advice and unconditional love. My sister Mia, my brother Itamar, his wife Ayelet, and all my beloved extended viii family, you all made me feel how fun it is to have the family near by, and this is one of the best reasons I can think of for coming back home. ix Contents Abstract v Acknowledgments vii 1 Introduction 1 1.1 Contributions ............................... 4 1.2 ThesisOutline............................... 5 1.3 Previously Published Work . 6 2 Preliminaries 7 2.1 TheJointDistribution .......................... 7 2.1.1 Independence ........................... 11 2.2 BayesianNetworks ............................ 12 2.2.1 PlateModels ........................... 14 2.2.2 Inference.............................. 15 2.2.3 Finding The MAP Assignment . 18 2.3 CommonDistributions . .. .. 19 2.3.1 DirichletDistribution. 19 2.4 MarkovChainMonteCarlo . 23 2.4.1 GibbsSampling.......................... 26 2.4.2 Using MCMC to Get the MAP Assignment . 26 2.4.3 MCMC With Collapsed Particles . 27 2.4.4 MCMC With Auxiliary Variables . 27 2.4.5 SliceSampling........................... 27 x 2.4.6 Example: Bayesian Clustering . 29 2.5 Non-ParametricModels . .. .. 32 2.5.1 Example: Bayesian Clustering - Continued . 32 2.5.2 The Chinese Restaurant Process . 34 2.5.3 TheDirichletProcess. 35 3 De novo Sequence Assembly 37 3.1 Background ...............................
Recommended publications
  • Algorithms for Computational Biology 8Th International Conference, Alcob 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings
    Lecture Notes in Bioinformatics 12715 Subseries of Lecture Notes in Computer Science Series Editors Sorin Istrail Brown University, Providence, RI, USA Pavel Pevzner University of California, San Diego, CA, USA Michael Waterman University of Southern California, Los Angeles, CA, USA Editorial Board Members Søren Brunak Technical University of Denmark, Kongens Lyngby, Denmark Mikhail S. Gelfand IITP, Research and Training Center on Bioinformatics, Moscow, Russia Thomas Lengauer Max Planck Institute for Informatics, Saarbrücken, Germany Satoru Miyano University of Tokyo, Tokyo, Japan Eugene Myers Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany Marie-France Sagot Université Lyon 1, Villeurbanne, France David Sankoff University of Ottawa, Ottawa, Canada Ron Shamir Tel Aviv University, Ramat Aviv, Tel Aviv, Israel Terry Speed Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia Martin Vingron Max Planck Institute for Molecular Genetics, Berlin, Germany W. Eric Wong University of Texas at Dallas, Richardson, TX, USA More information about this subseries at http://www.springer.com/series/5381 Carlos Martín-Vide • Miguel A. Vega-Rodríguez • Travis Wheeler (Eds.) Algorithms for Computational Biology 8th International Conference, AlCoB 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings 123 Editors Carlos Martín-Vide Miguel A. Vega-Rodríguez Rovira i Virgili University University of Extremadura Tarragona, Spain Cáceres, Spain Travis Wheeler University of Montana Missoula, MT, USA ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Bioinformatics ISBN 978-3-030-74431-1 ISBN 978-3-030-74432-8 (eBook) https://doi.org/10.1007/978-3-030-74432-8 LNCS Sublibrary: SL8 – Bioinformatics © Springer Nature Switzerland AG 2021 This work is subject to copyright.
    [Show full text]
  • UNIVERSITY of CALIFORNIA RIVERSIDE Unsupervised And
    UNIVERSITY OF CALIFORNIA RIVERSIDE Unsupervised and Zero-Shot Learning for Open-Domain Natural Language Processing A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Muhammad Abu Bakar Siddique June 2021 Dissertation Committee: Dr. Evangelos Christidis, Chairperson Dr. Amr Magdy Ahmed Dr. Samet Oymak Dr. Evangelos Papalexakis Copyright by Muhammad Abu Bakar Siddique 2021 The Dissertation of Muhammad Abu Bakar Siddique is approved: Committee Chairperson University of California, Riverside To my family for their unconditional love and support. i ABSTRACT OF THE DISSERTATION Unsupervised and Zero-Shot Learning for Open-Domain Natural Language Processing by Muhammad Abu Bakar Siddique Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, June 2021 Dr. Evangelos Christidis, Chairperson Natural Language Processing (NLP) has yielded results that were unimaginable only a few years ago on a wide range of real-world tasks, thanks to deep neural networks and the availability of large-scale labeled training datasets. However, existing supervised methods assume an unscalable requirement that labeled training data is available for all classes: the acquisition of such data is prohibitively laborious and expensive. Therefore, zero-shot (or unsupervised) models that can seamlessly adapt to new unseen classes are indispensable for NLP methods to work in real-world applications effectively; such models mitigate (or eliminate) the need for collecting and annotating data for each domain. This dissertation ad- dresses three critical NLP problems in contexts where training data is scarce (or unavailable): intent detection, slot filling, and paraphrasing. Having reliable solutions for the mentioned problems in the open-domain setting pushes the frontiers of NLP a step towards practical conversational AI systems.
    [Show full text]
  • Are Profile Hidden Markov Models Identifiable?
    Are Profile Hidden Markov Models Identifiable? Srilakshmi Pattabiraman Tandy Warnow Department of Electrical and Computer Engineering Department of Computer Science University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Urbana, Illinois Urbana, Illinois [email protected] [email protected] ABSTRACT 1 INTRODUCTION Profile Hidden Markov Models (HMMs) are graphical models that Profile Hidden Markov Models (HMMs) are arguably themost can be used to produce finite length sequences from a distribution. common statistical models in bioinformatics. Originally introduced In fact, although they were only introduced for bioinformatics 25 by Haussler and colleagues in [10, 12], and then expanded later years ago (by Haussler et al., Hawaii International Conference on in many subsequent texts [4–6, 9, 11, 21, 25], profile HMMs are Systems Science 1993), they are arguably the most commonly used now used in many analytical steps in biological sequence analysis statistical model in bioinformatics, with multiple applications, in- [15, 17–19, 22]. cluding protein structure and function prediction, classifications Profile Hidden Markov models are graphical models with match of novel proteins into existing protein families and superfamilies, states, insertion states, and deletion states; and the match and in- metagenomics, and multiple sequence alignment. The standard use sertion states emit letters from an underlying alphabet Σ (i.e., Σ of profile HMMs in bioinformatics has two steps: first a profile may be the 20 amino acids, the four nucleotides, or some other HMM is built for a collection of molecular sequences (which may set of symbols). In the standard form presented in [4] (widely in not be in a multiple sequence alignment), and then the profile HMM use in bioinformatics applications), each profile Hidden Markov is used in some subsequent analysis of new molecular sequences.
    [Show full text]
  • Computational Biology and Bioinformatics
    Vol. 30 ISMB 2014, pages i1–i2 BIOINFORMATICS EDITORIAL doi:10.1093/bioinformatics/btu304 Editorial This special issue of Bioinformatics serves as the proceedings of The conference used a two-tier review system, a continuation the 22nd annual meeting of Intelligent Systems for Molecular and refinement of a process begun with ISMB 2013 in an effort Biology (ISMB), which took place in Boston, MA, July 11–15, to better ensure thorough and fair reviewing. Under the revised 2014 (http://www.iscb.org/ismbeccb2014). The official confer- process, each of the 191 submissions was first reviewed by at least ence of the International Society for Computational Biology three expert referees, with a subset receiving between four and (http://www.iscb.org/), ISMB, was accompanied by 12 Special eight reviews, as needed. These formal reviews were frequently Interest Group meetings of one or two days each, two satellite supplemented by online discussion among reviewers and Area meetings, a High School Teachers Workshop and two half-day Chairs to resolve points of dispute and reach a consensus on tutorials. Since its inception, ISMB has grown to be the largest each paper. Among the 191 submissions, 29 were conditionally international conference in computational biology and bioinfor- accepted for publication directly from the first round review Downloaded from matics. It is expected to be the premiere forum in the field for based on an assessment of the reviewers that the paper was presenting new research results, disseminating methods and tech- clearly above par for the conference. A subset of 16 papers niques and facilitating discussions among leading researchers, were viewed as potentially in the top tier but raised significant practitioners and students in the field.
    [Show full text]
  • University of California Santa Cruz Sample
    UNIVERSITY OF CALIFORNIA SANTA CRUZ SAMPLE-SPECIFIC CANCER PATHWAY ANALYSIS USING PARADIGM A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in BIOMOLECULAR ENGINEERING AND BIOINFORMATICS by Stephen C. Benz June 2012 The Dissertation of Stephen C. Benz is approved: Professor David Haussler, Chair Professor Joshua Stuart Professor Nader Pourmand Dean Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright c by Stephen C. Benz 2012 Table of Contents List of Figures v List of Tables xi Abstract xii Dedication xiv Acknowledgments xv 1 Introduction 1 1.1 Identifying Genomic Alterations . 2 1.2 Pathway Analysis . 5 2 Methods to Integrate Cancer Genomics Data 10 2.1 UCSC Cancer Genomics Browser . 11 2.2 BioIntegrator . 16 3 Pathway Analysis Using PARADIGM 20 3.1 Method . 21 3.2 Comparisons . 26 3.2.1 Distinguishing True Networks From Decoys . 27 3.2.2 Tumor versus Normal - Pathways associated with Ovarian Cancer 29 3.2.3 Differentially Regulated Pathways in ER+ve vs ER-ve breast can- cers . 36 3.2.4 Therapy response prediction using pathways (Platinum Free In- terval in Ovarian Cancer) . 38 3.3 Unsupervised Stratification of Cancer Patients by Pathway Activities . 42 4 SuperPathway - A Global Pathway Model for Cancer 51 4.1 SuperPathway in Ovarian Cancer . 55 4.2 SuperPathway in Breast Cancer . 61 iii 4.2.1 Chin-Naderi Cohort . 61 4.2.2 TCGA Breast Cancer . 63 4.3 Cross-Cancer SuperPathway . 67 5 Pathway Analysis of Drug Effects 74 5.1 SuperPathway on Breast Cell Lines .
    [Show full text]
  • Introduction
    Introduction IJCAI-01 Conference Committee IJCAI-01 Program Committee: Contents: CONFERENCE CHAIR: Elisabeth André, DFKI GmbH (Germany) Introduction 2 Hector J. Levesque, University of Toronto (Canada) Minoru Asada, Osaka University (Japan) Sponsors & Committees 2-3 Franz Baader, RWTH Aachen (Germany) PROGRAM CHAIR: IJCAI-01 Awards 4 Craig Boutilier, University of Toronto (Canada) Bernhard Nebel,Albert-Ludwigs-Universität, Freiburg Didier Dubois, IRIT-CNRS (France) Conference at a Glance 5 (Germany) Maria Fox, University of Durham (United Kingdom) Workshop Program 6-7 LOCAL ARRANGEMENTS CHAIR: Hector Geffner, Universidad Simón Bolívar Doctoral Consortium 8 James Hoard, The Boeing Company, Seattle (USA) (Venezuela) Tutorial Program 8 SECRETARY-TREASURER: Georg Gottlob,Vienna University of Technology (Austria) Conference Program Highlights 9 Ronald J. Brachman,AT&T Labs – Research (USA) Invited Speakers 10 Haym Hirsh, Rutgers University (USA) IAAI-01 Conference 11 Eduard Hovy, Information Sciences Institute (USA) Advisory Committee: Joxan Jaffar, National University of Singapore Technical Program 12-19 Bruce Buchanan, University of Pittsburgh (USA) (Singapore) Exhibit Program 20-23 Silvia Coradeschi, Örebro University (Sweden) Daphne Koller, Stanford University (USA) RoboCup 2001 24 Olivier Faugeras, INRIA (France) Fangzhen Lin, Hong Kong University of Science and Registration Information 25 Cheng Hu, Chinese Academy of Sciences (China) Technolog y (Hong Kong) General Information 25-27 Nicholas Jennings, University of London (England) Heikki Mannila, Nokia Research Center (Finland) Conference Maps 28-30 Henry Kautz, University of Washington (USA) Robert Milne, Intelligent Applications (United Kingdom) IJCAI-03 Conference 31 Robert Mercer, University of Western Ontario (Canada) Daniele Nardi, Università di Roma “La Sapienza” Special Meetings 31 Silvia Miksch,Vienna University of Technology (Italy) (Austria) Dana Nau, University of Maryland (USA) Devika Subramanian, Rice University (USA) Patrick Prosser, University of Glasgow (UK) Welcome to IJCAI-01 L.
    [Show full text]
  • Research News
    Computing Research News COMPUTING RESEARCH ASSOCIATION, CELEBRATING 40 YEARS OF SERVICE TO THE COMPUTING RESEARCH COMMUNITY JUNE 2013 Vol. 25 / No. 6 Announcements 2 Coalition for National Science Funding 2 CRA Announces Outstanding Undergraduate Researcher Award Winners 3 Computing Research in Action 5 CERP Infographic 6 NSF Funding Opportunity 6 CRA Recognizes Participants 7 CRA Board Members 16 CRA Board Officers 16 CRA Staff 16 Professional Opportunities 17 COMPUTING RESEARCH NEWS, JUNE 2013 Vol. 25 / No. 6 Announcements 2012 Taulbee Report Updated May 15, 2013 Corrected Table F6 Click here to download updated version CRA Releases Latest Research Issue Report New Technology-based Models for Postsecondary Learning: Conceptual Frameworks and Research Agendas The report details the findings of a National Science Foundation-Sponsored Computing Research Association Workshop held at MIT on January 9-11, 2013. From the report: “Advances in technology and in knowledge about expertise, learning, and assessment have the potential to reshape the many forms of education and training past matriculation from high school. In the next decade, higher education, military and workplace training, and professional development must all transform to exploit the opportunities of a new era, leveraging emerging technology-based models that can make learning more efficient and possibly improve student support, all at lower cost for a broader range of learners.” The report is now available as a pdf at http://cra.org/resources/research-issues/. Slides from the presentation at NSF on April 19, 2013 are also available. Investments in STEM Research and Education: Fueling American Innovation On May 7, at the Rayburn House Office Building in Brett Bode from the National Center for Supercomputing Washington, DC, the Coalition for National Science Funding Applications at University of Illinois Urbana-Champaign were (CNSF) held its 19th annual exhibition and reception, on hand to talk about the “Blue Waters” project.
    [Show full text]
  • I S C B N E W S L E T T
    ISCB NEWSLETTER FOCUS ISSUE {contents} President’s Letter 2 Member Involvement Encouraged Register for ISMB 2002 3 Registration and Tutorial Update Host ISMB 2004 or 2005 3 David Baker 4 2002 Overton Prize Recipient Overton Endowment 4 ISMB 2002 Committees 4 ISMB 2002 Opportunities 5 Sponsor and Exhibitor Benefits Best Paper Award by SGI 5 ISMB 2002 SIGs 6 New Program for 2002 ISMB Goes Down Under 7 Planning Underway for 2003 Hot Jobs! Top Companies! 8 ISMB 2002 Job Fair ISCB Board Nominations 8 Bioinformatics Pioneers 9 ISMB 2002 Keynote Speakers Invited Editorial 10 Anna Tramontano: Bioinformatics in Europe Software Recommendations11 ISCB Software Statement volume 5. issue 2. summer 2002 Community Development 12 ISCB’s Regional Affiliates Program ISCB Staff Introduction 12 Fellowship Recipients 13 Awardees at RECOMB 2002 Events and Opportunities 14 Bioinformatics events world wide INTERNATIONAL SOCIETY FOR COMPUTATIONAL BIOLOGY A NOTE FROM ISCB PRESIDENT This newsletter is packed with information on development and dissemination of bioinfor- the ISMB2002 conference. With over 200 matics. Issues arise from recommendations paper submissions and over 500 poster submis- made by the Society’s committees, Board of sions, the conference promises to be a scientific Directors, and membership at large. Important feast. On behalf of the ISCB’s Directors, staff, issues are defined as motions and are discussed EXECUTIVE COMMITTEE and membership, I would like to thank the by the Board of Directors on a bi-monthly Philip E. Bourne, Ph.D., President organizing committee, local organizing com- teleconference. Motions that pass are enacted Michael Gribskov, Ph.D., mittee, and program committee for their hard by the Executive Committee which also serves Vice President work preparing for the conference.
    [Show full text]
  • Curriculum Vitae
    Curriculum Vitae Tandy Warnow Grainger Distinguished Chair in Engineering 1 Contact Information Department of Computer Science The University of Illinois at Urbana-Champaign Email: [email protected] Homepage: http://tandy.cs.illinois.edu 2 Research Interests Phylogenetic tree inference in biology and historical linguistics, multiple sequence alignment, metage- nomic analysis, big data, statistical inference, probabilistic analysis of algorithms, machine learning, combinatorial and graph-theoretic algorithms, and experimental performance studies of algorithms. 3 Professional Appointments • Co-chief scientist, C3.ai Digital Transformation Institute, 2020-present • Grainger Distinguished Chair in Engineering, 2020-present • Associate Head for Computer Science, 2019-present • Special advisor to the Head of the Department of Computer Science, 2016-present • Associate Head for the Department of Computer Science, 2017-2018. • Founder Professor of Computer Science, the University of Illinois at Urbana-Champaign, 2014- 2019 • Member, Carl R. Woese Institute for Genomic Biology. Affiliate of the National Center for Supercomputing Applications (NCSA), Coordinated Sciences Laboratory, and the Unit for Criticism and Interpretive Theory. Affiliate faculty member in the Departments of Mathe- matics, Electrical and Computer Engineering, Bioengineering, Statistics, Entomology, Plant Biology, and Evolution, Ecology, and Behavior, 2014-present. • National Science Foundation, Program Director for Big Data, July 2012-July 2013. • Member, Big Data Senior Steering Group of NITRD (The Networking and Information Tech- nology Research and Development Program), subcomittee of the National Technology Council (coordinating federal agencies), 2012-2013 • Departmental Scholar, Institute for Pure and Applied Mathematics, UCLA, Fall 2011 • Visiting Researcher, University of Maryland, Spring and Summer 2011. 1 • Visiting Researcher, Smithsonian Institute, Spring and Summer 2011. • Professeur Invit´e,Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Summer 2010.
    [Show full text]
  • UCLA UCLA Electronic Theses and Dissertations
    UCLA UCLA Electronic Theses and Dissertations Title Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods Permalink https://escholarship.org/uc/item/0tr9j01r Author Sun, Yidan Publication Date 2021 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Statistics by Yidan Sun 2021 © Copyright by Yidan Sun 2021 ABSTRACT OF THE DISSERTATION Bipartite Network Community Detection: Development and Survey of Algorithmic and Stochastic Block Model Based Methods by Yidan Sun Doctor of Philosophy in Statistics University of California, Los Angeles, 2021 Professor Jingyi Li, Chair In a bipartite network, nodes are divided into two types, and edges are only allowed to connect nodes of different types. Bipartite network clustering problems aim to identify node groups with more edges between themselves and fewer edges to the rest of the network. The approaches for community detection in the bipartite network can roughly be classified into algorithmic and model-based methods. The algorithmic methods solve the problem either by greedy searches in a heuristic way or optimizing based on some criteria over all possible partitions. The model-based methods fit a generative model to the observed data and study the model in a statistically principled way. In this dissertation, we mainly focus on bipartite clustering under two scenarios: incorporation of node covariates and detection of mixed membership communities.
    [Show full text]
  • Director's Update
    Director’s Update Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Council of Councils Meeting September 6, 2019 Changes in Leadership . Retirements – Paul A. Sieving, M.D., Ph.D., Director of the National Eye Institute Paul Sieving (7/29/19) Linda Birnbaum – Linda S. Birnbaum, Ph.D., D.A.B.T., A.T.S., Director of the National Institute of Environmental Health Sciences (10/3/19) . New Hires – Noni Byrnes, Ph.D., Director, Center for Scientific Review (2/27/19) Noni Byrnes – Debara L. Tucci, M.D., M.S., M.B.A., Director, National Institute on Deafness and Other Communication Disorders (9/3/19) Debara Tucci . New Positions – Tara A. Schwetz, Ph.D., Associate Deputy Director, NIH (1/7/19) Tara Schwetz 2019 Inaugural Inductees Topics for Today . NIH HEAL (Helping to End Addiction Long-termSM) Initiative – HEALing Communities Study . Artificial Intelligence: ACD WG Update . Human Genome Editing – Exciting Promise for Cures, Need for Moratorium on Germline . Addressing Foreign Influence on Research … and Harassment in the Research Workplace NIH HEAL InitiativeSM . Trans-NIH research initiative to: – Improve prevention and treatment strategies for opioid misuse and addiction – Enhance pain management . Goals are scientific solutions to the opioid crisis . Coordinating with the HHS Secretary, Surgeon General, federal partners, local government officials and communities www.nih.gov/heal-initiative NIH HEAL Initiative: At a Glance . $500M/year – Will spend $930M in FY2019 . 12 NIH Institute and Centers leading 26 HEAL research projects – Over 20 collaborating Institutes, Centers, and Offices – From prevention research, basic and translational research, clinical trials, to implementation science – Multiple projects integrating research into new settings .
    [Show full text]
  • Top 100 AI Leaders in Drug Discovery and Advanced Healthcare Introduction
    Top 100 AI Leaders in Drug Discovery and Advanced Healthcare www.dka.global Introduction Over the last several years, the pharmaceutical and healthcare organizations have developed a strong interest toward applying artificial intelligence (AI) in various areas, ranging from medical image analysis and elaboration of electronic health records (EHRs) to more basic research like building disease ontologies, preclinical drug discovery, and clinical trials. The demand for the ML/AI technologies, as well as for ML/AI talent, is growing in pharmaceutical and healthcare industries and driving the formation of a new interdisciplinary industry (‘data-driven healthcare’). Consequently, there is a growing number of AI-driven startups and emerging companies offering technology solutions for drug discovery and healthcare. Another important source of advanced expertise in AI for drug discovery and healthcare comes from top technology corporations (Google, Microsoft, Tencent, etc), which are increasingly focusing on applying their technological resources for tackling health-related challenges, or providing technology platforms on rent bases for conducting research analytics by life science professionals. Some of the leading pharmaceutical giants, like GSK, AstraZeneca, Pfizer and Novartis, are already making steps towards aligning their internal research workflows and development strategies to start embracing AI-driven digital transformation at scale. However, the pharmaceutical industry at large is still lagging behind in adopting AI, compared to more traditional consumer industries -- finance, retail etc. The above three main forces are driving the growth in the AI implementation in pharmaceutical and advanced healthcare research, but the overall success depends strongly on the availability of highly skilled interdisciplinary leaders, able to innovate, organize and guide in this direction.
    [Show full text]