Profile of David Haussler

Total Page:16

File Type:pdf, Size:1020Kb

Profile of David Haussler PROFILE Profile of David Haussler equencing the human genome rules, on strings written over finite al- was the final grand scientific phabets, ‘‘so it could be DNA in one achievement of the 20th century. incarnation, and something quite differ- One scientist who played a major ent in another’’ (4). Spart in organizing and analyzing the three billion base pairs of DNA that Sequencing Pioneers make up our genome is David Haussler, While at Boulder in the early 1980s, who was elected to the National Acad- Haussler befriended fellow grad stu- emy of Sciences in 2006. His Inaugural dents Gene Myers and Gary Stormo, Article, on mathematically modeling the and the three of them worked toward evolution of genomes, is published in sequencing DNA and making sense this issue (1). of genomes. They met in a seminar Haussler is currently a professor of led by Haussler’s supervisor, Andrzej biomolecular engineering at the Univer- Ehrenfeucht. sity of California, Santa Cruz (UCSC), Since their student days, the three and a Howard Hughes Medical Institute have each had an enormous impact on investigator, but his career took a slight genomics: Stormo developed fundamen- detour from his southern California tal methods of recognizing sequence roots to scientific prominence. He began motifs and other patterns in genomic his career in the humanities before set- data, Myers led the bioinformatics team tling in mathematics, but a keen ability at Celera, the private company that se- to mix his interests in biology and quenced the human genome, and genomics has led to his consistently ap- Haussler’s group provided bioinformat- plying math to biology. ics for the public, government-funded Human Genome Project. A Restive Youth ‘‘In the 1980s, we were analyzing Haussler grew up in Los Angeles, where David Haussler small snippets of the E. coli genome his father, an engineer, encouraged that were available and genomes of bac- David’s and his brother Mark’s interest teriophages like phi-X 174. These repre- because I was the math major. It was in science. David, however, did not fol- sented the first products of the early low a straight path to science. During really a foundational experience for me sequencing efforts, driven by the introduc- high school he was more interested in and a harbinger of things to come’’(2). tion of recombinant DNA methodolo- art and psychology and, after graduat- After completing his bachelor’s de- gies,’’ Haussler says. ‘‘We didn’t have ing, enrolled in the Academy of Art in gree in 1975, Haussler received a mas- much data to work with back then.’’ But San Francisco, in 1971, where he stud- ter’s degree in applied mathematics in that didn’t stop them from preparing the ied painting for three months. He soon 1979 from California Polytechnic State techniques that could be used to analyze transferred to tiny, offbeat Immaculate University at San Luis Obispo. He the large quantities of DNA to come (5). Heart College (IHC) in Hollywood, then moved to the University of Colo- Later that decade, Haussler moved where he studied gestalt therapy in rado (Boulder, CO), where he ob- among computer science, artificial intel- the hope of becoming a practicing tained his Ph.D in computer science in ligence, and statistics. ‘‘I was interested psychologist. 1982. In 2005, he won the Classic Pa- in how brain-like algorithms could be During this restless time, his brother per Award from the American Associa- built and what their limitations and helped him find his calling. After two tion of Artificial Intelligence (AAAI) strengths were,’’ he says. ‘‘I wanted to years at IHC, David left in 1973 to ma- for an earlier manuscript on learning know what was theoretically learnable in jor in mathematics at Connecticut Col- algorithms (3). a very general sense.’’ lege (New London, CT). He has also won the Dickson Prize in Even while working in mathematics, ‘‘I think the turning point came when science from Carnegie Mellon Univer- Haussler kept his eye on genomics, a I went to work in my brother’s lab. He’s sity (Pittsburgh, PA) and the Associa- contemporary field that only began tak- 12 years older and was a biochemist at tion for Computing Machinery/AAAI ing shape during the late 1970s. When the University of Arizona,’’ Haussler Allen Newell Award, and he is a Fel- more gene sequence data became avail- recalls. ‘‘He said, ‘You want a summer low of the California Academy of Sci- able in the 1990s, he got back into the job? Come to my lab and I’ll teach you ences, the American Academy of Arts field, developing statistical models and how to do science.’ He gave me and Sciences, and the American Asso- algorithms that were later used in major Leninger’s book on biochemistry and ciation for the Advancement of Sci- genome projects (6). said, ‘Read this first.’ I read the text and ence. He is 54 and married, with two ‘‘It was a long way from bacterio- worked in his lab and it was a dream children in college. phage genomes to our first whole ani- summer.’’ Haussler’s doctoral thesis was in pure mal genome,’’ Haussler says. ‘‘Of ‘‘By the end of it we had measured math, reporting his study of formal lan- course, the ultimate project was the hu- the levels of the hormonal form of vi- guage theory and the theory of compu- man genome. We were recruited into tamin D in the human bloodstream for tation, including Turing machines. It the project because they wanted experts the first time, and we published a pa- may seem far removed from the human per in Science—my first publication,’’ genome, but Haussler explains that this This is a Profile of a recently elected member of the National Haussler recalls. ‘‘Although my job was abstract world of machine language de- Academy of Sciences to accompany the member’s Inaugural the lab work, I also ended up doing a scribes how anything that is computable Article on pages 14254-14261 of volume 105. key step in the analysis for the paper, can be computed, using very simple © 2008 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808284105 PNAS ͉ September 23, 2008 ͉ vol. 105 ͉ no. 38 ͉ 14251–14253 Downloaded by guest on September 30, 2021 to find the genes in the DNA. We had developed a methodology for this using hidden Markov models.’’ He officially joined the public Human Genome Project in 1999. ‘‘When we got there, the public project had just tiny snippets of DNA scattered all over the genome in GenBank files without any cohesive map or assembly to pull them together,’’ Haussler recalls. There were genetic maps of the genome measured in centimorgans, radiation hybrid maps, and physical maps made from restric- tion enzyme digest data obtained from the thousands of approximately 150,000-base pair artificial chromo- somes that the project was sequencing. He remembers that the project’s origi- nal plans for assembling the draft ge- nome data were not working and had to be reinvented. ‘‘These maps were mutually inconsis- tent in places, the data were noisy, and David Haussler’s wet lab trying to overlay all the sequence data— both genomic DNA snippets and cDNA sequences made from mRNAs—was just You could download the human ge- have put on more powerful search and a huge jigsaw puzzle,’’ he says. ‘‘We nome, for free, without restriction, from interactive capabilities for access to an couldn’t even start to find the genes un- Santa Cruz that day. This was humani- increasing variety of high-throughput til we had built long stretches of contin- ty’s first real glimpse at its own recipe.’’ genomic data’’ (8). uous DNA.’’ Since then, Haussler has been assem- Along the way, the researchers have ‘‘Jim Kent, of my group, stepped in at bling and scanning other genomes to found ultra-conserved regions of the the last minute and saved the day by find the recipes of different species of human genome that have remained un- writing an amazing assembly program living animals, and using the results to changed for hundreds of millions of that we call GigAssembler,’’ Haussler determine the difference between the years, along with an RNA gene, ex- says (7). The program works at a scale DNA of various organisms and modern pressed in early development of the of billions of bases of information, tak- humans. His team has been a part of brain’s neocortex, that has changed dra- ing information from 13 sources, includ- deciphering the mouse, chimpanzee, matically only recently and could play a ing the different maps and information macaque, fruit fly, chicken, and rat role in human uniqueness (9–11). on RNA transcripts. genomes. All data from those projects Haussler’s current research is fo- Haussler says that without Kent, who cused on the evolution of complete wrote 20,000 lines of code in just a few genomes. He wants to take the trees of months, the public project would not life that have been created by compari- have caught up with Gene Myers’ team ‘‘The only hope for sons of single genes and do the same at Celera, which was well funded and with entire genomes, studying the boasted plenty of computational understanding the changes that made species what they power. are, or were (12). ‘‘We ended up doing our first assem- molecular evolution bly on 100 desktop machines that the DNA Evolution UCSC chancellor and dean of engineer- of life is to understand In his Inaugural Article, he and his ing hastily purchased for us.
Recommended publications
  • Department of Energy Office of Health and Environmental Research SEQUENCING the HUMAN GENOME Summary Report of the Santa Fe Workshop March 3-4, 1986
    Department of Energy Office of Health and Environmental Research SEQUENCING THE HUMAN GENOME Summary Report of the Santa Fe Workshop March 3-4, 1986 Los Alamos National Laboratory Los Alamos Los Alamos, New Mexico 87545 Los Alamos National Laboratory is operated by the University of California for the United States Department of Energy under contract W-7405-ENG-36. DEPARTMENT OF ENERGY OFFICE OF HEALTH AND ENVIRONMENTAL RESEARCH SEQUENCING THE HUMAN GENOME SUMMARY REPORT ON THE SANTA FE WORKSHOP (MARCH 3-4, 1986) Executive Summary. The following is a summary of the Santa Fe Workshop held on March 3 and 4, 1986. The workshop was sponsored by the Office of Health and Environmental Research (OHER) and Los Alamos National Laboratory (LANL) and dedicated to examining the feasibility, advisability, and approaches to sequencing the human genome. The workshop considered four principal topics: I. Technologies to be employed. II. Expected benefits. III. Architecture of the enterprise. IV. Participants and funding. I . Technology The participants of the workshop foresaw extraordinary and continuing progress in the efficiency and accuracy of mapping, ordering , and sequencing technologies. They suggested that a coordinated analysis of the human genome begin with the task of ordering overlapping recombinant DNA fragments obtained from purified human chromosomes that would provide an infrastructure for sequencing activity. At the same time, they support in-depth evaluation of current and developing strategies for sequencing including possible applications of automation and robotics that would minimize the time and cost of sequencing. II. Benefits The socio-political and health benefits, and the benefit:cost ratio were seen as highly favorable not only for human health, but in addition for the development of new diagnostic, preventative and therapeutic tools, jobs, and industries.
    [Show full text]
  • Mapping Our Genes—Genome Projects: How Big? How Fast?
    Mapping Our Genes—Genome Projects: How Big? How Fast? April 1988 NTIS order #PB88-212402 Recommended Citation: U.S. Congress, Office of Technology Assessment, Mapping Our Genes-The Genmne Projects.’ How Big, How Fast? OTA-BA-373 (Washington, DC: U.S. Government Printing Office, April 1988). Library of Congress Catalog Card Number 87-619898 For sale by the Superintendent of Documents U.S. Government Printing Office, Washington, DC 20402-9325 (order form can be found in the back of this report) Foreword For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technol- ogy, and politics. Congress is responsible for ‘(writing the rules” of what various Federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the U.S. Congress, The House Committee on Energy and Commerce requested that OTA undertake the project. The House Committee on Science, Space, and Technology, the Senate Com- mittee on Labor and Human Resources, and the Senate Committee on Energy and Natu- ral Resources also asked OTA to address specific points of concern to them. Congres- sional interest focused on several issues: ● how to assess the rationales for conducting human genome projects, ● how to fund human genome projects (at what level and through which mech- anisms), ● how to coordinate the scientific and technical programs of the several Federal agencies and private interests already supporting various genome projects, and ● how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology.
    [Show full text]
  • Characterizing the Dna-Binding Site Specificities of Cis2his2 Zinc Fingers
    MQP-ID-DH-UM1 C H A R A C T E RI Z IN G T H E DN A-BINDIN G SI T E SPE C I F I C I T I ES O F C IS2H IS2 Z IN C F IN G E RS A Major Qualifying Project Report Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degrees of Bachelor of Science in Biochemistry and Biology and Biotechnology by _________________________ Heather Bell April 26, 2012 APPROVED: ____________________ ____________________ ____________________ Scot Wolfe, PhD Destin Heilman, PhD David Adams, PhD Gene Function and Exp. Biochemistry Biology and Biotech UMass Medical School WPI Project Advisor WPI Project Advisor MAJOR ADVISOR A BST R A C T The ability to modularly assemble Zinc Finger Proteins (ZFPs) as well as the wide variety of DNA sequences they can recognize, make ZFPs an ideal framework to design novel DNA-binding proteins. However, due to the complexity of the interactions between residues in the ZF recognition helix and the DNA-binding site there is currently no comprehensive recognition code that would allow for the accurate prediction of the DNA ZFP binding motifs or the design of novel ZFPs for a desired target site. Through the analysis of the DNA-binding site specificities of 98 ZFP clones, determined through a bacterial one-hybrid selection system, a predictive model was created that can accurately predict the binding site motifs of novel ZFPs. 2 T A B L E O F C O N T E N TS Signature Page ««««««««««««««««««««««««««« $EVWUDFW«««««««««««««««««««««««««««««« 7DEOHRI&RQWHQWV«««««««««««««««««««««««««« $FNQRZOHGJHPHQWV««««««««««««««««««««««««« %DFNJURXQG«««««««««««««««««««««««««««« Project Purpose «««««««««««««««««««««««««««15 0HWKRGV««««««««««««««««««««««««««««««16 5HVXOWV««««««««««««««««««««««««««««««21 'LVFXVVLRQ«««««««««««««««««««««««««««««28 Bibliograph\«««««««««««««««««««««««««««« 6XSSOHPHQWDO««««««««««««««««««««««««««« 3 A C K N O W L E D G E M E N TS I would like to thank Dr.
    [Show full text]
  • UC Irvine UC Irvine Previously Published Works
    UC Irvine UC Irvine Previously Published Works Title The capacity of feedforward neural networks. Permalink https://escholarship.org/uc/item/29h5t0hf Authors Baldi, Pierre Vershynin, Roman Publication Date 2019-08-01 DOI 10.1016/j.neunet.2019.04.009 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California THE CAPACITY OF FEEDFORWARD NEURAL NETWORKS PIERRE BALDI AND ROMAN VERSHYNIN Abstract. A long standing open problem in the theory of neural networks is the devel- opment of quantitative methods to estimate and compare the capabilities of different ar- chitectures. Here we define the capacity of an architecture by the binary logarithm of the number of functions it can compute, as the synaptic weights are varied. The capacity provides an upperbound on the number of bits that can be extracted from the training data and stored in the architecture during learning. We study the capacity of layered, fully-connected, architectures of linear threshold neurons with L layers of size n1, n2,...,nL and show that in essence the capacity is given by a cubic polynomial in the layer sizes: L−1 C(n1,...,nL) = Pk=1 min(n1,...,nk)nknk+1, where layers that are smaller than all pre- vious layers act as bottlenecks. In proving the main result, we also develop new techniques (multiplexing, enrichment, and stacking) as well as new bounds on the capacity of finite sets. We use the main result to identify architectures with maximal or minimal capacity under a number of natural constraints.
    [Show full text]
  • ISMB 99 August 6 – 10, 1999 Heidelberg, Germany the Seventh
    ______________________________________ Welcome to ISMB 99 August 6 – 10, 1999 Heidelberg, Germany The Seventh International Conference on Intelligent Systems for Molecular Biology ______________________________________ Final Program and Detailed Schedule Friday, August 6, 1999 Tutorial Day The tutorials will take place in the following rooms: 8:30 – 12:30 (Coffee break around 10:30) Tutorial #1 Trübnersaal Piere Baldi Probabilistic graphical models Tutorial #2 Robert-Schumann-Zimmer Douglas L. Brutlag Bioinformatics and Molecular Biology Tutorial #3 Ballsaal Martin Reese The challenge of annotating a complete eukaryotic genome: A case study in Drosophila melanogaster Tutorial #4 Gustav-Mahler-Zimmer Tandy Warnow Computational and statistical Junhyong Kim challenges involved in reconstructing evolutionary trees Tutorial #5 Sebastian-Münster-Saal Thomas Werner The biology and bioinformatics of regulatory regions in genomes Lunch (on this day served in "Grosser Saal" on the ground floor) 13:30 – 17:30 (Coffee break around 15:30) Tutorial #6 Sebastian-Münster-Saal Rob Miller EST Clustering Alan Christoffels Winston Hide Tutorial #7 Trübnersaal Kevin Karplus Getting the most out of hidden Markov Melissa Cline models Christian Barrett Tutorial #8 Robert-Schumann-Zimmer Arthur Lesk Sequence-structure relationships and evolutionary structure changes in proteins Tutorial #9 Gustav-Mahler-Zimmer David States PERL abstractions for databases and Brian Dunford distributed computing Shore Tutorial # 10 Ballsaal Zoltan Szallasi Genetic network analysis
    [Show full text]
  • EMT Biocomputing Workshop Report
    FINAL WORKSHOP REPORT Sponsor: 004102 : Computing Research Association Award Number: AGMT dtd 7-2-08 Title: NSF Workshop on Emerging Models and Technologies in Computing: Bio-Inspired Computing and the Biology and Computer Science Interface. Report Prepared by: Professor Ron Weiss Departments of Electrical Engineering and Molecular Biology Princeton University Professor Laura Landweber Departments of Ecology and Evolutionary Biology and Molecular Biology Princeton University Other Members of the Organizing Committee Include: Professor Mona Singh Departments of Computer Science and Molecular Biology Princeton University Professor Erik Winfree Department of Computer Science California Institute of Technology Professor Mitra Basu Department of Computer Science Johns Hopkins University and the United States Naval Academy Workshop Website: https://www.ee.princeton.edu/Events/emt/ Draft: 09/07/2008 Table of Contents I. Executive Summary 1. Workshop Objectives 2. Summary of the Workshop Methodology and Findings 3. Recommendations II. Grand Challenges III. Breakout Group Presentations Appendices A1. Agenda A2. Participant List A3. Abstracts of Invited Presentations A4. Presentations Draft: 09/07/2008 I. Executive Summary I.1 Workshop Goals The National Science Foundation, Division of Computer and Information Science and Engineering (CISE), Computing and Communications Foundations (CCF) sponsored a Workshop on Emerging Models and Technologies in Computing (EMT): Bio- Inspired Computing. This workshop brought together distinguished leaders in the fields of synthetic biology, bio-computing, systems biology, and protein and nucleic acid engineering to share their vision for science and research and to learn about the research projects that EMT has funded. The goal was to explore and to drive the growing interface between Biology and Computer Science.
    [Show full text]
  • Research News
    Computing Research News COMPUTING RESEARCH ASSOCIATION, CELEBRATING 40 YEARS OF SERVICE TO THE COMPUTING RESEARCH COMMUNITY JUNE 2013 Vol. 25 / No. 6 Announcements 2 Coalition for National Science Funding 2 CRA Announces Outstanding Undergraduate Researcher Award Winners 3 Computing Research in Action 5 CERP Infographic 6 NSF Funding Opportunity 6 CRA Recognizes Participants 7 CRA Board Members 16 CRA Board Officers 16 CRA Staff 16 Professional Opportunities 17 COMPUTING RESEARCH NEWS, JUNE 2013 Vol. 25 / No. 6 Announcements 2012 Taulbee Report Updated May 15, 2013 Corrected Table F6 Click here to download updated version CRA Releases Latest Research Issue Report New Technology-based Models for Postsecondary Learning: Conceptual Frameworks and Research Agendas The report details the findings of a National Science Foundation-Sponsored Computing Research Association Workshop held at MIT on January 9-11, 2013. From the report: “Advances in technology and in knowledge about expertise, learning, and assessment have the potential to reshape the many forms of education and training past matriculation from high school. In the next decade, higher education, military and workplace training, and professional development must all transform to exploit the opportunities of a new era, leveraging emerging technology-based models that can make learning more efficient and possibly improve student support, all at lower cost for a broader range of learners.” The report is now available as a pdf at http://cra.org/resources/research-issues/. Slides from the presentation at NSF on April 19, 2013 are also available. Investments in STEM Research and Education: Fueling American Innovation On May 7, at the Rayburn House Office Building in Brett Bode from the National Center for Supercomputing Washington, DC, the Coalition for National Science Funding Applications at University of Illinois Urbana-Champaign were (CNSF) held its 19th annual exhibition and reception, on hand to talk about the “Blue Waters” project.
    [Show full text]
  • Download File
    Topics in Signal Processing: applications in genomics and genetics Abdulkadir Elmas Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2016 c 2016 Abdulkadir Elmas All Rights Reserved ABSTRACT Topics in Signal Processing: applications in genomics and genetics Abdulkadir Elmas The information in genomic or genetic data is influenced by various complex processes and appropriate mathematical modeling is required for studying the underlying processes and the data. This dissertation focuses on the formulation of mathematical models for certain problems in genomics and genetics studies and the development of algorithms for proposing efficient solutions. A Bayesian approach for the transcription factor (TF) motif discovery is examined and the extensions are proposed to deal with many interdependent parameters of the TF-DNA binding. The problem is described by statistical terms and a sequential Monte Carlo sampling method is employed for the estimation of unknown param- eters. In particular, a class-based resampling approach is applied for the accurate estimation of a set of intrinsic properties of the DNA binding sites. Through statistical analysis of the gene expressions, a motif-based computational approach is developed for the inference of novel regulatory networks in a given bacterial genome. To deal with high false-discovery rates in the genome-wide TF binding predictions, the discriminative learning approaches are examined in the context of sequence classification, and a novel mathematical model is introduced to the family of kernel-based Support Vector Machines classifiers. Furthermore, the problem of haplotype phasing is examined based on the genetic data obtained from cost-effective genotyping technologies.
    [Show full text]
  • Baldi Bioinformatics
    Vol. 15 no. 11 1999 BIOINFORMATICS Pages 865–866 A.M. Shmatkov, A.A. Melikyan, F.L. Chernousko and Editorial M. Borodovsky’s paper, ‘Finding prokaryotic genes by the “frame-by-frame” algorithm: targeting gene starts THE SECOND GEORGIA TECH INTERNA- and overlapping genes’, addresses the important practical TIONAL CONFERENCE ON BIOINFORMATICS: problem of gene recognition in prokaryotic genomes, SEQUENCE, STRUCTURE AND FUNCTION of which over a dozen have already been completely (NOVEMBER 11–14, 1999, ATLANTA, GEORGIA, sequenced. The authors suggest an approach to one of USA) the few remaining open problems in prokaryotic gene finding: accurate prediction of gene starts allowing for Steering & Program Committee: Pierre Baldi, Mark the possibility of overlapping protein-coding regions, a Borodovsky, Soren Brunak, Chris Burge, Jim Fickett, relatively common occurrence in prokaryotic genomes Steven Henikoff, Eugene Koonin, Andrej Sali, Chris which seems to be rare in eukaryotes. Their algorithm Sander, Gary Stormo involves application of a hidden Markov model of gene structure to each of the six global reading frames (three on This issue of Bioinformatics contains reports on selected each strand) of a genome separately, followed by a simple papers presented at the international conference in At- post-processing step to remove completely overlapping lanta. The conference was held at one of the midtown genes which rarely occur in nature. Promising results are hotels offering a magnificent bird’s-eye view to the obtained in identifying gene starts, and the possibility of cosmopolitan capital of the Southeast of the USA. The systematic biases in the annotation of several bacterial conference agenda included keynote lectures by Russell genomes is raised.
    [Show full text]
  • Bch394p-364C
    Assembling Genomes BCH394P/364C Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin 1 www.yourgenome.org/facts/timeline-the-human-genome-project Nature 409, 860-921(2001) 2 1 3 Beijing Genomics Institute “If it tastes good you should sequence it... you should know what's in the genes of that species” Wang Jun, Chief executive, BGI (Wikipedia) 4 2 https://www.technologyreview.com/s/615289/china-bgi-100-dollar-genome/ 5 6 3 https://www.prnewswire.com/news-releases/oxford-nanopore-sequencers-have-left-uk-for-china- to-support-rapid-near-sample-coronavirus-sequencing-for-outbreak-surveillance-300996908.html 7 http://www.triazzle.com ; The image from http://www.dangilbert.com/port_fun.html Reference: Jones NC, Pevzner PA, Introduction to Bioinformatics Algorithms, MIT press 8 4 “mapping” “shotgun” sequencing 9 (Translating the cloning jargon) 10 5 Thinking about the basic shotgun concept • Start with a very large set of random sequencing reads • How might we match up the overlapping sequences? • How can we assemble the overlapping reads together in order to derive the genome? 11 Thinking about the basic shotgun concept • At a high level, the first genomes were sequenced by comparing pairs of reads to find overlapping reads • Then, building a graph ( i.e. , a network) to represent those relationships • The genome sequence is a “walk” across that graph 12 6 The “Overlap-Layout-Consensus” method Overlap : Compare all pairs of reads (allow some low level of mismatches) Layout : Construct a graph describing the overlaps sequence overlap read Simplify the graph read Find the simplest path through the graph Consensus : Reconcile errors among reads along that path to find the consensus sequence 13 Building an overlap graph 5’ 3’ EUGENE W.
    [Show full text]
  • Gene Regulatory Network Inference Using Machine Learning Techniques
    GENE REGULATORY NETWORK INFERENCE USING MACHINE LEARNING TECHNIQUES Stephanie Kamgnia Wonkap A Thesis in the department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy(Computer Science) Concordia University Montreal,´ Quebec,´ Canada August 26 2020 c Stephanie Kamgnia Wonkap, 2020 Concordia University School of Graduate Studies This is to certify that the thesis prepared By: Miss. Stephanie Kamgnia Wonkap Entitled: Gene Regulatory Network Inference using Machine Learning Techniques and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: Chair Dr. Liangzhu Wang External Examiner Dr. Mathieu Blanchette Examiner Dr. Leila Kosseim Examiner Dr. Malcolm Whiteway Examiner Dr. Volker Haarslev Supervisor Dr. Gregory Butler Approved Dr. Leila Kosseim, Graduate Program Director August 26th, 2020 Dr. Amir Asif, Dean Date of Defence Faculty of Engineering and Computer Science Abstract Gene Regulatory Network Inference using Machine Learning Techniques Stephanie Kamgnia Wonkap, Ph.D. Concordia University, 2020 Systems Biology is a field that models complex biological systems in order to better understand the working of cells and organisms. One of the systems modeled is the gene regulatory network that plays the critical role of controlling an organism's response to changes in its environment. Ideally, we would like a model of the complete gene regulatory network. In recent years, several advances in technology have permitted the collection of an unprecedented amount and variety of data such as genomes, gene expression data, time-series data, and perturbation data.
    [Show full text]
  • Highlights (PDF)
    GENETICS A PERIODICAL RECORD OF INVESTIGATIONS BEARING ON HEREDITY AND VARIATION Founded in 1916 and published by The Genetics Society of America VOLUME 179, MAY–AUGUST 2008 GENETICS VOLUME 179, MAY–AUGUST 2008 EDITORIAL BOARD Elizabeth W. Jones, Editor-in-Chief Carnegie Mellon University Mark Johnston, Acting Editor-in-Chief Washington University School of Medicine Montserrat Aguade´ Kent Golic Rasmus Nielsen Universitat de Barcelona University of Utah University of Copenhagen, Centre for Bioinformatics Eric E. Alani Susan Gottesman Cornell University National Institutes of Health-NCI Michael Nonet Washington University School of Medicine Kathryn V. Anderson David I. Greenstein Sloan-Kettering Institute University of Minnesota Magnus Nordborg University of Southern California Brenda J. Andrews David Jonah Grunwald University of Utah University of Toronto Peter J. Oefner hris aley Robert R. H. Anholt C H Stanford University Roslin Institute (Edinburgh) North Carolina State University Andrew Paterson ichael ampsey Elja Arjas M H University of Georgia University of Helsinki Robert Wood Johnson Medical School-UMDNJ David Rand orman rnheim Brown University N A awrence arshman University of Southern California L G. H University of Nebraska, Lincoln Eric J. Richards onnie artel Washington University B B ancy ollingsworth Rice University N H Stony Brook University Mark D. Rose David Begun afri umayun Princeton University University of California, Davis M. Z H UMDNJ-New Jersey Medical School Paul Russell ames irchler J A. B ancy enkins The Scripps Research Institute University of Missouri N A. J National Cancer Institute-FCRDC Matthew S. Sachs arl roman K W. B homas aufman Texas A&M University University of Wisconsin, Madison T C.
    [Show full text]