Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

Total Page:16

File Type:pdf, Size:1020Kb

Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale SCALABLE PARALLEL METHODS FOR ANALYZING METAGENOMICS DATA AT EXTREME SCALE By JEFFREY ALAN DAILY A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY School of Electrical Engineering and Computer Science MAY 2015 c Copyright by JEFFREY ALAN DAILY, 2015 All rights reserved c Copyright by JEFFREY ALAN DAILY, 2015 All rights reserved To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of JEFFREY ALAN DAILY find it satisfactory and recommend that it be accepted. Ananth Kalyanaraman, Ph.D., Chair John Miller, Ph.D. Carl Hauser, Ph.D. Sriram Krishnamoorthy, Ph.D. ii ACKNOWLEDGEMENT First, I would like to thank Dr. Ananth Kalyanaraman for his supervision and support throughout the work. I would like to thank Dr. Abhinav Vishnu who encouraged my pursuit of my Ph.D. before anyone else, suggested Dr. Kalyanaraman as my advisor, and has been a valuable mentor and friend. I would like to thank Dr. Sriram Krishnamoorthy for his role on my graduate committee, for his research support at our employer, Pacific Northwest Northwest Laboratory (PNNL), and for his mentoring role while performing my research. I would like to thank Dr. John Miller and Dr. Carl Hauser for being on my graduate committee. Last, but not least, I would like to thank my employer, PNNL, for the tuition reimbursement benefit that assisted my pursuit of this degree. iii SCALABLE PARALLEL METHODS FOR ANALYZING METAGENOMICS DATA AT EXTREME SCALE Abstract by Jeffrey Alan Daily, Ph.D. Washington State University May 2015 Chair: Ananth Kalyanaraman, Ph.D. The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moores law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of already annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment iv algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implemen- tation for conducting optimal homology detection on distributed memory supercomput- ers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ∼75-100% on up to 8K cores, representing a time-to-solution of 33 seconds. We extend this work with a detailed analysis of single-node sequence alignment performance using the latest CPU vector instruction set extensions. Preliminary results re- veal that current sequence alignment algorithms are unable to fully utilize widening vector registers. v TABLE OF CONTENTS Page ACKNOWLEDGEMENTS . iii ABSTRACT . v LIST OF TABLES . ix LIST OF FIGURES . x CHAPTER 1. INTRODUCTION . 1 1.1 Quantifying Scaling Requirements . 4 1.2 Contributions . 5 2. BACKGROUND AND RELATED WORK . 7 2.1 Algorithms and Data Structures for Homology Detection . 7 2.1.1 Notation . 7 2.1.2 Sequence Alignment . 8 2.1.3 Sequence Homology . 10 2.1.4 String Indices for Exact Matching . 11 2.2 Parallelization of Homology Detection . 14 2.2.1 Vectorization Opportunities in Sequence Alignment . 15 2.2.2 Distributed Alignments . 18 3. SCALABLE PARALLEL METHODS . 20 3.1 Filtering Sequence Pairs . 20 vi 3.1.1 Suffix Tree Filter . 20 3.1.2 Length Based Cutoff Filter . 23 3.1.3 Storing Large Numbers of Tasks . 24 3.1.4 Storing Large Sequence Datasets . 24 3.2 Load Imbalance . 25 3.2.1 Load Imbalance Caused by Filters . 25 3.2.2 Load Imbalance in Homology Detection . 26 3.2.3 Solutions to Load Imbalance . 27 3.3 Implementation . 29 4. RESULTS AND DISCUSSION . 31 4.1 Compute Resources . 31 4.2 Datasets . 31 4.3 Length-Based Filter . 32 4.4 Suffix Tree Filter . 33 4.4.1 Suffix Tree Heuristic Parameters . 34 4.4.2 Distributed Datasets . 36 4.4.3 Strong Scaling . 36 5. EXTENSIONS . 42 5.1 Vectorized Sequence Alignments using Prefix Scan . 42 5.1.1 Algorithmic Comparison of Striped and Scan . 44 5.1.2 Workload Characterization . 47 5.1.3 Empirical Characterization . 48 5.1.3.1 Cache Analysis . 51 5.1.3.2 Instruction Mix Analysis . 52 5.1.3.3 Query Length versus Performance . 54 vii 5.1.3.4 Query Length versus Number of Striped Corrections . 56 5.1.3.5 Scoring Criteria Analysis . 57 5.1.3.6 Prescriptive Solutions on Choice of Algorithm . 58 5.2 Communication-Avoiding Filtering using Tiling . 60 6. CONCLUSIONS AND FUTURE WORK . 64 viii LIST OF TABLES Page 2.1 Enhanced Suffix Array for Input String s = mississippi$. 14 3.1 Notation used in the dissertation. 21 5.1 Relative performance of each vector implementation approach. For this study, we used a small but representative protein sequence dataset and aligned every protein to each other protein. The vector implementations used the SSE4.1 ISA, splitting the 128-bit vector register into 8 16-bit in- tegers. The codes were run using a single thread of execution on a Haswell CPU running at 2.3 Ghz. 44 5.2 Cache analysis of all-to-all sequence alignment for the Bacteria 2K dataset on Haswell. Scan and Striped are using 4, 8, and 16 Lanes. For both scan and striped, instruction and data miss rates for all cache levels were less than 1 percent and are therefore not shown. The primary difference between approaches is indicated by the instruction and data references. 51 5.3 Cache analysis of all-to-all sequence alignment for the Bacteria 2K dataset on Xeon Phi. Scan and Striped are using only 16 lanes because the KNC ISA only supports 32-bit integers which restricts this analysis to 16 lanes. 52 5.4 Decision table showing which algorithm should be used given a particular class of sequence alignment and query length. 58 ix LIST OF FIGURES Page 2.1 Input string s = mississippi$ and corresponding suffix tree Γ. 13 2.2 Known ways to vectorize Smith-Waterman alignments using vectors with four elements. The tables shown here represent a query sequence of length 18 against a database sequence of length 6. Alignment tables are shown with colored elements, indicating the most recently computed cells. In or- der of most recently computed to least recently, the order is green, yellow, orange, and red. Dark gray cells were computed more than four vector epochs ago. Light gray cells indicate padded cells, which are required to properly align the computation(s) but are otherwise ignored or discarded. The blue lines indicate the relevant portion of the tables with the table ori- gin in the upper left corner. (Blocked) Vectors run parallel to the the query sequence. Each vector may need to recompute until values converge. First described by Rognes and Seeberg (Rognes and Seeberg, 2000). (Diagonal) Vectors run parallel to the anti-diagonal. Fist described by Wozniak (Woz- niak, 1997). (Striped) Vectors run parallel to the query using a striped data layout. A column may need to be recomputed at most P − 1 times until the values converge. First described by Farrar (Farrar, 2007). (Scan) This is the approach taken in this dissertation. It is similar to (Striped) but requires exactly two iterations over a column. 16 3.1 Characterization of time spent in alignment operations for an all-against-all alignment of a 15K sequence dataset from the CAMERA database. 27 x 3.2 Schematic illustration of the execution on a compute node. One thread is reserved to facilitate the transfer of tasks. Tasks are stolen only from the shared portion of the deque and delivered only to the private portion. The set of input sequences typically fits entirely within a compute node and is shared by all worker threads. The task pool starts having only subtree processing (pair generation) tasks but as subtree tasks are processed they add pair alignment tasks to the pool, as well. The dynamic creation and stealing of tasks causes the tasks to become unordered. 30 4.1 Execution times for the 80K dataset using the dynamic load balancing strategies of work stealing (‘Brute’) and distributed task counters (‘Counter’). Additionally, a length-based filter is applied to each strategy. Work steal- ing iterators are not considered as they performed similarly to the original work stealing approach. 33 4.2 Execution times for suffix tree filter and best non-tree filter strategy for the 80K sequence dataset. 35 4.3 Execution times for suffix tree filter for the 80K sequence dataset when it is replicated on each node or distributed across nodes.
Recommended publications
  • 120421-24Recombschedule FINAL.Xlsx
    Friday 20 April 18:00 20:00 REGISTRATION OPENS in Fira Palace 20:00 21:30 WELCOME RECEPTION in CaixaForum (access map) Saturday 21 April 8:00 8:50 REGISTRATION 8:50 9:00 Opening Remarks (Roderic GUIGÓ and Benny CHOR) Session 1. Chair: Roderic GUIGÓ (CRG, Barcelona ES) 9:00 10:00 Richard DURBIN The Wellcome Trust Sanger Institute, Hinxton UK "Computational analysis of population genome sequencing data" 10:00 10:20 44 Yaw-Ling Lin, Charles Ward and Steven Skiena Synthetic Sequence Design for Signal Location Search 10:20 10:40 62 Kai Song, Jie Ren, Zhiyuan Zhai, Xuemei Liu, Minghua Deng and Fengzhu Sun Alignment-Free Sequence Comparison Based on Next Generation Sequencing Reads 10:40 11:00 178 Yang Li, Hong-Mei Li, Paul Burns, Mark Borodovsky, Gene Robinson and Jian Ma TrueSight: Self-training Algorithm for Splice Junction Detection using RNA-seq 11:00 11:30 coffee break Session 2. Chair: Bonnie BERGER (MIT, Cambrige US) 11:30 11:50 139 Son Pham, Dmitry Antipov, Alexander Sirotkin, Glenn Tesler, Pavel Pevzner and Max Alekseyev PATH-SETS: A Novel Approach for Comprehensive Utilization of Mate-Pairs in Genome Assembly 11:50 12:10 171 Yan Huang, Yin Hu and Jinze Liu A Robust Method for Transcript Quantification with RNA-seq Data 12:10 12:30 120 Zhanyong Wang, Farhad Hormozdiari, Wen-Yun Yang, Eran Halperin and Eleazar Eskin CNVeM: Copy Number Variation detection Using Uncertainty of Read Mapping 12:30 12:50 205 Dmitri Pervouchine Evidence for widespread association of mammalian splicing and conserved long range RNA structures 12:50 13:10 169 Melissa Gymrek, David Golan, Saharon Rosset and Yaniv Erlich lobSTR: A Novel Pipeline for Short Tandem Repeats Profiling in Personal Genomes 13:10 13:30 217 Rory Stark Differential oestrogen receptor binding is associated with clinical outcome in breast cancer 13:30 15:00 lunch break Session 3.
    [Show full text]
  • Extrachromosomal and Other Mechanisms of Oncogene Amplification in Cancer
    Title: Extrachromosomal and other mechanisms of oncogene amplification in cancer Abstract: Increase in the number of copies of tumor promoting (onco-) genes is a hallmark of many cancers, and cancers with copy number amplifications are often associated with poor outcomes. Despite their importance, the mechanisms causing these amplifications are incompletely understood. In this talk, we describe our recent results suggesting that a large faction of amplification is due to formation of extrachromosomal DNA (ecDNA). EcDNA play a critical role in tumor heterogeneity, accelerated cancer evolution, and drug resistance through their unique mechanism of non-chromosomal inheritance. While predominant, ecDNA are not the only mechanism to cause amplification. We also describe recent algorithmic methods required to distinguish ecDNA from other mechanisms including Breakage Fusion Bridge formation, Chromothripsis, and simpler events such as tandem duplications and translocations. The talk is a mix of published and unpublished work, largely in collaboration with Paul Mischel's lab at UCSD. EcDNA was recently recognized as one of the grand challenges of cancer research by Cancer Research UK and the National Cancer Institute. Reading Luebeck, 2020, Verhaak 2019 Biography Vineet Bafna, Ph.D., joined the Computer Science faculty at the University of California, San Diego in 2003, after seven years in the biosciences industry. He received his Ph.D. in computer science from The Pennsylvania State University in 1994 and was an NSF postdoctoral researcher at the Center for Discrete Mathematics and Theoretical Computer Science for two years. From 1996-99, Bafna was a senior investigator at SmithKline Beecham, conducting research on DNA signaling, target discovery and EST assembly.
    [Show full text]
  • UC San Diego Electronic Theses and Dissertations
    UC San Diego UC San Diego Electronic Theses and Dissertations Title Computational methods for genome-wide non-coding RNA discovery and analysis Permalink https://escholarship.org/uc/item/5qc2h8tf Author Zhang, Shaojie Publication Date 2007 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Computational Methods for Genome-Wide Non-Coding RNA Discovery and Analysis A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Shaojie Zhang Committee in charge: Professor Vineet Bafna, Chair Professor Sanjoy Dasgupta Professor Pavel Pevzner Professor Glenn Tesler Professor Steven Wasserman 2007 . Copyright Shaojie Zhang, 2007 All rights reserved. The dissertation of Shaojie Zhang is approved, and it is acceptable in quality and form for publication on micro- ¯lm: Chair University of California, San Diego 2007 iii To my parents. iv TABLE OF CONTENTS Signature Page . iii Dedication . iv Table of Contents . v List of Figures . vii List of Tables . viii Acknowledgements . ix Vita, Publications, and Fields of Study . xi Abstract . xiii 1 Introduction . 1 1.1 Non-coding RNAs . 1 1.2 RNA secondary structure . 4 1.3 The Challenge of ncRNA Discovery and Analysis . 6 1.3.1 RNA Homolog Search . 7 1.3.2 RNA Consensus Folding for ncRNA Discovery . 8 1.4 Dissertation Outline . 9 2 FastR: Fast RNA Search Using Structure-based Filters . 11 2.1 Introduction . 11 2.2 Methods . 14 2.2.1 Strucutre-based Filters . 14 2.2.2 Structure-based Filter Design . 17 2.2.3 Optimal Structure-based Filter Design .
    [Show full text]
  • Call for Papers (Page 1)
    CALL FOR PAPERS IEEE Computational Systems Bioinformatics Conference Stanford, CA • August 8-11, 2005 You are invited to submit papers to the 2005 IEEE Computational Systems Bioinformatics Conference (CSB2005). The conference’s goal is to facilitate exchange of ideas and collaborations between computer scientists and biologists by presenting cutting-edge computational biology research findings. Such research has an interdisciplinary character. Computer science and mathematical modeling papers must contain a concise description of the biological problem being solved, and biology papers should show how computation or analysis affects the results. Topics of interest include (but are not limited to): • Microarray Data Analysis • Mathematical and Quantitative Models • MicroRNA and RNAi of Cellular and Multicellular Systems • Pathways, Networks, Systems Biology • Synthetic Biological Systems • Biomedical Applications • Sequence Alignment • Biological Data Visualization • Evolution and Phylogenetics • Protein Structures and Complexes • Functional Genomics • Biological Data Mining • High Performance Bio-computing • Pattern Recognition • Comparative Genomics • Microbial Community Analysis • SNPs and Haplotyping • Promoter Analysis and Discovery Full papers are limited to 12 pages, single-spaced, in 12-point type, including title, abstract (250 words or less), figures, tables, text, and bibliography. The first page should give keywords, authors’ postal and electronic mailing addresses. Papers must not have been previously published and must not be currently under consideration for publication elsewhere. Papers will be submitted electronically in MS Word, postscript or PDF format. This year, the conference will also accept short papers, limited to four pages. These papers should describe new research activity in which a complete set of results may not yet be available. Full and short papers will have 25 and 15 minutes, respectively, of presentation time.
    [Show full text]
  • Computational Discovery of Splicing Events from High-Throughput Omics Data
    COMPUTATIONAL DISCOVERY OF SPLICING EVENTS FROM HIGH-THROUGHPUT OMICS DATA by Yen-Yi Lin M.Sc., National Taiwan University, 2008 B.Sc., National Taiwan University, 2003 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the School of Computing Sciences Faculty of Applied Sciences c Yen-Yi Lin 2017 SIMON FRASER UNIVERSITY Summer 2017 Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation. Approval Name: Yen-Yi Lin Degree: Doctor of Philosophy (Computing Sciences) Title: COMPUTATIONAL DISCOVERY OF SPLICING EVENTS FROM HIGH-THROUGHPUT OMICS DATA Examining Committee: Chair: Faraz Hach University Research Associate Cenk Sahinalp Senior Supervisor Professor Martin Ester Senior Supervisor Professor Peter Unrau Supervisor Professor Dept. of Molecular Biology and Biochemistry Leonid Chindelevitch Internal Examiner Assistant Professor Vineet Bafna External Examiner Professor Computer Science and Engineering University of California, San Diego Date Defended: July 18, 2017 ii Abstract The splicing mechanism, the process of forming mature messenger RNA (mRNA) by only concatenating exons and removing introns, is an essential step in gene expression. It allows a single gene to have multiple RNA isoforms which potentially code different proteins. In addition, aberrant transcripts generated from non-canonical splicing events (e.g. gene fusions) are believed to be potential drivers in many tumor types and human diseases. Thus, identification and quantification of expressed RNAs from RNA-Seq data become fundamental steps in many clinical studies. For that reason, number of methods have been developed. Most popular computational methods designed for these high-throughput omics data start by analyzing the datasets based on existing gene annotations.
    [Show full text]
  • Front Matter
    Cambridge University Press 978-1-107-01146-5 - Bioinformatics for Biologists Edited by Pavel Pevzner and Ron Shamir Frontmatter More information BIOINFORMATICS FOR BIOLOGISTS The computational education of biologists is changing to prepare students for facing the complex data sets of today’s life science research. In this concise textbook, the authors’ fresh pedagogical approaches lead biology students from first principles towards computational thinking. A team of renowned bioinformaticians take innovative routes to introduce computational ideas in the context of real biological problems. Intuitive explanations promote deep understanding, using little mathematical formalism. Self-contained chapters show how computational procedures are developed and applied to central topics in bioinformatics and genomics, such as the genetic basis of disease, genome evolution, or the tree of life concept. Using bioinformatic resources requires a basic understanding of what bioinformatics is and what it can do. Rather than just presenting tools, the authors – each a leading scientist – engage the students’ problem-solving skills, preparing them to meet the computational challenges of their life science careers. PAVEL PEVZNER is Ronald R. Taylor Professor of Computer Science and Director of the Bioinformatics and Systems Biology Program at the University of California, San Diego. He was named a Howard Hughes Medical Institute Professor in 2006. RON SHAMIR is Raymond and Beverly Sackler Professor of Bioinformatics and head of the Edmond J. Safra Bioinformatics
    [Show full text]
  • Research in Computational Molecular Biology
    Terry Speed Haiyan Huang (Eds.) Research in Computational Molecular Biology 1 lth Annual International Conference, RECOMB 2007 Oakland, CA, USA, April 21-25, 2007 Proceedings Sprin ger Table of Contents QNet: A Tool for Querying Protein Interaction Networks 1 Banu Dost, Tomer Shlomi, Nitin Gupta, Eytan Ruppin, Vineet Bafna, and Roded Sharan Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology 16 Rohit Singh, Jinbo Xu, and Bonnie Berger Reconstructing the Topology of Protein Complexes 32 Allister Bernard, David S. Vaughn, and Alexander J. Hartemink Network Legos: Building Blocks of Cellular Wiring Diagrams 47 T.M. Murali and Corban G. Rivera An Efficient Method for Dynamic Analysis of Gene Regulatory Networks and in silico Gene Perturbation Experiments 62 Abhishek Garg, Ioannis Xenarios, Luis Mendoza, and Giovanni DeMicheli A Feature-Based Approach to Modeling Protein-DNA Interactions 77 Eilon Sharon and Eran Segal Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking 92 Joshua A. Grochow and Manolis Kellis Nucleosome Occupancy Information Improves de novo Motif Discovery 107 Leelavati Narlikar, Raluca Gordan, and Alexander J. Hartemink Framework for Identifying Common Aberrations in DNA Copy Number Data 122 Amir Ben-Dor, Doron Lipson, Anya Tsalenko, Mark Reimers, Lars O. Baumbusch, Michael T. Barrett, John N. Weinstein, Anne-Lise B0rresen-Dale, and Zohar Yakhini Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models 137 Wenyi Wang, Benilton Carvalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, and Rafael A. Irizarry GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data 151 Yanxin Shi, Fan Guo, Wei Wu, and Eric P.
    [Show full text]
  • Table of Contents More Information
    Cambridge University Press 978-1-107-01146-5 - Bioinformatics for Biologists Edited by Pavel Pevzner and Ron Shamir Table of Contents More information CONTENTS Extended contents ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification in a haplotype block 23 Kun-Mao Chao 3 Genome reconstruction: a puzzle with a billion pieces 36 Phillip E. C. Compeau and Pavel A. Pevzner 4 Dynamic programming: one algorithmic key for many biological locks 66 Mikhail Gelfand 5 Measuring evidence: who’s your daddy? 93 Christopher Lee P A R T II Gene Transcription and Regulation 109 6 How do replication and transcription change genomes? 111 Andrey Grigoriev 7 Modeling regulatory motifs 126 Sridhar Hannenhalli 8 How does the influenza virus jump from animals to humans? 148 Haixu Tang vii © in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-1-107-01146-5 - Bioinformatics for Biologists Edited by Pavel Pevzner and Ron Shamir Table of Contents More information viii Contents P A R T III Evolution 165 9 Genome rearrangements 167 Steffen Heber and Brian E. Howard 10 Comparison of phylogenetic trees and search for a central trend in the “Forest of Life” 189 Eugene V.Koonin, Pere Puigbo,` and Yuri I. Wolf 11 Reconstructing the history of large-scale genomic changes: biological questions and computational challenges 201 Jian Ma P A R T IV Phylogeny 225 12 Figs, wasps, gophers, and lice: a computational exploration of coevolution 227 Ran Libeskind-Hadas 13 Big cat phylogenies, consensus trees, and computational thinking 248 Seung-Jin Sul and Tiffani L.
    [Show full text]
  • Friday (August 4) Intelligent Systems for Molecular Biology
    and 2nd Annual AB3C Conference: X-Meeting fortaleza, brazil - auGust 6-10, 2006 friday (auGust 4) Intelligent Systems for Molecular Biology Room A Room B1 Room E1 (2nd Floor) 7:00 a.m. Shuttles depart Hotels to Convention Center 7:30 a.m. - 6:00 p.m. Registration - Hall G 3Dsig Alternative Splicing BOSC 9:00 a.m. - 10:30 a.m. Satellite Meeting SIG SIG 10:30 a.m. - 11:00 a.m. Coffee Break - Satellite & SIG meetings (available outside meeting rooms) 3Dsig Alternative Splicing BOSC 11:00 a.m - 12:00 p.m. Satellite Meeting SIG SIG 12:00 p.m. - 1:00 p.m. Lunch - Satellite and SIG meetings (Hall F) 3Dsig Alternative Splicing BOSC 1:00 p.m. - 3:30 p.m. Satellite Meeting SIG SIG 3:30 p.m. - 4:00 p.m. Coffee Break - Satellite & SIG meetings (available outside meeting rooms) 3Dsig Alternative Splicing BOSC 4:00 p.m. - 6:30 p.m. Satellite Meeting SIG SIG 6:30 p.m. Shuttles depart Convention Center to Hotels 1 and 2nd Annual AB3C Conference: X-Meeting fortaleza, brazil - auGust 6-10, 2006 saturday (auGust 5) Intelligent Systems for Molecular Biology Room A Room B1 Room B2 Room E1 (2nd Floor) 7:00 a.m. Shuttles depart Hotels to Convention Center 7:30 a.m. - 6:00 p.m. Registration - Hall G Joint BioLINK & 3Dsig Alternative Splicing Bio-ontologies BOSC 9:00 a.m. - 10:30 a.m. Satellite Meeting SIG SIG SIG 10:30 a.m. - 11:00 a.m. Coffee Break - Satellite & SIG meetings (available outside meeting rooms) Joint BioLINK & 3Dsig Alternative Splicing Bio-ontologies BOSC 11:00 a.m - 12:00 p.m.
    [Show full text]
  • The Future of Genomic Medicine II the Neurosciences Institute Auditorium San Diego, California
    SCRIPPS GENOMIC MEDICINE A COLL ABoraTion OF S cripps H ea LT H and T H E S C R I P P S R E S E A RC H I NS T I T U T E TheThe Future Future of of Genomic Genomic MedicineMedicine II II in collaboration with: Friday, February 27 and Saturday, February 28, 2009 The Neurosciences Institute Auditorium San Diego, California Dear Colleague, Scripps Genomic Medicine A collaboration between Scripps Health and The Scripps Research Institute We invite you to a special day and a half program on a Friday, February 27 and Saturday, February 28, 2009 in beautiful La Jolla, California. Once again, Scripps Genomic Medicine, an initiative of Scripps Health in collaboration this year’s program will be held at the Neurosciences Institute Auditorium on The with The Scripps Research Institute, supports basic research and clinical pro- Scripps Research Institute campus. grams focused on defining the genes that underlie susceptibility to disease, As you will see in the enclosed program, we are inviting a dynamic group of and will take these findings into drug discovery programs and ultimately into speakers who will cover a wide range of topics. The Friday afternoon session clinical trials. The program’s work involves genotyping tens of thousands of will focus on Technology Breakthroughs and Challenges whereas the Saturday individuals of diverse ancestry in an attempt to identify and define genes re- session will feature a wide variety of topics along with keynote speakers. Our goal is to continue to examine the salient progress and challenges in the field of sponsible for major disease and the underpinnings of health.
    [Show full text]
  • Curriculum Vitae
    Curriculum Vitae Eran Halperin Updated to : October 14, 2016. Affiliation: Professor, Department of Computer Science, University of California, Los Angeles (UCLA) Professor, Department of Anesthesiology, University of California, Los Angeles (UCLA) Research areas: Computational Biology, Genomics, Epigenomics, Statistical Genetics, Population Genetics, , Algo- rithms, Machine Learning. Education: 1997-01 Ph.D. in Computer Science, Tel-Aviv University. Thesis: Approximation algorithms for optimization problems. Advisor: Prof. Uri Zwick. 1993-96 M.Sc. in Computer Science, Tel-Aviv University (Summa Cum Laude). Thesis: Bipartite subgraphs of integer weighted graphs. Advisor: Prof. Noga Alon. 1990-93 B.Sc. in Mathematics and Computer Science, Tel-Aviv University (Summa Cum Laude), Experience Academic Research Positions: 2016-now Professor, Computer Science and Anesthesiology, University of California, Los Angeles (UCLA) 2011-2016 Associate Professor, Blavatnik School of Computer Science, and the Department of Molecular Microbiology and Biotechnology, Tel-Aviv University. 2004-2016 Senior Research Scientist at the International Computer Science Institute (ICSI, Berkeley). 2008-2011 Senior Lecturer, Blavatnik School of Computer Science, and the Department of Molec- ular Microbiology and Biotechnology, Tel-Aviv University. 2003-2004 Research Associate at the Computer Science department of Princeton University. 2001-03 Post doc at the Computer Science department of the University of California in Berke- ley, and at the International Computer Science Institute (ICSI). Hosts: Richard Karp, Christos Papadimitriou, Satish Rao, Alistair Sinclair. July-August 2000 Summer intern in AT&T research labs, Florham Park, New Jersey. Mentor: Edith Cohen. 1 Positions in the industry: 07/11-present Scientific Advisory Board in Genia Technologies (nanopores sequencing tech- nologies) 05/12-present Computational Advisory Board in DNA Nexus 10/12-10/13 Scientific Advisory Board in Gene by Gene 07/07-12/08 Director of Bioinformatics in Navigenics, Inc.
    [Show full text]
  • Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale
    PNNL-24266 Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale May 2015 JA Daily SCALABLE PARALLEL METHODS FOR ANALYZING METAGENOMICS DATA AT EXTREME SCALE By JEFFREY ALAN DAILY A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY School of Electrical Engineering and Computer Science MAY 2015 c Copyright by JEFFREY ALAN DAILY, 2015 All rights reserved c Copyright by JEFFREY ALAN DAILY, 2015 All rights reserved To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of JEFFREY ALAN DAILY find it satisfactory and recommend that it be accepted. Ananth Kalyanaraman, Ph.D., Chair John Miller, Ph.D. Carl Hauser, Ph.D. Sriram Krishnamoorthy, Ph.D. ii ACKNOWLEDGEMENT First, I would like to thank Dr. Ananth Kalyanaraman for his supervision and support throughout the work. I would like to thank Dr. Abhinav Vishnu who encouraged my pursuit of my Ph.D. before anyone else, suggested Dr. Kalyanaraman as my advisor, and has been a valuable mentor and friend. I would like to thank Dr. Sriram Krishnamoorthy for his role on my graduate committee, for his research support at our employer, Pacific Northwest Northwest Laboratory (PNNL), and for his mentoring role while performing my research. I would like to thank Dr. John Miller and Dr. Carl Hauser for being on my graduate committee. Last, but not least, I would like to thank my employer, PNNL, for the tuition reimbursement benefit that assisted my pursuit of this degree. iii SCALABLE PARALLEL METHODS FOR ANALYZING METAGENOMICS DATA AT EXTREME SCALE Abstract by Jeffrey Alan Daily, Ph.D.
    [Show full text]