Lecture Notes in 12715

Subseries of Lecture Notes in

Series Editors Sorin Istrail Brown University, Providence, RI, USA Pavel Pevzner University of California, San Diego, CA, USA University of Southern California, Los Angeles, CA, USA

Editorial Board Members Søren Brunak Technical University of Denmark, Kongens Lyngby, Denmark Mikhail S. Gelfand IITP, Research and Training Center on Bioinformatics, Moscow, Russia Max Planck Institute for Informatics, Saarbrücken, Germany University of Tokyo, Tokyo, Japan Max Planck Institute of Molecular Cell and Genetics, Dresden, Germany Marie-France Sagot Université Lyon 1, Villeurbanne, France University of Ottawa, Ottawa, Canada Tel Aviv University, Ramat Aviv, Tel Aviv, Israel Terry Speed Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia Max Planck Institute for Molecular Genetics, Berlin, Germany W. Eric Wong University of Texas at Dallas, Richardson, TX, USA More information about this subseries at http://www.springer.com/series/5381 Carlos Martín-Vide • Miguel A. Vega-Rodríguez • Travis Wheeler (Eds.)

Algorithms for 8th International Conference, AlCoB 2021 Missoula, MT, USA, June 7–11, 2021 Proceedings

123 Editors Carlos Martín-Vide Miguel A. Vega-Rodríguez Rovira i Virgili University University of Extremadura Tarragona, Spain Cáceres, Spain Travis Wheeler University of Montana Missoula, MT, USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Bioinformatics ISBN 978-3-030-74431-1 ISBN 978-3-030-74432-8 (eBook) https://doi.org/10.1007/978-3-030-74432-8

LNCS Sublibrary: SL8 – Bioinformatics

© Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface

These proceedings contain the papers that were presented at the 8th International Conference on for Computational Biology (AlCoB 2021), held in Missoula, Montana, USA, during June 7–11, 2021. Due to the COVID-19 pandemic, AlCoB 2020 and AlCoB 2021 were merged and held on these dates together. AlCoB 2020 proceedings were published as LNCS/LNBI 12099. The scope of AlCoB includes topics of either theoretical or applied interest, namely – Sequence analysis – – Sequence assembly – rearrangement – Regulatory motif finding – Phylogeny reconstruction – Phylogeny comparison – Structure prediction – Compressive genomics – Proteomics: molecular pathways, interaction networks, mass spectrometry analysis – Transcriptomics: splicing variants, isoform inference and quantification, differential analysis – Next-generation : population genomics, metagenomics, metatranscrip- tomics, epigenomics – Genome CD architecture – Microbiome analysis – Cancer computational biology – Systems biology AlCoB 2021 received 22 submissions. Most papers were reviewed by three Program Committee members. There were also a few external reviewers consulted. After a thorough and vivid discussion phase, the committee decided to accept 12 papers (which represents an acceptance rate of about 55%). The conference program included three invited talks and some poster presentations of work in progress. The excellent facilities provided by the EasyChair conference management system allowed us to deal with the submissions successfully and handle the preparation of these proceedings in time. We would like to thank all invited speakers and authors for their contributions, the Program Committee and the external reviewers for their cooperation, and Springer for its very professional publishing work.

March 2021 Carlos Martín-Vide Miguel A. Vega-Rodríguez Travis Wheeler Organization

AlCoB 2021 was organized by the University of Montana, Missoula, USA, and the Institute for Research Development, Training and Advice (IRDTA), Brussels/London, Belgium/UK.

Program Committee

Ludmil Alexandrov University of California, San Diego, USA Can Alkan Bilkent University, Turkey Mani Arumugam University of Copenhagen, Denmark Massachusetts Institute of Technology, USA Sanchita Bhattacharya University of California, San Francisco, USA Chao Cheng Dartmouth College, USA Keith Crandall George Washington University, USA Colin Dewey University of Wisconsin-Madison, USA Ian Dunham European Bioinformatics Institute, UK Anton Enright University of Cambridge, UK Joe Felsenstein University of Washington, USA Pedro G. Ferreira University of Porto, Portugal Martin Frith University of Tokyo, Japan Debashis Ghosh University of Colorado, USA Michael Gribskov Purdue University, USA Michael Hawrylycz Allen Institute for Brain Science, USA Daniel Huson University of Tübingen, Germany Kazutaka Katoh Osaka University, Japan Miriam Konkel Clemson University, USA Maria-Jesus Martin European Bioinformatics Institute, UK Carlos Martín-Vide (Chair) Rovira i Virgili University, Spain David H. Mathews University of Rochester, USA Aaron McKenna Dartmouth College, USA Ryan E. Mills University of Michigan, USA Burkhard Morgenstern University of Göttingen, Germany Zemin Ning Wellcome Sanger Institute, UK Joel S. Parker University of North Carolina at Chapel Hill, USA Kay Prüfer Max Planck Institute for Evolutionary Anthropology, Germany Knut Reinert Free University of Berlin, Germany Walter L. Ruzzo University of Washington, USA Russell Schwartz Carnegie Mellon University, USA Gordon Smyth Walter and Eliza Hall Institute of Medical Research, Australia viii Organization

Peter F. Stadler Leipzig University, Germany Barcelona Supercomputing Center, Spain Fabio Vandin University of Padua, Italy Kai Wang Children’s Hospital of Philadelphia, USA Matt T. Weirauch Cincinnati Children’s Hospital, USA Travis Wheeler University of Montana, USA Zohar Yakhini Interdisciplinary Center Herzliya, Israel Shibu Yooseph University of Central Florida, USA

Additional Reviewer

Marzieh Eslami Rasekh

Organizing Committee

Sara Morales IRDTA, Brussels, Belgium Manuel Parra-Royón University of Granada, Spain David Silva (Co-chair) IRDTA, London, UK Miguel A. Vega-Rodríguez University of Extremadura, Cáceres, Spain Travis Wheeler (Co-chair) University of Montana, Missoula, USA Contents

Biological Dynamical Systems and Other Biological Processes

Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars ...... 3 Ernst Althaus, Andreas Hildebrandt, and Domenico Mosca

Can We Replace Reads by Numeric Signatures? Lyndon Fingerprints as Representations of Sequencing Reads for Machine Learning...... 16 Paola Bonizzoni, Clelia De Felice, Alessia Petescia, Yuri Pirola, Raffaella Rizzi, Jens Stoye, Rocco Zaccagnino, and Rosalba Zizza

Exploiting Variable Sparsity in Computing Equilibria of Biological Dynamical Systems by Triangular Decomposition ...... 29 Wenwen Ju and Chenqi Mou

A Recovery and Pooling Designs for One-Stage Noisy Group Testing Under the Probabilistic Framework ...... 42 Yining Liu, Sachin Kadyan, and Itsik Pe’er

Phylogenetics

Novel Phylogenetic Network Distances Based on Cherry Picking ...... 57 Kaari Landry, Aivee Teodocio, Manuel Lafond, and Olivier Tremblay-Savard

Best Match Graphs with Binary Trees ...... 82 David Schaller, Manuela Geiß, Marc Hellmuth, and Peter F. Stadler

Scalable and Accurate Phylogenetic Placement Using pplacer-XR ...... 94 Eleanor Wedell, Yirong Cai, and

Comparing Methods for Species Tree Estimation with Gene Duplication and Loss ...... 106 James Willson, Mrinmoy Saha Roddur, and Tandy Warnow

Sequence Alignment and Genome Rearrangement

Reversal Distance on with Different Gene Content and Intergenic Regions Information ...... 121 Alexsandro Oliveira Alexandrino, Klairton Lima Brito, Andre Rodrigues Oliveira, Ulisses Dias, and Zanoni Dias x Contents

Reversals Distance Considering Flexible Intergenic Regions Sizes...... 134 Klairton Lima Brito, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Ulisses Dias, and Zanoni Dias

Improved DNA-versus-Protein Homology Search for Protein Fossils ...... 146 Yin Yao and Martin C. Frith

The Maximum Weight Trace Alignment Merging Problem...... 159 Paul Zaharias, Vladimir Smirnov, and Tandy Warnow

Author Index ...... 173