Motif Search Prune Algorithm for Parallel Environment
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Bioinformatics Algorithms Techniques and Appli
BIOINFORMATICS ALGORITHMS Techniques and Applications Edited by Ion I. Mandoiu˘ and Alexander Zelikovsky A JOHN WILEY & SONS, INC., PUBLICATION BIOINFORMATICS ALGORITHMS BIOINFORMATICS ALGORITHMS Techniques and Applications Edited by Ion I. Mandoiu˘ and Alexander Zelikovsky A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2008 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201)-748-6011, fax (201)-748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. -
A Comprehensive and High-Performance Motif Finding Approach on Heterogeneous Systems
ABSTRACT A COMPREHENSIVE AND HIGH-PERFORMANCE MOTIF FINDING APPROACH ON HETEROGENEOUS SYSTEMS Unknown regulatory motif finding on DNA sequences is a crucial task for understanding gene expression and the task requires accuracy and efficiency. We propose DMF, a combinatorial approach that uses hash-based heuristics to skip unnecessary computations while retaining the maximum accuracy. Parallelized versions of our DMF approach, called PDMF, have been developed to use CPU, GPU and heterogeneous computing architectures in order to achieve the maximum performance. PDMF also incorporates SIMD instructions to further accelerate the task of unknown motif search. Our experimental results show that the multicore version (PDMFm) achieved 8.87x speedup over DMF. The GPU version (PDMFg) achieved a 41.48x and 9.95x average speedup over the serial version and PDMFm, respectively. Our SIMD enhanced heterogeneous approach (PDMFh) achieved a 3.42x speedup over our fastest GPU model (PDMFg1). The proposed approach was tested for performance against popular approximate and suffix tree-based approaches with various sized real-world datasets and the experimental results showed that the proposed approach achieved the maximum accuracy within a practical time bound for motif lengths 6~14. Sanjay Soundarajan May 2020 A COMPREHENSIVE AND HIGH-PERFORMANCE MOTIF FINDING APPROACH ON HETEROGENEOUS SYSTEMS by Sanjay Soundarajan A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the College of Science and Mathematics California State University, Fresno May 2020 © 2020 Sanjay Soundarajan APPROVED For the Department of Computer Science: We, the undersigned, certify that the thesis of the following student meets the required standards of scholarship, format, and style of the university and the student's graduate degree program for the awarding of the master's degree. -
A Speedup Technique for (L, D)-Motif Finding Algorithms Sanguthevar Rajasekaran*, Hieu Dinh
Rajasekaran and Dinh BMC Research Notes 2011, 4:54 http://www.biomedcentral.com/1756-0500/4/54 RESEARCHARTICLE Open Access A speedup technique for (l, d)-motif finding algorithms Sanguthevar Rajasekaran*, Hieu Dinh Abstract Background: The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results: Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS), (l, d)-motif search (or Planted Motif Search (PMS)), and Edit-distance-based Motif Search (EMS). In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. -
A Sample Sequence Selection Algorithm for Quorum Planted Motif Search on Large DNA Datasets Qiang Yu, Dingbang Wei and Hongwei Huo*
Yu et al. BMC Bioinformatics (2018) 19:228 https://doi.org/10.1186/s12859-018-2242-y RESEARCH ARTICLE Open Access SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets Qiang Yu, Dingbang Wei and Hongwei Huo* Abstract Background: Given a set of tn-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t =20andn = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. Results: We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D’ with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D’. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D’ efficiently and (2) that the qPMS algorithms executed on D’ can find implanted or real motifs in a significantly shorter time than when executed on D. -
An Efficient Approach for Finding Common DNA Motifs with Gaps
M.SC. ENGG. THESIS An Efficient Approach for Finding Common DNA Motifs with Gaps by Suri Dipannita Sayeed Student ID: 0416052004 Submitted to Department of Computer Science and Engineering in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering Department of Computer Science and Engineering Bangladesh University of Engineering and Technology (BUET) Dhaka 1000 July 09, 2019 Dedicated to the Department of CSE, BUET AUTHOR’S CONTACT Suri Dipannita Sayeed 282, New Elephant Road Dhaka-1205. Email: [email protected] i The thesis titled “An Efficient Approach for Finding Common DNA Motifs with Gaps”, submitted by Suri Dipannita Sayeed, Roll No. 0416052004P, Session April 2016, to the Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, has been accepted as satisfactory in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering and approved as to its style and contents. Examination held on July 09, 2019. Board of Examiners 1. Dr. Atif Hasan Rahman Chairman Assistant Professor (Supervisor) Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1000. 2. Dr. Md. Mostofa Akbar Professor and Head of the Department Member Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1000. 3. Dr. M. Sohel Rahman Professor Member Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1000. 4. Dr. Md. Shamsuzzoha Bayzid Assistant Professor Member Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1000.