A Molecular Phylogenetic Toolkit Using Node.Js

RESEARCH ARTICLE phylo-node: A molecular phylogenetic toolkit using Node.js Damien M. O'Halloran1,2* 1 Department of Biological Sciences, The George Washington University, Washington, DC, United States of America, 2 Institute for Neuroscience, The George Washington University, Washington, DC, United States of America * [email protected] Abstract a1111111111 a1111111111 a1111111111 a1111111111 Background a1111111111 Node.js is an open-source and cross-platform environment that provides a JavaScript codebase for back-end server-side applications. JavaScript has been used to develop very fast and user-friendly front-end tools for bioinformatic and phylogenetic analyses. However, no such toolkits are available using Node.js to conduct comprehensive molecular phylogenetic OPEN ACCESS analysis. Citation: O'Halloran DM (2017) phylo-node: A molecular phylogenetic toolkit using Node.js. PLoS Results ONE 12(4): e0175480. https://doi.org/10.1371/ To address this problem, I have developed, phylo-node, which was developed using Node. journal.pone.0175480 js and provides a stable and scalable toolkit that allows the user to perform diverse molecu- Editor: Quan Zou, Tianjin University, CHINA lar and phylogenetic tasks. phylo-node can execute the analysis and process the resulting Received: October 9, 2016 outputs from a suite of software options that provides tools for read processing and genome Accepted: March 27, 2017 alignment, sequence retrieval, multiple sequence alignment, primer design, evolutionary Published: April 14, 2017 modeling, and phylogeny reconstruction. Furthermore, phylo-node enables the user to deploy server dependent applications, and also provides simple integration and interopera- Copyright: © 2017 Damien M. O'Halloran. This is an open access article distributed under the terms tion with other Node modules and languages using Node inheritance patterns, and a cus- of the Creative Commons Attribution License, tomized piping module to support the production of diverse pipelines. which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Conclusions Data Availability Statement: All source code and phylo-node is open-source and freely available to all users without sign-up or login require- user guidelines are openly available at the GitHub ments. All source code and user guidelines are openly available at the GitHub repository: repository: https://github.com/dohalloran/phylo- https://github.com/dohalloran/phylo-node. node. Funding: This work was supported by the George Washington University (GWU) Columbian College of Arts and Sciences, GWU Office of the Vice- President for Research, and the GWU Department of Biological Sciences. Introduction Competing interests: The author has declared that The cost of whole genome sequencing has plummeted over the last decade and as a conse- no competing interests exist. quence, the demand for genome sequencing technology has risen significantly [1]. This PLOS ONE | https://doi.org/10.1371/journal.pone.0175480 April 14, 2017 1 / 8 phylo-node demand has meant that producing large complex datasets of DNA and RNA sequence information is common in small research labs, and in terms of human health this boom in sequence information and precipitous drop in sequencing costs has had a direct impact in the area of personalized medicine [2±5]. However, once the sequence information becomes available, per- haps the greater challenge is then processing, analyzing, and interpreting the data. To keep pace with this challenge, the development of new, fast, and scalable software solutions is required to visualize and interpret this information. JavaScript is a lightweight programming language that uses a web browser as its host environment. JavaScript is cross-platform and supported by all modern browsers. Because Java- Script is client-side, it is very fast as it doesn't have to communicate with a server and wait for a response in order to run some code. Web browsers are ubiquitous and require no dependen- cies to deploy and operate, and so JavaScript represents an obvious solution for visualizing sequence information. Front-end developments using JavaScript have proven to be extremely efficient in providing fast, easy-to-use, and embeddable solutions for data analysis [6±14]. A very active community of developers at BioJS (biojs.io/) provides diverse components for parsing sequence data types, data visualization, and bioinformatics analysis in JavaScript [6,7,15±19]. Node.js provides server-side back-end JavaScript. Node.js is written in C, C++, and Java- Script and uses the Google Chrome V8 engine to offer a very fast cross-platform environment for developing server side Web applications. Node is a single-threaded environment, which means that only one line of code will be executed at any given time; however, Node employs non-blocking techniques for I/O tasks to provide an asynchronous ability, by using callback functions to permit the parallel running of code. Node holds much potential for the bioinformatic analysis of molecular data. A community of Node developers provides modules for bioinformatic sequence workflows (biojs.io/) which in time will likely parallel the BioJS community for the number of modules versus components. However, as of now there are no robust tools for phylogenetic analysis pipelines currently available using the Node.js codebase. To fill this void I have developed, phylo-node, which provides a Node.js toolkit that provides sequence retrieval, primer design, alignment, phylogeny reconstruction and as well as much more, all from a single toolkit. phylo-node is fast, easy to use, and offers simple customization and portability options through various inheritance patterns. The Node package manager, npm (https://www.npmjs.com/), provides a very easy and efficient way to manage dependen- cies for any Node application. phylo-node is available at GitHub (https://github.com/ dohalloran/phylo-node), npm (https://www.npmjs.com/package/phylo-node), and also BioJS (biojs.io/d/phylo-node). Materials and methods phylo-node was developed using the Node.js codebase. The phylo-node core contains a base wrapper object that is used to prepare the arguments and directory prior to program execution. The base wrapper module is contained within the Wrapper_core directory (Fig 1). An individual software tool can be easily accessed and executed by importing the module for that tool so as to get access to the method properties on that object. These method properties are available to the user by using the module.exports reference object. Inside a driver script file, the user can import the main module object properties and variables by using the require keyword which is used to import a module in Node.js. The require keyword is actually a global variable, and a script has access to its context because it is wrapped prior to execution inside the runInThisContext function (for more details, refer to the Node.js source code: https://github. com/nodejs). Once imported, the return value is assigned to a variable which is used to access PLOS ONE | https://doi.org/10.1371/journal.pone.0175480 April 14, 2017 2 / 8 phylo-node Fig 1. Workflow for phylo-node. phylo-node is organized into a workflow of connected modules and application scripts. In order to interface with a software tool, the base wrapper module is invoked to process command-line requests that are then passed into the software specific module. The input for the specific software can be passed into the base wrapper from a folder specified by the user or by using the sequence retrieval module which is contained within the Sequence directory. The Pipes directory contains a module for easy piping of data between applications while binaries and executables can be downloaded using the get_executable module from within the Download folder to deploy software specific modules within the Run directory or to provide applications to a web server from within the Server directory. https://doi.org/10.1371/journal.pone.0175480.g001 the various method properties on that object. For example: a method property on the phyml object is phyml.getphyml(), which invokes the getphyml method on the phyml object to download and decompress the PhyML executable. For a complete list of all methods, refer to the README.md file at the GitHub repository (https://github.com/dohalloran/phylo-node/blob/ master/README.md) and for a tutorial on using phylo-node refer to the video here: https:// www.youtube.com/watch?v=CNcFgW122lI. In order to correctly wrap and run each executable, new shells must be spawned so as to execute specific command formats for each executable. This was achieved by using child.process.exec, which will launch an external shell and execute the command inside that shell while buffering any output by the process. Binary files and executables were downloaded and executed in this manner and the appropriate file and syntax selected by determining the user's operating system. phylo-node was validated on Microsoft Windows 7 Enterprise ver.6.1, MacOSX El Capitan ver.10.11.5, and Linux Ubuntu 64-bit ver.14.04 LTS. Results and discussion phylo-node is a toolkit to interface with key applications necessary in building a phylogenetic pipeline (Table 1). Firstly, phylo-node allows the user to remotely download sequences by building a unique URL and passing this string to the NCBI e-utilities API (http://www.ncbi. nlm.nih.gov/books/NBK25501/). Any number of genes can be supplied as command-line arguments to phylo-node by accessing the fetch_seqs.fasta method on the fetch_seqs object in PLOS ONE | https://doi.org/10.1371/journal.pone.0175480 April 14, 2017 3 / 8 phylo-node Table 1. Summary of phylo-node applications. Module Description Source Application Citation fetch_seqs Remotely retrieve https://www.ncbi.nlm.nih.gov/books/ ASN.1 and FASTA sequences - sequence data NBK25501/ get_executable Download binaries and https://www.npmjs.com/package/ batch, exe, jar etc.

Load more