Network 10.2.0.0. User Guide
Total Page:16
File Type:pdf, Size:1020Kb
Network 10.2.0.0. User Guide Version date: 30 December 2020 Copyright © 2020 Fluxus Technology Ltd. All rights reserved. Legal Disclaimer : This user guide shall not be interpreted as a warranty of any kind. Use of the software is subject to the terms under www.fluxus-engineering.com/network_terms.htm 2 Preface In this version of the user guide, I have added some suggestions on next generation sequencing (NGS) data analysis, on the beta-version of our VCF2RDFconverter, and on SARS-CoV-2 data analysis. Like nearly everyone else in most parts of the world, our work and lives have been severely affected by the unprecedented events and countermeasures. We thank everyone who continues to support our community-funded Network software by purchasing Network Publisher and DNA Alignment. Specifically, I would like to suggest that those who have sufficient funding could upgrade their Network Publisher lab license to a Network Publisher university license. We would welcome more funding to keep our equipment and commercial software developer licenses up to date and cover the ever increasing overhead costs. There are now two versions, Network 10 for current Windows systems, and Network 4 for older Windows systems. Network 4 hits memory limits faster and comes with minor display problems on some modern computers, but will also run on current Windows if you ignore the warning messages. This user guide still contains screen snapshots from an old Network version which have not changed, except in their Windows appearance, in the new versions. We generally try to answer your emails, but sometimes we get bounce-backs. Before you think that we are ignoring you, please bear in mind that your email providers may be blocking our email answer to you. Looking back, Network has come a long way since we released the first DOS version in January 2000. The network reduction strategies described in this user guide are even more relevant today than in 2000, when data sizes were generally smaller and “hairballs” were not yet a general problem.If you run into a problem, we hope that this updated user guide will continue to help you to get a meaningful network out of your data. Looking forward, I wish everyone a rapid return to normal conditions. Michael Forster, 30 December 2020 3 Table of Contents Preface ...................................................................................................................................2 1. Overview............................................................................................................................5 1.1 Scope of application ........................................................................................................5 1.2 Network building options ................................................................................................5 1.3 Further complexity reduction options...............................................................................5 1.4 Complementary options...................................................................................................5 2. Work Flow .........................................................................................................................6 2.1 Overview of the general work flow and the RM-MJ work flow.........................................6 2.1.1 Variable data .................................................................................................................8 2.1.2 Preparation of variable data sets for Network.................................................................9 2.1.3 Weights .......................................................................................................................12 2.1.4 Frequency....................................................................................................................16 2.1.5 Epsilon (in MJ), Connection Cost / Greedy FHP (in MJ) / MJ square option................17 2.1.6 Reduction threshold r and out file option (in RM network option)................................20 2.1.7 MP option to clean up networks...................................................................................22 2.1.8 Star Contraction option: Use for network simplification, or for identification of population expansion events........................................................................................24 2.1.9 "Frequency>1" Criterion for networks with large number of taxa ................................26 2.1.10 RM-MJ network calculation for reduced complexity.................................................27 2.2 DNA nucleotide sequence data .......................................................................................28 2.2.1 Data entry....................................................................................................................28 2.2.2 Network calculation using the MJ algorithm with optional external rooting .................29 2.2.3 Discussing, analysing, and interpreting network results (MJ and RM)..........................31 2.2.4 Graphical layout of results ...........................................................................................33 2.2.4.1 Node and pie chart colouring in Network Publisher 2.0.0.0.......................................34 2.2.5 Verification using the RM option.................................................................................36 2.3 RNA nucleotide sequence data .......................................................................................38 2.3.1 Data entry....................................................................................................................38 2.4 Amino acid sequence data ..............................................................................................39 2.4.1 Data entry....................................................................................................................39 2.4.2 Network calculation, analysis, interpretation, and graphics ..........................................40 2.5 STR data (short tandem repeat, microsatellite data) ........................................................41 2.5.1 Data entry....................................................................................................................41 2.5.2 Network calculation, analysis, interpretation, and graphics ..........................................42 2.6 Endonuclease data (RFLP, restriction fragment length data)...........................................43 2.6.1 Data entry....................................................................................................................43 2.6.2 Network calculation, analysis, interpretation, and graphics ..........................................44 4 2.7 Binary data.....................................................................................................................45 2.7.1 Data entry....................................................................................................................45 2.7.2 Network calculation, analysis, interpretation, and graphics ..........................................45 2.8 Time estimates ...............................................................................................................46 2.8.1 Calibration of network mutation rate with a known event ............................................46 2.8.2 Age estimation of a node in the network ......................................................................48 3. Software Limits in Network 10.2.0.0................................................................................50 4. Network 10.2.0.0.: Present and Future..............................................................................51 5. Feedback: Bug Reports and Enhancement Requests .........................................................52 6. Next Generation Sequencing (NGS) suggestions ..............................................................53 6.1 VCF2RDF Converter......................................................................................................53 6.2 GenSearchNGS ..............................................................................................................55 6.3 SARS-CoV-2 .................................................................................................................56 6.3.1 Private mutations of bat coronavirus in (PNAS) network.............................................56 6.3.2 Suggestions for adding virus genomes to early stage (PNAS) network.........................57 6.3.3 Suggestions for alignment of virus genomes ................................................................57 6.3.5 Suggestions for variant-calling in virus genomes .........................................................58 5 1. Overview 1.1 Scope of application Network is used to reconstruct phylogenetic networks and trees, infer ancestral types and potential types, evolutionary branchings and variants, and to estimate datings. The algorithms are designed for non-recombining bio-molecules. Successful applications include mtDNA, Y-STR, amino acid, RNA, virus DNA, bacterium DNA, some effectively non-recombining autosomal DNA, and non-biomolecule data such as linguistic data. By contrast, recombining bio-molecules will deliver high-dimensional networks which will be difficult to interpret. Work flow including data preparation and interpretation of results is described in detail in the next chapters. 1.2 Network building options The Network software was developed to reconstruct all possible shortest least complex phylogenetic