
sam2lca Release 0.3.1 Maxime Borry Sep 07, 2021 CONTENTS: 1 sam2lca 3 1.1 Quick start................................................3 1.2 Installation................................................3 1.3 Documentation..............................................4 2 Python API 5 3 Command Line Interface 7 3.1 sam2lca..................................................7 4 Output 11 4.1 JSON................................................... 11 4.2 CSV................................................... 13 5 Indices and tables 15 Python Module Index 17 Index 19 i ii sam2lca, Release 0.3.1 Homepage: github.com/maxibor/sam2lca CONTENTS: 1 sam2lca, Release 0.3.1 2 CONTENTS: CHAPTER ONE SAM2LCA Lowest Common Ancestor from a SAM/BAM/CRAM sequence alignment file 1.1 Quick start Quick analyis of sequencing reads aligned to a DNA database sam2lca analyze myfile.bam See all options sam2lca --help sam2lca update-db --help sam2lca analyze --help 1.2 Installation 1.2.1 From source git clone [email protected]:maxibor/sam2lca.git conda env create -f environment.yml conda activate sam2lca pip install git+ssh://[email protected]/maxibor/sam2lca.git 1.2.2 From Conda conda install -c conda-forge -c bioconda -c maxibor sam2lca 3 sam2lca, Release 0.3.1 1.2.3 From Pypi pip install sam2lca 1.3 Documentation The documentation is available here: sam2lca.readthedocs.io 4 Chapter 1. sam2lca CHAPTER TWO PYTHON API sam2lca.main.sam2lca(sam, mappings, tree, process, identity, length, conserved, dbdir, output) Performs LCA on SAM/BAM/CRAM alignment file Parameters • sam (str) – Path to SAM/BAM/CRAM alignment file • mappings (str) – Type of Acc2Tax mapping • tree (str) – Optional taxonomic tree • process (int) – Number of process for parallelization • identity (float) – Minimum identity • length (int) – Minimum alignment length • dbdir (str) – Path to database stroring directory • output (str) – Path to sam2lca output file sam2lca.main.update_database(mappings, dbdir, ncbi) Performs LCA on SAM/BAM/CRAM alignment file Parameters • mappings (str) – Type of Acc2Tax mapping • dbdir (str) – Path to database stroring directory • ncbi (bool) – Updates NCBI taxonomic tree 5 sam2lca, Release 0.3.1 6 Chapter 2. Python API CHAPTER THREE COMMAND LINE INTERFACE To access the help menu: $ sam2lca --help The list of arguments of options is detailed below 3.1 sam2lca sam2lca: Last Common Ancestor on SAM/BAM/CRAM alignment files Author: Maxime Borry Contact: <borry[at]shh.mpg.de> Homepage & Documentation: github.com/maxibor/sam2lca sam2lca[OPTIONS] COMMAND[ARGS]... Options --version Show the version and exit. -m, --mappings <mappings> Mapping type of accession to TAXID Default nucl Options nucl|prot|test -d, --dbdir <dbdir> Directory to store taxonomy databases Default /home/docs/.sam2lca 7 sam2lca, Release 0.3.1 3.1.1 analyze Run the sam2lca analysis SAM: path to SAM/BAM/CRAM alignment file sam2lca analyze[OPTIONS] SAM Options -i, --identity <identity> Minimum identity Default 0.8 -l, --length <length> Minimum alignment length Default 30 -c, --conserved Ignore reads mapping in ultraconserved regions -p, --process <process> Number of process for parallelization Default 2 -t, --tree <tree> Optional Newick Taxonomy Tree -o, --output <output> sam2lca output file Arguments SAM Required argument 3.1.2 update-db Download/prepare mappings and taxonomy databases sam2lca update-db[OPTIONS] 8 Chapter 3. Command Line Interface sam2lca, Release 0.3.1 Options -n, --ncbi Update NCBI taxonomy tree 3.1. sam2lca 9 sam2lca, Release 0.3.1 10 Chapter 3. Command Line Interface CHAPTER FOUR OUTPUT sam2lca generates a JSON and CSV file as outputs. 4.1 JSON A JSON file with NCBI Taxonomy IDs as keys. • name: scientific name of the taxon • rank: taxonomic rank of the taxon • count: number of reads mapping to the taxon • lineage: taxonomic lineage of the taxon Example: { "543":{ "name": "Enterobacteriaceae", "rank": "family", "count": 2152, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" } ] (continues on next page) 11 sam2lca, Release 0.3.1 (continued from previous page) }, "300267":{ "name": "Shigella dysenteriae Sd197", "rank": "no rank", "count": 338, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" }, { "genus": "Shigella" }, { "species": "Shigella dysenteriae" }, { "no rank": "Shigella dysenteriae Sd197" } ] }, "511145":{ "name": "Escherichia coli str. K-12 substr. MG1655", "rank": "no rank", "count": 385, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" (continues on next page) 12 Chapter 4. Output sam2lca, Release 0.3.1 (continued from previous page) }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" }, { "genus": "Escherichia" }, { "species": "Escherichia coli" }, { "no rank": "Escherichia coli K-12" }, { "no rank": "Escherichia coli str. K-12 substr. MG1655" } ] } } 4.2 CSV Rows: Taxons Columns: • TAXID: NCBI taxonomy ID • name: Name of the taxon • rank: Taxonomic rank • count: Number of reads assigned to this taxon • lineage: Taxonomic lineage of this taxon TAXID, name, rank, count, lineage 543, Enterobacteriaceae, family, 2242,"[{'no rank':'root'},{'no rank':'cellular ,!organisms'},{'superkingdom':'Bacteria'},{'phylum':'Proteobacteria'},{'class ,!':'Gammaproteobacteria'},{'order':'Enterobacterales'},{'family': ,!'Enterobacteriaceae'}]" 511145, Escherichia coli str.K-12 substr. MG1655, no rank, 385,"[{'no rank':'root'} ,!,{'no rank':'cellular organisms'},{'superkingdom':'Bacteria'},{'phylum': ,!'Proteobacteria'},{'class':'Gammaproteobacteria'},{'order':'Enterobacterales'} ,!,{'family':'Enterobacteriaceae'},{'genus':'Escherichia'},{'species': ,!'Escherichia coli'},{'no rank':'Escherichia coli K-12'},{'no rank': ,!'Escherichia coli str. K-12 substr. MG1655'}]" 300267, Shigella dysenteriae Sd197, no rank, 248,"[{'no rank':'root'},{'no rank': ,!'cellular organisms'},{'superkingdom':'Bacteria'},{'phylum':'Proteobacteria'}, ,! {'class':'Gammaproteobacteria'},{'order':'Enterobacterales'},{'family': ,!'Enterobacteriaceae'},{'genus':'Shigella'},{'species':'Shigella dysenteriae'}, ,! {'no rank':'Shigella dysenteriae Sd197'}]" 4.2. CSV 13 sam2lca, Release 0.3.1 14 Chapter 4. Output CHAPTER FIVE INDICES AND TABLES • genindex • modindex • search 15 sam2lca, Release 0.3.1 16 Chapter 5. Indices and tables PYTHON MODULE INDEX s sam2lca.main,5 17 sam2lca, Release 0.3.1 18 Python Module Index INDEX Symbols sam2lca-update-db command line --conserved option,9 sam2lca-analyze command line -o option,8 sam2lca-analyze command line --dbdir <dbdir> option,8 sam2lca command line option,7 -p --identity <identity> sam2lca-analyze command line sam2lca-analyze command line option,8 option,8 -t --length <length> sam2lca-analyze command line sam2lca-analyze command line option,8 option,8 --mappings <mappings> M sam2lca command line option,7 module --ncbi sam2lca.main,5 sam2lca-update-db command line option,9 S --output <output> SAM sam2lca-analyze command line sam2lca-analyze command line option,8 option,8 --process <process> sam2lca command line option sam2lca-analyze command line --dbdir <dbdir>,7 option,8 --mappings <mappings>,7 --tree <tree> --version,7 sam2lca-analyze command line -d,7 option,8 -m,7 --version sam2lca() (in module sam2lca.main),5 sam2lca command line option,7 sam2lca.main -c module,5 sam2lca-analyze command line sam2lca-analyze command line option option,8 --conserved,8 -d --identity <identity>,8 sam2lca command line option,7 --length <length>,8 -i --output <output>,8 sam2lca-analyze command line --process <process>,8 option,8 --tree <tree>,8 -l -c,8 sam2lca-analyze command line -i,8 option,8 -l,8 -m -o,8 sam2lca command line option,7 -p,8 -n -t,8 19 sam2lca, Release 0.3.1 SAM,8 sam2lca-update-db command line option --ncbi,9 -n,9 U update_database() (in module sam2lca.main),5 20 Index.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages24 Page
-
File Size-