Sam2lca's Documentation!

Sam2lca's Documentation!

sam2lca Release 0.3.1 Maxime Borry Sep 07, 2021 CONTENTS: 1 sam2lca 3 1.1 Quick start................................................3 1.2 Installation................................................3 1.3 Documentation..............................................4 2 Python API 5 3 Command Line Interface 7 3.1 sam2lca..................................................7 4 Output 11 4.1 JSON................................................... 11 4.2 CSV................................................... 13 5 Indices and tables 15 Python Module Index 17 Index 19 i ii sam2lca, Release 0.3.1 Homepage: github.com/maxibor/sam2lca CONTENTS: 1 sam2lca, Release 0.3.1 2 CONTENTS: CHAPTER ONE SAM2LCA Lowest Common Ancestor from a SAM/BAM/CRAM sequence alignment file 1.1 Quick start Quick analyis of sequencing reads aligned to a DNA database sam2lca analyze myfile.bam See all options sam2lca --help sam2lca update-db --help sam2lca analyze --help 1.2 Installation 1.2.1 From source git clone [email protected]:maxibor/sam2lca.git conda env create -f environment.yml conda activate sam2lca pip install git+ssh://[email protected]/maxibor/sam2lca.git 1.2.2 From Conda conda install -c conda-forge -c bioconda -c maxibor sam2lca 3 sam2lca, Release 0.3.1 1.2.3 From Pypi pip install sam2lca 1.3 Documentation The documentation is available here: sam2lca.readthedocs.io 4 Chapter 1. sam2lca CHAPTER TWO PYTHON API sam2lca.main.sam2lca(sam, mappings, tree, process, identity, length, conserved, dbdir, output) Performs LCA on SAM/BAM/CRAM alignment file Parameters • sam (str) – Path to SAM/BAM/CRAM alignment file • mappings (str) – Type of Acc2Tax mapping • tree (str) – Optional taxonomic tree • process (int) – Number of process for parallelization • identity (float) – Minimum identity • length (int) – Minimum alignment length • dbdir (str) – Path to database stroring directory • output (str) – Path to sam2lca output file sam2lca.main.update_database(mappings, dbdir, ncbi) Performs LCA on SAM/BAM/CRAM alignment file Parameters • mappings (str) – Type of Acc2Tax mapping • dbdir (str) – Path to database stroring directory • ncbi (bool) – Updates NCBI taxonomic tree 5 sam2lca, Release 0.3.1 6 Chapter 2. Python API CHAPTER THREE COMMAND LINE INTERFACE To access the help menu: $ sam2lca --help The list of arguments of options is detailed below 3.1 sam2lca sam2lca: Last Common Ancestor on SAM/BAM/CRAM alignment files Author: Maxime Borry Contact: <borry[at]shh.mpg.de> Homepage & Documentation: github.com/maxibor/sam2lca sam2lca[OPTIONS] COMMAND[ARGS]... Options --version Show the version and exit. -m, --mappings <mappings> Mapping type of accession to TAXID Default nucl Options nucl|prot|test -d, --dbdir <dbdir> Directory to store taxonomy databases Default /home/docs/.sam2lca 7 sam2lca, Release 0.3.1 3.1.1 analyze Run the sam2lca analysis SAM: path to SAM/BAM/CRAM alignment file sam2lca analyze[OPTIONS] SAM Options -i, --identity <identity> Minimum identity Default 0.8 -l, --length <length> Minimum alignment length Default 30 -c, --conserved Ignore reads mapping in ultraconserved regions -p, --process <process> Number of process for parallelization Default 2 -t, --tree <tree> Optional Newick Taxonomy Tree -o, --output <output> sam2lca output file Arguments SAM Required argument 3.1.2 update-db Download/prepare mappings and taxonomy databases sam2lca update-db[OPTIONS] 8 Chapter 3. Command Line Interface sam2lca, Release 0.3.1 Options -n, --ncbi Update NCBI taxonomy tree 3.1. sam2lca 9 sam2lca, Release 0.3.1 10 Chapter 3. Command Line Interface CHAPTER FOUR OUTPUT sam2lca generates a JSON and CSV file as outputs. 4.1 JSON A JSON file with NCBI Taxonomy IDs as keys. • name: scientific name of the taxon • rank: taxonomic rank of the taxon • count: number of reads mapping to the taxon • lineage: taxonomic lineage of the taxon Example: { "543":{ "name": "Enterobacteriaceae", "rank": "family", "count": 2152, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" } ] (continues on next page) 11 sam2lca, Release 0.3.1 (continued from previous page) }, "300267":{ "name": "Shigella dysenteriae Sd197", "rank": "no rank", "count": 338, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" }, { "genus": "Shigella" }, { "species": "Shigella dysenteriae" }, { "no rank": "Shigella dysenteriae Sd197" } ] }, "511145":{ "name": "Escherichia coli str. K-12 substr. MG1655", "rank": "no rank", "count": 385, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" (continues on next page) 12 Chapter 4. Output sam2lca, Release 0.3.1 (continued from previous page) }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" }, { "genus": "Escherichia" }, { "species": "Escherichia coli" }, { "no rank": "Escherichia coli K-12" }, { "no rank": "Escherichia coli str. K-12 substr. MG1655" } ] } } 4.2 CSV Rows: Taxons Columns: • TAXID: NCBI taxonomy ID • name: Name of the taxon • rank: Taxonomic rank • count: Number of reads assigned to this taxon • lineage: Taxonomic lineage of this taxon TAXID, name, rank, count, lineage 543, Enterobacteriaceae, family, 2242,"[{'no rank':'root'},{'no rank':'cellular ,!organisms'},{'superkingdom':'Bacteria'},{'phylum':'Proteobacteria'},{'class ,!':'Gammaproteobacteria'},{'order':'Enterobacterales'},{'family': ,!'Enterobacteriaceae'}]" 511145, Escherichia coli str.K-12 substr. MG1655, no rank, 385,"[{'no rank':'root'} ,!,{'no rank':'cellular organisms'},{'superkingdom':'Bacteria'},{'phylum': ,!'Proteobacteria'},{'class':'Gammaproteobacteria'},{'order':'Enterobacterales'} ,!,{'family':'Enterobacteriaceae'},{'genus':'Escherichia'},{'species': ,!'Escherichia coli'},{'no rank':'Escherichia coli K-12'},{'no rank': ,!'Escherichia coli str. K-12 substr. MG1655'}]" 300267, Shigella dysenteriae Sd197, no rank, 248,"[{'no rank':'root'},{'no rank': ,!'cellular organisms'},{'superkingdom':'Bacteria'},{'phylum':'Proteobacteria'}, ,! {'class':'Gammaproteobacteria'},{'order':'Enterobacterales'},{'family': ,!'Enterobacteriaceae'},{'genus':'Shigella'},{'species':'Shigella dysenteriae'}, ,! {'no rank':'Shigella dysenteriae Sd197'}]" 4.2. CSV 13 sam2lca, Release 0.3.1 14 Chapter 4. Output CHAPTER FIVE INDICES AND TABLES • genindex • modindex • search 15 sam2lca, Release 0.3.1 16 Chapter 5. Indices and tables PYTHON MODULE INDEX s sam2lca.main,5 17 sam2lca, Release 0.3.1 18 Python Module Index INDEX Symbols sam2lca-update-db command line --conserved option,9 sam2lca-analyze command line -o option,8 sam2lca-analyze command line --dbdir <dbdir> option,8 sam2lca command line option,7 -p --identity <identity> sam2lca-analyze command line sam2lca-analyze command line option,8 option,8 -t --length <length> sam2lca-analyze command line sam2lca-analyze command line option,8 option,8 --mappings <mappings> M sam2lca command line option,7 module --ncbi sam2lca.main,5 sam2lca-update-db command line option,9 S --output <output> SAM sam2lca-analyze command line sam2lca-analyze command line option,8 option,8 --process <process> sam2lca command line option sam2lca-analyze command line --dbdir <dbdir>,7 option,8 --mappings <mappings>,7 --tree <tree> --version,7 sam2lca-analyze command line -d,7 option,8 -m,7 --version sam2lca() (in module sam2lca.main),5 sam2lca command line option,7 sam2lca.main -c module,5 sam2lca-analyze command line sam2lca-analyze command line option option,8 --conserved,8 -d --identity <identity>,8 sam2lca command line option,7 --length <length>,8 -i --output <output>,8 sam2lca-analyze command line --process <process>,8 option,8 --tree <tree>,8 -l -c,8 sam2lca-analyze command line -i,8 option,8 -l,8 -m -o,8 sam2lca command line option,7 -p,8 -n -t,8 19 sam2lca, Release 0.3.1 SAM,8 sam2lca-update-db command line option --ncbi,9 -n,9 U update_database() (in module sam2lca.main),5 20 Index.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us