sam2lca Release 0.3.1
Maxime Borry
Sep 07, 2021
CONTENTS:
1 sam2lca 3 1.1 Quick start...... 3 1.2 Installation...... 3 1.3 Documentation...... 4
2 Python API 5
3 Command Line Interface 7 3.1 sam2lca...... 7
4 Output 11 4.1 JSON...... 11 4.2 CSV...... 13
5 Indices and tables 15
Python Module Index 17
Index 19
i ii sam2lca, Release 0.3.1
Homepage: github.com/maxibor/sam2lca
CONTENTS: 1 sam2lca, Release 0.3.1
2 CONTENTS: CHAPTER ONE
SAM2LCA
Lowest Common Ancestor from a SAM/BAM/CRAM sequence alignment file
1.1 Quick start
Quick analyis of sequencing reads aligned to a DNA database sam2lca analyze myfile.bam
See all options sam2lca --help sam2lca update-db --help sam2lca analyze --help
1.2 Installation
1.2.1 From source git clone [email protected]:maxibor/sam2lca.git conda env create -f environment.yml conda activate sam2lca pip install git+ssh://[email protected]/maxibor/sam2lca.git
1.2.2 From Conda conda install -c conda-forge -c bioconda -c maxibor sam2lca
3 sam2lca, Release 0.3.1
1.2.3 From Pypi pip install sam2lca
1.3 Documentation
The documentation is available here: sam2lca.readthedocs.io
4 Chapter 1. sam2lca CHAPTER TWO
PYTHON API
sam2lca.main.sam2lca(sam, mappings, tree, process, identity, length, conserved, dbdir, output) Performs LCA on SAM/BAM/CRAM alignment file Parameters • sam (str) – Path to SAM/BAM/CRAM alignment file • mappings (str) – Type of Acc2Tax mapping • tree (str) – Optional taxonomic tree • process (int) – Number of process for parallelization • identity (float) – Minimum identity • length (int) – Minimum alignment length • dbdir (str) – Path to database stroring directory • output (str) – Path to sam2lca output file sam2lca.main.update_database(mappings, dbdir, ncbi) Performs LCA on SAM/BAM/CRAM alignment file Parameters • mappings (str) – Type of Acc2Tax mapping • dbdir (str) – Path to database stroring directory • ncbi (bool) – Updates NCBI taxonomic tree
5 sam2lca, Release 0.3.1
6 Chapter 2. Python API CHAPTER THREE
COMMAND LINE INTERFACE
To access the help menu:
$ sam2lca --help
The list of arguments of options is detailed below
3.1 sam2lca sam2lca: Last Common Ancestor on SAM/BAM/CRAM alignment files Author: Maxime Borry Contact:
sam2lca[OPTIONS] COMMAND[ARGS]...
Options
--version Show the version and exit. -m, --mappings
7 sam2lca, Release 0.3.1
3.1.1 analyze
Run the sam2lca analysis
SAM: path to SAM/BAM/CRAM alignment file sam2lca analyze[OPTIONS] SAM
Options
-i, --identity
Arguments
SAM Required argument
3.1.2 update-db
Download/prepare mappings and taxonomy databases sam2lca update-db[OPTIONS]
8 Chapter 3. Command Line Interface sam2lca, Release 0.3.1
Options
-n, --ncbi Update NCBI taxonomy tree
3.1. sam2lca 9 sam2lca, Release 0.3.1
10 Chapter 3. Command Line Interface CHAPTER FOUR
OUTPUT
sam2lca generates a JSON and CSV file as outputs.
4.1 JSON
A JSON file with NCBI Taxonomy IDs as keys. • name: scientific name of the taxon • rank: taxonomic rank of the taxon • count: number of reads mapping to the taxon • lineage: taxonomic lineage of the taxon Example:
{ "543":{ "name": "Enterobacteriaceae", "rank": "family", "count": 2152, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" } ] (continues on next page)
11 sam2lca, Release 0.3.1
(continued from previous page) }, "300267":{ "name": "Shigella dysenteriae Sd197", "rank": "no rank", "count": 338, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" }, { "genus": "Shigella" }, { "species": "Shigella dysenteriae" }, { "no rank": "Shigella dysenteriae Sd197" } ] }, "511145":{ "name": "Escherichia coli str. K-12 substr. MG1655", "rank": "no rank", "count": 385, "lineage":[ { "no rank": "root" }, { "no rank": "cellular organisms" }, { "superkingdom": "Bacteria" }, { "phylum": "Proteobacteria" }, { "class": "Gammaproteobacteria" (continues on next page)
12 Chapter 4. Output sam2lca, Release 0.3.1
(continued from previous page) }, { "order": "Enterobacterales" }, { "family": "Enterobacteriaceae" }, { "genus": "Escherichia" }, { "species": "Escherichia coli" }, { "no rank": "Escherichia coli K-12" }, { "no rank": "Escherichia coli str. K-12 substr. MG1655" } ] } }
4.2 CSV
Rows: Taxons Columns: • TAXID: NCBI taxonomy ID • name: Name of the taxon • rank: Taxonomic rank • count: Number of reads assigned to this taxon • lineage: Taxonomic lineage of this taxon
TAXID, name, rank, count, lineage 543, Enterobacteriaceae, family, 2242,"[{'no rank':'root'},{'no rank':'cellular
˓→organisms'},{'superkingdom':'Bacteria'},{'phylum':'Proteobacteria'},{'class
˓→':'Gammaproteobacteria'},{'order':'Enterobacterales'},{'family':
˓→'Enterobacteriaceae'}]" 511145, Escherichia coli str.K-12 substr. MG1655, no rank, 385,"[{'no rank':'root'}
˓→,{'no rank':'cellular organisms'},{'superkingdom':'Bacteria'},{'phylum':
˓→'Proteobacteria'},{'class':'Gammaproteobacteria'},{'order':'Enterobacterales'}
˓→,{'family':'Enterobacteriaceae'},{'genus':'Escherichia'},{'species':
˓→'Escherichia coli'},{'no rank':'Escherichia coli K-12'},{'no rank':
˓→'Escherichia coli str. K-12 substr. MG1655'}]" 300267, Shigella dysenteriae Sd197, no rank, 248,"[{'no rank':'root'},{'no rank':
˓→'cellular organisms'},{'superkingdom':'Bacteria'},{'phylum':'Proteobacteria'},
˓→ {'class':'Gammaproteobacteria'},{'order':'Enterobacterales'},{'family':
˓→'Enterobacteriaceae'},{'genus':'Shigella'},{'species':'Shigella dysenteriae'},
˓→ {'no rank':'Shigella dysenteriae Sd197'}]"
4.2. CSV 13 sam2lca, Release 0.3.1
14 Chapter 4. Output CHAPTER FIVE
INDICES AND TABLES
• genindex • modindex • search
15 sam2lca, Release 0.3.1
16 Chapter 5. Indices and tables PYTHON MODULE INDEX
s sam2lca.main,5
17 sam2lca, Release 0.3.1
18 Python Module Index INDEX
Symbols sam2lca-update-db command line --conserved option,9 sam2lca-analyze command line -o option,8 sam2lca-analyze command line --dbdir
19 sam2lca, Release 0.3.1
SAM,8 sam2lca-update-db command line option --ncbi,9 -n,9 U update_database() (in module sam2lca.main),5
20 Index