ncbi-taxonomist Documentation Release 1.2.1+8580b9b
Jan P Buchmann
2020-11-15
Contents:
1 Installation 3
2 Basic functions 5
3 Cookbook 35
4 Container 39
5 Frequently Asked Questions 49
6 Module references 51
7 Synopsis 63
8 Requirements and Dependencies 65
9 Contact 67
10 Indices and tables 69
Python Module Index 71
Index 73
i ii ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
1.2.1+8580b9b :: 2020-11-15
Contents: 1 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
2 Contents: CHAPTER 1
Installation
Content
• Local pip install (no root required) • Global pip install (root required) ncbi-taxonomist is available on PyPi via pip. If you use another Python package manager than pip, please consult its documentation. If you are installing ncbi-taxonomist on a non-Linux system, consider the propsed methods as guidelines and adjust as required.
Important: Please note If some of the proposed commands are unfamiliar to you, don’t just invoke them but look them up, e.g. in man pages or search online. Should you be unfamiliar with pip, check pip -h
Note: Python 3 vs. Python 2 Due to co-existing Python 2 and Python 3, some installation commands may be invoked slighty different. In addition, development and support for Python 2 did stop January 2020 and should not be used anymore. ncbi-taxonomist requires Python >= 3.8. Depending on your OS and/or distribution, the default pip command can install either Python 2 or Python 3 packages. Make sure you use pip for Python 3, e.g. pip3 on Ubuntu.
1.1 Local pip install (no root required)
$: pip install ncbi-taxonomist --user
3 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
On Linux, ncbi-taxonomist will be installed to $HOME/.local/bin. If you cannot invoke ncbi-taxonomist from the command line, its’ likely $HOME/.local/bin is not in your $PATH (check echo $PATH). In such a case, choose one of the following possibilities: • add $HOME/.local/bin to your $PATH: – echo "export PATH=${PATH}:$HOME/.local/bin" >> ~/.bashrc • add an alias: – see man bash or https://www.tldp.org/LDP/abs/html/aliases.html • use $HOME/.local/bin/ncbi-taxonomist implicitly
1.2 Global pip install (root required)
$: pip install ncbi-taxonomist ncbi-taxonomist should be now in /usr/local/bin and in you $PATH.
4 Chapter 1. Installation CHAPTER 2
Basic functions
All ncbi-taxonomist commands have the following underlying structure: ncbi-taxonomist
Contents
• Collect – Output format
* JSON output * XML output • Map – Taxids and names – Mapping accession – Supported access Entrez databases – Output format
* JSON output · Single mapping result · Multiple mapping results
* XML output · Single mapping result
5 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
· Multiple mapping results • Resolve – Taxids and names – Accessions – Output format
* JSON output · Single mapping result · Multiple mapping results
* XML output · Single mapping result · Multiple mapping results • Import – Local database schema – Import taxa via collect – Import taxa via resolve – Import accessions • Subtree – Collecting subtrees
* Between two given ranks * Collect one specific rank * Collect from a given rank to root and print XML * Collect from a given rank to lowest rank – Output format
* JSON output * XML output • Group – Creating a group – Retrieve a group
2.1 Collect
The collect command fetches taxa from the Entrez database. If Taxids or names sharing parts of the same lineage, these taxa are printed only once.
6 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
2.1.1 Output format
The output describes the collected taxa, one per line. A single taxon has the following structure, for example chim- panzee (tx9598):
{ "taxid" : 9598, "rank" : "species", "parentid" : 9596, "name" : "Pan troglodytes", "names" : { "Pan troglodytes" : "scientific_name", "chimpanzee" : "GenbankCommonName" } }
Collecting taxa for chimpanzee and human: ncbi-taxonomist collect -n chimpanzee human
JSON output
{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},
˓→"parentid":null,"name":"cellular organisms"} {"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid
˓→":131567,"name":"Eukaryota"} {"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid
˓→":2759,"name":"Opisthokonta"} {"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid
˓→":33154,"name":"Metazoa"} {"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,
˓→"name":"Eumetazoa"} {"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,
˓→"name":"Bilateria"} {"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid
˓→":33213,"name":"Deuterostomia"} {"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,
˓→"name":"Chordata"} {"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid
˓→":7711,"name":"Craniata"} {"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid
˓→":89593,"name":"Vertebrata"} {"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid
˓→":7742,"name":"Gnathostomata"} {"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid
˓→":7776,"name":"Teleostomi"} {"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid
˓→":117570,"name":"Euteleostomi"} {"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},
˓→"parentid":117571,"name":"Sarcopterygii"} {"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},
˓→"parentid":8287,"name":"Dipnotetrapodomorpha"} {"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid
˓→":1338369,"name":"Tetrapoda"} {"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523, ˓→"name":"Amniota"} (continues on next page)
2.1. Collect 7 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) {"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,
˓→"name":"Mammalia"} {"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,
˓→"name":"Theria"} {"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,
˓→"name":"Eutheria"} {"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid
˓→":9347,"name":"Boreoeutheria"} {"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},
˓→"parentid":1437010,"name":"Euarchontoglires"} {"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,
˓→"name":"Primates"} {"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid
˓→":9443,"name":"Haplorrhini"} {"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"},
˓→"parentid":376913,"name":"Simiiformes"} {"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid
˓→":314293,"name":"Catarrhini"} {"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"},
˓→"parentid":9526,"name":"Hominoidea"} {"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid
˓→":314295,"name":"Hominidae"} {"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid
˓→":9604,"name":"Homininae"} {"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598,
˓→"name":"Homo"} {"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":
˓→"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"} {"taxid":9596,"rank":"genus","names":{"Pan":"scientific_name"},"parentid":207598,"name
˓→":"Pan"} {"taxid":9598,"rank":"species","names":{"Pan troglodytes":"scientific_name",
˓→"chimpanzee":"GenbankCommonName"},"parentid":9596,"name":"Pan troglodytes"}
XML output
˓→ ˓→name>
˓→131567
˓→taxon> ˓→parentid> ˓→parentid> ˓→parentid> ˓→parentid>
˓→ ˓→parentid> ˓→parentid>
8 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) ˓→parentid> ˓→parentid> ˓→parentid>
˓→117570
˓→taxon>
˓→117571
˓→taxon>
˓→ ˓→name> ˓→parentid> ˓→parentid> ˓→parentid> ˓→parentid> ˓→parentid>
˓→9347
˓→taxon>
˓→ ˓→name> ˓→parentid>
˓→9443
˓→taxon>
˓→376913
˓→taxon>
˓→314293
˓→taxon>
˓→9526 ˓→parentid>
˓→ ˓→parentid>
˓→ ˓→"GenbankCommonName">human ˓→parentid>
˓→9596 ˓→"GenbankCommonName">chimpanzee
2.1. Collect 9 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
2.2 Map
The map command maps taxonomic information for taxids, names, and accessions. Without specifing the -edb argument, nucleotide Entrez database is assumed.
2.2.1 Taxids and names
Taxids and names can be mapped together. The taxids and names can be separated by commas and/or space. However, names containing space need to be encapsulated by '. For example:
$: ncbi-taxonomist map -t 562, 10508 -n man 'Influenza B virus (B/Acre/121609/2012)',
˓→chimpanzee
2.2.2 Mapping accession
The default database to map accessions is nucleotide. To map an accession form a different database, it has to be specified by the –entrezdb/-edb argument.
2.2.3 Supported access Entrez databases
Entrez Example database assembly ncbi-taxonomist map -edb assembly -a ASM1001476v1 ViralProj177933 bioproject ncbi-taxonomist map -edb bioproject -a PRJNA604394 nucleotide ncbi-taxonomist map -edb nucleotide -a MH510449.1 ncbi-taxonomist map -a MH510449.1 protein ncbi-taxonomist map -a YP_009345145 -edb protein
Note: Querying the following databases does not return the queried accession in the results. Therefore, results cannot identify which accession corresponds to which results if more than one are requested. To solve the one-to-one relationship, each of the accessions from these databases needs to be queried one-by-one and not as batch query. Future releases will try to implement such queries. • biosample • biosystems • cdd • dbvar • gap • gapplus
10 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
• gene • genome • geoprofiles: using accessions like GDS6063 should work • proteinclusters: commontaxonomy attribute can be used as name • sra: Only XML results. Needs a dedicated parser
2.2.4 Output format
The result shows the used command, query, type of result, and the corresponding taxon.
JSON output
Single mapping result
• Taxon:
{ "mode" : "mapping", "query" : "Influenza B virus (B/Acre/121609/2012)", "cast" : "taxon", "parentid" : 11520, "name" : "Influenza B virus (B/Acre/121609/2012)", "taxon" : { "taxid" : 1334390, "rank" : "no rank", "names" : { "Influenza B virus (B/Acre/121609/2012)" : "scientific_name" } } }
• Accession:
{ "mode" : "mapping", "query" : "ASM1001476v1", "cast" : "accs", "db":"assembly", "uid":5515991, "accession" : { "taxid" : 1962788, "accessions" : { "assemblyaccession" : "GCA_010014765.1", "lastmajorreleaseaccession" : "GCA_010014765.1", "assemblyname" : "ASM1001476v1" } } }
2.2. Map 11 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Multiple mapping results
1 {"mode":"mapping","query":"Influenza B virus (B/Acre/121609/2012)","cast":"taxon",
˓→"taxon":{"taxid":1334390,"rank":"no rank","names":{"Influenza B virus (B/Acre/
˓→121609/2012)":"scientific_name"},"parentid":11520,"name":"Influenza B virus (B/Acre/
˓→121609/2012)"}}
2 {"mode":"mapping","query":"man","cast":"taxon","taxon":{"taxid":9606,"rank":"species",
˓→"names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":
˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"}}
3 {"mode":"mapping","query":"562","cast":"taxon","taxon":{"taxid":562,"rank":"species",
˓→"names":{"Escherichia coli":"scientific_name","Bacillus coli":"Synonym","Bacterium
˓→coli":"Synonym","Bacterium coli commune":"Synonym","Enterococcus coli":"Synonym","E.
˓→ coli":"CommonName","Escherichia sp. 3_2_53FAA":"Includes","Escherichia sp. MAR":
˓→"Includes","bacterium 10a":"Includes","bacterium E3":"Includes","Escherichia/
˓→Shigella coli":"EquivalentName","ATCC 11775":"type material","ATCC:11775":"type
˓→material","BCCM/LMG:2092":"type material","CCUG 24":"type material","CCUG 29300":
˓→"type material","CCUG:24":"type material","CCUG:29300":"type material","CIP 54.8":
˓→"type material","CIP:54.8":"type material","DSM 30083":"type material","DSM:30083":
˓→"type material","IAM 12119":"type material","IAM:12119":"type material","JCM 1649":
˓→"type material","JCM:1649":"type material","LMG 2092":"type material","LMG:2092":
˓→"type material","NBRC 102203":"type material","NBRC:102203":"type material","NCCB
˓→54008":"type material","NCCB:54008":"type material","NCTC 9001":"type material",
˓→"NCTC:9001":"type material","personal::U5/41":"type material","strain U5/41":"type
˓→material"},"parentid":561,"name":"Escherichia coli"}}
4 {"mode":"mapping","query":"ASM1001476v1","cast":"accs","accession":{"taxid":1962788,
˓→"accessions":{"assemblyaccession":"GCA_010014765.1","lastmajorreleaseaccession":
˓→"GCA_010014765.1","assemblyname":"ASM1001476v1"},"db":"assembly","uid":5515991}}
5 {"mode":"mapping","query":"PRJNA604394","cast":"accs","accession":{"taxid":573,
˓→"accessions":{"project_id":604394,"project_acc":"PRJNA604394","project_name":
˓→"Klebsiella pneumoniae strain:S01"},"db":"bioproject","uid":604394}}
XML output
Single mapping result
• Taxon:
• Accession:
12 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Multiple mapping results
1
˓→
˓→2012)
˓→B virus (B/Acre/121609/2012)
2
˓→ ˓→"CommonName">man
3
˓→ ˓→"Synonym">Bacterium coli
˓→
˓→
˓→Escherichia sp. MAR ˓→"Includes">bacterium E3
˓→
˓→ATCC:11775 ˓→"type material">CCUG:24 ˓→type="type material">CIP 54.8 ˓→type="type material">DSM 30083
˓→ ˓→name> ˓→name> ˓→name> ˓→NBRC:102203 ˓→"type material">NCTC:9001 ˓→ 4 ˓→604394 ˓→id> ˓→strain:S01 5 ˓→ ˓→010014765.1 ˓→lastmajorreleaseaccession> ˓→accession> 2.2. Map 13 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b 2.3 Resolve The resolve command resolve lineages. Names and taxid can be resolved directly, while accessions need a mapping step first. 2.3.1 Taxids and names ncbi-taxonomist resolve -n man -t2 2.3.2 Accessions $: ncbi-taxonomist map -a QZWG01000002.1 MG831203 | ncbi-taxonomist resolve -m 2.3.3 Output format The result shows the used command, query, type of result, and the corresponding lineage. In case of queried names or taxids, the data for the taxon used as query is shown. For accessions, the queried accession data is shown. JSON output Single mapping result { "mode" : "resolve", "query" : "man", "cast" : "taxon", "parentid" : 9605, "name":"Homo sapiens", "taxon" : { "taxid" : 9606, "rank" : "species", "names" : { "Homo sapiens" : "scientific_name", "human" : "GenbankCommonName", "man" : "CommonName" } }, "lineage": [ {"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human": ˓→"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"}, {"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598, ˓→"name":"Homo"}, {"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"}, ˓→"parentid":9604,"name":"Homininae"}, {"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid ˓→":314295,"name":"Hominidae"}, {"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"}, ˓→"parentid":9526,"name":"Hominoidea"}, (continues on next page) 14 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b (continued from previous page) {"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"}, ˓→"parentid":314293,"name":"Catarrhini"}, {"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"}, ˓→"parentid":376913,"name":"Simiiformes"}, {"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"}, ˓→"parentid":9443,"name":"Haplorrhini"}, {"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid ˓→":314146,"name":"Primates"}, {"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"} ˓→,"parentid":1437010,"name":"Euarchontoglires"}, {"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"}, ˓→"parentid":9347,"name":"Boreoeutheria"}, {"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid ˓→":32525,"name":"Eutheria"}, {"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid ˓→":40674,"name":"Theria"}, {"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid ˓→":32524,"name":"Mammalia"}, {"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid ˓→":32523,"name":"Amniota"}, {"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid ˓→":1338369,"name":"Tetrapoda"}, {"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"} ˓→,"parentid":8287,"name":"Dipnotetrapodomorpha"}, {"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"}, ˓→"parentid":117571,"name":"Sarcopterygii"}, {"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"}, ˓→"parentid":117570,"name":"Euteleostomi"}, {"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid ˓→":7776,"name":"Teleostomi"}, {"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid ˓→":7742,"name":"Gnathostomata"}, {"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid ˓→":89593,"name":"Vertebrata"}, {"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid ˓→":7711,"name":"Craniata"}, {"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid ˓→":33511,"name":"Chordata"}, {"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"}, ˓→"parentid":33213,"name":"Deuterostomia"}, {"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid ˓→":6072,"name":"Bilateria"}, {"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid ˓→":33208,"name":"Eumetazoa"}, {"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid ˓→":33154,"name":"Metazoa"}, {"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid ˓→":2759,"name":"Opisthokonta"}, {"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"}, ˓→"parentid":131567,"name":"Eukaryota"}, {"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"}, ˓→"parentid":null,"name":"cellular organisms"} ] } 2.3. Resolve 15 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b Multiple mapping results {"mode":"resolve","query":"man","cast":"taxon","taxon":{"taxid":9606,"rank":"species", ˓→"names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man": ˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank ˓→":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName", ˓→"man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank": ˓→"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid ˓→":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604, ˓→"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_ ˓→name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily", ˓→"names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{ ˓→"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid ˓→":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{ ˓→"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid ˓→":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid ˓→":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates": ˓→"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank": ˓→"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name ˓→":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria": ˓→"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank": ˓→"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{ ˓→"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674, ˓→"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name ˓→"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{ ˓→"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank ˓→":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name": ˓→"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha": ˓→"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287, ˓→"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571, ˓→"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi": ˓→"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank": ˓→"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi ˓→"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"}, ˓→"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{ ˓→"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid ˓→":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711, ˓→"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_ ˓→name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{ ˓→"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid ˓→":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name ˓→":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"}, ˓→"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{ ˓→"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank ˓→":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name": ˓→"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota": ˓→"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no ˓→rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name": ˓→"cellular organisms"}]} {"mode":"resolve","query":"2","cast":"taxon","taxon":{"taxid":2,"rank":"superkingdom", ˓→"names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria": ˓→"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart", ˓→"Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid ˓→":131567,"name":"Bacteria"},"lineage":[{"taxid":2,"rank":"superkingdom","names":{ ˓→"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName ˓→","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae": ˓→"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name": ˓→"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms": (continues on next page) ˓→"scientific_name"},"parentid":null,"name":"cellular organisms"}]} 16 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b (continued from previous page) {"mode":"resolve","query":"MG831203","cast":"accs","accs":{"taxid":198112,"accessions ˓→":{"accessionversion":"MG831203.1","caption":"MG831203","extra": ˓→"gi|1496532032|gb|MG831203.1|"},"db":"nucleotide","uid":1496532032},"lineage":[{ ˓→"taxid":198112,"rank":"species","names":{"Deformed wing virus":"scientific_name", ˓→"DWV":"GenbankAcronym"},"parentid":232799,"name":"Deformed wing virus"},{"taxid ˓→":232799,"rank":"genus","names":{"Iflavirus":"scientific_name"},"parentid":699189, ˓→"name":"Iflavirus"},{"taxid":699189,"rank":"family","names":{"Iflaviridae": ˓→"scientific_name"},"parentid":464095,"name":"Iflaviridae"},{"taxid":464095,"rank": ˓→"order","names":{"Picornavirales":"scientific_name"},"parentid":2732506,"name": ˓→"Picornavirales"},{"taxid":2732506,"rank":"class","names":{"Pisoniviricetes": ˓→"scientific_name"},"parentid":2732408,"name":"Pisoniviricetes"},{"taxid":2732408, ˓→"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name ˓→":"Pisuviricota"},{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae": ˓→"scientific_name"},"parentid":2559587,"name":"Orthornavirae"},{"taxid":2559587,"rank ˓→":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria ˓→"},{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"}, ˓→"parentid":null,"name":"Viruses"}]} XML output Single mapping result 2.3. Resolve 17 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b (continued from previous page) 18 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b (continued from previous page) Multiple mapping results ˓→ ˓→"CommonName">man ˓→ ˓→type="scientific_name">Homo sapiens ˓→name> ˓→ ˓→"scientific_name">Homo ˓→subfamily ˓→"scientific_name">Homininae ˓→family ˓→"scientific_name">Hominidae ˓→superfamily ˓→"scientific_name">Hominoidea ˓→parvorder ˓→"scientific_name">Catarrhini ˓→ ˓→ ˓→ ˓→ ˓→9443 ˓→ ˓→314146 ˓→ ˓→taxon> ˓→ ˓→names> ˓→ ˓→names> ˓→ ˓→ ˓→32524 ˓→ ˓→parentid> ˓→ ˓→parentid> ˓→ ˓→8287 ˓→names> ˓→name> ˓→name> ˓→Euteleostomi ˓→Euteleostomi
˓→
˓→Teleostomi
˓→Gnathostomata
˓→
˓→
˓→Craniata
˓→Chordata
˓→Chordata
˓→Deuterostomia
˓→Deuterostomia
˓→
˓→Bilateria
˓→Eumetazoa
˓→Eumetazoa
˓→
˓→Metazoa
˓→Opisthokonta
˓→Opisthokonta ˓→rank> ˓→rank> ˓→"scientific_name">cellular organisms
(continued from previous page) ˓→rank> ˓→"BlastName">bacteria ˓→Procaryotae ˓→Prokaryotae ˓→prokaryotes
˓→superkingdom ˓→type="scientific_name">Bacteria ˓→name> ˓→type="Inpart">Procaryotae ˓→"Inpart">Prokaryotae ˓→prokaryotes
˓→ ˓→"scientific_name">cellular organisms
˓→
˓→MG831203.1
˓→gi|1496532032|gb|MG831203.1|
˓→
˓→ ˓→name> ˓→taxid> ˓→ ˓→taxid> ˓→
˓→
˓→
˓→
˓→2732408
˓→
˓→
˓→
˓→
˓→
˓→Riboviria
˓→Riboviria
˓→
˓→Viruses
2.4 Import
The import command import taxa, lineages, and accessions into a local SQLite database. The import command will print the resulkts from the preceding command to stanard output.
2.4.1 Local database schema
1 CREATE TABLE taxa
2 (id INTEGER PRIMARY KEY,
3 taxonid INT NOT NULL,
4 rank TEXT NULL,
5 parentid INT NULL, (continues on next page)
20 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
6 UNIQUE(taxonid));
7 CREATE UNIQUE INDEX taxa_idx ON taxa (taxonid);
8 CREATE TABLE names
9 (id INTEGER PRIMARY KEY,
10 taxonid INT,
11 name TEXT,
12 type TEXT NULL,
13 FOREIGN KEY (taxonid) REFERENCES taxa(taxonid) ON DELETE CASCADE,
14 UNIQUE(taxonid, name));
15 CREATE TRIGGER delete_names DELETEON names
16 BEGIN DELETE FROM names WHERE taxonid=old.taxonid; END;
17 CREATE UNIQUE INDEX names_idx ON names (taxonid, name);
18 CREATE TABLE accessions
19 (id INTEGER PRIMARY KEY,
20 accession TEXT NOT NULL,
21 db TEXT NOT NULL,
22 type TEXT NULL,
23 uid INT NOT NULL,
24 taxonid INT NOT NULL,
25 FOREIGN KEY (taxonid) REFERENCES taxa(taxonid) ON DELETE CASCADE,
26 UNIQUE(accession, uid));
27 CREATE TRIGGER delete_uids DELETEON accessions
28 BEGIN DELETE FROM accessions WHERE uid=old.uid; END;
29 CREATE UNIQUE INDEX accessions_idx ON
30 accessions (accession, uid);
2.4.2 Import taxa via collect
ncbi-taxonomist collect -n man -t2 | ncbi-taxonomist --database taxa-collect.db {"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},
˓→"parentid":null,"name":"cellular organisms"} {"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid
˓→":131567,"name":"Eukaryota"} {"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid
˓→":2759,"name":"Opisthokonta"} {"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid
˓→":33154,"name":"Metazoa"} {"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,
˓→"name":"Eumetazoa"} {"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,
˓→"name":"Bilateria"} {"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid
˓→":33213,"name":"Deuterostomia"} {"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,
˓→"name":"Chordata"} {"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid
˓→":7711,"name":"Craniata"} {"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid
˓→":89593,"name":"Vertebrata"} {"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid
˓→":7742,"name":"Gnathostomata"} {"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid
˓→":7776,"name":"Teleostomi"} {"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid
˓→":117570,"name":"Euteleostomi"} (continues on next page)
2.4. Import 21 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) {"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},
˓→"parentid":117571,"name":"Sarcopterygii"} {"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},
˓→"parentid":8287,"name":"Dipnotetrapodomorpha"} {"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid
˓→":1338369,"name":"Tetrapoda"} {"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523,
˓→"name":"Amniota"} {"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,
˓→"name":"Mammalia"} {"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,
˓→"name":"Theria"} {"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,
˓→"name":"Eutheria"} {"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid
˓→":9347,"name":"Boreoeutheria"} {"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},
˓→"parentid":1437010,"name":"Euarchontoglires"} {"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,
˓→"name":"Primates"} {"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid
˓→":9443,"name":"Haplorrhini"} {"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"},
˓→"parentid":376913,"name":"Simiiformes"} {"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid
˓→":314293,"name":"Catarrhini"} {"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"},
˓→"parentid":9526,"name":"Hominoidea"} {"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid
˓→":314295,"name":"Hominidae"} {"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid
˓→":9604,"name":"Homininae"} {"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598,
˓→"name":"Homo"} {"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":
˓→"GenbankCommonName","man":"CommonName"},"parentid":9605,"na me":"Homo sapiens"} {"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},
˓→"parentid":null,"name":"cellular organisms"} {"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name","eubacteria":
˓→"GenbankCommonName","bacteria":"BlastName","Monera":"Inp art","Procaryotae":
˓→"Inpart","Prokaryota":"Inpart","Prokaryotae":"Inpart","prokaryote":"Inpart",
˓→"prokaryotes":"Inpart"},"parentid":131567,"na me":"Bacteria"}
• Check database: sqlite3 taxa.db 'SELECT * FROM taxa t JOIN names n on t.taxonid=n.taxonid;' id|taxonid|rank|parentid|id|taxonid|name|type 1|9606|species|9605|1|9606|Homo sapiens|scientific_name 1|9606|species|9605|2|9606|human|GenbankCommonName 1|9606|species|9605|3|9606|man|CommonName 2|9605|genus|207598|4|9605|Homo|scientific_name 3|207598|subfamily|9604|5|207598|Homininae|scientific_name 4|9604|family|314295|6|9604|Hominidae|scientific_name 5|314295|superfamily|9526|7|314295|Hominoidea|scientific_name 6|9526|parvorder|314293|8|9526|Catarrhini|scientific_name 7|314293|infraorder|376913|9|314293|Simiiformes|scientific_name 8|376913|suborder|9443|10|376913|Haplorrhini|scientific_name (continues on next page)
22 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) 9|9443|order|314146|11|9443|Primates|scientific_name 10|314146|superorder|1437010|12|314146|Euarchontoglires|scientific_name 11|1437010|clade|9347|13|1437010|Boreoeutheria|scientific_name 12|9347|clade|32525|14|9347|Eutheria|scientific_name 13|32525|clade|40674|15|32525|Theria|scientific_name 14|40674|class|32524|16|40674|Mammalia|scientific_name 15|32524|clade|32523|17|32524|Amniota|scientific_name 16|32523|clade|1338369|18|32523|Tetrapoda|scientific_name 17|1338369|clade|8287|19|1338369|Dipnotetrapodomorpha|scientific_name 18|8287|superclass|117571|20|8287|Sarcopterygii|scientific_name 19|117571|clade|117570|21|117571|Euteleostomi|scientific_name 20|117570|clade|7776|22|117570|Teleostomi|scientific_name 21|7776|clade|7742|23|7776|Gnathostomata|scientific_name 22|7742|clade|89593|24|7742|Vertebrata|scientific_name 23|89593|subphylum|7711|25|89593|Craniata|scientific_name 24|7711|phylum|33511|26|7711|Chordata|scientific_name 25|33511|clade|33213|27|33511|Deuterostomia|scientific_name 26|33213|clade|6072|28|33213|Bilateria|scientific_name 27|6072|clade|33208|29|6072|Eumetazoa|scientific_name 28|33208|kingdom|33154|30|33208|Metazoa|scientific_name 29|33154|clade|2759|31|33154|Opisthokonta|scientific_name 30|2759|superkingdom|131567|32|2759|Eukaryota|scientific_name 31|131567|no rank||33|131567|cellular organisms|scientific_name 32|2|superkingdom|131567|34|2|Bacteria|scientific_name 32|2|superkingdom|131567|35|2|eubacteria|GenbankCommonName 32|2|superkingdom|131567|36|2|bacteria|BlastName 32|2|superkingdom|131567|37|2|Monera|Inpart 32|2|superkingdom|131567|38|2|Procaryotae|Inpart 32|2|superkingdom|131567|39|2|Prokaryota|Inpart 32|2|superkingdom|131567|40|2|Prokaryotae|Inpart 32|2|superkingdom|131567|41|2|prokaryote|Inpart 32|2|superkingdom|131567|42|2|prokaryotes|Inpart
2.4.3 Import taxa via resolve ncbi-taxonomist resolve -n man -t2 | ncbi-taxonomist import -db taxa-resolve.db {"mode":"resolve","query":"man","cast":"taxon","taxon":{"taxid":9606,"rank":"species",
˓→"names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":
˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank
˓→":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName",
˓→"man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":
˓→"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid
˓→":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,
˓→"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_
˓→name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily",
˓→"names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{
˓→"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid
˓→":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{
˓→"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid
˓→":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid
˓→":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":
˓→"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":
˓→"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name
˓→":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":
˓→"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":
˓→"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":(continues"Eutheria" on next} page),{
˓→"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,
˓→"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name
2.4.˓→"}, Import"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{ 23
˓→"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank
˓→":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":
˓→"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":
˓→"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,
˓→"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,
˓→"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":
˓→"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":
˓→"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi
˓→"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},
˓→"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{
˓→"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid
˓→":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,
˓→"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_
˓→name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{
˓→"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid
˓→":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name
˓→":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},
˓→"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{
˓→"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank
˓→":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":
˓→"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":
˓→"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no
˓→rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":
˓→"cellular organisms"}]} ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) {"mode":"resolve","query":"2","cast":"taxon","taxon":{"taxid":2,"rank":"superkingdom",
˓→"names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":
˓→"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart",
˓→"Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid
˓→":131567,"name":"Bacteria"},"lineage":[{"taxid":2,"rank":"superkingdom","names":{
˓→"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName
˓→","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":
˓→"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":
˓→"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":
˓→"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
• Check database: The database should be identical to the database created with the collect command above. sqlite3 taxa-resolve.db 'SELECT * FROM taxa t JOIN names n ON t.taxonid=n.taxonid;'
2.4.4 Import accessions
Importing accessions does not inmport only the taxid for the accession, not any other taxon metadata. ncbi-taxonomist map --entrezdb protein --accessions AFR11853 AIA66128.1 | ncbi-
˓→taxonomist import -db taxa.db
• Check database: sqlite3 -header taxa.db 'SELECT * FROM accessions a JOIN taxa t ON a.taxonid==t. ˓→taxonid;' id|accession|db|type|uid|taxonid|id|taxonid|rank|parentid 1|AIA66128.1|protein|accessionversion|641483259|1239567|33|1239567|| 2|AIA66128|protein|caption|641483259|1239567|33|1239567|| 3|gi|641483259|gb|AIA66128.1||protein|extra|641483259|1239567|33|1239567|| 4|AFR11853.1|protein|accessionversion|403044789|1224525|34|1224525|| 5|AFR11853|protein|caption|403044789|1224525|34|1224525|| 6|gi|403044789|gb|AFR11853.1||protein|extra|403044789|1224525|34|1224525||
To add the missing information, please check Importing accessions for an extended command accomplishing this. The following example shows the database after adding the missing data: sqlite3 -header taxa.db 'SELECT * FROM accessions a JOIN taxa t ON a.taxonid==t. ˓→taxonid;' id|accession|db|type|uid|taxonid|id|taxonid|rank|parentid 1|AIA66128.1|protein|accessionversion|641483259|1239567|33|1239567|species|249588 2|AIA66128|protein|caption|641483259|1239567|33|1239567|species|249588 3|gi|641483259|gb|AIA66128.
˓→1||protein|extra|641483259|1239567|33|1239567|species|249588 4|AFR11853.1|protein|accessionversion|403044789|1224525|34|1224525|species|35278 5|AFR11853|protein|caption|403044789|1224525|34|1224525|species|35278 6|gi|403044789|gb|AFR11853.1||protein|extra|403044789|1224525|34|1224525|species|35278
2.5 Subtree ncbi-taxonomist subtree collects taxonomic subsamples for taxids or names in a local database.
24 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Note: Fetching subtrees remotely form Entrez is in development.
A local database is required, for example:
$: ncbi-taxonomist collect -t 142786 9606 | ncbi-taxonomist import -db test.db
2.5.1 Collecting subtrees
Between two given ranks
$: ncbi-taxonomist subtree -db test.db -t 142786 9606 --lrank order --hrank phylum {"mode":"subtree","query":9606,"subtree":[{"taxid":9443,"rank":"order","names":{
˓→"Primates":"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,
˓→"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid
˓→":1437010,"name":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{
˓→"Boreoeutheria":"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid
˓→":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name
˓→":"Eutheria"},{"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},
˓→"parentid":40674,"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":
˓→"scientific_name"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade
˓→","names":{"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid
˓→":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,
˓→"name":"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":
˓→"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,
˓→"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,
˓→"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":
˓→"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":
˓→"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi
˓→"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},
˓→"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{
˓→"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid
˓→":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,
˓→"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_
˓→name"},"parentid":33511,"name":"Chordata"}]} {"mode":"subtree","query":142786,"subtree":[{"taxid":464095,"rank":"order","names":{
˓→"Picornavirales":"scientific_name"},"parentid":2732506,"name":"Picornavirales"},{
˓→"taxid":2732506,"rank":"class","names":{"Pisoniviricetes":"scientific_name"},
˓→"parentid":2732408,"name":"Pisoniviricetes"},{"taxid":2732408,"rank":"phylum","names
˓→":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name":"Pisuviricota"}]}
Collect one specific rank
$: ncbi-taxonomist subtree -db test.db -t 142786 9606 --rank order {"mode":"subtree","query":9606,"subtree":[{"taxid":9443,"rank":"order","names":{
˓→"Primates":"scientific_name"},"parentid":314146,"name":"Primates"}]} {"mode":"subtree","query":142786,"subtree":[{"taxid":464095,"rank":"order","names":{
˓→"Picornavirales":"scientific_name"},"parentid":2732506,"name":"Picornavirales"}]
2.5. Subtree 25 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Collect from a given rank to root and print XML
$: ncbi-taxonomist subtree -x -db test.db -t 142786 9606 --lrank order
˓→order ˓→"scientific_name">Primates
˓→superorder
˓→
˓→1437010
˓→
˓→
˓→
˓→32525
˓→ ˓→taxid> ˓→type="scientific_name">Mammalia
˓→ ˓→"scientific_name">Amniota
˓→clade ˓→"scientific_name">Tetrapoda
˓→
˓→
˓→
˓→
˓→
˓→117570
˓→taxon>
˓→ ˓→names> ˓→ ˓→names> ˓→ ˓→names> ˓→ ˓→ ˓→33511 ˓→ ˓→33213 ˓→taxon> ˓→6072 ˓→ ˓→parentid> ˓→ ˓→parentid> ˓→ ˓→parentid> ˓→ ˓→131567 ˓→taxon> ˓→name> ˓→ ˓→order ˓→type="scientific_name">Picornavirales ˓→taxid> ˓→ ˓→ ˓→ ˓→ ˓→ ˓→ ˓→ ˓→name> ˓→names>
(continued from previous page)
Collect from a given rank to lowest rank
$: ncbi-taxonomist subtree -db test.db -t 142786 9606 --hrank order {"mode":"subtree","query":9606,"subtree":[{"taxid":9606,"rank":"species","names":{
˓→"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},
˓→"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":"genus","names":{"Homo":
˓→"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid":207598,"rank":
˓→"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,"name":
˓→"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},
˓→"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily","names":{
˓→"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{"taxid":9526,
˓→"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid":314293,"name
˓→":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{"Simiiformes":
˓→"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid":376913,"rank":
˓→"suborder","names":{"Haplorrhini":"scientific_name"},"parentid":9443,"name":
˓→"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},
˓→"parentid":314146,"name":"Primates"}]} {"mode":"subtree","query":142786,"subtree":[{"taxid":142786,"rank":"genus","names":{
˓→"Norovirus":"scientific_name","Norwalk-like viruses":"EquivalentName"},"parentid
˓→":11974,"name":"Norovirus"},{"taxid":11974,"rank":"family","names":{"Caliciviridae":
˓→"scientific_name"},"parentid":464095,"name":"Caliciviridae"},{"taxid":464095,"rank":
˓→"order","names":{"Picornavirales":"scientific_name"},"parentid":2732506,"name":
˓→"Picornavirales"}]}
2.5.2 Output format
JSON output
{ "mode": "subtree", "query": 9606, "subtree":[ { "taxid": 9443, "rank": "order", "names":{ "Primates": "scientific_name" }, "parentid": 314146, "name": "Primates" }, { "taxid": 314146, "rank": "superorder", "names":{ "Euarchontoglires": "scientific_name" }, "parentid": 1437010, "name": "Euarchontoglires" }, { (continues on next page)
2.5. Subtree 27 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) "taxid": 1437010, "rank": "clade", "names":{ "Boreoeutheria": "scientific_name" }, "parentid": 9347, "name": "Boreoeutheria" }, { "taxid": 9347, "rank": "clade", "names":{ "Eutheria": "scientific_name" }, "parentid": 32525, "name": "Eutheria" }, { "taxid": 32525, "rank": "clade", "names":{ "Theria": "scientific_name" }, "parentid": 40674, "name": "Theria" }, { "taxid": 40674, "rank": "class", "names":{ "Mammalia": "scientific_name" }, "parentid": 32524, "name": "Mammalia" }, { "taxid": 32524, "rank": "clade", "names":{ "Amniota": "scientific_name" }, "parentid": 32523, "name": "Amniota" }, { "taxid": 32523, "rank": "clade", "names":{ "Tetrapoda": "scientific_name" }, "parentid": 1338369, "name": "Tetrapoda" }, { "taxid": 1338369, "rank": "clade", "names":{ (continues on next page)
28 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) "Dipnotetrapodomorpha": "scientific_name" }, "parentid": 8287, "name": "Dipnotetrapodomorpha" }, { "taxid": 8287, "rank": "superclass", "names":{ "Sarcopterygii": "scientific_name" }, "parentid": 117571, "name": "Sarcopterygii" }, { "taxid": 117571, "rank": "clade", "names":{ "Euteleostomi": "scientific_name" }, "parentid": 117570, "name": "Euteleostomi" }, { "taxid": 117570, "rank": "clade", "names":{ "Teleostomi": "scientific_name" }, "parentid": 7776, "name": "Teleostomi" }, { "taxid": 7776, "rank": "clade", "names":{ "Gnathostomata": "scientific_name" }, "parentid": 7742, "name": "Gnathostomata" }, { "taxid": 7742, "rank": "clade", "names":{ "Vertebrata": "scientific_name" }, "parentid": 89593, "name": "Vertebrata" }, { "taxid": 89593, "rank": "subphylum", "names":{ "Craniata": "scientific_name" }, "parentid": 7711, (continues on next page)
2.5. Subtree 29 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page) "name": "Craniata" }, { "taxid": 7711, "rank": "phylum", "names":{ "Chordata": "scientific_name" }, "parentid": 33511, "name": "Chordata" } ] } { "mode": "subtree", "query": 142786, "subtree":[ { "taxid": 464095, "rank": "order", "names":{ "Picornavirales": "scientific_name" }, "parentid": 2732506, "name": "Picornavirales" }, { "taxid": 2732506, "rank": "class", "names":{ "Pisoniviricetes": "scientific_name" }, "parentid": 2732408, "name": "Pisoniviricetes" }, { "taxid": 2732408, "rank": "phylum", "names":{ "Pisuviricota": "scientific_name" }, "parentid": 2732396, "name": "Pisuviricota" } ] }
XML output
30 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
2.5. Subtree 31 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
32 Chapter 2. Basic functions ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
2.6 Group ncbi-taxonomist group creates and lists taxonomix groups in a local ncbi-taxonomist database.
2.6.1 Creating a group
$: ncbi-taxonomist collect -n 'Black willow' 'Black hickory'| \ ncbi-taxonomist import -db taxa.db | \ ncbi-taxonomist group --add tree -db taxa.db
2.6.2 Retrieve a group
Groups can be retrieved as taxids and processed, e.g. with jq, and reused.
$: ncbi-taxonomist group --get tree -db taxa.db | \ jq '.taxa[]'| \ ncbi-taxonomist map -t -db taxa.db
2.6. Group 33 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
34 Chapter 2. Basic functions CHAPTER 3
Cookbook
Contents
• Reformatting results – Convert accession lineages into TSV – Convert a lineage into a table • Importing accessions – Map accessions and collect corresponding taxa • Creating a valid XML file from line based XML output
3.1 Reformatting results
Examples how to use jq to reformat JSON output. For more jq help, please refer to: • jq manual • Reshaping JSON with jq
3.1.1 Convert accession lineages into TSV
Converting the lineage of several nucleotide accessions into a tab separated output. The queried accession is printed in the first field. Substituting @tsv with @csv in the example will result in CSV output.
1 ncbi-taxonomist map -a QZWG01000002.1 MG831203 | ncbi-taxonomist resolve --mapping \|
2 jq -r '[.query, .lineage[].name]|@tsv' (continues on next page)
35 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
3 MG831203 Deformed wing virus Iflavirus Iflaviridae Picornavirales Pisoniviricetes
˓→Pisuviricota Orthornavirae Riboviria Viruses
4 QZWG01000002.1 Glycine soja Glycine subgen. Soja Glycine Phaseoleae indigoferoid/
˓→millettioid clade NPAAA clade 50 kb inversion clade Papilionoideae Fabaceae
˓→Fabales fabids rosids Pentapetalae Gunneridae eudicotyledons Mesangiospermae
˓→Magnoliopsida Spermatophyta Euphyllophyta Tracheophyta Embryophyta Streptophytina
˓→Streptophyta Viridiplantae Eukaryota cellular organisms
3.1.2 Convert a lineage into a table
Convert the lineage into a table with the tab separated columns taxid, rank, and parentid.
1 ncbi-taxonomist resolve -t 9606 \ |
2 jq -r '.lin[]|"\(.taxon_id) \(.name) \(.rank) \(.parent_id)"'
3 9606 Homo sapiens species 9605
4 9605 Homo genus 207598
5 207598 Homininae subfamily 9604
6 9604 Hominidae family 314295
7 314295 Hominoidea superfamily 9526
8 9526 Catarrhini parvorder 314293
9 314293 Simiiformes infraorder 376913
10 376913 Haplorrhini suborder 9443
11 9443 Primates order 314146
12 314146 Euarchontoglires superorder 1437010
13 1437010 Boreoeutheria clade 9347
14 9347 Eutheria clade 32525
15 32525 Theria clade 40674
16 40674 Mammalia class 32524
17 32524 Amniota clade 32523
18 32523 Tetrapoda clade 1338369
19 1338369 Dipnotetrapodomorpha clade 8287
20 8287 Sarcopterygii superclass 117571
21 117571 Euteleostomi clade 117570
22 117570 Teleostomi clade 7776
23 7776 Gnathostomata clade 7742
24 7742 Vertebrata clade 89593
25 89593 Craniata subphylum 7711
26 7711 Chordata phylum 33511
27 33511 Deuterostomia clade 33213
28 33213 Bilateria clade 6072
29 6072 Eumetazoa clade 33208
30 33208 Metazoa kingdom 33154
31 33154 Opisthokonta clade 2759
32 2759 Eukaryota superkingdom 131567
33 131567 cellular organisms no rank null
3.2 Importing accessions
Mapping accessions fetched only the corresponding taxid but not all corresponding metadata.
36 Chapter 3. Cookbook ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
3.2.1 Map accessions and collect corresponding taxa
1 ncbi-taxonomist map --entrezdb protein --accessions AFR11853 AIA66128.1 | \
2 ncbi-taxonomist import -db taxa.db | \
3 jq '.accession.taxid'| \
4 ncbi-taxonomist collect -t | \
5 ncbi-taxonomist import -db taxa.db
6 {"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid
˓→":null,"name":"Viruses"}
7 {"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid
˓→":10239,"name":"Riboviria"}
8 {"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},
˓→"parentid":2559587,"name":"Orthornavirae"}
9 {"taxid":2732408,"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid
˓→":2732396,"name":"Pisuviricota"}
10 {"taxid":2732507,"rank":"class","names":{"Stelpaviricetes":"scientific_name"},
˓→"parentid":2732408,"name":"Stelpaviricetes"}
11 {"taxid":2732551,"rank":"order","names":{"Stellavirales":"scientific_name"},"parentid
˓→":2732507,"name":"Stellavirales"}
12 {"taxid":39733,"rank":"family","names":{"Astroviridae":"scientific_name"},"parentid
˓→":2732551,"name":"Astroviridae"}
13 {"taxid":249588,"rank":"genus","names":{"Mamastrovirus":"scientific_name"},"parentid
˓→":39733,"name":"Mamastrovirus"}
14 {"taxid":1239567,"rank":"species","names":{"Mamastrovirus 3":"scientific_name",
˓→"Porcine astrovirus":"EquivalentName"},"parentid":249588,"name":"Mamastrovirus 3"}
15 {"taxid":2585030,"rank":"no rank","names":{"unclassified Riboviria":"scientific_name"}
˓→,"parentid":2559587,"name":"unclassified Riboviria"}
16 {"taxid":439490,"rank":"no rank","names":{"unclassified ssRNA viruses":"scientific_
˓→name"},"parentid":2585030,"name":"unclassified ssRNA viruses"}
17 {"taxid":35278,"rank":"clade","names":{"unclassified ssRNA positive-strand viruses":
˓→"scientific_name"},"parentid":439490,"name":"unclassified ssRNA positive-strand
˓→viruses"}
18 {"taxid":1224525,"rank":"species","names":{"Cadicistrovirus":"scientific_name"},
˓→"parentid":35278,"name":"Cadicistrovirus"}
3.3 Creating a valid XML file from line based XML output
To create a valid XML document from the line based output, the output has to be encapsulated between two root XML tags. On Linux, this can be achieved via process substitution as shown in Listing 3.1.
Listing 3.1: Creating valid XML from line based output. Line 3 shows the command to create a valid XML output. The xmllint command on line 4 is not required but demonstrates the validity of the created XML output.
1 $: ncbi-taxonomist map --accessions QZWG01000002.1 MG831203 | \
2 ncbi-taxonomist resolve --xml --mapping | \
3 (echo "
4 xmllint --pretty1-
5
6
7
8
9
10
3.3. Creating a valid XML file from line based XML output 37 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
11
12
13
38 Chapter 3. Cookbook CHAPTER 4
Container
ncbi-taxonomist comes with a Docker container and Singularity image. Both include jq to facilitate JSON handling. Both containers have the /dbs mountpoint to mount host directories, e.g. to use local databases.
Content
• Docker – Install – Test – Basic usage
* Mapping * Resolving * Pipelines * Local database * Docker ncbi-taxonomist and jq • Singularity – Install
* Build – Test – Basic usage
* Mapping * Resolving * Pipelines
39 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
* Local database * Singularity ncbi-taxonomist and jq
Note: The commands shown here assume a current Linux system. Please adjust the commands to your system, accordingly.
4.1 Docker
The Docker container can be found at https://gitlab.com/janpb/ncbi-taxonomist/container_registry/. Please check the Docker Docs if some commands are unclear. • The Docker image creates the user user for the container to run all commands • The container has the mountpoint /dbs to bind host paths
4.1.1 Install
The latest ncbi-taxonomist Docker image can be pulled from registry.gitlab.com/janpb/ ncbi-taxonomist:latest . It can be run with the command docker run registry.gitlab.com/ janpb/ncbi-taxonomist. If desired, the image can be tagged to a more concise tag name using docker tag registry.gitlab.com/ janpb/ncbi-taxonomist ncbi-taxonomist.
1 $: docker pull registry.gitlab.com/janpb/ncbi-taxonomist:latest
2 latest: Pulling from janpb/ncbi-taxonomist
3 cbdbe7a5bc2a: Pull complete
4 50d9a3e26028: Pull complete
5 a0e2567dead0: Pull complete
6 #cut
7 $: docker tag registry.gitlab.com/janpb/ncbi-taxonomist:latest ncbi-taxonomist
8 $: docker images
9 ncbi-taxonomist latest f957b80d1034
˓→22 hours ago 68.3MB
10 registry.gitlab.com/janpb/ncbi-taxonomist latest f957b80d1034
˓→22 hours ago 68.3MB
Line 6 indicats cut output and the output on lines 3-8 and 12-13 will likely look different.
4.1.2 Test
Assuming the image is tagged ncbi-taxonomist, the following command should print the basic usage:
1 $: docker run --rm -it ncbi-taxonomist
2 usage: ncbi-taxonomist[--version][-v][--apikey APIKEY]{map,resolve,import,collect,
˓→subtree,group} ...
3
4 commands:
5 {map,resolve,import,collect,subtree,group}
6 map Map taxid to names and vice-versa
7 #cut
40 Chapter 4. Container ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
4.1.3 Basic usage
The examples assume the image has been tagged ncbi-taxonomist and show representative commands.
Mapping
1 $: docker run --rm -it ncbi-taxonomist map -t 9606
2 {"mode":"mapping","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species
˓→","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":
˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"}}
Resolving
1 $: docker run --rm -it ncbi-taxonomist resolve -t2 -n 'Arabidopsis'
2 {"mode":"resolve","query":"Arabidopsis","cast":"taxon","taxon":{"taxid":3701,"rank":
˓→"genus","names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},
˓→"parentid":980083,"name":"Arabidopsis"},"lineage":[{"taxid":3701,"rank":"genus",
˓→"names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},"parentid
˓→":980083,"name":"Arabidopsis"},{"taxid":980083,"rank":"tribe","names":{"Camelineae":
˓→"scientific_name"},"parentid":3700,"name":"Camelineae"},{"taxid":3700,"rank":"family
˓→","names":{"Brassicaceae":"scientific_name"},"parentid":3699,"name":"Brassicaceae"},
˓→{"taxid":3699,"rank":"order","names":{"Brassicales":"scientific_name"},"parentid
˓→":91836,"name":"Brassicales"},{"taxid":91836,"rank":"clade","names":{"malvids":
˓→"scientific_name"},"parentid":71275,"name":"malvids"},{"taxid":71275,"rank":"clade",
˓→"names":{"rosids":"scientific_name"},"parentid":1437201,"name":"rosids"},{"taxid
˓→":1437201,"rank":"clade","names":{"Pentapetalae":"scientific_name"},"parentid
˓→":91827,"name":"Pentapetalae"},{"taxid":91827,"rank":"clade","names":{"Gunneridae":
˓→"scientific_name"},"parentid":71240,"name":"Gunneridae"},{"taxid":71240,"rank":
˓→"clade","names":{"eudicotyledons":"scientific_name"},"parentid":1437183,"name":
˓→"eudicotyledons"},{"taxid":1437183,"rank":"clade","names":{"Mesangiospermae":
˓→"scientific_name"},"parentid":3398,"name":"Mesangiospermae"},{"taxid":3398,"rank":
˓→"class","names":{"Magnoliopsida":"scientific_name"},"parentid":58024,"name":
˓→"Magnoliopsida"},{"taxid":58024,"rank":"clade","names":{"Spermatophyta":"scientific_
˓→name"},"parentid":78536,"name":"Spermatophyta"},{"taxid":78536,"rank":"clade","names
˓→":{"Euphyllophyta":"scientific_name"},"parentid":58023,"name":"Euphyllophyta"},{
˓→"taxid":58023,"rank":"clade","names":{"Tracheophyta":"scientific_name"},"parentid
˓→":3193,"name":"Tracheophyta"},{"taxid":3193,"rank":"clade","names":{"Embryophyta":
˓→"scientific_name"},"parentid":131221,"name":"Embryophyta"},{"taxid":131221,"rank":
˓→"subphylum","names":{"Streptophytina":"scientific_name"},"parentid":35493,"name":
˓→"Streptophytina"},{"taxid":35493,"rank":"phylum","names":{"Streptophyta":
˓→"scientific_name"},"parentid":33090,"name":"Streptophyta"},{"taxid":33090,"rank":
˓→"kingdom","names":{"Viridiplantae":"scientific_name"},"parentid":2759,"name":
˓→"Viridiplantae"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":
˓→"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no
˓→rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":
˓→"cellular organisms"}]}
3 {"mode":"resolve","query":"2","cast":"taxon","taxon":{"taxid":2,"rank":"superkingdom",
˓→"names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":
˓→"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart",
˓→"Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid
˓→":131567,"name":"Bacteria"},"lineage":[{"taxid":2,"rank":"superkingdom","names":{
˓→"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName
˓→","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":
˓→"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":
˓→"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":
˓→"scientific_name"},"parentid":null,"name":"cellular organisms"}]} (continues on next page)
4.1. Docker 41 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
Pipelines
1 $: docker run --rm -i ncbi-taxonomist map -edb bioproject -a PRJNA604394 | \
2 docker run --rm -i ncbi-taxonomist resolve -m
3 {"mode":"resolve","query":"PRJNA604394","cast":"accs","accs":{"taxid":573,"accessions
˓→":{"project_id":604394,"project_acc":"PRJNA604394","project_name":"Klebsiella
˓→pneumoniae strain:S01"},"db":"bioproject","uid":604394},"lineage":[{"taxid":573,
˓→"rank":"species","names":{"Klebsiella pneumoniae":"scientific_name","'Klebsiella
˓→aerogenes' (Kruse) Taylor et al. 1956":"Synonym","Bacillus pneumoniae":"Synonym",
˓→"Bacterium pneumoniae crouposae":"Synonym","Hyalococcus pneumoniae":"Synonym",
˓→"Klebsiella pneumoniae aerogenes":"Synonym","Klebsiella sp. 2N3":"Includes",
˓→"Klebsiella sp. C1(2016)":"Includes","Klebsiella sp. M-AI-2":"Includes","Klebsiella
˓→sp. PB12":"Includes","Klebsiella sp. RCE-7":"Includes","ATCC 13883":"type material",
˓→"ATCC:13883":"type material","BCCM/LMG:2095":"type material","CCUG 225":"type
˓→material","CCUG:225":"type material","CDC 298-53":"type material","CDC:298-53":
˓→"type material","CIP 82.91":"type material","CIP:82.91":"type material","DSM 30104":
˓→"type material","DSM:30104":"type material","HAMBI 450":"type material","HAMBI:450":
˓→"type material","IAM 14200":"type material","IAM:14200":"type material","IFO 14940":
˓→"type material","IFO:14940":"type material","JCM 1662":"type material","JCM:1662":
˓→"type material","LMG 2095":"type material","LMG:2095":"type material","NBRC 14940":
˓→"type material","NBRC:14940":"type material","NCTC 9633":"type material","NCTC:9633
˓→":"type material"},"parentid":570,"name":"Klebsiella pneumoniae"},{"taxid":570,"rank
˓→":"genus","names":{"Klebsiella":"scientific_name"},"parentid":543,"name":"Klebsiella
˓→"},{"taxid":543,"rank":"family","names":{"Enterobacteriaceae":"scientific_name"},
˓→"parentid":91347,"name":"Enterobacteriaceae"},{"taxid":91347,"rank":"order","names":
˓→{"Enterobacterales":"scientific_name"},"parentid":1236,"name":"Enterobacterales"},{
˓→"taxid":1236,"rank":"class","names":{"Gammaproteobacteria":"scientific_name"},
˓→"parentid":1224,"name":"Gammaproteobacteria"},{"taxid":1224,"rank":"phylum","names":
˓→{"Proteobacteria":"scientific_name"},"parentid":2,"name":"Proteobacteria"},{"taxid
˓→":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name"},"parentid":131567,
˓→"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":
˓→"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
Local database
To use local databases with the ncbi-taxonomist Docker container, the path on the host machine needs to be bound to the container’s internal mountpoint /dbs. To have the proper permissions, the --user argument needs to be set when writing to a local database. On Linux, this can be done via the id command (Listing 4.1).
Listing 4.1: Populating a local database using the ncbi-taxonomist Docker container. Line 4 shows how to run the container as current user.
1 $ ls ${PWD}
2 #empty
3 $: docker run --rm -i ncbi-taxonomist collect -t 9606 \ |
4 docker run --rm -i --user $(id -u):$(id -g) -v ${PWD}:/dbs ncbi-taxonomist import -
˓→db /dbs/dockertaxa.db
5 {"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},
˓→"parentid":null,"name":"cellular organisms"}
6 {"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid
˓→":131567,"name":"Eukaryota"} (continues on next page)
42 Chapter 4. Container ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
7 {"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid
˓→":2759,"name":"Opisthokonta"}
8 {"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid
˓→":33154,"name":"Metazoa"}
9 {"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,
˓→"name":"Eumetazoa"}
10 {"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,
˓→"name":"Bilateria"}
11 {"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid
˓→":33213,"name":"Deuterostomia"}
12 {"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,
˓→"name":"Chordata"}
13 {"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid
˓→":7711,"name":"Craniata"}
14 #cut
15 $: ls ${PWD}
16 dockertaxa.db
17 $: docker run --rm -i -v ${PWD}:/dbs ncbi-taxonomist resolve -t 9606 -db /dbs/
˓→dockertaxa.db
18 {"mode":"resolve","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species
˓→","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":
˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank
˓→":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName",
˓→"man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":
˓→"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid
˓→":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,
˓→"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_
˓→name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily",
˓→"names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{
˓→"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid
˓→":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{
˓→"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid
˓→":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid
˓→":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":
˓→"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":
˓→"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name
˓→":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":
˓→"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":
˓→"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{
˓→"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,
˓→"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name
˓→"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{
˓→"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank
˓→":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":
˓→"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":
˓→"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,
˓→"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,
˓→"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":
˓→"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":
˓→"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi
˓→"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},
˓→"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{
˓→"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid
˓→":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,
˓→"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_
˓→name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{
˓→"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid
˓→":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid"(continues:6072, on next"name page)
˓→":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},
˓→"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{
4.1.˓→"Metazoa" Docker :"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank43
˓→":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":
˓→"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":
˓→"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no
˓→rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":
˓→"cellular organisms"}]} ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
Docker ncbi-taxonomist and jq
To use the included jq, Docker’s run command has to be adjusted with the --entrypoint argument (Listing 4.2).
Listing 4.2: ncbi-taxonomist and jq together in the Docker con- tainer. Line 3 shows how to modify the Docker run command for jq.
1 $: docker run --rm -i ncbi-taxonomist map -a QZWG01000002.1 MG831203 | \
2 docker run --rm -i ncbi-taxonomist resolve --mapping | \
3 docker run --rm -i --entrypoint 'jq' ncbi-taxonomist -r '[.query, .lineage[].
˓→name]|@tsv'
4 MG831203 Deformed wing virus Iflavirus Iflaviridae
˓→Picornavirales Pisoniviricetes Pisuviricota Orthornavirae Riboviria
˓→Viruses
5 QZWG01000002.1 Glycine soja Glycine subgen. Soja Glycine Phaseoleae
˓→indigoferoid/millettioid clade NPAAA clade 50 kb inversion clade
˓→Papilionoideae Fabaceae Fabales fabids rosids Pentapetalae Gunneridae
˓→ eudicotyledons Mesangiospermae Magnoliopsida Spermatophyta Euphyllophyta
˓→Tracheophyta Embryophyta Streptophytina Streptophyta Viridiplantae
˓→Eukaryota cellular organisms
4.2 Singularity
The Singularity container can be found at https://cloud.sylabs.io/library/jpb/ncbi-taxonomist/ncbi-taxonomist. Please check the Singularity Docs if some commands are unclear. • The Singularity image creates the user user for the container to run all commands • The container has the mountpoint /dbs to bind host paths
4.2.1 Install
The latest ncbi-taxonomist Singularity image can be pulled from https://cloud.sylabs. io/library/jpb/ncbi-taxonomist/ncbi-taxonomist using the command singularity pull library://jpb/ncbi-taxonomist/ncbi-taxonomist. If desired, the image can be renamed to a more concise name.
1 $: singularity pull library://jpb/ncbi-taxonomist/ncbi-taxonomist
2 INFO: Downloading library image
3 23.7MiB / 23.7MiB
˓→[======]
˓→100% 545.9 KiB/s 0s
4 $: mv ncbi-taxonomist_latest.sif ncbi-taxonomist.sif
Line 3 will likely look different.
Build
The Singularity container can be built using the definition file container/SINGULARITY.def present in the repository.
44 Chapter 4. Container ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
For more Singularity building ootions check the corresponding man page (‘’man singularity build’‘) or documentation To build locally, you need root permissions or use the --remote option for the build command (Listing 4.3):
Listing 4.3: Building the ncbi-taxonomist Singularity container lo- cally. The command on line 1 requires root permissions while the com- mand on line 2 uses the ‘’–remote” build option without root permis- sions.
1 $: singularity build ncbi-taxonomist.sif SINGULARITY.def
2 $: singularity build --remote ncbi-taxonomist.sif SINGULARITY.def
4.2.2 Test
Assuming the image is named ncbi-taxonomist.sif, invoking the command without arguments shows the basic usage and indicating a succesful isntall(Listing 4.4):
Listing 4.4: ncbi-taxonomist usage
1 $: ./ncbi-taxonomist
2 usage: ncbi-taxonomist[--version][-v][--apikey APIKEY]{map,resolve,import,collect,
˓→subtree,group} ...
3
4 commands:
5 {map,resolve,import,collect,subtree,group}
6 map Map taxid to names and vice-versa
7 #cut
4.2.3 Basic usage
The examples assume the image is names ncbi-taxonomist.sif and show representative commands. The image can be used as an executable, i.e. it can be invoked as ./ncbi-taxonomist.sif. This corresponds to the command singularity run ncbi-taxonomist.sif. Listing 4.5 shows hoe to use both commands.
Mapping
1 $: ./ncbi-taxonomist.sif map -t 9606
2 {"mode":"mapping","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species
˓→","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":
˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"}}
Resolving
1 $: ./ncbi-taxonomist.sif resolve -t2 -n 'Arabidopsis'
2 {"mode":"resolve","query":"Arabidopsis","cast":"taxon","taxon":{"taxid":3701,"rank":
˓→"genus","names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},
˓→"parentid":980083,"name":"Arabidopsis"},"lineage":[{"taxid":3701,"rank":"genus",
˓→"names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},"parentid
˓→":980083,"name":"Arabidopsis"},{"taxid":980083,"rank":"tribe","names":{"Camelineae":
˓→"scientific_name"},"parentid":3700,"name":"Camelineae"},{"taxid":3700,"rank":"family
˓→","names":{"Brassicaceae":"scientific_name"},"parentid":3699,"name":"Brassicaceae"},
˓→{"taxid":3699,"rank":"order","names":{"Brassicales":"scientific_name"},"parentid (continues on next page) ˓→":91836,"name":"Brassicales"},{"taxid":91836,"rank":"clade","names":{"malvids":
˓→"scientific_name"},"parentid":71275,"name":"malvids"},{"taxid":71275,"rank":"clade",
˓→"names":{"rosids":"scientific_name"},"parentid":1437201,"name":"rosids"},{"taxid 4.2. Singularity 45 ˓→":1437201,"rank":"clade","names":{"Pentapetalae":"scientific_name"},"parentid
˓→":91827,"name":"Pentapetalae"},{"taxid":91827,"rank":"clade","names":{"Gunneridae":
˓→"scientific_name"},"parentid":71240,"name":"Gunneridae"},{"taxid":71240,"rank":
˓→"clade","names":{"eudicotyledons":"scientific_name"},"parentid":1437183,"name":
˓→"eudicotyledons"},{"taxid":1437183,"rank":"clade","names":{"Mesangiospermae":
˓→"scientific_name"},"parentid":3398,"name":"Mesangiospermae"},{"taxid":3398,"rank":
˓→"class","names":{"Magnoliopsida":"scientific_name"},"parentid":58024,"name":
˓→"Magnoliopsida"},{"taxid":58024,"rank":"clade","names":{"Spermatophyta":"scientific_
˓→name"},"parentid":78536,"name":"Spermatophyta"},{"taxid":78536,"rank":"clade","names
˓→":{"Euphyllophyta":"scientific_name"},"parentid":58023,"name":"Euphyllophyta"},{
˓→"taxid":58023,"rank":"clade","names":{"Tracheophyta":"scientific_name"},"parentid
˓→":3193,"name":"Tracheophyta"},{"taxid":3193,"rank":"clade","names":{"Embryophyta":
˓→"scientific_name"},"parentid":131221,"name":"Embryophyta"},{"taxid":131221,"rank":
˓→"subphylum","names":{"Streptophytina":"scientific_name"},"parentid":35493,"name":
˓→"Streptophytina"},{"taxid":35493,"rank":"phylum","names":{"Streptophyta":
˓→"scientific_name"},"parentid":33090,"name":"Streptophyta"},{"taxid":33090,"rank":
˓→"kingdom","names":{"Viridiplantae":"scientific_name"},"parentid":2759,"name":
˓→"Viridiplantae"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":
˓→"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no
˓→rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":
˓→"cellular organisms"}]} ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
3 {"mode":"resolve","query":"2","cast":"taxon","taxon":{"taxid":2,"rank":"superkingdom",
˓→"names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":
˓→"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart",
˓→"Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid
˓→":131567,"name":"Bacteria"},"lineage":[{"taxid":2,"rank":"superkingdom","names":{
˓→"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName
˓→","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":
˓→"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":
˓→"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":
˓→"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
Pipelines
1 $: ./ncbi-taxonomist.sif map -edb bioproject -a PRJNA604394 | \
2 ./ncbi-taxonomist.sif resolve -m
3 {"mode":"resolve","query":"PRJNA604394","cast":"accs","accs":{"taxid":573,"accessions
˓→":{"project_id":604394,"project_acc":"PRJNA604394","project_name":"Klebsiella
˓→pneumoniae strain:S01"},"db":"bioproject","uid":604394},"lineage":[{"taxid":573,
˓→"rank":"species","names":{"Klebsiella pneumoniae":"scientific_name","'Klebsiella
˓→aerogenes' (Kruse) Taylor et al. 1956":"Synonym","Bacillus pneumoniae":"Synonym",
˓→"Bacterium pneumoniae crouposae":"Synonym","Hyalococcus pneumoniae":"Synonym",
˓→"Klebsiella pneumoniae aerogenes":"Synonym","Klebsiella sp. 2N3":"Includes",
˓→"Klebsiella sp. C1(2016)":"Includes","Klebsiella sp. M-AI-2":"Includes","Klebsiella
˓→sp. PB12":"Includes","Klebsiella sp. RCE-7":"Includes","ATCC 13883":"type material",
˓→"ATCC:13883":"type material","BCCM/LMG:2095":"type material","CCUG 225":"type
˓→material","CCUG:225":"type material","CDC 298-53":"type material","CDC:298-53":
˓→"type material","CIP 82.91":"type material","CIP:82.91":"type material","DSM 30104":
˓→"type material","DSM:30104":"type material","HAMBI 450":"type material","HAMBI:450":
˓→"type material","IAM 14200":"type material","IAM:14200":"type material","IFO 14940":
˓→"type material","IFO:14940":"type material","JCM 1662":"type material","JCM:1662":
˓→"type material","LMG 2095":"type material","LMG:2095":"type material","NBRC 14940":
˓→"type material","NBRC:14940":"type material","NCTC 9633":"type material","NCTC:9633
˓→":"type material"},"parentid":570,"name":"Klebsiella pneumoniae"},{"taxid":570,"rank
˓→":"genus","names":{"Klebsiella":"scientific_name"},"parentid":543,"name":"Klebsiella
˓→"},{"taxid":543,"rank":"family","names":{"Enterobacteriaceae":"scientific_name"},
˓→"parentid":91347,"name":"Enterobacteriaceae"},{"taxid":91347,"rank":"order","names":
˓→{"Enterobacterales":"scientific_name"},"parentid":1236,"name":"Enterobacterales"},{
˓→"taxid":1236,"rank":"class","names":{"Gammaproteobacteria":"scientific_name"},
˓→"parentid":1224,"name":"Gammaproteobacteria"},{"taxid":1224,"rank":"phylum","names":
˓→{"Proteobacteria":"scientific_name"},"parentid":2,"name":"Proteobacteria"},{"taxid
˓→":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name"},"parentid":131567,
˓→"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":
˓→"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
Local database
To use local databases with the ncbi-taxonomist Singularity container, the path on the host machine needs to be bound to the container’s internal mountpoint /dbs via the --bind options, which cannot be used when using the executable form (Listing 4.5). However, the bind options can be stored in the enviromental variable SINGULAR- ITY_BIND(Listing 4.6).
46 Chapter 4. Container ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Listing 4.5: Populating a local database using the ncbi-taxonomist Singularity container. Lines 4 and 17 and shows how to bind the current working directory to the container. #cut indicates shortened output.
1 $ ls ${PWD}
2 #empty
3 $: ./ncbi-taxonomist.sif collect -t 9606| \
4 singularity run --bind ${PWD}:/dbs ncbi-taxonomist.sif import -db /dbs/simgtaxa.db
5 {"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},
˓→"parentid":null,"name":"cellular organisms"}
6 {"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid
˓→":131567,"name":"Eukaryota"}
7 {"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid
˓→":2759,"name":"Opisthokonta"}
8 {"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid
˓→":33154,"name":"Metazoa"}
9 {"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,
˓→"name":"Eumetazoa"}
10 {"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,
˓→"name":"Bilateria"}
11 {"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid
˓→":33213,"name":"Deuterostomia"}
12 {"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,
˓→"name":"Chordata"}
13 {"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid
˓→":7711,"name":"Craniata"}
14 #cut
15 $: ls ${PWD}
16 simgtaxa.db
17 $: singularity run --bind ${PWD}:/dbs ncbi-taxonomist.sif resolve -t 9606 -db /dbs/
˓→simgtaxa.db
18 {"mode":"resolve","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species
˓→","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":
˓→"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank
˓→":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName",
˓→"man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":
˓→"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid
˓→":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,
˓→"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_
˓→name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily",
˓→"names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{
˓→"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid
˓→":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{
˓→"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid
˓→":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid
˓→":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":
˓→"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":
˓→"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name
˓→":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":
˓→"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":
˓→"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{
˓→"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,
˓→"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name
˓→"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{
˓→"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank
˓→":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":
˓→"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha": ˓→"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid"(continues:8287, on next page) ˓→"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,
˓→"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi": 4.2.˓→"scientific_name" Singularity },"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":47 ˓→"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi
˓→"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},
˓→"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{
˓→"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid
˓→":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,
˓→"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_
˓→name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{
˓→"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid
˓→":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name
˓→":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},
˓→"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{
˓→"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank
˓→":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":
˓→"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":
˓→"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no
˓→rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":
˓→"cellular organisms"}]} ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
(continued from previous page)
Listing 4.6: Populating a local database using the ncbi-taxonomist Singularity container using the SINGULARITY_BIND enviromental vari- able. Line 1 shows how to set the enviromental variable and the echo command on line 2 should correspond to your current working direc- tory. #result indicates the same results for the corresponding commands in Listing 4.5.
1 $: export SINGULARITY_BIND="${PWD}:/dbs"
2 $: echo $SINGULARITY_BIND
3 /path/to/your/current/working/directory
4 $: ./ncbi-taxonomist.sif collect -t 9606| \
5 ./ncbi-taxonomist.sif import -db /dbs/simgtaxa.db
6 #result
7 $: ls ${PWD}
8 simgtaxa.db
9 $: ./ncbi-taxonomist.sif resolve -t 9606 -db /dbs/simgtaxa.db
10 #result
Singularity ncbi-taxonomist and jq
To use the included jq with the Singularity container, the run command has to used in conjunction with the –app option
Listing 4.7: Using ncbi-taxonomist and jq together in the Singu- larity container. Line 1 shows how to invoke jq to print its usage (cut for clarity). Line 5 shows the use of jq in a ncbi-taxonomist Sin- gularity pipeline.
1 $: singularity run --app jq ncbi-taxonomist.sif
2 #jq usage
3 $: ./ncbi-taxonomist.sif map -a QZWG01000002.1 MG831203 | \
4 ./ncbi-taxonomist.sif resolve --mapping | \
5 singularity run --app jq ncbi-taxonomist.sif -r '[.query, .lineage[].name]|@tsv'
6 MG831203 Deformed wing virus Iflavirus Iflaviridae
˓→Picornavirales Pisoniviricetes Pisuviricota Orthornavirae Riboviria
˓→Viruses
7 QZWG01000002.1 Glycine soja Glycine subgen. Soja Glycine Phaseoleae
˓→indigoferoid/millettioid clade NPAAA clade 50 kb inversion clade
˓→Papilionoideae Fabaceae Fabales fabids rosids Pentapetalae Gunneridae
˓→ eudicotyledons Mesangiospermae Magnoliopsida Spermatophyta Euphyllophyta
˓→Tracheophyta Embryophyta Streptophytina Streptophyta Viridiplantae
˓→Eukaryota cellular organisms
48 Chapter 4. Container CHAPTER 5
Frequently Asked Questions
Content
• openSSL – I’m getting an SSL: CERTIFICATE_VERIFY_FAILED error • SQLite – sqlite3.OperationalError: near "ON": syntax error during import
* Possible solution 1 * Possible solution 2
5.1 openSSL
5.1.1 I’m getting an SSL: CERTIFICATE_VERIFY_FAILED error
If you encounter an SSL error like SSL: CERTIFICATE_VERIFY_FAILED, you may need to enable the SSL mofule for Python or update the certifications. It depends on your OS or distribution. • Mac OS you need to find run Install Certificates.command, usually found in the folder where Python has been installed. On Linux, you may need to update the certificates: • Debian: run update-ca-certificates --fresh and export the environment variable SSL_CERT_DIR=/etc/ssl/certs. • Arch Linux: install the ca-certificates* packages, e.g. pacman -S ca-certificates* It is also possible to update the certificates via pip: • run pip install --upgrade certifi
49 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
5.2 SQLite
5.2.1 sqlite3.OperationalError: near "ON": syntax error during im- port ncbi-taxonomist aborts with an error message similar as shown below:
Traceback(most recent call last): File "/tools/python/3.7.4/bin/ncbi-taxonomist", line 93, in
˓→py", line 95, in import_stdin commit(db, taxa, names) File "/tools/python/3.7.4/lib/python3.7/site-packages/ncbitaxonomist/db/dbimporter.
˓→py", line 34, in commit db.add_taxa(taxa) File "/tools/python/3.7.4/lib/python3.7/site-packages/ncbitaxonomist/db/dbmanager.py
˓→", line 69, in add_taxa self.taxa.insert(self.connection, values) File "/tools/python/3.7.4/lib/python3.7/site-packages/ncbitaxonomist/db/table/taxa.
˓→py", line 39, in insert connection.cursor().executemany(stmt, taxavalues) sqlite3.OperationalError: near "ON": syntax error
Possible solution 1
The taxonomic database uses an old ncbi-taxonomist database scheme. In this case, you need to rebuild the database using a current version of ncbi-taxonomist.
Possible solution 2
This has been reported earlier (issue 2). ncbi-taxonomist uses a PostgreSQL-style UPSERT introduced to SQLite 3.24.0. You need a recent Python version (>= Python 3.8) and SQlite version >= 3.24.0. You can use the available containers if you can’t update Python or SQLite. If none of these solutions work for you, please open an issue.
50 Chapter 5. Frequently Asked Questions CHAPTER 6
Module references
Documentation of the different modules and classes used in ncbi-taxonomist.
Contents
• Module references – ncbi-taxonomist – Analyzer
* Accession analyzer: ncbitaxonomist.analyzer.accession * Collection analyzer: ncbitaxonomist.analyzer.collect * Mapping analyzer: ncbitaxonomist.analyzer.mapping * Resolve analyzer: ncbitaxonomist.analyzer.resolve – Cache
* Cache module: ncbitaxonomist.cache * Taxa cache module: ncbitaxonomist.cache.taxa * Accession cache module: ncbitaxonomist.cache.accession – Converter
* Base converter: ncbitaxonomist.convert.converter * Attribute mapping: ncbitaxonomist.convert.convertermap * Local database accession converter: ncbitaxonomist.convert.accessiondb * NCBI accessions converter: ncbitaxonomist.convert.ncbiaccession * NCBI taxon converter: ncbitaxonomist.convert.ncbitaxon * Local database taxon converter: ncbitaxonomist.convert.taxadb
51 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
– Data models
* Basic data model: ncbitaxonomist.model.datamodel * Taxon model: ncbitaxonomist.model.taxon * Accession Data model: ncbitaxonomist.model.accession – Database
* Database manager: ncbitaxonomist.db.dbmanager * Database importer: ncbitaxonomist.db.dbimporter * Database tables · Base table: ncbitaxonomist.db.table.basetable · Taxa table: ncbitaxonomist.db.table.taxa · Names table: ncbitaxonomist.db.table.names · Accession table: ncbitaxonomist.db.table.accessions · Accession table: ncbitaxonomist.db.table.groups – Entrez results
* Accession result: ncbitaxonomist.entrezresult.accession * Taxa cache module: ncbitaxonomist.entrezresult.mapping * Accession cache module: ncbitaxonomist.entrezresult.taxonomy – Formatter
* Base module: ncbitaxonomist.formatter.base * JSON formatter: ncbitaxonomist.formatter.jsonformatter * XML formatter: ncbitaxonomist.formatter.xmlformatter – Logging
* Configuration: ncbitaxonomist.log.conf * Logger: ncbitaxonomist.log.logger – Mappers
* Mapper: ncbitaxonomist.mapper * Remote mapper: ncbitaxonomist.analyzer.mapping * Remote accession mapper: ncbitaxonomist.analyzer.accession – Parser
* Argument parser: ncbitaxonomist.parser.arguments * Group data parser: ncbitaxonomist.parser.group * General stdout parser: ncbitaxonomist.parser.stdout – Queries
* Collect queries · Base query: ncbitaxonomist.query.collect.collect
52 Chapter 6. Module references ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
· Name query: ncbitaxonomist.query.collect.name · Taxid query: ncbitaxonomist.query.collect.taxid
* Map queries · Base query: ncbitaxonomist.query.map.map · Name query: ncbitaxonomist.query.map.name · Taxid query: ncbitaxonomist.query.map.taxid · Accession query: ncbitaxonomist.query.map.accession
* Resolve queries · Base query: ncbitaxonomist.query.resolve.resolve · Name query: ncbitaxonomist.query.resolve.name · Taxid query: ncbitaxonomist.query.resolve.taxid · Accession query: ncbitaxonomist.query.resolve.accession
* Remote query pipelines – Payloads
* Base class for payloads: ncbitaxonomist.payload.payload * Taxid payload * Names payload * Accessions payload * Accession map payload – Resolver – Lineage resolver – Subtrees
* Subtree – Subtree analyzer – Utility functions used across modules
* Utility functions: ncbitaxonomist.utils
6.1 ncbi-taxonomist
This is the entry script for ncbi-taxonomist. It runs the requested command and checks its parameters.
6.2 Analyzer
Analyzer handle remote data from Entrez and are inherited from entrezpy.base.analyzer. EutilsAnalyzer.
6.1. ncbi-taxonomist 53 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
6.2.1 Accession analyzer: ncbitaxonomist.analyzer.accession
6.2.2 Collection analyzer: ncbitaxonomist.analyzer.collect
6.2.3 Mapping analyzer: ncbitaxonomist.analyzer.mapping
6.2.4 Resolve analyzer: ncbitaxonomist.analyzer.resolve
6.3 Cache
Cache caches taxa to reuse already solved queries, avoiding unnessecary local or remote database lookups.
6.3.1 Cache module: ncbitaxonomist.cache
6.3.2 Taxa cache module: ncbitaxonomist.cache.taxa
6.3.3 Accession cache module: ncbitaxonomist.cache.accession class ncbitaxonomist.cache.accession.AccessionCache Class to handle caching of accessions. Accessions are stored mapping accessions as key and class:ncbitaxonomist.model.accession.AccessionData as value. cache(acc: Type[ncbitaxonomist.model.accession.Accession]) Caches accession get_accession(acc) → Type[ncbitaxonomist.model.accession.Accession] Returns given or all taxids in cache incache(name=None, taxid=None) Tests if given accession is in cache. is_empty() Tests if cache is empty.
6.4 Converter
Converter convert between data models and pure attributes.
6.4.1 Base converter: ncbitaxonomist.convert.converter class ncbitaxonomist.convert.converter.ModelConverter Base class for converters between attributes and models. convert_from_model(model: Type[ncbitaxonomist.model.datamodel.DataModel], outdict: Map- ping[KT, VT_co] = None) → Dict[KT, VT] Virtual method converts model to attributes convert_to_model(attributes: Mapping[str, any], srcdb=None) → Type[ncbitaxonomist.model.datamodel.DataModel] Virtual method converts attributes to model
54 Chapter 6. Module references ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
map_inattributes(mattribs: Mapping[str, any], indata: Mapping[str, any], convmap: Mapping[str, str], switch: bool = False) Map input attributes to wanted model attributes
6.4.2 Attribute mapping: ncbitaxonomist.convert.convertermap
Maps indicating which data attributes are convertred to which model attributes.
6.4.3 Local database accession converter: ncbitaxonomist.convert. accessiondb class ncbitaxonomist.convert.accessiondb.DbAccessionConverter Class implementing a converter for accession attributes and models convert_from_model(model: Type[ncbitaxonomist.model.accession.Accession], outdict=None) → Dict[str, str] Converts accession model to attributes convert_to_model(attributes: Mapping[str, any], srcdb=None) → Type[ncbitaxonomist.model.accession.Accession] Converts local database attributes to accession model
6.4.4 NCBI accessions converter: ncbitaxonomist.convert.ncbiaccession class ncbitaxonomist.convert.ncbiaccession.NcbiAccessionConverter Convert NCBI accession data into model or model into attributes convert_from_model(model: Type[ncbitaxonomist.model.accession.Accession], outdict=None) → Dict[str, str] Converts accession model to attributes convert_to_model(attributes: Mapping[str, any], srcdb=None) → Type[ncbitaxonomist.model.accession.Accession] Converts NCBI attributes to accession model
6.4.5 NCBI taxon converter: ncbitaxonomist.convert.ncbitaxon
6.4.6 Local database taxon converter: ncbitaxonomist.convert.taxadb class ncbitaxonomist.convert.taxadb.TaxaDbConverter Converts local database attributes into class:ncbitaxonomist.model.taxon.Taxon instances and vice versa convert_from_model(model: Type[ncbitaxonomist.model.taxon.Taxon], outdict=None) → Dict[str, str] Virtual method converts model to attributes convert_to_model(attributes: Mapping, srcdb=None) → Type[ncbitaxonomist.model.taxon.Taxon] Convert local database taxon attributes into class:ncbitaxonomist.model.taxon.Taxon
6.5 Data models ncbi-taxonomist data models implement taxonomic and accession data. Models use a ncbitaxonomist. model.datamodel.DataModel as base class.
6.5. Data models 55 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
6.5.1 Basic data model: ncbitaxonomist.model.datamodel class ncbitaxonomist.model.datamodel.DataModel(cast, attributes: Mapping[KT, VT_co] = None) Base class for data models. get_attributes() → Dict[str, any] Return taxon attributes as dictionary. classmethod new(attributes: Mapping[str, any] = None) → ncbitax- onomist.model.datamodel.DataModel Return new instance with given attributes classmethod new_from_json(json_attributes: str) → ncbitax- onomist.model.datamodel.DataModel Return new instance with attributes encoded in JSON taxid() ncbitaxonomist.model.datamodel.int_attribute(attribute) Enforce int for attribute ncbitaxonomist.model.datamodel.standardize_attributes(attributes: Mapping[str, any]) Convert None into empty dictionary. See Important warning at https://docs.python.org/3/tutorial/controlflow. html#default-argument-values
6.5.2 Taxon model: ncbitaxonomist.model.taxon
6.5.3 Accession Data model: ncbitaxonomist.model.accession
ncbitaxonomist.model.datamodel.DataModel ncbitaxonomist.model.accession.Accession
class ncbitaxonomist.model.accession.Accession(attributes: Mapping[KT, VT_co] = None)
get_accessions() → Dict[str, str] Return accessions as dictionary get_attributes() → Dict[str, any] Return taxon attributes as dictionary. classmethod new(attributes: Mapping[str, any] = None) → ncbitax- onomist.model.datamodel.DataModel Return new instance with given attributes classmethod new_from_json(json_attributes: str) → ncbitax- onomist.model.datamodel.DataModel Return new instance with attributes encoded in JSON taxid()
56 Chapter 6. Module references ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
update_accessions(accession: Mapping[str, str]) Update accessions from dictionary with structure accession:type
6.6 Database
Database modules for a local ncbi-taxonomist database
6.6.1 Database manager: ncbitaxonomist.db.dbmanager
6.6.2 Database importer: ncbitaxonomist.db.dbimporter
6.6.3 Database tables
Base table: ncbitaxonomist.db.table.basetable class ncbitaxonomist.db.table.basetable.BaseTable(name: str, database: str) Implements a basic table in a taxonomist database. create(connection: Type[sqlite3.Connection]) → ncbitaxonomist.db.table.basetable.BaseTable Virtual function to create table create_index(connection: Type[sqlite3.Connection]) → None Virtual function to create table index insert(connection: Type[sqlite3.Connection], values: Tuple) → None Virtual function to insert rows
Taxa table: ncbitaxonomist.db.table.taxa class ncbitaxonomist.db.table.taxa.TaxaTable(database: str) Implements taxa table for local taxonomy database. create(connection: Type[sqlite3.Connection]) → ncbitaxonomist.db.table.taxa.TaxaTable Virtual function to create table create_index(connection: Type[sqlite3.Connection]) → None Virtual function to create table index get_lineage(connection: Type[sqlite3.Connection], taxid: int, name_table: str) → Type[sqlite3.Cursor] Recursive construction of lineage from given taxid to highest parent. get_rows(connection: Type[sqlite3.Connection]) → Type[sqlite3.Cursor] get_subtree(connection: Type[sqlite3.Connection], taxid: int) → Type[sqlite3.Cursor] Depth first search of taxon ids to find the subtree of taxid get_taxids(connection: Type[sqlite3.Connection]) → Type[sqlite3.Cursor] insert(connection: Type[sqlite3.Connection], taxavalues: Iterable[Tuple[int, str, int]]) → None Virtual function to insert rows insert_taxids(connection: Type[sqlite3.Connection], taxids: Iterable[int]) → None
6.6. Database 57 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Names table: ncbitaxonomist.db.table.names class ncbitaxonomist.db.table.names.NameTable(database: str) Implements the name table in a taxonomist database. create(connection: Type[sqlite3.Connection]) → ncbitaxonomist.db.table.names.NameTable Virtual function to create table create_index(connection: Type[sqlite3.Connection]) Virtual function to create table index get_rows(connection: Type[sqlite3.Connection]) → Type[sqlite3.Cursor] insert(connection: Type[sqlite3.Connection], values: Tuple[int, str, str]) Virtual function to insert rows name_to_taxid(connection: Type[sqlite3.Connection], name) → Type[sqlite3.Cursor]
Accession table: ncbitaxonomist.db.table.accessions class ncbitaxonomist.db.table.accessions.AccessionTable(database)
create(connection: Type[sqlite3.Connection]) → ncbitaxonomist.db.table.accessions.AccessionTable Virtual function to create table create_index(connection: Type[sqlite3.Connection]) → None Virtual function to create table index get_rows(connection: Type[sqlite3.Connection]) → Type[sqlite3.Cursor] insert(connection: Type[sqlite3.Connection], values: Iterable[Tuple[str, str, str, int, int]]) → None Virtual function to insert rows
Accession table: ncbitaxonomist.db.table.groups class ncbitaxonomist.db.table.groups.GroupTable(database: str)
create(connection: Type[sqlite3.Connection]) → ncbitaxonomist.db.table.groups.GroupTable Virtual function to create table create_index(connection: Type[sqlite3.Connection]) → None Virtual function to create table index delete_from_group(connection: Type[sqlite3.Connection], values: Iterable[Tuple[str, int]]) → None delete_group(connection: Type[sqlite3.Connection], groupname: str) → None insert(connection: Type[sqlite3.Connection], values: Iterable[Tuple[int, str]]) → None Virtual function to insert rows retrieve_group(connection: Type[sqlite3.Connection], groupname: str) retrieve_names(connection: Type[sqlite3.Connection]) → Type[sqlite3.Cursor]
58 Chapter 6. Module references ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
6.7 Entrez results
Implementations of Entrez results inherited from entrezpy.base.result.EutilsResult.
6.7.1 Accession result: ncbitaxonomist.entrezresult.accession
6.7.2 Taxa cache module: ncbitaxonomist.entrezresult.mapping
6.7.3 Accession cache module: ncbitaxonomist.entrezresult.taxonomy
6.8 Formatter
Formats JSON and XML outputs.
6.8.1 Base module: ncbitaxonomist.formatter.base
6.8.2 JSON formatter: ncbitaxonomist.formatter.jsonformatter
6.8.3 XML formatter: ncbitaxonomist.formatter.xmlformatter
6.9 Logging
Logging for ncbi-taxonomist.
6.9.1 Configuration: ncbitaxonomist.log.conf
6.9.2 Logger: ncbitaxonomist.log.logger
6.10 Mappers
Mappers handle the mapping of taxids, names, and accessions to each other. Analyzers are inherited and adjusted from entrezpy.
6.10.1 Mapper: ncbitaxonomist.mapper
6.10.2 Remote mapper: ncbitaxonomist.analyzer.mapping
6.10.3 Remote accession mapper: ncbitaxonomist.analyzer.accession
6.11 Parser
Parsers used in ncbi-taxonomist
6.7. Entrez results 59 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
6.11.1 Argument parser: ncbitaxonomist.parser.arguments ncbitaxonomist.parser.arguments.parse(basename) ncbitaxonomist.parser.arguments.version(basename)
6.11.2 Group data parser: ncbitaxonomist.parser.group class ncbitaxonomist.parser.group.GroupParser
parse(groupname: str) Parse stdin for taxonid to add into group groupname parse_taxa_list(taxa_list, taxids, groupname) parse_taxon(taxid, taxids, groupname)
6.11.3 General stdout parser: ncbitaxonomist.parser.stdout
6.12 Queries
Queries are modules implementing a specific taxonomic query, either remote or for a local database.
6.12.1 Collect queries
Queries to collect taxa remotely from Entrez.
Base query: ncbitaxonomist.query.collect.collect
Name query: ncbitaxonomist.query.collect.name
Taxid query: ncbitaxonomist.query.collect.taxid
6.12.2 Map queries
Queries to map taxa locally or remotely from Entrez.
Base query: ncbitaxonomist.query.map.map
Name query: ncbitaxonomist.query.map.name
Taxid query: ncbitaxonomist.query.map.taxid
Accession query: ncbitaxonomist.query.map.accession
6.12.3 Resolve queries
Queries to resolve taxa and accessions locally or remotely from Entrez.
60 Chapter 6. Module references ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
Base query: ncbitaxonomist.query.resolve.resolve
Name query: ncbitaxonomist.query.resolve.name
Taxid query: ncbitaxonomist.query.resolve.taxid
Accession query: ncbitaxonomist.query.resolve.accession
6.12.4 Remote query pipelines entrezpy.conduit pipelines to fetch remote query data.
6.13 Payloads
Payloads implement the requested taxids, names, and accessions. They keep track what has been successfully ana- lyzed.
6.13.1 Base class for payloads: ncbitaxonomist.payload.payload
6.13.2 Taxid payload
6.13.3 Names payload
6.13.4 Accessions payload
6.13.5 Accession map payload
6.14 Resolver
The resolver module implements the resolving of lineages for names, taxids, and accessions.
6.15 Lineage resolver
The lineage resolver resolves whole lineages or the lienage taxa between given ranks.
6.16 Subtrees
Subtrees are selected taxa form lineages.
6.16.1 Subtree
Implemenets a subtree
6.13. Payloads 61 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
6.17 Subtree analyzer
The subtree analyzer manages subtrees
6.18 Utility functions used across modules
6.18.1 Utility functions: ncbitaxonomist.utils
Content
• Synopsis – Functions • Requirements and Dependencies – Requirements – Dependencies • Contact • Indices and tables
62 Chapter 6. Module references CHAPTER 7
Synopsis
$: pip install ncbi-taxonomist --user $: ncbi-taxonomist collect -n human ncbi-taxonomist handles and manages phylogenetic data available in NCBI’s Entrez databases.
7.1 Functions
• Collect collect taxa from the Entrez Taxonomy database • Map map taxids, names, and accessions to related taxonomic information • Resolve: resolve lineages for taxa (taxid and names) and accessions, e.g. sequence or protein • Import: store obtained results locally in a SQLite databases • Subtree: extract a whole lineage, or a specific rank, or a range of ranks, from a taxid or name • Group: create user defined groups for taxa, for example: • create a group for all taxa specific for a project • group taxa without a phylogenetic relationship, e.g. group all taxa representing trees inot a group “trees” The ncbi-taxonomist commands, e.g. map or import, can be chained together using pipes to from more complex tasks. For example, to populate a local database collect will fetch data remotely from Entrez and print it to STDOUT where import will read STDIN and populates the local database (see below). ncbi-taxonomist collect -n human | ncbi-taxonomist import -db taxo.db
63 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
64 Chapter 7. Synopsis CHAPTER 8
Requirements and Dependencies
8.1 Requirements
• Required: Python >= 3.8 $: python --version • Optional: To use local databases, SQLite (>= 3.24.0) has to be installed. ncbi-taxonomist works without local databases, but needs to fetch all data remotely for each query. $: sqlite3 --version
8.2 Dependencies
ncbi-taxonomist has one dependency: • entrezpy: to handle remote requests to NCBI’s Entrez databases – https://gitlab.com/ncbipy/entrezpy.git – https://pypi.org/project/entrezpy/ – https://doi.org/10.1093/bioinformatics/btz385 This is a library maintained by myself and relies solely on the Python standard library. Therefore, ncbi-taxonomist is less prone to suffer dependency hell.
65 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
66 Chapter 8. Requirements and Dependencies CHAPTER 9
Contact
To report bugs and/or errors, please open an issue at https://gitlab.com/ncbi-taxonomist or contact me at: [email protected]. Of course, feel free to fork the code, improve it, and/or open a pull request.
67 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
68 Chapter 9. Contact CHAPTER 10
Indices and tables
• genindex • modindex • search
69 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
70 Chapter 10. Indices and tables Python Module Index
n ncbitaxonomist.cache, 54 ncbitaxonomist.cache.accession, 54 ncbitaxonomist.convert.accessiondb, 55 ncbitaxonomist.convert.converter, 54 ncbitaxonomist.convert.convertermap, 55 ncbitaxonomist.convert.ncbiaccession, 55 ncbitaxonomist.convert.taxadb, 55 ncbitaxonomist.db.table.accessions, 58 ncbitaxonomist.db.table.basetable, 57 ncbitaxonomist.db.table.groups, 58 ncbitaxonomist.db.table.names, 58 ncbitaxonomist.db.table.taxa, 57 ncbitaxonomist.log.conf, 59 ncbitaxonomist.model.accession, 56 ncbitaxonomist.model.datamodel, 56 ncbitaxonomist.parser.arguments, 60 ncbitaxonomist.parser.group, 60
71 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
72 Python Module Index Index
A create() (ncbitaxonomist.db.table.accessions.AccessionTable Accession (class in ncbitaxonomist.model.accession), method), 58 56 create() (ncbitaxonomist.db.table.basetable.BaseTable AccessionCache (class in ncbitax- method), 57 onomist.cache.accession), 54 create() (ncbitaxonomist.db.table.groups.GroupTable AccessionTable (class in ncbitax- method), 58 onomist.db.table.accessions), 58 create() (ncbitaxonomist.db.table.names.NameTable method), 58 B create() (ncbitaxonomist.db.table.taxa.TaxaTable method BaseTable (class in ncbitax- ), 57 create_index() ncbitax- onomist.db.table.basetable), 57 ( onomist.db.table.accessions.AccessionTable C method), 58 create_index() (ncbitax- cache() ncbitaxonomist.cache.accession.AccessionCache ( onomist.db.table.basetable.BaseTable method), method ), 54 57 convert_from_model() ncbitax- ( create_index() (ncbitax- onomist.convert.accessiondb.DbAccessionConverter onomist.db.table.groups.GroupTable method), method ), 55 58 convert_from_model() ncbitax- ( create_index() (ncbitax- onomist.convert.converter.ModelConverter onomist.db.table.names.NameTable method), method ), 54 58 convert_from_model() ncbitax- ( create_index() (ncbitax- onomist.convert.ncbiaccession.NcbiAccessionConverter onomist.db.table.taxa.TaxaTable method), method ), 55 57 convert_from_model() (ncbitax- onomist.convert.taxadb.TaxaDbConverter D method), 55 DataModel (class in ncbitaxonomist.model.datamodel), convert_to_model() (ncbitax- 56 onomist.convert.accessiondb.DbAccessionConverter DbAccessionConverter (class in ncbitax- method), 55 onomist.convert.accessiondb), 55 convert_to_model() (ncbitax- delete_from_group() (ncbitax- onomist.convert.converter.ModelConverter onomist.db.table.groups.GroupTable method), method), 54 58 convert_to_model() (ncbitax- delete_group() (ncbitax- onomist.convert.ncbiaccession.NcbiAccessionConverter onomist.db.table.groups.GroupTable method), method), 55 58 convert_to_model() (ncbitax- onomist.convert.taxadb.TaxaDbConverter G method), 55 get_accession() (ncbitax-
73 ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
onomist.cache.accession.AccessionCache M method), 54 map_inattributes() (ncbitax- get_accessions() (ncbitax- onomist.convert.converter.ModelConverter onomist.model.accession.Accession method), method), 54 56 ModelConverter (class in ncbitax- get_attributes() (ncbitax- onomist.convert.converter), 54 onomist.model.accession.Accession method), 56 N get_attributes() (ncbitax- name_to_taxid() (ncbitax- onomist.model.datamodel.DataModel method), onomist.db.table.names.NameTable method), 56 58 get_lineage() ncbitax- ( NameTable (class in ncbitaxonomist.db.table.names), onomist.db.table.taxa.TaxaTable method), 58 57 NcbiAccessionConverter (class in ncbitax- get_rows() (ncbitax- onomist.convert.ncbiaccession), 55 onomist.db.table.accessions.AccessionTable ncbitaxonomist.cache (module), 54 method), 58 ncbitaxonomist.cache.accession (module), get_rows() ncbitax- ( 54 onomist.db.table.names.NameTable method), ncbitaxonomist.convert.accessiondb (mod- 58 ule), 55 get_rows() (ncbitaxonomist.db.table.taxa.TaxaTable ncbitaxonomist.convert.converter (mod- method), 57 ule), 54 get_subtree() (ncbitax- ncbitaxonomist.convert.convertermap onomist.db.table.taxa.TaxaTable method ), (module), 55 57 ncbitaxonomist.convert.ncbiaccession get_taxids() (ncbitax- (module), 55 onomist.db.table.taxa.TaxaTable method), ncbitaxonomist.convert.taxadb (module), 55 57 ncbitaxonomist.db.table.accessions (mod- GroupParser class in ncbitaxonomist.parser.group ( ), ule), 58 60 ncbitaxonomist.db.table.basetable (mod- GroupTable (class in ncbitaxonomist.db.table.groups), ule), 57 58 ncbitaxonomist.db.table.groups (module), I 58 ncbitaxonomist.db.table.names (module), 58 incache() (ncbitaxonomist.cache.accession.AccessionCachencbitaxonomist.db.table.taxa (module), 57 method), 54 ncbitaxonomist.log.conf (module), 59 insert() (ncbitaxonomist.db.table.accessions.AccessionTablencbitaxonomist.model.accession (module), method), 58 56 insert() (ncbitaxonomist.db.table.basetable.BaseTable ncbitaxonomist.model.datamodel (module), method), 57 56 insert() (ncbitaxonomist.db.table.groups.GroupTable ncbitaxonomist.parser.arguments (module), method), 58 60 insert() (ncbitaxonomist.db.table.names.NameTable ncbitaxonomist.parser.group (module), 60 method), 58 new() (ncbitaxonomist.model.accession.Accession class insert() (ncbitaxonomist.db.table.taxa.TaxaTable method), 56 method), 57 new() (ncbitaxonomist.model.datamodel.DataModel insert_taxids() (ncbitax- class method), 56 onomist.db.table.taxa.TaxaTable method), new_from_json() (ncbitax- 57 onomist.model.accession.Accession class int_attribute() (in module ncbitax- method), 56 onomist.model.datamodel), 56 new_from_json() (ncbitax- is_empty() (ncbitax- onomist.model.datamodel.DataModel class onomist.cache.accession.AccessionCache method), 56 method), 54
74 Index ncbi-taxonomist Documentation, Release 1.2.1+8580b9b
P parse() (in module ncbitaxonomist.parser.arguments), 60 parse() (ncbitaxonomist.parser.group.GroupParser method), 60 parse_taxa_list() (ncbitax- onomist.parser.group.GroupParser method), 60 parse_taxon() (ncbitax- onomist.parser.group.GroupParser method), 60 R retrieve_group() (ncbitax- onomist.db.table.groups.GroupTable method), 58 retrieve_names() (ncbitax- onomist.db.table.groups.GroupTable method), 58 S standardize_attributes() (in module ncbitax- onomist.model.datamodel), 56 T TaxaDbConverter (class in ncbitax- onomist.convert.taxadb), 55 TaxaTable (class in ncbitaxonomist.db.table.taxa), 57 taxid() (ncbitaxonomist.model.accession.Accession method), 56 taxid() (ncbitaxonomist.model.datamodel.DataModel method), 56 U update_accessions() (ncbitax- onomist.model.accession.Accession method), 56 V version() (in module ncbitax- onomist.parser.arguments), 60
Index 75