G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
Molecular & Biochemical Parasitology xxx (2016) xxx–xxx
Contents lists available at ScienceDirect
Molecular & Biochemical Parasitology
−
WormBase ParaSite a comprehensive resource for helminth genomics
a,∗ a b a b
Kevin L. Howe , Bruce J. Bolt , Myriam Shafie , Paul Kersey , Matthew Berriman
a
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
b
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
a r t i c l e i n f o a b s t r a c t
Article history: The number of publicly available parasitic worm genome sequences has increased dramatically in the
Received 7 October 2016
past three years, and research interest in helminth functional genomics is now quickly gathering pace in
Received in revised form
response to the foundation that has been laid by these collective efforts. A systematic approach to the
24 November 2016
organisation, curation, analysis and presentation of these data is clearly vital for maximising the util-
Accepted 25 November 2016
ity of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.
Available online xxx
wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and
platyhelminth species are integrated, adding value by way of systematic and consistent functional annota-
Keywords:
tion (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage
Genome browser
specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide
Comparative genomics
Functional genomics several ways of exploring the data, including genome browsers, genome and gene summary pages, text
Helminths search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review,
WormBase we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the
Ensembl
displays and tools available to users for interrogating helminth genomic data.
© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction agricultural importance and therefore attract research interest in
their own right. The first parasitic nematode to have its genome
The WormBase project (http://www.wormbase.org, [1]) was fully sequenced was Brugia malayi [3], and since then, many have
initiated to facilitate and accelerate biological research that uses the followed [4–37]. Of the WormBase “core” species (the ones for
model nematode Caenorhabditis elegans, by making the collected which the reference genome sequence and annotation are curated),
outputs of scientists accessible from a single resource. This enables three are parasitic worms: Brugia malayi, Onchocerca volvulus and
the transfer of this wealth of knowledge to the study of other meta- Strongyloides ratti (http://www.wormbase.org/species). However,
zoa, from nematodes to humans. C. elegans data described in the there are a number of challenges associated with expanding the
research literature, deposited in the archives, or submitted directly remit to helminths. Firstly, the primary research goal of parasitol-
is placed into context via a combination of detailed manual cura- ogists is to identify ways of controlling the parasite, and as such
tion and semi-automatic data integration. In addition, the project their desired entry points and common use-cases for WormBase
curates the reference genome sequence, gene structures and other are often distinct from those of scientists doing basic science using
genomic features for C. elegans, thereby providing a high quality C. elegans as a model. Secondly, the wider community of worm par-
foundation for downstream studies. The WormBase mission also asitologists includes those studying platyhelminths (flatworms),
extends to free-living relatives of C. elegans, which were the focus which are outside the taxonomic scope of WormBase, which is
of early post-genomic research in the areas of evolution and com- a nematode resource. Thirdly, there have been recent concerted
parative biology [2]. efforts to sequence the genomes of many helminths (nematodes
In recent years, WormBase has begun to expand its mission and platyhelmiths) (e.g. the 50 Helminth Genomes Initiative [38]),
to include plant and animal parasitic nematodes which, while resulting in a flood of new draft genomes that vary considerably in
more distantly related to C. elegans, have direct biomedical and contiguity.
In response to these challenges, we have created WormBase Par-
aSite, a comprehensive new resource for parasitic worm genomes,
∗ aiming to serve parasitologists working on helminths who use
Corresponding author.
E-mail address: [email protected] (K.L. Howe). genomics as an investigative tool. WormBase ParaSite leverages the
http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
0166-6851/© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.
0/).
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
2 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx
Table 1
infrastructure and expertise of the WormBase project, with a main
Species with RNA-Seq expression tracks in WormBase ParaSite (release 7). Included
mission of breadth (foundational data for many species) rather than
are some unpublished studies that are publicly available via the European
depth (rich data for a single species).
Nucleotide Archive.
Species Studies Samples (tracks)
Nematode Brugia malayi 3 [58,59] 57
2. Data integration and analysis
Onchocerca volvulus 2 [60,61] 19
Strongyloides ratti 2 [34] 8
2.1. Genomes Strongyloides stercoralis 2 [62] 42
Trichuris muris 1 [25] 22
Our mission is to include all publicly available nematode and Platyhelminth Echinococcus granulosus 2 [15] 6
platyhelminth genomes in WormBase ParaSite. Where multi- Echinococcus multicularis 2 [15] 35
Fasciola hepatica 2 [31] 17
ple genome assemblies exist for the same species (e.g. different
Hymenolepis microstoma 2 [15] 17
genome projects for Haemonchus contortus [18,19], and male and
Schistosoma mansoni 6 [63–67] 33
female isolates for Trichuris suis [24]), we have included all. Release
7 of the resource (August 2016) included genomes from 82 nema-
tode species (98 genomes) and 28 platyhelminth species (30 2.2.2. Functional annotation
genomes). We maintain an up-to-date list of genomes for ease of Literature describing gene function is sparse for most of the
reference (http://parasite.wormbase.org/species.html). species in WormBase ParaSite, although we anticipate that the
Our primary source for genome sequences is the International availability of the genomes will stimulate new research. In lieu of
Nucleotide Sequence Database Collaboration (INSDC) resources annotations based on experimental evidence, we have used estab-
[39]. In rare cases, we collect genomes from project-specific FTP lished automated methods to predict the function of as many gene
sites or direct engagement with genome project scientists. How- products as possible. Firstly, we broker the submission of all protein
ever, it is our strong preference to use genomes deposited with sequences in WormBase ParaSite to the UniProt Knowledgebase
INSDC as these have been formally checked, processed and ver- [51], and import the product names assigned by them. These are
sioned by an authoritative sequence archive resource. As well as defined by a combination of manual curation and automatic anno-
allowing us to formally disambiguate between different genomes tation. For more detailed annotation, we use InterProScan [52] from
for the same species, by way of the INSDC BioProject identifier the InterPro project [53]. As well as predicting protein domains (e.g.
(http://www.ncbi.nlm.gov/bioproject), it also simplifies the pro- from the Pfam [54] database), it also assigns terms from the Gene
cess of integrating and displaying functional genomics data that has Ontology (GO) [55]. We run the latest version of the pipeline and
been submitted to the archives. For the species in common between data across the complete proteome for every genome each release,
WormBase and WormBase ParaSite (C. elegans and other free-living so that we are always up-to-date with the latest InterPro domain
nematodes, Brugia malayi, Strongyloides ratti and Onchocerca volvu- and Gene Ontology annotations.
lus), we synchronise the data with a specific release of WormBase.
For example, WormBase ParaSite release 7 was synchronised with
2.2.3. Gene expression
WormBase release WS254, meaning that data for species in com-
High-throughput sequencing of RNA has become the standard
mon were identical in both resources.
assay for measuring gene expression, and numerous studies con-
ducting “RNA-Seq” experiments in helminth species have now been
performed and deposited in the sequence archives. We ultimately
2.2. Genome annotation aim to align all helminth RNA-Seq data to the corresponding refer-
ence genome using a standard pipeline, in collaboration with the
2.2.1. Gene structures Gene Expression Atlas project [56] (see Section 4). In the meantime,
The definition of the intron/exon structure of protein-coding we have processed data for a small number of species using our own
genes is an important foundation for interpretation of genome pipeline, as a pilot study (Table 1). Briefly, we use STAR [57] to align
function. WormBase curates gene structures for a C. elegans and a each experiment to the reference genome, merging resulting BAM
small set of nematode species with high-quality reference genomes files when they are technical replicates of the same sample. We
[40]. For others, we import annotations from the acknowledged have labelled each sample with the appropriate descriptive terms,
authority for that genome (e.g. GeneDB [41] for Schistosoma man- using ontological terms where possible (e.g. from the WormBase
soni), or from the group that sequenced and published the genome life-stage ontology). The alignments can be viewed on the genome
[4–37]. browser (see Section 3.2).
In certain circumstances, we collaborate with genome projects
to provide first pass annotation of protein-coding gene structures.
2.3. Comparative genomics
We have co-developed a pipeline that produces high-quality gene
predictions using MAKER [42] to integrate evidence from mul-
Having genomes, genes and proteins for many helminth species
tiple sources: ab initio gene predictions from AUGUSTUS [43],
provides the opportunity to study the evolution of helminths at the
GeneMark-ES [44], and SNAP [45]; projected annotations from
molecular level. We use the Ensembl Compara system [68] to infer
C. elegans and the taxonomically nearest previously-annotated
the orthology and paralogy relationships between all nematode and
helminth using GenBlastG [46] and RATT [47]; and alignments of
platyhelminth genes, supplemented by genes from a number of
ESTs, mRNAs and proteins from related organisms. The pipeline was
other comparator species, including human, mouse and yeast.
used to annotate the majority of the genomes sequenced as part of
The method can be briefly summarised as follows: genes are first
the 50 Helminth Genomes Project [38].
organised into homologous clusters. Traditionally, this has been
A small number of genomes have curated structures for non-
done using NCBI BLAST+ [69] and hcluster sg (H. Li, unpublished)
coding RNAs, which we also import into WormBase ParaSite. To
but more recent versions of the system support the use of Hidden
supplement these, we have a pipeline that uses RNAmmer [48],
Markov Models from the PANTHER protein classification system
tRNAScan-SE [49] and Rfam [50] to predict structures for (respec-
[70]. A protein multiple alignment for each cluster is then con-
tively) ribosomal RNAs, transfer RNAs, and other non-coding RNAs.
structed using M-Coffee [71] or MAFFT [72]. The choice of program
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 3
Fig. 1. Viewing variation data in WormBase ParaSite. Top: the variation image view of a DNA helicase gene in Strongyloides ratti. The genomic variants are shown in a track
along the top, drawn to show their location with respect to the gene structure and protein domains, and colour-coded according to their putative effect on the protein.
Bottom: summary view of a single variant in the S. ratti genome. This view shows the effect of the variant on the reference annotation, genotypes for all samples assayed, and
(not shown in figure) metrics associated with the quality of the variant call (e.g. read-depth). This particular variant gives rise to a nonsense mutation in 6 of the 10 strains
assayed.
is made dynamically by the pipeline using a set of empirically- [75,76]; and (c) the website displays and tools [73]. We have cus-
derived rules regarding certain properties of the input data (for tomised some of the Ensembl tools for use in WormBase ParaSite.
example, MAFFT is generally favoured for larger clusters). Finally, For example, we have modified Ensembl code to provide sequence
TreeBeST [68] is used to produce a gene tree, combining a number search and data mining services that allow the interrogation of all
of methods which reconcile the sequence-based gene phylogeny species, or large sub-groups of species (e.g. all nematodes) at once
with the species phylogeny. The resulting tree represents an evo- in a single query (see Section 3.3).
lutionary history of the gene family, which can be used to infer
true orthologues (genes in different species related by a specia-
3. Website − displays and tools
tion event) and paralogues (genes in the same species related by a
duplication event).
3.1. General navigation
2.4. Infrastructure The most common entry point to WormBase ParaSite is via the
home page (http://parasite.wormbase.org), which collects together
The Ensembl infrastructure [73] is the basis for much of the man- the latest news from the project, some basic statistics, and a tool for
agement and analysis of the data in WormBase ParaSite. The main finding genomes of interest. The header bar is present on all pages in
components we use are (a) the MySQL database schema and Appli- WormBase ParaSite. As well as providing short-cut links to a variety
cation Programming Interface for loading and storing the data [74]; of tools and documentation pages (http://parasite.wormbase.org/
(b) the genome analysis pipelines and workflow management tools info), it has a search-box which allows searching for genes by a vari-
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
4 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx
Fig. 2. Compara tree showing the evolution of fox-1 in Trichinella spiralis, a gene involved in sex determination. In this view, the Trichinella sub-tree has been expanded
for additional detail. Other parts of the tree are collapsed for brevity. A schematic of a protein multiple alignment of the homologs is shown in the right, with regions of
conservation represented as coloured blocks.
ety of criteria, including gene name, product name, protein domain expanding sub-trees according to interest (Fig. 2). Alongside this,
names/accessions, and Gene Ontology terms. The search box has an protein multiple alignments for the family, or selected sub-trees,
auto-complete function, allowing searches to be performed using can be viewed and downloaded in a variety of formats.
only partial information. It is also possible to create a personal user account in Worm-
We also maintain a page for each genome and gene in Worm- Base ParaSite, and to store configuration and tool results under that
Base ParaSite. The genome pages collect together a description of account. Attaching a custom genome browser track whilst “logged
the species, attribution/references for the genome sequencing and in” will result in that track becoming available on any computer
annotation, and example entry points to data for that genome (e.g. where the user has logged into their account. Additionally, results
gene, genomic region). Also shown are some basic statistics about from the online tools (see Sections 3.3–3.5) are stored indefinitely
the quality of the genome, for example CEGMA [77] and BUSCO for retrieval at a later date.
[78] scores. Recent releases have displayed these graphically in an
Assembly Statistics panel, using code developed by the LepBase 3.2. Genome browser
project [79]. We also provide these data as columns in the table
of all genomes (http://parasite.wormbase.org/species.html), which We provide a fully interactive genome browser for every
can be sorted by each of the criteria. genome. By default, this shows the assembly and annotated gene
The gene pages act a starting place for exploring various types models. Additional tracks can be switched on using the left-hand
of information associated with the genes, including transcript navigation menu of the genome browser. Some tracks are available
models, functional annotations, orthologues and paralogues, cross- for every genome: DNA sequence, 6-frame protein translation, %GC
references to other resources, and the genomic context of the gene content, repetitive elements, protein-coding gene structures, and
via the genome browser. A recent addition has been variation data non-coding RNAs. Other tracks are available only for genomes for
from re-sequencing of isolates from selected species. These data which data is currently available. For example, we have RNA-Seq
can be viewed directly in WormBase ParaSite, via custom pages tracks for 8 genomes (see Section 2.2). The RNA-Seq tracks can be
and tables (Fig. 1), and the genome browser (see Section 2.2). used obtain a graphical overview of the gene expression landscape
The principal entry point for the comparative genomics data for a gene, or genomic region (Fig. 3).
(Section 2.3) is from the gene pages. The left-hand side-bar has The browser allows users to view their own data in the context
links to tables of orthologues and paralogues for the gene, which of the reference data. For small files (less than 20MB), these can be
can be filtered in a variety of ways. It is also possible to view the full uploaded directly through the web interface. Larger files must be
tree for the family of which the gene is a member, collapsing and stored externally (e.g. on a FTP server), where they will be retrieved
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 5
Fig. 3. Genome browser view for the Smp 033000 gene in Schistosoma mansoni. Top: a zoomed in view showing 6-frame translation of DNA, the location of start and stop
codons. Bottom: a zoomed-out view, showing the genomic context of the gene, with other protein-coding and non-coding genes, and repetitive elements. Both views show
expression in 4 life-stages, showing that this gene has highest expression in the Somule 3 h post infection, consistent with the findings of the original study [63].
on-demand by the WormBase ParaSite website. A variety of file 3.4. Advanced querying
formats can be attached, including BigWig, BAM, CRAM and UCSC
Track Hubs (http://parasite.wormbase.org/info/Browsing/Upload). Advanced search and custom data export is available through
our BioMart tool (http://parasite.wormbase.org/biomart). BioMart
is designed to allow the simple and intuitive creation of custom
tables and sequence files from sets of genes that meet a defined set
of criteria [80].
A typical use of WormBase ParaSite BioMart involves three basic
3.3. Sequence searching steps:
Our BLAST service (using NCBI-BLAST+ [69]) can be used to
search a nucleotide or peptide sequence against the complete pro- 1. Define query filters: these are a set of criteria that must be met
teome or genome of any combination of species in WormBase for genes to be included in the output. Filters can be narrow (e.g.
ParaSite. We have short-cut buttons for searching common groups a specific list of gene identifiers) or broad (e.g. all genes for a
of species (e.g. all nematodes, or all platyhelminths), and also pro- species or entire clade of species).
vide a graphical widget for defining a fully customised list. By 2. Define output attributes: these are the columns that will appear
default, BLAST results are saved on the WormBase ParaSite servers in the output table or, for sequences, the information that will
for seven days. However, by saving a BLAST submission to a user be included in the entry headers.
account, results are retained indefinitely. BLAST hits can be viewed 3. View results. Initially, a preview of the first 10 lines of output are
and downloaded in a variety of formats (http://parasite.wormbase. shown in the browser. Once happy that the query is working as
org/info/Tools/blast.html). intended, the user can then export the full result set to a file.
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
6 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx
Fig. 4. Using BioMart to identify C. elegans genes that could be used to model the response of drug targeting transmembrane signaling receptors in Schistosoma mansoni.
Left: setting query filters to restrict the query to S. mansoni genes with a C. elegans orthologue, without a human orthologue and associated with transmembrane signaling
receptor activity. Right, top: setting output attributes to customize the output table. Right, bottom: preview of the results within the web browser, prior to download.
The types of data that be used as filters and attributes in include: release, (c) genomes for that species (identified by NCBI BioProject),
and (d) data files for that genome.
• Another tool we make available is the Ensembl Variant Effect
Gene IDs, names and descriptions
• Predictor (VEP) [82]. This annotates genomic variants from re-
Identifiers for data from external databases (e.g. UniProt, RefSeq)
• sequencing or population genomics studies with predictions of the
Gene structure (e.g. exons, introns)
• how the reference annotations are affected by the variants (e.g.
Protein domains and function (e.g. InterPro, Pfam)
• whether they coincide with protein-coding genes, and the puta-
Gene Ontology annotations
• tive effect they have on the corresponding protein sequence). The
Orthologues and paralogues
• user can upload a list of variants in some standard formats, e.g. the
Sequences (e.g. genomic, transcript, peptide)
Variant Call Format (VCF). Results are fully exportable, including
as annotated VCF files. Full documentation and examples are avail-
A typical use-case for BioMart might be to find all membrane
able on the website (http://parasite.wormbase.org/info/Tools/vep.
proteins in filarial nematodes that do not have a human ortho-
html).
logue. This query relies on the fact that we have flagged potential
For bioinformaticians wishing to develop applications using the
transmembrane proteins using the TMHMM [81] software. The
data in WormBase ParaSite, direct programmatic access is available
query is performed by (a) selecting the “Filarioidea” taxon in the
via two routes. Firstly, we provide a “RESTful” application program-
SPECIES filter; (b) selecting “Restrict to genes without orthologues
ming interface (API) [83] that can be used with any programming
in Human” in the HOMOLOGY filter; (c) selecting “Limit to genes
language. A documented catalogue of “endpoints” and example
with TMMHMM protein features” in the PROTEIN DOMAINS filter;
code is available through the RESTful API section of the website
and (d) Clicking the “Results” button. Another example use-case for
(http://parasite.wormbase.org/rest). Secondly, data be retrieved
shown in Fig. 4, and more detailed instructions and worked exam-
from our BioMart service using the R programming language
ples are available from our help pages (http://parasite.wormbase.
and BioMaRt Bioconductor package (http://parasite.wormbase.org/
org/info/Tools/biomart.html).
info/Tools/biomart.html). These services allow software developers
and analysts to retrieve and manipulate the latest data in Worm-
3.5. Other tools Base ParaSite on demand, without having to download complete
data sets in bulk.
We provide a collection of files for each genome, including the
genome sequence itself (both plain and repeat-masked), and the
annotations in a variety of formats. Individual files can be down- 4. Challenges and future directions
loaded from our browser (http://parasite.wormbase.org/ftp.html),
or bulk downloads can be performed by FTP (ftp://ftp.wormbase. The availability of many helminth reference genomes opens up
org/pub/wormbase/parasite). The structure of the site is fairly self- the possibility a wide variety of functional genomics investigations,
explanatory, with folders nested by (a) releases, (b) species in that from life-stage specific gene expression to population genetics.
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 7
Making the data from these studies available in way that is useful to Saunders, A.E. Sluder, K. Smith, M. Stanke, T.R. Unnasch, J. Ware, A.D. Wei, G.
Weil, D.J. Williams, Y. Zhang, S.A. Williams, C. Fraser-Liggett, B. Slatko, M.L.
users has traditionally required inefficient and redundant interac-
Blaxter, A.L. Scott, Draft genome of the filarial nematode parasite Brugia
tions between data providers, WormBase curators and numerous
malayi, Science 317 (5845) (2007) 1756–1760.
archive and specialist resources. In order to reduce these costs as [4] P. Abad, J. Gouzy, J.M. Aury, P. Castagnone-Sereno, E.G. Danchin, E. Deleury, L.
Perfus-Barbeoch, V. Anthouard, F. Artiguenave, V.C. Blok, M.C. Caillaud, P.M.
data volumes continue to grow rapidly, we are increasingly mov-
Coutinho, C. Dasilva, F. De Luca, F. Deau, M. Esquibet, T. Flutre, J.V. Goldstone,
ing towards a model where our website pulls data directly from the
N. Hamamouch, T. Hewezi, O. Jaillon, C. Jubin, P. Leonetti, M. Magliano, T.R.
archives and other complementary resources, using their applica- Maier, G.V. Markov, P. McVeigh, G. Pesole, J. Poulain, M. Robinson-Rechavi, E.
Sallet, B. Segurens, D. Steinbach, T. Tytgat, E. Ugarte, C. van Ghelder, P.
tion programming interfaces. As a pilot for this model, we have
Veronico, T.J. Baum, M. Blaxter, T. Bleve-Zacheo, E.L. Davis, J.J. Ewbank, B.
brokered the submission of genome variation data from Strongy-
Favery, E. Grenier, B. Henrissat, J.T. Jones, V. Laudet, A.G. Maule, H.
loides ratti (Mark Viney, pers. comm) and Schistosoma mansoni [84] Quesneville, M.N. Rosso, T. Schiex, G. Smant, J. Weissenbach, P. Wincker,
to the European Variation Archive (EVA) [85]. Our website code has Genome sequence of the metazoan plant-parasitic nematode Meloidogyne
incognita, Nat. Biotechnol. 26 (8) (2008) 909–915.
been extended to extract these data from the EVA and display them
[5] C.H. Opperman, D.M. Bird, V.M. Williamson, D.S. Rokhsar, M. Burke, J. Cohn, J.
directly (see Section 3.1).
Cromer, S. Diener, J. Gajan, S. Graham, T.D. Houfek, Q. Liu, T. Mitros, J. Schaff,
For the past year, we have been aligning selected helminth RNA- R. Schaffer, E. Scholl, B.R. Sosinski, V.P. Thomas, E. Windham, Sequence and
genetic map of Meloidogyne hapla: a compact nematode genome for plant
Seq data sets to the corresponding reference genome, and making
parasitism, Proc. Natl. Acad. Sci. U. S. A. 105 (39) (2008) 14802–14807.
the alignments available as Track Hubs [86] for visualization in
[6] Schistosoma japonicum Genome Sequencing and Functional Analysis
WormBase-ParaSite and other compliant browsers (Fig. 4). We are Consortium, The Schistosoma japonicum genome reveals features of
host-parasite interplay, Nature, 460, 7253, (2009) 345-51. Please see https://
now collaborating with the Gene Expression Atlas (GxA)[56] to
www.ncbi.nlm.nih.gov/pubmed/19606140.
provide curation, analysis, display and querying of helminth gene
[7] M. Berriman, B.J. Haas, P.T. LoVerde, R.A. Wilson, G.P. Dillon, G.C. Cerqueira,
expression data. Specific high-value data-sets are curated within S.T. Mashiyama, B. Al-Lazikani, L.F. Andrade, P.D. Ashton, M.A. Aslett, D.C.
Bartholomeu, G. Blandin, C.R. Caffrey, A. Coghlan, R. Coulson, T.A. Day, A.
the GxA framework, and these can be interrogated by the GxA tools
Delcher, R. DeMarco, A. Djikeng, T. Eyre, J.A. Gamble, E. Ghedin, Y. Gu, C.
(for example querying for genes with a similar expression profile
Hertz-Fowler, H. Hirai, Y. Hirai, R. Houston, A. Ivens, D.A. Johnston, D. Lacerda,
to a given gene). To simplify the user experience, we plan to embed C.D. Macedo, P. McVeigh, Z. Ning, G. Oliveira, J.P. Overington, J. Parkhill, M.
Pertea, R.J. Pierce, A.V. Protasio, M.A. Quail, M.A. Rajandream, J. Rogers, M.
access to these tools directly in the WormBase ParaSite. We will
Sajid, S.L. Salzberg, M. Stanke, A.R. Tivey, O. White, D.L. Williams, J. Wortman,
also expand the curation to expression data sets in other species.
W. Wu, M. Zamanian, A. Zerlotini, C.M. Fraser-Liggett, B.G. Barrell, N.M.
We will continue to extend WormBase ParaSite with function- El-Sayed, The genome of the blood fluke Schistosoma mansoni, Nature 460
ality geared towards the use-cases of helminth parasitologists. One (7253) (2009) 352–358.
[8] M. Mitreva, D.P. Jasmer, D.S. Zarlenga, Z. Wang, S. Abubucker, J. Martin, C.M.
example of this is the identification of candidate genes for thera-
Taylor, Y. Yin, L. Fulton, P. Minx, S.P. Yang, W.C. Warren, R.S. Fulton, V.
peutics. In the near future, we will develop a pipeline to identify
Bhonagiri, X. Zhang, K. Hallsworth-Pepin, S.W. Clifton, J.P. McCarter, J.
reliable homology between helminth genes and known drug tar- Appleton, E.R. Mardis, R.K. Wilson, The draft genome of the parasitic
nematode Trichinella spiralis, Nat. Genet. 43 (3) (2011) 228–235.
gets in the ChEMBL database [87].
[9] T. Kikuchi, J.A. Cotton, J.J. Dalzell, K. Hasegawa, N. Kanzaki, P. McVeigh, T.
Takanashi, I.J. Tsai, S.A. Assefa, P.J. Cock, T.D. Otto, M. Hunt, A.J. Reid, A.
Funding Sanchez-Flores, K. Tsuchihara, T. Yokoi, M.C. Larsson, J. Miwa, A.G. Maule, N.
Sahashi, J.T. Jones, M. Berriman, Genomic insights into the origin of parasitism
in the emerging plant pathogen Bursaphelenchus xylophilus, PLoS Pathog. 7
WormBase ParaSite is funded by the UK Biotechnology and Bio- (9) (2011) e1002219.
[10] A.R. Jex, S. Liu, B. Li, N.D. Young, R.S. Hall, Y. Li, L. Yang, N. Zeng, X. Xu, Z. Xiong,
logical Sciences Research Council [BB/K020080].
F. Chen, X. Wu, G. Zhang, X. Fang, Y. Kang, G.A. Anderson, T.W. Harris, B.E.
Campbell, J. Vlaminck, T. Wang, C. Cantacessi, E.M. Schwarz, S. Ranganathan,
Acknowledgements P. Geldhof, P. Nejsum, P.W. Sternberg, H. Yang, J. Wang, J. Wang, R.B. Gasser,
Ascaris suum draft genome, Nature 479 (7374) (2011) 529–533.
[11] N.D. Young, A.R. Jex, B. Li, S. Liu, L. Yang, Z. Xiong, Y. Li, C. Cantacessi, R.S. Hall,
We are indebted to Alessandra Traini, Eleanor Stanley and Jane X. Xu, F. Chen, X. Wu, A. Zerlotini, G. Oliveira, A. Hofmann, G. Zhang, X. Fang,
Lomax for their work on WormBase ParaSite. We would also like Y. Kang, B.E. Campbell, A. Loukas, S. Ranganathan, D. Rollinson, G. Rinaldi, P.J.
Brindley, H. Yang, J. Wang, J. Wang, R.B. Gasser, Whole-genome sequence of
to thank Michael Paulini, Avril Coghlan, and other members of the
Schistosoma haematobium, Nat. Genet. 44 (2) (2012) 221–225.
WormBase and WTSI Parasite genomics groups for useful advice
[12] C. Godel, S. Kumar, G. Koutsovoulos, P. Ludin, D. Nilsson, F. Comandatore, N.
and feedback. Wrobel, M. Thompson, C.D. Schmid, S. Goto, F. Bringaud, A. Wolstenholme, C.
Bandi, C. Epe, R. Kaminsky, M. Blaxter, P. Maser, The genome of the
heartworm, Dirofilaria immitis, reveals drug and vaccine targets, FASEB J. 26
References (11) (2012) 4650–4661.
[13] J. Wang, M. Mitreva, M. Berriman, A. Thorne, V. Magrini, G. Koutsovoulos, S.
Kumar, M.L. Blaxter, R.E. Davis, Silencing of germline-expressed genes by DNA
[1] K.L. Howe, B.J. Bolt, S. Cain, J. Chan, W.J. Chen, P. Davis, J. Done, T. Down, S.
elimination in somatic cells, Dev. Cell 23 (5) (2012) 1072–1080.
Gao, C. Grove, T.W. Harris, R. Kishore, R. Lee, J. Lomax, Y. Li, H.M. Muller, C.
[14] Y. Huang, W. Chen, X. Wang, H. Liu, Y. Chen, L. Guo, F. Luo, J. Sun, Q. Mao, P.
Nakamura, P. Nuin, M. Paulini, D. Raciti, G. Schindelman, E. Stanley, M.A. Tuli,
Liang, Z. Xie, C. Zhou, Y. Tian, X. Lv, L. Huang, J. Zhou, Y. Hu, R. Li, F. Zhang, H.
K. Van Auken, D. Wang, X. Wang, G. Williams, A. Wright, K. Yook, M.
Lei, W. Li, X. Hu, C. Liang, J. Xu, X. Li, X. Yu, The carcinogenic liver fluke,
Berriman, P. Kersey, T. Schedl, L. Stein, P.W. Sternberg, WormBase 2016:
Clonorchis sinensis: new assembly, reannotation and analysis of the genome
expanding to enable helminth genomic research, Nucleic Acids Res. 44 (D1)
and characterization of tissue transcriptomes, PLoS One 8 (1) (2013) e54732.
(2016) D774–D780.
[15] I.J. Tsai, M. Zarowiecki, N. Holroyd, A. Garciarrubio, A. Sanchez-Flores, K.L.
[2] L.D. Stein, Z. Bao, D. Blasiar, T. Blumenthal, M.R. Brent, N. Chen, A. Chinwalla,
Brooks, A. Tracey, R.J. Bobes, G. Fragoso, E. Sciutto, M. Aslett, H. Beasley, H.M.
L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. D’Eustachio, D.H. Fitch, L.A. Fulton,
Bennett, J. Cai, F. Camicia, R. Clark, M. Cucher, N. De Silva, T.A. Day, P.
R.E. Fulton, S. Griffiths-Jones, T.W. Harris, L.W. Hillier, R. Kamath, P.E.
Deplazes, K. Estrada, C. Fernandez, P.W. Holland, J. Hou, S. Hu, T. Huckvale, S.S.
Kuwabara, E.R. Mardis, M.A. Marra, T.L. Miner, P. Minx, J.C. Mullikin, R.W.
Hung, L. Kamenetzky, J.A. Keane, F. Kiss, U. Koziol, O. Lambert, K. Liu, X. Luo, Y.
Plumb, J. Rogers, J.E. Schein, M. Sohrmann, J. Spieth, J.E. Stajich, C. Wei, D.
Luo, N. Macchiaroli, S. Nichol, J. Paps, J. Parkinson, N. Pouchkina-Stantcheva,
Willey, R.K. Wilson, R. Durbin, R.H. Waterston, The genome sequence of
N. Riddiford, M. Rosenzvit, G. Salinas, J.D. Wasmuth, M. Zamanian, Y. Zheng,
Caenorhabditis briggsae: a platform for comparative genomics, PLoS Biol. 1
Taenia solium Genome Consortium, X. Cai, X. Soberon, P.D. Olson, J.P. Laclette,
(2) (2003) E45.
K. Brehm, M. Berriman, The genomes of four tapeworm species reveal
[3] E. Ghedin, S. Wang, D. Spiro, E. Caler, Q. Zhao, J. Crabtree, J.E. Allen, A.L.
adaptations to parasitism, Nature 496 (7443) (2013) 57–63.
Delcher, D.B. Guiliano, D. Miranda-Saavedra, S.V. Angiuoli, T. Creasy, P.
[16] C.A. Desjardins, G.C. Cerqueira, J.M. Goldberg, J.C. Dunning Hotopp, B.J. Haas, J.
Amedeo, B. Haas, N.M. El-Sayed, J.R. Wortman, T. Feldblyum, L. Tallon, M.
Zucker, J.M. Ribeiro, S. Saif, J.Z. Levin, L. Fan, Q. Zeng, C. Russ, J.R. Wortman,
Schatz, M. Shumway, H. Koo, S.L. Salzberg, S. Schobel, M. Pertea, M. Pop, O.
D.L. Fink, B.W. Birren, T.B. Nutman, Genomics of Loa loa, a wolbachia-free
White, G.J. Barton, C.K. Carlow, M.J. Crawford, J. Daub, M.W. Dimmic, C.F.
filarial parasite of humans, Nat. Genet. 45 (5) (2013) 495–500.
Estes, J.M. Foster, M. Ganatra, W.F. Gregory, N.M. Johnson, J. Jin, R.
[17] X. Bai, B.J. Adams, T.A. Ciche, S. Clifton, R. Gaugler, K.S. Kim, J. Spieth, P.W.
Komuniecki, I. Korf, S. Kumar, S. Laney, B.W. Li, W. Li, T.H. Lindblom, S.
Sternberg, R.K. Wilson, P.S. Grewal, A lover and a fighter: the genome
Lustigman, D. Ma, C.V. Maina, D.M. Martin, J.P. McCarter, L. McReynolds, M.
Mitreva, T.B. Nutman, J. Parkinson, J.M. Peregrin-Alvarez, C. Poole, Q. Ren, L.
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
8 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx
sequence of an entomopathogenic nematode Heterorhabditis bacteriophora, A. Kulkarni, D. Harbecke, E. Nagayasu, S. Nichol, Y. Ogura, M.A. Quail, N.
PLoS One 8 (7) (2013) e69618. Randle, D. Xia, N.W. Brattig, H. Soblik, D.M. Ribeiro, A. Sanchez-Flores, T.
[18] R. Laing, T. Kikuchi, A. Martinelli, I.J. Tsai, R.N. Beech, E. Redman, N. Holroyd, Hayashi, T. Itoh, D.R. Denver, W. Grant, J.D. Stoltzfus, J.B. Lok, H. Murayama, J.
D.J. Bartley, H. Beasley, C. Britton, D. Curran, E. Devaney, A. Gilabert, M. Hunt, Wastling, A. Streit, T. Kikuchi, M. Viney, M. Berriman, The genomic basis of
F. Jackson, S.L. Johnston, I. Kryukov, K. Li, A.A. Morrison, A.J. Reid, N. Sargison, parasitism in the Strongyloides clade of nematodes, Nat. Genet. 48 (3) (2016)
G.I. Saunders, J.D. Wasmuth, A. Wolstenholme, M. Berriman, J.S. Gilleard, J.A. 299–307.
Cotton, The genome and transcriptome of Haemonchus contortus, a key [35] P.K. Korhonen, E. Pozio, G. La Rosa, B.C. Chang, A.V. Koehler, E.P. Hoberg, P.R.
model parasite for drug and vaccine discovery, Genome Biol. 14 (8) (2013) Boag, P. Tan, A.R. Jex, A. Hofmann, P.W. Sternberg, N.D. Young, R.B. Gasser,
R88. Phylogenomic and biogeographic reconstruction of the Trichinella complex,
[19] E.M. Schwarz, P.K. Korhonen, B.E. Campbell, N.D. Young, A.R. Jex, A. Jabbar, Nat. Commun. 7 (2016) 10513.
R.S. Hall, A. Mondal, A.C. Howe, J. Pell, A. Hofmann, P.R. Boag, X.Q. Zhu, T. [36] S.T. Small, L.J. Reimer, D.J. Tisch, C.L. King, B.M. Christensen, P.M. Siba, J.W.
Gregory, A. Loukas, B.A. Williams, I. Antoshechkin, C. Brown, P.W. Sternberg, Kazura, D. Serre, P.A. Zimmerman, Population genomics of the filarial
R.B. Gasser, The genome and developmental transcriptome of the strongylid nematode parasite Wuchereria bancrofti from mosquitoes, Mol. Ecol. 25 (7)
nematode Haemonchus contortus, Genome Biol. 14 (8) (2013) R89. (2016) 1465–1477.
[20] H. Zheng, W. Zhang, L. Zhang, Z. Zhang, J. Li, G. Lu, Y. Zhu, Y. Wang, Y. Huang, J. [37] S.N. McNulty, C. Strube, B.A. Rosa, J.C. Martin, R. Tyagi, Y.J. Choi, Q. Wang, K.
Liu, H. Kang, J. Chen, L. Wang, A. Chen, S. Yu, Z. Gao, L. Jin, W. Gu, Z. Wang, L. Hallsworth Pepin, X. Zhang, P. Ozersky, R.K. Wilson, P.W. Sternberg, R.B.
Zhao, B. Shi, H. Wen, R. Lin, M.K. Jones, B. Brejova, T. Vinar, G. Zhao, D.P. Gasser, M. Mitreva, Dictyocaulus viviparus genome, variome and
McManus, Z. Chen, Y. Zhou, S. Wang, The genome of the hydatid tapeworm transcriptome elucidate lungworm biology and support future intervention,
Echinococcus granulosus, Nat. Genet. 45 (10) (2013) 1168–1175. Sci. Rep. 6 (2016) 20316.
[21] P.H. Schiffer, M. Kroiher, C. Kraus, G.D. Koutsovoulos, S. Kumar, J.I. Camps, N.A. [38] Wellcome Trust Sanger Institute, The 50 Helminth Genomes Initiative, 2014,
Nsah, D. Stappert, K. Morris, P. Heger, J. Altmuller, P. Frommolt, P. Nurnberg, http://www.sanger.ac.uk/science/collaboration/50hgp, Accessed 6 October
W.K. Thomas, M.L. Blaxter, E. Schierenberg, The genome of Romanomermis 2016.
culicivorax: revealing fundamental changes in the core developmental [39] G. Cochrane, I. Karsch-Mizrachi, T. Takagi, International nucleotide sequence
genetic toolkit in Nematoda, BMC Genomics 14 (2013) 923. database the international nucleotide sequence database collaboration,
[22] Y.T. Tang, X. Gao, B.A. Rosa, S. Abubucker, K. Hallsworth-Pepin, J. Martin, R. Nucleic Acids Res. 44 (D1) (2016) D48–D50.
Tyagi, E. Heizer, X. Zhang, V. Bhonagiri-Palsikar, P. Minx, W.C. Warren, Q. [40] K. Howe, P. Davis, M. Paulini, M.A. Tuli, G. Williams, K. Yook, R. Durbin, P.
Wang, B. Zhan, P.J. Hotez, P.W. Sternberg, A. Dougall, S.T. Gaze, J. Mulvenna, J. Kersey, P.W. Sternberg, WormBase: annotating many nematode genomes,
Sotillo, S. Ranganathan, E.M. Rabelo, R.K. Wilson, P.L. Felgner, J. Bethony, J.M. Worm 1 (1) (2012) 15–21.
Hawdon, R.B. Gasser, A. Loukas, M. Mitreva, Genome of the human hookworm [41] F.J. Logan-Klumpler, N. De Silva, U. Boehme, M.B. Rogers, G. Velarde, J.A.
Necator americanus, Nat. Genet. 46 (3) (2014) 261–269. McQuillan, T. Carver, M. Aslett, C. Olsen, S. Subramanian, I. Phan, C. Farris, S.
[23] J.A. Cotton, C.J. Lilley, L.M. Jones, T. Kikuchi, A.J. Reid, P. Thorpe, I.J. Tsai, H. Mitra, G. Ramasamy, H. Wang, A. Tivey, A. Jackson, R. Houston, J. Parkhill, M.
Beasley, V. Blok, P.J. Cock, S. Eves-van den Akker, N. Holroyd, M. Hunt, S. Holden, O.S. Harb, B.P. Brunk, P.J. Myler, D. Roos, M. Carrington, D.F. Smith, C.
Mantelin, H. Naghra, A. Pain, J.E. Palomares-Rius, M. Zarowiecki, M. Berriman, Hertz-Fowler, M. Berriman, GeneDB—an annotation database for pathogens,
J.T. Jones, P.E. Urwin, The genome and life-stage specific transcriptomes of Nucleic Acids Res. 40 (2012) D98–D108, Database issue.
Globodera pallida elucidate key aspects of plant parasitism by a cyst [42] C. Holt, M. Yandell, MAKER2: an annotation pipeline and genome-database
nematode, Genome Biol. 15 (3) (2014) R43. management tool for second-generation genome projects, BMC Bioinf. 12
[24] A.R. Jex, P. Nejsum, E.M. Schwarz, L. Hu, N.D. Young, R.S. Hall, P.K. Korhonen, S. (2011) 491.
Liao, S. Thamsborg, J. Xia, P. Xu, S. Wang, J.P. Scheerlinck, A. Hofmann, P.W. [43] M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, B. Morgenstern,
Sternberg, J. Wang, R.B. Gasser, Genome and transcriptome of the porcine AUGUSTUS: ab initio prediction of alternative transcripts, Nucleic Acids Res.
whipworm Trichuris suis, Nat. Genet. 46 (7) (2014) 701–706. 34 (2006) W435–W439, Web Server issue.
[25] B.J. Foth, I.J. Tsai, A.J. Reid, A.J. Bancroft, S. Nichol, A. Tracey, N. Holroyd, J.A. [44] V. Ter-Hovhannisyan, A. Lomsadze, Y.O. Chernoff, M. Borodovsky, Gene
Cotton, E.J. Stanley, M. Zarowiecki, J.Z. Liu, T. Huckvale, P.J. Cooper, R.K. prediction in novel fungal genomes using an ab initio algorithm with
Grencis, M. Berriman, Whipworm genome and dual-species transcriptome unsupervised training, Genome Res. 18 (12) (2008) 1979–1990.
analyses provide molecular insights into an intimate host-parasite [45] I. Korf, Gene finding in novel genomes, BMC Bioinf. 5 (2004) 59.
interaction, Nat. Genet. 46 (7) (2014) 693–700. [46] R. She, J.S. Chu, B. Uyar, J. Wang, K. Wang, N. Chen, genBlastG: using BLAST
[26] N.D. Young, N. Nagarajan, S.J. Lin, P.K. Korhonen, A.R. Jex, R.S. Hall, H. searches to build homologous gene models, Bioinformatics 27 (15) (2011)
Safavi-Hemami, W. Kaewkong, D. Bertrand, S. Gao, Q. Seet, S. Wongkham, B.T. 2141–2143.
Teh, C. Wongkham, P.M. Intapan, W. Maleewong, X. Yang, M. Hu, Z. Wang, A. [47] T.D. Otto, G.P. Dillon, W.S. Degrave, M. Berriman, RATT: rapid annotation
Hofmann, P.W. Sternberg, P. Tan, J. Wang, R.B. Gasser, The Opisthorchis transfer tool, Nucleic Acids Res. 39 (9) (2011) e57.
viverrini genome provides insights into life in the bile duct, Nat. Commun. 5 [48] K. Lagesen, P. Hallin, E.A. Rodland, H.H. Staerfeldt, T. Rognes, D.W. Ussery,
(2014) 4378. RNAmmer: consistent and rapid annotation of ribosomal RNA genes, Nucleic
[27] L.J. Tallon, X. Liu, S. Bennuru, M.C. Chibucos, A. Godinez, S. Ott, X. Zhao, L. Acids Res. 35 (9) (2007) 3100–3108.
Sadzewicz, C.M. Fraser, T.B. Nutman, J.C. Dunning Hotopp, Single molecule [49] T.M. Lowe, S.R. Eddy, tRNAscan-SE: a program for improved detection of
sequencing and genome assembly of a clinical specimen of Loa loa, the transfer RNA genes in genomic sequence, Nucleic Acids Res. 25 (5) (1997)
causative agent of loiasis, BMC Genomics 15 (2014) 788. 955–964.
[28] H.M. Bennett, H.P. Mok, E. Gkrania-Klotsas, I.J. Tsai, E.J. Stanley, N.M. Antoun, [50] E.P. Nawrocki, S.W. Burge, A. Bateman, J. Daub, R.Y. Eberhardt, S.R. Eddy, E.W.
A. Coghlan, B. Harsha, A. Traini, D.M. Ribeiro, S. Steinbiss, S.B. Lucas, K.S. Floden, P.P. Gardner, T.A. Jones, J. Tate, R.D. Finn, Rfam 12.0: updates to the
Allinson, S.J. Price, T.S. Santarius, A.J. Carmichael, P.L. Chiodini, N. Holroyd, A.F. RNA families database, Nucleic Acids Res. 43 (2015) D130–D137, Database
Dean, M. Berriman, The genome of the sparganosis tapeworm Spirometra issue.
erinaceieuropaei isolated from the biopsy of a migrating brain lesion, Genome [51] UniProt Consortium, UniProt: a hub for protein information, Nucleic Acids
Biol. 15 (11) (2014) 510. Res, 43, (Database issue), (2015) D204-12. Please see https://www.ncbi.nlm.
[29] X.Q. Zhu, P.K. Korhonen, H. Cai, N.D. Young, P. Nejsum, G. von nih.gov/pubmed/25348405.
Samson-Himmelstjerna, P.R. Boag, P. Tan, Q. Li, J. Min, Y. Yang, X. Wang, X. [52] P. Jones, D. Binns, H.Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J.
Fang, R.S. Hall, A. Hofmann, P.W. Sternberg, A.R. Jex, R.B. Gasser, Genetic Maslen, A. Mitchell, G. Nuka, S. Pesseat, A.F. Quinn, A. Sangrador-Vegas, M.
blueprint of the zoonotic pathogen Toxocara canis, Nat. Commun. 6 (2015) Scheremetjew, S.Y. Yong, R. Lopez, S. Hunter, InterProScan 5: genome-scale
6145. protein function classification, Bioinformatics 30 (9) (2014) 1236–1240.
[30] E.M. Schwarz, Y. Hu, I. Antoshechkin, M.M. Miller, P.W. Sternberg, R.V. Aroian, [53] A. Mitchell, H.Y. Chang, L. Daugherty, M. Fraser, S. Hunter, R. Lopez, C.
The genome and transcriptome of the zoonotic hookworm Ancylostoma McAnulla, C. McMenamin, G. Nuka, S. Pesseat, A. Sangrador-Vegas, M.
ceylanicum identify infection-specific gene families, Nat. Genet. 47 (4) (2015) Scheremetjew, C. Rato, S.Y. Yong, A. Bateman, M. Punta, T.K. Attwood, C.J.
416–422. Sigrist, N. Redaschi, C. Rivoire, I. Xenarios, D. Kahn, D. Guyot, P. Bork, I.
[31] K. Cwiklinski, J.P. Dalton, P.J. Dufresne, J. La Course, D.J. Williams, J. Letunic, J. Gough, M. Oates, D. Haft, H. Huang, D.A. Natale, C.H. Wu, C. Orengo,
Hodgkinson, S. Paterson, The Fasciola hepatica genome: gene duplication and I. Sillitoe, H. Mi, P.D. Thomas, R.D. Finn, The InterPro protein families
polymorphism reveals adaptation to the host environment and the capacity database: the classification resource after 15 years, Nucleic Acids Res. 43
for rapid evolution, Genome Biol. 16 (2015) 71. (2015) D213–D221, Database issue.
[32] A.R. Dillman, M. Macchietto, C.F. Porter, A. Rogers, B. Williams, I. [54] R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C.
Antoshechkin, M.M. Lee, Z. Goodwin, X. Lu, E.E. Lewis, H. Goodrich-Blair, S.P. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A.
Stock, B.J. Adams, P.W. Sternberg, A. Mortazavi, Comparative genomics of Bateman, The Pfam protein families database: towards a more sustainable
Steinernema reveals deeply conserved gene regulatory networks, Genome future, Nucleic Acids Res. 44 (D1) (2016) D279–D285.
Biol. 16 (2015) 200. [55] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis,
[33] K. Wasik, J. Gurtowski, X. Zhou, O.M. Ramos, M.J. Delas, G. Battistoni, O. El K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A.
Demerdash, I. Falciatori, D.B. Vizoso, A.D. Smith, P. Ladurner, L. Scharer, W.R. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G.
McCombie, G.J. Hannon, M. Schatz, Genome and transcriptome of the Sherlock, Gene ontology: tool for the unification of biology. The Gene
regeneration-competent flatworm, Macrostomum lignano, Proc. Natl. Acad. Ontology Consortium, Nat. Genet. 25 (1) (2000) 25–29.
Sci. U. S. A. 112 (40) (2015) 12462–12467. [56] R. Petryszak, M. Keays, Y.A. Tang, N.A. Fonseca, E. Barrera, T. Burdett, A.
[34] V.L. Hunt, I.J. Tsai, A. Coghlan, A.J. Reid, N. Holroyd, B.J. Foth, A. Tracey, J.A. Fullgrabe, A.M. Fuentes, S. Jupp, S. Koskinen, O. Mannion, L. Huerta, K. Megy,
Cotton, E.J. Stanley, H. Beasley, H.M. Bennett, K. Brooks, B. Harsha, R. Kajitani, C. Snow, E. Williams, M. Barzine, E. Hastings, H. Weisser, J. Wright, P. Jaiswal,
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005
G Model
MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS
K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 9
W. Huber, J. Choudhary, H.E. Parkinson, A. Brazma, Expression Atlas [71] I.M. Wallace, O. O’Sullivan, D.G. Higgins, C. Notredame, M-Coffee: combining
update—an integrated database of gene and protein expression in humans, multiple sequence alignment methods with T-Coffee, Nucleic Acids Res. 34
animals and plants, Nucleic Acids Res. 44 (D1) (2016) D746–G752. (6) (2006) 1692–1699.
[57] A. Dobin, C.A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. [72] K. Katoh, D.M. Standley, MAFFT multiple sequence alignment software
Chaisson, T.R. Gingeras, STAR: ultrafast universal RNA-seq aligner, version 7: improvements in performance and usability, Mol. Biol. Evol. 30 (4)
Bioinformatics 29 (1) (2013) 15–21. (2013) 772–780.
[58] Y.J. Choi, E. Ghedin, M. Berriman, J. McQuillan, N. Holroyd, G.F. Mayhew, B.M. [73] A. Yates, W. Akanni, M.R. Amode, D. Barrell, K. Billis, D. Carvalho-Silva, C.
Christensen, M.L. Michalski, A deep sequencing approach to comparatively Cummins, P. Clapham, S. Fitzgerald, L. Gil, C.G. Giron, L. Gordon, T. Hourlier,
analyze the transcriptome of lifecycle stages of the filarial worm, Brugia S.E. Hunt, S.H. Janacek, N. Johnson, T. Juettemann, S. Keenan, I. Lavidas, F.J.
malayi, PLoS Negl. Trop. Dis. 5 (12) (2011) e1409. Martin, T. Maurel, W. McLaren, D.N. Murphy, R. Nag, M. Nuhn, A. Parker, M.
[59] C. Ballesteros, L. Tritten, M. O’Neill, E. Burkman, W.I. Zaky, J. Xia, A. Moorhead, Patricio, M. Pignatelli, M. Rahtz, H.S. Riat, D. Sheppard, K. Taylor, A. Thormann,
S.A. Williams, T.G. Geary, The effect of In vitro cultivation on the A. Vullo, S.P. Wilder, A. Zadissa, E. Birney, J. Harrow, M. Muffato, E. Perry, M.
transcriptome of adult brugia malayi, PLoS Negl. Trop. Dis. 10 (1) (2016) Ruffier, G. Spudich, S.J. Trevanion, F. Cunningham, B.L. Aken, D.R. Zerbino, P.
e0004311. Flicek, Ensembl 2016, Nucleic Acids Res. 44 (D1) (2016) D710–D716.
[60] J.A. Cotton, S. Bennuru, A. Grote, B. Harsha, A. Tracey, R. Beech, S.R. Doyle, M. [74] A. Stabenau, G. McVicker, C. Melsopp, G. Proctor, M. Clamp, E. Birney, The
Dunn, J.C. Hotopp, N. Holroyd, T. Kikuchi, O. Lambert, A. Mhashilkar, P. Ensembl core software libraries, Genome Res. 14 (5) (2004) 929–933.
Mutowo, N. Nursimulu, J.M. Ribeiro, M.B. Rogers, E. Stanley, L.S. Swapna, I.J. [75] S.C. Potter, L. Clarke, V. Curwen, S. Keenan, E. Mongin, S.M. Searle, A. Stabenau,
Tsai, T.R. Unnasch, D. Voronin, J. Parkinson, T.B. Nutman, E. Ghedin, M. R. Storey, M. Clamp, The Ensembl analysis pipeline, Genome Res. 14 (5)
Berriman, S. Lustigman, The genome of Onchocerca volvulus, agent of river (2004) 934–941.
blindness, Nat. Microbiol. 2 (2016) 16216. [76] J. Severin, K. Beal, A.J. Vilella, S. Fitzgerald, M. Schuster, L. Gordon, A.
[61] S.N. McNulty, B.A. Rosa, P.U. Fischer, J.M. Rumsey, P. Erdmann-Gilmore, K.C. Ureta-Vidal, P. Flicek, J. Herrero, eHive: an artificial intelligence workflow
Curtis, S. Specht, R.R. Townsend, G.J. Weil, M. Mitreva, An integrated system for genomic analysis, BMC Bioinf. 11 (2010) 240.
multiomics approach to identify candidate antigens for serodiagnosis of [77] G. Parra, K. Bradnam, I. Korf, CEGMA: a pipeline to accurately annotate core
human onchocerciasis, Mol. Cell. Proteomics 14 (12) (2015) 3224–3233. genes in eukaryotic genomes, Bioinformatics 23 (9) (2007) 1061–1067.
[62] J.D. Stoltzfus, S. Minot, M. Berriman, T.J. Nolan, J.B. Lok, RNAseq analysis of the [78] F.A. Simao, R.M. Waterhouse, P. Ioannidis, E.V. Kriventseva, E.M. Zdobnov,
parasitic nematode Strongyloides stercoralis reveals divergent regulation of BUSCO: assessing genome assembly and annotation completeness with
canonical dauer pathways, PLoS Negl. Trop. Dis. 6 (10) (2012) e1854. single-copy orthologs, Bioinformatics 31 (19) (2015) 3210–3212.
[63] A.V. Protasio, I.J. Tsai, A. Babbage, S. Nichol, M. Hunt, M.A. Aslett, N. De Silva, [79] R.J. Challis, S. Kumar, K.K.K. Dasmahapatra, C.D. Jiggins, M. Blaxter, Lepbase:
G.S. Velarde, T.J. Anderson, R.C. Clark, C. Davidson, G.P. Dillon, N.E. Holroyd, the Lepidopteran genome database, bioRxiv (2016).
P.T. LoVerde, C. Lloyd, J. McQuillan, G. Oliveira, T.D. Otto, S.J. Parker-Manuel, [80] D. Smedley, S. Haider, B. Ballester, R. Holland, D. London, G. Thorisson, A.
M.A. Quail, R.A. Wilson, A. Zerlotini, D.W. Dunne, M. Berriman, A Kasprzyk, BioMart—biological queries made easy, BMC Genomics 10 (2009)
systematically improved high quality genome and transcriptome of the 22.
human blood fluke Schistosoma mansoni, PLoS Negl. Trop. Dis. 6 (1) (2012) [81] A. Krogh, B. Larsson, G. von Heijne, E.L. Sonnhammer, Predicting
e1455. transmembrane protein topology with a hidden Markov model: application
[64] J.J. Collins 3rd, B. Wang, B.G. Lambrus, M.E. Tharp, H. Iyer, P.A. Newmark, to complete genomes, J. Mol. Biol. 305 (3) (2001) 567–580.
Adult somatic stem cells in the human parasite Schistosoma mansoni, Nature [82] W. McLaren, L. Gil, S.E. Hunt, H.S. Riat, G.R. Ritchie, A. Thormann, P. Flicek, F.
494 (7438) (2013) 476–479. Cunningham, The Ensembl variant effect predictor, Genome Biol. 17 (1)
[65] S. Leutner, K.C. Oliveira, B. Rotter, S. Beckmann, C. Buro, S. Hahnel, J.P. (2016) 122.
Kitajima, S. Verjovski-Almeida, P. Winter, C.G. Grevelding, Combinatory [83] A. Yates, K. Beal, S. Keenan, W. McLaren, M. Pignatelli, G.R. Ritchie, M. Ruffier,
microarray and SuperSAGE analyses identify pairing-dependently transcribed K. Taylor, A. Vullo, P. Flicek, The Ensembl REST API: ensembl data for any
genes in Schistosoma mansoni males, including follistatin, PLoS Negl. Trop. language, Bioinformatics 31 (1) (2015) 143–145.
Dis. 7 (11) (2013) e2532. [84] T. Crellen, F. Allan, S. David, C. Durrant, T. Huckvale, N. Holroyd, A.M. Emery,
[66] B. Wang, J.J. Collins 3rd, P.A. Newmark, Functional genomic characterization of D. Rollinson, D.M. Aanensen, M. Berriman, J.P. Webster, J.A. Cotton, Whole
neoblast-like stem cells in larval Schistosoma mansoni, Elife 2 (2013) e00768. genome resequencing of the human parasite Schistosoma mansoni reveals
[67] C.L. Valentim, D. Cioli, F.D. Chevalier, X. Cao, A.B. Taylor, S.P. Holloway, L. population history and effects of selection, Sci. Rep. 6 (2016) 20954.
Pica-Mattoccia, A. Guidi, A. Basso, I.J. Tsai, M. Berriman, C. Carvalho-Queiroz, [85] C.E. Cook, M.T. Bergman, R.D. Finn, G. Cochrane, E. Birney, R. Apweiler, The
M. Almeida, H. Aguilar, D.E. Frantz, P.J. Hart, P.T. LoVerde, T.J. Anderson, european bioinformatics institute in 2016: data growth and integration,
Genetic and molecular basis of drug resistance and species-specific drug Nucleic Acids Res. 44 (D1) (2016) D20–D26.
action in schistosome parasites, Science 342 (6164) (2013) 1385–1389. [86] B.J. Raney, T.R. Dreszer, G.P. Barber, H. Clawson, P.A. Fujita, T. Wang, N.
[68] A.J. Vilella, J. Severin, A. Ureta-Vidal, L. Heng, R. Durbin, E. Birney, Nguyen, B. Paten, A.S. Zweig, D. Karolchik, W.J. Kent, Track data hubs enable
EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic visualization of user-defined genome-wide annotations on the UCSC genome
trees in vertebrates, Genome Res. 19 (2) (2009) 327–335. browser, Bioinformatics 30 (7) (2014) 1003–1005.
[69] C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T.L. [87] A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Kruger,
Madden, BLAST+: architecture and applications, BMC Bioinf. 10 (2009) 421. Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos, J.P.
[70] H. Mi, S. Poudel, A. Muruganujan, J.T. Casagrande, P.D. Thomas, PANTHER Overington, The ChEMBL bioactivity database: an update, Nucleic Acids Res.
version 10: expanded protein families and functions, and analysis tools, 42 (2014) D1083–D1090, Database issue.
Nucleic Acids Res. 44 (D1) (2016) D336–D342.
Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem
Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005