<<

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Molecular & Biochemical Parasitology

WormBase ParaSite a comprehensive resource for helminth genomics

a,∗ a b a b

Kevin L. Howe , Bruce J. Bolt , Myriam Shafie , Paul Kersey , Matthew Berriman

a

European Molecular Biology Laboratory, European Institute, Genome Campus, Hinxton, Cambridge CB10 1SD, UK

b

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

a r t i c l e i n f o a b s t r a c t

Article history: The number of publicly available parasitic worm genome sequences has increased dramatically in the

Received 7 October 2016

past three years, and research interest in helminth functional genomics is now quickly gathering pace in

Received in revised form

response to the foundation that has been laid by these collective efforts. A systematic approach to the

24 November 2016

organisation, curation, analysis and presentation of these data is clearly vital for maximising the util-

Accepted 25 November 2016

ity of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.

Available online xxx

.org) for interrogating helminth genomes on a large scale. Data from over 100 and

platyhelminth species are integrated, adding value by way of systematic and consistent functional annota-

Keywords:

tion (e.g. domains and terms), gene expression analysis (e.g. alignment of life-stage

Genome browser

specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide

Comparative genomics

Functional genomics several ways of exploring the data, including genome browsers, genome and gene summary pages, text

Helminths search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review,

WormBase we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the

Ensembl

displays and tools available to users for interrogating helminth genomic data.

© 2016 The Authors. Published by Elsevier B.V. This is an article under the CC BY-NC-ND

license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction agricultural importance and therefore attract research interest in

their own right. The first parasitic nematode to have its genome

The WormBase project (http://www.wormbase.org, [1]) was fully sequenced was [3], and since then, many have

initiated to facilitate and accelerate biological research that uses the followed [4–37]. Of the WormBase “core” species (the ones for

model nematode , by making the collected which the reference genome sequence and annotation are curated),

outputs of scientists accessible from a single resource. This enables three are parasitic worms: Brugia malayi, Onchocerca volvulus and

the transfer of this wealth of knowledge to the study of other meta- Strongyloides ratti (http://www.wormbase.org/species). However,

zoa, from to humans. C. elegans data described in the there are a number of challenges associated with expanding the

research literature, deposited in the archives, or submitted directly remit to helminths. Firstly, the primary research goal of parasitol-

is placed into context via a combination of detailed manual cura- ogists is to identify ways of controlling the parasite, and as such

tion and semi-automatic data integration. In addition, the project their desired entry points and common use-cases for WormBase

curates the reference genome sequence, gene structures and other are often distinct from those of scientists doing basic science using

genomic features for C. elegans, thereby providing a high quality C. elegans as a model. Secondly, the wider community of worm par-

foundation for downstream studies. The WormBase mission also asitologists includes those studying platyhelminths (flatworms),

extends to free-living relatives of C. elegans, which were the focus which are outside the taxonomic scope of WormBase, which is

of early post-genomic research in the areas of evolution and com- a nematode resource. Thirdly, there have been recent concerted

parative biology [2]. efforts to sequence the genomes of many helminths (nematodes

In recent years, WormBase has begun to expand its mission and platyhelmiths) (e.g. the 50 Helminth Genomes Initiative [38]),

to include plant and animal parasitic nematodes which, while resulting in a flood of new draft genomes that vary considerably in

more distantly related to C. elegans, have direct biomedical and contiguity.

In response to these challenges, we have created WormBase Par-

aSite, a comprehensive new resource for parasitic worm genomes,

∗ aiming to serve parasitologists working on helminths who use

Corresponding author.

E-mail address: [email protected] (K.L. Howe). genomics as an investigative tool. WormBase ParaSite leverages the

http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

0166-6851/© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.

0/).

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

2 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Table 1

infrastructure and expertise of the WormBase project, with a main

Species with RNA-Seq expression tracks in WormBase ParaSite (release 7). Included

mission of breadth (foundational data for many species) rather than

are some unpublished studies that are publicly available via the European

depth (rich data for a single species).

Nucleotide Archive.

Species Studies Samples (tracks)

Nematode Brugia malayi 3 [58,59] 57

2. Data integration and analysis

Onchocerca volvulus 2 [60,61] 19

Strongyloides ratti 2 [34] 8

2.1. Genomes Strongyloides stercoralis 2 [62] 42

Trichuris muris 1 [25] 22

Our mission is to include all publicly available nematode and Platyhelminth Echinococcus granulosus 2 [15] 6

platyhelminth genomes in WormBase ParaSite. Where multi- Echinococcus multicularis 2 [15] 35

Fasciola hepatica 2 [31] 17

ple genome assemblies exist for the same species (e.g. different

Hymenolepis microstoma 2 [15] 17

genome projects for Haemonchus contortus [18,19], and male and

Schistosoma mansoni 6 [63–67] 33

female isolates for Trichuris suis [24]), we have included all. Release

7 of the resource (August 2016) included genomes from 82 nema-

tode species (98 genomes) and 28 platyhelminth species (30 2.2.2. Functional annotation

genomes). We maintain an up-to-date list of genomes for ease of Literature describing gene function is sparse for most of the

reference (http://parasite.wormbase.org/species.html). species in WormBase ParaSite, although we anticipate that the

Our primary source for genome sequences is the International availability of the genomes will stimulate new research. In lieu of

Nucleotide Collaboration (INSDC) resources annotations based on experimental evidence, we have used estab-

[39]. In rare cases, we collect genomes from project-specific FTP lished automated methods to predict the function of as many gene

sites or direct engagement with genome project scientists. How- products as possible. Firstly, we broker the submission of all protein

ever, it is our strong preference to use genomes deposited with sequences in WormBase ParaSite to the UniProt Knowledgebase

INSDC as these have been formally checked, processed and ver- [51], and import the product names assigned by them. These are

sioned by an authoritative sequence archive resource. As well as defined by a combination of manual curation and automatic anno-

allowing us to formally disambiguate between different genomes tation. For more detailed annotation, we use InterProScan [52] from

for the same species, by way of the INSDC BioProject identifier the InterPro project [53]. As well as predicting protein domains (e.g.

(http://www.ncbi.nlm.gov/bioproject), it also simplifies the pro- from the [54] database), it also assigns terms from the Gene

cess of integrating and displaying functional genomics data that has Ontology (GO) [55]. We run the latest version of the pipeline and

been submitted to the archives. For the species in common between data across the complete proteome for every genome each release,

WormBase and WormBase ParaSite (C. elegans and other free-living so that we are always up-to-date with the latest InterPro domain

nematodes, Brugia malayi, Strongyloides ratti and Onchocerca volvu- and Gene Ontology annotations.

lus), we synchronise the data with a specific release of WormBase.

For example, WormBase ParaSite release 7 was synchronised with

2.2.3. Gene expression

WormBase release WS254, meaning that data for species in com-

High-throughput of RNA has become the standard

mon were identical in both resources.

assay for measuring gene expression, and numerous studies con-

ducting “RNA-Seq” experiments in helminth species have now been

performed and deposited in the sequence archives. We ultimately

2.2. Genome annotation aim to align all helminth RNA-Seq data to the corresponding refer-

ence genome using a standard pipeline, in collaboration with the

2.2.1. Gene structures Gene Expression Atlas project [56] (see Section 4). In the meantime,

The definition of the intron/exon structure of protein-coding we have processed data for a small number of species using our own

genes is an important foundation for interpretation of genome pipeline, as a pilot study (Table 1). Briefly, we use STAR [57] to align

function. WormBase curates gene structures for a C. elegans and a each experiment to the reference genome, merging resulting BAM

small set of nematode species with high-quality reference genomes files when they are technical replicates of the same sample. We

[40]. For others, we import annotations from the acknowledged have labelled each sample with the appropriate descriptive terms,

authority for that genome (e.g. GeneDB [41] for Schistosoma man- using ontological terms where possible (e.g. from the WormBase

soni), or from the group that sequenced and published the genome life-stage ontology). The alignments can be viewed on the genome

[4–37]. browser (see Section 3.2).

In certain circumstances, we collaborate with genome projects

to provide first pass annotation of protein-coding gene structures.

2.3. Comparative genomics

We have co-developed a pipeline that produces high-quality gene

predictions using MAKER [42] to integrate evidence from mul-

Having genomes, genes and for many helminth species

tiple sources: ab initio gene predictions from AUGUSTUS [43],

provides the opportunity to study the evolution of helminths at the

GeneMark-ES [44], and SNAP [45]; projected annotations from

molecular level. We use the Ensembl Compara system [68] to infer

C. elegans and the taxonomically nearest previously-annotated

the orthology and paralogy relationships between all nematode and

helminth using GenBlastG [46] and RATT [47]; and alignments of

platyhelminth genes, supplemented by genes from a number of

ESTs, mRNAs and proteins from related organisms. The pipeline was

other comparator species, including human, mouse and yeast.

used to annotate the majority of the genomes sequenced as part of

The method can be briefly summarised as follows: genes are first

the 50 Helminth Genomes Project [38].

organised into homologous clusters. Traditionally, this has been

A small number of genomes have curated structures for non-

done using NCBI BLAST+ [69] and hcluster sg (H. Li, unpublished)

coding RNAs, which we also import into WormBase ParaSite. To

but more recent versions of the system support the use of Hidden

supplement these, we have a pipeline that uses RNAmmer [48],

Markov Models from the PANTHER protein classification system

tRNAScan-SE [49] and [50] to predict structures for (respec-

[70]. A protein multiple alignment for each cluster is then con-

tively) ribosomal RNAs, transfer RNAs, and other non-coding RNAs.

structed using M-Coffee [71] or MAFFT [72]. The choice of program

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 3

Fig. 1. Viewing variation data in WormBase ParaSite. Top: the variation image view of a DNA helicase gene in Strongyloides ratti. The genomic variants are shown in a track

along the top, drawn to show their location with respect to the gene structure and protein domains, and colour-coded according to their putative effect on the protein.

Bottom: summary view of a single variant in the S. ratti genome. This view shows the effect of the variant on the reference annotation, genotypes for all samples assayed, and

(not shown in figure) metrics associated with the quality of the variant call (e.g. read-depth). This particular variant gives rise to a nonsense mutation in 6 of the 10 strains

assayed.

is made dynamically by the pipeline using a set of empirically- [75,76]; and (c) the website displays and tools [73]. We have cus-

derived rules regarding certain properties of the input data (for tomised some of the Ensembl tools for use in WormBase ParaSite.

example, MAFFT is generally favoured for larger clusters). Finally, For example, we have modified Ensembl code to provide sequence

TreeBeST [68] is used to produce a gene tree, combining a number search and data mining services that allow the interrogation of all

of methods which reconcile the sequence-based gene phylogeny species, or large sub-groups of species (e.g. all nematodes) at once

with the species phylogeny. The resulting tree represents an evo- in a single query (see Section 3.3).

lutionary history of the gene family, which can be used to infer

true orthologues (genes in different species related by a specia-

3. Website − displays and tools

tion event) and paralogues (genes in the same species related by a

duplication event).

3.1. General navigation

2.4. Infrastructure The most common entry point to WormBase ParaSite is via the

home page (http://parasite.wormbase.org), which collects together

The Ensembl infrastructure [73] is the basis for much of the man- the latest news from the project, some basic statistics, and a tool for

agement and analysis of the data in WormBase ParaSite. The main finding genomes of interest. The header bar is present on all pages in

components we use are (a) the MySQL database schema and Appli- WormBase ParaSite. As well as providing short-cut links to a variety

cation Programming Interface for loading and storing the data [74]; of tools and documentation pages (http://parasite.wormbase.org/

(b) the genome analysis pipelines and workflow management tools info), it has a search-box which allows searching for genes by a vari-

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

4 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 2. Compara tree showing the evolution of fox-1 in Trichinella spiralis, a gene involved in sex determination. In this view, the Trichinella sub-tree has been expanded

for additional detail. Other parts of the tree are collapsed for brevity. A schematic of a protein multiple alignment of the homologs is shown in the right, with regions of

conservation represented as coloured blocks.

ety of criteria, including gene name, product name, expanding sub-trees according to interest (Fig. 2). Alongside this,

names/accessions, and Gene Ontology terms. The search box has an protein multiple alignments for the family, or selected sub-trees,

auto-complete function, allowing searches to be performed using can be viewed and downloaded in a variety of formats.

only partial information. It is also possible to create a personal user account in Worm-

We also maintain a page for each genome and gene in Worm- Base ParaSite, and to store configuration and tool results under that

Base ParaSite. The genome pages collect together a description of account. Attaching a custom genome browser track whilst “logged

the species, attribution/references for the genome sequencing and in” will result in that track becoming available on any computer

annotation, and example entry points to data for that genome (e.g. where the user has logged into their account. Additionally, results

gene, genomic region). Also shown are some basic statistics about from the online tools (see Sections 3.3–3.5) are stored indefinitely

the quality of the genome, for example CEGMA [77] and BUSCO for retrieval at a later date.

[78] scores. Recent releases have displayed these graphically in an

Assembly Statistics panel, using code developed by the LepBase 3.2. Genome browser

project [79]. We also provide these data as columns in the table

of all genomes (http://parasite.wormbase.org/species.html), which We provide a fully interactive genome browser for every

can be sorted by each of the criteria. genome. By default, this shows the assembly and annotated gene

The gene pages act a starting place for exploring various types models. Additional tracks can be switched on using the left-hand

of information associated with the genes, including transcript navigation menu of the genome browser. Some tracks are available

models, functional annotations, orthologues and paralogues, cross- for every genome: DNA sequence, 6-frame protein translation, %GC

references to other resources, and the genomic context of the gene content, repetitive elements, protein-coding gene structures, and

via the genome browser. A recent addition has been variation data non-coding RNAs. Other tracks are available only for genomes for

from re-sequencing of isolates from selected species. These data which data is currently available. For example, we have RNA-Seq

can be viewed directly in WormBase ParaSite, via custom pages tracks for 8 genomes (see Section 2.2). The RNA-Seq tracks can be

and tables (Fig. 1), and the genome browser (see Section 2.2). used obtain a graphical overview of the gene expression landscape

The principal entry point for the comparative genomics data for a gene, or genomic region (Fig. 3).

(Section 2.3) is from the gene pages. The left-hand side-bar has The browser allows users to view their own data in the context

links to tables of orthologues and paralogues for the gene, which of the reference data. For small files (less than 20MB), these can be

can be filtered in a variety of ways. It is also possible to view the full uploaded directly through the web interface. Larger files must be

tree for the family of which the gene is a member, collapsing and stored externally (e.g. on a FTP server), where they will be retrieved

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 5

Fig. 3. Genome browser view for the Smp 033000 gene in Schistosoma mansoni. Top: a zoomed in view showing 6-frame translation of DNA, the location of start and stop

codons. Bottom: a zoomed-out view, showing the genomic context of the gene, with other protein-coding and non-coding genes, and repetitive elements. Both views show

expression in 4 life-stages, showing that this gene has highest expression in the Somule 3 h post infection, consistent with the findings of the original study [63].

on-demand by the WormBase ParaSite website. A variety of file 3.4. Advanced querying

formats can be attached, including BigWig, BAM, CRAM and UCSC

Track Hubs (http://parasite.wormbase.org/info/Browsing/Upload). Advanced search and custom data export is available through

our BioMart tool (http://parasite.wormbase.org/biomart). BioMart

is designed to allow the simple and intuitive creation of custom

tables and sequence files from sets of genes that meet a defined set

of criteria [80].

A typical use of WormBase ParaSite BioMart involves three basic

3.3. Sequence searching steps:

Our BLAST service (using NCBI-BLAST+ [69]) can be used to

search a nucleotide or peptide sequence against the complete pro- 1. Define query filters: these are a set of criteria that must be met

teome or genome of any combination of species in WormBase for genes to be included in the output. Filters can be narrow (e.g.

ParaSite. We have short-cut buttons for searching common groups a specific list of gene identifiers) or broad (e.g. all genes for a

of species (e.g. all nematodes, or all platyhelminths), and also pro- species or entire clade of species).

vide a graphical widget for defining a fully customised list. By 2. Define output attributes: these are the columns that will appear

default, BLAST results are saved on the WormBase ParaSite servers in the output table or, for sequences, the information that will

for seven days. However, by saving a BLAST submission to a user be included in the entry headers.

account, results are retained indefinitely. BLAST hits can be viewed 3. View results. Initially, a preview of the first 10 lines of output are

and downloaded in a variety of formats (http://parasite.wormbase. shown in the browser. Once happy that the query is working as

org/info/Tools/.html). intended, the user can then export the full result set to a file.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

6 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

Fig. 4. Using BioMart to identify C. elegans genes that could be used to model the response of drug targeting transmembrane signaling receptors in Schistosoma mansoni.

Left: setting query filters to restrict the query to S. mansoni genes with a C. elegans orthologue, without a human orthologue and associated with transmembrane signaling

receptor activity. Right, top: setting output attributes to customize the output table. Right, bottom: preview of the results within the web browser, prior to download.

The types of data that be used as filters and attributes in include: release, (c) genomes for that species (identified by NCBI BioProject),

and (d) data files for that genome.

• Another tool we make available is the Ensembl Variant Effect

Gene IDs, names and descriptions

• Predictor (VEP) [82]. This annotates genomic variants from re-

Identifiers for data from external databases (e.g. UniProt, RefSeq)

• sequencing or population genomics studies with predictions of the

Gene structure (e.g. exons, introns)

• how the reference annotations are affected by the variants (e.g.

Protein domains and function (e.g. InterPro, Pfam)

• whether they coincide with protein-coding genes, and the puta-

Gene Ontology annotations

• tive effect they have on the corresponding protein sequence). The

Orthologues and paralogues

• user can upload a list of variants in some standard formats, e.g. the

Sequences (e.g. genomic, transcript, peptide)

Variant Call Format (VCF). Results are fully exportable, including

as annotated VCF files. Full documentation and examples are avail-

A typical use-case for BioMart might be to find all membrane

able on the website (http://parasite.wormbase.org/info/Tools/vep.

proteins in filarial nematodes that do not have a human ortho-

html).

logue. This query relies on the fact that we have flagged potential

For bioinformaticians wishing to develop applications using the

transmembrane proteins using the TMHMM [81] software. The

data in WormBase ParaSite, direct programmatic access is available

query is performed by (a) selecting the “Filarioidea” taxon in the

via two routes. Firstly, we provide a “RESTful” application program-

SPECIES filter; (b) selecting “Restrict to genes without orthologues

ming interface (API) [83] that can be used with any programming

in Human” in the HOMOLOGY filter; (c) selecting “Limit to genes

language. A documented catalogue of “endpoints” and example

with TMMHMM protein features” in the PROTEIN DOMAINS filter;

code is available through the RESTful API section of the website

and (d) Clicking the “Results” button. Another example use-case for

(http://parasite.wormbase.org/rest). Secondly, data be retrieved

shown in Fig. 4, and more detailed instructions and worked exam-

from our BioMart service using the R programming language

ples are available from our help pages (http://parasite.wormbase.

and BioMaRt Bioconductor package (http://parasite.wormbase.org/

org/info/Tools/biomart.html).

info/Tools/biomart.html). These services allow software developers

and analysts to retrieve and manipulate the latest data in Worm-

3.5. Other tools Base ParaSite on demand, without having to download complete

data sets in bulk.

We provide a collection of files for each genome, including the

genome sequence itself (both plain and repeat-masked), and the

annotations in a variety of formats. Individual files can be down- 4. Challenges and future directions

loaded from our browser (http://parasite.wormbase.org/ftp.html),

or bulk downloads can be performed by FTP (ftp://ftp.wormbase. The availability of many helminth reference genomes opens up

org/pub/wormbase/parasite). The structure of the site is fairly self- the possibility a wide variety of functional genomics investigations,

explanatory, with folders nested by (a) releases, (b) species in that from life-stage specific gene expression to population genetics.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 7

Making the data from these studies available in way that is useful to Saunders, A.E. Sluder, K. Smith, M. Stanke, T.R. Unnasch, J. Ware, A.D. Wei, G.

Weil, D.J. Williams, Y. Zhang, S.A. Williams, C. Fraser-Liggett, B. Slatko, M.L.

users has traditionally required inefficient and redundant interac-

Blaxter, A.L. Scott, Draft genome of the filarial nematode parasite Brugia

tions between data providers, WormBase curators and numerous

malayi, Science 317 (5845) (2007) 1756–1760.

archive and specialist resources. In order to reduce these costs as [4] P. Abad, J. Gouzy, J.M. Aury, P. Castagnone-Sereno, E.G. Danchin, E. Deleury, L.

Perfus-Barbeoch, V. Anthouard, F. Artiguenave, V.C. Blok, M.C. Caillaud, P.M.

data volumes continue to grow rapidly, we are increasingly mov-

Coutinho, C. Dasilva, F. De Luca, F. Deau, M. Esquibet, T. Flutre, J.V. Goldstone,

ing towards a model where our website pulls data directly from the

N. Hamamouch, T. Hewezi, O. Jaillon, C. Jubin, P. Leonetti, M. Magliano, T.R.

archives and other complementary resources, using their applica- Maier, G.V. Markov, P. McVeigh, G. Pesole, J. Poulain, M. Robinson-Rechavi, E.

Sallet, B. Segurens, D. Steinbach, T. Tytgat, E. Ugarte, C. van Ghelder, P.

tion programming interfaces. As a pilot for this model, we have

Veronico, T.J. Baum, M. Blaxter, T. Bleve-Zacheo, E.L. Davis, J.J. Ewbank, B.

brokered the submission of genome variation data from Strongy-

Favery, E. Grenier, B. Henrissat, J.T. Jones, V. Laudet, A.G. Maule, H.

loides ratti (Mark Viney, pers. comm) and Schistosoma mansoni [84] Quesneville, M.N. Rosso, T. Schiex, G. Smant, J. Weissenbach, P. Wincker,

to the European Variation Archive (EVA) [85]. Our website code has Genome sequence of the metazoan plant-parasitic nematode Meloidogyne

incognita, Nat. Biotechnol. 26 (8) (2008) 909–915.

been extended to extract these data from the EVA and display them

[5] C.H. Opperman, D.M. Bird, V.M. Williamson, D.S. Rokhsar, M. Burke, J. Cohn, J.

directly (see Section 3.1).

Cromer, S. Diener, J. Gajan, S. Graham, T.D. Houfek, Q. Liu, T. Mitros, J. Schaff,

For the past year, we have been aligning selected helminth RNA- R. Schaffer, E. Scholl, B.R. Sosinski, V.P. Thomas, E. Windham, Sequence and

genetic map of Meloidogyne hapla: a compact nematode genome for plant

Seq data sets to the corresponding reference genome, and making

parasitism, Proc. Natl. Acad. Sci. U. S. A. 105 (39) (2008) 14802–14807.

the alignments available as Track Hubs [86] for visualization in

[6] Schistosoma japonicum Genome Sequencing and Functional Analysis

WormBase-ParaSite and other compliant browsers (Fig. 4). We are Consortium, The Schistosoma japonicum genome reveals features of

host-parasite interplay, Nature, 460, 7253, (2009) 345-51. Please see https://

now collaborating with the Gene Expression Atlas (GxA)[56] to

www.ncbi.nlm.nih.gov/pubmed/19606140.

provide curation, analysis, display and querying of helminth gene

[7] M. Berriman, B.J. Haas, P.T. LoVerde, R.A. Wilson, G.P. Dillon, G.C. Cerqueira,

expression data. Specific high-value data-sets are curated within S.T. Mashiyama, B. Al-Lazikani, L.F. Andrade, P.D. Ashton, M.A. Aslett, D.C.

Bartholomeu, G. Blandin, C.R. Caffrey, A. Coghlan, R. Coulson, T.A. Day, A.

the GxA framework, and these can be interrogated by the GxA tools

Delcher, R. DeMarco, A. Djikeng, T. Eyre, J.A. Gamble, E. Ghedin, Y. Gu, C.

(for example querying for genes with a similar expression profile

Hertz-Fowler, H. Hirai, Y. Hirai, R. Houston, A. Ivens, D.A. Johnston, D. Lacerda,

to a given gene). To simplify the user experience, we plan to embed C.D. Macedo, P. McVeigh, Z. Ning, G. Oliveira, J.P. Overington, J. Parkhill, M.

Pertea, R.J. Pierce, A.V. Protasio, M.A. Quail, M.A. Rajandream, J. Rogers, M.

access to these tools directly in the WormBase ParaSite. We will

Sajid, S.L. Salzberg, M. Stanke, A.R. Tivey, O. White, D.L. Williams, J. Wortman,

also expand the curation to expression data sets in other species.

W. Wu, M. Zamanian, A. Zerlotini, C.M. Fraser-Liggett, B.G. Barrell, N.M.

We will continue to extend WormBase ParaSite with function- El-Sayed, The genome of the blood fluke Schistosoma mansoni, Nature 460

ality geared towards the use-cases of helminth parasitologists. One (7253) (2009) 352–358.

[8] M. Mitreva, D.P. Jasmer, D.S. Zarlenga, Z. Wang, S. Abubucker, J. Martin, C.M.

example of this is the identification of candidate genes for thera-

Taylor, Y. Yin, L. Fulton, P. Minx, S.P. Yang, W.C. Warren, R.S. Fulton, V.

peutics. In the near future, we will develop a pipeline to identify

Bhonagiri, X. Zhang, K. Hallsworth-Pepin, S.W. Clifton, J.P. McCarter, J.

reliable homology between helminth genes and known drug tar- Appleton, E.R. Mardis, R.K. Wilson, The draft genome of the parasitic

nematode Trichinella spiralis, Nat. Genet. 43 (3) (2011) 228–235.

gets in the ChEMBL database [87].

[9] T. Kikuchi, J.A. Cotton, J.J. Dalzell, K. Hasegawa, N. Kanzaki, P. McVeigh, T.

Takanashi, I.J. Tsai, S.A. Assefa, P.J. Cock, T.D. Otto, M. Hunt, A.J. Reid, A.

Funding Sanchez-Flores, K. Tsuchihara, T. Yokoi, M.C. Larsson, J. Miwa, A.G. Maule, N.

Sahashi, J.T. Jones, M. Berriman, Genomic insights into the origin of parasitism

in the emerging plant pathogen Bursaphelenchus xylophilus, PLoS Pathog. 7

WormBase ParaSite is funded by the UK Biotechnology and Bio- (9) (2011) e1002219.

[10] A.R. Jex, S. Liu, B. Li, N.D. Young, R.S. Hall, Y. Li, L. Yang, N. Zeng, X. Xu, Z. Xiong,

logical Sciences Research Council [BB/K020080].

F. Chen, X. Wu, G. Zhang, X. Fang, Y. Kang, G.A. Anderson, T.W. Harris, B.E.

Campbell, J. Vlaminck, T. Wang, C. Cantacessi, E.M. Schwarz, S. Ranganathan,

Acknowledgements P. Geldhof, P. Nejsum, P.W. Sternberg, H. Yang, J. Wang, J. Wang, R.B. Gasser,

Ascaris suum draft genome, Nature 479 (7374) (2011) 529–533.

[11] N.D. Young, A.R. Jex, B. Li, S. Liu, L. Yang, Z. Xiong, Y. Li, C. Cantacessi, R.S. Hall,

We are indebted to Alessandra Traini, Eleanor Stanley and Jane X. Xu, F. Chen, X. Wu, A. Zerlotini, G. Oliveira, A. Hofmann, G. Zhang, X. Fang,

Lomax for their work on WormBase ParaSite. We would also like Y. Kang, B.E. Campbell, A. Loukas, S. Ranganathan, D. Rollinson, G. Rinaldi, P.J.

Brindley, H. Yang, J. Wang, J. Wang, R.B. Gasser, Whole-genome sequence of

to thank Michael Paulini, Avril Coghlan, and other members of the

Schistosoma haematobium, Nat. Genet. 44 (2) (2012) 221–225.

WormBase and WTSI Parasite genomics groups for useful advice

[12] C. Godel, S. Kumar, G. Koutsovoulos, P. Ludin, D. Nilsson, F. Comandatore, N.

and feedback. Wrobel, M. Thompson, C.D. Schmid, S. Goto, F. Bringaud, A. Wolstenholme, C.

Bandi, C. Epe, R. Kaminsky, M. Blaxter, P. Maser, The genome of the

heartworm, Dirofilaria immitis, reveals drug and vaccine targets, FASEB J. 26

References (11) (2012) 4650–4661.

[13] J. Wang, M. Mitreva, M. Berriman, A. Thorne, V. Magrini, G. Koutsovoulos, S.

Kumar, M.L. Blaxter, R.E. Davis, Silencing of germline-expressed genes by DNA

[1] K.L. Howe, B.J. Bolt, S. Cain, J. Chan, W.J. Chen, P. Davis, J. Done, T. Down, S.

elimination in somatic cells, Dev. Cell 23 (5) (2012) 1072–1080.

Gao, C. Grove, T.W. Harris, R. Kishore, R. Lee, J. Lomax, Y. Li, H.M. Muller, C.

[14] Y. Huang, W. Chen, X. Wang, H. Liu, Y. Chen, L. Guo, F. Luo, J. Sun, Q. Mao, P.

Nakamura, P. Nuin, M. Paulini, D. Raciti, G. Schindelman, E. Stanley, M.A. Tuli,

Liang, Z. Xie, C. Zhou, Y. Tian, X. Lv, L. Huang, J. Zhou, Y. Hu, R. Li, F. Zhang, H.

K. Van Auken, D. Wang, X. Wang, G. Williams, A. Wright, K. Yook, M.

Lei, W. Li, X. Hu, C. Liang, J. Xu, X. Li, X. Yu, The carcinogenic liver fluke,

Berriman, P. Kersey, T. Schedl, L. Stein, P.W. Sternberg, WormBase 2016:

Clonorchis sinensis: new assembly, reannotation and analysis of the genome

expanding to enable helminth genomic research, Nucleic Acids Res. 44 (D1)

and characterization of tissue transcriptomes, PLoS One 8 (1) (2013) e54732.

(2016) D774–D780.

[15] I.J. Tsai, M. Zarowiecki, N. Holroyd, A. Garciarrubio, A. Sanchez-Flores, K.L.

[2] L.D. Stein, Z. Bao, D. Blasiar, T. Blumenthal, M.R. Brent, N. Chen, A. Chinwalla,

Brooks, A. Tracey, R.J. Bobes, G. Fragoso, E. Sciutto, M. Aslett, H. Beasley, H.M.

L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. D’Eustachio, D.H. Fitch, L.A. Fulton,

Bennett, J. Cai, F. Camicia, R. Clark, M. Cucher, N. De Silva, T.A. Day, P.

R.E. Fulton, S. Griffiths-Jones, T.W. Harris, L.W. Hillier, R. Kamath, P.E.

Deplazes, K. Estrada, C. Fernandez, P.W. Holland, J. Hou, S. Hu, T. Huckvale, S.S.

Kuwabara, E.R. Mardis, M.A. Marra, T.L. Miner, P. Minx, J.C. Mullikin, R.W.

Hung, L. Kamenetzky, J.A. Keane, F. Kiss, U. Koziol, O. Lambert, K. Liu, X. Luo, Y.

Plumb, J. Rogers, J.E. Schein, M. Sohrmann, J. Spieth, J.E. Stajich, C. Wei, D.

Luo, N. Macchiaroli, S. Nichol, J. Paps, J. Parkinson, N. Pouchkina-Stantcheva,

Willey, R.K. Wilson, R. Durbin, R.H. Waterston, The genome sequence of

N. Riddiford, M. Rosenzvit, G. Salinas, J.D. Wasmuth, M. Zamanian, Y. Zheng,

Caenorhabditis briggsae: a platform for comparative genomics, PLoS Biol. 1

Taenia solium Genome Consortium, X. Cai, X. Soberon, P.D. Olson, J.P. Laclette,

(2) (2003) E45.

K. Brehm, M. Berriman, The genomes of four tapeworm species reveal

[3] E. Ghedin, S. Wang, D. Spiro, E. Caler, Q. Zhao, J. Crabtree, J.E. Allen, A.L.

adaptations to parasitism, Nature 496 (7443) (2013) 57–63.

Delcher, D.B. Guiliano, D. Miranda-Saavedra, S.V. Angiuoli, T. Creasy, P.

[16] C.A. Desjardins, G.C. Cerqueira, J.M. Goldberg, J.C. Dunning Hotopp, B.J. Haas, J.

Amedeo, B. Haas, N.M. El-Sayed, J.R. Wortman, T. Feldblyum, L. Tallon, M.

Zucker, J.M. Ribeiro, S. Saif, J.Z. Levin, L. Fan, Q. Zeng, C. Russ, J.R. Wortman,

Schatz, M. Shumway, H. Koo, S.L. Salzberg, S. Schobel, M. Pertea, M. Pop, O.

D.L. Fink, B.W. Birren, T.B. Nutman, Genomics of Loa loa, a wolbachia-free

White, G.J. Barton, C.K. Carlow, M.J. Crawford, J. Daub, M.W. Dimmic, C.F.

filarial parasite of humans, Nat. Genet. 45 (5) (2013) 495–500.

Estes, J.M. Foster, M. Ganatra, W.F. Gregory, N.M. Johnson, J. Jin, R.

[17] X. Bai, B.J. Adams, T.A. Ciche, S. Clifton, R. Gaugler, K.S. Kim, J. Spieth, P.W.

Komuniecki, I. Korf, S. Kumar, S. Laney, B.W. Li, W. Li, T.H. Lindblom, S.

Sternberg, R.K. Wilson, P.S. Grewal, A lover and a fighter: the genome

Lustigman, D. Ma, C.V. Maina, D.M. Martin, J.P. McCarter, L. McReynolds, M.

Mitreva, T.B. Nutman, J. Parkinson, J.M. Peregrin-Alvarez, C. Poole, Q. Ren, L.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

8 K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx

sequence of an entomopathogenic nematode Heterorhabditis bacteriophora, A. Kulkarni, D. Harbecke, E. Nagayasu, S. Nichol, Y. Ogura, M.A. Quail, N.

PLoS One 8 (7) (2013) e69618. Randle, D. Xia, N.W. Brattig, H. Soblik, D.M. Ribeiro, A. Sanchez-Flores, T.

[18] R. Laing, T. Kikuchi, A. Martinelli, I.J. Tsai, R.N. Beech, E. Redman, N. Holroyd, Hayashi, T. Itoh, D.R. Denver, W. Grant, J.D. Stoltzfus, J.B. Lok, H. Murayama, J.

D.J. Bartley, H. Beasley, C. Britton, D. Curran, E. Devaney, A. Gilabert, M. Hunt, Wastling, A. Streit, T. Kikuchi, M. Viney, M. Berriman, The genomic basis of

F. Jackson, S.L. Johnston, I. Kryukov, K. Li, A.A. Morrison, A.J. Reid, N. Sargison, parasitism in the Strongyloides clade of nematodes, Nat. Genet. 48 (3) (2016)

G.I. Saunders, J.D. Wasmuth, A. Wolstenholme, M. Berriman, J.S. Gilleard, J.A. 299–307.

Cotton, The genome and transcriptome of Haemonchus contortus, a key [35] P.K. Korhonen, E. Pozio, G. La Rosa, B.C. Chang, A.V. Koehler, E.P. Hoberg, P.R.

model parasite for drug and vaccine discovery, Genome Biol. 14 (8) (2013) Boag, P. Tan, A.R. Jex, A. Hofmann, P.W. Sternberg, N.D. Young, R.B. Gasser,

R88. Phylogenomic and biogeographic reconstruction of the Trichinella complex,

[19] E.M. Schwarz, P.K. Korhonen, B.E. Campbell, N.D. Young, A.R. Jex, A. Jabbar, Nat. Commun. 7 (2016) 10513.

R.S. Hall, A. Mondal, A.C. Howe, J. Pell, A. Hofmann, P.R. Boag, X.Q. Zhu, T. [36] S.T. Small, L.J. Reimer, D.J. Tisch, C.L. King, B.M. Christensen, P.M. Siba, J.W.

Gregory, A. Loukas, B.A. Williams, I. Antoshechkin, C. Brown, P.W. Sternberg, Kazura, D. Serre, P.A. Zimmerman, Population genomics of the filarial

R.B. Gasser, The genome and developmental transcriptome of the strongylid nematode parasite Wuchereria bancrofti from mosquitoes, Mol. Ecol. 25 (7)

nematode Haemonchus contortus, Genome Biol. 14 (8) (2013) R89. (2016) 1465–1477.

[20] H. Zheng, W. Zhang, L. Zhang, Z. Zhang, J. Li, G. Lu, Y. Zhu, Y. Wang, Y. Huang, J. [37] S.N. McNulty, C. Strube, B.A. Rosa, J.C. Martin, R. Tyagi, Y.J. Choi, Q. Wang, K.

Liu, H. Kang, J. Chen, L. Wang, A. Chen, S. Yu, Z. Gao, L. Jin, W. Gu, Z. Wang, L. Hallsworth Pepin, X. Zhang, P. Ozersky, R.K. Wilson, P.W. Sternberg, R.B.

Zhao, B. Shi, H. Wen, R. Lin, M.K. Jones, B. Brejova, T. Vinar, G. Zhao, D.P. Gasser, M. Mitreva, Dictyocaulus viviparus genome, variome and

McManus, Z. Chen, Y. Zhou, S. Wang, The genome of the hydatid tapeworm transcriptome elucidate lungworm biology and support future intervention,

Echinococcus granulosus, Nat. Genet. 45 (10) (2013) 1168–1175. Sci. Rep. 6 (2016) 20316.

[21] P.H. Schiffer, M. Kroiher, C. Kraus, G.D. Koutsovoulos, S. Kumar, J.I. Camps, N.A. [38] Wellcome Trust Sanger Institute, The 50 Helminth Genomes Initiative, 2014,

Nsah, D. Stappert, K. Morris, P. Heger, J. Altmuller, P. Frommolt, P. Nurnberg, http://www.sanger.ac.uk/science/collaboration/50hgp, Accessed 6 October

W.K. Thomas, M.L. Blaxter, E. Schierenberg, The genome of Romanomermis 2016.

culicivorax: revealing fundamental changes in the core developmental [39] G. Cochrane, I. Karsch-Mizrachi, T. Takagi, International nucleotide sequence

genetic toolkit in Nematoda, BMC Genomics 14 (2013) 923. database the international nucleotide sequence database collaboration,

[22] Y.T. Tang, X. Gao, B.A. Rosa, S. Abubucker, K. Hallsworth-Pepin, J. Martin, R. Nucleic Acids Res. 44 (D1) (2016) D48–D50.

Tyagi, E. Heizer, X. Zhang, V. Bhonagiri-Palsikar, P. Minx, W.C. Warren, Q. [40] K. Howe, P. Davis, M. Paulini, M.A. Tuli, G. Williams, K. Yook, R. Durbin, P.

Wang, B. Zhan, P.J. Hotez, P.W. Sternberg, A. Dougall, S.T. Gaze, J. Mulvenna, J. Kersey, P.W. Sternberg, WormBase: annotating many nematode genomes,

Sotillo, S. Ranganathan, E.M. Rabelo, R.K. Wilson, P.L. Felgner, J. Bethony, J.M. Worm 1 (1) (2012) 15–21.

Hawdon, R.B. Gasser, A. Loukas, M. Mitreva, Genome of the human hookworm [41] F.J. Logan-Klumpler, N. De Silva, U. Boehme, M.B. Rogers, G. Velarde, J.A.

Necator americanus, Nat. Genet. 46 (3) (2014) 261–269. McQuillan, T. Carver, M. Aslett, C. Olsen, S. Subramanian, I. Phan, C. Farris, S.

[23] J.A. Cotton, C.J. Lilley, L.M. Jones, T. Kikuchi, A.J. Reid, P. Thorpe, I.J. Tsai, H. Mitra, G. Ramasamy, H. Wang, A. Tivey, A. Jackson, R. Houston, J. Parkhill, M.

Beasley, V. Blok, P.J. Cock, S. Eves-van den Akker, N. Holroyd, M. Hunt, S. Holden, O.S. Harb, B.P. Brunk, P.J. Myler, D. Roos, M. Carrington, D.F. Smith, C.

Mantelin, H. Naghra, A. Pain, J.E. Palomares-Rius, M. Zarowiecki, M. Berriman, Hertz-Fowler, M. Berriman, GeneDB—an annotation database for pathogens,

J.T. Jones, P.E. Urwin, The genome and life-stage specific transcriptomes of Nucleic Acids Res. 40 (2012) D98–D108, Database issue.

Globodera pallida elucidate key aspects of plant parasitism by a cyst [42] C. Holt, M. Yandell, MAKER2: an annotation pipeline and genome-database

nematode, Genome Biol. 15 (3) (2014) R43. management tool for second-generation genome projects, BMC Bioinf. 12

[24] A.R. Jex, P. Nejsum, E.M. Schwarz, L. Hu, N.D. Young, R.S. Hall, P.K. Korhonen, S. (2011) 491.

Liao, S. Thamsborg, J. Xia, P. Xu, S. Wang, J.P. Scheerlinck, A. Hofmann, P.W. [43] M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, B. Morgenstern,

Sternberg, J. Wang, R.B. Gasser, Genome and transcriptome of the porcine AUGUSTUS: ab initio prediction of alternative transcripts, Nucleic Acids Res.

whipworm Trichuris suis, Nat. Genet. 46 (7) (2014) 701–706. 34 (2006) W435–W439, Web Server issue.

[25] B.J. Foth, I.J. Tsai, A.J. Reid, A.J. Bancroft, S. Nichol, A. Tracey, N. Holroyd, J.A. [44] V. Ter-Hovhannisyan, A. Lomsadze, Y.O. Chernoff, M. Borodovsky, Gene

Cotton, E.J. Stanley, M. Zarowiecki, J.Z. Liu, T. Huckvale, P.J. Cooper, R.K. prediction in novel fungal genomes using an ab initio algorithm with

Grencis, M. Berriman, Whipworm genome and dual-species transcriptome unsupervised training, Genome Res. 18 (12) (2008) 1979–1990.

analyses provide molecular insights into an intimate host-parasite [45] I. Korf, Gene finding in novel genomes, BMC Bioinf. 5 (2004) 59.

interaction, Nat. Genet. 46 (7) (2014) 693–700. [46] R. She, J.S. Chu, B. Uyar, J. Wang, K. Wang, N. Chen, genBlastG: using BLAST

[26] N.D. Young, N. Nagarajan, S.J. Lin, P.K. Korhonen, A.R. Jex, R.S. Hall, H. searches to build homologous gene models, Bioinformatics 27 (15) (2011)

Safavi-Hemami, W. Kaewkong, D. Bertrand, S. Gao, Q. Seet, S. Wongkham, B.T. 2141–2143.

Teh, C. Wongkham, P.M. Intapan, W. Maleewong, X. Yang, M. Hu, Z. Wang, A. [47] T.D. Otto, G.P. Dillon, W.S. Degrave, M. Berriman, RATT: rapid annotation

Hofmann, P.W. Sternberg, P. Tan, J. Wang, R.B. Gasser, The Opisthorchis transfer tool, Nucleic Acids Res. 39 (9) (2011) e57.

viverrini genome provides insights into life in the bile duct, Nat. Commun. 5 [48] K. Lagesen, P. Hallin, E.A. Rodland, H.H. Staerfeldt, T. Rognes, D.W. Ussery,

(2014) 4378. RNAmmer: consistent and rapid annotation of ribosomal RNA genes, Nucleic

[27] L.J. Tallon, X. Liu, S. Bennuru, M.C. Chibucos, A. Godinez, S. Ott, X. Zhao, L. Acids Res. 35 (9) (2007) 3100–3108.

Sadzewicz, C.M. Fraser, T.B. Nutman, J.C. Dunning Hotopp, Single molecule [49] T.M. Lowe, S.R. Eddy, tRNAscan-SE: a program for improved detection of

sequencing and genome assembly of a clinical specimen of Loa loa, the transfer RNA genes in genomic sequence, Nucleic Acids Res. 25 (5) (1997)

causative agent of loiasis, BMC Genomics 15 (2014) 788. 955–964.

[28] H.M. Bennett, H.P. Mok, E. Gkrania-Klotsas, I.J. Tsai, E.J. Stanley, N.M. Antoun, [50] E.P. Nawrocki, S.W. Burge, A. Bateman, J. Daub, R.Y. Eberhardt, S.R. Eddy, E.W.

A. Coghlan, B. Harsha, A. Traini, D.M. Ribeiro, S. Steinbiss, S.B. Lucas, K.S. Floden, P.P. Gardner, T.A. Jones, J. Tate, R.D. Finn, Rfam 12.0: updates to the

Allinson, S.J. Price, T.S. Santarius, A.J. Carmichael, P.L. Chiodini, N. Holroyd, A.F. RNA families database, Nucleic Acids Res. 43 (2015) D130–D137, Database

Dean, M. Berriman, The genome of the sparganosis tapeworm Spirometra issue.

erinaceieuropaei isolated from the biopsy of a migrating brain lesion, Genome [51] UniProt Consortium, UniProt: a hub for protein information, Nucleic Acids

Biol. 15 (11) (2014) 510. Res, 43, (Database issue), (2015) D204-12. Please see https://www.ncbi.nlm.

[29] X.Q. Zhu, P.K. Korhonen, H. Cai, N.D. Young, P. Nejsum, G. von nih.gov/pubmed/25348405.

Samson-Himmelstjerna, P.R. Boag, P. Tan, Q. Li, J. Min, Y. Yang, X. Wang, X. [52] P. Jones, D. Binns, H.Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J.

Fang, R.S. Hall, A. Hofmann, P.W. Sternberg, A.R. Jex, R.B. Gasser, Genetic Maslen, A. Mitchell, G. Nuka, S. Pesseat, A.F. Quinn, A. Sangrador-Vegas, M.

blueprint of the zoonotic pathogen Toxocara canis, Nat. Commun. 6 (2015) Scheremetjew, S.Y. Yong, R. Lopez, S. Hunter, InterProScan 5: genome-scale

6145. protein function classification, Bioinformatics 30 (9) (2014) 1236–1240.

[30] E.M. Schwarz, Y. Hu, I. Antoshechkin, M.M. Miller, P.W. Sternberg, R.V. Aroian, [53] A. Mitchell, H.Y. Chang, L. Daugherty, M. Fraser, S. Hunter, R. Lopez, C.

The genome and transcriptome of the zoonotic hookworm Ancylostoma McAnulla, C. McMenamin, G. Nuka, S. Pesseat, A. Sangrador-Vegas, M.

ceylanicum identify infection-specific gene families, Nat. Genet. 47 (4) (2015) Scheremetjew, C. Rato, S.Y. Yong, A. Bateman, M. Punta, T.K. Attwood, C.J.

416–422. Sigrist, N. Redaschi, C. Rivoire, I. Xenarios, D. Kahn, D. Guyot, P. Bork, I.

[31] K. Cwiklinski, J.P. Dalton, P.J. Dufresne, J. La Course, D.J. Williams, J. Letunic, J. Gough, M. Oates, D. Haft, H. Huang, D.A. Natale, C.H. Wu, C. Orengo,

Hodgkinson, S. Paterson, The Fasciola hepatica genome: gene duplication and I. Sillitoe, H. Mi, P.D. Thomas, R.D. Finn, The InterPro protein families

polymorphism reveals adaptation to the host environment and the capacity database: the classification resource after 15 years, Nucleic Acids Res. 43

for rapid evolution, Genome Biol. 16 (2015) 71. (2015) D213–D221, Database issue.

[32] A.R. Dillman, M. Macchietto, C.F. Porter, A. Rogers, B. Williams, I. [54] R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C.

Antoshechkin, M.M. Lee, Z. Goodwin, X. Lu, E.E. Lewis, H. Goodrich-Blair, S.P. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A.

Stock, B.J. Adams, P.W. Sternberg, A. Mortazavi, Comparative genomics of Bateman, The Pfam protein families database: towards a more sustainable

Steinernema reveals deeply conserved gene regulatory networks, Genome future, Nucleic Acids Res. 44 (D1) (2016) D279–D285.

Biol. 16 (2015) 200. [55] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis,

[33] K. Wasik, J. Gurtowski, X. Zhou, O.M. Ramos, M.J. Delas, G. Battistoni, O. El K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A.

Demerdash, I. Falciatori, D.B. Vizoso, A.D. Smith, P. Ladurner, L. Scharer, W.R. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G.

McCombie, G.J. Hannon, M. Schatz, Genome and transcriptome of the Sherlock, Gene ontology: tool for the unification of biology. The Gene

regeneration-competent flatworm, Macrostomum lignano, Proc. Natl. Acad. Ontology Consortium, Nat. Genet. 25 (1) (2000) 25–29.

Sci. U. S. A. 112 (40) (2015) 12462–12467. [56] R. Petryszak, M. Keays, Y.A. Tang, N.A. Fonseca, E. Barrera, T. Burdett, A.

[34] V.L. Hunt, I.J. Tsai, A. Coghlan, A.J. Reid, N. Holroyd, B.J. Foth, A. Tracey, J.A. Fullgrabe, A.M. Fuentes, S. Jupp, S. Koskinen, O. Mannion, L. Huerta, K. Megy,

Cotton, E.J. Stanley, H. Beasley, H.M. Bennett, K. Brooks, B. Harsha, R. Kajitani, C. Snow, E. Williams, M. Barzine, E. Hastings, H. Weisser, J. Wright, P. Jaiswal,

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005

G Model

MOLBIO-11030; No. of Pages 9 ARTICLE IN PRESS

K.L. Howe et al. / Molecular & Biochemical Parasitology xxx (2016) xxx–xxx 9

W. Huber, J. Choudhary, H.E. Parkinson, A. Brazma, Expression Atlas [71] I.M. Wallace, O. O’Sullivan, D.G. Higgins, C. Notredame, M-Coffee: combining

update—an integrated database of gene and protein expression in humans, multiple methods with T-Coffee, Nucleic Acids Res. 34

animals and plants, Nucleic Acids Res. 44 (D1) (2016) D746–G752. (6) (2006) 1692–1699.

[57] A. Dobin, C.A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. [72] K. Katoh, D.M. Standley, MAFFT multiple sequence alignment software

Chaisson, T.R. Gingeras, STAR: ultrafast universal RNA-seq aligner, version 7: improvements in performance and usability, Mol. Biol. Evol. 30 (4)

Bioinformatics 29 (1) (2013) 15–21. (2013) 772–780.

[58] Y.J. Choi, E. Ghedin, M. Berriman, J. McQuillan, N. Holroyd, G.F. Mayhew, B.M. [73] A. Yates, W. Akanni, M.R. Amode, D. Barrell, K. Billis, D. Carvalho-Silva, C.

Christensen, M.L. Michalski, A deep sequencing approach to comparatively Cummins, P. Clapham, S. Fitzgerald, L. Gil, C.G. Giron, L. Gordon, T. Hourlier,

analyze the transcriptome of lifecycle stages of the filarial worm, Brugia S.E. Hunt, S.H. Janacek, N. Johnson, T. Juettemann, S. Keenan, I. Lavidas, F.J.

malayi, PLoS Negl. Trop. Dis. 5 (12) (2011) e1409. Martin, T. Maurel, W. McLaren, D.N. Murphy, R. Nag, M. Nuhn, A. Parker, M.

[59] C. Ballesteros, L. Tritten, M. O’Neill, E. Burkman, W.I. Zaky, J. Xia, A. Moorhead, Patricio, M. Pignatelli, M. Rahtz, H.S. Riat, D. Sheppard, K. Taylor, A. Thormann,

S.A. Williams, T.G. Geary, The effect of In vitro cultivation on the A. Vullo, S.P. Wilder, A. Zadissa, E. Birney, J. Harrow, M. Muffato, E. Perry, M.

transcriptome of adult brugia malayi, PLoS Negl. Trop. Dis. 10 (1) (2016) Ruffier, G. Spudich, S.J. Trevanion, F. Cunningham, B.L. Aken, D.R. Zerbino, P.

e0004311. Flicek, Ensembl 2016, Nucleic Acids Res. 44 (D1) (2016) D710–D716.

[60] J.A. Cotton, S. Bennuru, A. Grote, B. Harsha, A. Tracey, R. Beech, S.R. Doyle, M. [74] A. Stabenau, G. McVicker, C. Melsopp, G. Proctor, M. Clamp, E. Birney, The

Dunn, J.C. Hotopp, N. Holroyd, T. Kikuchi, O. Lambert, A. Mhashilkar, P. Ensembl core software libraries, Genome Res. 14 (5) (2004) 929–933.

Mutowo, N. Nursimulu, J.M. Ribeiro, M.B. Rogers, E. Stanley, L.S. Swapna, I.J. [75] S.C. Potter, L. Clarke, V. Curwen, S. Keenan, E. Mongin, S.M. Searle, A. Stabenau,

Tsai, T.R. Unnasch, D. Voronin, J. Parkinson, T.B. Nutman, E. Ghedin, M. R. Storey, M. Clamp, The Ensembl analysis pipeline, Genome Res. 14 (5)

Berriman, S. Lustigman, The genome of Onchocerca volvulus, agent of river (2004) 934–941.

blindness, Nat. Microbiol. 2 (2016) 16216. [76] J. Severin, K. Beal, A.J. Vilella, S. Fitzgerald, M. Schuster, L. Gordon, A.

[61] S.N. McNulty, B.A. Rosa, P.U. Fischer, J.M. Rumsey, P. Erdmann-Gilmore, K.C. Ureta-Vidal, P. Flicek, J. Herrero, eHive: an artificial intelligence workflow

Curtis, S. Specht, R.R. Townsend, G.J. Weil, M. Mitreva, An integrated system for genomic analysis, BMC Bioinf. 11 (2010) 240.

multiomics approach to identify candidate antigens for serodiagnosis of [77] G. Parra, K. Bradnam, I. Korf, CEGMA: a pipeline to accurately annotate core

human onchocerciasis, Mol. Cell. Proteomics 14 (12) (2015) 3224–3233. genes in eukaryotic genomes, Bioinformatics 23 (9) (2007) 1061–1067.

[62] J.D. Stoltzfus, S. Minot, M. Berriman, T.J. Nolan, J.B. Lok, RNAseq analysis of the [78] F.A. Simao, R.M. Waterhouse, P. Ioannidis, E.V. Kriventseva, E.M. Zdobnov,

parasitic nematode Strongyloides stercoralis reveals divergent regulation of BUSCO: assessing genome assembly and annotation completeness with

canonical dauer pathways, PLoS Negl. Trop. Dis. 6 (10) (2012) e1854. single-copy orthologs, Bioinformatics 31 (19) (2015) 3210–3212.

[63] A.V. Protasio, I.J. Tsai, A. Babbage, S. Nichol, M. Hunt, M.A. Aslett, N. De Silva, [79] R.J. Challis, S. Kumar, K.K.K. Dasmahapatra, C.D. Jiggins, M. Blaxter, Lepbase:

G.S. Velarde, T.J. Anderson, R.C. Clark, C. Davidson, G.P. Dillon, N.E. Holroyd, the Lepidopteran genome database, bioRxiv (2016).

P.T. LoVerde, C. Lloyd, J. McQuillan, G. Oliveira, T.D. Otto, S.J. Parker-Manuel, [80] D. Smedley, S. Haider, B. Ballester, R. Holland, D. London, G. Thorisson, A.

M.A. Quail, R.A. Wilson, A. Zerlotini, D.W. Dunne, M. Berriman, A Kasprzyk, BioMart—biological queries made easy, BMC Genomics 10 (2009)

systematically improved high quality genome and transcriptome of the 22.

human blood fluke Schistosoma mansoni, PLoS Negl. Trop. Dis. 6 (1) (2012) [81] A. Krogh, B. Larsson, G. von Heijne, E.L. Sonnhammer, Predicting

e1455. transmembrane protein topology with a : application

[64] J.J. Collins 3rd, B. Wang, B.G. Lambrus, M.E. Tharp, H. Iyer, P.A. Newmark, to complete genomes, J. Mol. Biol. 305 (3) (2001) 567–580.

Adult somatic stem cells in the human parasite Schistosoma mansoni, Nature [82] W. McLaren, L. Gil, S.E. Hunt, H.S. Riat, G.R. Ritchie, A. Thormann, P. Flicek, F.

494 (7438) (2013) 476–479. Cunningham, The Ensembl variant effect predictor, Genome Biol. 17 (1)

[65] S. Leutner, K.C. Oliveira, B. Rotter, S. Beckmann, C. Buro, S. Hahnel, J.P. (2016) 122.

Kitajima, S. Verjovski-Almeida, P. Winter, C.G. Grevelding, Combinatory [83] A. Yates, K. Beal, S. Keenan, W. McLaren, M. Pignatelli, G.R. Ritchie, M. Ruffier,

microarray and SuperSAGE analyses identify pairing-dependently transcribed K. Taylor, A. Vullo, P. Flicek, The Ensembl REST API: ensembl data for any

genes in Schistosoma mansoni males, including follistatin, PLoS Negl. Trop. language, Bioinformatics 31 (1) (2015) 143–145.

Dis. 7 (11) (2013) e2532. [84] T. Crellen, F. Allan, S. David, C. Durrant, T. Huckvale, N. Holroyd, A.M. Emery,

[66] B. Wang, J.J. Collins 3rd, P.A. Newmark, Functional genomic characterization of D. Rollinson, D.M. Aanensen, M. Berriman, J.P. Webster, J.A. Cotton, Whole

neoblast-like stem cells in larval Schistosoma mansoni, Elife 2 (2013) e00768. genome resequencing of the human parasite Schistosoma mansoni reveals

[67] C.L. Valentim, D. Cioli, F.D. Chevalier, X. Cao, A.B. Taylor, S.P. Holloway, L. population history and effects of selection, Sci. Rep. 6 (2016) 20954.

Pica-Mattoccia, A. Guidi, A. Basso, I.J. Tsai, M. Berriman, C. Carvalho-Queiroz, [85] C.E. Cook, M.T. Bergman, R.D. Finn, G. Cochrane, E. Birney, R. Apweiler, The

M. Almeida, H. Aguilar, D.E. Frantz, P.J. Hart, P.T. LoVerde, T.J. Anderson, european bioinformatics institute in 2016: data growth and integration,

Genetic and molecular basis of drug resistance and species-specific drug Nucleic Acids Res. 44 (D1) (2016) D20–D26.

action in schistosome parasites, Science 342 (6164) (2013) 1385–1389. [86] B.J. Raney, T.R. Dreszer, G.P. Barber, H. Clawson, P.A. Fujita, T. Wang, N.

[68] A.J. Vilella, J. Severin, A. Ureta-Vidal, L. Heng, R. Durbin, E. Birney, Nguyen, B. Paten, A.S. Zweig, D. Karolchik, W.J. Kent, Track data hubs enable

EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic visualization of user-defined genome-wide annotations on the UCSC genome

trees in vertebrates, Genome Res. 19 (2) (2009) 327–335. browser, Bioinformatics 30 (7) (2014) 1003–1005.

[69] C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T.L. [87] A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Kruger,

Madden, BLAST+: architecture and applications, BMC Bioinf. 10 (2009) 421. Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos, J.P.

[70] H. Mi, S. Poudel, A. Muruganujan, J.T. Casagrande, P.D. Thomas, PANTHER Overington, The ChEMBL bioactivity database: an update, Nucleic Acids Res.

version 10: expanded protein families and functions, and analysis tools, 42 (2014) D1083–D1090, Database issue.

Nucleic Acids Res. 44 (D1) (2016) D336–D342.

Please cite this article in press as: K.L. Howe, et al., WormBase ParaSite − a comprehensive resource for helminth genomics, Mol Biochem

Parasitol (2016), http://dx.doi.org/10.1016/j.molbiopara.2016.11.005