<<

Review articles

Genomics and the biology of parasites David A. Johnston,1* Mark L. Blaxter,2 Wim M. Degrave,3 Jeremy Foster,4 Alasdair C. Ivens,5 and Sara E. Melville6

Summary Despite the advances of modern , the threat of chronic illness, disfigure- ment, or death that can result from parasitic infection still affects the majority of the world population, retarding economic development. For most parasitic diseases, current therapeutics often leave much to be desired in terms of administration regime, toxicity, or effectiveness and potential vaccines are a long way from market. Our best prospects for identifying new targets for drug, vaccine, and diagnostics development and for dissecting the biological basis of drug resis- tance, antigenic diversity, infectivity and lie in parasite analy- sis, and international mapping and discovery initiatives are under way for a variety of protozoan and helminth parasites. These are far from ideal experimental , and the influence of biological and genomic characteristics on experi- mental approaches is discussed, progress is reviewed and future prospects are examined. BioEssays 1999;21:131–147. ௠ 1999 John Wiley & Sons, Inc.

Prologue sickness, loss of productive labour and premature death, imposes a multibillion dollar restriction on the economic The need for parasite genome analysis development of the Third World.(1,2) Despite the medical and healthcare revolution of the late 20th Amongst the most severe parasitic diseases, targeted for century, billions of people still suffer from one or more tropical special attention by the World Health Organization (WHO) parasitic disease and the constant drain imposed by chronic are malaria, leishmaniasis, Chagas disease (American try- panosomiasis), and African sleeping sickness (African try- panosomiasis), caused by protozoan parasites, and schisto- somiasis (Bilharzia) and cutaneous and lymphatic filariasis, 1Department of Zoology, The Natural History Museum, London, United Kingdom. caused by metazoan helminth parasites (Table 1). These 2Institute of Cell, Animal and Population Biology, King’s Buildings, diseases are intimately linked with poverty and persist through University of Edinburgh, Edinburgh, United Kingdom. the complex intertwining of socioeconomic factors (inad- 3 Department of Biochemistry and Molecular Biology, Oswaldo Cruz equate healthcare, poor housing and sanitation, political Institute, Rio de Janeiro, Brazil. 4New England Biolabs, Beverly, MA. inertia, difficult communications, etc.) with biological factors 5Department of Biochemistry, Imperial College, London, United King- (repeat infection, drug resistance, malnutrition, immunosup- dom. pressive effects of HIV/AIDS, emergence of disease vectors 6Department of Pathology, University of Cambridge, United Kingdom. resistant to chemical control, etc.). Economic development Abbreviations: BAC, bacterial artificial ; EST, expressed itself plays a role through environmental change that favours sequence tag; FISH, fluorescent in situ hybridisation; GST, genome sequence tag; MB, megabase; ORF, open reading frame; PCR, the spread of, and human contact with, vector species, polymerase chain reaction; PFGE, pulsed field gel electrophoresis; through human migration (with immunogically naive people PRINS, primed in situ hybridisation; VSG, variable surface glycopro- moving into endemic areas and infected people spreading tein; WHO, World Health Organization; YAC, artificial chromo- disease to new localities), and through uncontrolled (shanty- some. *Correspondence to: David A. Johnston, Department of Zoology, The town) urbanisation. Natural History Museum, London, United Kingdom. E-mail: Current control measures rely heavily on pharmacologic [email protected] intervention to alleviate symptoms and reduce parasite trans- mission, coupled to vector control. However, widespread

BioEssays 21:131–147, ௠ 1999 John Wiley & Sons, Inc. BioEssays 21.2 131 132 articles Review iEsy 21.2 BioEssays TABLE 1. Medical Impact of Parasites Targeted for Genome Analysis

Vector/ People Disease Disease and intermediate infected cases/year Deaths/year At risk Main control causative host (millions) (millions) (millions) (millions) Distribution problems

Protozoan parasites African sleeping sickness Tsetse fly Unknown 0.25–0.3 0.225–0.27 Invariably 55 36 countries of sub- ● Insufficient medical surveil- (African trypanosomiasis) fatal if untreated Saharan Africa lance Trypanosoma brucei (90% of cases) ● Premature cessation of con- subspp. (kinetoplastid trol programmes protozoan) ● Drug toxicity

Chagas Disease (American Triatome bugs 16–18 6–10 million Unknown, but infection 90–100 (25% S. USA, Cent. and S. ● Zoonotic infections trypanosomiasis) suffer chronic dis- is fatal in 10–40% of of S. and Cent. America ● Blood transfusions Trypanosoma cruzi (kineto- ease chronic cases America) plastid protozoan)

Leishmaniasis Sandflies 12–15 2 Unknown 350–370 88 countries of N. ● Increase in vector popula- Leishmania sp. (13 species) Africa, Middle East, tions as malaria vector con- (kinetoplastid protozoan) and S. America trol programs are wound down ● Immunosuppressive effects of HIV ● Prolonged drug treatment/ hospitalisation required

Malaria Mosquitoes Unknown 300–500 1.5–2.7 2,500 (40% global 98 countries, mainly in ● Drug resistance in parasite Plasmodium sp. (4 species) population) sub-Saharan Africa, ● Insecticide resistance in (apicomplexan proto- also Arabia, Latin vector zoan) America, S. E. Asia ● Migration and travel

Metazoan helminth para- sites Schistosomiasis (Bilharzia) Freshwater snails 250 20 0.3–0.5 600 76 countries of Africa, ● Water development pro- Schistosoma sp. (5 species) S. and Cent. grammes (trematode platyhelminth) America, Caribbean, ● Zoonotic infections India, China and S. ● Drug resistance E. Asia ● Migration

Cutaneous and lymphatic fila- Mosquitoes and other 150 43 Unknown, but small 1,100 73 countries of sub- ● Potentially eradicable riasis arthropods (black- Saharan Africa, S. 7 species in 5 genera flies, mites, cope- and Cent. America, includingcerca Brugia, Oncho- pods) Caribbean, India, and Wuchereria China, S. E. Asia, (filarial nematodes) and the Pacific Review articles

(and improper) use of the few treatments that were previously cheap, effective, and easy to administer has resulted in the TABLE 2. Internet Information Sources for appearance and spread of resistant parasites and vectors. Parasite Genome Projects Even where drug resistance is not yet a serious problem, Parasite Genome Initiative WWW sites available treatments may require long-term drug administra- Malaria Genome Network WWW sites tion and/or hospitalisation, involve regimens that have not ● P. falciparum gene sequence tag project: http://parasite.arf. been standardised or use compounds that are subcurative, ufl.edu/malaria.html ● P. falciparum genome sequencing project: http://www.sanger. invoke severe (sometimes fatal) side effects or may not even ac.uk/Projects/P_falciparum/ be officially licensed.(3) In the longer term, vaccines will ● malaria database: http://www. wehi.edu.au/MalDB-www/who.html provide a major impact but factors such as antigenic switch- Leishmania Genome Network WWW site: http://www.ebi.ac.uk/ ing, mutation, diversity, autoimmune stimulation and poor parasites/leish.html antigenicity have limited progress and none are currently Trypanosoma brucei Genome Network WWW site: http:// parsun1.path.cam.ac.uk/ available for these six diseases. Trypanosoma cruzi Genome Network WWW site: http://www.dbbm. The development of new drugs that are both safe and fiocruz.br/genome/tcruzi/tcruzi.html effective and the identification of new vaccine antigens are Schistosoma Genome Network WWW site: http://www.nhm.ac.uk/ hosted_sites/schisto/ urgent needs. However, eukaryotic parasites are far from Filarial Genome Network WWW sites ideal experimental organisms, and the mass screening tech- ● UK: http://helios.bto.ed.ac.uk/mbx/fgn/filgen1.html ● ϳ niques available for prokaryote pathogens are rarely appli- US Mirror: http://math.smith.edu/ sawlab/fgn/filgen.html cable. Therefore, rational design strategies, based on a Coordinating sites sound understanding of how parasites develop, survive, and WHO Parasite Genome WWW site: http://www.ebi.ac.uk/parasites/ reproduce in their different hosts, of parasite-host and parasite- parasite-genome.html (includes a BLAST-server for individual parasite datasets) immune system interactions and of the factors that determine WHO Parasite Genome FTP site: ftp.ebi.ac.uk/pub/databases/para- behaviour, pathogenicity, drug resistance and antigenic varia- sites/ WHO-TDR Parasite Genome Committee: http://www.who.ch/tdr/ tion represent the only practical approach to identifying new workplan/genome.htm . Sequencing those parts of the genome that en- Parasite Genome Mailing List Archive: http://www.mailbase.ac.uk/ code this information is an urgent need, and the past few lists-p-t/parasite-genome/ years have seen the initiation of Parasite Genome Initiatives Parasite Genome Databases to facilitate the acquisition, analysis, and distribution of such In addition to being available for local installation (follow links in the data. above sites for information), the WHO-TDR Parasite Genome Databases are available, on-line, to registered users of the In this article, representatives from several of the Parasite Mapping Project Resource Centre.(122) Genome Initiatives discuss the rationale behind current ap- proaches, report on progress, and predict future develop- ments as information accumulates and research shifts from genomic data generation to postgenomic analysis. Genome projects break the mould of conventional scientific practice by of governmental and charitable support, potentially enough to being collaborative efforts, performed with a service function sequence the entire genome.(18) The Schistosoma, Brugia in mind and with the wish to disseminate data as widely and malayi (as a model filarial nematode), Trypanosoma brucei, T. rapidly as possible (much of which is inappropriate for formal cruzi and Leishmania Initiatives fall within the WHO’s Special publication). Thus, extensive reference will be made to Programme for Research and Training in Tropical Disease. information sources on the Internet. It is not intended to WHO’s resources are extremely limited and its funding is provide either a ‘‘blow-by-blow’’ account of the activities largely ‘‘pump-priming’’ to facilitate development of re- under way within each of the Parasite Genome Initiatives or sources, strategies and collaborations to support applications detailed descriptions of genome analysis techniques. For to national or international agencies for larger scale projects. these, the reader is referred to other articles in this issue, to This ‘‘pump-priming’’ is beginning to bear fruit with the the general literature, and to the review-based publications Leishmania and T. brucei Genome Initiatives recently attract- and Worldwide Web (WWW) resources generated by the ing multimillion dollar support. Initiatives (Table 2).(4–17) Briefly stated, the main aims of all the initiatives are: (1) to increase knowledge of parasite molecular biology, especially with respect to mechanisms of drug resistance, antigenic Overview of the parasite genome projects variation, and genetic diversity; (2) to identify with key Of the six Parasite Genome Projects discussed here, the cellular functions that could represent new drug targets and to Malaria was initiated with funding from the identify antigens with diagnostic and/or vaccine potential; (3) Wellcome Trust and has since attracted some US$ 25,000,000 where practical, to develop a physical map of the parasite’s

BioEssays 21.2 133 Review articles

precisely because they represent ideal ‘‘model’’ organisms. They are easy and cheap to maintain in the laboratory, they show rapid growth/development, they can be obtained in very large numbers, they are amenable to diverse experimental manipulations and analyses and hundreds of strains have been identified and characterised in detail, allowing a vast amount of information on biochemistry, behaviour, develop- ment, genetics, etc., to be accumulated to underpin genomic analyses.(22,23) Moreover, their are organised in a ‘‘conventional’’ manner. Parasites, however, belong in the ‘‘real world,’’ they often require protracted animal passage for maintenance of all, or part, of the life cycle, many cannot be cultured in vitro, restricting experimental manipulations, cer- tain stages of the life cycle may be available in extremely Figure 1. WHO Parasite Genome Initiatives involve collabo- limited quantities (for example, restricting the amounts of rations between laboratories in the Developed and Develop- material available for cDNA library construction), and their ing Worlds genomes may display unique features that complicate analy- sis. Being human pathogens, their study is also frequently subject to strict controls. Thus, what is desirable to analyse and what is practical may be radically different. In this section, genome and/or other library resources for distribution to the we discuss some of the problems that parasite genome wider research community; and (4) to curate, analyse, and analysis faces and how this influences the approaches being disseminate parasite genome data. used. A further aim of the WHO Initiatives is to equip scientists from disease-endemic countries with expertise in . Choice of organism to target for analysis The Developing World contains 70% of the world population The diseases discussed in this article are, in fact, spectra of and 24% of the scientific community, yet receives only 5% of caused by different strains, subspecies, species global research funding and contributes to 3% of indexed or, in the case of filariasis, genera of parasite that may be (19) research publications. Most of the large-scale biotechnol- separated by many millions of years of evolutionary history, ogy initiatives, including the model organism genome projects, resulting in dramatic differences in distribution, vector and were conceived as domestic initiatives by developed coun- definitive host specificity, behaviour and pathology. The choice tries and the WHO Initiatives represent one of the few of organism to study, thus, is critical. Selecting the single most opportunities available to Third World countries to collaborate pathogenic, human-infective form generates a unified data in long-term, far-reaching projects and to acquire the exper- set of immediate clinical relevance but may reveal little about tise necessary to exploit advances in biotechnology for the causes of biomedically significant variation. Moreover, for (20) national development (Fig. 1). The Malaria Genome Project parasites that require animal passage, one that can parasitize has been, to date, an exclusively ‘‘First World’’ project small laboratory mammals has a much wider potential for involving the United States, United Kingdom, and Australia study and experimental manipulation than one that requires a although the importance of involving scientists from endemic primate host, even if it is not, in itself, a human pathogen. (21) countries is recognised. Such considerations are becoming increasingly important as the postgenomic era approaches because small mammal Problems models are widely available and permit the development/ Through advances in laboratory technology, automation and analysis of transgenic parasite systems.(21) On a purely informatics, it is now possible to rapidly acquire genomic data practical level, adopting a single species may require that the for any organism considered of sufficient importance to justify life cycle (including that of the hosts) or culture system is the effort. In an ideal world, genome analysis would encom- transferred between laboratories, not always a simple task. pass karyotype analysis, chromosome mapping, physical There is also the question of strain selection; a single mapping, genetic mapping, gene discovery, and genomic reference strain produces a unified data set and allows sequencing approaches allied to ongoing informatics and selection for useful biological characteristics but may require functional analysis. such as the yeast Saccharo- establishment of life cycles/culture systems, construction of myces cerevisiae and the free-living nematode Caenorhabdi- new DNA and cDNA libraries, and extensive work to define its tis elegans have been selected for detailed genome analysis biological features and karyotype.(24) The choice of experimen-

134 BioEssays 21.2 Review articles

TABLE 3. Target Organisms for Parasite Genome Analysis

Disease Parasite selected for analysis Rationale

Protozoan parasites African sleeping sickness Trypanosoma brucei TREU 927/4 (for physical mapping) ● Life cycle can be maintained by transmission through tsetse fly and mice ● Bloodstream and insect stages can be grown in vitro ● Sexually competent, genetic crosses are possible and hybrid progeny have been cloned ● Existing high quality libraries

Chagas disease Trypanosoma cruzi CL-Brener (for all studies) ● Likely to have derived from a human infection ● Grows in mice ● Both mammalian and insect stages can be main- tained in vitro ● Shows preference for heart and muscle cells in vitro ● Susceptible to current therapeutics

Leishmaniasis Leishmania major Friedlin (for physical mapping and ● Differentiates in vitro genomic sequencing) ● Can be passaged through sandflies ● Host immune responses very well characterised ● Transfection and gene knockout systems well devel- oped

Malaria Plasmodium falciparum 3D7 (for physical mapping ● Bloodstream stage grows well and produces sexual and genomic sequencing) stages in vitro ● The reference strain for chromosome numbering ● Retains all known functions ● The parent of a genetic cross so cloned progeny available for future analysis Metazoan helminth parasites Schistosomiasis Schistosoma mansoni, Schistosoma japonicum, ● The 2 main human pathogens (research priority given (any strain) to S. mansoni) ● Transfer and establishment of life cycle is difficult ● Encompass the genetic diversity of human-infective species ● Existing data suggests that data from one species will be relevant to the other ● Comparative genome analysis is immediately pos- sible ● Allows the most widespread international collabora- tion

Cutaneous and lymphatic filariasis Brugia malayi TRS ● Amenable to laboratory passage through mosquitoes and jirds, thus providing access to all stages of the life cycle ● Phylogenetically close to Wuchereria bancrofti, the main human pathogen

tal system is inevitably a compromise, and the different integral component of genome analysis. With the notable Parasite Genome Initiatives have reached different decisions exception of the protozoan Toxoplasma (not discussed in this based on the characteristics of the individual organisms review), complexities of the life cycle mean that parasites are (Table 3). rarely amenable to true genetic analysis. No sexual cycle has been observed in either Leishmania or T. cruzi, precluding Genetic analysis any type of genetic study. In Plasmodium, experimental For model organisms such as S. cerevisiae and C. elegans, crosses are technically difficult and only two have been genetic mapping (the identification and positioning on the reported(25,26) but a crude genetic linkage map is available.(27) of duplications, deletions, and marker loci that Genetic analysis of T. brucei is possible(28) but restricted by control phenotypically identifiable genetic traits by analysis of difficulties in obtaining large numbers of hybrid offspring from recombination frequencies from genetic crosses) forms an tsetse fly infections. However, isolation and cloning of large

BioEssays 21.2 135 Review articles

TABLE 4. Key Features of the Genomes of Target Parasites

Protozoan parasites Metazoan helminth parasites

Parameter T. brucei T. cruzi L. major P. falciparum Schistosoma B. malayi

Ploidy diploid diploid diploid haploid in man diploid diploid Haploid genome size 37–40 43–50 33–34 27–30 270 100 (MB) Chromosome At least 11 pairs of 30–40, exact n ϭ 36 n ϭ 14 n ϭ 8 ZW, sex deter- n ϭ 5 XY, sex deter- number megabase house- number still to be mination mination keeping chromo- determined somes Chromosome size 1.15–5.2 MB, non- 0.45–4.0 MB, non- 0.3–3.0 MB, non- 0.6–3.4 MB, non- condensing condensing and type condensing condensing condensing condensing A:T content 50% overall, 49% 37% 82% overall, up to 66% 75% 40–50% in coding 90–95% in inter- regions, 50–60% genic regions in intergenic regions Composition 12% highly repetitive 20–40% repetitive 30% repetitive approx. 40% coding 60% repetitive 17% repetitive, no 20% middle 30% single copy methylated DNA repetitive Expressed genes 10,000 10,000 8,000–10,000 5,000–7,500 15,000–20,000 16,000

numbers of hybrids and selection of polymorphic markers (chromosome set) and so establishing the chromosome (e.g., for drug resistance, human infectivity, transmissibility) is compliment, assigning reference marker loci to individual ongoing and genetic analysis is an integral part of the T. chromosomes and determining the extent and causes of brucei genome project.(29) Some experimental crosses have genomic variation are vital first steps. Because the chromo- also been performed in schistosomes, but again technical somes of the protozoan parasites do not condense during cell difficulties preclude widespread application,(30) none have yet division, they cannot be visualized by light microscopy and been performed in Brugia. karyotype definition/analysis must be performed on chromo- somes separated by pulse field gel electrophoresis (PFGE) Influence of genome characteristics techniques. This process has proved no easy task. The characteristics of individual parasite genomes (Table 4) The simplest organization is found in P. falciparum for impose severe limitations on the types of analysis that are which the entire chromosome compliment can be resolved on possible. An immediate distinction can be drawn between the a single PFGE run.(31) Homologous chromosomes of different protozoan parasites, which have (relatively) small genomes isolates can vary in size by up to 20%,(32) with polymorphism and chromosomes that do not condense during cell division, resulting from various phenomena, including chromosome and the helminths, which possess (relatively) large genomes breakage in subtelomeric regions followed by addition of with correspondingly large, typically chromosomes telomere repeats, crossover events, interchromosomal trans- that condense during cell division. This distinction has pro- positions and amplifications of regions of the genome under found effects on the strategies that can be adopted. selective pressure;(32,33) specific amplifications, linked to drug resistance, have also been identified in central regions.(34) Genome organization in the protozoan parasites Despite this variation, both linkage and synteny (gene order) The smaller overall size of the protozoan genomes (27–50 appear to be well conserved between isolates and between megabases [MB]) means that complete analysis, including species,(35,36) suggesting that the physical map being devel- full genomic sequencing, is potentially feasible. For prokary- oped for the reference strain will be broadly applicable. A otes with small genomes (Ͻ3 MB), the full genome sequence further problem with genome analysis in P. falciparum is that, can be assembled by computer from random ‘‘shotgun’’ due to its extremely high A:T content, fragments of P. sequences. However, computational systems capable of falciparum DNA Ͼ 5 KB are generally unstable in Escherichia larger shotgun assembly projects such as those of the coli;(37) therefore, conventional bacterial mapping vectors parasitic protozoa are in their infancy and so a physical map such as cosmids and bacterial artificial chromosomes (BACs) (the defined minimal set of overlapping clones that spans the cannot be used for library construction. entire genome) is a prerequisite for genomic sequencing. The PFGE of Leishmania chromosomes reveals approxi- physical map, in turn, needs to be referenced to the karyotype mately 25 visible bands that vary in staining intensity (suggest-

136 BioEssays 21.2 Review articles ing that they contain varying numbers of unresolved chromo- chromosomes can occur either subtelomerically or within the somes) and karyotypes of different strains and species show central, gene-rich area.(10) Retroposon-like elements have dramatic polymorphisms that have led to confusion about been described from both VSG-related and other regions,(48) chromosome number.(38) However, physical linkage patterns and there is speculation that they contribute to genome revealed by hybridising probe sequences to PFGE-separated rearrangement. chromosomes from a variety of species and strains have The karyotype of T. cruzi is as polymorphic as that of T. confirmed that 36 chromosome pairs exist.(38) Variation in size brucei; 20 to 42 bands that vary in staining intensity are seen between homologous chromosomes and extensive comigra- in different strains under a variety of PFGE conditions.(49) tion of nonhomologous chromosomes results in shuffling of Chromosome homologues vary by up to 50% within and chromosome identification numbers with respect to chromo- between isolates but physical linkage and overall chromo- some size. This makes chromosome identification difficult somal organization appear to be conserved.(50) Densitometric and the side-by-side analysis of two strains of L. infantum and analysis that uses telomeric probes reveals approximately 64 one of L. major under three sets of PFGE conditions is chromosome equivalents per cell, but the definitive karyotype required to unambiguously resolve all 36 chromosome pairs remains to be determined(51) and probe hybridisation to and so confirm a map location.(39) Despite this polymorphism, PFGE-separated chromosomes is being used to identify linkage groups appear well-conserved between species, further linkage groups.(52) The genome of reference strain suggesting that chromosome structure and gene order are CL-Brener appears to be very stable,(24) possibly because T. maintained(38,40) and the karyotype is now being transferred to cruzi is an intracellular parasite and so does not possess the the genome project reference strain. Variation between ho- minichromosomes, VSGs, and associated gross genomic mologous chromosomes results from amplification/deletion in rearrangements that T. brucei requires for immune evasion. subtelomeric regions, indicating that the central core region of However, genomic rearrangements have be detected, varia- the chromosomes (which contains the coding and structural tion in gene copy number/gene family composition can sequences) is more stable.(40) The karyotype of cloned strains occur(50,53) and breaking of physical linkage groups has been appears to be very stable,(41) possibly as Leishmania lacks a observed.(50) Moreover, there are telomere-associated genes, sexual cycle and utilises polycistronic , which including some for surface antigens, and exchange occurs may select against genome rearrangement. between them.(54) have also been de- T. brucei possesses three classes of chromosome, i.e., tected.(55) mini-, intermediate, and large, that serve different func- Despite such problems, the fact that the protozoan chromo- tions.(42) The large (megabase) chromosomes, which contain somes can be (imperfectly) separated on, and isolated from, the housekeeping genes, are the main focus for genomic PFGE gels means that chromosome-by-chromosome ap- study. Polymorphism amongst these chromosomes is more proaches are possible, dividing the genome into more manage- extreme than that in Leishmania with overall DNA content able portions and simplifying the task of physical map generation. varying by up to 30% and chromosome homologues varying by up to 400%.(43) For the megabase chromosomes, chromo- Genome organization in the helminth parasites some number has been determined by hybridising probes to Because the chromosomes of Schistosoma and Brugia con- PFGE-separated chromosomes(43) and confirmed by com- dense during cell division, they are visible by light microscopy. parative mapping of karyotypically polymorphic stocks under Moreover, differences in chromosome size, shape, centro- various PFGE conditions, allowing a chromosome numbering mere position and banding pattern allow the individual chro- system to be proposed.(44) Despite the huge size polymor- mosomes to be identified, facilitating karyotype analysis(56,57) phisms, no loss of synteny has been observed,(43) indicating and allowing probes to be mapped onto chromosome spreads that variation is not due to large translocations between by fluorescent in situ hybridisation (FISH) (although currently chromosomes. No gene probes hybridise exclusively to the only for large fragment probes).(58,59) To date, approximately mini- or intermediate chromosomes, suggesting that these 100 yeast artificial chromosome (YAC) clones have been may contain only variant surface glycoprotein genes (VSGs) mapped to the schistosome karyotype by FISH to anchor and involved in antigen switching for immune avoidance, VSG orientate the future physical map(60,61) and primed in situ expression sites and related sequences.(45) Huge arrays of hybridisation (PRINS) techniques are being optimised to transcriptionally silent VSGs are also found on several mega- increase the sensitivity of chromosome mapping. (59) Such base chromosomes(46) and extensive genome rearrange- techniques are also applicable to Brugia but have not yet ments occur to bring them into trancriptionally active telo- been extensively used. meric sites.(47) Despite these rearrangements, and the size Gross chromosomal morphology is conserved in schisto- polymorphisms, the overall structure of homologous chromo- somes, although there is slight variation in centromere posi- somes appears to be highly conserved.(43) Unlike P. falcipa- tion and C-band pattern between strains, and greater varia- rum and Leishmania, size-polymorphic regions in T. brucei tion between species.(56,62) There are reports of major genome

BioEssays 21.2 137 Review articles rearrangements occurring during the schistosome life cycle,(63) helminth Initiatives have focused on gene discovery and and it has been suggested that retroviruses could transfer development of resources for future mapping efforts. host genes, including those for major histocompatibility anti- gens, to the schistosome, which then expresses them for Physical mapping (64,65) immune avoidance. Such findings are disputed by other A physical map, that enables sequences to be ordered by (66) workers and the situation remains to be finally clarified reference to their parent clone, is a prerequisite for assembly (67) although retrotransposons are known to be present. In of the protozoan genome sequencing projects and even for addition, natural, hybridisation of schistosome species is a the parasitic helminths, where genome size is likely to (68) well-documented phenomenon, and there is preliminary preclude full-scale sequencing, will provide major benefits (1) (69) evidence for gene transfer between species. What effect by identifying the minimal clone set and regions of repetitive such plasticity (if it truly exists) will have on genome analysis DNA, it minimises sequencing efforts; (2) it expedites the remains to be determined. In contrast, the Brugia genome cloning of genes identified through EST sequencing and appears to be stable, chromatin diminution (the fragmenta- facilitates isolation of syntenous (similarly positioned) homo- tion and elimination of germ-line-specific chromatin), which is logues from related species; (3) it permits analysis of gene common in nematodes and can make the production of structure/order, transcription units, regulatory elements, and representative DNA libraries difficult, does not occur and polymorphic regions; and (4) it allows interspecific compari- transposable elements have not been reported. However, son of local gene order, physical linkage, and long range Brugia is known to carry a Wolbachia-like endosymbiont synteny. Many techniques are available for physical mapping within its cells(70) and the presence of this ‘‘foreign’’ genome and are appropriate for different circumstances. may complicate analyses, necessitating a parallel, endosym- Although P. falciparum DNA is generally unstable in biont Genome Initiative. Furthermore, as with Plasmodium, traditional (bacterial) mapping vectors, it is stable in YACs(71) the relatively high A:T content of Brugia DNA, may cause and these have been used in a polymerase chain reaction problems in library construction/analysis. (PCR)-based mapping strategy. Nested pools of YAC clones The downside of possessing large chromosomes is that were prepared as PCR templates, and PCR primers specific even the smallest of the helminth chromosomes are too large for known P.falciparum sequences (from characterised genes, to be separated by PFGE; therefore, individual chromosomes microsatellites, and genome sequence tags (GSTs, short cannot be isolated. This means that chromosome-specific regions of sequence generated from randomly selected DNA analyses cannot be undertaken and that physical mapping clones) used to amplify sequences from them. A positive PCR strategies are restricted to chromosome walking techniques reaction tags any YAC pool that contains the specific se- (which are relatively inefficient) or to global shotgun ap- quence under test, and the nested pools positively identify the proaches (a massive undertaking on genomes of this size). individual YAC(s) containing that sequence without needing The resources required for full-scale analysis of the helminth to individually screen each clone for each sequence. These genomes would be comparable to, or greater than, those YACs became anchor points for map generation. End se- required by the C. elegans project, and it is unlikely that full quence was generated from the anchor YACs and used to genomic sequencing will ever be initiated (the schistosome design PCR primers for the next screening round. Consecu- genome also contains so much repetitive DNA that this is tive cycles of sequencing and PCR screening assembled a unlikely to be a worthwhile endeavour). Therefore, under series of contigs that eventually covered the whole chromo- current circumstances, the helminth Initiatives must be selec- some, generating the map and, at the same time, assigning tive, targeting specific regions of the genome for detailed defined markers to it. Other chromosomes were mapped by analysis and their initial priority has been to target coding the probe saturation approach by using GSTs from chromo- regions through gene discovery. some fragment libraries as markers. Comparative restriction mapping of YAC contigs and genomic DNA confirmed corre- spondence between map and genome. Complete maps are Progress available for chromosomes 2, 3, 4, 8, and 12,(72–76) but not all The Parasite Genome Initiatives are ongoing efforts and, are from the reference strain and so work is ongoing to ultimately will encompass as many different kinds of analysis transfer them to it. Excepting chromosomes 10 and 11, all the as are possible to perform in the individual systems. How- other chromosome maps are effectively complete, with 85% ever, in the initial phases, practical considerations have of the genome assembled, and the density of marker se- required the different Initiatives to focus on different aspects quences is now being increased by hybridisation of ex- of genome analysis. Because full genomic sequencing is a pressed sequence tags (ESTs, see Gene Discovery section stated goal of the protozoan Initiatives, priority has been below) and GSTs to YAC clones spotted onto high density given to karyotype definition (see above) and physical map- filters. Low-resolution restriction maps are also available for ping as a prelude to genomic sequencing, whereas the most chromosomes, again not all derived from the reference

138 BioEssays 21.2 Review articles strain but comparison between strains suggests correspon- Contig assembly is ongoing as a prelude to genomic sequenc- dence.(5) ing and chromosomes 3 and 4 have already been mapped For Leishmania, a cosmid fingerprinting strategy has been from cosmids.(83) In addition, 850 KB of chromosome XVI has used for physical mapping. Individual cosmid clones were cut been assembled from YACs.(52) with restriction enzymes (enzymes that cut DNA at particular For the metazoan parasites Brugia and Schistosoma, sequences), and the fragments were end labelled, separated genome size means that only low-resolution maps are pos- on polyacrylamide gels, and visualized by autoradiography. sible with current resources and efforts have concentrated on Fragment patterns were matched by computer to identify generating the resources for future studies. overlapping cosmids and assemble them into contigs. Probes For Brugia, two complimentary mapping strategies will be generated from clones at the ends of contigs were then used: (1) probing BAC library grids with 5,000 ESTs to provide hybridised to cosmid filters to link contigs, and contigs were sequenced marker loci; (2) sample-without-replacement assigned to chromosomes by hybridisation to PFGE-sepa- screening of BAC libraries with BAC end probes to assemble rated chromosomes. From a library of 9,216 cosmid clones, contigs (in this approach, BAC clones are selected at random 9,004 fingerprints have been assembled into 39 contigs, and probes generated from the ends of their inserts, these covering Ͼ91% of the genome.(77) ESTs are currently being probes are then used to screen the BAC library gridded onto placed on the physical map to provide transcriptional loci to filters and identify overlapping clones, nonhybridising clones facilitate open reading frame (ORF) analysis of future ge- are used for each successive round of screening until all of nomic sequence data. the clones have been identified).(84) The endosymbiont ge- For T. brucei, the cloned reference stock TREU 927 has a nome is also targeted for mapping.(85) smaller genome than most other stocks and its homologous For Schistosoma, a YAC library exists,(60) a BAC library is chromosomes are of more even size.(43) This is fortuitous for currently under construction, and a pilot project is under way mapping because both homologues are represented in the to produce a low-resolution map for chromosome 3 (which genomic libraries which are constructed from diploid stages has the greatest density of YAC markers assigned to it). A of the life cycle. P1 (bacteriophage cloning vectors capable of chromosome walking strategy that uses the YACs as anchor carrying inserts of 85–100 KB) and cosmid libraries are points and assembles contigs outward from them until the available and have been used to generate a complete whole chromosome is covered will be used.(86) The majority of contig/restriction map of chromosome 1(78) and large contigs the schistosome genome is composed of repetitive DNA, and anchored to the other 10 megabase chromosomes by EST it is not yet known how this will affect mapping strategies, markers (Melville, unpublished). Comparison of the map with although it is likely that end probes, rather than whole inserts, size-variant homologues in other cloned stocks indicates that will need to be used as probes to minimise false positive polymorphisms affect the entire chromosome, although dis- signals due to dispersed repetitive sequences. persed repeat sequences are confined to one specific re- Although the helminth chromosomes are too large to be gion.(10) Map generation is by hybridisation of ESTs, GSTs, isolated by PFGE, preventing production of chromosome- and clone end probes onto gridded genomic libraries, and a specific sublibraries, it has been possible to physically scrape BAC library is under construction to facilitate mapping. individual schistosome chromosomes from chromosome (59) Concurrently, ESTs are being mapped onto the karyotype to spreads by using a micromanipulator. It is, theoretically, (87) anchor and orientate the map contigs. possible to generate libraries from such material and this For T. cruzi, cosmid, BAC, and YAC libraries are avail- approach may allow more focused mapping and sequencing able(52,53,79) and have been screened with chromosomes studies to be performed in the future. isolated by PFGE to produce chromosome-specific sublibrar- ies.(80) Mapping strategies have been adapted from those Gene discovery used in the yeast genome project, with predominately chromo- With the identification of new drug and vaccine targets and some-specific cDNA clones (identified by hybridisation), rather the elucidation of mechanisms of drug resistance, antigenic than cosmids, being used as probes to screen the cosmid variation, and genetic diversity being priorities for all of the library gridded onto filters.(53,81) This approach combines map Parasite Genome Initiatives, gene discovery programmes production with gene mapping and minimises problems that form an integral part of their activities. Much of the initial work would result from repetitive DNA sequences present in has exploited the conventional EST approach,(88) in which cosmid probes. Once contigs are assembled, restriction short (100–500 base) sequences are generated from the mapping identifies the overlapping regions to minimise future ends of randomly picked cDNA clones to provide ‘‘tags’’ for sequencing efforts and provide markers to confirm the accu- that clone. These are then compared with the public se- racy of sequencing data. The vectors used for cosmid library quence databases to try to identify the gene they were construction have been specifically designed to facilitate both derived from, allowing the rapid production of a gene cata- this restriction analysis and future transfection studies.(53,82) logue for the organism. Examination of cDNA libraries gener-

BioEssays 21.2 139 Review articles

Figure 2. The impact of parasite gene discovery programmes (data to mid-September of 1998). Current totals are available on the WWW(109)

ated from different stages of the life cycle detects developmen- and so random genomic sequencing would not be an efficient tally regulated genes and generating/exploiting these way to identify new genes. Such constraints are not univer- resources has been an integral part of the Initiatives.(89–95) sally applicable. The T. brucei genome is very gene dense; Due to the limited availability of in vitro culture systems, only one has ever been found,(102) and intergenic parasites are usually recovered from laboratory hosts, with regions are very short.(103) Thus, for T. brucei, random the inherent risk of contamination of both cDNA and genomic sequencing of clones from genomic DNA libraries is as DNA libraries with host material. All trypanosome and Leish- efficient a means of gene discovery as random sequencing of mania mRNAs, and the majority of nematode mRNAs, un- cDNA clones.(103) It has further advantages in that genes dergo post-translational modification through trans-splicing of expressed at any stage of the life cycle can be identified a spliced leader (SL) sequence onto the 5’ end of tran- (including those where cDNA library construction is impos- scripts.(96,97) As the SL sequence is parasite specific, using it sible), the library is effectively ‘‘normalised,’’ so redundancy is for second-strand cDNA synthesis produces libraries that are minimised and, because intergenic regions are short, se- free from contamination with host cDNA (they are also, by quencing both ends of an insert can detect different ORFs. definition, intact at their 5’ end). Such libraries have been The P. falciparum genome is similarly gene dense (up to 40% extensively used.(92–95) A schistosome SL has also been coding),(91) and here another technique is being used. For as identified(98) but is restricted to a small proportion of cDNA yet ill-defined reasons, mung bean nuclease specifically cuts transcripts (estimated at Ͻ10%, R. Davis, personal communi- P. falciparum DNA before and after structural genes and cation). Its significance remains to be determined, and schis- within some . The resulting noncoding regions do not tosome SL libraries have not yet been examined. clone efficiently;(104,105) therefore, genomic libraries con- In conventional cDNA libraries, differences in gene tran- structed from mung bean nuclease-restricted P. falciparum scription levels and cloning bias mean that some transcripts DNA are efficient gene discovery resources(105) with similar are very abundant, others very rare,(99) and so redundancy in benefits to those in T. brucei. Moreover, manipulation of the data set, due to the same genes being ‘‘discovered’’ over digestion conditions allows preferential release of either and over again, can rapidly reduce the efficiency of EST whole genes or exons, facilitating isolation of sequences for projects. Several approaches are being explored to over- further analysis.(106) Mung bean nuclease shows a similar come such problems: (1) examining a large number of cDNA ‘‘specificity’’ on trypanosome and Leishmania DNA(107,108) but libraries, because the bias of each will differ; (2) library has not yet been exploited for gene discovery in these systems. normalisation, to reduce the compositional bias(100) (already Irrespective of strategy used, parasite gene discovery has in use in the T. cruzi initiative and shortly to be used in been hugely successful and has resulted in a logarithmic Schistosoma); (3) the use of arbitrary-primed reverse trans- increase in the catalogue of known parasite sequences (Fig. criptase (RT)-PCR to generate mini-libraries, which also 2); parasites being amongst the highest placed organisms in allows library construction from the limiting amounts of cDNA the EST league table.(109) For example, if gene discovery in available from many stages of parasite life cycles;(101) (4) Schistosoma were to have continued at the pre-EST rate, it prescreening of libraries, to remove abundant transcripts;(9,85) would have taken some 700 years to obtain the full gene and (5) random genomic DNA sequencing (see below). catalogue, whereas 5 years of the EST initiative has tagged Expressed sequence tagging was devised for efficient 15%–20% of the genes in its genome. All the EST sequences gene discovery in organisms with massive genomes in which and their parent clones are in the public domain to serve as a the coding regions represent only a small fraction of the total resource for the research community. Hundreds of clones

140 BioEssays 21.2 Review articles have been distributed for ongoing detailed characterisation genome, genomic sequencing may allow efficient gene discov- studies, and many have been used as probes to identify homolo- ery in T. cruzi as gene-rich areas, with ORFs every 3 KB, have gous genes in other species. They are also being used as defined been identified.(52) In addition, experiments indicate some probes in the physical and chromosome mapping projects. synteny between Leishmania and T. cruzi; therefore, map and genomic sequence data from the former may allow targeted analysis in the latter.(114) Full genome sequencing For the metazoan parasites Brugia and Schistosoma, As discussed above, features of the P. falciparum and T. which have massive genomes (and in Schistosoma, very high brucei genomes allow efficient gene discovery from genomic levels of repetitive DNA), full genome sequencing is probably DNA clones.(103,105) However, the entire genome sequence of neither practical, nor necessary, even if the resources were a parasite represents the complete inventory of virulence available. Instead, targeted areas of the genome will be factors, immunogens, drug targets, etc., expressed by that selected for sequencing. For Brugia, the imminent completion organism and, thus, is the ultimate resource for the research of the C. elegans genomic sequence will provide an invalu- community. Determining full genomic sequence for even a able ‘‘roadmap’’ to aid such studies.(115) small genome requires very significant investment in labour, reagents, and facilities. Funding has been secured to se- quence the entire P. falciparum genome,(18) with the workload Informatics shared by major sequencing facilities in the United States The Parasite Genome Projects are generating large amounts (TIGR and Stanford) and United Kingdom (Sanger Centre). of data of diverse forms; sequences, homology search This work is proceeding, chromosome-by-chromosome, with results, sequence clusters, library details, bibliographic, labo- shotgun sequencing of individual chromosomes purified by ratory and personnel information, clone hybridisation results, PFGE. The Sanger and Stanford groups also use shotgun network policy documents, etc., and much of these data are libraries of YACs previously localised to chromosomes by the of wide interest as it pertains to groups of organisms that are physical mapping Initiative to identify contigs from the same traditionally under-researched. The Internet provides the region and aid gap closure. To date, over 176,000 sequenc- ideal mechanism for rapid dissemination of these data; all of ing runs have been performed; pilot projects on chromo- the parasite genome networks maintain WWW servers provid- somes 2 and 3 are nearly complete(18) with chromosomes 1, ing up to date information for their projects and an email 4, and 14 well under way, and chromosomes 9, 10, 11, 12, discussion list facilitates communication. In addition, a dedi- and 13 started.(110) With large-scale genome sequencing cated site at the European Institute provides projects costing some US$ 500,000 per MB,(18) complete general information and analytical services, including a para- sequencing of the genomes of all the other parasitic protozoa site-specific BLAST server (Table 2).(16) seems unlikely in the short term. However, medium-scale For all of the Parasite Genome Initiatives, cluster analysis funding has been secured for several projects. For Leish- of the sequence data sets is an important ongoing activity. By mania, where the minimal tiled cosmid set from the physical grouping sequences that are derived from the same gene, map allows a chromosome-by-chromosome approach, fund- these analyses reveal redundancy in the datasets and identify ing is available to sequence 45% of the genome (15 MB) by the actual number of different genes tagged to date (as opposed the year 2000, with the goal of obtaining the complete to sequences generated), thus monitoring the efficiency of the genome sequence by 2002. Chromosome 1 is complete,(111) ongoing work, for example, in Brugia, with some 16,000 se- significant progress has been made for chromosome 3, and quences available, approximately one in three new sequences chromosomes 2, 4, 5, and 6 are in progress.(112) Genomic tags a previously unknown gene, whereas in Schistosoma (8,000 sequencing means that EST initiatives for Leishmania have sequences available) it is two in three.(116,117) Clustering also now been halted. For T. brucei, two projects are under way, creates consensus sequences that may facilitate gene identi- one to completely sequence chromosome 1, the other a fication and identifies highly expressed genes of unknown large-scale, random genome sequence survey contributing to function, that may be important candidates for early study. To both gene discovery and physical mapping as a prelude to facilitate data analysis, standardised clone nomenclature sequencing whole chromosomes.(113) These endeavours will systems have been devised for individual Initiatives (details produce dividends in cataloguing new genes, especially on the Network WWW sites, see Table 2) and, for Brugia,a those expressed at very low levels and so unlikely to be standardised gene nomenclature has been proposed.(118) identified through ESTs (and will be vital when redundancy Data generation is one thing but these data need to be makes EST generation inefficient), in identifying regulatory made available to the research community in a form that is elements and in elucidating genome structure and organiza- easy both to access and to navigate. To this end, as well as tion. The T. cruzi Initiative is still seeking major funding, but a their WWW sites, all the Parasite Genome Initiatives curate pilot project has virtually completed chromosome 3 and integrated, hypertext databases of genomic and related data. chromosomes 2 and 4 are under way.(52) Despite a larger These are all based on ACeDB, the database engine devised

BioEssays 21.2 141 Review articles

Figure 3. The Parasite Genome Initiatives main- tain publicly accessible databases that allow hyper- text linking of related information and graphical displays of genome data. These are built using the ACeDB database engine.(119) ACeDB has many advantages for the Parasite Genome Initiatives: (1) the database software is freeware and avail- able for Unix, Intel, and Macintosh operating sys- tems, maximising its potential user base; (2) the data structure is platform-independent, facilitating curation; (3) database organization is completely flexible, through addition of new data classes or modification of existing ones, and so can be modified as the needs of each project evolve.

for the C. elegans Genome Initiative,(119) which is specifically Parasite Genome Initiatives exist for a purpose, i.e., to designed for the storage, analysis, and retrieval of genome identify new drug, vaccine, and diagnostics targets; to deter- data, with graphical displays to facilitate interpretation of mine the molecular basis of parasite function; and to provide specific data types such as sequences and maps (Fig. 3). resources to facilitate analysis. These resources are in the All the parasite genome databases are available for local public domain and ‘‘post genomic’’ investigations that exploit installation(120,121) and the WHO parasite databases are also them for functional analysis are already being initiated. mounted at the Human Genome Mapping Project Resource Database homology searches have identified hundreds of Centre (access requires registration).(122) WWW interfaces to ESTs/GSTs as being of potential interest, including potential the databases are under development to facilitate Internet immuno-modulatory , signal transduction molecules, access and a CD-ROM version is being produced for distribu- cell cycle regulators, key metabolic enzymes, etc., and these tion to laboratories in developing countries where Internet have been distributed for detailed characterisation studies. access is insufficient to permit downloading. For vaccine development, as well as work to identify and characterise new candidate antigens, attention is increasing Prospects being focused on the use of novel delivery systems such as Despite the limitations of the biological systems involved, the multiantigen cocktails, DNA vaccines, or live viral vectors, or Parasite Genome Initiatives have already proved to be on the modulation of parasite-encoded immunomodulatory extremely effective mechanisms for generating a wealth of proteins.(123–130) fundamental information about parasites. The Initiatives are The majority of newly tagged genes from the gene discov- ongoing and future work will both enhance existing datasets ery programmes (55%–85%) and of the ORFs identified by and generate new ones as mapping and genome sequencing genomic sequencing have no recognisable homology with efforts develop. In addition, the coming years are likely to see characterised genes on the public databases(9,89–95,103,105) huge improvements in technology that will transform molecu- and a major challenge for the future will be to develop lar biology from an analytical science into a discipline of effective, user-friendly bioinformatics systems capable of bioinformatics, structure/function analysis and in vivo manipu- assigning function to them or of identifying potential functional lation. In this final section, we examine how the Parasite motifs within them. Furthermore, with the advent of major Genome Initiatives might develop and where their data may genome sequencing efforts comes the need to identify the lead us. location of genes within the sequence. Gene prediction

142 BioEssays 21.2 Review articles algorithms already exist for Plasmodium and C. elegans but Not surprisingly, a large proportion of Brugia ESTs have a C. will need to be trained to work for other species. elegans gene as their closest homologue. Frequently, the Early identification of gene function is useful but by no latter has been fully sequenced, facilitating functional identifi- means essential, and the mass screening of pooled, unchar- cation of the Brugia version. The evolutionary distance acterised ESTs for their and/or DNA vaccine potential between the two nematodes is such that shared sequence is also being considered. Similarly, the development of motifs are likely to have functional significance and could standardised phenotypic assays, suitable for mass screening represent targets for broad spectrum nematocides. Compari- of cultured protozoan parasites with novel drug compounds is son in the opposite direction is also of value with Brugia ESTs, being addressed. confirming the existence of hypothetical C. elegans genes ESTs derived from different life cycle stages already that have been predicted from genomic sequence.(115) Syn- provide basic information on the ontogeny of gene expression teny between the two species is also being investigated as a (with the caveat of post-transcriptional control), but the possible route to targeted gene discovery; if a potentially absence of a gene from a particular EST dataset does not interesting gene in C. elegans is flanked by two housekeep- necessarily mean that the gene is not expressed at that ing genes, and gene order is conserved between the two stage, it may just not have been picked from the relevant species, the housekeeping genes can be used as markers for library. As the datasets grow, analysis of gene expression rapid isolation of the Brugia homologue. Similarly, for the patterns will be achieved far more effectively through the use trypanosomes and Leishmania, it may be possible to map of DNA chip technology, where the known gene set is gridded genes whose location is known in one species onto the and screened with cDNA from different developmental stages, physical map of the other species. from parasites with different biological properties, and from Perhaps the most exciting prospects are for gene analysis parasites subjected to different environmental stresses (drug by in vivo genetic manipulation (knockouts, transgenics, etc.), challenge, cytokine stimulation, temperature shock, etc.) to both for mass screening assays to identify functionally impor- identify genes that are induced, repressed, or developmen- tant genes and for analysis of individual genes. Transfection tally regulated. Production of gene arrays is high on the systems already exist for a number of parasitic proto- agenda of all of the Parasite Genome Initiatives, although the zoa,(82,131) and recent improvements in reporter gene technol- expressed gene content of these organisms means that, at ogy will facilitate gene expression studies on these organ- least in the short term, such arrays are likely to be generated isms.(132) Schistosome transfection systems are under ‘‘in-house,’’ by gridding the known gene set onto filters, rather development,(86) and for Brugia it may even be possible to than by using state of the art, but costly, commercial technolo- use its endosymbiont as a vehicle for gene delivery. With the gies that synthesise gene sequences directly on glass slides exception of Plasmodium, in which the human infective stage through a combination of oligonucleotide synthesis and pho- is haploid, all the parasites are diploid and this complicates tolithography techniques. the production of ‘‘knockout’’ lines as both copies of the has enormous potential to identify targeted gene must be inactivated. For Leishmania, chromo- gene homologues in related organisms, to provide informa- some fragmentation techniques aimed at generating a com- tion about functional diversity (e.g., across the filarial nema- prehensive panel of strains in which known regions of the todes)(85) or to assist in the development of experimental genome are rendered haploid, and other knockout strategies, models (e.g., by using data from P. falciparum as a template are scheduled for investigation. Functional analysis in well- for rapid analysis of rodent malarias in which parasite gene defined heterologous systems will also prove vital. It is knockout/transfection can more readily be studied).(21) Se- already possible to express genes from parasitic nematodes quences that match only sequences from related organisms in C. elegans(133) and to rescue S. cerevisiae mutants with or have no database match may also represent specific malaria genes.(134) An international collaboration is currently phylogenetic or pathologic adaptations and, thus, be possible aiming to produce a panel of knockouts for every single S. targets for intervention. Some of the most exciting compara- cerevisiae gene (some 6,000) in each of four genetic back- tive analysis are between Brugia and C. elegans. The latter, a grounds(135) which would provide a major resource for func- free-living nematode, has been studied in exquisite detail; a tional analysis. Adaptation of such technologies to those near complete physical map and full genome sequence will parasitic protozoa that can be cultured in vitro may ultimately be available by 1999, the full cell lineage is known, anatomy be possible (T. brucei, T. cruzi, and some Leishmania species and neural connections have been reconstructed from serial (but not L. major) can be grown in axenic culture, allowing the sections, and thousands of morphologic, developmental, and production of large numbers of parasites for screening as- behavioural mutants have been genetically mapped.(23) Tech- says, culture of Plasmodium requires addition of mammalian niques are available to analyse gene expression and function erythrocytes). Expression of malaria genes in Toxoplasma in C. elegans and many may be transferable to Brugia.(115) may also be possible.

BioEssays 21.2 143 Review articles

Proteomics, the study of gene expression through the sequencing centres where the huge economies of scale databasing and preliminary characterisation of its protein necessary for cost-effective genome sequencing are pos- products, is a major new research field that complements sible, and toward more insular research where postgenomic more traditional aspects of genomics.(136) Two-dimensional studies with commercial potential are concerned. The chal- gel electrophoresis is used to separate proteins from extracts lenge will be to keep the dialogue going and to ensure that the of cells, tissues, organs, or whole organisms and to produce a need to fund does not jeopardise progress database of their isoelectric point, molecular weight, and in the genomic studies from which they derive. relative abundance. Individual protein spots (as little as 1 picomole) are then excised from the gel and analysed by Concluding remarks N-terminal peptide sequencing or automated ‘‘time of flight’’ Enshrined in the mandate of the WHO is the right of each and mass spectroscopy. Comparative analysis of protein profiles every person, including the world’s underprivileged, to good obtained from different sources (life cycle stages, experimen- health and the successful control of tropical diseases is tal conditions, etc.) directly reveals changes in protein expres- essential to achieve this goal. In the past few years, collabora- sion levels. The amino acid sequence tag data from these tions between institutes in the Developed and Developing analyses is then compared with the DNA or protein sequence Worlds have clearly demonstrated the potential contribution databases to identify known proteins. For unknown proteins, that Parasite Genome Initiatives can make to this goal by amino acid sequence is used to design oligonucleotide identifying new targets for the development of diagnostic probes to isolate the genomic or cDNA clones corresponding tests, drugs, and vaccines, and by providing the biological to that protein. Without doubt, the Parasite Genome Initiatives and informatics resources for their rapid characterisation and will shortly commence proteomic studies, and the integration analysis. Moreover, progress has been achieved on a remark- of proteomic data with existing genomic databases will be an ably small budget. Parasitology is a historically underserved important task. The potential is already clear; comparison of a research discipline (per fatality, malaria gets 2% of the peptide sequence from the filarial nematode Dirofilaria immi- research funding that HIV/Aids receives), significant analysis tis with the Brugia EST database has been used to identify of any genome is a massive, labour intensive and costly Brugia cDNA clones for the homologous gene, and these exercise, and there is much competition for limited funds. The then were used to isolate the Dirofilaria equivalent (Blaxter, Parasite Genome community has successfully established unpublished). collaborative links and proved the feasibility of its ap- On the purely technical side, optical restriction mapping proaches. What is needed now is the will, on the part of techniques that use light microscopy to size and position funding agencies, to dedicate the funds to allow full-scale chromosome fragments may prove of benefit in the parasitic genomic and postgenomic analyses to occur.(138) protozoa, in analysing size polymorphism in homologous chromosomes and identifying markers for distinguish comi- Acknowledgments: (137) grating, nonhomologous chromosomes. We thank colleagues in the Parasite Genome Networks for In the excitement of product-oriented analysis, we should sharing data. This paper is dedicated to the memory of Prof. not forget that parasites are fascinating organisms in their Mette Strand, coordinator of the WHO Schistosoma Genome own right, providing systems for the study of differentiation, Network from 1994 to 1997. development, gene expression, and evolution. For example, platyhelminthes are traditionally regarded as the earliest diverging bilateral metazoans, and schistosomes are unique References amongst them in possessing separate sexes, this being 1. Pinto Dias JC. Epidemiology of Chagas disease. In: Wendel S, Brener Z, Camargo ME, Rassi A, editors. Chagas disease - American Trypanoso- determined through heteromorphic sex chromosomes. Were miasis: its impact on transfusion and clinical medicine), Sao Paulo, resources available to map and sequence these chromo- Brazil: ISBT Brazil; 1992. p 49–80. somes and compare them with the chromosomes of hermaph- 2. http://www.malaria.org/MAC.HTM 3. Cook GC. Adverse effects of chemotherapeutic agents used in tropical rodite relatives and to the sex chromosomes of other organ- medicine. Drug Saf 1995;13:31–45. isms, fascinating and fundamental biological questions could 4. Foster J, Thompson J. The Plasmodium falciparum genome project: a resource for researchers. Parasitol Today 1995;11:1–4. be addressed. 5. Dame JB, Arnot DE, Bourke PF, Chakrabarti D, Christodoulou Z, Coppel To date, much of the output of the Parasite Genome RL, Cowman AF, Craig AG, Fischer K, Foster J, Goodman N, Hinterberg Initiatives has been generated by individual researchers K, Holder AA, Holt D, Kemp DJ, Lanzer M, Lim A, Newbold CI, Ravetch JV, Reddy GR, Rubio J, Schuster SM, Su X-Z, Thompson JK, Vital F, working together in international collaborations. This ap- Wellems TE, Werner EB. Current status of the Plasmodium falciparum proach has done much to stimulate communication within the genome project. Mol Biochem Parsitol 1996;79:1–12. research community and to provide opportunities for the 6. Blackwell JM. Parasite genome analysis: progress in the Leishmania genome project. Trans R Soc Trop Med Hyg 1997;91:107–110. training of scientists from disease endemic countries. As the 7. Ivens AC, Blackwell JM. Unravelling the Leishmania genome. Curr Opin Initiatives develop, there is likely to be a shift toward the major Genet Dev 1996;6:704–710.

144 BioEssays 21.2 Review articles

8. Degrave W, Levin MJ, da Silveira, JF, Morel CM. Parasite genome 34. Watanabe J, Inselburg J. Establishing a physical map of chromosome projects and the Trypanosoma cruzi genome initiative. Mem Inst Os- no-4 of Plasmodium falciparum. Mol Biochem Parasitol 1994;65:189– waldo Cruz 1997 ;92:859–862. 199. 9. Zingales B, Rondinelli E, Degrave W, daSilveira JF, Levin M, LePaslier D, 35. Janse CJ, Carlton JMR, Walliker D, Waters AP. Conserved location of Modabber F, Dobrokhotov B, Swindle J, Kelly JM, Aslund L, Hoheisel JD, genes on polymorphic chromosomes of four species of malaria para- Ruiz AM, Cazzulo JJ, Pettersson U, Frasch AC. The Trypanosoma cruzi sites. Mol Biochem Parasitol 1994;68:285–296. genome initiative. Parasitol Today 1997;13:16–22. 36. Carlton JMR, Vinkenoog R, Waters AP, Waliker D. Gene synteny in 10. Melville SE. Parasite genome analysis. Genome research in Trypano- species of Plasmodium. Mol Biochem Parasitol 1998;93:285–294. soma brucei: chromosome size polymorphism and its relevance to 37. Weber JL. Molecular biology of malaria parasites. Exp Parasitol 1988;66: genome mapping and analysis. Trans R Soc Trop Med Hyg 1997;91:116– 143–170. 120. 38. Wincker P, Ravel C, Blaineau C, Pages M, Jauffret Y, Dedet JP, Bastien P 11. Melville SE. The African trypanosome genome project: focus on the The Leishmania genome comprises 36 chromosomes conserved across future. Parasitol Today 1998;14:129–131. widely divergent human pathogenic species. Nucleic Acids Res 1996;24: 12. Melville SE, Majiwa P, Donelson J. Resources available from the African 1688–1694. trypanosome genome project. Parasitol Today 1998;14:3–4. 39. Wincker P, Ravel C, Britto C, Dubessay P, Bastien P, Pages M, Blaineau 13. Johnston DA. The WHO/UNDP/World Bank Schistosoma genome initia- C. A direct method for the chromosomal assignment of DNA markers in tive: current status. Parasitol Today 1997;13:45–46. Leishmania. Gene 1997;194:77–80. 14. Reddy GR. Determining the sequence of parasite DNA. Parasitol Today 40. Ravel C, Macari F, Bastien P, Pages M, Blaineau C. Conservation among 1995;11:37–42. old world Leishmania species of six physical linkage groups defined in 15. Blaxter M. The filarial genome network. Parasitol Today 1995;11:441– Leishmania infantum small chromosomes. Mol Biochem Parasitol 1995; 442. 69:1–8. 16. Blaxter M, Aslett M. Internet resources for the parasite genome projects. 41. Bastien P, Blaineau C, Pages M. Molecular karyotype analysis in Trends Genet 1997;13:40–41. Leishmania. Subcell Biochem 1992;18:131–187. 17. Ivens AC, Smith DF. Parasite genome analysis. A global map of the 42. Myler PJ. Molecular variation in trypanosomes. Acta Trop 1993;53:205– Leishmania major genome: prelude to genomic sequencing. Trans R Soc 225. Trop Med Hyg 1997;91:111–115. 43. Melville SE, Leech V, Gerrard CS, Tait A, Blackwell JM. The molecular 18. Butler D. Funding assured for international malaria sequencing project. karyotype of the megabase chromosomes of Trypanosoma brucei and Nature 1997;388:701. the assignment of molecular markers. Mol Biochem Parasitol 1998;94: 19. Gibbs WW. Lost science in the Third World. Sci Am 1995;273:76–83. 155–173. 20. Pena SD. Third World participation in genome projects. Trends Biotech- 44. Turner CMR, Melville SE, Tait AA. Proposal for karyotype nomenclature in nol 1996;14:74–77. T brucei. Parasitol Today 1997;13:5–6. 21. http://www.malaria.org/FOCUS.HTM 45. Weiden M, Osheim YN, Beyer AL, Van der Ploeg LH. Chromosome 22. Strathern JN, Jones EW, Broach JR, editors. The molecular biology of the structure: DNA sequence elements of a subset of the yeast Saccharomyces: life cycle and inheritance. Cold Spring Harbor, minichromosomes of the protozoan Trypanosoma brucei. Mol Cell Biol New York: CSHL Press; 1981. 1991;11:3823–3834. 23. Riddle D, Blumenthal T, Meyer B, Priess J, editors. C. elegans II. Cold 46. Van der Ploeg LH, Valerio D, De Lange T, Bernards A, Borst P, Grosveld Spring Harbor, New York: CSHL Press; 1997. FG. An analysis of cosmid clones of nuclear DNA from Trypanosoma 24. Zingales B, Pereira ME, Oliveira RP, Almeida KA, Umezawa ES, Souto RP, brucei shows that the genes for variant surface glycoproteins are Vargas N, Cano MI, da Silveira JF, Nehme NS, Morel CM, Brener Z, clustered in the genome. Nucleic Acids Res 1982;10:5905–5923. Macedo A. Trypanosoma cruzi genome project: biological characteris- 47. Cross GA, Wirtz LE, Navarro M. Regulation of vsg expression site tics and molecular typing of clone CL Brener. Acta Trop 1997;68:159– transcription and switching in Trypanosoma brucei. Mol Biochem Parasi- 173. tol 1998;91:77–91. 25. Walliker D, Quakyi IA, Wellems TE, McCutchan TF, Szarfman A, London 48. Beals TP, Boothroyd JC. Genomic organization and context of a trypano- WT, Corcoran LM, Burkot TR, Carter R. Genetic analysis of the human malaria parasite Plasmodium falciparum. Science 1987;236:1661–1666. some variant surface glycoprotein gene family. J Mol Biol 1992;225:961– 26. Wellems TE, Panton LJ, Gluzman IY, do Rosario VE, Gwadz RW, 971. Walker-Jonah A, Krogstad DJ. Chloroquine resistance not linked to 49. Santos MR, Cano MI, Schijman A, Lorenzi H, Vazquez M, Levin MJ, mdr-like genes in a Plasmodium falciparum cross. Nature 1990;345:253– Ramirez JL, Brandao A, Degrave WM, Franco da Silveira J. The 255. Trypanosoma cruzi genome project: nuclear karyotype and gene map- 27. Walker-Jonah A, Dolan SA, Gwadz RW, Panton LJ, Wellems TE. An RFLP ping of clone CL Brener. Mem Inst Oswaldo Cruz 1997;92:821–828. map of the Plasmodium falciparum genome, recombination rate and 50. Henriksson J, Porcel B, Rydaker M, Ruiz A, Sabaj V, Galanti N, Cazzulo favoured linkage groups in a genetic cross. Mol Biochem Parasitol JJ, Frasch AC, Pettersson U. Chromosome specific markers reveal 1992;51:313–320. conserved linkage groups in spite of extensive chromosomal size 28. Gibson WC. The significance of genetic exchange in trypanosomes. variation in Trypanosoma cruzi. Mol Biochem Parasitol 1995;73:63–74. Parasitol Today 1995;11:465–468. 51. Cano MI, Gruber A, Vazquez M, Cortes A, Levin MJ, Gonzalez A, 29. http://www.gla.ac.uk/ Acad/WUMP/wumprtai.htm Degrave W, Rondinelli E, Zingales B, Ramirez JL Alonso C, Requena JM, 30. Pica-Mattoccia L, Dias LC, Moroni R, Cioli D. Schistosoma mansoni: Franco da Silveira J. Molecular karyotype of clone CL Brener chosen for genetic complementation analysis shows that two independent hycan- the Trypanosoma cruzi genome project. Mol Biochem Parasitol 1995;71: thone/oxamniquine-resistant strains are mutated in the same gene. Exp 273–278. Parasitol 1993;77:445–449. 52. http://www.dbbm.fiocruz.br/genome/tcruzi/meet97.html 31. Wellems TE, Walliker D, Smith CL, do Rosario VE, Maloy WL, Howard RJ, 53. Hanke J, Sanchez DO, Henriksson J, Aslund L, Pettersson U, Frasch AC, Carter R, McCutchan TF. A histidine-rich protein gene marks a linkage Hoheisel JD. Mapping the Trypanosoma cruzi genome: analyses of group favoured strongly in a genetic cross of Plasmodium falciparum. representative cosmid libraries. Biotechniques 1996;21:686–688. Cell 1987;49:633–642. 54. Ruef BJ, Dawson BD, Tewari D, Fouts DL, Manning JE. Expression and 32. Lanzer M, de Bruin D, Wertheimer SP, Ravetch JV. Organization of evolution of members of the Trypanosoma cruzi trypomastigote surface chromosomes in Plasmodium falciparum: a model for generating karyo- antigen multigene family. Mol Biochem Parasitol 1994;63:109–120. typic diversity. Parasitol Today 1994;10:114–117. 55. Martin F, Maranon C, Olivares M, Alonso C, Lopez, MC. Characterization 33. Lanzer M, Fischer K, Le Blancq SM. Parasitism and chromosome of a non-long terminal repeat cDNA (L1Tc) from Trypano- dynamics in protozoan parasites: is there a connection? Mol Biochem soma cruzi: homology of the first ORF with the ape family of DNA repair Parasitol 1995;70:1–8. enzymes. J Mol Biol 1995;247:49–59.

BioEssays 21.2 145 Review articles

56. Short RB, Liberatos JD, Teehan WH, Bruce JI. Conventional Giemsa- Cohen D, Le Paslier D, Levin MJ. Towards the physical map of the stained and C-banded chromosomes of seven strains of Schistosoma Trypanosoma cruzi nuclear genome: construction of YAC and BAC mansoni. J Parasitol 1989;75:920–926. libraries of the reference clone T cruzi CL-Brener. Mem Inst Oswaldo 57. Sakaguchi Y, Tada I, Ash LR, Aoki Y. Karyotypes of Brugia pahangi and Cruz 1997;92:843–852. Brugia malayi (Nematoda: Filarioidea). J Parasitol 1983;69:1090–1093. 80. Frohme M, Hanke J, Aslund L, Pettersson U, Hoheisel JD. Selective 58. Hirai H, LoVerde P. FISH techniques for constructing physical maps on generation of chromosomal cosmid libraries within the Trypanosoma schistosome chromosomes. Parasitol Today 1995;11:310–314. cruzi genome project. Electrophoresis 1998;19:478–481. 59. http://www.nhm.ac.uk/hosted_sites/schisto/network/meetings/96meet.h- 81. Hoheisel JD, Maier E, Mott R, McCarthy L, Grigoriev AV, Schalkwyk LC, tml Nizetic D, Francis F, Lehrach H. High resolution cosmid and P1 maps 60. Tanaka M, Hirai H, LoVerde PT, Nagafuchi S, Franco GR, Simpson AJ, spanning the 14 Mb genome of the fission yeast S. pombe. Cell Pena SD. Yeast artificial chromosome (YAC)-based genome mapping of 1993;9:109–120. Schistosoma mansoni. Mol Biochem Parasitol 1995;69:41–51. 82. Kelly JM, Das P, Tomas AM. An approach to functional complementation 61. http://www.nhm.ac.uk/hosted_sites/schisto/mapping/FISHmaps.html by introduction of large DNA fragments into Trypanosoma cruzi and 62. Grossman AI, Short RB, Kuntz RE. Somatic chromosomes of Schisto- Leishmania donovani using a cosmid shuttle vector. Mol Biochem soma rodhaini, S mattheei, and S intercalatum. J Parasitol 1981;67:41– Parasitol 1994;65:51–62. 44. 83. Hanke J, Frohme M, Laurent JP, Swindle J, Hoheisel JD. Hybridization 63. Nara T, Iwamura Y, Tanaka M, Irie Y, Yasuraoka K. Dynamic changes of mapping of Trypanosoma cruzi chromosomes III and IV. Electrophoresis DNA sequences in Schistosoma mansoni in the course of development. 1998;19:482–485. Parasitology 1990;100:241–245. 84. http://helios.bto.ed.ac.uk/mbx/fgn/mapping.html 64. Iwamura Y, Yonekawa H, Irie Y. Detection of host DNA sequences 85. http://helios.bto.ed.ac.uk/mbx/fgn/pubs/1998report.html including the H-2 locus of the major histocompatibility complex in 86. http://www.nhm.ac.uk/hosted_sites/schisto/network/meetings/98meet.h- schistosomes. Parasitology 1995;110:163–170. tml 65. Irie Y, Iwamura Y. Host-related DNA sequences are localized in the body 87. Brown SDM, Carey AH. Chromosome dissection and cloning. In: Gosden of schistosome adults. Parasitology 1993;107:519–528. JR, editor. Chromosome analysis protocols. New Jersey: Humana; 1994. 66. Clough KA, Drew AC, Brindley PJ. Host-like sequences in the schisto- p 425–436. some genome. Parasitol Today 1996;12:283–286. 88. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, 67. Drew AC, Brindley PJ. A retrotransposon of the non-long terminal repeat Xiao H, Merril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie class from the human blood fluke Schistosoma mansoni. Similarities to WR and Venter JC. Complementary DNA sequencing: expressed se- the chicken-repeat-1-like elements of vertebrates. Mol Biol Evol 1997;14: quence tags and . Science 1991;252:1651–1656. 602–610. 89. Franco GR, Adams MD, Soares MB, Simpson AJ, Venter JC, Pena SD. 68. Southgate VR, Rollinson D. Natural history of transmission and schisto- Identification of new Schistosoma mansoni genes by the EST strategy some interactions In: Rollinson D, Simpson AJG, editors. The biology of using a directional cDNA library. Gene 1995;152:141–147. schistosomes: from genes to latrines. London: Academic Press; 1987. p 90. Franco GR, Rabelo EM, Azevedo V, Pena HB, Ortega JM, Santos TM, 347–378. Meira WS, Rodrigues NA, Dias CM, Harrop R, Wilson A, Saber M, 69. Bremond P, Sellin B, Sellin E, Nameoua B, Labbo R, Theron A, Combes Abdel-Hamid H, Faria MS, Margutti ME, Parra JC, Pena SD. Evaluation of C. Arguments for the modification of the genome introgression of the cDNA libraries from different developmental stages of Schistosoma human parasite Schistosoma haematobium by genes from S. bovis,in mansoni for production of expressed sequence tags (ESTs). DNA Res Niger. C R Acad Sci III 1993;316:667–670. 1997;30:231–240. 70. Sironi M, Bandi C, Sacchi L, Di Sacco B, Damiani G, Genchi C. Molecular 91. Chakrabarti D, Reddy GR, Dame JB, Almira EC, Laipis PJ, Ferl RJ, Yang evidence for a close relative of the arthropod endosymbiont Wolbachia in TP, Rowe TC, Schuster SM. Analysis of expressed sequence tags from a filarial worm. Mol Biochem Parasitol 1995;74:223–227. Plasmodium falciparum. Mol Biochem Parasitol 1994;66:97–104. 71. de Bruin D, Lanzer M, Ravetch JV. Characterization of yeast artificial 92. el-Sayed NM, Alarcon CM, Beck JC, Sheffield VC, Donelson JE. cDNA chromosomes from Plasmodium falciparum: construction of a stable expressed sequence tags of Trypanosoma brucei rhodesiense provide representative library and cloning telomeric DNA fragments. Genomics new insights into the biology of the parasite. Mol Biochem Parasitol 1992;14:332–339. 1995;73:75–90. 72. Lanzer M, de Bruin D Ravetch JV. Transcription differences in polymor- 93. Brandao A, Urmenyi T, Rondinelli E, Gonzalez A, de Miranda AB, phic and conserved domains of a complete cloned P falciparum Degrave W. Identification of transcribed sequences (ESTs) in the Trypano- chromosome. Nature 1993;361:654–657. soma cruzi genome project. Mem Inst Oswaldo Cruz 1997;92:863–866. 73. Thompson JK, Cowman AF. A YAC contig and high resolution restriction 94. Levick MP, Blackwell JM, Connor V, Coulson RM, Miles A, Smith HE, Wan map of chromosome 3 from Plasmodium falciparum. Mol Biochem KL, Ajioka JW. An expressed sequence tag analysis of a full-length, Parasitol 1997;90:537–542. spliced-leader cDNA library from Leishmania major promastigotes. Mol 74. Rubio JP, Triglia T, Kemp DJ, de Bruin D, Ravetch JV, Cowman AF. A YAC Biochem Parasitol 1996;76:345–348. contig map of Plasmodium falciparum chromosome 4: characterization 95. Blaxter ML, Raghavan N, Ghosh I, Guiliano D, Lu W, Williams SA, Slatko of a DNA amplification between two recently separated isolates. Genom- B, Scott AL. Genes expressed in Brugia malayi infective third stage ics 1995;26:192–198. larvae. Mol Biochem Parasitol 1996;77:77–93. 75. Fischer K, Horrocks P, Preu␤ M, Wiesner J, Wunsch S, Camargo AA, 96. Ullu E, Nilsen T. Molecular biology of protozoan and helminth parasites. Lanzer M. Expression of var genes located within polymorphic subtelo- In: Marr JJ, Muller M, editors. Biochemistry and molecular biology of meric domains of Plasmodium falciparum chromosomes. Mol Cell Biol parasites. New York: Academic Press; 1995. p 1–17. 1997;17:3679–3686. 97. Krause M, Hirsh D. A trans-spliced leader sequence on actin mRNA in C. 76. Rubio JP, Thompson JK, Cowman AF. The var genes of Plasmodium elegans. Cell 1987;49:753–761. falciparum are located in the subtelomeric region of most chromosomes. 98. Rajkovic A, Davis RE, Simonsen JN, Rottman FM. A spliced leader is EMBO J 1996;15:4069–4077. present on a subset of mRNAs from the human parasite Schistosoma 77. Ivens AC, Lewis SM, Bagherzadeh A, Zhang L, Chan HM, Smith DF. A mansoni. Proc Natl Acad Sci USA 1990;87:8879–8883. physical map of the Leishmania major Friedlin genome. Genome Res 99. Lewin B. Units of transcription and translation: sequence components of 1998;8:135–145. heterogeneous nuclear RNA and messenger RNA. Cell 1975;4:77–93. 78. http://parsun1.path.cam.ac.uk/rep97.htm 100. Bonaldo MF, Lennon G, Soares MB. Normalization and subtraction: two 79. Ferrari I, Lorenzi H, Santos MR, Brandariz S, Requena JM, Schijman A, approaches to facilitate gene discovery. Genome Res 1996;6:791–806. Vazquez M, da Silveira JF, Ben-Dov C, Medrano C, Ghio S, Lopez 101. Neto ED, Harrop R, Correa-Oliveira R, Wilson RA, Pena SD, Simpson AJ. Bergami P, Cano I, Zingales B, Urmenyi TP, Rondinelli E, Gonzalez A, Minilibraries constructed from cDNA generated by arbitrarily primed Cortes A, Lopez MC, Thomas MC, Alonso C, Ramirez JL, Chiurrillo MA, RT-PCR: an alternative to normalized libraries for the generation of ESTs Rangel Aldao R, Branda˜o A, Degrave W, Perrot V, Saumier M, Billaut A, from nanogram quantities of mRNA. Gene 1997;186:135–142.

146 BioEssays 21.2 Review articles

102. Schneider A, McNally KP, Agabian N. Splicing and 3’-processing of the 120. ftp.ebi.ac.uk/pub/databases/parasites/ tyrosine tRNA of Trypanosoma brucei. J Biol Chem 1993;268:21868– 121. http://www.wehi.edu.au/MalDB-www/genomeInfo/MalariaDB/malari- 21874. aDB.html 103. el-Sayed NM, Donelson JE. A survey of the Trypanosoma brucei 122. http://www.hgmp.mrc.ac.uk/About/Registration/ rhodesiense genome using shotgun sequencing. Mol Biochem Parasitol 123. Engers HD, Godal T. Malaria vaccine development: current status. 1997;84:167–178. Parasitol Today 1998;14:56–64. 104. McCutchan TF, Hansen JL, Dame JB, Mullins JA. Mung bean nuclease 124. Grieve RB, Wisnewski N, Frank GR, Tripp CA. Vaccine research and cleaves Plasmodium genomic DNA at sites before and after genes. development for the prevention of filarial nematode infections. Pharm Science 1984;225:625–628. Biotechnol 1995;6:737–768. 105. Reddy GR, Chakrabarti D, Schuster SM, Ferl RJ, Almira EC, Dame JB. 125. Pays E, Nolan DP. Expression and function of surface proteins in Gene sequence tags from Plasmodium falciparum genomic DNA frag- Trypanosoma brucei. Mol Biochem Parasitol 1998;9:3–36. ments prepared by the ‘‘genease’’ activity of mung bean nuclease. Proc 126. Soo SS, Villarreal-Ramos B, Anjam Khan CM, Hormaeche CE, Blackwell Natl Acad Sci USA 1993;90:9867–9871. JM. Genetic control of immune response to recombinant antigens 106. Vernick KD, Imberski RB, McCutchan TF. Mung bean nuclease exhibits a carried by an attenuated Salmonella typhimurium vaccine strain: Nramp1 generalized gene-excision activity upon purified Plasmodium falciparum influences T-helper subset responses and protection against leishmanial genomic DNA. Nucleic Acids Res 1988;16:6883–6896. challenge. Infect Immun 1998;66:1910–1917. 107. Brown KH, Brentano ST, Donelson JE. Mung bean nuclease cleaves 127. Engers HD, Bergquist R, Modabber F. Progress on vaccines against preferentially at the boundaries of variant surface glycoprotein gene parasites. Dev Biol Stand 1996;87:73–84. transpositions in trypanosome DNA. J Biol Chem 1986;261:10352– 128. Cox FE. Designer vaccines for parasitic diseases. Int J Parasitol 10358. 1997;27:1147–1157. 108. Muhich ML, Simpson L. Specific cleavage of kinetoplast minicircle DNA 129. Kalinna BH. DNA vaccines for parasitic infections. Immunol Cell Biol from Leishmania tarentolae by mung bean nuclease and identification of 1997;75:370–375. several additional minicircle sequence classes. Nucleic Acids Res 130. Waine GJ, McManus DP. Schistosomiasis vaccine development: the 1986;14:5531–5556. current picture. Bioessays 1997;19:435–443 109. http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html 131. Kelly JM. Genetic transformation of parasitic protozoa. Adv Parasitol 110. http://www.sanger.ac.uk/Projects/P_falciparum/ 1997;39:227–270. 111. Myler PJ, Audleman L, deVos T, Hixson G, Kiser P, Lemley C, Magness C, Ricke E, Sisk E, Sunkin S, Swartzell S, Westlake T, Bastien P, Fu G, Ivens 132. Van Wye JD, Haldar K. Expression of green fluorescent protein in A, Stuart K. Leishmania major (Friedlin) chromosome 1 has an unusual Plasmodium falciparum. Mol Biochem Parasitol 1997;87:225–229. distribution of protein coding genes. 1998. Pro Natl Acad Sci USA 133. Kwa MS, Veenstra JG, Van Dijk M, Roos MH. Beta-tubulin genes from the (submitted). parasitic nematode Haemonchus contortus modulate drug resistance in 112. http://www.ebi.ac.uk/parasites/LGN/leishseq.html Caenorhabditis elegans. J Mol Biol 1995;246:500–510. 113. http://www.tigr.org/tdb/mdb/tbdb/index.html 134. Mamoun CB, Sullivan DJ Jr, Banerjee R, Goldberg DE. Identification and 114. Bringaud F, Vedrenne C, Cuvillier A, Parzy D, Baltz D, Tetaud E, Pays E, characterization of an unusual double serine/threonine protein phospha- Venegas J, Merlin G, Baltz T. Conserved organization of genes in tase 2C in the malaria parasite Plasmodium falciparum. J Biol Chem trypanosomatids. Mol Biochem Parasitol 1998;94:249–264. 1998;273:11241–11247. 115. Burglin TR, Lobos E, Blaxter ML. Caenorhabditis elegans as a model for 135. Shoemaker DD, Lashkari DA, Morris D, Mittmann M, Davis RW. Quantita- parasitic nematodes. Int J Parasitol 1998;28:395–411. tive phenotypic analysis of yeast deletion mutants using a highly parallel 116. http://www.nhm.ac.uk/hosted_sites/schisto/clusters/intro.html molecular bar-coding strategy. Nat Genet 1996;14:450–456. 117. http://helios.bto.ed.ac.uk/mbx/fgn/clusters/cluster.html 136. Celis JE, Ostergaard M, Jensen NA, Gromova I, Rasmussen HH, Gromov 118. Blaxter ML, Guiliano DB, Scott AL, Williams SA. A unified nomenclature P. Human and mouse proteomic databases: novel resources in the for filarial genes Parasitol Today 1997;13:416–417. protein universe. FEBS Lett 1998;430:64–72. 119. Durbin R, Thierry Mieg J. A C. elegans database. Documentation, code 137. http://www.tigr.org/tdb/mdb/pfdb/opticalmapping.html and data available from anonymous FTP servers at lirmm.lirmm.fr, 138. Bond E, Austin MJ. Funding sequencing efforts. Science 1997;275:1051– cele.mrc-lmb.cam.ac.uk, and ncbi.nlm.nih.gov 1991. 1052.

BioEssays 21.2 147