New Var Reconstruction Algorithm Exposes High Var Sequence Diversity in a Single Geographic Location in Mali Antoine Dara1†, Elliott F
Total Page:16
File Type:pdf, Size:1020Kb
Dara et al. Genome Medicine (2017) 9:30 DOI 10.1186/s13073-017-0422-4 RESEARCH Open Access New var reconstruction algorithm exposes high var sequence diversity in a single geographic location in Mali Antoine Dara1†, Elliott F. Drábek2†, Mark A. Travassos1, Kara A. Moser2, Arthur L. Delcher2,QiSu2, Timothy Hostelley2, Drissa Coulibaly3, Modibo Daou3ˆ, Ahmadou Dembele3, Issa Diarra3, Abdoulaye K. Kone3, Bourema Kouriba3, Matthew B. Laurens1, Amadou Niangaly3, Karim Traore3, Youssouf Tolo3, Claire M. Fraser2,4,5, Mahamadou A. Thera3, Abdoulaye A. Djimde3, Ogobara K. Doumbo3, Christopher V. Plowe1 and Joana C. Silva2,4* Abstract Background: Encoded by the var gene family, highly variable Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) proteins mediate tissue-specific cytoadherence of infected erythrocytes, resulting in immune evasion and severe malaria disease. Sequencing and assembling the 40–60 var gene complement for individual infections has been notoriously difficult, impeding molecular epidemiological studies and the assessment of particular var elements as subunit vaccine candidates. Methods: We developed and validated a novel algorithm, Exon-Targeted Hybrid Assembly (ETHA), to perform targeted assembly of var gene sequences, based on a combination of Pacific Biosciences and Illumina data. Results: Using ETHA, we characterized the repertoire of var genes in 12 samples from uncomplicated malaria infections in children from a single Malian village and showed them to be as genetically diverse as vars from isolates from around the globe. The gene var2csa, a member of the var family associated with placental malaria pathogenesis, was present in each genome, as were vars previously associated with severe malaria. Conclusion: ETHA, a tool to discover novel var sequences from clinical samples, will aid the understanding of malaria pathogenesis and inform the design of malaria vaccines based on PfEMP1. ETHA is available at: https://sourceforge.net/projects/etha/. Keywords: Malaria, Plasmodium falciparum, Plasmodium falciparum erythrocyte membrane protein-1, PfEMP1, var, var2csa, Mali, ETHA Background against clinical malaria [1, 2]. The best studied of these Naturally acquired immunity to malaria appears to variant surface antigen (VSA) families, Plasmodium occur, at least in part, through the acquisition of anti- falciparum erythrocyte membrane protein-1 (PfEMP1) bodies to parasite antigens expressed on the surface of antigens, are large molecules expressed on the surface of infected erythrocytes. In epidemiological studies, having the infected erythrocyte [3, 4] that bind to endothelial antibodies to these parasite-produced erythrocyte sur- receptors [5–8]. Each PfEMP1 typically contains be- face antigens is consistently associated with protection tween two and eight Duffy-binding-like (DBL) domains and one to two cysteine-rich interdomain regions * Correspondence: [email protected] (CIDR). PfEMP1s are encoded by the var family of †Equal contributors P. ˆ genes, 40 to 61 of which have been found in the few Deceased falciparum 2Institute for Genome Sciences, University of Maryland School of Medicine, assembled genomes available to date. An in- Baltimore, MD, USA fected erythrocyte expresses only one PfEMP1 variant 4Department of Microbiology and Immunology, University of Maryland on its surface [9]. P. falciparum parasites undergo clonal School of Medicine, Baltimore, MD, USA Full list of author information is available at the end of the article antigenic variation that is central to their ability to evade © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Dara et al. Genome Medicine (2017) 9:30 Page 2 of 14 the host immune response [10]. PfEMP1s mediate members of multigene families. In contrast, Illumina se- tissue-specific cytoadherence, which allows infected quencing technologies produce much more accurate data, erythrocytes to sequester in small capillaries of targeted but the short read length poses a challenge for var gene host tissue and evade clearance by the spleen, thereby assembly. producing devastating pathologic effects [11]. Compared Here, we describe a novel, specialized pipeline that to other P. falciparum genes, most var genes exhibit ex- leverages publicly available var gene sequences and treme diversity, with less than 50% shared amino acid capitalizes on a combination of relatively low amounts sequence identity [12]. Such diversity has proven a major of PacBio and Illumina data to reconstruct var gene se- obstacle to the development of strategies to amplify and quences present in clinical malaria field samples. We sequence new var genes from clinical samples, hamper- show that all P. falciparum genomes from uncompli- ing the understanding of var gene biology and the role cated malaria infections harbor var2csa and also var of PfEMP1 in clinical disease. The full complement of gene subclasses previously associated with severe mal- var genes is only known for a few reference genomes aria. We also show a lack of clustering of VAR2CSA by and two clinical samples [13]. Until now, it has thus not geography and identify segments conserved across most been possible to sequence vars with adequate complete- var2csa sequences despite high overall sequence diver- ness and from a number of samples sufficient to associ- sity at this locus among all samples analyzed. Such ate specific var elements with clinical outcomes such as segments, if immunogenic, suggest the potential of a pregnancy-associated or severe malaria [14]. pregnancy-associated malaria vaccine based on a limited The best characterized PfEMP1, VAR2CSA, is expressed number of strains. on the surface of infected erythrocytes that bind to chon- droitin sulfate in the placental matrix [15, 16]. Antibodies Methods to this antigen prevent placental cytoadherence and are as- Sample collection, genomic DNA preparation, and sociated with protection from placental malaria [15]. From quantification of host contamination the limited number of variants sequenced, VAR2CSA ex- A total of 3–5 mL of venous whole blood were obtained hibits significantly greater sequence conservation than from 12 children with symptomatic malaria episodes in other vars, with over 75% shared amino acid identity Bandiagara, Mali, in 2010, as part of a study measuring among orthologs [12, 17, 18], making it a promising sub- the incidence of malaria at a vaccine testing site [22]. unit vaccine candidate against pregnancy-associated mal- Whole blood samples were leukocyte-depleted at the aria. Better characterization of VAR2CSA diversity within time of collection using CF11 columns [23] and then and across different geographic locations would enable an kept frozen at –80 °C until DNA extraction. DNA was informed assessment of the need for regionally-based vac- extracted using the QIAamp Blood Midi kit, yielding cines versus a globally effective vaccine approach. 300 μL of each sample eluted in distilled water. Samples The study of rapidly evolving genes such as those that were stored at –80 °C until DNA sequencing. We deter- encode most variant surface antigen families relies on de mined clonality using six neutral microsatellite markers novo genome assemblies and/or gene-targeted sequen- [24, 25]. Host contamination was calculated retrospect- cing approaches. Assessment of genetic variation based ively by measuring the proportion of sequenced reads on mapping of re-sequencing reads requires reads to that mapped to the human genome (version GRCh37) map uniquely between isolates within a maximum se- (see below). Isolate NF54 was provided by Sanaria, quence divergence cutoff (usually ~2%, or a maximum Working Cell Bank SAN02-073009, cultured using of two to three single nucleotide polymorphisms, SNPs, standard protocols [26], and DNA extracted as above. in a 100 bp read). A large proportion of reads originating from var loci often cannot be reliably mapped across Generation of Illumina and PacBio whole-genome isolates, either because they map to repetitive regions sequence data and genome assemblies present in multiple copies in the genome or because Genomic DNA libraries for all samples were constructed they differ among orthologous copies by more than the for sequencing on the Illumina platform using the KAPA threshold cutoff necessary for mapping [19]. The long Library Preparation Kit (Kapa Biosystems, Woburn, MA, sequence reads generated with Pacific Biosciences (Pac- USA). First, 500 ng of DNA was fragmented with the Bio) technology have great utility as they allow the Covaris E210 to about 200 bp. Then libraries were pre- placement of repetitive regions in an assembly, using pared using a modified version of manufacturer’s proto- flanking unique conserved sequences as a genomic an- col. The DNA was purified between enzymatic reactions chor [20, 21]. However, if the PacBio reads are short in and the size selection of the library was performed comparison to the length of VSA genes, the high error with AMPure XT beads (Beckman