Plasmodium falciparum malaria in the Greater Mekong Sub-region: Elucidating parasite migration and genomic signatures of selection

Item Type dissertation

Authors Jacob, Christopher George

Publication Date 2015

Abstract Current estimates place over one third of the world's population at risk of contracting malaria, with approximately 300-500 million cases of clinical illness each year. Among these cases it is estimated more than 600k end in death, mostly among child...

Keywords migration; selection; Asia, Southeastern; Genetics; Malaria; Plasmodium falciparum

Download date 24/09/2021 15:23:13

Link to Item http://hdl.handle.net/10713/4589 Curriculum Vitae Christopher George Jacob Graduate Program in Life Sciences University of Baltimore

Contact Information Center for Vaccine Development University of Maryland School of Medicine 685 W. Baltimore St. Suite 480 Baltimore, MD 21201 E-mail: [email protected]

Education 2008 B.S. College of Mount Saint Joseph, Cincinnati, OH Biology, Chemistry minor (Summa Cum Laude) 2015 (expected) Ph.D. University of Maryland Baltimore, Baltimore, MD Molecular Medicine (Advisor: Christopher Plowe)

Post Graduate Education and Training 2011 Workshop on Molecular Evolution Fort Collins, Colorado, U.S.A. Director: Michael Cummings

Employment History

Other Employment 2008-2009 Cytogenetic laboratory assistant Division of Human Genetics, Cincinnati Children’s Hospital, Cincinnati, OH

2009-present Graduate Research Assistant Center of Vaccine Development, Department of Medicine University of Maryland School of Medicine, Baltimore, MD Thesis Advisor: Christopher Plowe

Institute for Genome Sciences University of Maryland School of Medicine, Baltimore, MD Advisor: Joana Carneiro Da Silva

Department of Microbiology and Immunology University of Maryland Baltimore, Baltimore, MD Advisor: Shiladitya DasSarma

Honors and Awards 2004-2008 Presidential Scholarship, College of Mount Saint Joseph 2004-2008 Art and Science Award, College of Mount Saint Joseph 2008 Biology Department Award, College of Mount Saint Joseph

Professional Society Memberships 2011-present American Society of Tropical Medicine and Hygiene

Teaching Service

Graduate Courses 2010-2011 Teaching Assistant, Mechanisms in Biomedical Sciences (Bioinformatics Module) (GPLS 601) Graduate Program in Life Sciences, University of Maryland Baltimore, 50 Graduate Students

2011 Teaching Assistant, Beginning PERL for Bioinformatics (GPLS 618) Graduate Program in Life Sciences, University of Maryland Baltimore, 10 Graduate Students

2012 Teaching Assistant, Genomics and Bioinformatics (GPLS 716) Graduate Program in Life Sciences, University of Maryland Baltimore, 15 Graduate Students

Publications

Peer-Reviewed Journal Articles (*equal authorship) 1. Capes MD, Coker JA, Gessler R, Grinblat-Huse V, Dassarma SL, Jacob CG , Kim JM, Dassarma P, Dassarma S. The information transfer system of halophilic archaea. Plasmid 2011 Mar; 65(2):77-101

2. Shaukat AM, Gilliams EA, Kenefic LJ, Laurens MB, Dzinjalamala FK, Nyirenda OM, Thesing PC, Jacob CG , Molyneux ME, Taylor TE, Plowe CV, Laufer MK. Clinical manifestations of new versus recrudescent malaria infections following anti-malarial drug treatment. Malaria Journal 2012 Jun; 11(1):207

3. Takala-Harrison S, Jacob CG* , Clark TG*, Cummings MP, Miotto O, Dondorp AM, Fukuda MM, Nosten F, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders D, Socheat D, Ariey F, Pyae Phyo A, Starzengruber P, Fuehrer HP, Swoboda P, Stepniewska K, Flegg J, Arze C, Cerqueira GC,. Silva JC, Ricklefs S, Porcella SF, Stephens RM, Adams M, Kenefic L, Campino S, Auburn S, MacInnis B, Kwiatkowski DP, Su X,. White NJ, Ringwald P, and Plowe CV. Genetic loci associated with delayed clearance of Plasmodium falciparum following artemisinin treatment in Southeast Asia. PNAS 2013 Jan 2;110(1):240- 5

4. Kyaw MP, Nyunt MH, Chit K, Aye MM, Aye KH, Lindegardh N, Tarning J, Imwong M, Jacob CG , Rasmussen C, Perin J, Ringwald, P, Nyunt MM. Reduced Susceptibility of Plasmodium falciparum to Artesunate in Southern Myanmar. PLOS One 2013 Mar 8:8(3) 5. Laurens MB, Billingsley P, Richman A, Eappen AG, Adams M, Li T, Chakravarty S, Gunasekera A, Jacob CG , Sim BKL, Edelman R, Plowe CV, Hoffman SL, Lyke KE. Successful Human Infection with P. falciparum Using Three Aseptic Anopheles stephensi Mosquitoes: A New Model for Controlled Human Malaria Infection. PLOS One 2013 July 16:8(7)

6. Khattak AA, Venkatesan M, Jacob CG , Artimovich EM, Nadeem MF, Nighat F, Hombhanje F, Mita T, Malik SA, Plowe CV. A comprehensive survey of polymorphisms conferring anti-malarial resistance in Plasmodium falciparum across Pakistan. Malaria Journal 2013 Sept; 12:300

7. Munro, JB, Jacob CG, Silva JC. A novel clade of unique eukaryotic ribonucleotide reductase R2 subunits is exclusive to apicomplexan parasites. Journal of Molecular Evolution 2013 Sept; 77(3) 92-106

8. Jacob CG , Tan JC, Miller BA, Tan A, Takala-Harrison S, Ferdig MT, Plowe CV. A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity. BMC Genomics 2014 Aug 26;15(1):719

9. Jacob CG*, Takala-Harrison S*, Arze C, Cummings MP, Silva JC, Dondorp AM, Fukuda MM, Hien TT, Mayxay M, Noedl H, Nosten F, Kyaw MP, Nhien NTT, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, Socheat D, Ariey F, Mercereau-Puijalon O, Menard D, Newton PN, Starzengruber P, Fuehrer HP, Swoboda P, Khan WA, Phyo AP, Nyunt MM, Nyunt MH, Brown TS, Adams M, Pepin CS, Bailey J, Tan JC, Ferdig MT, Clark TG, Miotto OM, MacInnis B, Kwiatkowski DP, White NJ, Ringwald P, Plowe CV. Independent emergence of Plasmodium falciparum artemisinin resistant mutations in Southeast Asia. Journal of Infectious Diseases, Epub 2014 Sep. 1 10. Brown, T, Jacob CG , Silva JC, Takala-Harrison S, Djimde A, Dondorp A, Fukuda M, Noedl H, Kyaw MP, Mayxay M, Hien TT, Plowe CV, Cummings MP. Plasmodium falciparum field isolates from areas of repeated emergence of drug resistance show no evidence of hypermutator phenotype. Infection, Genetics, and Evolution, Epub 2014 Dec. 13

11. Miotto O, Amato R, Ashley EA, MacInnis B, Almagro-Garcia J, Amaratunga C, Lim P, Mead D, Oyola S, Dhorda M, Imwong M, Woodrow C, Manske M, Stalker J, Drury E, Campino S, Amenga-Etego L, Thanh TNN, Tran HT, Ringwald P, Bethell D, Nosten F, Phyo AP, Pukrittayakamee S, Chotivanich K, Chuor CM, Nguon C, Suon S, Sreng S, Newton PN, Mayxay M, Khanthavong M, Hongvanthong B, Htut Y, Han KT, Kway MP, Faiz MA, Fanello CI, Mokuolu OA, Jacob CG , Takala-Harrison S, Plowe CV, Day NP, Dondorp AM, Spencer CAC, McVean G, Fairhurst RM, White NJ, Kwiatkowski DP. Genetic architecture of artemisinin-resistant Plasmodium falciparum . Nature Genetics, Epub 2015 Jan. 19

12. Travassos MA, Coulibaly D, Laurens MB, Dembele A, Tolo Y, Kone AK, Traore K, Niangaly A, Guindo A, Wu Y, Berry AA, Jacob CG, Takala-Harrison S, Kouriba B, Lyke KE, Diallo DA, Doumbo OK, Plowe CV, Thera MA. Hemoglobin C trait provides protection from clinical falciparum malaria comparable to that of Hemoglobin S trait in Malian children. Journal of Infectious Diseases (under review)

13. Huang F, Takala-Harrison S, Jacob CG , Liu H, Sun X, Yang H, Nyunt MM, Adams M, Zhou S, Xia Z, Ringwald P, Bustos, MD, Tang L, Plowe CV. A single mutation in K13 predominates in Souther China and is associated with delayed clearance of Plasmodium falciparum following artemisinin treatment. Journal of Infectious Diseases (under review)

14. Chaorattanakawee S, Saunders DL, Sea D, Chanarat N, Yingyuen K, Sundrakes S, Saingam P, Buathong N, Sriwichai S, Chann S, Se Y, Yom Y, Heng TK, Kong N, Kuntawunginn W, Tanghongchaiwiriya K, Jacob CG , Takala-Harrison S, Plowe CV, Lin JT, Chuor CM, Satharath P, Tyner SD, Gosi P, Teja-isavadharm P, Lon C, Lanteri CA. Ex vivo drug susceptibility and Plasmodium falciparum multidrug resistance gene 1 (pfmdr1) profiling of clinical isolates from Cambodia in 2008-2013 suggest emerging piperaquine resistance. (in preparation)

Book Chapters 1. Jacob CG , Plowe CV. Malaria Genomics and the Developing World. Genomics Applications for the Developing World, Advances in Microbial Ecology, Nelson KE, Jones-Nelson B (eds.), Springer Science Business Media, New York, 2012, 117-130

Abstracts (*Presented) 1. Munro JB, Jacob CG , Silva JC. A novel clade of eukaryotic ribonucleotide reductase R2 subunits is exclusive to apicomplexan parasites. American Society of Tropical Medicine and Hygiene Annual Meeting, December 2011, Philadelphia, PA.

2. Takala-Harrison S, Imwong M, Jacob CG , Arze C, Dondorp A, Fukuda M, Nosten F, Noedl H, Bethell D, Se Y, Lon C, Tyner S, Saunders D, Socheat D, Pyae Phyo A, Starzengruber P, Swoboda P, Stepniewska K, Flegg J, Cerqueira G, Silva JC, Adams M, Kenefic L, Bailey J, Niangaly A, White N, Ringwald P, Plowe CV. ARC3: Associations between candidate gene polymorphisms and parasite clearance rate following treatment with artemisinins. American Society of Tropical Medicine and Hygiene Annual Meeting, December 2011, Philadelphia, PA.

3. Takala Harrison S, Clark T, Cummings M, Jacob CG , Miotto O, Dondorp A, Fukuda M, Nosten F, Noedl H, Imwong M, Bethell D, Se D, Lon C, Tyner S, Saunders D, Socheat D, Pyae Phyo A, Starzengruber P, Swoboda P, Stepniewska K, Flegg J, Cerqueira G, Silva JC, Arze C, Ricklefs S, Porcella S, Adams M, Kenefic L, Campino S, Auburn S, Manske M, MacInnis B, Kwiatkowski D, Su X, White N, Ringwald P, Plowe CV. ARC3: A genome-wide association study of the genetic basis of parasite clearance rate following treatment with artemisinins. American Society of Tropical Medicine and Hygiene Annual Meeting, December 2011, Philadelphia, PA.

4. Jacob CG* , Miotto O, Takala-Harrison S, Clark T, Cummings M, Dondorp A, Fukuda M, Nosten F, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner S, Saunders D, Socheat D, Pyae Phyo A, Starzengruber P, Swoboda P, Cerqueira G, Silva JC, Arze C, Ricklefs S, Porcella S, Adams M, Kenefic L, Campino S, Auburn S, Manske M, MacInnis B, Kwiatkowski D, Su X, White N, Ringwald P, Plowe CV. ARC3: Detecting recent positive selection in artemisinin resistant malaria parasites. American Society of Tropical Medicine and Hygiene Annual Meeting, December 2011, Philadelphia, PA.

5. Jacob CG* , Miotto O, Takala-Harrison S, Clark T, Cummings M, Dondorp A, Fukuda M, Nosten F, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner S, Saunders D, Socheat D, Pyae Phyo A, Starzengruber P, Swoboda P, Cerqueira G, Silva JC, Arze C, Ricklefs S, Porcella S, Adams M, Kenefic L, Campino S, Auburn S, Manske M, MacInnis B, Kwiatkowski D, Su X, White N, Ringwald P, Plowe CV. Detecting recent positive selection in artemisinin-resistant malaria parasites from Southeast Asia. HHMI Translation Medicine Meeting, April 2012, Ashburn, VA.

6. Jacob CG* , Takala-Harrison S, Miotto O, Manske M, Woodrow C, Dondorp AM, Fukuda MM, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders D, Socheat D, Starzengruber P, Fuehrer HP, Swoboda P, Arze C, Silva JC, Maslen G, Auburn S, Campino S, MacInnis B, Kwiatkowski DP, White NJ, Ringwald P, Plowe CV. SNPs reveal genome-wide differences between two population from Southeast Asia. Genomic Epidemiology of Malaria Meeting, June 2012, Hinxton, UK.

7. Takala-Harrison S, Pepin C, Cummings MP, Jacob CG , Arze C, Dondorp A, Fukuda M, Hien TT, Kyaw MP, Nyunt MH, Nyunt MM, Mayxay M, Newton, PN, Nosten F, Noedl H, Imwong M, Bethel D, Se Y, Lon C, Tyner S, Saunders D, Socheat D, Phyoh AP, Starzengruber P, Swoboda P, Fuehrer HP, Clark TG, Silva JC, Adams M, Tan JC, Ferdig MT, Miotto O, Manske M, MacInnis B, Kwiatkowski D, White NJ, Ringwald P, Plowe CV. A replication genome-wide association study of the genetic basis of delayed parasite clearance following treatment with artemisinins. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

8. Jacob CG* , Tan JC, Miller BA, Tan A, Saunders D, Lanteri C, Lon C, Sugarum R, Hoonchaiyaphum T, Satimai W, Mita T, Alam MS, Akter J, Fang H, Han KT, Takala-Harrison S, Ferdig MT, Plowe CV. Application of a Plasmodium falciparum whole genome SNP microarray to field samples collected as dried blood spots from Southeast Asia. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

9. Jacob CG* , Takala-Harrison S, Silva JC, Lanteri C, Saunders D, Bethel D, Tyner S, Se Y, Lon C, Socheat D, Dondorp A, Imwong M, Hien TT, Satimai W, Sugarum R, Hoonchaiyaphum T, Mita T, Alam MS, Akter J, Fang H, Kyaw MP, Han KT, Nyunt MM, Mayxay M, Newton PN, Nosten F, Phyo AP, Noedl H, Starzengruber P, Tan JC, Ferdig MT, Fairhurst RM, Miotto O, Manske M, MacInnis B, Kwiatkowski D, White NJ, Ringwald P, Plowe CV. Falciparum malaria in the Greater Mekong Sub-Region: Mapping gene flow and genomic signatures of drug resistance. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

10. Larsen CP, Venkatesan M, Jacob CG , Adams M , Mon HH, Han KT, Nyung MH, Takala-Harrison S, Smith JJ, Nyunt MM, Ringwald P, Kyaw MP, Plowe CV. Field validation of candidate molecular markers of artemisinin resistance in Myanmar. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

11. Sug-arum R, Venkatesan M, Jacob CG* , Hoonchaiyaphum T, Vadla M, Smith JJ, Takala-Harrison S, Satimai W, Plowe CV. Field validation of candidate molecular markers of artemisinin resistance in . American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

12. Huang F, Venkatesan M, Jacob CG , Takala-Harrison S, Adams M, Liu H, Yang H, Zhou S, Tang L, Plowe CV. Genomic epidemiology of artemisinin resistance on the China-Myanmar border. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

13. Cassin JW, Jacob CG , Thesing PC, Nyirenda OM, Masonga R, Taylor TE, Plowe CV, Laufer MK. Malaria transmission in households in Blantyre, Malawi. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

14. Jacob CG *, Takala-Harrison S, Dhorda M, Joshi S, Adams M, Erickson K, Wang A, Tan JC, Miller BA, Tan A, Nguon C, Prukrittayakamee S, Imwong M, Tihn Hien T, Htut Y, Mokuolu O, Mayxay M, Onyamboko MA, Phyo AP, Nosten F, Faiz MA, Miotto O, MacInnis B, Kwiatkowski DP, Fairhurst RM, Ferdig MT, Guerin PJ, Dondorp AM, Day NP, White NJ, Plowe CV. Application of a Plasmodium falciparum whole genome SNP microarray to field samples collected in the Tracking Resistance to Artemisinin Collaboration. Genomic Epidemiology of Malaria Meeting, June 2014, Hinxton U.K.

15. Bailey JA, Travassos MA, Jacob CG , Ouattara A, Coulibaly D, Laurens MB, Takala-Harrison S, Diggs CL, Soissons L, Brown TS, Lyke KE, Lanar DE, Dutta

S, Heppner DG, Pablo J, Nakajima R, Jasinskas A, Niangaly A, Berry AA, Kouriba B, Thera MA, Doumbo OK, Felgner PL, Plowe CV. Plasmodium falciparum AMA-1-based subunit vaccine FMP2.1/AS02A elicits a diverse and strong yet unprotective immune response in a pediatric cohort in Bandiagara, Mali. American Society of Tropical Medicine Annual Meeting, November 2014, New Orleans, LA

16. Takala-Harrison S, Kyaw MP, Han KT, Kyaw YM, Aye KH, Pyar KP, Moser, Jacob CG , Adams M, Hampton SM, Wang A, Hlaing TM, Htut Y, Nyunt MM, Plowe CV. The prevalence and origins of artemisinin resistant falciparum malaria in Myanmar. American Society of Tropical Medicine Annual Meeting, November 2014, New Orleans, LA

17. Huang F, Takala-Harrison S, Jacob CG , Adams M, Liu H, Yang H, Tang L, Zhou X, Ringwald P, Plowe CV. Therapeutic efficacy of dihydroartemisinin- piperaquine for treating uncomplicated Plasmodium falciparum malaria and the K13-propellor polymorphisms along the China-Myanmar border. American Society of Tropical Medicine Annual Meeting, November 2014, New Orleans, LA Major Invited Speeches

National

1. Jacob CG . Genome-wide studies of clinical resistance of Plasmodium falciparum to artemisinins. Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases: Apicomplexa molecular epidemiology and evolution. Loyola University, New Orleans, LA, USA, November 2012.

2. Jacob CG. Falciparum malaria in the Greater Mekong Sub-Region: Mapping gene flow and genomic signatures of drug resistance. American Society of Tropical Medicine Annual Meeting, November 2013, Washington D.C.

International

1. Jacob CG . Genome-wide studies of clinical resistance of Plasmodium falciparum to artemisinins. Infectious Disease Genomics & Global Health: Parasites and Vectors (Session Co-chair). Moller Centre, Cambridge, UK, October 2012.

Published Multimedia

Microbe Magazine: The news magazine of the American Society of Microbiology, March 2010 - Cover photographer - Christopher Jacob & Priya DasSarma

Abstract Title of Dissertation: Plasmodium falciparum Malaria in the Greater Mekong Sub- Region: Elucidating Parasite Migration and Genomic Signatures of Selection

Christopher George Jacob, Doctor of Philosophy, 2015

Dissertation directed by: Christopher V. Plowe, M.D., M.P.H.

Professor, Medicine, Epidemiology and Public Health, Microbiology and Immunology; Leader, Malaria Group; Associate Director for Research Training, Center for Vaccine Development; Investigator, Howard Hughes Medical Institute

Current estimates place over one third of the world’s population at risk of contracting malaria, with approximately 300-500 million cases of clinical illness each year. Among these cases it is estimated more than 600k end in death, mostly among children less than

5 years of age in Africa, making malaria one of the leading causes of infectious disease death in the world. There is currently widespread drug resistance in Plasmodium falciparum to most anti-malaria medications, with some areas harboring multidrug resistant parasites, including those resistant to the current first line treatment artemisinin.

Single nucleotide polymorphisms (SNPs) were typed from over 2000 field samples by

DNA microarray and whole-genome sequencing. Genotypes were used to group parasites into putative subpopulations and bidirectional migration rates between geographic locations were estimated using model-based ADMIXTURE and LAMARC. We detected thirteen subpopulations within Southeast Asian samples including a core of six sub- populations within West Cambodia, the region with the highest prevalence of artemisinin resistance. We find evidence of parasite gene flow across Southeast Asia and between proximal and distant populations. Analysis investigating genes under positive selection using long-haplotype methods (iHS and XP-EHH) as well as population differentiation

found multiple shared loci across populations including drug resistance genes, vaccine antigens, and genes possibly involved in local natural adaptation. Our study highlights the complex genetic structure within Southeast Asia and patterns of parasite migration that identify areas most susceptible to the import of resistant parasites, as well as genes under selection that could be future drug targets or vaccine antigens.

Plasmodium falciparum Malaria in the Greater Mekong Sub-Region: Elucidating Parasite Migration and Genomic Signatures of Selection

by Christopher George Jacob

Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, Baltimore in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2015

©Copyright 2015 by Christopher George Jacob

All Rights Reserved

Table of Contents

Chapter 1: Malaria Genomics and the Developing World ...... 1

1.1 Essentials of malaria ...... 1

1.2 Sequencing human Plasmodium genomes ...... 3

1.3 Post-genomics: fundamental research ...... 5

1.4 Malaria genomics helping the developing world ...... 8

1.4.1 Vaccine target discovery and vaccine developments ...... 8

1.4.2 Drug resistance mechanisms and markers ...... 11

1.4.2.1 Drug resistance mechanisms and markers in the pre-genomic era ...... 11

1.4.2.2 An opportunity to deter drug resistance ...... 13

1.5 Global Collaboration ...... 17

1.6 Drug resistance update ...... 18

Chapter 2: A microarray platform and novel SNP calling algorithm to evaluate

Plasmodium falciparum field samples of low DNA quantity ...... 20

2.1 Background ...... 20

2.2 Methods ...... 21

2.2.1 SNP selection and chip design ...... 21

2.2.2 DNA labeling and hybridization ...... 24

2.2.3 Heuristic base calling algorithm ...... 25

2.2.4 Experimental samples ...... 26

2.2.5 Quantitative PCR ...... 27

2.3 Results ...... 28

2.3.1 Heuristic base calling ...... 28

iii

2.3.2 Analysis of field isolates and leukocyte depletion ...... 34

2.3.3 Effects of whole-genome amplification ...... 36

2.4 Discussion ...... 36

2.5 Conclusions ...... 38

Chapter 3: Admixture and gene flow of Plasmodium falciparum in Southeast Asia .39

3.1 Introduction ...... 39

3.2 Population structure and genetic relatedness...... 41

3.3 Coalescent analysis and estimation of gene flow ...... 46

3.4 Supplementary materials ...... 50

3.4.1 Results ...... 50

3.4.1.1 Geographic population relatedness ...... 50

3.4.2 Methods ...... 51

3.4.2.1 Study Sites ...... 51

3.4.2.2 Parasite genotyping ...... 51

3.4.2.3 Quality control of parasite genotypes ...... 52

3.4.2.4 Ancestry estimation ...... 52

3.4.2.5 Population Differentiation ...... 53

3.4.2.6 Coalescent analysis ...... 54

Chapter 4: Detection of signatures of positive selection in Plasmodium falciparum .57

4.1 Introduction ...... 57

4.2 Methods ...... 59

4.2.1 Sampling locations ...... 59

4.2.2 Parasite genotyping ...... 60

iv

4.2.3 Detection of signatures of positive selection ...... 61

4.3 Results ...... 62

4.3.1 Genes under selection across multiple geographic sites ...... 62

4.3.2 Drug resistance associated signatures of selection ...... 65

4.4 Discussion ...... 66

Chapter 5: Discussion and Future Directions ...... 70

v

List of Tables

Table 1.1 Malaria sequencing efforts to date………………...... 06

Table 2.1 SNP call accuracy and SNP call rate for 3D7 cultured parasites...... 29

Table 2.2 SNP call accuracy and SNP call rate of NF54 purified DNA...... 32

Table 3.1 Bi-directional migration rates...... 46

Table 4.1 Top regions under selection across multiple populations...... 50

vi

List of Figures

Figure 2.1 Microarray probe statistics...... 23

Figure 2.2 Sample intensities and intensity distributions of varying center bases...... 30

Figure 2.3 SNP call accuracy and SNP call rate of low quantity reference DNA…..33

Figure 2.4 SNP call accuracy and SNP call rate of field samples……………….…..35

Figure 3.1 Distribution of genetic subpopulations…………………………..………43

Figure 3.2 Principal components analysis…………………………………………...45

Figure 3.3 Histograms of identity-by-state values………………..…………………55

Figure 3.4 Average cross-validation error per K…………………….………………56

Figure 4.1 Heat map of selected regions in Pailin, Cambodia………………………67

vii

List of Abbreviations

ACT Artemisinin combination therapy

AMA1 Apical membrane antigen 1

BP Base pair

CRT Chloroquine resistance transporter

CV Cross-validation

DBL Duffy binding like

DHFS Dihydrofolate reductase

DHPS Dihydropterate synthase

EMP1 Erythrocyte membrane protein 1

ETRAMP Early transcribed membrane proteins

GCH1 GTP cyclohydrolase 1

GMS Greater Mekong Subregion

GWAS Genome-wide association study

HbE Hemoglobin E

IBS Identity-by-state iHS Integrated haplotype score

IRB Institutional review board

KB Kilobase

iii

LD Leukocyte depletion (chapter 2)

LD Linkage disequilibrium (chapter 3 and 4)

MalariaGEN Malaria genomic epidemiology network

MDR1 Malaria drug resistance gene 1

MSP1 Merozoite surface protein 1

PCA Principal components analysis

PWG Plasmodium writing group

QC Quality control

RESA Ring-infected erythrocyte surface antigen

SEA Southeast Asia

SNP Single nucleotide polymorphism

WGA Whole genome amplification

WHO World Health Organization

WWARN World-wide antimalarial resistance network

WTSI Wellcome Trust Sanger Institute

XP-EHH Cross-population extended haplotype homozygosity

iv

Chapter 1: Malaria Genomics and the Developing World

Published as a book chapter: Jacob CG and Plowe CV. Genomics Application for the

Developing World, Advances in Microbial Ecology, Nelson KE, Jones-Nelson B (eds.),

Springer Science Business Media, New York, 2012, 117 – 130.Serves as the introduction and contains updates to reflect the current situation of malaria research.

1.1 Essentials of Malaria

Malaria is a disease caused by parasites from the genus Plasmodium , a member of the Apicomplexan family. Apicomplexans are unique in that they are the only fully parasitic large clade on the tree of life. It is thought that every type of mammal, bird, and reptile is parasitized by at least one species of Plasmodium [1]. Five Plasmodium species infect humans: P. falciparum , P. vivax , P. malariae , P. ovale , and P. knowlesi , which until recently was thought to infect only non-human primates [2]. P. falciparum and P. vivax are the most prevalent species and P. falciparum is responsible for most cases of severe malaria and death. A key feature of Apicomplexans is that these eukaryotic organisms exist mainly in a haploid state with most having only a brief obligatory diploid phase. Another difference is the apicoplast, a plastid believed to have originated from the phagocytosis of a chloroplast-containing microorganism [3].

Current estimates place over one third of the world’s population at risk of

Plasmodium spp . infection. Malaria is transmitted throughout most of the tropics and sub- tropics, including sub-Saharan Africa, much of Asia, parts of Central and South America, and parts of the Middle East. However most of the at-risk population lives in tropical

Africa, where transmission is the highest, with an estimated 300-500 million cases of

1

clinical malaria illness each year, of which slightly less than one million end in death, putting malaria in the top three major global killers along with HIV/AIDS and tuberculosis. Malaria genomic research to date has focused largely on P. falciparum as the leading cause of malaria-associated disease and death, although the genomes of P. vivax [4], the most common human malaria outside of Africa, and several animal

Plasmodia used as model systems, have now been sequenced [5-7], as shown in the

Table.

As a mosquito-borne infection, malaria follows the rains. Where transmission is intense (as many as 1000 infected mosquito bites per person per year), as in much of sub-

Saharan Africa, repeated exposure throughout early childhood results in naturally acquired immunity that protects older children and adults against disease but does not fully prevent infection. This immunity provides balancing selection pressure that drives extreme genetic diversity in the malaria antigens targeted by immune responses [8].

Immune protection is thought to require repeated exposure to diverse parasites to build up a full repertoire of allele-specific immune responses to parasites that infect a given human population [9]. In the absence of this repeated exposure to malaria infection, immunity wanes over the course of a few years.

Infection with a Plasmodium sp . results in clinical symptoms ranging from asymptomatic infection to classical malarial fever paroxysms, to severe forms of falciparum malaria such as profound anemia, coma, seizures and death, typically from respiratory failure. The greater pathogenic potential of P. falciparum is chiefly attributed to a large, diverse family of var genes encoding proteins expressed one at a time on the surface of the host red blood cell that mediate cytoadherence to host tissues. Gene

2

switching among vars is thought to allow P. falciparum parasites to evade var -specific immune responses, and the interplay between allele-specific immune responses and expression of vars and other genes encoding variant antigens is likely to account for much of the variation in clinical manifestations of malaria [10-12].

The life cycle of malaria parasites is similar within the genus and begins in the female Anopheles mosquito. Worm-like sporozoites are injected into the bloodstream by a biting infected mosquito and quickly migrate to the liver where they enter hepatocytes.

There, they develop into merozoites that multiply until the hepatocyte ruptures, releasing free merozoites that reenter the circulation where they invade red blood cells and either enter a continuous cycle of merozoite development and red cell invasion or develop into male and female gametocytes, which can be taken up during another mosquito blood meal. In the mosquito midgut male gametes fertilize females to create the only diploid stage of the life cycle, providing an opportunity for genetic recombination when genetically non-identical gametes mate. After mating, haploid sporozoites migrate to the mosquito salivary glands for subsequent infection.

1.2 Sequencing Human Plasmodium Genomes

Sequencing of the P. falciparum genome began in the mid-1990s. Parts of the genome were completed by 1998 and the entire draft genome was completed in 2002

[13]. This was accomplished in parallel with completion of the human genome in 2001

[14], and the sequencing of the Anopheles gambiae (the leading mosquito vector of malaria) genome also in 2002 [15]. The completion of all three genomes marked the first

3

time that genomes of a parasite, vector, and host for an infectious disease were available

[16].

The genome of P. falciparum posed challenges not previously encountered with sequencing eukaryote genomes. The ~23 megabase genome is extremely A+T rich with an average G+C content of only 19.4%, dropping in intergenic regions to 13.5%. The approximately 5300 nuclear-encoded genes are distributed across 14 chromosomes varying in length from 650 kilobases (chromosome 1) to 3.3 megabases (chromosome

14). Of the initially identified genes, over 60% lack significant homology to genes in other eukaryotes, leaving the majority of the P. falciparum genome not yet annotated .

Also encoded by P. falciparum are the apicoplast and mitochondrial genomes consisting of 35 and 6 kilobases, respectively [13]. Other basic insights gained from sequencing the

P. falciparum genome included chromosomal location of genes involved in antigenic variation in the subtelomeric regions, and the relative abundance of genes involved in immune evasion and host-parasite interactions compared to the genomes of free-living eukaryotes.

The second fully sequenced human malaria parasite was P. vivax. The genome for this organism was completed in 2008, and like all sequenced malaria parasites it contains

14 chromosomes. Slightly larger than P. falciparum , the P. vivax genome is ~26.8 megabases and contains approximately 5400 nuclear encoded genes. A key difference between the P. vivax and P. falciparum genomes is their G+C content. While P. falciparum had the lowest G+C content of any sequenced genome, P. vivax has an average G+C content of 42.3% [4], the highest of all sequenced Plasmodia [17] .

4

1.3 Post-Genomics: Fundamental Research

Since the completed P. falciparum genome was published in 2002, several more draft and complete genome sequences of parasites from various hosts have been completed (Table 1.1). The availability of multiple genome sequences within the

Plasmodium clade allows researchers to compare functional differences in organisms with their genetic differences and provides an evolutionary view of genes, potentially highlighting genes under selection that could serve as vaccine or drug targets. A key example of such comparative genomics is the differential invasion machineries of P. falciparum and P. vivax. Phenotypic differences between the two species include the restricted invasion of P. vivax to reticulocytes expressing the Duffy receptor, an obligatory receptor for vivax invasion of host erythrocytes [18]. This restriction led some to speculate that the P. vivax invasion machinery was less complex than that in P. falciparum [4]. Sequencing and analysis of the P. vivax genome revealed an expansion of the reticulocyte binding protein family that could provide diversity similar to P. falciparum [19].

Comparative genomics is not limited to studying similarity of family members.

Comparison of parasites and host genomes can provide insight into the feasibility of potential drug targets. Many essential enzymes that would be potential drug targets due to their necessity for parasite survival are too genetically similar between host and parasite, making it difficult to identify gene products that can be targeted by drugs without harming the host. Cellular pathways and processes that are currently considered to be

5

good areas for locating drug targets include metabolism, DNA replication and transcription, and protein modification enzymes [20, 21].

Studies characterizing the transcriptome of P. falciparum have revealed novel parasite biology and validated gene models. Initial transcriptomic studies performed on intraerythrocytic parasites show patterns of gene expression unique to malaria. Using a large-scale parasite culture system parasite transcripts were evaluated using an oligonucleotide array every hour post-invasion [22]. Results showed that transcripts are produced only during the point in the cell cycle where they are needed, and that P. falciparum nuclear genes are not polycistronic, whereas genes located in the apicoplast genome are coregulated [23].

Table 1.1. Malaria Sequencing Efforts to Date: Parasites selected for sequencing were based primarily on infectivity in humans, animal models of parasite biology, and phylogenetic relationships. Citation: PWG = Plasmodium Writing Group - white paper for sequencing malarial genomes, WTSI = Wellcome Trust Sanger Institute; Host: H- human, N-non-human primates, R-rodent, B-bird, L-lizard.

*P. falciparum isolates are currently in multiple stages of sequencing development.

6

Species Strain/Isolate/Location Coverage/Status Host Citation

P. falciparum 3D7 Complete [Published] H,N [11]

>25 isolates ---* H,N PWG

P. vivax Salvador I 10x [Published] H [4]

P. knowlesi H 8x [Published] H,N -

P. ovale Nigeria I 8x [Incomplete] H WTSI

P. coatneyi Malaysia in progress N PWG

P. cynomolgi Berok in progress N PWG

P. fragile Sri Lanka in progress N PWG

P. inui OS in progress N PWG

P. reichenowi Oscar in progress N WTSI

P. berghei ANKA 3x [Published] R [5]

NK65 in progress R PWG

P. chabaudi AS 8x [Published] R [5]

P. vinckei P. v. vinckei in progress R PWG

P. v. petteri in progress R PWG

P. yoelii 17XNL 5x [Published] R -

17XA in progress R PWG

17XYM in progress R PWG

P. gallinaceum A 3x [Incomplete] B WTSI

P. relictum K1 in progress B PWG

KV115 in progress B PWG

P. mexicanum U.S.A in progress L PWG

7

1.4 Malaria Genomics Helping the Developing World

Nearly every paper reporting results of genome sequencing projects, transcriptome studies and other malaria genomics research endeavors, claims that the reported advances will lead to the identification of new drug and vaccine targets. The first generation of malaria genomics studies focused on a small number of parasite strains that have been cultured in the laboratory for many generations, unexposed to human (or for that matter, mosquito) immunity and other environmental stimuli. Less widely appreciated is the notion that sequencing large numbers of wild parasite isolates from the field accompanied by demographic, clinical and parasite phenotype information will directly inform vaccine development [24] and the discovery of mechanisms of drug action or markers of drug resistance. These kinds of genomic epidemiology studies, now underway, are likely to yield meaningful public health benefits for the developing world in the not-too-distant future.

1.4.1 Vaccine Target Discovery and Vaccine Development

Vaccine target discovery studies use sequencing and comparison of multiple isolates of the same species to locate highly polymorphic genes [25] and genes under diversifying selection [26] that likely encode immunogenic antigens. The approaches used to identify new antigens are verified by applying the same methods to known antigens, which give similar levels of selection and/or polymorphism. The validity of this approach is constrained by the limited number of malaria antigens that have demonstrated meaningful efficacy as vaccines in humans—at present just one such antigen, the circumsporozoite protein, is used in a vaccine formulation that prevents clinical malaria

8

with modest but significant efficacy [27]. More highly polymorphic antigens such as the blood stage proteins merozoite surface protein 1 (MSP1) and apical membrane antigen 1

(AMA1) were thought to hold high promise as vaccine candidates based on in vitro and animal studies, but initial human trials of monovalent MSP1 [28] and bivalent AMA1

[29] vaccines based on these antigens have shown no efficacy against clinical malaria.

These disappointing results are likely to be in part if not chiefly due to insufficient cross- protection against malaria parasites with highly diverse forms of the vaccine antigens [24,

30], although insufficient immunogenicity may also contribute to the lack of protective efficacy to date [31].

The extremely polymorphic blood stage antigen AMA1 demonstrates the pitfalls of using evidence of immune selection to pick vaccine candidates. Among 506 AMA1 sequences from parasites collected over three years in a single town in Mali, West Africa,

214 unique AMA1 haplotypes were detected. In the worst case, this might mean that a approximately 200-valent vaccine would be needed to provide complete protection against the diverse forms of malaria in a single rural town. In hopes of identifying a reduced number of variants that would be needed for a broadly protective polyvalent or chimeric AMA1 vaccine, in vitro and animal studies [32] and molecular epidemiological approaches [33] have been used to try to pinpoint the polymorphic AMA1 codons that are most important in determining allele-specific immune responses. Based on these approaches, a single cluster of eight polymorphic codons was identified that could be used to define about 10 haplotypes that might cover 80% of natural variants. Although a

10-valent malaria vaccine might still be infeasible, this complexity is within the range of currently licensed vaccines. Sequencing AMA1 from malaria episodes experienced by

9

children immunized with a highly immunogenic AMA1 vaccine [34] in a recently completed field efficacy trial may permit further narrowing down of the number of variants that would be needed for a cross-protective AMA1 vaccine.

Recent serological profiling studies using a peptide array containing 1200 recombinant proteins from the P. falciparum reference genome have suggested that protective humoral immune responses are directed against large numbers of malaria antigens [35]. This finding may help to explain the difficulty of achieving high and sustained efficacy with malaria vaccines based on just one or a few malaria antigens [24,

36]. In the face of the failure of such subunit vaccines as well as DNA and viral vectored vaccines to provide high-level protection, the concept of whole-organism vaccines has recently been revisited [37]. A radiation-attenuated, metabolically active, non-replicating whole sporozoite vaccine has been manufactured in and purified from aseptically raised mosquitoes [38] and was recently evaluated for safety and efficacy in an experimental sporozoite challenge trial in humans. If early clinical trials of sporozoite vaccines in the

United States and Europe demonstrate protection against homologous challenge with the same parasite clone used to make the vaccine, it will then be necessary to assess efficacy against natural heterologous challenge in high transmission settings such as Africa. It is hoped that immunizing with the very large number of antigens expressed by the whole organism will generate enough redundancy in protective immune responses to provide protection against diverse parasites. In the likely event that protection against diverse natural challenge afforded by a single-strain sporozoite vaccine is less than complete, comparing the genomes of breakthrough infections in vaccinated people and infections in unvaccinated controls will inform the design of multi-strain vaccines. This novel type of

10

comparative genomics will identify genes encoding proteins that are under directional selection by vaccine-induced immunity, thus identifying the antigens most responsible for protective efficacy.

1.4.2 Drug Resistance Mechanisms and Markers

The recent emergence in Southeast Asia of P. falciparum resistance to the leading class of antimalarial drugs [39, 40] represents a problem of urgent public health importance that malaria genomics can help to address through the identification of genetic loci associated with resistance that can be used as molecular markers for surveillance of resistance. The identification of such markers for other antimalarial drugs was accomplished in the pre-genomic era, but doing so without current genomic resources and technologies took so long that resistance had already spread globally by the time markers were identified and validated as surveillance tools [41]. Genomic science has the potential to greatly accelerate this process, particularly as genome sequencing shifts from complete sequencing and assembly of a limited number of genomes to less comprehensive but higher throughput genome-wide genotyping of large numbers of samples from field studies.

1.4.2.1 Drug Resistance Mechanisms and Markers in the Pre-Genomic

Era

Nearly a decade before the international effort to sequence the P. falciparum genome began in earnest in 1996, the gene encoding P. falciparum dihydrofolate reductase ( dhfr ) was cloned and sequenced using primers based on consensus in known dhfr sequences from other organisms [42]. Sequencing dhfr from falciparum strains

11

sensitive and resistant to pyrimethamine and other antifolate drugs quickly identified a set of single nucleotide polymorphisms (SNPs) that caused resistance [43, 44] and that had potential use as surveillance tools [45].

Chloroquine-resistant P. falciparum emerged on the Thailand/Cambodia border in the late 1950s, spread throughout the region, and then disseminated globally, arriving in

Africa in the late 1970s. The search for a chloroquine resistance marker was less straightforward than that for antifolate resistance. With no known mechanism of resistance, searching for an orthologous candidate gene was not possible. Lacking an assembled genome that would permit modern approaches to identify the genetic locus of chloroquine resistance such as genome-wide association studies (GWAS), a genetic cross was completed in the mid-1980s between the chloroquine-sensitive HB3 clone of P. falciparum from Honduras and the chloroquine-resistant clone Dd2 from an Indochina lineage parasite [46]. The parental clones were mixed in culture and fed to mosquitoes where recombination occurred, and the mosquitoes were allowed to take a blood meal on splenectomized chimpanzees. Initial mapping of the resulting progeny was completed using 85 restriction length fragment polymorphisms across the 14 chromosomes. This genetic cross demonstrated that neither of two known Plasmodium multi-drug resistance- like candidate genes were associated with the resistance phenotype, but a ~400 kilobase region on chromosome 7 was identified that did associate with the phenotype. This 400 kilobase region was postulated to contain anywhere from 80-100 protein coding genes, so further mapping was needed to narrow the region [47, 48].

To pinpoint the genetic determinant of resistance, a high-resolution linkage map was created using 342 microsatellites, or simple sequence repeats [49]. Over several

12

years and with some false alarms this map was used to resolve the chloroquine resistance locus to 36 kb, and through directly sequencing this region in cross progeny and in geographically diverse isolates, the P. falciparum chloroquine resistance transporter

(PfCRT) was identified as the primary determinant of chloroquine resistance. The central role of PfCRT for both in vitro and clinical chloroquine resistance was demonstrated in genetic transformation studies [50] and a single nucleotide polymorphism (SNP) in

PfCRT was validated as a molecular marker for surveillance of chloroquine resistance in field studies [51]. PCR-based protocols for detecting the marker in DNA extracted from filter paper blood spots collected from finger-pricks were disseminated through the

World Health Organization even before research results were published, and these assays were widely deployed throughout the malaria-endemic world [52-54] providing an example of the potential for genomics to improve the public health in developing countries. However, these results were published in 2000 and 2001, about 15 years after the effort to identify the genetic basis of chloroquine resistance began, and after chloroquine efficacy was already compromised in many parts of the world by the global dissemination of resistant forms of PfCRT.

1.4.2.2 An Opportunity to Deter Drug Resistance

As resistant forms of PfCRT, DHFR and dihydropteroate synthase (the target of sulfa drugs) spread globally [55], chloroquine and the antifolate combination sulfadoxine-pyrimethamine lost efficacy against malaria, resulting in large increases in malaria deaths [56]. These older drugs were replaced as first-line therapies by artemisinins, a class of compounds derived from a Chinese herb qinghausu or Artemisia annua [57]. Artemisinins are used in conjunction with one or more partner drugs as

13

artemisinin-based combination therapies (ACTs). ACTs are fast-acting, effective, and safe drugs that represent the last line of defense in areas with multi-drug resistance.

While the partnering of drugs into combination therapies is meant to deter resistance [58], it is probable that years of artemisinin monotherapy distribution along with use of substandard/counterfeit artemisinin and ACTs contributed to the emergence of resistance

[59], which was recently reported on the Thailand-Cambodia border [39, 40].

If artemisinin resistance follows the patterns established by chloroquine and antifolate resistance, which also originated along the Thailand-Cambodia border before disseminating globally, malaria deaths can be expected to sharply increase once again, reversing recent downward trends in malaria incidence and mortality and threatening to derail the renewed effort to eradicate malaria [60]. The World Health Organization is coordinating an urgent effort to contain resistance in Cambodia [61], but this initiative is hobbled by the lack of knowledge about whether and in what direction(s) resistance may be spreading from the site of origin. A molecular marker for resistance would greatly aid the containment effort, and would provide a valuable tool for surveillance at sentinel sites where clinical resistance has not yet been observed.

Because they rely on dried blood spots that can be collected from finger-pricks and require no cold chain, molecular surveillance tools can be more readily standardized and widely and rapidly deployed than surveillance based on clinical protocols or on in vitro assays that require frozen venous blood [54]. Research to identify artemisinin resistance markers has thus far mainly followed the same candidate gene approach that was used to identify antifolate resistance markers in the last century, focusing on genes known to play a role in resistance to other drugs or hypothesized to be involved in

14

purported mechanisms of drug action. These approaches have not yet provided an understanding of the mode of action of artemisinins or artemisinin resistance, and no candidate gene has so far been associated with delayed parasite clearance [62]. A comprehensive genome-wide search for the molecular basis of resistance is therefore not only warranted but urgent.

Even with the malaria genome sequence in hand and rapidly improving next- generation sequencing platforms making it possible to sequence large numbers of parasite isolates, several challenges remain to using genomics to identify artemisinin resistance markers. First, a clearly defined and reproducible genetically inherited phenotype is needed. The reference genome along with SNP discovery studies has allowed for the creation of genome-wide diversity maps [25, 63], which have been used to create SNP arrays. These SNP arrays allow for the rapid and cost-effective genotyping of hundreds to thousands of parasites. The first GWAS in P. falciparum used an array with about 3000

SNPs. This study identified genetic loci associated with in vitro susceptibility to artemisinins measured by culture-adapting field isolates and testing their ability to survive in the presence of different concentrations of artemisinins [26]. However the relevance, if any, of these loci to clinical resistance remains unknown because the in vitro phenotype correlates poorly with delayed clearance of parasites following treatment with

ACTs, the main clinical manifestation of the recently documented in vivo resistance.

Genetically identical parasite strains identified in multiple patients in western

Cambodia share similarly fast or slow parasite clearance rates, indicating that these different phenotypes may be linked to unique heritable determinants and that clearance rate is a suitable phenotype for GWAS [64]. In addition to using standard regression

15

methods for GWAS, powerful machine learning methods such as random forests [65] offer several advantages for analyzing large genetic datasets, including high accuracy and the ability to assess interactions. Complementary to GWAS, measures of signatures of selection locate changes in the genome such as regions of reduced heterozygosity and extended length haplotypes which could have resulted from recent selection, and are useful in that the methods are entirely phenotype-independent, avoiding the need for costly clinical or in vitro investigations to measure phenotypes [66-69].

Several other impediments remain for genomic studies aimed at identifying genetic determinants of clinically important phenotypes such as drug resistance. Because resistant parasites can only be found in remote rural sites far from established cold chains, finding, collecting, preserving, and transporting samples from the field to genotyping and sequencing centers is fraught with challenges. Current genotyping and next-generation sequencing platforms require relatively high amounts of parasite DNA with little human DNA contamination. This requires either the costly and cumbersome filtration of leukocytes from blood samples obtained from sick patients at remote field sites, or new, untested and relatively expensive methods for separating human and parasite DNA post-extraction. Moreover, malaria control efforts targeting areas where resistance has emerged have reduced the incidence of malaria in some of these areas to the extent that it is increasingly difficult to enroll patients in the clinical trials that are needed to measure clinical resistance phenotypes and collect parasite samples. Finally, many residents of these areas are transient migrant workers or military personnel who may be reluctant or unable to participate in study activities, including follow-up to measure parasite clearance rates. Close collaboration between genomic and clinical

16

scientists and public health officials, and multi-national, multi-site studies, are therefore required to design and conduct genomic epidemiology studies aimed at identifying artemisinin resistance markers.

Recently the World Health Organization coordinated a set of such studies at four sites in Cambodia, Thailand and Bangladesh, and a larger study at as many as 15 sites, including two sentinel sites in Africa, is now getting underway. Whole-genome sequencing of field isolates without culture-adaptation and cloning has been increasingly successful despite limitations of low parasite DNA and contaminating human DNA. At the Wellcome Trust Sanger Institute, more than 1000 such field-collected samples, including hundreds from clinical trials of artemisinin efficacy, have been subjected to next-generation genomic sequencing. Although the quality and coverage is not sufficient for complete de novo assembly, it is presently possible to genotype more than 80,000

SNPs in field samples with a high degree of confidence. The SNP calls resulting from these high throughput genome sequencing efforts will be used for GWAS and studies of signatures of selection in hopes of identifying artemisinin resistance markers that can be used for surveillance and to guide containment efforts.

1.5 Global Collaboration

The sequencing of the malaria, mosquito and human genomes was a monumental scientific accomplishment. Realizing the full potential public health benefit of this accomplishment will require collaboration among sequencing centers and clinical investigators and public health officials in developing countries. Malaria does not recognize national borders, and indeed drug resistance seems always to arise in

17

international border regions. Efficient application of genomic science to vaccine design and development and especially to the identification of drug resistance markers that can be used to help contain emerging resistance requires not only collaboration among scientists of different stripes but close transnational collaboration, as well as international coordination.

In the case of artemisinin resistance, the coordinating role has been taken up by the World Health Organization’s Global Malaria Programme. New programs like the

Worldwide Antimalarial Resistance Network (WWARN) [70, 71] and the Malaria

Genomic Epidemiology Network (MalariaGEN) [72] are also working to bridge scientific and public health disciplines across international borders, creating internet-based tools that integrate clinical and genetic data and that are designed to be useful to both genomic and clinical scientists as well as, in the case of WWARN, public health officials. These networks and programs all have different mandates, but share the goals of malaria control and eventual eradication, and continued collaboration is essential to the effective translation of advances in genomic science to benefiting public health in the developing world.

1.6 Drug Resistance Update

Since the publishing of this chapter multiple studies on the clinical and genomic properties of artemisinin resistance have been published. Most importantly is the discovery of a gene highly associated with clinical and in vitro resistance to treatment with artemisinin. This gene PF3D7_1343700 encodes a propeller protein made up of kelch domains as well as a conserved plasmodium domain of unknown function.

18

Residing on the parasite chromosome 13, this protein has been abbreviated kelch13 and will be identified as such in the rest of this document. The function of this protein is unknown but mutations in the 6 propeller domains are associated prolonged clearance half-lives in vivo and increased survival rate in a ring-stage survival assay. The first evidence of this genes role in artemisinin resistance arose from a mutation accumulation experiment in which an artemisinin sensitive parasite from West Africa was made resistant through constant culturing in the presence of increasing concentrations of drug.

Whole-genome sequencing at times throughout the life of the culture revealed eight point mutations in seven genes. Genomic evidence from our previous GWAS highlighting a genomic region on chromosome 13 containing kelch13 helped separate this result from the other mutations. Subsequent investigation of field derived samples confirmed this in vitro result in clinically relevant samples. Numerous mutations have been identified since this initial study with evidence by our group pointing to both independent emergence as well as spread of resistant parasites.

19

Chapter 2: A microarray platform and novel SNP calling algorithm to evaluate

Plasmodium falciparum field samples of low DNA quantity

Jacob CG, Tan JC, Miller BA, Tan A, Takala-Harrison S, Ferdig MT, Plowe CV. A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity. BMC Genomics 2014 Aug 26;15(1):71.

JCT helped with design concept and calculation of probe length. BAM trained me in the initial chip protocol at Notre Dame University. AT provided the D-score algorithm for initial use and comparison with the novel algorithm. STH, MTF, CVP all provided guidance and supervision at UMB and Notre Dame.

2.1 Background

Tools for assessing genetic diversity in malaria parasites are of potential use for the discovery of novel malaria vaccine antigens [73], and for understanding the molecular basis of drug resistance [74-78]. Recent advances in genome sequencing technology offer high-throughput methods for obtaining full genomic data [63, 79]. However, these technologies still require relatively large quantities of high quality DNA. Microarrays are able to tolerate DNA of lower quantity and quality typical of patient-derived field samples, which contain far more human than parasite DNA. Thus, field samples unsuitable for full-genome sequencing may be amenable to microarray analysis.

Microarrays can also be adapted to incorporate new loci, determine copy number polymorphism or designed to answer specific research questions.

20

Microarrays have been developed for genotyping Plasmodium falciparum , using a variety of platforms that detect single nucleotide polymorphisms (SNPs) at loci that had been described at the time of platform design [80-83]. Many novel SNPs were discovered in a large scale P. falciparum sequencing project, resulting in a map of population genomic variation [79]. We sought to create a microarray that would be able to genotype highly informative loci within this genomic map in field samples that were not amenable to next-generation sequencing. We also wanted a high-throughput system capable of working with DNA extracted from low volume blood samples. We chose to use a custom

NimbleGen 4.2 million probe design in multi-plex format. This platform is comprised of

12 identical arrays on one slide, each capable of genotyping 33,716 loci within the P. falciparum genome. When dual-color labeling is used, two samples can be hybridized to a single array yielding 33,716 SNPs for 24 samples in a single 2-day experiment, where several such slides can be run simultaneously, making this approach relatively high- throughput and inexpensive.

2.2 Methods

2.2.1 SNP selection and chip design

A pooled set of all possible SNPs was gathered from Version 1.0 of the MalariaGEN

Community Project [79] and a SNP set from an Affymetrix array used in a previous study

[75]. This pooled set was filtered based on the proportion of missing data in previous data sets, hyper-heterozygosity, and minor allele-frequency (MAF) in Southeast Asia and

Africa ≥1%. SNPs were prioritized by their MAF with precedence given to those with higher MAF values. SNPs given the highest priority were those that were variable in

21

multiple populations. SNPs were ranked in order of priority and then a final filter of minimum inter-SNP distance was applied. To decrease the number of large genomic gaps lower priority SNPs were given preference to high priority SNPs that were within 22 bp of another high priority SNP. The 22 bp value was determined through systematically increasing the inter-SNP distance threshold and measuring the number of large genomic gaps and total number of loci.

SNPs from African and Southeast Asian parasites were chosen to provide full coverage of the P. falciparum genome (minus telemetric regions and hyper-variable var gene clusters). Among the SNPs chosen for the array, 22,189 were variable only in Africa,

12,723 in Southeast Asia, and 3,811 in Papua New Guinea. Among the SNPs variable in

Africa and Southeast Asia 6,007 were variable on both continents. Also included on the array were 23 loci associated with drug resistance in the genes pfcrt, pfmdr-1, pfdhps, and pfdhfr . The entire SNP set averaged a median per chromosome inter-SNP distance of 324 bp (Figure 2.1A).

22

Figure 2.1 Microarray probe statistics. A) Histogram of Inter-SNP distances of proximal SNPs. B) Boxplot showing distribution of inter-SNP distances where allele variability has been shown. C) Histogram of G + C content of probes. D) Histogram showing distribution of probe lengths.

23

The probes designed for the array were mostly low GC content (Figure 2.1C) and of variable lengths (29–41 bp) (Figure 2.1D). Variable length probes provide higher average intensities when compared to static length probes and low GC was due to the genomic

AT richness of the P. falciparum genome [80]. Standard NimbleGen probes were also placed on the array, including alignment probes, chip identification probes, and random cDNA probes designed to match the content of our user designed probes. The average intensity of 8,421 random probes was used to define the global array background.

The NimbleGen 4.2 M Probe Custom DNA Microarray comes in varying formats from which we selected the highest number of plexes (12) to increase the number of samples for use in high-throughput fields studies. Probe length was determined by cDNA melting temperature of the sequence surrounding each SNP. For every SNP there were eight probes, four in the sense direction with each base at the center position and four in the anti-sense direction. Probe quartets were arranged sequential on the array to avoid variation due to chip defects and spatial bias. This array is no longer available as

NimbleGen has discontinued custom microarray production. Steps are being taken to transfer this probe set and analysis pipeline to a new commercial platform.

2.2.2 DNA labeling and hybridization

Sample DNA was concentrated using vacuum centrifugation to a volume of 30–50 μl and heat denatured with 1 OD of 65% random nonamers labeled with cy3 or cy5 for 10 minutes at 98°C. Denatured DNA was chilled on ice for 2 minutes and then incubated for

2 hours at 37°C in the presence of 50 units of Klenow fragment and a 50X dNTP mixture. The reaction was terminated with 0.5 M EDTA and DNA were precipitated with

24

5 M NaCl and iso-propanol. Labeled DNA was washed 2–3 times with 80% ice cold ethanol to remove unincorporated dye. After removal of ethanol, samples were rehydrated in water and cy3 and cy5 labeled samples were combined for multiplexing.

Samples were vacuum-concentrated and resuspended in a buffer containing NimbleGen alignment oligo and 1X Denhardt’s solution. The samples were heat denatured at 95°C for 5 minutes and stabilized at 42°C prior to loading onto the array. Samples were hybridized on a NimbleGen hybridization station for 16–24 hours at 42°C. Slides were disassembled in a dish containing Wash Buffer 1 at 42°C and washed in Wash Buffer 1,

Wash Buffer 2, and Wash Buffer 3 for 2 minutes, 1 minute, and 15 seconds respectively.

Slides were washed and subsequently dried in the SlideWasher 12 Array Processing

System. Microarrays were scanned with a NimbleGen MS 200 Microarray Scanner at 2

μm using “autogain” to automatically adjust scanning parameters on an individual array basis.

2.2.3 Heuristic base calling algorithm

Each SNP typed by this array is at a bi-allelic locus, as determined from extensive sequencing of global P. falciparum isolates [79]. The heuristic algorithm therefore focuses on the two intensities of each possible allele. This algorithm then identifies the global mean intensity for every probe with the identical center base and adjusts each individual intensity by the difference in intensity between the global means of the two bases being interrogated. Intensities are evaluated and adjusted independently for the sense and anti-sense direction. After adjusting the intensity, a SNP is called after fulfilling the following criteria: 1) the contrast of intensities is greater than or equal to

25

0.98, 2) the forward and reverse SNP calls are concordant, and 3) all intensities are above global background levels determined by the average value of the random probes.

Multiple thresholds were tested for background levels up to average random plus two standard deviations (data not shown). SNP calling accuracy changed minimally when thresholds were raised, however SNP call rate dropped more significantly. This algorithm was written in the PERL programming language and uses standard outputs from the

Roche NimbleScan (v2.6) software. Given the discontinuation of this microarray the algorithm will be made publically available when the probe set is validated on a new platform.

2.2.4 Experimental samples

P. falciparum (3D7 strain) was grown in culture under standard conditions from stocks procured from the MR4 repository. Base parasitemia was determined by light microscopy and dilutions were made using human whole blood. Dilutions were made to simulate

1,000, 10,000, 100,000, and 500,000 parasites/μl. Leukocyte depletion was performed on a subset of samples using 2.5 mL of blood in a CF11 column. Insufficient parasites were acquired to fulfill the necessary 2.5 mL of blood for leukocyte depletion of the 100,000 and 500,000 parasites/μl mixture. DNA extraction of these samples was performed with the Qiagen mini-kit. Dd2 and 3D7 purified DNA was also received from the MR4 repository. Purified NF54 parasite DNA was generously donated by Sanaria Inc. in

Rockville MD. Whole blood samples from field isolates came from studies conducted in

Southeast Asia [84]. A subset of samples was whole-genome amplified prior to experimentation, for these samples we used a Qiagen REPLI-g mini kit. Simple linear

26

regression was used to test the relationship of DNA quantity with SNP call rate and SNP call accuracy. Informed consent was obtained from participants or their parents or guardians for samples collected as part of clinical trials following protocols approved by the Research Ethics Review Committee of the World Health Organization, local

Institutional Review Boards (IRBs), and samples were analyzed following a protocol approved by the University of Maryland, Baltimore IRB.

2.2.5 Quantitative PCR

The P. falciparum gene encoding the 18 s ribosomal subunit was amplified using qPCR for each sample. In a total reaction volume of 25 μl, 2 μl of sample DNA was used along with 10 μM probe, 10 μM of each primer, water, and TaqMan Universal PCR Master

Mix (containing AmpliTaq Gold DNA Polymerase, dNTPs, and dUTP). The sequences for primers and probe follows: Forward- 5′-GTA ATT GGA ATG ATA GGA ATT TAC

AAG GT-3′, Reverse- 5′-TCA ACT ACG AAC GTT TTA ACT GCA AC-3′, Probe- 5′-

FAM GAA CGG GAG GTT AAC AA MGB-3′. PCR conditions were 15 minutes at

95°C, 15 seconds at 95°C, and 45 cycles for 1 minute at 60°C. For quantification a standard curve was generated and run on each plate as well as a no DNA control.

Standard curve DNA was derived from purified NF54 parasite DNA and quantified using

CYBR Green. This DNA was diluted to 3, 1.5, 0.75, 0.375, 0.188, 0.094, and 0.047 ng per μl and each standard and sample was run in duplicate with the final quantity expressed as the mean of both values.

27

2.3 Results

2.3.1 Heuristic base calling

To test the accuracy of this microarray we genotyped cultured parasites of the 3D7 reference strain, simulating varying levels of parasitemia by dilution into whole blood and using various pre- and post-extraction processing of the DNA. A previous P. falciparum array using the same NimbleGen technology used a SNP-calling algorithm referred to as the D-score method [80]. Applied to the new array, this approach provided an average of >92% accuracy for samples with 10,000 or more parasites/μl. However, samples with low parasitemia (<1,000 parasites/μl) yielded very poor accuracy (69% and

76%, Table 2.1). Investigation of this inaccuracy identified a strong bias toward G and C

SNP calls in lower parasitemia samples (Figure 2.2A). The cause of this bias is most likely a lack of probe saturation in low DNA quantity samples and inflated intensities due to a higher binding affinity of the triple hydrogen bonding of guanine to cytosine. The D- score method relies entirely on raw intensity values to call SNPs, without regard to the base composition of the probe (Figure 2.2B). Many microarrays do not rely on raw intensity values to make base calls, and in some instances a complex model is fitted to the data to accurately call SNPs [85, 86]. To overcome this bias a heuristic algorithm was developed which uses the probe center base position to adjust raw intensity values

(Figure 2.2C). When applied to the samples described above, this new algorithm gave accuracies greater than 95% in nearly all samples tested.

28

Table 2.1 SNP call accuracy and SNP call rate of 3D7 cultured parasites

SNP call accuracy SNP call rate

Heuristic D-score method Heuristic D-score method algorithm algorithm

WGA + - + - + - + -

LD + - + - + - + - + - + - + - + -

1,000 p/μl 99.0 96.2 99.6 94.1 93.5 76.4 97.4 69.0 75.4 36.8 68.7 15.6 56.0 40.7 63.2 32.3

10,000 99.4 99.4 99.7 99.5 94.4 95.8 98.6 94.1 76.5 87.5 87.6 62.9 51.7 59.1 70.8 50.4 p/μl

100,000 - 99.6 - 99.7 - 98.7 - 98.8 - 94.4 - 92.0 - 76.0 - 78.9 p/μl

500,000 - 99.7 - 99.8 - 98.4 - 98.9 - 94.5 - 95.3 - 64.4 - 72.8 p/μl

To test the accuracy and call rate of the new heuristic algorithm cultured 3D7 strain parasites were genotyped under differing leukocyte depletion, whole-genome amplification, and parasite concentrations. We then compared the previously describe D- score method to the heuristic algorithm. (WGA = whole-genome amplification, LD = leukocyte depletion, p/μl = parasites per microliter).

29

Figure 2.2 Sample intensities and intensity distributions of varying center bases. A) Distribution of raw probe intensities. Dotted lines indicate global mean intensity values for T (black) and G (green) center position alleles. B) Average raw intensities of a single SNP for 24 field samples. Dotted lines indicate area of low contrast where points between the lines would not be distinguishable. Black points were called a T allele by genome sequencing and green diamonds were called a G allele. C) Average intensities after global mean adjustment, all sequenced calls match microarray calls after running through algorithm.

30

This algorithm uses the global array average for probes with identical center positions to adjust individual probe intensities. A call is made only when the sense and antisense base match, a contrast between alleles of 0.98 exists, and all intensities are greater than the random binding threshold. This heuristic algorithm outperformed the D-score method in the mixing experiment of cultured 3D7 parasites raising the average accuracy of these samples to >98% and increasing the accuracy of the 1,000 parasites/μl by as much as

25%. We also observed an increase in SNP call rates when using this heuristic algorithm.

To set more accurate sample quality control (QC) thresholds, we switched from the initial approach of measuring DNA quantity in terms of parasites/μl to a quantitative PCR

(qPCR) method. For the previous microarray a DNA quantify cutoff of 250 ng was used to select samples for testing, but lower input amounts were not systematically evaluated

[9]. Adopting the new heuristic algorithm and using 152–219 ng DNA from the NF54 parasite lab strain quantified using 18 s qPCR, we observed an average SNP-calling accuracy of 99.4% (Table 2.2). To identify the lower limits of DNA quantity we tested a range of 1–40 ng purified DNA from 3D7 and Dd2 parasites, obtaining >95% accuracy down to 20 ng of parasite DNA for 3D7 and down to 10 ng for Dd2. Calculation of call rates for all samples revealed a wide range (8% - 70%) which was associated with the amount of DNA used (p = 1.57e-05) (Figure 2.3). For these samples there was also an association of DNA quantity with call accuracy (p = 0.025).

31

Table 2.2 SNP call accuracy and SNP call rate of NF54 purified DNA ID Parasite DNA (ng) Call accuracy Call rate

NF54-1 219 99.4% 86.6%

NF54-2 212 99.5% 81.5%

NF54-3 210 99.4% 78.6%

NF54-4 203 99.4% 73.8%

NF54-5 170 99.4% 85.7%

NF54-6 168 99.4% 77.3%

NF54-7 156 99.2% 76.9%

NF54-8 152 99.3% 69.0%

NF54 (WGA) 222 99.6% 91.6%

NF54 (WGA) 138 99.5% 85.1%

NF54 reference strain DNA was typed at varying concentrations and call accuracy and call rate was calculated. Two samples underwent whole genome amplification (WGA) prior to genotyping and the concentration reflects post-WGA quantities.

32

Figure 2.3 SNP call accuracy and SNP call rate of low quantity reference DNA. Black and grey bars show SNP accuracy of purified 3D7 and Dd2 strain parasites at low DNA quantities (1 – 40 ng). Lines show SNP call rate of each typed sample. DNA quantity was significantly associated with SNP calling accuracy (p = 0.025), and DNA quantity with SNP call rate (p = 1.57e-05).

33

2.3.2 Analysis of field isolates and leukocyte depletion

To assess the feasibility and accuracy of using DNA from P. falciparum parasites extracted directly from patient blood we used both high and low parasite DNA concentration samples derived from venous blood samples that had and had not been subjected to leukocyte depletion. High DNA samples include 17 samples from Southeast

Asia that had successfully undergone Illumina whole-genome sequencing and had a parasite DNA concentration of 250 ng. These samples had an average accuracy of 98.4% and an average call rate of 85.4% (Figure 2.4A). Low DNA volume samples were seven samples from Southeast Asia that ranged from 206 ng down to 37 ng of parasite DNA and averaged 98.8% and 73.9% accuracy and call rate respectively. There was no correlation between accuracy and DNA quantity (p = 0.79) (Figure 2.4B), while there was a strong correlation (p = 9.8e-08) (Figure 2.4C) between DNA quantity and call rate, demonstrating that DNA quantity is a good predictor of the number of SNPs called.

Analyses were performed on all reference and field isolates tested to determine the per

SNP accuracy rate. Among all 33,716 SNPs typed by this array ~28,000 were called correctly in 100% of samples analyzed. Only a small subset (~200) was called incorrectly in all samples and ~600 were not able to be called in any sample. Only SNPs called correctly in >95% of samples were included in final versions of the heuristic base calling algorithm. Distribution of inaccurate SNPs appears random through visual inspection of both their place within the genome and their layout on the printed array.

34

Figure 2.4 SNP call accuracy and SNP call rate of field samples. A) SNP calling accuracy (grey bars) and SNP call rates (black line) for 24 field isolate from Southeast Asia. B) Correlation of SNP call accuracy and DNA quantity (p = 0.79). C) Correlation of SNP call rate with DNA quantity (p = 9.8e-08).

35

We noted a difference for low volume samples that had undergone leukocyte depletion to remove human DNA prior to DNA extraction. Leukocyte depletion removes human white blood cells from a whole blood sample, enriching the final sample for red blood cells which harbor the parasite and consequently their DNA. In the analysis of cultured

3D7 parasites diluted in whole blood a subset of samples were leukocyte depleted using

CF11 cellulose columns. For the 1,000 parasites/μl samples we saw an average increase of accuracy of 4.2% and an average increase in SNP call rate of 45.9% (Table 2.1). This increase was more pronounced in non-whole genome amplified (WGA) samples.

2.3.3 Effects of whole-genome amplification

Samples with low DNA quantity can be subjected to WGA to increase the amount of

DNA, allowing more samples to pass preliminary QC metrics. Several cultured and purified DNA samples underwent WGA prior to genotyping and were used to evaluate possible bias when undergoing amplification. Accuracies in cultured parasites showed no marked difference (≤0.2%) while WGA slightly increased the call rate overall (Table

2.1). Two NF54 DNA samples were subjected to WGA prior to microarray analysis, yielding an average accuracy of 99.54% and an average call rate of 88.34%, about 10% higher than the non-WGA samples (Table 2.2).

2.4 Discussion

We set out to develop a SNP genotyping platform for use on P. falciparum field samples of low DNA quantity and quality, and evaluated this tool using parasite DNA from cultured parasites and preserved venous blood collected from malaria-infected

36

individuals in field studies. An earlier microarray used 250 ng of DNA as the minimum level tested, although lower DNA quantities were not systematically evaluated [80]. Our heuristic SNP-calling algorithm was capable of accounting for a bias discovered when using lower quantity samples and yields highly accurate results for samples with as little as 37 ng of parasite DNA in field samples and 10 ng in purified DNA samples. We also obtained high accuracy with filter paper samples, albeit with lower reproducibility (not shown). Addition of a WGA step and a strict DNA quantity threshold may improve reproducibility for filter paper blood spots.

DNA microarrays have previously been used for SNP detection in P. falciparum [80-82,

87]. With the increased availability of whole genome sequence data thanks to lower costs and improved methodology, the value of DNA microarrays can be questioned. However, for valuable field samples that are associated with important demographic, clinical and other phenotypic data such as drug resistance or vaccine resistance [88], microarrays can rescue the high proportion of samples that fail to meet criteria for sequencing, justifying the use of microarrays. Samples that often do not meet sequencing standards include low volume blood samples, filter paper blood spots, samples with high levels of human DNA contamination, and archived or degraded samples. It was our goal to create a SNP microarray that can complement genome sequence data in association and population genetics studies, providing genome-wide SNP data for samples not fit for sequencing.

The SNPs chosen were based on a set of highly validated loci identified by sequencing hundreds of global isolates of P. falciparum [79]. This large sequencing project shows how microarrays still have a place in genomic research. The first iteration of samples prepared for sequencing had a QC pass rate of ~30%; more recent samples had a pass rate

37

of >50%. This leaves a significant number of samples with no data. By selecting SNPs from within this highly validated list of loci we can fill in the gaps from sequencing while continuing to use samples for which sequencing was successful. The cost of genome- wide microarray analysis (currently less than $90/sample excluding personnel costs) compares favorably with that of genome sequencing (at least 3-fold higher for short-read

Illumina sequencing and approximately 25-fold higher for third generation sequencing and assembly). Imputation strategies are being developed that may further increase the value of microarray-generated SNP information.

2.5 Conclusions

This microarray genotypes over 33,000 variable positions in the P. falciparum genome with high accuracy and high throughput. The ability to run samples with as little as 10 ng of parasite DNA increases the number of field samples for which whole genome analysis is possible. The selection of loci also allows samples genotyped on this microarray to be analyzed in conjunction with higher quality samples sequenced using next-generation sequencing platforms.

38

Chapter 3: Admixture and Gene Flow of Plasmodium falciparum in Southeast Asia.

Manuscript formatted to be submitted to Science Magazine. All samples were collected by local organizations and were provided by to us by Arjen Dondorp, Mark Fukuda,

Francois Nosten, Harald Noedl, Mallika Imwong, Delia Bethell, Youry Se, Chanthap

Lon, Stuart Tyner, David L Saunders, Charlotte Lanteri, Frederic Ariey, Aung Pyae

Phyo, Peter Starzengruber, Hans-Peter Fuehrer, Paul Swododa, Nicholas White, Odile

Mercereau-Puijalon, Didier Menard, Paul Newton, Maniphone Khanthavong, Bouasy

Hongvanthong, Wasif A. Khan, Myaing M. Nyunt, and Myat H. Nyunt, Myat P. Kyaw, Ye

Htut, and Kay Thwe Han. Design and conceptualization were aided by Shannon Takala-

Harrison, Joana Carneiro da Silva, and Christophe V. Plowe. Statistical analysis was aided by Jason Bailey. Whole genome sequence was completed by the team at the Sanger

Centre by Olivo Miotto, Susana Campino, Sarah Auburn, Bronwyn MacInnis, Magnus

Manske, Jacob Almagro-Garcia, Gareth Maslen, Robert Amato, and Dominic

Kwiatkowski. Some DNA microarray samples were run in lab by Christopher Pepin,

Matthew Adams, Kaia Erickson, Amy Wang, and Sudhaunshu Joshi.

3.1 Introduction

The emergence of resistance to former first-line anti-malarial drugs, such as chloroquine and sulfadoxine-pyrimethamine contributed to the failure of the first malaria eradication campaign and led to increased disease and death due to malaria. Epidemiological studies of resistance in Plasmodium falciparum have shown evidence suggestive of limited but multiple origins of resistance for both chloroquine [89] and sulfadoxine-pyrimethamine

[90, 91], but in the case of all three drugs, highly resistant parasite lineages that are

39

prevalent in Africa have been traced to a single Southeast Asian origin [92, 93].

Resistance to the current first line treatment for falciparum malaria, artemisinin, has been confirmed in multiple locations throughout Southeast Asia [94-98]. Unlike previous waves of drug resistance, artemisinin resistant parasites have both emerged independently at multiple sites within Southeast Asia and have spread between geographic regions [99].

The discovery of artemisinin resistant parasites led to a large concerted effort to determine the molecular cause of resistance, as well as a viable molecular marker for tracking resistant parasites. Many different approaches were taken including, genome- wide association studies (GWAS) [100, 101] and mutation accumulation studies [102] leading to the discovery of a molecular marker of artemisinin resistance. Mutations in the kelch13 gene are significantly associated with increased parasite clearance half-life and in vitro ring stage survival, however not all mutations provide an equal level of resistance

[103], nor are all mutations equally distributed across Southeast Asia [99]. It has also been shown that P. falciparum “founder” populations (genetically homogeneous sub- populations) within Southeast Asia are associated with increased parasite clearance half- life [104], and that some of these populations are associated with specific kelch13 mutations [105].

The variability of different resistant kelch13 mutations [103] and their independent emergence, which potentially lead to the origin of founder populations, show an opportunity for selection of more resistant/fit parasites, as well as the rapid expansion and spread of resistant parasites outside of Western Cambodia. Highly structured populations within a region can affect the ability for integration of migrants due to incompatible background mutations; while conversely, genetically similar populations are able to more

40

freely exchange genetic polymorphisms. Structure can also complicate genomic association studies, as alleles isolated within specific subpopulations can be spuriously associated with phenotypes of a similar distribution. We used model based analyses to estimate population structure and gene flow between 19 sites in Southeast Asia.

3.2 Population Structure and Genetic Relatedness

To investigate the population structure of parasites in Southeast Asia we used the model based method ADMIXTURE. We genotyped 2,185 samples from the six countries comprising the Greater Mekong Sub region (GMS; Cambodia, Laos, Myanmar, Thailand,

Vietnam, and the Yunnan Province of China) and Bangladesh, as well as two sites in

Africa (Nigeria and the Democratic Republic of Congo). All analyses in this study were completed using single nucleotide polymorphisms (SNPs). These SNPs were called from full genome sequence or from a custom P. falciparum DNA microarray [106]. Population determination relies on individuals being unrelated and genetic markers being unlinked.

To remove related individuals the genetic analysis toolbox plink was used to calculate identity-by-state percentages to determine the threshold for “relatedness” and to group related parasites into family or “clone” groups. A small set of highly related individuals is seen among Southeast Asian samples (Figure 3.3A) and to a lesser extent, among African samples (Figure 3.3B). Samples within this whole data set group into 201 clone groups made up of 2 to 56 parasites. A single representative of each group was included along with all other non-related parasites totaling 1,257 individuals. A sliding window linkage disequilibrium (LD) analysis was used to remove linked markers. After pruning, 10,378 independent polymorphic sites remained. After removing highly related individuals and linked loci to conform to the model assumptions ten replicates of population numbers 1 –

41

39 were run with cross-validation. A K of 14 produced the model with the highest likelihood and the lowest cross-validation error (Figure 3.4). In analyses where highly related individuals were not removed, the technical replicates were never able to reach peak likelihood. Individuals were assigned to a subpopulation based on the greatest percent ancestry.

The ADMIXTURE analysis revealed distinct population structure within Southeast Asia, and multiple subpopulations within Cambodia (Figure 3.1). African samples are represented within a single subpopulation. Two small subpopulations appear in individual geographic locations in Southern Laos and Eastern Cambodia respectively. All other subpopulations represent at least 2 geographic locations with some appearing in as many as ten locations, and parasites from a single geographic location can belong to up to ten different genetic subpopulations. Samples from Bangladesh represent two subpopulations with most samples from a Bangladesh only subpopulation. Samples from China,

Myanmar, and West Thailand are represented primarily by three subpopulations, with some samples being present in Cambodia and Laos. Eastern sites (South Laos, East

Cambodia, and ) are comprised mostly of parasites from a single large subpopulation. East Thailand, West Cambodia, North Cambodia, and South Cambodia show large amounts of structure and are made up of primarily six different subpopulations. Some admixture exists between this group and others in Southeast Asia, but most parasites are from one of the six subpopulations.

42

Figure 3.1 Distribution of genetic subpopulations. Map showing prevalence of each genetic subpopulation by geographic site.

Genomic studies like those used to discover the kelch13 gene benefit from knowledge about the underlying genomic and population genetic characteristics of the parasites being investigated. Our group’s initial study looking for markers of artemisinin resistance used multiple strategies to correct for underlying population structure in a complex regression model [101]. Use of principal components as covariates was able to correct for moderate amounts of confounding due to population structure, but still allowed for many false positive to exist. In the end a method which utilized a genetic similarity matrix as a

43

random effect in mixed models provided the best adjustment for confounding due to structure [107]. Other types of studies rely on population structure for confident results.

A recent study described numerous P. falciparum “founder” populations in West

Cambodia, with multiple of them being associated with the artemisinin resistance phenotype [104]. A major caveat with the founder population hypothesis is the method in which populations were initially determined. Multi-locus model-based ADMIXTURE analysis was performed with an LD-trimmed SNP set from whole-genome sequencing.

This data set however included highly related, if not completely identical (clonal) parasites. Including such samples violates the model’s initial assumptions and may result in incorrect estimates of population structure. What have been called subpopulations, or

“founder” populations may have been in fact family groups of highly related parasites.

Our data show that non-related parasites show high levels of admixture with populations comprised of mostly admixed parasites.

Using principal components analysis (PCA) to look for patterns of gross population structure we made several interesting observations, firstly that a large number of subpopulations seem to be closely related. Among this core set of populations there are nine sub-populations. The parasites that do not cluster by principal components 1 and 2 include those comprising the African, Bangladeshi, West Cambodian (Pailin,

Battambang, and Pursat), and East Thai subpopulations (Figure 3.2). High differentiation within West Cambodia could be related to the presence of highly related “founder” populations resulting from resistance to antimalarial drugs, or could be a separate event that could hold answers about the historical patterns of early emergence of drug resistance in this area. It was previously hypothesized that an increased mutation rate in

44

Cambodian parasites was responsible [108], but recent work on clinical isolates shows no difference between parasites in different regions of Southeast Asia [109].

Figure 3.2 Principal components analysis. Principal components 1 and 2 (A) of all parasites colored by ADMIXTURE defined populations, and (B) parasites with greater than 80% identity within each subpopulation. Circles highlight extent of subpopulation.

45

3.3 Coalescent Analysis and Estimation of Gene Flow

The use of coalescent theory genetics provides an opportunity to study the patterns of migration between sites in Southeast Asia. To measure gene flow we employed a complex multi-parameter coalescent model using data from around Southeast Asia using the program LAMARC. We used a Bayesian framework to estimate θ (2 Neµ where Ne is the effective population size and µ is the per site per generation mutation rate) and bi- directional migration rates (M = m/µ where m is the per generation lineage immigration chance) for all populations in our dataset. Bi-directional migration values show uneven gene flow between individual populations as well as patterns of gene flow between geographic regions. The largest migration rate is seen from Kampot, South Cambodia to

Pailin, West Cambodia while the lowest value is seen between Kampong Speu, South

Cambodia and nearby Kampot, also in South Cambodia (Table 3.1).

Table 3.1 Bi-directional migration rates. Top 10% of values highlighted.

46

Laos .2 171.6 7.3 84.2 3.8 Vietnam East South South Central RatanakiriPhuoc Bihn Attapeu Savannakhet Speu Kampong Kampong Kampot Preah Preah Sihanouk Cambodia 3.9 348.8 162.4 209.2 49.9 123.6 302.2 713.7 56.1 00.5 528.5 301.8 193.7 126.1 77.6 389.7 59.6 402.4 2 105.2 212.5 170.6 219.9 559.7 96.4 74.9 132.4 .0 390.7.8 270.6 854.2 75.7 600.4 280.9 582.2 823.1 514.4 40.6 897.9 189.2 299.5 264.4 207.4 49.0 259.4 200.0 3.4 79.0 130.8 30.3 66.4 514.3 85.6 119.1 165.2 5.1 508.5 577.5 255.4 39.1 64.0 855.4 49.8 64.7 762.2 92.6 586.8 299.5 363.3 105.4 271.1 376.9 76.8 79.1 564.9 908.7 236.4 186.5 190.9 229.2 233.1 250.1 91.8 794.0 Migration INTO Migration 0 94.4 782.1 214.4 139.3 228.3 301.2 155.5 576.0 97.8 137.0 4 37.6 286.4 136.0 364.5 103.7 251.5 850.7 73.2 118.8 419.8 .4 511.2 67.2 103.6 119.9 916.1 683.4 132.3 340.9 270.6 0.7 101.3 159.1 384.0 63.1 132.6 280.6 49.8 396.9 144.0 276.1 85.2 38.9 112.8 118.8 186.1 26.0 113.0 173.7 179.4 93.2 Preah VihearPreah Pailin Battambang Pursat North West South 7 39.8 78.8 326.1 83.5 111.4 233.7 901.5 61.0 497.3 189.8 .1 500.5 221.5 897.2 127.2 315.5 63.4 198.1 82.7 187.9 529.0 Oddar Meanchey 9 160.5 410.6 264.8 98.7 39.4 248.0 287.3 222.2 329.0 131.9 22 .6 323.3 74.2 152.7 50.8 165.2 273.0 401.8 177.2 34.0 126.8 13 2.2 122.9 335.7 368.6 504.5 158.4 340.9 188.5 202.7 509.0 891 Myamar Thailand East South Central South South West East Chittagong Yunnan Bago Tanintharyi Ranong Tak Srisakhet Bangladesh China Pailin 169.2 157.8 143.6 100.6 732.5 903.1 62.7 91.1 254.8 38 Pursat 80.7 202.1 334.6 60.6 618.9 331.1 54.2 158.3 125.2 53. Kampot 227.6 312.9 64.0 928.8 700.9 262.5 293.3 902.2 78.2 93 Battambang 92.5 94.8 38.4 235.8 159.2 120.4 380.0 101.0 74.1 Preah VihearPreah 768.9 104.3 301.6 467.5 109.8 99.2 249.9 91.9 2 Preah SihanoukPreah 677.8 619.5 474.5 915.5 84.9 208.6 83.7 547. Kampong SpeuKampong 49.7 924.9 220.4 303.7 157.0 139.6 434.7 197.4 Oddar MeancheyOddar 104.1 118.1 109.7 256.9 385.0 155.2 274.0 24 East Srisakhet 182.6 348.0 306.5 65.1 178.1 80.9 90.7 693.7 7 East Ratanakiri 150.2 352.7 884.7 760.5 180.5 522.2 98.6 175 West West Tak 580.1 90.0 765.0 114.8 96.2 139.0 602.0 147.9 455.5 3 South South TanintharyiSouth 68.8 Ranong 53.0 245.2 652.1 212.7 114.4 67.2 628.0 182.5 65.2 50.6 79.6 184.7 343. 183.6 177.5 61 North South Attapeu 85.1 82.9 634.2 512.6 75.1 563.0 269.5 406.1 45 Central Bago 466.5 265.4 231.0 220.4 68.0 94.8 455.9 341.1 96 Central Savannakhet 119.6 187.2 903.4 548.4 45.3 329.5 248. Laos China South Yunnan 64.2 521.9 93.9 194.5 41.0 178.3 188.2 76. Vietnam South Phuoc Bihn 629.9 76.6 96.2 196.8 37.9 412.5 61 Myamar Thailand Cambodia Bangladesh East Chittagong 826.1 865.9 175.1 56.1 911.2 915

Migration FROM Migration

47

Looking at general patterns of migration we see the highest average immigration rates to

Bago, Central Myanmar, Battambang, West Cambodia, and Ratanakiri, East Cambodia.

Myanmar has the highest incidence of malaria within Southeast Asia and Bago is centrally located within the country. The main contributors to the influx of gene flow are from Central Laos, East Cambodia, and Southeast Bangladesh. The Battambang province in West Cambodia contains Cambodia’s second largest city and is a potential cross-road between sites throughout Cambodia as is seen with immigration of parasites from the

North (Preah Vihear) and East (Ratanakiri). Interestingly, we also observe migration into

West Cambodia from the nearby sites in Tanitharyi, South Myanmar and Ranong, South

Thailand, both across the gulf of Thailand. The third highest recipient of foreign parasites is Ratanakiri in East Thailand. While not centrally located in the Mekong Subregion like

Bago or near large city like Battambang, Ratanakiri is located between the Mondulkiri

Province in Cambodia and Southern Laos, both of which have the highest rates of malaria in their respective countries [110]. It is not however, these regions that are the highest contributors of foreign DNA to this population. The largest contributors are in fact Preah

Sihanouk in South Cambodia, Srisaket in East Thailand, and the Yunnan Province of

China. Our model simulates an “all-by-all” migration scenario and migration rates support this with values differing across both distance and adjacent populations. We did not simulate un-sampled populations in our model and this could potentially cause false rates of migration [111], however including potential “ghost” populations would have over parametrized the model. Little has been published about migration of individuals across Southeast Asia as many migrant people are unregistered [112] and most research that exists focuses on “border malaria” [113], not migration across distances. However, a

48

study in Eastern Thailand on the border with Cambodia did show that in certain sampling locations the largest proportion of migrant workers were from Myanmar or of Mon descent [112] as opposed to people from neighboring Cambodia.

Understanding the way parasites are distributed and spread throughout Southeast Asia has many implications in malaria control and elimination. A large aspect of this is the migration patterns of parasites in areas of high levels of drug resistance. Recent reports on the distribution of kelch13 mutants show the highest prevalence rates within

Cambodia, centered on sites in West Cambodia. While other sites have a mixture of kelch13 resistance mutations, Pailin has a skewed distribution that shows high levels of the 580Y mutation. It is possible that this mutation, which has been shown to cause lower levels of resistance in in vitro models, has a selective advantage allowing it to out- compete other parasites with other (more detrimental) resistance mutations. If this hypothesis holds true, it will be important to monitor the frequency of these alleles, as well as the migration patterns of sites with this mutation. In our study the three West

Cambodia sites (Battambang, Pailin, and Pursat) have large emigration values to sites in

West and South Thailand and Laos, areas of low to moderate incidences of kelch13 mutations. All of these emigration sites have different subpopulation structure and this may slow the integration of exported resistance alleles from West Cambodia. These patterns, if corroborated by additional molecular marker data will be important in determining the evolving dynamics of artemisinin resistance in Southeast Asia and can influence strategies for the elimination of artemisinin resistant malaria.

Our study highlights the complicated population structure within Southeast Asia using the largest population genomic dataset to date. The presence of many sub-populations

49

within Cambodia, where sampling was highest, may be standard for all areas and is most likely not the result of “founder” populations resulting from recent drug resistance associated selective sweeps, as we observe the presence of multiple populations within most study sites. It is possible that this complicated structure exists in other areas of malaria transmission in Southeast Asia and can be seen with denser sampling outside of areas of high drug resistance. With denser sampling, more efficient methods of measuring gene flow will need to be developed as our model-based coalescent analysis was at the limits of what the method can simulate and test. Mutations in kelch13, while present in

Africa, have yet to be associated with artemisinin resistance and keeping Southeast Asian mutants out of Africa is essential to the health of the millions at risk of infection.

3.4 Supplementary Materials

3.4.1 Results

3.4.1.1Geographic population relatedness

General patterns of population relatedness can be measured with Wright’s F ST . This statistic looks at genome-wide values of heterozygosity and ranges from 0 to 1, with near-zero values representing highly related populations, and values close to one representing highly divergent populations. Our FST values ranged from 0.01 to 0.128, which is within the range seen in other studies of P. falciparum population relatedness

[114]. To test isolation by distance we used a Mantel test (excluding sites from Africa). A

Mantel correlation of 0.314 was observed with a p-value of 0.025. Model outliers were detected using a simple linear regression between genetic divergence and geographic distance. 95% prediction intervals revealed six outlying values all of which were

50

comparisons between Srisakhet, Thailand and other populations. F ST comparisons were also calculated between each SEA site and Kinshasa, DRC in Africa. These values varied between 0.045 (Chittagong) and 0.131 (Srisakhet).

3.4.2 Methods

3.4.2.1Study Sites

Study sites were selected from ongoing and completed surveillance studies and clinical trials from all nations contained within the Greater Mekong Sub-Region (GMS;

Cambodia, Laos, Myanmar, Thailand, Vietnam, and the Yunnan Province of China).

Thirty-one studies were used from 23 sites in six countries from the GMS, with two studies from Bangladesh and two additional sites in Africa. All samples were collected between 2008 and 2013. Samples were from studies of ACT or artesunate therapeutic efficacy or from passive surveillance for clinical malaria and are from microscopy or

PCR confirmed P. falciparum infections. All samples were from symptomatic infections and meet the inclusion criteria of the initial study collected under protocols approved by local ethical review boards and genomic analyses were completed with approval of the

University of Maryland School of Medicine institutional review board.

3.4.2.2 Parasite Genotyping

All analyses in this study were completed using single nucleotide polymorphisms (SNPs).

These SNPs were called from full genome sequence or from a DNA microarray. Samples from collaborating studies of adequate quality were sent to the Wellcome Trust Sanger

Institute for full genome sequencing [115]. DNA was extracted from whole venous blood or leukocyte depleted blood. Using Illumina genome sequencing technology samples

51

were sequenced to varying coverage and analyzed using an automated pipeline at the

Sanger Institute to call approximately 400K positions in the genome deemed to be variable based on a global sample set. Samples from studies that did not meet quality control criteria for full genome sequencing, or were from non-contributing studies were analyzed using a NimbleGen custom DNA microarray, assaying greater than 33,000 variable positions in the parasite genome [106]. The same nucleotide positions typed on the microarray were extracted from full-genome data for analysis.

3.4.2.3 Quality control of parasite genotypes

A total of 2,185 samples passed initial quality control (QC) during preliminary data collection. SNP data for 1,468 samples were obtained from whole-genome sequence data, while data for 717 other samples were generated on the DNA microarray. Preliminary chip analyses revealed a set of 28,496 SNPs that could be reliably typed. SNPs that were excluded from the original ~33,000 SNPs were either consistently called incorrectly by the SNP calling algorithm, or consistently not called at all (N values). Missing data filters were applied to both SNPs and samples. Upon plotting of percent missing data for SNP calls, a 30 percent cutoff was determined to be adequate leaving 25,278 SNPs. A similar plot was generated for missing data among samples, and a 40 percent cutoff was used to leave 1,925 samples. One additional step filtered out triallelic SNPs leaving 25,135 SNPs.

3.4.2.4 Ancestry Estimation

Prior to ancestry analysis, SNP-based genotypes were used to determine relatedness among individual parasitess using the genomic analysis toolkit plink [116], to generate an identity-by-state (IBS) pairwise distance matrix. Groups of highly genetically similar

52

parasites (or “IBS groups”) were determined by combining samples with high pairwise

IBS values (>= 0.995) (Figure 3.3). A single, randomly selected representative of each

IBS group was included in the ancestry estimation. After removal of related individuals, genotypes were pruned to remove linkage among markers. Again plink was employed to prune data based on genetic linkage. Windows of 30 SNPs with a 10 SNP overlap were used to remove SNPs with R 2 values >= 0.1.

Data were run with ADMIXTURE [117] where the number of populations (K) ranged from 1 to 39, with 10 replicates per value of K. Cross-validation (CV) was performed and

CV errors were used to determine the most likely number of populations (Figure 3.4).

Ancestry values (Q-matrix) for each sample were used from the replicate with the greatest likelihood. Principal components analysis (PCA) was used to look at general patterns of parasite relatedness. Eigenvectors were calculated using R [118] and the functions dist and cmdscale .

3.4.2.5 Population Differentiation

Genetic differences between parasite populations are influenced by their time of divergences as well as the level of migration between them. A standard way of measuring these differences is Wright’s F ST , which compares the average allelic heterozygosity of two populations to the heterozygosity of their theoretical combined population [119]. In two isolated populations genetic drift will act randomly on each population forcing certain alleles to fixation and others to extinction. The isolation by distance model states that the farther apart two populations are, the longer they have been isolated from each other and, consequently, the more time allele frequencies have had to diverge [120]. To

53

test this model we used a Mantel test, implemented using the ecodist package of the R statistical program.

3.4.2.6 Coalescent Analysis

Programs that use the coalescent theory of population genetics do so by recreating the genealogical history of a population using varying parameter values to find the combination of parameter values that provides the best fit (or likelihood) to the data given the genealogical history model.. Coalescent theory programs use markov chain monte carlo sampling to select parameter values for migration rate, effective population size, and mutation rate in order to recreate and evaluate genealogies. For this study LAMARC

[121] was used to calculate the effective population size ( Ne) using the parameter θ

(2 Neµ) and directional migration rates (M = m/µ, where m is the per generation lineage immigration chance). Prior to analysis cross-population extended haplotype homozygosity was used to determine regions of the parasite genome subjected to selective sweeps due to linked positively selected loci [122]. Loci within regions of affected by sweeps were masked from coalescent estimation, and plink was used to filter linked loci based on identical criteria used prior to ancestry analysis.

For each LAMARC run identical parameters were run per chromosome. The nucleotide substitution model used was the Felsenstein 84 model which allows for unequal base frequencies and with a modified transition to transversion ratio of 0.63 to account for the genomic bias in P. falciparum [123]. Since panel-based SNPs were used for coalescent analysis, a correction factor was used prior to running the model [124]. Initial θ values were calculated from the data using the F ST setting and starting migration values were not

54

estimated and began at the default of 100. We employed a Bayesian search strategy with a single long chain of 1,000,000. Bayesian priors for θ were on a logarithmic scale with upper and lower limits of 2 and 0.0001 respectively. Priors for migration were on a linear scale with upper and lower limits of 1000 and 0.01 respectively. Chain heating was employed with 4 temperatures of 1, 1.1, 1.5, and 3.

Figure 3.3 Histograms of identity-by-state values. A) All pairwise IBS values among (A) Southeast Asian samples, (B) African samples, and (C) between Southeast Asian and African samples. Vertical line shows 0.995 cut-off value for “related” groups.

55

Figure 3.4 Average cross-validation error per K. Plot of average CV error rate over 10 replicates for each population number (K).

56

Chapter 4: Detection of Signatures of Positive Selection in Plasmodium falciparum .

Manuscript formatted to be submitted to Genome Research. All samples were collected by local organizations and were provided by to us by Arjen Dondorp, Mark Fukuda,

Francois Nosten, Harald Noedl, Mallika Imwong, Delia Bethell, Youry Se, Chanthap

Lon, Stuart Tyner, David L Saunders, Charlotte Lanteri, Frederic Ariey, Aung Pyae

Phyo, Peter Starzengruber, Hans-Peter Fuehrer, Paul Swododa, Nicholas White, Odile

Mercereau-Puijalon, Didier Menard, Paul Newton, Maniphone Khanthavong, Bouasy

Hongvanthong, Wasif A. Khan, Myaing M. Nyunt, and Myat H. Nyunt, Myat P. Kyaw, Ye

Htut, and Kay Thwe Han. Design and conceptualization were aided by Shannon Takala-

Harrison, Joana Carneiro da Silva, and Christophe V. Plowe. Statistical analysis was aided by Jason Bailey. Whole genome sequence was completed by the team at the Sanger

Centre by Olivo Miotto, Susana Campino, Sarah Auburn, Bronwyn MacInnis, Magnus

Manske, Jacob Almagro-Garcia, Gareth Maslen, Robert Amato, and Dominic

Kwiatkowski. Some DNA microarray samples were run in lab by Christopher Pepin,

Matthew Adams, Kaia Erickson, Amy Wang, and Sudhaunshu Joshi.

4.1 Introduction

Between the years 2000 and 2013 malaria incidence rates have decreased more than 30%

[125], gains that need to be maintained if global eradication is to remain tangible [126].

No vaccine for malaria is currently licensed and drug resistance to the World Health

Organization (WHO) recommended treatment, artemisinin, is present at multiple sites in

Southeast Asia [95, 97, 98, 127]. There is always a need for novel drug targets and therapeutics to replace those lost to resistance, as well as new antigens to develop into vaccines. The increased availability of whole-genome sequence and genotype data for

57

Plasmodium falciparum allows for more comprehensive scans to search for genes of public health interest. One such method is identifying regions of the parasite genome potentially under recent positive selection resulting from natural adaptation or resistance to anti-malaria drugs. Screens of positive selection in humans have shown that genetic variants associated with resistance to malaria infection are under positive selection [128], and studies in malaria recently were used to locate a locus associated with resistance to artemisinin [100, 101]. Those studies highlight the power of tests of selection, and other genome-wide analyses have identified potentially interesting genes within specific populations of parasites and humans. Previously though, studies of positive selection in

P. falciparum have focused on one or few populations making it difficult differentiate between genes under local adaptation and those under selection across entire regions, or globally.

Most genome-wide studies of selection in P. falciparum use long-haplotype methods to search for genomic regions under positive selection. The integrated haplotype score (iHS) method is based on haplotypes within a single population and can detect long haplotypes at intermediate frequency. Cross population extended haplotype homozygosity (XP-

EHH) uses the iHS method to compare haplotype frequencies in two populations, and is useful in detecting haplotypes that have risen to high frequency or fixation in a single population. A third method used in human population studies uses a windowed F ST analysis to search for areas of drastic population differentiation that could have arisen from positive selection. In these studies using iHS and XP-EHH drug-resistance associated genes ( pfcrt , pfmdr1 , pfdhps , pfdhfr , and pfgch1 ) were identified as being in

58

regions under possible positive selection, along with hundreds of other genes of known or unknown functions [129-133].

Our study aims to characterize signatures of selections shared across multiple sites in

Southeast Asia to locate regions of the genome under positive selection due to regional natural adaptations. Selective events that occur in multiple or all populations of a region could help locate genes of essential proteins for use as drug targets or conserved antigens that could be integrated into new vaccine designs. To determine regions under selection we used single nucleotide polymorphism data from samples collected at 19 study sites in

Southeast Asia. Both windowed F ST and XP-EHH methods were utilized for between population studies as well as the population specific iHS method to identify sweeps currently ongoing. Using all three methods, as well as new combined F ST -XP-EHH score to locate regions of both long haplotypes and high population differentiation we find that most signatures of selection are population specific while a subset do appear across multiple populations. We also see signatures of selection in populations with high prevalence of artemisinin resistance that could harbor genes accessory to the recently discovered kelch13 mutations [102].

4.2 Methods

4.2.1 Sampling Locations

Sampling sites were selected from ongoing and completed surveillance studies and clinical trials from all nations contained within the Greater Mekong Sub-Region (GMS:

Cambodia, Laos, Myanmar, Thailand, Vietnam, and the Yunnan Province of China). A total of 31 studies were used from 23 sites in six countries from the GMS, with two studies from Bangladesh and two additional sites in Africa. All samples were collected

59

between 2008 and 2013 and were from studies of ACT or artesunate therapeutic efficacy or from passive surveillance for clinical malaria. Samples are from smear or PCR confirmed P. falciparum infections and met the inclusion criteria of the initial collection study. All samples were collected with approval from local review boards and analysis was completed under institutional review board approval at the University of Maryland

School of Medicine.

4.2.2 Parasite Genotyping

All analyses in this study were completed using SNP data. These SNPs were called from full genome sequence or from a custom DNA microarray [106]. Samples from collaborating studies of adequate quality were sent to the Wellcome Trust Sanger

Institute for full genome sequencing [115]. DNA was extracted from whole venous blood or leukocyte depleted blood. Using Illumina genome sequencing technology samples were sequenced to varying coverage and analyzed using an automated pipeline at the

Sanger Institute to call approximately 400K positions in the genome determined to be variable based on a global population sample set. Samples from studies that did not meet quality control criteria for full genome sequencing, or were from non-contributing studies, were analyzed using a NimbleGen custom DNA microarray, assaying 33,000 variable positions in the parasite genome [106]. The same nucleotide positions typed on the microarray were extracted from full genome data for use in subsequent analysis.

SNP imputation was completed with Beagle [134], which has been used in a similar capacity in multiple genomic studies to fill in missing data in the haplotypes [99-101,

134]. This imputation software uses a haplotype phasing method that looks for linkage

60

disequilibrium (LD) blocks and fits a maximum likelihood model within blocks, giving preference to preexisting haplotypes over newly generated haplotypes. Imputation was completed in each genetic population separately to keep population haplotypes at the frequencies observed in the data. Genotype certainty scores are given for each imputed nucleotide and only those nucleotides with scores >= 0.95 were used.

4.2.3 Detection of Signatures of Positive Selection

We used cross population extended haplotype homozygosity (XP-EHH) [122], integrated haplotype scores (iHS) [135], and population differentiation to detect signatures of positive selection. Both XP-EHH and iHS search for unusually long haplotypes (i.e., extended blocks of linkage disequilibrium) given their frequency in the population. Once the XP-EHH and iHS scores were computed the data was normalized to a distribution with a mean of zero and a variance of one. The distribution of scores is already normal, but with a mean of zero all positive values represents the presence of unusually long haplotypes in the reference population and negative values representative of the extended haplotypes in the non-reference population in XP-EHH and in iHS positive and negative values are representative of ancestral and derived alleles respectively. For calculation of

XP-EHH and iHS scores the program SELSCAN was used with modified parameters.

These parameters included truncation of haplotypes at the end of chromosomes and extension of haplotypes up to 200 KB [136].

In the approach based on population differentiation we used windowed F ST values to look highly differentiated genomic regions between populations, which might be indicative of positive selection [137]. Pairwise F ST values were calculated for every SNP and averaged

61

across 20 SNP windows, with an overlap of 5 SNPs between consecutive windows. F ST values vary between 0 and 1, with 1 corresponding to cases where all the variation in the data is due to differences between the two populations, and 0 to instances when all variants are present in similar frequencies in the two populations, which are therefore identical. To make F ST values across populations pairs (i.e., South Cambodia to West

Cambodia versus South Cambodia to Bangladesh) comparable, we normalized F ST values based on deviation from the average F ST value across an entire chromosome in each population pair. Values between 0 and 1 represent loci of less than average differentiation, values equal to 1 are the mean, and values above 1 are loci of higher than average differentiation. We devised a third statistic, which combines the normalized F ST values with the XP-EHH scores. For the combined XP-EHH-FST statistic, each SNP window was given a score that corresponds to the product of the average normalized F ST value and the average XP-EHH value. This combined measure takes into account the presence of long-haplotypes as well as population differentiation. Using the combined measure within each population comparison, genomic regions were ranked based on score and data within the 95% percentile were deemed significant. Genes were considered under selection in their respective population if the genomic region in which they lay were within the top 5% of scores.

4.3 Results

4.3.1 Genes under selection across multiple geographic sites

Using the combined XP-EHH-FST score, the genome-wide scans of positive selection identified many population-specific selection events, as well as selection events that

62

appear in multiple populations simultaneously. Twenty regions were identified as being highly shared between multiple populations with five genes being under selection in seven geographic locations (Table 4.1). The first gene is in a region under selection on chromosome 1 and appears in all three West Cambodia sites (Battambang, Pailin, and

Pursat), East Thailand (Srisakhet), and North Cambodia (Oddar Meanchey & Preah

Vihear). This extended signature contains thirteen genes with two genes being present in six or seven population comparisons. These genes are PF3D7_0113700 and

PF3D7_0113800 and are a heat-shock protein 40 (type II) [138] and DBL containing

PfEMP-1 paralog [139] respectively. Most genes within this region are exported proteins of unknown function but a third protein is a RESA-like protein. The second and third genes are within a 77kb region on chromosome 11 which contains 18 genes. Both genes are conserved Plasmodium proteins with unknown functions and no assigned gene ontology terms. These genes are directly downstream from 2 related ETRAMP genes

(11.1 & 11.2) that localize to the parasitophorous vacuole and are expressed in early ring stage parasites and upstream of sufD and multiple kinase activators. The fourth gene is under selection across 7 populations is PF3D7_1223500 and again is unannotated. This gene is at the overlap of 2 seemingly distinct signatures. The first of which contains a core of 7 genes and the second a large signature (105kb) centered around the drug resistance gene GTP cyclohydrolase 1 ( gch1 ). The final gene PF3D7_1343800 lacks functional annotation but appears to be integral to the membrane and is directly next to the artemisinin resistance gene kelch13 .

63

Table 4.1 Top regions under selection across multiple populations.

Chr Start Stop Genes MAL1 520807 536351 2 MAL2 509525 521746 1 MAL5 907837 936315 1 MAL6 228132 244521 3 268856 308549 10 MAL8 845253 847380 1 1043813 1059003 5 482461 487554 1 511497 521529 1 MAL9 597694 600561 1 972548 979965 2 136661 139034 2 MAL11 146427 150815 1 MAL12 946864 1010251 14 300988 313929 1 MAL13 1730702 1753486 1 1985337 1995614 1 669692 678772 1 MAL14 2155736 2166467 1 2519205 2528994 3

Among the other signatures with a gene selected in at least 6 different populations there are 447 genes. Among these genes are multiple heat shock proteins (4), DnaJ domain proteins (5), AP2 domain containing transcription factors (5). There are also 2 drug resistance associated genes; pfmdr1 [140] associated with resistance to mefloquine and chloroquine, and gch1 associated with resistance to the antifolate drugs pyrimethamine and sulfadoxine [141]. Unlike other drug resistance loci ( pfcrt , pfdhps , pfdhfr ) these genes are known for effects brought along by increased copy number. Other highly selected genes include a MORN repeat protein, shown in Toxoplasma gondii to be associated with nuclear division during cellular replication [142] and cyp24 , a member of

64

the cyclophilin family which in Plasmodium is the receptor for antimalarial cyclosporins

[143].

4.3.2 Drug Resistance Associated Signatures of Selection

Drug resistance genes often leave large scars in the genome following the selective sweep. Multiple drug-resistance associated sweeps have passed through various parts of

Southeast Asia caused mainly by four genes ( pfcrt , pfmdr1 , pfdhps , and pfdhfr ) responsible for resistance to chloroquine, mefloquine, sulfadoxine, and pyrimethamine.

The latest emergence of resistant to artemisinin has also shown to produce a large signature of selection [100, 101]. When examining these signatures by ancestral genetic subpopulations only the African population shows any selection near kelch13 , and since

ART resistance has not been reported in Africa and resistance is a recent event it is most likely due to another nearby gene unrelated to resistance. When you look at selection by geographic location pfdhfr is under selection at both sites in Laos. pfcrt is only under selection in Bago, Myanmar, and pfmdr1 is under selection in multiple populations including Battambang, Kinshasa, Oddar Meanchey, Pursat, and Srisakhet. dhps is not under high selection in any geographic location but milder selective events appear on visual inspection of the plots in Savannakhet, Kinshasa, Ilorin, Chittagong, and Binh

Phuoc. Artemisinin resistance attributed to kelch13 seems to leave signatures in Bago,

Bihn Phuoc, Chittagong, Ilorin, Kampot, Pailin, Ratanakiri, and Savannakhet. Signatures containing kelch13 are not particularly large (wide) with the exception of Pailin which shows extensive haplotype homozygosity in a very large section of the chromosome.

65

4.4 Discussion

Our analyses looking at signatures of selection in populations of parasites has provided a resource for the malaria community to use in studies of drug resistance and vaccine development. We observed signatures unique to geographic populations as well as ancestral populations that have persisted in multiple geographical locations. Among all signatures we see a large number of chaperone proteins along with genes coding for proteins associated with drug resistance and candidate vaccines. Like the rest of the P. falciparum genome many of the highly selected genes have little associated information beyond putative amino-acid structures (e.g. transmembrane domains). Among the annotated selected genes we find evidence for selection surrounding the drug resistance genes gch1 and kelch13 , and ron2 , which recently has shown promise as an effective vaccine antigen [144].

Drug resistance to artemisinin has been the focal point of malaria research in Southeast

Asia since first reports in 2008 [98, 127]. Since that time resistance has been more fully defined and new tests were developed to define in vitro resistance [145] as well as more clearly understanding clinical resistance [146]. The first genome wide study looking for molecular associations with resistance explored a region on chromosome 13 linked with kelch13 and utilized XP-EHH to narrow regions of interest prior to association analysis

[100]. A second study used a more traditional genome-wide association study (GWAS) and results pointed toward the same region on chromosome 13 as well as two additional loci (chromosomes 10 and 14) [101]. Subsequent studies in Southeast Asia show multiple haplotypes and alleles associated with resistance [95, 99]. In sites where there are multiple resistance alleles selection studies lose power as signal is diluted by the presence

66

of multiple soft sweeps. At Pailin however, where resistance is near fixation, the C580Y mutation is the majority allele and a large signal is present in genome-wide scans of selection.

To attempt to identify accessory mutations that could be working in parallel to kelch13 we explored highly selected genes in Pailin where a single mutation is predominant

(Figure 4.1). Among the top 1% of selection scores we identified 14 non-overlapping windows. Fourteen genes at the core of each selected haplotype were identified and among them were two proteins (PF3D7_1217600 and PF3D7_1333200) involved in cell cycle regulation and could be related to results from a recent transcription study of resistant parasites [147]. This study identified multiple pathways disregulated in artemisinin resistant parasite including downregulation of DNA replication genes which can stall life stage progression.

Figure 4.1 Heat map of selected regions in Pailin, Cambodia. Heat map of regions under possible positive selection in Pailin, Cambodia. Regions are colored by rank percentile of combined F ST -XP-EHH score.

67

68

Among the other results is a strong signature on chromosome 1 that is shared by populations throughout West and North Cambodia and Southeast Thailand. This signature contains a core of four genes, among them three have known association with export into the red blood cell and may be involved in knob formation. These genes, a

HSP90 protein which has known association with exported VAR proteins [148], a pfEMP1 paralog [139], and a RESA-like protein, could be related as a gene-cassette causing selected haplotype, or a single mutation within this set could be responsible. The pattern of distribution of this signature shares similarity with the distribution of the hemoglobin E variant (HbE), which exists at higher levels in this particular area of

Southeast Asia [149]. We hypothesize that this cassette of proteins or an individual protein within the signature has adapted specifically to more efficiently replicate in people with HbE. This discovery could help further elucidate the mechanism with which the parasite manipulates the human red blood cell.

Detection of putative loci under positive selection highlights the benefit of phenotype- free whole-genome analyses in the search for genes of public health interest. Here we describe loci under possible selection associated with drug resistance, human immune system interaction, and natural adaptation to human red cell polymorphisms. These loci can help narrow focus on the development of anti-malaria treatments and novel vaccine antigens and should be used by the malaria community as a resource to inform future genome-wide and gene specific studies.

69

Chapter 5: Discussion and Future Directions

Malaria is a parasitic disease with widespread distribution that threatens the health and life of billions of humans worldwide. Plasmodium falciparum and P. vivax are the most widespread species that cause disease, with P. falciparum being the more deadly of the two. The need for effective ways to treat, eliminate, and eventually eradicate malaria is both perennial and urgent. The sequencing of the genome of P. falciparum gave researchers a rich new tool in the fight against malaria [13], however it was soon realized that such a tool would be far more difficult to use than previously thought [150]. Since its publication thousands of additional genomes have been sequenced and only in the past five years has whole-genome data been fully utilized.

Our studies focused on extracting whole-genome data from clinical samples of P. falciparum malaria for use in population wide association analyses. Whole-genome sequencing remains an expensive endeavor for large scale genomic epidemiological studies so we sought a more scalable option. We designed and tested the first generation of a whole-genome single nucleotide polymorphism (SNP) microarray for use in genotyping parasites extracted from infected individuals in the field [106]. The microarray was designed for maximum throughput while still maintaining whole-genome coverage at high-resolution. We selected approximately 33,000 polymorphic sites in the genome to place onto a NimbleGen 4.2M custom platform. Unlike other P. falciparum microarrays, we used polymorphic sites determined from deep-sequencing of hundreds of global isolates [115], instead of one or a few isolates [82, 151]. Each slide based array has twelve individual sub-arrays to which two separately labeled (colored) samples can be hybridized. Accuracy and sample limits were tested using reference DNA from lab

70

isolates (3D7, NF54, and Dd2) and field samples from Southeast Asia. All reference and test isolates had previously undergone whole-genome sequencing so comparisons could be made with results from hybridized samples.

After successfully testing this microarray, our laboratory possessed the ability to genotype thousands of field samples at relatively low cost for our population genetic studies. This microarray was used in a genome-wide association study of artemisinin resistance and corroborated the kelch13 locus’ association with resistance, and also showed the spread and independent emergence of certain resistant parasites [99]. This tool was also used to genotype parasite samples used in this thesis and for use in future analyses. Unfortunately, NimbleGen has discontinued the production of custom microarray tools and we will not be able to expand this tool to other laboratories and collection studies. For our future work we plan to build a second generation microarray based on the polymorphic loci placed on our first generation array using a new platform.

In total we were able to successfully genotype over 2,000 clinical samples of malaria giving us the largest single Plasmodium genomic dataset assembled. The outcomes of our analysis were two-fold; the first was to use these samples to identify loci associated with recently emerged artemisinin resistance and the second was to use this dataset to characterize the properties of Plasmodium population genomics in Southeast Asia.

Through various collaborations with malaria researchers, programs, and national health systems we were able to collect samples from 21 sites in 7 countries in Southeast Asia, and 2 in Central and West Africa. These samples were from studies of drug efficacy or from passives surveillance between 2008 and 2013 and were all from smear positive cases of P. falciparum malaria. Genotypes used for analysis were from samples derived

71

from whole-genome sequencing or from our custom NimbleGen microarray. In addition to being the origin of most drug resistant strains of P. falciparum , Cambodia also has been shown to have unusual patterns of population structure [99, 101, 104, 131]. Many early studies noted this structure but did not further investigate its causes or consequences. The emergence of artemisinin resistance, again in this area of Cambodia, forced these population genetic questions in focus. Early GWAS showed significant p- value inflation with most of the false associations being due to structure. A mixed-model using a genetic similarity matrix eventually was able to control for the confounded results

[101] but understanding the structure remained important as it can affect the dispersion of resistance alleles and haplotypes across a geographic area.

To explore this population structure we used the model-based program ADMIXTURE

[117], which is similar to STRUCTURE [152] but with a new refinement algorithm for quicker analysis and internal model-checking through cross-validation. We detected 13 genetic sub-populations in Southeast Asia and a single population comprising African parasites. The most structure was seen within West Cambodia; however this was also the site of the densest sampling. It will be essential to further expand the data set to include other geographic sites of similar density to confirm that the structure patterns are unique to Cambodia and not a result of a skewed sampling protocol. We would hypothesize this is a Cambodia specific phenomena as we see shared populations across the second most dense area of Southern Laos and East Cambodia. These three populations share most of their genetic subpopulation structure, along with , with on a minority of parasites belonging to other genetic subpopulations.

72

In addition to determining the population structure, we were also interested in determining the patterns of migration among Southeast Asian parasites. The importance of determining how alleles are shared among populations is to determine locations most at risk or importing and integrating newly emerged artemisinin resistance mutations. We have already seen spread of resistance alleles between West Cambodia and South

Vietnam, areas of low shared population structure [99]. In order to fully assess the effect that population structure and migration rate have on the spread and integration of resistance mutations the Cambodia to Vietnam shared alleles can act as a case study. It will be important to monitor the genetic background of parasites with shared resistance mutations (outside of the haplotype directly surrounding the kelch13 gene) to see if these mutations are being integrated into the subset of parasites with “Cambodia-like” structure or those within the East Southeast Asia sub-population. It will also be important to monitor the allele frequency of native versus imported resistance mutations to determine if there is an associated fitness cost of have a different genetic background.

The final piece of the genetic studies presented in this thesis is the whole-genome scans of positive selection within and across Southeast Asian parasite populations. Positive selection is typically seen in response to natural adaptation or through selective sweeps surrounding drug-resistance loci. Most studies of positive selection in P. falciparum have been coupled with GWAS looking for loci associated with drug resistance [101, 131,

132], while studies of positive selection in humans have focus primarily on natural adaptation in response to different environments. We also sought to use scans of positive selection to identify loci associated with artemisinin resistance, but such a comprehensive data set allowed further exploration into the selective history of parasites throughout

73

Southeast Asia. With the discovery of multiple mutations associated with artemisinin resistance [102] the utility of selective scans diminished greatly. Methods that locate regions under selection due to selective sweeps rely on high-frequency long haplotypes brought about by a resistance conferring mutation [122, 128]. The presence of multiple resistance alleles causes multiple long-haplotypes at low-mid frequencies which can be missed by the methods commonly used. In order to pursue artemisinin resistance associated selective sweeps we had to focus on populations with single predominant resistance alleles, specifically Pailin, West Cambodia where the C580Y mutation is at high frequency [95, 99]. Scans focusing on this population were able to locate regions outside of the kelch13 locus on chromosome 13 that appear to be under positive directional selection. In addition to selective events associated with artemisinin resistance we also wanted to explore loci under selection shared across sites in Southeast Asia. By increased the number of geographic sites being sampled we are able to determine what genomic loci underwent selection as a result of adaptation to the Southeast Asian environment, including new vector and hosts. The greatest benefit of these whole- genome scans of selection is to determine genes of public health importance for use as potential drug targets or vaccine antigens, two thing that are needed as artemisinin resistance increases and parts of the globe are moving toward malaria elimination.

These population genomic studies of structure, migration, and positive selection highlight the wide-range of available and fruitful analysis possible with high quality whole-genome data. We explore not only the interaction of parasites in Southeast Asia through migration, but also site specific features included genetic subpopulations and loci under positive selection. The specific examples we cite in our migration and selection analysis

74

are meant to promote hypothesis driven research into the genes we have highlighted, as well as being a resource for other researchers to use as they discover their own genes of interest. The extensive catalog of genotyped parasites also allows for further association analyses such as the cause of emerging piperaquine resistance in Cambodia. We believe this research lays a firm foundation for population studies in Southeast Asia, and can serve as a basis for even more interesting studies.

75

REFERENCES

1. Morrison DA: Evolution of the Apicomplexa: where are we now? Trends Parasitol 2009, 25: 375-382.

2. Cox-Singh J, Davis TM, Lee KS, Shamsul SS, Matusop A, Ratnam S, Rahman HA, Conway DJ, Singh B: Plasmodium knowlesi malaria in humans is widely distributed and potentially life threatening. Clin Infect Dis 2008, 46: 165-171.

3. Waller RF, McFadden GI: The apicoplast: a review of the derived plastid of apicomplexan parasites . Curr Issues Mol Biol 2005, 7: 57-79.

4. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, Salzberg SL, Stoeckert CJ, Sullivan SA, Yamamoto MM, Hoffman SL, Wortman JR, Gardner MJ, Galinski MR, Barnwell JW, Fraser-Liggett CM: Comparative genomics of the neglected human malaria parasite Plasmodium vivax . Nature 2008, 455: 757-763.

5. Hall N, Karras M, Raine JD, Carlton JM, Kooij TW, Berriman M, Florens L, Janssen CS, Pain A, Christophides GK, James K, Rutherford K, Harris B, Harris D, Churcher C, Quail MA, Ormond D, Doggett J, Trueman HE, Mendoza J, Bidwell SL, Rajandream MA, Carucci DJ, Yates JR, III, Kafatos FC, Janse CJ, Barrell B, Turner CM, Waters AP, Sinden RE: A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses . Science 2005, 307: 82-86.

6. Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ: Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii . Nature 2002, 419: 512-519.

7. Pain A, Bohme U, Berry AE, Mungall K, Finn RD, Jackson AP, Mourier T, Mistry J, Pasini EM, Aslett MA, Balasubrammaniam S, Borgwardt K, Brooks K, Carret C, Carver TJ, Cherevach I, Chillingworth T, Clark TG,

76

Galinski MR, Hall N, Harper D, Harris D, Hauser H, Ivens A, Janssen CS, Keane T, Larke N, Lapp S, Marti M, Moule S, Meyer IM, Ormond D, Peters N, Sanders M, Sanders S, Sargeant TJ, Simmonds M, Smith F, Squares R, Thurston S, Tivey AR, Walker D, White B, Zuiderwijk E, Churcher C, Quail MA, Cowman AF, Turner CM, Rajandream MA, Kocken CH, Thomas AW, Newbold CI, Barrell BG, Berriman M: The genome of the simian and human malaria parasite Plasmodium knowlesi . Nature 2008, 455: 799-803.

8. Conway DJ, Polley SD: Measuring immune selection . Parasitology 2002, 125 Suppl: S3-16.

9. Hviid L: Naturally acquired immunity to Plasmodium falciparum malaria in Africa . Acta Trop 2005, 95: 270-275.

10. Su XZ, Heatwole VM, Wertheimer SP, Guinet F, Herrfeldt JA, Peterson DS, Ravetch JA, Wellems TE: The large diverse gene family var encodes proteins involved in cytoadherence and antigenic variation of Plasmodium falciparum-infected erythrocytes . Cell 1995, 82: 89-100.

11. Baruch DI, Pasloske BL, Singh HB, Bi X, Ma XC, Feldman M, Taraschi TF, Howard RJ: Cloning the P. falciparum gene encoding PfEMP1, a malarial variant antigen and adherence receptor on the surface of parasitized human erythrocytes . Cell 1995, 82: 77-87.

12. Smith JD, Chitnis CE, Craig AG, Roberts DJ, Hudson-Taylor DE, Peterson DS, Pinches R, Newbold CI, Miller LH: Switches in expression of Plasmodium falciparum var genes correlate with changes in antigenic and cytoadherent phenotypes of infected erythrocytes . Cell 1995, 82: 101-110.

13. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum . Nature 2002, 419: 498-511.

14. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R,

77

Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, bu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di F, V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn- Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M: The sequence of the human genome . Science 2001, 291: 1304-1351.

15. Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de B, V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, 78

Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della TA, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH, Hoffman SL: The genome sequence of the malaria mosquito Anopheles gambiae . Science 2002, 298: 129-149.

16. Malaria after the genomes . Lancet 2002, 360: 1107.

17. Pain A, Hertz-Fowler C: Plasmodium genomics: latest milestone . Nat Rev Microbiol 2009, 7: 180-181.

18. Miller LH, Mason SJ, Clyde DF, McGinniss MH: The resistance factor to Plasmodium vivax in blacks. The Duffy-blood-group genotype, FyFy . N Engl J Med 1976, 295: 302-304.

19. Galinski MR, Medina CC, Ingravallo P, Barnwell JW: A reticulocyte-binding protein complex of Plasmodium vivax merozoites . Cell 1992, 69: 1213- 1226.

20. de AW, Jr., Soares MB: Selection of targets for drug development against protozoan parasites . Curr Drug Targets 2009, 10: 193-201.

21. Yeh I, Altman RB: Drug Targets for Plasmodium falciparum: a post-genomic review/survey . Mini Rev Med Chem 2006, 6: 177-202.

22. Bozdech Z, Zhu J, Joachimiak MP, Cohen FE, Pulliam B, DeRisi JL: Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray . Genome Biol 2003, 4: R9.

23. Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL: The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum . PLoS Biol 2003, 1: E5.

24. Takala SL, Plowe CV: Genetic diversity and malaria vaccine design, testing and efficacy: preventing and overcoming 'vaccine resistant malaria' . Parasite Immunol 2009, 31: 560-573.

25. Mu J, Awadalla P, Duan J, McGee KM, Keebler J, Seydel K, McVean GA, Su XZ: Genome-wide variation and identification of vaccine targets in the Plasmodium falciparum genome . Nat Genet 2007, 39: 126-130.

26. Mu J, Myers RA, Jiang H, Liu S, Ricklefs S, Waisberg M, Chotivanich K, Wilairatana P, Krudsood S, White NJ, Udomsangpetch R, Cui L, Ho M,

79

Ou F, Li H, Song J, Li G, Wang X, Seila S, Sokunthea S, Socheat D, Sturdevant DE, Porcella SF, Fairhurst RM, Wellems TE, Awadalla P, Su XZ: Plasmodium falciparum genome-wide scans for positive selection, recombination hot spots and resistance to antimalarial drugs . Nat Genet 2010, 42: 268-271.

27. Alonso PL, Sacarlal J, Aponte JJ, Leach A, Macete E, Milman J, Mandomando I, Spiessens B, Guinovart C, Espasa M, Bassat Q, Aide P, Ofori-Anyinam O, Navia MM, Corachan S, Ceuppens M, Dubois MC, Demoitie MA, Dubovsky F, Menendez C, Tornieporth N, Ballou WR, Thompson R, Cohen J: Efficacy of the RTS,S/AS02A vaccine against Plasmodium falciparum infection and disease in young African children: randomised controlled trial . Lancet 2004, 364: 1411-1420.

28. Ogutu BR, Apollo OJ, McKinney D, Okoth W, Siangla J, Dubovsky F, Tucker K, Waitumbi JN, Diggs C, Wittes J, Malkin E, Leach A, Soisson LA, Milman JB, Otieno L, Holland CA, Polhemus M, Remich SA, Ockenhouse CF, Cohen J, Ballou WR, Martin SK, Angov E, Stewart VA, Lyon JA, Heppner DG, Withers MR: Blood stage malaria vaccine eliciting high antigen-specific antibody concentrations confers no protection to young children in Western Kenya . PLoS ONE 2009, 4: e4708.

29. Sagara I, Dicko A, Ellis RD, Fay MP, Diawara SI, Assadou MH, Sissoko MS, Kone M, Diallo AI, Saye R, Guindo MA, Kante O, Niambele MB, Miura K, Mullen GE, Pierce M, Martin LB, Dolo A, Diallo DA, Doumbo OK, Miller LH, Saul A: A randomized controlled phase 2 trial of the blood stage AMA1-C1/Alhydrogel malaria vaccine in children in Mali . Vaccine 2009, 27: 3090-3098.

30. Takala SL, Coulibaly D, Thera MA, Dicko A, Smith DL, Guindo AB, Kone AK, Traore K, Ouattara A, Djimde AA, Sehdev PS, Lyke KE, Diallo DA, Doumbo OK, Plowe CV: Dynamics of polymorphism in a malaria vaccine antigen at a vaccine-testing site in Mali . PLoS Med 2007, 4: e93.

31. Ouattara A, Mu J, Takala-Harrison S, Saye R, Sagara I, Dicko A, Niangaly A, Duan J, Ellis RD, Miller LH, Su XZ, Plowe CV, Doumbo OK: Lack of allele-specific efficacy of a bivalent AMA1 malaria vaccine . Malar J 2010, 9: 175.

32. Dutta S, Lee SY, Batchelor AH, Lanar DE: Structural basis of antigenic escape of a malaria vaccine candidate . Proc Natl Acad Sci U S A 2007, 104: 12488-12493.

33. Takala SL, Coulibaly D, Thera MA, Batchelor AH, Cummings MP, Escalante AA, Ouattara A, Traore K, Niangaly A, Djimde AA, Doumbo OK, Plowe CV: Extreme polymorphism in a vaccine antigen and risk of clinical

80

malaria: implications for vaccine development . Sci Transl Med 2009, 1: 2ra5.

34. Thera MA, Doumbo OK, Coulibaly D, Laurens MB, Kone AK, Guindo AB, Traore K, Sissoko M, Diallo DA, Diarra I, Kouriba B, Daou M, Dolo A, Baby M, Sissoko MS, Sagara I, Niangaly A, Traore I, Olotu A, Godeaux O, Leach A, Dubois MC, Ballou WR, Cohen J, Thompson D, Dube T, Soisson L, Diggs CL, Takala SL, Lyke KE, House B, Lanar DE, Dutta S, Heppner DG, Plowe CV: Safety and immunogenicity of an AMA1 malaria vaccine in Malian children: results of a phase 1 randomized controlled trial . PLoS ONE 2010, 5: e9041.

35. Crompton PD, Kayala MA, Traore B, Kayentao K, Ongoiba A, Weiss GE, Molina DM, Burk CR, Waisberg M, Jasinskas A, Tan X, Doumbo S, Doumtabe D, Kone Y, Narum DL, Liang X, Doumbo OK, Miller LH, Doolan DL, Baldi P, Felgner PL, Pierce SK: A prospective analysis of the Ab response to Plasmodium falciparum before and after a malaria season by protein microarray . Proc Natl Acad Sci U S A 2010, 107: 6958-6963.

36. Plowe CV, Alonso P, Hoffman SL: The potential role of vaccines in the elimination of falciparum malaria and the eventual eradication of malaria . J Infect Dis 2009, 200: 1646-1649.

37. Luke TC, Hoffman SL: Rationale and plans for developing a non-replicating, metabolically active, radiation-attenuated Plasmodium falciparum sporozoite vaccine . J Exp Biol 2003, 206: 3803-3808.

38. Hoffman SL, Billingsley PF, James E, Richman A, Loyevsky M, Li T, Chakravarty S, Gunasekera A, Chattopadhyay R, Li M, Stafford R, Ahumada A, Epstein JE, Sedegah M, Reyes S, Richie TL, Lyke KE, Edelman R, Laurens MB, Plowe CV, Sim BK: Development of a metabolically active, non-replicating sporozoite vaccine to prevent Plasmodium falciparum malaria . Hum Vaccin 2010, 6: 97-106.

39. Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM: Evidence of artemisinin-resistant malaria in western Cambodia . N Engl J Med 2008, 359: 2619-2620.

40. Dondorp AM, Nosten F, Yi P, Das D, Phyo AP, Tarning J, Lwin KM, Ariey F, Hanpithakpong W, Lee SJ, Ringwald P, Silamut K, Imwong M, Chotivanich K, Lim P, Herdman T, An SS, Yeung S, Singhasivanon P, Day NP, Lindegardh N, Socheat D, White NJ: Artemisinin resistance in Plasmodium falciparum malaria . N Engl J Med 2009, 361: 455-467.

41. Plowe CV, Roper C, Barnwell JW, Happi CT, Joshi HH, Mbacham W, Meshnick SR, Mugittu K, Naidoo I, Price RN, Shafer RW, Sibley CH, Sutherland CJ, Zimmerman PA, Rosenthal PJ: World Antimalarial Resistance

81

Network (WARN) III: molecular markers for drug resistant malaria . Malar J 2007, 6: 121.

42. Bzik DJ, Li WB, Horii T, Inselburg J: Molecular cloning and sequence analysis of the Plasmodium falciparum dihydrofolate reductase-thymidylate synthase gene . Proc Natl Acad Sci U S A 1987, 84: 8360-8364.

43. Peterson DS, Walliker D, Wellems TE: Evidence that a point mutation in dihydrofolate reductase-thymidylate synthase confers resistance to pyrimethamine in falciparum malaria . Proc Natl Acad Sci U S A 1988, 85: 9114-9118.

44. Peterson DS, Milhous WK, Wellems TE: Molecular basis of differential resistance to cycloguanil and pyrimethamine in Plasmodium falciparum malaria . Proc Natl Acad Sci U S A 1990, 87: 3018-3022.

45. Plowe CV, Djimde A, Bouare M, Doumbo O, Wellems TE: Pyrimethamine and proguanil resistance-conferring mutations in Plasmodium falciparum dihydrofolate reductase: polymerase chain reaction methods for surveillance in Africa . Am J Trop Med Hyg 1995, 52: 565-568.

46. Walliker D, Quakyi IA, Wellems TE, McCutchan TF, Szarfman A, London WT, Corcoran LM, Burkot TR, Carter R: Genetic analysis of the human malaria parasite Plasmodium falciparum . Science 1987, 236: 1661- 1666.

47. Wellems TE, Walker-Jonah A, Panton LJ: Genetic mapping of the chloroquine- resistance locus on Plasmodium falciparum chromosome 7 . Proc Natl Acad Sci U S A 1991, 88: 3382-3386.

48. Wellems TE, Panton LJ, Gluzman IY, do R, V, Gwadz RW, Walker-Jonah A, Krogstad DJ: Chloroquine resistance not linked to mdr-like genes in a Plasmodium falciparum cross . Nature 1990, 345: 253-255.

49. Su X, Kirkman LA, Fujioka H, Wellems TE: Complex polymorphisms in an approximately 330 kDa protein are linked to chloroquine-resistant P. falciparum in Southeast Asia and Africa . Cell 1997, 91: 593-603.

50. Fidock DA, Nomura T, Talley AK, Cooper RA, Dzekunov SM, Ferdig MT, Ursos LM, Sidhu AB, Naude B, Deitsch KW, Su XZ, Wootton JC, Roepe PD, Wellems TE: Mutations in the P. falciparum digestive vacuole transmembrane protein PfCRT and evidence for their role in chloroquine resistance . Mol Cell 2000, 6: 861-871.

51. Djimde A, Doumbo OK, Cortese JF, Kayentao K, Doumbo S, Diourte Y, Dicko A, Su XZ, Nomura T, Fidock DA, Wellems TE, Plowe CV, Coulibaly D: A molecular marker for chloroquine-resistant falciparum malaria . N Engl J Med 2001, 344: 257-263.

82

52. Djimde A, Doumbo OK, Steketee RW, Plowe CV: Application of a molecular marker for surveillance of chloroquine-resistant falciparum malaria . Lancet 2001, 358: 890-891.

53. Djimde AA, Dolo A, Ouattara A, Diakite S, Plowe CV, Doumbo OK: Molecular diagnosis of resistance to antimalarial drugs during epidemics and in war zones . J Infect Dis 2004, 190: 853-855.

54. Plowe CV, Roper C, Barnwell JW, Happi CT, Joshi HH, Mbacham W, Meshnick SR, Mugittu K, Naidoo I, Price RN, Shafer RW, Sibley CH, Sutherland CJ, Zimmerman PA, Rosenthal PJ: World Antimalarial Resistance Network (WARN) III: molecular markers for drug resistant malaria . Malar J 2007, 6: 121.

55. Plowe CV: The evolution of drug-resistant malaria . Trans R Soc Trop Med Hyg 2009, 103 Suppl 1: S11-S14.

56. Trape JF, Pison G, Preziosi MP, Enel C, Dulou AD, Delaunay V, Samb B, Lagarde E, Molez JF, Simondon F: Impact of chloroquine resistance on malaria mortality . Comptes Rendus de l'Academie des Sciences de Paris / Life Sciences 1998, 321: 689-697.

57. Hien TT, White NJ: Qinghaosu . Lancet 1993, 341: 603-608.

58. White NJ, Olliaro PL: Strategies for the prevention of antimalarial drug resistance: rationale for combination chemotherapy for malaria . Parasitol Today 1996, 12: 399-401.

59. Dondorp AM, Yeung S, White L, Nguon C, Day NP, Socheat D, von SL: Artemisinin resistance: current status and scenarios for containment . Nat Rev Microbiol 2010, 8: 272-280.

60. Tanner M, de SD: Malaria eradication back on the table . Bull World Health Organ 2008, 86: 82.

61. World Health Organization: Global plan for artemisinin resistance containment (GPARC). ; 2011.

62. Imwong M, Dondorp AM, Nosten F, Yi P, Mungthin M, Hanchana S, Das D, Phyo AP, Lwin KM, Pukrittayakamee S, Lee SJ, Saisung S, Koecharoen K, Nguon C, Day NP, Socheat D, White NJ: Exploring the contribution of candidate genes to artemisinin resistance in Plasmodium falciparum . Antimicrob Agents Chemother 2010, 54: 2886-2892.

63. Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Jr., Daily JP, Sarr O, Ndiaye D, Ndir O, Mboup S, Duraisingh MT, Lukens A, Derr A, Stange-Thomann N, Waggoner S, Onofrio R, Ziaugra L, Mauceli E, Gnerre S, Jaffe DB, Zainoun J, Wiegand RC, Birren BW, Hartl DL,

83

Galagan JE, Lander ES, Wirth DF: A genome-wide map of diversity in Plasmodium falciparum . Nat Genet 2007, 39: 113-119.

64. Anderson TJ, Nair S, Nkhoma S, Williams JT, Imwong M, Yi P, Socheat D, Das D, Chotivanich K, Day NP, White NJ, Dondorp AM: High heritability of malaria parasite clearance rate indicates a genetic basis for artemisinin resistance in western Cambodia . J Infect Dis 2010, 201: 1326-1330.

65. Cummings MP, Segal MR: Few amino acid positions in rpoB are associated with most of the rifampin resistance in Mycobacterium tuberculosis . BMC Bioinformatics 2004, 5: 137.

66. Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK: Signals of recent positive selection in a worldwide sample of human populations . Genome Res 2009, 19: 826-837.

67. Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome . PLoS Biol 2006, 4: e72.

68. Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G,

84

Evans DM, Morris AP, Weir BS, Tsunoda T, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archeveque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL: Genome-wide detection and characterization of positive selection in human populations . Nature 2007, 449: 913-918.

69. Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure . Nature 2002, 419: 832-837.

70. Sibley CH, Barnes KI, Watkins WM, Plowe CV: A network to monitor antimalarial drug resistance: a plan for moving forward . Trends Parasitol 2008, 24: 43-48.

71. Sibley CH, Barnes KI, Plowe CV: The rationale and plan for creating a World Antimalarial Resistance Network (WARN) . Malar J 2007, 6: 118.

72. A global network for investigating the genomic epidemiology of malaria . Nature 2008, 456: 732-737.

73. Thera MA, Plowe CV: Vaccines for malaria: how close are we? Annu Rev Med 2012, 63: 345-357.

74. Dondorp AM, Fairhurst RM, Slutsker L, Macarthur JR, Breman JG, Guerin PJ, Wellems TE, Ringwald P, Newman RD, Plowe CV: The threat of artemisinin-resistant malaria . N Engl J Med 2011, 365: 1073-1075.

75. Takala-Harrison S, Clark TG, Jacob CG, Cummings MP, Miotto O, Dondorp AM, Fukuda MM, Nosten F, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, Socheat D, Ariey F, Phyo AP, Starzengruber P, Fuehrer HP, Swoboda P, Stepniewska K, Flegg J, Arze C, Cerqueira GC, Silva JC, Ricklefs SM, Porcella SF, Stephens RM, Adams M, Kenefic LJ, Campino S, Auburn S, MacInnis B, Kwiatkowski DP, Su XZ, White NJ, Ringwald P, Plowe CV: Genetic loci associated with delayed

85

clearance of Plasmodium falciparum following artemisinin treatment in Southeast Asia . Proc Natl Acad Sci U S A 2013, 110: 240-245.

76. Miotto O, Almagro-Garcia J, Manske M, MacInnis B, Campino S, Rockett KA, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Duong S, Nguon C, Chuor CM, Saunders D, Se Y, Lon C, Fukuda MM, Amenga-Etego L, Hodgson AV, Asoala V, Imwong M, Takala-Harrison S, Nosten F, Su XZ, Ringwald P, Ariey F, Dolecek C, Hien TT, Boni MF, Thai CQ, Amambua-Ngwa A, Conway DJ, Djimde AA, Doumbo OK, Zongo I, Ouedraogo JB, Alcock D, Drury E, Auburn S, Koch O, Sanders M, Hubbart C, Maslen G, Ruano-Rubio V, Jyothi D, Miles A, O'Brien J, Gamble C, Oyola SO, Rayner JC, Newbold CI, Berriman M, Spencer CC, McVean G, Day NP, White NJ, Bethell D, Dondorp AM, Plowe CV, Fairhurst RM, Kwiatkowski DP: Multiple populations of artemisinin- resistant Plasmodium falciparum in Cambodia . Nat Genet 2013, 45: 648-655.

77. Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois AC, Khim N, Kim S, Duru V, Bouchier C, Ma L, Lim P, Leang R, Duong S, Sreng S, Suon S, Chuor CM, Bout DM, Menard S, Rogers WO, Genton B, Fandeur T, Miotto O, Ringwald P, Le BJ, Berry A, Barale JC, Fairhurst RM, Benoit- Vical F, Mercereau-Puijalon O, Menard D: A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature 2014, 505: 50-55.

78. Cheeseman IH, Miller BA, Nair S, Nkhoma S, Tan A, Tan JC, Al SS, Phyo AP, Moo CL, Lwin KM, McGready R, Ashley E, Imwong M, Stepniewska K, Yi P, Dondorp AM, Mayxay M, Newton PN, White NJ, Nosten F, Ferdig MT, Anderson TJ: A major genome region underlying artemisinin resistance in malaria . Science 2012, 336: 79-82.

79. Manske M, Miotto O, Campino S, Auburn S, magro-Garcia J, Maslen G, O'Brien J, Djimde A, Doumbo O, Zongo I, Ouedraogo JB, Michon P, Mueller I, Siba P, Nzila A, Borrmann S, Kiara SM, Marsh K, Jiang H, Su XZ, Amaratunga C, Fairhurst R, Socheat D, Nosten F, Imwong M, White NJ, Sanders M, Anastasi E, Alcock D, Drury E, Oyola S, Quail MA, Turner DJ, Ruano-Rubio V, Jyothi D, menga-Etego L, Hubbart C, Jeffreys A, Rowlands K, Sutherland C, Roper C, Mangano V, Modiano D, Tan JC, Ferdig MT, mambua-Ngwa A, Conway DJ, Takala-Harrison S, Plowe CV, Rayner JC, Rockett KA, Clark TG, Newbold CI, Berriman M, MacInnis B, Kwiatkowski DP: Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing . Nature 2012, 487: 375-379.

80. Tan JC, Miller BA, Tan A, Patel JJ, Cheeseman IH, Anderson TJ, Manske M, Maslen G, Kwiatkowski DP, Ferdig MT: An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations . Genome Biol 2011, 12: R35.

86

81. Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG, Kwiatkowski DP: Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay . PLoS One 2011, 6: e20251.

82. Jiang H, Yi M, Mu J, Zhang L, Ivens A, Klimczak LJ, Huyen Y, Stephens RM, Su XZ: Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray . BMC Genomics 2008, 9: 398.

83. Kidgell C, Volkman SK, Daily J, Borevitz JO, Plouffe D, Zhou Y, Johnson JR, Le RK, Sarr O, Ndir O, Mboup S, Batalov S, Wirth DF, Winzeler EA: A systematic map of genetic variation in Plasmodium falciparum . PLoS Pathog 2006, 2: e57.

84. Takala-Harrison S, Jacob CG, Arze C, Cummings MP, Silva JC, Dondorp AM, Fukuda MM, Hien TT, Mayxay M, Noedl H, Nosten F, Kyaw MP, Nhien NT, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, Ariey F, Mercereau-Puijalon O, Menard D, Newton PN, Khanthavong M, Hongvanthong B, Starzengruber P, Fuehrer HP, Swoboda P, Khan WA, Phyo AP, Nyunt MM, Nyunt MH, Brown TS, Adams M, Pepin CS, Bailey J, Tan JC, Ferdig MT, Clark TG, Miotto O, MacInnis B, Kwiatkowski DP, White NJ, Ringwald P, Plowe CV: Independent Emergence of Artemisinin Resistance Mutations Among Plasmodium falciparum in Southeast Asia . J Infect Dis 2014.

85. Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, Clark TG: A genotype calling algorithm for the Illumina BeadArray platform . Bioinformatics 2007, 23: 2741-2746.

86. Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data . Biostatistics 2007, 8: 485-499.

87. Dharia NV, Sidhu AB, Cassera MB, Westenberger SJ, Bopp SE, Eastman RT, Plouffe D, Batalov S, Park DJ, Volkman SK, Wirth DF, Zhou Y, Fidock DA, Winzeler EA: Use of high-density tiling microarrays to identify mutations globally and elucidate mechanisms of drug resistance in Plasmodium falciparum . Genome Biol 2009, 10: R21.

88. Takala SL, Plowe CV: Genetic diversity and malaria vaccine design, testing and efficacy: preventing and overcoming 'vaccine resistant malaria' . Parasite Immunol 2009, 31: 560-573.

87

89. Wootton JC, Feng X, Ferdig MT, Cooper RA, Mu J, Baruch DI, Magill AJ, Su XZ: Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum . Nature 2002, 418: 320-323.

90. Pearce RJ, Pota H, Evehe MS, Ba e, Mombo-Ngoma G, Malisa AL, Ord R, Inojosa W, Matondo A, Diallo DA, Mbacham W, van den Broek IV, Swarthout TD, Getachew A, Dejene S, Grobusch MP, Njie F, Dunyo S, Kweku M, Owusu-Agyei S, Chandramohan D, Bonnet M, Guthmann JP, Clarke S, Barnes KI, Streat E, Katokele ST, Uusiku P, Agboghoroma CO, Elegba OY, Cisse B, Elbasit IE, Giha HA, Kachur SP, Lynch C, Rwakimari JB, Chanda P, Hawela M, Sharp B, Naidoo I, Roper C: Multiple origins and regional dispersal of resistant dhps in African Plasmodium falciparum malaria . PLoS Med 2009, 6: e1000055.

91. Roper C, Pearce R, Nair S, Sharp B, Nosten F, Anderson T: Intercontinental spread of pyrimethamine-resistant malaria . Science 2004, 305: 1124.

92. Mita T, Venkatesan M, Ohashi J, Culleton R, Takahashi N, Tsukahara T, Ndounga M, Dysoley L, Endo H, Hombhanje F, Ferreira MU, Plowe CV, Tanabe K: Limited geographical origin and global spread of sulfadoxine-resistant dhps alleles in Plasmodium falciparum populations . J Infect Dis 2011, 204: 1980-1988.

93. Plowe CV: The evolution of drug-resistant malaria . Trans R Soc Trop Med Hyg 2009, 103 Suppl 1: S11-S14.

94. Amaratunga C, Sreng S, Suon S, Phelps ES, Stepniewska K, Lim P, Zhou C, Mao S, Anderson JM, Lindegardh N, Jiang H, Song J, Su XZ, White NJ, Dondorp AM, Anderson TJ, Fay MP, Mu J, Duong S, Fairhurst RM: Artemisinin-resistant Plasmodium falciparum in Pursat province, western Cambodia: a parasite clearance rate study . Lancet Infect Dis 2012, 12: 851-858.

95. Ashley EA, Dhorda M, Fairhurst RM, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Mao S, Sam B, Sopha C, Chuor CM, Nguon C, Sovannaroth S, Pukrittayakamee S, Jittamala P, Chotivanich K, Chutasmit K, Suchatsoonthorn C, Runcharoen R, Hien TT, Thuy-Nhien NT, Thanh NV, Phu NH, Htut Y, Han KT, Aye KH, Mokuolu OA, Olaosebikan RR, Folaranmi OO, Mayxay M, Khanthavong M, Hongvanthong B, Newton PN, Onyamboko MA, Fanello CI, Tshefu AK, Mishra N, Valecha N, Phyo AP, Nosten F, Yi P, Tripura R, Borrmann S, Bashraheil M, Peshu J, Faiz MA, Ghose A, Hossain MA, Samad R, Rahman MR, Hasan MM, Islam A, Miotto O, Amato R, MacInnis B, Stalker J, Kwiatkowski DP, Bozdech Z, Jeeyapant A, Cheah PY, Sakulthaew T, Chalk J, Intharabut B, Silamut K, Lee SJ, Vihokhern B, Kunasol C, Imwong M, Tarning J, Taylor WJ, Yeung S, Woodrow CJ, Flegg JA, Das D, Smith J, Venkatesan M, Plowe CV, Stepniewska K, Guerin PJ, Dondorp AM, Day NP, White NJ: Spread

88

of artemisinin resistance in Plasmodium falciparum malaria . N Engl J Med 2014, 371: 411-423.

96. Phyo AP, Nkhoma S, Stepniewska K, Ashley EA, Nair S, McGready R, ler MC, Al-Saai S, Dondorp AM, Lwin KM, Singhasivanon P, Day NP, White NJ, Anderson TJ, Nosten F: Emergence of artemisinin-resistant malaria on the western border of Thailand: a longitudinal study . Lancet 2012, 379: 1960-1966.

97. Kyaw MP, Nyunt MH, Chit K, Aye MM, Aye KH, Aye MM, Lindegardh N, Tarning J, Imwong M, Jacob CG, Rasmussen C, Perin J, Ringwald P, Nyunt MM: Reduced susceptibility of Plasmodium falciparum to artesunate in southern Myanmar . PLoS One 2013, 8: e57689.

98. Dondorp AM, Nosten F, Yi P, Das D, Phyo AP, Tarning J, Lwin KM, Ariey F, Hanpithakpong W, Lee SJ, Ringwald P, Silamut K, Imwong M, Chotivanich K, Lim P, Herdman T, An SS, Yeung S, Singhasivanon P, Day NP, Lindegardh N, Socheat D, White NJ: Artemisinin resistance in Plasmodium falciparum malaria . N Engl J Med 2009, 361: 455-467.

99. Takala-Harrison S, Jacob CG, Arze C, Cummings MP, Silva JC, Dondorp AM, Fukuda MM, Hien TT, Mayxay M, Noedl H, Nosten F, Kyaw MP, Nhien NT, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, Ariey F, Mercereau-Puijalon O, Menard D, Newton PN, Khanthavong M, Hongvanthong B, Starzengruber P, Fuehrer HP, Swoboda P, Khan WA, Phyo AP, Nyunt MM, Nyunt MH, Brown TS, Adams M, Pepin CS, Bailey J, Tan JC, Ferdig MT, Clark TG, Miotto O, MacInnis B, Kwiatkowski DP, White NJ, Ringwald P, Plowe CV: Independent Emergence of Artemisinin Resistance Mutations Among Plasmodium falciparum in Southeast Asia . J Infect Dis 2014.

100. Cheeseman IH, Miller BA, Nair S, Nkhoma S, Tan A, Tan JC, Al SS, Phyo AP, Moo CL, Lwin KM, McGready R, Ashley E, Imwong M, Stepniewska K, Yi P, Dondorp AM, Mayxay M, Newton PN, White NJ, Nosten F, Ferdig MT, Anderson TJ: A major genome region underlying artemisinin resistance in malaria . Science 2012, 336: 79-82.

101. Takala-Harrison S, Clark TG, Jacob CG, Cummings MP, Miotto O, Dondorp AM, Fukuda MM, Nosten F, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, Socheat D, Ariey F, Phyo AP, Starzengruber P, Fuehrer HP, Swoboda P, Stepniewska K, Flegg J, Arze C, Cerqueira GC, Silva JC, Ricklefs SM, Porcella SF, Stephens RM, Adams M, Kenefic LJ, Campino S, Auburn S, MacInnis B, Kwiatkowski DP, Su XZ, White NJ, Ringwald P, Plowe CV: Genetic loci associated with delayed clearance of Plasmodium falciparum following artemisinin treatment in Southeast Asia . Proc Natl Acad Sci U S A 2013, 110: 240-245.

89

102. Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois AC, Khim N, Kim S, Duru V, Bouchier C, Ma L, Lim P, Leang R, Duong S, Sreng S, Suon S, Chuor CM, Bout DM, Menard S, Rogers WO, Genton B, Fandeur T, Miotto O, Ringwald P, Le BJ, Berry A, Barale JC, Fairhurst RM, Benoit- Vical F, Mercereau-Puijalon O, Menard D: A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature 2014, 505: 50-55.

103. Straimer J, Gnadig NF, Witkowski B, Amaratunga C, Duru V, Ramadani AP, Dacheux M, Khim N, Zhang L, Lam S, Gregory PD, Urnov FD, Mercereau-Puijalon O, Benoit-Vical F, Fairhurst RM, Menard D, Fidock DA: K13-propeller mutations confer artemisinin resistance in Plasmodium falciparum clinical isolates . Science 2014.

104. Miotto O, Almagro-Garcia J, Manske M, MacInnis B, Campino S, Rockett KA, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Duong S, Nguon C, Chuor CM, Saunders D, Se Y, Lon C, Fukuda MM, Amenga-Etego L, Hodgson AV, Asoala V, Imwong M, Takala-Harrison S, Nosten F, Su XZ, Ringwald P, Ariey F, Dolecek C, Hien TT, Boni MF, Thai CQ, Amambua-Ngwa A, Conway DJ, Djimde AA, Doumbo OK, Zongo I, Ouedraogo JB, Alcock D, Drury E, Auburn S, Koch O, Sanders M, Hubbart C, Maslen G, Ruano-Rubio V, Jyothi D, Miles A, O'Brien J, Gamble C, Oyola SO, Rayner JC, Newbold CI, Berriman M, Spencer CC, McVean G, Day NP, White NJ, Bethell D, Dondorp AM, Plowe CV, Fairhurst RM, Kwiatkowski DP: Multiple populations of artemisinin- resistant Plasmodium falciparum in Cambodia . Nat Genet 2013, 45: 648-655.

105. Miotto O, Amato R, Ashley EA, MacInnis B, Almagro-Garcia J, Amaratunga C, Lim P, Mead D, Oyola SO, Dhorda M, Imwong M, Woodrow C, Manske M, Stalker J, Drury E, Campino S, Amenga-Etego L, Thanh TN, Tran HT, Ringwald P, Bethell D, Nosten F, Phyo AP, Pukrittayakamee S, Chotivanich K, Chuor CM, Nguon C, Suon S, Sreng S, Newton PN, Mayxay M, Khanthavong M, Hongvanthong B, Htut Y, Han KT, Kyaw MP, Faiz MA, Fanello CI, Onyamboko M, Mokuolu OA, Jacob CG, Takala-Harrison S, Plowe CV, Day NP, Dondorp AM, Spencer CC, McVean G, Fairhurst RM, White NJ, Kwiatkowski DP: Genetic architecture of artemisinin-resistant Plasmodium falciparum . Nat Genet 2015.

106. Jacob CG, Tan JC, Miller BA, Tan A, Takala-Harrison S, Ferdig MT, Plowe CV: A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity . BMC Genomics 2014, 15: 719.

107. Zhou X, Stephens M: Genome-wide efficient mixed-model analysis for association studies . Nat Genet 2012, 44: 821-824.

90

108. Rathod PK, McErlean T, Lee PC: Variations in frequencies of drug resistance in Plasmodium falciparum . Proc Natl Acad Sci U S A 1997, 94: 9389- 9393.

109. Brown TS, Jacob CG, Silva JC, Takala-Harrison S, Djimde A, Dondorp AM, Fukuda M, Noedl H, Nyunt MM, Kyaw MP, Mayxay M, Hien TT, Plowe CV, Cummings MP: Plasmodium falciparum field isolates from areas of repeated emergence of drug resistant malaria show no evidence of hypermutator phenotype . Infect Genet Evol 2014.

110. Delacollette C, D'Souza C, Christophel E, Thimasarn K, Abdur R, Bell D, Dai TC, Gopinath D, Lu S, Mendoza R, Ortega L, Rastogi R, Tantinimitkul C, Ehrenberg J: Malaria trends and challenges in the Greater Mekong Subregion . Southeast Asian J Trop Med Public Health 2009, 40: 674-691.

111. Beerli P: Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations . Mol Ecol 2004, 13: 827-836.

112. Wangroongsarb P, Sudathip P, Satimai W: Characteristics and malaria prevalence of migrant populations in malaria-endemic areas along the Thai-Cambodian border . Southeast Asian J Trop Med Public Health 2012, 43: 261-269.

113. Smith C, Whittaker M: Beyond mobile populations: a critical review of the literature on malaria and population mobility and suggestions for future directions . Malar J 2014, 13: 307.

114. Anderson TJ, Nair S, Sudimack D, Williams JT, Mayxay M, Newton PN, Guthmann JP, Smithuis FM, Tran TH, van dB, IV, White NJ, Nosten F: Geographical distribution of selected and putatively neutral SNPs in Southeast Asian malaria parasites . Mol Biol Evol 2005, 22: 2362-2374.

115. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, O'Brien J, Djimde A, Doumbo O, Zongo I, Ouedraogo JB, Michon P, Mueller I, Siba P, Nzila A, Borrmann S, Kiara SM, Marsh K, Jiang H, Su XZ, Amaratunga C, Fairhurst R, Socheat D, Nosten F, Imwong M, White NJ, Sanders M, Anastasi E, Alcock D, Drury E, Oyola S, Quail MA, Turner DJ, Ruano-Rubio V, Jyothi D, Amenga-Etego L, Hubbart C, Jeffreys A, Rowlands K, Sutherland C, Roper C, Mangano V, Modiano D, Tan JC, Ferdig MT, Amambua-Ngwa A, Conway DJ, Takala-Harrison S, Plowe CV, Rayner JC, Rockett KA, Clark TG, Newbold CI, Berriman M, MacInnis B, Kwiatkowski DP: Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing . Nature 2012, 487: 375-379.

116. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for whole- 91

genome association and population-based linkage analyses . Am J Hum Genet 2007, 81: 559-575.

117. Alexander DH, Novembre J, Lange K: Fast model-based estimation of ancestry in unrelated individuals . Genome Res 2009, 19: 1655-1664.

118. R Development Core Team: R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna). ; 2008.

119. Wright S: Systems of Mating. Genetics 1921, 6: 111-178.

120. Wright S: Isolation by Distance . Genetics 1943, 28: 114-138.

121. Kuhner MK: LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters . Bioinformatics 2006, 22: 768-770.

122. Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF,

92

Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archeveque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL: Genome-wide detection and characterization of positive selection in human populations . Nature 2007, 449: 913-918.

123. Escalante AA, Lal AA, Ayala FJ: Genetic polymorphism and natural selection in the malaria parasite Plasmodium falciparum . Genetics 1998, 149: 189-202.

124. McGill JR, Walkup EA, Kuhner MK: Correcting coalescent analyses for panel- based SNP ascertainment . Genetics 2013, 193: 1185-1196.

125. World Health Organization Global Malaria Programme: World Malaria Report 2014. ; 2014.

126. Roberts L, Enserink M: Malaria. Did they really say ... eradication? Science 2007, 318: 1544-1545.

127. Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM: Evidence of artemisinin-resistant malaria in western Cambodia . N Engl J Med 2008, 359: 2619-2620.

128. Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure . Nature 2002, 419: 832-837.

129. Amambua-Ngwa A, Tetteh KK, Manske M, Gomez-Escobar N, Stewart LB, Deerhake ME, Cheeseman IH, Newbold CI, Holder AA, Knuepfer E, Janha O, Jallow M, Campino S, MacInnis B, Kwiatkowski DP, Conway DJ: Population genomic scan for candidate signatures of balancing selection to guide antigen characterization in malaria parasites . PLoS Genet 2012, 8: e1002992.

130. Ocholla H, Preston MD, Mipando M, Jensen AT, Campino S, MacInnis B, Alcock D, Terlouw A, Zongo I, Oudraogo JB, Djimde AA, Assefa S, Doumbo OK, Borrmann S, Nzila A, Marsh K, Fairhurst RM, Nosten F, Anderson TJ, Kwiatkowski DP, Craig A, Clark TG, Montgomery J: Whole-Genome Scans Provide Evidence of Adaptive Evolution in Malawian Plasmodium falciparum Isolates . J Infect Dis 2014, 210: 1991-2000. 93

131. Mu J, Myers RA, Jiang H, Liu S, Ricklefs S, Waisberg M, Chotivanich K, Wilairatana P, Krudsood S, White NJ, Udomsangpetch R, Cui L, Ho M, Ou F, Li H, Song J, Li G, Wang X, Seila S, Sokunthea S, Socheat D, Sturdevant DE, Porcella SF, Fairhurst RM, Wellems TE, Awadalla P, Su XZ: Plasmodium falciparum genome-wide scans for positive selection, recombination hot spots and resistance to antimalarial drugs . Nat Genet 2010, 42: 268-271.

132. Van TD, Park DJ, Schaffner SF, Neafsey DE, Angelino E, Cortese JF, Barnes KG, Rosen DM, Lukens AK, Daniels RF, Milner DA, Jr., Johnson CA, Shlyakhter I, Grossman SR, Becker JS, Yamins D, Karlsson EK, Ndiaye D, Sarr O, Mboup S, Happi C, Furlotte NA, Eskin E, Kang HM, Hartl DL, Birren BW, Wiegand RC, Lander ES, Wirth DF, Volkman SK, Sabeti PC: Identification and functional validation of the novel antimalarial resistance locus PF10_0355 in Plasmodium falciparum. PLoS Genet 2011, 7: e1001383.

133. Borrmann S, Straimer J, Mwai L, Abdi A, Rippert A, Okombo J, Muriithi S, Sasi P, Kortok MM, Lowe B, Campino S, Assefa S, Auburn S, Manske M, Maslen G, Peshu N, Kwiatkowski DP, Marsh K, Nzila A, Clark TG: Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya . Sci Rep 2013, 3: 3318.

134. Browning SR, Browning BL: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering . Am J Hum Genet 2007, 81: 1084-1097.

135. Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome . PLoS Biol 2006, 4: e72.

136. Szpiech ZA, Hernandez RD: selscan: an efficient multithreaded program to perform EHH-based scans for positive selection . Mol Biol Evol 2014, 31: 2824-2827.

137. Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG: Measures of human population structure show heterogeneity among genomic regions . Genome Res 2005, 15: 1468-1476.

138. Kulzer S, Charnaud S, Dagan T, Riedel J, Mandal P, Pesce ER, Blatch GL, Crabb BS, Gilson PR, Przyborski JM: Plasmodium falciparum-encoded exported hsp70/hsp40 chaperone/co-chaperone complexes within the host erythrocyte . Cell Microbiol 2012, 14: 1784-1795.

139. Rask TS, Hansen DA, Theander TG, Gorm PA, Lavstsen T: Plasmodium falciparum erythrocyte membrane protein 1 diversity in seven genomes--divide and conquer . PLoS Comput Biol 2010, 6.

94

140. Wilson CM, Volkman SK, Thaithong S, Martin RK, Kyle DE, Milhous WK, Wirth DF: Amplification of pfmdr 1 associated with mefloquine and halofantrine resistance in Plasmodium falciparum from Thailand . Mol Biochem Parasitol 1993, 57: 151-160.

141. Nair S, Miller B, Barends M, Jaidee A, Patel J, Mayxay M, Newton P, Nosten F, Ferdig MT, Anderson TJ: Adaptive copy number evolution in malaria parasites . PLoS Genet 2008, 4: e1000243.

142. Gubbels MJ, Vaishnava S, Boot N, Dubremetz JF, Striepen B: A MORN-repeat protein is a dynamic component of the Toxoplasma gondii cell division apparatus . J Cell Sci 2006, 119: 2236-2245.

143. Kumar R, Musiyenko A, Barik S: Plasmodium falciparum calcineurin and its association with heat shock protein 90: mechanisms for the antimalarial activity of cyclosporin A and synergism with geldanamycin . Mol Biochem Parasitol 2005, 141: 29-37.

144. Srinivasan P, Ekanem E, Diouf A, Tonkin ML, Miura K, Boulanger MJ, Long CA, Narum DL, Miller LH: Immunization with a functional protein complex required for erythrocyte invasion protects against lethal malaria . Proc Natl Acad Sci U S A 2014, 111: 10311-10316.

145. Witkowski B, Amaratunga C, Khim N, Sreng S, Chim P, Kim S, Lim P, Mao S, Sopha C, Sam B, Anderson JM, Duong S, Chuor CM, Taylor WR, Suon S, Mercereau-Puijalon O, Fairhurst RM, Menard D: Novel phenotypic assays for the detection of artemisinin-resistant Plasmodium falciparum malaria in Cambodia: in-vitro and ex-vivo drug-response studies . Lancet Infect Dis 2013, 13: 1043-1049.

146. Flegg JA, Guerin PJ, Nosten F, Ashley EA, Phyo AP, Dondorp AM, Fairhurst RM, Socheat D, Borrmann S, Bjorkman A, Martensson A, Mayxay M, Newton PN, Bethell D, Se Y, Noedl H, Diakite M, Djimde AA, Hien TT, White NJ, Stepniewska K: Optimal sampling designs for estimation of Plasmodium falciparum clearance rates in patients treated with artemisinin derivatives . Malar J 2013, 12: 411.

147. Mok S, Ashley EA, Ferreira PE, Zhu L, Lin Z, Yeo T, Chotivanich K, Imwong M, Pukrittayakamee S, Dhorda M, Nguon C, Lim P, Amaratunga C, Suon S, Hien TT, Htut Y, Faiz MA, Onyamboko MA, Mayxay M, Newton PN, Tripura R, Woodrow CJ, Miotto O, Kwiatkowski DP, Nosten F, Day NP, Preiser PR, White NJ, Dondorp AM, Fairhurst RM, Bozdech Z: Drug resistance. Population transcriptomics of human malaria parasites reveals the mechanism of artemisinin resistance . Science 2015, 347: 431-435.

148. Kulzer S, Rug M, Brinkmann K, Cannon P, Cowman A, Lingelbach K, Blatch GL, Maier AG, Przyborski JM: Parasite-encoded Hsp40 proteins define 95

novel mobile structures in the cytosol of the P. falciparum-infected erythrocyte . Cell Microbiol 2010, 12: 1398-1420.

149. Taylor SM, Cerami C, Fairhurst RM: Hemoglobinopathies: slicing the Gordian knot of Plasmodium falciparum malaria pathogenesis . PLoS Pathog 2013, 9: e1003327.

150. Volkman SK, Ndiaye D, Diakite M, Koita OA, Nwakanma D, Daniels RF, Park DJ, Neafsey DE, Muskavitch MA, Krogstad DJ, Sabeti PC, Hartl DL, Wirth DF: Application of genomics to field investigations of malaria by the international centers of excellence for malaria research . Acta Trop 2012, 121: 324-332.

151. Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Jr., Daily JP, Sarr O, Ndiaye D, Ndir O, Mboup S, Duraisingh MT, Lukens A, Derr A, Stange-Thomann N, Waggoner S, Onofrio R, Ziaugra L, Mauceli E, Gnerre S, Jaffe DB, Zainoun J, Wiegand RC, Birren BW, Hartl DL, Galagan JE, Lander ES, Wirth DF: A genome-wide map of diversity in Plasmodium falciparum . Nat Genet 2007, 39: 113-119.

152. Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data . Genetics 2000, 155: 945-959.

96