Anopheles Gambiae Genome Conservation As a Resource for Rational Gene Drive Target Site Selection
Total Page:16
File Type:pdf, Size:1020Kb
insects Article Anopheles gambiae Genome Conservation as a Resource for Rational Gene Drive Target Site Selection Nace Kranjc 1 , Andrea Crisanti 1, Tony Nolan 2,* and Federica Bernardini 1,* 1 Department of Life Sciences, Imperial College, London SW7 2AZ, UK; [email protected] (N.K.); [email protected] (A.C.) 2 Department of Vector Biology, Liverpool School of Tropical Medicine, Liverpool L3 5QA, UK * Correspondence: [email protected] (T.N.); [email protected] (F.B.) Simple Summary: Malaria is a huge public health burden that affects predominantly sub-Saharan Africa and is transmitted by Anopheles mosquitoes. As a measure for population control, a method called gene drive has been recently developed, which relies on genetic engineering to introduce specific genetic traits into mosquito populations. Gene drives are designed to insert at specific target sites in the mosquito genome. The efficacy of gene drives greatly depends on the selection of appropriate target sites that are functionally or structurally constrained and less likely to tolerate mutations that can hinder the spread of the desired trait in the population. The aim of this study was to perform a genome-wide analysis of highly conserved genomic regions in Anopheles gambiae and introduce a measure of conservation that could indicate sites of functional or structural constraint. The results of this analysis are gathered in a publicly available dataset that can support gene drive target selection and can offer further insights in the nature of conserved genomic regions. Abstract: The increase in molecular tools for the genetic engineering of insect pests and disease vectors, such as Anopheles mosquitoes that transmit malaria, has led to an unprecedented investigation of the genomic landscape of these organisms. The understanding of genome variability in wild Citation: Kranjc, N.; Crisanti, A.; mosquito populations is of primary importance for vector control strategies. This is particularly the Nolan, T.; Bernardini, F. Anopheles case for gene drive systems, which look to introduce genetic traits into a population by targeting gambiae Genome Conservation as a specific genomic regions. Gene drive targets with functional or structural constraints are highly Resource for Rational Gene Drive desirable as they are less likely to tolerate mutations that prevent targeting by the gene drive and Target Site Selection. Insects 2021, 12, consequent failure of the technology. In this study we describe a bioinformatic pipeline that allows 97. https://doi.org/10.3390/ insects12020097 the analysis of whole genome data for the identification of highly conserved regions that can point at potential functional or structural constraints. The analysis was conducted across the genomes Received: 18 December 2020 of 22 insect species separated by more than hundred million years of evolution and includes the Accepted: 19 January 2021 observed genomic variation within field caught samples of Anopheles gambiae and Anopheles coluzzii, Published: 23 January 2021 the two most dominant malaria vectors. This study offers insight into the level of conservation at a genome-wide scale as well as at per base-pair resolution. The results of this analysis are gathered in a Publisher’s Note: MDPI stays neutral data storage system that allows for flexible extraction and bioinformatic manipulation. Furthermore, with regard to jurisdictional claims in it represents a valuable resource that could provide insight into population structure and dynamics published maps and institutional affil- of the species in the complex and benefit the development and implementation of genetic strategies iations. to tackle malaria. Keywords: Anopheles gambiae complex; vector control strategies; gene drive; genetic variability; genome conservation; genetic resistance Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and 1. Introduction conditions of the Creative Commons Malaria is a threat to life, health and wellbeing. It is a parasitic disease transmitted Attribution (CC BY) license (https:// Anopheles creativecommons.org/licenses/by/ exclusively by mosquito species from the genus. In 2018 there were 228 million 4.0/). cases recorded worldwide and 405,000 of these resulted in death [1]. The investment Insects 2021, 12, 97. https://doi.org/10.3390/insects12020097 https://www.mdpi.com/journal/insects Insects 2021, 12, 97 2 of 12 in malaria control programs relying on insecticide-treated mosquito nets (ITNs), indoor residual spraying and artemisinin-based combination drug therapies has been crucial for the decline of malaria incidents between 2000 and 2015, with the vast majority of this decline being attributed to the control of mosquito numbers [2]. However, from 2014 to 2018, the rate of change slowed dramatically due to the rise of resistance to drugs and insecticides [1]. In sub-Saharan Africa, where the malaria burden is highest, the majority of malaria transmission is caused by members of the Anopheles gambiae species complex. Eight species are currently included in this complex of which three, Anopheles gambiae, Anopheles coluzzii and Anopheles arabiensis, are responsible for 95% of malaria transmission in Africa [3]. Re- cently, a new species, Anopheles fontenillei, was discovered in Central Africa and attributed to the complex [4]. Transgenesis tools for Anopheles mosquitoes have been in use since the turn of the century [5,6] and in recent years, a number of genetic control methods have been generated that have proven, in a laboratory setting, to have great potential to tackle malaria [7–11]. Among the genetic control methods available for suppression or alteration of Anopheles mosquito populations, technologies based on gene drive systems seem to be the most promising. Naturally occurring gene drives alter Mendelian rules of segregation by increasing the probability that a specific allele will be included in the gametes that contribute to the offspring of any one individual. This higher than normal rate of inheritance allows the rapid spread of that specific allele through the population [12–14]. In recent years, genome editing technologies have facilitated the development of syn- thetic gene drives designed to enable the spread of altered traits mimicking the dynamics of the gene drives found in nature [10,11,15–17]. Due to their simple molecular components and efficiency, CRISPR/Cas9-based homing gene drives have quickly become a system of choice in this context. This technology is based on the presence of a bacterial endonuclease (Cas9) and a guide RNA (gRNA) that is responsible for directing the sequence specificity of Cas9 [18]. A genetic construct encoding the units of the Cas9-gRNA complex can be inserted into the genome of an organism to cause disruption of a target gene. The driving nature of this system relies on the ability of the gRNA to direct Cas9 to cut the wild-type version of the target gene on the homologous chromosome. This cut triggers the activation of the cell DNA break repair machinery, which will use the homologous regions flanking the genetic construct as a template to repair the double strand break. The repair mechanism can thus lead to the copy of the gene drive construct (and any encoded trait) and the con- version of the cell to a homozygous genotype. When this process happens in the germline it results in gametes with a higher than 50% probability of containing the gene drive, which in turn causes the allele to increase in frequency each generation and spread through the population. We have previously shown that a CRISPR-Cas9 gene drive construct can be used to target genes with essential roles in the female fertility of Anopheles mosquitoes [9]. However, the high fitness cost imposed by this type of gene drive can create a strong selection pressure for mutations that restore function to the target gene yet are resistant to CRISPR-Cas9 cleavage. This highlights the requirement, when designing gene drive systems, for a meticulous selection of genic target sequences that are under high levels of functional constraint and therefore less likely to tolerate resistant mutations. Indeed, in caged populations, suppression was achieved only when doublesex, a gene involved in sex determination and highly conserved within the Anopheles complex was targeted, showing that this aspect is crucial for the success of mosquito population control through this approach [11,19–21]. High levels of sequence conservation can be a good proxy for functional constraint. Recently, high-throughput DNA sequencing of wild-caught Anopheles mosquitoes revealed complex population structure, patterns of gene flow and a genetic diversity higher than expected [22–24]. Here we describe a bioinformatics pipeline that integrates a systematic analysis of the data on genetic variation in more than 1000 wild-caught Anopheles gambiae individuals and conserved syntenic regions of 19 Anopheles species, and three phyloge- netically more distant species within the same Diptera order. This user-friendly pipeline Insects 2021, 12, 97 3 of 12 performs a sliding window analysis across the whole genome for every possible gene drive target sequence, attributing to it a conservation score (Cs), which is an indicator of their tendency to tolerate mutations. Cs incorporates interspecies variation, intraspecies variation and a measure of the selection forces applying there. This should prove a useful resource when selecting for functionally