Anopheles Mosquitoes Revealed New Principles of 3D Genome Organization in Insects
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.114017; this version posted May 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Anopheles mosquitoes revealed new principles of 3D genome organization in insects Varvara Lukyanchikova1,2,3,4,+, Miroslav Nuriddinov3,+, Polina Belokopytova3,4, Jiangtao Liang1,2, Maarten J.M.F. Reijnders5, Livio Ruzzante5, Robert M. Waterhouse5, Zhijian Tu2,6, Igor V. Sharakhov1,2,7,*, Veniamin Fishman3,4,* 1 Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA 2 Fralin Life Science Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA 3 Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia 4 Novosibirsk State University, Novosibirsk, Russia 5 Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland 6 Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA 7 Department of Cytology and Genetics, Tomsk State University, Tomsk, Russia + co-first authors * correspondence to I.V.S. ([email protected]) and V.F. ([email protected]) Abstract Chromosomes are hierarchically folded within cell nuclei into territories, domains and subdomains, but the functional importance and evolutionary dynamics of these hierarchies are poorly defined. Here, we comprehensively profiled genome organizations of five Anopheles mosquito species and showed how different levels of chromatin architecture influence contacts between genomic loci. Patterns observed on Hi-C maps are associated with known cytological structures, epigenetic profiles, and gene expression levels. At the level of individual loci, we identified specific, extremely long- ranged looping interactions, conserved for ~100 million years. We showed that the mechanisms underlying these looping contacts differ from previously described Polycomb-dependent interactions and clustering of active chromatin. Introduction Three-dimensional genome organization has recently been recognized as a complex and dynamic mechanism of gene regulation. Understanding of these features has been advanced by the development of chromosome conformation capture (3C) methods, which enable genome-wide chromatin contacts to be studied at high resolution1–3. In addition, data obtained using 3C-methods help to generate high quality, chromosome-level genome assemblies, allowing comprehensive analysis of genome evolution4. Comparative studies performed on multiple vertebrate species revealed that genome architecture is evolutionarily conserved and could be explained by dynamic interplay between processes of cohesin-mediated loop extrusion and chromatin compartmentalization5–8. In insects, comprehensive analyses9–13 and cross-species comparisons12,14 of genome architecture have to date focused only on Drosophila bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.114017; this version posted May 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. species. These studies suggested that, in contrast to mammals, the process of loop extrusion does not define the structure of chromatin contacts14. Instead, separation of active and repressed chromatin plays an essential role in formation and interaction of topologically associated domains (TADs)15,16, which are basic units of chromatin organization in Drosophila. Recently, Ghavi-Helm et al.17 suggested that disrupting TADs does not influence coordinated gene expression, based on the analysis of Drosophila lines with highly rearranged genomes. In contrast, Renschler et al.14 used three distantly related Drosophila species to show that while chromosomal rearrangements might shuffle the positions of entire TADs, they are preferentially maintained as intact units. Thus, the roles that 3-dimensional chromatin interactions play in the function and evolution of insect genomes remain unclear. To address these apparently conflicting observations, we characterized the chromosomal-level genome architectures of five Anopheles mosquito species using a Hi- C approach. Comparative analyses of multiple mosquito genomes18 previously revealed a high rate of chromosomal rearrangements, especially on the X chromosome, making them an attractive model for studying interconnections between structural variations, chromosome evolution and genome architecture. Malaria has a devastating global impact on public health and welfare and Anopheles mosquitoes are exclusive vectors of human malaria parasites. The Hi-C data allowed us to improve existing genome assemblies of three mosquito species and generate new chromosome-length assemblies for others. Our analysis of TADs and genomic compartmentalization, supplemented by improved algorithms for compartment identification, demonstrated conservation of principles controlling chromatin organization in Anopheles species and other insects. We found specific looping interactions, sometimes spanning several dozens of megabases (Mb) in all studied Anopheles species, and showed that these interactions are evolutionarily conserved over ~100 million years. We generated RNA-seq and ChIP- seq data to show that these loops cannot be explained using known molecular mechanisms and represent a specific type of chromatin interactions. Aggregating long-range chromatin interactions, we found that there is a decrease of contact probabilities beyond a certain genomic distance. Performing broad evolutionary comparison between Anopheles species, other insects, and vertebrates, we showed that this limiting genomic distance is taxon-dependent and suggested a mechanistic explanation of this phenomenon. Results Hi-C-guided assembly of five Anopheles mosquito genomes. In the Hi-C experiment we used 15-18h embryos of mosquito species from three different subgenera of the Anopheles genus: Cellia (An. coluzzii, An. merus, An. stephensi), Nyssorhynchus (An. albimanus) and Anopheles (An. atroparvus) (Fig. 1, A- C). In addition, we sequenced Hi-C libraries from an adult An. merus mosquito. Based on our analysis, phylogenetic relationships of the selected species represent a broad range of evolutionary distances, from 0.5 million years (MY) between closely related An. coluzzii and An. merus species19 to 100 MY separating the most distant lineages such as An. coluzzii and An. albimanus18 (Fig. 1, A). bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.114017; this version posted May 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Fig. 1. Selected species and Hi-C experimental setup. A) A time-calibrated phylogenetic tree shows estimated evolutionary distances among the selected Anopheles species; numbers in boxes show divergence times in millions of years (MY) for each branch; the scale bar represents 20 MY; B) Anopheles merus adult female mosquito; Adult mosquito illustrations were taken from VectorBase repository C) Anopheles atroparvus 15-18h embryo developmental stage; D) The experimental design of the embryo Hi-C experiments. After sequencing the libraries and merging biological replicas, we obtained 60-194 million of unique alignable reads for each species (Supplementary Table 1). Library statistics show high quality of the obtained data (Supplementary Table 1). Genomes of Anopheles species have been challenging to assemble to chromosomal levels, mainly due to the presence of highly repetitive DNA clusters that regular Illumina sequencing and assembly cannot successfully resolve18. Chromosome-length assemblies were already available for two species (An. albimanus, An. atroparvus), whereas for An. coluzzii, An. merus and An. stephensi there were only scaffold-level assemblies with N50s of 3.5-Mb20, 2.7-Mb18, and 1.6-Mb21 correspondingly. While evolutionary superscaffolding and chromosomal anchoring improved these assemblies, they did not reach a full chromosomal level22. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.114017; this version posted May 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. We employed a 3D-DNA pipeline4 to de novo assemble An. coluzzii, An. merus, and An. stephensi genomes using our Hi-C datasets. We reassembled chromosomes of An. albimanus and An. atroparvus using available chromosomal assemblies as drafts. For all species, five large scaffolds corresponding to the number of chromosomal arms (X, 2R, 2L, 3R, 3L) were identified (Table 1). Multiple misassemblies and several chromosome rearrangements were detected and fixed manually. Available physical maps of the genomes were used to verify these corrections23–27. For An. coluzzii Mopti we used recently published PacBio contigs from a single An. coluzzii Ngousso mosquito20, characterized by N50 of 3.5-Mb, because our Hi-C data revealed multiple errors in scaffolds of the existing An. gambiae PEST assembly. This indeed resulted in a more accurate assembly which was used in further analysis (Table 1). For An. merus we performed PacBio sequencing using whole genomic DNA extracted from 100 adult males, which produced reads with an average read length N50 of 2.7-Mb (Supplementary Table 2).