Community Science Program: FY22 Proposal
Total Page:16
File Type:pdf, Size:1020Kb
Community Science Program: FY22 Proposal Proposer’s Name: Olivier VALLON Proposal Title: The Chlamydomonas pan-genome Proposal (WIP) ID: 508098 Lay description: The land plants that make up our forests and produce most of our food have tiny relatives, called green algae. Their study allows us to learn more on how plants carry out photosynthesis, the only way nature can withdraw CO2 from the atmosphere, to build living matter. Among these inconspicuous organism, the unicellular Chlamydomonas reinhardtii is a key model organism, because of its ease of manipulation. It has been used in laboratories since 1945 and has allowed many discoveries. Our project will explore the natural biodiversity of this species and discover new functions for its many genes. 1 A) Brief description: Abstract: Chlamydomonas reinhardtii, with its 111 Mb genome, is the premier green algal model for research on photosynthesis, organelle biogenesis, lipid metabolism and for testing / designing biotechnology applications. It also serves as a model in research on cilia, the eukaryotic cell cycle, microbial interactions, nutrient homeostasis and other fields. Thanks to continuous efforts from the JGI, a genome has been available for a reference strain since the early 2000s, and the organism represents one of several flagship species for the JGI. Chlamydomonas reinhardtii is cosmopolitan, and next-gen sequencing of field isolates has revealed a surprisingly high genetic diversity, with SNP- level difference between strains of the same species as high as 3% and associated with a high phenotypic variability. We will harness the power of comparative genomics for a functional description of the genome features in this important model organism. We will sequence and assemble the genomes of up to 20 strains, most of them field isolates that either are available in collections or will be collected by us from the wild. This project is connected to a genome-wide association study using the "MAGIC" design that has already generated a large population of recombinant strains. The Chlamydomonas pan-genome not only will facilitate data analysis for individual investigator-initiated projects that rely on Chlamydomonas, but it will also help us understand how species- level diversity operates in natural environments to shape the phototrophic biome. Scope of Work: Phase I, to start immediately, will consist in assembly and annotation of 10 strains, mostly field isolates, and annotation of 5 previously obtained chromosome-level assemblies. The core sequencing technology will be PacBio HiFi, run by JGI using DNA samples prepared by us, with possible contributions of HiC (JGI) for synteny analysis and scaffolding and/or Nanopore (external) for DNA methylation. Assembly will benefit from additional RNA-Seq or Iso-Seq data, generated by JGI. In Phase II, the consortium will isolate new wild strains from a diversity of locations worldwide. Using Illumina sequencing, we will select 10 novel strains presenting additional diversity, for JGI to assemble and annotate. All data will be analyzed by the consortium for synteny and orthology relationships, and eventually presented to the public in the form of a Chlamydomonas pan-genome database. All the PIs involved have contributed to Chlamydomonas genomics in multiple diverse ways, usually in close collaboration with JGI staff. While Illumina sequencing is appropriate to monitor genetic diversity, the level of divergence we observe is so high that genome assembly and structural annotation are necessary to correlate genotype and phenotype at the resolution of interest. Today, only JGI has the expertise and experience necessary for such an ambitious project involving up to 20 strains, both for assembly and structural annotation. Over the years, JGI has accumulated transcriptomic data and tools that allows its staff to generate a precise description of the transcripts, including alternatively-spliced isoforms, in a controlled pipeline that allows direct comparison between strains. We also rely on JGI's powerful functional annotation pipeline to generate deep homology-based insight into gene function. 2 B) Background information: Technical Information: The nuclear genome of Chlamydomonas, based on previous assemblies, is approximately 111 Mb in size, with a GC content of 64% (Merchant et al., 2007). It is composed of 17 chromosomes, ranging from 3.7 Mb to 9.8 Mb. A total of 269 transposable elements has been described in Chlamydomonas, which together account for 10.8-12.4% of the genome depending on the strain (Craig et al., 2021). Their diversity is high, with 8 of the 9 known TE orders represented, in 16 superfamilies. The high similarity between copies of the same TE (80% of copies exhibit <5% divergence from their consensus) explains some of the problems with assemblies generated before long reads were available. As a unicellular alga, Chlamydomonas is amenable to many microbiological techniques (Harris, 2009). In particular, large clonal cultures can be generated from a single cell in a matter of days. As a result, very little heterogeneity is expected in DNA preparations. Being haploid, the strains under study will not need haplotype phasing. Established protocols exist for preparation of high-quality DNA, that have been previously applied by us (Lin et al., 2013). They have allowed generation of highly contiguous assemblies after PacBio sequencing, some enjoying N50 values approaching 3 Mb. Available Resources: The co-PIs of this proposal lead some of the most prominent laboratories working on Chlamydomonas. Accordingly, they have significant financial support from national and international granting agencies that will allow them to easily generate the starting materials and to analyze the results. A selection of their current grants is listed in their CVs (appendix), many of which will directly benefit from the pan-genome initiative. Chlamydomonas is usually the main if not sole research organism in the PIs' labs, so the proposal is clearly central to their research programs. The diversity of the subjects they study, from photosynthesis to cilia, from genome evolution to metal homeostasis, from transgenesis to synthetic biology, ensures that all aspects of Chlamydomonas biology will be covered in the analysis of the pan-genome. Several ongoing projects that will synergize with this pan-genome proposal can be highlighted. Population geneticist Rory Craig is generating assemblies of other algae, including 3 field isolates of C. reinhardtii that are not part of our sequencing request but that we would like JGI to annotate along with those they will assemble. This project targets TE dynamics in the species, the geographic structure of genetic diversity, gene flow, mutation rates and selection pressure (Craig et al., 2019). Co-PIs Marc Hanikenne, Pierre Cardol, Tom Druet and Denis Baurain, all from Liège University, are generating a MAGIC (Multiparent Advanced Generation Inter-Cross) population to fine- map quantitative traits. They crossed 8 starter strains (7 field isolates and 1 laboratory strain), all of which present interesting phenotypic characteristics and are part of our assembly/annotation request. The crossing scheme is designed so that all segments of each parental genome will be equally represented in the final population which they have finalized at 768 recombinant lines. Illumina sequencing was used to map the variants (Fig. 1), but the high genetic diversity of the field isolates (up to 2-3% at the 3 SNP/short indels level) makes the mapping to a single reference genome inadequate to capture the entire spectrum of variants. They would therefore greatly benefit from assembled parental genomes and a high quality annotation. Figure 1: the MAGIC design (left) and preliminary results of Illumina sequencing (right) illustrating chromosome-scale variations in the proportions of reads representing each of the parents (average contributions 10.3-15.6% each) The goal of their MAGIC project is to correlate sequence variants with quantitative traits measured in the parents and in each member of the recombinant population. Among the traits are a host of photosynthetic parameters deduced from the analysis of chlorophyll a fluorescence induction curves. The other target is metal homeostasis, approached by measuring elemental profiles and growth properties in media with limiting or excess metals. This population is meant to be stored frozen and distributed to laboratories interested in scoring other phenotypes (motility, mating, phototaxis, sensitivity to drugs are all easily scored in plate format). Other ongoing programs that will benefit from the pan-genome include the analysis a family of ~40 fast-evolving repetitive proteins (Boulouis et al., 2015) that affect chloroplast transcript stability, possibly sequence-specific endoribonucleases (Vallon); the search for variants in epigenetic processes that may provide tools for transgene expression and reveal algal-specific pathways (Bock, Schroda); discovery of new regulatory elements that could be used for synthetic biology (Smith, Schroda); control of cilia biogenesis and swimming behavior by multi-gene networks (Dutcher); variants in photoprotective mechanisms which may be linked to photic niche adaptations (Niyogi); the importance of gene neighborhoods (Blaby-Haas); variability in mating type structure that may affect sexuality (Dutcher) …… Technical Challenges: PacBio sequencing and genome assembly have proven very efficient in Chlamydomonas, and the wealth of transcriptome and homology