International Journal of Genomics

Genomic and Postgenomic Approaches to Understand Environmental Microorganisms

Lead Guest Editor: Rafael Silva-Rocha Guest Editors: Maria‑Eugenia Guazzaroni and Raul A. Platero Genomic and Postgenomic Approaches to Understand Environmental Microorganisms International Journal of Genomics Genomic and Postgenomic Approaches to Understand Environmental Microorganisms

Lead Guest Editor: Rafael Silva-Rocha Guest Editors: Maria-Eugenia Guazzaroni and Raul A. Platero Copyright © 2018 Hindawi. All rights reserved.

This is a special issue published in “International Journal of Genomics.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Editorial Board

Jacques Camonis, France Sylvia Hagemann, Austria Ferenc Olasz, Hungary Luigi Ceci, Italy Henry Heng, USA Elena Pasyukova, Russia Maria Luisa Chiusano, Italy Eivind Hovig, Norway Graziano Pesole, Italy Prabhakara V. Choudary, USA Hieronim Jakubowski, USA Giulia Piaggio, Italy Martine A. Collart, Switzerland B.-H. Jeong, Republic of Korea Ernesto Picardi, Italy Giandomenico Corrado, Italy Atsushi Kurabayashi, Japan PauloM.Pinto,Brazil M. Dmitrzak-Weglarz, Poland Sang Hong Lee, Australia Mohamed Salem, USA Antonio Ferrante, Italy Daofeng Li, USA Tatiana Tatusova, USA Marco Gerdol, Italy Julio Martin-Garcia, USA Wilfred van IJcken, Netherlands João Paulo Gomes, Portugal Giuliana Napolitano, Italy Brian Wigdahl, USA Soraya E. Gutierrez, Chile Corey Nislow, Canada Jinfa Zhang, USA M. Hadzopoulou-Cladaras, Greece Michael Nonnemacher, USA Contents

Genomic and Postgenomic Approaches to Understand Environmental Microorganisms María-Eugenia Guazzaroni , Raul Alberto Platero ,andRafaelSilva-Rocha Editorial (2 pages), Article ID 4915348, Volume 2018 (2018)

The Endophytic Bacterial Microbiota Associated with Sweet SorghumSorghum ( bicolor)IsModulated by the Application of Chemical N Fertilizer to the Field Cintia Mareque, Thais Freitas da Silva, Renata Estebanez Vollú, Martín Beracochea, Lucy Seldin , and Federico Battistoni Research Article (10 pages), Article ID 7403670, Volume 2018 (2018)

New Genomic Approaches to Enhance Biomass Degradation by the Industrial Fungus Trichoderma reesei Renato Graciano de Paula , Amanda Cristina Campos Antoniêto, Liliane Fraga Costa Ribeiro, Cláudia Batista Carraro, Karoline Maria Vieira Nogueira, Douglas Christian Borges Lopes , Alinne Costa Silva, Mariana Taíse Zerbini, Wellington Ramos Pedersoli, Mariana do Nascimento Costa, and Roberto Nascimento Silva Review Article (17 pages), Article ID 1974151, Volume 2018 (2018)

Metagenomic Approaches for Understanding New Concepts in Microbial Science Luana de Fátima Alves, Cauã Antunes Westmann, Gabriel Lencioni Lovate, Guilherme Marcelino Viana de Siqueira, Tiago Cabral Borelli, and María-Eugenia Guazzaroni Review Article (15 pages), Article ID 2312987, Volume 2018 (2018)

Protein Engineering Strategies to Expand CRISPR-Cas9 Applications Lucas F. Ribeiro , Liliane F. C. Ribeiro, Matheus Q. Barreto, and Richard J. Ward Review Article (12 pages), Article ID 1652567, Volume 2018 (2018)

Calibrating Transcriptional Activity Using Constitutive Synthetic Promoters in Mutants for Global Regulators in Escherichia coli Ananda Sanches-Medeiros, Lummy Maria Oliveira Monteiro, and Rafael Silva-Rocha Research Article (10 pages), Article ID 9235605, Volume 2018 (2018) Hindawi International Journal of Genomics Volume 2018, Article ID 4915348, 2 pages https://doi.org/10.1155/2018/4915348

Editorial Genomic and Postgenomic Approaches to Understand Environmental Microorganisms

1 2 3 María-Eugenia Guazzaroni , Raul Alberto Platero , and Rafael Silva-Rocha

1Faculty of Philosophy, Sciences and Letters at Ribeirao Preto, Ribeirao Preto, Brazil 2Clemente Estable Biological Research Institute, Montevideo, Uruguay 3Ribeirao Preto Medical School, Ribeirao Preto, Brazil

Correspondence should be addressed to Rafael Silva-Rocha; [email protected]

Received 4 September 2018; Accepted 4 September 2018; Published 24 October 2018

Copyright © 2018 María-Eugenia Guazzaroni et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The introduction of new high-throughput technologies has Ribeiro et al. review the current strategies applied to over- profoundly impacted the development of modern science come the limitations of Cas9 protein (CRISPR-associated outside the gene-centered understanding perceived in the protein 9) to generate robust and efficient tools for cus- earlier genomic era [1, 2]. Thus, low-cost platforms known tomized DNA manipulation using protein engineering as Next Generation Sequencing (NGS) technologies can pro- approaches. Application of revised techniques in diverse duce millions of sequences of DNA molecules with different biological processes should permit the optimization of yields and sequence lengths, having displayed a considerable gene therapy, metabolic flux, and synthetic gene networks. reduction in cost along the last decade and a lustrum [3]. In another revision, L. de Fátima Alves et al. focus on how At the same time, microorganisms are the most abun- metagenomics has contributed to gain scientific under- dant organisms on Earth, playing elemental roles in bio- standing in diverse areas of knowledge. In this framework, geochemical, biomedical, and biotechnological processes. main milestones in the metagenomic field are presented In this sense, novel problem-solving techniques involving over the last 30 years, since the first published metage- systems and synthetic biology approaches have emerged nomic experiment. The article is oriented in a philosophi- to give significant insights into the comprehension of the cal manner, providing perceptions into potentialities and molecular mechanisms related to microbial function. current challenges of metagenomic approaches, presenting Moreover, this knowledge can be extended to our under- the field as a promoter of new concepts in microbial sci- standing of microbial processes, allowing the exploration of ence. The revision article authored by R. G. de Paula environmental microorganisms—including bacteria and et al. discusses new perspectives about control of secretion fungi—as cell factories. In this context, this special issue and cellulase expression based on RNA-seq and functional attempts to explore how postgenomic approaches have characterization data of the filamentous fungi Trichoderma allowed to take advantage of the abundance of the genetic reesei, one of the most well-studied cellulolytic microor- and biochemical information currently being produced from ganisms. In one research article, A. Sanches-Medeiros the fields of (meta)genomics, (meta)transcriptomics or tran- et al. report a new method to calibrate transcriptional scriptome profiling, and (meta)proteomics and how large activity using constitutive synthetic promoters in the bac- amounts of data generated by these methodologies are inte- terium Escherichia coli. In this article, authors demonstrate grated to engineer microbes for relevant applications. that simple experimental techniques involving the use of a Accordingly, the current special issue includes two origi- single fluorescent reporter and plasmids are sufficient to nal researches and three review articles, comprising different provide robust characterization of transcriptional elements, aspects of the selected topics. For example, in one article, L. F. which is fundamental to proper construction of complex 2 International Journal of Genomics synthetic circuits. Finally, in another research article, C. Mareque et al. evaluate how different levels of N fertiliza- tion affect the structure of total and diazotrophic endo- phytic bacterial microbiota associated with field-grown Sorghum bicolor. The data obtained by the authors con- tribute to the understanding of how agronomical practices impact the plant-associated microbiota, an essential step toward the design of new sustainable agricultural produc- tion systems based on plant growth-promoting bacteria. Thus, in our opinion, gathered articles in this special issue would allow us to explore novel methodological approaches to understand the molecular mechanisms associ- ated to microbial function. We hope that these materials will help to inspire readers to perform novel studies capitalizing the unlimited biological potential of environmental microor- ganisms, probably bringing important profits in areas of agriculture, industry, pharmacy, and biotechnology.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

María-Eugenia Guazzaroni Raul Alberto Platero Rafael Silva-Rocha

References

[1] E. L. van Dijk, H. Auger, Y. Jaszczyszyn, and C. Thermes, “Ten years of next-generation sequencing technology,” Trends in Genetics, vol. 30, no. 9, pp. 418–426, 2014. [2] L. Perbal, “The case of the gene: postgenomics between modernity and postmodernity,” EMBO Reports, vol. 16, no. 7, pp. 777–781, 2015. [3] A. Sanchez-Flores and C. Abreu-Goodger, “A practical guide to sequencing genomes and transcriptomes,” Current Topics in Medicinal Chemistry, vol. 14, no. 3, pp. 398–406, 2014. Hindawi International Journal of Genomics Volume 2018, Article ID 7403670, 10 pages https://doi.org/10.1155/2018/7403670

Research Article The Endophytic Bacterial Microbiota Associated with Sweet Sorghum (Sorghum bicolor) Is Modulated by the Application of Chemical N Fertilizer to the Field

1 2 2 1 Cintia Mareque, Thais Freitas da Silva, Renata Estebanez Vollú, Martín Beracochea, 2 1 Lucy Seldin , and Federico Battistoni

1Departamento de Bioquímica y Genómica Microbianas, Instituto de Investigaciones Biológica Clemente Estable, Avenida Italia 3318, Montevideo 11600, Uruguay 2Laboratório de Genética Microbiana, Instituto de Microbiologia Paulo de Góes, Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil

Correspondence should be addressed to Federico Battistoni; [email protected]

Received 21 March 2018; Accepted 16 August 2018; Published 30 September 2018

Academic Editor: Rafael Silva-Rocha

Copyright © 2018 Cintia Mareque et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Sweet sorghum (Sorghum bicolor) is a multipurpose crop used as a feedstock to produce bioethanol, sugar, energy, and animal feed. However, it requires high levels of N fertilizer application to achieve the optimal growth, which causes environmental degradation. Bacterial endophytes, which live inside plant tissues, play a key role in the health and productivity of their host. This particular community may be influenced by different agronomical practices. The aim of the work was to evaluate the effects of N fertilization on the structure, diversity, abundance, and composition of endophytic and diazotrophic bacterial community associated with field-grown sweet sorghum. PCR-DGGE, quantitative PCR, and high-throughput sequencing were performed based on the amplification of rrs and nifH genes. The level of N fertilization affected the structure and abundance but not the diversity of the endophytic bacterial communities associated with sweet sorghum plants. This effect was pronounced in the roots of both bacterial communities analyzed and may depend on the physiological state of the plants. Specific bacterial classes and genera increased or decreased when the fertilizer was applied. The data obtained here contribute to a better understanding on the effects of agronomical practices on the microbiota associated with this important crop, with the aim to improve its sustainability.

fi 1. Introduction examples being N2- xing bacteria [4], but the potential of applying endophytic bacteria as inoculants is underexplored Globally, sweet sorghum (Sorghum bicolor) is the fourth [5–7]. Several members of the phyla Proteobacteria, Firmi- most important cereal and is a multipurpose crop that is cutes, Bacteriodetes, and Actinobacteria have been isolated used in grain, forage, syrup, fodder, and bioethanol produc- as endophytes. These bacteria have been recognized to have tion [1]. Microbial communities play a crucial role in ecosys- profoundly favourable impacts on plant growth by produc- tems, and particularly plant microbiomes can modulate the ing phytohormones, synthesizing fungicidal and/or bacteri- growth, health, productivity, C-sequestration, and phytore- cidal substances, enhancing the availability of minerals, mediation of plants and play a key role in global biogeo- possessing phosphate-solubilizing activity, and providing chemical cycles [2]. nitrogen to plants [4]. In addition, endophytic bacteria are Bacterial endophytes are defined as bacteria that can be an effective agent for stimulating plant secondary metabo- detected at a particular moment within the internal tissues lism and for improving or producing functional components of an apparently healthy host plant [3]. Many endophytes [4, 8]. These features mentioned stress the potential of are likely to have positive effects on their hosts, with the best the endophytic bacteria to be used as a bioinoculant of 2 International Journal of Genomics agronomical important crops, with the aim to develop more was extracted from 1.5 g of each plant fraction using the sustainable production systems [9]. PowerSoil® DNA Isolation Kit (MO BIO Laboratories, Inc. The structure and diversity of the endophytic communi- USA) and purified with the Kit Wizard® DNA Clean-Up ties have been shown to be potentially influenced by several System (Promega, USA). Prior to the bacterial DNA extrac- factors, such as the plant species and genotype, agricultural tion, the stem samples were disinfected with 70% EtOH practices, and environmental conditions [10–12]. To better and the epidermis was peeled off with a sterile scalpel. In understand and manipulate the contribution of endophytic the case of the roots, the rhizospheric soil was removed by bacteria to plants when used as bioinoculants, it is crucial vortexing for 10 min in 0.9% NaCl, the material was disin- to decipher the community structure, diversity, composition, fected for 10 min in 70% EtOH, surface-sterilized 30 min in metabolic processes, adaptability, and beneficial features of 4% sodium hypochlorite, and then rinsed with sterile deion- the microbiome associated with target crops. Plants have ized water. Finally, the root samples were sonicated for been shown to harbour an enormous diversity of bacteria, 15 min and vortexed for 1 min. Subsequently, the ends of including diazotrophs [4, 13, 14]. Therefore, a better the material were removed with a sterile scalpel and dis- understanding of the endophytic bacteria microbiota of carded, and sterility tests were conducted on the remainder host plants may help elucidate their role within their of the tissue on TSA (trypticase soy agar) plates. DNA prep- hosts, moving toward the development of more sustainable arations were visualized after electrophoresis in a 1% (w/v) agronomical practices. agarose gel in 1x TBE buffer to access their integrity, stained The availability of nitrogen often limits crop productiv- with GoodView (Beijing SBS; Genetech), and stored at −20°C ity, and chemical nitrogen fertilization is widely used to prior to PCR amplification. increase the yield of agronomical crops. However, this prac- tice has a high environmental and economic cost, since crop 2.2. Nested PCR Amplification of the nifH and 16S rRNA plants are able to use only 50% of the applied fertilizer, while Coding Genes. For DGGE analyses, the 16S rRNA and the rest is lost from the plant-soil system through gaseous nifH gene sequences from stem and root samples were emissions, runoff erosion, and leaching [15, 16]. The envi- amplified by triplicate, using nested PCR as previously ronmental impacts of this loss range from greenhouse effects, described [20–22]. ozone layer damage, and acid rain to changes in the global N For 16S rRNA coding gene amplification, the first PCR cycle and nitrate pollution of surface and ground water [17]. was carried out with the forward primer 799F and the reverse In addition, the application of N fertilizer could also inhibit primer 1492R (Table S1) [20] generating a of fi the N2- xation process by diazotrophic bacteria in the soil approximately 700 bp. The second step was carried out with or associated with the plants [18]. the forward primer F968 containing a GC clamp and the These problems emphasize the urgent need for new reverse primer R1401 (Table S1) [23] yielding a product of technologies based on plant growth-promoting bacteria 433 bp. For both reactions, the PCR mixture contained − (PGPB) to help achieve more sustainable agricultural 1.0 μl of template DNA (16 ng μl 1), 2x GoTaq® Reaction ff μ production systems. Bu er, 2.1 mM MgCl2, 0.4 mM dNTPs, 0.2 M of each Previously, part of the cultivable community associated primer, and 2 U Taq polymerase, in a final reaction volume with sweet sorghum (cv. M81E) was described. Isolates from of 25 μl. In the second PCR, 1 μl of the first PCR product this community showed several plant growth-promoting was used as a template. The PCR conditions were as follows: (PGP) traits, and some of them were described as plant 35 cycles consisting of denaturing at 94°C for 20 sec, anneal- growth promoters of this sweet sorghum cultivar [19]. ing at 53°C for 40 sec for the first PCR and 48°C for 90 sec for The aim of the present study was to evaluate the effects of the second PCR, and primer extension for 40 sec for the first N fertilization on the structure, diversity, abundance, and PCR and 90 sec for the second PCR at 72°C with a final exten- composition of endophytic and diazotrophic bacterial com- sion at 72°C for 10 min. munities associated with sweet sorghum in fields. To study DGGE analyses based on the 16S rRNA gene, of the these communities, DNA fingerprinting, quantitative PCR, Proteobacteria, Actinobacteria, and Firmicutes phyla, were and high-throughput sequencing techniques were used. performed using specific primers for each group (Table S1). The reaction mixtures and the PCR conditions were as previ- 2. Materials and Methods ously reported [20, 23–27], while the conditions for the sec- ond PCR were the same as mentioned above. 2.1. Plant Sampling and DNA Extraction. Sweet sorghum For the nifH gene amplification, the first PCR was carried plants were sampled from two different fields with contrast- − out with the forward primer FGPH19 and the reverse primer ing N fertilization levels: 0 and 100 kg N ha 1, at the cropping POLR (Table S1) [21, 22], generating a product of 429 bp. region Bella Union-Artigas, Uruguay (30°37′56.0″S57°21′ The second PCR was carried out with the forward primer 18.0″W). The soil physicochemical characteristics were POLF containing a GC clamp and the reverse primer AQER analyzed as follows: pH 6.2, 42% sand, 25% silt, 33% clay, (Table S1 [21]), yielding a product of 320 bp. The PCR reac- − 1.51% organic matter, and 0.12% total N. tion mixture contained 1 μl of template DNA (16 ng μl 1), μ ff At the laboratory, each collected plant (three plants from 10 l5xGoTaq® Reaction Bu er, 2.1 mM MgCl2, 0.8 mM each fertilization level) was independently divided into roots dNTPs, 0.2 μM of both sets of primers, and 1.25 U Taq poly- and upper and lower stems, which were used for total micro- merase, in a final reaction volume of 50 μl. For the second bial DNA extraction. The total microbial community DNA reaction, 1 μl of the first PCR product was used as a template International Journal of Genomics 3

+N −N 60 80 100 Stem Stem 80.9 L Root Root Lower Upper Lower Upper 76.4 U L 75.7 81.0 L 71.5 U U Stem L 61.3 87.4 78.5 L 1 8 32 29 52.5 88.4 U 2 9 14 30 33 71.6 22 25 34 U 3 16 35 U 4 7 18 5 10 19 26 31 36 L 37 20 27 21 38 50.5 88.9 76.1 39 28 Root 66.4

85.7

+N

−N

(a) (b)

Figure 1: (a) Denaturing gradient gel electrophoresis (DGGE) fingerprints of 16S rRNA gene fragments amplified from endophytic DNA − templates isolated from sweet sorghum plants grown under high (+) and low (−) N fertilization levels (100 and 0 kg N ha 1, respectively). Numbers indicate the excised bands from which sequences were determined. (b) Dendrogram obtained using the unweighted pair group method with mathematical averages and DICE similarity coefficients. Grey and black squares: high and low nitrogen fertilization levels, respectively. L and U: lower and upper stem parts, respectively. for the next reaction. The PCR conditions were as follows: 30 sequenced. For the latter, the bands were eluted at 4°C over- cycles consisting of denaturing at 94°C for 1 min, annealing at night in 50 μl of Milli-Q water, and 1 μl of each supernatant 50°C for 1 min for the first PCR and 55°C for the second PCR, was used as a PCR template. The reamplification reaction and primer extension for 2 min at 72°C with a final extension conditions were the same as used for the second cycle at 72°C for 5 min. The amplification products obtained were described above using the primers F968-GC and R1401 for analyzed on 1% (w/v) agarose gels via electrophoresis in TAE the 16S rRNA; or the primers POLF-GC and AQER for the buffer (20 mM Tris-acetate, 0.5 mM EDTA; pH 9) and nifH gene. PCR products obtained were sent for sequencing stained with GoodView (Beijing SBS; Genetech). to Macrogen Inc., Korea. Forward and reverse sequences obtained were assembled using the DNA Baser Sequence 2.3. Denaturing Gradient Gel Electrophoresis (DGGE). PCR Assembler v3.x 302 (2010) (http://www.DnaBaser.com). product (20 μl) was loaded onto 8% (w/v) polyacrylamide Nucleotide sequences obtained were identified by BLASTn gels, 1 mm thick, in 1x TAE buffer (20 mM Tris-acetate, analyses [28], using the GenBank database from the National 0.5 mM EDTA; pH 9), with a 40–65% urea and formamide Center for BiotechnologyInformation. Allsequences obtained denaturant gradient for the study of Alphaproteobacteria, in this study were deposited in the GenBank database Betaproteobacteria, and Firmicutes, 45–75% for Actinobac- under the accession numbers: KY062497-KY062552. teria, and 20–70% for the diazotrophic community. The elec- trophoresis was run at 60 V for 16 hours using the D-Code 2.4. Bioinformatics and Statistical Analysis. The dendrograms system from Bio-Rad Laboratories. The gels were stained and the binary matrix based on the digitalized image of the for 30 min with 1x SYBR® Green (Invitrogen™) and the DGGE gels were constructed with the UPGMA algorithm bands visualized digitalized using Storm™ (GE Healthcare). with mathematical averages and Dice similarity coefficients Selected bands (numbered in Figures 1 and 2) were excised using the GelCompar II 6.5 software (Applied Maths NV). from the gels with a sterile scalpel, reamplified, and The Shannon-Wiener, Simpson, and Chao-1 alpha indices 4 International Journal of Genomics

+N −N 60 100 Stem Stem L Root Root 94.1 Lower Upper Lower Upper 64.1 L L 47.5 Stem 83.3 U 37.8 80.9 U U 27

Root 1 18 19 34.7 95.2 U 73.5 U 71.0 4 11 L Stem 66.6 5 17 21 U 6 22 12 23 31 7 13 20 26.4 L 8 14 24 51.2 25 94.1 9 15 10 32 90.8 16 4.7 Root

L Stem Root

+N −N

(a) (b)

Figure 2: (a) Denaturing gradient gel electrophoresis (DGGE) fingerprints of nifH gene fragments amplified from endophytic DNA − templates isolated from sweet sorghum plants grown under high (+) and low (−) N fertilization levels (100 and 0 kg N ha 1, respectively). Numbers indicate the excised bands from which sequences were determined. (b) Dendrogram obtained using the unweighted pair group method with mathematical averages and DICE similarity coefficients. Grey and black squares: high and low nitrogen fertilization levels, respectively. L and U: lower and upper stem parts, respectively. were calculated from the band patterns using the densitome- by melting curve analysis (58–95°C, 0.5°C per read, 5 s hold) try curves and then exported into a quantitative numeric and visualization in agarose gels, which showed specific matrix, relative to the band surface. product bands at the expected size of 180 bp for the 16S ANOVA test was performed using the Fisher LSD post rRNA gene and 360 bp for the nifH gene. hoc test at a significance level of P <005. All the statistic For both genes, three replicates in duplicate were used. analyses were performed in InfoStat programme [29]. For the standard curve, triplicates were employed for every run using a known number of each gene from the genome 2.5. Quantification of the Bacterial Community by of Herbaspirillum seropedicae SmR1 [31], from 6.62 × 101 to Quantitative PCR (qPCR). The abundances of 16S rRNA 6.62 × 105 copies of the nifH gene and from 1.99 × 103 to and nifH genes were quantified by real-time PCR (qPCR) 1.99 × 106 copies of the 16S rRNA encoding gene. using the primers 6S-27F/338R and POLF/POLR (Table S1), For the standard curve, mass concentrations of standard respectively [21, 22, 30]. The qPCR reaction was performed DNA were converted into copy concentrations using the in a CFX96 Touch Real-Time PCR (Bio-Rad) equipment, following equation [32]: and all measurements were performed using the SYBR μ Green approach. The PCR mixture was 12.5 l of the 6 02 × 1023 copy/mol × DNA amount g μ DNA copy = iQSYBR Green Supermix (Bio-Rad), 1 M of each primer, DNA length bp × 660 g/mol/bp and 4 to 25 ng of DNA template, within a total volume of 25 μl. The qPCR cycle for the 16S rRNA coding gene con- 1 sisted of a denaturation step for 10 min at 95°C, 40 cycles for 15 s at 95°C, 30 s at 58°C, and 30 s at 72°C. The qPCR For statistical analyses, an ANOVA test was performed cycle for the nifH gene consisted of a denaturation step for using the InfoStat programme and in those circumstances 5 min at 95°C and 40 cycles for 10 s at 95°C, for 10 s at where significant differences were confirmed, the means were 59°C, and 30 s at 72°C. Product specificity was confirmed compared using the Tukey test with a P <005 [29]. International Journal of Genomics 5

2.6. Ion Torrent® High-Throughput Sequencing of the of the expected size obtained using each set of primers were Bacterial 16S rRNA Coding Gene resolved by DGGE. The results from the DGGE analysis of the total endo- 2.6.1. Sample Collection and DNA Extraction. For bacterial phytic bacterial community based on the 16S rRNA gene DNA extraction, same plant samples used in the DGGE amplicon are shown in Figure 1(a). The UPGMA-assisted experiments were employed. In this case, stems from four cluster analysis of the DGGE gels revealed the endophytic sweet sorghum plants were sampled and pooled from two communities clustered according to the organ analyzed different fields with contrasting N fertilization levels (0 and −1 (roots and stem) with 50.5% similarity (Figure 1(b)). More- 100 kg N ha ). Bacterial DNA extraction was performed over, within the aforementioned organs, the communities following the protocol previously described [33], with the are grouped with respect to the N fertilization level analyzed, fi following modi cations: stems were peeled with a sterilized with 61.3 and 66.4% similarities observed for the stems and scalpel, and 50 g of the inner stem tissues were homogenized roots, respectively. However, no distinct community struc- in 300 ml of Milli-Q water. Bacterial DNA extraction from turing was observed between the lower and upper stems the enriched fraction was obtained using a CTAB bacterial (Figure 1(b)). DNA isolation method (Joint Genome Institute protocols/ When the whole endophytic community was analyzed by http://1ofdmq2n8tc36m6i46scovo2e.wpengine.netdna-cdn. DGGE, alpha diversity indices (Simpson 1-D, Chao-1, and com/wp-content/uploads/2014/02/JGIBacterial-DNA-isola Shannon H) showed that the higher values were obtained tion-CTAB-Protocol-2012.pdf). for the root communities at the low N fertilization level, but The purity of the extracted DNA was checked with no significant differences were observed among the treat- the NanoDrop ND-1000 spectrophotometer (NanoDrop ments analyzed (Table S3). Technologies, Wilmington, DE, USA) (260/280 nm ratio), The DNA bands were retrieved from the DGGE gels, fi and it was quanti ed using an Agilent 2100 Bioanalyzer reamplified, and sequenced (Figure 1(a)). BLASTn analyses (Agilent Technologies, Santa Clara, CA, USA). The integrity revealed that all the identified genera belonged to the phylum fi of the DNA was also con rmed by electrophoresis in a 0.8% Proteobacteria (Table S2), which was primarily represented ff agarose gel with 1x TAE bu er. by genera from the classes Betaproteobacteria (Duganella, Aquabacterium, Bordetella, and Massilia) and Gammapro- 2.6.2. Sequencing and Data Analysis. PCR amplification of teobacteria (Pantoea, Salmonella, Klebsiella, Kosakonia, the V6 region of the 16S rRNA gene sequences was carried Pseudomonas, Serratia, and Stenotrophomonas). out using a pool of six forward and reverse degenerate Interestingly, when the phyla Proteobacteria (classes Alpha, Beta, and Gamma), Firmicutes, and Actinobacteria primers each (Table S2) [34]. High-throughput sequencing fi of the amplicons was conducted using the Ion Torrent were analyzed using speci c primers based on the 16S rRNA Personal Genome Machine (PGM) platform at the Genomic gene amplicon (Table S1), the UPGMA-assisted cluster anal- Department of the IIBCE (Uruguay). Raw sequencing reads ysis of the DGGE gels revealed that the structure of those were checked using the following quality criteria: (i) poly- communities did not cluster according to the plant organ (roots and stem) or to the fertilization level analyzed (0 and clonal reads with Ion Torrent Suite (5.03) were discarded, −1 (ii) reads were trimmed to 90 bp and shorter ones were 100 kg N ha ) (data not shown). Selected bands retrieved discarded, (iii) reads with an expected error rate equal to or from each DGGE gel are shown in Table S4. greater than 1.0 using Usearch (Edgar, 2010) were also dis- Analysis of the Alphaproteobacteria DNA bands ampli- fi fi carded, and (iv) reads that match with the Sorghum bicolor ed using the speci c primers and retrieved from the DGGE complete genome [35] using Bowtie 2 [36] were discarded. gels showed that only 45% of the sequences were related to Downstream analysis followed the pipeline described by the expected genera, such as Agrobacterium, Ancylobacter, Pylro et al. [37]. A custom set of bash scripts was con- Brevundimonas, and Pleomorphomonas (Table S4). In con- structed to automatize the pipeline; these are available on trast, analysis of the Betaproteobacteria bands isolated from fi fi the site https://github.com/mberacochea/sorghum-bicolor- the DGGE gels, which were ampli ed using the speci c M81E-16S. All sequences obtained in this study were depos- primers, revealed that 100% of the sequences were related ited under the NCBI accession number PRJNA352426. to the expected genera. From the Betaproteobacteria DGGE gel, only the genera Massilia and Methyloversatilis were iden- tified (Table S4). Despite the specificity of the primers, none 3. Results of the selected bands sequenced from the Gammaproteobac- teria DGGE gels was assigned to a genus belonging to this 3.1. Endophytic Bacterial Communities Associated with class. By contrast, the identities of the bands retrieved from Different Tissues of Sweet Sorghum Plants. DNA was recov- the Actinobacteria DGGE gels showed that 100% of the ered from all the sweet sorghum samples (roots and stems) genera identified were related to the expected phyla, of cv. M81E grown in the field under different N fertilization including Curtobacterium, Microbacterium, Nocardia, and − levels (0 and 100 kg N ha 1). All DNA samples were used as Sediminihabitans (Table S4). Similarly, the sequence iden- templates for PCR amplification using primers based on tities of the bands retrieved from the Firmicutes DGGE gel general 16S rRNA coding genes, on specific 16S rRNA genes were also 100% related to the expected phyla, including for Alpha-, Beta-, and Gammaproteobacteria, Actinobacteria, Bacillus, Macrococcus, Staphylococcus, and Exiguobacterium Firmicutes, and on the nifH gene (Table S1). DNA fragments (Table S4). 6 International Journal of Genomics

1.0E+06 B A 1.0E+05 AAA A

1.0E+04 b 1.0E+03 a a 1.0E+02 a a a

Number of copies gene 1.0E+01

1.0E+00 Root Lower Upper Root Lower Upper Stem Stem (+) N (−) N Treatment 16S rRNA nifH

Figure 3: Quantification of 16S rRNA and nifH genes copies in samples taken from different organs of sweet sorghum plants (cv. M81E) − grown in the field under high (+) and low (−) N fertilization (100 and 0 kg N ha 1, respectively). Means within two treatments that have the same letter are not significantly different by Tukey test with a P <005.

In contrast to the 16S rRNA-based data, analysis of the amplification showed a linear correlation (R2) of 0.99, corre- endophytic-diazotrophic bacterial community based on sponding to a PCR efficiency of 90%; for the nifH gene, the DGGE gels with the nifH gene amplicons showed that linear correlation was 0.98, corresponding to a PCR efficiency the stems produced a greater number of bands than the roots of 110%. The 16S rRNA gene abundance varied from in both treatments analyzed (Figure 2(a)). In addition, the 3.0 × 104 to 2.5 × 105, while for the nifH gene, the number UPGMA-assisted cluster analysis of the endophytic- of copies varied from 1.9 × 101 to 1.2 × 103 (Figure 3). diazotrophic community structure showed that this com- The number of copies of the 16S rRNA gene in the roots munity was clustered according to treatment (0 and of plants grown under high N fertilization (+N) conditions − 100 kg N ha 1) that exhibited 34.7% similarity, except for was significantly higher than in all other conditions analyzed a single root replicate. Moreover, within these treatments, (Figure 3). By contrast, the number of nifH gene copies var- the communities were grouped according to the organs ied markedly in the roots and stems (lower and upper) of analyzed, with 51.2 and 37.8%, respectively. In particular, plants grown under low N fertilization (−N) conditions, under low N fertilization conditions, the stem community while no differences were observed within the plants grown separated into two groups (lower and upper stem samples) under +N conditions (Figure 3). Indeed, under −N condi- that exhibited 47.5% similarity (Figure 2(b)). tions, the abundance of the nifH genes was significantly The diversity of diazotrophic bacterial communities was higher in the roots than in the other organs analyzed. also evaluated based on the DGGE gels obtained. In this case, higher alpha index values were obtained in the stems from the high N fertilization condition. However, in all the treat- 3.3. Bacterial Community Composition. After quality control, ments analyzed, no significant differences were observed the number of retained reads was 41,736 and 44,890 for the − (Table S5). Selected bands from the DGGE gels in which DNA samples from plants grown under +N and N condi- the diazotrophic community was analyzed were excised and tions, respectively. Rarefaction curves were used to assess reamplified, and the products were sequenced (Figure 2(a)). OTU richness and showed that an asymptote was reached BLASTn analyses revealed that all the bands were closely for both treatments analyzed, with a higher number of OTUs related to nifH genes of Gammaproteobacteria (71%), observed in the +N treatment (Figure S1). Betaproteobacteria, and Cyanobacteria (4%), while 21% were The relative abundance of the microbial clades at two related to unculturable bacteria (Table S5). From the last taxonomic levels (phylum and class) is summarized in group, the first hit from the BLASTn analysis that matched Figure 4. At the phylum level, OTUs related to Proteobacteria to a culturable strain was also taken into account. In this case, and Firmicutes dominated both treatments analyzed, with all the sequences were closely associated with nifH genes relative abundances of 75/65 and 18/27% in each treatment − from Gammaproteobacteria members (Table S5). (+/ N), respectively. In addition, Actinobacteria and Bacter- oidetes OTUs accounted for approximately 2% of the relative bacterial abundances in both treatments. 3.2. Quantification of the Endophytic and Diazotrophic- An analysis at the class level showed that in the +N Endophytic Bacterial Communities. The abundances of the treatment, Gammaproteobacteria was the most abundant bacterial endophytic and endophytic-diazotrophic commu- class (45%), followed by Betaproteobacteria (21%), Bacilli nities were assessed by qPCR using the 16S rRNA and nifH (18%), Alphaproteobacteria (8.9%), and Actinobacteria genes. The standard curves for the 16S rRNA gene (2%). The most abundant class in the −N treatment was also International Journal of Genomics 7

Phyla Class 100% 100%

80% 80%

60% 60%

40% 40% Relative abundance Relative abundance

20% 20%

0% 0% +N −N +N −N

Unassigned Bacteroidetes Unassigned Gammaproteo Bacilli Proteobacteria Actinobacteria Sphingobacteria bacteria Alphaproteo Firmicutes Saprospirae Flavobacteria bacteria Other Betaproteobacteria Actinobacteria (a) (b)

Figure 4: Taxonomic composition of the 16S rRNA samples associated with sweet sorghum plants (cv. M81E) grown in the field under different N fertilization levels (+/−N). Relative abundance (over 0.5%) of the bacteria at the level of (a) phylum and (b) class.

Gammaproteobacteria (48%), followed by Bacilli (27%), endophytic bacterial community within each organ analyzed Alphaproteobacteria (13%), Betaproteobacteria (4.2%), and (roots and stems). This result agrees with a previous study, in Actinobacteria (2.4%) (Figure 4). which the structures of sorghum stem and root communities Analysis of OTUs exhibiting a 10-fold change in abun- were shown to be significantly different [42]. Moreover, the dance between treatments showed that OTUs that were most effects of chemical fertilization on the endophytic bacterial affected by the N fertilization treatment were members of the community structure were reported for several grasses, such Beta- and Gammaproteobacteria, as well as Firmicutes. as Dactylis glomerata, Festuca rubra, and Lolium perenne Within the Betaproteobacteria class, the OTUs associated [43, 44]. Different selective processes act together during with the genus Herbaspirillum increased from 0.1 to 5.0% the recruitment of bacteria that finally colonize the internal after the N fertilization treatment. For the Gammaproteobac- tissues of plants [8]. Thus, the combination of N teria, large changes were observed in OTUs related to the fertilization and the specific plant organ features (e.g., high genus Erwinia, showing a decrease from 21.6 to 1.1%, while sugar content) could be the primary factors that influenced Pseudomonas increased from 1.3 to 17.9% after the N fertili- the bacterial endophyte structure observed in this study. zation treatment. Finally, within Firmicutes, OTUs related to Additionally, we observed that the bacterial abundance in the genus Bacillus decreased from 14.8 to 8.6% in response to the stems of both treatments analyzed was statistically the − fertilization with 100 kg N ha 1. same and was not influenced by the N fertilization treatment. Nevertheless, the bacterial abundance on roots was increased 4. Discussion in response to N fertilization, a case where the diversity was lower but not significantly so. Because root tissues are the pri- Understanding the effects of agronomical practices, e.g., mary entry point for bacteria, N fertilization may directly chemical fertilization, on plant microbiota is necessary to affect the physiological state of roots and the bacteria in the optimize plant-microbiota communities with the aim of vicinity that can effectively infect the root internal tissues as improving agronomical sustainability. a consequence [39, 45]. In this work, different culture-independent methods were With respect to the bacteria identified, the results showed used to evaluate the impact of chemical N fertilization on the that the 16S rRNA gene sequences from bands retrieved from endophytic and diazotrophic-endophytic communities asso- DGGE were all related to Proteobacteria, as was previously ciated with the commercial sweet sorghum cv. M81E grown reported for arable sweet sorghum [46]. Moreover, high- under field conditions. These methodologies are powerful throughput sequencing analysis based on the 16S rRNA gene tools that have greatly contributed to identifying the micro- from the potential endophytic bacterial community also bial composition and diversity in a wide range of ecosystems, showed that most of the OTUs obtained from both treat- including the interior of plant tissues [38–41]. ments were also related to Proteobacteria. These results agree Our results demonstrated that N fertilization in the field with those obtained by Maropola et al. [42], who observed influenced the structure but not the diversity of the that Proteobacteria, in addition to Firmicutes and 8 International Journal of Genomics

Actinobacteria, were the most dominant phyla in both com- The identities of the nifH amplicon sequences retrieved munities analyzed (roots and stems) of sweet sorghum from the DGGE gels belonged to the phyla Cyanobacteria plants. and Proteobacteria. Within the Proteobacteria, as was Interestingly, when Proteobacteria (Alpha and Beta), observed in the 16S rRNA gene analysis, the most abundant Actinobacteria, and Firmicutes phyla were analyzed by class detected was the Gammaproteobacteria. Similar results DGGE, several genera were identified, in agreement with pre- were obtained when the diversity of the nifH gene pools of vious a study in which the culturable endophytic community sweet sorghum was studied, but in this case, the most abun- associated with sweet sorghum was analyzed [19, 47, 48]. dant classes affected by the chemical N fertilization were Moreover, from the DGGE analysis, we showed that the Alpha- and Betaproteobacteria [49]. Is interesting to note genera Massilia from the class Betaproteobacteria, as well as that within the Gammaproteobacteria, most of the sequences Bacillus from the phylum Firmicutes, were well represented. were from the genera Enterobacter and Klebsiella, which are Species of both genera are common inhabitants of the inner well described as diazotrophic-endophytic plant growth pro- tissues of various species of plants, where they play an impor- moters [52, 53] and as being associated with sweet sorghum tant role in plant protection and growth promotion [4, 8, 14]. plants [19, 48]. These results stress the role that these genera With respect to the effect of the chemical fertilization on may play as PGP diazotrophs in the microbiota associated the composition of the bacterial community, it was interest- with sweet sorghum plants. ing to observe that the Betaproteobacteria, as well as mem- bers of the class Bacilli, had large shifts in the number of 5. Conclusions OTUs present after the application of the N fertilizer. OTUs that increased in abundance in the presence of N fertilizer The results obtained in our study showed that the application were associated with the genera Pseudomonas and Herbaspir- of N fertilizer affected the structure, abundance, and compo- illum, while the OTUs that decreased were associated with sition of the endophytic bacterial communities associated the genera Bacillus and Erwinia. The association of represen- with sweet sorghum plants. This effect was pronounced tatives of these genera with different plants (including sweet in the roots of both bacterial communities analyzed and sorghum) has previously been described [14]. Our results may have depended of the physiological state of the plants. support the hypothesis that the physiological state of the Moreover, specific bacterial classes and genera increased or plant modulates the bacterial microbiota composition decreased when the fertilizer was applied. The data obtained recruiting specific bacteria, a phenomenon that may play a in this study contribute to a better understanding of the key role in promoting plant health and growth [8, 39, 45]. effects of different agronomical practices on the microbiota Further experiments are needed to determine the specific associated with this important crop, which may help improve functions of the identified bacterial genera in the microbiota its sustainability. of sweet sorghum. In addition, we observed that the level of N fertilization Abbreviations was the primary factor affecting the structure of the diazotrophic-endophytic bacterial community, but it did DGGE: Denaturing gradient gel electrophoresis not significantly affect its diversity. In addition, the abun- PGPB: Plant growth-promoting bacteria dances of the nifH gene in the internal tissues of both BNF: Biological nitrogen fixation. analyzed treatments were not significantly different, except for the roots of plants grown without N fertilization. In the Data Availability latter case, the abundance of the diazotrophic-endophytic bacterial community was higher and less diverse than that The data used to support the findings of this study are in the roots of plants grown without N fertilization. These available from the corresponding author upon request. results also support the hypothesis that under certain condi- tions, a specific endophytic community is recruited from a Conflicts of Interest pool of opportunistic bacteria present in the soil and that those most competitive can infect and survive within the The authors declare that there is no conflict of interest plant tissue environment [8, 42, 46]. In this study, the regarding the publication of this paper. absence of N fertilization may have prompted plants to recruit diazotrophic bacteria, which may contribute to the Acknowledgments plant growth promotion via the BNF process. This is sup- ported by the observation that positive BNF was detected in This work was supported by grants from the Sectorial the roots of sweet sorghum plants by an acetylene reduction Energy Fund (Project FSE_2011_1_5911) of the Uruguayan assay (ARA) [48]. Moreover, our results are consistent with National Agency for Innovation and Research (Agencia previous studies in which the nifH gene abundance of sweet Nacional de Investigación e Innovación—ANII) and the sorghum, maize, and rice treated with different levels of nitro- Uruguayan Program for the Development of the Basic gen fertilizer was studied [49–51]. However, it should be Sciences (Programa de Desarrollo de las Ciencias Básicas—- noted that because the present study was based on the extrac- PEDECIBA). The authors are very grateful to Ing. Agr. tion of total DNA from the roots and stems of sorghum plants, Fernando Hackembruch from the Agriculture Department we cannot assume that the nifH genes were actually active. of the Alcoholes Uruguay S.A. (ALUR S.A.) for the plant International Journal of Genomics 9 material supplied and Dr. Claudia Piccini for her critical structuring rhizosphere bacterial communities?,” FEMS reading and inputs to the manuscript. Microbiology Ecology, vol. 72, no. 3, pp. 313–327, 2010. [11] P. R. Hardoim, L. S. van Overbeek, G. Berg et al., “The hidden world within plants: ecological and evolutionary consider- Supplementary Materials ations for defining functioning of microbial endophytes,” Microbiology and Molecular Biology Reviews, vol. 79, no. 3, Table S1: primers used for PCR amplification. Table S2: pp. 293–320, 2015. primers used for Ion Torrent pyrosequencing analysis. [12] K. Ulrich, A. Ulrich, and D. Ewald, “Diversity of endophytic Table S3: alpha diversity indices. Statistical analysis of bacterial communities in poplar grown under field conditions,” – the total endophytic and diazotrophic-endophytic bacterial FEMS Microbiology Ecology, vol. 63, no. 2, pp. 169 180, 2008. community associated with sweet sorghum cv. M81E, grown [13] M. R. R. Coelho, I. E. Marriel, S. N. Jenkins, C. V. Lanyon, under different N fertilization levels (+/−N). Table S4: identi- L. Seldin, and A. G. O’Donnell, “Molecular detection and fi fication by NCBI BLASTn of 16S rRNA sequences retrieved quanti cation of nifH gene sequences in the rhizosphere of from DGGE bands. Table S5: identification by NCBI sorghum (Sorghum bicolor) sown with two levels of nitrogen fertilizer,” Applied Soil Ecology, vol. 42, no. 1, pp. 48–53, 2009. BLASTx of nifH sequences retrieved from DGGE bands. [14] M. Rosenblueth and E. Martínez-Romero, “Bacterial endo- Figure S1: rarefactions curves of sweet sorghum samples ” treated with high and low N fertilization levels (+N and –N, phytes and their interactions with hosts, Molecular Plant- Microbe Interactions, vol. 19, no. 8, pp. 827–837, 2006. respectively). (Supplementary Materials) [15] D. Tilman, “The greening of the green revolution,” Nature, References vol. 396, no. 6708, pp. 211-212, 1998. [16] P. M. Vitousek, J. D. Aber, R. W. Howarth et al., “Human alter- ” [1] A. Almodares and M. R. Hadi, “Production of bioethanol from ation of the global nitrogen cycle: sources and consequences, – sweet sorghum: a review,” African Journal of Agricultural Ecological Applications, vol. 7, no. 3, pp. 737 750, 1997. Research, vol. 4, pp. 772–780, 2009. [17] A. O. Adesemoye and J. W. Kloepper, “Plant-microbes inter- ffi ” [2] R. Mendes, P. Garbeva, and J. M. Raaijmakers, “The rhizo- actions in enhanced fertilizer-use e ciency, Applied Microbi- – sphere microbiome: significance of plant beneficial, plant ology and Biotechnology, vol. 85, no. 1, pp. 1 12, 2009. pathogenic, and human pathogenic microorganisms,” FEMS [18] M. M. Roper and V. V. S. R. Gupta, “Enhancing non-symbiotic – fi ” Microbiology Reviews, vol. 37, no. 5, pp. 634 663, 2013. N2 xation in agriculture, The Open Agriculture Journal, – [3] B. J. E. Schulz and C. J. C. Boyle, “What are endophytes?,” in vol. 10, no. 1, pp. 7 27, 2016. Microbial Root Endophytes, B. Schulz, C. Boyle, and T. N. [19] C. Mareque, C. Taulé, M. Beracochea, and F. Battistoni, “Isola- Sieber, Eds., Springer, Berlin, Heidelberg, 2006. tion, characterization and plant growth promotion effects of [4] G. Santoyo, G. Moreno-Hagelsieb, M. del Carmen Orozco- putative bacterial endophytes associated with sweet sorghum Mosqueda, and B. R. Glick, “Plant growth-promoting bacterial (Sorghum bicolor (L) Moench),” Annales de Microbiologie, endophytes,” Microbiological Research, vol. 183, pp. 92–99, vol. 65, no. 2, pp. 1057–1067, 2015. 2016. [20] M. K. Chelius and E. W. Triplett, “The diversity of archaea and [5] D. Egamberdieva, S. J. Wirth, V. V. Shurigin, A. Hashem, and bacteria in association with the roots of Zea mays L,” Microbial E. F. Abd_Allah, “Endophytic bacteria improve plant growth, Ecology, vol. 41, no. 3, pp. 252–263, 2001. symbiotic performance of chickpea (Cicer arietinum L.) and [21] F. Poly, L. Ranjard, S. Nazaret, F. Gourbiere, and L. J. induce suppression of root rot caused by Fusarium solani Monrozier, “Comparison of nifH gene pools in soils and soil under salt stress,” Frontiers in Microbiology, vol. 8, pp. 1–13, microenvironments with contrasting properties,” Applied and 2017. Environmental Microbiology, vol. 67, no. 5, pp. 2255–2262, [6] R. D. Lally, P. Galbally, A. S. Moreira et al., “Application of 2001. endophytic Pseudomonas fluorescens and a bacterial consor- [22] P. Simonet, M. C. Grosjean, A. K. Misra, S. Nazaret, tium to Brassica napus can increase plant height and biomass B. Cournoyer, and P. Normand, “Frankia genus-specific under greenhouse and field conditions,” Frontiers in Plant characterization by polymerase chain reaction,” Applied and Science, vol. 8, pp. 1–10, 2017. Environmental Microbiology, vol. 57, no. 11, pp. 3278–3286, [7] Y. Liu, L. Cao, H. Tan, and R. Zhang, “Surface display of ACC 1991. deaminase on endophytic Enterobacteriaceae strains to [23] H. Heuer, M. Krsek, P. Baker, K. Smalla, and E. M. Wellington, increase saline resistance of host rice sprouts by regulating “Analysis of Actinomycete communities by specific amplifica- plant ethylene synthesis,” Microbial Cell Factories, vol. 16, tion of genes encoding 16S rRNA and gel-electrophoretic sep- no. 1, pp. 214–219, 2017. aration in denaturing gradients,” Applied and Environmental [8] P. R. Hardoim, L. S. van Overbeek, and J. D. Elsas, “Properties Microbiology, vol. 63, no. 8, pp. 3233–3241, 1997. of bacterial endophytes and their proposed role in plant [24] J. K. Brons and J. D. Van Elsas, “Analysis of bacterial commu- growth,” Trends in Microbiology, vol. 16, no. 10, pp. 463–471, nities in soil by use of denaturing gradient gel electrophoresis 2008. and clone libraries, as influenced by different reverse primers,” [9] A. V. Sturz, B. R. Christie, and J. Nowak, “Bacterial endo- Applied and Environmental Microbiology, vol. 74, no. 9, phytes: potential role in developing sustainable systems of crop pp. 2717–2727, 2008. production,” Critical Reviews in Plant Sciences, vol. 19, no. 1, [25] P. Garbeva, J. A. Van Veen, and J. D. Van Elsas, “Predominant pp. 1–30, 2000. Bacillus spp. in agricultural soil under different management [10] P. G. Dennis, A. J. Miller, and P. R. Hirsch, “Are root exudates regimes detected via PCR-DGGE,” Microbial Ecology, vol. 45, more important than other sources of rhizodeposits in no. 3, pp. 302–316, 2003. 10 International Journal of Genomics

[26] N. C. M. Gomes, H. Heuer, J. Schönfeld, R. Costa, [41] A. Sessitsch, P. Hardoim, J. Döring et al., “Functional charac- L. Mendonça-Hagler, and K. Smalla, “Bacterial diversity of teristics of an endophyte community colonizing rice roots as the rhizosphere of maize (Zea mays) grown in tropical soil revealed by metagenomic analysis,” Molecular Plant-Microbe studied by temperature gradient gel electrophoresis,” Plant Interactions, vol. 25, no. 1, pp. 28–36, 2012. – and Soil, vol. 232, no. 1/2, pp. 167 180, 2001. [42] M. K. A. Maropola, J.-B. Ramond, and M. Trindade, “Impact [27] H. Heuer, K. Hartung, G. Wieland, I. Kramer, and K. Smalla, of metagenomic DNA extraction procedures on the iden- “Polynucleotide probes that target a hypervariable region of tifiable endophytic bacterial diversity in Sorghum bicolor 16S rRNA genes to identify bacterial isolates corresponding (L. Moench),” Journal of Microbiological Methods, vol. 112, to bands of community fingerprints,” Applied and Environ- pp. 104–117, 2015. – mental Microbiology, vol. 65, no. 3, pp. 1045 1049, 1999. [43] F. Wemheuer, K. Kaiser, P. Karlovsky, R. Daniel, S. Vidal, and [28] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. B. Wemheuer, “Bacterial endophyte communities of three Lipman, “Basic local alignment search tool,” Journal of Molec- agricultural important grass species differ in their response ular Biology, vol. 215, no. 3, pp. 403–410, 1990. towards management regimes,” Scientific Reports, vol. 7, no. 1, [29] J. A. Di Rienzo, F. Casanoves, M. G. Balzarini, L. Gonzalez, article 40914, 2017. M. Tablada, and C. W. Robledo, InfoStat Versión, Grupo Info- [44] F. Wemheuer, B. Wemheuer, D. Kretzschmar et al., “Impact of Stat, FCA, Universidad Nacional de Córdoba, Argentina, 2016, grassland management regimes on bacterial endophyte diver- http://www.infostat.com.ar. sity differs with grass species,” Letters in Applied Microbiology, [30] D. Bulgari, P. Casati, F. A. Quaglino, and P. A. Bianco, “Endo- vol. 62, no. 4, pp. 323–329, 2016. phytic bacterial community of grapevine leaves influenced by [45] L. C. Carvalhais, P. G. Dennis, B. Fan et al., “Linking plant sampling date and phytoplasma infection process,” BMC nutritional status to plant-microbe interactions,” PLoS One, Microbiology, vol. 14, no. 1, p. 198, 2014. vol. 8, no. 7, article e68555, 2013. [31] T. Pellizzaro Pereira, F. P. do Amaral, P. Dall’Asta, F. C. A. [46] J.-B. Ramond, F. Tshabuse, C. W. Bopda, D. A. Cowan, and Brod, and A. C. M. Arisi, “Real-time PCR quantification of M. I. Tuffin, “Evidence of variability in the structure and the plant growth promoting bacteria Herbaspirillum seropedi- recruitment of rhizospheric and endophytic bacterial commu- cae strain SmR1 in maize roots,” Molecular Biotechnology, nities associated with arable sweet sorghum (Sorghum bicolor vol. 56, pp. 660–670, 2014. (L) Moench),” Plant and Soil, vol. 372, no. 1-2, pp. 265–278, [32] J. A. Whelan, N. B. Russell, and M. A. Whelan, “A method for 2013. the absolute quantification of cDNA using real-time PCR,” [47] J. L. Grönemeyer, C. S. Burbano, T. Hurek, and B. Reinhold- Journal of Immunological Methods, vol. 278, no. 1-2, Hurek, “Isolation and characterization of root-associated bac- pp. 261–269, 2003. teria from agricultural crops in the Kavango region of [33] H. X. Wang, Z. L. Geng, Y. Zeng, and Y. M. Shen, “Enriching Namibia,” Plant and Soil, vol. 356, no. 1-2, pp. 67–82, 2012. plant microbiota for a metagenomic library construction,” [48] W. L. Pedersen, K. Chakrabarty, R. V. Klucas, and A. K. Environmental Microbiology, vol. 10, no. 10, pp. 2684–2691, Vidaver, “Nitrogen fixation (acetylene reduction) associated 2008. with roots of winter wheat and sorghum in Nebraska,” [34] S. Jünemann, K. Prior, R. Szczepanowski et al., “Bacterial com- Applied and Environmental Microbiology, vol. 35, no. 1, munity shift in treated periodontitis patients revealed by ion pp. 129–135, 1978. torrent 16S rRNA gene amplicon sequencing,” PLoS One, [49] M. R. R. Coelho, M. de Vos, N. P. Carneiro, I. E. Marriel, vol. 7, no. 8, article e41606, 2012. E. Paiva, and L. Seldin, “Diversity of nifH gene pools in the rhi- [35] A. H. Paterson, J. E. Bowers, R. Bruggmann et al., “The Sor- zosphere of two cultivars of sorghum (Sorghum bicolor) ghum bicolor genome and the diversification of grasses,” treated with contrasting levels of nitrogen fertilizer,” FEMS Nature, vol. 457, no. 7229, pp. 551–556, 2009. Microbiology Letters, vol. 279, no. 1, pp. 15–22, 2008. [36] B. Langmead and S. L. Salzberg, “Fast gapped-read alignment [50] A. Rodríguez-Blanco, M. Sicardi, and L. Frioni, “Plant geno- with Bowtie 2,” Nature Methods, vol. 9, no. 4, pp. 357–359, type and nitrogen fertilization effects on abundance and 2012. diversity of diazotrophic bacteria associated with maize [37] V. S. Pylro, L. F. W. Roesch, D. K. Morais, I. M. Clark, P. R. (Zea mays L.),” Biology and Fertility of Soils, vol. 51, Hirsch, and M. R. Tótola, “Data analysis for 16S microbial no. 3, pp. 391–402, 2015. profiling from different benchtop sequencing platforms,” [51] Z. Tan, T. Hurek, and B. Reinhold-Hurek, “Effect of Journal of Microbiological Methods, vol. 107, pp. 30–37, 2014. N-fertilization, plant genotype and environmental conditions [38] A. Campisano, L. Antonielli, M. Pancher, S. Yousaf, M. Pindo, on nifH gene pools in roots of rice,” Environmental Microbiol- and I. Pertot, “Bacterial endophytic communities in the grape- ogy, vol. 5, no. 10, pp. 1009–1015, 2003. vine depend on pest management,” PLoS One, vol. 9, no. 11, [52] A. L. Iniguez, Y. Dong, and E. W. Triplett, “Nitrogen fixation article e112763, 2014. in wheat provided by Klebsiella pneumoniae 342,” Molecular [39] T. F. da Silva, R. E. Vollú, J. M. Marques, J. F. Salles, and Plant-Microbe Interactions, vol. 17, no. 10, pp. 1078–1085, L. Seldin, “The bacterial community associated with rose- 2004. scented geranium (Pelargonium graveolens) leaves responds [53] C. Taulé, A. Castillo, S. Villar, F. Olivares, and F. Battistoni, to anthracnose symptoms,” Plant and Soil, vol. 414, no. 1-2, “Endophytic colonization of sugarcane (Saccharum offici- pp. 69–79, 2017. narum) by the novel diazotrophs Shinella sp. UYSO24 and [40] D. Fischer, B. Pfitzner, M. Schmid et al., “Molecular character- Enterobacter sp. UYSO10,” Plant and Soil, vol. 403, no. 1-2, isation of the diazotrophic bacterial community in uninocu- pp. 403–418, 2016. lated and inoculated field-grown sugarcane (Saccharum sp.),” Plant and Soil, vol. 356, no. 1-2, pp. 83–99, 2012. Hindawi International Journal of Genomics Volume 2018, Article ID 1974151, 17 pages https://doi.org/10.1155/2018/1974151

Review Article New Genomic Approaches to Enhance Biomass Degradation by the Industrial Fungus Trichoderma reesei

Renato Graciano de Paula , Amanda Cristina Campos Antoniêto, Liliane Fraga Costa Ribeiro, Cláudia Batista Carraro, Karoline Maria Vieira Nogueira, Douglas Christian Borges Lopes , Alinne Costa Silva, Mariana Taíse Zerbini, Wellington Ramos Pedersoli, Mariana do Nascimento Costa, and Roberto Nascimento Silva

Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of Sao Paulo, Ribeirao Preto, SP, Brazil

Correspondence should be addressed to Renato Graciano de Paula; [email protected]

Received 21 March 2018; Revised 20 June 2018; Accepted 29 July 2018; Published 24 September 2018

Academic Editor: Raul A. Platero

Copyright © 2018 Renato Graciano de Paula et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The filamentous fungi Trichoderma reesei is one of the most well-studied cellulolytic microorganisms. It is the most important fungus for the industrial production of to biomass deconstruction being widely used in the biotechnology industry, mainly in the production of biofuels. Here, we performed an analytic review of the holocellulolytic system presented by T. reesei as well as the transcriptional and signaling mechanisms involved with holocellulase expression in this fungus. We also discuss new perspectives about control of secretion and cellulase expression based on RNA-seq and functional characterization data of T. reesei growth in different carbon sources, which comprise glucose, cellulose, sophorose, and sugarcane bagasse.

1. Trichoderma reesei: Environmental and As a result of this versatility, Trichoderma species are very Lignocellulosic Biomass Degrader useful in many aspects that range from plant biocontrol [9] to various sorts of industries [10–13], especially for the cellulo- Trichoderma species are ubiquitous and cosmopolitan. lytic enzymes produced by them. Among all species from this They are very efficient colonizers of a variety of habitats genus and which are industrially used, Trichoderma reesei is and can be found from the tundra to the tropics [1], espe- the most studied one regarding lignocellulosic biomass cially in lignocellulosic material and plant rhizospheres, and degradation, since it is the main producer of cellulolytic this effectiveness is translated by the ability of competently and xylanolytic enzymes [14–17]. The ability of growing in degrading the available and of secreting different a wide range of carbon sources allows great variability in enzymes and metabolites used in the process [2–5]. The the production of cellulases, since the gene expression and capability of growing in such a variety of carbon sources is secretion of enzymes are directly dependent on the different also due to the high and fast capacity of responding to diverse chemical signals produced from the diverse substrates. Con- environmental signals, being able to adapt according to that sidering that the plant biomass, one of the most important current background and regulate its growth, conidiation, and complex substrates used by Trichoderma, is composed and the production of enzymes and secondary metabolites. of mono-, di-, and polysaccharides, the different sugars may These signals may vary from different nutrients found in have different levels of induction or repression of cellulase the milieu to the absence and presence of light, and genes. Some of the cellulase inducers are cellulose, β-glucan, adjusting to them is crucial for the survival of the micro- xylan, lactose, cellobiose, and sophorose, while glucose is the organism [2, 4, 6–8]. main repressor carbon source [18]. When T. reesei degrades 2 International Journal of Genomics

Lignin

XYN EAR FER AXE B)

C) XGL Holocellulose MAN GLU B) GAL ARA XYL MON Lignin (a) (b) Hemicellulose

 G Glucose EAR Endoarabinase Fructose GAL Galactosidase CEL6A AA9 AA9 Mannose FER Feruloyl Arabinose ARA Arabinofuronidase SWO EG CEL7A Galactose GLU Glucuronidase Glucuronic acid XGL Xyloglucanase CEL7A Feluric acid AXE Acetylxylanesterase G CEL6A Acetyl moiety EG Endoglucanase Cellobiose CEL6A Cellobiohydrolase II EG XYN Endoxylanase CEL7A Cellobiohydrolase I CEL6A EG CEL7A XYL -Xylosidase G -Glucosidase MON Mannosidase SWO Swollenin MAN Mannanase AA9 LPMO

(c) Cellulose

Figure 1: Global regulation of holocellulase expression in T. reesei. (a) The schematic structure of the lignocellulosic biomass, which is constituted by lignin and holocellulose, composed of hemicellulose and cellulose. All chains are drawn from the reducing (left) to the nonreducing end (right). (b) Enzymes that attack hemicellulose act in synergy in order to efficiently hydrolyze it and promote a more accessible surface area on cellulose, to enhance cellulase activities. (c) The enzymatic degradation of cellulose: EG acts by cleaving in amorphous regions of the chain, while CEL6A and CEL7A cleave at the nonreducing and reducing ends, respectively. The resulting oligosaccharides from this cleavage are then broken into monosaccharides by β-glucosidase, so they become capable of being directly metabolized by the organism. SWO expands the cellulolytic chain to improve cellulase accessibility to it, and AA9 works through bivalent- metallic-ion-dependent oxidative metabolism (based, among others, on [39, 144–146]). the lignocellulosic biomass, cellobiose may be converted into into second-generation bioethanol [13, 27, 28]. However, sophorose by a transglycosylation activity of a β-glucosidase considering that the plant biomass is highly recalcitrant, [19, 20]. The comparison of the genomes of Trichoderma large amounts of enzymes are needed during the hydroly- species, including T. reesei, suggests they have a mycoparasi- sis process, which make the biofuel production economi- tic common ancestral, probably from fungi that degrade cally unfeasible [29–31]. In this context, Trichoderma lignocellulosic material. Considering this, T. reesei may reesei may play an important role in decreasing costs for have maintained the mycoparasitic characteristic, which bioethanol production, whereas it is the filamentous fungus allows it to have advantages over other species when com- with the greatest capacity of degrading the lignocellulosic peting for substrate, through the conversion of cellobiose biomass [2, 15, 32]. into sophorose by transglycosylation to be metabolized [21, 22]. Differently from other fungi, in T. reesei, sophorose 2. The Global Analysis of the Enzymatic acts as a very potent cellulase inducer in very low concentra- Repertoire of Trichoderma reesei tions, being able to induce the expression of some xylanases as well [23, 24]. The filamentous fungus T. reesei obtains energy through the With the increasing concern about the environmental degradation of the lignocellulosic biomass, composed spe- disadvantages that fossil fuels and nuclear energy represent cially of the cellulose polymer [2, 33, 34]. The cellulolytic nowadays, there has been considerable pursuit for new ways enzymes synergistically act to degrade the cellulose polymer. of generating large-scale renewable green energy. One of Regarding the site of action, these enzymes are classified in at the currently most promising possibilities is the usage of least three large groups: the endoglucanases, which cleave the lignocellulosic material from agricultural and industrial internal bonds of the cellulose fiber; the exoglucanases, which waste [25, 26]. This material is mainly composed of cellulose act in the external region of the cellulose chain; and β-gluco- and hemicellulose, which suffer enzymatic hydrolysis and are sidases, which hydrolyze soluble oligosaccharides into glu- converted into simple carbohydrate monomers and finally cose molecules [2, 35] (Figure 1). Recently, LPMOs (lytic International Journal of Genomics 3 polysaccharide monooxygenases) have been suggested to make the cellulose polymer more accessible to the action of traditional cellulases through an oxidative mechanism [36]. −10 −5 0 5 10 In addition to the LPMOs, a protein known as swollenin also Cellulose participates in the deconstruction of the plant cell wall by Sophorose breaking the hydrogen bonds between the cellulose microfi- brils without hydrolyzing them, thus increasing the biomass Glucose degradation efficiency [37] (Figure 1). In 2012, Häkkinen and coworkers performed a reannota- Cellulose tion of all genes encoding enzymes belonging to CAZy Sophorose — (Carbohydrate-Active Enzymes http://www.cazy.org) in Glucose T. reesei and identified 201 genes of glycosyl , 22 carbohydrate , and 5 genes of polysaccharide Cellulose [38]. The cellulases produced by T. reesei belong to six GH families: endo-β-1,4-D-glucanases are found in Sophorose the GH5, GH7, GH12, and GH45 families, the exoglucanases Glucose in the GH6 and GH7 families, and the β-glucosidases in the family GH3 [39]. The Cel7a is the dominant of the Sugarcane bagasse cellulolytic complex, comprising about 60% of the total QM6a QM9414 Δ cre1 Δ xyr1 proteins secreted by T. reesei, followed by Cel6a (20%), and cel6a then the endoglucanases, mostly Cel7b (10%). In fewer cel7a amounts, the β-glucosidases represent only 1% of the total AA9 cel61b  -glucanase  -mannanase  -glucosidase proteins secreted by this species [40–42].  -, -trehalase In 2014, dos Santos Castro and coworkers showed that  -1,3-glucanase Endo-  -1,4-xylanase

when T. reesei is grown in cellulase inducing carbon sources, cel12a Endoglucanase

such as cellulose and sophorose, CAZyme-encoding genes Candidate Candidate Candidate are highly transcriptionally induced. An opposite profile is  -xylosidase/

observed during growth in glucose, in which most of the endo-1,3- Candidate genes encoding GH families had low expression. In this study, 61 genes encoding CAZymes were highly expressed Candidate in the presence of cellulose in comparison to glucose [21]. Figure 2: Heatmap expression of Top CAZy-encoding genes Among these, the main gene upregulated was a copper- differentially expressed in T. reesei strains grown in cellulose, dependent polysaccharide monooxygenase cel61b (ID sophorose, glucose, and sugarcane bagasse. The results of gene 120961), almost 3 thousand-fold more expressed in cellulose expressions were transformed in Log2FoldChange values and than in glucose. The endoglucanase cel12a (ID 123232) was employed to heat map construction using the GraphPad Prism the second most upregulated gene in this condition, followed version 7 program (https://www.graphpad.com/). by a β-mannanase. In sophorose, 58 genes were transcrip- tionally induced when compared to glucose. The most Biomass Degradation). In 2016, dos Santos Castro and expressed gene in this condition was the endoglucanase coworkers showed that the XYR1 regulator positively cel12a (ID 123232, as well as observed in cellulose). The regulates expression of at least 61 CAZyme genes in cellulose two cellobiohydrolases of T. reesei cel7a (ID 123989) and and 46 genes in sophorose [35]. The genes encoding a cel6a (ID 72567) ranked the second and third among the candidate α-xylosidase/α-glucosidase (ID 69944), endo-β- most expressed genes in sophorose [21] (Figure 2). In a study 1,4-xylanase (ID 120229), and a copper-dependent monoox- by de Paula [43], 98 CAZyme-encoding genes were upregu- ygenase polysaccharide candidate AA9 (ID 120961) were the lated during growth in sugarcane bagasse compared to major XYR1 target in the presence of cellulose. All of them glycerol in the T. reesei QM6a strain. The endoglucanase were more than 500-fold less expressed in the Δxyr1 mutant cel12a was the third most expressed gene in this condition, compared to the parental QM9414, and the first two genes similar to that found by dos Santos Castro and coworkers were also modulated by XYR1 in the presence of sophorose. [21] in the presence of cellulose. The two CAZyme genes However, none of these genes are under CRE1-mediated most expressed in sugarcane bagasse encode a mannanase carbon catabolic repression (CCR) [44, 45] (Figure 2). from the GH76 family (ID 122495) and the hemicellulase CRE1-mediated CCR was also evaluated regarding the xyn3 (ID 120229). Similar to that found by dos Santos Castro CAZymes of T. reesei by Antoniêto and coworkers [44, 45]. and coworkers in sophorose [21], the cellobiohydrolase cel7a In these studies, the most evident CCR occurred during was also highly expressed in sugarcane bagasse, achieving growth of T. reesei in glucose [45]. In this condition, several 326-fold higher expression in sugarcane bagasse than in genes encoding CAZymes were repressed by CRE1. A gene glycerol (ID 122495) (Figure 2). encoding a candidate endo-1,3-β-glucanase (ID 73256) was CAZyme gene expression is regulated at the transcrip- the most repressed by CRE1, being more than 2 thousand- tional level by transcription factors such as XYR1 and fold more expressed in the mutant Δcre1 compared to the CRE1 (for more details, see Transcriptional Regulation of QM9414, followed by another candidate β-1,3-glucanase 4 International Journal of Genomics

(ID 56418) and a candidate α,α-trehalase (ID 123456) permease family, have a characteristic to recognize and carry (Figure 2). The trehalase gene was also under positive regula- more than one type of sugar into the fungal cell. For example, tion of XYR1 [35]. Modulation of the trehalase gene by the the T. reesei STP1 transporter is involved in glucose and XYR1 and CRE1 regulators may be a strategy adopted cellobiose uptake [48], as well as Aspergillus nidulans by T. reesei to avoid the unnecessary use of the energy transporter XtrD was shown to be able to transport xylose, stock when a readily available carbon source is in the culture glucose, and several other monosaccharides [59]. medium, since this disaccharide acts as an energetic reserve Despite advances in studies about the involvement of in fungi [44]. Taken together, these results showed that, MFS sugar transporters during biomass degradation, a very although the cellulase gene expression profile is similar few sugar transporters have been functionally characterized during T. reesei growth in inducing carbon sources such as in T. reesei [48, 60, 61]. A deep transcriptomic and proteomic cellulose, sophorose, and sugarcane bagasse, the mechanisms study investigating the molecular basis for lignocellulose- employed by XYR1 and CRE1 to control the expression of degrading enzyme production in T. reesei during growth in specific genes are dependent on the carbon source available cellulose, sophorose, and glucose revealed new components in the environment (Figure 2). involved in cellulose degradation, including transporters, while the MFS permeases family was the most present in the different carbon sources evaluated [21]. In this study, 3. Involvement of Transporters during the MFS permeases differentially expressed in the three carbon Lignocellulosic Biomass Degradation sources were analyzed. The gene encoding MFS permease Process in Trichoderma reesei (ID 69957) and MFS permease (ID 76800), targets of light signaling in Trichoderma, were highly expressed in cellulose The process of lignocellulosic biomass utilization involves when compared to glucose and sophorose. In sophorose, a the capacity of T. reesei to sense the insoluble cellulose in gene encoding MFS maltose permease (ID 48444) was highly the environment and initiates the rapid production of the induced by this carbon source, while the MFS permease ID enzymatic machinery required to breakdown cellulose and 76641 was expressed at a higher level in glucose than in offer the nutrients to its growth [46–48]. Appropriated sense sophorose or cellulose [21]. Conversely, cultures with of the extracellular insoluble cellulose is key to initiating cellulose or sophorose promoted the expression of MFS the rapid synthesis of cellulases by this fungus, and the permeases at similar expression levels, including crt1 (ID uptake of soluble sugars released from biomass hydrolysis 3405), required for cellulase induction by cellulose and denote a potential point of control in the induced cascade lactose, besides mediating the cellulose sensing process in T. [48]. In this context, transporter proteins have an important reesei [48]; Hxt1, a glucose permease [60]; Str1 (ID 50894), role during the biomass degradation process. Transport a xylose transporter [61]; and MFS ID 79202, which is systems act in the sensing and uptaking of essential nutrients critical for cellulase production in lactose cultures, although and ions, allowing excretion of end products of metabolism it is related to a sucrose transporter. In addition, the stp1 and toxic substances. Also, these transporters are involved exhibited a higher level of expression in sophorose than in in communication between cells and the environment cellulose, although it is involved in cellobiose and glucose [49, 50]. It is possible that organisms sense cellulose through transport [21]. recognition of sugars by a transporter in the membrane. In In addition, in T. reesei, the global induction of genes in fact, two MFS sugar transporters, Stp1 and Crt1, were impli- response to exposure to wheat straw showed that MFS cated in cellulose sensing and cellulase induction in T. reesei transporters were highly transcribed in straw and repressed [48]. Transporter proteins can carry different small molecule in glucose-rich conditions. The genes encoding MFS ID inducers from the extracellular environment into the fungus 3405 (crt1), ID 50894 (str1), and ID 69957 were upregulated influencing the expression of CAZyme-encoding genes in the presence of straw and downregulated in the presence of [51–53]. Approximately, 5% (459 genes) of the T. reesei glucose [62]. Besides, during cultivation of T. reesei RUT C- genome comprises genes that encode proteins involved in 30 strain in sugarcane bagasse, the genes encoding the MFS transport [21]. Among these, the largest group of identified permeases ctr1 and str1 were strongly induced by this transporters belongs to the major facilitator superfamily carbon source at all the time points analyzed (6, 12, and (MFS), class of sugar transporters, followed by ABC (ATP 24 hours) [63]. The induction of MFS transporters may binding cassette) transporters. These two families of trans- be controlled by endogenous regulatory systems over a porter proteins have been most intensively studied among range of in vivo conditions. In T. reesei, the modulation of the fungal transporters [34, 54]. transporter expression was shown to be carbon source- The genomes of the filamentous fungus encode large dependent [21, 35, 44]. A global transcriptome analysis of numbers of MFS transporters [55, 56]. These proteins can the Δxyr1 T. reesei mutant strain and the parental strain transport a broad variety of substrates and are divided into QM9414 grown in the presence of cellulose, sophorose, and 17 distinct families among which three families (1, 5, and 7) glucose as sole carbon sources showed that genes encoding are involved in sugar transport into the cell. These trans- to proteins belonging to MFS permease superfamily are the porters carried out the transport of carbon sources, including most modulated genes by XYR1 (Figure 3). The genes that hexose and pentose sugars, and small soluble molecules in encode a MFS permease ID 69957, ID 44175, ID 54632, ID response to ion gradients [50, 57, 58]. In the filamentous 79202, ID 50894, and the ctr1 showed to be strongly fungus, sugar transporters, which belong to the MFS repressed by XYR1 in the presence of cellulose [35]. International Journal of Genomics 5

H H HO OH HO OH O O O O O O O O O O HO OH O HO OH O H H H H HO OH HO OH O O O O O O O O O O HO OH O HO OH O H H H H OH OH O O HO O O HO OOO O O O HO OH O HO OH O Signaling H H ID 44175 H H O HO OH O HO OH O O O O ID 68122 O O O O HO OH O HO OH O ID 69834 H H ID 60945 Cellulose ID 48444 ID 54632 −5 015 0 ID 50894 ID 79202 Cellulose ID 22912 XYR1 Sophorose ID 69957 QM9414 Glucose Induction Extracellular XYR1 ACE3 HAP2/3/5 BGL1 ACE2 Intracellular Cellulose CRE1 ACE1 Cellulase expression Sophorose OH Glucose HO O OH HO OH OH HO O HO OH OH OH HO O HO OH OH OH HO O Cellulose ID 44175 Signaling OH HO OH ID 68122 HO O OH HO OH Sophorose ID 69834 OH Δxyr1 Δcre1 ID 60945 Glucose Glucose ID 48444 ID 54632 ID 50894 ID 79202 Sugarcane bagasse QM6a CRE1 ID 22912 ID 69957

Extracellular 69957 22912 50894 79202 48444 60945 69834 44175 54632 68122 Repression Intracellular XYR1 ACE2 HAP2/3/5 BGL1 ACE3

CRE1 ACE1 Cellulase expression (a) (b)

Figure 3: Sugar transporters potentially associated with the degradation of biomass in T. reesei. (a) The predicted model of XYR1 and CRE1- mediated MFS permease regulation in T. reesei under cellulose inducing condition and glucose repressed condition. (b) Heatmap expression of Top MFS transporter-encoding genes differentially expressed in T. reesei strains grown in cellulose, sophorose, glucose, and sugarcane bagasse. The results of gene expressions were transformed in Log2FoldChange values and employed to heat map construction using the GraphPad Prism version 7 program (https://www.graphpad.com/).

Furthermore, it is possible that the protein transporters are source-dependent transcriptional regulation, which were involved in carbon catabolic repression (CCR) in T. reesei. upregulated in cellulose and sophorose [35]. In cellulose, During analysis between Δcre1 T. reesei mutant strain and the ABC transporter ID 76682 was highly upregulated, parental strain QM9414, grown in cellulose, glucose, and while the gene encoding the ABC transporter ID 80028 sophorose, the main genes marked by CRE1-mediated CCR was upregulated in sophorose [21]. Additionally, in T. reesei, encode to proteins belonging to MFS permeases superfamily. it was demonstrated that the transport of molecules by ABC Most genes of MFS permeases were upregulated by CRE1 in transporters is highly induced by wheat straw and lactose the presence of glucose. Among these, the expression of the than glucose [53, 65]. maltose permease gene (ID 48444) and three other MFS per- The ABC transporters showed to be modulated by mease genes (ID 60945, ID 79202, and ID 44174) was higher transcription factors such as XYR1 and CRE1. In Δxyr1 in the Δcre1 mutant than in the parental strain [45] T. reesei mutant strain, the genes encoding ABC trans- (Figure 3). In sophorose, the gene encoding a xylose trans- porters ID 55814, ID 60116 (MRP-type ABC transporter), porter gene (ID 104072) was highly repressed by CRE1. Dur- and ID 120114 were downregulated in the presence of ing the transcriptional analysis of T. reesei grown in wheat cellulose, sophorose, and glucose, respectively. Oppositely, straw and glucose, this gene was strongly transcribed in the gene encoding ABC transporter ID 58899 (MDR-type wheat straw and repressed in glucose [62]. ABC transporter) was upregulated in the presence of Furthermore, a large number of ABC transporters are cellulose [35]. In Δcre1 mutant strain, the gene encoding encoded in T. reesei [34]. Generally, ABC proteins are ABC transporter ID 76682 (PDR-type ABC transporter) integral to the membrane, acting as ATP-driven transporters was upregulated in the presence of cellulose [45] while in for several substrates, including lipids, sugar, drugs, heavy sophorose, the major CRE1 repressed gene was ID 76682. metals, and auxin [64]. In fungi, ABC transporters are However, four other ABC transporter genes (ID 82105, involved with the secretion of secondary metabolites, resis- 123293, 73924, and 58899) were repressed by CRE1 in the tance to toxic compounds, and cell signaling [63]. Although presence of this carbon source [44]. Among the identified the ABC transporters have been shown to be important for ABC transporters, most of them are correlated with multi- different processes in fungi, the role of the transporters drug resistance (MDR), pleiotropic drug resistance (PDR), belonging to the ABC family is still unclear in T. reesei. and multidrug resistance-related protein (MRP) ABC In this fungus, the ABC transporters showed carbon protein subfamilies [66]. Although the ABC transporters 6 International Journal of Genomics contribute to multidrug resistance in microbial pathogens regulates genes belonging to the CAZymes family, in a and tumor cells, in fungi, these transporters have been carbon source-dependent manner [35]. In this work, two scarcely studied [34]. In other fungi, the ABC transporters xylosidases were the main downregulated genes in the Δxyr1 are also related with mycoparasitic interaction and anti- mutant in the presence of cellulose (ID 69944; 1260-fold) and fungal resistance [67, 68], and interestingly, several studies sophorose (ID 121127; 209-fold). In addition to CAZymes, have shown the involvement of ABC transporters with the transcription factors and transporters were also regulated sugar transport [69]. For example, Sulfolobus solfataricus by XYR1. Besides XYR1, the deletion of the aceII tran- (extreme thermoacidophilic archaeon) uses several sugars scription factor causes a decrease in the transcription as the sole carbon and energy source. This sugar transport levels of most cellulases and significantly reduces the is mediated by two families of protein binding dependent cellulolytic activity when the fungus is grown in a cellulose- ABC transporters that may transport different sugars such containing medium [79]. Furthermore, the ACEIII transcrip- as arabinose, cellobiose, maltose, and trehalose [70]. In Clor- tion factor was more recently discovered, and its deletion was ynebacterium glutamicum, Watanabe et al. [71] described a detrimental to the production of cellulases and xylanases in functional characterization of a xyloside ABC transporter T. reesei [80]. It is believed that the HAP2/3/5 complex and an enhanced uptake of xylooligosaccharides in the promotes the formation of an open chromatin structure, presence of a functional xylEFG-encoded xyloside ABC necessary for the activation of the transcription process transporter. In Pyrococcus furiosus, cellobiose uptake [81]. In addition, the BglR transcription factor acts as a pos- involves an inducible ABC transporter system that not only itive regulator of β-glycosidases-specific genes [82]. Deletion binds cellobiose but also binds cellotriose, cellotetraose, cello- of the lae1 reduces the production of cellulases, xylanases, and pentaose, laminaribiose, laminaritriose, and sophorose [72]. the auxiliary factors CIP1 and swollenin [83]. Finally, deletion As exposed here, both MFS transporters and ABC of vel1 completely decreases the expression of cellulases, transporters are suggested to be involved during the process xylanases, and xyr1 genes in the presence of lactose. Interest- of biomass degradation in T. reesei. The novel transporters ingly, studies have shown that a combined action occurs identified offer new perspectives for studies about the between the LAE1/VEL1 complex for regulation of genes involvement of protein transporters in the expression and involved in biomass degradation [83, 84]. secretion of cellulase for degradation of lignocellulosic Among the negative regulators of cellulase gene biomass. With this, new proteins might be revealed as transcription, the main transcription factor is the CRE1 involved in the process to sense and transduce signals related catabolic repressor [75]. In 2014, Antoniêto et al. showed to biomass deconstruction, providing future strain improve- that, in addition to traditional cellulases, CRE1 deletion ment for cellulase production. The characterization of these affects the expression of genes involved in nutrient transport, transporters may allow the construction of more efficient other transcription factors, and oxidative metabolism. In the strains in the degradation of plant biomass and contribute presence of cellulose, this regulator represses genes encoding to the implementation of T. reesei in the bioethanol industry. copper transporters and ferric reductase enzymes and, conse- quently, inhibits the access of the traditional cellulases to the 4. Transcriptional Regulation of cellulose polymer. In the presence of glucose, CRE1 acts Biomass Degradation suppressing the expression of genes related to the entry of the inducers into the cell [45]. It was also shown that the The regulation of holocellulose degradation by the fungus transcription factor xyr1 (ID 122208) is the main target T. reesei is a highly coordinated phenomenon dependent regulator of CRE1 in the presence of glucose, being almost on the carbon sources available in the medium. In the 40 times more expressed in the mutant Δcre1 when com- presence of readily metabolizable carbon sources, such as pared to the parental QM9414 [45]. In sophorose, CRE1 glucose, the fungus represses the genes accountable for mainly modulates CAZYmes and membrane permeases, the expression of cellulolytic enzymes as a way of saving including maltose permeases that possibly act transporting energy, while in the presence of inductive carbon sources, the disaccharide sophorose [44]. Regarding the other nega- such as cellobiose and sophorose, the fungus activates the tive regulators of cellulose degradation, studies have shown cellulase production. At the transcriptional level, transcrip- that ACEI represses the expression of the major cellulolytic tion factors are the key proteins for regulating the expression genes (cbh1, cbh2, egl1, and egl2) and xylanases (xyn1 and of genes that act on the hydrolysis of the cellulose polymer xyn2), under the cellulose- and sophorose-inducing condi- [32, 73–75]. In T. reesei, ten transcription factors involved tions [85]. In 2017, Cao and coworkers identified a new in the regulation of this process have been identified so far. transcription factor, Rce1, which acts as a repressor of They are the positive regulators XYR1, ACEII, ACEIII, cellulase gene expression, antagonizing XYR1 by binding to LAE1, VEL1, BglR, and the HAP2/3/5 complex, as well the the cbh1 promoter [86]. repressors ACEI, RCE1, and CRE1 (Figure 4). Despite the numerous studies about the transcription In T. reesei, the XYR1 is the master positive regulator of factors already identified in T. reesei, several regulatory cellulase production, and deletion of this transcription factor proteins involved in the regulation of the lignocellulosic totally eliminates the cellulase gene expression and also biomass deconstruction have not been characterized yet impairs the induction of hemicellulolytic genes involved with (Figure 4). In a study of dos Santos Castro and coworkers the degradation of xylans and arabinans [73, 76–78]. In 2016, [21], the expression of 7 genes encoding transcription factors dos Santos Castro and coworkers showed that XYR1 mainly was affected during growth in cellulose. Likewise, 18 genes International Journal of Genomics 7

Sophorose Cellobiose OH OH OH O OH O OH HO O HO HO OH O HO OH HO O OH OH OH OH OH OH O O OH O OH OH HO HO O HO OH O HO O HO OH OH O OH OH OH

XYR1 VEL1 ACE3 LAE1 BglR ACE2

HAP2/3/5

?

Other ACE1 TFs

Holocellulases, transporters, TFs RCE1

CRE1 OH

HO O HO OH HO O OH HO OH Glucose OH

Figure 4: Overview of the transcriptional regulation of biomass degradation. The regulation of cellulose deconstruction involves at least 10 transcription factors: the positive regulators XYR1, ACEII, ACEIII, LAE1, VEL1, BglR, and the HAP2/3/5 complex, as well the repressors ACEI, RCE1, and CRE1. However, several transcription factors still not identified are potentially involved in this mechanism. were also modulated in the presence of sophorose and remodeling is necessary to promote cellulolytic enzyme glucose. In sugarcane bagasse, 88 transcription factors were expression [81, 87]. Interestingly, deleting the methyltrans- modulated in the QM6a strain compared to the growth in ferase LAE1 impairs cellulase expression, while overexpress- glycerol [43]. Expression of several genes encoding transcrip- ing this protein increases cellulase expression and changes in tion factors was also under regulation of XYR1 and CRE1. the H3K4 methylation pattern in the promoter region of During cultivation in cellulose and glucose, 14 transcription cel5b [83]. This mechanism of regulation was also demon- factors were targeted by the CRE1-mediated mechanism of strated in M. oryzae, in which deleting the methyltransferase carbon catabolic repression (CCR) [45]. In sophorose, 8 MoSET1 decreased cellulase induction [88]. Likewise, acetyl- regulators were also regulated by CRE1 [44]. The main are important enzymes involved with chromatin positive regulator of cellulase production, XYR1, regulates remodeling. In T. reesei, acetyltransferases belonging to the the expression of other 31 transcription factors in the pres- GCN5 family are crucial for cellulase expression [89]. During ence of cellulose, 17 in sophorose, and 7 in glucose [35]. In growth of the T. reesei QM6a in sugarcane bagasse, de Paula all these studies, most of the transcription factors modulated [43] showed that at least 9 genes related to chromatin remod- have not been characterized yet, which highlights the impor- eling were upregulated and 8 downregulated in comparison tance of exploring more about the regulatory proteins to glycerol. Of these genes, the acetyltransferase SidF (ID involved in the degradation of biomass. 82628) was almost 40 times more expressed in sugarcane bagasse, reaching the top of the list of genes modulated 5. Epigenetic Regulation of by this carbon source [43]. Also, dos Santos Castro and Holocellulase Expression coworkers showed that two genes encoding a SWI-SNF chromatin-remodeling complex protein (ID 123327 and ID Another important mechanism of gene transcriptional 122943) were also highly expressed in the presence of regulation involves the chromatin modification, a phenome- cellulose and sophorose [21]. This last gene was a target of non that includes the modification of the histones which are CCR by CRE1 under cellulose condition [45]. All these the proteins responsible for the DNA packaging. In T. reesei, findings reinforce the evidence of the involvement of the the evidence of nucleosome rearrangement in the pro- chromatin modifications in the mechanism of biomass moter region of cel6a and cel7a suggests that chromatin degradation by the filamentous fungus T. reesei. 8 International Journal of Genomics

The genetic engineering has become an important way involving protein kinases and [34], respec- for improving the production of holocellulolytic enzymes. tively. Schuster et al. [112] demonstrated the role of protein The modification of gene promoters has also been extensively kinase A (PKA) in the regulation of cellulase expression in studied. Recently, Hirasawa and coworkers modified the pro- the presence of light. Additionally, the deletion of a T. reesei moter region of xyn3 by using the xyn1 cis-acting region and casein kinase II promotes the decrease of chitinase expression obtained improved enzyme expression [90]. A study of Zou and repression of sporulation and glucose metabolism [113]. and coworkers also showed that the CRE1 binding motifs Similarly, He et al. [114] showed that the deletion of pro- in the promoter region of cbh1 were replaced by the binding tein kinase EKi1 increases the chitinase-encoding gene motifs of the positive regulators ACEII and the HAP2/3/5 expression, radial growth, conidiation, and ethanolamine complex, thus improving the promoter efficiency [91]. In accumulation in the cell wall. The MAPK-mediated phos- addition, Uzbas and coworkers obtained a Δxyr1 strain able phorylation can regulate important processes in T. reesei. to secrete the cellulases cel7a, cel7b, and cel12a under control The deletion of the MAPK gene tmk3 induced a decrease of the promoter regions of two highly expressed genes tef1 in transcription and cellulase production [115]. Furthermore, and cdna1 in a glucose-containing medium [92]. Similarly, it was also demonstrated that tmk2 is involved in processes the CCR in the presence of glucose was also eliminated by regulating cell wall integrity, sporulation and, cellulase the deletion of CRE1 binding sites and insertion of multiple production [116]. Recently, it was shown that MAPK IME2 copies of positive regulator-binding sites in the promoter represses the expression of the three main cellulase genes region of cbh1, also increasing the heterologous gene expres- (cbh1, cbh2, and egl1)inT. reesei as well as activates the sion in T. reesei [93]. These reports highlight the importance XYR1 and CRE1 expressions [117]. of investigating the transcription factors involved in the Finally, another interesting signaling cascade involved regulation of biomass degradation and the mechanisms with the regulation of important processes in fungi is the through which these regulators interact with the architec- intracellular calcium-dependent signaling pathway. The tural framework of the DNA in order to improve the calcium homeostasis is essential for organisms, and its intra- holocellulase production. cellular level reflects the environmental changes [118, 119]. In T. reesei, calcium-modulated protein (calmodulin) has an important role in xylanase expression [120]. Equally, it 6. Signaling Pathways Involved in Cellulose, was demonstrated that NCS (neuronal calcium sensor-like) Sophorose, Glucose, and Sugarcane interacts with and , regulat- Bagasse Recognition ing the light-dependent signaling pathway involved with the control of cellulase expression [121]. Recent reports The sensing of the changing environment is an important have shown the differential expression of genes encoding event for both survival and fungi competition. The fungi have calcium transporters and other calcium signaling pathway appreciated mechanisms involved in sending signals and members during cultivation of T. reesei QM9414, Δxyr1 e reacting to them [34, 94, 95]. Right after the perception of Δcre1 strains in the presence of cellulose, sophorose, and environmental signals, a complex network can be initiated, glucose [21, 35, 44, 45]. Similarly, the cultivation of T. reesei which is responsible for integrating all signals and promoting QM6a strain in sugarcane bagasse, glucose, and glycerol a suitable gene response to desirably react to the environment induced the expression of genes encoding Ca2+-ATPases conditions [34]. In T. reesei,different signaling pathways and other calcium signaling transporters [43]. These have been described to be involved with fungal development results suggest the involvement of different signaling path- and environment sensing. Among these, the signaling path- ways controlling cellulase expression in the filamentous ways dependent on G protein, cAMP, Ras-GTPases, protein fungus T. reesei. kinases, phosphatases, calcium, and MAPK are the mostly A deep analysis of gene expression in two functional characterized intracellular pathways although some aspects mutant strains of main transcription factors (XYR1 and remain unclear [95–104]. Several studies showed the involve- CRE1) [35, 44, 45] and two MAPK-encoding genes showed ment of Gα proteins in the regulation of cellulase gene that different signaling pathways and regulatory mechanisms transcription by light [105, 106]. A functional characteriza- are involved with the regulation of cellulase expression [43]. tion of a GPCR-encoding gene of Trichoderma atrovoviride The main finding of these studies was that the enzyme showed its role in vegetative growth and conidiation [107] regulation is carbon source-dependent. The growth of the and expression of chitinase-encoding genes [108]. Recently, QM9414 parental strain in the presence of cellulose and it was demonstrated that two G proteins Gβ and Gγs act sophorose revealed a distinct expression pattern of genes along with a class I phosducin protein controlling the expres- encoding proteins involved with intracellular signaling. sion of glycoside genes [109]. The cAMP pathway Additionally, 50 and 28 genes related to signal transduction is a highly conserved signaling cascade in which cAMP acts were differentially expressed in the presence of cellulose as secondary messenger promoting the integration of differ- and sophorose, respectively [35]. In cellulose, the main ent pathways [99]. In T. reesei, cAMP levels control cellulase upregulated gene encoded to a conidiospore surface protein gene expression [110] and according to Nogueira et al. [111], cmp1 (ID 72379), which was 21-fold more expressed in the this regulation is in a carbon source-dependent manner. parental strain. Oppositely, the most upregulated gene in The cellulase gene expression can also be regulated by the sophorose encoded to an unknown protein (ID 73119), dynamics of protein phosphorylation and dephosphorylation which was 8-fold more in the parental strain. Regarding International Journal of Genomics 9

Δxyr1 QM9414 sophorose cellulose QM9414 sophorose Δcre1 cellulose Unknown protein Casein kinase 1 Unknown protein Velvet protein GPCR MAPKK GPCR D G protein W-40 repeat Ca2+ signaling Serine/threonine protein kinase Unique protein Protein kinase cAMP dependente -protein 2+ Unknown protein Ca signaling Unknown protein Protein 2C Phospholipase C 2C 17 20 11 8 (16.2%) (19%) (13.1%) (9.5%)

1 4 0 6 1 0 (1.2%) (4.8%) (0%) (5.7%) (1%) (0%) 36 4 39 13 0 0 (42.9%) 2 2 (4.8%) (37.1%) (12.4%) (2.4%) (2.4%) RasGEF (0%) (0%) RasGEF RAS1 RAS1 1 RhoGTPase 1 RhoGTPase 4 4 3 (1.2%) GPCR 3 (1%) GPCR (4.8%) (4.8%) (2.9%) Unknown protein (2.9%) Unknown protein 0 4 1 (4.8%) Histidine kinases 0 Histidine kinases (0%) (HHK1; HHK6) (0%) (1%) 2+ (HHK1; HHK6) 3 Ca signaling Sphingoid long-chan base kinase Unique protein 1 RASGAP (3.6%) Unique protein Phospholipase C (1%) Phospholipase C Rad54 protein 2+ Unknown protein Ca signaling Ca2+ signaling Unique secreted protein Serine/threonine protein kinase Protein phosphatase 2C Serine/threonine protein kinase Δcre1 sophorose QM9414 cellulose Δxyr1 sophorose QM9414 cellulose (a) (b)

Figure 5: Expression pattern of differentially expressed signaling pathway genes in T. reesei during cultivation in cellulose and sophorose. (a) Comparative Venn diagram of expressed genes between the QM9414 and Δxyr1 mutant strains in the presence of cellulose and sophorose. Venn diagram clustering was designed using Venny 2.1 tools. (b) Comparative Venn diagram of expressed genes between the QM9414 and Δcre1 mutant strains in the presence of cellulose and sophorose. Venn diagram clustering was designed using Venny 2.1. downregulated genes, an unknown protein (ID 65522, 12- In the presence of sophorose, we observe distinct expres- fold) was the most repressed in the presence of cellulose. sion patterns, being GPCR, D and C, and Moreover, the most repressed gene in the presence of Ca2+- signaling the main pathways involved with carbon sophorose encoded the conidiospore surface protein cmp1 source recognition (Figure 5(a)). The Δxyr1 mutant strain (ID 72379) being 69-fold less expressed in this condition demonstrated a distinct pattern of regulation, and, in the [35]. Curiously, this gene was the most upregulated one when presence of cellulose, we observed the activation of casein the QM9414 mutant strain was grown in the presence of kinase 1 and MAPKK signaling pathways (Figure 5(a)). The cellulose. This result suggests that this protein may have an gene encoding to casein kinase 1 (ID 55049) was 2.5-fold important role in cellulose recognition although the aspects more expressed in this condition, and the MAPKK (ID about the regulatory mechanisms involved with the signal 57513) was increased about 2-fold in cellulose as the carbon transduction remain unclear. source [35]. Wang et al. [113] showed that the casein kinase Here, we compared the expression patterns of signaling pathway governs chitinase expression, and beyond that, pathway-encoding genes in two functional mutant strains casein kinase-dependent phosphorylation has been suggested of the XYR1 positive and CRE1 negative regulators of to be an important mechanism of regulation of DNA-binding cellulase expression in T. reesei. The reported studies have zinc finger proteins, such as CRE1 [122]. These results suggest shown there is a specific pattern of signaling pathways for that XYR1 is a negative regulator of some genes encoding the recognition of different carbon sources (Figure 5). In components of intracellular signaling pathways, such as the QM9414 parental strain, the RAS-GTPAses (RAS-EGF, MAPK and casein kinase cascades, involved with cellulose RAS1, and RhoGTPase) and histidine kinases (HHK1 and recognition being this regulation an additional mechanism HHK6) are the main activated proteins (Figure 5(a)). of transcriptional regulation of cellulase expression. Moreover, ras1 gene (ID 67275) and hhk6 gene (ID 62751) The functional Δcre1 mutant strain exhibited a specific were 8-fold and 2-fold more expressed in the presence of expression pattern related to signaling pathways. In the cellulose, respectively [35]. Zhang et al. [97] showed that presence of cellulose, the main signaling pathways involved ras1 plays important roles in some cellular processes such with cellulose sensing were Ca2+, phospholipase, cAMP, as polarized apical growth, hyphal branch formation, and velvet-dependent signaling pathways (Figure 5(b)). The sporulation, and cAMP level adjustment. Additionally, Ca2+-dependent protein kinase (ID 62181, 5.7-fold) was the deletion of GTPase ras2 modulates the expression of the most upregulated gene in this condition. Moreover, a major cellulase genes and transcription factors in cellulose. cAMP-dependent protein (ID 119614) was 2-fold more The growth and protein secretion of T. reesei in cellulose expressed in Δcre1 grown in cellulose [45]. The phospho- cultures were decreased in Δrho3 GTPase mutant strain, D gene (ID 22331) was 3.8-fold more expressed in suggesting rho3 is involved with secretion processes in this Δcre1 in the presence of cellulose [45]. Finally, the deletion fungus [100]. of cre1 promoted a decrease of 2.3-fold in the expression 10 International Journal of Genomics

QM6a sugarcane bagasse Δtmk2 sugarcane bagasse

Unknown protein cAMP phosphodiesterase (PDE1) Rho-like GTPase Rho2 GTPase Ca2+-signaling Germinal center kinase 66 22 25 GPCR 2+ Ca -signaling (54.1%) (18%) (20.5%) Histidine kinase HHK6 GPCR PTH11-GPCR Phospholipase C Phospholipase C 2 (1.6%) Serine/threonine protein kinase RasGAP 1 1 PTH11-GPCR (0.8%) (0.8%) Phospholipase D RhoA GTPase 5 Protein tyrosine phosphatases (4.1%) Unknown protein Ca2+-signaling PTH11-GPCR

Δtmk1 sugarcane bagasse

Figure 6: Expression pattern of differentially expressed signaling pathway genes in QM6a parental and two functional MAPK-mutant genes in the presence of sugarcane bagasse. (a) Comparative Venn diagram of expressed genes between the QM6a, Δtmk1, and Δtmk2 mutant strains in the presence of sugarcane bagasse. Venn diagram clustering was designed using Venny 2.1. of regulatory protein velvet 1 (ID 122284). Karimi- all genes related with intracellular signaling were involved Aghcheh et al. [84] showed the deletion of vel1 completely with sugarcane bagasse recognition in Δtmk1 mutant blocked the expression of xylanases, cellulases, and the strain. Curiously, Vercoe et al. [124] demonstrated in regulator XYR1 in the presence of lactose. Since vel1 was Ruminococcus flavefaciens, a cellulolytic bacterium, that downregulated with cre1 deletion, this result suggests that the phosphorylation and dephosphorylation dynamics are CRE1 might indirectly regulate the XYR1 expression in a important for carbon source recognition and regulation vel1-dependent mechanism, and this may be an unclear of carbon metabolism. Together, our results suggest that mechanism of CCR regulation in T. reesei. in T. reesei, there may be a mechanism for carbon source The growth of T. reesei QM6a in sugarcane bagasse recognition involving PTH11 GPCR, and that this process demonstrated a specific expression profile of signaling- might be regulated by posttranslational events. encoding genes. In this condition, the expression of genes The understanding of how signal transduction pathways belonging to cAMP, GTPase, germinal center kinase, modulate cellulase gene expression has acquired attention, calcium, phospholipases, GPCR, protein phosphatases, and such as tools for strain improvement in T. reesei [125]. PTH11 signaling pathways was observed (Figure 6). Among The adequate environmental perception is a crucial step these, PTH11-encoding genes were an overrepresented for regulation of gene expression. Thereby, changing trans- category, where eight genes were differentially expressed in mission of signals aiming at the adjustment of enzyme this condition. The most upregulated PTH11 (ID 69500) secretion in response to environmental conditions is an was 14.5-fold more expressed in the presence of sugarcane advantageous alternative to enhance the activity efficiency bagasse [43]. In the filamentous fungus Neurospora crassa, of genes of interest [18]. Different approaches can be PTH11 GPCR seems to be a yet-unclear key role in cellulose employed to improve the production of enzymes by T. reesei. recognition and in the control of holocellulytic enzyme One of them is the manipulation of the expression of expression [123]. De Paula [43] showed in a global transcrip- holocellulolytic-encoding genes by increasing gene expres- tional analysis of two functional mutants for the MAPK sion of activators and/or decreasing expression of repres- signaling pathway that there is a cross talk among different sors that control the expression of the cellulose- and signaling pathways in the presence of sugarcane bagasse. In hemicellulose-degrading enzymes. Thus, to carry out this the Δtmk2 mutant strain, the main signaling pathways task, a deeper knowledge of holocellulase regulation is involved with the sugarcane recognition are Rho GTPase, crucial [23, 35, 45, 76, 126–128]. GPCR, histidine kinase, and PTH11 GPCR (Figure 6). In order to accomplish improvement in cellulase Interestingly, PTH11-encoding genes were also overrepre- expression, different signal transduction mechanisms can sented in the tmk2 deletion when compared to the parental be exploited. The holocellulytic gene expression may be strain [43]. The deletion of tmk1 alters the expression of regulated by different factors such as light, carbon and few genes belonging to signaling pathways. Only 4.1% of nitrogen sources, pH, temperature, inorganic compounds, International Journal of Genomics 11 transcription factors, and epigenetic events [21, 35, 44, 45, 81, sophorose, there were three of unknown function. These 87, 126, 129–133]. These factors and their effects in regulat- results suggested these genes (were upregulated in inducing ing intracellular signaling pathways have been well studied conditions) play an important and so far a neglected role in in Trichoderma. As an example, in T. reesei, light and photo- biomass degradation. receptors BLR1 and BLR2, which belong to light signaling, Some transporters have been reported to be involved are known to regulate expression of cellulase genes [134, with sensing external carbon sources and thus with the 135]. Regarding cbh1 and cbh2, the two main cellobiohydro- induction of CAZymes [48, 61]. Although the importance lases, it has been demonstrated that growth of T. reesei in the of these proteins has been known, specially sugar trans- presence of light promotes an increase of 2-fold in gene porters, for the cellular response to the environment, there expression in the presence of cellulose [134, 135]. The signal are putative transporters that have been differentially integration of light and nutrient signaling was an important expressed [21, 35, 44, 45] and have not yet been character- discovery in the understanding of cellulase gene expression ized. Figure 7(a) summarizes a few genes that have been [105, 106]. The G-protein alpha subunits GNA1 and GNA3 expressed in very different levels depending upon the carbon were described to be involved in light-dependent regulation source. Among them, four copper transporters (ID 52315, ID of cellulase gene expression. Additionally, transcriptional 62716, ID 71029, and ID 108749) had transcriptional levels analysis of the effect of deletion of GNB1 and GNG1 as decreased in the presence of sophorose [21]. Supporting this well as the phosducin-like protein PHLP1 showed that gly- result, Bak showed in 2015 that a copper transporter was cosyl hydrolases are the major targets of light-dependent downregulated in Phanerochaete chrysosporium cultivated signaling by heterotrimeric G-proteins [109]. The intracel- in rice straw compared to no carbon source [138]. In 2016, lular level of cAMP directly affects cellulase expression [110, dos Santos Castro showed that the deletion of xyr1 caused 136]. Nogueira et al. [111] showed that intracellular levels of the transcriptional levels of these genes to be upregulated cAMP were 4-fold in the presence of sophorose, and cAMP in the knockout strain compared to the wild type in may regulate secretion of cellulolytic enzymes in T. reesei in sophorose [35]. Deletion of the major transcription factor the presence of this sugar. Interestingly, de Paula [43] showed involved in CCR, cre1, did not result in changes in these that deletion of MAPK gene tmk1 promoted the downregula- four genes [44, 45]. Therefore, these copper transporters tion of the main holocellulolytic genes of T. reesei. These may be involved in biomass degradation since copper is a results suggest that TMK1 is a positive regulator of cellulase required for LPMO activity [139], and the results gene expression and point this gene as a potential target for obtained so far suggest they are regulated (directly or improvement engineering approaches to enhance cellulase indirectly) by XYR1 but not by CRE1. expression in this fungus. Finally, all the knowledge about Another class of proteins related to biomass degradation intracellular signaling pathways and its effects in the cellulase regulation and that is still poorly characterized is transcrip- expression in T. reesei will provide important insights for tion factor (TF). A lot of effort have been put into discovering metabolic engineering for strain improvement to be used in new TF that would be important for the regulation of biotechnological industries. expression of genes that is important for an efficient biomass degradation [73, 80, 82, 140]. Antoniêto et al. [44, 45] and dos Santos Castro et al. [21, 35] studied differential expres- 7. The New Players Potentially sion in T. reesei cultivated in different carbon sources. Also, Involved in T. reesei Lignocellulosic they have studied two transcription factors, CRE1 and Biomass Degradation XYR1, and how they affect the transcriptional level of genes related to biomass degradation. Among the transcription In order to survive in different environments, filamentous factors that showed differences in expression, a few unknown fungi must be able to sense the surroundings and respond TF presented particularly high differences of expression to it accordingly. Once the fungus senses the carbon source Figure 7(b). The high expression of these TFs in sugarcane (whether it is sugarcane bagasse, cellulose, or sophorose— bagasse, cellulose, and sophorose suggests that they can play cellulase inducers), a signaling cascade is activated, and it a role on the regulation of biomass degradation and should ends with increasing/decreasing the transcription level of be considered targets for further studies [137]. Particularly, certain genes involved in the degradation of each carbon gene 107641, classified as a transcription factor by KOG, source. However, all this sensing and responding processes was repressed in sugarcane bagasse and induced in sophor- are still not completely characterized. Attempts to solve this ose. Also, the deletion of xyr1 caused this gene (ID 107641) puzzle have been made by analyzing transcriptome results, to be highly expressed when T. reesei was cultivated in and several studies have shown that T. reesei presents genes sophorose, a cellulase inductor. Moreover, this TF (ID that code for proteins of unknown function [137]. 107641) was also highly expressed in the knockout of cre1 Adifferential analysis of expressed genes when T. reesei when cultivated in cellulose Figure 7(b). All these results was cultivated in cellulose versus glucose, in sophorose versus combined suggest that this TF (ID 107641) is involved in cellulose, and in sophorose versus glucose revealed that about the regulation of transcription of enzymes related to biomass 35 to 46% of the genes presented unknown function [21]. degradation and that XYR1 and CRE1 supposedly act as Also, when analyzing the top 10 upregulated genes in repressors for the expression of this TF. Several studies have cellulose, five of them corresponded to protein of unknown identified genes encoding putative fungal C6 zinc finger- function [21]. Among the top 10 upregulated genes in type transcription factors enriched among the differential 12 International Journal of Genomics

Copper transporters Transcription factors

−5 0 5 0 5 Cellulose Cellulose Sophorose Sophorose Glucose Δxyr1 Glucose

Cellulose Cellulose Sophorose Sophorose Glucose Glucose

Cellulose Cellulose Sophorose Sophorose Glucose Glucose

Sugarcane bagasse Sugarcane bagasse QM6a QM9414 Δcre1 QM6a QM9414 Δcre1 Δxyr1 52315 62716 71028 108749 121178 122140 107641 123881 122271 105520 121164 121107 106051 123130 (a) (b)

Figure 7: Heatmap expression of new players potentially involved with T. reesei biomass degradation. (a) Expression profile of copper transporter-encoding gene differentially expressed in T. reesei strains grown in cellulose, sophorose, glucose, and sugarcane bagasse. (b) Expression profile of transcription factor-encoding gene differentially expressed in T. reesei strains grown in cellulose, sophorose, glucose, and sugarcane bagasse. The results of gene expressions were transformed in Log2FoldChange values and employed to heat map construction using the GraphPad Prism version 7 program (https://www.graphpad.com/). analysis [80, 141–143]. However, characterizing them Conflicts of Interest according to the and which gene expression fl it regulates is still necessary in order to have a greater The authors declare that there is no con ict of interest knowledge of biomass degradation regulation. Therefore, regarding the publication of this paper. the identification of novel transcription factors which tran- scription levels are affected by different carbon sources Authors’ Contributions may be the first step for understanding more deeply how the regulation of transcription occurs. Therefore, these new Renato Graciano de Paula and Amanda Cristina Campos findings could be used for engineering T. reesei in order to Antoniêto contributed equally to this work. have a new hyperproducer strain for the industry to use in biomass degradation with higher efficiency. Acknowledgments 8. Conclusions This work was supported by The State of São Paulo Research Foundation (FAPESP) (Processes 2016/20358-5, fi The adoption of the lamentous fungus T. reesei as the most 2016/23233-9, and 2017/04206-3). important holocellulase producer in biotechnology industries has been stabilized over the years. However, some aspects about holocellulase expression remain unclear. Thus, References global transcriptional analyses are an excellent approach “ to understanding gene expression and promoting the [1] G. J. Samuels, Trichoderma: a review of biology and systematics of the genus,” Mycological Research, vol. 100, selection of candidate genes for construction of strains no. 8, pp. 923–935, 1996. producing high levels of holocellulases for plant cell wall [2] V. K. Gupta, A. S. Steindorff, R. G. de Paula et al., “The degradation. This way, knowledge about the mechanisms post-genomic era of Trichoderma reesei: what’s next?,” involved in the recognition of environmental signals, sugar Trends in Biotechnology, vol. 34, no. 12, pp. 970–982, 2016. transport, and transcriptional regulatory events involved ff [3] G. E. Harman, A. H. Herrera-Estrella, B. A. Horwitz, and with fungal adaptation for di erent conditions consists in M. Lorito, “Special issue: Trichoderma—from basic biology an extremely important step in the fungal biology compre- to biotechnology,” Microbiology, vol. 158, no. 1, pp. 1-2, hension. The integration of all these data might contribute 2012. to a better understanding of the regulatory mechanisms [4] A. Schuster and M. Schmoll, “Biology and biotechnology during lignocellulosic biomass degradation by T. reesei of Trichoderma,” Applied Microbiology and Biotechnology, facilitating its application in fungal biotechnology. vol. 87, no. 3, pp. 787–799, 2010. International Journal of Genomics 13

[5] C. P. Kubicek, M. Komon-Zelazowska, and I. S. Druzhinina, [21] L. dos Santos Castro, W. Pedersoli, A. C. Antoniêto et al., “Fungal genus Hypocrea/Trichoderma: from barcodes to “Comparative metabolism of cellulose, sophorose and glu- biodiversity,” Journal of Zhejiang University Science B, cose in Trichoderma reesei using high-throughput genomic vol. 9, no. 10, pp. 753–763, 2008. and proteomic analyses,” Biotechnology for Biofuels, vol. 7, [6] D. Tisch and M. Schmoll, “Targets of light signalling in Tri- no. 1, p. 41, 2014. choderma reesei,” BMC Genomics, vol. 14, no. 1, p. 657, 2013. [22] C. P. Kubicek, A. Herrera-Estrella, V. Seidl-Seiboth et al., [7] A. Schuster, C. P. Kubicek, M. A. Friedl, I. S. Druzhinina, and “Comparative genome sequence analysis underscores myco- M. Schmoll, “Impact of light on Hypocrea jecorina and the parasitism as the ancestral life style of Trichoderma,” Genome multiple cellular roles of ENVOY in this process,” BMC Biology, vol. 12, no. 4, article R40, 2011. Genomics, vol. 8, no. 1, p. 449, 2007. [23] L. dos Santos Castro, A. C. C. Antoniêto, W. R. Pedersoli, [8] I. Druzhinina and C. P. Kubicek, “Species concepts and R. Silva-Rocha, G. F. Persinoti, and R. N. Silva, “Expression biodiversity in Trichoderma and Hypocrea: from aggregate pattern of cellulolytic and xylanolytic genes regulated by species to species clusters?,” Journal of Zhejiang University transcriptional factors XYR1 and CRE1 are affected by car- Science B, vol. 6, no. 2, pp. 100–112, 2005. bon source in Trichoderma reesei,” Gene Expression Patterns, – [9] G. E. Harman, “Overview of mechanisms and uses of Tricho- vol. 14, no. 2, pp. 88 95, 2014. derma spp,” Phytopathology, vol. 96, no. 2, pp. 190–194, 2006. [24] E. A. Znameroski, S. T. Coradetti, C. M. Roche et al., “Induc- [10] Y. Galante, A. De Conti, and R. Monteverdi, Application tion of lignocellulose-degrading enzymes in Neurospora ” of Trichoderma Enzymes in the Food and Feed Industries, crassa by cellodextrins, Proceedings of the National Academy G. E. Harman and C. P. Kubicek, Eds., Taylor and Francis, of Sciences of the United States of America, vol. 109, no. 16, – London, 1998. pp. 6012 6017, 2012. “ [11] Y. Galante, A. De Conti, and R. Monteverdi, Application of [25] Z. Anwar, M. Gulfraz, and M. Irshad, Agro-industrial ligno- Trichoderma Enzymes in the Textile Industry, G. E. Harman cellulosic biomass a key to unlock the future bio-energy: a ” and C. P. Kubicek, Eds., Taylor and Francis, London, 1998. brief review, Journal of Radiation Research and Applied Sciences, vol. 7, no. 2, pp. 163–173, 2014. [12] J. Buchert, T. Oksanen, J. Pere, M. Siika-Aho, A. Suurnäkki, “ and L. Viikari, Applications of Trichoderma reesei Enzymes [26] V. Vandenbossche, J. Brault, G. Vilarem et al., A new in the Pulp and Paper Industry, G. E. Harman and C. P. lignocellulosic biomass deconstruction process combining Kubicek, Eds., Taylor and Francis, London, 1998. thermo-mechano chemical action and bio-catalytic enzy- matic hydrolysis in a twin-screw extruder,” Industrial Crops [13] S. Pereira, L. Maehara, C. Machado, and C. Farinas, “2G and Products, vol. 55, pp. 258–266, 2014. ethanol from the whole sugarcane lignocellulosic biomass,” Biotechnology for Biofuels, vol. 8, no. 1, p. 44, 2015. [27] J. K. Saini, R. Saini, and L. Tewari, “Lignocellulosic agricul- [14] R. H. Bischof, J. Ramoni, and B. Seiboth, “Cellulases and ture wastes as biomass feedstocks for second-generation fi bioethanol production: concepts and recent developments,” beyond: the rst 70 years of the enzyme producer Tricho- – derma reesei,” Microbial Cell Factories, vol. 15, no. 1, p. 106, 3 Biotech, vol. 5, no. 4, pp. 337 353, 2015. 2016. [28] C. A. Cardona, J. A. Quintero, and I. C. Paz, “Production of “ bioethanol from sugarcane bagasse: status and perspectives,” [15] Y. Li, C. Liu, F. Bai, and X. Zhao, Overproduction of cellulase – by Trichoderma reesei RUT C30 through batch-feeding of Bioresource Technology, vol. 101, no. 13, pp. 4754 4766, synthesized low-cost sugar mixture,” Bioresource Technology, 2010. vol. 216, pp. 503 –510, 2016. [29] A. V. Gusakov, “Alternatives to Trichoderma reesei in bio- ” [16] J. Huang, D. Chen, Y. Wei et al., “Direct ethanol production fuel production, Trends in Biotechnology, vol. 29, no. 9, – from lignocellulosic sugars and sugarcane bagasse by a pp. 419 425, 2011. recombinant Trichoderma reesei strain HJ48,” The Scientific [30] L. R. Lynd, W. H. van Zyl, J. E. Mcbride, and M. Laser, World Journal, vol. 2014, Article ID 798683, 8 pages, 2014. “Consolidated bioprocessing of cellulosic biomass: an ” [17] S. S. Adav and S. K. Sze, “Trichoderma secretome: an update, Current Opinion in Biotechnology, vol. 16, no. 5, – overview,” in Biotechnology and Biology of Trichoderma,V. pp. 577 583, 2005. K. Gupta, M. Schmoll, A. Herrera-Estrella, R. S. Upadhyay, [31] L. R. Lynd, P. J. Weimer, W. H. van Zyl, and I. S. Pretorius, I. Druzhinina, and M. G. Tuohy, Eds., pp. 103–114, Elsevier, “Microbial cellulose utilization: fundamentals and biotech- Waltham, 1st edition, 2014. nology,” Microbiology and Molecular Biology Reviews, – [18] H. Bazafkan, D. Tisch, and M. Schmoll, “Regulation of vol. 66, no. 3, pp. 506 577, 2002. glycoside hydrolase expression in Trichoderma,” in Biotech- [32] Y. Shida, T. Furukawa, and W. Ogasawara, “Deciphering the nology and Biology of Trichoderma, V. K. Gupta, M. Schmoll, molecular mechanisms behind cellulase production in Tri- A. Herrera-Estrella, R. Upadhyay, I. Druzhinina, and M. G. choderma reesei, the hyper-cellulolytic filamentous fungus,” Tuohy, Eds., p. 527, Elsevier, 2014. Bioscience, Biotechnology, and Biochemistry, vol. 80, no. 9, – [19] J. Gao, Y. Qian, Y. Wang, Y. Qu, and Y. Zhong, “Production pp. 1712 1729, 2016. of the versatile cellulase for cellulose bioconversion and [33] D. Martinez, R. M. Berka, B. Henrissat et al., “Genome cellulase inducer synthesis by genetic improvement of sequencing and analysis of the biomass-degrading fungus Trichoderma reesei,” Biotechnology for Biofuels, vol. 10, Trichoderma reesei (syn. Hypocrea jecorina),” Nature Bio- no. 1, p. 272, 2017. technology, vol. 26, no. 5, pp. 553–560, 2008. [20] Q. Zhou, J. Xu, Y. Kou et al., “Differential involvement of [34] M. Schmoll, C. Dattenböck, N. Carreras-Villaseñor et al., β-glucosidases from Hypocrea jecorina in rapid induction “The genomes of three uneven siblings: footprints of the of cellulase genes by cellulose and cellobiose,” Eukaryotic lifestyles of three Trichoderma species,” Microbiology and Cell, vol. 11, no. 11, pp. 1371–1381, 2012. Molecular Biology Reviews, vol. 80, no. 1, pp. 205–327, 2016. 14 International Journal of Genomics

[35] L. dos Santos Castro, R. G. de Paula, A. C. C. Antoniêto, G. F. [49] E. M. Quistgaard, C. Löw, F. Guettou, and P. Nordlund, Persinoti, R. Silva-Rocha, and R. N. Silva, “Understanding the “Understanding transport by the major facilitator superfam- role of the master regulator XYR1 in Trichoderma reesei by ily (MFS): structures pave the way,” Nature Reviews Molecu- global transcriptional analysis,” Frontiers in Microbiology, lar Cell Biology, vol. 17, no. 2, pp. 123–132, 2016. vol. 7, p. 175, 2016. [50] S. S. Pao, I. T. Paulsen, and M. H. Saier Jr., “Major facilitator [36] R. J. Quinlan, M. D. Sweeney, L. Lo Leggio et al., “Insights superfamily,” Microbiology and Molecular Biology Reviews, into the oxidative degradation of cellulose by a copper metal- vol. 62, no. 1, pp. 1–34, 1998. ” loenzyme that exploits biomass components, Proceedings of [51] P. Cai, R. Gu, B. Wang et al., “Evidence of a critical role for the National Academy of Sciences of the United States of cellodextrin transporte 2 (CDT-2) in both cellulose and – America, vol. 108, no. 37, pp. 15079 15084, 2011. hemicellulose degradation and utilization in Neurospora [37] M. Saloheimo, M. Paloheimo, S. Hakola et al., “Swollenin, a crassa,” PLoS One, vol. 9, no. 2, article e89330, 2014. Trichoderma reesei protein with sequence similarity to the [52] J. d. O. Porciuncula, T. Furukawa, Y. Shida et al., “Identifica- plant expansins, exhibits disruption activity on cellulosic tion of major facilitator transporters involved in cellulase ” materials, European Journal of Biochemistry, vol. 269, production during lactose culture of Trichoderma reesei – no. 17, pp. 4202 4211, 2002. PC-3-7,” Bioscience, Biotechnology, and Biochemistry, [38] M. Häkkinen, M. Arvas, M. Oja et al., “Re-annotation of the vol. 77, no. 5, pp. 1014–1022, 2014. CAZy genes of Trichoderma reesei and transcription in the “ ” [53] C. Ivanova, J. A. Bååth, B. Seiboth, and C. P. Kubicek, Sys- presence of lignocellulosic substrates, Microbial Cell tems analysis of lactose metabolism in Trichoderma reesei Factories, vol. 11, no. 1, p. 134, 2012. identifies a lactose permease that is essential for cellulase “ [39] I. S. Druzhinina and C. P. Kubicek, Genetic engineering of induction,” PLoS One, vol. 8, no. 5, article e62631, 2013. Trichoderma reesei cellulases and their production,” Micro- [54] N. Chaudhary, I. Kumari, P. Sandhu, M. Ahmed, and bial Biotechnology, vol. 10, no. 6, pp. 1485–1499, 2017. Y. Akhter, “Proteome scale census of major facilitator [40] S. K. Brady, S. Sreelatha, Y. Feng, S. P. S. Chundawat, and superfamily transporters in Trichoderma reesei using protein “ M. J. Lang, Cellobiohydrolase 1 from Trichoderma reesei sequence and structure based classification enhanced rank- ” degrades cellulose in single cellobiose steps, Nature Commu- ing,” Gene, vol. 585, no. 1, pp. 166–176, 2016. nications, vol. 6, no. 1, p. 10149, 2015. [55] W. Vongsangnak, M. Salazar, K. Hansen, and J. Nielsen, [41] V. K. Gupta, Biotechnology and Biology of Trichoderma, “Genome-wide analysis of maltose utilization and regulation Elsevier, 2014. in aspergilli,” Microbiology, vol. 155, no. 12, pp. 3893–3902, ł ł “ fi [42] J. Strakowska, L. B aszczyk, and J. Che kowski, The signi - 2009. cance of cellulolytic enzymes produced by Trichoderma in [56] J. E. Galagan, S. E. Calvo, K. A. Borkovich et al., “The genome opportunistic lifestyle of this fungus,” Journal of Basic Micro- fi ” – sequence of the lamentous fungus Neurospora crassa, biology, vol. 54, no. S1, pp. S2 S13, 2014. Nature, vol. 422, no. 6934, pp. 859–868, 2003. [43] R. G. De Paula, Characterization of the MAPK-dependent [57] C. J. Law, P. C. Maloney, and D.-N. Wang, “Ins and outs of signaling pathway in the regulation of cellulase expression by major facilitator superfamily antiporters,” Annual Review of the fungus Trichoderma reesei (Hypocrea jecorina), University Microbiology, vol. 62, no. 1, pp. 289–305, 2008. of Sao Paulo, 2017. [58] N. Yan, “Structural advances for the major facilitator super- [44] A. C. Campos Antoniêto, R. Graciano de Paula, L. d. family (MFS) transporters,” Trends in Biochemical Sciences, Santos Castro, R. Silva-Rocha, G. Felix Persinoti, and vol. 38, no. 3, pp. 151–159, 2013. R. Nascimento Silva, “Trichoderma reesei CRE1-mediated “ carbon catabolite repression in response to sophorose [59] A. C. Colabardini, L. N. Ries, N. Brown et al., Functional through RNA sequencing analysis,” Current Genomics, characterization of a xylose transporter in Aspergillus nidu- ” vol. 17, no. 2, pp. 119–131, 2016. lans, Biotechnology for Biofuels, vol. 7, no. 1, p. 46, 2014. [45] A. C. C. Antoniêto, L. dos Santos Castro, R. Silva-Rocha, G. F. [60] W. Zhang, Y. Cao, J. Gong, X. Bao, G. Chen, and W. Liu, Persinoti, and R. N. Silva, “Defining the genome-wide role of “Identification of residues important for substrate uptake in CRE1 during carbon catabolite repression in Trichoderma a glucose transporter from the filamentous fungus Tricho- reesei using RNA-Seq analysis,” Fungal Genetics and Biology, derma reesei,” Scientific Reports, vol. 5, no. 1, article 13829, vol. 73, pp. 93–103, 2014. 2015. [46] C. P. Kubicek, R. Messner, F. Gruber, M. Mandels, and [61] Z. B. Huang, X. Z. Chen, L. N. Qin, H. Q. Wu, X. Y. Su, and E. M. Kubicek-Pranz, “Triggering of cellulase biosynthesis Z. Y. Dong, “A novel major facilitator transporter TrSTR1 is by cellulose in Trichoderma reesei: involvement of a consti- essential for pentose utilization and involved in xylanase tutive, sophorose-inducible, glucose-inhibited β-diglucoside induction in Trichoderma reesei,” Biochemical and Biophysi- permease,” Journal of Biological Chemistry, vol. 268, no. 26, cal Research Communications, vol. 460, no. 3, pp. 663–669, pp. 19364–19368, 1993. 2015. [47] C. P. Kubicek, R. Messner, F. Gruber, R. L. Mach, and [62] L. Ries, S. T. Pullan, S. Delmas, S. Malla, M. J. Blythe, and E. M. Kubicek-Pranz, “The Trichoderma cellulase regulatory D. B. Archer, “Genome-wide transcriptional response of puzzle: from the interior life of a secretory fungus,” Enzyme Trichoderma reesei to lignocellulose using RNA sequencing and Microbial Technology, vol. 15, no. 2, pp. 90–99, 1993. and comparison with Aspergillus niger,” BMC Genomics, [48] W. Zhang, Y. Kou, J. Xu et al., “Two major facilitator vol. 14, no. 1, p. 541, 2013. superfamily sugar transporters from Trichoderma reesei [63] L. Atanasova, S. L. Crom, S. Gruber et al., “Comparative and their roles in induction of cellulase biosynthesis,” Journal transcriptomics reveals different strategies of Trichoderma of Biological Chemistry, vol. 288, no. 46, pp. 32861–32872, mycoparasitism,” BMC Genomics, vol. 14, no. 1, p. 121, 2013. 2013. International Journal of Genomics 15

[64] J. Kang, J.-U. Hwang, M. Lee et al., “PDR-type ABC [78] A. R. Stricker, K. Grosstessner-Hain, E. Wurleitner, and R. L. transporter mediates cellular uptake of the phytohormone Mach, “Xyr1 (xylanase regulator 1) regulates both the hydro- abscisic acid,” Proceedings of the National Academy of lytic enzyme system and D-xylose metabolism in Hypocrea Sciences of the United States of America, vol. 107, no. 5, jecorina,” Eukaryotic Cell, vol. 5, no. 12, pp. 2128–2137, 2006. – pp. 2355 2360, 2010. [79] N. Aro, A. Saloheimo, M. Ilmén, and M. Penttilä, “ACEII, a [65] R. Bischof, L. Fourtis, A. Limbeck, C. Gamauf, B. Seiboth, and novel transcriptional activator involved in regulation of cellu- C. P. Kubicek, “Comparative analysis of the Trichoderma lase and xylanase genes of Trichoderma reesei,” The Journal of reesei transcriptome during growth on the cellulase inducing Biological Chemistry, vol. 276, no. 26, pp. 24309–24314, 2001. ” substrates wheat straw and lactose, Biotechnology for [80] M. Häkkinen, M. J. Valkonen, A. Westerholm-Parvinen et al., Biofuels, vol. 6, no. 1, p. 127, 2013. “Screening of candidate regulators for cellulase and hemicel- [66] E. Lamping, P. V. Baret, A. R. Holmes, B. C. Monk, lulase production in Trichoderma reesei and identification of A. Goffeau, and R. D. Cannon, “Fungal PDR transporters: a factor essential for cellulase production,” Biotechnology for phylogeny, topology, motifs and function,” Fungal Genetics Biofuels, vol. 7, no. 1, p. 14, 2014. – and Biology, vol. 47, no. 2, pp. 127 142, 2010. [81] S. Zeilinger, M. Schmoll, M. Pail, R. L. Mach, and C. P. “ fi [67] M. Ruocco, S. Lanzuise, F. Vinale et al., Identi cation of a Kubicek, “Nucleosome transactions on the Hypocrea jecorina new biocontrol gene in Trichoderma atroviride: the role of (Trichoderma reesei) cellulase promoter cbh2 associated with an ABC transporter membrane pump in the interaction with cellulase induction,” Molecular Genetics and Genomics, ff ” di erent plant-pathogenic fungi, Molecular Plant-Microbe vol. 270, no. 1, pp. 46–55, 2003. Interactions, vol. 22, no. 3, pp. 291–301, 2009. [82] M. Nitta, T. Furukawa, Y. Shida et al., “A new Zn(II) Cys - “ 2 6 [68] M. Karlsson, M. B. Durling, J. Choi et al., Insights on the type transcription factor BglR regulates β-glucosidase expres- evolution of mycoparasitism from the genome of Clonosta- sion in Trichoderma reesei,” Fungal Genetics and Biology, ” chys rosea, Genome Biology and Evolution, vol. 7, no. 2, vol. 49, no. 5, pp. 388–397, 2012. – pp. 465 480, 2015. “ “ ” [83] B. Seiboth, R. A. Karimi, P. A. Phatale et al., The putative [69] S. Wilkens, Structure and mechanism of ABC transporters, protein methyltransferase LAE1 controls cellulase gene F1000Prime Reports, vol. 7, p. 14, 2015. expression in Trichoderma reesei,” Molecular Microbiology, [70] M. G. L. Elferink, S. V. Albers, W. N. Konings, and A. J. M. vol. 84, no. 6, pp. 1150–1164, 2012. Driessen, “Sugar transport in Sulfolobus solfataricus is [84] R. Karimi Aghcheh, Z. Németh, L. Atanasova et al., “The mediated by two families of binding protein-dependent VELVET A orthologue VEL1 of Trichoderma reesei regulates ABC transporters,” Molecular Microbiology, vol. 39, no. 6, fungal development and is essential for cellulase gene expres- pp. 1494–1503, 2001. sion,” PLoS One, vol. 9, no. 11, article e112799, 2014. [71] A. Watanabe, K. Hiraga, M. Suda, H. Yukawa, and M. Inui, [85] N. Aro, M. Ilmen, A. Saloheimo, and M. Penttila, “ACEI of “Functional characterization of Corynebacterium alkanolyti- Trichoderma reesei is a repressor of cellulase and xylanase cum β-xylosidase and xyloside ABC transporter in Coryne- expression,” Applied and Environmental Microbiology, bacterium glutamicum,” Applied and Environmental vol. 69, no. 1, pp. 56–65, 2003. Microbiology, vol. 81, no. 12, pp. 4173–4183, 2015. “ [72] S. M. Koning, M. G. L. Elferink, W. N. Konings, and A. J. M. [86] Y. Cao, F. Zheng, L. Wang et al., Rce1, a novel transcrip- Driessen, “Cellobiose uptake in the hyperthermophilic tional repressor, regulates cellulase gene expression by antag- onizing the transactivator Xyr1 in Trichoderma reesei,” archaeon Pyrococcus furiosus is mediated by an inducible, – high-affinity ABC transporter,” Journal of Bacteriology, Molecular Microbiology, vol. 105, no. 1, pp. 65 83, 2017. vol. 183, no. 17, pp. 4979–4984, 2001. [87] L. Ries, N. J. Belshaw, M. Ilmén, M. E. Penttilä, “ [73] T. Furukawa, Y. Shida, N. Kitagami et al., “Identification of M. Alapuranen, and D. B. Archer, The role of CRE1 in specific binding sites for XYR1, a transcriptional activator nucleosome positioning within the cbh1 promoter and ” of cellulolytic and xylanolytic genes in Trichoderma reesei,” coding regions of Trichoderma reesei, Applied Microbiology – Fungal Genetics and Biology, vol. 46, no. 8, pp. 564–574, 2009. and Biotechnology, vol. 98, no. 2, pp. 749 762, 2014. “ [74] T. Nakari-Setala, M. Paloheimo, J. Kallio, J. Vehmaanpera, [88] B. Van Vu, K. T. M. Pham, and H. Nakayashiki, Substrate- M. Penttila, and M. Saloheimo, “Genetic modification of induced transcriptional activation of the MoCel7C cellulase carbon catabolite repression in Trichoderma reesei for gene is associated with methylation of histone H3 at lysine ” improved protein production,” Applied and Environmental 4 in the rice blast fungus Magnaporthe oryzae, Applied and – Microbiology, vol. 75, no. 14, pp. 4853–4860, 2009. Environmental Microbiology, vol. 79, no. 21, pp. 6823 6832, [75] T. Portnoy, A. Margeot, R. Linke et al., “The CRE1 carbon 2013. catabolite repressor of the fungus Trichoderma reesei:a [89] Q. Xin, Y. Gong, X. Lv, G. Chen, and W. Liu, “Trichoderma master regulator of carbon assimilation,” BMC Genomics, reesei histone acetyltransferase Gcn5 regulates fungal growth, vol. 12, no. 1, p. 269, 2011. conidiation, and cellulase gene expression,” Current Microbi- – [76] T. Portnoy, A. Margeot, V. Seidl-Seiboth et al., “Differential ology, vol. 67, no. 5, pp. 580 589, 2013. regulation of the cellulase transcription factors XYR1, [90] H. Hirasawa, K. Shioya, T. Furukawa et al., “Engineering of ACE2, and ACE1 in Trichoderma reesei strains producing the Trichoderma reesei xylanase3 promoter for efficient high and low levels of cellulase,” Eukaryotic Cell, vol. 10, enzyme expression,” Applied Microbiology and Biotechnol- no. 2, pp. 262–271, 2011. ogy, vol. 102, no. 6, pp. 2737–2752, 2018. [77] E. Akel, B. Metz, B. Seiboth, and C. P. Kubicek, “Molecular [91] G. Zou, S. Shi, Y. Jiang et al., “Construction of a cellulase regulation of arabinan and L-arabinose metabolism in Hypo- hyper-expression system in Trichoderma reesei by promoter crea jecorina (Trichoderma reesei),” Eukaryotic Cell, vol. 8, and enzyme engineering,” Microbial Cell Factories, vol. 11, no. 12, pp. 1837–1844, 2009. no. 1, p. 21, 2012. 16 International Journal of Genomics

[92] F. Uzbas, U. Sezerman, L. Hartl, C. P. Kubicek, and the presence of light,” Eukaryotic Cell, vol. 8, no. 3, pp. 410– B. Seiboth, “A homologous production system for Tricho- 420, 2009. derma reesei secreted proteins in a cellulase-free back- [107] K. Brunner, M. Omann, M. E. Pucher et al., “Trichoderma G ” ground, Applied Microbiology and Biotechnology, vol. 93, protein-coupled receptors: functional characterisation of a – no. 4, pp. 1601 1608, 2012. cAMP receptor-like protein from Trichoderma atroviride,” [93] T. Liu, T. Wang, X. Li, and X. Liu, “Improved heterologous Current Genetics, vol. 54, no. 6, pp. 283–299, 2008. gene expression in Trichoderma reesei by cellobiohydrolase [108] M. R. Omann, S. Lehner, C. Escobar Rodriguez, K. Brunner, ” I gene (cbh1) promoter optimization, Acta Biochimica et and S. Zeilinger, “The seven-transmembrane receptor Gpr1 – Biophysica Sinica, vol. 40, no. 2, pp. 158 165, 2008. governs processes relevant for the antagonistic interaction [94] Y.-S. Bahn, C. Xue, A. Idnurm, J. C. Rutherford, J. Heitman, of Trichoderma atroviride with its host,” Microbiology, and M. E. Cardenas, “Sensing the environment: lessons from vol. 158, no. 1, pp. 107–118, 2012. ” – fungi, Nature Reviews Microbiology, vol. 5, no. 1, pp. 57 69, [109] D. Tisch, C. P. Kubicek, and M. Schmoll, “The phosducin-like 2007. protein PhLP1 impacts regulation of glycoside hydrolases [95] M. Schmoll, “The information highways of a biotechnological and light response in Trichoderma reesei,” BMC Genomics, workhorse-signal transduction in Hypocrea jecorina,” BMC vol. 12, no. 1, p. 613, 2011. Genomics, vol. 9, no. 1, p. 430, 2008. [110] S. Sestak and V. Farkas, “Metabolic regulation of endogluca- [96] M. Saloheimo and T. M. Pakula, “The cargo and the transport nase synthesis in Trichoderma reesei: participation of cyclic system: secreted proteins and protein secretion in Tricho- AMP and glucose-6-phosphate,” Canadian Journal of Micro- derma reesei (Hypocrea jecorina),” Microbiology, vol. 158, biology, vol. 39, no. 3, pp. 342–347, 1993. – no. 1, pp. 46 57, 2012. [111] K. M. V. Nogueira, M. d. N. Costa, R. G. de Paula, F. C. [97] J. Zhang, Y. Zhang, Y. Zhong, Y. Qu, and T. Wang, “Ras Mendonça-Natividade, R. Ricci-Azevedo, and R. N. Silva, GTPases modulate morphogenesis, sporulation and cellulase “Evidence of cAMP involvement in cellobiohydrolase gene expression in the cellulolytic fungus Trichoderma expression and secretion by Trichoderma reesei in presence reesei,” PLoS One, vol. 7, no. 11, article e48786, 2012. of the inducer sophorose,” BMC Microbiology, vol. 15, [98] S. Zeilinger, B. Reithner, V. Scala, I. Peissl, M. Lorito, and no. 1, p. 195, 2015. R. L. Mach, “Signal transduction by Tga3, a novel G protein [112] A. Schuster, D. Tisch, V. Seidl-Seiboth, C. P. Kubicek, and alpha subunit of Trichoderma atroviride,” Applied and Envi- M. Schmoll, “Roles of protein kinase A and adenylate cyclase ronmental Microbiology, vol. 71, no. 3, pp. 1591–1597, 2005. in light-modulated cellulase regulation in Trichoderma ree- ” [99] C. A. D'Souza and J. Heitman, “Conserved cAMP signaling sei, Applied and Environmental Microbiology, vol. 78, no. 7, – cascades regulate fungal development and virulence,” FEMS pp. 2168 2178, 2012. Microbiology Reviews, vol. 25, no. 3, pp. 349–364, 2001. [113] M. Wang, H. Yang, M. Zhang et al., “Functional analysis of α [100] T. Vasara, M. Saloheimo, S. Keränen, and M. Penttilä, Trichoderma reesei CKII 2, a catalytic subunit of casein ” “Trichoderma reesei rho3, a homologue of yeast RHO3, sup- kinase II, Applied Microbiology and Biotechnology, vol. 99, – presses the growth defect of yeast sec15-1 mutation,” Current no. 14, pp. 5929 5938, 2015. Genetics, vol. 40, no. 2, pp. 119–127, 2001. [114] R. He, W. Guo, and D. Zhang, “An ethanolamine kinase Eki1 ff [101] R. Morawetz, H. Mischak, J. Goodnight, T. Lendenfeld, a ects radial growth and cell wall integrity in Trichoderma ” J. F. Mushinsky, and C. P. Kubicek, “A protein kinase- reesei, FEMS Microbiology Letters, vol. 362, no. 17, article encoding gene, pktl, from Trichoderma reesei, homologous fnv133, 2015. to the yeast YPK1 and YPK2 (YKR2) genes,” Gene, [115] M. Wang, Q. Zhao, J. Yang et al., “A mitogen-activated vol. 146, no. 2, pp. 309-310, 1994. protein kinase Tmk3 participates in high osmolarity resis- [102] R. Morawetz, T. Lendenfeld, H. Mischak et al., “Cloning and tance, cell wall integrity maintenance and cellulase produc- ” characterisation of genes (pkc1 and pkcA) encoding protein tion regulation in Trichoderma reesei, PLoS One, vol. 8, kinase C homologues from Trichoderma reesei and Aspergil- no. 8, article e72189, 2013. lus niger,” Molecular and General Genetics MGG, vol. 250, [116] M. Wang, Y. Dong, Q. Zhao et al., “Identification of the no. 1, pp. 17–28, 1996. role of a MAP kinase Tmk2 in Hypocrea jecorina (Tricho- ” fi [103] T. Lendenfeld and P. C. Kubicek, “Characterization and derma reesei), Scienti c Reports, vol. 4, no. 1, article 6732, properties of protein kinase C from the filamentous fungus 2015. Trichoderma reesei,” Biochemical Journal, vol. 330, no. 2, [117] F. Chen, X. Z. Chen, X. Y. Su et al., “An Ime2-like mitogen- pp. 689–694, 1998. activated protein kinase is involved in cellulase expression fi ” [104] M. Wang, M. Zhang, L. Li et al., “Role of Trichoderma reesei in the lamentous fungus Trichoderma reesei, Biotechnology mitogen-activated protein kinases (MAPKs) in cellulase for- Letters, vol. 37, no. 10, pp. 2055–2062, 2015. mation,” Biotechnology for Biofuels, vol. 10, no. 1, p. 99, [118] D. E. Clapham, “Calcium signaling,” Cell, vol. 131, no. 6, 2017. pp. 1047–1058, 2007. [105] C. Seibel, G. Gremel, R. do Nascimento Silva, A. Schuster, [119] L. Navazio, B. Baldan, R. Moscatiello et al., “Calcium- C. P. Kubicek, and M. Schmoll, “Light-dependent roles of mediated perception and defense responses activated in the G-protein α subunit GNA1 of Hypocrea jecorina plant cells by metabolite mixtures secreted by the biocon- (anamorph Trichoderma reesei),” BMC Biology, vol. 7, no. 1, trol fungus Trichoderma atroviride,” BMC Plant Biology, p. 58, 2009. vol. 7, no. 1, p. 41, 2007. [106] M. Schmoll, A. Schuster, R. N. Silva, and C. P. Kubicek, “The [120] R. L. Mach, S. Zeilinger, D. Kristufek, and C. P. Kubicek, G-alpha protein GNA3 of Hypocrea jecorina (Anamorph “Ca2+-calmodulin antagonists interfere with xylanase forma- Trichoderma reesei) regulates cellulase gene expression in tion and secretion in Trichoderma reesei,” Biochimica et International Journal of Genomics 17

Biophysica Acta (BBA) - Molecular Cell Research, vol. 1403, fungus Trichoderma virens,” BMC Genomics, vol. 14, no. 1, no. 3, pp. 281–289, 1998. p. 138, 2013. [121] D. Tisch, C. P. Kubicek, and M. Schmoll, “New insights [134] F. Castellanos, M. Schmoll, P. Martínez et al., “Crucial into the mechanism of light modulated signaling by factors of the light perception machinery and their impact heterotrimeric G-proteins: ENVOY acts on gna1 and on growth and cellulase gene transcription in Trichoderma gna3 and adjusts cAMP levels in Trichoderma reesei (Hypo- reesei,” Fungal Genetics and Biology, vol. 47, no. 5, pp. 468– crea jecorina),” Fungal Genetics and Biology, vol. 48, no. 6, 476, 2010. – pp. 631 640, 2011. [135] M. Schmoll, L. Franchi, and C. P. Kubicek, “Envoy, a PAS/ [122] A. Cziferszky, R. L. Mach, and C. P. Kubicek, “Phosphoryla- LOV domain protein of Hypocrea jecorina (Anamorph tion positively regulates DNA binding of the carbon catabo- Trichoderma reesei), modulates cellulase gene transcription lite repressor Cre1 of Hypocrea jecorina (Trichoderma in response to light,” Eukaryotic Cell, vol. 4, no. 12, reesei),” The Journal of Biological Chemistry, vol. 277, pp. 1998–2007, 2005. – no. 17, pp. 14688 14694, 2002. [136] W. Dong, Q. Yinbo, and G. Peiji, “Regulation of cellulase [123] I. E. Cabrera, I. V. Pacentine, A. Lim et al., “Global synthesis in mycelial fungi: participation of ATP and cyclic analysis of predicted G protein−coupled receptor genes AMP,” Biotechnology Letters, vol. 17, no. 6, pp. 593–598, in the filamentous fungus, Neurospora crassa,” G3: 1995. – Genes, Genomes, Genetics, vol. 5, no. 12, pp. 2729 [137] G. P. Borin, C. C. Sanchez, E. S. de Santana et al., “Compara- 2743, 2015. tive transcriptome analysis reveals different strategies for deg- [124] P. E. Vercoe, S. A. Kocherginskaya, and B. A. White, radation of steam-exploded sugarcane bagasse by Aspergillus “Differential protein phosphorylation–dephosphorylation in niger and Trichoderma reesei,” BMC Genomics, vol. 18, no. 1, response to carbon source in Ruminococcus flavefaciens p. 501, 2017. ” FD-1, Journal of Applied Microbiology, vol. 94, no. 6, [138] J. S. Bak, “Lignocellulose depolymerization occurs via an – pp. 974 980, 2003. environmentally adapted metabolic cascades in the wood- [125] D. Tisch and M. Schmoll, “Novel approaches to improve rotting basidiomycete Phanerochaete chrysosporium,” cellulase biosynthesis for biofuel production—adjusting Microbiology, vol. 4, no. 1, pp. 151–166, 2015. signal transduction pathways in the biotechnological work- [139] G. R. Hemsworth, E. M. Johnston, G. J. Davies, and ” — horse Trichoderma reesei, in Biofuel Production -Recent P. H. Walton, “Lytic polysaccharide monooxygenases in Developments and Prospects, M. A. Santos Bernardes, Ed., biomass conversion,” Trends in Biotechnology, vol. 33, – pp. 199 224, Intech, Rijeka, Croatia, 2011. no. 12, pp. 747–761, 2015. “ [126] T. M. Mello-de-Sousa, A. Rassinger, M. E. Pucher et al., The [140] W. C. Kim, J. Y. Kim, J. H. Ko, H. Kang, and K. H. Han, impact of chromatin remodelling on cellulase expression in “Identification of direct targets of transcription factor ” Trichoderma reesei, BMC Genomics, vol. 16, no. 1, p. 588, MYB46 provides insights into the transcriptional regulation 2015. of secondary wall biosynthesis,” Plant Molecular Biology, [127] G. P. Borin, C. C. Sanchez, A. P. de Souza et al., “Comparative vol. 85, no. 6, pp. 589–599, 2014. secretome analysis of Trichoderma reesei and Aspergillus [141] Y. Xiong, V. W. Wu, A. Lubbe et al., “A fungal transcription ” Niger during growth on sugarcane biomass, PLoS One, factor essential for starch degradation affects integration of vol. 10, no. 6, article e0129275, 2015. carbon and nitrogen metabolism,” PLoS Genetics, vol. 13, [128] R. Peterson and H. Nevalainen, “Trichoderma reesei RUT- no. 5, article e1006737, 2017. — ” C30 thirty years of strain improvement, Microbiology, [142] S. T. Coradetti, J. P. Craig, Y. Xiong, T. Shock, C. Tian, and – vol. 158, no. 1, pp. 58 68, 2012. N. L. Glass, “Conserved and essential transcription factors [129] R. K. Aghcheh and C. P. Kubicek, “Epigenetics as an for cellulase gene expression in ascomycete fungi,” Proceed- emerging tool for improvement of fungal strains used in ings of the National Academy of Sciences, vol. 109, no. 19, biotechnology,” Applied Microbiology and Biotechnology, pp. 7397–7402, 2012. – vol. 99, no. 15, pp. 6167 6181, 2015. [143] P. K. Foreman, D. Brown, L. Dankmeyer et al., “Transcrip- [130] A. B. Sanz, R. García, J. M. Rodríguez-Peña, C. Nombela, and tional regulation of biomass-degrading enzymes in the J. Arroyo, “Cooperation between SAGA and SWI/SNF filamentous fungus Trichoderma reesei,” The Journal of Bio- complexes is required for efficient transcriptional responses logical Chemistry, vol. 278, no. 34, pp. 31988–31997, 2003. ” regulated by the yeast MAPK Slt2, Nucleic Acids Research, [144] F. Segato, A. R. Damásio, R. C. de Lucas, F. M. Squina, and – vol. 44, pp. 7159 7172, 2016. R. A. Prade, “Genomics review of holocellulose deconstruc- [131] R. M. Duran, S. Gregersen, T. D. Smith et al., “The role of tion by aspergilli,” Microbiology and Molecular Biology Aspergillus flavus veA in the production of extracellular Reviews, vol. 78, no. 4, pp. 588–613, 2014. ” proteins during growth on starch substrates, Applied Micro- [145] C. M. Payne, B. C. Knott, H. B. Mayes et al., “Fungal cellu- – biology and Biotechnology, vol. 98, no. 11, pp. 5081 5094, lases,” Chemical Reviews, vol. 115, no. 3, pp. 1308–1448, 2015. 2014. “ ” “ [146] A. Berlin, No barriers to cellulose breakdown, Science, [132] J. Schumacher, A. Simon, K. C. Cohrs et al., The VELVET vol. 342, no. 6165, pp. 1454–1456, 2013. complex in the gray mold fungus Botrytis cinerea: impact of BcLAE1 on differentiation, secondary metabolism, and virulence,” Molecular Plant-Microbe Interactions, vol. 28, no. 6, pp. 659–674, 2015. [133] N. Trushina, M. Levin, P. K. Mukherjee, and B. A. Horwitz, “PacC and pH-dependent transcriptome of the mycotrophic Hindawi International Journal of Genomics Volume 2018, Article ID 2312987, 15 pages https://doi.org/10.1155/2018/2312987

Review Article Metagenomic Approaches for Understanding New Concepts in Microbial Science

1 2 1 Luana de Fátima Alves, Cauã Antunes Westmann, Gabriel Lencioni Lovate, 1 3 Guilherme Marcelino Viana de Siqueira, Tiago Cabral Borelli, 3 and María-Eugenia Guazzaroni

1Department of Biochemistry, Faculdade de Medicina de Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, Brazil 2Department of Cell Biology, Faculdade de Medicina de Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, Brazil 3Department of Biology, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, Brazil

Correspondence should be addressed to María-Eugenia Guazzaroni; [email protected]

Received 23 February 2018; Revised 21 June 2018; Accepted 29 July 2018; Published 23 August 2018

Academic Editor: Henry Heng

Copyright © 2018 Luana de Fátima Alves et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Over the past thirty years, since the dawn of metagenomic studies, a completely new (micro) universe was revealed, with the potential to have profound impacts on many aspects of the society. Remarkably, the study of human microbiome provided a new perspective on a myriad of human traits previously regarded as solely (epi-) genetically encoded, such as disease susceptibility, immunological response, and social and nutritional behaviors. In this context, metagenomics has established a powerful framework for understanding the intricate connections between human societies and microbial communities, ultimately allowing for the optimization of both human health and productivity. Thus, we have shifted from the old concept of microbes as harmful organisms to a broader panorama, in which the signal of the relationship between humans and microbes is flexible and directly dependent on our own decisions and practices. In parallel, metagenomics has also been playing a major role in the prospection of “hidden” genetic features and the development of biotechnological applications, through the discovery of novel genes, enzymes, pathways, and bioactive molecules with completely new or improved biochemical functions. Therefore, this review highlights the major milestones over the last three decades of metagenomics, providing insights into both its potentialities and current challenges.

1. Introduction Around ten years later, in 1998, the term “metagenome” appeared, when Handelsman and collaborators [2] described About thirty years ago, in 1986, Pace and collaborators [1] the importance of soil microorganisms as sources for new proposed, for the first time, the revolutionary idea of cloning natural compounds (Figure 1, indicated as M6). According DNA directly from environmental samples to analyze the to them, a new frontier in science was emerging—the mining complexity of natural microbial populations (Figure 1, indi- for novel chemical compounds from uncultured microorgan- cated as M2). The adopted strategy was based on shotgun isms, which comprises more than 99% of the microbial diver- cloning of 16S rRNA genes using purified DNA from natural sity [3]. This new concept in microbial science opened the samples. At that time, authors stressed that although the mind of the scientific community with respect to the aston- DNA was originated from a mixed population of microor- ishingly large catalogue of biochemical functions available ganisms, the methodology allowed the recovery and subse- in nature remaining to be discovered. quent sequencing of individual rRNA genes. Thus, by Currently, metagenomics is subdivided into two major evaluating complete or partial rRNA sequences, the composi- approaches, which target different aspects of the local micro- tion of the original microbial populations could be retrieved. bial community associated with a determined environment. 2 International Journal of Genomics

M5: Stein et al. described the frst genomic sequence M10: MetaHIT consortium bearing a 16S rRNA gene of releases the human gut an uncultured archaeon microbial gene catalogue M1: Sanger et al. M6: Handelsman and develop DNA M9: frst next-generation collaborators employ the sequencing sequencing term “metagenome” platform is released

M3: Schimdt et al. generate the frst metagenomic library from marine plankton

1977 1991 1996 1998 2005 2010

1986 1995 2000 2004 2016

M7: Rondon et al. introduce the term “metagenomic library”

M8: Sequencing of the M4: Healy et al. sargasso sea by Venter et al. construct metagenomic libraries from a gene of M2: Pace et al. perform, at frst interest-related M11: MetaSUB time, cloning of DNA directly form environment to mining consortium is environmental samples cellulases created

Figure 1: Timeline of the major advancements in metagenomics. Timeline highlighting important developments in the field over the last 40 years, since Sanger sequencing (M1), and over the last 30 years, since the first published metagenomic experiment (M2). The main metagenomic milestones are shown as M1–M11 (all of them are highlighted in the text where they were mentioned).

In the first one, the so-called structural metagenomic approaches that establish a link between 16S rRNA analyses approach, the main focus is to study the structure of the to genes or metabolic pathways have been shown to be use- uncultivated microbial population, which can be expanded ful in determining the functional potential of a microbiome to other properties, such as the reconstruction of the complex [11–13]. Thus, the combination of these complementary metabolic network established between community members strategies allows for a deeper exploration of relevant biolog- (Figure 2) [4, 5]. In this sense, the microbial community struc- ical questions in microbial ecology such as “who are the ture can be defined as the population composition and its members of the community?” and “what are their func- dynamics in a specific ecosystem, in response to selective tional roles?” pressures and spatiotemporal parameters. The study of the At the beginning of the metagenomic studies, the use of community structure allows a deeper understanding about Sanger sequencing technology [14] provided important pro- the relationships between the individual components that gresses in the field [15–17] (Figure 1, indicated as M1). How- build a community and is essential for deciphering ecological ever, the advent of next-generation sequencing (NGS) or biological functions among its members [5, 6]. In a differ- technologies capable of sequencing millions of DNA frag- ent manner, the functional metagenomic approach aims to ments simultaneously, at a low cost, greatly bolstered the field identify genes that code for a function of interest, which [18–21] (Figure 1, indicated as M9). Comparatively, NGS involves the generation of expression libraries with thousands platforms can recover up to 5000 Mb of DNA sequence per of metagenomic clones followed by activity-based screenings day with costs at about 0.50$/Mb, while Sanger sequencing (Figure 2) [7, 8]. methodology allows about 6 Mb of DNA sequence to be cre- It is important to highlight that 16S rRNA gene surveys ated per day with costs at about 1000 higher [22]. are often referred to as metagenomic studies, although they This review focuses on how metagenomics has con- are not. In the 16S rRNA gene analysis, the study is focused tributed to gain scientific comprehension in many differ- on a single gene used as a taxonomic marker (Figure 2). On ent areas of knowledge. In this manner, milestones in the other hand, structural metagenomics aims to investigate metagenomics have ranged from findings with significant the genomes of the microbial community members. In this biotechnological impact to unexpected outcomes with high sense, the later approach allows the overall reconstruction biomedical relevance, shining a light on hidden molecular of the community structure, potentially revealing metabolic components and on the connections between microbial pathways of the whole microbiome and assigning minor communities and complex diseases [23–26]. We also dis- or major geoecological roles to community members [4, cuss the current boundaries of the field that should be 6, 9, 10]. We highlight that 16S rRNA surveys and meta- overcome for the achievement of conceptual advances in genomics are not mutually exclusive; on the contrary, microbial science. International Journal of Genomics 3

Construction of a 16S eDNA sequencing rRNA library

Environmental DNA extraction Microbial community analysis Sequencing

Clostridium beijerinckii Br21 (KT626859) Clostridium beijerinckii TB8 (LC020493) Clostridium beijerinckii TERI-Chilika (KF892544) Clostridium beijerinckii TERI-Chilika (KF892545) uncultured Clostridium sp. (KF680967) Screenings of libraries uncultured Clostridium sp. (KF680963) uncultured Clostridium sp. (KF680961) for function/activity Clostridium sp. G117 (JX091678) Clostridium sp. MN7 (JX575131) Clostridium sp. KL6 (JX575130) Clostridium beijerinckii JCM (AB971810) uncultured Clostridium sp. (HG917276) Clostridium sp. C5S10 (AB539904) uncultured bacterium (EU828377) Clostridium butyricum VPI3266 (NR042144) uncultured bacterium (G1296461) uncultured bacterium (FN667423)

Phylogenetic analysis

Functional Structural Not metagenomics metagenomics metagenomics

Figure 2: The metagenomics framework and its two main approaches. Both structural and functional metagenomic approaches are the main strategies for exploring key ecological and biotechnological features in environmental samples, respectively. Additionally, 16S rRNA gene surveys can work in synergy with metagenomics for further understanding of microbial ecology.

2. Milestones in Metagenomics technical strategies, which have been addressed in recent reviews [30–32]. In 1991, Schmidt and collaborators generated the first meta- genomic library using DNA from marine picoplankton [27], 2.1. Genomic and Taxonomic Novelties in Environments. Ever and, some years later, Healy et al. constructed metagenomic since the proposal of using molecular markers such as 16S (or libraries from an enriched consortia sampled from cellulose 18S) rRNA for comparing species similarity, in the late 1970s, digesters to mine genes encoding cellulases [28] (Figure 1, followed by the outlining of a third “urkingdom” (the indicated as M3 and M4, resp.). In this context, the idea of Archaebacteria), our knowledge of organism relatedness screening metagenomic libraries from specific environments has taken a great leap forward. Taxonomic classification was introduced, allowing that the number and properties of started to rely on a “comparative approach that can measure the retrieved genes (e.g., enzymes) could be correlated to the degree of difference in comparable structures,” which not the conditions of the source environments. However, only only allowed a more resolved phylogeny and a less biased in 2000, Rondon and collaborators [29] coined the term organization based on the Prokaryotae versus Eukaryotae “metagenomic libraries,” by generating libraries in BACs dichotomy but also made it possible to better understand (bacterial artificial chromosome) using DNA from soil sam- how life on earth has come to be [33]. In this context, meta- ples (Figure 1, indicated as M7). Furthermore, the authors genomic approaches have been used to generate data of novel also performed phylogenetic analysis of 16S rRNA sequences genomes from otherwise uncultivated organisms, deepening and identified a number of clones expressing heterologous the framework of genomic tools available for comparison genes in functional screenings using Escherichia coli as host. and study. Since then, a large amount of data has been generated The shotgun sequencing of the Sargasso Sea waters, by using metagenomic approaches and impacting different areas Venter and colleagues in 2004 [34], is one of the most illus- of high applicability in our society (Figure 2). Here, we trative examples of how metagenomics are a feasible way to highlighted the main milestones in metagenomic studies that accumulate genomic knowledge (Figure 1, indicated as defined the field in distinct contexts. Although many of the M8). In this study, Venter and collaborators have recovered advances in the field can be credited to novel sequencing almost 1.5 Gbp of microbial DNA sequences from microbial approaches and the development of new computational populations of three marine sites using en masse whole methodologies to analyze the generated data, in this review, genome shotgun sequencing from filtered sea water. This we have focused on highlighting biological discoveries leads to the finding of almost 70 thousand novel genes among through metagenomics rather than describing these more the roughly 1.2 million genes by ORF (open reading frame) 4 International Journal of Genomics identification and alignment of the putative protein products. established. Rinke and colleagues, for instance, revealed the Among the main findings, the researchers described a novel “microbial dark matter” in a single-cell genomics study that ammonia oxidation pathway uninhibited by UV light, puta- comprised over 20 major uncultivated archaeal and bacterial tive genes for trace metal resistance such as arsenate and lineages [41]. They first described archaeal sigma factors and copper, and, additionally, up to 782 new proteorhodopsin- the first reported lateral gene transfer from a eukaryote to an like receptors genes. The latter added an insight to the archaeon. Another insight of how the archaeal/eukaryotic exotic marine photoheterotrophic lifestyle first described relationship came to be and how the modern eukaryotic cell by Béjà and colleagues only a few years before [35]. Further, was formed was found by Spang and collaborators [42], who the metagenomic study of the Sargasso Sea allowed the have identified a candidate archaeal phylum which forms a identification of 148 previously unknown rRNA bacterial monophyletic group with eukaryotes. It also possesses genes phylotypes. However, this probably was an underestimation encoding proteins similar to those related to cell shape for- of the environment’s total genomic pool, given that the dif- mation processes in eukaryotic cells, which might suggest ference in the number of rRNA coding genes within the sophisticated membrane remodeling and vesicular traffick- rRNA operon between species may result in biases in PCR ing processes in eukaryotic cells even before the acquisition studies with under representation of some of the commu- of mitochondrion. nity constituents, as also discussed by Klappenbach and col- leagues [36]. 2.2. Innovative Functions in Uncultivable Microbes. In this context, a recent study based on metagenomic Sequenced-based analysis in metagenomics can be accom- data has shown that about 10% of environmental bacterial plished, overall, by following one of the following paths: or archaeal sequences might not be recovered when using a (i) sequencing all clones with a phylogenetic marker indicat- targeted PCR survey with the most common primers for ing the potential taxonomic source of the DNA fragment or SSU rRNA [37]. For instance, a study led by Brown and col- (ii) sequencing random fragments until a gene of interest is leagues in 2015 [38] first described the Candidate Phyla Radi- found followed by sequencing of the adjacent regions to find ation (CPR) bacterial lineage comprising at least 25 new taxonomic markers. The former method was developed by bacterial phyla, which make up to at least 15% to the Bacteria Stein’s group and described the first genomic sequence bear- domain. These novel organisms lack many biosynthetic ing a 16S rRNA gene of an uncultured archaeon (Figure 1, pathways and possess unusual features in their small indicated as M5) [43]. This provided the first highlights of genomes, such as self-splicing introns within the rRNA genes the metagenomics capabilities for unravelling novel genes, and novel ribosome structure, providing insight to the organ- functions, and taxonomic groups. In this study, with a elle’s evolution trajectory. It is worth noting that all complete marine picoplankton assemblage collected at eastern North CPR genomes curated in the work have only one copy of the Pacific, a 38.5 kbp fragment containing an archaeal 16S rRNA 16S rRNA gene and many of the organisms would evade gene was isolated for the first time. Among other features, the detection by 16S rRNA gene amplicon surveys, even though discovery of an RNA helicase and a glutamate semialdehyde corresponding to such a high percentage of bacterial diversity aminotransferase, which were still unknown in archaeal in environments. In their work, Venter and collaborators organisms, was reported. The metagenomic library in this have attempted to address this issue and to better elucidate study had an average fragment size of 40 kb and was based the phylogenetic relationships in the recovered genetic on fosmid backbones. The screening was very labor intensive, material by employing six other phylogenetic markers, such relying on Southern blotting with phylogenetic probes for as heat shock protein 70 (HSP70) and elongation factor Tu 16S rRNA gene, followed by sequencing of selected clones (EF-Tu), as well as several methods to estimate species through automated Sanger or shotgun methods. diversity, which have resulted in the estimation of over It did not take long for the same group to push the 1000 species [34]. boundaries and further advance the incipient field of metage- Metagenomic studies also shed a light on challenges in nomics [35, 44]. From the surface waters of the Californian the field of evolutionary biology, such as in the understand- coast, a marine planktonic bacterial assemblage was collected ing of sexual reproduction as a constraint on genomic varia- for metagenomics analysis. Hitherto, bacteriorhodopsins tion [39]. Previous to the metagenomic era, it was believed were considered unique features of halophilic archaea; how- that many microbial species should be genomic clones—a- ever, the small ribosome subunit gene revealed its source as sexual reproduction was assumed to produce identical a gammaproteobacterium (uncultivable bacterial SAR86 lin- clones. Yet, metagenomic analyses unexpectedly revealed eage). It was also shown that the protein was functional when that most microbial species were not clonal [34]. Thus, asex- cloned into E. coli, presenting similar kinetics to the archaeal- ual reproduction present in bacteria should increase genetic related cognates [35]. In this study, both metagenomic variation providing evolutionary diversity for future environ- library and fragment average sizes were much larger than in mental challenges [40]. This finding was essential to support the previous study and the screening process was accelerated the conclusion that sexual reproduction acts as a constraint (6240 screened clones with an average size of 80 kb) [44]. on genomic and epigenetic variation, thereby limiting adap- These improvements were only possible due to the establish- tive evolution [39]. ment of BACs and PCR-based screening as new tools for In recent years, new sampling and sequencing techniques metagenomic studies. Thus, it did not take long for novel allowed researchers to further explore life diversity improv- proteorhodopsins to be detected in other populations of ing phylogenetic, genomic, and ecological notions previously planktonic marine bacteria [45]. This was considered the first International Journal of Genomics 5 great “metagenomic success,” allowing the adoption of these not only the inside but also the external part of the human new techniques among laboratories around the world. world, also known as microbiomes from human-built envi- ronments [53–55]. In this context, a myriad of anthropo- 2.3. Deciphering and Rebuilding Microbial Communities. By centric utensils and physical spaces has been assessed, starting with an extremely simple microbial community—an such as kitchen sponges [56], dollar bills [57], ATMs (Auto- acid mine drainage (AMD) microbial community—evi- mated Teller Machines) [58], homes [59], hospitals [60], denced by an initial group-specific fluoresce in situ hybridi- subways [61, 62], food production sites [63], and even the zation (FISH) assay, Tyson and collaborators [46] were able International Space Station [64]. The main goal of this to perform the first assembly of genomes directly from envi- new research branch is to provide a framework for under- ronmental samples. In this study, they have obtained two standing the relationship between human societies and nearly complete genomes of Leptospirillum group II and Fer- microbial communities, ultimately optimizing both human roplasma type II and partially recovered another three health and productivity. genomes. The initial characterization of this microenviron- A seminal research in the context of urban areas was con- ment, a biofilm with rather extreme conditions like very ducted by Afshinnekoo et al. [61], which sampled different acidic pH (approximately 0.83), revealed the presence of features of New York subways and reported that nearly archaea (Thermoplasmatales) and bacteria (Leptospirillum, 1700 microbial taxa were dominated mostly by human skin Sulfobacillus, and Acidimicrobium), with the domination of bacteria and to a lesser extent by microbes from the human Leptospirillum group II. The low diversity of the system was gastrointestinal and urogenital tracts. Almost half the DNA reported as possibly related to the extreme environmental present on the subway surfaces matched no known organism. conditions. Afterwards, the DNA sequencing of a biofilm Although results showed that the bacteria found in the sample suggested adaptive molecular traits of the community subways were mostly harmless, several pathogenic agents, to survive in this environment—such as homologous recom- including fragments of the plague and anthrax genomes were bination forming mosaic genome types—and metabolic detected. This was the beginning of an international consor- adaptations, such as abundance of genes related to ferrous tium called The Metagenomics and Metadesign of Subways iron oxidation. and Urban Biomes (MetaSUB) that has been sampling urban In other metagenomic studies, Tringe et al. [5] performed microbiomes throughout the world [62] (Figure 1, indicated comparisons of the composition and functionality of micro- as M11). An important scientific agenda was also launched in bial communities from two nutrient-poor and two 2017, the microbiomes of the built environment [65]. Its enriched-nutrient environments. This approach was mainly main objective is to assess the current state of knowledge concerned with gene function rather than genome composi- on indoor microbiomes and also to map out research tion, overcoming limitations to genome assembly from com- agendas and advise government agencies on how living plex environments. Authors showed that gene function and spaces can be designed “to support occupant health and well- structure differ in nutrient-limited (Sargasso Sea and AMD) being.” Other studies have assessed complementary aspects versus nutrient-abundant (Minnesota farm soil and deep- of this matter, such as the influence of landscape connectivity sea “whale fall” carcasses) environments. Some gene func- in microbial diversity [66], the influence of green areas in tions were exclusive to specific environments, for instance, urban spaces [67, 68], and the consequences of excessive time (i) cellobiose phosphorylases were only found at the agricul- expenditure indoors in the context of both human health and tural soil and (ii) light-driven proton pumps are only found environmental microbiomes [54, 69]. at the Sargasso Sea samples, whereas no photoreceptors were Altogether, those studies allowed the depiction of a few found at the deep-sea samples. important conclusions regarding human-built spaces and The development of functional approaches in large scales microbiomes, further explained by Stephens et al. [70]: (i) also provided novel insights into communities’ key metabolic culture-independent methods are essential for those surveys, process. A comparative functional profiling of 9 biomes was (ii) indoor spaces often harbor unique microbial communi- performed by Dinsdale and collaborators [47], describing ties, tightly related to the indoor sources—mostly humans how different biological traits play essential roles in each and pets, (iii) building design and operation can directly environment. For instance, authors showed an abundance modulate indoor microbial communities, and (iv) it is possi- of virulence genes in fish- and terrestrial-animal-associated ble to optimize human health by exposure to certain micro- metagenomes in comparison to the other biomes. In con- bial groups. Consequently, society moved from the old trast, virome analysis of the total number of biomes showed concept of microbes as harmful organisms to a new view in a more uniform gene functional composition due to phages which the interaction between humans and microorganisms playing similar roles in different environments [47]. can be flexible and directly dependent on our own decisions and practices. Further studies shall reveal novel rules on 2.4. City-Scale Molecular Profile of DNA. The last decades of “good living standards” for both humans and microbes in metagenomics have shed a light on the human microbiome built environments. and its profound influence on a wide range of aspects which were previously regarded as solely (epi-) genetically encoded, 2.5. The Human Microbiome. The concept of the human such as diseases susceptibility, immunological response, and microbiome, embracing the idea that human beings are social and nutritional behaviors [48–52]. Although there is highly susceptible to the microbial communities that live still much to learn in this subject, recent studies are targeting in and on our bodies, was an indubitable milestone in 6 International Journal of Genomics metagenomics with large repercussion in many areas. In manner, allowing the emergence of potentially pathogenic this sense, scientific contributions involving metagenomic Enterobacteriaceae—in a subgroup of patients—probably approaches rapidly highlighted the evidences showing related to a decreased initial gut microbiome diversity in that these microbiomes play key roles in human health those individuals. and disease. Besides obesity and antibiotic resistance, the human gut The human gut microbiota—the collection of microor- microbiome has been associated with several diseases as type ganisms that compose the human gut—is composed by up 2 diabetes, cardiovascular and inflammatory bowel diseases, to 1014 microorganisms [71, 72] including bacteria, viruses, and even cancer [84]. Some studies have also associated gut fungi, and protozoa. Deciphering the function and composi- microbiome with intestinal immunity. It has been shown that tion of our microbiome—the collective genomes of the a healthy microbiota improves local expression of a Toll-like microbial community resident—is a challenge that has been receptor (TLR) [85] which recognizes the PAMPs (pathogen- explored by researchers in a series of initiatives like the associated molecular patterns) expressed by a broad range of Human Microbiome Project (HMP), the Integrative Human infective agents and improves the percentage of antigen- Microbiome Project (iHMP), and the MetaHIT (METAge- presenting cells, differentiated T cells, and lymphoid follicles nomics of the Human Intestinal Tract) [73–75] (Figure 1, [86, 87]. Besides the local immunity, the gut microbiota indicated as M10). The findings of these projects have affects the systemic immunity by increasing splenic CD4+ T provided valuable data about the function of the human cells and systematic antibody expression [88]. Consequently, microbiome in different tracts (e.g., nasal, oral, skin, gastro- the global role of the gut microbiota in intestinal immunity intestinal, and urogenital). Particularly, advances in molecu- has increased the interest of the scientific community in lar biology procedures, next-generation DNA sequencing, developing techniques that improve human health by manip- and omics techniques have allowed to access not only to the ulating the gut microbiota. microbial genetic diversity but also to the understanding of Nowadays, researchers are exploiting these important the physiology and the lifestyle of our microbiome. In this results for medical applications; for instance, fecal microbiota sense, it was demonstrated that gut microorganisms perform transplantation (FMT) has been used to eliminate Clostrid- several elemental functions like synthesis of essential amino ium difficile recurrent infection by transplantation of healthy acids and vitamins and processing of cellulosic material microbiota in human patients [89]. Besides that, FMT has [76], playing an important role in a number of human health been successfully used in treatment of inflammatory bowel aspects [77]. disease, functional gastrointestinal disorders, hepatic enceph- In a series of studies coming out of the Gordan lab at alopathy, obesity, and metabolic syndrome [24, 90]. Washington University School of Medicine in St. Louis, Ley and collaborators [78] showed that obesity has a microbial 2.6. Biomedical Significance. Findings from studies of the gut component. To explore the relation between gut microbial microbiome shed a light over a number of diseases directly ecology and body fat in humans, authors studied 12 obese impacted by it, becoming a promising scope for advances in people, who were randomly assigned to two types of low- understanding and treating of complex diseases. Among calorie diet. The composition of their gut microbiota was them, Crohn’s disease [91], rheumatoid arthritis [92], obesity monitored over the course of one year by sequencing 16S [93, 94], type 1 and type 2 diabetes [95–97], breast cancer ribosomal RNA genes from stool samples [78]. Obtained [98, 99], and atherosclerosis [100] can be cited as associated data showed that two groups of beneficial bacteria were dom- to the gut microbiome. Thus, researchers are interested in inant in the human gut, the Bacteroidetes and the Firmicutes. finding biomarkers and microbiome-based signatures for In addition, they showed that the percentage of Bacteroidetes use in diagnostics, prognostics, and treatment of patients correlated with the percentage of loss weight. In other studies with diseases related to the human microbiome, describing [79], they found that the transplantation of the microbiota important targets with biomedical significance that could from obese mice to lean mice could lead to an increase of be useful for public health. body fat in transplanted mice when compared with trans- In this context, Yu and collaborators [101] performed a plantation from lean mice microbiota. metagenome-wide association study (MGWAS) using stool The other important outcome from the human gut samples from 74 Chinese individuals with colorectal cancer microbiome studies was regarding antibiotic resistance. In and 54 controls, aiming to identify noninvasive biomarkers order to determine an “antibiotic resistance potential,” For- for colorectal cancer. Authors found that, besides known slund et al. [80] performed a quantitative gut metagenomic colorectal cancer-associated species such as Fusobacterium analysis of known resistance genes from people of three nucleatum and Peptostreptococcus stomatis, the other 20 countries. In this study, authors showed that antibiotic resis- microbial gene markers could differentiate colorectal cancer tance gene abundance in the general human population is from control patients. In order to define a “worldwide” signa- correlated with the length of antibiotic usage in livestock ture for colorectal cancer identification, they validated four of [80–82]. In another study, Raymond and coworkers [83] these gene markers in Danish-, French-, and Austrian- showed that the initial gut microbiome affects its recomposi- published cohorts. This result indicates that the four bio- tion after antibiotics treatment. They administrated two markers validated in individuals from different countries second-generation antibiotics (cephalosporin and cefprozil) might be used to early diagnosis of colorectal cancer even in healthy individuals and showed that antibiotics altered in different populations with different gut microbiome struc- the microbiome of healthy volunteers in an interindividual tures and is a promise for early noninvasive diagnosis of the International Journal of Genomics 7 disease. Later, Yu’s group [102] developed a new diagnostic lipopolysaccharides (LPS) were significantly present in the tool for colorectal cancer using the four biomarkers validated pool, expanding E. coli’s LPS biosynthesis repertoire with in 2015. For this, they applied a probe-based duplex qPCR the acquired Bt biosynthetic genes. Those changes in the assay for quantifying these bacterial genes and showed that LPS synthesis could provide the E. coli with different antige- one of these genes can discriminate colorectal cancer from nicity and enhance resistance to barriers for colonization in control individuals with 77.7% of sensitivity and 79.5% of the gut. Nevertheless, a distinct set of genes was present in specificity. Moreover, combining these four bacterial genes the long-term experiment, mostly related to sugar metabo- with a fecal immunochemical test improved the diagnostic, lism or transport. Furthermore, except for a mutation in displaying a sensitivity of 92.8% and a specificity of 81.5%. the galK chromosomal gene, the recipient strain maintained In the same way, Pascal et al. [103] defined a microbial its genetic stability [105]. signature for Crohn’s disease. They performed a MGWAS The work is the first one to use temporal-functional using samples from 2045 individuals from four countries metagenomics to describe how temporal data can contribute (Spain, Belgium, UK, and Germany) and found eight groups to the discovery of genes with functions of interest, once most of microorganisms that could be used to discriminate of the genes involved in the GI tract community fitness would between Crohn’ disease and ulcerative colitis (the two main not be found if the data was from only an endpoint [105]. inflammatory bowel diseases that share many immunologic, Temporal approaches like these could also bring interesting epidemiologic, and clinical features). Then, they developed insights about interaction dynamics and fitness of microbes an algorithm that showed specificity of approximately 90% in other environments (such as bacteria associated to plant of Crohn’s disease detection when compared with healthy growth or parasitic interactions), unraveling the genes control, anorexia, ulcerative colitis, and inflammatory bowel involved in a microbial community structure and metabo- syndrome patients. Similarly, Loomba et al. 2017 [104] lism. Moreover, identified genes could be used as novel drug defined a gut microbiome signature for the diagnosis of target genes in pathogenic bacteria. advanced fibrosis in individuals with nonalcoholic fatty liver disease using a metagenomic analysis. Analogously, a similar 2.8. Biotechnological Impact. Biotechnology is one of the approach was used to distinguish between type 2 diabetes most favored fields by the metagenomic era. As microor- individuals and nondiabetic controls [97]. By means of a ganisms are the major source of biocatalysts for industrial sequence-based profiling metagenomic approach, authors purposes, increasing the repertoire of biochemical transfor- showed that type 2 diabetes individuals were characterized mations available for biotechnological solutions is of high by an increase of opportunist pathogens, an enrichment of relevance [106]. Since Healy and collaborators [28] intro- sulphate reduction genes and oxidative stress resistance duced the idea of constructing metagenomic libraries from genes, and a decrease in butyrate-producing species. Taken a gene of interest-related environment, functional and together, these gut metagenomic markers might become a sequence-based metagenomics have been shown to be effec- powerful tool for the diagnosis of patients with the disease. tive in the identification of novel genes that confer resistance to extreme conditions, enzymes, antibiotics, and other bio- 2.7. Mining of Microbial Genes In Vivo. New outcomes from active molecules derived from a variety of environments the studies of the human microbiome inspired novel biolog- (Table 1) [107–111]. Furthermore, functional metagenomic ical questions related to the microbial fitness at diverse approaches made it possible to identify novel biological human tracts. Thus, elucidating the set of genes involved in parts with specific activities without the need for isolation the colonization and maintenance of the intestinal microbi- and cultivation of microorganisms. ota would provide valuable tools to further engineering of In this context, a report from Ferrer and collaborators probiotics or to enhance the survival of certain microorgan- [112] should be highlighted. Authors constructed a metage- isms directly related to healthy conditions. nomic library of DNA from a cow rumen in a phage lambda In this way, the in vivo temporal functional metage- vector and performed functional screenings for different nomics approach, developed by Yaung and collaborators carbohydrate-active enzymes. Considering that cow rumen [105], allowed them to mine microbial genes associated to microorganisms are specialized in degradation of cellulosic microbial fitness in the mammalian gastrointestinal tract. plant material, the sample used for library construction The temporal sequencing platform employed in this study should be enriched in biomass-degrading genes. The success allowed the detection of genes that confer microbial fitness of the approach was exposed by the high rate of recovering of using a mouse as a model host. Thus, authors inoculated clones with different hydrolytic activities (22 clones), being germ-free mice with E. coli transformed with a library com- among them are 9 esterases, 12 endo-β-1,4-glucanase, and posed of 2–5 kb fragments of the Bacteroides thetaiotaomi- 1 cyclodextrinase. Moreover, after DNA sequencing analysis cron (Bt) complete genome. Through a 28-day experiment of the enzymes, 8 could not be found in any sequenced geno- of kinetic monitoring of the enriched clones inside the mic data, revealing that 36% of the recovered enzymes were mice’s gastrointestinal tract, they were able to identify genes completely new. related to colonization and maintenance of the bacteria Other interesting works related to exploring functional inside the tract. The study revealed that different sets of genes in leaf-cutter ant fungus gardens were carried out to genes were enriched in the pool of bacteria at contrasting determine enzymes and pathways involved in symbiosis times. For instance, during the early phase of colonization, between leaf-cutter ants and their cultivar. By using metage- genes responsible for the synthesis of polysaccharides and nomic approaches, authors have determined the microbial 8 International Journal of Genomics

Table 1: Genes discovered through metagenomic approaches with high biotechnological potential.

Number Screening Function/gene target DNA source Library size ∗ of hits Biotechnological relevance Reference method found Enzymes Esterases, Eight enzymes (36%) were endo-β-1,4-glucanases, Cow rumen 1.1 Gb Function based 22 [112] entirely new and cyclodextrinase High chloride resistance and Water from South Sequencing Laccase 1.4 Gb 1 ability [136] China Sea based to decolorize industrial dyes Applicable in oil-contaminated Naphthalene dioxygenase Oil-contaminated soil 294 Mb Function based 2 [137] soil/water Artificially polluted Applicable in oil-contaminated Oxygenases 5.2 Gb Function based 29 [126] soil soil/water Potential application in Leaf-branch compost 735 Mb Function based 19 polyethylene terephthalate (PET) [138] degradation Phenol hydroxylases Wastewater treatment Potential use in aromatic and catechol 495 Mb Function based 413 [139] plant compound degradation 2,3-dioxygenases Cold-active and salt-resistant Carboxylesterase Marine water ~1.3 Gb Function based 95 [140] enzyme Cellulase/esterase Water lakes 1.86 Gb Function based 3 New cellulase [141] Halo- and thermotolerant Cellulase Soil Not found Function based 1 [142] enzyme Hydrothermal spring Thermotolerant and heath-active β-Glucosidase Not found Function based 1 [143] water enzyme Lipase/protease/ Slaughterhouse drain ~884 Mb Function based 22 Antimicrobial activity [144] hemolysins/biosurfactants Genes that confer resistance to extreme conditions Plankton and Acid resistance genes rhizosphere from ~2.3 Gb Function based 15 Genes involved in acid resistance [118] Tinto River Rhizosphere of E. Nickel resistance genes andevalensis from 2.15 Gb Function based 13 Genes related to nickel resistance [117] Tinto River Brine and rhizosphere Salt resistance genes 2.15 Gb function-based 11 Genes conferring salt resistance [116] from Es Trenc saltern Headwater from Tinto Genes involved in arsenic Arsenic resistance genes 151 Mb Function based 18 [145] River resistance Regulatory sequences Soil from secondary Constitutive promoters ~500 Mb Function based 33 Use as “biobricks” [135] Atlantic Forest Pathways/systems/operons Naphthalene- Naphthalene-degrading Sequencing Pollutant-degrading enzyme contaminated ~283 Mb 3 [146] system based systems groundwater Degrading phenoxyalkanoic acid Dioxygenase-degrading Sequencing Forest soil 260–815 bp 11 (PAA) herbicides avoiding [147] cluster based groundwater contamination NRPS biosynthetic Tunicate consortium Sequencing ET-743 biosynthetic pathway; ~280 Mb 1 [148] pathway in Florida Keys based anticancer molecule Bioactive molecules Pigmentation producing Potential new molecules to be Soil Not found Function based 45 [125] and antibacterial activity used as antibiotics International Journal of Genomics 9

Table 1: Continued.

Number Screening Function/gene target DNA source Library size ∗ of hits Biotechnological relevance Reference method found Turbomycin A and B Soil ~1 Gb Function based 3 Antibiotic activity [16] Antimicrobial small Soil ~720 Mb Function based 4 Antibiotic activity [149] molecules ∗All genes discovered through sequencing-based methodologies were experimentally tested for their related functions. composition of the fungus gardens [113, 114] and how the and 39 trillion bacteria [123] and emits bacteria at rates of plant biomass degrading process in this microbial commu- over a million biological particles per hour [124]. Then, it is nity occurs, showing a number of novel cellulases involved daunting to ask can we understand the “maketh men” in it [113–115]. through its microbiome? Nowadays, we know that our per- Alternative relevant metagenomic studies take advantage sonal ecosystem of microbes is shed on everything we touch of the genetic potential of microbe inhabitants of extreme and everyplace we go as “molecular echoes.” Thus, can we environments, such as high or low temperatures, salinity, trace back an individual lifetime through metagenomics? acidity, pressure, radiation, or high concentrations of heavy What about the lifetime of a whole society? In reverse, can metals (Table 1) [116–118]. Deciphering microbial diversity we use metagenomics to guide the rational design of novel- and metabolic activities of microorganisms in extreme condi- built environments (indoors and outdoors) for artificially tions reveals the biochemical strategies used by them to sur- selecting microbial communities, which will ultimately con- vive under extreme conditions. This, in turn, can be used to tribute to human health? The answer to all those exciting expand the capability of survival of bacteria used in industrial questions is yes, we can—at least, partially—as it was processes. In addition, those organisms are interesting described in diverse examples along this revision, which are sources for enzymatic activities with specific and unusual fea- enabling us to move towards the right answers. tures. Table 1 summarises some of the most noticeable genes In contrast, despite the notable milestones reached in the that have been discovered until now using the strategies field, there are still crucial challenges that need to be faced in described above. order to delineate new conceptual advances in microbial sci- ence. Development of novel bioinformatic tools specific for 3. Conclusions and Perspectives metagenomic analysis is still necessary, once the next- generation sequencing platforms are generating an increas- Over the last three decades, since the first studies using the ing amount of data that is not directly proportional to its bio- concept of metagenomics, extraordinary advances in the field logical significance. That is, there is an enormous quantity of have been achieved. Collective intelligence from a plethora of information in sequenced data that need to be transformed experts in diverse fields (such as biologists, biochemists, into biological understating. In the near future, huge data geneticists, physicists, and computer scientists) was impera- processing and analysis in an integrated way with data tive for answering central biological questions and for bring- already known will be the main challenge of the field [30]. ing biotechnological solutions in a myriad of different fields. On the other hand, the success of the function-based Understanding properties such as structure, diversity, screenings depends on factors like the size of the gene in richness, and dynamics of microbial communities is essential metagenomic DNA, its abundance in the samples, the effi- for unravelling the underlying processes that govern the ciency of the screening method, the host vector system used, organization of those systems. However, for a more compre- and the heterologous expression of the gene. Nowadays, after hensive analysis, it is crucial to integrate information from overcoming of some critical limitations related to host-biased both the microbial community and the environment it is screenings, researchers have used alternative hosts besides E. embedded in. The macrodynamics of physical spaces and coli to perform the screenings [125–130]. For this, the use of the interactions between their components directly modulate broad host range vectors, able to replicate in different hosts, is the microbiological universe (and vice versa). Thus, under- required. Although a large collection of broad host range vec- standing the physical, chemical, and relational aspects of an tors is available [131], we still need to create a la carte vectors environment can provide insightful predictions about its specific for some microorganisms that are essential to certain microinhabitants, whereas the reverse process, depicting an industrial uses. In this sense, strategies involving synthetic environment from a collection of “microbial footprints,” biological approaches [132] have been crucial to develop although more challenging, is also attainable [119]. new smart screening methods. Engineering biological cir- In this context, one of the most proximal models of the cuits—the so-called biosensors—have accelerated the identifi- study is our own and the built spaces we create and live in. cation of positive activities in metagenomic screenings From our bodies to our cities and far away, the Earth is including millions of clones. Interesting examples are the heavily packed with microbes [120–122]. A “reference substrate-induced gene expression (SIGEX) and the product- man” (one who is 70 kilograms, 20–30 years old, and 1.7 induced gene expression (PIGEX) approaches [17, 133] meters tall) contains on average about 30 trillion human cells or a riboswitch-based selection system initially constructed 10 International Journal of Genomics for mining thiamine uptake functions [134], but generaliz- [2] J. Handelsman, M. R. Rondon, S. F. Brady, J. Clardy, and able to other compounds. Furthermore, developing new R. M. Goodman, “Molecular biological access to the chemis- methods for expanding the search space of functional try of unknown soil microbes: a new frontier for natural metagenomics from enzymes to novel genetic elements products,” Chemistry & Biology, vol. 5, no. 10, pp. R245– such as regulators, promoters, and cis-regulatory sequences R249, 1998. is very important for both mining biological parts and [3] R. D. Sleator, C. Shortall, and C. Hill, “Metagenomics,” Let- – understanding their natural diversity [17, 130, 133, 135]. ters in Applied Microbiology, vol. 47, no. 5, pp. 361 366, 2008. In summary, current challenges in metagenomics that [4] J. Handelsman, “Metagenomics: application of genomics to ” need to be addressed can be divided into two main groups: uncultured microorganisms, Microbiology and Molecular – (i) the development of novel bioinformatic tools and (ii) the Biology Reviews, vol. 69, no. 1, pp. 195 195, 2005. “ generation of novel molecular tools. The first group com- [5] S. G. Tringe, C. von Mering, A. Kobayashi et al., Comparative metagenomics of microbial communities,” Science, vol. 308, prises the necessity of dealing with the colossal amount of – information delivered from novel sequencing approaches, as no. 5721, pp. 554 557, 2005. [6] J. M. Vieites, M. E. Guazzaroni, A. Beloqui, P. N. Golyshin, previously described. Thus, it is imperative to transform the “ metagenomic information overload into biological under- and M. Ferrer, Metagenomics approaches in systems micro- biology,” FEMS Microbiology Reviews, vol. 33, no. 1, pp. 236– standing. The second challenge is related to the generation 255, 2009. of novel molecular approaches such as merging metagenomic [7] C. Schmeisser, H. Steele, and W. R. Streit, “Metagenomics, and synthetic biology to delineate novel strategies for activity- biotechnology with non-culturable microbes,” Applied driven screening. Existing functional screening methods usu- – fi Microbiology and Biotechnology, vol. 75, no. 5, pp. 955 962, ally have low rates of gene target identi cation. Therefore, the 2007. construction of novel synthetic circuits able to detect enzy- “ — — [8] M. E. Guazzaroni, R. Silva-Rocha, and R. J. Ward, Synthetic matic activities or other target gene output present in the biology approaches to improve biocatalyst identification in cloned metagenomic fragments is essential to improve the metagenomic library screening,” Microbial Biotechnology, screening efficiency of metagenomic libraries. In this manner, vol. 8, no. 1, pp. 52–64, 2015. ff by combining the collective e orts of specialists for overcom- [9] M. E. Guazzaroni, A. Beloqui, P. N. Golyshin, and M. Ferrer, ing the previously described challenges, it will be possible to “Metagenomics as a new technological tool to gain scientific integrate emerging concepts and dive deeper into the uni- knowledge,” World Journal of Microbiology and Biotechnol- verse of metagenomics expanding the current knowledge in ogy, vol. 25, no. 6, pp. 945–954, 2009. a myriad of areas. Shedding a light on the “hidden” world of [10] S. Louca, M. F. Polz, F. Mazel et al., “Function and functional uncultured microorganisms—and its inherent biochemical redundancy in microbial systems, ” Nature Ecology & Evolu- treasures—shall tell us stories not only about the multitude tion, vol. 2, no. 6, pp. 936–943, 2018. of Microverses that surround us but also about ourselves. [11] M. Li, B. Wang, M. Zhang et al., “Symbiotic gut microbes modulate human metabolic phenotypes,” Proceedings of the – Conflicts of Interest National Academy of Sciences, vol. 105, no. 6, pp. 2117 2122, 2008. The authors have no conflict of interest to declare. [12] M. G. I. Langille, J. Zaneveld, J. G. Caporaso et al., “Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences,” Nature Biotechnology, vol. 31, Acknowledgments no. 9, pp. 814–821, 2013. “ The authors are grateful to the anonymous reviewer for [13] D. E. Hunt, Y. Lin, M. J. Church et al., Relationship between abundance and specific activity of bacterioplankton in open her/his careful reading of our manuscript and her/his many ocean surface waters,” Applied and Environmental Microbiol- insightful comments and suggestions. This work was sup- ogy, vol. 79, no. 1, pp. 177–184, 2013. ported by the National Counsel of Technological and Scien- [14] F. Sanger, G. M. Air, B. G. Barrell et al., “Nucleotide sequence tific Development (Conselho Nacional de Desenvolvimento ” fi – of bacteriophage phi X174 DNA, Nature, vol. 265, no. 5596, Cientí co e Tecnológico 472893/2013 0) and by Young pp. 687–695, 1977. Research Wards by the Sao Paulo State Foundation (Funda- [15] M. Breitbart, I. Hewson, B. Felts et al., “Metagenomic analy- ção de Amparo à Pesquisa do Estado de São Paulo, Award ses of an uncultured viral community from human feces,” – no. 2015/04309 1). Luana de Fátima Alves, Cauã Antunes Journal of Bacteriology, vol. 185, no. 20, pp. 6220–6223, 2003. fi Westmann, and Tiago Cabral Borelli are bene ciaries of [16] D. E. Gillespie, S. F. Brady, A. D. Bettermann et al., “Isolation Fundação de Amparo à Pesquisa do Estado de São Paulo of antibiotics turbomycin a and B from a metagenomic fellowships (Award nos. 2016/06323–4, 2016/05472–6, and library of soil microbial DNA,” Applied and Environmental 2017/20818–9, resp.). Microbiology, vol. 68, no. 9, pp. 4301–4306, 2002. [17] T. Uchiyama, T. Abe, T. Ikemura, and K. Watanabe, “Sub- References strate-induced gene-expression screening of environmental metagenome libraries for isolation of catabolic genes,” Nature [1] N. R. Pace, D. A. Stahl, D. J. Lane, and G. J. Olsen, “The anal- Biotechnology, vol. 23, no. 1, pp. 88–93, 2005. ysis of natural microbial populations by ribosomal RNA [18] P. J. Turnbaugh, V. K. Ridaura, J. J. Faith, F. E. Rey, R. Knight, sequences,” in Advances in Microbial Ecology, M. K. Cou, and J. I. Gordon, “The effect of diet on the human gut micro- Ed., pp. 1–55, Springer, Boston, MA, USA, 1986. biome: a metagenomic analysis in humanized gnotobiotic International Journal of Genomics 11

mice,” Science Translational Medicine, vol. 1, no. 6, p. 6ra14, [34] J. C. Venter, K. Remington, J. F. Heidelberg et al., “Environ- 2009. mental genome shotgun sequencing of the Sargasso Sea,” Sci- [19] S. Sunagawa, L. P. Coelho, S. Chaffron et al., “Structure and ence, vol. 304, no. 5667, pp. 66–74, 2004. function of the global ocean microbiome,” Science, vol. 348, [35] O. Béjà, L. Aravind, E. V. Koonin et al., “Bacterial rhodopsin: no. 6237, article 1261359, 2015. evidence for a new type of phototrophy in the sea,” Science, – [20] A. Klindworth, E. Pruesse, T. Schweer et al., “Evaluation of vol. 289, no. 5486, pp. 1902 1906, 2000. general 16S ribosomal RNA gene PCR primers for classical [36] J. A. Klappenbach, J. M. Dunbar, and T. M. Schmidt, “rRNA and next-generation sequencing-based diversity studies,” operon copy number reflects ecological strategies of bacteria,” Nucleic Acids Research, vol. 41, no. 1, pp. e1–11, 2013. Applied and Environmental Microbiology, vol. 66, no. 4, – [21] A. Oulas, C. Pavloudi, P. Polymenakou et al., “Metage- pp. 1328 1333, 2000. nomics: tools and insights for analyzing next-generation [37] E. A. Eloe-Fadrosh, N. N. Ivanova, T. Woyke, and N. C. sequencing data derived from biodiversity studies,” Bioin- Kyrpides, “Metagenomics uncovers gaps in amplicon-based formatics and Biology Insights, vol. 9, pp. BBI.S12462– detection of microbial diversity,” Nature Microbiology, vol. 1, BBI.S12488, 2015. no. 4, 2016. [22] M. Kircher and J. Kelso, “High-throughput DNA sequencing - [38] C. T. Brown, L. A. Hug, B. C. Thomas et al., “Unusual biology concepts and limitations,” BioEssays, vol. 32, no. 6, pp. 524– across a group comprising more than 15% of domain Bacte- 536, 2010. ria,” Nature, vol. 523, no. 7559, pp. 208–211, 2015. [23] E. Le Chatelier, MetaHIT consortium, T. Nielsen et al., [39] R. Gorelick and H. H. Q. Heng, “Sex reduces genetic varia- “Richness of human gut microbiome correlates with meta- tion: a multidisciplinary review,” Evolution, vol. 65, no. 4, bolic markers,” Nature, vol. 500, no. 7464, pp. 541–546, pp. 1088–1098, 2011. 2013. [40] H. H. Q. Heng, “Elimination of altered karyotypes by sexual [24] D. Kao, B. Roach, H. Park et al., “Fecal microbiota transplan- reproduction preserves species identity,” Genome, vol. 50, tation in the management of hepatic encephalopathy,” Hepa- no. 5, pp. 517–524, 2007. tology, vol. 63, no. 1, pp. 339-340, 2016. [41] C. Rinke, P. Schwientek, A. Sczyrba et al., “Insights into the [25] N. Qin, F. Yang, A. Li et al., “Alterations of the human gut phylogeny and coding potential of microbial dark matter,” microbiome in liver cirrhosis,” Nature, vol. 513, no. 7516, Nature, vol. 499, no. 7459, pp. 431–437, 2013. pp. 59–64, 2014. [42] A. Spang, J. H. Saw, S. L. Jørgensen et al., “Complex archaea [26] H. K. Pedersen, V. Gudmundsdottir, H. B. Nielsen et al., that bridge the gap between prokaryotes and eukaryotes,” “Human gut microbes impact host serum metabolome and Nature, vol. 521, no. 7551, pp. 173–179, 2015. insulin sensitivity,” Nature, vol. 535, no. 7612, pp. 376–381, [43] J. L. Stein, T. L. Marsh, K. Y. Wu, H. Shizuya, and E. F. 2016. DeLong, “Characterization of uncultivated prokaryotes: iso- [27] T. M. Schmidt, E. F. DeLong, and N. R. Pace, “Analysis of a lation and analysis of a 40-kilobase-pair genome fragment marine picoplankton community by 16S rRNA gene cloning from a planktonic marine archaeon,” Journal of Bacteriology, and sequencing,” Journal of Bacteriology, vol. 173, no. 14, vol. 178, no. 3, pp. 591–599, 1996. pp. 4371–4378, 1991. [44] O. Béjà, M. T. Suzuki, E. V. Koonin et al., “Construction and fi [28] F. G. Healy, R. M. Ray, H. C. Aldrich, A. C. Wilkie, L. O. analysis of bacterial arti cial chromosome libraries from a ” Ingram, and K. T. Shanmugam, “Direct isolation of func- marine microbial assemblage, Environmental Microbiology, – tional genes encoding cellulases from the microbial consortia vol. 2, no. 5, pp. 516 529, 2000. in a thermophilic, anaerobic digester maintained on lignocel- [45] O. Béjà, E. N. Spudich, J. L. Spudich, M. Leclerc, and E. F. lulose,” Applied Microbiology and Biotechnology, vol. 43, DeLong, “Proteorhodopsin phototrophy in the ocean,” no. 4, pp. 667–674, 1995. Nature, vol. 411, no. 6839, pp. 786–789, 2001. [29] M. R. Rondon, P. R. August, A. D. Bettermann et al., “Cloning [46] G. W. Tyson, J. Chapman, P. Hugenholtz et al., “Community the soil metagenome: a strategy for accessing the genetic and structure and metabolism through reconstruction of micro- functional diversity of uncultured microorganisms,” Applied bial genomes from the environment,” Nature, vol. 428, and Environmental Microbiology, vol. 66, no. 6, pp. 2541– no. 6978, pp. 37–43, 2004. 2547, 2000. [47] E. A. Dinsdale, R. A. Edwards, D. Hall et al., “Functional [30] A. Escobar-Zepeda, A. V.-P. de León, and A. Sanchez-Flores, metagenomic profiling of nine biomes,” Nature, vol. 452, “The road to metagenomics: from microbiology to DNA no. 7187, pp. 629–632, 2008. sequencing technologies and bioinformatics,” Frontiers in [48] J. Lloyd-Price, A. Mahurkar, G. Rahnavard et al., “Strains, Genetics, vol. 6, 2015. functions and dynamics in the expanded human microbiome [31] D. D. Roumpeka, R. J. Wallace, F. Escalettes, I. Fotheringham, project,” Nature, vol. 550, pp. 61–66, 2017. and M. Watson, “A review of bioinformatics tools for bio- [49] J. A. Gilbert, R. A. Quinn, J. Debelius et al., “Microbiome- prospecting from metagenomic sequence data,” Frontiers in wide association studies link dynamic microbial consortia Genetics, vol. 8, 2017. to disease,” Nature, vol. 535, no. 7610, pp. 94–103, 2016. [32] F. P. Breitwieser, J. Lu, and S. L. Salzberg, “A review of [50] A. L. Kau, P. P. Ahern, N. W. Griffin, A. L. Goodman, and methods and databases for metagenomic classification and J. I. Gordon, “Human nutrition, the gut microbiome and assembly,” Briefings in Bioinformatics, pp. 1–15, 2017. the immune system,” Nature, vol. 474, no. 7351, pp. 327– [33] C. R. Woese and G. E. Fox, “Phylogenetic structure of the 336, 2011. prokaryotic domain: the primary kingdoms,” Proceedings of [51] E. A. Archie and J. Tung, “Social behavior and the micro- the National Academy of Sciences, vol. 74, no. 11, pp. 5088– biome,” Current Opinion in Behavioral Sciences, vol. 6, 5090, 1977. pp. 28–34, 2015. 12 International Journal of Genomics

[52] T. Kuntz and J. Gilbert, “Does the brain listen to the gut?,” [69] J. A. Gilbert, “How do we make indoor environments and eLife, vol. 5, 2016. healthcare settings healthier?,” Microbial Biotechnology, [53] A. J. Prussin and L. C. Marr, “Sources of airborne microor- vol. 10, no. 1, pp. 11–13, 2017. ganisms in the built environment,” Microbiome, vol. 3, [70] B. Stephens, “What have we learned about the microbiomes no. 1, p. 78, 2015. of indoor environments?,” mSystems, vol. 1, no. 4, 2016. [54] S. W. Kembel, E. Jones, J. Kline et al., “Architectural design [71] S. R. Gill, M. Pop, R. T. DeBoy et al., “Metagenomic analysis influences the diversity and structure of the built environment of the human distal gut microbiome,” Science, vol. 312, microbiome,” The ISME Journal, vol. 6, no. 8, pp. 1469–1479, no. 5778, pp. 1355–1359, 2006. 2012. [72] J. W. Arnold, J. Roach, and M. A. Azcarate-Peril, “Emerging [55] M. H. Y. Leung and P. K. H. Lee, “The roles of the out- technologies for gut microbiome research,” Trends in Micro- doors and occupants in contributing to a potential pan- biology, vol. 24, no. 11, pp. 887–901, 2016. microbiome of the built environment: a review,” Microbiome, [73] The Integrative HMP, (iHMP) Research Network consor- vol. 4, no. 1, pp. 21–15, 2016. tium, “The integrative human microbiome project: dynamic [56] T. Thomas, L. Moitinho-Silva, M. Lurgi et al., “Diversity, analysis of microbiome-host omics profiles during periods structure and convergent evolution of the global sponge of human health and disease,” Cell Host & Microbe, vol. 16, microbiome,” Nature Communications, vol. 7, 2016. no. 3, pp. 276–289, 2014. [57] J. M. Maritz, S. A. Sullivan, R. J. Prill, E. Aksoy, P. Scheid, and [74] The Human Microbiome Project Consortium, “Structure, J. M. Carlton, “Filthy lucre: a metagenomic pilot study of function and diversity of the healthy human microbiome,” microbes found on circulating currency in New York City,” Nature, vol. 486, no. 7402, pp. 207–214, 2012. PLoS One, vol. 12, no. 4, pp. e0175527–e0175516, 2017. [75] MetaHIT Consortium, J. Qin, R. Li et al., “ARTICLES A [58] H. M. Bik, J. M. Maritz, A. Luong, H. Shin, M. G. Dominguez- human gut microbial gene catalogue established by metage- Bello, and J. M. Carlton, “Microbial community patterns nomic sequencing,” Nature, vol. 464, no. 7285, pp. 59–65, associated with automated teller machine keypads in New 2010. York City,” mSphere, vol. 1, no. 6, pp. e00226–e00216, 2016. [76] F. Bäckhed, R. E. Ley, J. L. Sonnenburg, D. A. Peterson, and [59] R. I. Adams, A. C. Bateman, H. M. Bik, and J. F. Meadow, J. I. Gordon, “Host-bacterial mutualism in the human intes- “Microbiota of the indoor environment: a meta-analysis,” tine,” Science, vol. 307, no. 5717, pp. 1915–1920, 2005. Microbiome, vol. 3, no. 1, p. 49, 2015. [77] P. C. Watts, K. R. Buley, S. Sanderson, W. Boardman, [60] S. G. Tringe, T. Zhang, X. Liu et al., “The airborne metagen- C. Ciofi, and R. Gibson, “Parthenogenesis in Komodo ome in an indoor urban environment,” PLoS One, vol. 3, dragons,” Nature, vol. 444, no. 7122, pp. 1021-1022, 2006. no. 4, article e1862, 2008. [78] R. E. Ley, P. J. Turnbaugh, S. Klein, and J. I. Gordon, “Micro- [61] E. Afshinnekoo, C. Meydan, S. Chowdhury et al., “Geospa- bial ecology: human gut microbes associated with obesity,” tial resolution of human and bacterial diversity with city- Nature, vol. 444, no. 7122, pp. 1022-1023, 2006. scale metagenomics,” Cell Systems, vol. 1, no. 1, pp. 72–87, [79] P. J. Turnbaugh, R. E. Ley, M. A. Mahowald, V. Magrini, E. R. 2015. Mardis, and J. I. Gordon, “An obesity-associated gut micro- [62] The MetaSUB International Consortium, “The Metage- biome with increased capacity for energy harvest,” Nature, nomics and Metadesign of the Subways and Urban Biomes vol. 444, no. 7122, pp. 1027–1031, 2006. (MetaSUB) International Consortium inaugural meeting [80] K. Forslund, S. Sunagawa, J. R. Kultima et al., “Country- report,” Microbiome, vol. 4, no. 1, p. 24, 2016. specific antibiotic use practices impact the human gut resis- [63] N. A. Bokulich, Z. T. Lewis, K. Boundy-Mills, and D. A. Mills, tome,” Genome, vol. 23, no. 7, pp. 1163–1169, 2013. “A new perspective on microbial landscapes within food [81] K. Forslund, S. Sunagawa, L. P. Coelho, and P. Bork, “Meta- production, ” Current Opinion in Biotechnology, vol. 37, genomic insights into the human gut resistome and the forces pp. 182–189, 2016. that shape it,” BioEssays, vol. 36, no. 3, pp. 316–329, 2014. [64] J. M. Lang, D. A. Coil, R. Y. Neches et al., “A microbial survey [82] T. S. Ghosh, S. S. Gupta, G. B. Nair, and S. S. Mande, “In silico of the International Space Station (ISS),” Peer J, vol. 5, article analysis of antibiotic resistance genes in the gut microflora of e4029, 2017. individuals from diverse geographies and age-groups,” PLoS [65] National Academies of Sciences, Engineering and MNA of One, vol. 8, no. 12, article e83823, 2013. ED on E and PSH and MDD on E and LSB on I and the [83] F. Raymond, A. A. Ouameur, M. Déraspe et al., “The initial CEB on ES and T. Microbiomes of the Built Environment, state of the human gut microbiome determines its reshaping National Academies Press, Washington, D.C. USA, 2017. by antibiotics,” The ISME Journal, vol. 10, no. 3, pp. 707–720, [66] J. S. Griffin, N. Lu, N. Sangwan et al., “Microbial diversity in 2016. an intensively managed landscape is structured by landscape [84] R. K. Singh, H.-W. Chang, D. Yan et al., “Influence of diet on connectivity,” FEMS Microbiology Ecology, vol. 93, no. 10, the gut microbiome and implications for human health,” pp. 1–12, 2017. Journal of Translational Medicine, vol. 15, no. 1, pp. 73–17, [67] G. Mhuireach, B. R. Johnson, A. E. Altrichter et al., “Urban 2017. greenness influences airborne bacterial community composi- [85] A. Lundin, C. M. Bok, L. Aronsson et al., “Gut flora, Toll-like tion,” Science of The Total Environment, vol. 571, pp. 680– receptors and nuclear receptors: a tripartite communication 687, 2016. that tunes innate immunity in large intestine,” Cellular [68] A. T. Reese, A. Savage, E. Youngsteadt et al., “Urban stress is Microbiology, vol. 10, no. 5, pp. 1093–1103, 2008. associated with variation in microbial species composition - [86] Y. K. Lee and S. K. Mazmanian, “Has the microbiota played a but not richness - in Manhattan,” The ISME Journal, critical role in the evolution of the adaptive immune sys- vol. 10, no. 3, pp. 751–760, 2016. tem?,” Science, vol. 330, no. 6012, pp. 1768–1773, 2010. International Journal of Genomics 13

[87] Y. Belkaid and T. W. Hand, “Role of the microbiota in immu- disease,” Cell Metabolism, vol. 25, no. 5, pp. 1054– nity and inflammation,” Cell, vol. 157, no. 1, pp. 121–141, 1062.e5, 2017. 2014. [105] S. J. Yaung, L. Deng, N. Li et al., “Improving microbial fitness [88] M. C. Noverr and G. B. Huffnagle, “Does the microbiota reg- in the mammalian gut by in vivo temporal functional metage- ulate immune responses outside the gut?,” Trends in Microbi- nomics,” Molecular Systems Biology, vol. 11, no. 3, p. 788, ology, vol. 12, no. 12, pp. 562–568, 2004. 2015. [89] J. R. Allegretti, J. R. Korzenik, and M. J. Hamilton, “Fecal [106] L. Fernández-Arrojo, M. E. Guazzaroni, N. López-Cortés, microbiota transplantation via colonoscopy for recurrent A. Beloqui, and M. Ferrer, “Metagenomic era for biocatalyst C. difficile infection,” Journal of Visualized Experiments, identification,” Current Opinion in Biotechnology, vol. 21, no. 94, article e52154, 2014. no. 6, pp. 725–733, 2010. [90] S. Gupta, E. Allen-Vercoe, and E. O. Petrof, “Fecal microbiota [107] M. Ferrer, A. Ghazi, A. Beloqui et al., “Functional metage- transplantation: in perspective,” Therapeutic Advances in nomics unveils a multifunctional glycosyl hydrolase from Gastroenterology, vol. 9, no. 2, pp. 229–239, 2015. the family 43 catalysing the breakdown of plant polymers in [91] P. Seksik, L. Rigottier-Gois, G. Gramet et al., “Alterations of the calf rumen,” PLoS One, vol. 7, no. 6, article e38134, 2012. the dominant faecal bacterial groups in patients with Crohn’s [108] M. V. Del Pozo, L. Fernández-Arrojo, J. Gil-Martínez et al., disease of the colon,” Gut, vol. 52, no. 2, pp. 237–242, 2003. “Microbial β-glucosidases from cow rumen metagenome [92] J. U. Scher, A. Sczesnak, R. S. Longman et al., “Expansion of enhance the saccharification of lignocellulose in combination intestinal Prevotella copri correlates with enhanced suscepti- with commercial cellulase cocktail,” Biotechnology for Bio- bility to arthritis,” eLife, vol. 2, 2013. fuels, vol. 5, no. 1, pp. 73–13, 2012. [93] L. Zhao, “The gut microbiota and obesity: from correlation [109] C. Thompson, W. Beys-Da-Silva, L. Santi et al., “A potential to causality,” Nature Reviews Microbiology, vol. 11, no. 9, source for cellulolytic enzyme discovery and environmental pp. 639–647, 2013. aspects revealed through metagenomics of Brazilian man- [94] C. Menni, M. A. Jackson, T. Pallister, C. J. Steves, T. D. groves,” AMB Express, vol. 3, no. 1, p. 65, 2013. Spector, and A. M. Valdes, “Gut microbiome diversity [110] C. Yang, Y. Xia, H. Qu et al., “Discovery of new cellulases and high-fibre intake are related to lower long-term weight from the metagenome by a metagenomics-guided strategy,” gain,” International Journal of Obesity, vol. 41, no. 7, Biotechnology for Biofuels, vol. 9, no. 1, p. 138, 2016. – pp. 1099 1105, 2017. [111] S. Courtois, C. M. Cappellano, M. Ball et al., “Recombinant [95] M. Knip and H. Siljander, “The role of the intestinal microbi- environmental libraries provide access to microbial diversity ota in type 1 diabetes mellitus,” Nature Reviews Endocrinol- for drug discovery from natural products,” Applied and Envi- ogy, vol. 12, no. 3, pp. 154–167, 2016. ronmental Microbiology, vol. 69, no. 1, pp. 49–55, 2003. [96] J. L. Han and H. L. Lin, “Intestinal microbiota and type 2 [112] M. Ferrer, O. V. Golyshina, T. N. Chernikova et al., “Novel diabetes: from mechanism insights to therapeutic perspec- hydrolase diversity retrieved from a metagenome library of tive,” World Journal of Gastroenterology, vol. 20, no. 47, bovine rumen microflora,” Environmental Microbiology, pp. 17737–17745, 2014. vol. 7, no. 12, pp. 1996–2010, 2005. [97] J. Qin, Y. Li, Z. Cai et al., “A metagenome-wide association [113] F. O. Aylward, K. E. Burnum, J. J. Scott et al., “Metagenomic study of gut microbiota in type 2 diabetes,” Nature, and metaproteomic insights into bacterial communities in vol. 490, no. 7418, pp. 55–60, 2012. leaf-cutter ant fungus gardens,” The ISME Journal, vol. 6, [98] C. M. Velicer, S. R. Heckbert, J. W. Lampe, J. D. Potter, C. A. no. 9, pp. 1688–1701, 2012. Robertson, and S. H. Taplin, “Antibiotic use in relation to the [114] G. Suen, J. J. Scott, F. O. Aylward, and C. R. Currie, “The risk of breast cancer,” JAMA, vol. 291, no. 7, pp. 827–835, microbiome of leaf-cutter ant fungus gardens,” in Handbook 2004. of Molecular Microbial Ecology II: Metagenomics in Different [99] H. T. Sørensen, M. V. Skriver, S. Friis, J. K. McLaughlin, W. J. Habitats, F. J. de Bruijn, Ed., John Wiley & Sons, Inc., Blot, and J. A. Baron, “Use of antibiotics and risk of breast Hoboken, NJ, USA, 2011. – ” cancer: a population-based case control study, British Jour- [115] F. O. Aylward, K. E. Burnum-Johnson, S. G. Tringe et al., – nal of Cancer, vol. 92, no. 3, pp. 594 596, 2005. “Leucoagaricus gongylophorus produces diverse enzymes [100] Z. Wang, E. Klipfell, B. J. Bennett et al., “Gut flora metabolism for the degradation of recalcitrant plant polymers in leaf- of phosphatidylcholine promotes cardiovascular disease,” cutter ant fungus gardens,” Applied and Environmental Nature, vol. 472, no. 7341, pp. 57–63, 2011. Microbiology, vol. 79, no. 12, pp. 3770–3778, 2013. [101] J. Yu, Q. Feng, S. H. Wong et al., “Metagenomic analysis of [116] S. Mirete, M. R. Mora-Ruiz, M. Lamprecht-Grandío, C. G. de faecal microbiome as a tool towards targeted non-invasive Figueras, R. Rosselló-Móra, and J. E. González-Pastor, “Salt biomarkers for colorectal cancer,” Gut, vol. 66, no. 1, resistance genes revealed by functional metagenomics from pp. 70–78, 2016. brines and moderate-salinity rhizosphere within a hypersa- [102] Q. Liang, J. Chiu, Y. Chen et al., “Fecal bacteria act as novel line environment,” Frontiers in Microbiology, vol. 6, 2015. biomarkers for noninvasive diagnosis of colorectal cancer,” [117] S. Mirete, C. G. De Figueras, and J. E. González-Pastor, Clinical Cancer Research, vol. 23, no. 8, pp. 2061–2070, 2017. “Novel nickel resistance genes from the rhizosphere meta- [103] V. Pascal, M. Pozuelo, N. Borruel et al., “A microbial sig- genome of plants adapted to acid mine drainage,” Applied nature for Crohn’s disease,” Gut, vol. 66, no. 5, pp. 813– and Environmental Microbiology, vol. 73, no. 19, pp. 6001– 822, 2017. 6011, 2007. [104] R. Loomba, V. Seguritan, W. Li et al., “Gut microbiome- [118] M. E. Guazzaroni, V. Morgante, S. Mirete, and J. E. Gon- based metagenomic signature for non-invasive detection zález-Pastor, “Novel acid resistance genes from the meta- of advanced fibrosis in human nonalcoholic fatty liver genome of the Tinto River, an extremely acidic 14 International Journal of Genomics

environment,” Environmental Microbiology, vol. 15, no. 4, Applied and Environmental Microbiology, vol. 76, no. 21, pp. 1088–1102, 2013. pp. 7029–7035, 2010. [119] D. R. Garza, M. C. van Verk, M. A. Huynen, and B. E. Dutilh, [134] H. J. Genee, A. P. Bali, S. D. Petersen et al., “Functional min- “Towards predicting the environmental metabolome from ing of transporters using synthetic selections,” Nature Chem- metagenomics with a mechanistic model,” Nature Microbiol- ical Biology, vol. 12, no. 12, pp. 1015–1022, 2016. – ogy, vol. 3, no. 4, pp. 456 460, 2018. [135] C. A. Westmann, L. d. F. Alves, R. Silva-Rocha, and [120] A. P. Alivisatos, M. J. Blaser, E. L. Brodie et al., “A unified M.-E. Guazzaroni, “Mining novel constitutive promoter initiative to harness Earth’s microbiomes,” Science, vol. 350, elements in soil metagenomic libraries in Escherichia coli,” no. 6260, pp. 507-508, 2015. Frontiers in Microbiology, vol. 9, 2018. [121] L. R. Thompson, J. G. Sanders, D. McDonald et al., “A com- [136] Z. Fang, T. Li, Q. Wang et al., “A bacterial laccase from munal catalogue reveals Earth’s multiscale microbial diver- marine microbial metagenome exhibiting chloride tolerance sity,” Nature, vol. 551, no. 7681, pp. 457–463, 2017. and dye decolorization ability,” Applied Microbiology and [122] C. Schmidt, “Living in a microbial world,” Nature Biotechnol- Biotechnology, vol. 89, no. 4, pp. 1103–1110, 2011. ogy, vol. 35, no. 5, pp. 401–403, 2017. [137] A. Ono, R. Miyazaki, M. Sota, Y. Ohtsubo, Y. Nagata, and [123] R. Sender, S. Fuchs, and R. Milo, “Revised estimates for the M. Tsuda, “Isolation and characterization of naphthalene- number of human and bacteria cells in the body,” PLoS Biol- catabolic genes and plasmids from oil-contaminated soil by ogy, vol. 14, no. 8, pp. e1002533–e1002514, 2016. using two cultivation-independent approaches,” Applied – [124] J. F. Meadow, A. E. Altrichter, A. C. Bateman et al., “Humans Microbiology and Biotechnology, vol. 74, no. 2, pp. 501 510, differ in their personal microbial cloud,” Peer J, vol. 3, article 2007. e1258, 2015. [138] S. Sulaiman, S. Yamato, E. Kanaya et al., “Isolation of a novel [125] J. W. Craig, F.-Y. Chang, J. H. Kim, S. C. Obiajulu, and S. F. homolog with polyethylene terephthalate-degrading Brady, “Expanding small-molecule functional metagenomics activity from leaf-branch compost by using a metagenomic ” through parallel screening of broad-hostrRange cosmid envi- approach, Applied and Environmental Microbiology, vol. 78, – ronmental DNA libraries in diverse proteobacteria,” Applied no. 5, pp. 1556 1562, 2012. and Environmental Microbiology, vol. 76, no. 5, pp. 1633– [139] C. C. Silva, H. Hayden, T. Sawbridge et al., “Identification of 1641, 2010. genes and pathways related to phenol degradation in metage- fi ” [126] H. Nagayama, T. Sugawara, R. Endo et al., “Isolation of oxy- nomic libraries from petroleum re nery wastewater, PLoS – genase genes for indigo-forming activity from an artificially One, vol. 8, no. 4, pp. e61811 e61811, 2013. polluted soil metagenome by functional screening using [140] A. Tchigvintsev, H. Tran, A. Popovic et al., “The environment Pseudomonas putida strains as hosts,” Applied Microbiology shapes microbial enzymes: five cold-active and salt-resistant and Biotechnology, vol. 99, no. 10, pp. 4453–4470, 2015. carboxylesterases from marine metagenomes,” Applied – [127] B. Leis, A. Angelov, M. Mientus et al., “Identification of novel Microbiology and Biotechnology, vol. 99, no. 5, pp. 2165 esterase-active enzymes from hot environments by use of the 2178, 2015. host bacterium Thermus thermophilus, ” Frontiers in Micro- [141] H. C. Rees, S. Grant, B. Jones, W. D. Grant, and S. Heaphy, biology, vol. 6, 2015. “Detecting cellulase and esterase enzyme activities encoded ” [128] A. Angelov, M. Mientus, S. Liebl, and W. Liebl, “A two-host by novel genes present in environmental DNA libraries, – fosmid system for functional screening of (meta) genomic Extremophiles, vol. 7, no. 5, pp. 415 421, 2003. libraries from extreme thermophiles,” Systematic and Applied [142] R. Garg, R. Srivastava, V. Brahma, L. Verma, S. Karthikeyan, Microbiology, vol. 32, no. 3, pp. 177–185, 2009. and G. Sahni, “Biochemical and structural characterization of ” [129] T. Aakvik, K. F. Degnes, R. Dahlsrud et al., “A plasmid RK2- a novel halotolerant cellulase from soil metagenome, Scien- fi – based broad-host-range cloning vector useful for transfer of ti c Reports, vol. 6, no. 1, pp. 1 15, 2016. metagenomic libraries to a variety of bacterial species,” FEMS [143] C. Schröder, S. Elleuche, S. Blank, and G. Antranikian, “Char- Microbiology Letters, vol. 296, no. 2, pp. 149–158, 2009. acterization of a heat-active archaeal β-glucosidase from a ” [130] N. I. Johns, A. L. C. Gomes, S. S. Yim et al., “Metagenomic hydrothermal spring metagenome, Enzyme and Microbial – mining of regulatory elements enables programmable Technology, vol. 57, pp. 48 54, 2014. species-selective gene expression,” Nature Methods, vol. 15, [144] S. Thies, S. C. Rausch, F. Kovacic et al., “Metagenomic discov- no. 5, pp. 323–329, 2018. ery of novel enzymes and biosurfactants in a slaughterhouse fi ” fi [131] R. Silva-Rocha, E. Martínez-García, B. Calles et al., “The bio lm microbial community, Scienti c Reports, vol. 6, – standard European vector architecture (SEVA): a coherent no. 1, pp. 1 12, 2016. platform for the analysis and deployment of complex prokary- [145] V. Morgante, S. Mirete, C. G. de Figueras, M. Postigo Cacho, otic phenotypes,” Nucleic Acids Research, vol. 41, no. D1, and J. E. González-Pastor, “Exploring the diversity of arsenic pp. D666–D675, 2013. resistance genes from acid mine drainage microorganisms,” – [132] L. D. F. Alves, R. Silva-Rocha, and M.-E. Guazzaroni, Environmental Microbiology, vol. 17, no. 6, pp. 1910 1925, “Enhancing metagenomic approaches through synthetic 2015. biology,” in Functional Metagenomics: Tools and Applications [146] Y. Wang, Y. Chen, Q. Zhou et al., “A culture-independent María-Eugenia Guazzaroni, T. C. Charles, M. R. Liles, and A. approach to unravel uncultured bacteria and functional genes Sessitsch, Eds., pp. 1–14, Springer International Publishing, in a complex microbial community,” PLoS One, vol. 7, no. 10, Berlin, 1st edition, 2017. article e47530, 2012. [133] T. Uchiyama and K. Miyazaki, “Product-induced gene [147] A. Zaprasis, Y. J. Liu, S. J. Liu, H. L. Drake, and M. A. expression, a product-responsive reporter assay used to Horn, “Abundance of novel and diverse tfdA-like genes, screen metagenomic libraries for enzyme-encoding genes,” encoding putative phenoxyalkanoic acid herbicide-degrading International Journal of Genomics 15

dioxygenases, in soil,” Applied and Environmental Microbiol- ogy, vol. 76, no. 1, pp. 119–128, 2009. [148] C. M. Rath, B. Janto, J. Earl et al., “Meta-omic characteriza- tion of the marine invertebrate microbial consortium that produces the chemotherapeutic natural product ET-743,” ACS Chemical Biology, vol. 6, no. 11, pp. 1244–1256, 2011. [149] M. N. IA, C. L. Tiong, C. Minor et al., “Expression and isola- tion of antimicrobial small molecules from soil DNA librar- ies,” Journal of Molecular Microbiology and Biotechnology, vol. 3, no. 2, pp. 301–308, 2001. Hindawi International Journal of Genomics Volume 2018, Article ID 1652567, 12 pages https://doi.org/10.1155/2018/1652567

Review Article Protein Engineering Strategies to Expand CRISPR-Cas9 Applications

1 2 2 3 Lucas F. Ribeiro , Liliane F. C. Ribeiro, Matheus Q. Barreto, and Richard J. Ward

1Department of Biology, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, University of São Paulo, São Paulo, SP, Brazil 2Department of Biochemistry and Immunology, Faculdade de Medicina de Ribeirão Preto, University of São Paulo, São Paulo, SP, Brazil 3Department of Chemistry, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, University of São Paulo, São Paulo, SP, Brazil

Correspondence should be addressed to Lucas F. Ribeiro; [email protected]

Received 21 March 2018; Accepted 6 June 2018; Published 2 August 2018

Academic Editor: Raul A. Platero

Copyright © 2018 Lucas F. Ribeiro et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The development of precise and modulated methods for customized manipulation of DNA is an important objective for the study and engineering of biological processes and is essential for the optimization of gene therapy, metabolic flux, and synthetic gene networks. The clustered regularly interspaced short palindromic repeat- (CRISPR-) associated protein 9 is an RNA-guided site-specific DNA-binding complex that can be reprogrammed to specifically interact with a desired DNA sequence target. CRISPR-Cas9 has been used in a wide variety of applications ranging from basic science to the clinic, such as gene therapy, gene regulation, modifying epigenomes, and imaging chromosomes. Although Cas9 has been successfully used as a precise tool in all these applications, some limitations have also been reported, for instance (i) a strict dependence on a protospacer-adjacent motif (PAM) sequence, (ii) aberrant off-target activity, (iii) the large size of Cas9 is problematic for CRISPR delivery, and (iv) lack of modulation of protein binding and activity, which is crucial for precise spatiotemporal control of gene expression or genome editing. These obstacles hinder the use of CRISPR for disease treatment and in wider biotechnological applications. Protein-engineering approaches offer solutions to overcome the limitations of Cas9 and generate robust and efficient tools for customized DNA manipulation. Here, recent protein-engineering approaches for expanding the versatility of the Streptococcus pyogenes Cas9 (SpCas9) is reviewed, with an emphasis on studies that improve or develop novel protein functions through domain fusion or splitting, rational design, and directed evolution.

1. Introduction activator-like effector (TALE) to be reengineered each time in order to bind different target DNA sequences. In Progress in genetic engineering, metabolic engineering, and CRISPR-Cas9, the element that specifies the DNA target is synthetic biology will be determined by the development of not the protein itself, but a single-guide RNA (sgRNA) versatile, user-friendly technologies for the precise and molecule, which is straightforward to design and to synthe- efficient manipulation of cells. The clustered regularly size [8]. The sgRNA-Cas9 complex binds to the DNA to interspaced short palindromic repeat- (CRISPR-) associated create a double-stranded break. Cleavage by the sgRNA- protein 9 (CRISPR-Cas9) system is a particularly attractive Cas9 requires both sequence complementarity between the tool for gene/base editing, gene regulation studies, epige- sgRNA (spacer sequence) and the target DNA sequence netic modulation, genome imaging, and manipulation of (“protospacer”) as well as the presence of an appropriate chromatin topology [1–7]. This technology has attracted protospacer-adjacent motif (PAM) sequence at the 3′ end considerable attention due to its simplicity for targeting and of the protospacer sequence [9]. CRISPR-Cas9 is part of modifying a specific DNA sequence. Use of the Cas9 protein the adaptive immune system in prokaryotes, and the circumvents the requirement of other DNA-binding proteins PAM sequence allows this immune system to distinguish such as zinc finger (ZFNs) and transcription between self and nonself DNA target [10]. In addition, 2 International Journal of Genomics there is significant variation in PAM specificity between Despite its proven potential, CRISPR-Cas9 system has Cas9 orthologs [11], which extends the applications of limitations that restrict its wide use for disease treatment Cas9 to complex systems that require orthogonality or or in other biotechnological applications. These limitations multiplexing [12]. include the strict dependence of a PAM sequence, off- The Cas9 enzyme is an endonuclease found in some target DNA cleavage, and problems for CRISPR delivery bacteria and Archaea [13–15]. The Cas9 of Streptococcus posed by the large size of the protein [26–30]. Not only pyogenes (SpCas9) is the best characterized and most widely could protein engineering offer elegant approaches to applied for DNA sequence manipulation. The SpCas9 is overcome these limitations but may also be used to comprised of 1368 amino acids and is organized in multiple improve the kinetic and biophysical properties of the domains each with a distinct function (Figure 1(a)) [16]. enzyme. Engineered Cas9 could provide robust and effi- The HNH-like and RuvC-like domains have activity cient tools for DNA manipulation tailored to specified and are so named because they present sequence simi- applications. In this review, we will discuss recent larity to other . The HNH-like domain protein-engineering approaches to expand Cas9 functions, cleaves the DNA strand that is complementary to the emphasizing those studies that have improved or created sgRNA (target strand), and the RuvC-like domain cleaves SpCas9 functions through domain fusion or splitting, the noncomplementary DNA strand (nontarget strand) rational design, and directed evolution. (Figure 1(b)) [8, 17, 18]. Alanine substitution of key residues in the RuvC domain (D10A) produces a nick in the targeting 2. Engineering Cas9 by End-to-End Fusion or strand, while the H840A mutation of the HNH domain nicks Protein Domain Insertion only the nontargeting strand. The double D10A/H840A mutant eliminates nuclease activity in both domains Protein domains are evolutionarily conserved polypeptide producing a catalytically inactive, or “dead,” Cas9 (dCas9) units that generally show independent structural or func- [8]. Nevertheless, dCas9 retains the ability of Cas9 to bind tional properties. Proteins containing multiple domains specifically to a target DNA sequence. represent more than two-thirds of the proteins found in The structure of Cas9 presents two lobes, an α-helical rec- prokaryotes and eukaryotes [31]. Moreover, protein domains ognition lobe (REC) and a nuclease lobe (NUC) (Figures 1(a) can act as structural reservoirs to generate new protein archi- and 1(b)) [16]. The REC is comprised of three α-helical tectures [32]. Bioengineers have used domains as building domains and has no structural similarity to any other known blocks to generate new proteins with novel biotechnological protein. The NUC lobe includes the nuclease domains applications [33–38]. Multidomain proteins can be generated RuvCs, HNH, and also the C-terminal domain (CTD). The by either end-to-end fusion or by domain insertion. End-to- two lobes are connected by two linker sequences, the end fusion consists of creating a peptidyl linkage between the arginine-rich bridge helix (Arg) and the disordered linker N-terminal residue of one domain and the C-terminal (DL). The sgRNA-DNA complex is bound at the interface residue of another. Since no prior knowledge of the protein between the two lobes. The CTD domain contains the structure is necessary, end-to-end fusion is amongst the most PAM-interaction residues necessary for the recognition of widely used strategies in protein engineering [37], and several the PAM sequence, in which arginine residues at positions studies have used this approach to engineer Cas9 with the 1333 and 1335 play an important role [19]. The apo Cas9 aim of manipulating its properties for different applications protein (without sgRNA) adopts an inactive conformation, (Figure 2(a)) [39–42]. in which the PAM-interaction region is largely disordered Alternatively, multidomain proteins can be formed by [20], and sgRNA binding is a key event for Cas9 activation domain insertion in which one domain (insert) is spliced into [20]. When bound to sgRNA, the REC lobe of Cas9 another domain (acceptor) either at a specific position or undergoes a large conformational change and the Cas9- by random insertion. Domain insertion can create struc- sgRNA complex is ready to probe for the target DNA tural coupling among the combined domains, with the sequence. Single-molecule experiments have demonstrated emergence of new functions (Figure 2(b)) [43]. Further- that the recognition of target DNA by Cas9-sgRNA occurs more, in contrast to end-to-end fusion in which proteins through three-dimensional collision [21]. The process starts are linked by a single contact point, when two domains by probing the correct PAM sequence; if PAM is not found, are fused by domain insertion, they will be linked by peptidyl the protein rapidly dissociates from the DNA. When PAM bonds at two contact points, which can generate more stable is present, the DNA adjacent to the PAM sequence begins structures [36, 44–46]. In the following examples, we discuss to denature and subsequent base pairing of the sgRNA current strategies for engineering SpCas9 through end-to- forms a RNA:DNA hybrid [21, 22]. DNA cleavage end fusion and domain insertion [43]. requires perfect complementarity between the sgRNA and 10–12 nucleotides located at the 3′ end of the 20-nt spacer 2.1. Reducing Off-Target Events. One of the main limitations sequence, and this hybrid structure is denominated as the of the CRISPR-Cas9 system is the high level of off-target “seed” region [8, 21, 23, 24]. Imperfect base pairing outside DNA-cleavage events [47]. In order to improve specificity, the seed region can be tolerated and results in off-target dimerization-dependent Cas9-based nucleases have been Cas9 activity [25]. This detailed understanding of the struc- created by end-to-end fusion of dCas9 with a dimerization- tural basis of Cas9 activity is the starting point for improving dependent FokI nuclease domain (Figure 2(a)) [48, 49]. The and expanding the functions and applications of this protein. fusion protein doubled the sequence length required for International Journal of Genomics 3

1⁎D10 56 94 712 718 765 780⁎H840 906 918 1099 1368 RuvC-I Arg Helical domain I-III DL RuvC-II L-I HNH L-II RuvC-III CTD

NUC lobe REC lobe NUC lobe (a)

RuvC

5′ 3′ 3′ HNH 5′

5′ sgRNA

3′

(b) REC lobe

′ 5 sgRNA

Target DNA

NUC lobe (c)

Figure 1: Overall organization, structure, and function of CRISPR-associated protein 9 (Cas9) from Streptococcus pyogenes (SpCas9). (a) Schematic representation of the domain organization of the SpCas9. Asterisks denote catalytical residues. (b) Cas9 (blue) requires a sgRNA that has a 20 bp region complementary to the target DNA. Cas9 requires two RNA components—CRISPR RNA (crRNA; orange) and transactivating RNA (tracrRNA; green). sgRNA is a chimeric RNA in which crRNA and tracrRNA are fused through a linker. PAM sequence (5′-NGG-3′) is shown in yellow and is crucial for binding and cleavage. DNA cleavage occurs in two different domains: the HNH domain that cuts the target strand and RuvC domain that cleaves the nontarget strand. (c) Cartoon representation of the crystal structure of SpCas9 (PDB 4UN3). Cas9 domains are colored according to the scheme in (a). Abbreviations: Arg: arginine- rich bridge helix; DL: disordered linker; CTD: C-terminal domain; NUC: nuclease lobe; PAM: protospacer-adjacent motif; REC: recognition lobe.

target DNA recognition and efficient cleavage, resulting in activator-like effector)) was fused to a Cas9 in which key a >140-fold higher specificity in human cells as compared PAM-interacting residues were mutated to reduce DNA- to wild-type dCas9 [49]. However, this strategy restricts binding affinity. Three different Cas9-ZFP end-to-end fusion the overall targetable sequence space because it requires proteins were designed to bind to 12 base pair sequences, and two Cas9-compatible sequences to be close enough (~15 their activities were tested at previously defined off-target to 25 bp) in order for FokI to dimerize and cleave the sites. The approach not only dramatically decreased off- target DNA. In an alternative strategy to improve Cas9 target cleavage events but the Cas9-ZFP chimeras also DNA-binding precision [41], a DNA-binding domain reduced the size of the engineered protein, which is advanta- (either ZFP (zinc finger proteins) or a TALE (transcription geous for the viral delivery systems. 4 International Journal of Genomics

DNA cleavage ′ ′ 5 − −3 (active domain ex: FokI nuclease) 5′− −3′ + 5′− −3′ Gene fusion dCas9 gene Active domain 5′− −3′ Base editing gene (active domain ex: cytidine or adenine deaminase) Inducible heterodimerization (active domain ex: CRY2, CIB1, CIBN, ABI–PYL1, GID1–GAI) Active domain Gene activation (active domain ex: p65, VPR) Gene inhibition (active domain ex: KRAB)

Epigenetic modifcations (active domain ex:DNMT3a, CpG Mtase, LSD1)

dCas9 Genomic imaging (active domain ex: fuorescentproteins) (a)

5′− −3′ 5′− −3′ 5′− −3′ Cas9 or dCas9 gene Input domain Gene fusion gene

Binding domain + Input domain Cas9 or dCas9 Inteins

(b)

Figure 2: Schematic representations of Cas9 engineering by domain fusion. (a) End-to-end gene fusion approach. A diverse range of new technologies for DNA manipulation can be created by end-to-end fusion of Cas9 with an active domain. These fusions have been used to expand Cas9 applications. (b) Schematic depiction of Cas9 engineering by domain insertion. A switch behaviour could emerge in such a way that the Cas9 protein could be regulated by the input domain’s recognition of an input signal. Either binding domains or inteins have been used to generate Cas9 with a switch response. DNA sequences are depicted as lines. A light gray color of the Cas9 indicates that the protein is inactive or less active while blue represents the active state. The signal that modulates the switch is showed as a black triangle.

In a further attempt to decrease the off-target events, the the generation of random insertions and deletions (indels). SpCas9 was fused to the structurally unstable protein To achieve this goal, dCas9 was used as a DNA-binding domain: dihydrofolate reductase (DHFR) or the estrogen domain and fused to a deaminase. In 2016, Komor et al. receptor (ER50) [42], which target the fusion protein for created a so-called “base editor” to convert cytidines into rapid proteasomal degradation [50]. Switched systems were uridine within a sequence of five nucleotides located between created by fusion of DHFR or ER50 to both the N- and C- the protospacer and PAM [51]. A cytidine deaminase from termini of SpCas9. These fusion proteins bind to the DNA Rat known as APOBEC1 (apolipoprotein B mRNA editing target with full endonuclease activity only in the presence enzyme, catalytic polypeptide-like 1) was fused to the N- of trimethoprim (TMP) or 4-hydroxytamoxifen (4-HT), terminus of dCas9 using a linker in order to maintain small molecules that stabilize DHFR and ER50, respectively. deaminase activity. Iterative optimizations in the linker In the absence of TMP or 4-HT, fused proteins were targeted and chimera were performed, and each step of optimiza- for degradation by the proteasome [42]. This approach tion produced a new “generation” of the construction. demonstrates that limiting Cas9 action to a short and Therefore, the third generation of base editors (BE3) controlled period may decrease off-target activity. consisted of APOBEC1 linked to the N-terminus of dCas9 with the catalytic His840 restored and a uracil glycosylase 2.2. Base Editing with Cas9 Fusion Chimeras. Cas9 has also inhibitor linked at Cas9 C-terminus (APOBEC–linker– been fused end-to-end with cytidine deaminase [51] and dCas9(A840H)–UGI). This chimera showed a permanent adenine deaminase [52] (Figure 2(a)) in order to develop point mutation correction of up to 75% of total cellular nucleotide base editors. In both cases, the goal was to effi- DNA with 1% indel formation [51]. Since the use of this ciently correct point mutations related to diseases, without chimera is limited by off-target activity, the BE3 has been International Journal of Genomics 5 improved by incorporating four point mutations (N497A, activator includes three NLS for correct nuclear localization, R661A, Q695A, and Q926A) to generate a high-fidelity the photolyase homology region of CRY2 (CRY2PHR), base editor (HF-BE3) with reduced off-target activity [53]. and the activator domain (p65)-NLSdCas9-trCIB1 and ffi fi However, e cient CRISPR-Cas9-based editors require a (NLS)3-CRY2PHR-p65, and the nal fusion protein demon- 5′-NGG-3′ PAM, limiting the sequence space in the strated a 31-fold induction. Polstein and Gersbach created a genome that can be efficiently targeted. Furthermore, light-activated CRISPR/Cas9 effector (LACE) system in these base editors are only able to efficiently convert C which a different fusion protein combination was used [40]. to T within ~5 bp of the editing window, and this window Optimized LACE consisted of two CIB domains fused to could introduce undesired changes around the base target. both the N- and C-termini of Cas9 together with a CRY2 In an attempt to address these limitations, SpCas9 was fused to a transcriptional activator. This system was used substituted for dCas9 from Staphylococcus aureus (SaCas9) in HEK293T cells to control expression of human IL1RN in the BE3 to generate the APOBEC1–SaCas9n–UGI and presented a 400-fold increase after 30 hours of blue (SaBE3). In order to obtain a narrower editing window, a light irradiation. series of iterative optimizations were conducted resulting in In addition to light activation, systems for chemical base editors with altered PAM specificities (YE1-BE3, EE- activation of dCas9 to regulate gene expression have been BE3, YE2-BE3, or YEE-BE3) [54]. Although mutations in developed [57, 58]. A variant of Cas9 with nuclease activity the cytidine deaminase enzyme could narrow the editing modulated by the presence of the tamoxifen analogue 4-HT window, the variants were not able to discriminate among (Figure 3(d)) [58] was based on fusions of human estrogen the cytidines within the window. An alternative base- receptor 2 (ERT2) domain at either the N- or C-terminus editing chimera has been described that uses an engineered of Cas9, in which the positions of the nuclear localization human APOBEC3a (eA3A) domain in the BE3 [55]. The signal (NLS) was varied. The result was a switch protein eA3A-BE3 chimera was able to correct a mutation in a working as a rapidly inducible conditional genome-editing human beta-thalassemia promoter with >40-fold higher system, with high DNA-cleavage efficiency after induction precision than BE3. Furthermore, the eA3A-BE3 had lower with 4-HT and low background in the absence of inducer off-target activity as compared to BE3 [55]. [58]. Variants of Cas9 responding to both chemical and Using an alternative approach, Gaudelli et al. sought to light inducers have also been developed [57] where an develop a base editor to convert A:T base pairs to G:C by activator domain (VPR) was fused to six chemical and fusing dCas9 to an adenosine deaminase [52]. However, light-inducible heterodimerization domains (Figure 2(a)): since there is no enzyme known to deaminate adenine in abscisic acid- (ABA-) inducible ABI–PYL1, gibberellin- DNA, the first step was to evolve E. coli TadA, a tRNA ade- (GA-) inducible GID1–GAI, rapamycin-inducible FKBP– nine deaminase that converts adenine to inosine. Mutations FRB, phytochrome-based red-light-inducible PHYB–PIF, in the vicinity of the D108 residue in TadA were sufficient cryptochrome-based blue-light-inducible CRY2PHR–CIBN, to introduce adenine deamination activity against DNA and light-oxygen-voltage-based blue-light-inducible FKF1– substrates. The TadA A106V/D108N mutant was fused to GI [57]. The best results were obtained with ABA- and GA- the N-terminus of the Cas9 (D10A mutant with nickase dimerized VPR–SpdCas9-activated systems (Figure 3(f)), in activity) [52]. After several rounds of iterative optimization, which EGFP protein expression increased 165- and 94-fold, evolved fused proteins were capable of converting A:T base respectively, upon induction. These efficiencies were similar pairs into G:C base pairs in human cells with approximately to those obtained for the VPR-dCas9 direct fusion protein, 50% efficiency, high specificity (~99%), and low rates of indel demonstrating the potential for the use of these chimeras as formation (<0.1%). These studies highlight both the potential gene expression regulators. The platform developed in this and the importance of improving the precision of base work was also created to be used as AND, OR, NAND, and editors using CRISPR-Cas9 for use in medical therapies. NOR dCas9 logic gates, offering a protein-based alternative to produce a functional output from multiple inputs [57]. 2.3. Cas9 Chimeras for Controlling Transcriptional Regulation Activated by Light. Several studies have reported the use of 2.4. Epigenetic Regulation by Cas9 Chimeras. Cas9 chimeras CRISPR-Cas9 to control gene expression [40, 56–58]. Two have also been created with the aim of controlling and manip- similar systems have been developed by fusion of CRISPR- ulating epigenetic modifications (Figure 2(a)) [59–62]. Fusion Cas9 and a light-inducible heterodimerizing cryptochrome of a DNA-binding domain to a cytosine DNA methyltransfer- 2 (CRY2) together with the calcium- and integrin-binding ase has been a common strategy in order to elucidate the protein 1 (CIB1) (Figure 2(a)) [40, 56]. Nihongaki et al. fused effects of DNA methylation in mammalian cells, in which dCas9 with CIB1 and CRY2 with a transcriptional activator dCas9 was fused to a DNA methyltransferase domain named [56]. In this system, dCas9 is expressed and binds to a target DNMT3a [59–63]. In these chimeras, the dCas9 acts as a DNA sequence guided by the gRNA. Upon blue light DNA-binding domain and directs binding to a specificDNA irradiation, CRY2 and CIB1 heterodimerize, and the tran- promotor target determined by the sgRNA control. The meth- scriptional activator is recruited to the target locus, activating yltransferase then modulates DNA methylation of the target gene expression. After optimization, a fusion protein was promoter resulting in the downregulation of target genes. generated in which the N-terminus of dCas9 was linked to a In a different strategy, DNMT3a was artificially split gen- nuclear localization signal (NLS), and the C-terminus of erating N- and C-terminal fragments. Subsequently, the dCas9 was linked to CIB1 with a truncated C-terminus. The DNMT3a C-terminal fragment was fused to the C-terminus 6 International Journal of Genomics

4-HT

OH hv

И O

Intein self- GFP splicing (pdDronpa) Engineered ligand- dependent intein (a) (b) Light-oxygen-voltage (LOV) 4-HT domain OH hv И O

Estrogen receptor (ER-LBD) (c) (d) N-Intein GA Intein Intein GA Active Cas9 Active Cas9 AND ABA OR ABA recognition splicing VPR VPR ABA PYL1 VPR ABI C-Intein GID1 ABA GA PYL1 GA GID1 GAI ABI GAI

(e) (f)

Figure 3: Selected case studies of engineering dCas9 by domain fusion. (a) Intein insertion inactivates dCas9, and in the presence of 4-HT, the inserted intein (red) undergoes self-splicing and restores the active Cas9 structure and function [70]. (b) Insertion of an engineered GFP (pdDronpa) that dimerizes in the dark and prevents DNA binding. On light illumination, pdDronpa dissociates and enables the binding of dCas9 to DNA [71]. (c) In the absence of light, the LOV domain (dark green) maintains a stable dimer that sterically blocks dCas9 function. Light induces LOV dissociation and Cas9 activation [74]. (d) Domain insertion of a ligand-binding domain (LBD) of the estrogen receptor ER leads to an allosteric activation by 4-HT [75]. Rather than domain insertion, an end-to-end fusion with ER-LBD results to nuclear translocation by 4-HT [58]. (e) A split Cas9 composed of two separate fragments is fused to intein sequences that perform self-splicing upon dimerization, leading to fully active Cas9 [83]. (f) The dCas9 fusion with multiple domains can generate a multi-input system and produce logic gates [57]. A VPR–SpdCas9 construct induces gene expression in the presence of gibberellin (GA) OR/AND abscisic acid (ABA). A light gray color of the Cas9 indicates that the protein is inactive or less active while blue represents the active state. of dCas9 using a 15-amino-acid linker [20]. The dCas9 DNMT [59]. This study showed that the combination directed the assembly of methyltransferase fragments at the of dCas9-KRAB with dCas9-DNMT improved silencing CpG site, resulting in an efficient (~70%) and predictable efficiency. These examples highlight the versatility of the DNA methylation [63]. Application of Cas9 fusions for Cas9 as a tool for silencing transcription and targeting epigenetic manipulation also includes the fusion of a histone regulatory regions. demethylase LSD1 (Lys-specific histone demethylase) to Cas9 from Neisseria meningitides [64]. This chimera was 2.5. Cas9 Chimera for Genome Imaging. With the aim of used to target the distal enhancer region of the endogenous simplifying the study of spatial genome organization, fusion transcription factor gene Oct4 in mouse embryonic stem of fluorescent protein domains to dCas9 has been applied cells, resulting in the repression of Oct4 expression and to the visualization of genomic loci and chromatin spatio- loss of pluripotency [64]. In a separate study, targeting temporal dynamics in live cells (Figure 2(a)) [39, 66, 67]. three endogenous promoters using a chimera in which the Although the fusion of single-color protein fluorescent labels C-terminus of a dCas9 was fused to the highly conserved (eGFP) to SpdCas9 has been described [39, 66], multiple acetyltransferase p300 resulted in acetylation of the histone labels are required to differentiate interchromosomal and H3 lysine 27 at the target site and transcriptional activation intrachromosomal loci within the nucleus. The design of of target genes [65]. Finally, dCas9 was fused to two families multicolor versions of dCas9 from three bacterial orthologs, of proteins directly involved in gene silencing through S. pyogenes, N. meningitides, and Streptococcus thermophiles, methylation, the KRAB (Krüppel-associated box) that forms has been reported [67]. Each dCas9 ortholog, targeting the a complex with two histone methyltransferases, and the human telomere DNA repeat, was fused to a fluorescent International Journal of Genomics 7 protein (green fluorescent protein (GFP), red fluorescent DNA (Figure 3(b)). The fluorescent protein was inserted into protein (RFP), or blue fluorescent protein (BFP)). The con- two loops in Cas9 at the REC2 domain and into the CTD struction NLS-dCas9-(NLS)1–3-(XFP)3, in which XFP means domain of the NUC lobe. These loops are situated across GFP, RFP, or BFP, permitted the estimation of the intranuc- the DNA-binding cleft and are occluded by pdDronpa lear distance between loci in different chromosomes and the dimerization. Although psCas9 produced an indel level lower linear distance between two loci in the same chromosome, than the parental Cas9 that was similar to other photoin- allowing assessment of the DNA compaction in these regions duced two-component Cas9 systems [73], the ps-SpCas9 in a live cell. The development of these tools may enable functions as a single chain, and the pdDronpa domain the study of the 4D nucleome and the regulation of gene can also be used as a localization and expression marker. expression in different eukaryotic cell types and at various The ps-SpCas9 was also engineered to regulate gene tran- stages of development and differentiation [67]. scription, for which the mutations D10A and H841A were introduced in Cas9 to create ps-dSpCas9, which was then 2.6. Cas9 Chimeras Created by Domain Insertion. A limita- fused to a VP64-p65-Rta (VPR) transactivation module tion of many Cas9-derived systems is that its activity is not [57] at the N-terminus. The pdDronpa domain was replaced directly modulated, and domain insertion has been used to to a new version of the protein called pdDronpa1.2 with create new functions in Cas9, including inducible response. reduced basal activity in the absence of light [72]. This Fine tuning of the Cas9 is of key importance to reduce the new protein, denominated as VPR-ps-dSpCas9, showed a off-target activity and to allow a spatial-temporal control of 58-fold induction of the reporter gene (mCherry) after light the protein. Inducible Cas9 can be created by fusing two illumination. This is higher than an optimized multiple- domains in such a way that the Cas9 activity is regulated by chain light-activated Cas9 effector (LACE) system [40, 56] the recognition of an input signal by a sensor domain. and comparable with chemically inducible systems [57]. Inteins (intervening proteins) are proteins that perform A recent study also used a photomodulated dimerizing protein-splicing posttranslational modifications [68] with protein known as RsLOV, a photoreceptor from Rhodobacter the goal of creating a regulated Cas9; SpCas9 has been sphaeroides, which in the dark is a homodimer and upon recombined with the intein 37R3-2 [69, 70]. Since inteins illumination by blue light dissociates to its monomeric form are self-excised from the “immature” polypeptide producing [74]. The RsLOV was inserted using flexible linkers at 231 a functional mature protein, the insertion of an intein into positions throughout the dCas9 protein via multiplex- the Cas9 may lead to its inactivation, and its excision may inverse PCR (Figure 3(c)). Chimeras were selected by result in Cas9 activation. The 37R3-2 intein was engineered fluorescence-activated cell sorting (FACS) based on the to perform protein splicing only in the presence of chemical ability to modulate RFP expression in the presence of blue 4-HT. Subsequently, the engineered 37R3-2 was inserted at light (470 nm), and two photoactivatable RsLOV-Cas9 15 positions of Cas9 distributed throughout its different variants demonstrated repression activity enhancement domains. Insertions at Ala728, Thr995, and Ser1154 did not under blue light. A temperature-sensitive variant, denoted ° lead to a significant inactivation of Cas9 in the absence of as tsRC9, also was isolated and presented activity at 29 C ° 4-HT, suggesting that these positions tolerate large protein but negligible activity at 37 C [74]. insertions. Insertions at positions S219 and C574 produced low nuclease activity in the absence of 4-HT and an ~4-fold 2.8. Exploring the Limits of Insertional Fusion with Cas9. In increase of this activity in the presence of the compound an elegant approach, Oakes et al. [75] have characterized (Figure 3(a)). The on-target/off-target ratio was 6-folds SpCas9 tolerance to domain insertion by random insertion higher on average and up to 25-folds higher when compared of an engineered Mu transposon flanked by BsaI endonucle- to the wild-type Cas9, demonstrating that the insertion of the ase restriction sites into dCas9. Deep sequencing analysis intein into specific sites of Cas9 was able to increase the showed that the transposon was inserted in >70% of all protein precision for gene editing. However, a decrease in possible amino acid sites of dCas9, and this library was used nonhomologous end joining (NHEJ) efficiency was observed, as a starting point for insertion of the 86 aa PDZ domain. The and in addition, the activation of the Cas9 activity by the library containing the plasmid carrying dCas9 and Mu intein domain is irreversible. transposon was digested with BsaI, and the transposon was replaced by the BsaI-digested gene encoding the PDZ 2.7. Photoregulation of Cas9 Chimeras Created by Domain domain with flanking amino acid linkers. After PDZ inser- Insertion. The ideal tool for DNA editing would be an tion, FACS was used to identify dCas9 variants that could inducible Cas9 that could be readily modulated by the pres- repress RFP expression, and 127 positions were identified ence of a nontoxic, nonmetabolizable, inexpensive signal. A as tolerant to PDZ insertion. These sites tended to cluster recent study explored the photomodulated dimerizing around flexible loops, at solvent-exposed residues and the protein domains known as pdDronpa [71], a green fluores- ends of helices, and preferential sites for domain insertion cent protein which dimerizes in the dark but dissociates to were found mainly within the helical II recognition lobe monomers upon illumination with light at 500 nm [72]. A (REC), in the RuvC-III region and throughout the CTD single-chain photoswitchable Cas9 (ps-SpCas9) was engi- domain. Insertions into vital motifs, such as sgRNA- neered by insertion of the pdDronpa protein into Cas9 at binding grooves, the bridge helix, the PAM-binding pocket, two positions, and in the absence of light, the dimerization and the DNA/RNA heteroduplex annealing channel, resulted of the pdDronpa protein blocks the binding of Cas9 to the in impaired Cas9 functionality. Subsequently, eight insertion 8 International Journal of Genomics sites were chosen to introduce a SH3 domain, of which six recognition of the 5′-NGG-3′ to the 5′-NGGNG-3′ PAM showed repression levels similar to parental dCas9 [75]. from S. thermophilus. Furthermore, 2–4 insertions of multiple PDZ and SH3 A limitation of SpCas9 in gene therapy is its size domains were introduced into dCas9 at validated insertion (~4.3 kb), which limits its efficient gene-based delivery via sites, and many of these constructs were capable of repressing recombinant adenoassociated virus (rAAV) [80, 81]. rAAV expression to a degree comparable with the parental dCas9. is a viral vector that has been successfully used in therapeutic Moreover, with the aim of developing a switch Cas9, gene editing. However, the size limit of an insert in this vector the well-characterized human estrogen receptor-α ligand- is ∼4.7 kb, which is problematic for packaging SpCas9, binding domain (ER-LBD) was inserted into the naive sgRNA, and control elements [80, 81]. To reduce the size of dCas9 transposition library (Figure 3(d)). The ER-LBD Cas9, the effects of deletion of specific Cas9 domains on binds 4-HT, and a dCas9 variant carrying an ER-LBD inser- protein activity were investigated [76]. It was observed that tion (denominated as darC9:231) capable of modulating gene the enzyme with the REC2 domain deleted (Δ175–307) repression in the presence of 4-HT was identified. The maintained 50% activity as compared to its wild-type catalytic residues (D10 and H840) were reintroduced in counterpart, indicating that this domain is not critical for darC9 to produce arcC9:231. In the presence of 4-HT, arC9 DNA cleavage and could potentially be targeted to reduce increased chromosomal cleavage 100- and 24-fold in E. coli the size of Cas9 [76]. and human cells, respectively [75]. These results demonstrate Using another approach, the nuclease (M1 to E57-GSS that the domain insertion profiling of Cas9 can generate linker-G729 to D1368) and the α-helical lobes (G56 to important functional information, which can be used to S714) of Cas9 were expressed separately [82], and although facilitate protein engineering. neither of the two fragments presented activity, both parts readily reassemble to an active form on contact with the sgRNA, albeit with reduced editing efficiency. In another 3. Structure-Based Cas9 Engineering study, the Cas9 coding sequence was divided into two segments, and each was fused to an intein moiety (N-moiety: The crystal structure of Cas9 from S. pyogenes [76] pro- M1 to E573 and C-moiety: C574 to D1368 or N-moiety: vides the basis for rational engineering and modification M1–K637 and C-moiety: T638–D1368) [83]. After the of the protein. Using structure-based design principles, it sequences were introduced into the cell by a recombinant is possible to enhance specific properties of Cas9, reduce adenoassociated virus, protein splicing efficiently reconsti- nonspecific activity, switch specific regions of the protein, tuted the Cas9 as a single polypeptide with full activity and obtain enzymes that are suitable for specific genome- (Figure 3(e)), suggesting a means to use CRISPR for gene engineering applications. therapy via adenovirus release of genetic material. Splitting Nonspecific cleavage of DNA is the consequence of Cas9 into two segments (N-moiety: D2–V713 and C-moiety: imperfect complementarity between the RNA guide and a S714–D1.368) and fusing each segment to a photodimeriza- genomic site, leading to off-target gene editing. The stabiliza- tion domain (pMag and nMag) [84] resulted in reassociation tion of the nontarget DNA by Cas9 is through the positively of the two fragments on irradiation with blue light and charged groove located between the HNH-, RuVC-, and restoration of Cas9 structure and activity, suggesting the pos- PAM-interacting residues in the CTD domain. Engineering sibility for optical regulation of Cas9 activity. Based on crystal of this region holds the promise of reducing off-target structure analysis, the Cas9 was split into eleven sites [85], edition, and the substitution of the 32 positively charged and in each case, the fragments were fused to FKBP or FRB residues by alanine in this groove leads to the identification that dimerize in the presence of rapamycin. In addition, the of five residues that decreased off-target cleavage [77]. The N-terminal fragment was fused to a nuclear export sequence combination of these mutations generated three Cas9 (NES) and C-terminal fragment to a nuclear localization variants with normal on-target activity and decreased off- sequence (NLS). Thus, split Cas9 can be reassembled upon target indel formation. Using a similar approach, alanine rapamycin induction and acted in the nucleus without substitution of four polar or charged residues located in the detectable off-target activity. Moreover, a split dCas9 version same groove resulted in undetectable levels of off-target was fused to the VP64 transactivation domain, and this indels [78]. Structural analysis revealed that these residues construction was able to activate transcription of target genes participate in nonspecific interactions with the phosphate in the presence of rapamycin. backbone of the target DNA strand. The crystal structure of the Cas9 shows that the K866 The cleavage of DNA by Cas9 is dependent of the undergoes substantial conformational change upon sgRNA recognition of the protospacer-adjacent motif (PAM). The binding, allowing the proper positioning of the target DNA. recognition of a specific PAM sequence by Cas9 is one of With the aim of modulation of Cas9, a site-specific photo- the limitations of using the wild-type SpCas9. Sequence caged lysine was introduced at these positions to create databases presently contain over 1000 Cas9 orthologs whose optochemical control of Cas9 [71]. The photocaged lysine different PAM specificities could provide insights to engineer at this position deactivated Cas9, and the caging group the SpCas9 and alter its specificity [79]. The CTD domain could be removed through light exposure. The photo- (containing the PAM-interacting motif) from the SpCas9 caged Cas9 showed minimal background activity in the has been exchanged for the Cas9 orthologous CRISPR-3 absence of UV light and reached parental Cas9 levels after from Streptococcus thermophilus [76], which altered the light irradiation. International Journal of Genomics 9

4. Engineering PAM Specificity by are located near the DNA-sgRNA interface in the crystal Directed Evolution structure. Restoring the catalytic residues showed that the variants were able to cleave the DNA with five PAM ′ ′ ′ ′ ′ ′ ′ ′ The sequences recognized by Cas9 are limited by the sequences (5 -NG-3 ,5-NNG-3 ,5-GAA-3 ,5-GAT-3 , requirement of a specific protospacer-adjacent motif and 5′-CAA-3′), and off-target tests showed a signifi- (PAM) [24]. One strategy to improve CRISPR-Cas9 is cant off-target activity reduction as compared to the by changing PAM specificity using directed evolution parental SpCas9. techniques [86]. Based on the crystal structure of Cas9 [76], the PAM-interacting domain of SpCas9 was sub- jected to random mutagenesis, and a library of mutated 5. Future Perspectives sequences was screened against the 5′-NGA-3′ PAM- target site. Three Cas9 variants called VQR (D1135V/ Cas9 has a tremendous utility for the regulation and fi R1335Q/T1337R), EQR (D1135E/R1335Q/T1337R), and modi cation of complex biological systems. However, VRER (D1135V/G1218R/R1335E/T1337R) were obtained, overcoming the limitations of the system is paramount to whose PAM specificity changed from 5′-NGG-3′ observed realize its full potential. An ideal Cas9-based tool should ′ ′ ′ ′ bind and/or cleave a single specific target in a complex in the wild-type Cas9 to 5 -NGA-3 ,5-NGAG-3 , and ff ′ ′ genome without generating o -targets as side products. 5 -NGCG-3 , respectively, thereby broadening the target In addition, it would be highly desirable to develop Cas9 range of the Cas9 [86]. technologies endowed with precise spatiotemporal control, This approach is limited because it needs to evolve rapid responses to inducers, lack of toxicity, ease of each variant separately with a potential PAM sequence, customization, and high efficiency of in vitro and in vivo and this limitation is exacerbated for Cas9 orthologs that delivery. The studies described here showed that over specify longer PAMS. As an alternative, variants with relaxed fi recent years, new ways to improve Cas9 functions have speci cities within the PAM could be evolved [87], and been developed. These protein-engineering efforts have the predicted PI-domain from SaCas9 was randomly fi fi made signi cant contributions to the improvement of gene mutagenized and tested for PAM speci city, resulting in therapy, genomic imaging, and the emerging field of a Cas9 variant called KKH (E782K/N968K/R1015H) which synthetic biology. showed the same DNA-cleavage specificity when compared ′ ′ However, new Cas9-based tools are required in order to to its wild-type counterpart (5 -NNGRRT-3 ). Nevertheless, overcome a number of features that are still poorly explored. fi the KKH enzyme can also identify nonspeci c sites such as For example, most of the systems described in this review still 5′-NNARRT-3′,5′-NNCRRT-3′, and 5′-NNTRRT-3′, which present low delivery efficiency and off-target activity, are increases its targeting range. The crystal structures of the relatively expensive, and are not readily adapted for multi- three variants [76] revealed that PAM-reprogramming of plexing. This suggests that there is still plenty of scope for these enzymes is based on a synergistic effect of the engineering Cas9. For example, some inducible Cas9 systems mutations in the displacement of the PAM duplex, enabling use expensive steroids that have short half-lives in solution different nucleotide sequences to be recognized by these and relatively high toxicity for both prokaryote and new PAMs. eukaryote cells [90–92]. Engineering Cas9 to respond to a In order to enhance acquisition of spacer sequences range of inexpensive and nontoxic signals would result in flanked by non-NGG PAM motifs, SpCas9 variants were CRISPR applications that are more flexible and economically created by error-prone PCR, and after selection, a Cas9 feasible on a larger scale. In addition, many of these studies variant I473F was identified with higher specificity for the are specific to mammalian cells. The adaptability of these PAM sequence 5′-NAG-3′ [88]. This variant caused an tools to other platforms such as bacteria, fungus, and plant enhanced immune response against viruses due to higher cells would greatly increase its biotechnological impact. rates of spacer acquisition, showing the importance of the Moreover, little is known about the allergenic potential enzyme not only in DNA-cleavage events but also in gaining of Cas9 in gene therapies in humans, and protein engi- new spacer sequences to facilitate the CRISPR immune neering plays a key role for studying and solving this response in bacteria. potential issue. Protein engineering of catalytic and bio- A recent strategy that has been employed to broaden physical properties is as yet little explored, and Cas9 with PAM specificity was to use phage-assisted continuous improved catalytic efficiency or able to work in extreme evolution (PACE) to evolve a SpCas9 variant (xCas9) conditions could be useful for engineering extremophilic [89]. Using PACE, hundreds of generations of directed organisms. In addition to Cas9 engineering, sgRNA evolution could be implemented, and screening used a engineering also has been used to enhance CRISPR func- bacterial one-hybrid selection in which dCas9 was fused tionality, and the combination of both approaches could to the ω subunit of a bacterial RNA polymerase which be used to expand Cas9 applications [4, 93]. Given the on binding to a PAM causes phage propagation. It was rapid development of diverse engineered SpCas9s, it is proposed that the replication of variants with broader likely that in the future, a wide range of tailored Cas9, PAM specificity would be favored. After continuous working as simple or multiplex tools, will be available for in vivo protein evolution, several variants were enriched a wide variety of genome-editing applications, including in the pool, including R324L, S409I, and M694I, which gene therapy and treatment of disease. 10 International Journal of Genomics

Abbreviations of the prokaryotic CRISPR defence system,” Microbiology, vol. 155, no. 3, pp. 733–740, 2009. SpCas9: Cas9 from Streptococcus pyogenes [10] D. Rath, L. Amlinger, A. Rath, and M. Lundgren, “The SaCas9: Cas9 from Staphylococcus aureus CRISPR-Cas immune system: biology, mechanisms and PAM: Protospacer-adjacent motif applications,” Biochimie, vol. 117, pp. 119–128, 2015. NLS: Nuclear localization signal [11] I. Fonfara, A. le Rhun, K. Chylinski et al., “Phylogeny of CRY2: Light-inducible heterodimerizing cryptochrome 2 Cas9 determines functional exchangeability of dual-RNA and CIB1: Calcium- and integrin-binding protein 1 Cas9 among orthologous type II CRISPR-Cas systems,” 4-HT: 4-Hydroxytamoxifen Nucleic Acids Research, vol. 42, no. 4, pp. 2577–2590, 2014. VPR: Chimeric activation domain composed of the [12] K. M. Esvelt, P. Mali, J. L. Braff, M. Moosburner, S. J. Yaung, activation domainsVP64, P65, and Rta and G. M. Church, “Orthogonal Cas9 proteins for RNA- pdDronpa: Engineered GFP that dimerizes in the dark guided gene regulation and editing,” Nature Methods, vol. 10, LOV: Photomodulated dimerizing protein no. 11, pp. 1116–1121, 2013. ER-LBD: Ligand-binding domain of the estrogen [13] F. J. M. Mojica and F. Rodriguez-Valera, “The discovery of receptor. CRISPR in archaea and bacteria,” The FEBS Journal, vol. 283, no. 17, pp. 3162–3169, 2016. [14] F. J. M. Mojica and L. Montoliu, “On the origin of CRISPR-Cas Conflicts of Interest technology: from prokaryotes to mammals,” Trends in Micro- – The authors declare that there is no conflict of interest biology, vol. 24, no. 10, pp. 811 820, 2016. “ regarding the publication of this paper. [15] D. Burstein, L. B. Harrington, S. C. Strutt et al., New CRISPR- Cas systems from uncultivated microbes,” Nature, vol. 542, no. 7640, pp. 237–241, 2017. Acknowledgments [16] F. Jiang and J. A. Doudna, “CRISPR–Cas9 structures and mechanisms,” Annual Review of Biophysics, vol. 46, no. 1, The authors are grateful for support from Fundação de pp. 505–529, 2017. Amparo à Pesquisa do Estado de São Paulo (FAPESP) [17] H. Chen, J. Choi, and S. Bailey, “Cut site selection by the two (LFR: 2016/18827-7; LFCR: 2016/20358-5). The authors nuclease domains of the Cas9 RNA-guided endonuclease,” thank Dr. M. Ostermeier for the helpful insights. The authors The Journal of Biological Chemistry, vol. 289, no. 19, sincerely apologize to the authors whose work could not be pp. 13284–13294, 2014. included in this manuscript due to space limitation. [18] G. Gasiunas, R. Barrangou, P. Horvath, and V. Siksnys, “Cas9- crRNA ribonucleoprotein complex mediates specific DNA References cleavage for adaptive immunity in bacteria,” Proceedings of the National Academy of Sciences, vol. 109, no. 39, [1] P. Mali, K. M. Esvelt, and G. M. Church, “Cas9 as a versatile pp. E2579–E2586, 2012. tool for engineering biology,” Nature Methods, vol. 10, [19] C. Anders, O. Niewoehner, A. Duerst, and M. Jinek, “Struc- no. 10, pp. 957–963, 2013. tural basis of PAM-dependent target DNA recognition by the [2] P. D. Hsu, E. S. Lander, and F. Zhang, “Development and Cas9 endonuclease,” Nature, vol. 513, no. 7519, pp. 569–573, applications of CRISPR-Cas9 for genome engineering,” Cell, 2014. vol. 157, no. 6, pp. 1262–1278, 2014. [20] M. Jinek, F. Jiang, D. W. Taylor et al., “Structures of Cas9 [3] M. F. Copeland, M. C. Politz, and B. F. Pfleger, “Application of endonucleases reveal RNA-mediated conformational activa- TALEs, CRISPR/Cas and sRNAs as trans-acting regulators in tion,” Science, vol. 343, no. 6176, article 1247997, 2014. prokaryotes,” Current Opinion in Biotechnology, vol. 29, [21] S. H. Sternberg, S. Redding, M. Jinek, E. C. Greene, and pp. 46–54, 2014. J. A. Doudna, “DNA interrogation by the CRISPR RNA- [4] R. Barrangou and J. A. Doudna, “Applications of CRISPR guided endonuclease Cas9,” Nature, vol. 507, no. 7490, technologies in research and beyond,” Nature Biotechnology, pp. 62–67, 2014. vol. 34, no. 9, pp. 933–941, 2016. [22] M. D. Szczelkun, M. S. Tikhomirova, T. Sinkunas et al., “Direct [5] R. Barrangou and P. Horvath, “A decade of discovery: CRISPR observation of R-loop formation by single RNA-guided Cas9 functions and applications,” Nature Microbiology, vol. 2, and cascade effector complexes,” Proceedings of the National article 17092, 2017. Academy of Sciences, vol. 111, no. 27, pp. 9798–9803, 2014. [6] S. L. Morgan, N. C. Mariano, A. Bermudez et al., “Manipula- [23] L. Cong, F. A. Ran, D. Cox et al., “Multiplex genome engineer- tion of nuclear architecture through CRISPR-mediated ing using CRISPR/Cas systems,” Science, vol. 339, no. 6121, chromosomal looping,” Nature Communications, vol. 8, 2017. pp. 819–823, 2013. [7] N. Hao, K. E. Shearwin, and I. B. Dodd, “Programmable DNA [24] W. Jiang, D. Bikard, D. Cox, F. Zhang, and L. A. Marraffini, looping using engineered bivalent dCas9 complexes,” Nature “RNA-guided editing of bacterial genomes using CRISPR- Communications, vol. 8, no. 1, p. 1628, 2017. Cas systems,” Nature Biotechnology, vol. 31, no. 3, pp. 233– [8] M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, 239, 2013. and E. Charpentier, “A programmable dual-RNA-guided [25] V. Pattanayak, S. Lin, J. P. Guilinger, E. Ma, J. A. Doudna, DNA endonuclease in adaptive bacterial immunity,” Science, and D. R. Liu, “High-throughput profiling of off-target vol. 337, no. 6096, pp. 816–821, 2012. DNA cleavage reveals RNA-programmed Cas9 nuclease [9] F. J. M. Mojica, C. Díez-Villaseñor, J. García-Martínez, and specificity,” Nature Biotechnology, vol. 31, no. 9, pp. 839– C. Almendros, “Short motif sequences determine the targets 843, 2013. International Journal of Genomics 11

[26] G. J. Gibson and M. Yang, “What rheumatologists need to catalytic performance,” The Journal of Biological Chemistry, know about CRISPR/Cas9,” Nature Reviews Rheumatology, vol. 286, no. 50, pp. 43026–43038, 2011. vol. 13, no. 4, pp. 205–216, 2017. [45] B. Pierre, J. W. Labonte, T. Xiong et al., “Molecular determi- [27] J. Luo, “CRISPR/Cas9: from genome engineering to cancer nants for protein stabilization by insertional fusion to a drug discovery,” Trends in Cancer, vol. 2, no. 6, pp. 313–324, thermophilic host protein,” Chembiochem, vol. 16, no. 16, 2016. pp. 2392–2402, 2015. [28] S. Q. Tsai and J. K. Joung, “Defining and improving the [46] C. S. Kim, B. Pierre, M. Ostermeier, L. L. Looger, and J. R. Kim, genome-wide specificities of CRISPR-Cas9 nucleases,” Nature “Enzyme stabilization by domain insertion into a thermophilic Reviews Genetics, vol. 17, no. 5, pp. 300–312, 2016. protein,” Protein Engineering Design & Selection, vol. 22, – [29] J. A. Doudna and C. A. Gersbach, “Genome editing: the end of no. 10, pp. 615 623, 2009. the beginning,” Genome Biology, vol. 16, no. 1, p. 292, 2015. [47] X. H. Zhang, L. Y. Tee, X. G. Wang, Q. S. Huang, and “ ff ff [30] D. B. T. Cox, R. J. Platt, and F. Zhang, “Therapeutic genome S. H. Yang, O -target e ects in CRISPR/Cas9-mediated ” editing: prospects and challenges,” Nature Medicine, vol. 21, genome engineering, Molecular Therapy - Nucleic Acids, no. 2, pp. 121–131, 2015. vol. 4, article e264, 2015. “ [31] R. Aroul-Selvam, T. Hubbard, and R. Sasidharan, “Domain [48] S. Q. Tsai, N. Wyvekens, C. Khayter et al., Dimeric CRISPR fi insertions in protein structures,” Journal of Molecular Biology, RNA-guided FokI nucleases for highly speci c genome edit- ” – vol. 338, no. 4, pp. 633–641, 2004. ing, Nature Biotechnology, vol. 32, no. 6, pp. 569 576, 2014. “ [32] M. Wang and G. Caetano-Anollés, “The evolutionary [49] J. P. Guilinger, D. B. Thompson, and D. R. Liu, Fusion of mechanics of domain organization in proteomes and the rise catalytically inactive Cas9 to FokI nuclease improves the fi fi ” of modularity in the protein world,” Structure, vol. 17, no. 1, speci city of genome modi cation, Nature Biotechnology, – pp. 66–78, 2009. vol. 32, no. 6, pp. 577 582, 2014. “ [50] A. K. Singh Gautam, S. Balakrishnan, and P. Venkatraman, [33] L. F. Ribeiro, T. D. Warren, and M. Ostermeier, Construction “ of protein switches by domain insertion and directed Direct ubiquitin independent recognition and degradation ” – of a folded protein by the eukaryotic proteasomes-origin of evolution, Methods in Molecular Biology, vol. 1596, pp. 43 ” 55, 2017. intrinsic degradation signals, PLoS One, vol. 7, no. 4, p. e34864, 2012. [34] J. Tullman, N. Nicholes, M. R. Dumont, L. F. Ribeiro, “ [51] A. C. Komor, Y. B. Kim, M. S. Packer, J. A. Zuris, and D. R. Liu, and M. Ostermeier, Enzymatic protein switches built from “ paralogous input domains,” Biotechnology and Bioengineering, Programmable editing of a target base in genomic DNA – without double-stranded DNA cleavage,” Nature, vol. 533, vol. 113, no. 4, pp. 852 858, 2016. – “ no. 7603, pp. 420 424, 2016. [35] L. F. Ribeiro, N. Nicholes, J. Tullman et al., Insertion of a xyla- “ nase in xylose binding protein results in a xylose-stimulated [52] N. M. Gaudelli, A. C. Komor, H. A. Rees et al., Programmable ” base editing of A•TtoG•C in genomic DNA without DNA xylanase, Biotechnology for Biofuels, vol. 8, no. 1, p. 118, 2015. ” – “ cleavage, Nature, vol. 551, no. 7681, pp. 464 471, 2017. [36] L. F. Ribeiro, J. Tullman, N. Nicholes et al., A xylose- “ stimulated xylanase-xylose binding protein chimera created [53] H. A. Rees, A. C. Komor, W. H. Yeh et al., Improving the DNA specificity and applicability of base editing through by random nonhomologous recombination,” Biotechnology protein engineering and protein delivery,” Nature Communi- for Biofuels, vol. 9, no. 1, p. 119, 2016. cations, vol. 8, 2017. [37] M. Ostermeier, “Engineering allosteric protein switches by [54] Y. B. Kim, A. C. Komor, J. M. Levy, M. S. Packer, K. T. Zhao, domain insertion,” Protein Engineering, Design and Selection, and D. R. Liu, “Increasing the genome-targeting scope and vol. 18, no. 8, pp. 359–364, 2005. precision of base editing with engineered Cas9-cytidine “ [38] V. Stein and K. Alexandrov, Synthetic protein switches: deaminase fusions,” Nature Biotechnology, vol. 35, no. 4, ” design principles and applications, Trends in Biotechnology, pp. 371–376, 2017. vol. 33, no. 2, pp. 101–110, 2015. [55] J. M. Gerhke, O. R. Cervantes, M. Kendell Clement, L. Pinello, “ [39] B. Chen, L. A. Gilbert, B. A. Cimini et al., Dynamic imaging of and J. Keith Joung, “High-precision CRISPR-Cas9 base editors genomic loci in living human cells by an optimized CRISPR/ with minimized bystander and off-target mutations,” bioRxiv, ” – Cas system, Cell, vol. 155, no. 7, pp. 1479 1491, 2013. article 273938, 2018. “ [40] L. R. Polstein and C. A. Gersbach, A light-inducible CRISPR- [56] Y. Nihongaki, S. Yamamoto, F. Kawano, H. Suzuki, and ” Cas9 system for control of endogenous gene activation, M. Sato, “CRISPR-Cas9-based photoactivatable transcription – Nature Chemical Biology, vol. 11, no. 3, pp. 198 200, 2015. system,” Chemistry & Biology, vol. 22, no. 2, pp. 169–174, [41] M. F. Bolukbasi, A. Gupta, S. Oikemus et al., “DNA-binding- 2015. domain fusions enhance the targeting range and precision of [57] Y. Gao, X. Xiong, S. Wong, E. J. Charles, W. A. Lim, and ” – Cas9, Nature Methods, vol. 12, no. 12, pp. 1150 1156, 2015. L. S. Qi, “Complex transcriptional modulation with orthogo- [42] B. Maji, C. L. Moore, B. Zetsche et al., “Multidimensional nal and inducible dCas9 regulators,” Nature Methods, vol. 13, chemical control of CRISPR-Cas9,” Nature Chemical Biology, no. 12, pp. 1043–1049, 2016. – vol. 13, no. 1, pp. 9 11, 2017. [58] K. I. Liu, M. N. B. Ramli, C. W. A. Woo et al., “A chemical- [43] M. Berrondo, M. Ostermeier, and J. J. Gray, “Structure inducible CRISPR-Cas9 system for rapid control of genome prediction of domain insertion proteins from structures of editing,” Nature Chemical Biology, vol. 12, no. 11, pp. 980– individual domains,” Structure, vol. 16, no. 4, pp. 513–527, 987, 2016. 2008. [59] A. Amabile, A. Migliara, P. Capasso et al., “Inheritable [44] L. F. Ribeiro, G. P. Furtado, M. R. Lourenzoni et al., “Engineer- silencing of endogenous genes by hit-and-run targeted epige- ing bifunctional laccase-xylanase chimeras for improved netic editing,” Cell, vol. 167, no. 1, pp. 219–232.e14, 2016. 12 International Journal of Genomics

[60] P. Stepper, G. Kungulovski, R. Z. Jurkowska et al., “Efficient improved specificity,” Science, vol. 351, no. 6268, pp. 84– targeted DNA methylation with chimeric dCas9-Dnmt3a- 88, 2016. ” Dnmt3L methyltransferase, Nucleic Acids Research, vol. 45, [78] B. P. Kleinstiver, V. Pattanayak, M. S. Prew et al., “High-fidelity – no. 4, pp. 1703 1713, 2017. CRISPR-Cas9 nucleases with no detectable genome-wide off- [61] A. Vojta, P. Dobrinić, V. Tadić et al., “Repurposing the target effects,” Nature, vol. 529, no. 7587, pp. 490–495, 2016. ” CRISPR-Cas9 system for targeted DNA methylation, Nucleic [79] V. Siksnys and G. Gasiunas, “Rewiring Cas9 to target new – Acids Research, vol. 44, no. 12, pp. 5615 5628, 2016. PAM sequences,” Molecular Cell, vol. 61, no. 6, pp. 793-794, [62] J. I. McDonald, H. Celik, L. E. Rois et al., “Reprogrammable 2016. fi CRISPR/Cas9-based system for inducing site-speci c DNA [80] E. Zinn and L. H. Vandenberghe, “Adeno-associated virus: fit ” – methylation, Biology Open, vol. 5, no. 6, pp. 866 874, 2016. to serve,” Current Opinion in Virology, vol. 8, pp. 90–97, 2014. “ [63] T. Xiong, G. E. Meister, R. E. Workman et al., Targeted DNA [81] K. Chamberlain, J. M. Riyad, and T. Weber, “Expressing trans- methylation in human cells using engineered dCas9-methyl- genes that exceed the packaging capacity of adeno-associated ” fi transferases, Scienti c Reports, vol. 7, no. 1, p. 6732, 2017. virus capsids,” Human Gene Therapy Methods, vol. 27, no. 1, [64] N. A. Kearns, H. Pham, B. Tabak et al., “Functional annotation pp. 1–12, 2016. ” of native enhancers with a Cas9-histone demethylase fusion, [82] A. V. Wright, S. H. Sternberg, D. W. Taylor et al., “Rational – Nature Methods, vol. 12, no. 5, pp. 401 403, 2015. design of a split-Cas9 enzyme complex,” Proceedings of the [65] I. B. Hilton, A. M. D'Ippolito, C. M. Vockley et al., “Epigenome National Academy of Sciences, vol. 112, no. 10, pp. 2984– editing by a CRISPR-Cas9-based acetyltransferase activates 2989, 2015. ” genes from promoters and enhancers, Nature Biotechnology, [83] D. J. J. Truong, K. Kühner, R. Kühn et al., “Development of an – vol. 33, no. 5, pp. 510 517, 2015. intein-mediated split-Cas9 system for gene therapy,” Nucleic [66] T. Anton, S. Bultmann, H. Leonhardt, and Y. Markaki, Acids Research, vol. 43, no. 13, pp. 6450–6458, 2015. “ fi Visualization of speci c DNA sequences in living mouse [84] Y. Nihongaki, F. Kawano, T. Nakajima, and M. Sato, fl embryonic stem cells with a programmable uorescent “Photoactivatable CRISPR-Cas9 for optogenetic genome edit- ” – CRISPR/Cas system, Nucleus, vol. 5, no. 2, pp. 163 172, 2014. ing,” Nature Biotechnology, vol. 33, no. 7, pp. 755–760, 2015. [67] H. Ma, A. Naseri, P. Reyes-Gutierrez, S. A. Wolfe, S. Zhang, [85] B. Zetsche, S. E. Volz, and F. Zhang, “A split-Cas9 architecture “ and T. Pederson, Multicolor CRISPR labeling of chromo- for inducible genome editing and transcription modulation,” ” somal loci in human cells, Proceedings of the National Nature Biotechnology, vol. 33, no. 2, pp. 139–142, 2015. Academy of Sciences, vol. 112, no. 10, pp. 3002–3007, 2015. [86] B. P. Kleinstiver, M. S. Prew, S. Q. Tsai et al., “Engineered “ ’ [68] N. H. Shah and T. W. Muir, Inteins: nature s gift to protein CRISPR-Cas9 nucleases with altered PAM specificities,” ” – chemists, Chemical Science, vol. 5, no. 2, pp. 446 461, 2014. Nature, vol. 523, no. 7561, pp. 481–485, 2015. “ [69] S. H. Peck, I. Chen, and D. R. Liu, Directed evolution of [87] B. P. Kleinstiver, M. S. Prew, S. Q. Tsai et al., “Broadening the a small-molecule-triggered intein with improved splicing targeting range of Staphylococcus aureus CRISPR-Cas9 by ” properties in mammalian cells, Chemistry & Biology, vol. 18, modifying PAM recognition,” Nature Biotechnology, vol. 33, – no. 5, pp. 619 630, 2011. no. 12, pp. 1293–1298, 2015. [70] K. M. Davis, V. Pattanayak, D. B. Thompson, J. A. Zuris, [88] R. Heler, A. V. Wright, M. Vucelja, D. Bikard, J. A. Doudna, “ and D. R. Liu, Small molecule-triggered Cas9 protein and L. A. Marraffini, “Mutations in Cas9 enhance the rate of fi ” with improved genome-editing speci city, Nature Chemical acquisition of viral spacer sequences during the CRISPR- – Biology, vol. 11, no. 5, pp. 316 318, 2015. Cas immune response,” Molecular Cell, vol. 65, no. 1, [71] X. X. Zhou, X. Zou, H. K. Chung et al., “A single-chain photo- pp. 168–175, 2017. switchable CRISPR-Cas9 architecture for light-inducible gene [89] J. H. Hu, S. M. Miller, M. H. Geurts et al., “Evolved Cas9 ” editing and transcription, ACS Chemical Biology, vol. 13, variants with broad PAM compatibility and high DNA – no. 2, pp. 443 448, 2017. specificity,” Nature, vol. 556, no. 7699, pp. 57–63, 2018. “ [72] X. X. Zhou, L. Z. Fan, P. Li, K. Shen, and M. Z. Lin, Optical [90] F. Atroshi, A. Rizzo, T. Westermarck, and T. Ali-Vehmas, control of cell signaling by single-chain photoswitchable “Effects of tamoxifen, melatonin, coenzyme Q10, and L- ” – kinases, Science, vol. 355, no. 6327, pp. 836 842, 2017. carnitine supplementation on bacterial growth in the presence [73] G. T. Hess, L. Frésard, K. Han et al., “Directed evolution using of mycotoxins,” Pharmacological Research, vol. 38, no. 4, dCas9-targeted somatic hypermutation in mammalian cells,” pp. 289–295, 1998. – Nature Methods, vol. 13, no. 12, pp. 1036 1042, 2016. [91] X. Liu, E. Pisha, D. A. Tonetti et al., “Antiestrogenic and [74] F. Richter, I. Fonfara, B. Bouazza et al., “Engineering of DNA damaging effects induced by tamoxifen and toremifene temperature- and light-switchable Cas9 variants,” Nucleic metabolites,” Chemical Research in Toxicology, vol. 16, no. 7, Acids Research, vol. 44, no. 20, pp. 10003–10014, 2016. pp. 832–837, 2003. [75] B. L. Oakes, D. C. Nadler, A. Flamholz et al., “Profiling of [92] P. W. Fan, F. Zhang, and J. L. Bolton, “4-Hydroxylated engineering hotspots identifies an allosteric CRISPR-Cas9 metabolites of the antiestrogens tamoxifen and toremifene are switch,” Nature Biotechnology, vol. 34, no. 6, pp. 646–651, metabolized to unusually stable quinone methides,” Chemical 2016. Research in Toxicology, vol. 13, no. 1, pp. 45–52, 2000. [76] H. Nishimasu, F. A. Ran, P. D. Hsu et al., “Crystal structure of [93] C. M. Nowak, S. Lawson, M. Zerez, and L. Bleris, “Guide RNA Cas9 in complex with guide RNA and target DNA,” Cell, engineering for versatile Cas9 functionality,” Nucleic Acids vol. 156, no. 5, pp. 935–949, 2014. Research, vol. 44, pp. 9555–9564, 2016. [77] I. M. Slaymaker, L. Gao, B. Zetsche, D. A. Scott, W. X. Yan, and F. Zhang, “Rationally engineered Cas9 nucleases with Hindawi International Journal of Genomics Volume 2018, Article ID 9235605, 10 pages https://doi.org/10.1155/2018/9235605

Research Article Calibrating Transcriptional Activity Using Constitutive Synthetic Promoters in Mutants for Global Regulators in Escherichia coli

Ananda Sanches-Medeiros, Lummy Maria Oliveira Monteiro, and Rafael Silva-Rocha

Systems and Synthetic Biology Lab, FMRP - University of São Paulo, Ribeirão Preto, SP, Brazil

Correspondence should be addressed to Rafael Silva-Rocha; [email protected]

Received 24 October 2017; Accepted 30 January 2018; Published 21 March 2018

Academic Editor: João Paulo Gomes

Copyright © 2018 Ananda Sanches-Medeiros et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The engineering of synthetic circuits in cells relies on the use of well-characterized biological parts that would perform predicted functions under the situation considered, and many efforts have been taken to set biological standards that could define the basic features of these parts. However, since most synthetic biology projects usually require a particular cellular chassis and set of growth conditions, defining standards in the field is not a simple task as gene expression measurements could be affected severely by genetic background and culture conditions. In this study, we addressed promoter parameterization in bacteria in different genetic backgrounds and growth conditions. We found that a small set of constitutive promoters of different strengths controlling a short-lived GFP reporter placed in a low-copy number plasmid produces remarkably reproducible results that allow for the calibration of promoter activity over different genetic backgrounds and physiological conditions, thus providing a simple way to set standards of promoter activity in bacteria. Based on these results, we proposed the utilization of synthetic constitutive promoters as tools for calibration for the standardization of biological parts, in a way similar to the use of DNA and protein ladders in molecular biology as references for comparison with samples of interest.

1. Introduction in recruiting ribosomes for the translation of the target protein [4]. Additionally, the dilution or degradation of Understanding the logic underlying the genetics of a micro- mRNA and proteins depends on the physiological state of organism based on the dynamics of its promoters and the cell just as how their synthesis also relies on cell phys- transcription factors is essential for manipulation of other iology with respect to the availability of nucleotides, amino living systems. A way to study this logic is introducing acids, RNAP, and ribosomes [5]. In this way, changes in synthetic circuits provided with a reporter gene into living cell physiology and growth conditions can cause variability cells and analyzing the results of the expression [1, 2]. How- in gene expression in a manner that is independent of ever, the success of the implementation of complex circuits in promoter regulation [5]. living cells relies strongly on the correct production of the On account of these possible variations between the cells, molecular components of the cells and is not limited to the several attempts have been made to establish biological influences of the promoters and transcription factors on gene standards for promoter activity, and the use of internal expression. Several factors are responsible for controlling promoters as references has been proposed some years ago gene expression, including the rates of mRNA and protein [6, 7]. More recently, the use of calibrated internal promoters production and their rates of degradation. However, synthe- has been proposed as an alternative for defining relative sis of mRNA depends strongly on promoter strength, which promoter activities during experimental measurements of determines how frequently the RNA polymerase (RNAP) is transcription levels. In this method, an endogenous (or refer- recruited to the promoter to initiate transcription [3]. On ence) promoter is placed in the same plasmid as the target the other hand, the rate of protein production depends promoter, each of them controlling the expression of a differ- strongly on the strength of the ribosome binding site (RBS) ent fluorescent protein, and the intrinsic promoter activity is 2 International Journal of Genomics calculated as a ratio of the two outputs [8]. However, the promoter analysis. Plasmid pMR1 has a low-copy p15a ori, expression of additional genes in the host bacterium can a chloramphenicol-resistance marker, and two genes encod- increase genetic load and influence gene expression as ing fluorescent proteins oriented in opposite directions well. In this way, inserting a calibrated internal promoter (mCherry and gfplva). Plasmid pGLR2 has a low-copy RK2 would disturb cell functions [9]. Additionally, most origin of replication, a kanamycin resistance marker, and methods have focused on the analysis of maximal pro- two reporter genes oriented in the same direction, namely, moter activity at fixed conditions or on linear expression the GFP gene followed by luxCDABE. Although the vector range of promoter activity, limiting the utilization of stan- has GFP, in this work when pGLR2 was used, only the Lux ° dards on condition where cells are subjected to changing was measured. The E. coli strains were grown at 37 C with physiological regimens [8]. These requirements make the aeration at an agitation rate of 180 rpm in LB medium (for use of calibration methods for the analysis of regulated overnight growth) or M9 minimal medium (containing ffi • promoters extremely di cult. 6.4 g/L Na2HPO4 7H2O, 1.5 g/L KH2PO4, 0.25 g/L NaCl, In this study, we seek to analyze intrinsic promoter and 0.5 g/L NH4Cl) supplemented with 2 mM MgSO4, ff activity using a single reporter gene in di erent strains of 0.1 mM CaCl2, 0.1 mM casamino acids, and 1% glycerol as Escherichia coli by using a simple and straightforward proto- the sole carbon source (for growth during the analysis). col. For the determination of intrinsic promoter activity, we When required, chloramphenicol (34 μg/mL), kanamycin used a low-copy number plasmid based on the p15a origin (50 μg/mL), or glucose (0.4%) was added to the medium. In of replication (ori) and a short-lived GFP with LVA tag the minimal medium, the antibiotics were added at half of [10]. We analyzed four constitutive promoters available in the previously mentioned concentrations. the Registry of Standard Biological Parts and a wild-type Plac promoter as regulated system. As hosts, we used two strains 2.2. Plasmid and Strain Construction. For the experiments, of E. coli with mutant global regulatory proteins, ihf and fis, oligonucleotides were synthesized (Exxtend, Campinas, which are responsible for regulating the expression of Brazil) based on the synthetic constitutive promoters from hundreds of genes in this bacterium [11], obtained from the the iGEM BBa_J23104 set of promoters (http://parts.igem. widely used Keio collection of E. coli mutants [12]. Addition- org/Part:BBa_J23104), with an annealing site on pMR1 and ally, glucose was used as the external source of variation, restriction sites for EcoRI and BamHI. The promoters since all strains exhibited improved growth rates in its J23100, J23106, J23114, and J23113 (referred here as Pj100, presence. Under the conditions of the analysis, we observed Pj106, Pj114, and Pj113, resp.) were used (see Table 1). Once fi that the system we had used exhibited invariant promoter these fragments were ampli ed by PCR, they were digested activities that were independent of the strains and growth by EcoRI and BamHI and inserted into the multiple cloning conditions used, indicating it was able to demonstrate the sites (MCS) of pMR1 and pGLR2, and thus, generating intrinsic properties of the promoters analyzed. In addition, pMR1-Pjx and pGR2-Pjx. These plasmids were inserted into α to prove that our calibrator works, we tested the natural DH5 and DH10B strains, respectively, cloned, and promoter of Pseudomonas putida Pm promoter with differ- sequenced. The plasmids, pMR1-Pjx were inserted into E. fi ent concentrations of 3-methylbenzoate (3MBz) [13] and coli BW25113 and into ihf and s mutants obtained from calibrate it with our four constituent promoters in liquid the Keio collection. fi and solid medium. In this way, it was possible to verify that The xylS, PxylS, and Pm promoters were PCR ampli ed the calibrator works and presents a potential application in with Phusion High-Fidelity DNA polymerase (Thermo fi synthetic biology. In this regard, we propose a simple, Fisher Scienti c) using the primer pairs 5_xylS_EcoRI plasmid-based and single reporter method for promoter (5′-GAA TTC TCA AGC CAC TTC CTT TTT GCA calibration that is compatible for use with regulated pro- TTG-3′) and 3_Pm_BamHI (5′-GGA TCC ATT ATT GTT moters and changes in growth conditions, which could be TCT GTT GCA TAA AGC C-3′) and pSEVA438 vector fundamental to the characterization of biological parts in (pBBR1 replication origin, Sm/Sp; Silva-Rocha and de synthetic biology. Lorenzo [1]) as template. These primers introduced EcoRI and BamHI restriction sites (underlined) at the 5′ and 3′ 2. Material and Methods ends, respectively. The PCR products were gel purified, digested with EcoRI/BamHI, and ligated to the pMR1 vector 2.1. Bacterial Strains, Plasmids, and Growth Conditions. The previously cut with the same restriction enzymes. The result- bacterial strains, plasmids, and primers used in this study are ing plasmids were sequenced to check integrity and inserted α listed in Table 1. E. coli DH5 was used for cloning the to E. coli strains (E. coli BW25113and into ihf and fis pMR1-Pjx (where x stands for 100, 106, 114, and 113) and mutants). The resulting plasmid was named pMR1-xylS-Pm. pMR1-Plac vectors [14] by transformation using the heat- shock method, and E. coli DH10B was used for cloning 2.3. GFP Fluorescence and Bioluminescence Assays and Data pGLR2-Pjx vectors by electroporation and for GFP/Lux Processing. To analyze promoter activity, single colonies expression analysis [15]. For the calibration of promoter of recombinant strains containing pMR1-Pjx and E. coli activity, E. coli BW25113 was used as wild-type strain, and DH10B containing pGLR2-Pjx were grown overnight in ihf (Δihf)orfis (Δfis) mutants (from the Keio collection) were LB medium that was supplemented with chloramphenicol used as mutant hosts with reduced growth rate. Plasmids (34 μg/mL) or kanamycin (50 μg/mL) for plasmid selection ° pMR1 and pGLR2 were used as reporter systems for at 37 C with aeration and agitated at 180 rpm. The strains International Journal of Genomics 3

Table 1: Strains, plasmids and primers used in this study.

Strains, plasmids, and primers Description Reference Strains − φ Δ α F endA1 glnV44 thi-1 recA1 relA1 gyrA96 deoR nupG purB20 80dlacZ M15 E. coli DH5 Δ − + λ− [12] (lacZYA-argF)U169, hsdR17(rK mK ), . mcrA Δmrr-hsdRMS-mcrBC) φ 80lacZΔM15 ΔlacX74 recA1 araD139 Δ E. coli DH10B [12] (ara-leu)7697 galU galK rpsL endA1 nupG Δdcm. lacI+rrnB ΔlacZ hsdR514 ΔaraBAD ΔrhaBAD rph-1 E. coli BW25113 T14 WJ16 AH33 LD78 [12] Δ(araB–D)567 Δ(rhaD–B)568 ΔlacZ4787(::rrnB-3) hsdR514 rph-1. E. coli JW1702 E. coli BW25113 Δihf mutant [12] E. coli JW3229 E. coli BW25113 Δfis mutant [12] Plasmids pMR1 CmR, ori p15a. GFPlva promoter probe vector [14] pMR1-Pj113 pMR1 with Pj113 cloned as EcoRI/BamHI fragment This work pMR1-Pj114 pMR1 with Pj114 cloned as EcoRI/BamHI fragment This work pMR1-Pj106 pMR1 with Pj106 cloned as EcoRI/BamHI fragment This work pMR1-Pj100 pMR1 with Pj100 cloned as EcoRI/BamHI fragment This work pMR1-Plac pMR1 with Plac promoter cloned as EcoRI/BamHI fragment [28] pGLR2 KmR, oriT, ori RK2. SEVA-based vector with dual GFP-lux reporter [15] pGLR2-Pj113 pGLR2 with Pj113 cloned as EcoRI/BamHI fragment This work pGLR2-Pj114 pGLR2 with Pj114 cloned as EcoRI/BamHI fragment This work pGLR2-Pj106 pGLR2 with Pj106 cloned as EcoRI/BamHI fragment This work pGLR2-Pj100 pGLR2 with Pj100 cloned as EcoRI/BamHI fragment This work pMR1-xylS-Pm pMR1 with PxylS, xylS and Pm cloned as EcoRI/BamHI fragment This work Primers Pj100-FW GAATTCTTGACGGCTAGCTCAGTCCTAGG This work Pj100-RV TACAGTGCTAGCAAGTGGATCCTTGCGATC This work Pj106-FW GAATTCTTTACGGCTAGCTCAGTCCTAGGTA This work Pj106-RV TAGTGCTAGCAAGTGGATCCTTGCGATC This work Pj114-FW GAATTCTTTATGGCTAGCTCAGTCCTAGGT This work Pj114-RV ACAATGCTAGCAAGTGGATCCTTGCGATC This work Pj113-FW GAATTCCTGATGGCTAGCTCAGTCCTAGGG This work Pj113-RV ATTATGCTAGCAAGTGGATCCTTGCGATC This work 5_xylS_EcoRI GAATTCTCAAGCCACTTCCTTTTTGCATTG This work 3_Pm_BamHI GGATCCATTATTGTTTCTGTTGCATAAAGCC This work

ff grown overnight were washed with MgSO4 (10 mM) bu er, measured for DH10B containing pGLR2-Pjx. Promoter resuspended in the same buffer, and diluted to a ratio of activities were expressed as fluorescence or bioluminescence 1 : 20 with M9 minimal medium (containing 6.4 g/L Na2H- normalized by the OD600nm upon background normalization • fl PO4 7H2O, 1.5 g/L KH2PO4, 0.25 g/L NaCl, and 0.5 g/L ( uorescence/OD600nm). As a positive control for pMR1-Pjx NH4Cl) supplemented with 2 mM MgSO4, 0.1 mM CaCl2, analysis, wild type lac promoter (Plac), which is regulated 0.1 mM casamino acids, chloramphenicol (17 μg/mL), and by CRP, was used. Data analysis and representation was 1% glycerol as the sole carbon source. When required, glu- performed using Microsoft Excel (2016) and ad hoc R script. cose (0.4%) was supplemented to the medium. In total, To prove that the pMR1-Pjx system works as a gene expres- 200 μL of the culture was placed in a 96-well plate and sion standard, the natural promoter of P. putida Pm, which is analyzed using a Victor X3 plate reader (PerkinElmer) over regulated by xylS when this regulator is induced by 3MBz, ° several hours at 37 C. At 30-minute time intervals, the was used [13]. The pMR1-xylS-Pm construct contains the fl optical density at 600 nm (OD600nm) and the uorescence xylS promoter (PxylS), which in the presence of 3MBz leads (excitation 485 nm and emission 535 nm) were measured to the expression of the XylS regulatory protein. The XylS for the strains containing pMR1-Pjx; the optical density regulator binds to 3MBz and activates the Pm promoter fl at 600 nm (OD600nm), the uorescence (excitation 485 nm by inducing the expression of GFP. Data analysis and rep- and emission 535 nm), and the bioluminescence were resentation were performed using ad hoc R script. 4 International Journal of Genomics

3. Results and Discussion constructs analyzed. However, in all cases, the addition of 0.4% glucose to the growth medium resulted in a stepwise fi 3.1. Quanti cation of Constitutive Promoter Activity Using improvement in the growth of the strains. In other words, GFP and luxCDABE Reporters. For the analysis of promoter in the presence of glucose, bacterial growth is faster, although activities, we used two reporter plasmids based on a short- there is no glucose effect on the final promoter activity. Our lived GFP variant placed into a narrow-host-range vector calibrator approach is an interesting way to avoid the genetic (pMR1 [14]) and a synthetic GFP-luxCDABE reporter background differences, indicating that the calibrator can be system placed into a broad-host-range vector [15, 16] to used in several conditions to standardize promoter studies measured Lux, as represented in Figure 1(a). In order to using different strains under a growth condition variety ff ff observe the e ects of these di erences (regarding use of (glucose or 3MBz—performed below). ff di erent reporter systems) on the measurement of the In order to observe the effects of the differences in strains activities of the promoters of interest, we analyzed the pro- and growth medium on the measurements of the activities of moter activities of four BioBrick parts, namely Pj100, Pj106, the promoters of interest, we analyzed four synthetic Pj114, and Pj113, that contain mutations in the sequences promoters that contain sequence differences at −35 or −10 − − at 35 or 10 and exhibit about 100%, 50%, 10%, and 1% (Figure 3(a)). As shown in Figure 3(b), the regulated Plac activities, respectively (relative to Pj100 activity). As shown promoter (a natural promoter used as reference) exhibited in Figures 1(b) and 1(c), maximal promoter activity of the strong activity in the three strains analyzed (wild type, Δihf, four synthetic promoters analyzed were virtually identical and Δfis), and this activity was fully suppressed in the for both the short-lived GFPlva and the luxCDABE reporters, presence of glucose (due to the inactivation of CRP [17]). resulting in relative activity values that are closer to the When we analyzed promoter activity of the four synthetic expected value. When we analyzed the promoter activities promoters, we observed that the promoter dynamics and during the growth period of E. coli using the GFPlva and steady state promoter activity were almost invariant in the ff Lux reporters, we observed that the di erences were present different mutant strains and under the two growth condi- throughout the growth period of the bacteria, with better tions (Figure 3(c)), indicating that these promoters were ff di erentiation of the intrinsic promoter activities achieved not influenced by the drastic physiological variations regard- using the luminescent reporter system (Figures 1(d) and ing the different strains of E. coli (E. coli BW25113, and into ff 1(e)). Although the luciferase reporter provided better di er- ihf and fis mutants). It is noteworthy that the same was entiation, GFP reporter allows uses to perform single-cell observed for the addition of glucose to the medium that did experiments that cannot be made using light-emitting not compromise the promoter activity. Again showing that reporters. Since most synthetic biology works use the GFP the internal calibrator can be used in different situations. ffi reporter, and moreover, GFP provided su cient resolution When we performed a comparison of the observed promoter to analyze the promoters and also allowed for single-cell activities, Pj106 exhibited an activity level very close to the analysis (a possible calibrator approach), we focused on the expected value (45.7% observed versus 47% expected), section on the pMR1 reporter system containing the short- whereas Pj114 and P113 exhibited promoter activities lived variant of the reporter protein. varying by~ 3% and 2.3%, respectively, from the value of the activity exhibited by Pj100 (compared to the expected 3.2. Robust Calibration of Promoter Activities under Different values, 10% and 1%, resp.). These expected values come Pleiotropic and Growth Conditions. During the experiments, from the previous analysis made by iGEM BBa_J23104 wild type and mutant strains of E. coli were grown in minimal set of promoters (http://parts.igem.org/Part:BBa_J23104). M9 medium with 1% glycerol and 1% glycerol plus 0.4% These differences are due to the differences in the reporter, glucose in order that the cells were adapted to a richer phys- plasmids, and strains used for promoter characterization. iological regimen. Figure 2(a) represents the critical steps in Additionally, the Plac promoter exhibited activity of value gene expression that were influenced by the bacterial hosts about 30% under nonrepressive conditions when compared and growth conditions used. In this regard, the rate of mRNA to Pj100 reference promoter. synthesis (βm) was the main parameter controlled by a These results show that the use of the short-lived GFP specific synthetic promoter, while the rate of protein synthe- reporter in combination with constitutive promoters is a sis (βp) was dependent on the strength of the RBS involved simple way to calibrate promoter activity under user- (which was the same in all constructs analyzed). Addition- specific experimental conditions, similar to the way that ally, the rates of mRNA and protein degradation were DNA and protein ladders are used in molecular biology dependent on the nature of the reporter sequence (i.e., due techniques as references for the comparison of specific tar- to differences in the sequence of reporter genes) and the gets. In conclusion, our data shows how intrinsic promoter growth rate of the bacteria (since fast-growing bacteria have activity can be calibrated using single reporter genes and higher dilution rates of mRNA and protein than slow- simple data processing without the need for using internal growing bacteria). In this method, the use of constitutive promoter references. promoters with different strengths allowed for variations in βm and facilitated the analysis of its sensitivity to changes 3.3. The Calibrator Can Be Applied to the Induction System in the dilution or degradation rates of mRNA and proteins. xylS-Pm. In order to prove that the set of four promoters As shown in Figures 2(b) and 2(c), both ihf and fis mutants proposed in this work acts as an internal calibrator even exhibited reduced growth compared to the wild type for all when applied to an induced expression system, we analyzed International Journal of Genomics 5

GFPlva GFP luxCDABE Pjx Pjx pMR1-Pjx pGLR2-Pjx

(a) 6 12

GFPlva luxCDABE 5 10 ) ) 5 5 4 8

3 6

2 4 Promoter activity (a.u., 10 (a.u., activity Promoter Promoter activity (a.u., 10 (a.u., activity Promoter

1 2

0 0 − Pj113 Pj114 Pj106 Pj100 − Pj113 Pj114 Pj106 Pj100 (b) (c) 7 7 GFPlva luxCDABE ) ) 600 600 6 6

5 5

4 4 Promoter activity (log10 GFP/OD activity Promoter Promoter activity (log10 Lux/OD activity Promoter

3 3

04123 5 0 2468 Time (h) Time (h)

Pj100 Pj113 Pj100 Pj113 Pj106 Empty Pj106 Empty Pj114 vector Pj114 vector (d) (e)

Figure 1: Construction and validation of the reporter systems. (a) Synthetic promoters were cloned into the plasmid pMR1, which contains a short-lived GFPlva variant, and pGLR2, a broad host range vector containing a GFP-luxCDABE reporter system. (b) Maximal promoter activity of the four promoters in pMR1 vector. (c) Maximal promoter activity analyzed by monitoring lux expression using pGLR2 constructions. (d) GFP expression profile along the growth curve from reporters cloned in pMR1 vector. (e) lux expression profile along the growth curve from reporters cloned in pGLR2 vector. The solid lines represent the average values calculated using data from three independent experiments while dashed lines represent standard deviation from the samples. 6 International Journal of Genomics

 0.3 p Plac Plac (glu)

RNAP  p 0.2

 600 m OD  0.1 m

P gfplva 0.0 01 2345 0 12345 Time (h) wt Δihf Δfs (a) (b) 0.3 Pj100 Pj100 (glu) Pj106 Pj106 (glu)

0.2

0.1

0.0 600

OD 0.3 Pj114 Pj114 (glu) Pj113 Pj113 (glu)

0.2

0.1

0.0 0 12345 0 12345 0 12345 0 12345 Time (h)

wt Δihf Δfs (c)

Figure 2: Quantification of growth variation in different E. coli strains under two physiological regimens. (a) Schematic representation of the main steps for gene expression in bacteria. The strength of the interaction between RNA polymerase (RNAP) and target promoter determines β β the rate of mRNA synthesis ( m), while the RBS sequence determines the rate of protein translation ( p). The rates of mRNA and protein γ γ dilution or degradation ( m and p, resp.) depends on cell growth and physiological regimens of the cells. (b) Growth curve of E. coli strains harboring a Plac::GFPlva fusion in minimal medium with 1% glycerol (left) or 1%glycerol plus 0.4% glucose (right) as carbon source. (c) Growth curve of E. coli strains harboring different promoter fusions (Pj100, Pj106, Pj114, and Pj113) in minimal medium with 1% glycerol or 1%glycerol plus 0.4% glucose (labeled as glu) as carbon source. Solid lines represent average values calculated using data from three independent experiments for wild type (black), Δihf (red), and Δfis (green) strains, while dashed lines represent the upper and lower limits of standard deviations. the four synthetic promoters and pMR1-xylS-Pm system were carried out on the three E. coli strains previously used (Figure 4(a)) with increasing concentrations of 3MBz. In in this work. The data shown in Figure 4(d) are relative to order to demonstrate that our calibrator is robust and even 4.5 hours after the start of the induction. From Figure 4, it works in a blue light transilluminator, we analyzed colonies is possible to note that the increasing concentration of grown on petri dish contend medium LB plus 3MBz 3MBz did not promote differences in Pjx promoter activity 1000 μM. As shown in Figure 4(b), pMR1-xylS-Pm displayed (Figure 4(d)), neither during the 8 hours of the experiment the same promoter activity as pMR1-P106. Next, we tested (Figure 4(c)). On Figure 4(c), it is important to note that the system at increasing 3MBz inductor concentrations; they each color on the graph represents a different Pjx synthetic International Journal of Genomics 7

6

5 −35 −10 TTGACG Pj100 TACAGT TTTACG Pj106 TATAGT 4 TTTATG Pj114 TACAAT CTGATG Pj113 GATTAT Plac (glu) Promoter activity (a.u., log10) (a.u., activity Promoter 3 Plac 012345 0 12345 Time (h) wt Δihf Δfs (a) (b) 6

5

4

Pj100 Pj100 (glu) Pj106 Pj106 (glu) 3 6

5 Promoter activity (a.u., log10) (a.u., activity Promoter

4

3 Pj114 Pj114 (glu) Pj113 Pj113 (glu) 0 12345 0 12345 0 12345 0 12345 Time (h)

wt Δihf Δfs (c)

Figure 3: Quantification of promoter activities in different E. coli strains. (a) Representation of the sequences at −10 or −35 for the four constitutive promoters analyzed, using bold letters for bases conserved related to Pj100 reference. (b) Promoter activity of E. coli strains harboring a Plac::GFPlva fusion in minimal medium with 1% glycerol (left) or 1%glycerol plus 0.4% glucose (right) as carbon source. (c) Promoter activity of E. coli strains harboring different promoter fusions (Pj100, Pj106, Pj114, and Pj113) in minimal media with 1% glycerol or 1%glycerol plus 0.4% glucose (labeled as glu) as carbon source. Solid lines represent the average values calculated using data from three independent experiments for wild type (black), Δihf (red), and Δfis (green) strains, while dashed lines represent the upper and lower limits of standard deviations. promoter and the set of lines belonging to each color A brief and simple conclusion can be made from group refers to different 3MBz concentrations. In this Figure 4(d), regardless of the host strain used, in the range sense, it is possible to note that there are no differences of 1 to 10 μM (0 to 1 on the x axis) concentration, the Pm between the set color lines. This result suggests that the promoter presents similar promoter activity to Pj114 pro- 3MBz addition do no promote differences on promoter moter. On the other hand, in the range of 10 to 100 μM activity. On the other hand, for a 3MBz-induced system, (1 to 2 on the x-axis) concentration, Pm presents interme- the aromatic compound produced a change in promoter diate promoter activity to Pj114 and Pj106 promoters. activity for a sigmoidal curve, proportional to the 3MBz Finally, in the range of 100 to 1000 μM (2 to 3.0 on the increase concentration (Figure 4(d)). x-axis) concentrations, Pm presents promoter activity close 8 International Journal of Genomics

) BW25113 600 6.0 BW25113 Pj100 3MBz 1000 m 5.5 Pj106 5.0

4.5 Pm Pj114 4.0 Pj113 xylS GFPlva xylS_pM Pj100 Pj106 Pj113 Pj114 pMR1 3.5 PxylS pMR1-xylS-Pm 3.0

60 100 140 180 220 (log10 GFP/OD activity Promoter 02468 Time (h) (a) (b) (c) BW25113 ΔIHF ΔFis 6.0 6.0 6.0 5.5 5.5 5.5 5.0 5.0 5.0 4.5 4.5 4.5 4.0 4.0 4.0 3.5 3.5 3.5 3.0 3.0 3.0 Promoter activity (logGFP/IOD) activity Promoter Promoter activity (logGFP/IOD) activity Promoter Promoter activity (logGFP/IOD) activity Promoter 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3MBz (logM) 3MBz (logM) 3MBz (logM) xylS Pj114 Pj100 Pj113 Pj106

(d)

Figure 4: The calibrator can be applied to the induction system xylS-Pm. (a) xylS promoters (PxylS), xylS protein, and Pm promoter were cloned into the plasmid pMR1, which contains a short-lived GFP variant. (b) xylS-Pm calibration in LB solid medium with 1000 μMof 3MBz added. This calibration was performed in BW25113 wt strains. (c) Pjx promoter activity analyzed by 8 hours of experiment by monitoring GFPlva expression using pMR1 constructions. (d) GFP expression profile for 7 different 3MBz concentrations for Pjx and xylS-Pm in pMR1 vector, 4.5 hours after the induction. Solid lines represent the average values calculated using data from three independent experiments for wild type, Δihf, and Δfis strains, while dashed lines represent the upper and lower limits of standard deviations. to Pj106 promoter activity. Additionally, we can safely con- synthetic circuits that accelerates design-build-test cycles in firm that the sigmoidal form for the xylS-Pm system is due synthetic biology. Once a circuit has been effectively imple- to the 3MBz addition and not by environmental or host mented and tested, the introduction of a single copy of the changes, since the calibration system does not change under construct by using a chromosome insertion on a same region these conditions. for all promoters is recommended in order to enhance the performance of the system as well as provide stable strains 4. Conclusions for final use because an insertion on different regions could modify the GFP expression [25–27]. In this sense, the use The standardization of biological parts for the construction of this promoter on different chromosome regions could of complex circuits forms the basis of synthetic biology provide a way to standardize the variation of gene expression [18–21]. In this regard, failure in the implementation of caused by variations on chromosome position and structure. constructed synthetic biological circuits may occur when At the same time, the use of low-copy number plasmids can poorly characterized parts are used, and several strategies provide similar results as can the use of single-copy set-ups have been proposed to mitigate this problem [6, 22–24]. In under certain circumstances [1, 15]. In general, since each this report, we have highlighted that simple experimental synthetic biology project has its own design and uses specific techniques involving the use of a single fluorescent reporter hosts and experimental conditions, the use of calibrators and plasmids are sufficient to provide robust characterization such as those described in this paper could provide a simple of transcriptional elements without the necessity of using of way to standardize the experimental conditions used. This dual markers and complicated mathematical treatments [8]. would be similar to the use of molecular-weight size markers Additionally, plasmids provide an easy way of implementing in molecular biology techniques as references for the International Journal of Genomics 9 comparison of samples of interest under varying experimen- [10] J. B. Andersen, C. Sternberg, L. K. Poulsen, S. P. Bjorn, tal conditions. Although the calibration methods used in this M. Givskov, and S. Molin, “New unstable variants of green study were implemented in strains of the Keio collection of fluorescent protein for studies of transient gene expression in E. coli mutants [12], we expect that the validations pre- bacteria,” Applied and Environmental Microbiology, vol. 64, – sented here will be adopted by other research groups study- no. 6, pp. 2240 2246, 1998. ing synthetic biology as well as molecular microbiology. [11] A. Martinez-Antonio and J. Collado-Vides, “Identifying global ” Additionally, the approaches used in this research study regulators in transcriptional regulatory networks in bacteria, – can be easily adopted using alternative plasmid standards Current Opinion in Microbiology, vol. 6, no. 5, pp. 482 for gram negative bacteria other than E. coli such as the 489, 2003. [12] T. Baba, T. Ara, M. Hasegawa et al., “Construction of vectors available at the SEVA database [16], thus creating ‐ ‐ ‐ a significant impact on research in microbiology. Escherichia coli K 12 in frame, single gene knockout mutants: the Keio collection,” Molecular Systems Biology, vol. 2, 2006. Conflicts of Interest [13] R. Silva-Rocha and V. de Lorenzo, “Broadening the signal fi fl speci city of prokaryotic promoters by modifying cis-regu- The authors declare that there is no con ict of interest latory elements associated with a single transcription regarding the publication of this article. factor,” Molecular BioSystems, vol. 8, no. 7, pp. 1950–1957, 2012. Acknowledgments [14] M. E. Guazzaroni and R. Silva-Rocha, “Expanding the logic of bacterial promoters using engineered overlapping operators This work was supported by the Young Investigator Award for global regulators,” ACS Synthetic Biology, vol. 3, no. 9, of Sao Paulo State Foundation (FAPESP, award number pp. 666–675, 2014. 2012/22921-8). Ananda Sanches-Medeiros was supported [15] I. M. Benedetti, V. de Lorenzo, and R. Silva-Rocha, “Quan- by a Scientific Initiation Scholarship (FAPESP, award num- titative, non-disruptive monitoring of transcription in ber 2015/22386-3). Lummy Maria Oliveira Monteiro was single cells with a broad-host range GFP-luxCDABE dual ” supported by a FAPESP PhD fellowship (FAPESP, award reporter system, PLoS One, vol. 7, no. 12, article e52000, number 2016/19179-9). The authors are thankful to the lab 2012. “ members for insightful discussions on this work. [16] R. Silva-Rocha, E. Martínez-García, B. Calles et al., The Standard European Vector Architecture (SEVA): a coherent platform for the analysis and deployment of complex prokary- References otic phenotypes,” Nucleic Acids Research, vol. 41, no. D1, pp. D666–D675, 2013. [1] R. Silva-Rocha and V. de Lorenzo, “Chromosomal integration [17] A. Schmitz, “Cyclic AMP receptor protein interacts with of transcriptional fusions,” Methods in Molecular Biology, lactose operator DNA,” Nucleic Acids Research, vol. 9, no. 2, vol. 1149, pp. 479–489, 2014. pp. 277–292, 1981. [2] A. Zaslaver, A. Bren, M. Ronen et al., “A comprehensive [18] A. S. Khalil and J. J. Collins, “Synthetic biology: applications library of fluorescent transcriptional reporters for Escherichia come of age,” Nature Reviews Genetics, vol. 11, no. 5, coli,” Nature Methods, vol. 3, no. 8, pp. 623–628, 2006. pp. 367–379, 2010. [3] D. F. Browning and S. J. W. Busby, “Local and global regula- [19] R. P. Shetty, D. Endy, and T. F. Knight Jr, “Engineering tion of transcription initiation in bacteria,” Nature Reviews BioBrick vectors from BioBrick parts,” Journal of Biological Microbiology, vol. 14, no. 10, pp. 638–650, 2016. Engineering, vol. 2, no. 1, p. 5, 2008. [4] L. Zelcbuch, N. Antonovsky, A. Bar-Even et al., “Spanning high-dimensional expression space using ribosome-binding [20] E. Andrianantoandro, S. Basu, D. K. Karig, and R. Weiss, ” “Synthetic biology: new engineering rules for an emerging site combinatorics, Nucleic Acids Research, vol. 41, no. 9, ” article e98, 2013. discipline, Molecular Systems Biology, vol. 2, 2006. “ ff [21] C. A. Voigt, “Genetic parts to program bacteria,” Current [5] S. Klumpp and T. Hwa, Bacterial growth: global e ects on – gene expression, growth feedback and proteome partition,” Opinion in Biotechnology, vol. 17, no. 5, pp. 548 557, 2006. Current Opinion in Biotechnology, vol. 28, pp. 96–102, 2014. [22] J. A. N. Brophy and C. A. Voigt, “Principles of genetic circuit ” – [6] J. R. Kelly, A. J. Rubin, J. H. Davis et al., “Measuring the activity design, Nature Methods, vol. 11, no. 5, pp. 508 520, 2014. of BioBrick promoters using an in vivo reference standard,” [23] V. Singh, “Recent advancements in synthetic biology: current Journal of Biological Engineering, vol. 3, no. 1, p. 4, 2009. status and challenges,” Gene, vol. 535, no. 1, pp. 1–11, 2014. [7] B. Canton, A. Labno, and D. Endy, “Refinement and stan- [24] A. de Las Heras, C. A. Carreño, E. Martinez-Garcia, and dardization of synthetic biological parts and devices,” Nature V. de Lorenzo, “Engineering input/output nodes in pro- Biotechnology, vol. 26, no. 7, pp. 787–793, 2008. karyotic regulatory circuits,” FEMS Microbiology Reviews, – [8] T. J. Rudge, J. R. Brown, F. Federici et al., “Characterization vol. 34, no. 5, pp. 842 865, 2010. of intrinsic properties of promoters,” ACS Synthetic Biology, [25] R. C. Brewster, F. M. Weinert, H. G. Garcia, D. Song, vol. 5, no. 1, pp. 89–98, 2016. M. Rydenfelt, and R. Phillips, “The transcription factor titra- [9] M. Carbonell-Ballestero, E. Garcia-Ramallo, R. Montanez, tion effect dictates level of gene expression,” Cell, vol. 156, C. Rodriguez-Caso, and J. Macia, “Dealing with the genetic no. 6, pp. 1312–1323, 2014. load in bacterial synthetic biology circuits: convergences with [26] J. W. Lee, A. Gyorgy, D. E. Cameron et al., “Creating single- the Ohm’s law,” Nucleic Acids Research, vol. 44, no. 1, copy genetic circuits,” Molecular Cell, vol. 63, no. 2, pp. 329– pp. 496–507, 2016. 336, 2016. 10 International Journal of Genomics

[27] D. F. Browning, D. C. Grainger, and S. J. W. Busby, “Effects of nucleoid-associated proteins on bacterial chromosome struc- ture and gene expression,” Current Opinion in Microbiology, vol. 13, no. 6, pp. 773–780, 2010. [28] G. R. Amores, M. E. Guazzaroni, and R. Silva-Rocha, “Engineering synthetic cis-regulatory elements for simulta- neous recognition of three transcriptional factors in bacteria,” ACS Synthetic Biology, vol. 4, no. 12, pp. 1287–1294, 2015.