608885.Full.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/608885; this version posted April 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 Genomics, Molecular and Evolutionary Perspective of NAC 2 Transcription Factors 3 4 Tapan Kumar Mohanta1*ꞩ, Dhananjay Yadav2ꞩ, Adil Khan1, Abeer Hashem3,4, Baby Tabassum5, 5 Abdul Latif Khan1, Eslayed Fathi Abda_Allah6, Ahmed Al-Harrasi1* 6 1Natural and Medicinal Plant Sciences Research Center, University of Nizwa, Nizwa, 616, Oman 7 2Dept. of Medical Biotechnology, Yeungnam University, Gyeongsan, 38541, Republic of Korea. 8 3Botany and Microbiology Department, College of Science, King Saud University, P.O. Box. 9 2460, Riyadh 11451, Saudi Arabia. 10 4Mycology and Plant Disease Survey Department, Plant Pathology Research Institute, ARC, Giza 11 12511, Egypt. 12 5Toxicology laboratory, Department of Zoology, Raza P.G. College, Rampur, Uttar Pradesh, 13 244091, India 14 6Plant Production Department, College of Food and Agricultural Sciences, King Saud University, 15 P.O. Box. 2460 Riyadh 11451, Saudi Arabia. 16 17 ꞩ Contributed equally 18 *Corresponding author 19 Tapan Kumar Mohanta, E-mail: [email protected], [email protected] 20 Ahmed Al-Harrasi, E-mail: [email protected] 21 Abstract 22 NAC (NAM, ATAF1,2, and CUC2) transcription factors are one of the largest transcription factor 23 families found in plants and are involved in diverse developmental and signalling events. Despite 24 the availability of comprehensive genomic information from diverse plant species, the basic 25 genomic, biochemical, and evolutionary details of NAC TFs have not been established. Therefore, 26 NAC TFs family proteins from 160 plant species were analyzed in the current study. The analysis, 27 among other things, identified the first algal NAC TF in the Charophyte, Klebsormidium 28 flaccidum. Furthermore, our analysis revealed that NAC TFs are membrane bound and contain 29 monopartite, bipartite, and multipartite nuclear localization signals. NAC TFs were also found to 30 encode a novel chimeric protein domain and are part of a complex interactome network. 31 Synonymous codon usage is absent in NAC TFs and it appears that they have evolved from 32 orthologous ancestors and undergone significant duplication events to give rise to paralogous 33 NAC TFs. 34 Keywords: NAC transcription factor, Transmembrane domain, Chimeric NAC transcription 35 factor, Nuclear localization signal, Codon usage, Gene duplication 36 1 bioRxiv preprint doi: https://doi.org/10.1101/608885; this version posted April 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 37 Introduction 38 Next-generation sequencing (NGS) has fostered the sequencing of many plant genomes. The 39 availability of so many genomes has allowed researchers to readily identify genes, examine 40 genetic diversity within a species, and gain insight into the evolution of genes and gene families. 41 Gene expression is regulated in part by different families of proteins known as transcription 42 factors (TFs). TFs are involved in inducing the transcription of DNA into RNA. They include 43 numerous and diverse proteins, all of which contain one or more DNA-binding motifs. The DNA- 44 binding domain enables them to bind to the promoter or repressor sequence of DNA that is 45 present either at the upstream, downstream, or within an intron region of a coding gene. Some TFs 46 bind to a DNA promoter region located near the transcription start site of a gene and help to form 47 the transcription initiation complex. Other TFs bind to regulatory enhancer sequences and 48 stimulate or repress transcription of the related genes. Regulating transcription is of paramount 49 importance to controlling gene expression and TFs enable the expression of an individual gene in 50 a unique manner, such as during different stages of development or in response to biotic or abiotic 51 stress. TFs act as a molecular switch for temporal and spatial gene regulation. A considerable 52 portion of a genome consists of genes encoding transcription factors. For example, there are at 53 least 52 TF families in Arabidopsis thaliana, and the NAC (no apical meristem (NAM) TF family 54 is one of them. 55 NAC TFs are characterised by the presence of a conserved N-terminal NAC domain comprising 56 approximately 150 amino acids and a diversified C-terminal end. The DNA binding NAC domain 57 is divided into five sub-domains designated A-E. Sub-domain A is apparently involved in the 58 formation of functional dimers, while sub-domains B and E appear to be responsible for the 59 functional divergence of NAC genes 1–4. The dimeric architecture of NAC proteins can remain 60 stable even at a concentration of 5M NaCl 4. The dimerization is established by Leu14-Thr23, 61 and Glu26-Tyr31 amino acid residues. The dimeric form is responsible for the functional unit of 62 stress-responsive SNAC1 and can modulate DNA-binding specificity. Sub-domains C and D 63 contain positively charged amino acids that bind to DNA. The crystal structure of the SNAC1 TF 64 revealed the presence of a central semi-β-barrel formed from seven twisted anti-parallel β-strands 65 with three α-helices 4. The NAC domain is most responsible for DNA binding activity that lies 66 between amino acids Val119-Ser183, Lys123-Lys126, with Lys79, Arg85, and Arg88 reside 67 within different strands of β-sheets 2,5,6. The remaining portion of the NAC domain contains a 68 loop region composed of the amino acids, Gly144-Gly149 and Lys180-Asn183, which are very 69 flexible in nature 4. The loop region of SNAC1 is quite long and different from the loop region of 70 ANAC, an abscisic-acid-responsive NAC, and could underlie the basis for different biological 71 functions. NAC TFs possesses mono or bipartite nuclear localization signals which contain a Lys 72 residue in sub-domain D 1,6–8. In addition, NAC proteins, as part of a mechanism of self- 73 regulation, also modulate the expression of several other proteins 6,9. The D subunit of a few NAC 74 TFs contain a hydrophobic negative regulatory domain (NRD), comprised of L-V-F-Y amino 75 acids, which is involved in suppressing transcriptional activity 10. For example, the NRD domain 76 can suppress the transcriptional activity of Dof, WRKY, and APETALA 2/dehydration responsive 77 elements (AP2/DRE) TFs 10. 2 bioRxiv preprint doi: https://doi.org/10.1101/608885; this version posted April 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 78 Studies indicate that the diverse C-terminal domain contains a transcription regulatory region 79 (TRR) which has several group-specific motifs that can activate or repress transcription activity 80 11–14. The diverse C-terminal region imparts differences in the function of individual NAC 81 proteins by regulating the interaction of NAC TFs with diverse target proteins. Although the C- 82 terminal region of NAC TFs is diverse, it also contains group-specific conserved motifs 15. 83 Although the diverse aspects of NAC TFs have been studied, most were conducted within 84 individual plant species. A detailed comparative study of the genomic, molecular biology, and 85 evolution of NAC TFs has not been conducted. Therefore, a comprehensive analysis of NAC TFs 86 is presented in the current study. 87 Results and discussion 88 NAC transcription factors exhibit diverse genomic and biochemical features 89 Advancements in genome sequencing technology have enabled the discovery of the genomic 90 details of hundreds of plant species. The availability this genome sequence data allowed us to 91 characterize the genomic details of NAC TFs in diverse plant species. The presence of NAC TFs 92 in 160 species (18774 NAC sequences) was identified and served as the basis of the conducted 93 analyses. Comparisons of NAC sequences revealed that Brassica napus has the highest number 94 (410) of NAC TFs, while the pteridophyte plant, Marchantia polymorpha, was found to contain 95 the lowest number (9) (Table 1). On average, monocot plants contain a higher (141.20) number of 96 NAC TFs relative to dicot plants (125.56). Except for Hordeum vulgare (76), Saccharum 97 officinarum (44), and Zostera marina (62) all other monocot species possess more than one 98 hundred NAC TFs each (Table 1). Lower eukaryotic plants, bryophytes and pteridophytes also 99 possess NAC TFs. In addition, the algal species, Klebsormidium flaccidum, also contains NAC 100 TFs and this finding represents the first report of NAC TFs in algae (Table 1). A NAC TF in 101 Trifolium pratense (Tp57577_TGAC_v2_mRNA14116) was found to be the largest NAC TF, 102 comprising 3101 amino acids, while a NAC TF in Fragaria x ananassa 103 (FANhyb_icon00034378_a.1.g00001.1) was found to be the smallest NAC TF, comprising only 104 25 amino acids. Although it only contains a 25 amino acid sequence, it still encodes a NAC 105 domain. Typically, NAC TFs contain a single NAC domain located near the N-terminal region of 106 the protein. The current analysis, however, also identified NAC TFs with two NAC domains. At 107 least 77 of the 160 studied species were found to contain two NAC domains (Table 1).