2019 International Conference on Energy, Power, Environment and Computer Application (ICEPECA 2019) ISBN: 978-1-60595-612-1

The MatK Gene of Spiranthes Sinensis Encodes a Maturase Involved in Multiple Biological Processes

* Jing FAN College of Life Sciences, Leshan normal University, Leshan, P.R. China *Corresponding author

Keywords: MatK Gene, Spiranthes Sinensis, Group II Intron, Domain X.

Abstract. The maturase K gene (matK) encodes the only hypothetical group II intron maturase in the chloroplast genome. To investigate the molecular characteristics and functions of matK gene in Spiranthes sinensis (SsmatK), the Ssmatk protein was systematically analyzed by bioinformatics. The results showed that SsmatK gene encodes an unstable hydrophilic protein located in chloroplast. Sequence alignment showed that SsmatK protein contained a domain X region, which was a specific region of group II intron encoding maturase. Phylogenetic tree analysis showed that SsmatK protein is most closely related to the matK proteins of Schiedeella garayana and Schiedeella romeroana. Post-translational modification analysis revealed that SsmatK protein had 55 phosphorylation sites and 3 glycosylation sites. The result of tertiary structure showed that two potential phosphorylation sites Ser446 and Thr450 in the essential SCARTLARKHK sequence may be two important synergistic sites for SsmatK protein function. Molecular functional analysis of the gene ontology showed that SsmatK protein was defined as binding activity, including specific binding to other proteins, organic cyclic compounds, nucleic acids, heterocyclic compounds and RNA molecules. Biological processes analysis of gene ontology further revealed that SsmatK was mainly involved in metabolic processes, gene expression and RNA processing. The above results suggest that the SsmatK gene located within the chloroplast group II intron encodes a highly conserved maturase and participates in a variety of biological processes. Therefore, strengthening the identification of potential coding genes in chloroplast genome will help to enrich our understanding of the complexity of transcriptional genes.

Introduction chloroplasts are important organelles that convert light energy into chemical energy. At present, the chloroplast genome sequences of many have been obtained, and a good understanding of the various biological processes occurring in chloroplasts has been made [1]. Spiranthes sinensis is one of the important orchid plants in China and has been listed as a national secondary protected plant containing effective substances for scavenging free radicals, anti-inflammatory and anti-tumor [2-4]. However, little information about the chloroplast coding genes of Spiranthes sinensis is available. The group II introns are of interest because they have the potential to encode maturase that play a role in introns mobility and RNA splicing [5]. For example, the matK gene located in the intron of chloroplast transfer RNA gene for lysine (trnK) encodes the protein that involved in trnk intron splicing [6]. The most prominent feature of matK is its rapid evolution rates, the base substitutions of matK gene occur anywhere in the triple codons, however, base substitutions for most other types of genes usually occur at the third base of the triplet codon, making the encoded matK proteins more prone to amino acid variation than other types of proteins [7]. The insertion and deletion of nucleic acids also often occur in the matK gene and resulting in transposition mutations [8-9]. Although about 2500 orchid matk genes in the GenBank database were defined as pseudo-genes, the SsmatK gene can express a normal protein with a molecular weight of about 62 kDa [10]. However, molecular information of SsmatK remains largely unknown. In this paper, molecular characteristics and functions of SsmatK gene were analyzed. This work is of great

222

significance for enriching molecular information of the chloroplast introns encoding matK gene of Spiranthes sinensis.

Materials and Methods Materials Sequences of matK protein from different species were derived from NCBI database, including Spiranthes sinensis (GenBank accession no. SCA64306.1), Hordeum vulgare (GenBank accession no. BAC54890.1), Aulosepalum pyramidale (GenBank accession no. CAP06379.1), Aulosepalum oestlundii (GenBank accession no. CAP06381.1), Beloglottis mexicana (GenBank accession no. SCA64305.1), Aulosepalum sp. Reyes (GenBank accession no. CAP06382.1), Aulosepalum tenuiflorum (GenBank accession no. CAP06387.1), Sarcoglottis schaffneri (GenBank accession no: CAP16586.1), Pelexia sp. Salazar (GenBank accession no. CAP16585.1), Physogyne gonzalezii (GenBank accession no: SCA64308.1), Dichromanthus michuacanus (GenBank accession no: CAP16591.1), Brachystele polyantha (GenBank accession no: SCA64310.1), Deiregyne falcata (GenBank accession no: SCN13869.1), Dichromanthus yucundaa (GenBank accession no: SCA64322.1), Schiedeella garayana (GenBank accession no: SCA64314.1), Schiedeella romeroana (GenBank accession no: SCA64315.1), Schiedeella wercklei (GenBank accession no: SCA64319.1), Dichromanthus cinnabarinus (GenBank accession no: CAP16590.1). Methods The physicochemical properties of matK protein were analyzed using Protparam software on the ExPASy Server (https://web.expasy.org/protparam/). Transmembrane analysis was performed on the TMHMM server with default parameters (http://www.cbs.dtu.dk/services/TMHMM-2.0/) [11]. Signal peptide prediction was performed by the online SignalP 4.1 server (http://www.cbs.dtu.dk/ services/SignalP/). Protein subcellular localization and gene ontology analysis were predicted using the online tools PredictProtein (https://www.predictprotein.org/). To determine the conserved domain of SsmatK protein, 12 matK protein sequences from various plants were retrieved from NCBI database and compared using Clustal X1.83 (EMBL-EBI, Cambridge, UK). To determine the evolutionary position of matK proteins amongst Spiranthes sinensis and its close relatives, 18 matK protein sequeces were used for multiple sequence alignment by MEGA 6.0 software, and then phylogenetic tree was generated using the neighbor-joining algorithm and tested with 1000 bootstrap replications [12]. Subsequently, the genetic distance was calculated according to the protein difference by p-distance method. The potential phosphorylation sites were identified by NetPhos 3.1 server (http://www.cbs.dtu.dk/services/NetPhos/). N-linked and O-linked glycosylation sites were predicted by NetNGlyc 1.0 Server (http://www.cbs.dtu.dk/services/ NetNGlyc/) and NetOGlyc 4.0 Server (http://www.cbs.dtu.dk/services/NetOGlyc/), respectively. Peptide secondary structure prediction was constructed using the SOPMA software available at Prabi-Gerland Rhone-Alpes Bioinformatic Pole Gerland (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_ sopma.html) [13]. To construct the tertiary structure of SsmatK, its protein sequence was submitted to the swiss-model workspace, then a target-template sequence alignment was done, finally, the tertiary structure of SsmatK was automatically established by the swiss-model server and viewed with PyMOL software (version 0.99, DeLano Scientific LLC, South San Francisco, California, USA) [14].

Results

Physicochemical Properties of SsmatK Protein To reveal the physicochemical properties of SsmatK protein, its protein sequence was analyzed by the ProtParam tool on ExPASy server. The results showed that SsmatK encodes a 517 amino acid protein with molecular weight of 62.34 kDa and theoretical isoelectric point (pI) of 9.88, the molecular

223 formula of SsmatK was determined as C2941H4479N743O728S14, the unstable coefficient was 45.83 and the average hydropathicity was -0.039, respectively. Since the instability parameter of stable protein is less than 40 and the average hydropathicity value of hydrophilic protein is less than 0, it is inferred that SsmatK is an unstable hydrophilic protein.

Transmembrane Topology and Signal Peptide Analysis of SsmatK Protein To analyse the transmembrane topology of SsmatK, the protein sequence was analyzed using the default settings of the TMHMM v2.0 program. The result showed the heights of the posterior probability peaks of SsmatK were lower than the transmembrane line, indicating no typical transmembrane region in SsmatK protein (Fig. 1A). For SignalP predictions, the default settings for eukaryotes and the default cut-off value 0.45 (D-score) for SignalP-noTM networks were chosen. The neural network in SignalP generated three scores for each position of the input SsmatK sequence, the maximal value of raw cleavage site score (C-score), combined cleavage site score (Y-score) and signal peptide score (S-score) were 0.177, 0.137 and 0.171, respectively. The average S-score (mean S) of the possible signal peptide at position 1-37 amino acids was 0.105, the weighted average of the mean S and the maximal Y-scores (D-score) was 0.119. The above results showed that all scores for SsmatK protein were lower than the cut-off score of 0.45, indicating that there was no signal peptide in SsmatK protein (Fig. 1B). The over-all results suggest that SsmatK is a non-secretory protein.

Figure 1. Transmembrane domain and signal peptide analysis of SsmatK protein. Note: (A): Transmembrane topology. T indicates transmembrane, O and I represent outside and inside of the membrane, respectively; (B): Signal peptide. C-score represents raw cleavage site score, Y-score represents the combined cleavage site score, S-score represents signal peptide score.

Multiple Sequence Alignment and Phylogenetic Tree of SsmatK Protein To analyze the conserved region of SsmatK protein, the protein sequences of Spiranthes sinensis and 11 other plants were compared using the do complete alignment option of ClustalX 1.83. The results showed that the aligned protein matrix of matK consisted of 517 amino acids, the amino acid substitutions of matK protein in different plants were relatively even, but the amino acid substitution rates of N-terminal proteins were slightly higher. A domain-X region was found at position amino acid 372-472 of SsmatK, which contained the essential functional domian of SCARTLARKHK of matK proteins (Fig. 2). All above results suggest that matK protein sequences varies by species, but they are all constrained to retain the essential structural feature to maintain their biological functions. To analyze the evolution of SsmatK protein, the matK protein sequences of different species were analyzed by MEGA 6.0 software, then the distance estimation was calculated to establish the NJ phylogenetic tree. The results showed that SsmatK protein was closely related to the matk proteins of Schiedeella garayana and Schiedeella romeroana, their genetic distance were both 0.033. However, the genetic distance between the matK proteins of Spiranthes sinensis and Hordeum vulgare was far, with a genetic distance of 0.382 (Fig. 3).

224

Figure 2. Multiple sequence alignment of 12 matK proteins. Note: A period (.) indicates that the residue is identical to that of Spiranthes sinensis in the first line. The conserved domain-X motif in matK protein was indicated by overline, which has an essential sequence SCARTLARKHK marked

225 with a wavy line. The phosphorylation sites were indicated by asterisks, asparagine (N)-linked glycosylation sites were labeled with triangle and threonine (O)-linked glycosylation site was indicated by rhombus.

Figure 3. Neighbor-joining (NJ) phylogenetic tree of matK proteins. Note: The tree was constructed with the matK proteins of Spiranthes sinensis and 17 other plants. The number near each node represents the percentage that supports that node, with 1000 bootstrap replicates.

Phosphorylation and Glycosylation Analysis of MatK Proteins To reveal post-translational modifications of SsmatK protein, the N-glycosylation sites, O-glycosylation sites and phosphorylation sites were predicted by NetNGlyc1.0 Server, NetOGlyc4.0 Server and NetPhos3.1 Server, respectively. The prediction of phosphorylation sites of SsmatK protein showed that there were 55 phosphorylation sites, including 7 tyrosines (Y), 12 threonines (T), 35 serines (S) and 1 isoleucine (I). Analysis of glycosylation sites showed that SsmatK protein had two asparagine (N)-linked glycosylation sites at position of 83 and 336 amino acids, and a threonine (O)-linked glycosylation site at the 458-amino acid residue. The above results indicated that phosphorylation was the main modification of SsmatK protein. Interestingly, two amino acids (Ser446 and Thr450) in the essential sequence SCARTLARKHK of SsmatK protein were also phosphorylated, meaning that these two amino acids are potential important functional sites (Fig. 2).

Structure of SsmatK Protein To explore the spatial structure of SsmatK protein, the secondary and tertiary structure of SsmatK protein were studied using the Sopma and Swiss-model software, respectively. The secondary structure showed that SsmatK consists of 47.58% alpha-helices, 18.18% extended strands, 3.87% beta-turns and 30.37% random coils. The tertiary structure of SsmatK also indicated that alpha-helices and random coils are the main components (Fig. 4A). In addition, inter-residue interactions between Ser446 and Thr450, two potential phosphorylation sites in the essential sequence SCARTLARKHK of SsmatK protein, were aslo found. The results revealed that one oxygen atom of Ser446 can interact with the nitrogen-atom of Thr450, the interaction distance was 3.5 Å (Fig. 4B), indicating these two sites maybe have synergistic effects on the function of SsmatK protein.

226

Figure 4. Tertiary structure and amino acid residue interactions of SsmatK protein. Note: (A) Tertiary structure of matK protein of Spiranthes sinensis were shown in cartoon representation, with alpha-helices colored in red, coiled regions colored in green, beta sheets colored in yellow. The essential sequence SCARTLARKHK colored in blue, amino acids Ser446 (S446) and Thr450 (T450) were indicated by arrows. N and C represent the N- and C-terminals of the SsmatK peptide chain, respectively; (B) Interaction of the amino acid residues Ser446 and Thr450 was shown in stick-sphere representation, the hydrogen bond was shown as dashed red lines and the interaction distance was 3.5 Å.

Figure 5. Gene ontology analysis of SsmatK protein. Note: (A) Molecular function ontology; (B) Cellular component ontology; (C) Biological process ontology

227 Functional Analysis of SsmatK Protein To determine the functional characteristics of SsmatK, the amino acid sequence was analyzed by Predictprotein program. The results showed that there were 12 protein binding sites at amino acid sites 1+2, 19, 21, 73, 75, 91, 196, 198, 214+215, 243, 418 and 499, respectively. In addition, a polynucleotide binding region was also found at position amino acid 460. Gene ontology information indicated that the molecular function of SsmatK protein was mainly binding (Fig. 5A). The result of cellular component ontology showed SsmatK protein mainly existed in chloroplasts (Fig. 5B). The result of biological process ontology further revealed SsmatK was involved in metabolic process, gene expression and RNA processing (Fig. 5C). The above results indicate that SsmatK protein has important function for chloroplast.

Discussion Chloroplasts are important organelles for plant photosynthesis, the normal operation of chloroplast in higher plants requires the coordination of chloroplast genes and nuclear genes [15]. It was generally believed that the chloroplast genome of most higher plants contains 100-130 different genes, about 100 of these genes encode proteins involved in photosynthesis and gene expression [16-17]. However, depending on the size of the chloroplast genome, its coding capacity is far beyond what we now know. Therefore, strengthening the identification of unknown functional genes of chloroplast genome will helps to enrich our understanding of the chloroplast genome functions. Spiranthes sinensis was listed as an endangered species in China. To date, however, only a few of the chloroplast genes have been isolated and their molecular information remains largely unknown. In this study, molecular characteristics and functions of the group II intron encoding matK protein in Spiranthes sinensis were studied. matK protein proposed to be the only maturase encoded by group II intron of chloroplast genome, and the intron splicing of some chloroplast genes can only be performed by chloroplast maturases, which makes matK an important gene for chloroplast to play its normal function [18]. matK is a fast evolving gene at levels of nucleotides and amino acids, this unique feature makes it widely used in the phylogenetic analysis of land plants [19-20]. However, rapid evolution of chloroplast matK gene will lead to production of pseudogenes as a result of gene inactivation, and the matK gene is often considered a pseudogene due to occasional occurrence of frameshift indels, high frequency of transition/transversion or substitution ratios in some orchid taxa [21-22]. With the help of Protparam, TMHMM v2.0, signalP 4.1 and Predictprotein tools, SsmatK was predicted to be an unstable non-transmembrane protein (Fig. 1). Many chloroplast maturases encoded by group II introns can function in RNA splicing and binding of intron RNA during reverse transcription, which usually contain a readily identifiable domain X region containing an essential sequence SX3-6TLAXKXK [5, 23]. The multiple sequence alignment result in our study revealed that the domain X region existed in SsmatK protein and featured with the essential sequence SCARTLARKHK (Fig. 2). Phylogenetic tree analysis showed that SsmatK is closely related to the matK proteins of Schiedeella garayana and Schiedeella romeroana (Fig. 3). Therefore, based on the results of multiple sequence alignment and phylogenetic tree analysis, it can be concluded that the SsmatK protein belongs to the group II intron encoded maturase. Post-translational modification can regulate protein activity and localization, it can also affect protein-protein interaction in many cellular processes [24]. In addition, post-translational modification is also an important approach to increase proteome complexity and lead to fine regulation of protein function [25-26]. Glycosylation and phosphorylation analysis indicated that SsmatK protein contains 55 phosphorylation sites and 3 glycosylation sites (Fig. 2). The above results indicated that phosphorylation is the most important protein modification pattern of SsmatK protein, and since phosphorylation and dephosphorylation of proteins are key steps in cell signal transmission, these sites may have important effects on the activity and function of SsmatK protein. Residual interaction was found between the two potentially phosphorylated amino acids Ser446 and Thr450 in the high conservative SCARTLARKHK motif, the result suggested that Ser446 and Thr450 may be two synergistic sites that are important for SsmatK

228

protein function (Fig. 4). MatK gene has been shown to be closely related to the normal function of chloroplasts. A 19-base insertion was found in the encoding region of chloroplast matK gene of Wogon-Sugi, which caused the loss of chlorophyll synthesis [8]. In our study, the molecular function of SsmatK protein was defined as binding activity, including specific binding to other proteins, organic cyclic compounds, nucleic acids, heterocyclic compounds and RNA molecules, mainly involved in metabolic process, gene expression and RNA processing, indicating that SsmatK is an important multifunctional protein (Fig. 5).

Conclusion The characteristics and functions of SsmatK protein were studied from the perspective of bioinformatics in this study, the results suggest that the SsmatK gene located in the chloroplast trnK intron encodes a maturase involved in mutiple biological processes. This paper provided a detailed molecular information for the SsmatK gene, it was also significant for enriching chloroplast genetic information of the endangered Spiranthes sinensis.

Acknowledgements This work was supported by the key scientific research projects of Sichuan education department (16ZA0302).

References [1] J. Lan, L. Li, L. Jia, Y. Cao, H. Bai, H. Chen, S. Liu, L. Wu, G. Liu, Expression profiling of chloroplast-encoded proteins in rice leaves at different growth stages, Progress in Biochemistry and Biophysics. 38 (2011) 652-660. (Chinese) [2] C. Liang, C. Chang, C. Liang, K. Hung, C. Hsieh, In vitro antioxidant activities, free radical scavenging capacity, and tyrosinase inhibitory of flavonoid compounds and ferulic acid from Spiranthes sinensis (Pers.) Ames, Molecules. 19 (2014) 4681-4694. [3] J. Peng, Q. Xu, Y. Xu, Y. Qi, X. Han, L. Xu, A new anticancer dihydroflavanoid from the root of Spiranthes australis (R. Brown) Lindl, Natural Product Research. 21 (2007) 641-645. [4] P. Shie, S. Huang, J. Deng, G. Huang, Spiranthes sinensis suppresses production of pro-inflammatory mediators by down-regulating the NF-κB signaling pathway and up-regulating HO-1/Nrf2 anti-oxidant protein, The American Journal of Chinese Medicine. 43 (2015) 969-989. [5] G. Mohr, P. Perlman, A. Lambowitz, Evolutionary relationships among group II intron-encoded proteins and identification of a conserved domain that may be related to maturase function, Nucleic Acids Research. 21 (1993) 4991-4997. [6] M. Enan, Low-stringency single-specific-primer PCR as a tool for detection of mutations in the matK gene of phaseolus vulgaris exposed to paranitrophenol, Journal of Cell and Molecular Biology. 10 (2012) 71-77. [7] M. Barthet, K. Hilu, Expression of matK: functional and evolutionary implications, American Journal of Botany. 94 (2007) 1402-1412. [8] T. Hirao, A. Watanabe, M. Kurita, T. Kondo, K. Takata, A frameshift mutation of the chloroplast matK coding region is associated with chlorophyll deficiency in the Cryptomeria japonica virescent mutant Wogon-Sugi, Current Genetics. 55 (2009) 311-321. [9] J.V. Freudenstein, D.M. Senyo, Relationships and evolution of matK in a group of leafless orchids (Corallorhiza and Corallorhizinae; : Epidendroideae), American Journal of Botany. 95 (2008) 498-505.

229 [10] M.M. Barthet, K. Moukarzel, K.N. Smith, J. Patel, K.W. Hilu, Alternative translation initiation codons for the plastid maturase MatK: unraveling the pseudogene misconception in the Orchidaceae, BMC Evolutionary Biology. 15 (2015) 210. [11] A. Krogh, B. Larsson, G. Von Heijne, E.L. Sonnhammer, Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, Journal of Molecular Biology. 305 (2001) 567-580. [12] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular evolutionary genetics analysis version 6.0, Molecular Biology and Evolution. 30 (2013) 2725-2729. [13] C. Geourjon, G. Deléage, SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments, Bioinformatics. 11 (1995) 681-684. [14] T. Schwede, J. Kopp, N. Guex, M. Peitsch, SWISS-MODEL: an automated protein homology-modeling server, Nucleic Acids Research. 31 (2003) 3381-3385. [15] J.E. Mullet, Chloroplast development and gene expression, Annual Review of Plant Physiology and Plant Molecular Biology. 39 (1988) 475-502. [16] D. Shi, Z. Liu, W. Jin, Biosynthesis, catabolism and related signal regulations of plant chlorophyll, Hereditas. 31 (2009) 698-704. (Chinese) [17] S. Shusei, N. Yasukazu, K. Takakazu, A. Erika, T. Satoshi, Complete structure of the chloroplast genome of arabidopsis thaliana, DNA Research. 06 (1999) 283-290. [18] H. Neuhaus, G. Link, The chloroplast tRNALys(UUU) gene from mustard (Sinapis alba) contains a class II intron potentially coding for a maturase-related polypeptide, Current Genetics. 11 (1987) 251-257. [19] W.T. Jin, A. Schuiteman, M.W. Chase, J.W. Li, S.W. Chung, T.C. Hsu, X.H. Jin, Phylogenetics of subtribe Orchidinae s.l. (Orchidaceae; ) based on seven markers (plastidmatK,psaB, rbcL, trnL-F, trnH-psba,and nuclear nrITS,Xdh): implications for generic delimitation, BMC Plant Biology. 17 (2017) 222. [20] C. Castellanos, R. Steeves, G.P. Lewis, A. Bruneau, A settled sub-family for the orphan tree: The phylogenetic position of the endemic Colombian Orphanodendron in the Leguminosae, Brittonia, 69 (2017) 62-70. [21] A. Kocyan, E.F. De Vogel, E. Conti, B. Gravendeel, Molecular phylogeny of Aerides (Orchidaceae) based on one nuclear and two plastid markers: A step forward in understanding the evolution of the Aeridinae, Molecular Phylogenetics and Evolution. 48 (2008) 422-443. [22] W.M. Whitten, N.H. Williams, M.W. Chase, Subtribal and generic relationships of Maxillarieae (Orchidaceae) with emphasis on Stanhopeinae: combined molecular evidence, American Journal of Botany. 87 (2000) 1842-1856. [23] D.C. Hao, J. Mu, S.L. Chen, P.G. Xiao, Physicochemical evolution and positive selection of the gymnosperm matK proteins, Journal of Genetics. 89 (2010) 81-89. [24] A. Hashiguchi, S. Komatsu, Impact of post-translational modifications of crop proteins under abiotic stress, Proteomes. 04 (2016) 42. [25] M. Audagnotto, M. Dal Peraro, Protein post-translational modifications: In silico prediction tools and molecular modeling, Computational and Structural Biotechnology Journal. 15 (2017) 307-319. [26] A. Darling, V. Uversky, Intrinsic Disorder and Posttranslational Modifications: The Darker Side of the Biological Dark Matter, Frontiers in Genetics. 09 (2018) 158.

230