<<

www.nature.com/scientificreports

OPEN Molecular evidence for origin, diversifcation and ancient gene duplication of Received: 22 January 2019 Accepted: 9 August 2019 (SBTs) Published: xx xx xxxx Yan Xu1,2,3, Sibo Wang2,4,5, Linzhou Li2,6, Sunil Kumar Sahu 2,4, Morten Petersen5, Xin Liu 2,4, Michael Melkonian7, Gengyun Zhang2,4 & Huan Liu 2,4,5

Plant subtilases (SBTs) are a widely distributed family of which participates in plant developmental processes and immune responses. Although SBTs are divided into seven subgroups in , their origin and evolution, particularly in green remain elusive. Here, we present a comprehensive large-scale evolutionary analysis of all subtilases. The plant subtilases SBT1-5 were found to be monophyletic, nested within a larger radiation of suggesting that they originated from bacteria by a single horizontal gene transfer (HGT) event. A group of bacterial subtilases comprising representatives from four phyla was identifed as a sister group to SBT1-5. The phylogenetic analyses, based on evaluation of novel streptophyte algal genomes, suggested that the recipient of the HGT of bacterial subtilases was the common ancestor of , and . Following the HGT, the gene duplicated in the common ancestor and the two genes diversifed into SBT2 and SBT1, 3–5 respectively. Comparative structural analysis of homology-modeled SBT2 also showed their conservation from bacteria to embryophytes. Our study provides the frst molecular evidence about the evolution of plant subtilases via HGT followed by a frst gene duplication in the common ancestor of Coleochaetophyceae, Zygnematophyceae, and embryophytes, and subsequent expansion in embryophytes.

Serine proteases are a highly abundant and functionally diverse class of proteins which occupy a notable place in plants1,2. According to the MEROPS peptidase database, the clan SB is one of 13 clans of serine proteases that are widely distributed in Archaea, Bacteria, and eukaryotes2–4. Clan SB contains two families, family S8 (ofen called the subtilase family) and family S53 (the family)4. Te catalytic mechanisms of the two families of clan SB are diferent. In family S8 the residues form a in the order Asp, His, Ser, whereas fam- ily S53 contains a catalytic tetrad in the order Glu, Asp, Asp, Ser. Family S8 is the second largest family of serine proteases and is divided into two subfamilies, S8A (type example ) and S8B (type example )5. Plant subtilisin-like proteases (also known as plant subtilases, SBTs) belong to subfamiliy S8A. Te frst subtilase cloned from plants was cucumisin from melon fruit6. In the past 20 years, several SBT gene families have been revealed throughout the plant , in Vitis vinifera7,8, Arabidopsis thaliana9, Oryza sativa10, Solanum lycopersicum11,12, Solanum tuberosum13, and others. Te Arabidopsis proteome alone comprises 56 subtilases. Based on these fndings SBTs have been divided into six subgroups9, and recently one of the sub- groups (SBT6.1) was described as a seventh subgroup (SBT7)14. Schaller et al. have summarized most of the functions of plant subtilases, such as in embryogenesis, seed devel- opment and germination, cuticle formation and epidermal patterning, vascular development, programmed cell

1BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China. 2BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China. 3China National GeneBank, Institute of New Agricultural Resources, BGI-Shenzhen, Jinsha Road, Shenzhen, 518120, China. 4State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, 518083, China. 5Department of Biology, University of Copenhagen, Copenhagen, Denmark. 6School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China. 7Botanical Institute, Cologne Biocenter, University of Cologne, Cologne, D-50674, Germany. Yan Xu, Sibo Wang and Linzhou Li contributed equally. Correspondence and requests for materials should be addressed to H.L. (email: [email protected])

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 1 www.nature.com/scientificreports/ www.nature.com/scientificreports

a. b.

Embryophyta Alveolata Streptophyte algae Euglenozoa Stramenopiles Glaucophyta Rhodophyta Fungi Bacteria Archaea

Chloroflexi Chloroflexi Firmicutes 100 Bacterial S8A genes 100 Caldiserica (S8 cluster 1) SBT1-5 97 Caldiserica 98 Coprothermobacterota 84 Actinobacteria Chloroflexi Chloroflexi 98 Gamma-proteobacteria 83 75 S8 cluster 1 86 100 Delta-proteobacteria Delta-proteobacteria Bacterial Subtilases 79 100 Actinobacteria 69 100 Chloroflexi Actinobacteria

Actinobacteria 40 83 100 Bacterial Subtilases S8 cluster 2 100 100 Gamma-proteobacteria 98 59 97 Chloroflexi 99 Beta-proteobacteria/Gamma-proteobacteria

Beta-proteobacteria/Gamma-proteobacteria KP43 proteinase 93 100 Gamma-proteobacteria 70 Beta-proteobacteria 88 100 Firmicutes 83 Actinobacteria

56 57 SBT7 97 Chloroflexi 95 68 Actinobacteria

Mesotaenium endlicherianum S8 cluster 4 99 Streptophyte algae scutata HGT 100 92 Streptophyte algae

100 “ sp.” Plant Subtilases 100 Streptophyte algae (SBT2) 73 Tree scale: 1 Embryophyta SBT6 100

S8 cluster 3

Figure 1. Phylogenetic relationship of the S8A gene subfamily. (a) Maximum likelihood unrooted phylogenetic tree of the S8A subfamily from representative Archaea, Bacteria and species was constructed with IQ-TREE (model: LG + R10, predicted by Modelfnder) using an ultrafast bootstrap approximation (100,000 bootstrap replicates). Colored domains display eight diferent clusters in the S8A subfamily. Te domains with a continuous line indicate resolved clusters, while domains with dotted lines represent undefned clusters. Te colored circles at the top lef represent the species composition of individual clusters. Te plant subtilases (SBT1-5) apparently originated from bacterial subtilases (red branches in in paraphyletic divergences) through a single HGT. (b) Phylogenetic analysis of plant subtilases using an extended taxon sampling of bacterial subtilases to search for a bacterial sister group to the plant subtilases was constructed by Maximum Likelihood using 500 bootstrap replicates (model: WAG + F + R7, predicted by Modelfnder). Plant subtilases are monophyletic with a clade of bacterial sequences derived from four phyla (Proteobacteria (only Gammaproteobacteria and Betaproteobacteria), Chlorofexi, Actinobacteria and Firmicutes). Te streptophyte algal sequences from Mesotaenium endlicherianum, Coleochaete scutata and “Spirotaenia sp.” diverge paraphyletically from the common ancestor of the plant subtilases with “Spirotaenia sp.” in sister position to embryophytes (the detailed tree with all taxon and species names is shown as Supplementary Fig. S1). Some bacterial S8 genes from the phylogenetic tree of the S8 cluster 1 (a) were selected as an outgroup.

death, organ abscission, and plant responses to their biotic and abiotic environments15–17. However, many specifc functions and physiological substrates of SBTs still need to be explored. Previous studies have found that land plant subtilases were derived from a single HGT (horizontal gene transfer) event18. Horizontal gene transfer is proposed relative to vertical gene transfer, which breaks down the boundaries of kinship and complicates the pos- sibility of gene fow. Many genes found in algae doesn’t have any homologs in higher plants were suggesting their possible bacterial origin. For instance, two alginate-specifc , MC5E and GDP-mannose dehydrogenase show high similarity with the bacterial genes, indicating that these genes might have undergone a non-canonical evolutionary history. Taylor and Qiu (2017) investigated the evolutionary history of plant subtilases through a phylogenetic analysis using 2,460 subtilase sequences of 341 species, and identifed 11 new gene lin- eages14. Meanwhile, the presence of plant subtilases in streptophyte algae, the grade of most closely related to land plants (embryophytes), was also frst reported in their study, based on analysis of transcriptomes14. Plant subtilases seem to have experienced several gene duplications, which were accompanied by functional diversifcation7. Due to their multiple duplications and complex evolutionary history, it is important to explore the origin and diversifcation of SBTs among including the streptophyte algae. We have recently sequenced the genomes of several streptophyte algal species (unpublished data) enabling us to explore the evo- lution of subtilases including all major lineages of streptophyte algae. We reconstructed the phylogeny of all S8A subtilases existing in archaea, bacteria, and . Ten we focused on the origin and evolution of plant subtilases (SBTs). Here we show that plant subtilases originated from bacteria by HGT into streptophyte algae followed by a gene duplication event in the common ancestor of Coleochaetophyceae, Zygnematophyceae and embryophytes. Our study provides new information about the origin and early diversifcation of plant subtilases and thus contributes to a better understanding of the phylogeny of the S8A protease family. Results Classifcation of the S8A gene family and putative origins of plant-type subtilases. To explore the phylogenetic relationship between plant subtilases and other members of the S8A family, genes related to plant subtilases were selected based on genome functional annotation. We performed a phylogenetic analysis incorpo- rating genes from Archaea, Bacteria, Fungi, Amoebozoa, Stramenopiles, Euglenozoa and (Fig. 1a;

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 2 www.nature.com/scientificreports/ www.nature.com/scientificreports

Supplementary Tables 1–3 with all genes harboring the conserved S8 domain. Finally, eight (nine: SBT7 and S8 cluster 4 could not be reliably separated in the phylogenetic analysis, Fig. 1a) defned clusters were obtained by combining domains and phylogenetic information (Fig. 1a; for conserved domain designations see Supplementary Table S3): Proteinase K, KP43 proteinase, S8 clusters 1, 2, 3, and 4, SBT 6, SBT 7, and cluster SBT 1–5. Among the S8A subfamily genes, the Proteinase K-related cluster fwas well supported. Proteinase K is a known that has previously been found only in fungi and bacteria5 but our analysis revealed that Proteinase K homologs also occur in Archaeplastida [from Rhodophyta, Glaucophyta, to green algae and bryo- phytes (Marchantia polymoprpha and Physcomitrella patens)] but not in vascular plants (Fig. 1; Supplementary Table S3). Another well-supported cluster is the KP43 protease with the Peptidases_S8_Kp43_protease domain. Te architecture of the Kp43 protease was reported to be similar to kexin and , both of which belong to the S8B subfamily19. In addition to Proteinase K and KP43 protease, other S8 family genes were also clustered in diferent groups (Supplementary Table S3). However, it was difcult to defne them precisely because of their non-systematic domain distribution. Plant subtilases possess one of three types of domains: Peptidases_S8_Tripeptidyl_Aminopeptidase_II (SBT6), Peptidases_S8_SKI-1_like (SBT7), and Peptidases_S8_3 (SBT1-5) (Supplementary Table S3). In Fig. 1a, SBT-1-5 occurs only in embryophytes, two clades of streptophyte algae (Coleochaetophyceae and Zygnematophyceae) and a set of diverse bacteria, which suggests horizontal gene transfer between these unrelated organisms. We termed these bacterial genes “bacterial subtilases” or “bacterial SBT”. In order to further explore the phylogenetic relationship between bacterial subtilases and plant subtilases, a detailed tree was reconstructed by using the genes selected from a wider range of species (Fig. 1b; Supplementary Fig. S1). In our preliminary analysis we performed the Blastp against the nr database using the cutof e value 1e-10. We found that only bacteria and streptophyte algae possess plant-like SBTs, suggesting the complete absence of SBTs among Archaea, Fungi and other eukaryote taxa. Te further detailed phylogenetic analyses of the plant algae-type subtilases suggested a single HGT event from a bacterial donor because the streptophyte sequences had a single origin, nested within a larger radiation of bacterial subtilases (Fig. 1a,b; Supplementary Fig. S1). Te search for the bacterial donor of the HGT is compounded by the fact that the SBTs apparently underwent mul- tiple HGTs among bacteria (Supplementary Fig. S1), i.e. the bacterial SBT phylogeny does not refect the species phylogeny. Te bacterial sister group to the plant subtilases (bootstrap support 95%; Fig. 1b; Supplementary Fig. S1) comprised species from four phyla (Proteobacteria (only Gammaproteobacteria and Betaproteobacteria), Chlorofexi, Actinobacteria and Firmicutes). All bacterial SBT sequences in this clade (except Halioglobus sp.20, which is a marine bacterium) correspond to soil bacteria, e.g. Glycomyces xiaoerkulensis21, Longilinea arvoryzae22 and Streptosporangium roseum23. HGT among soil bacteria is rampant involving IncP- and IncPromA-type broad host range plasmids24. Te frst diverging lineage in the bacterial sister clade consists of fve sequences from Gammaproteobacteria and Betaproteobacteria indicating that perhaps one of these two classes of bacteria had provided the donor SBT gene for the plant subtilases. Although a representative set of fve cyanobacterial genomes was included in the analysis, none encoded plant-like SBTs (Supplementary Fig. S1). An HGT of the SBT gene in the opposite direction i.e. from a plant donor to bacteria cannot be ruled out, however, this is unlikely, because the plant subtilases were nested within a larger bacterial radiation (and not the opposite), and no trace of plant subtilases were found in the earlier diverging streptophyte algae (for a recent review on HGT from bacteria to eukaryotes25). Te phylogenetic tree (Fig. 1b; Supplementary Fig. S1) indicated that the recipient of the bacterial subtilase was a streptophyte alga, most likely the common ancestor of Coleochaetophyceae, Zygnematophyceae and embryophytes (Fig. 1b).

Phylogenetic analysis of plant subtilases SBT1 to SBT5. To further explore the phylogenetic rela- tionship among SBT1-5 plant subtilases, an extensive search was conducted by taking 54 subtilases as the standard reference9. A total of 314 genes from 3 species of streptophyte algae (Coleochaete scu- tata, “Spirotaenia sp.” and Mesotaenium endlicherianum) and 7 species of embryophytes (Marchantia polymor- pha, Physcomitrella patens, moellendorfi, Salvinia cucullata, Oryza sativa, Zea mays and Arabidopsis thaliana) were selected. Tese identifed plant subtilase genes were also classifed into fve subgroups (SBT1-5), similar to the classifcation based on subtilases from Arabidopsis thaliana (Fig. 2). Each gene’s intron number and their average length was calculated as well (Fig. 2; Supplementary Table S5). It allowed us to gain new insights into the evolution of subtilases in Viridiplantae. First, the most interesting observation was the existence of plant-like SBT genes in streptophyte algae. Te possible presence of plant subtilases in algae had been reported before for three species of Zygnematophyceae ( sp., Cylindrocystis sp. and Roya obtusa; Taylor & Qiu, 2017; their Fig. 114) based on the evaluation of transcriptomes established by the 1,000 plants transcriptome initiative (1KP project)26, but their distribution within streptophyte algae was not studied. Because of the recent availability of high-quality genome data of several streptophyte algal clades, we were able to provide the frst evidence for the origin of SBT1 to SBT5 in the common ancestor of Coleochaetophyceae, Zygnematophyceae and embryophytes. According to the phylogenetic tree, following their origin by HGT, the plant subtilases likely underwent one gene duplication in the ancestor of Coleochaetophyceae, Zygnematophyceae and embryophytes, one copy evolved into SBT2 and the other was ancestral to SBT1, SBT3, SBT4, and SBT5 (Fig. 2). Other gene or genome duplications possibly occurred in the common ancestor of embryophytes, however, because of the low confdence levels with less than 50% bootstrap support, relationships among SBT3, SBT4, and SBT5 could not be resolved. Te SBT1 subclass is the largest of the subtilase subfamilies according to their large gene copy numbers, and this subclass has undergone multiple gene duplications starting in the monocot plants. Interestingly, we found that the intron numbers vary considerably between diferent groups of plant subtilases, especially in SBT1, almost all the genes had no intron. Tis phenomenon was also reported in grape subtilases, which is inferred that, in order to increase the ftness of an organism, the intragenic recombination is also increased which is further related to the evolu- tionary rate of genes7,27,28.

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 3 www.nature.com/scientificreports/ www.nature.com/scientificreports

25

12

1

10

0 0 0

1 1 10 10 10

10 9 10 10 8

10

7 13 10

10

7 8 9 7

9

11 9

3 4 10 1 9 10 1

6

9 4 1 4 1 9 10 9 9 10 15

8 13 9

5 0

0 8 0

0

0

0

0 22 12 1087.3 0

0

4 8 9 5 0 0

3

385.67

2 0

2 2

8 9

2 4

. 277. 8

2 . 6 4

1 . 0

2 4 .

16.5 1

1 1 7

5

10.5 1

9 1 299.8 4

1 2

0 .56

8

2 186. 1 . 0 1 82.22

6 3 97.6 166.78 7 9 183. 18.5 1 5 13.6 1

7 1 168 121.5 18.43 320 88 7.8 0 1 13.22 7 75 6 1 264 726.27 11 269.38 193.78 77 243.67 0 214.75 106. 83.5 72 6 305.13 169.78 0 6 365. 224.8 42.6

400.73 0 28 3 9 0 0 205.13 58.4 2374 184. 0 218.07 0 3 0 5 0 1267 0 195.2 197 0

0 0 9 590. 0 0 796.0 33 0 0 9 0 0 3 554.1 ancestors 0

3 M 0

p

. a Sacu of SBT1,3,4,5

1 p

Z Sacu v1. . 0 0 799.43 Zm00001d002783 o

Os04t0543700-01 1

m 1 Zm0000 5

. 63 Zm0 l

y 7

6 6

Os01t076920 0

0 0 0

8

3 0 v1.1 V A V6.

0

0

9

0 0

T

5 s 0001 5

0 4 21 0 1 19.g021298 3 7 1 9V6.1 002.g000828 0 A 0

1 2 1 5 1 s0240. G

s 4 A s0263.g0 T5G44530.1 4

d A 1d043327 3 0 0 0 P T4G204 2 8 1 6 5 09V6.1 T1G3060 0 0

0 ACid 15 d012 1 s0 P 2 1 0 2 1 1

9 y 1 11 4 3 P ACid l 5 1 d Phypa65 9 a

i

4 SBT2 o ACid 0 9

. p

0 p 1 C 00000-0 1 0 Phypa48 63V6. y 655

. 0-01 a Map . g02662 9 1d01466 d03648 A

1 h p

Phypa 154030 26913 30. P M

P 8 40914 Phypa 065 P 1541 Phypa163 5 0 Phypa7 3 0 0. Mapol oly013 ACid Sacu v1.1 s0 0 1 15S1T1871 Sacu Sacu v1.1 T4G30020.1 Phy 308V6. 1 AT2G19170.1 67S01971 199 5 A S08 12 7 23 Os06t07 0 804 0 1 Zm0000 isp365S13 1 03 0 v1.1 s0 pa14 4S0 y0060s0 Zm00001 0 5s00 5410 PACid 15414 Sp 7V6. 9 0S0568 Spisp5 0 1 n12 Colsc 319V6.1 s 1 00-00 03.1 Colsc8333S0T1580 10.1 Spi 532 369S0695 0-01 1 1 0 Spisp 080.g0 Mesen472 890 0 1 e sp255 Mese 1 . 1.1.p p 0 Spis Mesen28 01431 0 1055S Mesen 3 n Col T1G04 Spisp 1794 944 S A 01d037 5900-01 0 Col p2 sc9 1 1 1619 Os03t 0 e 1 Col sc2 04893 6 255S 1S009 Os05t036870 4 768S 00-0 g s 305 Zm000 68 0 c 0 1 -01 1670S 0T1734 1 Os07t068 2S0T 1620 59 d023 Mesen96S Os03t0430500-0 86 0 3 Me 1 3 A Zm00001d02253 296 2220 0T190 80. 0 Mesen225S04sen15 Os12t04276 5 5 100 0524600 0-0 1d0 8 4 Me Zm00001 5 1012 7S03 2 1930 7 sen20 8 Os10t 1 49 0 0 Mesen2 S 100 6 Zm0000 Clavi 016 0-00 Mesen157S03010 7S0414 Os03t0 01d048 17 0 355 8 l 5 100 Cl ba 76 Mesen100 89S 0 avibacter cter-W Zm00001d027387 99 100 Arthrobacte 33 055900 0581 a 100 1 Zm000 1d0200 0 9 i 85 04t P 94 Kytoc 3 100 Os r -W 0122 100 0 0 S00 100 P 100

1 0-00 Kytococcuoccus- r-W 012 Zm0000 2 Intrasporangium-W 9803 061 100 2

e 0 P Zm00001d0026 0 005267086.2980 250

t 0 W 5.1 018230 0 P 100 Zm00001d00398 1d0 0-01 5 s- 08143974 34. 100 0 c W 100 100 1 9 Os04t Herpe Colwelli P 6530

37

0128 4 0 a 1 69 Zm0000 0 6 to P 51 s a-WP 0 02038. 6. 25 0 iph 1 1 80 Os02t06 4 Gl 3491397. 100 48 on- B ac 0330 85 79 0 1 Zm00001d017524 0261 1 0 8 Roseiflexuie W 45 cola 20 04t0558900-01 P 28 512. 05 84080. 92 001d 4 - 1 Os 0 2 4 C W 453 0 hlor P 0 100 635 1 90 Zm0 oflexus-Ys-W 512 .1 Na 76 6 7 P 9. 0 kamurella-W 012120 88534.1 1 100 Phypa156 69V6. 5 104.71 Herp 63 100 186V SBT1 P 1 Phypa232 23V6. 418.3 etosi 0016364 54 0 0 8 781. 59 1 100 ypa78 8 ph 100 h Roseifl P 01574750 1 P on- 100 100 In W 73. T1G01900.1 01 0 Ba tr 100 100 A 0- cillu asporan exus-W P 0 1 0 133.22 54533605 99 9 685. s am 7.1 100 100 6 100 Sacu v1.1057830 s0096.g01937 6 8 Ba gium-W P 0121218 0 cillu yloliq 96 2 1 s li uefa .1 Os07t che P 53 001d00662191 9 01 76 98 niform ciens 91. 74 100 96 m00 0 3493 1 100 Z 01d0 -Subtil 874. 22 is-S 58 40 7 ubtilisin C 1 100 100 Zm000 0 0 isin 65 0430700-013668 9 BPN 62 100 24 O Os04t 2 s02 arlsber 77 94 68 99 31 100 25.5 t02 40 Zm00001d00 0 7080 g -00 Os02t0 00 96 0-00 T5G67090.1 .86 2702 52 100 A 8688 0 Os02t 99 100 4 175. 00-01 0 02709 95 Os01t0 d012268 10 25 Os02t 0 00-0 00-01 317.83 0271000-01 0 Zm00001 80 100 1 Os02t0269600-0 98 00-0 71 0 Os01t08689 9 223.33 Zm0 t04358 79 Os05 10179 96 0001d007 0 52 0 100 .g0226 0 Zm000 91 140 517 100 41 Zm00001d0 s0 4 11 01d00731 270 1.33 97 73 cu v1.1 0 Zm0000 61 Sa 140.g02 359.25 9 100 0 2 1d043216 2.g010887 Zm0 99 Sacu v1.1 s0 003 0 0001d04 95 89 96 Os01t0 321 61 66 100 Sacu v1.1 s 4 11 1 6 120.g021409 0 1042 8.3 795 0 100-0 76 Sacu v1.1 s0 Os01t 0 33 100 7 0795000-0 86 240.1 0 82 T3G14 0 1 Os01t 1 100 100 71 A 00-0 438 0794800-0 93 76 .5 100 93 Os08t04521 0 Os01t 1 100 100 76 5 0795400 98 1 Zm00001d050092 26 Os01 -01 56 0 2 t0795 96 200-0 AT4G34980.1 1 432 0 6 432.43 Os04t012 100 73 16 1 Os03t0242900-01 100-01 98 97 99 Os04t0120300 675 9 55 Zm00001d028391 p 242.4 -00 100 .1. 0 Os04t01213 00-00 100 70 Mapoly0136s0006 364.33 100 .1 0 Zm00001 100 28 90V6 1 7 d037651 100 62 Phypa Os04 97 22V6.1 0 100.11 t0127300-01 9 98 Phypa170 8 100 100 100 2 1 Os04t0127 Phypa19 223V6. 144 86.5 200-02 7 100 98 17170 AT4G10540. PACid 154 1277 1 50 0 8 91.5 46 AT4G10510. 42 PACid 15407072 1 100 28 39 96 5420927 0 93.13 A PACid 1 0 8 T4G10550.3 89 64 41 AT3G14067.1 0 93.75 AT1G32960.1 100 100 100 00-01 0 8 A Os02t07792 T1G32950.1 100 100 0 102 Zm00001d018282 AT1G32940.1 90 12 0 0 9 100 72 100 Os02t0778900-01 117.44 AT1G32970.1 99 100 Zm00001d018281 0 01 91.89 AT4G10530.1 100 98 9 Zm00001d0 87 52104 0 AT4G10520.1 100 100 0 89.11 74 Os02t0779000-00 9 AT4G21640.1 98 0 175.89 93 100 94 39 Os02t0779900-00 AT4G21630.1 100 100 0 Os02t0780200-01 9 108.11 AT4G21650.1 29 100 AT2G0592 3077 .1 0.1 0 1 AT1G66220 100 100 115.78 21 89 Os09t044 9 0.1 Embryophyta 1000-00

SBT3 AT1G6621 112 63 89 100 Zm00001d 0 123.43 020627 T5G11940.1 25 average length of intron 9 A Os10t0394 70 100 200-01 460 90.86 AT4G21326 1 98 53 Zm00001d0330 100 39 0 7 AT4G21323.1 95 100 Streptophyte algae 105.75 AT5G5175 21283 100 0.1 1 Zm00001d0 17 Os04t057330 98 7 100 861 7 02089 100 0-00 Zm00001d Zm00001d 0 0 5 90.88 026068 01d00620 100 Bacteria 22 100 AT5G67360 Zm000 0 intron number 0 8 40 1 .1 97 50 1 100 Os03t06 1 Os09t0530800-0 100 59 74 05300-0 0 83 Zm00001d 1 6 0 PACid 154160 276 50 033194 0 99 22 Os03t 0 PACid 15415 100 10 12 0761500-0 107.56 50 37 78 Zm0 2 8 PACid 15415415 32 0001d034 0 0 horizontal gene transfer and ancient gene duplication 82 P 1 5 ACid 1 12 15.7 Cid 15404306 1 PA 54083 0 9 402669 PACid 1 37 0 59.88 PACid 15 907 100 99 5418128 0 PACid 0 2 154191 PACid 15406 85 100 58.75 Mapoly0 4 0 2 6 0 Zm00001d02773 100 PA 013s008 Tree scale: 1 Cid 15 51.14 4.1. Os03t0158700-01 9 PA 41 p 0 0 9 Cid 1387 AT4G26330.1 43 28 100 15 54.63 9V6.2 Phypa2 411963 4 100 0 0 Sacu v1.1 18s0 Phypa70 24 10 115V6.1 55.88 p 26 .2. 85 PACid 15418 79. Phypa70 249V6.1 52 100 068.g0164 23 s0026 .p 100 92 46 0 8 026.1 34 97 AT 57.5 34 5G4564 22 22 Mapoly0098 98 24 100 1 603.44 100 AT5G45650. oly0098s0 8 41 99 0.1 0 Map 8 284.86 ACid 154077 100 46 Os01t07 P 100 1 407. 98 82 91 15 65 Zm00 27800-0 55.1 5 PACid 1541889 0 7 100 001d 3 Cid 15404373 100 PACid 154127 1 PA 5418903 20 0435 54.8 122.14 20 33 58 57 397 3 41 950.13 PA 81 9 ACid 1 100 Cid 0 P 100 38 83 8 34 PACid 15420 11 243.86 59 100 PACid 15404 20 291.62 80 15405672589 97 43 PACid 196 ACid 1541890654227 44 85 100 P 34 Mapoly0040s0059.1. 1542 8 98 100 5404398 0 PACid 1 92 100 Mapoly004 0093 524.63 99 114.3 234.7 Cid 1 90 100 2 PA 94 Mapoly 8 6 9 79 0s0057.1. ACid 1540438 9 Mapo 13 234.7 P 99 0004s01 232.88 100 100 p 100 9 58.13 ACid 15422486 84 Mapol ly00 P 49 100 7 1 100 p 0 Sacu 40s0 74.1.p ACid 1540180 6230 71 y0059s0 361. P 9

100 05 89.3 29 66 Sacu v1.1 s0035.gv1.1 s02 88 ACid 15412476 99 217.67 P 579 83 6.1 3 10 8 98 100 Sacu 076.1. .p 158 ACid 154168 6121 94 P 206. 41 Sacu v1 v1.1 21.g02 p 0 8 ACid 1540 6 88 53.6 P 97 100 Sacu v1.1 s00 58.2 5 100 7 50 0 44 s022 619 ACid 1540 1 Sacu v1.1 s0023.g00896.1 s00 9 74 P 100 46 55 0 53 9.g 1 6 7 100 30 1346 72 ACid 15413064 Sacu v1.1 s0169.g02 P 409656 15.g0 0263 9 7 57 95 ACid 15415 2282 Os08t

2 55.5 P 66.g016 96 61.2 3548 52 100 0669 53 100 100 100 Zm00 58 PACid 15 0326700- 5 100 A 9 ACid 154 1541 T2G04 73.1

50 001d 12 P 100 A 06556 T5G5 8 1 PACid 6 13 100 Os09t0 0349 55.63 Cid 15407556 160 00 8 429 1 20 Zm00001 9810. 5 PA SBT .1 24 1 ACid 154 Os01t070 9 55.63 P 47990 130. 1 10 ACid 15406553 0607 Zm0 P 2 Zm0000 d020867 13 0001 0-0 PACid 1540681 Os0 4 ACid 15410604d 1541 2300-00 1 P Zm00001d01579 13 10 2t0 d0437 509 ACi 9 8 135.5 P 1d04679 190.71 68 P ACid 15 P 1987 8 3 ACid 15412832 407880 P 0.1 ACid

P 31 ACid 15417550 ACid 1 P PA 00- 315. 3 8 ACid 15417548 1540502 P 154 41827 3 P ACid 1542005Cid 01 PA

ACid 15 154 8 135.42 P 285. P 15415 1026 1.2 ACid T5G5883 58810 .1 Cid 1 A P AC 8 11 P T5G58820.1 120.1 .1 0633 7 P ACid 15423056 6 56 A P ACid 1540810 id 15408 T5G58840.1 T5G 2 SBT4 A A P ACid 1 5413 .1 63 P ACid 1541 1 224. PA ACid 5 T5G59 6 P 8 A P 62.2 T5G59090 0.1 ACid 1 207 5 Cid 15 3 P 5 A ACid T5G59130 P 540 66 6 P ACid A P ACid 1 154080 P ACi T5G59100.1 PA O ACid 1540620 5 Z .1 ACid 06 0 P A T3G46850.1 S 7 55.57 383 s 6 m 5 0 15420 A A T3G46840 a 260.4 Cid 1541 540810 4 58 0 d 1 0489 6 6 0 1065 0.1 A 019064 c C 15419995 9 0.1 0 T4G15040.1 d02099 - 1 6

4

u T5G5919 0 1837 4 A 0 i 64 t 5406 0 197. d019 d

0 15412242 5406 A 0 T2G39850.1 5 79 v

4

6 3 0 1 3

9 A 1

01 0 2 10 1 1 2015 0001 5

8 .

d 1 6 4 8 d 67.29 4 4G00230.1 7 736 01d0194 1

T5G03620 1 19 7 259. 0 0

2 s 8 9

0

0 A 3 0 8 0 1 AT T1G2016

Zm0 T1G 0 8 t 0 5 6 1 19 53.25 9 2 Zm00001d A

0 A 3

1 - Zm000 7 8 1 1 44 0

0 Zm00001d019065 0

1 1.83 8 3 12 0

54.5 s 8

8 m Zm000 . Os09t0482660-01

g

O Z 2

0

55.1 2

4

9

55.2 1 165.14 9 8 1 9 436.22 54.25 55.3 5 8 3 10 78.22 576. 1 29 16.6 54.13 72 97 5 5 7 809.6 8 223 78.1 7 7 7 57.7 65.5

192.67 55.7 521. 85.13 719 87.18 2 5 2 88.25 56.5 8 97.5 5 .17 10 57.75

77.1 4 100 158.88 2 12 102.78 22 7 10 51.4 145.25 5 73 143. 137.14 4 101.22 99.33 60. 1 15.3 98.44 59.75

59.5 85.75 54.5 1

92.13 79. 1

7 57.8 3 53.9 1 9

5 12.0 7 87 7 210 125.33 6 127.44 44 5

7

8 3 9

.

3

1

8

6 9

7 . 10.5

83.8 6

9

190.78 . 1 7 5

183. 106. 0

162.38 5 86.63

8

.

2 0 1 7 291

1

9

6 8 9

8 15

1 7 9 10 8 10 6 3 10

10 3 9 8 3

6 2 8 9

8

8 10 8 10 8

9

2 8

8 11 9

9

9 8

9 6 7 9 5 9

8 9 3 8 4

7

5 8 9 9 5

8 9 8 16 9

10

9 8

8 9

5

10

10

10 10

1

1

23 14

Figure 2. Phylogenetic relationship of diferent classes of plant subtilases. Te tree was constructed with IQ- TREE by employing the Maximum Likelihood method (model: WAG + R8, predicted by Modelfnder). Te circles around the phylogenetic tree (from center to periphery) represent SBT1-5, and the respective species distribution, indicated by diferent colors. Some bacterial subtilase genes from the phylogenetic tree of the S8A subfamily (Supplementary Fig. S1) were selected as outgroup. Histograms of two outer rings are numbers (red histogram) and average length of introns (green histogram) respectively. For each clade, numbers above branches indicate bootstrap values based on 200 replications.

SBT6 and SBT7 are highly conserved among various lineages. SBT7 and SBT6 are reported as homologs of human protein convertases and are characterized by a stronger similarity to the mammalian kexins and pyrolysins than to plant subtilases9. According to the phylogenetic tree in Fig. 1, SBT7 and SBT6 are indeed very distinct from SBT1-5 with diferent S8 domains (Supplementary Table S3). We conducted a broad search among (Homo sapiens, Drosophila melanogaster, Mus musculus, and Bactrocera dorsalis) and found that both SBT6 and SBT7 have a broad distribution with one or two gene copies. Te phylogeny of SBT7 and SBT6 among eukaryotic species is shown in Fig. 3. It seems that these genes are ubiquitously present among all species we selected. Interestingly, the presence of SBT6 has been reported in only two bacterial species (Blastopirellula marina, Rubinisphaera brasiliensis)29. Both species are related members of occurring in saline/ marine environments. Based on these observations, and our phylogenetic tree, we hypothesize that SBT7 and SBT6 had likely a eukaryotic origin and SBT6 might have been transferred to the two species of Planctomycetes via HGT, although the donor remains unknown as the support values in this region of the SBT6 tree are extremely low. In spite of their broad distribution, the motifs of both SBT7 and SBT6 genes displayed a consistently high similarity among eukaryotes, showing their conservation among various lineages.

Taxonomic distribution of S8A genes and plant subtilases expansion in Embryophyta. To intu- itively show the overall distribution of S8A genes, we performed a quantitative statistic combing all explored

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 4 www.nature.com/scientificreports/ www.nature.com/scientificreports

Schizosaccharomyces pombe TPP2 Motif 10 96 Malassezia pachydermatis TPP2 Fungi Motif 17 100 Moesziomyces antarcticus TPP2 Motif 19 Stramenopiles-Fragilariopsis-OEU13959.1 Stramenopiles Motif 20 SBT6 32 Porum69557.1 Drosophila melanogaster TPP2 Motif 3 Caenorhabditis elegans TPP2 80 Motif 11 40 Homo sapiens TPP2 Motif 1 19 100 Mus musculus TPP2 32 bac-Blastopirellula marina DSM 3645 TPP2 Motif 12 93 bac-Rubinisphaera brasiliensis TPP2 Bacteria Motif 2 Amoebozoa-Planoprotostelium-PRP81954.1 24 35 Motif 13 Stramenopiles-Ectocarpus-ES0004G00070 Motif 16 25 Stramenopiles-Phytophthora-XP 002906072.1 Stramenopiles 20 100 Stramenopiles-Phytophthora-XP 008899999.1 Motif 5 Chlre5586 Motif 8 100 Volca7180 Motif 6 100 Chlva7429 100 Cocsu1782 Motif 9 99 Mesovi-comp25362 c0 seq1 m.23508 Motif 4 100 Mesovi-comp7206 c0 seq1 m.2217 Motif 7 KleflGAQ87799.1 84 PACid 15412976 Green lineage Motif 15 96 Mapoly0069s0029.1.p Motif 18 69 100 Phypa2 123V6.1 Motif 14 100 Phypa304 1V6.1 36 Sacu v1.1 s0149.g023286 Os02t0664300-01 96 100 Zm00001d017522 100 AT4G20850.1 Euglenozoa-Trypanosoma cruzi XP 806372.1 Euglenozoa SBT7 Stramenopiles-Ectocarpus siliculosus ES0273G00010 100 Stramenopiles-Phytophthora parasitica XP 008898115.1 Stramenopiles 100 Stramenopiles-Phytophthora infestans XP 002904862.1 81 Porum68732.1 Red algae Bactrocera dorsalis MBTPS1 24 100 Homo sapiens MBTPS1 Animal 52 100 Mus musculus MBTPS1 39 Amoebozoa-Planoprotostelium fungivorum PRP88795.1 48 Amoebozoa-Acytostelium subglobosum XP 012754371.1 Amoeba 100 Amoebozoa-Acytostelium subglobosum XP 012758124.1 24 Cyanophora paradoxa ConsensusfromContig10185-SBT6.1Glaucophyte algae MicRCC9363 MBTPS1 77 Batpr7359 MBTPS1 71 Ostta7021 MBTPS1 46 55 54 Chlva756 MBTPS1 Cocsu148 MBTPS1 78 Chlre8114 MBTPS1 100 38 Volca24 53 Chlat66S07212 Mesvi29S04991 KleflGAQ86362.1 36 Spisp1455S07574 Green lineage 88 100 Spisp369S13987 80 Mesen371S07005 76 Colsc4866S0T10164 82 Mapoly0155s0018.1.p 57 Phypa55 133V6.1 92 PACid 15422900 Sacu v1.1 s0003.g001892 75 Os06t0163500-01 92 100 Zm00001d045217 Tree scale: 1 100 AT5G19660.1 5' 3' 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700

Figure 3. Phylogenetic relationship of SBT7 and SBT6 and their motif composition. Te phylogenetic tree was constructed by IQ-TREE based on the Maximum Likelihood method (model: LG + F + R5, predicted by Modelfnder). For each clade, bootstrap values are labeled on each branch based on 200 replications. 20 conserved motifs were identifed through MEME analysis.

genes with their phylogenetic relationship above. Statistical analyses of the copy numbers showed the distribution of S8A genes among Archaeplastida (Fig. 4; Supplementary Table S4). Te S8 gene clusters revealed a discontinu- ous distribution among the lineages, and we could only detect their presence up to early-diverging embryophytes (Bryophyta). For example, we found that genes of the S8 cluster 3 exist in species of Chlorophyta, Sreptophyta, in (Physcomitrella patens) and in the red alga Porphyra umbilicales but not in vascular plants. In proteinease K, both Porphyra umbilicalis (Rhodophyta) and Cyanophora paradoxa (Glaucophyta) have a larger copy number than the Viridiplantae combined. Plant subtilases (SBT2 and the ancestor of SBT1, SBT3, SBT4, and SBT5) frst appeared in derived streptophyte algae (Coleochaetophyceae and Zygnematophyceae) with the excep- tion of SBT6 and SBT7 that occur throughout Archaeplastida albeit with a discontinuous distribution in SBT 6 (Fig. 4). Signifcant expansions of copy numbers were observed in SBT1 among monocot species and in SBT 4 and SBT 5 in Selaginella moellendorfi although their signifcance remains unknown (Figs 2,4). Unfortunately, the function of these genes remains uncertain without experimental validation, although we inferred that the expansions may be connected with it and the species’ living environment (habitat).

Conserved structures of SBT2 during their evolution. With the exception of gene copy numbers and amino acid site mutation, variant gene structure also refects evolutionary diference among the classifca- tion of diverse species. Te tertiary structures of subtilases have been reported earlier, i.e. the cucumisin from melon fruits ( (PDB) code 3VTA and 4YN3)30,31, and SlSBT3 from tomato (PDB code 3I6S)32. However, the protein structural information of SBT2 (the earliest-diverging subtilase of embryophytes; Fig. 2) is unavailable. Terefore, we frst homology-modeled the 3D structure of SBT2, and then performed a structural comparison of this modelled protein among prokaryotes and eukaryotes (Fig. 5). Te analyses revealed the signif- icant structural similarity between bacteria vs algae, algae vs bryophytes, and bryophytes vs dicots/monocots. In fact, SBT2 showed a high similarity between bacteria and embryophytes as well. Moreover, when we analyzed the combined SBT2 sequences among all the fve species, they still exhibited highly conserved regions (pink color) (Fig. 5b, lef panel), which was also evident in the MSA (multiple sequence alignment) (Fig. 5b, right panel). Tese observations confrmed the highly conserved nature of SBT2 throughout the evolution, and its likely origin and transfer via HGT from bacteria. Discussion Proteases play key roles in the developmental regulation of plants. While plant genomes encode hundreds of proteases, the largest class of them are represented by serine proteases33. However, despite being the dominant class, the complex evolutionary history and function of serine proteases are not yet fully explored. In this study, we performed a comprehensive phylogenetic analysis of the S8A gene peptidase family using 835 genes from Archaea, Bacteria, Fungi, Amoebozoa, Stramenopiles, Euglenozoa, and Archaeplastida. All the genes were clus- tered into several groups, including genes that clustered into seven groups of plant subtilases (SBT1 to SBT7). Genes corresponding to plant subtilases were also selected to build a refned phylogenetic tree to show the rela- tionships among them. Previous studies have shown that some plant-like in fungi have been acquired from embryophytes34, and another study implicated a single HGT event involved in the origin of plant subti- lases18. However, none of these studies addressed the possible recipient of the HGT among Viridiplantae or the early evolution of plant subtilases. Based on our phylogenetic analyses and the copy number variation among subtilases, we concluded that the evolution of plant subtilases began with a single HGT event followed by a

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 5 www.nature.com/scientificreports/ www.nature.com/scientificreports

Arabidopsis thaliana

Oryza sativa

Zea mays

Salvinia cucullata

Selaginella moellendorffii

Physcomitrella patens

Marchantia polymorpha

Mesotaenium endlicherianum

“Spirotaenia sp.”

Coleochaete scutata

Chara braunii

Klebsormidium nitens

Chlorokybus atmophyticus

Mesostigma viride

Chlorella variabilis

Coccomyxa sorokiniana

Chlamydomonas reinhardtii

Volvox carteri

Ostreococcus tauri Embryophyta 30

Streptophyte algae 25 Micromonas commoda Chlorophyta 15 Bathycoccus prasinos Rhodophyta 5

Cyanophora paradoxa Glaucophyta 1

Porphyra umbilicalis

e K SBT7 SBT6 SBT2 SBT1 SBT5 SBT4 SBT3

S8 cluster 1S8 cluster 2S8 cluster S83 cluster 4 Proteinase KP43 proteinas

Figure 4. Copy number variation of subtilases among Archaeplastida. Te copy numbers were calculated based on the phylogenetic tree (Figs 1a and 2) and functional annotation. Te colors corresponding to respective group of taxa are highlighted at appropriate positions.

frst gene duplication in the common ancestor of Coleochaetophyceae, Zygnematophyceae, and embryophytes. Interestingly, the putative donor taxa of the plant subtilases (SBT1-5) genes belong to four phyla (Proteobacteria [only Gammaproteobacteria and Betaproteobacteria], Chlorofexi, Actinobacteria and Firmicutes) (Fig. 1b; Supplementary Fig. S1) and with one exception (Halioglobus sp.) are all soil bacteria. Tis may suggest that the HGT occurred in a terrestrial environment corroborating the mounting evidence that streptophyte algae through- out their evolution underwent increasing adaptation to subaerial/terrestrial habitats although many extant streptophyte algae thrive in aquatic habitats35 (and unpublished results). Subsequently, one of the algae-type SBTs gradually evolved into the present-day plant SBT2s, another was split into two parts evolving into SBT1 and SBT3, 4 and 5 respectively via an extra gene or genome duplication event that presumably occurred in the ancestor of the embryophytes (Fig. 6). However, our hypothesis difers from that proposed by Taylor and Qiu14. According to their hypothesis, the lineages of SBT 1,3,4, and 5 originated before the divergence of Embryophyta and Zygnematophyceae, and the SBT2 lineage originated early in embryophytes. In this study, we also predicted that subtilases in SBT 3,4, and 5 might have undergone more complicated species-specifc duplications or gene losses, and therefore require more detailed phylogenetic information. As SBT6 and SBT7 could not be clustered into the same group with SBT1 to SBT5, we built another tree to show their distribution among eukaryotes. Both SBT6 and SBT7 were also found to be distinct from each other, which can be also judged by their distinct roles in plants9. For instance, in Arabidopsis, the activation of two membrane-bound transcription factors bZIP28 and bZIP17 depends upon the cleavage by SBT7 during ER (endoplasmic reticulum) stress signaling and salt stress, respectively36,37. In addition to membrane-bound transcription factors, other proteins have also been identifed as substrates of SBT738,39. SBT6 acts in a proteolytic pathway downstream of the proteasome during cadmium stress40. Interestingly, two bacterial species that have likely obtained SBT6 genes from plants (Fig. 3) can both survive in a saline environment41,42, suggesting that SBT6 may also act under salt stress.

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 6 www.nature.com/scientificreports/ www.nature.com/scientificreports

a

C. psychrerythraea M. endlicherianum P. patens C. psychrerythraea M. endlicherianum P. patens A. thaliana A. thaliana

β2 η3 α3 β3 b 230 240 250 260 270 Colwellia psychrerythraea NK LI GA RY F DASFSSQYEV Q YALG EF DSP RD ADGHG SHT A ST A GG N EG V S Mesotaenium endlicherianum KK VI GA NY F KDGFEYSVGAIAKK. D SP SP LD VDGHG SHT A ST A AG N AG Y P Physcomitrella patens GK VI GA KY F ARGIMAADMF N ETY. DF ASP FD GDGHG THT SSI A AG S SG VP Oryza sativa GK IV GA QH F AKAAIAAGAF N PDV. DF ASP LD GDGHG SHT A AI A AG N NG IP Arabidopsis thaliana GK II GA QH F AEAAKAAGAF N PDI. DF ASP MD GDGHG SHT A AI A AG N NG IP β4 β5 η4 TT 280 290 300 310 320 Colwellia psychrerythraea A M L S G ANA G T VSG MAPRARIA AYK A C W NSDYKNPEGGDEA G ....C F YG D Mesotaenium endlicherianum V GS K ELPA G L ASG MAPRARIA AYK VIW YN...... EA G EGPY G ES S D Physcomitrella patens V V VKG YNY G T ASG IAPRARIA VYK VIY RD...... G .... GF LS D Oryza sativa V R MHG HEF G K ASG MAPRARIA VYK VLY R...... LF G .... GY VS D Arabidopsis thaliana V R MHG YEF G K ASG MAPRARIA VYK ALY R...... LF G .... GF VA D

η11β18 α8 α9

590 600 610 620 630 Colwellia psychrerythraea ..SAPMFGTQ GE T F KY LQ GTSM SS PH IAG MA AL FK ESNSS WS P AQ I KSA M Mesotaenium endlicherianum .. G RQTLNGK P LLSN VLS GTSM A APH LAG LA AL VK QK Y P RWS P F AI KSA I Physcomitrella patens QN G IDVTGFV GE S F A MIS GTSM A TPH IAG VV AL VK QK H P TWS TS AI HSA L Oryza sativa PN G TDEANYA GE G F A MVS GTSM A APH IAG IA AL IK QK N P KWS P S AI KSA L Arabidopsis thaliana AN G TDEANYI GE G F A LIS GTSM A APH IAG IA AL VK QK H P QWS P A AI KSA L

Figure 5. SBT2 is conserved during the course of evolution from bacteria to embryophytes. (a) Comparative structural analysis of the homology-modelled SBT2 protein between Colwellia psychrerythraea, Mesotaenium endlicherianum, Physcomitrella patens, and Arabidopsis thaliana or Oryza sativa by superimposition. Te colors corresponding to individual species are labelled in the fgure. (b) Te compiled analysis of the SBT2 protein among all the above-mentioned species by using the CONSURF tool. Te pink color at the core region indicates the highly conserved nature of SBT2 throughout evolution (lef panel). Te right panel displays the multiple sequence alignment of the representative protein sequence from the fve species along with the information regarding secondary structures revealing conserved amino acid blocks.

In conclusion, large-scale phylogenetic analyses of subtilases among species in Archaea, Bacteria, and eukar- yotes were performed to better understand their complex evolutionary history. Phylogenetic trees of subtilase genes showed the diversifcation of the S8A superfamily and the origin of plant subtilases through a single HGT event likely from a soil bacterium to the common ancestor of Coleochaetophyceae, Zygnematophyceae and embryophytes, suggesting that this ancestor may have thrived in a subaerial/terrestrial environment. Methods Data retrieval and fundamental analyses of plant subtilases. Te whole genome sequences and structure annotation of 143 species were downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/), including 5 species from Archaea, 101 species from Bacteria, and 37 species from eukaryotes (Supplementary Table S1). We included six newly sequenced and representative genomes of streptophyte algae: viride (strain CCAC 1140), atmophyticus (strain CCAC 0220), Entransia fmbriata (strain UTEX 2353), Coleochaete scutata (strain SAG 110.80), “Spirotaenia sp.” (strain CCAC 0214), and Mesotaenium endlicheria- num (strain SAG 12.94). Tese algae were obtained as axenic strains from the Culture Collection of Algae at the University of Cologne (http://www.ccac.uni-koeln.de/), and the DNA as extracted by a modifed CTAB method43. All genomes were processed for functional annotation using the SWISS-PROT database (cutoff: e-value of 1e-5) to selected possible subtilisin-like proteases by using the keywords “Subtilisin-like protease”, “Subtilase”, “Tripeptidyl-peptidase II”, “TPP2”, or “Membrane-bound transcription factor site-1 protease”. In all these species, only 2 species of Archaea, 53 species of Bacteria, 41 species of eukaryotes were found to encode a total of 835 S8A genes (including subtilases). To further explore phylogenetic relationship between bacterial subtilases and plant subtilases, we rebuilt a detailed HGT tree (Fig. 1b; Supplementary Fig. S1). Bacterial subtilases in the detailed HGT tree (Fig. 1b) were selected by comparing every algal subtilases with nr databases using a 1e-10 e-value and 1,000 max target sequences as the cutof followed by a process of removing redundancy. According to NCBI bacterial , the species sources of these bacterial subtilases were classifed and for each taxon we only selected some repre- sentative species in order to cover all the bacterial taxonomy, fnally 80 genes were used in the analysis. Parts of genes from cluster 1 of the S8A gene subfamily tree were also added to this analysis as the outgroup and

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 7 www.nature.com/scientificreports/ www.nature.com/scientificreports

Coleochaetophyceae Algae-type SBT2

Duplication Zygnematophyceae SBT2

Embryophyta

HGT

ancestors of SBT1, 3, 4 and5 Coleochaetophyceae Bacterial SBT-like

Duplication Zygnematophyceae

SBT1

Embryophyta

SBT3, 4, 5

Embryophyta

Figure 6. An overview about the evolution of plant subtilases. Te recipient of the HGT of bacterial subtilases was the common ancestor of Coleochaetophyceae, Zygnematophyceae and embryophytes. Following the HGT the subtilase gene duplicated in the common ancestor and the two genes diversifed into SBT2 and SBT1,3–5 respectively, and subsequent expansion in embryophytes.

SBT2s were selected as the representative plant subtilases since they are one of the earliest-diverging subtilases of embryophytes. For SBT6 and SBT7, 13 genes from animals (Homo sapiens, Caenorhabditis elegans, Drosophila melano- gaster, Mus musculus, and Bactrocera dorsalis), fungi (Schizosaccharomyces pombe, Malassezia pachydermatis, Moesziomyces antarcticus) and bacteria (Blastopirellula marina, Rubinisphaera brasiliensis) were selected from the UniProt database (https://www.uniprot.org/). Tese gene were also included in the phylogenetic analysis of the S8A gene subfamily. Te conserved domains of selected genes were searched using NCBI’s CDD database (conserved domain database, https://www.ncbi.nlm.nih.gov/cdd/, cutof: e-value of 1e-5), the genes which do not contain the domain belonging to the Peptidases S8/S53 superfamily or contain a domain named Peptidases S8 Protein convertases Kexins Furin-like (S8B subfamily) were excluded from our analysis. Finally, 915 genes were selected for the down- stream analysis (Supplementary Tables S2, S3). Te intron numbers and the intron’s average length of plant subti- lases were also calculated (Supplementary Table S5).

Phylogenetic tree construction. Multiple sequence alignments were performed by MAFFT using high accuracy method (parameters:–maxiterate 1000–localpair). We removed the sites having the gap ratio higher than 50%. Phylogenetic analyses were conducted by using the IQ-TREE sofware44 (model: LG + R10, WAG + F + R7, WAG + R8 and LG + F + R5 for phylogenetic trees from Fig. 1a, Fig. 1b, Fig. 2 and Fig. 3 respectively, all these models were predicted and selected by Modelfnder). Due to the large gene number of sequences in the S8A gene family tree, we used an ultrafast bootstrap approximation (UFBoot, parameter: -bb) to assess branch support45, UFBoot overcomes the computational burden required by the nonparametric bootstrap and is faster than the standard procedure providing relatively unbiased branch support values. Te reliability of diferent trees was assessed using diferent bootstrap replicates and except for these, all parameters were set as default. Unexpectedly long branches in all the trees were removed because they ofen refer to erroneous sequences.

Identification of conserved motifs. The local Multiple Em for Motif Elicitation (MEME, http:// meme-suite.org/) tool was used to identify conserved motifs. All SBT6.1 and SBT6.2 genes were analyzed in our study using the classical model. Te number of motifs MEME should fnd was set to 20.

Homology modeling and comparative structural analyses of plant subtilases. Five homolo- gous proteins were selected from Colwellia psychrerythraea, Mesotaenium endlicherianum, Physcomitrella patens, Oryza sativa, and Arabidopsis thaliana respectively. Te tertiary structure of these fve proteins were modelled by SWISS-MODEL (https://swissmodel.expasy.org/), which is a fully automated structure homology-modelling server. Comparative structural analysis of the homology modelled SBT2 protein was carried out using the CONSURF tool (http://consurf.tau.ac.il/). Finally, the protein superimposition was done by employing Biovia Discovery Studio (2017, R2).

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 8 www.nature.com/scientificreports/ www.nature.com/scientificreports

Data Availability Te sequences of S8A genes which we identifed from the green algae (Mesostigma viride, Chlorokybus atmophyt- icus, Klebsormidium nitens, Chara braunii, Coleochaete scutata, “Spirotaenia sp”., Mesotaenium endlicherianum) are available in the CNGB Nucleotide Sequence Archive (CNSA: http://db.cngb.org/cnsa; accession number CNP0000252). Te specifc details regarding other genes which were used in this study are available in Supple- mentary Table S3. References 1. Schaller, A. A cut above the rest: the regulatory function of plant proteases. Planta 220, 183–197 (2004). 2. Di Cera, E. Serine proteases. IUBMB life 61, 510–515 (2009). 3. Page, M. J. & Di Cera, E. Serine peptidases: classifcation, structure and function. Cellular and Molecular Life Sciences 65, 1220–1236 (2008). 4. Rawlings, N. D. et al. Te MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 46, D624–D632, https://doi.org/10.1093/nar/gkx1134 (2018). 5. Siezen, R. J. & Leunissen, J. A. Subtilases: the superfamily of subtilisin‐like serine proteases. Protein science 6, 501–523 (1997). 6. Yamagata, H., Masuzawa, T., Nagaoka, Y., Ohnishi, T. & Iwasaki, T. Cucumisin, a from melon fruits, shares structural homology with subtilisin and is generated from a large precursor. Journal of Biological Chemistry 269, 32725–32731 (1994). 7. Cao, J. et al. Genome-wide and molecular evolution analysis of the subtilase gene family in Vitis vinifera. BMC Genomics 15, 1116, https://doi.org/10.1186/1471-2164-15-1116 (2014). 8. Figueiredo, J. et al. Revisiting Vitis vinifera subtilase gene family: a possible rolein grapevine resistance against Plasmopara viticola. Front Plant Sci 7, 1783, https://doi.org/10.3389/fpls.2016.01783 (2016). 9. Rautengarten, C. et al. Inferring hypotheses on functional relationships of genes: Analysis of the Arabidopsis thaliana subtilase gene family. PLoS Comput Biol 1, e40, https://doi.org/10.1371/journal.pcbi.0010040 (2005). 10. Tripathi, L. P. & Sowdhamini, R. Cross genome comparisons of serine proteases in Arabidopsis and rice. Bmc Genomics 7, 200 (2006). 11. Meichtry, J., Amrhein, N. & Schaller, A. Characterization of the subtilase gene family in tomato (Lycopersicon esculentum Mill.). Plant Mol Biol 39, 749–760 (1999). 12. Reichardt, S. et al. Te tomato subtilase family includes several cell death-related proteinases with caspase specifcity. Sci Rep 8, 10531, https://doi.org/10.1038/s41598-018-28769-0 (2018). 13. Norero, N. S., Castellote, M. A., de la Canal, L. & Feingold, S. E. Genome-wide analyses of subtilisin-like serine proteases on Solanum tuberosum. American journal of potato research 93, 485–496 (2016). 14. Taylor, A. & Qiu, Y.-L. Evolutionary history of subtilases in land plants and their involvement in symbiotic interactions. Molecular Plant-Microbe Interactions 30, 489–501 (2017). 15. Schaller, A., Stintzi, A. & Graf, L. Subtilases - versatile tools for protein turnover, plant development, and interactions with the environment. Physiol Plant 145, 52–66, https://doi.org/10.1111/j.1399-3054.2011.01529.x (2012). 16. Schaller, A. Plant Subtilisins. Handbook of Proteolytic Enzymes. 3, 3247–3254, https://doi.org/10.1016/B978-0-12-382219-2.00717-1 (2013). 17. Schaller, A. et al. From structure to function - a family portrait of plant subtilases. New Phytol 218, 901–915, https://doi.org/10.1111/ nph.14582 (2018). 18. Yue, J., Hu, X., Sun, H., Yang, Y. & Huang, J. Widespread impact of horizontal gene transfer on plant colonization of land. Nature communications 3, 1152 (2012). 19. Nonaka, T. et al. Te crystal structure of an oxidatively stable subtilisin-like alkaline serine protease, KP-43, with a C-terminal β-barrel domain. Journal of Biological Chemistry 279, 47344–47351 (2004). 20. Han J. R. et al. Halioglobus sediminis sp. nov., isolated from coastal sediment. International journal of systematic and evolutionary microbiology, https://doi.org/10.1099/ijsem.0.003366 (2019). 21. Wang, Y. et al. Glycomyces xiaoerkulensis sp. nov., isolated from Xiaoerkule lake in Xinjiang, China. International journal of systematic and evolutionary microbiology 68, 2722–2726 (2018). 22. Matsuura, N. et al. Draf genome sequences of Anaerolinea thermolimosa IMO-1, Bellilinea caldifstulae GOMI-1, Leptolinea tardivitalis YMTK-2, Levilinea saccharolytica KIBI-1, Longilinea arvoryzae KOME-1, previously described as members of the Class Anaerolineae (Chlorofexi). Genome Announc. 3, e00975–15 (2015). 23. Nolan, M. et al. Complete genome sequence of Streptosporangium roseum type strain (NI 9100 T). Standards in genomic sciences 2, 29 (2010). 24. Klümper, U. et al. Broad host range plasmids can invade an unexpectedly diverse fraction of a soil bacterial community. ISME J. 9, 934–945 (2015). 25. Husnik, F. & McCutcheon, J. P. Functional horizontal gene transfer from bacteria to eukaryotes. Nat. Rev. Microbiol. 16, 67–79 (2018). 26. Matasci, N. et al. Data access for the 1,000 Plants (1KP) project. Gigascience 3, 17, https://doi.org/10.1186/2047-217X-3-17 (2014). 27. Rose, R., Schaller, A. & Ottmann, C. Structural features of plant subtilases. Plant Signal Behav 5, 180–183 (2010). 28. Roy, S. W. & Gilbert, W. Te evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7, 211–221, https://doi. org/10.1038/nrg1807 (2006). 29. Rockel, B., Kopec, K. O., Lupas, A. N. & Baumeister, W. Structure and function of II, a giant cytosolic protease. Biochim Biophys Acta 1824, 237–245, https://doi.org/10.1016/j.bbapap.2011.07.002 (2012). 30. Murayama, K. et al. Crystal structure of cucumisin, a subtilisin-like endoprotease from Cucumis melo L. J Mol Biol 423, 386–396, https://doi.org/10.1016/j.jmb.2012.07.013 (2012). 31. Sotokawauchi, A. et al. Structural basis of cucumisin protease activity regulation by its propeptide. Te Journal of Biochemistry 161, 45–53 (2017). 32. Ottmann, C. et al. Structural basis for Ca2+-independence and activation by homodimerization of tomato subtilase 3. Proc Natl Acad Sci USA 106, 17223–17228, https://doi.org/10.1073/pnas.0907587106 (2009). 33. Van der Hoorn, R. A. Plant proteases: from phenotypes to molecular mechanisms. Annu. Rev. Plant Biol. 59, 191–223 (2008). 34. Armijos Jaramillo, V. D., Vargas, W. A., Sukno, S. A. & Ton, M. R. Horizontal transfer of a subtilisin gene from plants into an ancestor of the plant pathogenic fungal genus Colletotrichum. PLoS One 8, e59078, https://doi.org/10.1371/journal.pone.0059078 (2013). 35. Harholt, J., Moestrup, Ø. & Ulvskov, P. Why Plants Were Terrestrial from the Beginning. Trends Plant Sci 21, 96–101, https://doi. org/10.1016/j.tplants.2015.11.010 (2016). 36. Liu, J. X., Srivastava, R., Che, P. & Howell, S. H. Salt stress responses in Arabidopsis utilize a signal transduction pathway related to endoplasmic reticulum stress signaling. Plant J 51, 897–909, https://doi.org/10.1111/j.1365-313X.2007.03195.x (2007). 37. Liu, J. X. & Howell, S. H. Endoplasmic reticulum protein quality control and its relationship to environmental stress responses in plants. Plant Cell 22, 2930–2942, https://doi.org/10.1105/tpc.110.078154 (2010). 38. Srivastava, R., Liu, J. X., Guo, H., Yin, Y. & Howell, S. H. Regulation and processing of a plant hormone, AtRALF23, in Arabidopsis. Te Plant Journal 59, 930–939 (2009).

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 9 www.nature.com/scientificreports/ www.nature.com/scientificreports

39. Wolf, S., Rausch, T. & Greiner, S. Te N‐terminal pro region mediates retention of unprocessed type‐I PME in the Golgi apparatus. Te Plant Journal 58, 361–375 (2009). 40. Polge, C. et al. Evidence for the Existence in Arabidopsis thaliana of the Proteasome Proteolytic Pathway: Activation in Response to Cadmium. J Biol Chem 284, 35412–35424, https://doi.org/10.1074/jbc.M109.035394 (2009). 41. Fuller, R. S., Brake, A. & Torner, J. Yeast prohormone processing (KEX2 gene product) is a Ca2+-dependent serine protease. Proc Natl Acad Sci USA 86, 1434–1438 (1989). 42. Schlesner, H. et al. Taxonomic heterogeneity within the Planctomycetales as derived by DNA–DNA hybridization, description of Rhodopirellula baltica gen. nov., sp. nov., transfer of Pirellula marina to the genus Blastopirellula gen. nov. as Blastopirellula marina comb. nov. and emended description of the genus Pirellula. International journal of systematic and evolutionary microbiology 54, 1567–1580 (2004). 43. Sahu, S. K., Tangaraj, M. & Kathiresan, K. DNA extraction protocol for plants with high levels of secondary metabolites and polysaccharides without using liquid nitrogen and phenol. ISRN 2012 (2012). 44. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and efective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268–274, https://doi.org/10.1093/molbev/msu300 (2015). 45. Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol Biol Evol 35, 518–522, https://doi.org/10.1093/molbev/msx281 (2018). Acknowledgements We thank Prof. Jinling Huang and Dr. Shifeng Cheng for their useful advice and suggestions about the study. Financial support was provided by the Shenzhen Municipal Government of China (Grant numbers No. JCYJ20151015162041454 and No. JCYJ20160331150739027), Guangdong Provincial Key Laboratory of Genome Read and Write (Grant Number 2017B030301011), and Guangdong Provincial Key Laboratory of core colletion of corp genetic resources research and application (Grant Number 2011A091000047). Tis work is part of the 10KP project led by BGI-Shenzhen and China National GeneBank. Author Contributions H.L., S.B.W., L.Z.L., S.K.S. and X.Y. conceived, designed, and supervised the study. X.Y., S.B.W., L.Z.L. and S.K.S. performed the data analysis. G.Y.Z. and M.M. advised and discussed about the study. X.Y. and S.K.S. wrote the manuscript. S.B.W., L.Z.L., S.K.S., M.M., G.Y.Z., X.L. and P.M. revised the manuscript. All authors read and approved the fnal version of the manuscript. Additional Information Supplementary information accompanies this paper at https://doi.org/10.1038/s41598-019-48664-6. Competing Interests: Te authors declare no competing interests. Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre- ative Commons license, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per- mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© Te Author(s) 2019

Scientific Reports | (2019)9:12485 | https://doi.org/10.1038/s41598-019-48664-6 10