G3: Genes|Genomes|Genetics Early Online, published on February 21, 2018 as doi:10.1534/g3.118.200040
1 Title: Synopsis of The SOFL Plant-Specific Gene Family
2
3 Reuben Tayengwa*,†,1, Jianfei Zhao‡, Courtney F. Pierce†,2, Breanna E. Werner†,3, Michael
4 M. Neff*,†
5
6 *Program in Molecular Plant Sciences, Washington State University, Pullman, WA, USA.
7 †Department Crop and Soil Sciences, Washington State University, Pullman, WA, USA.
8 ‡Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
9
10 Present addresses
11 1University of Maryland, Plant Sciences and Horticultural Landscape Department, College Park,
12 MD 20742.
13 2Colorado State University, Department of Animal Sciences, Fort Collins, CO 80523.
14 3Washington State University College of Nursing, Spokane, WA 99202.
15
16
17
18
1
© The Author(s) 2013. Published by the Genetics Society of America. 19 Running Title: The SOFL gene family
20
21 Keywords
22 SOFL, plant-specific, gene-family, phylogenetic, Arabidopsis, evolution, introns
23 For correspondence (e-mail [email protected])
24 Address: 387 Johnson Hall, PO Box 646420, Pullman WA 99164-6420 USA
25 Phone: 509-335-7705, Fax: 509-335-8674
26
27
28
29
30
31
32
33
34
35
36
2
37 ABSTRACT
38 SUPPRESSOR OF PHYB-4#5 DOMINANT (sob5-D) was previously identified as a suppressor
39 of the phyB-4 long-hypocotyl phenotype in Arabidopsis thaliana. Overexpression of SOB5
40 conferred dwarf phenotypes similar to those observed in plants containing elevated levels of
41 cytokinin (CK) nucleotides and nucleosides. Two SOB-FIVE- LIKE (SOFL) proteins, AtSOFL1
42 and AtSOFL2, which are more similar at the protein level to each other than they are to SOB5,
43 conferred similar phenotypes to the sob5-D mutant when overexpressed. We used founding
44 SOFL gene family members to perform database searches and identified a total of 289 SOFL
45 homologues in sequenced genomes of 89 angiosperm species. Phylogenetic analysis results
46 implied that the SOFL gene family emerged during the expansion of angiosperms and later
47 evolved into four distinct clades. Among the newly identified gene family members are four
48 previously unreported Arabidopsis SOFLs. Multiple sequence alignment of the 289 SOFL
49 protein sequences revealed two highly conserved domains; SOFL-A and SOFL-B.
50 Overexpression and site-directed mutagenesis studies demonstrated that the SOFL domains are
51 necessary for SOB5 and AtSOFL1’s overexpression phenotypes. Examination of the subcellular
52 localization patterns of the founding Arabidopsis thaliana SOFLs suggested they may be
53 localized in the cytoplasm and the nucleus. We have discovered that SOFLs are a plant-specific
54 gene family characterized by two conserved domains that are important for function.
55
56 INTRODUCTION
57 Many genes can be grouped into specific families based on nucleotide and protein sequence
58 similarity. Such gene families have arisen as a result of duplication and expansion of individual
59 members (Wang et al. 2012). Gene family sizes vary across different lineages and may have
3
60 important functional outcomes related to adaptation or speciation (Qu and Zhu 2006; Vollrath et
61 al. 1998). Functional and evolutionary studies of genes belonging to various families have
62 greatly enhanced our understanding of components involved in plant growth and development
63 (Kim et al. 2007; Higuchi et al. 2004; Xie et al. 1999; Kim 2006; Kondou et al. 2010; Zhao et al.
64 2014). Despite significant progress made towards discovery and curations of many gene families,
65 it is estimated that the majority of plant gene families remain uncharacterized (Guo 2012).
66 Therefore, the continued identification and study of hitherto uncharacterized gene families
67 remains crucial for our continued endeavor to eventually achieve the goal of functionally
68 characterizing most of the genes.
69
70 The Arabidopsis thaliana activation-tagged mutant sob5-D (SUPPRESSOR OF PHYB-4#5
71 DOMINANT) was previously identified as a suppressor of the long-hypocotyl phenotype
72 conferred by a phyB-4 mutation (Zhang et al. 2006). Database searches using SOB5 (At5g08150)
73 protein sequence as a query revealed two SOB-FIVE- LIKE (SOFL) proteins in Arabidopsis,
74 AtSOFL1 (At1g26210) and AtSOFL2 (At1g68870) (Zhang et al. 2006). AtSOFL1 and AtSOFL2
75 are more similar to each other than they are to SOB5 at the protein level (Zhang et al. 2006,
76 2009). When individually overexpressed, AtSOFL1 and AtSOFL2 conferred phenotypes similar
77 to the sob5-D activation-tagged mutant (Zhang et al. 2006, 2009). Transgenic lines individually
78 overexpressing SOB5, AtSOFL1 and AtSOFL2 genes displayed reduced apical dominance, and
79 conferred smaller rosette leaves, retarded root growth and delayed senescence phenotypes
80 (Zhang et al. 2006, 2009). In addition, these transgenic lines also contained elevated levels of
81 cytokinin (CK) nucleotides and nucleosides; trans-zeatin riboside (tZR), trans-zeatin riboside
82 monophosphate (tZRMP), and N6-(∆2-isopentenyl)adenosine monophosphate (iPRMP) (Zhang et
4
83 al. 2006, 2009). These results suggested that the founding SOFL gene family members may have
84 similar or overlapping CK-related functions. However, the specific details of what roles, if any,
85 intermediate CK nucleotide/nucleoside species play during plant growth and development are
86 not fully understood and will have to be further investigated in the future.
87
88 Zhang et al., (2006), reported that the three founding SOFLs constituted a small, novel and
89 intron-less Arabidopsis thaliana gene family. In addition, SOFL homologues and expressed
90 sequence tags were also identified in a few monocotyledonous and dicotyledonous species;
91 Malus x domestica, Brassica napus, Oryza sativa and Populus trichocarpa (Zhang et al. 2006).
92 Due to limited sequenced genomes at the time it was difficult to obtain a comprehensive
93 phylogenetic overview of the SOFL gene family. Nonetheless, based on these preliminary results
94 we speculated that SOFL homologues also existed in genomes of other eukaryotic, prokaryotic,
95 and or mammalian species (Zhang et al. 2006).
96
97 Since Arabidopsis SOFLs were first identified, more genome sequence information has been
98 released (Fang et al. 2013; Wheeler et al. 2008; Kersey et al. 2016). We have taken a
99 phylogenetic approach to update our understanding of the evolution and diversification of the
100 SOFL gene family. Database searches using the three founding SOFL members’ proteins
101 sequences as queries revealed at least 289 SOFL homologues in 89 angiosperm species. During
102 this process, we also discovered that the Arabidopsis thaliana genome contains four additional
103 and more divergent SOFLs that were previously unreported. In addition, phylogenetic analysis
104 showed that the 289 SOFL homologues evolved into four main clades. Finally, to generate
105 preliminary data as well as gain some insights into this gene family, we performed subcellular
5
106 localization, tissue expression and domain mutational studies using the three founding SOFL
107 members. This updated synopsis of the SOFL gene family provides basic background
108 information needed to design future studies to eventually fully characterize this novel gene
109 family.
110
111 MATERIALS AND METHODS
112 Plant materials and growth conditions
113 All Arabidopsis thaliana lines used in this manuscript are in the Columbia (Col-0) background.
114 For phenotypic analysis, all plants were grown on soil in pots in growth chambers set at 21°C
115 with white light (200 μmol m-2 sec-1) and 60-70 % humidity, and under 16 hours light and 8
116 hours darkness.
117
118 Sequence alignment and phylogenetic analysis
119 SOB5, AtSOFL1 and AtSOFL2 protein sequences were downloaded from NCBI (Lamesch et al.
120 2012; Goodstein et al. 2011; Wheeler et al. 2008; Sayers et al. 2009). The sequences were used
121 as queries to search NCBI and Phytozome databases for SOFL homologues using BLASTP
122 option using a < 1 X 10-2 cut-off value. The amino acid sequences were aligned using Probalign
123 V1.3 (Roshan 2014) on CIPRES Science Gateway Server (Miller et al. 2015). Bayesian
124 inference analysis was performed with the Mr. Bayes 3.2.1 on XSEDE tool on CIPRES Science
125 Gateway for 30 million generations with 8 chains to run. Generations were sampled every 10,000
126 generations with the first 25% generations were used as a burn-in.
127
128 Sequence logo analysis
6
129 Sequence logo analysis of conserved SOFL-A and SOFL-B domains was performed using the
130 online WebLogo tool (Crooks et al. 2004). 289 SOFL sequences retrieved from NCBI (Sayers et
131 al. 2009)and Phytozome (Goodstein et al. 2011) were used to generate the sequence logo using
132 default settings.
133
134 Overexpression and point-mutation analysis of SOFL domains in SOB5 and AtSOFL1
135 Site-directed point mutations were generated using the QuikChange® Lightning mutagenesis kit
136 according to manufacturer’s instructions (Agilent Technologies). Mutagenesis primers were
137 designed using Agilent’s web-based QuikChange® Primer Design Program
138 (http://www.genomics.agilent.com/primerDesignProgram.jsp) (Table S1). Gateway® compatible
139 entry vectors, pENTR223 (ABRC), carrying SOB5 and AtSOFL1 coding sequences, were used as
140 templates in mutagenesis PCR reactions. Each point mutation was confirmed via sequencing.
141 Once mutagenesis was confirmed, SOB5 or AtSOFL1 coding sequences carrying point mutations
142 were cloned via LR® reactions into pCHF3 (Zhang et al. 2006) and pEarlyGate100 (Earley et al.
143 2006) binary vectors, respectively. Destination binary vectors carrying mutated SOB5 and
144 AtSOFL1 genes under control of the constitutive 35S promoter were transformed into
145 Arabidopsis thaliana Col-0 wild-type plants using the floral dip method and the Agrobacterium
146 tumafaciens strain GV3101 (Clough and Bent 1998).Transformants were screened on 1/2×
147 Linsmaier and Skoog media plates containing 25 mg L-1 Basta (pEarleyGate 100) and 30 mg L-1
148 kanamycin (pCHF3). Homozygous lines were identified in the T3 generation. Three independent
149 transgenic lines representing each construct were selected for analysis. All plants were grown
150 and analyzed in a growth chamber.
151
7
152 RNA extraction and real-time PCR
153 Total RNA was extracted from tissues pooled from three independent plants (n=3 for all tissues)
154 of six-day old seedlings, 10-day old juvenile plants, rosette leaves from a 20-day old plant, floral
155 tissue, siliques, a 25-day old adult plant, and roots from a 25-day old plant using the Qiagen®
156 RNeasy Mini Kit (Qiagen). On-column DNase digestion was performed using the RNase-Free
157 DNase Set (Qiagen, Valencia, CA). Complementary DNA (cDNA) was further generated using
158 the iScript Reverse Transcription Supermix for RT-qPCR (Bio Rad, Hercules, CA). Identical
159 amounts of cDNA input were used in PCR reactions using either ubiquitin10 (UBQ10) or gene-
160 specific primers (Table S1). Amplification using the UBQ10 primers was done in 30 cycles
161 while for gene-specific primers 45 cycles were used. Negative control, no reverse transcriptase
162 (No RT), samples were prepared using the same reaction conditions and reagents (minus reverse
163 transcriptase enzyme) used to make cDNA.
164
165 CFP-SOFL fusion constructs and onion bombardment
166 To generate pSAT4-CFP-SOB5, pSAT4-CFP-AtSOFL1, and pSAT4-CFP-AtSOFL2 N-terminal
167 fusion constructs, respective full-length coding sequences, each contained in the entry vector
168 pENTR223 (ABRC), were cloned into pSAT4-CFP (ABRC) destination vector via Gateway®
169 LR reactions (Invitrogen, Carlsbad, CA). Onion epidermal layers were co-bombarded with
170 pSAT6-mRFP (ABRC) plasmid and pSAT4-CFP constructs carrying SOFLs fused to CFP using
171 a PDS-1000/He Biolistic transformation system (Bio Rad, Hercules, CA). Bombarded onion
172 epidermal layers were incubated in the dark for 40 hours. To identify successfully transformed
173 cells and observe fluorescent signals, the onion epidermal layer was examined on a Leica TCS
174 SP8 X (Leica Microsystems, Mannheim, Germany) confocal microscope.
8
175
176 Data availability
177 Genetic material used in this manuscript is available upon request. Primers used in the study are
178 listed in Table S1. Total numbers of SOFL homologues identified in various species are listed in
179 Table S2. Table S3 shows results from protein localization prediction analysis. Alternatively-
180 spliced AtSOFL1 product sequences (AtSOFL1.1 and AtSOFL1.2) are provided in the
181 supplemental file (File S2). Figure S1 shows a cartoon and gel image of AtSOFL1’s alternative
182 splicing products. Sequences extracted from our database searches using the founding SOFL
183 protein sequences as queries are provided in Supplemental file.
184
185 RESULTS
186 SOFLs are a plant-specific gene family
187 To identify SOFL homologues, we performed BLASTP searches against the NCBI database
188 (Sayers et al. 2009) using SOB5, AtSOFL1 and AtSOFL2 sequences as queries. We extracted
189 289 SOFL homologues from 89 flowering plant species, of which 15 were monocotyledons and
190 74 were dicotyledons (Table S2). To gain insight into the evolutionary history of the SOFL gene
191 family, we next performed phylogenetic analysis using the 289 SOFL protein sequences
192 extracted via database searches. Our analysis suggested that SOFLs likely evolved into four
193 major clades: Clade-I, II, III and IV (Figure 1a and 1b). We hypothesize that SOFLs within each
194 clade have greatly expanded through evolution, with Clades-I and -IV showing the largest
195 expansion (Figure 1a and1b). However, since the tree was constructed only with the existing
196 SOFL genes in the plant species examined and no outgroup used, we cannot rule out gene loss
197 events. Annotation data from NCBI (Sayers et al. 2009), Phytozome (Goodstein et al. 2011) and
9
198 individual species genome databases revealed that 60 SOFL homologues contained introns and
199 another 226 did not (Table S2). There was no annotation data for Arabis alpina and Erythranthe
200 guttata (Mimulas guttatus) SOFL homologues.
201
202 Arabidopsis thaliana genome contains seven SOFL members.
203 It was previously reported that the Arabidopsis genome contained three SOFL members (Zhang
204 et al. 2006, 2009). However, latest database search results revealed that the Arabidopsis thaliana
205 genome contained a total of seven SOFL members (File S1). In addition, we also identified seven
206 SOFLs in Arabidopsis lyrata (File S1). We have designated the newly identified Arabidopsis
207 thaliana SOFL genes as AtSOFL3 (AT3G30580), AtSOFL4 (AT5G38790), AtSOFL5
208 (AT4G33800) and AtSOFL6 (AT1G58460). SOB5 is in Clade VI of the phylogenetic tree (Figure
209 1a and 1b), compared to AtSOFL1 and AtSOFL2 which were in Clade I. Interestingly, the newly
210 identified SOFL members, AtSOFL3, AtSOFL4, AtSOFL5 and AtSOFL6 are in Clade IV (Figure
211 1b), suggesting that genetic redundancy may exist among these genes and SOB5. All
212 Arabidopsis SOFLs are intron-less except AtSOFL1(recently classified by TAIR as intron-
213 containing), AtSOFL5 and AtSOFL6.
214
215 We next assessed the expression patterns of Arabidopsis thaliana SOFL homologues. RNA was
216 extracted from seedlings, juvenile plants, adult rosette leaf, floral structures, siliques, an entire
217 flowering plant and adult plant roots. SOB5, AtSOFL1 and AtSOFL2 transcripts were detected in
218 all samples tested except roots (Figure 2). AtSOFL3, AtSOFL4, AtSOFL5 and AtSOFL6 were
219 expressed at varying levels in seedlings, juvenile plants, flowers, siliques and whole plant
220 (Figure 2). AtSOFL3, AtSOFL4, AtSOFL5 were the only ones detected in roots. AtSOFL6
10
221 showed little to no expression in rosette leaves and roots, and similarly, AtSOFL5 showed little
222 to no expression in seedlings and rosette leaves (Figure 2). AtSOFL4 was expressed in all
223 samples tested but had the lowest transcript levels overall. The varying transcript accumulation
224 levels and differential tissue expression patterns may suggest unique functional roles among the
225 seven Arabidopsis thaliana SOFLs during plant development (Figure 2). Surprisingly, during
226 efforts to amplify a full-length AtSOFL1 (At1g26210) cDNA we identified a previously
227 unreported splice variant. We have designated the original 447 bp transcript AtSOFL1
228 (At1g26210.1), and the newly identified 771bp second transcript as (At1g26210.2) (Figure S1a,
229 b).
230
231 SOFL homologues are characterized by two conserved domains
232 Multiple sequence alignment analysis is a useful tool used to infer relationships among gene
233 family members. Alignment data can provide functional information through the identification of
234 key conserved domains and other important features. We aligned SOFL homologues protein
235 sequences and revealed two conserved domains in the N-terminal region (Figure 3a), which were
236 previously reported by Zhang et al. (2006), albeit based on fewer protein sequences. We have
237 designated the two conserved domains, SOFL-A and SOFL-B (Figure 3a). When only
238 Arabidopsis SOFL sequences were aligned (Figure 3b) and compared to an alignment of all 289
239 SOFL homologues (Figure 3c, 3d) we observed slight differences in the SOFL-B domain. In an
240 alignment of Arabidopsis SOFL sequences only, SOFL-B domain contains eight 100%
241 conserved residues SM×SDASS×P (Figure 3b). However, the same domain only contains four
242 100% conserved residues (S××SDA) when all 289 SOFLs homologous sequences were aligned
243 (Figure 3d). In general, based on all SOFL orthologue sequences identified so far, SOFL-A
11
244 domain contains an SGWT×Y motif and SOFL-B domain contains an S××SDA motif (× =
245 amino acid residue that is not 100% conserved) (Figure 3c, d). There were no areas of high
246 sequence similarity in the C-terminal region, except for an increased presence of basic amino
247 acid residues in most SOFL homologues (Table S3).
248
249 Conserved amino acid residues in SOFL domains are required for the manifestation of
250 SOB5 and AtSOFL1 overexpression phenotypes.
251 Previously, Zhang et al., (2009), investigated the requirement of some of the 100% conserved
252 residues in SOFL-A and SOFL-B domains for the manifestation of AtSOFL2’s overexpression
253 phenotype via site-directed mutagenesis. Constructs carrying AtSOFL2 gene harboring individual
254 point mutations in each of the two conserved domains were used to generate plants
255 overexpressing an aberrant protein. Resultant transgenic mutant plants overexpressing AtSOFL2
256 gene harboring T21I and D80N point mutations lost the phenotype that is typically associated
257 with the overexpression of the wild type AtSOFL2 gene (Zhang et al. 2009). These results
258 implied that the conserved amino acid residues were important for the manifestation of AtSOFL2
259 overexpression phenotype.
260
261 To further investigate the biological importance of some of the 100% conserved residues in the
262 other two founding SOFLs, SOB5 and AtSOFL1, we generated a series of site-directed point
263 mutations in their respective SOFL-A and SOFL-B domains. Wild-type Arabidopsis thaliana
264 plants were separately transformed with constructs carrying mutations in SOB5 (T21I and P61R)
265 and AtSOFL1 (T23I and P84R) coding sequences. Transgenic plants overexpressing SOB5
266 (T21I, P61R) and AtSOFL1 (T23I, P84R) mutated genes lost the dwarf/semi-dwarf phenotypes
12
267 typically observed when wild-type versions were overexpressed (Figure 4a, b). We also
268 performed site-directed mutagenesis on non-conserved amino acid residues; D33H and D53H for
269 SOB5 and AtSOFL1, respectively, which are not located in the two conserved domains (Figure
270 3b). Transgenic plants overexpressing mutated SOB5 (D33H) and AtSOFL1(D53H) exhibited
271 phenotypes similar to those observed when wild-type genes are constitutively expressed (Figure
272 4a, b) (Zhang et al. 2006, 2009). These data suggest that less conserved amino acid residues are
273 not required for the observed phenotypes.
274
275 SOFL subcellular localization
276 We used two publicly available online prediction programs, Wolf PSORT (Horton et al. 2007)
277 and SeqNLS (Lin and Hu 2013), to gain insights into subcellular localization patterns of the 289
278 SOFL homologues. Wolf PSORT converts protein amino acid sequences into numerical
279 localization features based on sorting signals, amino acid composition and functional motifs such
280 as DNA-binding motifs (Horton et al. 2007). SeqNLS uses a sequential pattern mining algorithm
281 to effectively identify potential nuclear localization signals (NLS) in protein sequences (Lin and
282 Hu 2013). Wolf PSORT predicted 274 SOFL homologues to localize to the nucleus, with the
283 remainder predicted to localize to chloroplast, cytoplasm, mitochondria, peroxisome, Golgi and
284 extracellular space (Table S3). In contrast, the SeqNLS program detected nuclear localization
285 signals in only 156 SOFLs with a statistically significant prediction score of at least 0.89 (Table
286 S2) (Lin and Hu 2013). Interestingly SeqNLS did not detect nuclear localization signals in 13
287 SOFLs which were predicted to localize to the nucleus by Wolf PSORT (Table S3). 119 SOFLs
288 scored below the default cutoff threshold to be classified as containing a NLS. Overall, at least
13
289 156 SOFL homologues were predicted to localize to the nucleus by both Wolf PSORT and
290 SeqNLS algorithms (Table S3).
291
292 To further assess some of the subcellular prediction data, we examined the intracellular
293 distribution of the founding Arabidopsis thaliana SOFL members. We fused SOB5, AtSOFL1
294 and AtSOFL2 to the carboxyl end of a cyan fluorescent protein (CFP) and transiently expressed
295 the fusion proteins in onion epidermal cells. CFP-SOB5, CFP-AtSOFL1 and CFP-AtSOFL2
296 fluorescent signals were observed in both the cytoplasm and nucleus of the onion epidermal cells
297 (Figure 5). We also examined whether the fluorescent signals of all three fusion proteins were
298 localized to the cytosol and not the cell wall. This was achieved by inducing plasmolysis through
299 exposure of onion epidermal peels to 0.8 M mannitol. Ensuing the mannitol treatment, CFP
300 signal remained restricted to the edges of the plasmolyzed and plasma membrane and or
301 cytoplasm, suggesting that CFP-SOB5, CFP-AtSOFL1 and CFP-AtSOFL2 fusion proteins were
302 not localized to the cell wall.
303
304 DISCUSSION
305 SOFL homologues emerged in angiosperms
306 The founding members of the SOFL gene family were first identified via an activation tagging
307 screen in Arabidopsis thaliana (Zhang et al. 2006). Based on the limited number of sequenced
308 genomes available at the time, it was initially concluded that SOB5, AtSOFL1 and AtSOFL2 were
309 a small three-member intron-less Arabidopsis thaliana gene family. To expand on earlier studies
310 and acquire a basic level understanding of this poorly characterized gene family, we took
311 advantage of an increasing number of sequenced genomes to perform database searches using
14
312 the founding Arabidopsis thaliana founding SOFL members as queries. We have retrieved a total
313 of 289 SOFL sequences from 89 angiosperm species (Table S2). However, we cannot rule out
314 the possibility that SOFL homologues are present in unsequenced organisms or were lost in other
315 genomes. In addition, we expect additional SOFL homologues to be identified in other species as
316 more genomes are sequenced. Furthermore, as genome sequence annotation improves, it is
317 possible that some of the SOFL homologues currently classified as intron-less or intron-
318 containing may be re-categorized in the future.
319
320 No SOFL homologues were identified in Volvox cartari (green algae), Chlamydomonas
321 reinhardtii (green algae), Ostreococcus lucimarinas (single-celled water algae), Micromonas
322 pusilla (water algae), Physcomitrella patens (non-vascular bryophyte, moss) and Selaginella
323 moellendorffii (member of an ancient vascular plant lineage) (Lamesch et al. 2012), strongly
324 suggesting that SOFLs may have emerged after the separation of lycophytes, pterophytes and
325 seed plants. Out of the four main plant groups: bryophytes, seedless vascular
326 (pteridophytes/lycophytes), gymnosperms and angiosperms, SOFLs have so far only been
327 identified in angiosperms. We hypothesize that the SOFL gene family emerged and expanded
328 during the evolution of flowering plant species.
329
330 SOFL gene family is comprised of intron-containing and intron-less genes
331 Approximately 20% of SOFL homologues identified so far contain introns and 80% are intron-
332 less (Table S2). These results are inconsistent with overall data from rice and Arabidopsis
333 genomes which revealed the presence of only 19.9% and 21.7% intron-lacking genes,
334 respectively (Jain et al. 2008). Our latest database search results also showed that Arabidopsis
15
335 thaliana contains seven SOFL genes, in contrast to previous reports of three intron-less family
336 members (Zhang et al. 2006, 2009). Out of the four newly discovered Arabidopsis SOFLs, only
337 AtSOFL5 and AtSOFL6 contain introns, a departure from the previous designation of the
338 Arabidopsis SOFL family as being comprised of members lacking introns (Zhang et al. 2006,
339 2009). Genes lacking introns are a characteristic of prokaryotes and are a useful resource for
340 studying the evolution of gene architecture in eukaryotes, but information on their biological
341 significance remains limited (Yan et al. 2014). Considering that SOFLs seem to have emerged
342 during the evolution of angiosperms, it is surprising that the majority of the genes in this family
343 are intron-less, a trait expected in early species (Zou et al. 2011). On the other hand, this result
344 can be explained by the fact that majority of the intron-less SOFLs could have potentially arisen
345 because of gene duplication events (Yan et al. 2016; Lecharny et al. 2003). Gene prediction and
346 gene functional studies data suggests that intron-less genes may play unique roles in growth and
347 development, including translation and energy metabolism in maize, rice and Arabidopsis as well
348 as cell envelope and amino acid biosynthesis in rice and Arabidopsis (Yan et al. 2014; Jain et al.
349 2008). In addition, gain-of-function studies from Zhang et al. (2006, 2009) suggested that the
350 three-founding intron-lacking SOFLs may be involved in CK-related functions. According to
351 Zhao et al. (2014) the presence of introns enhances the transcription of associated genes.
352 Therefore, to further explore the biological significance of intron-less genes, studies that include
353 gain-of-function, loss of function, gene expression pattern, gene expression level analysis and
354 subcellular localization will need to be performed in the future.
355
356 SOFL-A and SOFL-B domains are important for SOB5 and AtSOFL1’s overexpression
357 phenotypes
16
358 Previously, Zhang et al. (2009) demonstrated via site-directed mutagenesis experiments that
359 certain amino residues in the SOFL domains were necessary for the manifestation of AtSOFL2’s
360 overexpression phenotype. In our study, we have similarly shown that specific conserved amino
361 acid residues in SOFL-A and SOFL-B domains are also necessary for both SOB5’s and
362 AtSOFL1’s overexpression phenotypes (Figure 4). These results, together with sequence logo
363 analysis showing that certain amino acid residues in SOFL-A and SOFL-B domains are 100%
364 conserved in all 289 SOFLs, further strengthen the hypothesis that they are important for
365 function. These results should, however, be interpreted with caution because point mutations can
366 potentially cause proteins to fold incorrectly. Nonetheless, our results demonstrated, at least, that
367 not all point mutations lead to a loss of protein function. Future experiments in which all
368 conserved residues are replaced with alanines or entire domains are deleted may provide answers
369 to whether SOFL-A and SOFL-B domains are important for function. It is not yet clear what
370 biological role these highly conserved domains play. However, conserved domains typically play
371 crucial roles in protein-protein interactions, DNA binding, and other important cellular
372 processes.
373
374 Founding Arabidopsis thaliana SOFL members localize to the nucleus and cytoplasm
375 To begin to examine the intracellular distribution of SOFLs homologues we first used nuclear
376 localization signal (NLS) detection and sub-cellular localization programs, SeqNLS (Lin and Hu
377 2013) and Wolf PSORT (Horton et al. 2007). The two programs predicted that majority of
378 SOFLs contained NLSs and may localize to the nucleus among other organelles (Table S3). This
379 hypothesis was further supported by CFP-SOFL protein fusion subcellular localization
380 experimental data, which showed that SOB5, AtSOFL1 and AtSOFL2 localize to the nucleus
17
381 and the cytosol (Figure 5). However, SOB5 subcellular localization results were at odds with
382 findings by Zhang et al, (2006) who reported that SOB5 only localized to the cytoplasm and/or
383 plasma membrane, but not the nucleus. These contrasting conclusions could be due to a different
384 experimental design and fluorescence detection methods used. The SOB5-GFP C-terminal fusion
385 protein in Zhang et al. (2006) was missing five C-terminal amino acids from SOB5, whereas in
386 our case we used a full-length SOB5 protein fused to the carboxyl end of the CFP tag. Secondly,
387 we used confocal microscopy to detect CFP fluorescence, which is more reliable compared to
388 fluorescence microscopy used by Zhang et al. (2006). In addition, we generated and analyzed N-
389 terminal fusions in contrast to the C-terminal fusion described in Zhang et al., (2006). C-terminal
390 and N-terminal tagged proteins have been shown to show opposite localization patterns (Palmer
391 and Freeman 2004), a possibility that may also explain SOB5-GFP and CFP-SOB5 localization
392 pattern. In the future, both C-and N-terminal fusions of each protein should be examined to avoid
393 such potential problems.
394
395 Even though we used a plasmolysis assay to show that the three founding SOFLs were not
396 localized to the cell wall, we could not distinguish whether they were localized in the plasma
397 membrane, cytosol or both. To try and answer this question, we used a bioinformatics approach
398 by running the 289 SOFL homologue sequences against web-based transmembrane protein
399 topology prediction algorithms. TMHMM (Kahsay et al. 2005) and TMMOD (Kahsay et al.
400 2005), both hidden Markov prediction models, did not predict any transmembrane helices in all
401 289 SOFL orthologue sequences. This outcome could be verified experimentally by
402 discriminating between cytosolic and cell membrane protein using osmotic disruption of the
403 protoplast vacuole in hypotonic solution (Serna 2005). This method results in the diffusion of the
18
404 GFP signal from the cell periphery to the central part of the cell volume, an outcome that will not
405 occur when the protein under study is attached to the cell membrane (Serna 2005). Overall, these
406 data suggest that at least, the three founding Arabidopsis thaliana SOFLs and possibly several
407 other SOFL homologues may localize to the nucleus or cytoplasm.
408
409 CONCLUSION
410 The identification of at least 289 SOFL homologues creates an opportunity to study and
411 characterize this novel gene family in at least 89 flowering plant species. A combination of gain-
412 of-function, loss-of-function, protein-protein interaction and CK quantitation studies will go a
413 long way towards answering various questions raised by Zhang et al. (2006, 2009) about the
414 SOFL gene family. One critical question is whether CK nucleosides and nucleotides have
415 biological activity in plant growth and development, as suggested by Zhang et al., (2006, 2009).
416 Gain-of-function and CK-quantitation studies in selected species may be used to test the
417 hypothesis that overexpression of SOB5, AtSOFL1 and AtSOFL2 homologues can cause similar
418 CK-related phenotypes reported by Zhang et al., (2006, 2009). Results from such studies can
419 then be compared to higher order null mutants in which putative redundant SOFLs from the same
420 phylogenetic clades are knocked out. In addition, our site-directed mutagenesis studies suggested
421 that conserved SOFL-A and SOFL-B domains were important for function. Similar studies
422 involving mutagenesis of additional conserved amino acid residues in both Arabidopsis and other
423 species will likely provide new functional details regarding these poorly characterized domains.
424 Finally, even though our latest database searches suggest that SOFLs are a plant specific gene
425 family, the ever-increasing number of sequenced genomes will continue to test this hypothesis.
426
19
427 ACKNOWLEDGEMENTS
428 We thank Dr. Jingyu Zhang of the Institute of Botany, Chinese Academy of Sciences, Beijing,
429 China for assistance in providing background information regarding prior work. We also thank
430 Dr. Amit Dhingra and his lab members at Washington State University for help using the
431 Biolistic PDS- 1000/He Particle Delivery System during transient expression of CFP-protein
432 fusion in onion epidermal layers. We are grateful to Dr. Daniel Mullendore of the WSU
433 Franceschi Microscopy and Imaging Center for the guidance and assistance in using the confocal
434 microscope during sub-cellular localization analyses. This research was supported by the USDA
435 National Institute of Food and Agriculture, HATCH project 1007178 (to M.M.N.). This work
436 was also supported in part by the funds provided by the Program in WSU Molecular Plant
437 Sciences GPSI Fellowship (to R.T.).
438
439 Conflict of interest
440 The authors have no conflict of interest to declare.
441
442
443
444
445
446
447
448
449
20
450 Short legends for Supporting Information
451 Table S1. Primers used in this manuscript. List of primers that were used for genotyping,
452 construct development and measuring transcript accumulation.
453
454 Table S2. A total of 289 SOFL homologues were identified in at least 89 flowering plant
455 species. Founding Arabidopsis thaliana SOFL family members were used as queries to blast the
456 NCBI (Sayers et al. 2009) and Phytozome databases. SOFL gene family homologues were
457 retrieved from genome sequences of 89 monocotyledon and dicotyledon species. The total
458 number of SOFLs included in this list is not final and is likely to change over time as the
459 curation, annotation and sequencing of more genome sequences continue to improve and
460 increase.
461
462 Table S3. Subcellular localization prediction analysis. All 289 SOFL orthologue protein
463 sequences were analyzed through Wolf PSORT (Horton et al. 2007) and SeqNLS (Lin and Hu
464 2013) online databases. SeqNLS did not detect NLS in 14 SOFL homologues and classified an
465 additional 119 SOFLs as scoring below a cut-off threshold score to be confidently predicted as
466 nuclear localizing (containing residues in yellow, green, cyan, blue and navy-blue shades). Wolf
467 PSORT predicted 274 SOFLs out of 289 homologues to localize to the nucleus. Underlined
468 orange and red colored parts of the peptide sequence represent predicted NLSs.
469
470 Figure S1. AtSOFL1 is alternatively spliced into AtSOFL1.1 and AtSOFL1.2 splice variants.
471 (a) AtSOFL1.1 is the first splicing variant and represents At1g26210, previously thought to be an
472 intron-less gene, and includes a partially retained intron (dashed box and red solid line).
21
473 AtSOFL1.2 represents the second splicing variant and includes two predicted exons. The blue
474 box [a] and the green box [b] both represent the two predicted exons. The red solid line
475 represents an intron. The brown box represents 3’ UTR region. Small orange and purple regions
476 represent the SOFL-A and SOFL-B domains, respectively. (b) Agarose gel image showing a
477 AtSOFL1.1 PCR product amplified from genomic DNA and an alternatively spliced AtSOFL1.2
478 semi-quantitative PCR product. M = marker.
479
480 File S1. SOFL homologue protein sequences. List of 289 SOFL homologues retrieved from
481 NCBI and Phytozome database searches.
482
483 File S2. Annotation of AtSOFL1 and the two alternative splicing variants. Our BLASTP
484 search revealed a, (a) 168 amino acid hypothetical protein, GenBank accession number
485 AAG50669.1 with a predicted (b) 507 nucleotide long coding sequence (CDS), GenBank
486 accession number AC079829.6 from which we designed PCR primers (Table S1). To further
487 investigate this putative SOFL gene, we performed PCR using genomic DNA and
488 complementary DNA as templates and amplified products of (c) 986 and (d) 771 base pairs sizes
489 (which includes the 3’ UTR sequence), respectively. Based on the predicted CDS length we had
490 expected a PCR product of 507 base pairs if we used cDNA as template. To investigate this
491 anomaly, we sequenced the two amplified products. Sequence alignment of the genomic DNA
492 and cDNA PCR amplified products revealed that AtSOFL1 contains a 215-base pair intron
493 (highlighted in grey). Sequence alignment analysis showed that (e) AtSOFL1.1’s first 319
494 nucleotides out of 986 are 100% identical to (f) AtSOFL1 (At1G26210) gene’s CDS.
495 Furthermore, the terminal 128 nucleotides of AtSOFL1.1’s CDS are 100% identical to the first
22
496 segment of AtSOFL1’s 215 nucleotide long intron segment. This suggests that AtSOFL1’s 986
497 nucleotide-long gene produce two alternatively spliced variants; AtSOFL1.1 and AtSOFL1.2.
498 AtSOFL1.1 CDS is translated into a 148-amino acid long peptide and (g) AtSOFL1.2 CDS is
499 translated into a predicted 118 amino acid products. *represents a stop codon.
500
501 FIGURE LEGENDS
502 Figure 1. Phylogenetic analysis of 289 SOFL homologue protein sequences showed that the
503 gene family evolved into four main clades. (A) and (B) Annotated rectangular layout mid-point
504 root Mr. Bayes phylogenetic trees. Bayesian inference analysis was performed on 289 SOFL
505 homologues amino acid sequences with the Mr. Bayes 3.2.1 on XSEDE tool on CIPRES Science
506 Gateway for 30 million generations with 8 chains to run with convergence at 0.0099 (Miller et al.
507 2015).
508
509 Figure 2. Expression levels of Arabidopsis thaliana SOFL genes in various tissues. Total
510 RNA was isolated from; 6-day old seedlings (S), juvenile plants (J), rosette leaf (L), floral
511 structure (F), siliques (SQ), whole adult plant (WP) and roots (R) and used for RT-PCR analysis.
512 UBIQUITIN10 was used as an internal control for PCR. No reverse transcriptase (No RT) RNA
513 samples were used as negative controls.
514
515 Figure 3. Multiple sequence alignment and analysis of SOFL proteins. (a) Illustration
516 showing the topology of SOFL proteins. Purple rectangular blocks represent the two conserved
517 domains, SOFL-A and SOFL-B. (b) N-terminal amino acid sequence alignment of Arabidopsis
518 thaliana SOFL proteins. The alignment was obtained using JalView Probcons with Defaults
519 program (Malek 2001; Zhang 2003). Semi-conserved amino acids are indicated with a light
23
520 purple shade. Conserved amino acids are indicated by a darker purple shade. Red squares
521 indicate amino acid residues that were selected for site-directed mutagenesis. WebLogo©
522 analysis of conserved, (c) SOFL-A domain, and (d) SOFL-B domain, using all 289 SOFL protein
523 sequences (Lin and Hu 2013). Red asterisk denotes 100% conserved amino acid residues among
524 all 289 SOFL sequences.
525 Figure 4. Phenotypes of transgenic plants overexpressing full-length SOB5 and AtSOFL1
526 genes harboring point mutations generated via site-directed mutagenesis. (a) Wild-type and
527 sob5-D control plants compared to transgenic plants overexpressing the following SOB5 point
528 mutations: T21I, P61R and D33H. D33H mutation is in a non-conserved portion of SOB5 gene.
529 (b) Wild-type and 35S:AtSOFL1 control plants compared to transgenic plants overexpressing the
530 following AtSOFL1 point mutations: T23I, P84R and D53H. D53H is in a non-conserved portion
531 of AtSOFL1 gene. All plants were grown together in a growth chamber for two weeks. Three
532 independent transgenic lines were analyzed. This experiment was repeated three times with
533 similar outcomes.
534 Figure 5. Subcellular localization of CFP-SOB5 (a), CFP-AtSOFL1 (b), CFP-AtSOFL2 (c)
535 fusion proteins in onion epidermal cells. Most biochemical functions carried out in plant cells
536 are performed by proteins in specific cellular locations. Protein subcellular localization studies, via
537 fluorescent protein fusions, are a useful tool to narrow down cellular functions of novel or unknown
538 proteins. Onion epidermal peels were biolistically bombarded with CFP-protein fusion constructs
539 and incubated in the dark for 40 hours, then visualized under a confocal microscope. Each panel
540 shows, in a clockwise direction, CFP-protein fusion signal, free RFP signal, merge of CFP and
541 RFP signal and bright-field view showing outline of cells. White arrows indicate outline of
24
542 plasmolyzed membrane. Free RFP construct was used as a control for successful bombardment
543 as well as a localization marker. Plasmolysis was induced by treatment with 0.8 M mannitol.
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
25
565 LITERATURE CITED
566 Clough, S.J., and A.F. Bent, 1998 Floral dip: a simplified method for Agrobacterium-mediated
567 transformation of Arabidopsis thaliana. Plant J 16 (6):735-743.
568 Crooks, G.E., G. Hon, J.M. Chandonia, and S.E. Brenner, 2004 WebLogo: a sequence logo generator.
569 Genome Res 14 (6):1188-1190.
570 Earley, K.W., J.R. Haag, O. Pontes, K. Opper, T. Juehne et al., 2006 Gateway-compatible vectors for
571 plant functional genomics and proteomics. Plant J 45 (4):616-629.
572 Fang, H., M.E. Oates, R.B. Pethica, J.M. Greenwood, A.J. Sardar et al., 2013 A daily-updated tree of
573 (sequenced) life as a reference for genome research. Sci Rep 3:2015.
574 Goodstein, D.M., S. Shu, R. Howson, R. Neupane, R.D. Hayes et al., 2011 Phytozome: a comparative
575 platform for green plant genomics. Nucleic Acids Res 40 (Database issue):D1178-1186.
576 Guo, Y.L., 2012 Gene family evolution in green plants with emphasis on the origination and evolution of
577 Arabidopsis thaliana genes. Plant J 73 (6):941-951.
578 Higuchi, M., M.S. Pischke, A.P. Mahonen, K. Miyawaki, Y. Hashimoto et al., 2004 In planta functions of
579 the Arabidopsis cytokinin receptor family. Proceedings of the National Academy of Sciences of
580 the United States of America 101 (23):8821-8826.
581 Horton, P., K.J. Park, T. Obayashi, N. Fujita, H. Harada et al., 2007 WoLF PSORT: protein localization
582 predictor. Nucleic Acids Res 35 (Web Server issue):W585-587.
583 Jain, M., P. Khurana, A.K. Tyagi, and J.P. Khurana, 2008 Genome-wide analysis of intronless genes in
584 rice and Arabidopsis. Funct Integr Genomics 8 (1):69-78.
585 Kahsay, R.Y., G. Gao, and L. Liao, 2005 An improved hidden Markov model for transmembrane protein
586 detection and topology prediction and its applications to complete genomes. Bioinformatics 21
587 (9):1853-1858.
588 Kersey, P.J., J.E. Allen, I. Armean, S. Boddu, B.J. Bolt et al., 2016 Ensembl Genomes 2016: more
589 genomes, more complexity. Nucleic Acids Res 44 (D1):D574-580.
26
590 Kim, S.Y., 2006 The role of ABF family bZIP class transcription factors in stress response. Physiologia
591 Plantarum 126 (4):519-527.
592 Kim, S.Y., S.G. Kim, Y.S. Kim, P.J. Seo, M. Bae et al., 2007 Exploring membrane-associated NAC
593 transcription factors in Arabidopsis: implications for membrane biology in genome regulation.
594 Nucleic Acids Res 35 (1):203-213.
595 Kondou, Y., M. Higuchi, and M. Matsui, 2010 High-throughput characterization of plant gene functions
596 by using gain-of-function technology. Annu Rev Plant Biol 61:373-393.
597 Lamesch, P., T.Z. Berardini, D. Li, D. Swarbreck, C. Wilks et al., 2012 The Arabidopsis Information
598 Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40 (Database
599 issue):D1202-1210.
600 Lecharny, A., N. Boudet, I. Gy, S. Aubourg, and M. Kreis, 2003 Introns in, introns out in plant gene
601 families: a genomic approach of the dynamics of gene structure. J Struct Funct Genomics 3 (1-
602 4):111-116.
603 Lin, J.R., and J. Hu, 2013 SeqNLS: nuclear localization signal prediction based on frequent pattern
604 mining and linear motif scoring. PLoS One 8 (10):e76864.
605 Malek, J.A., 2001 Abundant protein domains occur in proportion to proteome size. Genome Biol 2
606 (9):RESEARCH0039.
607 Miller, M.A., T. Schwartz, B.E. Pickett, S. He, E.B. Klem et al., 2015 A RESTful API for Access to
608 Phylogenetic Tools via the CIPRES Science Gateway. Evol Bioinform Online 11:43-48.
609 Palmer, E., and T. Freeman, 2004 Investigation into the use of C- and N-terminal GFP fusion proteins for
610 subcellular localization studies using reverse transfection microarrays. Comp Funct Genomics 5
611 (4):342-353.
612 Qu, L.J., and Y.X. Zhu, 2006 Transcription factor families in Arabidopsis: major progress and
613 outstanding issues for future research. Curr Opin Plant Biol 9 (5):544-549.
614 Roshan, U., 2014 Multiple sequence alignment using Probcons and Probalign. Methods Mol Biol
615 1079:147-153.
27
616 Sayers, E.W., T. Barrett, D.A. Benson, S.H. Bryant, K. Canese et al., 2009 Database resources of the
617 National Center for Biotechnology Information. Nucleic Acids Res 37 (Database issue):D5-15.
618 Serna, L., 2005 A simple method for discriminating between cell membrane and cytosolic proteins. New
619 Phytol 165 (3):947-952.
620 Vollrath, D., V.L. Jaramillo-Babb, M.V. Clough, I. McIntosh, K.M. Scott et al., 1998 Loss-of-function
621 mutations in the LIM-homeodomain gene, LMX1B, in nail-patella syndrome. Hum Mol Genet 7
622 (7):1091-1098.
623 Wang, Y., X. Wang, and A.H. Paterson, 2012 Genome and gene duplications and gene expression
624 divergence: a view from plants. Ann N Y Acad Sci 1256:1-14.
625 Wheeler, D.L., T. Barrett, D.A. Benson, S.H. Bryant, K. Canese et al., 2008 Database resources of the
626 National Center for Biotechnology Information. Nucleic Acids Res 36 (Database issue):D13-21.
627 Xie, Q., A.P. Sanz-Burgos, H. Guo, J.A. Garcia, and C. Gutierrez, 1999 GRAB proteins, novel members
628 of the NAC domain family, isolated by their interaction with a geminivirus protein. Plant Mol
629 Biol 39 (4):647-656.
630 Yan, H., X. Dai, K. Feng, Q. Ma, and T. Yin, 2016 IGDD: a database of intronless genes in dicots. BMC
631 Bioinformatics 17:289.
632 Yan, H., W. Zhang, Y. Lin, Q. Dong, X. Peng et al., 2014 Different evolutionary patterns among
633 intronless genes in maize genome. Biochem Biophys Res Commun 449 (1):146-150.
634 Zhang, J., R. Vankova, J. Malbeck, P.I. Dobrev, Y. Xu et al., 2009 AtSOFL1 and AtSOFL2 act
635 redundantly as positive modulators of the endogenous content of specific cytokinins in
636 Arabidopsis. PLoS One 4 (12):e8236.
637 Zhang, J., E.L. Wrage, R. Vankova, J. Malbeck, and M.M. Neff, 2006 Over-expression of SOB5 suggests
638 the involvement of a novel plant protein in cytokinin-mediated development. Plant Journal 46
639 (5):834-848.
640 Zhang, J.Z., 2003 Overexpression analysis of plant transcription factors. Curr Opin Plant Biol 6 (5):430-
641 440.
28
642 Zhao, J., D.S. Favero, J. Qiu, E.H. Roalson, and M.M. Neff, 2014 Insights into the evolution and
643 diversification of the AT-hook Motif Nuclear Localized gene family in land plants. BMC Plant
644 Biol 14 (1):266.
645 Zou, M., B. Guo, and S. He, 2011 The roles and evolutionary patterns of intronless genes in
646 deuterostomes. Comp Funct Genomics 2011:680673.
647
648
29
1 BnSOFL15 1 BnSOFL20 1 BrSOFL11 1 BnSOFL19 1 BnSOFL16 BrSOFL9 0.84 1 BnSOFL18 1 BnSOFL11 BrSOFL8 1 1 RsSOFL5 0.99 RsSOFL7 1 BnSOFL13 0.74 1 BrSOFL10 0.98 1 BnSOFL14 1 RsSOFL8 EsSOFL2 AaSOFL2 1 1 0.89 CsaSOFL9 CsaSOFL10 0.99 CsaSOFL8 1 CgSOFL3 0.79 1 1 CrSOFL1 1 AhSOFL2 0.8 AlSOFL7 1 AtSOFL2 1 BsSOFL3 ThSOFL3 1 CsaSOFL6 1 1 CsaSOFL7 1BsSOFL2 1 CgSOFL2 AhSOFL3 0.98 AlSOFL3 1 AtSOFL1 1 BnSOFL6 BnSOFL7 1 RsSOFL4 1 BnSOFL17 0.87 BrSOFL5 1 1 BrSOFL6 RsSOFL6 BnSOFL8 BrSOFL7 0.98 SlSOFL3 1 SpSOFL1 0.86 StSOFL1 1 StSOFL3 1 CanSOFL1 1 CanSOFL21 NtSOFL1 1 NsSOFL2 0.65 1 NaSOFL1 NaSOFL21 1 NtoSOFL2 1 NtoSOFL1 1 NsSOFL3 1 1 DhSOFL1 1 DhSOFL2 1 OeSOFL1 1 0.78 1 GaSOFL1 1 HiSOFL1 0.61 1 EgrSOFL1 CcaSOFL1 0.86 GarSOFL2 1 GhSOFL1 0.94 GhSOFL2 0.8 1 GrSOFL3 DzSOFL1 1 1 HuSOFL1 TcSOFL3 1 CcapSOFL2 0.88 1 CoSOFL2 0.95 CfSOFL1 CpSOFL1 FvSOFL1 1 HbSOFL1 1 MeSOFL4 0.99 1 1 PtSOFL1 0.61 1 PtSOFL3 1 1CsiSOFL1 CsiSOFL4 0.86 1 1 1 MdSOFL3 1 1MdSOFL5 1 PbSOFL5 0.88 1 PpSOFL31 1 PpSOFL5 MnSOFL2 1 1 1 GsSOFL3 0.99 GsSOFL51 1 1 1 LaSOFL2 1 LaSOFL3 Clade I 1 1 CarSOFL4 1 MeSOFL1 1 1 MeSOFL3 HbSOFL2 1 1 1 HbSOFL3 MeSOFL2 0.99 1 JcSOFL2 RcSOFL1 1 PeSOFL1 PtSOFL5 1 1 0.96 1 NtoSOFL4 1 NsSOFL1 1 SlSOFL2 1 1 StSOFL2 1 0.98 SlSOFL1 1 StSOFL4 1 NtoSOFL3 PgSOFL1 0.96 1 1 GhSOFL3 0.52 1 GrSOFL1 0.85 1 1 GarSOFL3 0.98 CcapSOFL1 1 CoSOFL1 1 1 DzSOFL2 TcSOFL2 1 GarSOFL1 0.61 1 GrSOFL2 0.65 1 CsiSOFL5 MnSOFL3 ZjSOFL2 1 0.93 GmSOFL1 0.8 1 GmSOFL6 1 GsSOFL1 0.98 GmSOFL7 1 GsSOFL4 1 1 1 VaSOFL2 1 1 VrSOFL2 0.71 1 PvuSOFL1 0.8 1 CcSOFL2 1 CarSOFL2 TsSOFL2 0.79 LaSOFL1 1 0.8 GmSOFL4 0.79 GmSOFL5 0.86 1 PvuSOFL2 CarSOFL1 1 CmSOFL1 1 CusaSOFL1 Clade II Continue with next page Continue with previous page 1 0.86 GmSOFL2 1 GsSOFL2 0.54 CcSOFL1 0.56 0.98 VaSOFL1 1 VrSOFL1 1 GmSOFL3 1 1 1 1 MtSOFL2 0.99 1 TsSOFL1 1 CarSOFL3 1 LaSOFL4 LaSOFL5 1 0.79 MdSOFL1 1 PbSOFL4 1 1 MdSOFL2 1 1 PbSOFL1 0.51 0.95 PmSOFL1 0.91 1 PpSOFL1 0.78 1 MnSOFL1 0.53 1 ZjSOFL1 0.95 1 CpSOFL2 1 VvSOFL1 1 VvSOFL2 0.98 JcSOFL1 1 1 MeSOFL5 RcSOFL21 1 1 PtSOFL2 PtSOFL4 1 MdSOFL4 1 MdSOFL7 1 PbSOFL2 0.99 1 MdSOFL6 1 PbSOFL3 PpSOFL2 0.81 PpSOFL4 0.81 1 PpSOFL6 1 PpSOFL7 1PmSOFL2 1 DcSOFL1 1 1 DcSOFL2 1 DcSOFL31 1 DcSOFL4 1 CsiSOFL2 CsiSOFL3 1 EgrSOFL2 1 1 EgrSOFL3 1 NnSOFL1 1 NnSOFL2 EguSOFL1 Clade III
1 1 BnSOFL2 1 1 BnSOFL3 1 1BrSOFL1 1 BrSOFL4 0.71 1 RsSOFL21 0.56 1BnSOFL5 1 1 BrSOFL3 RsSOFL11 0.94 1 1 BnSOFL4 BnSOFL21 0.93 0.98 0.83 BnSOFL1 1 1 BrSOFL2 BnSOFL22 1 1 RsSOFL3 1 1 AaSOFL1 1 CsaSOFL1 1 1 CsaSOFL2 CsaSOFL3 1 0.99 1 1 CsaSOFL4 1 1 BsSOFL1 1 1 CgSOFL1 1 CcSOFL3 1 1 MtSOFL1 AhSOFL1 1 CarSOFL7 1 AlSOFL1 1 SOB5 0.99 CClSOFL1 0.99 ThSOFL1 1 CsiSOFL6 ThSOFL2 1 1 JcSOFL3 0.61 1 1 MdSOFL8 1 MdSOFL9 0.8 MdSOFL10 0.99 1 MdSOFL11 1 MdSOFL12 CClSOFL2 1 0.68 1 MaSOFL5 0.61 1 MaSOFL6 1 1 MaSOFL4 1 AlSOFL6 1 AtSOFL5 1 CarSOFL5 1 CarSOFL6 0.51 AlSOFL5 1 AtSOFL6 1 1 CcarSOFL1 1 HaSOFL1 TcSOFL1 1 1 1 PviSOFL1 0.99 PviSOFL2 SiSOFL1 1 1 1ZmSOFL1 1 1 ZmSOFL2 1 OtSOFL1 1 1 OsSOFL2 1 OsSOFL3 1 1 ObSOFL1 BdSOFL2 1 AtaSOFL11 1 TuSOFL1 0.99 1PhSOFL1 1 1 1 PviSOFL3 1 1 PviSOFL4 1 SiSOFL2 SvSOFL1 1 1 ZmSOFL3 1 ZmSOFL4 1 1 1 SbSOFL1 0.53 BdSOFL1 1 BstSOFL1 1 ObSOFL2 0.98 OsSOFL1 1 1 0.94 CgSOFL4 0.99 CrSOFL2 CsaSOFL5 0.94 AlSOFL4 0.59 1 AtSOFL3 1 BnSOFL10 1 BnSOFL12 0.98 EsSOFL1 0.89 1 BnSOFL9 1 1 AlSOFL2 1 AtSOFL4 1 CgSOFL5 1 ThSOFL4 1 AcSOFL1 InSOFL1 1 1 1 EguiSOFL1 1 1 EguiSOFL4 1 EguiSOFL5 1 1 PdSOFL1 1 1 1 1 PdSOFL2 1 PdSOFL5 1 EguiSOFL2 1 0.96 1 MaSOFL1 1 PdSOFL3 0.92 PdSOFL4 EguiSOFL3 1 MaSOFL2 Clade VI MaSOFL3 S J L F SQ WP R
AtSOFL6
AtSOFL5
AtSOFL4
AtSOFL3
AtSOFL2
AtSOFL1
SOB5
UBQ
UBQ No RT A
B
C * * * * *
D * * * *
Col-0 sob5-D T21I A P61R D33H
B Col-0 35S:AtSOFL1 T23I P84R D53H A
B
C