Authors: Yong Wang Guanqun Ding Tingting Gu Jing Ding Yi Li

Bioinformatic and expression analyses on carotenoid dioxygenase genes in fruit development and abiotic stress responses in Fragaria vesca

Authors: Yong Wang • Guanqun Ding • Tingting Gu • Jing Ding • Yi Li

Table S1 Databases and Common names for all 10 species we used in this study Species Common name Version Resources Amborella trichopoda Amborella Version 1.1 Phytozome V11 Arabidopsis thaliana Arabidopsis TAIR 10 Phytozome V11 Brassica rapa B. rapa Version 2.1 Phytozome V11 Fragaria vesca F. vesca Version 1.1 Phytozome V11 Oryza sativa Rice Version 7.0 Phytozome V11 Pyrus bretschneideri Pear Version 1.0 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_0003 15295.1_Pbr_v1.0 Prunus persica Peach Version 2.1 Phytozome V11 Solanum lycopersicum Tomato Version 2.4 Phytozome V11 Vitis vinifera Grape Version 12X Phytozome V11 Zea mays Maize Version 6a Phytozome V11 Table S2 The protein/Gene ID of predicted carotenoid dioxygenase genes from the other 9 species, including Amborella (Am), Arabidopsis (At), B. rapa (Br), rice (Os), peach (Ppe), tomato (Sl), grape (Vv), mazie (Zm) and pear (Pbr). Gene name Protein/Gene ID Gene name Protein/Gene ID Gene name Protein/Gene ID AmCCD1a scaffold00022.400 OsCCD1 LOC_Os12g44310 VvCCD1a GSVIVT01032103001 AmCCD1b scaffold00022.401 OsCCD4a LOC_Os02g47510 VvCCD1b GSVIVT01032110001 AmCCD4a scaffold00011.172 OsCCD4b LOC_Os12g24800 VvCCD4a GSVIVT01036862001 AmCCD4b scaffold00011.173 OsCCD4c LOC_Os10g08980 VvCCD4b GSVIVT01024318001 AmCCD8a scaffold00050.103 OsCCD7 LOC_Os04g46470 VvCCD7 GSVIVT01018217001 AmCCD8b scaffold00056.38 OsCCD8a LOC_Os01g38580 VvCCD8 GSVIVT01035626001 AmCCDL1 scaffold00056.19 OsCCD8b LOC_Os01g54270 VvCCDL1 GSVIVT01035632001 AmNCED1 scaffold00092.158 OsCCDL1 LOC_Os08g28240 VvCCDL2 GSVIVT01035633001 AmNCED6 scaffold00039.158 OsCCDL2 LOC_Os08g28410 VvCCDL3 GSVIVT01035636001 AtCCD1 AT3G63520 OsCCDL3 LOC_Os09g15240 VvCCDL4 GSVIVT01015260001 AtCCD4 AT4G19170 OsNCED1 LOC_Os04g04230 VvNCED2 GSVIVT01021507001 AtCCD7 AT2G44990 OsNCED2 LOC_Os12g42280 VvNCED3 GSVIVT01038080001 AtCCD8 AT4G32810 OsNCED3 LOC_Os07g05940 VvNCED6 GSVIVT01029057001 AtNCED2 AT4G18350 OsNCED9 LOC_Os03g44380 ZmCCD1a GRMZM2G057243 AtNCED3 AT3G14440 PpeCCD1 Prupe.2G014700 ZmCCD1b GRMZM2G376433 AtNCED5 AT1G30100 PpeCCD4 Prupe.1G255500 ZmCCD4a GRMZM2G110192 AtNCED6 AT3G24220 PpeCCD7 Prupe.2G133900 ZmCCD4b GRMZM2G150363 AtNCED9 AT1G78390 PpeCCD8 Prupe.1G448400 ZmCCD7 GRMZM2G158657 BrCCD1a Brara.D00005 PpeCCDL1 Prupe.1G449100 ZmCCD8 GRMZM2G446858 BrCCD1b Brara.I04483 PpeCCDL2 Prupe.1G449300 ZmCCDL1 AC197699.3_FG002 BrCCD1c Brara.I04484 PpeCCDL3 Prupe.1G449200 ZmCCDL2 GRMZM2G164967 BrCCD4a Brara.H01021 PpeNCED3a Prupe.4G150100 ZmNCED10 GRMZM2G473687 BrCCD4b Brara.A01020 PpeNCED3b Prupe.4G082000 ZmNCED11 AC206764.4_FG004 BrCCD7 Brara.D02722 PpeNCED6 Prupe.1G061300 ZmNCED2 GRMZM5G838285 BrCCD8 Brara.A00510 SlCCD1a Solyc01g087250 ZmNCED3 AC194863.3_FG006 BrNCED2a Brara.A00958 SlCCD1b Solyc01g087260 ZmNCED4 GRMZM2G407181 BrNCED2b Brara.K00450 SlCCD4a Solyc08g075480 ZmNCED5 GRMZM5G858784 BrNCED3a Brara.A03189 SlCCD4b Solyc08g075490 ZmNCED6 GRMZM2G330848 BrNCED3b Brara.C03527 SlCCD7 Solyc01g090660 ZmNCED7 GRMZM2G417954 BrNCED3c Brara.E02677 SlCCD8 Solyc08g066650 ZmNCED8 GRMZM2G408158 BrNCED5 Brara.I02844 SlCCDL1 Solyc08g066720 ZmNCED9 GRMZM2G328612 BrNCED6 Brara.G00531 SlNCED2 Solyc08g016720 VP14 GRMZM2G014392 BrNCED9a Brara.B02322 SlNCED3 Solyc07g056570 PbrCCD1a XP_009365068/ LOC103954936 BrNCED9b Brara.G03580 SlNCED6 Solyc05g053530 PbrCCD1b XP_009365069/ LOC103954937 PbrCCD4a XP_009363530/ PbrCCD7 XP_009373202/ PbrNCED1 XP_009354854/ LOC103953513 LOC103962245 LOC103945979 PbrCCD4b XP_009364932/ PbCCD8a XP_009367455/ PbrNCED2 XP_009360900/ LOC103954827 LOC103957092 LOC103951286 PbrCCD4c XP_009364935/ PbrCCD8b XP_009354311/ PbrNCED3 XP_009367762/ LOC103954831 LOC103945458 LOC103957334 PbrCCD4d XP_009376527/ PbrCCDL1 XP_009350949/ PbrNCED6a XP_009365475/ LOC103965230 LOC103942488 LOC103955326 PbrCCD4e XP_009346749/ PbrCCDL2 XP_009367500/ PbrNCED6b XP_009365476/ LOC103938475 LOC103957132 LOC103955327 PbrCCD4f XP_009349668/ PbrCCDL3 XP_009368017/ LOC103941199 LOC103957555 Table S3 Primers used in our qPCR studies Primer Sequences (5'-3') FveGAPDH-F CATTCATCACCACCGACTACA FveGAPDH-R GAAGGGTCTTCTCATCCTTGAC FveNCED2-F AACACCTTGCGTGCATAGTC FveNCED2-R TGGAGAGGTGGTGAGGTATTT FveNCED3-F TAAGCGATCAACGACTGCTGTG FveNCED3-R ATGTGCCTGAAGTAGTAGCTGT FveNCED6-F1 CACTACGATCTCCACCATTCTG FveNCED6-R1 CGAGTAGTGCTGAAATGCAAAG FveNCED6-F2 CTAATCCTATGTTTACGCCGTTG FveNCED6-R2 GCGGGTGTACCTACAAGAATAG FveCCD1-F CTTCGTGACAGAGGAGCAATTA FveCCD1-R GTGACCATCTACGCAACATAGT FveCCD4-F CCTCACCATCTCCTCTGTTAGA FveCCD4-R AGTGTGCTTGGGTTGTAGTG FveCCD7-F GCTCTTACGAGTGGTTCAATTTC FveCCD7-R CAATGACTTGCTTCCGGTTATG FveCCD8-F ATTGTTGCGAGTGTGGAAGTGC FveCCD8-R AGGATGGTGGTATCAGCGTTGT FveCCDL1-F CATGCCTCGTTATGGTGATGC FveCCDL1-R TGGATGTGAGAGCCCTACAGC FveCCDL2-F CTTGTTGCCAAAGGTGGAATATC FveCCDL2-R CCAGCCATCATCTTCCTTGAT FveCCDL3-F TCGGTGTTTGGAAGGTCAAGT FveCCDL3-R CCGTCCATCTGCAACCATCAC FveCCDL4-F GTCGCCTGCAACTTGGGAGAA FveCCDL4-R TCCGAAGATCGAAACTGCTGCT FveCCDL5-F ACCTGCCACAGCAGGAGACAG FveCCDL5-R GCACATAGGTCTCATCATTCGC

Table S4 The motifs containing the conserved histidine, aspartate or glutamate residues in each group of the carotenoid dioxygenase gene family. Group Motif containing the conserved residue Glu264 His347 His298 His412 Glu447 Glu530 His590 NCED Motif 6 Motif 7 Motif 1 Motif 5 Motif 4 Motif 14 Motif 3 CCD1 Motif 6 Motif 7 Motif 1 Motif 5 Motif 16 Motif 14 Motif 3 CCD4 Motif 6 Motif 7 Motif 1 Motif 5 Motif 16 Motif 14 Motif 3 CCD7 Motif 6 Motif 1 Motif 4 Motif 14 CCD8 Motif 18 Motif 1 Motif 5 Motif 4 Motif 14 Motif 3 CCDL Motif 6 Motif 7 Motif 1 Motif 5 Motif 4 Motif 14 Motif 3 The number after each conserved histidine and glutamate residue indicate the position of the residue in the VP14 protein shown in Fig. S1.

Table S5 Distribution of conserved motifs in the 6 groups of carotenoid dioxygenase proteins in the 10 species Motif 1 2 3 4-5 6-7 8 9 10 11-13 14-15 16 17 18 19-20 NCED + + + + + + + + + + + + + + CCD4 + + + + + + + + + + + + + CCD1 + + + + + + + + + + + + CCDL + + + + + + + + + + CCD7 + + + + + + CCD8 + + + + + + + + means the presence of the motif in most proteins of the group. The motifs were identified using full-length protein sequences in MEME online system. The detail motif composition and sequence logos were shown in Fig. S2.