Supporting Information Appendix

SI Materials and Methods

Plant materials. Bread wheat (Triticum aestivum L.), diploid ancestral species of T.

urartu and A. tauschii and allotetraploid T. turgidum were used in this study. The

mutants ms1d.1 and ms1e were obtained from the Wheat Genetics Resource Center at

Kansas State University. We screened ms1d.2 and ms1h-q from an EMS-mutagenized population of bread wheat (variety ‘Ningchun 4’). The used for map-based cloning were progeny segregated from heterozygous ms1e plants. Triticum turgidum L. accession Langdon (AABB), T. urartu accession G1812 (AA) and Ae. tauschii accession AL8/78 were provided by Professor Hongqing Ling (Institute of Genetics and Developmental Biology, Chinese Academy of Sciences). All plants were grown in a greenhouse under long-day conditions (16 h of light at 22–25°C/8 h of dark at

15–20°C) at a white light intensity of 250 mmol/m2 s.

The preparation of EMS-mutagenized population and isolation of ms1 alelles in

‘Ningchun 4’ variety. The wheat variety ‘Ningchun 4’ was used for preparation of

EMS-mutagenized population. In brief, about 200 kilograms of seeds were soaked in

0.5% (v/v) ethyl methane sulfonate (EMS, Sigma-Aldrich, St Louis, MO, USA) for

12 hours at room temperature (about 25°C), and were planted in a field at ,

China; 990 kilograms of M2 seeds were harvested. One hundred and thirty-five male sterile mutants were screened from a population of 50 kilograms of M2 seeds (about 1 million seeds). After allelism test, 11 of the 135 male sterile mutants were confirmed

1 to be ms1 alleles.

RT-PCR, qRT-PCR and in situ hybridization. Total RNA was isolated using TRI

reagent (Takara Bio Inc.); genomic DNA was removed with DNase I (Promega). Two

micrograms of RNA per sample were used to synthesize cDNA using a First-Strand

cDNA Synthesis Kit (Thermo Fisher). RT-PCR was performed with LA Taq (Takara

Bio Inc.). qRT-PCR was performed on a cycler apparatus (Bio-Rad) using SYBR

Premix Ex Taq GC (Takara Bio Inc.) according to the manufacturer’s instructions.

Amplification was conducted as follows: 95°C for 2 min, followed by 40 cycles of

95°C for 5 s and 60°C for 35 s. ACTIN was used as an internal control. Three biological replicates with three technique repeats per replicate were conducted. The

primers used for RT-PCR and qRT-PCR are provided in SI Appendix Table S9.

In situ hybridization was performed according to Shitsukawa et al.(1) with minor

modifications. Tissues were cut into 10-μm-thick sections and hybridization was

performed overnight at 50°C. The probe (a 971-bp Ms1 fragment) was amplified

using primers Ms1-ISH-F/R (SI Appendix Table S10) and inserted into pEASY-T1

Simple Cloning Vector (TransGen Biotech) in both forward and reverse orientations.

The vectors were linearized by digestion with HindIII and EcoRI and used as a

template to generate anti-sense and sense probes with T7 RNA polymerase.

Histological analysis. For paraffin sections, tissues were prepared as for RNA in situ

hybridization. Transverse sections (10 μm thick) were cut and stained with 0.25%

2 toluidine blue. Each section was observed under an Axio Imager M2 microscope

(Zeiss).

RNA-seq, resequencing and bioinformatics processing of the sequence data. To map Ms1, we first collected the ms1e allele (from an EMS-mutagenized line of wild-type Chris) for further analysis (2). Male-sterile and wild-type plants produced from heterozygous ms1e plants were collected for MutMap-based cloning.

To obtain RNA-seq data, RNA was extracted using RNA extraction kits

(Qiagen) from microspore-stage anthers of segregated wild-type plants, ms1e plants and their heterozygous progeny. Paired-end libraries were prepared from 10 µg of cDNA reverse-transcribed from the RNA samples (mean insert size: 250 bp). The libraries were sequenced using the Illumina HiSeq 2000 system to produce 101-bp paired-end DNA reads. Library preparation and sequencing were performed at the sequencing center of Peking University (BIOPIC sequencing platform). We obtained

25.0 Gb of data for wild type, 47.6 Gb of data for ms1e and 12.2 Gb of data for the heterozygous plants (SI Appendix Table S1). The reads were trimmed for quality control and to remove adapter sequences with Trimmomatic (3) and then aligned to the wheat genome (4) using TopHat2 (5) (parameter: --b2 -mp 40). After filtering for sequence repeats and reads with multiple mapping regions in the genome, the clean reads were extracted with SAMtools (6) and then further processed using perl scripts.

To obtain resequencing data, DNA was extracted using plant DNA extraction kits

(Qiagen) from 10-day-seedlings of segregated homozygous progeny of wild-type and

3 ms1e plants. Paired-end libraries were prepared from 1 µg of DNA (mean insert size:

350 bp). The libraries were sequenced using the Illumina HiSeq 2500 system to

produce 150-bp paired-end reads. Library preparation and sequencing were performed

by Novogene Co. DNA resequencing generated 524.7 Gb of data for wild type and

522.1 Gb of data for ms1e (SI Appendix Table S3). The reads were trimmed for

quality control and to remove adapter sequences with Trimmomatic (3), and then

aligned to the available wheat genomes, including IWGSC (4), TGAC (7) and W7984

(8), using Bowtie 2 (9) (parameter: --mp 40). Sequence repeats and reads with

multiple mapping locations were filtered. The RNA-seq and resequencing data have

been deposited in the NCBI’s SRA database (accession no. SRP113349). All data will be publicly available after the publication of this work.

Identification of candidate SNPs between wild type and ms1e from the RNA-seq

and resequencing data by MutMap analysis. We applied the MutMap method (10)

to our RNA-seq and resequencing data to map Ms1. After filtering SNPs with low

read coverage (<6), index values for the SNPs were computed as follows: index =

Nmutant/(Nreference + Nmutant), where N represents the number of accumulated reads with

corresponding genotypes. The SNPs were mapped to the wheat chromosome (8), and

peaks with high indexMU/indexWT ratios were identified as candidate chromosomal

regions containing Ms1. To exclude bias due to index values caused by homologous

sequences outside the candidate chromosomal region, two steps were performed. First,

SNPs from homologous genes in the wheat genome were identified by comparing

4 sequences from the candidate region with the whole genome sequence using BLASTn,

and SNPs obtained from the BLAST analysis were filtered during index calculation.

Second, haplotypes for 200-bp regions around each candidate SNP were generated

using Haploview (11), and SNPs from the different haplotypes were removed so that

only the index ratios of the same haplotypes between wild type and ms1e were

calculated. When indexWT =0, we define that the indexMU/indexWT ratio is 15 in the

RNA-seq analysis and is 30 in the re-sequencing analysis. Thus, the highest

indexMU/indexWT ratio is 15 and 30 for the RNA-seq analysis and the re-sequencing

analysis, respectively, in our analysis. As the loci with indexMU/indexWT ratios

lower than 2 in RNA-seq analysis and lower than 5 in re-sequencing analysis

represent the low possibilities for the candidate genes, we only included the loci with

indexMU/indexWT ratios higher than 2 in RNA-seq analysis and higher than 5 in re-sequencing analysis in Fig. 1F and G, respectively.

Molecular cloning of Ms1

A traditional map-based cloning approach was adopted to clone Ms1 using SNPs

between wild-type and ms1e as markers. Using 112 male-sterile plants segregated

from ms1e heterozygotes, we initially mapped Ms1 to interval YZ5–YZ2 with the

SNP markers derived from our RNA-seq data. High-resolution markers were

developed using DNA-seq data; Ms1 was initially mapped between DYZ18 and YZ2,

and then to a 198-kb region between DYZ23 and DYZ19.

5 Southern blotting. Genomic DNA was extracted from young leaves of T. aestivum L.

(Ms1/Ms1 and ms1g/ms1g), T. turgidum L. accession Langdon (AABB), T. urartu accession G1812 (AA) and Ae. tauschii accession AL8/78 (DD) by the cetyl

trimethylammonium bromide (CTAB) method. The concentration of the purified DNA

was quantified with a nucleic acid analyzer (NanoDrop 2000; Thermo Scientific).

Forty micrograms of each DNA sample was digested overnight at 37°C with HindIII

(Takara Bio Inc.), then purified and separated on a 0.8% agarose gel overnight at 4°C and 35 V. The separated genomic DNA was transferred to Amersham Hybond-N+

nylon membranes (GE Healthcare) and immobilized by UV crosslinking. The probe

DNA was labeled with digoxigenin according to the manufacturer’s guidelines (DIG

Probe Synthesis Kit; Roche); the primers for the probe are listed in SI Appendix Table

S10. We used a 469-bp fragments from the first intron of Ms1 as probe to get the result included in the manuscript. The identities between Ms1 and Ms-A1 and Ms-D1 of the 469-bp probe sequence are 71% and 76%, respectively. The membranes were probed and then analyzed using a chemiluminescence kit (RPN2106; GE Healthcare).

Sequence alignment and phylogenetic tree analysis. Sequences were aligned with

Clustal X 2.1 and a phylogenetic tree was constructed using Molecular Evolutionary

Genetics Analysis (MEGA) 5.2.1 software by the neighbour-joining method. A

bootstrap analysis of 1000 replicates was performed to provide confidence estimates

for the tree topologies. SI Appendix, Fig. S5 shows the amino acid sequences used for

tree construction.

6

DNA methylation analysis. Genomic DNA was isolated from spikes at meiosis in each sample by the CTAB method. DNA samples (30 μg) were treated with proteinase K (AMRESCO Inc.) at 45°C for 1 h. Next, 1.8 μg of purified DNA was treated with an EZ DNA Methylation-Gold Kit (ZYMO Research). Nested PCRs were then performed with Ex Taq HS DNA Polymerase (Takara Bio Inc.). The products were purified using a HiPure Gel Pure DNA Mini Kit (Magen) and cloned into a pEASY-T1 Simple Cloning Vector (TransGen Biotech). For each amplicon, more than

24 clones were sequenced; the sequence data were analyzed using Kismeth software

(12). The primers used to amplify the promoter regions are listed in SI Appendix

Table S9.

Complementation of ms1. For functional complementation of ms1, Ms1 genomic

DNA (including 2,205 bp upstream of the ATG, the gene body region and 536 bp downstream of the TGA) was inserted into pAHC20 digested with HindIII to construct pAHC20-Ms1p::Ms1. Next, pAHC20-Ms1p::Ms1 was transformed into a callus induced from the immature embryo of a heterozygous ms1e plant via particle bombardment, and transgenic plants were selected and regenerated. Transgenic plants in an ms1e/ms1e background were identified by PCR amplification and sequencing.

To evaluate the function of Ms-A1, pAHC20-Ms1p::Ms-A1 was constructed by replacing the Ms1 gene body region with the Ms-A1 gene body region in pAHC20-Ms1p::Ms1. pAHC20-Ms1p::Ms-A1 was then transformed into a callus

7 induced from the immature embryo of a heterozygous ms1g plant via particle

bombardment, and transgenic plants were selected and regenerated. Transgenic plants

in the ms1g/ms1g background were identified by PCR amplification and sequencing.

The primers used to prepare the constructs are provided in SI Appendix Table S10.

Subcellular localization assay. The plasmids used in the subcellular localization

assay were constructed in p35S-GFP, a pUC-based expression vector that includes the

CaMV35S promoter, GFP reporter gene and polyA of rbcS. To create the

p35S::Ms1-GFP, p35S::Ms1SP-GFP, p35S::Ms1△TM-GFP or p35S::Ms1△SP-GFP construct, full-length or truncated Ms1 cDNA was amplified using a cDNA library prepared from wheat anthers and inserted into p35S-GFP via digestion with KpnI and

BamHI. The primer sequences used for PCR amplification are given in SI Appendix

Table S10.

The peroxisome marker PTS1-mCherry was created by adding the sequence encoding SKL to the 3' end of mCherry (13). The Golgi marker GmMan1-mCherry was created by adding the sequence encoding the first 49 amino acids of GmMan1 to the 5' end of mCherry as described previously (13). The mitochondria marker pFAγ-mCherry was created by adding the sequence encoding the first 57 amino acids of the Arabidopsis thaliana F1-ATPase γ-subunit (At2g33040) to the 5' end of mCherry (14). The plastid marker WxTP-mCherry was created by adding the sequence encoding the first 111 amino acid residues (including the transit peptide sequence of the rice waxy gene) to the 5' end of mCherry (15). The ER marker

8 SP-mCherry-HDEL was created by adding the sequence encoding the signal peptide

of AtWAK2 to the 5' end of mCherry and adding the sequence encoding HDEL to the

3' end of mCherry (13, 16). The primers used to amplify these marker genes are given

in SI Appendix Table S10.

To introduce plasmid DNA into onion epidermal cells, the particle bombardment

method was adopted using a helium-driven particle accelerator (PDS-1,000/He;

Bio-Rad) according to the manufacturer’s recommendations. Three micrograms of

plasmid DNA (1 µg/µl) was mixed with 10 µl of a gold particle (60 µg/µl; diameter:

0.6 µm) solution, 10 µl of 2.5 mM CaCl2, and 4 µl of 0.1 M spermidine, and

incubated for 30 min at room temperature. The plasmid-coated gold particles were

rinsed with 70% ethanol and 100% ethanol, respectively, and then gently suspended

in 10 µl of 100% ethanol. The gold particles were bombarded twice into onion cells

using the particle delivery system with 1100 p.s.i. rupture discs. The bombarded onion

epidermal cells were cultured on MS medium at 25°C in darkness for 24 h.

To detect the GFP- or mCherry-tagged proteins, an LSM710 laser scanning

confocal microscope (Zeiss) was used. GFP was excited at 488 nm, and the

fluorescence emission was detected between 493 and 560 nm. mCherry fluorescence was excited at 543 nm and fluorescence emission was detected between 585 and 680 nm. For the plasmolysis assay, bombarded epidermal cells were plasmolyzed by 20 min of exposure to 30% sucrose before confocal microscope scanning with a 10 × 0.3 numerical aperture objective. For the co-localization assay, bombarded epidermal cells were scanned with a 40 × 1.0 numerical aperture water-immersion objective, 63

9 ×1.4 numerical aperture oil immersion objective or 100 × 1.3 numerical aperture oil immersion objective.

Preparation of Ms1-specific antibodies and immunoblotting. The peptide

CEPVVAAVDLGGGVP from Ms1 was used to raise rabbit polyclonal antibodies against Ms1 and the antisera were affinity-purified. Total proteins were extracted with

Plant Protein Extraction Buffer (50 mM Tris-Cl, pH 7.5, 150 mM NaCl, 10 mM

MgCl2, 1% NP-40, 1 mM PMSF and protease inhibitor cocktail).

The proteins were separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) then transferred to a polyvinylidene fluoride (PVDF) membrane (IPVH00010; Millipore), which was blocked with 5% (w/v) non-fat dried milk in phosphate-buffered saline (PBS) with 0.1% (v/v) Tween-20 (PBST) at room temperature for 1 h. Primary antibodies were added to the blocking solution at various dilutions as indicated below and incubation was continued overnight at 4°C. After being washed three times with PBST, the PVDF membrane was incubated with peroxidase-conjugated secondary antibodies for 1 h at room temperature followed by three washes with PBST and detection using a chemiluminescence kit (RPN2106; GE

Healthcare). The anti-Ms1 antibodies were diluted 1:600; monoclonal anti-MBP-tag antibodies (E8032; New England Biolabs) were diluted 1:2000; anti-tubulin antibodies were diluted 1:2000; peroxidase-conjugated goat anti-rabbit IgG antibodies

(A0545; Sigma) were diluted 1:10,000 and peroxidase-conjugated goat anti-mouse

IgG antibodies (A4416; Sigma) were diluted 1:10,000.

10

Expression and purification of MBP-Ms1-His fusion proteins in Escherichia coli.

To prepare the construct expressing MBP-Ms1-His, primers Ms1-His-F and

Ms1-His-R were used to amplify truncated Ms1 (lacking the signal peptide and

trans-membrane domain) fused with 6×His at its C-terminus. To prepare the construct

expressing His, primers His-F and His-R were used for annealing. The sequences of the primers used for PCR are given in SI Appendix Table S8. Next, the PCR fragment

and annealing product were in-fused into pMal-c2x (New England Biolabs) digested with EcoRI and PstI, respectively.

All fusion proteins were expressed in E. coli strain BL21. An overnight preculture

of E. coli in LB medium (4 ml) was used to start a 200 ml culture in LB medium.

Protein expression was induced using 0.5 mM IPTG at an A600 of 0.6–0.8, and the

bacteria were allowed to grow at 16°C overnight. The cultures were then cooled to

4°C and resuspended in lysis buffer (50 mM NaH2PO4, pH 8.0, 300 mM NaCl and 10

mM imidazole). Next, the cells were lysed by sonication, followed by centrifugation

at 9,000 x g. The resulting supernatant was applied to an Ni-NTA agarose column

(30210; Qiagen). Proteins that bound nonspecifically to the column were washed off using lysis buffer containing 20 mM imidazole. The His-tagged protein was then eluted with lysis buffer containing 500 mM imidazole. The plant samples were concentrated and dialyzed into PBS buffer (10 mM NaH2PO4, 10 mM KH2PO4, 2.7

mM KCl and 137 mM NaCl, pH 7.4) using Amicon centrifugal filter devices

(UFC501024; Millipore).

11

Protein-lipid overlay assay. PIP lipid strips (P-6001) and membrane lipid strips

(P-6002) were purchased from Echelon Biosciences. A protein-lipid overlay assay was performed as described previously (17) with modifications. First, the lipid membranes were blocked in 3% (w/v) fatty acid-free ovalbumin (A5253; Sigma) in

PBS for 3 h at room temperature. Second, purified MBP-Ms1-His, MBP-Ms1N-His,

MBP-Ms1C-His, or MBP-His was added at a final concentration of 10 µg/ml and

incubated overnight at 4°C followed by three washes with PBST and three washes

with PBS. Third, lipid membranes were incubated with anti-MBP antibodies (E8032,

diluted 1:2,000; New England Biolabs) in blocking buffer for 3 h at room temperature

followed by three washes with PBST and three washes with PBS. Then, the lipid

membranes were incubated with mouse antibodies (A4416, diluted 1:10,000; Sigma)

in blocking buffer for 1 h at room temperature followed by three washes with PBST

and three washes with PBS. Finally, the membranes were processed for enhanced

chemiluminescence detection.

12 Fig. S1. Phenotypic characterization of ms1. (A) Microspores from DAPI-stained

Ms1 and ms1e UM: unicellular microspore; BP: bicellular pollen; MP: mature

pollen. n = 100, bar = 50 μm. (B) Paraffin sections of anthers from Ms1 and ms1e

at different stages. E: epidermis; Ar: archesporial cell; Sp: sporogenous cell; En: endothecium; ML: middle layer; T: tapetum; MMC: microspore mother cell; MC:

meiotic cell; Tds: tetrads. Bar = 50 μm.

13 Fig. S2. Physical map of Ms1 using the SNP markers from the MutMap analysis.

SNP markers and physical distances are shown for chromosome (Chr.) 4BS. Double

dot lines indicate the unavailable sequence (gap) in the candidate region. Plants 18-19,

11-240, 11-262, 2-137 and 25-1 were male-sterile (ms1e/ms1e) recombinants. Plant

18-22* was a fertile (Ms1/ms1e) recombinant. B: homozygous SNP; H: heterozygous

SNP. TGACv1_scaffold_328661_4BS, TGACv1_scaffold_328309_4BS, 4BS-sc1246 and 4BS-sc963 are genomic DNA accession numbers from the IWGSC (4). The primer sequences for the mapping markers indicated above are included in SI

Appendix Table S2.

14 Fig. S3. Complementation of ms1e by transformation with Ms1. (A) Spikes from

Ms1, ms1e and a transgenic line containing Ms1p::Ms1 in a ms1e background. Bar =

1 cm. (B) Phenotypes of floral organs from Ms1, ms1e and a transgenic line containing Ms1p:Ms1 in a ms1e background after removal of the palea and lemma.

Bars = 1 mm. (C) Seeds of Ms1 and a transgenic line containing Ms1p:Ms1 in ms1e; no seed developed in ms1e. Bars = 1 mm. (D) Mature pollen grains stained with 1%

I2-KI from Ms1, ms1e and a transgenic line containing Ms1p:Ms1 in a ms1e background. Bars = 200 µm.

15 Fig. S4. Sequence alignment of Ms1 and its orthologues in hexaploid wheat. The sequences are based on the genomic sequence of Chinese Spring (4).

16 Fig. S5. Sequence alignment of Ms1 and its orthologues in the family.

The proteins are named according to their species. These sequences were used to produce the phylogenetic tree shown in Fig. 2B.

17 Fig. S6. Ms1 expression in anthers and ms1g and expression and purification of

recombinant MBP-Ms1. (A) Ms1 was expressed in developing anthers during

microspore meiosis. Meiosis: meiosis-stage anther; UM: unicellular microspore-stage anther; BP: bicellular pollen-stage anther; MP: mature pollen-stage anther. (B) No

Ms1 was detected in ms1g. Total proteins were isolated from anthers of Ms1 (A) and

(B) or ms1g (B) plants in microspore meiosis, followed by SDS-PAGE and

immunoblot analysis using anti-Ms1 antibodies. Tubulin was used as a loading

control. MBP-Ms1-His and MBP-His (C) or MBP-Ms1N-his and MBP-Ms1C-His (D) were purified from bacteria expressing MBP-Ms1-His, MBP-Ms1H-His,

MBP-Ms1C-His and MBP-His by affinity chromatography, respectively. M: marker.

18 Fig. S7. Complementation of ms1g by transformation with Ms-A1. (A) Spikes from Ms1, ms1g and a transgenic line containing Ms1p::Ms-A1 in a ms1g background.

Bar = 1 cm. (B) Phenotypes of floral organs from Ms1, ms1g and a transgenic line containing Ms1p:Ms-A1 in ms1g after removal of the palea and lemma. Bars = 1 mm.

(C) Seeds of Ms1 and a transgenic line containing Ms1p:Ms-A1 in ms1g; no seed developed in ms1g. Bars = 1 mm. (D) Mature pollen grains stained with 1% I2-KI

from Ms1, ms1g and a transgenic line containing Ms1p:Ms-A1 in ms1g. Bars = 200

µm.

19 Fig. S8. The predicted structures of Ms1. (A) The predicted signal peptide domain in Ms1 according to SignalP-4.1. C-score: raw cleavage site score; S-score: signal peptide score; Y-score: combined cleavage site score (18). (B) The predicted trans-membrane domain in Ms1. The trans-membrane domain was predicted using the transmembrane strands and topology of the β-barrel outer membrane protein prediction web server (PRED TMBB; http://bioinformatics.biol.uoa.gr/PRED-TMBB/)

based on a hidden Markov model (19). (C) The structure of the eight-cysteine motif in

Ms1 from wheat and other representative members of the Poaceae. (D) Diagram of the structural domains in Ms1.

20 Fig. S9. Ms1 is not a secretory protein. Onion epidermal cells were transiently transformed with constructs encoding GFP and Ms1-GFP. Plasmolysis was induced by adding 30% sucrose prior to confocal scanning. The images were recorded using the GFP channel under a confocal microscope. Bar = 50 µm.

21 Table S1. Counts of paired-end reads from the RNA-seq data Clean Read Raw bases GC Sample Raw reads Clean reads bases length (Gbp) content (Gbp) (bp) Ms1/ms1e_lane6 30,219,193 6.1 27,106,438 5.4 101 52.6% Ms1/ms1e_lane7 30,245,424 6.1 27,165,790 5.4 101 52.7% ms1e/ms1e_lane6 62,097,423 12.5 55,599,082 11.1 101 52.7% ms1e/ms1e 62,091,525 12.5 55,669,677 11.1 101 52.3% _lane7 Ms1/Ms1_lane6 117,864,027 23.8 105,372,896 21.1 101 52.7% Ms1/Ms1_lane7 117,852,306 23.8 105,484,538 21.1 101 52.4%

22 Table S2. Primers used for the SNP markers cM Primer position on Scaffold ID Sequence (5'→3') name Chr. 4B CAGACGAAGTCGCCATCATCAATC YZ5 6.843 TGACv1_scaffold_329667_4BS TTCCTTGTATATGAGCCAGGTCTG CTGTTCTTAGAACTCTTCTTGGTAG DYZ12 6.843 TGACv1_scaffold_328717_4BS GAGATCGAGGAACCAACATATAAGC TACCATCGTGCGAAGAGGGGAAC DYZ11 11.623 TGACv1_scaffold_330253_4BS ACGCTACCGAAAGAATCCTATCCAC TGGGAAATAATCGTGGAAACAGTTC DYZ14 10.41 IWGSC 4BS-sc127 CGCACATGTCGGCGACTGAG TTTTTTGTTGTGTGGATTGATGACC DYZ2 12.837 TGACv1_scaffold_328808_4BS CGATGGTAAAATGGCTAAATTCTGG TCTGGAGGCAGCCCGGTAGCGAC DYZ10 12.837 IWGSC 4BS-sc661 TTGTGCAGGTATTGGTGATTTGCGC AAGTTTTGCGACAGCTTGAACTCTC DYZ9 15.111 IWGSC 4BS-sc2875 CTCGCGCTGTGAGTGTGCTTTCT ATGCTCTTAGTACAAGTATTGTGCG DYZ15 15.111 IWGSC 4BS-sc2875 TCAACGAATAGAGAAGCTGCCATG CAAACAAAAAAAGTCACGGACATTA DYZ3 15.111 TGACv1_scaffold_328625_4BS TAGTTCGTCAAAACCTATCAACATG GTTAACCGTGATTGTTGTTCCTCCT DYZ13 15.111 TGACv1_scaffold_328576_4BS GGAAAGAAAAGATCAGCCCCTAGTG ATATGAAATGTCGTGTAATGGCAC DYZ4 18.525 TGACv1_scaffold_328921_4BS AACTCCTTCAACAAGATGACAACG AAAGCAAAACGAACTCATATCCAAT DYZ18 19.662 TGACv1_scaffold_329888_4BS TGGAGTTCTTGATGAACTAGCGACG CGGAGAAGCAGAATGAAAAGTAAAC DYZ8 19.662 IWGSC 4BS-sc2277 CGATGGAGTTGACCTAAGGGACG AGGCTTCTCAAGATCAAGGGAATG DYZ16 19.0935 IWGSC 4BS-sc1948 TGACTTTTCAAGAAGGCTGAAATTC CGAGATCCACCGGTTTTGACAC DYZ21 19.662 TGACv1_scaffold_329160_4BS CATGGCACATCGTTTACAATCAG TCGGTGAGTAATAGTTAAAGAAACGG DYZ23 19.662 TGACv1_scaffold_328661_4BS CAAGCGTGATCACAAGGTGTTGG CCGATCCGGTGCACATGTTAGTAAC DYZ20 19.662 TGACv1_scaffold_328661_4BS TCGTGAAAAGAGGTCGGGTCAAACC CTACTTCACCACCTCCTACAACTGC DYZ19 19.662 IWGSC 4BS-sc963 CAAAAACCACATCAAGAGCAACCTT AAGTACTCAAAGTGTACTCCCTCCC YZ2 19.662 TGACv1_scaffold_328309_4BS AACCGCCGCGGTGTCCCTCT

23 ATGATCTCAAAGCTTGGATATATTC DYZ7 19.662 IWGSC 4BS-sc1383 CCCATCCACGAAGATATATTATTGG CACACTGTTCTCTCTATTGGTTTCC DYZ6 21.936 TGACv1_scaffold_330289_4BS ACCAATAAGGGCTAAAAGTTCCTCC TTGGAGATTAGATCCACGAAAATCC DYZ17 25.351 IWGSC 4BS-sc495 CTCCCCACTGTGACCAGCCTTAG TTGACTTTGTGCTATAAGTATATGC DYZ5 19.0935 TGACv1_scaffold_328611_4BS AATCTGACTTGTATTGTGTTATTGC ATCTGAATTTGTGTTCGCTGCCAC YZ105 25.351 TGACv1_scaffold_1126105_4BS TTTCTTTGCGAATGGAAGTTAAAC TGACv1_scaffold_4870643 CAGGTAACCACAAAATTGACATTCC YZ8 44.689 _4BS GGTAAATGAGGATATAAACCAAATTTTC

24 Table S3. Counts of paired-end reads from the resequencing data Raw bases Sample Raw reads Read length (bp) GC content (Gbp) Ms1 71,961,639 150 49% Ms1 462,611,898 150 48% Ms1 419,280,872 150 48% Ms1 85,643,682 150 48% 524.7 Ms1 396,226,584 150 48% Ms1 89,621,510 150 48% Ms1 113,668,901 150 48% Ms1 110,060,149 150 48% ms1e 36,727,284 150 50% ms1e 417,756,344 150 50% ms1e 427,418,147 522.1 150 50% ms1e 429,752,054 150 50% ms1e 428,791,204 150 50%

25 Table S4. Nine candidate genes predicted within the 198-kb sequence between DYZ 23 and DYZ19

Gene TGAC location CS_NR Gene location Uniprot ID Annotation ID

TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold Globulin 3B n=1 C1 28661_4BS: 150014: UniRef50_B7U6L5 Tax=Triticum aestivum 20888-18679 1375948-1378157 RepID=B7U6L5_WHEAT Retrotransposon protein, TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold putative, Ty1-copia subclass C2 28661_4BS: 150014: UniRef50_Q10IE2 n=4 Tax=BEP clade 30252-35330 1365268-1361390 RepID=Q10IE2_ORYSJ TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold Putative polyprotein n=6 C3 28661_4BS: 150014: UniRef50_Q6ATL7 Tax=Oryza sativa 38394-42366 1358138-1353609 RepID=Q6ATL7_ORYSJ TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold Globulin 3B n=1 C4 28661_4BS: 150014: UniRef50_B7U6L5 Tax=Triticum aestivum 43063-45385 1352840-1350518 RepID=B7U6L5_WHEAT U3 small nucleolar TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold RNA-associated protein 25 C5 28661_4BS: 150014: UniRef50_M7YQ04 n=2 Tax=Pooideae 56052-66725 1341757-1329881 RepID=M7YQ04_TRIUA TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold Predicted protein n=2 C6 28661_4BS: 150014: UniRef50_F2D737 Tax=Triticeae 94254-92399 1303196-1305051 RepID=F2D737_HORVD Retrovirus-related Pol TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold polyprotein from transposon C7 28309_4BS: 150014: UniRef50_M7YSL0 TNT 1-94 n=1 34286-37738 1254660-1251208 Tax=Triticum urartu RepID=M7YSL0_TRIUA Retrotransposon protein, TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold putative, Ty3-gypsy C8 28309_4BS: 150014: UniRef50_Q2QZQ1 subclass n=50 Tax=Oryza 65849-72658 1223097-1216288 RepID=Q2QZQ1_ORYSJ Uncharacterized protein TGACv1_scaffold_3 Scaffolds_v3_chr4B|scaffold UniRef50_UPI00035 LOC101783098, partial n=2 C9 28309_4BS: 150014: 1185658-1180374 08A36 Tax=Setaria italica 103288-108572 RepID=UPI0003508A36

26 Table S5. The fourteen ms1 alleles in bread wheat

Amino acid or protein allele Accession name Mutagenesis type Mutation type change disruption of splice donor site ms1d.1 FS2 EMS mutagenesis G329A for 1st intron disruption of splice donor site ms1d.2 NC41 EMS mutagenesis G329A for 1st intron ms1e FS3 EMS mutagenesis G1431A C1432 P124R and frameshift

ms1g LZ spontaneous mutation Deletion△ None

ms1h NC642 EMS mutagenesis C1762T stop after P190 disruption of splice acceptor ms1i NC790 EMS mutagenesis G1603A site for 2nd intron ms1j NC791 EMS mutagenesis C1775T S195F disruption of splice acceptor ms1k NC28 EMS mutagenesis G1397A site for 1st intron ms1l NC130 EMS mutagenesis C226T stop after P75

ms1m NC226 EMS mutagenesis C1472T stop after K134

ms1n NC904 EMS mutagenesis T164A V55D

ms1o NC955 EMS mutagenesis G281A C94Y

ms1p NC318 EMS mutagenesis G155A C52Y

ms1q NC110 EMS mutagenesis C148T stop after A49

The mutant site in each allele is based on the sequence in the variety from which the allele is generated.

27 Table S6. Ms1 homologs in other species Gene ID or name Species Contig information Triticum Ms1 TGACv1_scaffold_328661_4BS 94254–92399 aestivum Triticum TaMs-A1 TGACv1_scaffold_290346_4AL 29315–27459 aestivum Triticum TaMs-D1 TGACv1_scaffold_361174_4DS 69146–71038 aestivum Triticum TtMS-A1* N/A turgidum Triticum TtMS-B1* N/A turgidum TuMS1 Triticum urartu TGAC_WGS_urartu_v1_contig_181471:4268–2767 ctg7180000362290, whole-genome shotgun AetMS1 Aegilops tauschii sequence (113193–115077) LOC_Os03g46110 Oryza sativa Chr. 3: 26076622–26073641 ONIVA03G29620 Oryza nivara Chr. 3: 25338122–25340561 OPUNC03G26010 Oryza punctata Chr. 3: 28367645–28369999 LpMS1 Leersia perrieri scaffold_3_115 23556–20677 Phyllostachys Phyllostachys PH01001922 245407–242952 heterocycla MS1 heterocycla Contig: Zjn_sc00058.1, cultivar: Nagirizaki, Zoysia japonica MS1 Zoysia japonica 989555–991578 Contig: Zpz_sc01372.1, cultivar: Zanpa, Zoysia pacifica MS1 Zoysia pacifica 22268–24309 Brachypodium Bradi1g13030 Chr. 1: 9858880–9861102 distachyon Brachypodium Brast02G255500 Chr. 2: 18898450–18901089 stacei Panicum Pavir.Ga01748 Chr. 7a: 21869965–21872338 virgatum Panicum Pavir.Gb01613 Chr. 7b: 21324041–21326309 virgatum Lophatherum gracile Lophatherum N/A MS1† gracile

28 Phragmites australis Phragmites N/A MS1† australis Sb06g017510 Sorghum bicolor Chr. 6: 46903482–46905662 Hordeum vulgare Hordeum vulgare Contig: HVVMRXALLeA0005J15: 14967–13033 MS1 Sevir.7G115900 Setaria viridis Chr. 7: 19782572–19784377 Si012756m.g Setaria italica Chr. 7: 20903438–20903770 GRMZM2G151021 Zea mays Chr. 2: 49392293–49394518 GRMZM2G166484 Zea mays Chr. 10: 119825127–119826659 N/A, not applicable. * TtMS-B1 and TtMS-A1 were cloned using primers from Ms1 and TaMs-A1 and their sequences were confirmed by sequencing. † Lophatherum gracile and Phragmites australis MS1 were cloned by PCR using degenerate primers and genome walking. The primers used to amplify Ms1 orthologues are listed in SI Appendix, Table S9.

29 Table S7: Align the sequence of 5' portion of Ms1 within angiosperm lineages

Ms1-5'_portion ------ATGGAGAGATCCCGCGGGCTGCTGCTGGTGGCGGG 35 Aco005198.1 ATGGATCCCACCCTCTCCCTCTTCCTCCTCGCTGCGGCGGCGCTCGCCGGCGCCGGCGCCGCCGCAGTGGCGGAGCCAGCGCCGTCGAGTTGCGCG 96 Cucsa.239340.1 ------ATGGC 5 Potri.001G119000.3 ------ATGGCTTCTTCTCTCAAGATTTC 23 At1g05450.2 ------ATGAACTCCAATAGTTTCTTAATCTCAGCAGCCTTAATCTTCTCTCTACTATCATC 56 PGSC0003DMT400032496 ------ATGGC 5 DCAR_004911 ------ATGGTGGGTGTGGCGGTGGC 20

Ms1-5'_portion GCTGCTGGCGGCGCTGCTGCC-GGCGGCGGCGGCGCAGCCGGGGGCGCCGTG-CGAGCCCGCGCTGCTGGCGA-CGCAG--GTGGCGCTCTTCTGC 126 Aco005198.1 GAGGAGATCGTCGGGATCTCC-GCTTGCCTCCCCCTCGTCGTGGCGGCGACG-CCGATCACCGCCGCCGCCAA-CGCCACCGCGGCGGCGGCGGCG 189 Cucsa.239340.1 GGTGGTTGCGATGTCGCCGCCCACGGGATGCACCACTAGAGAGC-TGCTTTT-GCTCTCTCCATGTCTGCCTTTCATTTCTGCTCCGCCAAACAAT 99 Potri.001G119000.3 TATTCTGGCGATGATGGTTGTAGTTTTTTTTTCGAGCGCGACAA-CCTTAAC-GAGAGCA-CAAGACCAGTCT-ACTTCTTGTGCATCTAAGTTA- 114 At1g05450.2 AAATTCTCCAACATCGATTCTTGCTCAAATCAATA-CACCATGTTCACCATCTATGCTCTCTAGCGTTACAGGTTGCACGAGTTTTCTAACGGGAG 151 PGSC0003DMT400032496 GTTGACGGCGGCGATAATTGC-GTCAGATGCGCAA--ACAACGC-CGCCGTC-GTGTGCCTCGAAATTAGTGC-CATGT--GCGCCTTACCTTAAC 93 DCAR_004911 GTTGTTGGTGGTTATGGCAGT-GATGACGGCGGAAGGACAGGATATTCCGTC-GTGTGCATCGGGACTGGTGC-CATGC--GCGGATTATTTGAAT 111

Ms1-5'_portion GCGCCCGACATGCCGACGGCCCAGTGCTGCGAGCCCGTCGTCGCCGCCGTCGACCTCGGCGGCGGGGTGCCCTGCCTCTGCCGCGTCGCCGCCGAG 222 Aco005198.1 GCGGAGGCGGCGCCATCCGACGCGTGCTGCGACGCGTTCCTCCGTGGCCTCG---TCGGCGGTGGCGCCGCGTGCCTCTGCCACCTCTTACGGGAC 282 Cucsa.239340.1 CTTTCCGATACGGTTCCTTCTGAGTGCTGTGATGCGTTCTCCTCCGCTTACAG---TGCCGGCGGAGGGATTTGCCTTTGTTATTTTCTTCGTGAG 192 Potri.001G119000.3 GTACCATGTCAACCACCAGACAGCTGCTGCAACTCCATCAAAGAAGCGGTTG----CAA--ATGAGCTTCCTTGTCTTTGCAAACTCTATAACGAC 204 At1g05450.2 GTGGT-AGTTTTCCGACCTCAGATTGTTGTGGGGCTCTTA----AATCGTTAA--CCGGAACCGGTATGGACTGTTTGTGT----CTGATAGTAAC 236 PGSC0003DMT400032496 GCATCGAGT---CCCCCTGCGGAGTGTTGTGATCCATTGAGAGAAGCAATAA----CAA--ATGATTTAGATTGTTTGTGTAAATTGTATGAAAAT 180 DCAR_004911 GCAACCAGTAAGCCGCCGGCTTCGTGTTGTGATCCGATCAAGGAAGCTGTTA----CGA--AACAGCTTCCGTGTTTGTGTAATCTTTATAATACT 201

Ms1-5'_portion C-CGCAGCTCGTCATGG---CGGGCCTCAACGCCACCCACCTCCTCACGCTCTACAGCTCCTGCGGCGGCCTCCGCCCCGGCGG------CGCCC- 307 Aco005198.1 C-CGCTCCTCCTAGGGT---TTCCGATTAACACCTCCCGCATCGCCTCGCTCTTCTCCTCCTGCGGAGCTCCGAACCCTAGCGA------CTCCGC 368 Cucsa.239340.1 C-CTCAGATTTTAGGCT---TTCCGTTGAATCGAACGAAGTTCATGGCTCTGTCTTCGTTTTGTCCTCTTAATGGTGAAAACGGA--ATATATTTG 282 Potri.001G119000.3 C-CCAATTTGTTTCAGAGTTTGGGTATAAATGTCACTCAGGCTGTCATGCTCAGCCAGAGATGCGGTGTCACCACTAATCTCAC------TAGTTG 293 At1g05450.2 CGCAGGTGTTCCAATCAGTATTCCTATAAACCGAACTTTAGCCATCTCTCTCCCTCGTGCATGTGGCATTCCTGGTGTCCCCGTTCAATGCAAAGC 332 PGSC0003DMT400032496 C-CAACTTTGTTGCCTTCACTTGGTATTAATATTACTCAAGCACTTGCACTTCCTAAGGCTTGTAATATTTCTGGTGATCTTAA------TGCTTG 269 DCAR_004911 C-CTGGCTTGTTGAAGTCTTTTGGGATTAATGTTACTCAAGCTGTGAGGCTTCCTACTTTGTGTGGTGTTCCTGGTGATCTCTG------TCAGGG 290

Ms1-5'_portion -ACCTC-GCCGCCGCCTGC------324 Aco005198.1 GGACTC-GTCGTTCTCCGAGATGTGCAATGAATCCCAGTCGTTGCCACCGTTCCGAAGCATCACATGGAATGATACAAGCATACCAAGCCCAGCGA 463 Cucsa.239340.1 GAGAAGAATAGTTCTCTGGACTC--GGTTTGTGCTGCTTCACAAACTCTGCCTCCTCTTCAAAGCTCGAGGATTCCAAGAATCCAAGAGCCGGATA 376 Potri.001G119000.3 CAGCGC-TTCAGCTCCAACGCCAGCTGGTTCAG-CAGTTCCTGGAAACGATGGAGATAATGGTGGTAGCAGGATGTCATTGTCGACTGGACTTTCA 387 At1g05450.2 TTCTGCAGCACCTCTCCCTACTCCAGGTCCTGCGTCTTTCGGTCCGACCACTTCTCCTACAGATTCGCAAACTTCTGATCCTGAAGGGTCTGCTTC 428 PGSC0003DMT400032496 TACTACAGGTGGTGCTCCAGGTC-CAAGTTCTGAAGGCTTGCCACCCCCAGGTAACTAA------327 DCAR_004911 TAATTCCTCTCTCTCCCCCTCTCTCTATCTCTCTCTCCCCCCTCTCTCTATCTCTCTCTCTCCCTCTGTCTCGCTCTCTCCCTCCCTCTCCCTCCC 386

Ms1-5'_portion ------Aco005198.1 GTTCTG---AAGAAACCATTAATCCGTCCTCTGCAAATCTGACCCGACCTGCCCCCGTTCCGATCTGCCCCCCTGTAGCATGTCCGAAACCGTCGG 556 Cucsa.239340.1 GTCCTGCTGATGAGAACATAGAAACTCCCGACGTGGGTTTACCACCAAATGCAATTGTATCGCCCTCTGCACCTGCAGAAAAACCGCAGCCGCCTC 472 Potri.001G119000.3 GGCTTG---CTCGTATTATTGGTCGCGTCTCTCCTGCATTAG------426 At1g05450.2 TTTCCGTCCGCCCACTTCTCCGACAACTTCGCAAACTCCTAATGACAAGGATCTCAGCGGATCGGGCAACGGAGGAGATCCAATGGGGTTTGCTCC 524 PGSC0003DMT400032496 ------DCAR_004911 TCTCCC---CCCTCTCTCTCAATTAGGCATATTGATAAAATCTCTCCTCTCTCTCTCTCTCCCTCTGTCTCTCTCTTTCTCGCTCTCTCTCCCCCC 479

Ms1-5'_portion ------Aco005198.1 AGCCCGACCTCGCACCACAACCACGCCCGGATGCGTCAGCCCGATCGCTGCTGGCCAGTGCAATGTCGCTCATTTTTGCTGTCCTCGCATTCTTTA 652 Cucsa.239340.1 CGTCATCTGCTACAGCTGAACGTTTTTTATTGGCAAGAAAATGTATTGGTTTGTTCTTTTCAGGTCCACTCTTCCTTATTCACATTTTGTGA---- 564 Potri.001G119000.3 ------At1g05450.2 ACCTCCACCCTCGTCGTCGCCTTCCTCTTCGCACTCTCTCAAGCTTTCGTATCTTCTATTTGCTTTCGCCTTTACGATTATCAAATTCATCTAA-- 618 PGSC0003DMT400032496 ------DCAR_004911 CCTCTCTCCCAATTAG------495

Ms1-5'_portion ------Aco005198.1 TTTTCGAAGCGATGAGTCGTGCAGCCGACTAA 684 Cucsa.239340.1 ------Potri.001G119000.3 ------At1g05450.2 ------PGSC0003DMT400032496 ------DCAR_004911 ------

Aco005198.1 (JGI), Ananas comosus , Bromeliaceae , Cucsa.239340.1 (JGI), Cucumis sativus, Cucurbitaceae, Cucurbitales Potri.003G113900.3 (JGI), Populus trichocarpa, Salicaceae, Malpighiales At1g05450.2 (Arapot11), Arabidopsis thaliana, Brassicaceae, Brassicales PGSC0003DMT400032496 (JGI), Solanum tuberosum, Solanaceae, Solanales DCAR_004911 (JGI), Daucus carota Apiaceae, Apiales

30 Table S8: Align the sequence of 3' portion of Ms1 within angiosperm lineages

Ms1-3'_portion ------ERN03927.1 ATGGTTCCAAGCATTAGAGAACCCATTTGCTCGGCCATTATTTGGTGGTTGCTACTGGTTCTTACCGGGGGTTTTTGTGGGATAAGCGGCCATGG 95 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 CTTCTCTCGCTCTGAAGCTTCATCCATAGAGCTCAGTCATGGCCATGGCCATGGCTTCTCTGCAAGTTTACACGAAGCTCTTGTTGATGG-ACCG 189 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------ATGAAGAAGACGATTCAAATCCTCCT--CTTCTTCTTCTTCCTCATCAATCTCACCA 55 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 GAGCTCGAACCAGGCTTTGGATTCAGAAGCCATGGCTTGACCTTTGCCGAGGCTTCATCCATCAAGCACAGGCAGCTCCTTTACTACATCGATAG 284 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ACGCTCTC-TCAATCTCC---TCTGACGGCGGCGTTCTCTCCGATAACGAAGTCCGTCACATTCAACGCCGTCAATTACTCGAATTCGCCGA--- 143 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 GTATGGAGACAGAGGAGAGAGTGTATTCGTAGATCCGAGCTTCGAGTTCGAGAATTCGAGGCTCCGAGAAGCTTACATTGCTCTTCAGGCATGGA 379 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------AC-GAAGCGTCAAAATCACCGTTGATCCTTCTCTAAACTTCGAGAATCCGAGATTGCGAAATGCTTATATAGCTCTACAAGCTTGGA 229 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 AAATGGCCATTATTTCAGATCCCATGAACATCACAGGTAACTGGATTGGCCCTATTGTCTGCAACTACACTGGAGTCTTCTGCTCAAAGAGCTTA 474 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 AACAAGCGATTCTCTCTGATCCAAACAATTTCACTTCGAATTGGATCGGATCCAATGTCTGTAACTACACCGGAGTTTTCTGTTCTCCGGCGCTT 324 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 GATGGCTTGAACCTCACTGTAGTGGCCGGAATCGACTTGAACCACTCTGATATCGCCGGTTATTTGCCTGACGAGCTCGGAAAACTCACCGATCT 569 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 GATAATCGGAAGATTCGTACCGTCGCCGGAATCGATCTCAATCACGCAGATATCGCTGGTTATTTACCTGAAGAGCTTGGTTTGTTATCAGATCT 419 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 AGCTCTTCTTCACCTGAACTCAAATCGATTTTGCGGCACCATTCCGCACACTTTCAAGAGATTGAAGCTCCTCTACGAGCTTGATCTCAGCAATA 664 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 TGCTTTGTTTCATGTTAATTCAAACCGGTTTTGTGGTACTGTACCACACCGGTTTAACCGGCTTAAGCTTTTATTCGAGCTTGATCTTAGTAACA 514 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ATCGATTTGCTGGTCGATTCCCCACTGTAGTTCTCAAACTACCTACCCTGAAATACTTAGATCTCAGGTACAATGAGTTCGAAGGCCCCGTTCCT 759 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ACCGGTTCGCTGGGAAGTTTCCGACGGTTGTCTTGCAATTACCGTCGTTGAAGTTTTTAGATCTCCGGTTTAATGAATTTGAAGGAACTGTACCG 609 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 TCGTCTCTCTTCAATCGGCCATTAGATGCCATTTTCCTCAACCACAATCGATTCCATTTTGAAATTCCAGAGAACTTTGGGAACTCTCCTGTCTC 854 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 AAAGAGCTTTTTAGTAAAGATCTTGACGCGATTTTCATAAACCATAACCGGTTCCGGTTTGAATTACCGGAGAATTTTGGTGATTCGCCGGTTTC 704 PGSC0003DMG400010093 ------DCAR_026603 ------

31 Ms1-3'_portion ------ERN03927.1 TGTGGTGGTTCTGGCAAACAACAGATTCAGGGGCTGCATTCCTTCGAGCTTGGCAAAAATGGCTCCCACATTGAATGAAATCATTATTATGAATA 949 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 GGTTATTGTTTTGGCGAATAACCGGTTCCATGGTTGTGTACCATCGAGCTTGGTGGAGATGAAG---AATCTTAACGAGATCATCTTCATGAACA 796 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ATGGATTATCTGGTTGTGTACCAGAGGAGTTTGGAGCTCTGAAAAATCTAACTGTTTTGGATGTGAGCTTCAACAAATTGGTGGGGAATTTGCCT 1044 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ATGGTCTTAATTCTTGTTTACCGTCTGATATCGGACGGTTAAAGAACGTGACGGTGTTTGACGTCAGTTTTAATGAACTTGTTGGGCCGTTACCG 891 PGSC0003DMG400010093 ------DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 CTAAATTTAGGTGGACTTGTCTCTTTGGAACAGTTGAATGTTGCTCACAACATGCTTTCAGGTCAGATTCCTCCTCAAATTTGCAGTCTTCCAAA 1139 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 GAGAGTGTTGGTGAGATGGTTTCGGTGGAGCAGCTTAATGTGGCGCATAATATGTTGTCGGGGAAGATTCCGGCGAGTATTTGTCAGTTACCGAA 986 PGSC0003DMG400010093 ------ATGGCCTCTCCAGATTCTCCCCCATCTTTTTTTCCATTCCCAAC 44 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 TTTAGACAACTTTACTTTCTCTTATAACTTCTTTGAGGGAGAGCCCCCTGTGTGCTTGAGGCTCCCAAGCTTTGATGATAGGAGG---AATTGCA 1231 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 GCTTGAGAATTTCACTTATAGTTACAATTTCTTTACCGGAGAAGCGCCTGTGTGTTTGAGGTTGCCGGAGTTTGATGATCGGAGA---AATTGTT 1078 PGSC0003DMG400010093 TTTCCCCACAGCTACACCTTCAAATTCTACTTCTGATTCACCTCCTGCTCCTCCTCCTGACTCTTCATCTCCTCCTCCTCCACCACCTGACTCTT 139 DCAR_026603 ------

Ms1-3'_portion ------GGCGCCC 7 ERN03927.1 TTAATGGAAGGCCAAAACAGAGGCCAAGGAAACAATGTAAATCTTTTCTCTCTCATAGA---GTGGATTGTGGTAGTGTCAAGTGTGGTCGAGTT 1323 Aco000634.1 ------ATGGAAACCCTACCCTCGAGCTACAAGGAGAACGACGAGGACGAGGAGCCAATGTGGAGCGACGCCC 67 XM_008456796.2 ------ATGATGGATCCTCGAGCTTTTTTGTTGTGTTTCACTTTCATCTCCATTGCTT 52 ABK95485.1 ------ATGATGCTAAAAAAAGCTGTCATCCT---TCTTTCTTTGATCTGCATTTCGA 49 AT4G13340.1 TGCCGGGAAGACCTGCTCAGAGGTCTCCAGGGCAATGTAAAGCGTTTTTGTCTCGTCCGCCGGTAAATTGTGGATCGTTTAGTTGTGGCCGTTCT 1173 PGSC0003DMG400010093 CAGCTCCACCTCCTTCTCCACCAGCTCCAGATTCTCCTCCTCCGTCAGAATCAAAATCACCTCCACCGGCAGAATCTCCTCCACCTCCACCTCCT 234 DCAR_026603 ------ATGTTTTTTCTGAATTATCAGTGGCCACCAGCTCCTCC-----TCCAAAACCA 48

Ms1-3'_portion ---ACCTCGCCGCCGCCTGCGAAGGACCCGCTCCCCCGGCCGCCGTCGTCAGCAGCCCCCCGCCCC------CGCCTCCACCGTCCGCCGCACC 92 ERN03927.1 TCGCCATCTCCTCCACCGGTATTTTCGCCCCCGCCACCTCTGGTTTTATCACCCCCATCGCCACCAGTTTCATTGCCCCCTCCGGTGCCAATTTC 1418 Aco000634.1 TCGACCCCATCCCATCTCCTCCTTCTCCTCCTCCCTCCGCCGCCGCCGCCGCCGCCGCCGCACACT------CTCCTCCGCCGCCGCCTCCGCC 155 XM_008456796.2 TCGCCATCGCCGGAGC----TCAGTCTCCCTCCAGTCCTCCCACCGCCACCCCTTCTCCTCCCACCACCTCCGCCCCTCCTCCCGCCTCTACTCC 143 ABK95485.1 TTGCTGGTGTTTCTGG----TCAAGCACCAGCAACGTCACCAACAGCAGCACCAGCACCACCCACA------CCAACTTCTTCTCC 125 AT4G13340.1 GTGTCGCCTCGTCCTCCGGTTGTAACGCCGTTACCACCGCCTTCTTTGCCATCTCC------GCCTCCACCTGCGCCAATTTT 1250 PGSC0003DMG400010093 ACAGCAGCGGCGCCACCACCTTCGGCGCCTCCTCCAAAGCCGTCAGTGTCACCACCACCACCTTCTCCTAAGGCTCCTCCACCAGCTAATTCTCC 329 DCAR_026603 GCGCCGCCACCAAAGCCTGAACCACCTCCTGCACCTCCTCCTCCTCCGCCACCTAA------GCCGCCTCCTGCGCCTGCACC 125

Ms1-3'_portion T------CGCCGCAAGCAGCCAGCGCACGAC-GCACCACCGCCGCCACCGCCGTCGAGCG------AGAAGCCGTCGTCCCCGCCGCCGTCCCAG 174 ERN03927.1 TTCG-CCCCCACCTCCGCCAATTTCATTGCCACCACCACAGCCAATTTTGTCGCCACCGCCGCCACCAATTTTGTCGCCACCACCGGCTCCGCTA 1512 Aco000634.1 GGAGGAACACCCCGAACCCCAATCCAACGGCCACGCCTACCCAGCCGGCGTCGACGATCCTCCCCCGCCGTCCGATGACGACGCCGACGACCCCG 250 XM_008456796.2 TCCC-CCTGTTTCATCTCCCCCTCCAGCAGCAA------CTCCCCCTCCAGCTGCTACTCCTCCTCCAGCATCCCCACCACCGGCGTCTCCACCT 231 ABK95485.1 ACCG-CCAGCAACCACTCCTCCACCAGTTTCAG------CCCCACCTCC---TGTTACCCAATCTCCA------CCTCCAGCTACCCCTCCTCCA 204 AT4G13340.1 CTCA-ACACCTCCTACGCTTACTTCCCCACCACCTCCGTCACCGCCTCCGCCTGTTTATTCTCCCCCTCCTCCACCGCCACCACCTCCTCCGGTA 1344 PGSC0003DMG400010093 CCCT-CCAGCTTCATCTCCACCCCCACCATCAAAAGATTCTCCTCCTCCTGCTCCTCCTCCTTCCCCACCTCCTCCCCCTCCAGCAGTTTCACCT 423 DCAR_026603 AGCT-CCGAAGCCTACTCCAAATCCAGCACCTGCACCTCCACCAACACCACCGCCCCCTCCTCCACCGAATCCA---CCACCTCCTCCTCCACCT 216

Ms1-3'_portion --GACCACGACGGCGCCGCCCCCCGCGCCAAGGCCGCGCCCGCCCAGGCGGCCACCTCCACGCTCGCGCCCGCCGCCGCCGCCACCGCCCCGCCG 267 ERN03927.1 TATTCACCACCTCCGCCATCTCTGCCACCTCCTCTATATTCACCACCACCACCTTCGCCTCCCTCGCCATCTCCACCACCATCTCCTCTATACTC 1607 Aco000634.1 ACGACGACGACGACGACGACGACTCCGCCCCGGCGAGGAAGAAGCAGAAGCCCCTCTCCGCCTTCGCCTCCGCCGCCGCCGCCGCAGCCCCTCCT 345 XM_008456796.2 CCCGCATCCCCACCACCAGCGACTCCACCTCCGGCTTCCCCACCACCGGCATCTCCTCCTCCGGCCTCCCCACCACCGG---CTTCTCCTCCTCC 323 ABK95485.1 GTTTCAGCCCCACCACCTGCCACCCCTCCTCCCGCAACCCCACCACCAGCAACTCCTCCTCCCGCCACCCCACCACCAG---CAACCCCACCACC 296 AT4G13340.1 TATTCTCCTCCACCACCACCGCCCCCACCGCCTC------CTCCGCCAGTATATTCTCCTCCACCACCACCACCGCCCCCACCGCCTCC---TCC 1430 PGSC0003DMG400010093 TCTCCTCCACCACCAGTGAAAAATCAACCACCACCACCTGATTCTCCACCTCCTGCACCTGTTGCAAATCCGCCACAAAACTCCCCTCCACCTCC 518 DCAR_026603 AGTCCACCTCCAGCACCCCCGCCTGCGCCTGCTCCCCCTCCAAAGCCAGCTCCTGCACCACCACCAGCTCCACCTCCCCCACCAAGTCCCCCTCC 311

Ms1-3'_portion CCCCAGGCGCCGCACTCCGCCGCGCCCACGGCGCCGTCCAAGGCGGCCTTCTTCTTCGTCGCCACG---GCCATGCTCGGCCTCTACATCATCCT 359 ERN03927.1 GCCGCCTCCGCCTCCGCCTCCGCCTCCGCCACCTTCCCCTCCGCCACCTCCCCCTCCCCCTCCACCTCCACCTCCACCTCCACCATCCCCCCCAC 1702 Aco000634.1 CTCCCCGCTCCTCCCTCCGCCTCCGCCTCGGCCTCGAAGAAGCCGAAGAAGAAGAGCAACAACGTGTGGACCAAGTCCACCTCCCGCAAGGGCAA 440 XM_008456796.2 GGCATCTCCTCCACCAGCATCCCCTCCCCCAGCGATTCCACCGCCTGCACCATTGGCATCACCACCAACGGCAGTGCCAGCTCCTGCACCGAGCA 418 ABK95485.1 TGCTACTCCTCCACCAGCAACCCCTCCTCCCGCTGTTCCTCCACCAGCTCCATTGGCAGCTCCACCAGCTCTTGTTCCAGCTCCAGCTCCCAGCA 391 AT4G13340.1 GCCAGTATATTCTCCGCCACCACCATCGCCGCCTCCACCGCCTCCGCCAGTCTACTCTCCCCCACCACCACCACCGCCTCCACC---TCCTCCGC 1522 PGSC0003DMG400010093 TGCATTGGCTCCCCCTCCAGCCTCACTGCCATCTGCCCCTCCACCTAACCTCTTAACATCTCCACCCCCTTCTATTTCACCTCCTGCTCCCCCAA 613 DCAR_026603 TGCACCACCTCCGGCTCCTCCACCAAGTCCACCACCTGCTCCTGCCCCTCCACCTAATCCTCCACCACCTCCTGCACCTCCACCAAGTCCTCCAC 406

32 Ms1-3'_portion CTGA------363 ERN03927.1 CACCTTCGCCACCCCCGCCATCCCCACCACCACCTTCACCATCTCCCCCACCCCCTTCGCCACCCCCACCA-TCTCCCGCACCACCTTCGCCACC 1796 Aco000634.1 GAAGAAGTCCAAGCCCTCCCCCCACGCCCCCGCCCCGGAGGACACCGTCCTCATGACCCCGATGCCCCGCGGCTTCCCTGACCGCTCCGACGACT 535 XM_008456796.2 AGAAGAAGGTGAAGGCAGCAGCTCCGGGTCCGGCTCCAGTTTCGAGCCCGCC---AGCGCCGTCAGTGGAGGCTCCAGGACCTGCAGGCCCTGAT 510 ABK95485.1 AGCCTAAGTTGAAGTCTCCAGCTCCATCTC---CCCTGGCATTGAGTCCTCC---ATCTCCACCAACTGGCGCTCCTGCTCCAAGTTTGGGTGCT 480 AT4G13340.1 CGGTATACTCTCCTCCGCCTCCGCCAGTATACTCTTCTCCACCTCCTCCGCCTTCTCCAGCACCAACTCCAGTTTATTGCACC-CGTCCACCACC 1616 PGSC0003DMG400010093 ATAATACTTCTCCAGCTGGAGCTCCCCCTCCATTACCTGTGACTCGCCTTCCTACAGAGAAGCCCACTGCTATCCCTAAACCTGCTATCACTGCA 708 DCAR_026603 CACCTCCAGCTCCTCCACCGAGTCCACCACCACCTCCTG---CCCCTCCACCGAGTCCACCACCACCTCCTGCCCCTC-CACCGAGTCCACCACC 497

Ms1-3'_portion ------ERN03927.1 CCCACCATCTCCCCCACCACCATCGCCACCCCCGCCATCCCCCCCACC--ACCATC-GCCACCACCGCCAAT--T-TTGTCCCCACCACCTCCGG 1885 Aco000634.1 CCCCCGACGCCCGCATCTGCCTCTCCCGCATCTACAAGGCCGAGAAGGTCGAGCTCAGCGACGACCGCCTCGCCGCGGGGAGCACCAAAGGCTAC 630 XM_008456796.2 CAATC---TCCTACCCCATCTCAGAACGACAATAGTGGA---GTGGAGAAAGTTTG-GAGAAAGGAGAGTAT--GGTGGGGAGCATAGTGATTG- 595 ABK95485.1 TCCTC---TCCTGGACCCGCTGGAACCGATATGAGTGGA---GTAGAGAAGATGGG-GTCCGTGCAGAAGAT--GGTCCTGAGCCTGGTCTTTG- 565 AT4G13340.1 CCCACCACCTCACTCGCCGCCACCACCACAATTTTCTCCTCCACCACCTGAACCTT-ACTACTACAGCTCAC--CACCACCACCGCATTCTTCAC 1708 PGSC0003DMG400010093 GATTCAAGTGCCAGAAATGGTGGGGGAAATAAGACAGGAAGTGTGGCGGCAATTGGTGTTGTTGCTGGGTTTTTGGCCCTTAGCTTGGTCATTGT 803 DCAR_026603 CCCTCCTGCCCCTCCACCGAGTCCACCACCACCTCCTGCCCCTCCACC--GAGTCC-ACCACCACCTCCTGC--C-CCTCCACCGAGTCCTCCA- 585

Ms1-3'_portion ------ERN03927.1 ----TATATTCGCCACCACCTCCTCCGCTAAATTCACCGCCACCTCCAATTTCACCACCTTATTGTATAAGGTCTCCCCCACCACCTCCGCCAAA 1976 Aco000634.1 CG--CATGGTCCGCGCCACCCGCGGCGTGATGGACGGCGCCTGGTTCTTCGAGATCAGGATCGTGCGGCTCGGGGAGACGGGCCACACCAGGCTT 723 XM_008456796.2 ----GAATGGGATATGTATTTTTGATGCTTTAGGGAGAAGAAAATTAAAGGGTTTCCTTTTGCTGTCTAT--GTGTTTGCCTCTTTTTTTTTCTT 684 ABK95485.1 ----GATCAGCAT---TCTGGTTGCTAACTTAG------591 AT4G13340.1 ----CGCCGCCGCATTCACCTCCACCACCACATTCACCTCCCCCACCGATTTATCCATATCTGTCTCCA---CCGCCCCCACCAACACCAGTTTC 1796 PGSC0003DMG400010093 TGCTGTCTGGTTTACACGCAGGCGAAAGAAAAGAGAGAGTGCATTTAATCTCAATTACCTGGGACCCTCT--CCATTTGCTTCCTCACCAAATTC 896 DCAR_026603 ------CCACCTCCAGCTCCTCCACCGAGTCCACCACCACCTCCTGCCCCTCCACCGAGTCCTCCA---CCACCTCCAGCTCCTCCACCAAG 668

Ms1-3'_portion ------ERN03927.1 TTCACCACCACCACCACCGCCTCACTATCCACCTCCCCCTTCT-CACCATCCACCACCGCCTCCTCACTATCCACCACCGCATCACTATCCACCA 2070 Aco000634.1 GGGTGGACCACCGACAAGGGCGATCTACAGGCGCCCGTCGGGTACGACGGCCATAGCTACGGCTACAGAGATATCGATGGGACTAAGATTCACAA 818 XM_008456796.2 CATTATTTCCTTTCTTCTTGGGATGGCTCTATATTTCATTTCA-TTTTGATTATTATTATTAA------746 ABK95485.1 ------AT4G13340.1 TTCTCCACCACCCACTCCGGTCTATTCCCCTCCTCCCCCACCT-CCTTGTATAGAACCACCACCAC---CTCCACCGTGTATAGAGTATTCACCT 1887 PGSC0003DMG400010093 AGATACATCATTCCTGAGATCAAGGTCTCAACATTCCACTTAT-CTAGCTCCAACCGGCTCACAAAGCAATTTTATGTACTCTCCAGACCATGGA 990 DCAR_026603 CCCTCCCCCAGCACCTCCACCTAGTCCACCCCCGCCTCCTGCC-CCTCCACCAAGCCCTCCTCCAG---CACCTCCACCTAGTCCACCGCCGCCT 759

Ms1-3'_portion ------ERN03927.1 CCTCCTTCTCACCATCCACCA-CCGCCTCCTCACTATCCACCACCCCCACCACATGTGCATTCTCCACCACCACCGTCACCAGTGTATAGCCCAC 2164 Aco000634.1 GGCCTTGAGGGACAAGTATGGGGAGGAGGCCTACACAGAGGGGGATGTGATTGGGTGTTATATTAGCCTCCCCGATGGGGAGGCGTATGCGCCGA 913 XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 CCTCCTCCTC------CACCA-GTCGTTCATTATAGCTCTCCGCCTCCACCGCCAGTCTACTACAGCTCTCCGCCACCTCCACC----AGTCTAT 1971 PGSC0003DMG400010093 GGTATTGGAAATTCAAGATCATGGTTCACTTACGAAGAATTATCTGAGGCAACAAATGGTTTTTCTCCTGATAGTGTTTTGGGTGAAGGAGGGTT 1085 DCAR_026603 CCTGCCCCTC------CGCCAAGCCCTCCCCCANCCACCACCACCACCAGCACCTCCACCAAGCCCTCCTCCTGCACCCCCACCA---AGTCCAC 845

Ms1-3'_portion ------ERN03927.1 CGCCGCC-CATTCACCTTTCACCACCACCACCGTATTACTATGAGTCCCCACCACCACCTCAACCTGTGTATTCTCCTCCTCCACCTTGCATAGA 2258 Aco000634.1 AGCCGCCGCACCTGATTTGGTACAAAGGGCAGAGGTACGTCTACTCGGCCGACGGCAAGGATGAACCGCCCAAGGTAGTGCCTGGGAGTGAGATA 1008 XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 TACAGCT-C-TCCACCT----CCGCCACCGCCGGTTCATTACAGCTCTCCGCCACCACCAGAA---GTCCATTACCATTCTCCGCCT------2049 PGSC0003DMG400010093 TGGATGTGTTTACAAAGGTGTTCTTAATGACGGAAGAGAAGTCGCTGTCAAACAGCTGAAAAGTGGAAGTGGACAAGGGGAGCGGGA---ATTCA 1177 DCAR_026603 CACCACCACCTGCACCT----CCACCAA-GCCCTCCTCCTGCA--CCCCCACCAAGTCCACCACCACCACCTGCACCTCCACCAAGC------925

Ms1-3'_portion ------ERN03927.1 ACCACCACCTCCTCCTCAACCTTGCATTGAACCACCGCCACCACCTACTCCTAGCTATTTGCCAACCCCATCTCCATCACCACCACCGCCACCAA 2353 Aco000634.1 TCTTTCTTCAAGAACGGGATATGCCAAGGTGTCGCCTTCACGGACCTTTTC-----GGTGGACGATACTATCCCGCGGCGTCCATGTACACACTT 1098 XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ----CCATCTCCAGTACACTACAGCTCTCCACCACCGCCACCATCAGCTCC-----ATGTGAAGA----ATCTCCTCCACCAGCACCGGTA---G 2128 PGSC0003DMG400010093 GAGCAGAAGTTGAGATTATCAGCCGTGTGCACCATCGCCATTTGGTTTCACTTGTTGGTTACTGTATCTCAGAGCAGCAAAGGTTACTTGTCTAC 1272 DCAR_026603 ----CCTCCTCCTGCACCCCCACCAAGTCCACCACCACCACCTGCACCTCC------ACCAAGCCCTCCTCCTGCACCCCCACCAA-----G 1002

Ms1-3'_portion ------ERN03927.1 TCCATTATAACTCCCCTCCTCCACC--TTCACCACCGCCACCAATCTATTATAGCCCACCTCCACCACCACATTACAGTTCACCCCCACCTCCAA 2446 Aco000634.1 CCTAACCAGCCCAACTGCGAGGTTCGGTTCAACTTCGGGCCTGATTTCGAGTTTTTTCCGCAAGATTTTGGTGGCCGTCCGACCCCCCGACCGAT 1193 XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 TTCACCACAGTCCACCACCGCCCATGGTTCACCACAGCCCACCACCTCCAGTGATCCACCAAAGCC--CAC------CACCGCCATCTCCTG 2212 PGSC0003DMG400010093 GACTATGTGCCAAATGACACGCTTGACTATCACCTTCATGGTAAAGGCATGCAAACTATGGATTGGGCTACCCGAGTAAAAGTAGCTGCTGGTGC 1367 DCAR_026603 TCCACCAC----CACCACCTGCACC--TCCACCACCTGCACCTCCACCAAGCCCACCACCCGCACCTCCACCA---AGTCCACCCCCACCACCTA 1088

Ms1-3'_portion ------ERN03927.1 TTCATCATAGTCCACCACCTCCAATTCACTATAGTTCACCCCCACCACCTCCACCAGTTCCTTGCAATTCTCCACCCCCACCACCACCGATGAGC 2541 Aco000634.1 GATCGAGGTTCCTTATCACGGCTATGATTGTAAGATTGATGGGCCTGCTGAAAATGGCGTTGCAGAGAAAACTAGTTAA------1272 XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 AATATGAAGGACCACTACCACCGGTCATCGGCGTATCATACGCATCTCCTCCACCACCGCCGTTCTATTGA------2283 PGSC0003DMG400010093 AGCACGTGGACTTGCTTATCTTCATGAAGACTGTCATCCCCGCATTATCCATAGGGATATCAAAACATCAAACATTCTCTTGGATATCAATTTTG 1462 DCAR_026603 --CACCACCTCCCAAC-CCTCCACCAAACCCCCCACCTCTCCCAT-GCTTAAACCCTTTCCGATAAATCCCATGCCCTAA------1164

33 Ms1-3'_portion ------ERN03927.1 GAATTGCCACCTCCTTATATGGGGCCATTGCCACCAGTTACAGCGATTTCTTATAGCTCGCCACCTCCACCGCCTTATTATTGA------2625 Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 AGGCACAGGTTGCTGATTTTGGCCTTGCAAGGTTAGCAGGTGATGCCAGTAGTACACACGTGACAACTCGTGTGATGGGAACCTTTGGATACTTG 1557 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ------Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 GCACCAGAGTATGCATCTAGTGGAAAATTAACAGAGAAGTCTGATGTTTATTCATATGGCGTTGTGCTTTTGGAGCTTATTACGGGACGGAAACC 1652 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ------Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 TGTTGACCAGTCTCAACCCTTAGGTGATGAAAGCCTGGTTGAATGGGCTCGACCTTTGCTTGCTCAAGCACTTGAGACTGAAAATTTTGAAAATG 1747 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ------Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 TAGTAGATCCTAGGCTTGGAAACAACTTTGTTGCGGGTGAGATGTTCCGGATGATTGAAGCAGCTGCAGCTTGCGTTCGTCATTCAGGCTCTAAG 1842 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ------Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 AGGCCACGGATGAGTCAGGTGGTTAGAGCTCTAGATTCCATGGATGAGCTGTCGGATCTGTCCAATGGAGTGAAACCTGGACAAAGTGGAATTTT 1937 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ------Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 TGAGTCAAGGGAACAATCTGCACAGATAAGAATGTTTCAAAAGATGGCATTTGGAAGTCAAGAGTACAGTTCAGATTTCTTCAATTACTCCCAAG 2032 DCAR_026603 ------

Ms1-3'_portion ------ERN03927.1 ------Aco000634.1 ------XM_008456796.2 ------ABK95485.1 ------AT4G13340.1 ------PGSC0003DMG400010093 GCAGCTATAAAAGTTGA 2049 DCAR_026603 ------

ERN03927.1 (ENA), Amborella trichopoda, Amborellaceae, Amborellales Aco000634.1 (JGI), Ananas comosus, Bromeliaceae, Poales XM_008456796.2 (NCBI), Cucumis melo, Cucurbitaceae, Cucurbitales ABK95485.1 (ENA), Populus trichocarpa, Salicaceae, Malpighiales AT4G13340.1 (Arapot11), Arabidopsis thaliana, Brassicaceae, Brassicales PGSC0003DMT400026193 (JGI), Solanum tuberosum, Solanaceae, Solanales DCAR_026603 (JGI), Daucus carota, Apiaceae, Apiales

34 Table S9. The primers used for the molecular cloning of Ms1 orthologues, RT-PCR, qRT-PCR and BSP Primer name Sequence (5'→3') Application TaACTIN-F TCAGCCATACTGTGCCAATC RT-PCR in AABBDD, AABB and AA TaACTIN-R CTTCATGCTGCTTGGTGC

AetACTIN-R CTTCATGCTGCTTGGGGC RT-PCR in DD with TaACTIN-F

Q-PCR in AABBDD, AABB and AA with TaACTIN-QF1 TTCCAGCCATCTTTCATTG TaACTIN-R, Q-PCR in DD with AetACTIN-R

Ms1-QF ACATCATCCTCTGAGTCGCG RT-PCR and Q-PCR for Ms1 in AABBDD and

Ms1-QR GACCACGCAAACACGTACG AABB

Ms-D1-QF GCCTCTACATCATCCTCTGAGTG RT-PCR and Q-PCR for Ms-D1 in AABBDD and DD Ms-D1-QR ATACTCCTGCCAACGACAG

Ms-A1-QF ACATCATCCTCTGAGTCGCC RT-PCR and Q-PCR for Ms-A1 in AABBDD, AABB

Ms-A1-QR CTACCAGGACGCTACGATC and AA

Ms1-BSPF1 AAAATTYGGAAAYGGAAAAG BSP for Ms1 in AABBDD and AABB, first round Ms1-BSPR1 CRRRATCTCTCCATCRTCRC

Ms1-BSPF2 YTTTYTYGYATYYYGAGGY BSP for Ms1 in AABBDD and AABB, second round Ms1-BSPR2 TRRACRCTAARCCAARCCC

Ms-D1-BSPF1 AGGAGAGGCGGTTAYGYG BSP for Ms-D1 in AABBDD and DD, first round Ms-D1-BSPR1 RRCRRRRCRRTCTCTCCC

Ms-D1-BSPF2 GYATYYGGGYYGTYYGAT BSP for Ms-D1 in AABBDD and DD, second round Ms-D1-BSPR2 TCTCCCTCTCTCRCTCRR

Ms-A1-BSPF1 TYAAAAATYGAAAAYGGAAAAY BSP for Ms-A1 in AABBDD, AABB and AA, first

Ms-A1-BSPR1 ATCTCTCCATCRRCRRRRTC round

Ms-A1-BSPF2 ATYYYGAGGAGAGGYGGTTAG BSP for Ms-A1 in AABBDD and AABB, second

Ms-A1-BSPR2 CRRRRTRRTCTCTCTCTCC round

Ms-A1-BSPF3 ATYYYGAGGAGAGGYGGTTA BSP for Ms-A1 in AA, second round Ms-A1-BSPR3 CRRRRCRRTCTCTCTCTCC

Ms-A1-ProF TTCTTGAGAACCACCTTGTTCG To confirm the TtMs1-A promoter sequence Ms-A1-ProR CCATGGAACACTACGTACTAGGC

Ms-A1-GBF TCCGGCATTCCATTTCCGTC To confirm the TtMs-A1 gene body sequence

35 Ms-A1-GBR CCCACCGTCTTCTTCTCAATCG

Ms-A1-TerF CTCTACATCATCCTCTGAGTCGC To confirm the TtMs-A1 terminator sequence Ms-A1-TerR CATCACATCATTAGCAGAAAC

Ms1-B-ProF GCACTAGTTCTTTACTATACTCAAGCC To confirm the TtMs1 promoter sequence Ms1-B-ProR GAGCACTTCTAGCGAGTCAAGAAGG

Ms1-B-GBF CACGCCACCTCCGGCTATATAAG To confirm the TtMs1 gene body and terminator

Ms1-C-GBR AACGCAAAACTTGATCCATTTC sequence

LgMS1-DF CWCCCACCTCCTCGCGCTCTAC Degenerate primer for the Lophatherum gracile MS1

LgMS1-DR GCCATTTCGTGMGGGCCAAGA genomic sequence

LgMS1-GWR1 ACGATTCGAATCCCATATG

LgMS1-GWR2 AGAGAAAGACGAGAACCTGC

LgMS1-GWR3 TGTGTGAGGTACCTTGGCAG Genome walking primer for the Lophatherum gracile

LgMS1-GWF1 AGCAGAAAGCAGATGGTTGC MS1 genomic sequence

LgMS1-GWF2 CTAGCGTGGTTGGAACGC

LgMS1-GWF3 TTTTGAGATTCCGGCAGAAC

PaMS1-DF CTCGCCCGCCTCAACGCC Degenerate primer for the Phragmites australis MS1

PaMS1-DR GCCATTTCGTGMGGGCCAAGA genomic sequence

PaMS1-GWR1 AAGCACACGAACGAATGATTC

PaMS1-GWR2 TGCTGCTGGATCTTGCAAAG

PaMS1-GWR3 ACAACCACTTTGAAAACCAG Genome walking primer for the Phragmites australis

PaMS1-GWF1 GTTTCATTTCGTGTATCTTCCGG MS1 genomic sequence

PaMS1-GWF2 AGCCGGTCGTACGTACTTG

PaMS1-GWF3 CGGTTTGGTTCTCATCGG

36 Table S10. Primers used to prepare the constructs in this study Primer Sequence Application Ms1-ISH-F aagcttCGAGCGAGGGAGAGAGAGACC In situ hybridization probe Ms1-ISH-R gaattcGATCACATAGCATCAGTGGTTC Ms1-SB-F1 CGACATACACGGAGCGATCTATG Ms1-SB-R1 GGCAGAGGCGACGCACTG Southern blotting probe Ms1-SB-F2 TCTACAGCTCCTGCGGCGGCCTCCGC Ms1-SB-R2 CCAAAAGCACGGCCAGCTCTTGCCG Ms1 genomic acgacggccagtgcc aagctt DNA-F1 CACCTAGTTGCATATCTAGTGAACCC Ms1 genomic gcactgcaggcatgc aagctt acgcgt DNA-R1 GGTCTCTCTCTCCCTCGCTCGC For the pAHC20-Ms1p::Ms1 construct Ms1 genomic AGCGAGGGCGGCGCGCCCGGGGCTTG DNA-F2 GCTTAGCGTCCACGC Ms1 genomic caggcatgc aagctt acgcgt DNA-R2 ATAGCAGAATGGAAGCTACAAACAGC gccagtgcc aagctt cctgcagg Ms1p-F CACCTAGTTGCATATCTAGTGAACCC CCTCAGATCTACCAT aggcct Ms1p-R CGTCGCGGCGGGGCGGTCTC Ms-A1 gene acgacggccagtgcc aagctt cctgcagg aggcct body-F ATGGAGAGATCCCGCCGCC For the pAHC20-Ms1p::Ms-A1 construct Ms-A1-gene GCGGGGTCGGCGCGCGACTCAGAGGA body-R TGATGTAGAGGCCG CGGCCTCTACATCATCCTCTGAGTCGC Ms1 ter-F GCGCCGACCCCGC gcactgcaggcatgc aagctt Ms1ter-R ATAGCAGAATGGAAGCTACAAACAGC cgggggacgagctcgggtaccATGGAGAGATCCC Ms1 CDS-F1 GCGGGC cgggggacgagctcgggtaccATGCAGCCGGGG Ms1 CDS-F2 GCGCCGTGC For the p35S::Ms1-GFP, ggtgtcgactctagaggatccGAGGATGATGTAGA Ms1 CDS-R1 p35S::Ms1SP-GFP, p35S::Ms1△TM-GFP GGCCGAGCAT or p35S::Ms1△SP-GFP construct ggtgtcgactctagaggatccGGCCGCCTTGGAC Ms1 CDS-R2 GGCGC ggtgtcgactctagaggatccCGCCGCCGCCGCC Ms1 CDS-R2 GGCAG PTS1-mCherr cggggatcctctaga gtcgac y-F ATGGTGTCCAAGGGCGAG For the PTS1-mCherry marker construct PTS1-mCherr atacgaacgaaagct ctgcag TCA GAGTTTTGA y-R CTTGTACAGCTCGTCCATGC GmMan1-mC cggggatcctctaga gtcgac For the GmMan1-mCherry marker

37 herry-F1 ATGGCGAGAGGGAGCAGATC construct GmMan1-mC CTCGCCCTTGGACACCATAGTTTGACG herry-R1 GTCCCAGAAAAC GmMan1-mC GTTTTCTGGGACCGTCAAACTATGGTG herry-F2 TCCAAGGGCGAG GmMan1-mC atacgaacgaaagct ctgcag herry-R2 TCACTTGTACAGCTCGTCCATGC pFAγ-mCherr cggggatcctctaga gtcgac y-F1 ATGGCAATGGCTGTTTTCCG pFAγ-mCherr CTCGCCCTTGGACACCATGTTCTTAAC y-R1 ACTCTTCATGCGGTTAC For the pFAγ-mCherry marker construct pFAγ-mCherr GTAACCGCATGAAGAGTGTTAAGAACA y-F2 TGGTGTCCAAGGGCGAG pFAγ-mCherr atacgaacgaaagct ctgcag y-R2 TCACTTGTACAGCTCGTCCATGC WxTP-mCher cggggatcctctaga gtcgac ry-F1 ATGTCGGCTCTCACCACG WxTP-mCher CTCGCCCTTGGACACCATGGCAGGGGG ry-R1 GAGGCCACC For the WxTP-mCherry marker construct WxTP-mCher GGTGGCCTCCCCCCTGCCATGGTGTCC ry-F2 AAGGGCGAG WxTP-mCher atacgaacgaaagct ctgcag ry-R2 TCACTTGTACAGCTCGTCCATGC SP-mCherry- cggggatcctctaga gtcgac HDEL-F1 ATGAAGGTACAGGAGGGTTTGTTC SP-mCherry- CTCGCCCTTGGACACCATGCACTCCTT HDEL-R1 GCGAGGTTGC For the SP-mCherry-HDEL marker SP-mCherry- GCAACCTCGCAAGGAGTGCATGGTGTC construct HDEL-F2 CAAGGGCGAG atacgaacgaaagct ctgcag SP-mCherry- TCACAGCTCGTCATGCTTGTACAGCTC HDEL-R2 GTCCATGC gagggaaggatttca gaattc Ms1s-His-F CAGCCGGGGGCGCCGTGC cagtgccaagcttgc ctgcag For the MBP-Ms1s-His construct Ms1s-His-R TCAGTGATGATGATGATGATGGGCCG CCTTGGACGGCG His-F aattc CATCATCATCATCATCACTGA ctgca For the MBP-His construct His-R g TCAGTGATGATGATGATGATG g

38 References

1. Shitsukaw N et al. (2007) Genetic and epigenetic alteration among three homoeologous genes of a class E MADS Box gene in hexaploid wheat. Plant Cell

19:1723–1737.

2. Franckowiak JD, Maan SS, Williams ND (1976) A proposal for hybrid wheat utilizing Aegilops squarrosa L. cytoplasm. Crop Science 16:725–728.

3. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for

Illumina sequence data. Bioinformatics 30:2114-2120.

4. International Wheat Genome Sequencing Consortium (2014) A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science

345:1251788.

5. Kim D et al. (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36.

6. Li H et al. (2009) 1000 Genome project data processing subgroup: The sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25:2078-2079.

7. Clavijo BJ et al. (2017) An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. Genome Res 27: 885-896.

8. Chapman JA et al. (2015) A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 16:26.

10.1186/s13059-015-0582-8.

39 9. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2.

Nature Methods 9:357-359.

10. Abe A et al. (2012) Genome sequencing reveals agronomically important loci in rice using MutMap. Nature Biotech 30:174-178

11. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263-265.

12. Gruntman E et al. (2008) Kismeth: analyzer of plant methylation states through bisulfite sequencing. BMC Bioinformatics 9:371..

13. Nelson BK, Cai X, Nebenführ A (2007) A multicolored set of in vivo organelle markers for co-localization studies in Arabidopsis and other plants. Plant J

51:1126–1136.

14. Lee S et al. (2012) Mitochondrial targeting of the Arabidopsis F1-ATPase

γ-subunit via multiple compensatory and synergistic presequence motifs. Plant Cell

24:5037-5057.

15. Kitajima A et al. (2009) The rice α-amylase glycoprotein is targeted from the

Golgi apparatus through the secretory pathway to the plastids. Plant Cell 21:

2844-2858.

16. Gomord V et al. (1997) The C-terminal HDEL sequence is sufficient for retention of secretory proteins in the endoplasmic reticulum (ER) but promotes vacuolar targeting of protein that escape the ER. Plant J 11:313-325.

17. Dowler S, Kular G, Alessi DR (2002) Protein lipid overlay assay. Sci. STKE

129:pl6

40 18. Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods

8:785-786.

19. Bagos PG, Liakopoulos TD, Spyropoulos IC, Hamodrakas SJ (2004)

PRED-TMBB: a web server for predicting the topology of b-barrel outer membrane proteins. Nucleic Acids Research 32 (Web Server issue):W400-4.

41