<<

Mol Breeding (2016) 36: 152 DOI 10.1007/s11032-016-0569-5

Molecular diversity of α-gliadin expressed genes in genetically contrasted (Triticum aestivum ssp. spelta) accessions and comparison with (T. aestivum ssp. aestivum) and related diploid Triticum and Aegilops species

Benjamin Dubois & Pierre Bertin & Dominique Mingeot

Received: 13 May 2016 /Accepted: 11 October 2016 /Published online: 10 November 2016 # The Author(s) 2016. This article is published with open access at Springerlink.com

Abstract The of cereals such as bread particularly high proportion of α-gliadins from the B ge- wheat (Triticum aestivum ssp. aestivum)andspelt nome and a low immunogenic content. Even if no clear (T. aestivum ssp. spelta) are responsible for celiac disease separation between spelt and bread wheat sequences was (CD). The α-gliadins constitute the most immunogenic shown, spelt α-gliadins displayed specific features class of gluten proteins as they include four main T-cell concerning e.g. the frequencies of some amino acid sub- stimulatory epitopes that affect CD patients. Spelt has been stitutions. Given this observation and the variations in less studied than bread wheat and could constitute a source toxicity revealed in the spelt accessions in this study, the of valuable diversity. The objective of this work was to high genetic diversity held in spelt germplasm collections study the genetic diversity of spelt α-gliadin transcripts and could be a valuable resource in the development of safer to compare it with those of bread wheat. Genotyping data varieties for CD patients. from 85 spelt accessions obtained with 19 simple sequence repeat (SSR) markers were used to select 11 contrasted Keywords Spelt . α-gliadin . Celiac disease . Gluten . α accessions, from which 446 full open reading frame - Genetic diversity gliadin genes were cloned and sequenced, which revealed a high allelic diversity. High variations among the acces- sions were highlighted, in terms of the proportion of α- Introduction gliadin sequences from each of the three genomes (A, B and D), and their composition in the four T-cell stimulatory Gluten is the result of denaturation of endosperm storage epitopes. An accession from Tajikistan stood out, having a proteins called during the dough kneading. These prolamins are water-insoluble proteins found in the of bread wheat (Triticum aestivum L. ssp. Electronic supplementary material The online version of this article (doi:10.1007/s11032-016-0569-5) contains supplementary aestivum), spelt [Triticum aestivum ssp. spelta (L.) material, which is available to authorized users. Thell.] and durum wheat [Triticum turgidum ssp. durum : (Desf.) Husnot]. Equivalent proteins are also found in B. Dubois (*) D. Mingeot other cereals, such as barley and . Prolamins are Centre wallon de Recherches agronomiques (CRA-W), Département Sciences du vivant, Chaussée de Charleroi, 234, composed of monomeric gliadins and polymeric 5030 Gembloux, Belgium (Shewry and Halford 2002), which can be e-mail: [email protected] divided into high-molecular-weight subunits (HMW-GS) and low-molecular-weight glutenin sub- B. Dubois : P. Bertin Earth and Life Institute – Agronomy, Université catholique de units (LMW-GS). Conventionally, gliadins are classi- Louvain (UCL), Croix du Sud, 2 bte L7.05.11, fied into α/β-, γ- and ω-gliadins according to their 1348 Louvain-la-Neuve, Belgium electrophoretic mobility. Gluten proteins mainly 152 Page 2 of 15 Mol Breeding (2016) 36: 152 determine the functional properties of ; The α-gliadins constitute the most important class of glutenins are thought to determine dough elasticity, gliadins as they represent 15–30 % of the bread wheat whereas gliadins could determine its viscosity (Shewry proteins (Gu et al. 2004). Encoded by a multigene et al. 2003). family, they possess a very high allelic variability and The ingestion of gluten peptides can lead to three are located at the Gli-2 loci (Gli-A2, Gli-B2 and Gli-D2) main types of pathologic reactions: allergic (wheat al- on the short arm of the homeologous chromosomes 6A, lergy), autoimmune (celiac disease; CD) and possibly 6B and 6D, respectively. The haploid genome includes a immune-mediated non-celiac gluten sensitivity number of α-gliadin gene copies ranging from 25 to 35 (NCGS), an unclear grouping of patients whose overall (Harberd et al. 1985) to 100 (Okita et al. 1985)oreven state of health improves when gluten is withdrawn from up to 150 (Anderson et al. 1997), depending on the their diet (Mooney et al. 2013). CD is an uncontrolled variety. inflammatory response to partially digested gluten pep- The development of new cereal varieties that lack tides and is triggered by a T-cell activation in the gas- immunogenic gluten peptides, but still display good trointestinal mucosa. It results in the flattening of intes- baking properties, constitute one of the new CD thera- tinal villi and a reduction in its absorptive capacity, peutic approaches currently being considered (Rashtak leading to such clinical symptoms as diarrhea, bowel and Murray 2012). It would therefore be relevant to pain, fatigue, weight loss, anemia, , head- make use of the high variability existing in bread wheat ache and growth retardation (Koning et al. 2005;Marsh and its related taxa. Among them, spelt could be partic- 1992; Rashtak and Murray 2012; Sapone et al. 2012). ularly interesting because of the high genetic diversity CD affects genetically predisposed individuals, with a held in spelt germplasm collections (An et al. 2005; prevalence of about 1 % in the human population Bertin et al. 2004; Caballero et al. 2004). In addition, (Rewers 2005). spelt has been less subject to selection pressure than Among the gluten proteins, α-gliadins are the most bread wheat. Selection programs, which have focused, immunogenic fraction with the strongest T-cell activa- inter alia, on the improvement of bread wheat baking tion (Arentz-Hansen et al. 2002; Camarca et al. 2009; qualities, have contributed to a decrease in genetic di- Ciccocioppo et al. 2005; Vader et al. 2003). They dis- versity, especially at the level of α-gliadins and their play four major T-cell stimulatory epitopes: the overlap- toxic epitope content (Vanden Broeck et al. 2010). Spelt ping DQ2.5-glia-α1and-α2 epitopes (P{F/ was one of the most important cereals in Europe at the Y}PQPQLPY and PQPQLPYPQ, respectively), the beginning of the twentieth century, but bread wheat DQ2.5-glia-α3 epitope (FRPQQPYPQ) and the DQ8- almost completely replaced it because of its better bak- glia-α1 epitope (QGSFQPSQQ) epitopes. Because the ing qualities, higher yields and lower processing costs DQ2.5-glia-α2 epitope can be duplicated once or twice, (Koening et al. 2015). Spelt, however, has several inter- the canonical form of the overlapping DQ2.5-glia-α1 esting features, such as high vitamin content and nutri- epitope shows two variants: DQ2.5-glia-α1a tion values, robustness, adaptability to soil and climatic (PFPQPQLPY) and DQ2.5-glia-α1b (PYPQPQLPY). conditions, resistance to diseases and nitrogen use effi- These four epitopes can be displayed in their canonical ciency (Caballero et al. 2004;Campbell1997;Kema form (shown above in brackets), as well as with 1992). For more than a decade, the popularity of spelt substituted or deleted amino acid residues. Mitea et al. products has been increasing thanks to their pleasant (2010) showed that these mutations reduce or suppress taste and healthy food reputation (Koening et al. 2015; the antigenic properties of the epitope variants. When Kozub et al. 2014). Interest in spelt as a crop for organic two duplications of the DQ2.5-glia-α2 epitope occur, farming has also increased because of its lower pesticide this leads to the full 33-mer fragment, displaying three requirements compared with bread wheat (Kohajdova copies of DQ2.5-glia-α2, which is the most immuno- and Karovicova 2008; Kozub et al. 2014). genic fragment of α-gliadin sequences (Molberg et al. Spelt and bread wheat are both allohexaploids 2005;Shanetal.2002). In addition, α-gliadins display (2n = 6× = 42; AABBDD genome), but they seem to the p31–43 peptide, which induces the innate immune have emerged from distinct hybridization events response and enhances the T-cell adaptive response (Dvorak et al. 2012). The origin of spelt is not yet fully (Gianfrani et al. 2005;Maiurietal.2003;Stepniakand understood, but it seems to have emerged in two differ- Koning 2006). ent places, one in Iran and one in Europe. Iranian spelt Mol Breeding (2016) 36: 152 Page 3 of 15 152 might have emerged, like bread wheat, through hybrid- Rosenberg 2007) was used to obtain the mean individ- ization between cultivated emmer [Triticum turgidum ual Qmatrix. The log probability [LnP(D)] values cal- ssp. dicoccum (Schrank ex Schübler) Thell., AABB culated with Structure software and the ΔK statistics genome] and Aegilops tauschii Cosson (DD genome) were used to determine the number of clusters that best whereas European spelt could be the result of a cross described the collection structure. One accession was between cultivated emmer and hexaploid bread wheat selected in each cluster (10 in total) for all the experi- (Blatter et al. 2004;Dvoraketal.2012; Kozub et al. ments described here. Based on the results reported by 2014;Salaminietal.2002). Dvorak et al. (2012) and on the membership coefficients The objective of this study was to investigate the obtained with CLUMPP software (see below), the diversity of α-gliadin expressed genes from spelt com- Iran77d accession was added to this selection. pared with bread wheat and diploid species in the Triticeae tribe based on their α-gliadin amino acid se- Plant materials quence composition. The work involved (i) cloning and sequencing full-ORF α-gliadins from genetically The 11 selected accessions (Table 1)werekindlypro- contrasted spelt accessions, (ii) studying the allelic var- vided by the United States Department of Agriculture iation in the T-cell stimulatory epitopes, (iii) evaluating (USDA, Washington, USA), the Vavilov Institute of the toxicity of spelt accessions by analyzing their ca- Plant Genetic Resources (VIR, Saint-Petersburg, nonical epitope composition and (iv) comparing spelt Russia) and the Center for Genetic Resources (CGN, sequences to α-gliadins from bread wheat and related Wageningen, The Netherlands). Among these acces- diploid Triticum and Aegilops species in order to find sions, eight were landraces (BEL08, SPA03, GER11, potential spelt specificities. GER12, TAD06, SWI23, Iran77d and IRA03), two had an uncertain improvement status (DK01 and BUL04) and the final one was breeding material (US06). They Materials and methods were grown in 2014 in field conditions in Belgium and all the immature grains from a self-pollinated ear were Genetic diversity analysis and selection of accessions harvested 20 days post-anthesis, immediately frozen in liquid nitrogen and stored at −80 °C. A working collection of 84 spelt accessions, from 23 countries and 4 continents, has been maintained at the mRNA extraction and RT-PCR Walloon Agricultural Research Center (CRA-W, Belgium, see Online Resource 1). An Iranian accession For each accession, total RNA was extracted from (CGN06533, Iran77d), thought to originate from ances- 100 mg seeds using the NucleoSpin® RNA Plant kit tors that differ from those of other spelts (Dvorak et al. (Macherey-Nagel, Germany). The RNA quality was 2012), was added to this working collection. The mi- evaluated by a 1 % agarose gel electrophoresis and the crosatellite data from 19 simple sequence repeat (SSR) RNA was quantified by spectrometry. First strand markers (Bertin et al. 2004) used on the 85 accessions cDNA was synthetized from 250 ng RNA with were subjected to the model-based clustering method oligo(dT)18 primer using the RevertAid H Minus First implemented with Structure software (v2.3.4; Pritchard Strand cDNA Synthesis Kit (Thermo Scientific) in a et al. 2000) in order to infer the optimal number of total volume of 20 μl. groups best describing the population structure. The number of clusters (K) was tested from 1 to 20 with 10 Cloning and sequencing iterations per K, each iteration consisting of 100,000 burn-in steps, followed by 100,000 Markov Chain The α-gliadin coding sequences were amplified by the Monte Carlo (MCMC) repetitions. The admixture an- specific primers GliFS: 5′-ATGAAGACCTTTCT cestry model was chosen and the allele frequencies were CATC-3′ and GliRS: 5′-GTTRGTACCGAAGA assumed to be independent. The ΔK statistics devel- TGCC-3′. The reverse primer is degenerated to work oped by Evanno et al. (2005) was calculated using with an α-gliadin panel as wide as possible. Despite this STRUCTURE HARVESTER software (Earl and precaution, it should however be noted that this could vonHoldt 2012). CLUMPP software (Jakobsson and still lead to the amplification of a subset of expressed α- 152 ae4o 15 of 4 Page

Table 1 Total number of pseudogenes and full open reading frames (ORFs) of α-gliadin sequences obtained from 11 contrasted spelt accessions, distribution of the full-ORF sequences and enumeration of the canonical forms of the four T-cell stimulatory epitopes among each genome and each accession.

Name* Accession number Total number of sequences Pseudogenes Full ORF A B D DQ2.5-glia-α1 DQ2.5-glia-α2DQ2.5-glia-α3 DQ8-glia-α1

ABDTOTABDTOTABDTOTABDTOT

BEL08 PI348315 42 2 40 11 15 14 9 0 16 25 0 0 16 16 9 0 14 23 0 0 3 3 DK01 PI361811 42 0 42 13 19 10 13 0 13 26 0 0 13 13 10 0 7 17 0 0 5 5 SPA03 PI348572 41 1 40 26 8 6 25 0 12 37 0 0 12 12 24 0 6 30 0 0 4 4 BUL04 PI295063 45 2 43 17 15 11 15 0 17 32 0 0 17 17 15 0 9 24 4 0 4 8 GER11 PI348114 41 2 39 13 17 9 13 0 11 24 0 0 11 11 12 0 7 19 1 0 3 4 GER12 PI348120 36 1 35 18 9 8 16 0 10 26 0 0 10 10 16 0 8 24 1 4 2 7 TAD06 K 52437 47 1 46 14 27 5 8 0 7 15 0 0 7 7 14 0 5 19 0 0 1 1 SWI23 PI347939 45 4 41 15 17 9 15 0 12 27 0 0 12 12 15 0 7 22 0 2 2 4 US06 PI355595 38 4 34 21 7 6 14 0 11 25 0 0 11 11 21 0 6 27 0 1 5 6 Iran77d CGN 06533 45 1 44 22 7 15 12 0 22 34 0 0 22 22 22 0 14 36 0 0 8 8 IRA03 CGN12270 42 0 42 15 11 16 10 0 24 34 0 0 26 26 15 0 10 25 0 0 11 11 464 18 446 185 152 109 150 0 155 305 0 0 157 157 173 0 93 266 6 7 48 61

The 446 full-ORF α-gliadin expressed sequences were checked for the presence of the four T-cell stimulatory epitopes in their canonical forms: P{F/Y}PQPQLPY (DQ2.5-glia-α1), PQPQLPYPQ (DQ2.5-glia-α2), FRPQQPYPQ (DQ2.5-glia-α3) and QGSFQPSQQ (DQ8-glia-α1)

*: Names used for the same accessions in Bertin et al. (2004) except for Iran77d, which was named as in Dvorak et al. (2012) 152 36: (2016) Breeding Mol Mol Breeding (2016) 36: 152 Page 5 of 15 152 gliadins and thus to an underestimation of the amount of (15 from Triticum urartu Tumanian ex Gandilyan, five expressed variants. Moreover, the magnitude of this from Aegilops speltoides Tausch and 11 from Aegilops underestimation could vary from one genome to anoth- tauschii) and 36 from T. aestivum ssp. aestivum,previ- er. The amplification was carried out in 20 μlreaction ously assigned to one of the three genomes (13 from volume containing 1.25 U Pfu DNA Polymerase chromosome 6A, 12 from 6B and 11 from 6D). These (Thermo Scientific), 1 μlcDNA,2μl10×Pfu reaction sequences are reported in Online Resource 2a. buffer (with 20 mM MgSO4), 0.2 mM dNTP, 0.25 μM Neighbor-joining trees were then constructed in of each primer and nuclease-free water to reach 20 μl. MEGA6 based on a distance method (Poisson substitu- The polymerase chain reaction (PCR) was performed as tion model), with 1000 bootstrap replications, uniform described by Mitea et al. (2010). rates among sites, homogeneous pattern among lineages The PCR products were run on a 1 % agarose gel, and pairwise deletion of gaps and missing data. In order purified with the GeneJET Gel Extraction Kit (Thermo to compare spelt and bread wheat α-gliadin sequences Scientific) and cloned in a pJET 1.2/blunt cloning vector and to see if spelt characteristics could be pointed out, an using the CloneJET PCR Cloning Kit (Thermo overall phylogenetic analysis was conducted in the same Scientific). Chemical competent cells of the E. coli way and included all the sequences from the 11 spelt DH5α strain were then transformed and colonies grown accessions, as well as 210 GenBank α-gliadin se- after an overnight incubation at 37 °C were checked by quences from bread wheat and diploid species (see colony PCR. Subsequently, the PCR products of about Online Resource 2a and b). The bread wheat sequences 50 positive clones for each spelt accession were se- included in this analysis came from 31 distinct varieties quenced using the Sanger technique (Beckman Coulter and corresponded to all bread wheat α-gliadins reported Genomics, United Kingdom). in GenBank after withdrawing pseudogenes and se- quences of poor quality. Sequence sorting and genome assignment

Given the multigenic character of α-gliadins, there was a risk to obtain chimeric products during the PCR am- Results plification. To avoid it, sequences showing a putative combination of variants were discarded. After with- Genetic diversity analysis drawing sequences of poor quality (4), other than α- gliadins (7) or thought to be chimeric (96), the α-gliadin In order to select contrasted spelt accessions, the model- sequences (deposited in GenBank with accession num- based clustering method in Structure software was ap- bers KX173847 through KX174292) were translated plied to the SSR data of an international collection of 85 into amino acid sequences using BioEdit v7.1.11 (Hall spelt accessions. The LnP(D) value calculated by 1999). The identification of nucleic and amino acid Structure was the highest for K = 10. The ΔK statistics sequences present in more than one copy, as well as (Evanno et al. 2005) showed two clear peaks at K = 2 the elaboration of clusters grouping identical sequences, and 10. The peak at K = 10 being the highest, 10 was were carried out with SeqTools v8.4.042 (http://www. therefore assumed as the number of groups that best seqtools.dk/). This software was also used to search for described the structure of the spelt collection (Fig. 1). homologies with all α-gliadin proteins in GenBank via a Among the 85 accessions, 68 (80 %) had a member- BLASTP analysis (date of analysis: 23 October 2015). ship coefficient (Q value) higher than 0.9 and the mean The amino acid sequences of the four major T-cell Q was equal to 0.91. The composition of some clusters stimulatory epitopes (DQ2.5-glia-α1, DQ2.5-glia-α2, was strongly linked to the geographical provenance of DQ2.5-glia-α3andDQ8-glia-α1) were investigated in the accessions, e.g. the orange cluster (Fig. 1)that order to assign each sequence to a genome, following grouped all the Spanish accessions, almost exclusively, VanHerpenetal.(2006). For each accession, this attri- and the green cluster that included a high proportion of bution was further confirmed by a phylogenetic analy- accessions from Eastern Europe. Nine accessions were sis. The spelt amino acid sequences were first aligned clearly admixed (Q < 0.7), including Iran77d. One ac- using ClustalW in MEGA6 (Tamura et al. 2013), togeth- cession was selected in each of the 10 groups and er with 67 GenBank sequences: 31 from diploid species Iran77d, previously assumed to originate from ancestors 152 Page 6 of 15 Mol Breeding (2016) 36: 152 BUL03 BUL01 HUN03 CZE06 BUL04 HUN01 AUS01 TAD15 BEL02 TAD07 GER21 Iran77d TAD06 AFG04 SWI31 BEL13 BEL10 CZE08 SWE02 BUL02 GER15 DK01 AZE02 GER23 SWI12 SWI11 BEL06 BEL08 BEL07 ITA01 SWI02 UKR01 SWI20 CZE07 SWE01 CZE01 GER19 ROM01 US02 GER09 SWI27 GER11 BEL14 GER16 GER10 SWI25 HUN02 BEL09 DK02 TAD01 AZE03 SWI23 SWI10 SWI18 BEL04 SAH01 SPA10 SPA02 SPA01 IRA03 GER08 SWI13 SWI28 BEL12 POL01 SPA05 SPA09 SPA08 GER22 UK01 SPA04 SPA11 MAC01 SPA13 SPA06 SPA03 SWI21 TAD22 GER17 MEX01 GER25 GER12 SWI24 CZE02 US06

Fig. 1 Structure inference in a collection of 85 spelt accessions on represents one group and each vertical strip corresponds to one the basis of SSR results using Structure software (v2.3.4), cluster- accession. The strips are divided into fragments representing the ing for K = 10. Microsatellite data from 19 SSR markers (Bertin membership proportion to each group et al. 2004) were used to perform this analysis. Each color that differs from those of the other spelts (Dvorak et al. classical structure even when one sequence showed a 2012), was added to this selection. deletion of almost the entire unique domain I (63 amino acids) and an insertion of eight amino acids at the same Cloning and molecular characterization of spelt location (accession number KX173965). Although α-gliadin genes showing the typical features, 26 sequences displayed an insertion or deletion in at least one domain. In total, 464 α-gliadin expressed sequences were ob- The typical structure of an α-gliadin also contains six tained from the 11 spelt accessions, ranging from 36 to residues: four in the unique domain I and two in 47 sequences per accession (Table 1). Among these, 446 the unique domain II (Online Resource 3). Among the showed a full open reading frame (full ORF). The 18 446 sequences, 427 displayed these . remaining sequences (3.9 %) had at least one premature Seventeen of the 19 remaining α-gliadins had a seventh stop codon (PSC) and were designated as pseudogenes, extra cysteine at the beginning of the second unique in line with previous publications (Noma et al. 2015; domain. A loss of cysteine residues was also observed VanHerpenetal.2006; Ozuna et al. 2015). in two sequences: sequence KX173978 showed five A clustering was carried out on the 446 full-ORF cysteines as the result of a C to Y substitution, whereas genes to group the identical sequences. This resulted in sequence KX173965 had only two cysteines after the 260 and 226 different nucleic and amino acid sequences deletion of almost the entire unique domain I. respectively. The 226 amino acid sequences were ana- lyzed using BLASTP for their homology with all α- gliadins from Triticeae species reported in GenBank. Genome assignment and T-cell stimulatory epitope Among them, only 26 showed 100 % homology with diversity other α-gliadins from Triticeae species. Most α-gliadins have a typical structure (Online In order to assign each sequence to its corresponding Resource 3), starting with an N-terminal signal peptide, genome of origin, we used as reference genome-specific followed by a repetitive domain where three types of amino acid motifs identified by Van Herpen et al. (2006) CD toxic epitopes are located (DQ2.5-glia-α1, -α2and in wheat diploid species (highlighted in yellow in -α3), two polyglutamine regions separated by a first Fig. 2a) in and around the four T-cell stimulatory epi- unique domain and, finally, a second unique domain at topes DQ2.5-glia-α1, -α2, -α3andDQ8-glia-α1. All the C-terminal side containing a fourth CD epitope these motifs were found in the spelt sequences in this (DQ8-glia-α1). With regard to the sequences obtained study (highlighted in yellow in Fig. 2b). Some other in this study, all the 446 α-gliadins displayed this genome-specific motifs in the same regions, however, Mol Breeding (2016) 36: 152 Page 7 of 15 152 were detected in spelt sequences and are highlighted in epitope (p299). Among all the B genome spelt se- orange in Fig. 2b. quences, 73.7 % corresponded to the first type (BYG^ type) and 25 % to the second type (BFV^ type). Only & DQ2.5-glia-α1and-α2 two α-gliadin sequences from the American accession (US06) could not be classified according to these types The alignment by genomes of spelt sequences because they displayed both a tyrosine residue at p289 allowed three specific motifs to be detected that had and a valine at p301. We also identified in the spelt not been reported by Van Herpen et al. (2006): (i) a B- sequences a high proportion (52.3 %) of α-gliadins from genome specific substitution was found at a high fre- Gli-D2 showing a serine-to-phenylalanine substitution quency (62.9 % of the sequences from the Gli-B2 locus) at p302 (QGFFQPSQQ). where the proline amino acid located just after the For each accession, a neighbor-joining tree resulting DQ2.5-glia-α2 epitope was replaced by a threonine from the alignment of the amino acid sequences with α- (Fig. 2b, amino acid at position 112); (ii) a substitution gliadins of known genomes of origin was constructed of the at the last position (Fig. 2b, p111) of the (data not shown). Three major groups, corresponding to DQ2.5-glia-α2 epitope by a histidine (PQPQLPYSH) the A, B and D genomes, were clearly displayed in each occurred in some sequences (14.6 %) of the A genome, phylogenetic tree, confirming the genome assignment of but never in the B and D genomes; and (iii) a D-genome the sequences according to the motifs described by Van specific substitution of a glutamine by a histidine resi- Herpen et al. (2006). due at p113 (Fig. 2b) in the second copy of the DQ2.5- glia-α2 epitope (resulting in PQPHLPYPQ) was ob- served in 11 of the 30 sequences displaying one dupli- Genomic distribution cation of this epitope. The A, B and D genomes were not equally represented & DQ2.5-glia-α3 among the 446 full-ORF spelt α-gliadin sequences, with 185 (42 %), 152 (34 %) and 109 (24 %) being counted Sequences expressed from the Gli-A2 and Gli-D2 for the A, B and D genomes, respectively (Table 1). The loci are generally not distinguishable from each other sequence frequencies in each genome were quite differ- because both display the canonical form of the epitope, ent from one accession to another (Fig. 3a). The Spanish but we found a substitution of the proline residue at (SPA03) and American (US06) accessions, for example, p134 (Fig. 2b) by a serine in 13 (11.9 %) sequences showed the highest proportion of A genome sequences from the D genome (FRPQQSYPQ). (more than 60 %), the Tajik accession (TAD06) the highest proportion of B genome sequences (about & DQ8-glia-α1 60 %) and the Belgian (BEL08) and both Iranian acces- sions (Iran77d and IRA03) the highest proportion of We identified two types into which spelt α- sequences from the D genome (35–40 %). gliadins from the B genome could be divided. The Genome-specific variations in the number of gluta- first type was characterized by a tyrosine or some- mine residues in the two polyglutamine regions (PQI times a leucine instead of a phenylalanine residue 11 and PQII) have been reported several times in Triticum positions before the epitope (p289 in the Fig. 2), as and Aegilops species (Li et al. 2012;Lietal.2013;Van described by Van Herpen et al. (2006). Remarkably, Herpen et al. 2006; Xie et al. 2010). In spelt, significant this mutation was associated with a proline-to-serine variations in the length of PQI and PQII regions were substitution at p305 (QGSFQSSQQ) in 89.4 % of observed (Online Resource 4). Overall, the PQI region the spelt sequences of this type. The second type had a higher average number of glutamine residues than was characterized by the glycine-to-valine substitu- the PQII region and displayed a significantly larger tion at p301, as reported by Van Herpen et al. average number of glutamine residues in the α- (2006). In spelt, this substitution was associated gliadins from the A genome than from the B and D with the replacement of the glutamine residue genomes. In contrast, the PQII region had a significantly by a leucine at p308 (QVSFQPSQL) and with a lower number of Q-residues in the A genome sequences glutamine-to-serine substitution one position before the than the B and D genome sequences. The standard 152 Page 8 of 15 Mol Breeding (2016) 36: 152

001 011 021 031 092 003 013 a T. urartu A genome Sgenome* Ae. speltoides Ae. D genome Ae. tauschii Ae. b A genome . spelta ssp B genome Triticum aestivum D genome

DQ2.5-glia-α1 and - α2 DQ2.5-glia-α3 DQ8-glia-α1 Fig. 2 Localization of genome-specific motifs displayed in α- genome-specific motifs already identified by Van Herpen et al. gliadins from (a) diploid species and from (b) spelt. a: For each of (2006) in and around the four major T-cell stimulatory epitopes. the A-, B- and D-genome ancestral species, five α-gliadin se- Orange residues are new genome-specific motifs discovered in quences selected from GenBank were aligned. b: For each of the spelt α-gliadins from this study. *: The B genome is hypothesized three genomes, five representative α-gliadin sequences from spelt to be an altered S genome (Von Buren 2001); Ae. speltoides is obtained in this study were aligned. Yellow residues correspond to therefore taken as the closest representative of the B genome

deviation of the mean number of Q-residues in the PQII In the B genome cluster, a grouping coherent with the of the B genome sequences was noticeably high. YG/FV classification was displayed. The sub-cluster at the top of the figure, predominantly composed of spelt sequences, included only YG-type α-gliadins whereas Phylogenetic analysis the middle sub-cluster displayed almost only FV-type spelt and bread wheat α-gliadins. The last sub-cluster at A phylogenetic analysis was performed involving the the bottom of the B genome cluster included some YG- 226 different spelt amino acid sequences in this study, type spelt α-gliadins but also bread wheat and Ae. 31 α-gliadins from diploid species representing the A, B speltoides α-gliadins that do not match with this YG/ and D genomes (empty triangles in Fig. 4) and 179 α- FV classification. gliadins from bread wheat (empty circles). Three main In the D genome cluster, sub-groups corresponding groups were clearly discernible, corresponding to the to the number of duplication of the DQ2.5-glia-α2 three genomes, A, B and D. epitope were observed. Both external sub-clusters on The A genome cluster was the largest. Even if no the left and the right parts of the cluster displayed mainly clear separation into spelt, bread wheat and T. urartu α- α-gliadins with one duplication. Between them, one gliadins was seen, some sub-clusters were identified, sub-cluster included α-gliadins with two duplications, including exclusively or predominantly sequences from i.e. the full 33-mer sequence, whereas the last one spelt (sub-clusters c, e, g, h, j, k, l), bread wheat (b, f, i) contained mainly α-gliadins without duplication. or T. urartu (a and d). Among this D genome cluster, spelt and bread wheat Mol Breeding (2016) 36: 152 Page 9 of 15 152

Fig. 3 Analysis of α-gliadin transcripts from 11 contrasted a 70 spelt accessions: proportion of 60 sequences from the three ge- nomes (a) and average number of 50 canonical epitopes per sequence 40 (b). a: The frequencies were cal- % A genome culated by reporting the number 30 B genome of sequences from each genome to the total number of cloned α- 20 D genome gliadins for each accession. b: 10 Contribution of the four T-cell stimulatory epitopes to the aver- 0 age number of canonical epitopes per sequence

b 2,5

2

1,5 DQ8-glia-α1 DQ2.5-glia-α3 1 DQ2.5-glia-α2 DQ2.5-glia-α1 0,5

0

α-gliadins were rather homogeneously distributed while it was still found in relatively high amounts be- Ae. tauschii sequences seemed to cluster together. cause of its duplication or triplication in some sequences. These duplications and triplications Canonical epitope screening and inventory of epitope weredisplayedin27.5%and13.8%oftheD variants genome sequences, respectively. The mean number of canonical epitopes per sequence Each of the 446 α-gliadin sequences was manually varied greatly, depending on the accession (Fig. 3b). checked for the presence of the four T-cell stimu- The spelt from Tajikistan TAD06 was clearly distinct latory epitopes DQ2.5-glia-α1, -α2, -α3 and DQ8- from the others, displaying a mean number of canonical glia-α1 in their canonical forms: P{F/ epitopes (0.91) significantly smaller than the average of Y}PQPQLPY, PQPQLPYPQ, FRPQQPYPQ and the 10 remaining ones (1.87). The highest values were QGSFQPSQQ, respectively (Table 1). DQ2.5- observed for the Spanish (SPA03) and two Iranian glia-α1and-α3 were the most frequent epitopes, (Iran77d and IRA03) spelt accessions in relation to the followedbyDQ2.5-glia-α2 and finally DQ8- large proportion of sequences from the A and D ge- glia-α1. This last-mentioned epitope was the only nome, respectively, found in these accessions. one that was present in its canonical form in each Variants of the DQ2.5-glia-α1, -α2, -α3 and DQ8- of the three genomes, whereas the three other glia-α1 epitopes were also searched in the 446 full-ORF intact epitopes were systematically absent from sequences, according to their genome of origin (Online the B genome. The DQ2.5-glia-α2 epitope was Resource 5). The DQ2.5-glia-α1, -α2and-α3epitopes always mutated in the A and B genome sequences. displayed 11, 13 and 8 variants, respectively, and the The D genome was therefore the only one to canonical form was always predominant for all of them. display the canonical DQ2.5-glia-α2 epitope, but Only mutated variants, however, were observed in 152 Page 10 of 15 Mol Breeding (2016) 36: 152

0 2 8 6 4 1 3

0 4 7

7 8 0 9

D D D 7 D 2 2 D 7 2 3 7 7 4 8 7 7 5 4 1 7 6 8 2

3 7 7 5 3 7 6 7 1 6 D 5 6 6 6 7 2 2 0 5 0 3 0 N N 7 N 7 7 5 3 0 N 0 0 N 6 3 5 G A 0 6 A 7 4 1 0 S A A D

F A 2 S 7 A 0 0 1 A 0 6 3 G S E R A 0 7 9 R 9 R N 9 9 I U R

U08287 3 E R S 1 0 I U R A T I S 1 3 8 I R 7 3 K P R I R 5 E I 0 2 5 B I I A I F A 4 0 5 7 U R 1 6 R K J R A 1 U 8 4 E I RA J A 1 D 0 E R A 0 2 2 1 D 1 I 7 1 A C 1 0 I AN 7 6 L K R 7 3 3 R 0 D 2 N 0 3 I C 6 3 7 0 1 5 I R 1 7 2 8 7 I C 7 7 S U 6 U 0 0 8 4 7 1 8 2 2 7 K 5 N 7 7 8 S 4 K S 3 6 W S 7 5 4 4 1 E A 2 1 S W 1 0 U A N 0 3 1 T 0 D 9 2 0 U R 8 3 W I 1 I 0 1 3 A 2 6 4 R X A A 3 0 G G I 2 I A 0 1 1 3 G D I 2 3 3 5 J R P 1 8 6 I A 8 E E 2 3 0 P R 1 1 2 1 B E 0 3 4 1 S P 0 3 8 7 B R R 6 3 S E 1 U R 8 S R 8 4 6 B E 1 1 1 3 K R D L 1 7 G E 2 6 K E L 1 2 3 D E 1 7 6 0 1 1 G 8 0 0 8 J J L 0 5 4 7 6 5 3 1 0 8 4 1 G X K D A X 5 J N 7 5 8 J 8 3 8 1 3 2 D A A 0 5 1 D 2 7 8 6 1 2 1 2 T R 3 7 9 9 I U 3 8 2 1 4 I 0 1 6 7 8 8 R A U S 3 3 4 F 0 5 8 3 3 K S N A J S 0 6 1 E A 1 5 4 N 1 0 6 0 0 U 7 1 8 S 3 5 R 7 7 7 3 6 3 I C 9 0 9 5 S W 7 6 7 5 K C 0 3 7 W D 0 K 5 1 3 2 8 I2 3 5 1 7 7 J B 2 U 5 2 6 8 7 7 X U I2 3 3 U 0 3 7 8 3 5 5 0 4 9 B 2 L 3 X F 0 8 D I U 8 0 3 2 E 1 5 7 1 R 3 4 7 RA 4 1 7 7 I IR IRA A L 0 4 I J 7 R 0 0 4 l K C N 3 2 A A 3 4 A 0 3 0 2 N N 03 5 K R A 0 3 0 7 3 I P A 3 1 7 7 4 8 S 2 S S 7 D 4 P I 8 5 W W D S W 0 2 29 S I 5 5 L 1 9 B P I2 2 6 S E 0 4 5 E A 3 3 k B K 0 4 9 T L 0 7 D L 0 1 K A 0 3 1 U L 8 4 P D 8 1 7 B U 0 2 1 I 4 0 1 1 B L 6 8 R IR 0 613 E 0 52 A A 5 4 B S 0 36 0 N 0 25 0 U 4 6 93 TA 77 3 2 P 0 5 78 A D D 2 K S 1 2 7 B genome B 0 1 U 7 7 9 2 C 05 2 U 8 6 8 K 4 5 94 A S 2 7 P 40 4 7 B 0 28 2 K P 1 92 AB 98 6 5 K 14 5 32 A 9 22 71 X 71 1 B 82 3 J C 29 98 2 5 j K 71592 88 U 2 41 C 98 2 BU 51 29 K B 82 78 L 30 4 A 9 22 8 IR 04 3 AB 98 29 A0 1 B 05 94 X 3 0 A 4 52 02 6 KP 0 88 E K0 54 9 P4 04 F5 30 0 K 41 2 61 75 KJ 01 1 M1 27 K 06 K 10 4 D D 6 K 0 73 i TA 6 J13 20 S0 64 6 S 72 68 U 06 29 E PA 39 S 18 7 F 03 U U0 29 561 77 E 018 97 DK 28 EU 052 BE 01 8 P4 20 L0 10 K 08 7 IR 8 2 EL 047 A03 7 B 41 13 DK 79 KJ 159 8 B 01 C7 6 4 E EL0 21 K D0 06 F56 8 37 TA 159 GE 127 KC7 3 20 R11 9 A0 D 20 SP 6 23 B K01 S0 8 UL0 18 U 01 2 6 BU 4 65 DK 644 L04 Q24 14 DK 36 D D06 HM 01 4 TA 0478 120 7 J41 7 KC7 221 K 03 4 1594 SPA 49 DQ41 1 A03 K 7343 SP 55 C715 PA03 G 950 S 11 44 ER11 GER 3 BUL 9 h R12 7 04 19 GE 3 39 TAD0 SWI2 I 6 9 A03 29 RA03 68 SP 21 HM120 BUL04 222 A03 69 K03074 SP X0 DK01 16 2538 A03 12 KJ410484 SP KJ41 DK01 7 0476 PA03 32 US06 21 S AB SWI23 15 982234 g 41493 AB982273 JX1 A genome 0 A JX82830 B982282 KP405275 AB982292 AB982242 AB982236 AB982298 SWI23 23 AB982299 GER12 13 KP405276 GER12 32 f KP 588 405280 DQ002 KP405272 AB982237 KP405 82267 273 AB9 KP405271 DQ002586 IRA 87 N77D 23 DQ0025 e IRA03 27 Q002585 SPA D 84 03 25 DQ0025 IRA03 6 R12 26 GE GE 1 R12 70 J41048 IRA03 K 5553 BU 9 EF16 0 L04 16 YG type + others + type YG 39114 BEL0 FN 47 BU 8 6 Q2464 L04 1 D 15907 GER KC7 92 G 12 1 0182 ER1 5 EU 3 41 AB 2 16 IRA0 9822 03 39 JX8 72 IRA 26 d K 2829 A03 P40 2 IR 492 KP 5282 141 1. K 4052 JX 11 2 P40 99 ER 952 KP 529 G 15 5 K 405 5 KC7 149 P4 302 X14 83 KP 052 J 104 8 K 405 93 KJ4 224 c P40 308 B98 89 BE 53 A 414 3 G L08 01 X1 68 ER 10 J 891 87 GE 12 9 GQ 612 6 B R1 71 F5 28 UL 2 4 E 28 91 DK 04 3 JX8 82 3 B 01 39 82 29 UL 79 JX 82 6 SW 04 B9 38 3 S I2 47 A 31 25 W 3 N8 2 7 TA I2 16 J 98 03 I D 3 4 AB RA 99 RA 06 0 I 82 5 TA N 2 01 07 U D 77 0 U 11 8 S 06 D E M 44 0 E 06 1 44 46 5 U F1 6 1 Q2 7D 96 S 65 9 D 7 58 4 SP 06 55 AN 1 28 G A 1 5 IR C7 8 8 E 0 9 K 82 29 7 EF R 3 X 8 1 I 5 11 31 J 82 59 6 R 6 X 1 29 1 U AN 12 34 J 7 8 5 G S 7 8 C 82 4 4 b E 06 7 3 K X 0 2 G R 7 D J L 1 99 B E 1 6 4 BU K0 2 4 U R 2 6 D 28 5 6 S L 12 3 8 59 7 G P 0 6 X 1 2 5 E A 47 40 J 7 2 8 S R 03 C 98 6 60 B P 3 K B 91 2 1 U A 11 5 A 8 5 6 G L 0 7 Q 0 2 2 B E 0 3 3 4 5 26 8 a U R 4 6 G P 40 5 9 G 1 2 1 K 0 2 4 S E L0 2 2 P 4 8 4 1 W R 4 K P 1 9 2 S 1 3 K 0 5 3 0 S P I 1 2 1 U 1 2 8 6 A 2 6 E 7 I 1 6 S P 0 3 5 C W 0 2 7 B P A 3 3 K K 5 6 3 E A 0 5 S D 0 0 6 B 0 3 4 4 S 0 S U L 3 4 5 2 9 L 0 6 1 P U 1 5 9 B W 0 8 3 K 8 G U 3 R 2 5 4 I2 4 1 E 0 2 9 6 A E L 3 2 2 G 0 0 5 9 2 A B R 0 7 6 Q 0 2 5 9 A B 98 4 4 0 2 9 1 4 2 D Q 0 0 5 9 8 A B 9 1 D 2 5 K B 9 8 22 9 Q 0 0 9 7 2 2 D Q 0 2 5 9 0 A J 9 8 8 0 2 4 1 A B 1 8 2 2 D Q 0 0 5 3 2 7 1 2 3 3 6 K B 9 2 7 D Q 0 0 1 U P 7 2 5 0 D 8 9 8 2 5 D Q 0 A 4 0 K S 4 8 2 4 7 D 1 5 K P 2 3 9 D Q R 7 0 7 0 0 2 6 I 1 6 3 K P 4 6 5 2 9 D N 7 6 2 5 K P 4 0 3 K 4 0 2 4 6 A J N 2 0 2 8 P 4 0 5 4 7 5 S 8 1 K P 1 R K A A 3 K P 4 0 5 3 9 I U 9 8 1 D 4 0 0 R R 8 48 J P 5 2 I B I 0 1 4 9 5 4 0 5 3 9 6 3 G X Q 4 0 A L 1 5 9 5 K 5 3 0 6 9 0 8 1 0 14 4 2 5 9 S E 5 3 0 5 E 2 5 1 7 K C 2 6 5 3 1 0 2 6 I W R1 0 7 B 0 2 2 4 K R C 7 8 6 2 0 4 JX X 0 8 3 7 K 0 0 8 7 K I 1 3 3 6 3 J Q 1 0 D 2 2 P A 7 2 1 0 0 0 K C 5 0 7 3 Q 0 A 7 1 1 K C 4 0 1 3 D 5 3 I C 9 6 6 Q 22 3 9 K 7 5 6 D 7 6 3 5 R P 7 0 3 U R 8 1 2 7 I 1 1 1 I 8 4 R 7 1 9 D 5 2 C A 1 5 E N 9 2 2 1 3 4 4 0 0 I 8 D 2 1 5 2 4

F 6 A N 5 A 8 3 4 7 0

0 5 2 1 1 9 0 0

B 7 3 8 1 8 5 5

E 9 8 N 1 5 9 W 0

X 0 6 9 6 6 7 0

R L 7

. 1 9 9

0 5

7 8 8 7 4

A 4 9 5

I 7 2 7 5 2 2 6

J S 4 3 3 6 8 7 7 8

9

3 4 5 8

3 1 0 2

E A 1 2 4 9 8

R 2 1

N 2 8 8 5

8 9 3 3 8

8 7 . 7 4

3 2 1 D

1 0 1 5

7

P 1 3

4 8 4 1 2 4

1 6

4 6 4

1 1

5 2 5 2 6 B 0

E 2 2 5

P 2

2 2 3

1

A 2 4 0

1 6 0 D

2

1 0

0 7 7 4

6 6

6 1 2 6

K 1 5 2 1

3 6

0 8 8 3

8 D 5

9

1

2 9 1 1

1 6

S J 8 2

8

4 7

R G 4

X 9 1

2 K 4

4

5 0 7 D 5 2

3 8

5 2 2

I X 7 2 2

9 4

K 0 J

3 1 4

1

0

X 8 6 2 1 R

A 5 0

5

52

4 7 1 0 8

J 3 0 5 X F 0 F

0

1 1 D

2 7 2

0

0 1 2

J B A

0 L

D 8 0 R L

R

1 R

0 E J

L

1 7 0

4 0

6 X L

8 A 1 8 E P E WI 1

N L A 4

A 7 2 E

J S E

4 E 40

E

L 4 J

5 U

9 9 G

P N U K R E R S 1 A 8

E R M

K B

C I P

U B

E P

G

P

I F G

G B

B

K A B

B E X

R B X

K

K B K I K

E

A

A R J J

G I 2 1 0 D genome

Fig. 4 Neighbor-joining tree of 226 spelt α-gliadin amino acid from bread wheat by empty circles and spelt α-gliadins were sequences from this study and 210 previously published sequences labeled by filled circles. Sequences from the A genome were from diploid species and bread wheat. The neighbor-joining tree is colored black. In the B genome, YG-type and FV-type sequences presented in a circular disposition where only the topology is were colored olive and blue-gray, respectively. Alpha-gliadins displayed for the sake of clarity. Sequences from diploid species matching with neither YG- nor FV-type were colored pink. Spelt and bread wheat were retrieved from GenBank, and spelt α- sequences from the D genome were marked with a turquoise, red gliadins were determined in this study. Alpha-gliadin sequences or yellow label when they displayed 0, 1 or 2 duplications of the from diploid species, arising from Triticum urartu, Aegilops DQ2.5-glia-α2 epitope, respectively speltoides and Ae. tauschii, were labeled by empty triangles, those

sequences from the B genome and the mutations Discussion consisted either of a residue deletion (DQ2.5-glia-α1 and -α2) or a residue substitution (DQ2.5-glia-α3). Clustering in the spelt collection The DQ8-glia-α1 epitope showed the lowest diversity, with seven variants that were always the result of a The main objective of this study was to investigate substitution. Its canonical form was preferentially en- the α-gliadin diversity in spelt through the cloning countered in the D genome α-gliadins, but remarkably it and sequencing of expressed sequences from was not the most frequent form as two other variants, contrasted spelt accessions. To this end, we started almost exclusively found in the A or B genome, ap- by studying the genetic diversity of a spelt collec- peared in the first and second positions, respectively. tion. The analysis with Structure software based on Mol Breeding (2016) 36: 152 Page 11 of 15 152

19 SSR markers led to a clustering of 85 acces- copy number with high allelic variation (Anderson sions in 10 groups broadly coherent with the spelt et al. 1997; Okita et al. 1985). geographic provenance. This result was consistent Among the 464 sequences, 446 displayed a full ORF, with the findings reported by Bertin et al. (2004), whereas only 18 (3.9 %) displayed at least one PSC, and where an unweighted pair-group method with ar- were therefore considered to be pseudogenes. Working ithmetic averaging (UPGMA)-based dendrogram on genomic DNA, Anderson and Greene (1997)showed was generated through the calculation of genetic that about half of the α-gliadin genes from bread wheat distances (1 – proportion of shared alleles). Based displayed at least one PSC. More recently, Ozuna et al. on the high mean membership coefficient (mean (2015) found pseudogene proportions of 39, 76 and Q = 0.91) and the low number of admixed acces- 63 % in the genomes of diploid, tetraploid and hexa- sions, we assumed that choosing one accession in ploid wheat species, respectively. Polyploidization each of the 10 clusters would provide a panel that might have contributed to this increase in PSC occur- was representative of the spelt diversity. Given that rence because the genetic redundancy created by poly- Iran77d does not clearly belong to any of the 10 ploidy can change the dynamics of coding sequence clusters (Q < 0.7) and that it might originate from evolution and lead to the accumulation of PSC in dupli- ancestors that differ from those of the other spelts cated genes (Akhunov et al. 2013; Mighell et al. 2000). (Dvorak et al. 2012), it was added to the 10 The appearance of a PSC usually results from a C-to-T selected accessions. This panel enabled us to study substitution (18 of the 19 substitutions in this study). As the diversity of expressed α-gliadin genes of ge- much as 20 % of the total DNA residues can be meth- netically contrasted spelt accessions, to investigate ylated in plants and a cytidine methylation at the 5- their allelic variations at the level of four major T- position can lead to an incorrect replication as a thymi- cell stimulatory epitopes, to evaluate the toxicity dine, which favors the C-to-T transition (Anderson and through their canonical epitope composition and to Greene 1997; Gojobori et al. 1982). In this study, clon- compare these spelt sequences to α-gliadins from ing the α-gliadin sequences from the transcriptome bread wheat and related diploid Triticum and (cDNA) enabled us to avoid cloning most pseudogenes. Aegilops species in order to find potential spelt The proportion of pseudogenes still observed in the specificities. transcript α-gliadins could be explained by the existence of non-functional sequences due to mutations in the Molecular characterization of α-gliadin expressed -coding part while the control elements are main- sequences tained, enabling the transcription of the pseudogene (Mighell et al. 2000). The low proportion of sequences The results obtained in this study revealed a high displaying PSC (3.9 %) could result from the nonsense- allelic variation among the α-gliadin expressed mediated mRNA decay (NMD), which is one of several sequences cloned from the 11 contrasted acces- post-transcriptional mechanisms controlling the quality sions. We successfully cloned and sequenced 464 of mRNA function (Maquat 2004). NMD, also known α-gliadin expressed sequences corresponding to as mRNA surveillance, eliminates mRNAs displaying 226 different complete amino acid sequences. PSC in order to prevent the production of potentially Among these, only 26 displayed 100 % homology deleterious truncated proteins (Maquat 2004;Mitrovich with other α-gliadins from Triticeae species al- and Anderson 2005). ready reported in GenBank. We therefore provided Although most of the full-ORF spelt sequences in 200 new α-gliadin sequences from spelt; only 44 this study displayed the classical α-gliadin structure, 19 spelt α-gliadins had previously been reported. sequences had an extra cysteine and two sequences had Among these 200 new sequences, a rather high only two and five cysteines. This could have a direct number are unique sequences which could mean impact on dough quality because six cysteine residues that only a subset of all expressed α-gliadins has lead to the formation of three intramolecular been amplified. This high diversity is consistent bonds which stabilize the compact globular protein fold, with the multigenic character of the α-gliadin fam- whereas additional cysteines enable intermolecular ily given that several authors have showed that the bonds to be created (Khatkar et al. 2002;Shewryand duplication of α-gliadin genes led to a great gene Tatham 1997). Anderson et al. (2001) and Kasarda 152 Page 12 of 15 Mol Breeding (2016) 36: 152

(1989) postulated that an odd number of cysteine resi- at the third position of the DQ8-glia-α1 epitope by a dues leads to a free cysteine that could participate in phenylalanine (QGFFQPSQQ). Mitea and her colleagues gluten polymers. showed that both variants displayed a 1000 times reduced T-cell stimulation compared to the canonical epitope. Such T-cell stimulatory epitope diversity epitope mutations are thus interesting with the aim of lowering the α-gliadin CD-immunogenic content. The expression of the four T-cell stimulatory epitopes Furthermore, we used the mean number of canonical DQ2.5-glia-α1, -α2, -α3 and DQ8-glia-α1inthe11 epitopes per α-gliadin sequence as an indicator of the contrasted accessions showed high diversity in terms of immunogenic content of the 11 spelt accessions and quality (canonical or mutated epitope) and quantity. there were great variations, with mean values ranging Phylogenetic analyses with α-gliadins from diploid species from 0.87 to 2.11, depending on the accession. Among enabled the spelt α-gliadins to be grouped into three them, despite a hypothesized difference in phylogenetic clusters corresponding to the three genomes, as for bread origin, the Iran77d accession did not stand out from the wheat (Van Herpen et al. 2006). We showed that the others. The Tajik accession TAD06, in relation to the genome of origin (A, B or D) greatly influenced the α- highest proportion of B-genome α-gliadin transcripts, gliadin immunogenicity in spelt, as reported by Mitea et al. displayed a low mean number of canonical epitopes. (2010) for bread wheat. The 33-mer, involving six over- Interestingly, several studies have suggested the exis- lapping copies of DQ2.5-glia-α1 and DQ2.5-glia-α2, is tence of two independent origins for spelt (one in the most immunogenic fragment of α-gliadin sequences Europe and one in Asia) and some genetic differences (Molberg et al. 2005;Shanetal.2002). Alpha-gliadins betweenthemhavebeenreported(Blatteretal.2004; with these epitope duplications are D-genome specific and Dvorak et al. 2012;Jaaska1978). Thus, the reduced the 33-mer fragment is generally found at a low frequency immunogenic content of TAD06 and its geographic in bread wheat α-gliadins (Molberg et al. 2005; Ozuna origin provide an interesting route for exploring the et al. 2015). In this study, we noticed the same trend, with genetic diversity and the α-gliadin composition of spelt only 13.8 % of the D-genome spelt α-gliadins containing accessions originating from Asia. the full 33-mer fragment. In addition, α-gliadin sequences from the D genome were the only ones to display the four Comparison between spelt and bread wheat α-gliadin canonical epitopes, as already shown in bread wheat and sequences diploid species (Van Herpen et al. 2006). In bread wheat, α-gliadins expressed from the Gli-A2 locus display two The phylogenetic analysis based on the spelt α-gliadin canonical epitopes (DQ2.5-glia-α1and-α3) and a small sequencesinthisstudyandothersfrombreadwheatand proportion of them also contain a canonical DQ8-glia-α1 diploid species in the Triticeae tribe did not show a clear (Lietal.2012). Mitea et al. (2010) found that there were no separation between them. In the A and B genome, sub- canonical HLA-DQ2.5 T-cell epitopes in B genome α- clusters showing preferential grouping were however gliadins from bread wheat and that their mutated substi- pointed out. Moreover, significant differences in length tutes did not display any T-cell stimulatory capacity. Given in the two polyglutamine regions (PQI and PQII) were that these features were reflected in the spelt α-gliadin observed. These regions play an important role in dough sequences in our study, a high proportion of expressed α- properties because large numbers of glutamine side gliadins from the B genome combined with few sequences chains can increase the visco-elasticity properties of from the D genome would be desirable in order to develop dough via intermolecular interactions, given that they new spelt varieties with a reduced CD-immunogenic con- are both good hydrogen bond donors and acceptors tent. Moreover, Mitea et al. (2010) showed that almost all (Masci et al. 2000). Spelt α-gliadins from the A genome substitutions occurring in the four major T-cell stimulatory had a PQI region with a significantly higher number of epitopes reduced or suppressed their toxicity, regardless of glutamine residues than the PQI regions in the B and D the genome. They synthetized and tested the toxicity of 14- genome sequences. We did not, however, observe spelt to 17-mer epitope peptides including two variants α-gliadin sequences from a particular genome with sig- highlighted in the Fig. 2b: the substitution of the proline nificantly larger PQII regions. This does not accord with residue by a threonine one position after the DQ2.5- the findings of previous studies (Li et al. 2012;Lietal. glia-α2 (P-PQLPYPQT) and the substitution of the serine 2013; Van Herpen et al. 2006; Xie et al. 2010)on Mol Breeding (2016) 36: 152 Page 13 of 15 152

Triticeae species other than spelt, showing that the B References genome sequences had a significantly higher number of Q-residues in the PQII region. The main reason for such a Akhunov ED, Sehgal S, Liang H, Wang S, Akhunova AR, Kaur difference not being observed in these spelt α-gliadins G, Li W, Forrest KL, See D, Simkova H, Ma Y, Hayden MJ, lies in the occurrence of two sub-groups in the B genome Luo M, Faris JD, Dolezel J, Gill BS (2013) Comparative sequences, the first one with a low number (6 or 7) and analysis of syntenic genes in grass genomes reveals acceler- ated rates of gene structure and coding sequence evolution in the second one with a large number (from 13 to 25) of Q- polyploid wheat. Plant Physiol 161:252–265. doi:10.1104 residues in the PQII region. Interestingly, we also re- /pp.112.205161 vealed two sub-groups, YG and FV, in the B genome An X, Li Q, Yan Y, Xiao Y, Hsam SLK, Zeller FJ (2005) Genetic sequences based on the amino acid patterns located be- diversity of European spelt wheat (Triticum aestivum ssp. α spelta L. Em. Thell.) revealed by glutenin subunit variations fore and in the DQ8-glia- 1 epitope. For 144 out of the at the Glu-1 and Glu-3 loci. Euphytica 146:193–201. doi: 150 B genome sequences (96 %), the YG-type classifi- 10.1007/s10681-005-9002-6 cation was systematically associated with a short PQII Anderson OD, Greene FC (1997) The a-gliadin gene family. II. region, whereas the FV-type was associated with a long DNA and protein sequence variation, subfamily structure, – PQII region. In addition, the absence of a clear YG-FV and origins of pseudogenes. Theor Appl Genet 95:59 65. α doi:10.1007/s001220050532 dichotomy in Ae. speltoides and bread wheat -gliadins Anderson OD, Litts JC, Greene FC (1997) The a-gliadin gene make them distinguishable from spelt α-gliadins. family. I. Characterization of ten new wheat a-gliadin geno- A previous study also reported that several bread mic clones, evidence for limited sequence conservation of wheat varieties had fewer α-gliadins expressed from flanking DNA, and southern analysis of the gene family. Theor Appl Genet 95:50–58. doi:10.1007/s001220050531 the B genome than from the A and D genomes Anderson OD, Hsia CC, Torres V (2001) The wheat γ- (Kawaura et al. 2005). Spelt accessions in our study gliadin genes: characterization of ten new sequences did not display this expression pattern, with the mean and further understanding of γ-gliadin gene family proportions of spelt α-gliadin transcripts from A, B and structure. Theor Appl Genet 103:323–330. D genomes being 42, 34 and 24 %, respectively, and the doi:10.1007/s00122-001-0551-3 α Arentz-Hansen EH, McAdam SN, Molberg O, Kristiansen C, B-genome -gliadin proportion even reaching 59 % in Sollid LM (2000) Production of a panel of recombinant the TAD06 accession. This higher frequency of gliadins for the characterization of T cell reactivity in coeliac expressed α-gliadins from the B genome compared with disease. Gut 46:46–51. doi:10.1136/gut.46.1.46 bread wheat suggests that it would be worthwhile pay- Arentz-Hansen H, McAdam SN, Molberg O, Fleckenstein B, Lundin KEA, Jorgensen TJD, Jung G, Roepstorff P, Sollid ing more attention to spelt in efforts to develop safer LM (2002) Celiac lesion T cells recognize epitopes that varieties for CD patients. cluster in regions of gliadins rich in proline residues. Gastroenterology 123:803–809. doi:10.1053 /gast.2002.35381 Bertin P, Grégoire D, Massart S, de Froidmont D (2004) High level of genetic diversity among spelt germplasm revealed by Acknowledgments The authors would like to thank Emanuelle microsatellite markers. Genome 47:1043–1052. doi:10.1139 Escarnot for the implementation and the monitoring of the field trials. /G04-065 The advices of Yordan Muhovski and the technical assistance of Blatter RHE, Jacomet S, Schlumbaum A (2004) About the origin Xavier Seffer, Roberte Baleux, Laurence Kutten and Agnès Melard of European spelt (Triticum spelta L.): allelic differentiation are also acknowledged. of the HMW glutenin B1-1 and A1-2 subunit genes. Theor Compliance with ethical standards Appl Genet 108:360–367. doi:10.1007/s00122-003-1441-7 Caballero L, Martin LM, Alvarez JB (2004) Variation and genetic Conflict of interest The authors declare that they have no con- diversity for gliadins in Spanish spelt wheat accessions. flict of interest. Genet Resour Crop Ev 51:679–686. doi:10.1023 /B:GRES.0000034575.97497.6e Camarca A, Anderson RP, Mamone G, Flerro O, Facchiano A, Costantini S, Zanzi D, Sidney J, Auricchio S, Sette A, Open Access This article is distributed under the terms of the Troncone R, Gianfrani C (2009) Intestinal T cell responses Creative Commons Attribution 4.0 International License (http:// to gluten peptides are largely heterogeneous: implications for creativecommons.org/licenses/by/4.0/), which permits unrestrict- a peptide-based therapy in celiac disease. J Immunol 182: ed use, distribution, and reproduction in any medium, provided 4158–4166. doi:10.4049/jimmunol.0803181 you give appropriate credit to the original author(s) and the source, Campbell KG (1997) Spelt: agronomy, genetics, and breeding. In: provide a link to the Creative Commons license, and indicate if Janick J (ed) Plant Breeding Reviews, vol 15. Wiley, New changes were made. York, pp. 187–213 152 Page 14 of 15 Mol Breeding (2016) 36: 152

Ciccocioppo R, Di Sabatino A, Corazza GR (2005) The immune Kema GHJ (1992) Resistance in spelt wheat to yellow rust. III. recognition of gluten in . Clin Exp Immunol Phylogenetical considerations. Euphytica 63:225–231 140:408–416. doi:10.1111/j.1365-2249.2005.02783.x Khatkar BS, Fido RJ, Tatham AS, Schofield JD (2002) Functional Dvorak J, Deal KR, Luo M-C, You FM, von Borstel K, Dehgani H properties of wheat gliadins. II. Effects on dynamic rheolog- (2012) The origin of spelt and free-thrishing hexaploid ical properties of wheat gluten. J Cereal Sci 35:307–313. wheat. J Hered 103:426–441. doi:10.1093/jhered/esr152 doi:10.1006/jcrs.2001.0430 Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a Koening A, Konitzer K, Wieser H, Koehler P (2015) website and program for visualizing STRUCTURE output Classification of spelt cultivars based on differences in stor- and implementing the Evanno method. Conserv Genet age protein compositions from wheat. Food Chem 168:176– Resour 4:359–361. doi:10.1007/s12686-011-9548-7 182. doi:10.1016/j.foodchem.2014.07.040 Evanno G, Regnaut S, Goudet J (2005) Detecting the number of Kohajdova Z, Karovicova J (2008) Nutritional value and baking clusters of individuals using the software STRUCTURE: a applications of spelt wheat. Acta Sci Pol Technol Aliment 7: simulation study. Mol Ecol 14:2611–2620. doi:10.1111 5–14 /j.1365-294X.2005.02553.x Koning F, Schuppan D, Cerf-Bensussan N, Sollid LM (2005) Garcia-Maroto F, Marana C, Garcia-Olmedo F, Carbonero P Pathomechanisms in celiac disease. Best Pract Res Cl Ga (1990) Nucleotide sequence of a cDNA encoding an α/β- 19:373–387. doi:10.1016/j.bpg.2005.02.003 type gliadin from hexaploid wheat (Triticum aestivum). Plant Kozub NA, Boguslavskii RL, Sozinov IA, Tverdokhleb YV, MolBiol14:867–868. doi:10.1007/BF00016521 Xynias IN, Blume YB, Sozinov AA (2014) Alleles at storage Gianfrani C, Auricchio S, Troncone R (2005) Adaptative and protein loci in Triticum spelta L. Accessions and their occur- innate immune responses in celiac disease. Immunol Lett rence in related . Cytol Genet 48:33–41. doi:10.3103 99:141–145. doi:10.1016/j.imlet.2005.02.017 /S0095452714010046 Gojobori T, Li W-H, Graur D (1982) Patterns of nucleotide sub- Li J, Wang S, Li S, Ge P, Li X, Ma W, Zeller FJ, Hsam SLK, Yan Y stitution in pseudogenes and functional genes. J Mol Evol 18: (2012) Variations and classification of toxic epitopes related 360–369. doi:10.1007/BF01733904 to celiac disease among α-gliadin genes from four Aegilops Gu YQ, Crossman C, Kong X, Luo M, You FM, Coleman-Derr D, genomes. Genome 55:513–521. doi:10.1139/g2012-038 Dubcovsky J, Anderson OD (2004) Genomic organization of Li J, Wang S-L, Cao M, Lv D-W, Subburaj S, Li X-H, Zeller FJ, the complex α-gliadin gene loci in wheat. Theor Appl Genet Hsam SLK, Yan Y-M (2013) Cloning, expression, and evo- 109:648–657. doi:10.1007/s00122-004-1672-2 lutionary analysis of α-gliadin genes from Triticum and Hall TA (1999) BioEdit: a user-friendly biological sequence align- Aegilops genomes. J Appl Genet 54:157–167. doi:10.1007 ment editor and analysis program for windows 95/98/NT. /s13353-013-0139-z Nucl Acids Symp Ser 41:95–98. doi:10.1021/bk-1999-0734. Li Y, Xin R, Zhang D, Li S (2014) Molecular characterization of ch008 α-gliadin genes from common wheat cultivar Zhengmai 004 Harberd NP, Bartels D, Thompson RD (1985) Analysis of the and their role in quality and celiac disease. Crop J 2:10–21. gliadin multigene loci in bread wheat using nullisomic- doi:10.1016/j.cj.2013.11.003 tetrasomic lines. Mol Gen Genet 198:234–242. doi:10.1007 Maiuri L, Ciacci C, Ricciardelli I, Vacca L, Raia V, Auricchio S, /BF00383001 Picard J, Osman M, Quaratino S, Londei M (2003) Jaaska V (1978) NADP-dependent aromatic alcohol dehydroge- Association between innate response to gliadin and activation nase in polyploid wheats and their diploid relatives. On the of pathogenic T cells in coeliac disease. Lancet 362:30–37. origin and phylogeny of polyploid wheats. Theor Appl Genet doi:10.1016/S0140-6736(03)13803-2 53:209–217. doi:10.1007/BF00277370 Maquat LE (2004) Nonsense-mediated mRNA decay: splicing, Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster translation and mRNP dynamics. Nat Rev Mol Cell Bio 5: matching and permutation program for dealing with label 89–99. doi:10.1038/nrm1310 switching and multimodality in analysis of population struc- Marsh MN (1992) Mucosal pathology in gluten sensitivity. In: ture. Bioinformatics 23:1801–1806. doi:10.1093 Marsh MN (ed) Coeliac disease. Blackwell, Oxford, pp. 136– /bioinformatics/btm233 191 Kasarda DD (1989) Glutenin structure in relation to wheat quality. Masci S, D’Ovidio R, Lafiandra D (2000) A 1B-coded low- In: Wheat is unique. Am Assoc Cereal Chem 277–302 molecular-weight glutenin subunit associated with quality Kasarda DD, Okita TW, Bernardin JE, Baecker PA, Nimmo CC, in durum wheats shows strong similarity to a subunit present Lew EJ-L, Dietler MD, Greene FC (1984) Nucleic acid in some bread wheat cultivars. Theor Appl Genet 100:396– (cDNA) and amino acid sequences of α-type gliadins from 400. doi:10.1007/s001220050052 wheat (Triticum aestivum). Proc Natl Acad Sci U S A 81: Mighell AJ, Smith RN, Robinson PA, Markham AF (2000) 4712–4716 Vertebrate pseudogenes. FEBS Lett 468:109–114. Kaur A, Bains NS, Sood A, Yadav B, Sharma P, Kaur S, Garg M, doi:10.1016/S0014-5793(00)01199-6 Midha V,Chhuneja P (2016) Molecular characterization of a- Mitea C, Salentijn EMJ, van Veelen P, Goryunova SV, van der gliadin gene sequences in Indian wheat cultivars Vis-à-Vis Meer IM et al (2010) A universal approach to eliminate celiac disease eliciting epitopes. J Plant Biochem Biot. antigenic properties of alpha-gliadin peptides in celiac dis- doi:10.1007/s13562-016-0367-5 ease. PLoS One 5:e15637. doi:10.1371/journal. Kawaura K, Mochida K, Ogihara Y (2005) Expression profile of pone.0015637 two storage-protein gene families in hexaploid wheat re- Mitrovich QM, Anderson P (2005) mRNA surveillance of vealed by large-scale analysis of expressed sequence tags. expressed pseudogenes in C. elegans. Curr Biol 15:963– Plant Physiol 139:1870–1880. doi:10.1104/pp.105.070722 967. doi:10.1016/j.cub.2005.04.055 Mol Breeding (2016) 36: 152 Page 15 of 15 152

Molberg O, Uhlen AK, Jensen T, Flaete NS, Fleckenstein B, Shewry PR, Halford GH (2002) Cereal seed storage proteins: Arentz-Hansen H, Raki M, Lundin KEA, Sollid LM (2005) structures, properties and role in grain utilization. J Exp Bot Mapping of gluten T-cell epitopes in the bread wheat ances- 53:947–958. doi:10.1093/jexbot/53.370.947 tors: implications for celiac disease. Gastroenterology 128: Shewry PR, Tatham AS (1997) Disulphide bonds in wheat gluten 393–401. doi:10.1053/j.gastro.2004.11.003 proteins. J Cereal Sci 25:207–227. doi:10.1006 Mooney PD, Aziz I, Sanders DS (2013) Non-celiac gluten sensi- /jcrs.1996.0100 tivity: clinical relevance and recommendations for future Shewry PR, Halford NG, Lafiandra D (2003) Genetics of wheat research. Neurogastroent Motil 25:864–871. doi:10.1111 gluten proteins. Adv Genet 49:111–184 /nmo.12216 Stepniak D, Koning F (2006) Celiac disease-sandwiched between Noma S, Kawaura K, Hayakawa K, Abe C, Tsuge N, Ogihara Y innate and adaptive immunity. Hum Immunol 67:460–468. α (2015) Comprehensive molecular characterization of the / doi:10.1016/j.humimm.2006.03.011 β-gliadin multigene family in hexaploid wheat. Mol Gen – Sumner-Smith M, Rafalski JA, Sugiyama T, Stoll M, Söll D Genomics 291:65 77. doi:10.1007/s00438-015-1086-7 (1985) Conservation and variability of wheat α/β-gliadin Okita TW, Cheesbrough V, Reeves CD (1985) Evolution and – α− β γ genes. Nucleic Acids Res 13:3905 3916. doi:10.1093 heterogeneity of the / -type and -type gliadin DNA /nar/13.11.3905 sequences. J Biol Chem 260:8203–8213 Tamura K, Nei M, Peterson D, Filipski A, Kumar S (2013) Ozuna CV, Iehisa JCM, Gimenez MJ, Alvarez JB, Sousa C, Barro MEGA6: molecular evolutionary genetics analysis version F (2015) Diversification of the celiac disease α-gliadin com- 6.0. Mol Biol Evol 30:2725–2729. doi:10.1093 plex in wheat: a 33-mer peptide with six overlapping epi- /molbev/mst197 topes, evolved following polyploidization. Plant J 82:794– 805. doi:10.1111/tpj.12851 Vader LW, Stepniak DT, Bunnik EM, Kooy YMC, De Haan W, Pritchard JK, Stephens M, Donnelly P (2000) Inference of popu- Wouter Drijfhout J, Van Veelen PA, Koning F (2003) lation structure using multilocus genotype data. Genetics Characterization of cereal toxicity for celiac disease patients – based on protein homology in grains. Gastroenterology 125: 155:945 959 – Rashtak S, Murray JA (2012) Review article: coeliac disease, new 1105 1113. doi:10.1053/S0016-5085(03)01204-6 approaches to therapy. Aliment Pharm Ther 35:768–781. Van den Broeck HC, de Jong HC, Salentijn EMJ, Dekking L, doi:10.1111/j.1365-2036.2012.05013.x Bosch D, Hamer RJ, Gilissen LJWJ, van der Meer IM, Rewers MJ (2005) Epidemiology of celiac disease: what are the Smulders MJM (2010) Presence of celiac disease epitopes prevalence, incidence, and progression of celiac disease? in modern and old hexaploid wheat varieties: wheat breeding Gastroenterology 128:S47–S51 may have contributed to increased prevalence of celiac dis- – Salamini F, Ozkan H, Brandolini A, Schafer-Pregl R, Martin W ease. Theor Appl Genet 121:1527 1539. doi:10.1007 (2002) Genetics and geography of wild cereal domestication /s00122-010-1408-4 in the near east. Nat Rev Genet 3:429–441. doi:10.1038 Van Herpen TWJM, Goryunova SV, van der Schoot J, Mitreva M, /nrg817 Salentijn E, Vorst O, Schenk MF, van Veelen PA, Koning F, Sander I, Rozynek P, Rihs H-P,van Kampen V,Chew FT, Lee WS, van Soest LJM, Vosman B, Bosch D, Hamer RJ, Gilissen Kotschy-Lang N, Merget R, Bruning T, Raulf-Heimsoth M LJWJ, Smulders MJM (2006) Alpha-gliadin genes from the (2011) Multiple wheat flour allergens and cross-reactive car- a, B and D genomes of wheat contain different sets of celiac bohydrate determinants bind IgE in baker’s asthma. Allergy disease epitopes. BMC Genomics 7:1. doi:10.1186/1471- 66:1208–1215. doi:10.1111/j.1398-9995.2011.02636.x 2164-7-1 Sapone A, Bai JC, Ciacci C, Dolinsek J, Green PHR, Von Buren M (2001) Polymorphisms in two homeologous γ- Hadjivassiliou M, Kaukinen K, Rostami K, Sanders DS, gliadin genes and the evolution of cultivated wheat. Genet Schumann M, Ullrich R, Villalta D, Volta U, Catassi C, Resour Crop Ev 48:205– 220. doi:10.1023 Fasano A (2012) Spectrum of gluten-related disorders: con- /A:1011213228222 sensus on new nomenclature and classification. BMC Med Xie Z, Wang C, Wang K, Wang S, Li X, Zhang Z, Ma W, Yan Y 10:13. doi:10.1186/1741-7015-10-13 (2010) Molecular characterization of the celiac disease epi- Shan L, Molberg O, Parrot I, Hausch F, Filiz F, Gray GM, Sollid tope domains in α-gliadin genes in Aegilops tauschii and LM, Khosla C (2002) Structural basis for gluten intolerance hexaploid wheats (Triticum aestivum L.). Theor Appl Genet in celiac sprue. Science 297:2275–2279. doi:10.1126 121:1239–1251. doi:10.1007/s00122-010-1384-8 /science.1074129