<<

Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. fseiesi rca oaodcmrmsn eerhadco rdcin ie hti snteasy not is it that Given production. crop and research lack compromising (2) , avoid authentication Therefore, to incorrect in material. crucial (1) housed old to germplasms is with due on contamination specimens mislabeled reliance (3) be our and of may increasing parameters, 2013). banks lost, morphological seed been al., diagnostic in have of et relatives seeds However, (Jia close their banks. erosion and seed genetic conserved crops of of initially forms were rounds some wild relatives several originated close undergone which their has later and wheat, and Crescent, crops bread Fertile of example, the resources loss For a Genetic in suffered ago domestication. have years crops prolonged most 8000 following however, improvement; diversity cultivar genetic for fundamental of are resources Genetic 1964). rpvreisrqie pcfi ee rmtegn olo h rpseisado t ls eaie,such relatives, close its and/or species higher-yielding crop of has the ( Creation production of rice crop pool intensification. in in crop gene increase the and semidwarfing rapid from unit the The together per as expansion. wheat, specific yields this and requires higher for varieties maize, three-fold key crop through Rice, a been largely by have 2013). achieved supported crops, Yearbook been been staple has Statistical which (FAO’s other population, some production human with crop the in in explosion expansion an global witnessed years 50 last The biodiversity. the rice clarify support to and expected germplasms, are rice results Introduction of These that use revealed mislabeled. and were identification collections the field rice-specific in chloroplast or six aid banks a The species, seed and marker. rice rice R22), reliable from of accessions most and concept seed the (NP78 rps16- 53 was barcodes the rpoB-trnC, latter of DNA (psaJ-rpl33, The 17% nuclear barcodes . genome DNA rice-specific conventional DNA L chloroplast two four and super trnV-ndhC), of rice-specific D genome performance and six the rufipogon. ITS), trnK-matK, evaluated O. we and rps19-rpl22, and Importantly, psbA-trnH, japonica trnQ, and extinct. minuta rbcL, was subsp. O. type (matK, sativa granulata, genome O. barcodes O. H and DNA the and indica, and meyeriana and found O. subsp. not alta, glaberrima sativa were O. O. O. types and between and grandiglumis nivara determined O. O. were longistaminata, malampuzhaensis, relationships O. O. Conspecific and identity of and glumipatula identified. species Breeding determine O. were collected to barthii, we species are genes population. O. Here, Oryza nuclear which Twenty-one world’s hypervariable misidentifications. two species, or the seeds. and issues wild genomes rice of taxonomic chloroplast in and to half complete due cultivated 58 supporting of mislabeled both world, are the from seeds the applied numerous resources Unfortunately, in genetic banks. crops seed on important in relies maintained most cultivars the quality of and one high-yielding is Oryza) (genus Rice Abstract 2020 28, April 3 2 1 Wu e Zhang Wen super and specific, conventional, barcodes Oryza: of barcoding DNA nttt fFrni cec,Mnsr fPbi Security Public of Ministry Science, Forensic of Institute Sciences University of Forestry Academy Northeast Chinese of Institute 1 uyn Yang Xueying , nsitu in 1 uh Sun Yuzhe , nterhmlnso eryaes ihitnercaaino rbeln,mr n more and more land, arable of reclamation intense With areas. nearby or homelands their in 3 n h-in Zhou Shi-Liang and , 1 i Liu Jia , sd-1 1 and ) hoXu Chao , Rht1 1 and 1 1 ihiZou Xinhui , Rht2 nwet(ae&Mrhl,17;Jennings, 1973; Marshall, & (Gale wheat in 1 u Chen Xun , xsitu ex nse ak worldwide banks seed in 2 aliLiu Yanlei , 1 Ping , Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. Oryza olce uigorfil xeiin oce pcmn fteesmlswr eoie nteherbarium Research the in Rice deposited International were the samples these were from accessions of three accessions) and specimens source Voucher (41 particular a expedition. mostly to traced field proceeded be our not They could during accessions collected Six Philippines. 1). the (Table in Institute field of the accessions from three including accessions, seed Fifty-three using of acquisition faults Seed possible seed address in also Methods housed we and rice resources barcodes, identifying Materials DNA barcode. genetic for nuclear super barcode of some genome super use employing chloroplast genome By rice chloroplast the rice the 2005) and the banks. a al., seed of seeds of effectiveness from et of half the resources seeds Wing of demonstrate characterization than genetic we study 2003; correct Here, more precious genetic Kadowaki, the banks. feeds and represent & facilitate molecular it relatives Morishima, can the and or (Vaughan, 2010) for worldwide improvement progenitors tools genetic crops genomic wild and cereal Established Its breeding important rice most 2005). relatives. for the or (Khush, progenitors of population wild ( one their world’s rice is with Asian genetically the rice perhaps of Cultivated and subspecies Chen, two morphologically Sang, the intermingled Ge, example, are Lu, For they 2016; 1989). ), Ge, Vaughan, & 2014; Yan, al., (Liu, et persist sativa Rougerie issues 2001; some Hong, but revisions & taxonomic ( several to types subjected genome ( haploid types known the genome eight unknown regarding includes remain and disputes However, 2005) S1). Table 1989; (Vaughan, areas between or relationship subtropical species and to tropical correct genome, genus across the chloroplast the to entire to belongs the assigned Rice as finally fragments. such be gene barcodes, chloroplast can super common super banks using of require entire seed advent seeds the The in Rice between With distinguish species interest. barcodes. haplotypes. related assemblies parts super individual of and closely or DNA genome), even species genome of of a complete the in examples seeds a regions constitute between barcodes, (or includes polymorphisms discriminate genes barcode nucleotide many DNA to single super of a combinations of DNA information instances, genomes, A such enough mitochondrial address or 2015). containing “species” To chloroplast diverged al., genome slightly conventional 2011). et only a advancements, Little, (Li or other & of species proposed identify related Graham, and was closely and (Hollingsworth, barcode these extremely event 2010), super Despite of radiation al., case recent Van the et 2012). (Lahaye, a in (Sonstebo al., work from species not biodiversity Karsholt, endangered et do assess (Howard or (Huemer, barcodes invasive 2008), mixtures species DNA traded, a Savolainen, cryptic in illegally of discover & detect resolution to Duthoit, medicinal 2009), unsatisfactory used Maurin, al., to successfully of Bank, et Due is region der (Hollingsworth Kress technique A assessed 2015). 2014; the were Mutanen, al., 2015). Nowadays, schemes & et al., combination 2005), various 2009). et (Dong Janzen, Yan species, al., resolution between 2011; & et high discriminating al., Weigt, its in et Zimmer, to marker Li Wurdack, owing single 2010; target (Kress, al., and barcoding 2005 et (ITS)/ITS2 a (Chen in DNA plants mention combination ribosomal land two-locus first of for the their spacer 2009, Following transcribed reliable In a internal fragments. 2009). become DNA has al., short barcoding for et on DNA solution based 2003), promising DeWaard, species a & of identify Ball, offer rapidly Cywinska, to to (Hebert, technology come 2003 has in proposed barcoding First DNA morphology, materials. similar on very solely between based discriminating seeds identify to matK a eysoteouinr itr.I iegdfrom diverged It history. evolutionary short very a has ,subsp. ), + rbcL indica a eomne sacr acd o h dnicto fln lns(Hollingsworth plants land of identification the for barcode core a as recommended was A .granulata O. n subsp. and and D C Oryza and ali eoetps hc r ocoeyrltdta hycno eresolved be cannot they that related closely so are which types, genome haploid H japonica ntefml oca.Tegnscnit faot2 pce distributed species 26 about of consists genus The Poaceae. family the in and Agra,Ba,Nni un,&Kuh 99.Tegnshsbeen has genus The 1999). Khush, & Huang, Nandi, Brar, (Aggarwal, ) .meyeriana O. r aooial norc.Ai oArcnrc ( rice African to Akin incorrect. taxonomically are , 2 n between and , Leersia A , B Leersia eeaqie rmse ak rcollected or banks seed from acquired were , psbA-trnH , C .sativa O. .schweinfurthiana O. , oe1 ilo er g Go&Ge, & (Guo ago years million 14 some E , F eepooe snwbarcodes new as proposed were Kme l,20;Tn tal., et Tang 2008; al., et (Kim , G , J ycf1 , K a lopooe as proposed also was and , and .glaberrima O. .punctata O. L n two and ) O. . Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. aae otie 8clrpatgnms ersnigalrc pce 13prseis,tgte with together species), per (1~3 species rice all representing genomes, chloroplast three 58 methods. contained seed phylogenetic with 1 and using Dataset comparison, (together datasets resolution corresponding genomes delimitation, with Species chloroplast Se-Al. performed (Katoh using were 37 mafft-win manually using identification adjusted aligned with S3), and (Table combined 2013), GenBank Standley, from were & downloaded species) genomes three of chloroplast fragments chloroplast determined newly The Biosystems). calling (Applied nucleotide some preparation Analyzer correct Dataset DNA to 3730xl necessary if ABI edited an and 5.4 on Sequencher mistakes. directions using assembled both were in sequences The sequenced and 94 PEG8000 at using cycles 40 included 1 program contained mixture 5.4. reaction barcodes Sequencher PCR DNA with The rice-specific HiSeq assembled Illumina of and an references on and known together amplification using fragments sequenced PCR The and extracted fragments were primers. chloroplast Reads specific the using platform. with amplified selected Ten mixed were were X Fragments were genes sample nuclear 2008). same single-copy al., the and et variable of (Zou highly genes two candidate species, 142 polyploid from of origins the regions. determine To these sequence and amplify to all designed from 2009) Rozas, genomes & chloroplast brado all across diversity Nucleotide Lipman, & Myers, (2013). primers design Miller, barcode Zhou using and sequencing DNA Gish, Rice-specific Sanger Lin, (Altschul, by Cheng, 2.8.10 filled Xu, were blastn Dong, gaps by by (Corperation)and reported references 5.4 Sequencher closest with the assembled sorted 1990), to were Novogene assembled reads mapped Beijing genome & were were Chloroplast Wang, at genomes Yu, contigs sample platform. Wang, Ten Total the each X Li, oven. HiSeq for and ( Illumina convection sequenced method out an mCTAB a and using the Beijing, in constructed Ltd, using was dried Co., leaves Technology library quickly dry A and from harvested, extracted 2013). greenhouse, was Zhou, mg) a (~30 in DNA seeds the genomic from identification, raised field were or Seedlings determination names genome their chloroplast on and Based extraction DNA species. Sciences. 25 of to Academy belonged Chinese samples Botany, rice of Institute the of hsdtstwsue odlmttecrusrpino pce oehrwt aae n ue barcode super a and 6 dataset ( with 2 together Dataset species of circumscription the lengths). branch delimit (longer to of quality low used relatively was of dataset genomes or This positions) systematic (wrong genomes mislabeled aiu parsimony. Maximum rice DNA of candidate identification species these analyses reliable Phylogenetic for of identified methods 2008). barcodes resolution phylogenetic DNA al., using the chloroplast et analyzed rice-specific test was (Zou six seeds. dataset of genes to This concatenation study. 142 the methods this by from in phylogenetic formed selected was using R22) 7 Dataset and analyzed barcodes. (N78 were genes datasets nuclear The single-copy two of concatenation The barcodes. DNA Oryza Leersia . matK pce sotrus aiu asmn nlsswr are u oietf n exclude and identify to out carried were analyses parsimony Maximum outgroups. as species ,dtst3( 3 dataset ), psbA-trnH h othpraibergoswr eetda ieseicbroe.Pieswere Primers barcodes. rice-specific as selected were regions hypervariable most The . rbcL ° o 0s 55 s, 30 for C eunei nerpe by interrupted is sequence ,dtst4( 4 dataset ), enovo de × a ue ihMg with buffer Taq ° sn Pds39(akvc ta. 02.Tegenerated The 2012). al., et (Bankevich 3.9 SPAdes using o 0s n 72 and s, 30 for C psbA-trnH 3 Oryza ,addtst5(T)rpeetdconventional represented (ITS) 5 dataset and ), 2+ rps19 . MdTs n 0n N.TePCR The DNA. ng 20 and dNTPs, mM 0.1 , ° o i.PRpout eecleaned were products PCR min. 2 for C pce a unie sn DnaSP using quantified was species nPaee aae ersne the represented 6 Dataset Poaceae. in (Li- Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. .punctata The O. a with 1. same with species the species The Fig. of comprising with Haplotypes consistent The The base, species. the 2). allotetraploid at of (Fig. located origins clades was the monophyletic types clarified formed genes types the nuclear genome of on species a based that Phylogeny indicating clades, the monophyletic form to not belonged did parent chloroplast maternal complete their the suggesting in parent. clades maternal and species eight their The type The was 1). type genes. (Fig. genome R22 types and genome eight minuta NP78, the O. ITS, exactly nuclear matched the phylogeny genome as well as genomes, among relationships phylogenetic The relationships phylogenetic their and MrBayes species from Rice using split outcomes rooted of The were deviation trees burn-in. standard consensus as average the discarded Results the were and when every PhyloSuite runs al., achieved sampled by of et was were up 25% (Zhang trees Stationarity summed Bayesian first PhyloSuite and were The chains. generations the 6. <0.01. 4 2,000,000 in remained dataset x run integrated frequencies ModelFinder was 2 and 2012) process with running 1 al., Carlo generations dataset Monte et by 100 chain (Fredrik for Markov selected 3.2 2017) The MrBayes Blosum+I+G Jermiin, 2020). with and & assessed was Haeseler, GTR+I+G inference Von were Wong, us- models rooted Minh, (Kalyaanamoorthy, were substitution trees best-fit The The replicates. bootstrap 1000 inference. with Bayesian assessed was model. trees GTR+I+G ML the with the ing 2014) for (Stamatakis, RAxML support using Branch performed replicate. were trees each analyses The likelihood from Maximum [?]5 replicates. bootstrap scores 1000 tree with likelihood. two reconnection Maximum assessed than and was more bisection trees parsimony no using tree maximum rooted replicates, with were the 100 trees search for of multiple tree support addition saving The Branch stepwise 2003). and random (Swofford, swapping, with 4.0a150 branch strategy version PAUP heuristic using a executed used was analysis parsimony Maximum eei iegnebtenseiso h aegnm yewsrte ml,ecp between except small, rather was grandiglumis type genome O. same the of teri species between divergence Genetic a are u sn h rebidn ehd n i ihrslto ein ( regions regions high-resolution 36 six these and of evaluation method, Further building diversity. tree nucleotide the on , DnaSP, using based of picked out method were window carried sliding S2) was the (Table by regions identified were 36 genomes and chloroplast the in regions hypervariable The barcodes DNA with Rice-specific clade monophyletic a formed latter the sativa O. glumipatula rps16 Leersia and D J - altpsfre ld with clade a formed haplotypes trnQ and .coarctata O. subsp. .schweinfurthiana O. pce soutgroups. as species fthe of and K ( B , rps19 ugsigta aenlprn ihthe with parent paternal a that suggesting , indica and ) .longistaminata O. rbetween or , BC Leersia BC - .schweinfurthiana O. rpl22 osgicn hools eoedvrec a bevdbetween observed was divergence genome chloroplast significant No . n subsp. and eoetp omdacaewith clade a formed type genome eoehdidpnetoiis with origins, independent had genome pce soutgroups. as species , trnK fthe of .barthii O. japonica - ncnrs,clrpatgnm iegnewsceryntdbetween noted clearly was divergence genome chloroplast contrast, In . matK BC Oryza E eoetp omdacaewt pce fthe of species with clade a formed type genome and .alta O. and , altps niaigta the that indicating haplotypes, = h omrfre oohltccaewith clade monophyletic a formed former The . .rufipogon O. pce eercntutdbsdo hi opeechloroplast complete their on based reconstructed were species .glaberrima O. .punctata O. trnV C , eoetp.Seiswith Species type. genome .grandiglumis O. 4 - ndhC .punctata O. . ( B H io iegnewsdtce between detected was divergence Minor . H al )wr nlycoe srice-specific as chosen finally were 2) Table , x ) .malampuzhaensis O. eoetp a xse u hnde out. died then but existed had type genome altpsfre ld needn of independent clade a formed haplotypes H .eichingeri O. eoetp a hi aenlparent. paternal their was type genome and , niaigta pce fthe of species a that indicating , D Leersia .latifolia O. eoetp safr of form a is type genome ( HJ C psaJ .malampuzhaensis O. pce soutgroups. as species ). = and .officinalis O. - fthe of rpl33 HK F C .nivara O. and eoetypes genome eoetype, genome , CD .alta O. rpoB .schlech- O. G genome genome ( C - and , trnC and and E x ) O. B . Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. fsesfo edbnso edcletos ie(7)msaee ape eefud(i.S) These S6). (Fig. found were samples between mislabeled confusion (17%) some accessions Nine still 53 types identify is collections. genome to there species-rich field it from used or all we banks were species, samples seed rice all from almost seeds resolved of barcode rice-specific banks the seed that from Considering samples species mislabeled the and Surprisingly, seeds of types well. Identification genome sufficiently them of phylogenetic resolved reliable species barcode extremely though sativa super but Even sites the insensitive 145,860 identify, an of 1). to using length the species exhibited aligned (Fig. all barcode an super method resolving had The power, genome included. chloroplast discriminating were complete outgroups highest when the characters of parsimony-informative 5048 barcode with DNA super the Finally, sativa punctata and combination marker 7943 This of except length considered. species aligned were an all outgroups had almost when and characters resolved regions parsimony-informative chloroplast 603 hypervariable with six species sites of between consisted discriminate barcode to rice-specific The failed combination marker parsimony-informative This 722 with the sites included. of 2218 were of outgroups length aligned when an characters had detected the combination was gene that the NP78+R22 copy nuclear of suggested The ITS species ITS between one on discriminate Only subjected to based those failed amplify. data from ITS to Phylogeny slightly the difficult differed from were marker species. originated this sequences allotetraploid for the several used because in characters samples markers parsimony-informative 162 The chloroplast with sites to considered. 713 of were length aligned outgroups an when had 5.8S) (including ITS nuclear The sativa O. partial and outgroups the of The species between discriminate to failed also S2). marker This the considered. of were species between discriminate The to failed marker This included. were The molecular species, delimited incorrectly or pairs: and narrowly species of following Because the nearly 2004), between of barcode. discriminate sample phylogenies super a cannot parsimonious of the assign markers maximum presence to and the way as the on barcodes, such reliable by based markers, most specific were molecular tested the different comparisons is are using following markers methods samples the identical various Phylogenetic and the type. species genome of a per resolution to species The one than them. more between discriminate the to within markers types barcodes genome DNA different super The and rice-specific, conventional, of variable more powers and Discrimination diversity nucleotide higher displayed markers above than the sites While barcodes. DNA chloroplast .glumipatula O. matK .meyeriana O. rbcL psbA-trnH subsp. A subsp. .barthii O. , n mn tetraploid among and , eehda lge egho 48stswt 0prioyifraiecaatr hnoutgroups when characters parsimony-informative 50 with sites 1428 of length aligned an had gene subsp. rbcL B eehda lge egho 47stswt 0prioyifraiecaatr hnoutgroups when characters parsimony-informative 90 with sites 1417 of length aligned an had gene japonica , indica C vrl,teetoprmtr eemc ihri ula akr Tbe2). (Table markers nuclear in higher much were parameters two these overall, ; einhda lge egho 1 ie ih1 asmn-nomtv hrceswhen characters parsimony-informative 10 with sites 515 of length aligned an had region and indica , H Gn,Broe,&L,2000), Lu, & Borromeo, (Gong, iial,i the in Similarly, . F and , .glaberrima O. and , and rps19 eoetp Fg 4,afidn o upre yteohrtoncergns The genes. nuclear two other the by supported not finding a S4), (Fig. type genome Fg S3). (Fig. .sativa O. .nivara O. J eeicue.Ti akrcudscesul dniyonly identify successfully could marker This included. were eoetps(i.S5). (Fig. types genome .alta O. .punctata O. Wn ta. 2014), al., et (Wang subsp. + .rufipogon O. C Oryza .sativa O. , eoetp,teei ofso ewe ili n tetraploid and diploid between confusion is there type, genome .latifolia O. japonica and eu aegnrlydvre ucetyfrms molecular most for sufficiently diverged generally have genus A A subsp. and .minuta O. and .minuta O. and and 5 C and , .nivara O. .glumipatula O. indica .rufipogon O. C hswsntsrrsn,a nthe in as surprising, not was This . matK eoetypes. genome A .minuta O. fthe of and and eesprbeuigtesprbarcode. super the using separable were n between and , , .alta O. rbcL C .malampuzhaensis O. B . r eycoeyrltdaddifficult and related closely very are eoetp Fg S6). (Fig. type genome and . , psbA-trnH A and .longistaminata O. , B .grandiglumis O. and , A .glaberrima O. , B T,N7+2,rice- NP78+R22, ITS, , C and , H .brachyantha O. eoe Fg S1). (Fig. genomes .rufipogon O. , or .nivara O. C A J eoe (Fig. genomes eoetypes genome , eoetype, genome ( .granulata O. Bo&Ge, & (Bao .barthii O. and + and O. O. O. ) Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. eoetpsaemnseicadrltvl sltdfo te pce.Seispishv enfound been have pairs Species species. the other on from based isolated Phylogeny the relatively of incomplete. and is species monospecific species between that all are indicates of types 1) phylogeny been genome have the (Fig. efforts far, genome considerable So chloroplast Although remain. controversies barcoding. of some DNA taxonomy and for the prerequisite on a made is delimitation species Correct rice of taxonomy and delimitation Species Discussion a ieec o eakbegntcdvrec.Afis ntneo ofso novsteArcncultivated African the involves confusion morphologi- obvious of neither instance exhibit first pairs A species Brozynska, divergence. (Wambugu, Some rice event genetic radiation remarkable 2014). a nor al., by difference et period cal Zhang short 2015; a Henry, within & diverged Waters, Furtado, species the these within All common species. very between is material plant of Misidentification within it merging to compared of parent instead maternal species different a with allotetraploid an and , Guinea. the New of the that Papua species type, or among revealed exist Indonesia 2) problems Jaya, identification Irian (Fig. Major in marker somewhere N78+R22 originating nuclear species now-extinct the a on the to based belonging Phylogeny Species 2003). Ge, & the (Lu type genome the of usind u oeua hlgne n lotalpeiu tde uha htb abg tal. et very Wambugu is It by own. that been their of as never species such has wild closest studies status significantly the previous have taxonomic and that differ subspecies all their now range cultivated isolated, two almost clear 2006), mountain the reproductively and that Himalayan Schaal, confirmed phylogenies are the & have (2015) molecular subspecies in Chiang, Our two separately Hung, the domesticated Chiang, questioned. were Although (Londo, and China physiology, names. southern and naked morphology of in spite in rice Hence, , cultivated Asian differentiation. the niche rufipogon involves O. ecological case without confusing al., field second et the A Wang of in 2011; synonym al., side a et by considered (Li side genes be grow nuclear should on often based They results similar 2014). confirms which genomes, chloroplast their and tata species African the for contrast, between bridization the Within 426597). of instead Douady, latter the (Lefebure, between delimitation difference between species morphological observed trivial was and divergence divergence genome chloroplast molecular Little mis 1999). between 2006). Hong, correlation Gibert, & & general Lu, Gouy, a Sang, Ge, is 2004; There Ge, & (Bao results earlier the confirming that indicates Phylogeny 2). as identified was BOP022669 L n hi oseicntr a ugse y(a e 04 ae nncergns osdrn the Considering genes. nuclear on based 2004) Ge, & (Bao by suggested was nature conspecific their and .glaberrima O. .minuta O. sptra aet osdrn ninfiatmrhlgcldffrne between differences morphological insignificant Considering parent. paternal as .grandiglumis O. eoeddnteit while exist, not did genome HJ .meyeriana O. D eoetp,adbetween and type, genome BC eoetp sfudol nSuhadCnrlAeia pce,sc as such species, American Central and South in only found is type genome h sa utvtdrc a iie notosbpce,subsp. subspecies, two into divided was rice cultivated Asian The . h omrcudb eadda yoy ftelte.Gvnthat Given latter. the of synonym a as regarded be could former the , .sativa O. eoetp,tetoAinspecies Asian two the type, genome n t idprogenitor wild its and .latifolia O. .punctata O. with , and HJ subsp. Oryza .neocaledonica O. and .latifolia O. CCDD ev,a o xml n“h ln it (http://www.theplantlist.org/tpl1.1/record/kew- List” Plant “The on example for as Desv., .schweinfurthiana O. D smtra aetand parent maternal as HK indica osnu a o enrahdo h ubro pce ntegenus the in species of number the on reached been not has consensus , eoetp svr ieyavrato the of variant a likely very is type genome .glaberrima O. .punctata O. .coarctata O. eoetpssaeacmo aenlpoeio ihthe with progenitor paternal common a share types genome eoe.Itrsigy the Interestingly, genomes. .coarctata O. .alta O. sdmsiae from domesticated is n omdacaewith clade a formed and .barthii O. fthe of and . eogdt the to belonged rawl type. wild a or .grandiglumis O. G fthe of oovosgntcdvrec a apndbetween happened has divergence genetic obvious No . .malampuzhaensis O. , 6 .eichingeri O. eoetp,between type, genome .officinalis O. A .sativa O. , KL E B A and , ( .nivara O. .minuta O. .australiensis O. eoetp u oicretdiscrimination incorrect to due type genome eoetp and type genome .australiensis O. D HK C h omrbcmsotnasnnmof synonym a often becomes former the , evda aenlprn and parent maternal as served sptra aet(o ta. 05.In 2015). al., et (Zou parent paternal as eoetp sltdfo h sample the from isolated type genome n t idprogenitors wild its and eoetps swt the with As types. genome ahrta the than rather n that and tsol ecniee distinct a considered be should it , and E .longiglumis O. .minuta O. eoetp,i not if type, genome and ) indica fthe of .schlechteri O. .sativa O. .alta O. .schweinfurthiana O. F .malampuzhaensis O. E .alta O. n subsp. and ( KL eoetp (Fig. type genome .brachyantha O. rgntdb hy- by originated and subsp. and .nivara O. eoetype. genome .grandiglu- O. , H .latifolia O. O H fthe of .ridleyi O. .punc- O. japonica japonica genome, . E genome barthii itself, HK and is ) Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. h ula T ffre iia eouina ovninlclrpatrgos lhuhteeae10 are there Although regions. chloroplast conventional as and resolution in similar species afforded of allotetraploid insertion ITS the nuclear to due The probably is This species. with identifiable spacer one complementation. only without to with species same failed poorly the barcodes almost Both resolved barcodes resolvable. both The in >75%) because values satisfactory situation, (bootstrap the improve barely reliably not the were was of species species performance 21 between Their the discriminate analyses of phylogenetic half resolution. conducted than Fewer and species perform regions regions for these Chloroplast the suitability extracted Generally, 2005). we their al., Here, et evaluate Kress groups. to 2009; plant al., different et in (Hollingsworth plants differently higher for barcodes DNA ( regions chloroplast Three rice of barcodes DNA Conventional S1). text ing exhibited into labeled which Tateoka incorrectly cultivar, many in synonymizing new resulting the After a progenitors, wild regarding GenBank. as their in problems hybrids and deposited rice taxonomic F2 of being created fertile kinds sequences two however, produce and the hybridization, of artificially to identification Subsequent broken used correct was was vigor. subspecies rice hybrid these hybrid between considerable isolation F1 reproductive fertile The partially China. province, Hainan in sativa to accordingly should changed subspecies nivara be two the to have Therefore, , name. autonomous an as indica with renamed subspecies and detached cultivated be this in retained be from comes ee eotdpeiul.Smlry nythe only Similarly, previously. of reported that never to similar was them ihpyoeei ehd.Oigt h igecp aueo hools ee,mttosi chloroplast in mutations genes, even chloroplast identifiable of are nature species single-copy rice the all to and Owing them between methods. types. discriminate phylogenetic genome to with significantly same variations diverged the enough have of accumulated to species the have rice-specific between genes of as discriminate two here species could the served they Although the for why (2008) species, unlikely explaining tetraploid is al. thus in It et species, copies different Zou at multiple well. in by from well sufficiently arising tested work performed difficulties genes marker not sequencing nuclear combined do Despite 142 barcodes barcodes. from DNA DNA out nuclear conventional picked the and R22) to belonging and time those (NP78 short for a especially such level, species within accumulated has variation the in species Most barcodes DNA Rice-specific as such tetraploids formed types. newly Only tetraploids. old the of schweinfurthiana O. the whereas , psbA-trnH .schlechteri O. BC subsp. (syn. (syn. and .rufipogon O. .sativa O. .nivara O. rps19 Oryza indica matK B negncsae,oeo h otvral ein nclrpatgnms efre similarly performed genomes, chloroplast in regions variable most the of one spacer, intergenic CD eoetp defined type genome ( sequences. Oryza HK (= and ) eoetps hc spoal eas fcnetdeouino h T nrelatively in ITS the of evolution concerted of because probably is which types, genome eeoeshge eouinthan resolution higher offers gene .longistaminata O. h eune eoie nGnakicueol n ido eunefrspecies for sequence of kind one only include GenBank in deposited sequences The . subsp. .I 90 aeseieitrpcfi yrdbetween hybrid interspecific sterile male a 1970, In ). .coarctata O. .Hwvr w id fsqecswr bevdin observed were sequences of kinds two However, ). A eas h yeof type the Because . Oryza eu aea vltoayhsoyo nyafwmlinyas eylmtdgenetic limited Very years. million few a only of history evolutionary an have genus .sativa O. matK , .ridleyi, O. .sativa O. B indica or nyoegnm a eetdin detected was genome one only , , psbA-trnH C A subsp. ob) 1seisaenwrcgie nthe in recognized now are species 21 Roxb.), .sativa O. .Tenmso hi idprogenitors, wild their of names The ). , eoetpsaevr lsl eae,cmlt hools genomes chloroplast complete related, closely very are types genome .malampuzhaensis O. B n h te a iia ota of that to similar was other the and subsp. under and , sativa and , .sativa O. C C sativa .glumipatula O. subsp. eoetp a ofimdin confirmed was type genome eoetps oevr obnto of combination a Moreover, types. genome A rbcL (= , B 7 (syn. .sativa O. rbcL .schweinfurthiana O. n n ula ein(T)rpeetconventional represent (ITS) region nuclear one and ) rufipogon eog to belongs and , .sativa O. Both . u in but , C n including and eoetps h w otvral genes variable most two The types. genome subsp. (syn. .sativa O. B Oryza subsp. .coarctata O. and .rufipogon O. japonica C tddntprommc better. much perform not did it , .brachyantha O. japonica ananboth maintain subsp. .nivara O. eoetpswr eetdin detected were types genome otrsacoarctata Porteresia .longiglumis O. .alta O. .nivara O. a icvrda farm a at discovered was ) rps19 ( japonica KL and ) Oryza and ) and subsp. hc elcdthe replaced which , ), .ridleyi O. and .nivara O. .nivara O. B matK eu (support- genus .grandiglumis O. phenomenon a , , .sativa O. and ( indica HJ .rufipogon O. + C ,oeof one ), rbcL (Roxb.) genome ( Oryza. subsp. subsp. (= HJ must did O. ), Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. og . u . i . u,J,Zo . h,S,...Zo,S 21) c1 h otpoiigplastid promising most the Ycf1, (2015). S. Zhou, . . . S., Shi, Y., Zuo, plants. J., made land Sun, genomes of C., plastid barcode Li, angiosperm DNA saxifragales. C., of Sequencing Xu, phylogeny the W., region (2013). on Dong, ITS2 study S. case the Zhou, a of and & 5 Validation primers K., Evolution, universal (2010). of Lin, and C. set T., complete Leon, A Cheng, . C., easy: Xu, . species. W., . plant Dong, L., medicinal Shi, identifying J., for Song, barcode C., 10.1371/journal.pone.0008613 DNA Liu, novel multiple-gene J., a on Han, based as genome H., CD Yao, A. the S., with P. Chen, species Pevzner, Oryza of . phylogeny and Origin . data. sequencing. (2004). sequence single- . S. Ge, to S., & applications A. Y., its Bao, Kulikov, and M., search algorithm Dvorkin, 19 alignment assembly Biology, A., genome local Computational A. new Basic of Gurevich, A (1990). D., SPAdes: J. Antipov, D. (2012). S., Lipman, Nurk, & A., W., relationships Bankevich, E. Phylogenetic Myers, W., (1999). Miller, S. W., tool. G. Gish, Khush, F., S. & Altschul, N., markers. Huang, AFLP S., by Nandi, revealed https://doi.org/10.1007/s001220051198 S., species D. Oryza Brar, among K., R. Aggarwal, 19050303 Institute XDA Reference Research Public-Service No. Central Grant the Sciences, for Funds of by Research Academy supported Fundamental Chinese partly the was the [2018JB001]. and study of 23080204, This Program XDA assembly. Research & genome chloroplast Priority on Strategic guidance the for Dong Wenpan thank We belong support bootstrap in reasonable approach Acknowledgement a sensitive with least clade the monophyletic our also a Although 2012), but in species. Marino-Ramirez, samples consideration. reliable same & level, the most serious we (Spouge species to deserve the Here, At species to offer to users. respect. enough methods specimens to this phylogenetic high of distributed that figure assignment occasionally suggest a the were findings mislabeled, improved seeds were has wrong identify algorithm seeds to the very no difficult of are why is 17% seeds it explains whose that identification, species This show are seed there only. for and correctly successfully morphology them used by apply seed to be taxonomists from can for even seeds characteristics difficult morphological rice seed using some discrimination Although species of seeds rice terms the of in in Identification epitome genome regions This discrimination. entire hypervariable collections. barcodes. species six the field DNA Here, to as or rice-specific much genome. well banks as take, whole contribute served as the can combination not almost their of one does worked and epitome shortcuts genome selected an sensible were chloroplast be some genome could the are chloroplast regions There that of variable imply proportion identifications. most not material large The does plant very not identification routine species may a in for mutations as used genome Such chloroplast be complete should genomes. the it discrimination. nuclear of species in performance for powerful those adequate The than are but quickly phylogeny more true spread a and reflect fixed become genomes ora fMlclrBooy 215 Biology, Molecular of Journal ln ytmtc n vlto,249 Evolution, and Plant 5,9597 https://doi.org/10.1093/gbe/evt063 985-997. (5), cetfi eot,5 Reports, Scientific 5,457.hts/diog 10.1089/cmb.2012.0021 https://doi.org/ 455-77. (5), 3,4340 https://doi.org/10.1016/S0022-2836(05)80360-2 403-410. (3), 38 https://doi.org/10.1038/srep08348 8348. , hoeia n ple eeis 98 Genetics, Applied and Theoretical 8 56.hts/diog 10.1007/s00606-004-0173-8 https://doi.org/ 55-66. , LSOE 5 ONE, PLoS 1,81.https://doi.org/ 8613. (1), eoeBiology Genome 8,1320-1328. (8), Journal Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. i,H,Hriz . u . olr,K,Gl,N,Sniul . ig .A 20) Construction, (2008). A. genus R. the Wing, of types . genome . ten the . represent P., that SanMiguel, maps N., physical Gill, framework twelve Oryza K., of 2030. Collura, analysis in Y., and Yu, consumers alignment B., Rice Hurwitz, Billion 5.0 H., Feed Kim, to take will it What 59 Mod- (2005). G. improvements S. (2017). 7: Khush draft version S. software tauschii alignment L. sequence Aegilops multiple usability. MAFFT Jermiin, and performance (2013). & in (2013). M. D. A., estimates. Standley, L. & phylogenetic Haeseler, K., Mao, Katoh, Von accurate F., . for K. selection T. . model https://doi.org/10.1038/nmeth.4285 Wong, Fast . Q., B. W., elFinder: Minh, He, adaptation. S., G., wheat Kalyaanamoorthy, Zhao, for Y., repertoire gene Li, a X., 10.1038/nature12028 reveals Kong, sequence S., genome Zhao, J., Jia, 1. Objective Breeding Rice a as diversity: Type 10.2135/cropsci1964.0011183X000400010005x cryptic Plant Gelechiidae). for (1964). (, R. tool P. species Jennings, screening new a a as of https://doi.org/10.3897/zookeys.404.7234 barcoding (2012). description DNA 91-101. A. with (404), Slater, (2014). Caryocolum, M. from . Mutanen, example & . An mixtures. O., . complex Karsholt, W., P., in N. Huemer, plants Scott, medicinal R., multiple M. 7 Fowler, of Kingdom), E., identification (United Graham, DNA-based S., - barcode. Williams, PlantID DNA E., plant a Socratous, using C., and Howard, Choosing . (2011). P. . D. with Little, loci & . W., ONE candidate G., S. seven Graham, D. M., of P. Long, Hollingsworth, evaluation T., R. plants: plants. Pennington, land for of J., loci 10.1111/j.1755-0998.2008.02439.x groups Richardson, https://doi.org/ divergent barcoding three L., Selecting in L. sampling Forrest, species-level (2009). A., M. A. P. DNA Clark, Hollingsworth, through identifications L., Biological M. Hollingsworth, (2003). R. J. Dewaard, & L., https://doi.org/10.1098/rspb.2002.2218 S. Ball, A., barcodes. Cywinska, complex D., P. meyeriana Hebert, chloro- Oryza from sequences the DNA on of based (Poaceae) study Oryzeae 10.2307/4126139 of biosystematic genomes. phylogeny nuclear Molecular A and (2005). mitochondrial, S. plast, Ge, (2000). & L., R. Y. of Guo, B. origins Lu, on & emphasis with T., genomes (Poaceae). Borromeo, rice of Y., Phylogeny Gong, (1999). Y. 10.2307/121416 D. https://doi.org/ Hong, 14400-5. & (25), R., B. species. Lu, P. Wheats. allotetraploid T., Dwarf J. Sang, in Huelsenbeck, S., Gibberellin Ge, to . Insensitivity Model . (1973). Large https://doi.org/10.1093/oxfordjournals.aob.a084741 A. a . G. 729-735. Across Marshall, H., (152), Choice & Sebastian, D., Model D., M. and Aaron, Gale, Inference L., Phylogenetic D. Bayesian Ayres, Efficient M., Space. D. 3.2: V. MrBayes Paul, (2012). T., Maxim, R., Fredrik, 1,16 https://doi.org/10.1007/s11103-005-2159-5. 1-6. (1), eoeBooy 9 Biology, Genome . () 124 https://doi.org/10.1371/journal.pone.0019254 e19254. 6(5), , ytmtcBooy 63 Biology, Systematic ln ytmtc n vlto,224 Evolution, and Systematics Plant rceig fteRylSceyo odn eisB ilgclSine,270 Sciences, Biological B: Series London. of Society Royal the of Proceedings 1,1.hts/diog 10.1186/1749-8546-7-18 https://doi.org/ 18. (1), rceig fteNtoa cdm fSine fteUie ttso mrc,96 America, of States United the of Sciences of Academy National the of Proceedings 2,R5 https://doi.org/10.1186/gb-2008-9-2-r45 R45. (2), oeua clg eore,30 Resources, Molecular 1,594.https://doi.org/10.1093/sysbio/sys029 539-42. (1), mrcnJunlo oay 92 Botany, of Journal American 3-5.hts/diog 10.1007/bf00986339 https://doi.org/ 135-151. , 9 4,7270 https://doi:10.1093/molbev/mst010 772-780. (4), oeua clg eore,9 Resources, Ecology Molecular rpScience Crop aue 496 Nature, aueMtos 14 Methods, Nature 9,14-58 https://doi.org/ 1548-1558. (9), , 4 74) https://doi.org/ (7443). 1,1-5 https://doi.org/ 13-15. (1), ln oeua biology, molecular Plant naso oay 37 Botany, of Annals hns Medicine Chinese 11) 313-321. (1512), oKy,404 ZooKeys, 6,587-589. (6), 2,439-457. (2), PLoS Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. tmtks .(04.RxLvrin8 olfrpyoeei nlssadps-nlsso large of post-analysis and analysis phylogenetic for tool A 8: C. version Brochmann, RAxML efficacy. . phylogenies. (2014). barcode DNA A. . of evaluation Stamatakis, practical . The J., (2012). 858 L. Haile, Biology, Marino-Ramirez, Molecular M., & L., Edwards, climate. J. R., and Spouge, vegetation Elven, Arctic past K., of 10 reconstruction Resources, A. Ecology molecular for Brysting, Molecular sequencing L., next-generation Using Gielly, distributions. (2010). H., Australian and (2014). of J. N. boundaries perplexity D. Sonstebo, P. and species Hebert, taxonomy current & current 10.1371/journal.pone.0101108 A., challenge The Hausmann, https://doi.org/ barcodes E., e101108. (2001). S. DNA Miller, Y. J., - D. Haxaire, Hong, Sphingidae J., & I. Kitching, K., Porteresia R., J. Rougerie, of Chen, relationships T., (Poaceae). the Sang, Oryza reflects S., genus best the Ge, that R., name B. the Lu, coarctata: of Oryza (2003). 1051.2003.tb00434.x (2006). S. Oryzeae). Ge, A. Oryza (Poaceae: & B. coarctata rice, R., Schaal, cultivated B. & of Lu, Y., domestications T. independent Chiang, multiple H., reveals https://doi.org/10.1073/pnas.0603152103 genus K. rufipogon, the Hung, in Oryza species sativa. C., diverged rice, recently Y. wild on Chiang, barcoding Asian DNA P., of J. use The Londo, (2016). polymorphism J. DNA X. of China. Ge, to in analysis & (Gentianaceae) Gentiana gene F., comprehensive H. for from Yan, software J., A barcoding: Liu, v5: DNA DNA DnaSP (2015).Plant plant S. (2009). for J. Chen data. protocol Rozas, Y, CTAB & Wang modified P., M, Librado, A Rossetto (2013). R.J., L. Henry S. Y., genome. Comparative Zhou, Yang & (2011). X., W. L., Li Wang, G. Duan, J., Yu, . S., . Wang, extraction. the into L., . incorporated Q., J. J. be Li, should Liu, (ITS) J., spacer X. transcribed Ge, internal H., 108 plants. that Wang, seed indicates T., for barcode dataset H. core large Li, a M., of delimitation. L. analysis species Gao, help Z., to taxonomy D. threshold morphological Li, molecular between for 40 a Relationship Evolution, of barcode (2006). and Proposal J. DNA Crustacea: Gibert, A Molecular within & divergence M., (2008). Gouy, molecular J., and V. C. Savolainen, Douady, Africa). T., & (South Lefebure, S., Park Duthoit, National O., Kruger Maurin, the https://doi.org/10.1016/j.sajb.2008.01.073 M., of to Bank, flora barcodes der the DNA Van of Use R., (2005). Lahaye, H. D. Janzen, 106 & America, A., E. L. Weigt, of Bermingham, 102 Panama. A., & States E. in plants. Zimmer, O., flowering plot United J., identify K. Sanjur, dynamics the Wurdack, forest J., R., of W. tropical Perez, Kress, Sciences a G., of of N. phylogeny Academy Swenson, community National A., https://doi.org/10.1073/pnas.0909820106 a F. the and Jones, barcodes of Proceedings DNA L., Plant D. Erickson, (2009). J., W. Kress, 4) 94-94.https://doi.org/10.1073/pnas.1104551108 19641-19646. (49), 2) 3987.https://doi.org/10.1073/pnas.0503123102 8369-8374. (23), iifrais 25 Bioinformatics, rceig fteNtoa cdm fSine fteUie ttso mrc,103 America, of States United the of Sciences of Academy National the of Proceedings ilgclRveso h abig hlspia oit,90 Society, Philosophical Cambridge the of Reviews Biological hns ultno oay 48 Botany, of Bulletin Chinese iifrais 30 Bioinformatics, 6-7 https://doi.org/10.1007/978-1-61779-591-6_17 365-77. , rceig fteNtoa cdm fSine fteUie ttso America, of States United the of Sciences of Academy National the of Proceedings 1) 4115.https://doi.org/10.1093/bioinformatics/btp187 1451-1452. (11), rceig fteNtoa cdm fSine fteUie ttso America, of States United the of Sciences of Academy National the of Proceedings ora fSseaisadEouin 39 Evolution, and Systematics of Journal odcJunlo oay 23 Botany, of Journal Nordic 9,11-.hts/diog 10.1093/bioinformatics/btu033 https://doi.org/ 1312-3. (9), 6,10-8 tp:/o.r/10.1111/j.1755-0998.2010.02855.x https://doi.org/ 1009-18. (6), LSOE 11 ONE, PLoS 2,4547 tp:/o.r/10.1016/j.ympev.2006.03.014 https://doi.org/ 435-447. (2), 1,72-78.https://doi.org/10.3724/SP.J.1259.2013.00072 (1), 4,e130.https://doi.org/10.1371/journal.pone.0153008 e0153008. (4), 10 ot fia ora fBtn,74 Botany, of Journal African South 5,5558 https://doi.org/10.1111/j.1756- 555-558. (5), 1,176.https://doi.org/10.1111/brv.12104 157-66. (1), 4,373-388. (4), LSOE 9 ONE, PLoS 4) 18621-6. (44), 2) 9578-83. (25), ehd in Methods 2,370-1. (2), (7), Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. l uhr omne ntemnsrp n prvdtefia version final the approved and manuscript the on commented authors All YS WZ, SZ, paper: PW the Wrote YZ, WZ, SZ, YL data: XC, the CX, Analyzed JL, YS, WZ, experiment: the Performed of Analysis XZ materials: (2008). plant S. Provided Ge, XY SZ, . study: . the Designed . J., Wang, L., Contributions Tang, genus. Author L., rice L. the Zang, of BBCC of G., diversification origins J. rapid Multiple 10.1186/gb-2008-9-3-r49 Zhang, the (2015). resolves M., S. genes Ge, F. & 142 Zhang, T., H., Sang, J., X. J. Zou, Doyle, diversification W., (Oryza). Rapid X. genus rice Xu, (2014). the L., Z. in Tang, L. species S., allopolyploid Gao, Y. Du, . H., . X. 111 Zou, America, . Y., of adaptation. Zhang, States L., rice United Y. with the Liu, associated An C., of genomes Shi, AA PhyloSuite: H., Oryza E. five Xia, (2020). of T., T. Zhu, J., G. Q. evolu- Wang, Zhang, and & management X., data W. barcoding sequence DNA Li, molecular 0998.13096 streamlined J., (2015). studies. for J. Zhang, phylogenetics platform X. H., tionary desktop Ge, Zou, scalable & Jakovli´c, I., and G., F., integrated Hao, Gao, M., D., C. China. The Hu, in Zhang, L. (2005). Y., Primula S. C. genus Jackson, Zhang, 10.1371/journal.pone.0122903 species-rich the https://doi.org/ F., . in X. e0122903. implications . Xie, taxonomic J., its . species. and Y. rice D., evaluation wild Liu, Kudrna, of F., The Y., potential H. Yu, genetic Yan, (2014). R., the A. unlocking H. to R. Kim, path 59 M., Wing, golden Biology, Luo, The Molecular . S., project: S. alignment . J. map domestication. Oryza . Ammiraju, independent A., L., for R. J. evidence Wing, Goicoechea, and C., glaberrima) Fan, (Oryza of R., rice 46 P. Relationships African Genetics, Marri, sequences. of G., (2015). genome sequence Haberer, J. chloroplast genome Y., R. whole Yu, Henry, upon M., & based Wang, species) L., genome D. AA 5 Waters, Reports, (Oryza A., Scientific rices Furtado, domesticated genus. M., and Oryza Brozynska, wild the W., in P. Diversity Wambugu, (2003). K. 6 taxonomy. biology, Kadowaki. of plant status H., current Morishima, L. D.,A., Oryza Vaughan, fragments. genus Phylogeny chloroplast The 20 (2010). (1989). of S. A. analysis Ge, D. combined & Vaughan, from Y., evidence D. 54 Zhang, (Oryzeae): Evol, G., Sinauer tribe Phylogenet Second, rice Mol C., the Potgieter, methods). of G., other biogeography Achoundong, and H., (and X. Parsimony Zou, L., Using Tang, Analysis https://doi.org/10.1002/0471650129.dob0522 PAUP*:Phylogenetic USA. Massachusetts, Sunderland, (2003). Associates, L. D., Swofford 9,928 tp:/o.r/10.1038/ng.3044 https://doi.org/ 982-8. (9), 2,1912 tp:/o.r/10.1016/S1369-5266(03)00009-8 https://doi.org/ 139-142. (2), 197.hts/diog 10.1038/srep13957 https://doi.org/ (13957). 1,5-2 tp:/o.r/10.1007/s11103-004-6237-x https://doi.org/ 53-62. (1), 1,2627 https://doi.org/10.1016/j.ympev.2009.08.007 266-277. (1), oeua clg eore,20 Resources, Ecology Molecular 4) 45-2 tp:/o.r/10.1073/pnas.1418307111 https://doi.org/ E4954-62. (46), cetfi eot,5 Reports, Scientific 11 rceig fteNtoa cdm fSciences of Academy National the of Proceedings eoeBooy 9 Biology, Genome 1,3835 https://doi.org/10.1111/1755- 348-355. (1), 47.https://doi.org/10.1038/srep14876 14876. , RIRsac ae Series Paper Research IRRI 3,R9 https://doi.org/ R49. (3), LSOE 10 ONE, PLoS urn pno in opinion Current Nature Plant . (4), Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. h akr h grsbsd rnhsaebosrpvle n h eoetpsaegvni odcapital bold in given are types genome the and side. values right bootstrap the are branches on in beside letters species figures barcode. all DNA The chloroplast of marker. rice-specific sequences the the on regions) based hypervariable tree six consensus strict (concatenated parsimony barcode maximum DNA The nuclear S6. Figure rice-specific in the values. species on bootstrap all are based of branches tree sequences beside consensus R22 ITS strict + barcode parsimony NP78 maximum DNA of The conventional the S5. Figure on based tree values. consensus bootstrap in strict are species parsimony all maximum of sequences The barcode DNA S4. conventional Figure the on values. based bootstrap tree are consensus branches strict parsimony barcode maximum trnH The DNA conventional S3. the Figure on based tree values. consensus bootstrap in are strict species parsimony all of maximum sequences The barcode DNA S2. conventional Figure the on based values. tree bootstrap in consensus are species strict all the parsimony of on maximum sequences letters The capital bold S1. in Figure given NP78 are nuclear Haplotypes of sequences analyses. side. concatenated Bayesian right in and on species based likelihood side. maximum tree all right parsimony, the consensus of on strict genes letters R22 likelihood capital maximum and bold The in given 2. are Figure types Genome sequences genome analyses. chloroplast Bayesian complete and the likelihood on in maximum based species tree consensus all strict of of likelihood maximum number The average 1. Figure k: diversity; nucleotide Figures Pi: of haplotypes; genomes of Chloroplast number S3. Table h: of sites; genomes variable differences. chloroplast S: nucleotide the of listed. evaluation variability were Initial S2. Table nu- in of species Information accepted three information. genes. Generally with barcode. together nuclear markers super S1. two eight genome Table the and chloroplast of regions the sites variable and chloroplast of barcodes number six DNA and amplify conventional lengths, expected to and designed diversity cleotide Primers 2. Table of materials Plant 1. Table addi- All later. Tables attached be manuscript. will the and in submission available awaiting is are information numbers tional accession GenBank sequences: DNA Accessibility Data eune falseisin species all of sequences Oryza h grsbsd rnhsaebosrpvle fbt aiu parsimony, maximum both of values bootstrap are branches beside figures The . Oryza Oryza Oryza Oryza Oryza eosrtn h eouino h akr h grsbsd branches beside figures The marker. the of resolution the demonstrating , branches beside figures The marker. the of resolution the demonstrating , Oryza eosrtn h eouino h akr h grsbsd branches beside figures The marker. the of resolution the demonstrating , ape nti td with study this in sampled Oryza h grsbsd rnhsaebosrpvle fbt maximum both of values bootstrap are branches beside figures The . eosrtn h eouino h akr h grsbeside figures The marker. the of resolution the demonstrating , and Oryza Oryza Leersia ihcrmsm ubr eoetp n distribution and type genome number, chromosome with eosrtn h eouino h akr h figures The marker. the of resolution the demonstrating , 12 pce onoddfo GenBank. from downloaded species Leersia Oryza Oryza pce soutgroups. as species eosrtn h eouinof resolution the demonstrating , pce n 6vral regions variable 36 and species psbA- matK rbcL Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. * 0.0005 substitutions/site * 2 1 100/100/1 95/96 90/93 * 100/82/1 * * 100/98/1 96/99/1 * * * O. meyerianaKF359921.1 O. granulataKF359920.1 100/98/0.99 O. coarctata O. granulataBOP204877 O. schlechteriBOP022683 * * O. coarctataBOP022690 61/70/1 * O. neocaledonica O. neocaledonica * * * * * 1 * * O. longiglumisBOP022664 O. ridleyi O. ridleyi 98/99/1 * 98/98/1 O. altaBOP022645 O. grandiglumisKF359914.1 * O. grandiglumisBOP022694 O. altaKF359913.1 O. schweinfurthianaBOP022678 O. longiglumisKF359918.1 O. latifoliaBOP204878 O. eichingeriMF401450.1 O. eichingeriBOP204879 O. latifolia O. officinalisBOP022672 O. officinalisKM881643.1 * O. rhizomatisMF401452.1 O. rhizomatisBOP204880 * 87/94/1 O. australiensisKF359916.1 O. australiensisBOP022647 13 * 2 BOP022680 KF359919.1 * * * O. malampuzhaensisBOP204667 O. minutaBOP022670 O. minutaKF359909.1 * * * O. punctataKF359908.1 O. punctataKM103375.1 * * * * * * * * * BOP022671 O. malampuzhaensis KF359915.1 * O. glumipatulaKM881640.1 O. longistaminataKM088024.1 O. glumipatulaW2201KR364803.1 O. longistaminataKF359907.1 O. sativasubsp.indica O. sativasubsp.indica O. sativasubsp.indica O. barthiiKM103371.1 O. sativasubsp.indica O. rufipogonKF359902.1 O. nivaraKM088022.1 O. meridionalisKF359906.1 O. sativasubsp.japonica O. barthiiKF359904.1 O. meridionalisKM103373.1 O. rufipogonDongXiangKF562709.1 O. glaberrimaKJ513090 O. brachyanthaKF359917.1 O. brachyanthaBOP022653 O. nivaraAP006728.1 Leersia japonica Leersia perrieri O. glaberrimaKF359903.1 Leersia perrieri O. sativajaponicaX15901.1 BOP022686 KF359922 KY347906 AY522329.1 RPBio-226KU705873.1 JN861110.1 RPBio-226KY780370.1

NipponbareAY522330.1

FF GG AA BB/BBCC CC/CCBB/CCDD EE HHKK HHJJ Posted on Authorea 3 Apr 2020 | CC BY 4.0 | https://doi.org/10.22541/au.158592560.03447648 | This a preprint and has not been peer reviewed. Data may be preliminary. of-oryza-conventional-specific-and-super-barcodes Primer.docx 2. Table file Hosted of-oryza-conventional-specific-and-super-barcodes Materials.docx 1. Table file Hosted 100/83/1 10 changes 64/-/0.67 vial at available 100/83/1 vial at available 97/52/1 98/95/1 92/-/1 100/91/1 O.neocalidonia BOP022671 O.granulata 106469 O.granulata EF578073+EF578236.1 O.granulata BOP022661 100/91/1 https://authorea.com/users/308394/articles/439407-dna-barcoding- 100/96/1 100/82/1 https://authorea.com/users/308394/articles/439407-dna-barcoding- Leersia perrieriBOP022686 100/99/1 Leersia tisserantii105610 O.longiglumis BOP022664 O.longiglumis BOP022665 O.ridleyi BOP022680 78/-/1 100/98/1 O.schlechteri BOP022683 O.brachyantha EF578235.1 O.brachyantha EF578072 O.brachyantha 105151 O.longiglumis BOP022664 O.brachyantha BOP022654 O.longiglumis BOP022665 O.coarctata BOP022690 O.ridleyi BOP022680 82/100/1 81/58/1 96/93/0.96 O.schlechteri BOP022683 100/100/1 83/100/1 82/-/1 14 O.coarctata BOP022690 100/100/1 60/-/- 71/-/- 61/-/- 62/-/- 94/94/1 50/79/1 71/76/1 100/100/1 77/-/1 71/-/- 56/-/- O.latifolia BOP204878C 75/83/1 O.rhizomatis EU503439.1 O.officinalis EF578070.1+EF578233.1 O.officinalis 104972 O.officinalis BOP022700 O.eichingeri BOP022656 O.eichingeri EU503440.1 O.eichingeri BOP022646 O.latifolia BOP022669 100/100/1 O.schweinfurthiana BOP022678 O.alta BOP022645 O.minuta BOP022670 O.grandiglumis BOP022694 O.rhizomatis BOP204880 O.minuta BOP022670 O.rufipogon 105480 O.rufipogon EF578068.1+EF578231.1 O.sativa subsp.japonicaAP014967.1 O.sativa subsp.japonicaAC119671.4 O.sativa subsp.indicaCP012619.1 O.sativa subsp.indicaCP018167.1+CP018157.1 O.sativa subsp.indicaCP012609.1 O.nivara BOP022699 O.alta BOP022645 O.malampuzhaensis BOP204667 O.meridionalis BOP022667 O.glumipatula BOP022657 O.barthii BOP022652 O.grandiglumis BOP022694 O.barthii EU503438.1 O.barthii BOP022651 O.latifolia BOP022669 O.punctata EF578069.1+EF578232.1 O.punctata 103903 O.schweinfurthiana BOP022678 O.latifolia BOP204878 O.punctata BOP022676 O.rufipogon BOP022682 O.malampuzhaensis BOP204667 O.australiensis BOP022647 O.australiensis EF578071.1+EF578234.1 O.australiensis 105263 K C B H J G E F A