<<

Posted on Authorea 11 Sep 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.159986559.90881792 | This a preprint and has not been peer reviewed. Data may be preliminary. giutr n ua ffis olg fHriutr,SuhCiaArclua nvriy Guangzhou, # University, Agricultural South 3 Horticulture, of China P.R. College 510642, Affairs, , Rural and Agriculture 2 1 Zhang flowering Hua Chinese of Li assembly Guangguang genome chromosome-level A title: Running ( cabbage flowering Chinese breeding of nensis molecular assembly the genome guiding high-continuity genome A reference the as serve potentially can deciphering crops. it for rapa resource evolution. and B. genetic genome species identified valuable of (SVs) a Brassica practice on provides variations SVs of cabbage of structural flowering evolution significance of Chinese genome functional the amount the the of large suggesting assembly genes, a the genome coding high-quality in that the impact our in regions found could Overall, role genomic we that a Furthermore, orthologous lines play on rapa may the genus. regions B. expansion to Brassica within lineage-specific pericentromeric compared This the the expanded in amplification. that significantly LTR-retrotransposon divergence reveals Comparative by been species C) drive have largely juncea. and genome genomes, B Brassica AA CC (A, the and species, BB types of allotetraploid subgenome 6 has the different and cabbage of with flowering 5 progenitor Chinese chromosome species and diploid the Brassica elements AA of of DNA genome the in analysis the 17% with genomic Chinese reveals including relationship of sequences analysis evolutionary repetitive Phylogenetic typical closer as (LTRs). a genome a retrotransposons of its protein-coding Asia. terminal chromosomes of 47,598 in long 10 (205.9/384) modeled We in 52% crop of technology. annotated 22% genome Hi-C and Mb leaf and analysis 384 Illumina, cultivated this PacBio, the in widely using of genes approach and assembly integrated novo popular an de with a quality cabbage is flowering high parachinensis) a performed var. we rapa Here, (Brassica cabbage flowering Chinese Abstract 2020 11, September 2 1 Ren Hailong Li Guangguang evolution into structure insights genome new provides Brassica parachinensis) cabbage var. flowering rapa Chinese (Brassica of assembly genome high-continuity A ot hn giutrlUniversity Agricultural China South available not Affiliation eateto clg n vltoayBooy nvriyo aiona rie A 29,USA. 92697, CA, Irvine, California, of University Biology, Evolutionary and Ecology of Department China 510308, Guangzhou, Science, Agriculture of Institute Guangzhou e aoaoyo ilg n eei mrvmn fHriutrlCos(ot hn) iityof Ministry China), (South Crops Horticultural of Improvement Genetic and Biology of Laboratory Key qa contributors Equal rvdsnwisgt into insights new provides ) 1* n hnmn Chen Changming and , 1 1# u Zhang Hua , utoWang Juntao , 1 utoWang Juntao , 1 n hnmn Chen Changming and , 2# iLiao Yi , 2* 2 iLiao Yi , Brassica 3# igJiang Ding , 1 igJiang Ding , eoesrcueevolution structure genome 1 1 1 asn Zheng Yansong , 1 asn Zheng Yansong , 1 rsiarapa Brassica icu Dai Xiuchun , 1 icu Dai Xiuchun , 1 aln Ren Hailong , var. parachi- 1 , 1 , Posted on Authorea 11 Sep 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.159986559.90881792 | This a preprint and has not been peer reviewed. Data may be preliminary. odel eihrtegnmcvrat htmycnrbt otegetdvriyta o nyphenotype only not that diversity great the to contribute read may long that with variants representative genomic 2018) of the al., range decipher et wide deeply a N50>5Mb)(Belser to from information (i.e. genome assemblies megabase species, genome to continuity up high in N50 technology of contig assembly resulting and the a 2019) in with al., technology et read botrytis(Sun long var. with sequencing four sequenced least including long-read be at size, genome to are of contigs.There to reported application assembled (PACBIO), were the Biosciences such the of that Pacific continuity parts recently, improved and genomic until greatly (ONT) complex has Technology Only assembling the Nanopore at Oxford regions. especially including analysis centromeric technologies, genomic and and the pericentromeric 2017), impede as may al., which et technology, continuity Sun low 454 tively 2014; illumina/Roche al., with sequencing et sequenced Chalhoub next-generation those the example especially for acea genomes, technology, U’ sequenced of sequencing including These ‘triangle in of a quality. advances number in of recent large defined a rapid well (NGS), the are other to each 2016). Due with al., et relationship 2019;Yang and al., species, et origination diploid model(Wang three evolutionary former their the of and combinations pair the important by generated agriculturally including were most which the species allopolyploid Among diversity(Cheng including three karyotype 2019). types also al., evolution, the genome but et in diversity its diploid Wang species phytochemical of Thus, 2016; and course morphological 2005). al., Schubert, the with great et & display Over splitted Pecinka, only Koch, it not crops. Lysak, after genus 2016; oilseed event al., and (WGT) et triplication ancestor(Cheng common genome-wide staple additional of an range experienced wide a contains it Brassica Introduction Hi-C PacBio; of sembly; practice breeding a molecular of provides the Keywords: evolution cabbage guiding flowering genome genome Chinese reference the the the deciphering as of for serve assembly divergence resource genome species high-quality genetic the our valuable in Overall, LTR- role evolution. by drive genome a largely significantly play genomes, been may CC have within and expansion genome BB AA lineage-specific the the the This in in of species, regions 6 genomic amplification. allotetraploid and orthologous 5 retrotransposon the the chromosome to of on compared has regions progenitor expanded pericentromeric cabbage diploid the flowering AA that Chinese terminal reveals the long the in with of 22% genome relationship and juncea elements the evolutionary DNA reveals 52% closer in annotated analysis 17% PacBio, and a 10 Phylogenetic including analysis using of sequences this approach genome (LTRs). repetitive in integrated Mb as genes retrotransposons 384 an genome protein-coding the with its 47,598 of of cabbage Zhang, modeled assembly We flowering (205.9/384) novo technology. de Chinese Hua quality Hi-C of high and cultivar a Illumina, and typical performed a we Here, [email protected] of chromosomes Asia. in Chen, crop vegetable leaf Changming to Abstract: addressed be [email protected] should Correspondence O00HPri ta. 2014), al., et TO1000DH(Parkin Brassica .rapa B. oprtv eoi nlssof analysis genomic Comparative . rsianapus Brassica hc eog othe to belongs which , .rapa B. hns oeigcbae( cabbage flowering Chinese hns oeigcabbage; flowering Chinese Brassica .oleracea B. ie htcudipc oiggns ugsigtefntoa infiac fSson SVs of significance functional the suggesting genes, coding impact could that lines eu.Frhroe efudta ag muto tutrlvrain Ss identified (SVs) variations structural of amount large a that found we Furthermore, genus. var. pekinensis eoe.Snetegetmrhlgcladpyohmcldvriyi the in diversity phytochemical and morphological great the Since genomes. (AACC), utvr HDEM, Brassica rsiarapa Brassica hiu(age l,2011), al., et (Wang Chiifu rsiajuncea Brassica .napus B. pce aebe eune,btms r nyo rmtv level primitive a on only are most but sequenced, been have species .nigra B. rsiarapa Brassica rsiarapa Brassica Brassica rsiarapa Brassica aiy saogtems cnmclyipratgns since genus, important economically most the among is family, Sn ta. 00.Teesuisdmntae ra success great demonstrated studies These 2020). al., et (Song (AA), .juncea B. Z25(age l,2016), al., et YZ12151(Yang AB)and (AABB) rsianigra Brassica 2 pce ihdffrn ugnm ye A n C) and B (A, types subgenome different with species var. var. Wn ta. 09 age l,21)hdarela- a had 2016) al., et Yang 2019; al., et (Wang parachinensis 1(elwsro)Ble ta. 2018), al., et sarson)(Belser (yellow Z1 parachinensis .oleracea B. rsiacarinata Brassica Brassica Brassica B)and (BB) sapplradwdl cultivated widely and popular a is ) .rapa B. pce ilb epu n needed and helpful be will species eoesrcueeouin as- evolution; structure genome ; 21(i ta. 2014), al., et 02-12(Liu Brassica .napus B. pce n tcnpotentially can it and species rsiaoleracea Brassica BC) hs i species six These (BBCC). crops. pce,teeaethree are there species, Byre l,2017; al., et (Bayer Arabidopsis Brassica .oleracea B. C) and (CC), genomes Brassica Brassica Brassica Brassica Brassica .oler- B. rma from Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. nlss eas admypce 000ra ar n lse hmaantteNB o redundant genome non cabbage NCBI flowering the Chinese against the them of blasted low- contamination. assembly 50% and sample novo than obvious pairs De reads; more for read the check three with 10,000 from to reads Briefly, picked removed database third, (nt) were randomly and reads. nucleotide sequences eliminated; also and were We adapter bases bases the N analysis. low-quality First, 10% filter ( than data. to bases more quality NGS with used the reads was clean the 2013) second, to al., performed et were Yang steps package(Xi data sequencing HTQC NGS The on were based subreads estimation genome assembly of size reference genome 113Gb following Genome a the than for of used more were construction data sequences, the sequencing adaptor The for removing operations. coverage. reads sequence After times genomic 219 cabbage. long with obtained flowering generate platform. Chinese to II system the used SEQUEL Selection for PacBio was Size the platform construction. BluePippin on PacBio the library sequenced The using was sequencing preparation library PacBio library The 10 the template USA). USA), 30-kb for Science, for Biosciences, (Sage used used (Pacific also were protocol was DNA 2000 leaves manufacturer’s genomic HiSeq young the Illumina the to the from According library using extracted sequenced sequencing and evaluated DNA Illumina constructed were The was An range bp CA). size 300-350 and Clara, of purity Santa length platform. DNA insertion Technologies, of TruSeq USA). an the (Agilent 2g using with Inc., 2100 prepared from (PE) (Illumina was Bioanalyzer DNA bp Kit 250 extract Agilent Preparation of to length with Library insertion used LT an was for DNA library protocol Nano sequencing and Illumina extraction DNA An phenol/chloroform for leaves. young the -80*C sequencing, at Illumina stored For and nitrogen sequencing liquid and which in extraction 1), Guangdong, frozen DNA Guangzhou, 701(Fig. in soon Youlv Science, were cv. Agriculture leaves parachinensis of extraction. young Institute var. RNA Guangzhou collected rapa the The B. by issued of China. line plant inbred single highly a a from is collected were leaves Young collection Sample methods evolutionary and and Materials comparison genome addition, In representative the other species. into and resolved this insights genome of assembly this regions The pericentromeric of technology. the analysis (Hi-C) of Chinese part capture for large conformation assembly a genome chromosome level high-throughput chromosome and and and Mb) the PacBio 7.2 morphological uncover = (N50 special continuity further high cabbage of flowering to a report assembly formation we study, and the this In temperature. sequencing in low cultivar. genome involved this under this mechanisms of vernalization conduct characteristics molecular other strict phytochemical to and Unlike without important information easily very flower 2019). genomic is and al., it bolt et dietary can Therefore, and effects(Xiao 2020). metabolites cabbage al., secondary health-promoting flowering minerals, et human vitamins, (Kamran Chinese in and confer and rich Japan, leafy is which important China, and an value in fiber, is nutritional particularly 2019), high Asia, al., has et in vegetable Xiao grown This 2019; widely Reiter, vegetable & Lu, stem Kuang, bolting Fan, Hsin(Tan, Tsai or choy, bok ( Sum, cabbage flowering Chinese species. the The of cultivars various karyotype also but < 5 eedsadd aty eotie 23G ˜6)o lae aafrteKmer-based the for data cleaned of (˜86X) Gb 42.3 obtained we Lastly, discarded. were =5) Brassica Basc rapa (Brassica eoesrcueevolution. structure genome rsiarapa Brassica .I a sebe iha nertdapoc sn luiasequencing, Illumina using approach integrated an with assembled was It ). var. Brassica parachinensis 3 pce eecnutd h eut rvd novel provide results The conducted. were species ,lclykona axn siTi Choy Tai, Tsai Caixin, as known locally ), μ fCieeflwrn cabbage flowering Chinese of g .rapa B. vegetables, Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. s/gtu.o/ezogicht ormv h eunerdnac.Terslso eet sequence repeats of results The redundancy. of sequence content in the annotation time, remove the same integrated to the we ps://github.com/weizhongli/cdhit) At annotation, sequence. 2018) gene 2016) transcriptome al., complete the al., more obtain et a to Yang obtain cabbage flowering to tran- full-length Chinese order the of process to data used scriptome density was inner (https://github.com/pacificbiosciences/isoseq) TE pipeline the Isoseq3 and circle. to The inner density outer the from DNATE prediction in the density, collinearity gene shown LTR from was the coding density, genome constructed Protein find the gene were to within showing tracks collinearity used Four 2009), the was and files. al., respectively, 2012) and the link with et al., identify DNATE annotated generate Circos(Krzywinski were to et the and sequences of used Wang results annotate repeat was MCScanX(Y. the comparison 1999) to Finally, 2008). (Benson, the threshold. used finder) al., the was repeats as et 2019) (tandem points MAKER(Cantarel 20000 TRF al., with genome. et sequence the centromere (EDTA)(Ou of Annotator sequences type TE LTR de-novo picture extended Circos of The construction and SMRTLink annotation used element we Repetitive genome Last, for USA). sequences CA, mRNA The Park, selection the purification. Menlo all size annotation. PacBio and California, the produce the amplification of on to cDNA sequenced (https://www.pacb.com/support/software-downloads/) perform Biosciences after Selection and 7.0 (Pacific to respectively, protocol, platform Size manufacturer’s kb, used the II 2–6 BluePippin was to SEQUEL and according USA) A kb constructed CA, mRNA kit. were 0–3 libraries Park, sized the synthesis SMRTbell libraries, Briefly, Menlo cDNA two construction. California, SMARTer the library of Clontech for sequencing Biosciences a transcriptome (Pacific using for Technologies,System (Agilent transcribed used Bioanalyzer reversely was 2100 young a RNA RNA was The and a verified USA). USA) of (Invitrogen, (LabTech, The Reagent tissues spectrophotometer TRIzol USA). mixed a the with by with checked extracted performed was was was quality RNA sequencing imbibition). after transcriptome day genome, (14 analysis the seedling of data annotation and gene experiment For (Iso-seq) sequencing settings. “–filterThreshold default RNA to at molecule set 2018) kept were Single al., hicCorrectMatrix were step et rests level the kit(Wolff the for chromosome HiCExplorer Parameters and the Then map. 5” to -L0”. contact reads Hi-C -3.5 -E50 chromosome. a paired -B4 the build two “-A1 to to map parameters used assembled to was these manually 2019) with Aluru, being alone & sequence after Li, existing genome name Misra, the new mem(Vasimuddin, and bwa a scaffolds used given between We was comparison sequence collinear The a Wang 2017). make MCScanX(Y. 2017). al., to al., used et et was to 3D-DNA(Dudchenko hitched and 2012) DNA was 2016) al., the group al., overlapping construction; et et The library Juicer(Durand USA). sequencing using (Illumina, the level platform scaffold for 2000 the used HiSeq was Illumina lysis, DNA the crosslinking, enriched using purification(Xuefen and sequenced steps: was DNA purified same following and The the the reversal, 2019). of linking al., from cross et consisted collected Yang ligations, experiment proximity tissue marking, Hi-C biotin leaf The digestion, young construction. chromatin library of Hi-C g for 8 used study, present the analysis In data and series preparation with of the completeness library reads The identify Hi-C short removed. to NGS are used repeats using Kriventseva, series was analysis(Sim˜ao, Ioannidis, 2015). the polishing Waterhouse, v3.0 1999) of Zdobnov, BUSCO of 60% finder)(Benson, & using than rounds evaluated more repeats was two of genome (tandem ratio applied assembled the TRF We the with series kb. 2014). the assembly. and 10 al., genome repeats, of et cabbage (Walker flowering cutoff Chinese Pilon length the a a for used had was reads 2017) al., Long et Xiao package(C.-L. MECAT2 The and .ngaW age l,2019) al., et Wang nigra(W. B. , .npsCahu ta. 2014) al., et napus(Chalhoub B. sterfrnegn euneuigC-I-S (htt- CD-HIT-EST using sequence gene reference the as 4 , .oeae(i ta. 2014) al., et oleracea(Liu B. .rapa B. var. parachinensis .rapa B. , .rp(hn et rapa(Zhang B. .juncea(J. B. genome(Cai ln was plant Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. 2.Frhroe efudta 78 n .%o h opee n ata ee ftettlo 1,440 of genome total the the of of (Table completeness genes credible the validated partial is which and genome respectively, completed this genome, the of the of assembly in S3). 0.8% The the (Table detected and 1). and the that were (Table 97.8% contigs, 37.6% obtained suggested genes that were 450 we BUSCO tools contigs contained found Finally, genomic SAM genome we 2014). the Furthermore, The by for al., 7.2Mb. content S2). statistics et GC of coverage Pilon(Walker The length of 1). a N50 (Table results with contig 19.9Mb was a reads cutoff contig with short length longest 384Mb a the NGS with of al., using reads assembly et long contig polished Xiao Secondly, assembly. final were package(C.-L. genome MECAT2 kb cabbage the flowering 10 Firstly, Chinese genome. of the the for assembly. of assemble used content facilitate to was would (TE) 2017) strategy that elements integrated 0.16% transposon only an and with 32GB size, applied 38.9% low was We of with very genome size content is genome out the GC heterozygosity The overall carried the 2013). of Remarkably, an al. was S1). survey with et (Table line (Liu 515Mb 64.1% preliminary method and inbred about Kmer-based A 219 be this using respectively. to to coverage) of ˜83 estimated corresponding Mb), content 1; (Table S1), (TEs) (515 reads (Table elements illumina size reads clean transposon raw genome Illumina and estimated 47.5Gb GC the heterozygosity, and of PacBio Overall, 113Gb depth plants. different of 86 for between total variability pipeline genome a assembly potential obtained avoid The to we data. sequencing Hi-C Hi-C and and Illumina PacBio, reads long coverage deep rapa with ( assembly cabbage and flowering Chinese sequencing of line inbred highly A ( cabbage flowering Chinese of of coverage ) assembly genome high continuous sequencing. highly using A ISO-seq QC using annotation extensive gene and and Hi-C, annotation include repeat steps using novo The Results scaffolding de genome. by parachinensis by followed var. followed reads rapa per short Brassica reads 10-8 Illumina for PacBio pipeline 1.5X of assembly of the assembly rate of Overview substitution 1. neutral Figure the used element we LTR Here T= intact 2000). an formula: Mitchell-Olds, rate. of the & mutation 3’-LTR with Haubold, and average generation(Koch, element 5’-LTR the intact per the site each between to synonymous difference of refers LTRs sequence two the r the to and refers between Ks (T) where time K/2r, divergence the by estimated tutrlvrain eedtce sn nasml-ae ieiebsdon based pipeline assembly-based an available publicly is which using 2003) Haus- al., & et Schwartz detected Miller, 2020; Emerson, Hinrichs, https://github.com/yiliao1022/LASTZ & Chakraborty, Baertsch, at Zhang, were Kent, Liao, 2003; 2007; sler, tools(Harris, variations LASTZ/CHAIN/NET/NETSYNTENY Structural analysis variants method. Structural likelihood between maximum relationship (htt- then the Easyspecietree phylogenetic alignment, using 2010). super the Toh, species and a generate & the into orthogroups program(Katoh to concatenated find mafft were used the to species was using used ps://github.com/Davey1220/EasySpeciesTree) one alignment was in sequence package multiple genes Orthofinder through single-copy The run the of genes. other All single-copy and genes. from cabbage single-copy orthologs flowering the Chinese using between zed MA- relationships into phylogenetic annotation. enter sequence The to repeat repeats and reference gene as of analysis used rounds Phylogenetic were 5 1999) for TRF(Benson, 2008) and al., 2019) et al., KER(Cantarel et EDTA(Ou by found var. parachinensis eoewssoni i..DAsmlsfo igepatwr rprdfor prepared were plant single a from samples DNA Fig.1. in shown was genome SV ieie neto ie fLRrtornpsn were LTR-retrotransposons of times Insertion pipeline. .rapa B. 5 var. parachinensis i.)wsue o h genome the for used was Fig.1) , .rapa B. Brassica var. lnswr analy- were plants parachinensis Brassica Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. al .Saitc n noae nlsso h hns oeigcbaegnm assembly genome chromosome-level cabbage flowering first Chinese the the and of contiguous analysis most annotated the and Statistics far 1. so Table knowledge, species. our this of to S4). assembly (Table provide, %) genome families repetitive we abundant (22.26 as most retrotransposons conclusion, annotated LTR is the is In gene method. are genome per Kmer-based %) the exons of of (17.62 of 53.2% estimation transposons number Approximately the DNA average 1). with and (Table The consistent bp 1). is (Table 199 the flowering which bp of using Chinese 2060 sequences, length study the of mean present in a length genes the with gene average 6.13, protein-coding from an 47,598 sequencing with annotated ISO-seq We genome from 2008). cabbage transcripts al., et full-length pipeline(Cantarel and MAKER A05 data DNA on of read region short regions two, large from repeat other a the performed corrected be also with might region. of We there collinear this in number that at is 47.5Mb large indicated transposons position is which LTR A each scaffold (Fig.1C), and (Fig.1B). longest that identified transposons complete the were shows chromosomes and genome is A06 32Mb contigs the annotation contigs. and of of 180 total the N50 map the These that scaffold of Circos 1A). a indicating (180/450) The with 40% (Fig. 1). scaffolds and pseudo-chromosomes (Table sequence 69 length 10 assembled contains the total into assembly from assembly the final calculated scaffolded current of The frequency the further Mb/384Mb) contact to which (338 were Using mappable reads 87.93% contigs were contigs. represent (PE) (434M/442M) different 180 paired-end 98.27% to reads, Hi-C which, mapped PE cleaned Of were genome. Gb (147M/442M) the 66 ˜33.18% of of and contigs depth the total 128 scaffold a about to obtained is used We was assembly. data chromosome-level (Hi-C) into capture conformation chromatin high-throughput Furthermore, Ccnet144M 37.61 74.62 44.26 12.31 74.50 Mb 144.4 bp bp 2,060 199 Mb Mb 384 47.5 Mb 32.2 Mb 170.3 M 6.13 Mb 19.9 47.3 Mb 429.89 7.2 Mb 384 Mb 221,630 69 gene per exons Average gene per 47598 length Average genes protein-coding 128.46 Total 450 sequences repetitive Total Mb 66,231 content GC 82.10 441,545,786 219.31 scaffold Longest scaffold Mb of Mb 113,068 N50 42,330 Mb scaffold 515 322,016,292 contig 4,448,280 Longest contigs of N50 Contigs reads Total reads HiC reads illumina reads PacBio size genome of Estimate enovo de eepeito ihgiac yhmlg rmrltdseis transcriptome species, related from homologs by guidance with prediction gene ubrSz eunecvrg()Percentage(%) coverage(X) Sequence Size Number 6 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. h current the group each of plant frequency model The genes. ortholog that two) the fact categories: than that three (more revealed into multiple species them and nigra eudicot separate genes two 20 and 2015) gene, Kelly, the species copy & among eudicot Orthofinder(Emms single 20 used a across we with group models, group gene ortholog and the assembly construct genome to of completeness the assess current To the reveals genomes eudicot of 20 nensis chromosomes across Circos assembled analysis (B) ten duplication the chromosomes. Gene indicate map; A06 of 10 Hi-C chromosomes and and the the A05 09 and 08, on on assembly 07, features between regions sequence consistency pericentromeric suggesting of repetitive ( diagonals, diagram highly cabbage the indicate flowering at squares highest Chinese Blue are of contacts assembly Hi-C genome of continuous highly parachinensis A 2. Figure abrmr ulctdotoosthan orthologs duplicated more harbor ) eoei mn h othg-ult sebisof assemblies high-quality most the among is genome Brassica .rapa B. rbdpi thaliana Arabidopsis ). A iCcnatmpo h hns oeigcbaeasmldcrmsms Density chromosomes; assembled cabbage flowering Chinese the of map contact Hi-C (A) pce xeine netawoegnm rpiain(G)eetcmae ihthe with compared event (WGT) triplication genome whole extra an experienced species var. parachinensis Lue l,21) diinly oedpiae rhlg r dnie in identified are orthologs duplicated more Additionally, 2014). al., et (Liu eoeasml hni h w te sebiso hsseiswith species this of assemblies other two the in than assembly genome Arabidopsis Brassica 7 .rapa B. pce (i.e. species pce Fg AB,wihi ossetwt the with consistent is which 3A,B), (Fig. species .rapa B. var. parachinensis .napus B. var. Brassica parachinensis , genomes .rapa B. 0,0,0,0,0,06, 05, 04, 03, 02, A01, ; .rapa B. . , .juncea B. var. .rapa B. parachi- and var. B. Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. f44snl-oygnsta r rsn nalo h pce.Tersl hw httethree the that shows result The species. the of all in present are of that distance genes evolutionary single-copy the elucidate 434 To collected of . 12 2016) for other 4) al., to et genome (Fig. cabbage model(Cheng flowering triangle The Chinese U current evolution. classical the a chromosome in and described polyploidy well studying important for ecologically model rapa six useful of a relationship as evolutionary species, serves allotetraploid family the Brassicaceae of The progenitor diploid the other with two juncea genomes; relationship and B. evolutionary of eudicot cabbage closer collection 20 flowering a a the has Chinese of of among analysis assembly families Phylogenetic genome gene of of overlap analysis the BUSCO of showing assemblies (B) diagram genomes; Venn eudicot (C) 20 across in (grey) genes of species. Distribution 3. Figure history. amplification the gene by specific caused or be may genomes which three models, these gene among among specific quality more assembly models has of (Fig.3C) difference gene genome cabbage flowering of Chinese overlap The the compared rapa we gene all 3B). and that Next, assembly (Fig. suggested value analysis genome BUSCO BUSCO of highest 2018). quality the al., higher has et a Zhang obtained 2018; 12 al., we the et that studies(Belser suggesting previous than (Fig.3A), annotation N50 lower relative a eoe(esre l,21;Zage l,21) oa f1,4 ee r hrdb l he genomes. three all by shared are genes 19,042 of total A 2018). al., et Zhang 2018; al., et genomes(Belser , .oleracea B. Brassica A itiuino rhlggop:snl oy(le,tocpe oag) n utpecopies multiple and (orange), copies two (blue), copy single groups: ortholog of Distribution (A) B.rapa pce aeahg ult fgnm sebyadtecurrent the and assembly genome of quality high a have species and , species. .nigra B. Brassica n he loerpodseis( species allotetraploid three and ) eoe n ih eae rsiaeeseisuigtecdn sequences coding the using species Brassicaceae related eight and genomes .rapa B. Brassica var. 8 Brassica parachinensis Brassica eoe eel hns oeigcabbage flowering Chinese reveals genomes .rapa B. eoe,w osrce hlgntctree phylogenetic a constructed we genomes, pce nldn he ili pce ( species diploid three including species .napus B. var. n te ersnaieplant representative other and parachinensis , .juncea B. .rapa B. and , n w other two and var. .carinata B. parachinensis Brassica was ) B. B. Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. vrl eoeacietr xetatasoaineetbtencrmsm n hoooe3that 3 chromosome and 1 chromosome between event translocation a three except these that architecture reveals map genome SyMAP between overall The and 2018). within al., species, et both this (Zhang of genes strains different orthologous of assemblies syntenic genome using conducted for was species analysis synteny Genome-wide between arrangements chromosomal Extensive of relationship phylogenetic plants. The 4 Figure the of of progenitor genome genome CC AA of the donor to the to with closer firstly allotetraploid is species, the that allotetraploid genome of the CC progenitor of diploid the genome to AA two closer Chinese the evolutionarily current to is The species, closer cabbage is flowering species. which Chinese investigated suggesting type the genome among juncea AA other B. each a from has separated cabbage flowering clearly are types genome .napus B. rsiajuncea Brassica hnteA eoeo another of genome AA the than , rsiarapa Brassica .napus B. Cgnmsadte with then and genomes CC Agnm n hnohrA eoe,pitn oi sbigeouinrl closer evolutionarily being as it to pointing genomes, AA other then and genome AA lo nteC eoeclade, genome CC the in Also, . isl,tegnm fCieeflwrn abg a oprdt w published two to compared was cabbage flowering Chinese of genome the Firstly, . .napus B. . .oleracea B. .rapa B. .rapa B. .rapa B. 9 line, Brassica var .oleracea B. var. 1Ble ta. 08 and 2018) al., et Z1(Belser italica .rapa B. .napus B. rsiarapa Brassica parachinensis species implying , var var Similarly, . pekinensis capitata sebisrti elconserved well retain assemblies .oleracea B. ihohrBrassicaceae other with a lsee rtywith firstly clustered was .rapa B. ntepyoeei tree, phylogenetic the in .rapa B. var 1wsclustered was Z1 var capitata pekinensis . a a has Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. hoooe5bek ttecnrmr ein(e loFg B n h ra vn smr ieyt occur to likely more is event break the since and history 5B) amplification Fig. oleracea also retrotransposon (see LTR region the centromere specific in the lineage at This by breaks species. 5 between caused two chromosome comparison these be While between to retained of divergence. well regions likely their is pericentromeric more chromosome whole orthologous is the the difference of to relationship syntenic compared the retrotransposon although LTR chromosomes of between enrichment all 5 clear chromosome of of regions in comparison pericentromeric 6 example, the and at 5 features mosome sequence three and these related structure among closely two genome for the assemblies compared contiguous highly two selected regions. other we species, repetitive the comparison, assemblies highly interspecies in of two for missed analysis Thus, other was genomic regions comparative than for pericentromeric regions required the are repetitive of assemblies pericentromeric part the large of especially A part assemblies, 1A,B,C,D). larger Fig. a (Supplementary resolved gene which among proliferation, regions assembly parts, LTR-retrotransposons pericentromeric the genomic current as the of such evolving Comparison of mechanisms rapidly assemblies 2018). major most al., three some et the by duplications(Liao segmental driven among and largely are conversions, be genomes to plant found of are number. same regions the pericentromeric with The labelled are in chromosomes evolution Homologous structure 2014)). Genome al., et Liu 2018; al., et ( genome eoesneybetween synteny Genome iue5 eoesneybsdo rhlgu ee ihnadbtenseisfor species between ( and assemblies within genes genome orthologous on based var. synteny Genome 5. Figure 15 over even well-conserved, is species diploid most of 2018). karyotype al., the et which history(Stein evolutionary in years monocot, during million occurred in that since models rearrangements changes chromosomal genus minimal extensive showed The Chr2) of and course ancestor. (Chr1 the common (i.e. chromosomes comparison a 2 numbers from Only the chromosome divergence 5B). different their performed (Figure the species we two Besides these Next, between 2014). 5A). and al., n=10 et genome, (Fig. AA Liu assemblies 2018; two al., et other the to between assembly our differentiates .oleracea B. parachinensis .nigra B. osnt(i,6C) (Fig, not does .rapa B. .oleracea B. Brassica var. ieg ic the since lineage and Brassica .rapa B. .rapa B. var. parachinensis . eoeeouini ieetfo h bevto in observation the from different is evolution genome .oleracea B. .oleracea B. .rapa B. A eoesneybetween synteny Genome (A) .rapa B. .rapa B. capitata . hs hoooerarneet a ea lentv as o h different the for cause alternative an be may rearrangements chromosome Thus, 1Ble ta. 08 and 2018) al., et Z1(Belser xeine ieg-pcfi T-ertasoo mlfiainhsoy For history. amplification LTR-retrotransposon lineage-specific a experienced pce rgnm ye.W on htteprcnrmrcrgoso chro- of regions pericentromeric the that found We types. genome or species ihdffrn sebyqaiy(upeetr i.1)rvae htthe that revealed 1E) Fig. (Supplementary quality assembly different with var. var. Ble ta. 08 i ta. 04 and 2014) al., et Liu 2018; al., et (Belser Cgnm,n9) eosre xesv hoooa rearrangements chromosomal extensive observed we ), n=9 genome, CC ; hc ersn w other two represent which , .rapa B. n w ihycniuu sebiso the of assemblies continuous highly two and Brassica pekinensis parachinensis hr h ytn lc with block synteny the share .rapa B. nih rmprcnrmrcregions pericentromeric from insight : .rapa B. seby hsrsl hw hthg otgosgenome contiguous high that shows result This assembly. 10 n w ihycniuu sebiso the of assemblies continuous highly two and and .rapa B. .rapa B. and .oleracea B. .nigra B. var. var. Brassica pekinensis parachinensis Fg A hwdthat showed 6A) (Fig. Fg B hwdta h ytn of synteny the that showed 6B) (Fig. eoetps(BadC) and CC), and (BB types genome .nigra B. Oryza .oleracea B. .rapa B. .oleracea B. Zage l,21);(B) 2018)); al., et (Zhang n ftewell-studied the of one , Fg A,wiethe while 6A), (Fig. n w other two and var. var. eoe(Belser genome parachinensis italica .rapa B. .oleracea B. .rapa B. Brassica .nigra B. .rapa B. (Belser a a has B. ; Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. ihszsrnigfo .K o1416K,ad855cmlxSswt mrcs rapit between breakpoints imprecise with SVs complex 1,374 in 8,565 deletions, and specifically 26,002 Using Kb, insertions insertions, LTR 1,431.6 rearrangements. 27,190 to chromosomal and 5.2Kb of large Z1 from total genomes in ranging two a exist these sizes identified not 5A, with we do Fig. approach, and in in shown alignment translocation duplications As genome single respectively. whole a Mb, in the 7.26 been and only within has Mb different far differing 5.51 are so SVs N50, work contig of assembly no glimpse genome knowledge, of first our in genomes a assemblies of the have genome using best high-contiguous with and the on them To gap based identifying knowledge variations. SVs fully structural identify in insertions to of difficulty conducted short the forms to all and due of (SNP) explored agronomicallyidentification commonly diverse polymorphism for less nucleotide reads. responsible are short are single SVs and translocations (InDels), to genome deletions and the Compared and in (INVs) typically encoded size, inversions phenotypes/traits. genes in (DUPs), the larger important or impact duplications 50bp greatly (DELs), are SVs that deletions (TRAs). alterations (INSs), genomic as insertions defined including generally Synteny. is (SV) f: variation repeats; Structural Tandem e: retrotransposons; in LTR variants d: Structural TE; DNA-type c: Gene; b: of Chr06 of Chr06 and oleracea oleracea .nigra B. three regions among pericentromeric the 6 at genome), systeny and and 5 features chromosome sequence of on analysis Comparative 6. Figure chromosome 6D,E,F). of comparison (Fig. the pattern Similarly, analogous regions. an pericentromeric revealed the 6 in observed features structure genome .rapa B. parachinensis C eoe and genome) (CC C eoe and genome) (CC B eoe and genome) (BB .rapa B. .nigra B. .nigra B. enovo De var. parachinensis parachinensis var. B eoe.Tak ntecro ltfo ue oinrrpeet :Chromosomes; a: represent: inner to outer from plot circos the in Tracks genome). (BB eoeasmle,epcal ihhg otgiy a aiiaei-et genome-wide in-depth facilitate can contiguity, high with especially assemblies, genome B eoe and genome) (BB parachinensis .rapa B. Fg A.O h neto vns 4 n 4 r on ob el occurred newly be to found are 847 and 845 events, insertion the Of 7A). (Fig. Brassica .nigra B. .rp a.parachinensis var. rapa B. .rapa B. 1Ble ta. 08 and 2018) al., et Z1(Belser seby ,6 ulctosi Z1 in duplications 1,368 assembly, parachinensis A eoe;()Sneympo h0 of Chr03 of map Synteny (E) genome); (AA A eoe;()Sneympo h0 of Chr03 of map Synteny (F) genome); (AA B eoe.()Sneympo h0 between Chr06 of map Synteny (D) genome). (BB var. genomes parachinensis .oleracea B. Brassica n 1asml,rsetvl,wihaecnitn ihtheir with consistent are which respectively, assembly, Z1 and 11 eoetps hns oeigcbae(AA cabbage flowering Chinese types: genome A eoe;()Sneympo h0 between Chr05 of map Synteny (C) genome); (AA A eoe;()Sneympo h0 between Chr05 of map Synteny (B) genome); (AA C genome) (CC .rapa B. var. assembly, rsiarapa Brassica parachinensis A ytn a fCr5between Chr05 of map Synteny (A) . Brassica n 6mdu-ie inversions medium-sized 46 and .oleracea B. .oleracea B. eoe,w dnie SVs identified we genomes, ti td) ahwith each study), (this eoe.T ls this close To genomes. .nigra B. C eoe and genome) (CC C eoe and genome) (CC B genome) (BB B. B. Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. n hti elaatdt h ihtmeaueadhg uiiyciaei h ot fCia tcan It China. of south the in climate abundant humidity the high Among and temperature 2019). high al., the et to Asia(Tan well-adapted in is grown that widely one been of has types which ecological value nutritional high with ( cabbage flowering Chinese of between times Discussion inversions insertions genomic of size medium Distribution of (B) rapa Example Brassica (D) imprecise. genes. continuous are impacting breakpoints highly duplication their three indicate in SVs LTR-retrotransposons Complex versa. vice between assemblies continuous nensis highly two using between identified variations Structural 7. Figure contiguous phenotype. on highly effect these potential formation on the inversion based thus the small-size variations and in on structural SVs structure of features genomic gene glimpse on of sequence first analysis of the local provide comparative our prevail role assemblies of the features together, Additionally, genome causal cases sequence Of Taken the two repeat 7D). variations. highlighting inverted genes. 7C, structural especially regions, (Fig. full of sequences, Fig. flanking mechanisms or repeat In the mutational that fragments the at found annotation. into gene we gene identified, insights with inversions the provide 46 overlap on also detected can based to deletions analysis regions shown and genomic gene insertions are of the duplication proportion with large tandem overlap A to 7B). found (Fig. was times insertion estimated recent relatively TD . 1 admdpiain nZ sebyrltv othe to relative assembly Z1 in duplications tandem Z1, var. parachinensis, rsiarapa Brassica .rapa B. htaepatda eealsi hn,Cieeflwrn abg sthe is cabbage flowering Chinese China, in vegetables as planted are that hc rvisin prevails which var. parachinensis Brassica .rapa B. Brassica 12 sa motn ef n otn tmvegetable stem bolting and leafy important an is ) rsiarapa Brassica Brassic lines. eoeevolution. genome eoeasmle.()Eape ftandem of Examples (C) assemblies. genome eoe n hi ucinlsignificance functional their and genomes a A oa ubro tutrlvariations structural of number Total (A) parachinensis 1and Z1 rsiarapa Brassica sebyadTD and assembly rsiarapa Brassica var. parachi- 1and Z1 pare Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. lob udvddit ee ouain uha olais hns ae alflwr rcoi Brussels Interestingly, , cauliflower, 2016). kale, Chinese al., kohlrabies, et as (Cheng such and in populations genome kale seven AA into the sprouts, that of genome subdivided found donor we AA be closest Meanwhile, evolutionary also the 2017). most of al., with the et donor be firstly Cai should clustered the 2018; was al., that et (sarson) suggested (Belser and Z1 and results varieties) sarsons zicaitai Our as such and varieties, 2016). cabbage flowering al., mixed Chinese and et in taicai wutacai, cabbages(Cheng rapes, choi, Chinese (pak oilseed), heading choi spring/winter pak and cycling morphotypes, rapid Japanese (sarson, sarsons The turnips), the European the of 2018). with evolution al., relationship the closest et analyze two the to with shows constructed not cabbage was but flowering tree Chinese phylogenetic the a Interestingly, study, present the In genome), 1.45Mb, of Fig.1E). N50 of a (Supplementary assembly with assembly oleracea the Sequel its PacBio in in genus by However, regions sequenced The S1E), repetitive S1). al., (Figure large 2018) et (Fig. the al., assembly(Belser present 2018) et current not al., Zhang does the 2020; et al., than Zhang et pattern 2020; Song studies, weaker 2018; al., previous relatively et but the Song similar in 2018; a example, present For assemblies technology. genome sequencing long-read by lineage be other two to Brassica contact the seems Hi-C in expansion Hi-C sparse in pattern this the clear B06 Strikingly, similar in by a a 3). C06 verified have observe and (Fig. be 6 not C05 sequences further and do repetitive can we 5 observation by chromosome since This caused of specific mostly 6). regions few is very pericentromeric Fig. that and the 2B; signal regions pericen- which pericentromeric (Fig. in the other region map of to this contact comparison most in in resolved annotated expanded genome were significantly genes cabbage be flowering to found Chinese were the the 6(A06) of of assembly the regions the of tromeric study, most present surpassed the the and In study, of the present completeness to the The similar in was S5). analysis oleracea which (Table BUSCO of related 2018) 47.4Mb, the length of al., of using N50 genome et validated size scaffold technology(Belser was The maximum (97.8%) Nanopore the chromosomes. genome 10 with with of onto sequenced Mb, genomes contigs genome the 32.3 Mb Z1 than reached 545 longer assembly than much more final and scaffold the to 2018), technique al., Hi-C et two the the Zhang obtained the we 2018; than sequencing, al., longer genome rapa is et this Zhang the which technology(Belser in 2018; assemble Mb, used Nanopore al., to plants 7.26 and et the reads of of Belser long length (0.16%) greatly PacBio 2019; N50 ratio used al., contig can heterozygous we et low technologies study, Wang the pre- this of sequencing 2020; Because In model al., long-read et gene 2018). recent assembly(Song al., and that genome et development of demonstrated marker continuity have genome-wide the studies improve for Enormous critical is diction. assembly genome. genome cabbage continuous flowering for Highly Chinese studies In the evolutionary of for process. content resource vernalization repeat data strict and genomic a ecological valuable for important a this need provides rapa of the which assembly without genome cabbage, chromosome-level products flowering first flower Chinese the tender report for we study, round this year all planted be .juncea B. n related and and .nigra B. var. C eoe,wihfrhrhbiiet iers otrealplpodspecies, allopolyploid three to rise give to hybridize further which genome), (CC A BadC eoe.I swrhntn htsc ag eeiiergoscnol eresolved be only can regions repetitive large such that noting worth is It genomes. CC and BB AA, .juncea B. .oleracea B. Brassica sms ieyfo h a higop(hns oeigcbae ncnrs oother to contrast in cabbage) flowering (Chinese group choi pak the from likely most is botrytis .oleracea B. Fg A.Ti ieg pcfi xaso a lyarl nteeouinr iegneof divergence evolutionary the in role a play may expansion specific lineage This 6A). (Fig. Brassica .rapa B. Brassica .rapa B. AB eoe,and genome), (AABB Sne l,21)and 2019) al., et (Sun otistrebscgenomes, basic three contains eune sn luiatcnlg(i ta. 04 age l,21) eapplied We 2019). al., et Wang 2014; al., et technology(Liu Illumina using sequenced .rapa B. pce.Ti rsn td stefis orpr ntegnm ie heterozygosity, size, genome the on report to first the is study present This species. eoe Cieecbaeadylo asn(i.4(esre l,21;Zhang 2018; al., et 4)(Belser sarson)(Fig. yellow and cabbage (Chinese genomes and pce eune hsfr including far, thus sequenced species pce a efrhrsbiie nosxppltos unp Cieeand (Chinese turnips populations: six into subdivided further be can species .napus B. mn hm h eietoei ein fcrmsm A5 and (A05) 5 chromosome of regions pericentromeric the them, Among . .napus B. .carinata B. Ble ta. 08 oge l,22) n hoooeB5and B05 chromosome and 2020), al., et Song 2018; al., et (Belser .rapa B. Agnm n hnohrA eoe,ipyn htit that implying genomes, AA other then and genome AA 1Ble ta. 08 TbeS5). (Table 2018) al., et Z1(Belser .rapa B. 13 BC eoe(hn ta. 06 u ta. 2019). al., et Sun 2016; al., et genome)(Cheng (BBCC A genome), (AA .rapa B. .oleracea B. .oleracea B. Brassica .napus B. eoe eune eetyb PacBio by recently sequenced genomes .rapa B. .nigra B. eoetps ..chromosome i.e. types, genome .rapa B. iial,the Similarly, . DMBle ta. 2018), al., et HDEM(Belser var. var. capitata parachinensis .juncea B. B eoe,and genome), (BB .rapa B. 1and Z1 .napus B. Brassica cbae)was (cabbages) .oleracea B. .rapa B. .napus B. Ble tal., et (Belser Agenome AA genome. (AACC .rapa B. species. .rapa B. .rapa B. strain, can AA B. B. B. B. Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. ae,P . ugbn . oiz .A,Ca,C-.K,Ya,Y,Le . dad,D (2017). D. Edwards, . . . H., Lee, genomes. Y., napus Yuan, Brassica K., related C.-K. closely Chan, two A., of A. Genome. comparison Golicz, Human B., and the Assembly Hurgobin, of Eichler, E., Alleles GeneBank P. . Variant . Bayer, . Structural E., National Major A. Welch, the M., China Characterizing Sorensen, S., 663–675.e19. the Cantsilieris, (2019). A., E. T. Graves-Lindsay, in E. A., Sulovari, A., deposited P. Audano, were data References sequencing Bioproject. submitted same was assembly the RNA chromosome under final CNGBdb The and to CNP0001121. number Bioproject genome under DataBase(CNGBdb) raw authors. The the all revised by substantively approved and and Y.-S.Z., Accessibility read analysis Data been data analysis. has in variants manuscript participated structural final B.-H.C., and D.J. The and and analysis, manuscript. J.-T.W., G.-J.C., evolution Y.L., J.-J.L., genome G.-G.L., H.-L.R., assembly, manuscript. X.-C.D., draft genome the the wrote and to project contributed the Y.L.designed and H.Z. C.-M.C. Guangzhou of Program contributions Technology Author and Science and (2018B020202010), Development and Province (201804010320). Research Key-Area Guangdong Guangdong the of the (2020A1515011396), (202002020007), Foundation Program Guangzhou Research of Basic Program Technology Applied and and Science Basic the by funded was work This Acknowledgements of practice breeding molecular the the of two rela- between 6 evolutionary (SVs) closer and ations a the 5 in has chromosome regions genome the genomic of this accurate on progenitor indicates its events diploid analysis and expansion AA cabbage phylogenetic flowering the The Chinese with of annotation. tionship assembly TE detect genome characteristics. chromosome-level and that phytochemical a SVs gene and report of morphological we report as summary, first such In the variations, is phenotypic 8,565 This to and contribute Kb, 7). further 1,431.6 may (Fig. to them 5.2Kb between from breakpoints size between SVs imprecise with detect inversions with in we medium-sized study, SVs rice(Fuentes studies current 46 complex In as and SVs such genes. duplications and two assembly- coding species, 1,368 quality. 2020), of of plant deletions, whole proportion Weigel, assembly assemblies of large & genome the on range a tomato(Voichek the affect wide depend reads, can between 2020), a SVs still al., short in that indicate but and et Illumina 2019). 2020) 2010), Weigel, Maize(Mahmoud theory al., on al., 2019), in et et based al., SVs SVs(Fuentes Huang et are of 2019; the al., that recover significance et vari- functional fully methods human(Audano genomic the can detection of SNPs. reveal method landscape SVs than also the based to regions but of genomic understanding comparison species larger our between In help impact and only can within not variations would ation structural discovery that (SV) found variant Structural studies evolutionary of of interpretation cases the Numerous in aid can assemblies genome among continuity relationship high that demonstrated we in Thus, genome CC of donor the two with firstly clustered vrl,orhg-ult eoeasml fCieeflwrn abg rvdsavlal eei resource genetic valuable of a provides evolution cabbage genome flowering the Chinese deciphering of for assembly genome high-quality our Overall, Brassica eoe sn ihcniut eoeasmle.TeeSsmyaetcdn ee that genes coding affect may SVs These assemblies. genome contiguity high using genomes Brassica Brassica .napus B. .rapa B. .napus B. species. BadC eoe.Fnly erpr ag muto tutrlvari- structural of amount large a report we Finally, genomes. CC and BB .rapa B. ie Z and (Z1 lines Cgnmsadte with then and genomes CC a rbbyeovdfrom evolved probably was rsiarapa Brassica Brassica crops. .juncea B. pce n twudsrea h eeec eoeguiding genome reference the as serve would it and species parachinensis 14 ie n dnie oa f2,9 netos 26,002 insertions, 27,190 of total a identified and lines eas on h ieg pcfi pericentromeric specific lineage the found also We . Brassica .oleracea B. .oleracea B. sn ihcniut eoeassemblies. genome continuity high using ) Agnm oprdt h orthologous the to compared genome AA var. var. ln itcnlg Journal Biotechnology Plant capitata italica Arabidopsis bocl) implying (broccoli), cbae)(i.4). (Fig. (cabbages) Cell Vihk& (Voichek , 176 (3), , Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. oh .A,Huod . icelOd,T 20) oprtv vltoayaayi fchalcone of analysis evolutionary Comparative (2000). (Brassicaceae). of genera T. related and Mitchell-Olds, Arabis, pro- Modulation Arabidopsis, & in alignment loci B., dehydrogenase alcohol Haubold, sequence and synthase A., genomes. multiple (2020). M. human America MAFFT Koch, of and duplication, Y. States cauldron: mouse the Evolution’s United the the (2003). of Lu, D. of in Haussler, Sciences & Parallelization rearrangement W., & Miller, and A., Hinrichs, deletion, R., C., Baertsch, (2010). . J., . . W. . Kent, H. Shi, detoxi- toxicity Toh, methylglyoxal D., salt & gram. and Wang, mitigates K., ascorbate-glutathione acid Katoh, J., of salicylic induction Sun, by https://www.sciencedirect.com/science/article/pii/S0147651319312084 coordinated systems (2010). K., H. and K. fication Burns, Xie, performance . . genome. . growth human A., M., the M. Robinson, in P., variants Shen, Kamran, structural T., major Niranjan, are Y., Lu, repeats M., interspersed (2007). A. Mobile Schneider, L., R. C. S. Alexandrov,Huang, . . . M., R. Mohiyuddin, https://etda.libraries.psu.edu/catalog/7971 F., J. genomes. Hoz, rice la Harris, 3000 De in S., comparisons variants Smith, genome Structural J., whole (2019). Duitama, in L. N. D., biases Chebotarov, E. fundamental R., Aiden, R. solving & Fuentes, OrthoFinder: S., accuracy. E. (2015). inference orthogroup S. Lander, improves Kelly, H., dramatically & E. M. M., Huntley, Aiden, D. P., Emms, S. Experiments. . S. Hi-C . . Loop-Resolution Rao, Analyzing C., I., for System N. Machol, 3 One-Click scaffolds. a S., Durand, chromosome-length Provides M. Juicer yields M., Shamim, Hi-C (2016). Hoeger, using C., K., genome N. aegypti S. Durand, Aedes Nyquist, the D., of parallel A. assembly Subgenome novo Science Omer, (2016). De S., X. (2017). Wang, S. L. rapa Batra, Brassica . . in . O., domestication Y., Dudchenko, crop Zhang, convergent F., and Zhang, oleracea. diversification Brassica H., morphotype and Zheng, with ge- Plant X., associated is Hou, (2014). P. selection R., Wincker, Sun, . F., . . Cheng, X., genome. Wang, oilseed H., napus Tang, Brassica P., post-Neolithic MAKER: the A. 950–953. in I. (2008). evolution Parkin, M. allopolyploid Yandell, S., Early Liu, netics. . F., . . Denoeud, B., B., Moore, 2.0: Chalhoub, E., Genome Ross, genomes. rapa G., organism Brassica model Parra, emerging C., for (2017). 188–196. M. designed X. S. pipeline annotation Wang, Robb, easy-to-use I., an Korf, . . Re-annotation. L., . B. Gene Y., Cantarel, and Cui, Re-assembly J., Liang, Sequence J., through 649–651. Wu, Upgrade B., (2018). Reference Liu, J.-M. A X., sequences. Wang, Aury, DNA C., analyze maps. Cai, to . program optical . . a and finder: C., repeats reads Falentin, 27 Tandem long F.-C., (1999). G. nanopore Baurens, Benson, using M., Dubarry, genomes plant E., Plants of Denis, assemblies B., Chromosome-scale Istace, C., Belser, 15 1,95–98. (1), 2,573–580. (2), 1602–1610. (12), Bioinformatics , 4 , 1) 879–887. (11), 356 63) 92–95. (6333), , aueGenetics Nature 26 1) 1899–1900. (15), mrvdpiws lgme fgnmcDNA genomic of Alignmnet pairwise Improved , 48 , 100 1) 1218–1224. (10), 2) 11484–11489. (20), eoeResearch Genome 15 eoeBiology Genome rceig fteNtoa cdm of Academy National the of Proceedings , , 16 29 Ecotoxicology 157. , 5,870–880. (5), Cell eoeResearch Genome oeua Plant Molecular uli cd Research Acids Nucleic , eree from Retrieved . 141 eree from Retrieved . Science 7,1171–1182. (7), elSystems Cell , 345 Molecular , , (6199), Nature 18 10 (1), (4), , , Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. cwrz . et .J,Si,A,Zag . arsh . adsn .C,...Mle,W (2003). W. Miller, . . . C., R. Hardison, R., Baertsch, (2014). Z., G. Zhang, A. A., Sharpe, Smit, BLASTZ. . with J., . . Brassica alignments W. E., Human-mouse mesopolyploid Kent, W. the Clarke, S., in S., Schwartz, dominance Kagale, genome J., Bench- of S. (2019). relics Robinson, B. reveals H., oleracea. M. profiling Tang, Hufford, methylome C., pipeline. and Koh, . comprehensive Transcriptome . P., . streamlined, A. J., a I. A. of Parkin, Hellinga, A., creation R. for J. methods Agda, Biology annotation K., Genome Chougule, element Y., transposable Liao, marking W., Su, S., Ou, aiudn . ir,S,L,H,&Auu .(09.EcetAcietr-wr ceeaino BWA- of Acceleration Architecture-Aware Efficient (2019). S. Systems. Aluru, Multicore Chinese & of for H., senescence degradation. Li, MEM leaf S., chlorophyll delays Misra, and Melatonin M., biosynthesis Vasimuddin, (2019). J. acid R. Research abscisic Brassica Reiter, Pineal ABFs-mediated of & of W., genome suppressing Journal Lu, high-quality by J., The cabbage Kuang, (2017). flowering Z., Fan, C. L., Tong, morphotype. X. & Tan, semi-winter in M., history Guan, introgression Y., https://onlinelibrary.wiley.com/doi/abs/10.1111/tpj.13669 the Zhou, from reveals Q., “ZS11” species. Hu, of cultivar Brassica sequence G., napus in genome Fan, genome Draft C (2019). F., the X. Sun, Shan, into insights . . new of . provides Genomes X., Yao, botrytis) Research (2018). H., var. Horticulture A. Jiang, L. R. W., the oleracea Zhang, Wing, across (Brassica X., cauliflower innovation . Zhang, . C., and . Wang, turnover C., D., conservation, Zhang, Sun, genetic L., highlight Zhang, relatives J., genomes high-quality D. rice Oryza. Eight Zwickl, genus wild (2020). D., and L. Copetti, domesticated Guo, Y., 13 Yu, . . napus. . C., Brassica S., J. Wang, of Stein, Z., differentiation Yang, ecotype C., and BUSCO: Guo, architecture J., (2015). pan-genome Hu, reveal M. Z., E. Guan, J.-M., Zdobnov, Song, & V., orthologs. E. single-copy Kriventseva, with P., completeness 3210–3212. annotation Ioannidis, and assembly M., genome assessing R. Waterhouse, A., Sim˜ao, F. to Related Possibly Lines Inbred Maize of Genomes Novel Two in the Tolerance. Variants across Glyphosate found Structural triplication of Chromosome Identification J., (2005). Gracz-Bernaciak, I. Brassica M., Schubert, The Mahmoud, & (2014). A., H. Pecinka, A. Paterson, A., Brassiceae. M. . tribe sativa Koch, . . Oryza A., P., genomes. A. M. of polyploid I. Lysak, Comparison of Parkin, evolution D., (2018). asymmetrical Edwards, the M. C., reveals Tong, Chen, genome X., Yang, oleracea Y., . . Liu, . S., Z., Liu, Regions. doi: Bai, Centromeric the J., from Chen, 2020.05.13.094516). Escape Cell T., Gene Selection-Driven Plant Liu, and (p. Reveals B., Genomes domains Drosophila brachyantha Li, Oryza in associating and X., Topologically function Zhang, and Y., (2020). (2009). structure Liao, A. J. genome M. J. of Marra, Emerson, evolution & the . 10.1101/2020.05.13.094516 M., . in . Chakraborty, D., role X., Horsman, their Zhang, R., Y., Gascoyne, genomics. Liao, comparative J., for Connors, aesthetic I., information Birol, an J., Circos: Schein, M., Krzywinski, Evolution and Biology eoeBiology Genome , 30 aueGenetics Nature 8,1729–1744. (8), , eoeResearch Genome 20 1,275. (1), , , 17 6 Plants 82. , 1) 1483–1498. (10), , eree rmhttps://onlinelibrary.wiley.com/doi/abs/10.1111/jpi.12570 from Retrieved . 15 , , 6,R77. (6), 9 50 4.di 10.3390/plants9040523 doi: (4). 09IE nentoa aalladDsrbtdPoesn Symposium Processing Distributed and Parallel International IEEE 2019 2,285–296. (2), , 15 yik,M,Krlwk,W,Tadwk,T,&Tcesa .(2020). A. Tyczewska, & T., Twardowski, W., Kar lowski, M., Zywicki, ˙ 4,516–525. (4), eoeResearch Genome 16 eoeResearch Genome , 13 1,103–107. (1), aueCommunications Nature , 19 auePlants Nature 9,1639–1645. (9), Bioinformatics h Plant The , 6 1,34–45. (1), Retrieved . , , 5 31 3930. , (19), The Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986559.90881792 — This a preprint and has not been peer reviewed. Data may be preliminary. fteTasrpino rA0x yaBTP1TasrpinFco sAscae ihGibberellin- with Associated Is Activation Factor (2019). Storage. fast Transcription J.-Y. during MECAT: Chen, Cabbage BrTCP21 Flowering a . (2017). Chinese Sciences . . by in Z. J.-W., Senescence BrGA20ox3 Chen, Xie, Leaf Z.-L., of Delayed Liu, Transcription X.-L., . Tan, the . reads. . Z.-X., of Zeng, sequencing Y., Y.-M., Xu, single-molecule Han, X.-M., for Y., Xiao, assembly Wang, novo K.-N., de Chen, and visualization. 14 S.-Q., correction, and Xie, error control Y., quality mapping, (2018). Chen, A. analysis, B. C.-L., data Gruning, Hi-C MCScanX: Xiao, . reproducible . . (2012). for R., H. Research server Gilsbach, Acids A. G., web Nucleic Renschler, Paterson, a G., Richard, HiCExplorer: S., . . Galaxy Nothjunge, collinearity. . V., and X., Bhardwaj, synteny Wang, J., gene Wolff, J., of Li, Project analysis X., evolutionary Sequencing Tan, and Genome 40 D., detection level rapa for J. Brassica Chromosome Debarry, toolkit a H., . (2019). . Tang, . J. Y., S., rapa. Wang, Wang, Brassica Liu, species J., crop . Wu, mesopolyploid . . the R., of Q., 1035–1039. genome Sun, The Xu, J., (2011). Wang, B., Consortium. H., (2014). Song, Wang, M. H., X., A. Wang, Zhang, Earl, genomes. X., Brassica Liu, . of . . analysis R., comparative S., Guan, improvement. Sakthikumar, assembly W., A., genome Wang, and Abouelliel, detection M., variant microbial Priest, comprehensive One T., plants for PloS tool in Shea, integrated variation T., an phenotypic Abeel, Pilon: underlying J., variants B. genetic Walker, Identifying (2020). genomes. D. complete Weigel, without & Y., Voichek, (IPDPS) ag . i,H,M,Z,Zu . o,M,Mo . ag .(09.Crmsm-ee genome Chromosome-level Tibetan (2019). the R. of Yang, control environment quality high-altitude . fast . harsh . a the Y., HTQC: to Mao, adapted (2013). M., fish B. Zou, a Zhu, Y., , Zou, Plateau. tibetana . . Z., Triplophysa . Ma, of of X., H., assembly sequence Xiao, Liu, genome J., The X., Zou, Yang, J., (2016). data. Wu, M. sequencing F., Illumina Zhang, selection. Liu, for influencing D., toolkit . . expression Liu, . gene X., B., Yang, homoeolog Liu, differential F., of Cheng, analysis C., Genetics and Nature Ji, juncea X., Brassica Wang, allopolyploid D., Liu, J., Yang, hn,L,Ci . u . i,M,Go,S,Ceg . ag .(08.Ipoe Brassica Improved (2018). X. Wang, . . . F., Cheng, technologies. capture S., conformation Grob, chromosome M., and sequencing Liu, single-molecule Research J., by Horticulture Wu, genome X., reference rapa Cai, L., Zhang, 1) 1072–1074. (11), e49. (7), , o:10.1109/ipdps.2019.00041 doi: . oeua clg Resources Ecology Molecular , 20 9 1) e112963. (11), 1) o:10.3390/ijms20163860 doi: (16). , 48 1) 1225–1232. (10), , 5 , 50. , 46 W) W11–W16. (W1), aueGenetics Nature M Bioinformatics BMC , 19 4,1027–1036. (4), ln oeua Biology Molecular Plant , 52 5,534–540. (5), 17 , 14 33. , , 99 3,237–249. (3), nentoa ora fMolecular of Journal International aueGenetics Nature uli cd Research Acids Nucleic aueMethods Nature , 43 (10), , ,