Marker development for the study at micro- and macro-evoluonary me scales in neotropical Palms

Marylaure de la Harpe, Oriane Loiseau, Jaqueline Hess, Nicolas Salamin, Chrisan Lexer, Margot Paris

(picture: Oriane Loiseau) POPCORN, a muldisciplinary project

Using Populaon Genomics, Phylogenecs and Community Ecology to understand Radiaons in Neotropical mountains

Populaon Phylogeny genomics

Community ecology Ideal markers

• Many markers widespread along the genome

• Low cost in order to genotypes thousands of samples

• Evoluon rate suitable for both macro and micro evoluon studies

Mutaon rate divergence

Study level

kingdom Order Family populaons

• Long sequences (>600 bp) for phylogeny and selecon tests

• Include candidate genes for adaptaon and “neutral” non-genic markers

• Include markers already used for phylogeny in palms • Can be applied to low quanty and quality DNA from specimens Target capture sequencing

Very flexible as we can choose the targets: -number -nature (genes, non-genic regions) -candidate genes/regions -locaon in genome -length (in bp) -…

hp://www.arborbiosci.com/products/custom-target-capture/ Oil palm genome: very useful

• Oil palm genome is the closest reference genome

too divergent for proper capture design ?? especially because we are not interested in conserved regions

Building Geonoma reference sequences • Whole genome sequencing of the species G. undata (27x coverage, Illumina PE150bp)

• Reference assisted reconstrucon of the G. undata genome

 94% of the genes recovered (UTRs + exons + introns)

 Low recovery of the inter-genic regions (repeats, too divergent to the oil palm,…) Criteria for the selecon of 4’051 genes

• Broad range of rates of molecular evoluon Criteria for the selecon of 4’051 genes

• Divergence to oil palm used as proxy for rate of molecular evoluon

Highly conserved genes: suitable range of Highly variable genes: Histogram of dat1$Divergence Not informave at our scale evoluon rates Mostly paralogues, pseudo or paral genes,… 1500

All genes 1000

No. of genes Selected genes 500 0

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Divergence to oil palm Criteria for the selecon of 4’051 genes

• Broad range of rates of molecular evoluon • Mostly single copy genes (using coverage and He info)

• Average size of 1’300 bp

• Interesng funcons: pathogenesis; flowering; response to UV, light; floral scents;…

• 8 genes previously used for phylogeny + 141 Heyduk et al. (2015) genes

• Even distribuon in the genome (around 160Kb on average between 2 target genes)

Addional 133 non-genic regions

• 5 to 15 per chromosomes • 800 bp length in average 4’770’883 bp in total • As far as possible from genes Sampling for kit evaluaon

MK891_B

MK891_A MK891 - 5 “phylogenec” samples from 3 palm subfamilies MK891_D populaon up to 83 Miy divergence to G undata. MK891_C samples MK891_E

- 5 “G. undata intraspecific” samples IO438_E from 5 different populaons IO438_B IO438 IO438_A populaon - 2 x 5 “G. undata populaon” samples IO438_D samples

IO438_C

Quind_A

IO260_A

IO026_A Asterogyne guianensis Cocos nucifera Ceroxylon alpinum Licuala merguensis

0.0070 Sampling for kit evaluaon : phylogeny samples

3 palm subfamilies represented Up to 80 Miy divergence to G. undata

Licuala merguensis

Ceroxylon alpinum

Cocos nucifera

Asterogyne guianensis Geonoma undata hp://www.palmweb.org Protocole

DNA extracon 250-500ng of DNA used “Home-made + KAPA” library preparaon dual index sequencing

Quanficaon and pooling

Mybait target capture + PCR 11 cycles

Illumina sequencing PE 2x150bp (2 Million PE reads per sample)

Total cost per sample = 80 $ High reproducibility

The all procedure (library preparaon + target capture + sequencing) was done in duplicate for each sample to test for reproducibility

'$!!" #!!!" Ceroxylon alpinum Geonoma undata '&!!" '#!!" (Ceroxyloideae) (Arecoideae) '%!!" '!!!" '$!!"

'#!!" &!!"

'!!!"

%!!" &!!"

$!!" %!!"

$!!"

#!!" #!!"

!" !" !" #!!" $!!" %!!" &!!" '!!!" '#!!" '$!!" '%!!" !" #!!" $!!" %!!" &!!" '!!!" '#!!" '$!!" '%!!" '&!!"

Coverage per bait - Replicate 2 Coverage per bait - Replicate 1

High for all sample, for all 3 subfamilies (correlaon coefficient range: 0.94 – 0.98) High efficiency of the method

!"#!$$#%&'()# *"#+,$-#,.,/01,'2#%&'()# 100 100 80 80 60 60 40 40

Global Efficiency Global !"#$#%&'($)&*&' Efficiency Global +,*"-#./$"'.(0&$"$,0,' 20 1#2#,'$(203"-&' 20 1"-#4/5#$'&560$(%' 702(&5&'%"-.("$,0,' 0 0

0 500000 1500000 2500000 0 500000 1500000 2500000 Sequencing effort Sequencing effort Factors influencing bait efficiency 200 200 150 150 100 100 50 50 !"#$%'>0#'(0*% 0 0

1 3 5 7 9 11 13 15 17 19 21 17.5 27.5 37.5 47.5 57.5 67.5 77.5 !"#$%&'()#$*% ;<%0,($'($%9=:% 200 200 150 150 100 100 50 50 !"#$%'>0#'(0*% 0 0

0 25.8 62.9 67.2 83.8 0.01 0.05 0.09 0.13 0.17 6#4'27'(0'%58'%$,%!"#$%&'('#9.*2:% +"$'%,-%.,/'01/"2%34,/15,(% SNP detecon

! Efficiency of the bait set for phylogeny

MK891_B

MK891_A MK891 RAxML tree with concatenated data MK891_D populaon

100 MK891_C samples 100 Geonoma undata High branch support, even within species MK891_E

IO438_E

100 IO438_B IO438 100 IO438_A populaon 100 IO438_D samples 96

IO438_C 100

100 Quind_A

IO260_A 100 IO026_A 100 Asterogyne guianensis 100 Cocos nucifera Ceroxylon alpinum Licuala merguensis

0.0070 Efficiency of the bait set for populaon genomics

A. B. MK891 IO438 1.2

0.00 Quind 1.0 0.8 -0.10 0.6 IO260 Cross-validation error IO026 -0.20 -0.10 0.00 0.10 1 2 3 4 5 6 7 C. K 1.0

0.8

0.6

0.4

0.2

Ancestry proportions (K=5) 0.0 Quind_A IO260_A IO026_A IO438_A IO438_B IO438_E IO438_C IO438_D MK891_A MK891_B MK891_E MK891_C MK891_D Ongoing work

Different sets of bait lists : - Full 60’000 baits = popcorn kit Mybait kit 3 - 57’000 baits = combine popcorn kit + Heyduck bait set (2015)

- 57’000 baits = popcorn kit Other companies - 54’000 baits = combine popcorn kit + Heyduck bait set (2015) size kit

- 20’000 baits = reduced phylogeny informave kit Mybait kit 1

Phylogenec RAxML trees, informaveness branch support

PhyDesign (Heyduck et al. 2015) Thanks for your aenon

Chrisan Lexer Marylaure de la Harpe

POPCORN group