About the Dryad repository for MER-20-0380 (doi:10.5061/dryad.z08kprrbj):

A multi-tiered sequence capture strategy spanning broad evolutionary scales: Application for phylogenetic and phylogeographic studies of orchids

Rod Peakall1, Darren C. J. Wong1, Ryan D. Phillips1,2, Monica Ruibal1, Rodney Eyles1, Claudia Rodriguez-Delgado1, Celeste C. Linde1 1Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT 2600, Australia. 2Department of Ecology, Environment and Evolution, La Trobe University, Vic. 3086, Australia.

This Dryad repository contains the following:

1. DNA sequences in Fasta format for each of sequence Sets 1 to 5, as provided to Roche NimbleGen for the probe design.

2. Final sequence alignment and homologous k-mer block files for all phylogenetic trees provided in the main text and supplement.

3. Tree files for all phylogenetic trees provided in the main text and supplement.

4. Functional annotation and MapMan BIN ontology of the target gene sequences.

5. Outcomes of HybPiper paralog risk analysis.

This readme file provides a brief background to each of the five folders containing this data. Please also see the full publication, and the online supporting information.

All raw transcriptome sequence reads have been deposited in NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) under the BioProject and SRA study accession of PRJNA661963 and SRP281918, respectively.

1. DNA sequences in Fasta format

1.1. Folder and file structure

1. Sequence Sets Sets1-2FinalSeq.fa Sets3-5FinalSeq.fa 1.2. About the target sequence sets

The final set of species for which transcriptomes were available or have been generated, is shown in Table 1 and Figure 2 of the paper. Table S1 also provides further details of the associated sample source, location, and voucher specimen details. Based on our objectives, and transcriptome availability, we identified five different target sequence sets for use in the final probe design (Tables 1 and 2).

It is important to stress that while we will often refer to each of these target sequence sets individually in the paper, in practice a single DNA probe set was used in the hybridization step to simultaneously capture the target loci across all target sequence sets. Below we briefly outline the purpose of each sequence set:

Target Sequence Set 1 – : Genes matching loci from Deng et al. (2015), who identified 315 putative orchid wide single locus orthologs, were the target of this set, drawing on sequences from our 11 ‘core’ transcriptomes (Figure 2 and Table 1). Set 1 ultimately targeted 230 loci (Table 2).

The sequences for this set are contained in the path: 1. Sequence Sets/S1- 2FinalSeq.fa.

Target Sequence Set 2 – Diurideae: To supplement sequence Set 1, additional target genes common across our 11 core transcriptomes spanning the Diurideae and outgroup (Figure 2, Table 1).

Note that this set also includes sequences for the following targets labelled S2-ACOX, S2-ADH, S2-IVD, S2-KAS, S2-RED and S2-PKS. These were included for purposes other than phylogenetic analysis (Wong et al., 2019; Wong et al., 2018; Wong et al., 2017; Xu et al., 2017), and may pose high paralog risk.

The sequences for this set are contained in the path: 1. Sequence Sets/S1- 2FinalSeq.fa.

Target Sequence Set 3 – Other Diurideae: The goal of this set was to expand the number of genes targeting the Acianthinae, Diuridinae, Cryptostylidinae,

Peakall et al. MER-20-0380-About the Dryad Repository 2 Thelymitrinae, Megastylidinae, and Prasophyllinae, for which phylogenetic relationships at the tribe level were not fully resolved by Weston et al. (2014) (Figure 2, Table 1).

The sequences for this set are contained in the path: 1. Sequence Sets/S3- 5FinalSeq.fa.

Target Sequence Set 4 – Drakaeninae: This sequence set was aimed at the Drakaeninae, based on six transcriptomes representing three of the 11 core transcriptomes, plus three additional transcriptomes. Collectively, the transcriptomes covered the genera Caleana, Chiloglottis and Drakaea as representatives of the subtribe, and Rimacola as the outgroup (Figure 2).

The sequences for this set are contained in the path: 1. Sequence Sets/S3- 5FinalSeq.fa.

Target Sequence Set 5 – Caladeniinae: This sequence set was aimed at the Caladeniinae, based on seven transcriptomes, representing three of the 11 core transcriptomes, plus four additional transcriptomes. Collectively, the transcriptomes covered six Caladenia species, representing one species in the subgenus Phlebochilus, five species within subgenus Calonema, and Corybas as the outgroup (Figure 2 and Table 1), yielding 1052 gene loci.

The sequences for this set are contained in the path: 1. Sequence Sets/S3- 5FinalSeq.fa.

2. Final sequence alignment and homologous k-mer block files

2.1. Folder and file structure

2. Alignments 2.1 Option 1 D67 Alignment Sets 1-5 D65S1Aln D67S2Aln D67S3Aln D67S4Aln D67S5Aln 2.2 Option 1 C72 Alignment Set 5 C72S5Aln 2.3 Option 2 D67 kmer Alignment D67_k25c10q2w35.nex0 D67_k25c10q2w35.nex0.sets D67_k25c10q2w35.nex0.treefile

2.4 Option 2 C72 kmer Alignment: C72_k25c11q0w35.nex0 C72_k25c11q0w35.nex0.sets C72_k25c11q0w35.nex0.treefile

Peakall et al. MER-20-0380-About the Dryad Repository 3

2.5 Option 3 D67 HybPiper Alignment: D67_cov50_S2.phy D67_cov50_S2.phy.treefile 2.2. About the alignments

The above alignments are split by the two test data sets outlined in the paper: The D67 sample set, comprising 67 samples spanning the Diurideae, and an outgroup, intended for exploratory analysis of sequence Sets 1-5.. The C72 sample set, comprising 72 samples of Caladenia from subgenus Calonema, and C. denticulata from the subgenus Phlebochilus as the outgroup, intended for exploratory analysis of sequence Set 5, targeting the Caladeniinae.

3. Tree files for all phylogenetic trees

3.1. Folder and file structure

3.1 Tree Files Main Fig6A D67_S2_gcf_scf.cf.tre Fig7A C72_S5_gcf_scf.cf.tre Fig8A C72_k25c18q2.nex0.tre

3.2 Tree Files Supp FigS10A C72_S5_Intron.tre FigS14A C72_S5.tre FigS14B C72_S1.tre FigS14C C72_S2.tre FigS5A D67_k25c10q2w35.nex0.tre FigS5B D67_cov50_S2.phy.tre FigS7A D67_S1.tre FigS7B D67_S2.tre FigS8 D67_S2_ASTRAL.tre FigS9 C72_S5_ASTRAL.tre 3.2. About the tree files

The tree files are in Newick format and can be read in FigTree or other tree reading programs. Please refer to the figure captions in both the main text and supplement for details.

4. Functional annotation and MapMan BIN ontology

4.1. Folder and file structure

4 Annotation S1-5 Annotation.fa

Peakall et al. MER-20-0380-About the Dryad Repository 4 4.2. About annotation

This fasta file lists the functional annotation and assignment of -specific MapMan BIN ontology of the target gene sequences, which was performed using Mercator4 v2.0 (Schwacke et al., 2019).

5. Paralogs

5.1. Folder and file structure

5. Paralogs D67.S1-S5.paralog.stat.pdf 5.2. About paralogs

We used the HybPiper pipeline (Johnson et al., 2016) to identify loci with high paralog risk, with this analysis performed across the 67 Diurideae wide samples (D67). By our criterion of requiring more than six of the 67 samples (> 10%) at a locus to be flagged as potentially containing paralogs, very few paralog risk loci were detected. For example, by this criterion, the percentage of loci with paralog risk was as follows: Set1: 3 of 230 (~1.3%); Set2: 9 of 245 (~ 3.7%); Set3: 19 of 916 (~ 2.1%); Set4: 12 of 1058 (~ 1.1%); Set5: 4 of 1052 (~0.38%). The figure provided highlights the loci exceeding the criterion, for each sequence set.

6. References

Deng H, Zhang GQ, Lin M, Wang Y, Liu ZJ (2015) Mining from transcriptomes: 315 single-copy orthologous genes concatenated for the phylogenetic analyses of . Ecology and Evolution 5, 3800-3807. Schwacke R, Ponce-Soto GY, Krause K, et al. (2019) MapMan4: A refined protein classification and annotation framework applicable to multi-omics data analysis. Molecular Plant 12, 879-892. Weston PH, Perkins AJ, Indsto JO, Clements MA (2014) Phylogeny of Orchidaceae Tribe Diurideae and Its Implications for the Evolution of Pollination Systems. In: Darwin’s Orchids Then and Now (eds. Edens-Meier R, Bernhardt P). The University of Chicago Press, Chicago. Wong DCJ, Amarasinghe R, Falara V, Pichersky E, Peakall R (2019) Duplication and selection in β- ketoacyl-ACP synthase gene lineages in the sexually deceptive Chiloglottis (Orchidaceace). Annals of Botany 123, 1053-1066. Wong DCJ, Amarasinghe R, Pichersky E, Peakall R (2018) Evidence for the involvement of fatty acid biosynthesis and degradation in the formation of insect sex pheromone-mimicking chiloglottones in sexually deceptive Chiloglottis orchids. Frontiers in Plant Science 9, 839. Wong DCJ, Amarasinghe R, Rodriguez-Delgado C, et al. (2017) Tissue-specific floral transcriptome analysis of the sexually deceptive orchid Chiloglottis trapeziformis provides insights into the biosynthesis and regulation of its unique UV-B dependent floral volatile, chiloglottone 1. Frontiers in Plant Science 8, 1260. Xu H, Bohman B, Wong DCJ, et al. (2017) Complex sexual deception in an orchid is achieved by co- opting two independent biosynthetic pathways for pollinator attraction. Current Biology 27, 1867-1877.

Peakall et al. MER-20-0380-About the Dryad Repository 5