SUPPLEMENTAL MATERIAL

Increased taxon sampling reveals thousands of hidden orthologs in

José M. Martín-Durán, Joseph F. Ryan, Bruno C. Vellutini, Kevin Pang, Andreas

Hejnol

TABLE OF CONTENTS

• Supplemental Methods

• Supplemental Figures S1–S9

• Supplemental Tables 1–10

• Supplemental References

• Supplemental File 1 (scripts and Leapfrog code)

• Supplemental File 2 (HumRef2015 dataset and alignments)

Supplemental Methods

Macrostomum lignano transcriptome

Adult and juveniles of M. lignano were kept under laboratory conditions as described elsewhere (Rieger et al. 1988). starved for four days were homogenized and used as source material to isolate total RNA with the TRI Reagent (Life Technologies) following the manufacturer’s recommendations. A total of 1 µg was used for Illumina paired-end library preparation at the GeneCore facilities (EMBL) and sequencing in a

HiSeq 2000 platform. Paired-end reads were cleaned for adaptors using Trimmomatic v.0.35 (Bolger et al. 2014) and resulting reads were assembled de novo with Trinity v.r20140717 using default settings (Grabherr et al. 2011).

Data set preparation

We downloaded the Human RefSeq FASTA file from the NCBI FTP site last updated on March 25, 2015

(ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/H_sapiens/protein/protein.fa.gz). We also downloaded the gene2accession data file from NCBI, which was last updated on July 3,

2015 (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz). We then used the reduce_refseq script (Supplemental File 1) to generate a non-redundant Human RefSeq

FASTA file with the following command: (reduce_refseq --fasta=protein.fa.gz -- gene2accession=gene2accession.gz > HumRef2015.fa). This script prints only the first isoform for each ID in the RefSeq FASTA file. The resulting file (available in

Supplemental File 2) will be hereafter referred to as HumRef2015. Additionally, we assembled de novo a transcriptome for M. lignano (Supplemental Methods), and downloaded the 28 RNA-seq de novo assemblies from (Laumer et al. 2015) and six additional S. mediterranea datasets from PlanMine v1.0 (Brandl et al. 2016) on May 29, 2015. On July 14, 2015 we downloaded Schistosoma mansoni, Hymenolepis microstoma, and Girardia tigrina gene models from the Sanger FTP sites (S. mansoni: ftp://ftp.sanger.ac.uk/pub/project/pathogens/Schistosoma/mansoni/; H. microstoma: ftp://ftp.sanger.ac.uk/pub/project/pathogens/Hymenolepis/microstoma/; G. tigrina: https://drive.google.com/open?id=0B-oXGfLmcaXQUUhVS29IbC1CeHM ). Further details on datasets are available in Supplemental Table 1. All transcriptomes were constructed from adult tissue, except for S. ellipticus, P. vittatus, G. applanata and

Prorhynchus sp. I, which included embryonic stages.

CEGMA analysis, transcriptome quality assessment, and statistics

Transcriptome completeness was evaluated with CEGMA (Parra et al. 2007; Parra et al.

2009). We could not run the CEGMA pipeline in the transcriptomes of G. tigrina,

Microdalyellia sp. and H. microstoma due to an untraceable error. We calculated the contig metrics for each transcriptome assembly with TransRate (Smith-Unna et al.

2015). Principal component analysis was performed in R (R Core Team 2015) and plotted using the ggplot2 package (Wickham 2009). For this analysis, branch lengths were the root-to-tip distances inferred from a maximum likelihood phylogeny previously reported (Laumer et al. 2015).

Smed-SDCCAG8 and Smed-CEP192 primer sequences

The following gene specific primers were used to clone Smed-SDCCAG8 and Smed-

CEP192: Smed-SDCCAG8-F 5’ TTGACCAGGAACCGTACGAA 3’, Smed-

SDCCAG8-R 5’ CACATTGCTCGAATCTGGCA 3’; Smed-CEP192-F 5’

CAACAGGTCCGATTCCAACC 3’, Smed-CEP192-R 5’

AACGCAACAGGAACCAGAAC 3’. Supplemental Figures

Supplemental Figure S1. Distribution of hidden orthologs in the analyzed flatworm transcriptomes. The figure shows the total number of hidden orthologs in the analyzed transcriptomes in a phylogenetic context and with respect to their completeness (percentage of recovered core eukaryote , CEGs). The quality of the transcriptomes seems to be a limitation for the recovery of hidden orthologs in some flatworm lineages (e.g. Provortex cf. sphagnorum). However, the number of hidden orthologs is very species-specific.

Supplemental Figure S2. Impact of a ‘bridge’ species in OrthoFinder. Number of orthogroups identified by OrthoFinder when only S. mediterranea and human are considered (top) and when a ‘bridge’ transcriptome is added (bottom). The inclusion of an intermediate species increases the number of sequences assigned to orthogroups from

29% to 46.14% of the transcriptome, and the number of orthogroups between S. mediterranea and human from 5,638 to 5,819. The inclusion of the P. vittatus ‘bridge’ transcriptome in an OrthoFinder analysis recovered only 62 out of the 75 hidden orthologs (82.7%) identified by the Leapfrog pipeline in the same S. mediterranea transcriptome with similar E-value cutoffs. For S. mediterranea, we used the transcriptome sequenced by J. Rink’s lab at the MPI in Dresden (Brandl et al. 2016).

Supplemental Figure S3. GC content in flatworm transcriptomes. GC content of each transcript plotted against its average length of G/C stretches for each flatworm species under study. The transcripts corresponding to hidden orthologs are in blue.

Hidden orthologs do not differentiate from the majority of transcripts.

Supplemental Figure S4. (GO) characterization of hidden orthologs. Distribution of GO terms for all recovered hidden orthologs (A–C) and for the hidden orthologs identified in S. mediterranea (D–F). Hidden orthologs include a great diversity of GO categories, with a big proportion of binding and catalytic activity.

The number of GO nodes in each category is indicated in parentheses.

Ciona intestinalis CEP192

Capitella teleta CEP192 2 9 3 2 Dictyostelium discoideum CEP192

2 5 Mus musculus CEP192

9 4 Homo sapiens CEP192

Prostheceraeus vittatus CEP192

9 9 Stylochus ellipticus CEP192

7 0 Monocelis fusca CEP192

Hymenolepis microstoma CEP192 2 4 6 6 Clonorchis sinensis CEP192

2 9 3 7 Girardia tigrina CEP192

100 Schmidtea mediterranea CEP192

Stenostomum leucops CEP192 0.7

Supplemental Figure S5. Orthology analysis of the centrosomal CEP192 protein.

CEP192 do not contain any identifiable protein domain, and there is no known related protein that can help root the tree. Flatworm sequences are highlighted in red.

Model of protein evolution: RtRev+I+G+F.

Crassostrea gigas SDCCAG8 7 4 Biomphalaria glabrata SDCCAG8

2 6 Capitella teleta SDCCAG8

3 5 Lingula anatina SDCCAG8

2 9 Danio rerio SDCCAG8

9 1 Homo sapiens SDCCAG8 100 Mus musculus SDCCAG8

Stenostomum leucops SDCCAG8

Girardia tigrina SDCCAG8 100 Schmidtea mediterranea SDCCAG8 6 1 Protomonotresidae sp. SDCCAG8 2 5 Prorhynchus alpinus SDCCAG8

9 9 Xenoprorhynchus sp. I SDCCAG8 2 6 7 8 2 Geocentrophora applanata SDCCAG8

Monocelis fusca SDCCAG8 3 4 lineare SDCCAG8 9 1 6 lignano SDCCAG8 2 8 Gnosonesimida sp. IV SDCCAG8

5 5 Prostheceraeus vittatus SDCCAG8 3 5 Stylochus ellipticus SDCCAG8

Rynchomesostoma rostratum SDCCAG8 0.9

Supplemental Figure S6. Orthology analysis of the centrosomal SDCCAG8 protein. SDCCAG8 proteins contain a SDCCAG8 domain (PFAM: PF15964), which is exclusive of these proteins. The domain is clearly recognizable in all flatworm sequences except P. alpinus (fragment too short) and the triclads G. tigrina and S. mediterranea (too divergent). Flatworm sequences are highlighted in red. Model of protein evolution: JTT+G+F.

Supplemental Figure S7. Expression of Smed- SDCCAG8 and Smed-CEP192 in adult planarians. (A–B) Whole mount in situ hybridization of Smed-SDCCAG8 (n=6) and Smed-CEP192 (n=6) in adult intact S. mediterranea. (A) Smed-SDCCAG8 is expressed in the epidermis, anterior margin (black arrowheads), in the digestive system

(dg) and pharynx, and faintly in the parenchyma. (B) Smed-CEP192 is expressed in a salt-and-pepper manner in isolated cells of the epidermis, posterior margin of the , and parenchyma.

Tribolium castaneum SO

63 Drosophila melanogaster SO Homo sapiens SIX2 9 Homo sapiens 87 13 Branchiostoma floridae SIX1/2 Branchiostoma floridae SIX3/6 3 Homo sapiens SIX3 85 54 Homo sapiens SIX6 52 Drosophila melanogaster OPTIX 14 59 Tribolium castaneum OPTIX Tribolium castaneum SIX4 73 Drosophila melanogaster SIX4 32 Branchiostoma floridae SIX4/5 43 Homo sapiens SIX4 41 Homo sapiens SIX5 Branchiostoma floridae ONECUT Drosophila melanogaster ONECUT 95 100 Tribolium castaneum ONECUT 57 Homo sapiens ONECUT3 67 Homo sapiens ONECUT2 59 Homo sapiens ONECUT1 Drosophila melanogaster CT 87 Tribolium castaneum CT 70 Branchiostoma floridae CUX 89 Homo sapiens CUX1 90 Homo sapiens CUX2 63 Branchiostoma floridae ACUT 24 Homo sapiens ZFHX2 Homo sapiens ZFHX4 86 29 Homo sapiens ZFHX3 60 Branchiostoma floridae ZFH 91 Drosophila melanogaster ZFH2 98 Tribolium castaneum ZFH2 Homo sapiens POU6F1 80 Homo sapiens POU6F2 6 64 Branchiostoma floridae POU6 57 Tribolium castaneum PDM3a 59 Drosophila melanogaster PDM3 99 Tribolium castaneum PDM3b 9 Branchiostoma floridae POU1 88 Homo sapiens POU1F1 Homo sapiens POU4F2 69 Homo sapiens POU4F3 94 Drosophila melanogaster ACJ6 5 55 Tribolium castaneum ACJ6 39 Branchiostoma floridae POU4 50 Homo sapiens POU4F1 6 Homo sapiens NANOGNB 2 0 Drosophila melanogaster NUB 8 Drosophila melanogaster PDM2 26 Tribolium castaneum NUB 30 Branchiostoma floridae POU2 30 Homo sapiens POU2F3 3 36 Homo sapiens POU2F1 57 Homo sapiens POU2F2 Branchiostoma floridae POU3L

16 Branchiostoma floridae POU3 Tribolium castaneum VVL 12 48 Drosophila melanogaster VVL 13 Homo sapiens POU3F4 30 Homo sapiens POU3F2 7 Homo sapiens POU3F1 15 Homo sapiens POU3F3 17 Homo sapiens POU5F1 83 Homo sapiens POU5F2 Homo sapiens ADNP 97 Homo sapiens ADNP2 0 3 Branchiostoma floridae AZFH 19 Branchiostoma floridae ZHX (HD1) 81 Homo sapiens HOMEZ (HD1) 68 Homo sapiens ZHX1 (HD1) 42 Homo sapiens ZHX3 (HD1) 3 39 Homo sapiens ZHX2 (HD1) Branchiostoma floridae ZHX (HD3) 47 Homo sapiens ZHX3 (HD3) 73 Homo sapiens ZHX2 (HD3) 75 40 Homo sapiens ZHX1 (HD3) Branchiostoma floridae ZHX (HD4) 37 Homo sapiens ZHX3 (HD2) 36 Homo sapiens ZHX1 (HD2) 35 Homo sapiens ZHX2 (HD2) 23 Branchiostoma floridae ZHX (HD2) 51 Branchiostoma floridae ZHX (HD5) Homo sapiens HOMEZ (HD2) 65 36 Homo sapiens ZHX3 (HD4) 47 Homo sapiens ZHX1 (HD4) 37 Homo sapiens ZHX2 (HD4) Homo sapiens LHX6 83 Homo sapiens LHX8 57 Tribolium castaneum AWH 71 Drosophila melanogaster AWH 35 44 Branchiostoma floridae LHX6/8 Branchiostoma floridae LHX2/9-a Homo sapiens LHX9 84 96 0 Homo sapiens LHX2 68 Branchiostoma floridae LHX2/9-b 6 Tribolium castaneum AP2 66 Drosophila melanogaster AP 98 2 Tribolium castaneum AP1 Branchiostoma floridae AZFH (HD2) Homo sapiens ZFHX3 (HD1) 11 49 Homo sapiens ZFHX4 (HD1) 88 Tribolium castaneum ZFH2 (HD1) 45 Drosophila melanogaster ZFH2 (HD1) 72 Branchiostoma floridae ZFHX (HD1) 3 Homo sapiens ZFHX3 (HD4) 96 Branchiostoma floridae ZFHX (HD4) 46 Homo sapiens ZFHX4 (HD4) 25 Homo sapiens ZFHX2 (HD3) 60 Tribolium castaneum ZFG2 (HD4) 7 Branchiostoma floridae ZEB 67 Homo sapiens ZEB2 45 Drosophila melanogaster ZFH1 (HD1) 48 Tribolium castaneum ZFH1 (HD1) 27 Branchiostoma floridae ZFHX (HD3) Homo sapiens ZFHX4 (HD3) 90 52 Homo sapiens ZFHX3 (HD3) 19 0 Homo sapiens ZFHX2 (HD2) 41 Drosophila melanogaster ZFH2 (HD3) 79 Tribolium castaneum ZFH2 (HD3) Homo sapiens ISL1 100 Homo sapiens ISL2 51 Branchiostoma floridae ISL 61 Tribolium castaneum TUP Tribolium castaneum LIM1 88 26 Drosophila melanogaster LIM1 88 Homo sapiens LHX1 85 Branchiostoma floridae LHX1/5 43 Homo sapiens LHX5 48 Drosophila melanogaster LIM3 78 Tribolium castaneum LIM3 20 Branchiostoma floridae LHX3/4 31 Homo sapiens LHX3 27 71 Homo sapiens LHX4 Drosophila melanogaster CG32105 95 Tribolium castaneum LMXb 0 Tribolium castaneum LMXa 60 70 Drosophila melanogaster CG4328 47 Branchiostoma floridae LMX 87 Homo sapiens LMX1B 92 Homo sapiens LMX1A Branchiostoma floridae MUXb Homo sapiens PROP1 Branchiostoma floridae PROP 73 Drosophila melanogaster CG32532 87 Tribolium castaneum PROP 0 Branchiostoma floridae PAX3/7 Homo sapiens PAX7 56 54 Homo sapiens PAX3 49 Tribolium castaneum PRD 60 Drosophila melanogaster PRD 0 60 Tribolium castaneum GSB 67 Drosophila melanogaster GSB 11 40 Tribolium castaneum GSBN 51 Drosophila melanogaster GSB-N.1 Tribolium castaneum EYG 84 Drosophila melanogaster TOE 62 Drosophila melanogaster EYG 33 Homo sapiens PAX4 Homo sapiens PAX6 30 98 0 Branchiostoma floridae PAX6 83 Drosophila melanogaster EY 77 Tribolium castaneum EY 81 Tribolium castaneum TOY 76 Drosophila melanogaster TOY Homo sapiens PITX1 68 Homo sapiens PITX3 80 89 Homo sapiens PITX2 Branchiostoma floridae PITX 45 Drosophila melanogaster PTX1 21 Tribolium castaneum PTX1 Homo sapiens MIXL1 12 0 Branchiostoma floridae DMBX 91 0 Homo sapiens DMBX1 Homo sapiens SEBOX 0 Branchiostoma floridae APRD5 0 9 Homo sapiens ISX 26 0 Branchiostoma floridae ISX Homo sapiens ALX1 34 Branchiostoma floridae ALX 22 Homo sapiens ALX3 3 Homo sapiens ALX4 Homo sapiens VSX1 0 67 Homo sapiens VSX2 10 Branchiostoma floridae VSX Drosophila melanogaster VSX1 5 84 Drosophila melanogaster TUP 99 Drosophila melanogaster VSX2 5 Tribolium castaneum VSX Prostheceraeus vittatus VSX2 4 34 Prostheceraeus vittatus VSX1 4 Branchiostoma floridae APRD3 11 VSX Prostheceraeus vittatus DRGX 14 Macrostomum lignano DRGX 49 Schmidtea mediterranea DRGX (contig 1) 96 Schmidtea mediterranea DRGX (contig 2) 0 Tribolium castaneum AL 85 Drosophila melanogaster AL

0 Prostheceraeus vittatus ARX 0 Tribolium castaneum PHOX 43 Drosophila melanogaster PHDP 75 0 Homo sapiens PHOX2A 94 Branchiostoma floridae PHOX 29 Homo sapiens PHOX2B

0 Drosophila melanogaster PPH13 53 Tribolium castaneum PPH13 Branchiostoma floridae ARX 0 Macrostomum lignano ARX 0 0 Branchiostoma floridae OTP 92 Homo sapiens OTP 60 Tribolium castaneum OTP 98 0 Drosophila melanogaster OTP Homo sapiens ARX Branchiostoma floridae REPO 20 Tribolium castaneum REPO 0 11 Branchiostoma floridae MUXA 3 Drosophila melanogaster REPO Tribolium castaneum HBN 0 92 Drosophila melanogaster HBN Branchiostoma floridae APRD2 0 Branchiostoma floridae APRD1

0 1 Branchiostoma floridae RAX Tribolium castaneum RX 81 67 Drosophila melanogaster RX 66 Homo sapiens RAX1 54 0 Homo sapiens RAX2 0 Branchiostoma floridae SHOX Homo sapiens SHOX 76 60 Homo sapiens SHOX2 53 Tribolium castaneum SHOX 15 Macrostomum lignano SHOX 5 43 Drosophila melanogaster CG34367 Branchiostoma floridae UNCXB 80 Branchiostoma floridae UNCXC Branchiostoma floridae UNCXA 25 52 Homo sapiens UNCX 61 Drosophila melanogaster ODSH 69 Drosophila melanogaster UNC-4 57 Tribolium castaneum UNC4 Branchiostoma floridae DRGX 46 Homo sapiens DRGX 14 Drosophila melanogaster CG34340 90 Tribolium castaneum DRGX 0 Tribolium castaneum CG11294 97 Drosophila melanogaster CG11294

4 Homo sapiens DPRX 24 Homo sapiens ARGFX 15 Homo sapiens CRX 79 Homo sapiens OTX1 Branchiostoma floridae OTX 84 24 Homo sapiens OTX2 58 Tribolium castaneum OC2 31 Tribolium castaneum OC1 54 Drosophila melanogaster OC Homo sapiens RHOXF2 100 0 Homo sapiens RHOXF2B 8 Branchiostoma floridae PRRX Homo sapiens PRRX2 70 63 Homo sapiens PRRX1 42 Macrostomum lignano PRRX 46 Tribolium castaneum PRRX 74 Drosophila melanogaster CG9876 Prostheceraeus vittatus GSC Schmidtea polychroa GSC 3 23 Branchiostoma floridae GSC 46 Drosophila melanogaster GSC 64 0 Tribolium castaneum GSC Homo sapiens GSC2 0 Homo sapiens TPRXL 2 98 Homo sapiens TPRX1 0 Branchiostoma floridae MUXA (HD3) 8 Homo sapiens GSC Branchiostoma floridae AHBX1 18 Homo sapiens NOBOX

0 Branchiostoma floridae APRD6 Homo sapiens CPHX1 95 1 Homo sapiens CPHX2 Homo sapiens DUXB 1 12 Branchiostoma floridae APRD4 Homo sapiens DUXA (HD1) 0 4 Homo sapiens DUXA (HD2) Homo sapiens LOC100506764 (HD1) 41 Homo sapiens DUX4L13 (HD1) 98 Homo sapiens DUX4 (HD1) 62 Homo sapiens DUX4L8 (HD1)

10 Homo sapiens DUX4L15 (HD2) 0 3 1 Homo sapiens DUX4L12 (HD1) 0 Homo sapiens DUX4L5 (HD1) 0 Homo sapiens DUX4L11 (HD1) 2 10 Homo sapiens DUX4L10 (HD1) Homo sapiens DUX4L3 (HD1) 15 9 Homo sapiens DUX4L4 (HD1) 0 Homo sapiens DUX4L2 (HD1) 0 Homo sapiens DUX4L6 (HD1) 0 Homo sapiens DUX4L7 (HD1) 0 Homo sapiens DUX4L14 (HD1) 0 0 Homo sapiens DUX4L9 (HD1) Homo sapiens DUXB (HD2)

50 Homo sapiens DUX4L12 (HD2) Homo sapiens DUX4 (HD2) 98 Homo sapiens DUX4L11 (HD2) 27 0 Homo sapiens LOC100506764 (HD2) 0 Homo sapiens DUX4L4 (HD2) 10 3 Homo sapiens DUX4L14 (HD2) Homo sapiens DUX4L10 (HD2) 0 Homo sapiens DUX4L15 (HD1) 9 Homo sapiens DUX4L3 (HD2) 0 Homo sapiens DUX4L5 (HD2) 6 Homo sapiens DUX4L2 (HD2) 10 Homo sapiens DUX4L8 (HD2) 0 Homo sapiens DUX4L6 (HD2) 0 Homo sapiens DUX4L7 (HD2) 11 Homo sapiens DUX4L9 (HD2) Branchiostoma floridae MUXA (HD2) 10 Homo sapiens RHOXF1 Homo sapiens ESX1 Homo sapiens HESX1 Homo sapiens LEUTX 0 Branchiostoma floridae DLL Drosophila melanogaster DLL 20 91 Tribolium castaneum DLL 3 Homo sapiens DLX4

0 Homo sapiens DLX5 3 57 Homo sapiens DLX2 72 2 Homo sapiens DLX3 Homo sapiens DLX1 0 3 Homo sapiens DLX6 0 Branchiostoma floridae ZHX Branchiostoma floridae MSX Homo sapiens MSX2 0 86 75 Homo sapiens MSX1 46 Drosophila melanogaster DR 90 Tribolium castaneum DR2 0 90 Tribolium castaneum DR1 Branchiostoma floridae MSXLX

1 Tribolium castaneum MSXLX

0 Drosophila melanogaster CG15696 Homo sapiens HOMEZ.3 0 34 Homo sapiens ZHX3 (HD5) 51 8 Homo sapiens ZHX1 (HD5) Homo sapiens CERS3 75 Drosophila melanogaster LAG1 86 Tribolium castaneum LAG1A 59 0 Tribolium castaneum LAG1B Branchiostoma floridae NKX1b 97 Branchiostoma floridae NKX1a Homo sapiens NKX1-1 16 68 Homo sapiens NKX1-2 63 Drosophila melanogaster SLOU 69 Tribolium castaneum SLOU Drosophila melanogaster CG11085 86 Tribolium castaneum BARI 24 Branchiostoma floridae BARI 49 0 Branchiostoma floridae BARH 66 Homo sapiens BARHL1 32 Homo sapiens BARHL2 39 Tribolium castaneum B-H 86 Drosophila melanogaster B-H1 91 Drosophila melanogaster B-H2 Drosophila melanogaster BCD Homo sapiens NOTO 52 0 Branchiostoma floridae NOT 95 Tribolium castaneum NOT 99 Drosophila melanogaster CG18599 11 Branchiostoma floridae VAX Prostheceraeus vittatus VAX 16 9 Schmidtea mediterranea VAX 28 Homo sapiens VAX1 100 Homo sapiens VAX2 3 0 Macrostomum lignano VAX 4 0 Tribolium castaneum BEETLEBOX1 25 Tribolium castaneum BEETLEBOX2 1 Branchiostoma floridae EMXA 24 Branchiostoma floridae EMXC 25 Branchiostoma floridae EMXB Homo sapiens EMX1 11 59 Homo sapiens EMX2 31 Tribolium castaneum EMS 64 Drosophila melanogaster E5 45 Drosophila melanogaster EMS Branchiostoma floridae HMX 0 Tribolium castaneum HMX 86 69 Homo sapiens HMX3 38 Homo sapiens HMX2 31 Homo sapiens HMX1 53 13 Drosophila melanogaster HMX Branchiostoma floridae NKX7 88 Tribolium castaneum NK7.1 89 Drosophila melanogaster NK7.1 24 Drosophila melanogaster HGTX 60 Tribolium castaneum HGTX 95 Homo sapiens NKX6-2 20 Homo sapiens NKX6-3 26 Homo sapiens NKX6-1 81 Branchiostoma floridae NKX6 Drosophila melanogaster BAP 74 Tribolium castaneum BAP 19 Homo sapiens NKX3-1 23 Homo sapiens NKX3-2 0 54 67 Branchiostoma floridae NKX3 Homo sapiens NKX2-5 15 Homo sapiens NKX2-6 39 Homo sapiens NKX2-3 82 Branchiostoma floridae CSX Drosophila melanogaster TIN 9 45 Tribolium castaneum TIN 25 Branchiostoma floridae NKX2.1 Homo sapiens NKX2-4 83 99 Homo sapiens NKX2-1 23 Tribolium castaneum SCRO 25 Drosophila melanogaster SCRO 21 Branchiostoma floridae NKX2.2 1 Homo sapiens NKX2-8 1 90 69 Homo sapiens NKX2-2 55 Tribolium castaneum VND 55 Drosophila melanogaster VND Branchiostoma floridae ANKX Homo sapiens HHEX 74 Branchiostoma floridae HHEX 4 78 Prostheceraeus vittatus HHEX 50 Tribolium castaneum HEX 54 8 Drosophila melanogaster CG7056 Homo sapiens TLX2 88 Homo sapiens TLX3 53 Homo sapiens TLX1 40 2 Branchiostoma floridae TLX 54 Drosophila melanogaster C15 64 Tribolium castaneum C15 Homo sapiens VENTX Branchiostoma floridae HX

17 Branchiostoma floridae LCX Branchiostoma floridae VENT2 17 100 4 Branchiostoma floridae VENT1 23 Homo sapiens LBX1 68 Branchiostoma floridae LBX 89 Tribolium castaneum LBX 1 56 Drosophila melanogaster LBE 39 Homo sapiens LBX2 25 Drosophila melanogaster LBL Homo sapiens NANOG Branchiostoma floridae BSX 42 Homo sapiens BSX 10 55 Tribolium castaneum BSH 37 Drosophila melanogaster BSH

18 Homo sapiens BARX2 92 Branchiostoma floridae BARX 46 Homo sapiens BARX1 Homo sapiens HLX 11 75 Branchiostoma floridae HLX 62 Drosophila melanogaster H2.0 55 Tribolium castaneum H2.0 95 Tribolium castaneum DBX

52 Drosophila melanogaster CG12361 Branchiostoma floridae DBX 38 70 Homo sapiens DBX1 68 54 Homo sapiens DBX2 Macrostomum lignano DBX 35 Prostheceraeus vittatus DBX 44 Schmidtea mediterranea DBX Branchiostoma floridae ABOX 99 Tribolium castaneum ABOX 79 Drosophila melanogaster CG34031 Drosophila melanogaster CG13424 6 54 Tribolium castaneum NEDX 77 Branchiostoma floridae NEDXA 91 Branchiostoma floridae NEDXB 9 Branchiostoma floridae EN Homo sapiens EN2 91 96 Homo sapiens EN1 28 Drosophila melanogaster EN 91 Tribolium castaneum EN 91 Tribolium castaneum INV 0 63 Drosophila melanogaster INV Drosophila melanogaster UNPG 95 Tribolium castaneum UNPG 49 Homo sapiens GBX1 74 Homo sapiens GBX2 20 Branchiostoma floridae GBX Tribolium castaneum EXEX 77 Homo sapiens MNX1 2 44 Drosophila melanogaster EXEX 17 Branchiostoma floridae MNXA 88 Branchiostoma floridae MNXB Drosophila melanogaster RO 84 Tribolium castaneum RO 40 Branchiostoma floridae RO 51 4 Branchiostoma floridae EVXB Homo sapiens 85 81 Homo sapiens EVX2 86 Branchiostoma floridae EVXA 71 Drosophila melanogaster EVE 58 Tribolium castaneum EVE Branchiostoma floridae MOX 3 Homo sapiens MEOX2 90 56 Homo sapiens MEOX1 71 Drosophila melanogaster BTN 61 Tribolium castaneum BTN Homo sapiens HOXB2 70 Homo sapiens HOXA2 45 Tribolium castaneum MXP 97 Drosophila melanogaster PB 11 17 Branchiostoma floridae HOX2 22 Homo sapiens HOXB1 91 Homo sapiens HOXD1 42 Homo sapiens HOXA1 34 Branchiostoma floridae HOX1 43 Tribolium castaneum LAB 78 Drosophila melanogaster LAB 9 Drosophila melanogaster ZEN2 35 Drosophila melanogaster ZEN 25 Tribolium castaneum IND 93 Drosophila melanogaster IND 13 Homo sapiens GSX2 31 Branchiostoma floridae 25 Homo sapiens GSX1 7 Homo sapiens PDX1 93 Branchiostoma floridae XLOX 13 Tribolium castaneum ZEN1 44 Tribolium castaneum ZEN2 19 Branchiostoma floridae HOX3 46 Homo sapiens HOXD3 84 Homo sapiens HOXB3 7 39 Homo sapiens HOXA3 Branchiostoma floridae HOX4 Homo sapiens HOXB4 52 Homo sapiens HOXC4 22 23 Homo sapiens HOXD4 74 Homo sapiens HOXA4 10 Tribolium castaneum DFD 100 Drosophila melanogaster DFD 7 Homo sapiens HOXC5 Homo sapiens HOXB5 94 8 Homo sapiens HOXA5 Branchiostoma floridae HOX5 1 Tribolium castaneum CX 93 Drosophila melanogaster SCR 3 Tribolium castaneum FTZ 13 Drosophila melanogaster FTZ 17 Branchiostoma floridae 3 89 Tribolium castaneum CAD2 23 Drosophila melanogaster CAD 13 Homo sapiens CDX2 16 Homo sapiens CDX4 5 22 Homo sapiens CDX1 Homo sapiens HOXA7 Drosophila melanogaster ANTP 42 Tribolium castaneum PTL 0 Homo sapiens HOXB7 13 Homo sapiens HOXA6 2 28 Homo sapiens HOXB6 8 Homo sapiens HOXC6

4 Drosophila melanogaster UBX 6 97 Tribolium castaneum UTX 90 Tribolium castaneum ABD-A 98 Drosophila melanogaster ABD-A Branchiostoma floridae HOX7

3 Branchiostoma floridae HOX6 Branchiostoma floridae HOX8 5 Homo sapiens HOXC8 8 90 Homo sapiens HOXD8 19 Homo sapiens HOXB8 24 Tribolium castaneum ABD-B 100 Drosophila melanogaster ABD-B 59 Homo sapiens HOXC9 8 Homo sapiens HOXD9 1 Homo sapiens HOXA9

3 Homo sapiens HOXB9 Branchiostoma floridae HOX9 4 Homo sapiens HOXA10 13 86 Homo sapiens HOXD10 65 Homo sapiens HOXC10 15 Branchiostoma floridae HOX12 37 Branchiostoma floridae HOX11 9 Branchiostoma floridae HOX10 Homo sapiens HOXD11 17 82 Homo sapiens HOXA11 40 Homo sapiens HOXC11 43 Branchiostoma floridae HOX13

48 Branchiostoma floridae HOX14 Homo sapiens HOXD12 28 96 Homo sapiens HOXC12 57 Branchiostoma floridae HOX15 87 Homo sapiens HOXD13 81 Homo sapiens HOXC13 37 Homo sapiens HOXB13 36 Homo sapiens HOXA13

1.0 Supplemental Figure S8. Orthology analysis of the ANTP homeodomain class. The newly identify sequences in the macrostomid M. lignano, the polyclad P. vittatus and the triclad S. mediterranea are highlighted in red. Model of protein evolution: LG+G.

Branchiostoma floridae CMP (HD2) 19 Branchiostoma floridae CMP (HD1) 25 Drosophila melanogaster CMP (HD1) 82 Apis mellifera CMP (HD1) 57 92 Tribolium castaneum CMP (HD1) Prostheceraeus vittatus CMP 21 Macrostomum lignano CMP Apis mellifera CMP (HD2) 14 82 Tribolium castaneum CMP (HD2) 51 20 Drosophila melanogaster CMP (HD2) Schmidtea mediterranea CMP 14 Caenorhabditis elegans CMP (HD1) 38 Caenorhabditis elegans CMP (HD1) Branchiostoma floridae CUX Danio rerio CUX 79 59 Homo sapiens CUX1 58 Xenopus laevis CUX1 45 Mus musculus CUX1 22 53 Gallus gallus CUX1 Danio rerio CUX2 Mus musculus CUX2 48 57 Homo sapiens CUX2 61 Gallus gallus CUX2 38 Xenopus laevis CUX2 Tribolium castaneum CT 92 51 Drosophila melanogaster CT 18 Apis mellifera CT 9 Branchiosotoma floridae ACUT Danio rerio SATB2 38 Homo sapiens SATB2 97 80 Mus musculus SATB2 19 Gallus gallus SATB2 15 35 Xenopus laevis SATB2 33 Danio rerio LOC571757 Gallus gallus SATB1 64 41 Xenopus laevis SATB1 74 Mus musculus SATB1 71 Homo sapiens SATB1 Caenorhabditis elegans CEH-44 Branchiostoma floridae ONECUT Caenorhabditis elegans CEH-38 67 70 Caenorhabditis elegans CEH-21 95 13 Caenorhabditis elegans CEH-39 95 Caenorhabditis elegans CEH-48 37 Apis mellifera ONECUT 94 Drosophila melanogaster ONECUT 27 Tribolium castaneum ONECUT 22 Xenopus laevis ONECUT2 Gallus gallus CHR10.1 31 63 Danio rerio ONECUT1 63 Homo sapiens ONECUT1 63 Xenopus laevis ONECUT1 50 47 Mus musculus ONECUT1 Gallus gallus CHRZ.1 91 Mus musculus ONECUT2 98 Homo sapiens ONECUT2 42 Danio rerio LOC100000073 33 Danio rerio ONECUTL Danio rerio ONECUT3 32 77 Danio rerio ONECUT1 59 Xenopus laevis ONECUT3 85 Mus musculus ONECUT3 72 Homo sapiens ONECTU3 1.0

Supplemental Figure S9. Orthology analysis of the CUT homeodomain class. The newly identify sequences in the macrostomid M. lignano, the polyclad P. vittatus and the triclad S. mediterranea are highlighted in red. Model of protein evolution: LG+G.

Supplemental Tables

Supplemental Table 1. Transcriptomes analyzed in this study.

Supplemental Table 2. Recovered hidden orthologs. Hidden orthologs (as in

HumRef2015) recovered in each transcriptome after running Leapfrog with the transcriptome of the polyclad P. vittatus used as the ‘bridge’.

Supplemental Table 3. List of hidden orthologs with reciprocal best BLAST hit against multiple human proteins.

Supplemental Table 4. PFAM domain composition of 130 human queries and their ortholog ‘bridge’ proteins in P. vittatus.

Supplemental Table 5. Number of hidden orthologs recovered with iterated

‘bridge’ transcriptomes.

Supplemental Table 6. Data set used for principal component analysis.

Supplemental Table 7. Length of hidden orthologs and ORFs in flatworm transcriptomes.

Supplemental Table 8. PFAM domains identified in the hidden orthologs.

Supplemental Table 9. Significantly enriched GO terms in the hidden orthologs recovered in S. mediterranea. Supplemental References

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina

sequence data. Bioinformatics 30: 2114-2120.

Brandl H, Moon H, Vila-Farre M, Liu SY, Henry I, Rink JC. 2016. PlanMine - a

mineable resource of planarian biology and biodiversity. Nucleic Acids Res 44:

D764-773.

Grabherr M, Haas BJ, Yassour M, Levin J, Thompson D, Amit I, Adiconis X, Fan L,

Raychowdhury R, Zeng Q et al. 2011. Full-length transcriptome assembly from

RNA-seq data without a reference genome. Nat Biotechnol 29: 644-652.

Laumer CE, Hejnol A, Giribet G. 2015. Nuclear genomic signals of the

'microturbellarian' roots of platyhelminth evolutionary innovation. Elife 4.

Parra G, Bradnam K, Korf I. 2007. CEGMA: a pipeline to accurately annotate core

genes in eukaryotic genomes. Bioinformatics 23: 1061-1067.

Parra G, Bradnam K, Ning Z, Keane T, Korf I. 2009. Assessing the gene space in draft

genomes. Nucleic Acids Res 37: 289-297.

R Core Team. 2015. R: A Language and Environment for Statistical Computing. R

Foundation for Statistical Computing, Vienna, Austria.

Rieger R, Gehlen M, Haszprunar G, Holmlund M, Legniti A, Salvenmoser W, Tyler S.

1988. Laboratory cultures of marine (). Progr Zool 36:

523.

Smith-Unna RD, Boursnell C, Patro R, Hibberd JM, Kelly S. 2015. TransRate:

reference free quality assessment of de-novo transcriptome assemblies. BioRxiv

021626.

Wickham H. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New

York.