Comparative analysis of ciliary gene regulation in

by

Shirley Yin

B.Sc., University of British Columbia, 2014

Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science

in the Department of Molecular Biology and Biochemistry Faculty of Science

© Shirley Yin 2016 SIMON FRASER UNIVERSITY Fall 2016

All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. Approval

Name: Shirley Yin

Degree: Master of Science

Title: Comparative analysis of ciliary gene regulation in nematodes

Examining Committee: Chair: Christopher Beh Associate Professor

Jack Chen Senior Supervisor Professor

David Baillie Supervisor Professor Emeritus

Michel Leroux Supervisor Professor

Fiona Brinkman Internal Examiner Professor

Date Defended: 8 September 2016

ii Abstract

Cilia are highly-conserved organelles ubiquitously present in metazoans and some unicellular eu- karyotes. Cilia are responsible for many biological functions, including fluid movement during left- right body development, and sensory functions such as signal transduction in vision and olfaction. In humans, ciliary defects result in a plethora of serious genetic diseases termed ciliopathies. De- spite their diverse morphology and function, cilia share a common microtubule-based structure and are comprised of a core set of proteins, and many ciliary genes share similar but likely not identical regulation mechanisms. Our research aims to understand the variations in cis-regulatory elements in ciliary genes and the impact of such variations on transcriptional regulation. We hypothesize that cis-regulatory elements in different ciliary promoters are unique and that this uniqueness impacts the expression and function of ciliary genes. We focus on a particular cis-regulatory element, the X-box motif, which functions as the binding motif for RFX/DAF-19, a transcription factor that regulates ciliary gene expression. We identify and analyze X-box motifs for a set of 32 well-studied ciliary genes in C. elegans and their orthologs in 25 additional nematodes, including both free-living and parasitic species. My research consists of three modules. First, we curate ciliary gene orthologs using a combined approach, including homology-based gene finding and RNA-seq-based improve- ment. The primary goal of this step is to ensure that the 5’ ends of the genes are accurately defined in order to properly locate ciliary promoters. Second, we search for putative X-box motifs in these promoters using computational tools to identify motifs that resemble the consensus. For the promot- ers from which consensus X-box motifs are not found, we searched for X-box motifs that may show more differences from the consensus using frequency matrix-based search and regular expressions, which we call “atypical” X-box motifs. Third, we analyze the putative atypical X-box motifs, fo- cusing on their sequence similarities, positions in promoter sequences, and flanking sequences, and compare them against the consensus X-box motifs. We find that atypical X-box motifs differ from the consensus but do not have common patterns or characteristics. Our research aims to understand the variations that can occur in X-box motifs despite the highly conserved DNA-binding domain of RFX/DAF-19.

Keywords: cilia; bioinformatics; transcriptional regulation; X-box motifs

iii Acknowledgements

Many people were helpful and influential during my time in graduate school. I am extremely grateful to my senior supervisor, Dr. Jack Chen, for his mentorship and providing valuable advice and support in my research project. I have learned and benefited immensely from his dedication and patience. I am also grateful to Dr. Jiarui Li, who was always willing to provide help and advice and taught me a lot.

My gratitude extends to my committee members, Dr. David Baillie and Dr. Michel Leroux, for providing guidance and reviewing the thesis, as well as Dr. Fiona Brinkman and Dr. Christopher Beh for serving on my examining committee.

I would like to thank the members of the Chen lab: Zhaozhao Qin, Cyndi Zhao, Jun Wang, Dr. Xi Chen, Dr. Timothy Warrington, Marija Jovanovic, Justin White, Matthew Douglas, Dr. Junxiang Gao, Dr. Jian Ling for creating an enjoyable and productive place to work. Farnaz Bondar and Chander Siddarth were undergraduate volunteers partially under my supervision and contributed to the identification of X-box polymorphisms in C. elegans strains and gene annotation and X-box search in B. xylophilus, respectively.

Finally, I am grateful to friends and family for their ongoing encouragement and support throughout these years.

iv Table of Contents

Approval ii

Abstract iii

Acknowledgements iv

Table of Contents v

List of Tables ix

List of Figures xii

List of Acronyms xx

Glossary xxi

1 Introduction 1 1.1 Overview of transcriptional regulation of genes ...... 1 1.2 Discovery of X-box motifs and RFX genes ...... 2 1.3 More recent efforts to identify RFX genes, X-box motifs, and ciliary genes . . . . . 3 1.4 RFX genes in nematodes ...... 4 1.5 Overview of cilia and ciliary components/genes ...... 5 1.6 Ciliopathies ...... 8 1.7 Thesis aims and organization ...... 9

2 Development of a bioinformatics pipeline for annotating ciliary genes and identifying X-box motifs 13 2.1 Introduction ...... 13 2.2 Criteria for a high-confidence ciliary gene set ...... 13 2.3 Searching for ciliary gene orthologs in species ...... 17 2.4 Annotation of 5’ start sites of ciliary genes ...... 17 2.5 Identification of X-box motifs ...... 18 2.5.1 Identification of typical X-box motifs using HMMER ...... 18 2.5.2 Identification of atypical X-box motifs using TFM-scan ...... 19

v 2.5.3 Identification of atypical X-box motifs using regular expressions ...... 20 2.5.4 Identification of atypical X-box motifs using manual inspection ...... 20 2.6 Reconstructing gene models with RNA-seq (TBLASTN) analysis ...... 21 2.7 Discussion ...... 21

3 Curation of ciliary genes in pathogenic and non-pathogenic nematodes 23 3.1 Introduction ...... 23 3.2 Phylogenetic analysis of nematodes ...... 23 3.3 Identification and annotation of ciliary gene orthologs ...... 28 3.3.1 Curation of arl-6 orthologs in nematodes ...... 28 3.3.2 Curation of bbs-1 orthologs in nematodes ...... 32 3.3.3 Curation of bbs-2 orthologs in nematodes ...... 35 3.3.4 Curation of bbs-4 orthologs in nematodes ...... 38 3.3.5 Curation of bbs-5 orthologs in nematodes ...... 41 3.3.6 Curation of bbs-8 orthologs in nematodes ...... 44 3.3.7 Curation of bbs-9 orthologs in nematodes ...... 48 3.3.8 Curation of che-2 orthologs in nematodes ...... 51 3.3.9 Curation of che-11 orthologs in nematodes ...... 54 3.3.10 Curation of che-13 orthologs in nematodes ...... 58 3.3.11 Curation of dyf-1 orthologs in nematodes ...... 61 3.3.12 Curation of dyf-2 orthologs in nematodes ...... 64 3.3.13 Curation of dyf-3 orthologs in nematodes ...... 68 3.3.14 Curation of dyf-5 orthologs in nematodes ...... 71 3.3.15 Curation of dyf-11 orthologs in nematodes ...... 75 3.3.16 Curation of dyf-13 orthologs in nematodes ...... 78 3.3.17 Curation of dyf-18 orthologs in nematodes ...... 82 3.3.18 Curation of dylt-2 orthologs in nematodes ...... 85 3.3.19 Curation of ift-20 orthologs in nematodes ...... 88 3.3.20 Curation of ifta-1 orthologs in nematodes ...... 91 3.3.21 Curation of mks-1 orthologs in nematodes ...... 94 3.3.22 Curation of mks-6 orthologs in nematodes ...... 97 3.3.23 Curation of mksr-1 orthologs in nematodes ...... 100 3.3.24 Curation of mksr-2 orthologs in nematodes ...... 103 3.3.25 Curation of nphp-2 orthologs in nematodes ...... 106 3.3.26 Curation of odr-4 orthologs in nematodes ...... 109 3.3.27 Curation of osm-1 orthologs in nematodes ...... 112 3.3.28 Curation of osm-5 orthologs in nematodes ...... 115 3.3.29 Curation of osm-6 orthologs in nematodes ...... 118 3.3.30 Curation of osm-12 orthologs in nematodes ...... 121

vi 3.3.31 Curation of tub-1 orthologs in nematodes ...... 124 3.3.32 Curation of xbx-1 orthologs in nematodes ...... 127 3.3.33 Summary of gene annotation efforts ...... 131 3.4 Tracing the missing ciliary genes through gene model reconstruction using RNA-seq analysis ...... 132 3.4.1 Reconstructing C. elegans osm-5 using C. elegans RNA-seq reads . . . . . 133 3.4.2 Reconstructing C. elegans osm-5 using a C. briggsae query ...... 135 3.4.3 Case study: Reconstruction of M. incognita osm-5 ...... 138 3.5 Discussion ...... 140

4 Identification and comparative analysis of X-box motifs in nematodes 141 4.1 Introduction ...... 141 4.2 Identification of putative X-box motifs in nematodes ...... 142 4.2.1 X-box motifs in the arl-6 promoter ...... 142 4.2.2 X-box motifs in the bbs-1 promoter ...... 144 4.2.3 X-box motifs in the bbs-2 promoter ...... 146 4.2.4 X-box motifs in the bbs-4 promoter ...... 148 4.2.5 X-box motifs in the bbs-5 promoter ...... 150 4.2.6 X-box motifs in the bbs-8 promoter ...... 152 4.2.7 X-box motifs in the bbs-9 promoter ...... 155 4.2.8 X-box motifs in the che-2 promoter ...... 156 4.2.9 X-box motifs in the che-11 promoter ...... 158 4.2.10 X-box motifs in the che-13 promoter ...... 159 4.2.11 X-box motifs in the dyf-1 promoter ...... 161 4.2.12 X-box motifs in the dyf-2 promoter ...... 163 4.2.13 X-box motifs in the dyf-3 promoter ...... 165 4.2.14 X-box motifs in the dyf-5 promoter ...... 167 4.2.15 X-box motifs in the dyf-11 promoter ...... 169 4.2.16 X-box motifs in the dyf-13 promoter ...... 171 4.2.17 X-box motifs in the dyf-18 promoter ...... 173 4.2.18 X-box motifs in the dylt-2 promoter ...... 175 4.2.19 X-box motifs in the ift-20 promoter ...... 176 4.2.20 X-box motifs in the ifta-1 promoter ...... 178 4.2.21 X-box motifs in the mks-1 promoter ...... 179 4.2.22 X-box motifs in the mks-6 promoter ...... 181 4.2.23 X-box motifs in the mksr-1 promoter ...... 182 4.2.24 X-box motifs in the mksr-2 promoter ...... 184 4.2.25 X-box motifs in the nphp-2 promoter ...... 186 4.2.26 X-box motifs in the odr-4 promoter ...... 188

vii 4.2.27 X-box motifs in the osm-1 promoter ...... 190 4.2.28 X-box motifs in the osm-5 promoter ...... 192 4.2.29 X-box motifs in the osm-6 promoter ...... 194 4.2.30 X-box motifs in the osm-12 promoter ...... 197 4.2.31 X-box motifs in the tub-1 promoter ...... 200 4.2.32 X-box motifs in the xbx-1 promoter ...... 201 4.3 Putative X-box motifs in C. briggsae promoters ...... 204 4.4 Discussion of validation procedures ...... 208 4.4.1 In vitro methods ...... 208 4.4.2 Transcriptional reporter gene assay ...... 208 4.4.3 MosSCI assay ...... 209 4.5 Polymorphisms in C. elegans strains ...... 210 4.6 Discussion ...... 213

5 Conclusion 216

Bibliography 218

viii List of Tables

Table 1.1 Ciliary genes in C. elegans ...... 8 Table 1.2 Common symptoms in ciliopathies (reviewed in Hildebrandt et al. (2011); Waters and Beales (2011)) ...... 8 Table 1.3 The nematode species used in this study, including C. elegans and 25 addi- tional species ...... 11

Table 2.1 List of high confidence ciliary genes in C. elegans ...... 15

Table 3.1 Core genes present in single copy across all 26 nematode species ...... 26 Table 3.2 Curation of arl-6 orthologs ...... 30 Table 3.3 Curation of bbs-1 orthologs ...... 33 Table 3.4 Curation of bbs-2 orthologs ...... 36 Table 3.5 Curation of bbs-4 orthologs ...... 39 Table 3.6 Curation of bbs-5 orthologs ...... 42 Table 3.7 Curation of bbs-8 orthologs ...... 46 Table 3.8 Curation of bbs-9 orthologs ...... 49 Table 3.9 Curation of che-2 orthologs ...... 52 Table 3.10 Curation of che-11 orthologs ...... 55 Table 3.11 Curation of che-13 orthologs ...... 59 Table 3.12 Curation of dyf-1 orthologs ...... 62 Table 3.13 Curation of dyf-2 orthologs ...... 65 Table 3.14 Curation of dyf-3 orthologs ...... 69 Table 3.15 Curation of dyf-5 orthologs ...... 72 Table 3.16 Curation of dyf-11 orthologs ...... 76 Table 3.17 Curation of dyf-13 orthologs ...... 79 Table 3.18 Curation of dyf-18 orthologs ...... 83 Table 3.19 Curation of dylt-2 orthologs ...... 86 Table 3.20 Curation of ift-20 orthologs ...... 89 Table 3.21 Curation of ifta-1 orthologs ...... 92 Table 3.22 Curation of mks-1 orthologs ...... 95 Table 3.23 Curation of mks-6 orthologs ...... 98 Table 3.24 Curation of mksr-1 orthologs ...... 101

ix Table 3.25 Curation of mksr-2 orthologs ...... 104 Table 3.26 Curation of nphp-2 orthologs ...... 107 Table 3.27 Curation of odr-4 orthologs ...... 110 Table 3.28 Curation of osm-1 orthologs ...... 113 Table 3.29 Curation of osm-5 orthologs ...... 116 Table 3.30 Curation of osm-6 orthologs ...... 119 Table 3.31 Curation of osm-12 orthologs ...... 122 Table 3.32 Curation of tub-1 orthologs ...... 125 Table 3.33 Curation of xbx-1 orthologs ...... 129 Table 3.34 Summary of gene annotation of ciliary gene orthologs ...... 131

Table 4.1 Position specific scoring matrix used to score X-box motifs ...... 142 Table 4.2 Alignment of X-box motifs in the arl-6 promoter ...... 143 Table 4.3 Alignment of X-box motifs in the bbs-1 promoter ...... 145 Table 4.4 Alignment of X-box motifs in the bbs-2 promoter ...... 147 Table 4.5 Alignment of X-box motifs in the bbs-4 promoter ...... 149 Table 4.6 Alignment of X-box motifs in the bbs-5 promoter ...... 150 Table 4.7 Alignment of X-box motifs in the bbs-8 promoter ...... 153 Table 4.8 Alignment of X-box motifs in the bbs-9 promoter ...... 155 Table 4.9 Alignment of X-box motifs in the che-2 promoter ...... 157 Table 4.10 Alignment of X-box motifs in the che-11 promoter ...... 158 Table 4.11 Alignment of X-box motifs in the che-13 promoter ...... 160 Table 4.12 Alignment of X-box motifs in the dyf-1 promoter ...... 162 Table 4.13 Alignment of X-box motifs in the dyf-2 promoter ...... 164 Table 4.14 Alignment of X-box motifs in the dyf-3 promoter ...... 165 Table 4.15 Alignment of X-box motifs in the dyf-5 promoter ...... 168 Table 4.16 Alignment of X-box motifs in the dyf-11 promoter ...... 170 Table 4.17 Alignment of X-box motifs in the dyf-13 promoter ...... 172 Table 4.18 Alignment of X-box motifs in the dyf-18 promoter ...... 173 Table 4.19 Alignment of X-box motifs in the dylt-2 promoter ...... 175 Table 4.20 Alignment of X-box motifs in the ift-20 promoter ...... 177 Table 4.21 Alignment of X-box motifs in the ifta-1 promoter ...... 178 Table 4.22 Alignment of X-box motifs in the mks-1 promoter ...... 180 Table 4.23 Alignment of X-box motifs in the mks-6 promoter ...... 181 Table 4.24 Alignment of X-box motifs in the mksr-1 promoter ...... 183 Table 4.25 Alignment of X-box motifs in the mksr-2 promoter ...... 185 Table 4.26 Alignment of X-box motifs in the nphp-2 promoter ...... 187 Table 4.27 Alignment of X-box motifs in the odr-4 promoter ...... 188 Table 4.28 Alignment of X-box motifs in the osm-1 promoter ...... 191

x Table 4.29 Alignment of X-box motifs in the osm-5 promoter ...... 193 Table 4.30 Alignment of X-box motifs in the osm-6 promoter ...... 196 Table 4.31 Alignment of X-box motifs in the osm-12 promoter ...... 198 Table 4.32 Alignment of X-box motifs in the tub-1 promoter ...... 200 Table 4.33 Alignment of X-box motifs in the xbx-1 promoter ...... 202 Table 4.34 Summary of atypical X-box motifs in C. briggsae ...... 204 Table 4.35 C. elegans strains sequenced in Million Mutation Project (Thompson et al., 2013) ...... 210

xi List of Figures

Figure 1.1 Multiple sequence alignment of DNA binding domain in DAF-19 orthologs. The arrows indicate residues that interact with the X-box motif...... 4 Figure 1.2 Phylogenetic tree produced from DAF-19 DNA binding domain sequences in nematodes...... 5 Figure 1.3 Molecular structure of the cilium...... 6

Figure 2.1 LOGO of X-box motifs in the C. elegans training set ...... 19 Figure 2.2 Searching for half-motifs using TFM-scan ...... 20

Figure 3.1 Distribution of species represented in 1:1:...:1 ortholog clusters ...... 24 Figure 3.2 Distribution of species represented in 1:1:...:1 ortholog clusters ...... 25 Figure 3.3 Phylogenetic tree showing relationships between nematode species . . . . 27 Figure 3.4 Phylogenetic tree showing relationships between nematode species, with branch lengths included to show similarities between clades ...... 28 Figure 3.5 Multiple sequence alignment of first 100a.a. of arl-6 orthologs. Among 25 nematode genomes, 26 arl-6 orthologs are found, and none are not found. Note: H. contortus contains two arl-6 genes, and both genes have high confidence 5’ start sites...... 32 Figure 3.6 Multiple sequence alignment of first 100a.a. of bbs-1 orthologs. Among 25 nematode genomes, 26 bbs-1 orthologs are found, and none are not found. Note: P. redivivus contains two bbs-1 genes, and both genes have high con- fidence 5’ start sites...... 35 Figure 3.7 Multiple sequence alignment of first 100a.a. of bbs-2 orthologs. Among 25 nematode genomes, 25 bbs-2 orthologs are found, and none are not found. 38 Figure 3.8 Multiple sequence alignment of first 100a.a. of bbs-4 orthologs. Among 25 nematode genomes, 25 bbs-4 orthologs are found, and 2 are not found. Note: C. briggsae and H. contortus contain two bbs-4 genes. Both of the C. briggsae bbs-4 genes have high confidence 5’ start sites, but only one of the H. contortus genes has a high confidence 5’ start site...... 41

xii Figure 3.9 Multiple sequence alignment of first 100a.a. of bbs-5 orthologs. Among 25 nematode genomes, 26 bbs-5 orthologs are found, and none are not found. Note: M. incognita contains two bbs-5 genes, and both genes have high confidence 5’ start sites...... 44 Figure 3.10 Multiple sequence alignment of first 100a.a. of bbs-8 orthologs. Among 25 nematode genomes, 27 bbs-8 orthologs are found, and none are not found. Note: C. brenneri and M. incognita contain two bbs-8 genes. Both C. bren- neri bbs-8 genes have high confidence 5’ start sites, but only one M. incog- nita has a high confidence 5’ start site...... 48 Figure 3.11 Multiple sequence alignment of first 100a.a. of bbs-9 orthologs. Among 25 nematode genomes, 19 bbs-9 orthologs are found, and 6 are not found. 51 Figure 3.12 Multiple sequence alignment of first 100a.a. of che-2 orthologs. Among 25 nematode genomes, 25 che-2 orthologs are found, and none are not found. 54 Figure 3.13 Multiple sequence alignment of first 100a.a. of che-11 orthologs. Among 25 nematode genomes, 24 che-11 orthologs are found, and 1 is not found. 57 Figure 3.14 Multiple sequence alignment of first 100a.a. of che-11 orthologs, only showing genes with high confidence 5’ start sites. This additional align- ment is generated in order to show the conserved regions more clearly and remove noise caused by sequences without high confidence 5’ start sites. . 57 Figure 3.15 Multiple sequence alignment of first 100a.a. of che-13 orthologs. Among 25 nematode genomes, 24 che-13 orthologs are found, and 1 is not found. 61 Figure 3.16 Multiple sequence alignment of first 100a.a. of dyf-1 orthologs. Among 25 nematode genomes, 26 dyf-1 orthologs are found, and none are not found. Note: C. brenneri contains two dyf-1 genes, and both genes have high con- fidence 5’ start sites...... 64 Figure 3.17 Multiple sequence alignment of first 100a.a. of dyf-2 orthologs. Among 25 nematode genomes, 24 dyf-2 orthologs are found, and 1 is not found. . . . 67 Figure 3.18 Multiple sequence alignment of first 100a.a. of dyf-2 orthologs, only show- ing genes with high confidence 5’ start sites. This additional alignment is generated in order to show the conserved regions more clearly and remove noise caused by sequences without high confidence 5’ start sites . . . . . 68 Figure 3.19 Multiple sequence alignment of first 100a.a. of dyf-3 orthologs. Among 25 nematode genomes, 24 dyf-3 orthologs are found, and 1 is not found. . . . 71 Figure 3.20 Multiple sequence alignment of first 100a.a. of dyf-5 orthologs. Among 25 nematode genomes, 29 dyf-5 orthologs are found, and none are not found. Note: C. japonica contains two dyf-5 genes, and M. incognita contains three dyf-5 genes. One of the C. japonica dyf-5 genes has a high confidence 5’ start site, and two of the M. incognita dyf-5 genes have high confidence 5’ start sites...... 74

xiii Figure 3.21 Multiple sequence alignment of first 100a.a. of dyf-11 orthologs. Among 25 nematode genomes, 24 dyf-11 orthologs are found, and 1 is not found. . 78 Figure 3.22 Multiple sequence alignment of first 100a.a. of dyf-13 orthologs. Among 25 nematode genomes, 24 dyf-13 orthologs are found, and 2 are not found. Note: C. brenneri contains two dyf-13 genes, and only one of those genes have a high confidence 5’ start site...... 81 Figure 3.23 Multiple sequence alignment of first 100a.a. of dyf-13 orthologs, only showing genes with high confidence 5’ start sites...... 81 Figure 3.24 Multiple sequence alignment of first 100a.a. of dyf-18 orthologs. Among 25 nematode genomes, 25 dyf-18 orthologs are found, and none are not found. 85 Figure 3.25 Multiple sequence alignment of first 100a.a. of dylt-2 orthologs. Among 25 nematode genomes, 20 dylt-2 orthologs are found, and 5 are not found. 88 Figure 3.26 Multiple sequence alignment of first 100a.a. of ift-20 orthologs. Among 25 nematode genomes, 23 ift-20 orthologs are found, and 3 are not found. Note: C. brenneri contains two ift-20 genes, and both genes have high con- fidence 5’ start sites...... 91 Figure 3.27 Multiple sequence alignment of first 100a.a. of ifta-1 orthologs. Among 25 nematode genomes, 21 ifta-1 orthologs are found, and 4 are not found. 94 Figure 3.28 Multiple sequence alignment of first 100a.a. of mks-1 orthologs. Among 25 nematode genomes, 11 mks-1 orthologs are found, and 14 are not found. 97 Figure 3.29 Multiple sequence alignment of first 100a.a. of mks-6 orthologs. Among 25 nematode genomes, 9 mks-6 orthologs are found, and 16 are not found. 100 Figure 3.30 Multiple sequence alignment of first 100a.a. of mksr-1 orthologs. Among 25 nematode genomes, 23 mksr-1 orthologs are found, and 2 are not found. 103 Figure 3.31 Multiple sequence alignment of first 100a.a. of mksr-2 orthologs. Among 25 nematode genomes, 24 mksr-2 orthologs are found, and 1 is not found. 106 Figure 3.32 Multiple sequence alignment of first 100a.a. of nphp-2 orthologs. Among 25 nematode genomes, 23 nphp-2 orthologs are found, and 3 are not found. Note: H. contortus contains two nphp-2 genes, and neither genes has a high confidence 5’ start site...... 109 Figure 3.33 Multiple sequence alignment of first 100a.a. of odr-4 orthologs. Among 25 nematode genomes, 19 odr-4 orthologs are found, and 7 are not found. Note: H. contortus contains two odr-4 genes, and neither gene has a high confidence 5’ start site...... 112 Figure 3.34 Multiple sequence alignment of first 100a.a. of osm-1 orthologs. Among 25 nematode genomes, 27 osm-1 orthologs are found, and none are not found. Note: H. contortus and M. incognita contain two osm-1 genes, and all of these genes have high confidence 5’ start sites...... 115

xiv Figure 3.35 Multiple sequence alignment of first 100a.a. of osm-5 orthologs. Among 25 nematode genomes, 24 osm-5 orthologs are found, and 1 is not found. . 118 Figure 3.36 Multiple sequence alignment of first 100a.a. of osm-6 orthologs. Among 25 nematode genomes, 27 osm-6 orthologs are found, and none are not found. Note: C. brenneri and M. incognita contain two osm-6 genes, and all of these genes have high confidence 5’ start sites...... 121 Figure 3.37 Multiple sequence alignment of first 100a.a. of osm-12 orthologs. Among 25 nematode genomes, 25 osm-12 orthologs are found, and 1 is not found. Note: H. contortus contains two osm-12 genes, and both genes have high confidence 5’ start sites...... 124 Figure 3.38 Multiple sequence alignment of first 100a.a. of tub-1 orthologs. Among 25 nematode genomes, 25 tub-1 orthologs are found, and 1 is not found. Note: C. angaria contains two tub-1 genes, and both genes have high confidence 5’ start sites...... 127 Figure 3.39 Multiple sequence alignment of first 100a.a. of tub-1 orthologs, only show- ing genes with high confidence 5’ start sites. This additional alignment is generated in order to show the conserved regions more clearly and remove noise caused by sequences without high confidence 5’ start sites . . . . . 127 Figure 3.40 Multiple sequence alignment of first 100a.a. of xbx-1 orthologs. Among 25 nematode genomes, 26 xbx-1 orthologs are found, and 2 are not found. Note: H. contortus, M. incognita, and O. volvulus contain two xbx-1 genes. Both genes in H. contortus and O. volvulus have high confidence 5’ start sites, and neither M. incognita gene has a high confidence 5’ start site. . . 131 Figure 3.41 C. elegans osm-5 protein sequence constructed from RNA-seq reads . . . 134 Figure 3.42 Pairwise alignment between C. elegans osm-5 reference sequence and se- quence reconstructed from RNA-seq reads ...... 134 Figure 3.43 C. elegans osm-5 protein sequence constructed from RNA-seq reads, using C. briggsae osm-5 as the query ...... 136 Figure 3.44 5’ end of C. briggsae osm-5 gene model, showing minor revisions to the end of the first exon supported by RNA-seq splicing junctions ...... 136 Figure 3.45 5’ excerpt of pairwise alignment between C. elegans (Y41G9A.1) and C. briggsae (CBG02013) osm-5 sequences ...... 136 Figure 3.46 Pairwise alignment between C. elegans osm-5 reference sequence and re- constructed osm-5 from a C. briggsae osm-5 query, using TBLASTN e- value ≤ 1 ...... 137 Figure 3.47 Multiple sequence alignment between C. elegans osm-5 reference sequence, C. briggsae osm-5 reference sequence, and reconstructed osm-5 from a C. briggsae osm-5 query, using TBLASTN e-value ≤ 1 ...... 138

xv Figure 3.48 M. incognita osm-5 protein sequence constructed from RNA-seq reads, us- ing C. elegans osm-5 as the query ...... 139 Figure 3.49 Pairwise alignment between C. elegans osm-5 and the reconstructed M. incognita osm-5 ...... 139

Figure 4.1 Distribution of X-box motifs in the arl-6 promoter...... 144 Figure 4.2 Sequence logos of arl-6 X-box motifs...... 144 Figure 4.3 LOGO depicting aligned arl-6 X-box motifs and 30bp of flanking promoter sequence...... 144 Figure 4.4 Distribution of X-box motifs in the bbs-1 promoter...... 146 Figure 4.5 Sequence logo of bbs-1 X-box motifs...... 146 Figure 4.6 LOGO depicting aligned bbs-1 X-box motifs and 30bp of flanking promoter sequence...... 146 Figure 4.7 Distribution of X-box motifs in the bbs-2 promoter...... 148 Figure 4.8 Sequence logo of bbs-2 X-box motifs...... 148 Figure 4.9 LOGO depicting aligned bbs-2 X-box motifs and 30bp of flanking promoter sequence...... 148 Figure 4.10 Distribution of X-box motifs in the bbs-4 promoter...... 149 Figure 4.11 Sequence logo of bbs-4 typical X-box motifs (generated from 18 input se- quences, including C. elegans motif(s)) ...... 150 Figure 4.12 LOGO depicting aligned bbs-4 X-box motifs and 30bp of flanking promoter sequence...... 150 Figure 4.13 Distribution of X-box motifs in the bbs-5 promoter...... 151 Figure 4.14 Sequence logo of bbs-5 typical X-box motifs (generated from 27 input se- quences, including C. elegans motif(s))...... 151 Figure 4.15 LOGO depicting aligned bbs-5 X-box motifs and 30bp of flanking promoter sequence...... 151 Figure 4.16 Distribution of X-box motifs in the bbs-8 promoter...... 154 Figure 4.17 Sequence logo of bbs-8 X-box motifs...... 154 Figure 4.18 LOGO depicting aligned bbs-8 X-box motifs and 30bp of flanking promoter sequence...... 154 Figure 4.19 Distribution of X-box motifs in the bbs-9 promoter...... 156 Figure 4.20 Sequence logo of bbs-9 typical X-box motifs (generated from 19 input se- quences, including C. elegans motif(s))...... 156 Figure 4.21 LOGO depicting aligned bbs-9 X-box motifs and 30bp of flanking promoter sequence...... 156 Figure 4.22 Distribution of X-box motifs in the che-2 promoter...... 157 Figure 4.23 Sequence logo of che-2 X-box motifs...... 158

xvi Figure 4.24 LOGO depicting aligned che-2 X-box motifs and 30bp of flanking promoter sequence...... 158 Figure 4.25 Distribution of X-box motifs in the che-11 promoter...... 159 Figure 4.26 Sequence logo of che-11 typical X-box motifs (generated from 19 input sequences, including C. elegans motif(s))...... 159 Figure 4.27 LOGO depicting aligned che-11 X-box motifs and 30bp of flanking pro- moter sequence...... 159 Figure 4.28 Distribution of X-box motifs in the che-13 promoter...... 160 Figure 4.29 Sequence logo of che-13 X-box motifs...... 161 Figure 4.30 LOGO depicting aligned che-13 X-box motifs and 30bp of flanking pro- moter sequence...... 161 Figure 4.31 Distribution of X-box motifs in the dyf-1 promoter...... 163 Figure 4.32 Sequence logo of dyf-1 X-box motifs...... 163 Figure 4.33 LOGO depicting aligned dyf-1 X-box motifs and 30bp of flanking promoter sequence...... 163 Figure 4.34 Distribution of X-box motifs in the dyf-2 promoter...... 164 Figure 4.35 Sequence logo of dyf-2 X-box motifs...... 165 Figure 4.36 LOGO depicting aligned dyf-2 X-box motifs and 30bp of flanking promoter sequence...... 165 Figure 4.37 Distribution of X-box motifs in the dyf-3 promoter...... 166 Figure 4.38 Sequence logo of dyf-3 X-box motifs...... 167 Figure 4.39 LOGO depicting aligned dyf-3 X-box motifs and 30bp of flanking promoter sequence...... 167 Figure 4.40 Distribution of X-box motifs in the dyf-5 promoter...... 169 Figure 4.41 Sequence logo of dyf-5 X-box motifs...... 169 Figure 4.42 LOGO depicting aligned dyf-5 X-box motifs and 30bp of flanking promoter sequence...... 169 Figure 4.43 Distribution of X-box motifs in the dyf-11 promoter...... 171 Figure 4.44 Sequence logo of dyf-11 X-box motifs...... 171 Figure 4.45 LOGO depicting aligned dyf-11 X-box motifs and 30bp of flanking pro- moter sequence...... 171 Figure 4.46 Distribution of X-box motifs in the dyf-13 promoter...... 172 Figure 4.47 Sequence logo of dyf-13 X-box motifs...... 173 Figure 4.48 LOGO depicting aligned dyf-13 X-box motifs and 30bp of flanking pro- moter sequence...... 173 Figure 4.49 Distribution of X-box motifs in the dyf-18 promoter...... 174 Figure 4.50 Sequence logo of dyf-18 X-box motifs...... 174 Figure 4.51 LOGO depicting aligned dyf-18 X-box motifs and 30bp of flanking pro- moter sequence...... 174

xvii Figure 4.52 Distribution of X-box motifs in the dylt-2 promoter...... 176 Figure 4.53 Sequence logo of dylt-2 typical X-box motifs (generated from 21 input se- quences, including C. elegans motif(s))...... 176 Figure 4.54 LOGO depicting aligned dylt-2 X-box motifs and 30bp of flanking pro- moter sequence...... 176 Figure 4.55 Distribution of X-box motifs in the ift-20 promoter...... 177 Figure 4.56 Sequence logo of ift-20 typical X-box motifs (generated from 21 input se- quences, including C. elegans motif(s))...... 178 Figure 4.57 LOGO depicting aligned ift-20 X-box motifs and 30bp of flanking promoter sequence...... 178 Figure 4.58 Distribution of X-box motifs in the ifta-1 promoter...... 179 Figure 4.59 Sequence logo of ifta-1 typical X-box motifs (generated from 18 input se- quences, including C. elegans motif(s))...... 179 Figure 4.60 LOGO depicting aligned ifta-1 X-box motifs and 30bp of flanking promoter sequence...... 179 Figure 4.61 Distribution of X-box motifs in the mks-1 promoter...... 180 Figure 4.63 LOGO depicting aligned mks-1 X-box motifs and 30bp of flanking pro- moter sequence...... 180 Figure 4.62 Sequence logo of mks-1 X-box motifs...... 181 Figure 4.64 Distribution of X-box motifs in the mks-6 promoter...... 182 Figure 4.65 Sequence logo of mks-6 typical X-box motifs (generated from 7 input se- quences, including C. elegans motif(s))...... 182 Figure 4.66 LOGO depicting aligned mks-6 X-box motifs and 30bp of flanking pro- moter sequence...... 182 Figure 4.67 Distribution of X-box motifs in the mksr-1 promoter...... 183 Figure 4.68 Sequence logo of mksr-1 typical X-box motifs (generated from 21 input sequences, including C. elegans motif(s))...... 184 Figure 4.69 LOGO depicting aligned mksr-1 X-box motifs and 30bp of flanking pro- moter sequence...... 184 Figure 4.70 Distribution of X-box motifs in the mksr-2 promoter...... 186 Figure 4.71 Sequence logo of mksr-2 X-box motifs...... 186 Figure 4.72 LOGO depicting aligned mksr-2 X-box motifs and 30bp of flanking pro- moter sequence...... 186 Figure 4.73 Distribution of X-box motifs in the nphp-2 promoter...... 187 Figure 4.74 Sequence logo of nphp-2 X-box motifs...... 188 Figure 4.75 LOGO depicting aligned nphp-2 X-box motifs and 30bp of flanking pro- moter sequence...... 188 Figure 4.76 Distribution of X-box motifs in the odr-4 promoter...... 189 Figure 4.77 Sequence logo of odr-4 X-box motifs...... 189

xviii Figure 4.78 LOGO depicting aligned odr-4 X-box motifs and 30bp of flanking promoter sequence...... 189 Figure 4.79 Distribution of X-box motifs in the osm-1 promoter...... 192 Figure 4.80 Sequence logo of osm-1 typical X-box motifs (generated from 28 input se- quences, including C. elegans motif(s))...... 192 Figure 4.81 LOGO depicting aligned osm-1 X-box motifs and 30bp of flanking pro- moter sequence...... 192 Figure 4.82 Distribution of X-box motifs in the osm-5 promoter...... 194 Figure 4.83 Sequence logo of osm-5 X-box motifs...... 194 Figure 4.84 LOGO depicting aligned osm-5 X-box motifs and 30bp of flanking pro- moter sequence...... 194 Figure 4.85 Distribution of X-box motifs in the osm-6 promoter...... 197 Figure 4.86 Sequence logo of osm-6 X-box motifs...... 197 Figure 4.87 LOGO depicting aligned osm-6 X-box motifs and 30bp of flanking pro- moter sequence...... 197 Figure 4.88 Distribution of X-box motifs in the osm-12 promoter...... 199 Figure 4.89 Sequence logo of osm-12 X-box motifs...... 199 Figure 4.90 LOGO depicting aligned osm-12 X-box motifs and 30bp of flanking pro- moter sequence...... 199 Figure 4.91 Distribution of X-box motifs in the tub-1 promoter...... 201 Figure 4.92 Sequence logo of tub-1 typical X-box motifs (generated from 16 input se- quences, including C. elegans motif(s))...... 201 Figure 4.93 LOGO depicting aligned tub-1 X-box motifs and 30bp of flanking promoter sequence...... 201 Figure 4.94 Distribution of X-box motifs in the xbx-1 promoter...... 203 Figure 4.95 Sequence logo of xbx-1 X-box motifs...... 203 Figure 4.96 LOGO depicting aligned xbx-1 X-box motifs and 30bp of flanking promoter sequence...... 203 Figure 4.97 Promoter alignment of C. elegans osm-6 and C. briggsae Cbr-osm-6 . . . 205 Figure 4.98 Promoter alignment of C. elegans mksr-2 and C. briggsae Cbr-mksr-2. . . 206 Figure 4.99 Promoter alignment of C. elegans bbs-4 and C. briggsae CBG10029. . . . 207 Figure 4.100 Promoter alignment of C. elegans bbs-4 and C. briggsae CBG09893. . . . 207 Figure 4.101 bbs-5 X-box motif variations in the JU360 and PX174 strains...... 212

xix List of Acronyms

BBS Bardet-Biedl Syndrome

BRE B Recognition Element

DBD DNA-Binding Domain

DPE Downstream Promoter Element

HBV Hepatitis B Virus

HLA Human Leukocyte Antigen

HMM Hidden Markov Model

IFT Intraflagellar Transport

INR Initiator Element

MHC Major Histocompatibility Complex

MKS Meckel Syndrome

MSA Multiple Sequence Alignment

PSSM Position Specific Scoring Matrix

RFX Regulatory Factor X

TFM Transcription Factor Matrices

xx Glossary

BBSome Conserved complex of BBS proteins that is transported along the cilium and required for proper cilia function

BLASTP Program that uses protein input sequences to search for similar sequences in a specified protein sequence database

GeMoMa (Gene Model Mapper) Program that defines gene models in a target genome based on protein input sequences, using individual the amino acid sequences of individual exons genBlastG Program that defines gene models in a target genome based on protein input sequences

HMMER Program that uses a hidden markov model generated from input sequences to search for similar motifs in target sequences

InParanoid Program that identifies orthologs between two species based on similarities in protein sequence

MEGA6 Program that contains features for generating multiple sequence alignments and building phylogenetic trees

Orthologs Genes that share a common ancestor, and diverged due to speciation

PFAM domain Database of protein families, containing multiple sequence alignments and HMMs of protein domain sequences

RNA-seq Sequencing technology that captures information from RNA in a cell, including sequences and quantity

TATA box DNA sequence rich in T and A nucleotides that serves as a binding site for the general transcription factor TFIID

TBLASTN Program that uses a protein input sequences to search for similar sequences in a specified nucleotide sequence database, translated in all six reading frames

xxi TFM-Scan (Transcription Factor Matrices-Scan) Program that uses a position specific scoring matrix generated from input sequences to search for transcription factor binding sites in target sequences

xxii Chapter 1

Introduction

1.1 Overview of transcriptional regulation of genes

All of the genetic information in an organism is encoded in the genome, but much of the molecular activity in a cell occurs at the protein level. In eukaryotes, RNA is synthesized from the genome as an intermediary step during the process of transcription, and the RNA molecules are used as templates to synthesize proteins. This phenomenon, stated by Francis Crick, is termed the central dogma of molecular biology (Crick, 1970).

In eukaryotic cells, transcription is facilitated by RNA polymerase I, II, and III. RNA polymerase II in particular synthesizes mRNA, which codes for proteins, and requires the assembly and coordination of a series of transcription factors. These are termed the general transcription factors and include TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. The general transcription factors are named as such because they are involved in transcription at a basal level and do not modulate the transcription of specific genes. The process is initiated when a subunit of TFIID, called the TATA binding protein, binds to the TATA box roughly 25bp upstream of the transcription start site. The TATA box is most important DNA sequence for most promoters recognized by RNA polymerase II, but there exist other notable binding sites for transcription factors such as BRE, INR, and DPE, which are binding sites for TFIIB, TFIID, and TFIID, respectively. The DNA undergoes physical distortion as a result of the binding of TFIID, allowing other TFs and RNA polymerase II to assemble. A large role of these general transcription factors is to facilitate the necessary physical conformation of the chromatin and protein complexes, although some of the general transcription factors have specific functions. For example, TFIIH contains a helicase, which unwinds double-stranded DNA. RNA polymerase undergoes conformational changes which allow it to transcribe longer sequences. As the process of transcription is underway, the general transcription factors dissociate. (Reviewed in Alberts et al. (2014), Chapter 6.)

1 In addition to general transcriptional regulation, cells also have mechanisms to regulate the transcrip- tion of specific genes. Transcriptional regulation is an important process for complex organisms— this allows differentiated cells to express specific genes needed for development and maintenance of cell type as well as enabling the organism to respond to environmental stimuli. Genes contain cis- regulatory elements in their promoter, and transcription is affected when regulatory proteins bind to these sites. These binding sites are also referred to as enhancers.

Transcriptional regulators often fall into these general categories: helix-turn-helix, homeodomains, leucine zippers, and zinc fingers. These regulators are often dimers, linked by weak affinity of monomers. Most transcriptional regulators attract coactivators or corepressors that do not recognize DNA but have protein-protein interactions with the transcriptional regulators. One such coactivator that is involved in the transcription of many genes is Mediator. Gene-specific transcriptional regula- tion mechanisms also promotes transcription by facilitating conformational changes in the chromatin structure. These conformational changes afford RNA polymerase greater access to the DNA, thereby promoting transcription of the associated gene. (Reviewed in Alberts et al. (2014), Chapter 7.)

In particular, we are interested in Regulatory Factor X (RFX) transcription factors, which have been linked to ciliogenesis. RFX transcription factors contain helix-turn-helix DNA-binding domains (DBDs), which bind to X-box motifs in the promoters of target genes.

1.2 Discovery of X-box motifs and RFX genes

X-box motifs were first identified while studying the regulation of MHC class II genes. Conserved regions were identified by aligning human and mouse promoters of an MHC class II gene promoter (Mathis et al., 1983; Saito et al., 1983; Kelly and Trowsdale, 1985; O’Sullivan et al., 1986). Two of these upstream sequences were later termed the X-box and Y-box, which have a stretch of sequence in between with a conserved length (19-20bp) but variable nucleotide composition (Dorn et al., 1987). The X-box motif is a conserved 13-15bp sequence, comprised of two 6bp half-motifs with a 0-3bp spacer in between, where a 1 or 2bp spacer is preferred (Emery et al., 1996). Deletion of either the X or Y motifs abolishes promoter and enhancer activity in multiple cell lines and in vitro (reviewed in Benoist and Mathis (1990)). Additional cis-regulatory elements, such as the S and W motifs, are also involved in the regulation of these genes and affect gene expression when altered (reviewed in Benoist and Mathis (1990)). MHC genes are an essential component of the mammalian immune system, and deficiencies in MHC gene expression result in bare lymphocyte syndrome, a disease involving severe immunodeficiencies and recurrent bacterial and viral infections. Some cases of bare lymphocyte syndrome are a result of lack of expression of MHC class II genes (reviewed in Benoist and Mathis (1990)).

2 Reith et al. (1988, 1989) identified a protein that binds to the X-box in the HLA-DRA gene, and named this X-box binding protein RF-X. Cell lines established from patients with MHC class II deficiencies lack this RF-X binding and MHC class II genes are not expressed. RF-X binds to vari- ous MHC class II genes, including HLA-DRA, HLA-DRB1, HLA-DRB3, HLA-DPA, HLA-DQA, mouse Eα and Eβ with varying affinities (Reith et al., 1989). The RF-X transcript is still produced at its normal size and expressed at its normal abundance in these cells, indicating that RF-X is defec- tive at the protein level. This gene was later mapped to chr 19 by Pugliatti et al. (1992) and renamed RFX1. RFX1 also shares homology with enhancer factor C (EF-C), which binds to the hepatitis B virus (HBV) enhancer, functioning as a transactivator of the HBV enhancer (Siegrist et al., 1993). RFX2 was also cloned in Pugliatti et al. (1992), and is expressed in the testis and is involved in the regulation of spermatogenesis (Horvath et al., 2004).

The structure of the DNA-binding domain of human RFX1 has been determined by X-ray crystallog- raphy (Gajiwala et al., 2000). The X-box motif is palindromic and consists of an imperfect inverted repeat. The RFX1 DBD binds to half of the X-box motif but the DBDs of the two RFX proteins that bind to the X-box motif do not interact with each other. Thus, RFX1 acts as a symmetrical dimer and can form complexes with RFX1, RFX2, or RFX3 (Gajiwala et al., 2000). 9 residues of the DBD interact with the X-box motif; these are indicated in Figure 1.1.

1.3 More recent efforts to identify RFX genes, X-box motifs, and cil- iary genes

Five additional RFX genes have been identified in humans, resulting in a total of 7 RFX genes (RFX1-7) conserved in mammalian genomes (Pugliatti et al., 1992; Reith et al., 1994; Steimle et al., 1995; Aftab et al., 2008). One RFX gene is present in yeast (sak1 in S. pombe, Crt1 in S. cerevisiae) (Wu and McLeod, 1995; Huang et al., 1998), three RFX genes are present in Drosophila (dRFX, dRFX1, dRFX2) (Dubruille et al., 2002; Otsuki et al., 2004; Chu et al., 2010), and one RFX gene is present in C. elegans (daf-19) (Swoboda et al., 2000). RFX transcription factors regulate genes from a variety of biological processes, and Swoboda et al. (2000) established the link to ciliogenesis by studying daf-19 mutations in C. elegans. In daf-19 mutants, cilia are absent and worms display phenotypes often associated with cilia. For example, the amphid and phasmid neurons in C. elegans will typically take up fluorescent dyes such as DiI or 5-fluorescein isothiocyanate (FITC) (Perkins et al., 1986). This dye uptake has been linked to ciliogenesis, and ciliary mutants (including daf-19 mutants) fail to fill with dye (Swoboda et al., 2000). In addition, daf-19 mutants display a constitutive Dauer phenotype, where development is arrested in a larval stage, emphasizing the importance of cilia in worm development (Swoboda et al., 2000).

Swoboda et al. (2000) used previously identified mammalian X-box motifs as a query to search for X-box motifs in promoters of genes implicated in sensory or ciliary functions. X-box motifs have

3 also identified in specific ciliary genes: for example, Haycraft et al. (2001) identified and established osm-5 as a ciliary gene, and verified that a putative X-box motif in the osm-5 promoter was functional by mutagenesis. This X-box motif was identified by aligning C. elegans and C. briggsae promoter sequences, based on the idea that regulatory elements should be more conserved than the general promoter region (Haycraft et al., 2001). Other examples include arl-6 and che-13, where putative X- box motifs were identified in promoters and the promoter region was able to drive GFP reporter gene expression (Fan et al., 2004; Haycraft et al., 2003) Since then, there have been several genome-wide efforts to identify X-box motifs and ciliary genes using sequence similarity search methods based on previously identified X-box motifs (Efimenko et al., 2005; Blacque et al., 2005; Chen et al., 2006; Phirke et al., 2011).

1.4 RFX genes in nematodes

Currently, there are 25 nematode species in addition to C. elegans with available genome sequences (Table 1.3). Using InParanoid, we were able to identify daf-19 orthologs in all nematode species. Each species has one copy of a daf-19 ortholog with the exception of C. brenneri, which has two copies of the gene. Using the PFAM RFX_DNA_binding domain and HMMER (v3.1b1), we found DNA-binding domains in all of these daf-19 orthologs except in C. japonica, where the daf- 19 gene is near the end of a contig and is truncated. For all genes, we used WormBase annotations, except in cases where no InParanoid hit was found. In these cases, we used whichever (GeMoMa or genBlastG) gene model that had a higher PID when aligned with C. elegans daf-19 (for more details on GeMoMa and genBlastG, see Section 2.3).

Figure 1.1: Multiple sequence alignment of DNA binding domain in DAF-19 orthologs. The arrows indicate residues that interact with the X-box motif.

4 Figure 1.2: Phylogenetic tree produced from DAF-19 DNA binding domain sequences in nematodes.

As expected, the DNA binding domain is highly conserved across these nematode species, confirm- ing that ciliary genes in these species are also regulated by DAF-19 via binding to X-box motifs. The residues that interact with the X-box motif are indicated with arrows, and these residues do not vary at all across the genomes studied (Gajiwala et al., 2000). This conservation suggests similarly strong conservation in the X-box motifs.

1.5 Overview of cilia and ciliary components/genes

Cilia are highly-conserved organelles present in metazoans and some unicellular eukaryotes (Chu et al., 2010). They appear as hair-like projections on the surface of a cell, and are responsible for a diverse set of biological processes. For example, cilia conduct fluid movement during left-right body development, facilitate signal transduction in vision and olfaction, function in pathogen clearance in the airway, and are involved in fertility and reproduction (Choksi et al., 2014).

5 Figure 1.3: Molecular structure of the cilium. Figure reprinted from Current Biology, Vol 25/23, Avidor-Reiss, T. and Leroux, M.R., Shared and Distinct Mechanisms of Compartmentalized and Cytosolic Ciliogenesis, Copyright 2015, with permission from Elsevier.

The structure of cilia consists of three main components: the axoneme, transition zone, and basal body. The axoneme contains the microtubule core of the cilium, and the basal body nucleates the growth of the axoneme. In C. elegans, the basal body contains a circular arrangement of doublet microtubules instead of the triplet microtubules typically found in other organisms (Perkins et al., 1986). The transition zone has been defined as a separate compartment between the basal body and axoneme, and plays a role in regulating protein entry into the cilium as well as trafficking of ciliary proteins (reviewed in Szymanska and Johnson (2012)). There are two general classes of cilia: motile cilia and immotile cilia (also referred to as primary cilia). The axoneme contains a core of microtubules, arranged in a 9+2 (motile cilia) or 9+0 (immotile cilia) configuration. Motile cilia contain dynein arms that use energy from ATP hydrolysis to drive movement of axonemes, while immotile cilia lack the motility components but are specialized to sense extracellular signals such as light, signalling molecules, or odorants (reviewed in Choksi et al. (2014)).

The axoneme can be divided into the middle segment and the distal segment, and is assembled from the basal body by the intraflagellar transport (IFT) process. Kinesin-based motors are responsible for anterograde IFT, which builds the axoneme, and dynein-based motors are responsible for retrograde IFT, which disassembles it. In C. elegans, there are two types of anterograde motors: heterotrimeric kinesin-II (kap-1, klp-11, klp-20) and OSM-3 kinesin. These two motors coordinate to move along

6 the middle segment, while OSM-3 kinesin travels along the distal segment alone (Snow et al., 2004). These two motors travel at different rates: kinesin-II transports IFT-A particles at 0.5 µ/s while OSM- 3 kinesin transports IFT-B at 1.1-1.3 µm/s, and the two motors travel in a combined protein complex in the middle segment at 0.7 µ/s (Ou et al., 2005a; Pan et al., 2006).

Once the axoneme is built to its full length, it exists in a constant state of turnover, where tubulin is added to and recycled from the ends of the microtubules (Ishikawa and Marshall, 2011). The process of IFT is crucial for the trafficking of IFT cargo, such as axonemal, ciliary membrane, and signal transduction proteins along the cilium (Ishikawa and Marshall, 2011). The BBSome is a group of conserved BBS proteins that forms a complex with IFT, and is responsible for transporting ciliary cargo, such as Hedgehog signalling components and some G-protein coupled receptors (reviewed in Avidor-Reiss and Leroux (2015)). In C. elegans mutants with disrupted bbs-7 and bbs-8, IFT-A and IFT-B travel separately, suggesting that the BBSome also plays a role in stabilizing the IFT complex (Ou et al., 2005a).

Expression of ciliary genes is controlled by two regulators: the regulatory factor X (RFX) tran- scription factor family and the forkhead transription factor FOXJ1. In C. elegans, there is one RFX transcription factor, DAF-19, which is responsible for the regulation of most ciliary genes (Swoboda et al., 2000). FOXJ1 has been shown to be important in motile ciliogenesis; FOXJ1 knockout mice show loss of axonemes of motile cilia in airway epithelial cells while immotile cilia were unaffected (Brody et al., 2000). C. elegans only contains immotile cilia, and does not contain a FOXJ1 ortholog. However, other transcription factors are also involved in regulating cilia in C. elegans. For exam- ple, the forkhead transcription factor fkh-2 is required to build cell-specific ciliary morphology in AWB neurons, which have branched cilia with irregular morphology (Mukhopadhyay et al., 2007). fkh-2 mutants display morphological defects in their dendrites and cilia, which contain shortened branches or are lacking one or both branches of the cilia structure. fkh-2 is also regulated by DAF- 19. Thus, RFX/DAF-19 is a main regulator of ciliogenesis but there are also mechanisms to regulate cell-specific ciliary structures.

In this project, we use C. elegans as a ciliary model. C. elegans contains 302 sensory neurons, 60 of which are ciliated (White et al., 1986). Some of the ciliated neurons include the amphid, phasmid, and labial neurons. In addition, C. elegans males contain 52 additional ciliated neurons, which are mostly located in the male tail rays. Cilia are responsible for sensory functions in C. elegans, and disruption of ciliary genes results in several common phenotypes. Ciliated neurons normally exhibit a dye-filling phenotype; ciliary gene mutants often fail to fill with dye, and are called dye-filling defective (Dyf) (Perkins et al., 1986). Other common phenotypes include osmotic avoidance abnormal (Osm) where worms fail to avoid regions with high concetrations of sugars and salts (Culotti and Russell, 1978), and chemotaxis abnormal (Che), where worms fail to move towards or away from a gradient of attractants or repellants (Ward, 1973). Ciliary genes encoding structural or functional components of cilia in C. elegans are listed in Table 1.1.

7 Table 1.1: Ciliary genes in C. elegans

Component Genes (C. elegans)

IFT-A che-11, daf-10, dyf-2, ifta-1 IFT-B che-13, dyf-11, osm-1, osm-5, osm-6 Basal body/transition zone dyf-17, dyf-19, mks-1, mks-3, mks-5, mks-6, mksr-1, mksr-2, nphp-1, nphp-2, nphp-4 BBSome bbs-1, bbs-2, bbs-3 (arl-6), bbs-4, bbs-5, bbs-7 (osm-12), bbs-8, bbs-9 Dynein motors dhc-1, dhc-3, dhc-4, dylt-1, dylt-2, dylt-3, xbx-1 Kinesin motors kap-1, klp-11, klp-20, osm-3 Other IFT-related arl-3, arl-13, dyf-3, dyf-6, dyf-13, dyf-18, ift-20, ift-74, ift-81, ifta-2 Other asic-2, che-10, fkh-2, gasr-8, ift-139/ZK328.7, osm-9, rpi-2, xbx- 3, xbx-4, xbx-5, xbx-9

1.6 Ciliopathies

Because of their diversity in function, ciliary dysfunction affects many biological processes and results in a variety of ciliopathies. Many of these ciliopathies are pleiotropic and tend to share symptoms affecting common organs or biological processes (Table 1.2).

Table 1.2: Common symptoms in ciliopathies (reviewed in Hildebrandt et al. (2011); Waters and Beales (2011))

Disease Symmetry defects Kidney dys- Liver dys- Retinal de- and/or polydactyly function function generation

Bardet-Biedl Syn- XXXX drome Polycystic Kidney - XX - Disease Nephronophthisis - XXX Meckel’s Syndrome XX -- Joubert’s Syndrome - X - X

Ciliopathies can be caused by mutations in one or several ciliary genes. For example, Bardet-Biedl syndrome is caused by mutations in the BBS genes (reviewed in Hildebrandt et al. (2011)). Fan et al. (2004) discusses cases of patients with homozygous missense mutations G169A and L170W

8 in ARL6/BBS3, where a single mutation is sufficient to cause disease. Another example is polycystic kidney disease, in which the PKD1 and PKD2 genes are implicated. There are autosomal dominant and autosomal recessive variants of the disease, where the dominant variant affects 1 in 1000 in Europe and the US (reviewed in Hildebrandt et al. (2011)). A 2011 study involving exon sequencing of a cohort of 93 affected families revealed 41 novel missense, nonsense, deletion, insertion, and splice site mutations in PKD1 in addition to over 500 already known mutations (Hoefele et al., 2011).

Ciliopathies can also occur as a result of regulatory defects. For example, retinitis pigmentosa is a group of diseases that causes progressive retinal degeneration, and mutations in 40 genes are associ- ated with disease (Veltel et al., 2008). CRX is a transcription factor heavily involved in rod and cone cell gene expression, and some patients with retinitis pigmentosa had mutations in CRX (Sohocki et al., 1998).

Defects in RFX genes can lead to severe phenotypes. Rfx3-deficient mice exhibit high rates of death during embryogenesis and birth (Bonnafe et al., 2004). Rfx3 is linked to ciliogenesis in mammals, and is a close homolog of daf-19, which regulates ciliogenesis in C. elegans. Embryos and newborns exhibit situs inversus, incomplete inversion defects (heterotaxy), and small body size (Bonnafe et al., 2004), and symmetry defects are a common symptom in human ciliopathies (see Table 1.2).

1.7 Thesis aims and organization

We have established that cilia are important organelles that are involved in many biological processes, and defects in cilia result in ciliopathies. Ciliogenesis is regulated by RFX genes, specifically, daf- 19 in C. elegans by binding to conserved X-box motifs in the promoters of target genes. We are interested in studying the conservation of these X-box motifs.

We proposed a research question as follows: are all ciliary genes regulated by a DAF-19 ortholog via binding to X-box motif? We expect that all ciliary genes should contain conserved X-box motifs in their promoter regions. We limited the scope of our analysis to 32 ciliary genes in 25 nema- tode species with available genome sequences (listed in Table 1.3), and developed a bioinformatics pipeline in order to identify X-box motifs. This pipeline involves identification of ciliary gene or- thologs in each nematode species, improvement of gene annotation (focusing on the 5’ start sites), and X-box motif search in the promoters of each ciliary gene. gene promoters.

Previous X-box motif searches are based on sequence similarity searches, and thus newly identi- fied X-box motifs will be similar to known motifs. Most known X-box motifs in C. elegans are 14bp long. Some exceptions are nph-1, which has a 15bp X-box motif containing a 3bp spacer: GTTGCC AGG GGCAAC (Winkelbauer et al., 2005), and peli-1, which has two 15bp X-box motifs: GTCTCCAATGGCAAC and GTCCTCACAAGTAAC (Chu et al., 2012). Our main research goal is to identify X-box motifs that are different from the known consensus, and explore the type and level

9 of variation that can occur in X-box motifs. We hypothesize that all ciliary genes are regulated by DAF-19 via binding to the X-box motif; thus, all ciliary gene promoters should contain X-box mo- tifs in their promoters. Doing so will allow us to gain a better understanding of the characteristics of X-box motifs.

10 Table 1.3: The nematode species used in this study, including C. elegans and 25 additional species

Species Pathogenicity Genome Size (Mb) Citation

Ancylostoma ceylanicum Parasite of humans, dogs, cats 313 Schwarz et al. (2015) Ascaris suum Parasite of pigs, humans 273 Jex et al. (2011) Brugia malayi Parasite of humans 94 Ghedin et al. (2007) Bursaphelencus xylophilus Parasite of pine trees 75 Kikuchi et al. (2011) angaria Non-pathogenic 106 Mortazavi et al. (2010) Caenorhabditis brenneri Non-pathogenic 190 Caenorhabditis briggsae Non-pathogenic 108 Stein et al. (2003) Non-pathogenic 100 The C. elegans Sequencing Consortium (1998) Caenorhabditis japonica Non-pathogenic 166 Caenorhabditis remanei Non-pathogenic 145 Caenorhabditis sinica Non-pathogenic 132 Caenorhabditis tropicalis Non-pathogenic 79 11 Dilofaria immitis Parasite of dogs 88 Godel et al. (2012) Heterorhabditis bacteriophora Parasite of insects 77 Bai et al. (2013) Haemonchus contortus Parasite of sheep, goats 370 Laing et al. (2013) Loa loa Parasite of humans 91 Desjardins et al. (2013) Meloidogyne hapla Parasite of plants 53 Opperman et al. (2008) Meloidogyne incognita Parasite of plants 86 Abad et al. (2008) Necator americanus Parasite of humans, dogs, cats 244 Tang et al. (2014) Onchocerca volvulus Parasite of humans 96 Unnasch and Williams (2000) Panagrellus redivivus Non-pathogenic 65 Srinivasan et al. (2013) Pristionchus exspectatus Non-pathogenic 178 Rodelsperger et al. (2014) Pristionchus pacificus Non-pathogenic 172 Dieterich et al. (2008) Strongyloides ratti Parasite of rats 43 Trichinella spiralis Parasite of mammals 64 Mitreva et al. (2011) Trichuris suis Parasite of pigs, humans 71 Jex et al. (2014) The thesis is organized as follows. In Chapter 2, we describe our bioinformatics pipeline for identi- fying ciliary gene orthologs, improving gene annotation, and identifying typical and atypical X-box motifs. In Chapter 3, we present results of our ciliary gene identification and annotation in 25 ne- matode species. In Chapter 4, we present results of our X-box motif search, including both typical and atypical X-box motifs. In the final chapter, we will provide a conclusion and propose future directions.

12 Chapter 2

Development of a bioinformatics pipeline for annotating ciliary genes and identifying X-box motifs

2.1 Introduction

A bioinformatics pipeline was developed to identify candidate X-box motifs in 25 nematode species. Using C. elegans as a model, we curated a set of ciliary genes and limited our analysis to orthologs of these ciliary genes in the other nematode species. This pipeline involves identification and annotation of ciliary genes and identification of candidate X-box motifs. Genome assemblies and annotations were obtained from WormBase, and RNA-seq data was downloaded from the NCBI SRA database. Since the genomes studied in this analysis are in varying stages of completion, it is necessary to ensure the quality of results at each step. In particular, it is crucial to evaluate gene annotations, especially near the translational start site because this defines the location of the promoter and affects the X-box motif search. We applied this pipeline to each nematode species. The results of our ciliary gene annotation are discussed in Chapter 3, and the results of the X-box motif search are discussed in Chapter 4.

2.2 Criteria for a high-confidence ciliary gene set

There may be thousands of proteins present and functioning in cilia (Gherman et al., 2006). To limit the scope of this project, we focused on a subset of genes selected based on the following criteria:

13 1. Gene encodes a structural or molecular component of cilia 2. Gene is expressed in some or all ciliated neurons 3. Gene is regulated by DAF-19 via binding to the X-box motif1

These genes form our set of high-confidence ciliary genes, and limits our dataset to genes that are integral and specific to ciliary function. In C. elegans, we selected 33 high-confidence ciliary genes for our analysis, described in Table 2.1. These genes are representative of the different components of cilia and suitable for the scope of this project. Some of the genes are not as well-studied and do not fulfill all three criteria, but still show evidence of being DAF-19-regulated ciliary genes. For example, bbs-4 and bbs-9 do not have a reported expression profile, but were included because they are components of the conserved BBSome. In addition, the specific molecular characterization of che-2 is not known, but it is expressed in cilia, results in abnormal cilium morphology and pheno- types such as Dyf, Osm, and Che, and has a validated X-box motif (Fujiwara et al., 1999; Swoboda et al., 2000). Because these genes represent core components of cilia, we expect these genes will be conserved across all nematode species and will be regulated by DAF-19 via conserved X-box motifs.

1Ideally, X-box motifs should be validated by mutagenesis. These cases are listed in Table 2.1 as "X-box dependent". However, most X-box motifs have not been validated in this way and thus we also include genes whose expression is abolished when DAF-19 is disrupted. These cases are listed in Table 2.1 as "DAF-19 dependent".

14 Table 2.1: List of high confidence ciliary genes in C. elegans

Gene Expression Profile X-box Sequence Molecular Component Citation

arl-6/C38D4.8 Some CSNs GTTTCCATGGTTAC BBSome component Fan et al. (2004) bbs-1/Y105E8A.5 Most or all CSNs GTTCCCATAGCAAC (Expression is BBSome component Ansley et al. (2003); Efimenko et al. DAF-19 dependent) (2005); Chen et al. (2006) bbs-2/F20D12.3 Most or all CSNs GTATCCATGGCAAC (Expression is BBSome component Ansley et al. (2003); Efimenko et al. DAF-19 and X-box dependent) (2005) bbs-4/F58A4.14 BBSome component Reviewed in Inglis et al. (2007) bbs-5/R01H10.6 Most or all CSNs GTCTCCATGGCAAC BBSome component Li et al. (2004); Lee et al. (2011) bbs-8/T25F10.5 Most or all CSNs GTACCCATGGCAAC (Expression is BBSome component Ansley et al. (2003); Efimenko et al. DAF-19 dependent) (2005); Blacque et al. (2004); Ou et al. (2005a) bbs-9/C48B6.8 GTTTCCATGACAAC BBSome component Chen et al. (2006); Lee et al. (2011); Blacque et al. (2004)

15 che-2/F38G1.1 Most or all CSNs GTTGTCATGGTGAC (Expression is Undergoes IFT Fujiwara et al. (1999); Swoboda et al. DAF-19 and X-box dependent) (2000) che-11/C27A7.4 Most or all CSNs ATCTCCATGGCAAC (Expression is IFT-A component Qin et al. (2001); Efimenko et al. DAF-19 dependent in amphids and (2005) phasmids) che-13/F59C6.7 Most or all CSNs GTTGCTATAGCAAC (Expression is IFT-B component Haycraft et al. (2003); Schafer et al. DAF-19 dependent) (2003) dyf-1/F54C1.5 Some CSNs GTTACCATGGATAT IFT-B component Blacque et al. (2005); Ou et al. (2005a) dyf-2/ZK520.3 Some CSNs GTTACCAAGGCAAC (Expression is Involved in IFT Efimenko et al. (2006) DAF-19 dependent) dyf-3/C04C3.5 Some CSNs GTTTCTATGGGAAC (Expression is Involved in IFT Ou et al. (2005b); Murayama et al. DAF-19 dependent) (2005) dyf-5/M04C9.5 Some CSNs GTTACCATAGAAAC (Expression is Regulator (kinase) Chen et al. (2006); Burghoorn et al. DAF-19 dependent) (2007) dyf-11/C02H7.1 Most or all CSNs GTCTTCATGACAAC (Expression is IFT-B component Ou et al. (2007); Wei et al. (2013); DAF-19 dependent) Kunitomo and Iino (2008) dyf-13/C27H5.7 Some CSNs GTCTCCATAGCAAC IFT component Blacque et al. (2005) dyf-18/H01G02.2 Most or all CSNs GTCTCCATGACAAC (Expression is Regulator (kinase) Phirke et al. (2011) DAF-19 dependent) List of high confidence ciliary genes in C. elegans

Gene Expression Profile X-box Sequence Molecular Component Citation

dylt-2/D1009.5 Most or all CSNs GTTGCCATGACAAC (Expression is IFT motor (dynein light chain) O’Rourke et al. (2007); Efimenko et al. DAF-19 and X-box dependent) (2005) ift-20/Y110A7A.20 Some CSNs GTCTCTATAGCAAC (Expression is IFT-B component Follit et al. (2006); Blacque et al. DAF-19 dependent) (2005); Ou et al. (2007) ifta-1/C54G7.4 Most or all CSNs GTTGCCATGGCAAT Associates with IFT-A Blacque et al. (2006) mks-1/R148.1 Some CSNs GTCACCATAGGAAC (Expression is Basal body/transition zone Efimenko et al. (2005); Bialas et al. DAF-19 dependent) (2009) mks-6/K07G5.3 Some CSNs GTTGCCATAGCGAC Basal body/transition zone Williams et al. (2011) mksr-1/K03E6.4 Most or all CSNs GTTCCCTTGGCAAC (Expression is Basal body/transition zone Bialas et al. (2009) DAF-19 dependent) mksr-2/Y38F2AL.2 Some CSNs GTTGCCGTGGCAAC (Expession is Basal body/transition zone Bialas et al. (2009) DAF-19 dependent) nphp-2/Y32G9A.6 Most or all CSNs GTTGTCAGGGTAAC Basal body/transition zone Warburton-Pitt et al. (2012) 16 odr-4/Y102E9.1 Some CSNs ATCGTCATCGTAAC (Expression is Odorant receptor, responsible Dwyer et al. (1998); Efimenko et al. DAF-19 and X-box dependent) for localization of some odorant (2005) receptors (e.g. ODR-10) to cilia osm-1/T27B1.1 Most or all CSNs GCTACCATGGCAAC (Expression is IFT-B component Qin et al. (2001); Cole et al. (1998) DAF-19 and X-box dependent) osm-5/Y41G9A.1 Most or all CSNs GTTACTATGGCAAC (Expression is IFT-B component Qin et al. (2001) DAF-19 and X-box dependent) osm-6/R31.3 Most or all CSNs GTTACCATAGTAAC (Expression is IFT-B component Qin et al. (2001); Cole et al. (1998); DAF-19 and X-box dependent) Swoboda et al. (2000) osm-12/Y75B8A.12 Most or all CSNs GTTGCCATAGTAAC (Expression is Stabilizes IFT complex, BBSome Blacque et al. (2004) DAF-19 dependent) component tub-1/F10B5.4 Most or all CSNs ATCTCCATGACAAC (Expression is Required for localization of Efimenko et al. (2005); Brear et al. DAF-19 dependent) GPCRs to cilia in AWB, ASK (2014) neurons xbx-1/F02D8.3 Most or all CSNs GTTTCCATGGTAAC (Expression is IFT motor (dynein light chain) Efimenko et al. (2005); Schafer et al. DAF-19 and X-box dependent) (2003) 2.3 Searching for ciliary gene orthologs in nematode species

Ciliary genes were identified in each species using the pairwise ortholog-detection software InPara- noid (Remm et al., 2001) using C. elegans ciliary genes as a reference. InParanoid uses a reciprocal best BLAST hit approach, and creates initial ortholog groups using these BLAST ortholog pairs. The reasoning behind this approach is that orthologs should score higher with each other than with any other sequence in the genome. This also makes sense considering the time scale of speciation and gene duplication: there has been less time for paralogs to occur after speciation of these nematode species than before speciation. For example, C. elegans and C. briggsae diverged from a common ancestor approximately 100 million years ago (Stein et al., 2003), which is a short time frame consid- ering that the first eukaryote emerged at least 2.7 billion years ago (Reviewed in Cooper and Haus- mann (2000), Chapter 1). Next, the algorithm adds in-paralogs (also referred to as co-orthologs) to the initial ortholog groups, and resolves overlapping ortholog clusters. Since InParanoid uses protein sequences for comparisons, good quality gene annotations are required in the target species. This means that genes may be missed as we are working with many draft genomes with incomplete gene annotations.

For genes that are not found by InParanoid, many can be recovered by using homology-based gene- idenfication software genBlastG (She et al., 2011) and GeMoMa (Keilwagen et al., 2016), which uses C. elegans proteins to find corresponding genes in the genomic sequence of the target species. The main difference between genBlastG and GeMoMa is that genBlastG uses the entire query protein sequence to search in the target genome, while GeMoMa uses individual exons. Each approach has their own benefits; GeMoMa theoretically produces more accurate gene models when the evolution- ary distance is smaller (where the exon structures are more likely to be conserved) and genBlastG theoretically produces more accurate gene models when exons may not be conserved. Using both tools allows us to gain the benefits of both these approaches and minimize the number of false neg- atives.

After using this combined approach, some orthologs still cannot be identified. Organisms without potential orthologs for some ciliary genes could either have undergone species-specific gene loss, or contain missing genes due to technical errors (e.g. incomplete genome assembly). This will be further discussed in Section 2.6.

2.4 Annotation of 5’ start sites of ciliary genes

In order to accurately study cis-regulatory elements in promoter regions, it is important to accurately define promoter regions of the genes. We define the promoter region as upstream of the translational start codon ATG, consistent with previous X-box motif analyses (Swoboda et al., 2000; Chen et al.,

17 2006; Efimenko et al., 2005; Blacque et al., 2006). Thus, it is necessary to annotate 5’ start sites of ciliary genes. We established several criteria for high confidence 5’ start sites, as follows: 1. RNA-seq data supports the first intron. This implies that the first exon will be directly upstream and the start codon of the gene will be correct within a few base pairs. 2. Pairwise alignment does not show a large (> 50 a.a.) gap at the 5’ end of the gene. 3. First 100 a.a. of the protein sequence is well conserved among nematode species in a multiple sequence alignment. Many of the ciliary genes have well-conserved 5’ protein sequences. An example of this is dyf-5, as shown in Figure 3.20. However, some genes do not show high sequence-level conservation at the 5’ end. An example of this is mks-1 (see Chapter 3, Section 3.3.14, Figure 3.28). These cases are challenging because it is difficult to determine from the sequence alone whether a gene is an ortholog with a divergent sequence, or whether the gene has been misclassified as an ortholog. We present comprehensive results of our ciliary gene identification and annotation in Section 3.3.

2.5 Identification of X-box motifs

We restricted the X-box motif search to gene models with high-confidence 5’ start sites. We defined the search space as both strands of the 2kb promoter region upstream of the translational start site (Chen et al., 2006). We used a “three-punch” strategy for identifying X-box motifs: HMMER, TFM- scan, and regular expressions, where we first used HMMER to identify candidate X-box motifs, and applied the other methods for genes where no candidate X-box motifs were found. We found that not all ciliary genes with well-defined start sites have promoters that contain typical X-box motifs. Because of the substantial evidence supporting the regulation of ciliary genes by RFX transcription factors, we hypothesize that these promoters contain X-box motifs that are more divergent from the consensus. We refer to these motifs as “atypical X-box motifs”, and our main research goal is to identify these atypical X-box motifs in ciliary genes.

2.5.1 Identification of typical X-box motifs using HMMER

We used the Hidden Markov Model (HMM)-based software HMMER (Eddy, 2009) to build a profile based on a training set of X-box motifs in the C. elegans high-confidence ciliary gene set. Since there are a limited number of known C. elegans X-box motifs, and all are 14bp long, this method restricts the results to sequences highly similar to the limited training set. As expected, most candidate X- box motifs found by HMMER are 14bp long. (There are a few cases of HMMER hits as short as 10bp; these were excluded from further analysis.) Using this HMM, we searched for putative X-box motifs in the promoter regions of ciliary gene orthologs. HMMs consider both the frequency of each nucleotide in each position, as well as the frequency of nucleotide transitions (e.g. the probability

18 of transitioning from A → T from positions 2-3). This means that out of the three approaches used, HMMER finds motifs that are most similar to the C. elegans training set. The consensus sequence of X-box motifs in C. elegans is shown in Figure 2.1.

Figure 2.1: LOGO of X-box motifs in the C. elegans training set

2.5.2 Identification of atypical X-box motifs using TFM-scan

In order to look for atypical X-box motifs, we allow for variation in length and sequence composition. We do this by splitting the C. elegans training sets into half-motifs (either 6bp or 7bp in length), and searching for pairs of half-motifs, allowing some flexible nucleotides in between (see Figure 2.2). In particular, we search for either 6bp half-motifs with 1-3 flexible nucleotides in between, or 7bp half-motifs with 0-1 flexible nucleotides in between. Like HMMER, TFM-scan considers the frequency of nucleotides at each position, but unlike HMMER does not consider nucleotide transition frequencies. This allows us to search for motifs that follow the generally symmetrical X-box motif pattern, while allowing some variation in sequence.

TFM-scan does not allow non-ATCG characters in DNA sequences; therefore, promoters containing gaps (Ns) were omitted from this step. These promoters were still included in HMMER and regular expression searches.

19 Figure 2.2: Searching for half-motifs using TFM-scan

2.5.3 Identification of atypical X-box motifs using regular expressions

We additionally use a regular expression-based method to search for atypical X-box motifs. We use the three regular expressions described in Efimenko et al. (2005): a relaxed consensus (RYYNYY WW RRNRAC), refined consensus (GTHNYY AT RRNAAC), and average consensus (RTHNYY WT RRNRAC). Regular expressions do not consider the frequency of any nucleotides at any given position or nucleotide transition frequencies. The Python re module was used for this search.

2.5.4 Identification of atypical X-box motifs using manual inspection

After applying the three previously described methods, there remain 15 genes with high-confidence 5’ start sites but with no putative X-box motifs. We manually inspected the promoters of these genes for sequences resembling X-box motifs.

20 2.6 Reconstructing gene models with RNA-seq (TBLASTN) analysis

In order to study the evolution of ciliary genes, we need to distinguish between real gene loss and genes that are missing due to technical reasons, such as an incomplete or defective genome assembly. For example, an osm-5 ortholog could not be found in M. incognita even though it was identified in all other nematode species studied. osm-5 is an IFT-B component and is required for ciliogenesis; osm-5 mutants have truncated cilia that lack the distal segment (Qin et al., 2001). Since osm-5 was found in all other nematode species studied and is critically important in maintaining properly formed cilia, it follows that M. incognita should also contain an osm-5 gene. If M. incognita does indeed contain osm-5, it should be expressed in ciliated cells. Following this logic, we may be able to find evidence of osm-5 by searching in RNA-seq short reads. To achieve this, we used TBLASTN with default settings to find matching sequences, aligning these reads to their corresponding C. elegans reference coordinates. From this we generated a consensus sequence by using the most common amino acid at each position of the assembly. We present the results of this analysis in Section 3.4.

2.7 Discussion

In this section we developed a bioinformatics pipeline to identify candidate X-box motifs in 25 ne- matode species. This pipeline covers the process of ortholog identification using both reciprocal protein-protein (BLASTP) comparisons and protein-DNA (TBLASTN) comparisons. Although re- ciprocal BLASTP search is an effective method of finding orthologs, it relies on high-quality and complete gene annotation. Our analysis includes species that have recently been sequenced and are not completely assembled or annotated, so we also use TBLASTN-based methods (genBlastG and GeMoMa) to find orthologs that are misannotated or have not been annotated. There may also be ciliary gene orthologs that are in regions of the genome that have not been assembled. These were excluded from further analysis since we do not have the genome sequence for these genes. However, we can verify whether these genes are present in the genome by searching for matching sequences in the RNA-seq short reads. An example of this is demonstrated in Section 3.4.3.

It is also important to accurately define the promoter region, which means annotating 5’ start sites of each gene. We established some criteria for evaluating the annotation of these genes. We use RNA-seq intron junction data when available, which provides evidence to support the first intron. In addition, we require that the first 100 a.a. of a multiple sequence alignment of the protein sequence of each gene is conserved.

The next stage involves searching for X-box motifs. We exhaustively search for X-box motifs by first searching for “typical” X-box motifs, which we define as X-box motifs found by HMMER. Due to the hidden Markov model-based algorithm HMMER uses, which takes into account nucleotide frequency at each position as well as individual nucleotide transitions between position, these X-box

21 motifs tend to be highly similar to the input sequences used to build the HMM profile. This allows us to find X-box motifs that are similar to the known consensus. However, we are primarily interested in finding X-box motifs that differ from our expectation. We searched for these “atypical” motifs by allowing flexibility in motif length and sequence. We did this by searching for half-motifs with flex- ible nucleotides in between (TFM-scan) as well as using regular expressions. Both of these methods are less stringent than HMMER, since both do not consider nucleotide transition frequencies, and regular expressions do not even consider nucleotide frequencies at each position. For genes that still do not have candidate X-box motifs, we manually inspected promoter regions for sequences resem- bling X-box motifs. In addition to these methods, other algorithms can also be used to search for motifs, such as Gibbs motif sampling, which iteratively samples sequences until it converges on a motif, or Expectation Maximization-based methods (implemented in the MEME suite) (Lawrence et al., 1993; Bailey and Elkan, 1995). Our analysis resulted in a list of both typical and atypical X-box motifs, most of which have not been previously identified.

22 Chapter 3

Curation of ciliary genes in pathogenic and non-pathogenic nematodes

3.1 Introduction

Since our study focuses on cis-regulatory elements in the promoter, we must accurately define the promoter region. In this section, we identify and annotate ciliary genes in 25 nematode species. We provide results of our ciliary gene identification and annotation for each gene, ordered by phyloge- netic relationship determined in Section 3.2.

3.2 Phylogenetic analysis of nematodes

We determined the relationship between the 26 nematode species included in our study using “core genes” as a marker for evolutionary distance. We define core genes as orthologs that are present in a single copy in all 26 nematode species. To do this, we used InParanoid to identify pairwise orthologous relationships between C. elegans and each of the other 25 species, and filtered for the set of genes that had a 1-to-1 orthologous relationship between C. elegans and every other genome. Figure 3.1 shows a distribution of 1:1:...:1 ortholog clusters, and how many species are represented in each cluster.

23 Figure 3.1: Distribution of species represented in 1:1:...:1 ortholog clusters

24 Figure 3.2: Distribution of species represented in 1:1:...:1 ortholog clusters

There were only 10 genes that had a single-copy ortholog present in all 26 species, listed in Table 3.1. Of these 10 genes, 2 had annotations that contained stop codons in some species and were omitted.

The small number of orthologs conserved across all of the nematode species in this study is likely re- flective of the quality of genome assembly and annotation —many of these genomes are incomplete, and have only computational gene annotation. In particular, we think the large number of genes that are only conserved across a few (<10) species is caused by poor genome quality of some species. This is also supported by the fact that no ribosomal genes were found to be present in all 26 species, given the current annotations. We expect that if species with poor genome quality were removed, the distribution will not have a bimodal appearance and instead the peak around 20 genomes will be higher, and there will be more genes that are conserved across all species.

From the remaining 8 genes, we generated multiple sequence alignments using MUSCLE, which were concatenated and used to build a phylogenetic tree (Figure 3.3). We used MEGA6 using the Maximum Likelihood algorithm with 100 bootstrap replications to build the tree. Although it does not make sense to show branch lengths of bootstrapped trees, we have generated a tree with branch lengths included to demonstrate similarities between clades (Figure 3.4).

25 Table 3.1: Core genes present in single copy across all 26 nematode species

Gene Name Gene Function

H19N07.1a/erfa-3 Eukaryotic release factor (GSPT1/GSPT2) homolog

Y40B1B.8 Mitochondrial carrier protein (SLC25A46) homolog H27A22.1b Glutaminyl-peptide cyclotransferase (QPCT) and glutaminyl- peptide cyclotransferase-like (QPCTL) homolog F46C5.8/rer-1 Retention in ER (RER1) homolog C30A5.3 Phocein (MOB4) homolog, may play role in membrane traffick- ing F46H6.1/rhi-1 Rho GDP-dissociation inhibitor T25B9.9 Phosphogluconate dehydrogenase (PGD) homolog. This gene was omitted because the D. immitis annotated protein se- quence contains stop codons. T21B10.7/cct-2 Chaperonin containing TCP1 complex member (CCT2) ho- molog R10E11.3a/usp-46 Cysteine protease, part of deubiquitinating enzyme. This gene was omitted because the C. tropicalis annotated protein se- quence contains stop codons.

26 Figure 3.3: Phylogenetic tree showing relationships between nematode species

27 Figure 3.4: Phylogenetic tree showing relationships between nematode species, with branch lengths included to show similarities between clades

The species tree accurately reflects the general clade relationships of the nematodes (Table 1.3), as well closer relationships such as species with a common genus.

3.3 Identification and annotation of ciliary gene orthologs

3.3.1 Curation of arl-6 orthologs in nematodes arl-6 (also named bbs-3) is a gene implicated in Bardet-Biedl Syndrome type 3 (Fan et al., 2004). It is expressed in ciliated neurons such as amphids, phasmids, and labial neurons, where it localizes to

28 both the cytosol and the ciliary axoneme, where it undergoes intraflagellar transport (IFT) (Fan et al., 2004). arl-6 and the other bbs genes travel along the axoneme as part of the BBSome complex.

We identified arl-6 orthologs in 25 nematode species, where H. contortus contains a duplication. The protein sequence of arl-6 is well-conserved, as shown in Figure 3.5.

29 Table 3.2: Curation of arl-6 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE20225 WormBase gene model Yes 86.0 C. tropicalis C38D4.8_ortholog genBlastG gene model Yes 96.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN25639 WormBase gene model Yes 96.0 C. sinica Csp5_scaffold_00221.g7353.t1 genBlastG gene model Yes 98.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG17978 WormBase gene model Yes 96.0 C. elegans C38D4.8 - - - C. japonica CJA16169b genBlastG gene model Yes 91.0 Multiple in-frame Ms C. angaria Cang_2012_03_13_00011.g860.t1 Manual gene model Yes 45.2 Gap 700bp upstream H. bacteriophora Hba_15833 genBlastG gene model Yes 68.0 No RNA-seq data, but first 100a.a. are 30 conserved H. contortus HCOI02112500.t1 WormBase gene model Yes 66.0 H. contortus HCOI01745600.t1 WormBase gene model Yes 66.0 A. ceylanicum Acey_s0005.g2729.t1 WormBase gene model Yes 68.0 N. americanus NECAME_06332 WormBase gene model Yes 53.5 Gap 300bp upstream and 600bp downstream of gene; 3’ end of gene truncated, protein is 99a.a. long P. pacificus PPA02356 WormBase gene model Yes 61.0 P. exspectatus scaffold1496-EXSNAP2012.2 WormBase gene model Yes 61.0 No RNA-seq data, but first 100a.a. are con- served; gap <1kb upstream S. ratti SRAE_1000327800 WormBase gene model Yes 61.0 P. redivivus g20378.t1 WormBase gene model Yes 50.5 No RNA-seq data, but first 100a.a. are conserved B. xylophilus BUX.s00658.13 WormBase gene model Yes 47.0 M. incognita C38D4.8_ortholog - No 40.4 Beginning of contig Curation of arl-6 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig908.frz3.gene1 genBlastG gene model Yes 56.0 No RNA-seq data, but first 100a.a. are conserved A. suum GS_18516 WormBase gene model Yes 61.0 D. immitis nDi.2.2.2.t06199 Manual gene model Yes 46.0 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) O. volvulus OVOC9580 WormBase gene model Yes 43.0 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) B. malayi Bm2664 WormBase gene model Yes 50.0 L. loa EFO15011.1 genBlastG gene model No 40.2 Sparse RNA-seq data; first 100a.a. not well conserved; gene doesn’t begin with ATG; gap 800bp upstream T. spiralis EFV50599 - No 13.5 Gap in 5’ end of alignment 31 T. suis C38D4.8_ortholog Manual gene model Yes 46.2 t_spiralis_wormbase_EFV50599 1 MLCTHNVDDAIMG------LFIILIHIQNRN----LTEEACWSSQYSEII PSITSSVTDVISGER 55 c_angaria_manual_Cang_2012_03_13_00011.g860.t1 1 -MGL--TISSLFNRLFGKKQVRILMVGLDAAGKTTILYKLKLGEIVTT--I PTIGFNVETVEYKNI 61 t_suis_manual_C38D4.8 1 -MGL--TLSSLFSRLFGKRQVRILMVGLDAAGKTTILYKLKLGEIVTT--I PTIGFNVETVDYKNI 61 p_exspectatus_wormbase_scaffold1496-EXSNAP2012.2 1 -MGFFSTLSSIFG--VGKKSCNIVVVGLDNAGKSTILNALRSEDTRVSQVV PTVGMTVTTFSGTGV 63 p_pacificus_wormbase_PPA02356 1 -MGFFSTLSSIFG--VGKKSCNIVVVGLDNAGKSTILNALRSEDTRVSQVV PTVGMTVTTFSGTGV 63 c_remanei_wormbase_CRE20225 1 -MGFFSSLSSLFG--MGKKNVSIVVVGLDNSGKTTILNHLKTPDTRSQQIV PTVGHVVTHFSTQNI 63 c_japonica_genblastg_CJA16169b 1 -MGFLSSLSSLFG--MSRKDVNIVVVGLDNSGKTTVLNQLKPPETRSQQIV PTVGHVVTNFTTQNL 63 c_elegans_C38D4.8 1 -MGFFSSLSSLFG--LGKKDVNIVVVGLDNSGKTTILNQLKTPETRSQQIV PTVGHVVTNFSTQNL 63 c_sinica_genblastg_Csp5_scaffold_00221.g7353.t1 1 -MGFFSSLSNLFG--MGKKDVNIVVVGLDNSGKTTILNQLKTPETRSQQIV PTVGHVVTNFSTQNL 63 c_brenneri_wormbase_CBN25639 1 -MGFLSSLSNLFG--MGKKDVNIVVVGLDNSGKTTILNHLKTPETRSQQIV PTVGHVVTNFSTQNL 63 c_briggsae_wormbase_CBG17978 1 -MGFLSSLSNLFG--MGKKDVNIVVVGLDNSGKTTILNQLKTPETRSQQIV PTVGHVVTNFSTQNL 63 c_tropicalis_genblastg_C38D4.8_ortholog 1 -MGLLSSLSNLFG--MGKKDVNIVVVGLDNSGKTTILNQLKTPETRSQQIV PTVGHVVTNFSTQNL 63 n_americanus_wormbase_NECAME_06332 1 -MGFLSSLSQMLG--VGRRQVNVIVVGLDNSGKTTMLNYLRTPETRTSQIA PTVGYSVTNFVTENF 63 h_bacteriophora_genblastg_Hba_15833 1 -MGFFSSLSQMLG--IGKRQVSVIVVGLDNSGKTTILNFLRTPETRTSQIV PTVGYTITNFSIENF 63 h_contortus_wormbase_HCOI02112500.t1 1 -MGFLSTLSQILG--VGRKQVNVIVVGLDNSGKTTMLNYLRTPETRTSQIA PTVGYSVTNFTTESF 63 h_contortus_wormbase_HCOI01745600.t1 1 -MGFLSTLSQILG--VGRKQVNVIVVGLDNSGKTTMLNYLRTPETRTSQIA PTVGYSVTNFTTESF 63 a_ceylanicum_wormbase_Acey_s0005.g2729.t1 1 -MGFLSSLSQMLG--VGRRQVNVIVVGLDNSGKTTMLNYLRTPETRTTQIA PTVGYSVTNFVTENF 63 p_redivivus_wormbase_g20378.t1 1 -MGFVNLLQNIFG--NNPRSVQVLVLGLDNSGKTTVVNHLKNPQ-QADQAV PTVGQNVEKFVSHNL 62 l_loa_genblastg_EFO15011.1 1 ------RLLWA--VSRKQVNILMIGLDNSGKSTIINQMKPHEDQVTQVMPSIGCSIEKFIFNNT 56 d_immitis_manual_nDi.2.2.2.t06199 1 -MGLLSQISIALG--VTRRQVNILMIGLDNSGKSTIINQMKAQNDQVTQVV SSIGCTVEKFIFNNT 63 o_volvulus_wormbase_OVOC9580 1 -MGLLSQISIALG--VSRKQVNVLMIGLDNSGKSNIINEMKAKDDQVTQVL PSIGCTTEKFIFNNT 63 b_malayi_wormbase_Bm2664 1 -MGLLSQISIALG--VSRKQVNILMIGLDNSGKTTIINKMKKEEDRVTQIT PTIGYTTEKFIFNNT 63 b_xylophilus_wormbase_BUX.s00658.13 1 -MGLFSTIAKFFA--PHQHPCEVLVLGLDNSGKTSILNQLKPPDNQLSAVT PTVGFNVEKFNAANI 63 s_ratti_wormbase_SRAE_1000327800 1 -MGFFSSIGNFLG--FGKRNVNILVIGLDNSGKTTILNHLKSQEVQSMTIV PTVGYNVEKFTNANF 63 a_suum_wormbase_GS_18516 1 -MGLLNQLTSIFG--MSKKQVNILVIGLDNSGKTTILNQLKPPEAQTTQVV PTVGYNVDKFTSSNM 63 m_hapla_genblastg_MhA1_Contig908.frz3.gene1 1 -MGLLKALSSLIN--NSGKPVDILVLGLDNSGKTTILNQLKPPETQSASIT PTVGYNVEKFSAAGM 63 m_incognita_genblastg_C38D4.8_ortholog 1 ------FRQTYILVLGLDNSGKTTILNQLKPAETQSASIT PTVGYNVEKFSAAGM 49 t_spiralis_wormbase_EFV50599 56 SCIVMSVRTSSELKLHHIPTAVAKSSITSAS-TPKSTSLNFLMKSG------100 c_angaria_manual_Cang_2012_03_13_00011.g860.t1 62 SFTVWDVGGQDKIR------PLWRHY-FQNTQGLIFVVDSNDKERI ------100 t_suis_manual_C38D4.8 62 SFTVWDVGGQDKIR------PLWRHY-FQNTQGLIFVVDSNDRERV ------100 p_exspectatus_wormbase_scaffold1496-EXSNAP2012.2 64 NFSAFDMSGQGKYR------NLWDAY-YAKAEGIMFVVDSTDRL------100 p_pacificus_wormbase_PPA02356 64 NFSAFDMSGQGKYR------NLWDAY-YAKAEGIMFVVDSTDRL------100 c_remanei_wormbase_CRE20225 64 SFQAFDMAGQMKYR------SAWESF-FSSASGVIFVLDSSDRI------100 c_japonica_genblastg_CJA16169b 64 SFHAFDMAGQMKYR------STWESY-FHTSQGVIFVLDSSDRV------100 c_elegans_C38D4.8 64 SFHAFDMAGQMKYR------STWESY-FHSSQGVIFVLDSSDRL------100 c_sinica_genblastg_Csp5_scaffold_00221.g7353.t1 64 SFHAFDMAGQMKYR------STWESY-FHSSQGVIFVLDSSDRL------100 c_brenneri_wormbase_CBN25639 64 SFHAFDMAGQMKYR------STWESY-FHSSQGVIFVLDSSDRL------100 c_briggsae_wormbase_CBG17978 64 SFHAFDMAGQMKYR------STWESY-FHSSQGVIFVLDSSDRV------100 c_tropicalis_genblastg_C38D4.8_ortholog 64 SFHAFDMAGQMKYR------STWESY-FHSSQGVIFVLDSSDRL------100 n_americanus_wormbase_NECAME_06332 64 SFTAFDMAGQEKKE------LICRVHAFIKYVDIIYQMICNE------99 h_bacteriophora_genblastg_Hba_15833 64 CFTAFDMAGQGKYR------NLWETY-YAGSQAILFVVDSSDRL------100 h_contortus_wormbase_HCOI02112500.t1 64 SFTAFDMAGQGKYR------NLWETY-YLNAQAVIFVVDSADRL------100 h_contortus_wormbase_HCOI01745600.t1 64 SFTAFDMAGQGKYR------NLWETY-YLNAQAVIFVVDSADRL------100 a_ceylanicum_wormbase_Acey_s0005.g2729.t1 64 SFTAFDMAGQGKYR------NLWETY-YVNSQAVIFVVDSADRL------100 p_redivivus_wormbase_g20378.t1 63 TFNTYDMAGQSKYR------NLWETH-YKSMHGIIFVVDSTDRMR------100 l_loa_genblastg_EFO15011.1 57 TFMVHDMSGQGKYR------NLWENY-YNEVDGVAFVVDSNDRLRMAVVRD------100 d_immitis_manual_nDi.2.2.2.t06199 64 TFLIYDMSGQGKYR------NLWENY-YSEVDGVVFVIDSNDRL------100 o_volvulus_wormbase_OVOC9580 64 TFLVHDMSGKGKYR------NLWENY-YKEVDGVVFVVDSSDRL------100 b_malayi_wormbase_Bm2664 64 TFLVHDMSGQGKYR------NLWENY-YKEVDGVVFVIDSNDRL------100 b_xylophilus_wormbase_BUX.s00658.13 64 AFTAYDMSGQSKYR------TLWETQ-YKTAHAIIFVVDSTDRL------100 s_ratti_wormbase_SRAE_1000327800 64 TFTAFDMSGQSKYR------NLWENY-YKNVQGIIFVVDSVDRL------100 a_suum_wormbase_GS_18516 64 SFSAYDMSGQGKYR------NLWETY-YKEVDGIIFVVDSSDRL------100 m_hapla_genblastg_MhA1_Contig908.frz3.gene1 64 TFSAYDMSGQSRYR------NLWETQ-YKNVIGIIFVVDASDRL------100 m_incognita_genblastg_C38D4.8_ortholog 50 TFSAYDMSGQSRYR------NLWETQ-YKNVNGIIFVVDASDKLRI AVARDELWMLLD 100

Figure 3.5: Multiple sequence alignment of first 100a.a. of arl-6 orthologs. Among 25 nematode genomes, 26 arl-6 orthologs are found, and none are not found. Note: H. contortus contains two arl-6 genes, and both genes have high confidence 5’ start sites.

3.3.2 Curation of bbs-1 orthologs in nematodes bbs-1 is a gene implicated in Bardet-Biedl Syndrome type 1. It is expressed in ciliated neurons, and is a component of the BBSome (Ansley et al., 2003; Wei et al., 2012). BBS-1 is involved in turnaround of IFT particles at the ciliary tip; in bbs-1 mutants, IFT-A and IFT-B fail to associate in retrograde IFT and IFT-B accumulates at the ciliary tip (Wei et al., 2012). We identified bbs-1 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.6).

32 Table 3.3: Curation of bbs-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE20678 WormBase gene model Yes 56.0 C. tropicalis Csp11.Scaffold630.g19055.t1 WormBase gene model Yes 58.4 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN25891 genBlastG gene model Yes 63.1 C. sinica Csp5_scaffold_00639.g13883.t1 WormBase gene model Yes 57.8 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG08744 WormBase gene model Yes 57.0 No RNA-seq data for first intron, but first 100a.a. are conserved C. elegans Y105E8A.5 - - - C. japonica CJA08243a+CJA10503a WormBase gene model Yes 71.0 C. angaria Cang_2012_03_13_00946.g15068.t1 WormBase gene model Yes 59.4

33 H. bacteriophora Hba_05402+Hba_05401 GeMoMa gene model Yes 34.3 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00065300.t1/ No ortholog found - - - Low sequence similarity (WormBase PID: 23, GeMoMa PID: 28.1, genBlastG PID: 20.5) A. ceylanicum Acey_s0012.g1641.t1 GeMoMa gene model Yes 32.0 N. americanus NECAME_05590+NECAME_05589 WormBase gene model Yes 29.2 Gap 1.2kb upstream P. pacificus No ortholog found - - Low sequence similarity (genBlastG PID: 21, other predictions not found) P. exspectatus scaffold95-EXSNAP2012.19 genBlastG gene model No 23.7 First 100a.a. not conserved S. ratti SRAE_X000204900 WormBase gene model Yes 25.4 P. redivivus g8979.t1 WormBase gene model Yes 23.6 Small gap 300bp upstream P. redivivus g16322.t1 WormBase gene model Yes 25.0 B. xylophilus BUX.s01254.190 WormBase gene model Yes 22.9 M. incognita Minc05801 WormBase gene model Yes 31.1 M. hapla MhA1_Contig1797.frz3.gene11 WormBase gene model Yes 25.5 No RNA-seq data, but first 100a.a. are conserved Curation of bbs-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_23233 Manual gene model Yes 36.4 No RNA-seq data for first intron, but first 100a.a. are conserved D. immitis nDi.2.2.2.t07722 WormBase gene model Yes 33.3 No RNA-seq data for first intron, but first 100a.a. conserved O. volvulus OVOC11256 WormBase gene model Yes 22.9 B. malayi Bm7232 WormBase gene model Yes 33.3 L. loa EJD75430.1 WormBase gene model Yes 33.3 T. spiralis No ortholog found - - - Low sequence similarity (GeMoMa PID: 20.4, genBlastG PID: 23.9) T. suis M514_11136/No ortholog found - - - Low sequence similarity (WormBase PID: 25.3, GeMoMa PID: 26.4, genBlastG PID: 26.7) 34 p_redivivus_wormbase_g16322.t1 1 ------MAFNSDERRWVRALWEPALNIET IKQNIAVGNFSS- -GEASL I VVD-SPYGAAKRLV 54 p_redivivus_wormbase_g8979.t1 1 ------MGLDDEGPKRWIRLIFEANTNLHTSKANVTTANITG-DGEFKLI VADQSPAGC--RIK 55 b_xylophilus_wormbase_BUX.s01254.190 1 --MQSSNRFGDMRSQKRWLRALYEPLAGVQTTKSCITIGDVHS-DGDYKLI LVDEKQPPP--KMV 60 c_remanei_wormbase_CRE20678 1 ------MARSIEENSRKWTAPVLMKSGEVFCPSTCVTLGPIYN-ESESKLI IGTGGHRGMNMKLR 58 c_briggsae_wormbase_CBG08744 1 ------MSKPDPAPNRKWTNPVMMTACEIHCPSTCVALGPVLT-NSDSKLI CAHGGNRGMNLRLT 58 c_sinica_wormbase_Csp5_scaffold_00639.g13883.t1 1 ------MSFKPDPHAKWTQPVSLNNCEVHCPSTCVGLGPVLITNAESKLI VAHGGTRGMNLRLT 58 c_tropicalis_wormbase_Csp11.Scaffold630.g19055.t1 1 ------MAKPEEYRSKWTSPVVMRDCEVHCPSTCVALGTVYNGDDGTKLI I AHSGHGGMNMQLR 58 c_brenneri_genblastg_CBN25891 1 ------MAKPLEVTQSKWTAPVVLQDHEIHCPSTCVTLGPLLA-DNESMLI IAHGGHRGVNMKLK 58 c_angaria_wormbase_Cang_2012_03_13_00946.g15068.t1 1 ------MNESMKAKWTAPVMINNCEIHCPITCVTLGDISL-DNDVKLI VATGGHRGLNMKLK 55 c_japonica_wormbase_CJA08243a+CJA10503a 1 ------MAKTEPAPNSKWTTPVMMKDCEIHCSSTCVALGPLFT-DGDSKLI VAQGGHRGLNMKLK 58 c_elegans_Y105E8A.5 1 ------MAKPVNVNQSKWTVPVLLKECEIYCPSTCVAFGPILS-DNDSKLI IAHGGHRGVNMKLK 58 s_ratti_wormbase_SRAE_X000204900 1 ------MSLKWVGALWDPSAKISTRASLVVLSDVLG-DGDYKLL LVDMSQNLP--KLK 49 m_hapla_wormbase_MhA1_Contig1797.frz3.gene11 1 MCPHNQTDRIIVNTSSKWTRALWEPAAGVQTSKDFVALIDLQG-RGDYQLI LVDESSLPF--KLK 62 m_incognita_wormbase_Minc05801 1 ------MNVNTSSKWTRALWEPAANVQTSKDFVVLSDLQG-RGDYQLI LVDESSLPF--KLK 53 p_exspectatus_genblastg_scaffold95-EXSNAP2012.19 1 ------DNQFILS------RLNTNANCVSFVDTQA-DGDIKLV VADLGTSRYEMKLK 44 o_volvulus_wormbase_OVOC11256 1 ------MNECSSCWVQTPNTANLRLNTIQSCVCLADFHA-NGDYKLA IGDFGTEKYGIRLK 54 b_malayi_wormbase_Bm7232 1 ------MKKSENSSSKWVSALQATSLGLNTLPSCVCLADLYG-DGDYKLV IGDFGTEKYDIQLK 57 d_immitis_wormbase_nDi.2.2.2.t07722 1 ------MVMNITSANWVSALNATSLGFNTLPSCVCLADLHG-DNDYKLV IGDFGTEKYDIRLK 56 l_loa_wormbase_EJD75430.1 1 ------MKKSTSTSKWVLALNATSLGLNTLPSCVCLADLHG-DGDYKLV IGDFGTEKYDIRLK 56 a_suum_manual_GS_23233 1 ------MSNKWVSALHDYSAGINSLPTCIGLSDLYG-DGDYKLI IGDIGTGKYNMRLK 51 h_bacteriophora_gemoma_Hba_05402+Hba_05401 1 ------MATAEMTNKWVSALTDDSAGITTFASCISLSDMYG-DGDTKLV LAHIGSSKFNMRLK 56 a_ceylanicum_gemoma_Acey_s0012.g1641.t1 1 ------MASSETTKWMSALSDDQAGLFTFFNCVCLSDMYG-DGDTKLV AAHVGSSKFNMRLK 55 n_americanus_wormbase_NECAME_05590+NECAME_05589 1 ------MSALADDQAGVFTFYNCVCLSDMYG-DGDTKLV LAHVGSSKFNMRLK 46 p_redivivus_wormbase_g16322.t1 55 HMKGLTLASDVALPSEPVGIVPFVVDNG--VAPCVAVAIQSSLLVYRN------100 p_redivivus_wormbase_g8979.t1 56 LFKGLTLVGDTFLSDPPVGIVTFRSEDA--MVPCVAVAVGGSLLIYR------100 b_xylophilus_wormbase_BUX.s01254.190 61 LYFGLTPQTQTTLSDRPAAIVTFYSESG--QSPCIAVASGNG------100 c_remanei_wormbase_CRE20678 59 VFKDVDQDEECTLAESPTAIMHFVNEAR--AVPTIAVAAGSSLL------100 c_briggsae_wormbase_CBG08744 59 VFSGLSQEMEATLADPPTQIMHFVNEKS--TMPNIAVAAGPSFT------100 c_sinica_wormbase_Csp5_scaffold_00639.g13883.t1 59 VFSALDQEMESSLADPPTAIMHFCNEKS--TIPNVAVAAGPSIL------100 c_tropicalis_wormbase_Csp11.Scaffold630.g19055.t1 59 VFEGLSQHSENTLADAPVSICHFVNELS--SVPSVAVASGPSLL------100 c_brenneri_genblastg_CBN25891 59 VFQGLSQHSESSLADMPTAIMHFVNELM--AKPLEVTQSKWTAP------100 c_angaria_wormbase_Cang_2012_03_13_00946.g15068.t1 56 VFKGVSPHSESSLADVPTAIVHFVNDFS--SMPSIAVASGPALLIYK------100 c_japonica_wormbase_CJA08243a+CJA10503a 59 VFQGIEPHSDSSLADMPTAIVHFMNELS--SIPSVAVAAGPSLL------100 c_elegans_Y105E8A.5 59 VFQQLEQLSESSLADMPTALVHFINDLS--SIPSIAVAAGPSLL------100 s_ratti_wormbase_SRAE_X000204900 50 MFKGINPVAESALTSHPTGLVSFYNSMTNPPTPCIGLSTGNSVLIYRSLKP ------100 m_hapla_wormbase_MhA1_Contig1797.frz3.gene11 63 LFKGLKPIVESALAECPTGIVGFACDSGNSDSTCLAVA------100 m_incognita_wormbase_Minc05801 54 LFKGLKPIVESALAECPTGIVGFACDSGNSDSTCLAVACGSSLLIYR------100 p_exspectatus_genblastg_scaffold95-EXSNAP2012.19 45 VFKALTKIGEQTMIESPISAISFNNEAT--PTNTIGVAAGSTLFIYKALKP FYKENST 100 o_volvulus_wormbase_OVOC11256 55 IFKGFQVIVDNSLNDLPSALISFNSENVKPNPSSLALACDTTILIY------100 b_malayi_wormbase_Bm7232 58 VFRGLQIIGANVLSDLPAALVSFNNENIQPSLASLAVACGSSI------100 d_immitis_wormbase_nDi.2.2.2.t07722 57 VFRGLQLIGENVLSDLPSALVSFNNENIQSSLSSLAVACGSSIL------100 l_loa_wormbase_EJD75430.1 57 VFRGLQVIGENVLSDLPSALVSFNNENVQPNLSSLAVACGPAIL------100 a_suum_manual_GS_23233 52 VFKGLTLIGESVLTDVPSAVVPFINELMQPTLPSIAIASGPSVLIYKNL------100 h_bacteriophora_gemoma_Hba_05402+Hba_05401 57 VYKGVSVIAESALADVPTAVVSFNNEKI--TLPSLAIASGAFIRIY------100 a_ceylanicum_gemoma_Acey_s0012.g1641.t1 56 VYKGVTVVGESALADMPTAVVSFYNEKV--TLPAIGVASGSYIRIYK------100 n_americanus_wormbase_NECAME_05590+NECAME_05589 47 VFKGVTVVGESALADLPTAVVSFYNEKI--SLPAIGVASGSYIRIYKNLKP FYQYN-- 100

Figure 3.6: Multiple sequence alignment of first 100a.a. of bbs-1 orthologs. Among 25 nematode genomes, 26 bbs-1 orthologs are found, and none are not found. Note: P. redivivus contains two bbs-1 genes, and both genes have high confidence 5’ start sites.

3.3.3 Curation of bbs-2 orthologs in nematodes bbs-2 is a gene implicated in Bardet-Biedl Syndrome type 2. It is expressed in ciliated neurons, and is a component of the BBSome (Wei et al., 2012). We identified bbs-2 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.6).

35 Table 3.4: Curation of bbs-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE11012 WormBase gene model Yes 82.0 C. tropicalis Csp11.Scaffold629.g12796.t1 WormBase gene model Yes 85.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN30879 WormBase gene model Yes 81.0 C. sinica Csp5_scaffold_00067.g3198.t1 WormBase gene model Yes 81.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG17712 WormBase gene model Yes 77.0 C. elegans F20D12.3 - - - C. japonica CJA15357 WormBase gene model Yes 78.0 C. angaria Cang_2012_03_13_00813.g14312.t2 WormBase gene model Yes 52.9 Sparse RNA-seq data for this gene, but first 100a.a. are conserved

36 H. bacteriophora Hba_21429+Hba_21428+ GeMoMa gene model Yes 33.3 No RNA-seq data for first intron, but first Hba_21427+Hba_21426 100a.a. are conserved H. contortus HCOI02034600.t1 genBlastG gene model Yes 36.0 A. ceylanicum Acey_s0484.g2315.t3+ genBlastG gene model Yes 37.8 RNA-seq suggests upstream exon but low Acey_s0484.g2318.t1 coverage N. americanus NECAME_08774 WormBase gene model Yes 40.9 No RNA-seq data for first intron, but first 100a.a. are conserved P. pacificus F20D12.3_ortholog GeMoMa gene model Yes 27.2 P. exspectatus F20D12.3_ortholog GeMoMa gene model Yes 27.2 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_X000085100 WormBase gene model Yes 20.7 P. redivivus g5059.t1 WormBase gene model Yes 23.9 Sparse RNA-seq data for this gene, but first 100a.a. are conserved; gap ~2kb upstream B. xylophilus BUX.s00116.72 WormBase gene model Yes 13.8 M. incognita Minc17570 WormBase gene model Yes 17.5 Curation of bbs-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig1288.frz3.gene5+ WormBase gene model Yes 17.5 No RNA-seq data, but first 100a.a. are MhA1_Contig1288.frz3.gene6+ conserved MhA1_Contig1288.frz3.gene7+ MhA1_Contig1288.frz3.gene8 A. suum F20D12.3_ortholog genBlastG gene model Yes 24.0 No RNA-seq data for first intron, but first 100a.a. are conserved D. immitis nDi.2.2.2.t09549 WormBase gene model No 2.6 End of contig O. volvulus OVOC12073 WormBase gene model Yes 23.4 B. malayi Bm3471 WormBase gene model Yes 21.0 No RNA-seq data for first intron, but first 100a.a. are conserved L. loa EJD75163.1 WormBase gene model Yes 22.6 Sparse RNA-seq data for this gene, but first 100a.a. are conserved; gap just over 2kb upstream

37 T. spiralis F20D12.3_ortholog GeMoMa gene model No 14.4 RNA-seq suggests different first exon; first 100a.a. partially conserved T. suis M514_06651 WormBase gene model Yes 13.0 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) d_immitis_wormbase_nDi.2.2.2.t09549 1 -----MDGVNCIKVGKFNTYDKLIFCGGNCAVWGLDIDGKDAFWTVTGDNV LSLCLSD--VDNDG-NNELIVGS 66 t_spiralis_gemoma_F20D12.3_ortholog 1 ------MMSVGVAFTVKLNHK------I APKAVTLGHFDDN--QCSLVAVD 37 t_suis_wormbase_M514_06651 1 ------MSMQSMSLVVSFTLKLGRR------I SPHAVALGCFDIGQ-TQSLALVD 42 s_ratti_wormbase_SRAE_X000085100 1 ------MSNNYQLVSIFDYRLNYH------T IYKCAVGGKFQENG-KQQIIIVT 41 c_angaria_wormbase_Cang_2012_03_13_00813.g14312.t2 1 MTDGDRTPEEQ----EISEVPKIDVELEVVASSTFSLNQR------I LPNCLISAILEPNG-RESVLAVS 59 c_japonica_wormbase_CJA15357 1 -MDGDQTPEEQIEISENDPMENLEANVELVSAFAYSLDQR------VMEGCVITAILEPKG-HETLVAVS 62 c_remanei_wormbase_CRE11012 1 -MDGDQTPEEQVEIGESDETSKFDDNVELTSVFSFSLDQR------I MEGCVISAVLEPDG-QETIVAVS 62 c_elegans_F20D12.3 1 -MDGDRTPEEQIEIAESDQGPQLDDNVELANVFSYSLDQR------I MEGCVISAILEPRG-LETIVAVS 62 c_tropicalis_wormbase_Csp11.Scaffold629.g12796.t1 1 -MDGDQTPEEQVEIGESDQGPQFDDQVELIDVFKYSLDQR------I MDGCVISAVLEPRG-QDTIVAVS 62 c_brenneri_wormbase_CBN30879 1 -MDGDQTPEEQIEIGDQEQNPQLDDNVELTIAFKFSLDQR------VMEGCIISAVLEPRG-QSTIVAVS 62 c_briggsae_wormbase_CBG17712 1 -MDGDETPEEQFEIGDSSQVPHFDENVELKSVFNFSLNQR------I MEGCVVSAILEPRG-VETIVAVS 62 c_sinica_wormbase_Csp5_scaffold_00067.g3198.t1 1 -MDGDQTPEEQIEIGDVSQVPQFDENVELKSVFNFSLDQR------I MEGCVVSAILEPRG-AETIVAVS 62 p_exspectatus_gemoma_F20D12.3_ortholog 1 ------MKLKPTFSYSLNSR------I LAGCIVSARIEPTATKETLIAVS 38 p_pacificus_gemoma_F20D12.3_ortholog 1 ------MKLKPTFSYSLNSR------I LAGCIVSARIEPTATKETLIAVS 38 h_bacteriophora_gemoma_Hba_21429+Hba_21428+Hba_21427+... 1 ------MSSEESNTPTKNENLQDLSPTFNFALNHR------I LPLCAVSAKVVPEE-KETLIAVT 52 n_americanus_wormbase_NECAME_08774 1 ------MSEETEEQVVVRNDSFELASVFSFSLNQR------I LPHCATSARIEPDT-RETLVVVS 52 a_ceylanicum_genblastg_Acey_s0484.g2315.t3+Acey_s0484.g... 1 ------MTEEAEEEVLGNDSFELASLFSFSLNHR------I LPRCATSARIEPDS-RETLVAVS 51 h_contortus_genblastg_HCOI02034600.t1 1 ------MTEEAELEVISNDTFDLASAFSFSLNHR------I LPNGATSARIEPDT-KETLVAVT 51 a_suum_genblastg_F20D12.3_ortholog 1 ------IESMLGLRTAFTFSLLHR------T VPKCAQAGVLDDSA-RLRLVVAT 41 o_volvulus_wormbase_OVOC12073 1 ------MLSLRPDFSYGLSHH------I VPGCARFGKIDETG-ESKLIAAT 38 b_malayi_wormbase_Bm3471 1 ------MLGLRSEFTYRISHH------I VPGCARFGIIDETG-QLQLIVAT 38 l_loa_wormbase_EJD75163.1 1 ------MLNLRSEFTYKISHH------I VPGCARFGIIDDTG-QLQLIVAT 38 b_xylophilus_wormbase_BUX.s00116.72 1 ------MKGSLKEIVNYSFGHR------L AKNATSIGYVDQSE-VEKIVVGT 39 p_redivivus_wormbase_g5059.t1 1 ------MATSSTEEPVVEQYTSKASFKNVFSFGFSGH------V LPKGASFGSFDDTR-RTQIVIAT 54 m_hapla_wormbase_MhA1_Contig1288.frz3.gene5+MhA1_... 1 ------MSFGQGNCSLKTAFGFSFAHR------L V--AASSGSFDETG-HEQLVVGT 42 m_incognita_wormbase_Minc17570 1 ------MSVGQGNCSLKTAFGFSFAHR------L V--AAASGSFDESG-HEQLVVGT 42 d_immitis_wormbase_nDi.2.2.2.t09549 67 ESYDIRIYKND-L------LLYEITEADAVTGLCD------LGDGIF 100 t_spiralis_gemoma_F20D12.3_ortholog 38 NVGKVHSLKNA-E------CMDDSTLPRFRLNQNASAMMNINRMVKIV ATGY------LKASDTFD 90 t_suis_wormbase_M514_06651 43 TAGKV-FLYSI-ETVDSRTTTDADQFAAAPGFRLNPSASTVLNVNQQVKML SSGR------PTLQD--- 100 s_ratti_wormbase_SRAE_X000085100 42 PENKI-VLQNEID------IVHNITEKINCI KELR------ITDPLKENDEGYD 82 c_angaria_wormbase_Cang_2012_03_13_00813.g14312.t2 60 ASNKV-FVKDT-E------ISLNINEPIRCMTIAP------FGAGYD 92 c_japonica_wormbase_CJA15357 63 VTNKI-IIKDT-E------TSLHITETIRCI AAAP------FGEGYD 95 c_remanei_wormbase_CRE11012 63 VTNKV-IIKDT-E------TSLNITETIRCI AAAP------FGEGYD 95 c_elegans_F20D12.3 63 VTNKI-IIKDK-E------TSLNITETIRCI AAAP------FGDGYD 95 c_tropicalis_wormbase_Csp11.Scaffold629.g12796.t1 63 VTNKI-VIKDT-E------TSLNITETIRCI AAAP------FGDGYD 95 c_brenneri_wormbase_CBN30879 63 VTNKI-IIKDT-E------ASLNITETIRCI AAAP------FGEGYD 95 c_briggsae_wormbase_CBG17712 63 MTNKV-IIKDV-E------SPLNITETIRCI AAAP------IGDGYD 95 c_sinica_wormbase_Csp5_scaffold_00067.g3198.t1 63 MTNKI-IIKDT-E------SSLNITETIRCI AAAP------FGEGYD 95 p_exspectatus_gemoma_F20D12.3_ortholog 39 ASNKV-IIQGV-E------SSLHISDRVRCL SLLP------LEDNRD 71 p_pacificus_gemoma_F20D12.3_ortholog 39 ASNKV-IIQGV-E------SSLHISDRVRCL SLLP------LEDNRD 71 h_bacteriophora_gemoma_Hba_21429+Hba_21428+Hba_21427+...53 VSNKV-ILKDS-E------SSLHIPDKIKCI TTVP------FGNGYD 85 n_americanus_wormbase_NECAME_08774 53 ASNKI-ILRNN-E------SSLHIPEKIKCI TTVP------FGQGYD 85 a_ceylanicum_genblastg_Acey_s0484.g2315.t3+Acey_s0484.g... 52 ASNKV-ILRNN-E------SSLHIPDKIKCI TTVP------FGMGYD 84 h_contortus_genblastg_HCOI02034600.t1 52 ASNKI-ILRNN-E------STLHIPDKIKCI TTVP------FGDGYD 84 a_suum_genblastg_F20D12.3_ortholog 42 VTNKV-IIHET-D------TVLNINERIISL AVAP------LGTSYD 74 o_volvulus_wormbase_OVOC12073 39 ATNKI-IIHDN-E------TVLNINEKIRAL EVTT------LNKTHD 71 b_malayi_wormbase_Bm3471 39 TTNKV-IIHDN-E------TVLNINEKIRAL EVTT------LDKTHD 71 l_loa_wormbase_EJD75163.1 39 TTNKV-IIHDN-E------IVLNINEKIRAL EVTT------FDKTYD 71 b_xylophilus_wormbase_BUX.s00116.72 40 ASNKV-ILQNS-E------AIFHVNEPITWI DVLP------PKQRITSELNCD 78 p_redivivus_wormbase_g5059.t1 55 NTNKI-VLQSS-E------AIFHVNENINVI KAIK------FTSSDEHD 89 m_hapla_wormbase_MhA1_Contig1288.frz3.gene5+MhA1_... 43 ETGKI-CLQGS-E------SIFHVNDRINFL SVYKRNKKTKEQSTKSSDNSDYD 88 m_incognita_wormbase_Minc17570 43 ETGKI-CLQGS-E------SVFHVNDKINFL SVYKHNRKIKEQSIKSSDNSEYD 88 d_immitis_wormbase_nDi.2.2.2.t09549 ------t_spiralis_gemoma_F20D12.3_ortholog 91 AIVIGTATHI------100 t_suis_wormbase_M514_06651 ------s_ratti_wormbase_SRAE_X000085100 83 ILLIGTIEICGCGSAMWG------100 c_angaria_wormbase_Cang_2012_03_13_00813.g14312.t2 93 YVVIGTDA------100 c_japonica_wormbase_CJA15357 96 CIVIG------100 c_remanei_wormbase_CRE11012 96 CIVIG------100 c_elegans_F20D12.3 96 CIIIG------100 c_tropicalis_wormbase_Csp11.Scaffold629.g12796.t1 96 CILIG------100 c_brenneri_wormbase_CBN30879 96 CIIIG------100 c_briggsae_wormbase_CBG17712 96 CIVIG------100 c_sinica_wormbase_Csp5_scaffold_00067.g3198.t1 96 CIVIG------100 p_exspectatus_gemoma_F20D12.3_ortholog 72 AIVVGTDSHLIVYDAHDNVTLFQREVPDG 100 p_pacificus_gemoma_F20D12.3_ortholog 72 AIVVGTDSHLIVYDAHDNITLFQREVPDG 100 h_bacteriophora_gemoma_Hba_21429+Hba_21428+Hba_21427+...86 YIVVGTESQIIVYDF------100 n_americanus_wormbase_NECAME_08774 86 YIVVGTESQIMVYDF------100 a_ceylanicum_genblastg_Acey_s0484.g2315.t3+Acey_s0484.g... 85 YIVVGTESQVLVYDFH------100 h_contortus_genblastg_HCOI02034600.t1 85 YIVVGTESQVLVYDFH------100 a_suum_genblastg_F20D12.3_ortholog 75 LILIGTTSSILAYDVQKNTNIFRKDI--- 100 o_volvulus_wormbase_OVOC12073 72 VIIVGTVSGVLVYDVYNNTALIQRELIDG 100 b_malayi_wormbase_Bm3471 72 AIIVGTISGLLIYDAYNNTTLIQREIIDG 100 l_loa_wormbase_EJD75163.1 72 VIIIGTVCGVLIYDAYNNTTLIQRELVDG 100 b_xylophilus_wormbase_BUX.s00116.72 79 LVVVGTDKSLIVFDVYNNKTIF------100 p_redivivus_wormbase_g5059.t1 90 VVVIGTTSSLL------100 m_hapla_wormbase_MhA1_Contig1288.frz3.gene5+MhA1_... 89 IVIIGTNVGNVD------100 m_incognita_wormbase_Minc17570 89 IVIIGTNNSLMA------100

Figure 3.7: Multiple sequence alignment of first 100a.a. of bbs-2 orthologs. Among 25 nematode genomes, 25 bbs-2 orthologs are found, and none are not found.

3.3.4 Curation of bbs-4 orthologs in nematodes bbs-4 is a gene implicated in Bardet-Biedl Syndrome type 4. It is expressed in ciliated neurons, and is a component of the BBSome (Wei et al., 2012). We identified bbs-4 orthologs in 23 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.8).

38 Table 3.5: Curation of bbs-4 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE25560 WormBase gene model Yes 63.8 C. tropicalis Csp11.Scaffold629.g9328.t1 GeMoMa gene model Yes 59.6 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN05677 GeMoMa gene model Yes 65.0 C. sinica Csp5_scaffold_00091.g3989.t1 WormBase gene model Yes 69.6 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG09893 WormBase gene model Yes 63.6 C. briggsae CBG10029 WormBase gene model Yes 69.2 Sparse RNA-seq data for this gene, but first 100a.a. are conserved C. elegans F58A4.14a - - - C. japonica CJA19013 WormBase gene model Yes 68.6

39 C. angaria Cang_2012_03_13_00614.g12780.t1 WormBase gene model Yes 48.1 H. bacteriophora F58A4.14a_ortholog genBlastG gene model Yes 28.7 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00399400.t1 Manual gene model No 30.8 End of contig H. contortus HCOI01431900.t1 WormBase gene model Yes 30.8 A. ceylanicum Acey_s0288.g1477.t3 WormBase gene model Yes 27.1 RNA-seq suggests upstream exon but low coverage N. americanus NECAME_05770 Manual gene model No 30.6 RNA-seq suggests different first exon P. pacificus PPA17341 Manual gene model Yes 21.4 P. exspectatus F58A4.14a_ortholog GeMoMa gene model Yes 18.3 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_X000167600 WormBase gene model No 18.7 First 100a.a. not conserved P. redivivus g20191.t1 - No 15.4 Sparse RNA-seq data for this gene, and first 100a.a. not supported B. xylophilus BUX.s00351.430 WormBase gene model Yes 20.2 Curation of bbs-4 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. incognita Minc13696+Minc13695 WormBase gene model Yes 22.6 No RNA-seq data for first intron, but first 100a.a. are conserved M. hapla MhA1_Contig262.frz3.gene14/No - - - Low sequence similarity (WormBase PID: ortholog found 20.2, GeMoMa PID: 27, genBlastG PID: 19.3) A. suum GS_14882 Manual gene model Yes 17.8 RNA-seq suggests upstream exon but low coverage D. immitis nDi.2.2.2.t04966 WormBase gene model No - WormBase annotation contains stop codons O. volvulus OVOC4269 WormBase gene model Yes 16.9 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) B. malayi Bm4542 WormBase gene model Yes 17.8 L. loa EFO18218.1 WormBase gene model Yes 21.2 No RNA-seq data for first intron, but first 100a.a. are conserved 40 T. spiralis EFV51566/No ortholog found - - - Low sequence similarity (WormBase PID: 9, GeMoMa PID: 20, genBlastG PID: 15.6) T. suis M514_04135 - No 10.1 RNA-seq suggests different first exon p_redivivus_wormbase_g20191.t1 1 MTSTDDSVPVDEATEPLPEAP------AEET-PATAAPASPPAEE--- --TPVKDANDE--- 47 s_ratti_wormbase_SRAE_X000167600 1 ---MSESNEIEKNLQNYDDEDIKSIDEEENEEDS-EPMISEDELELSP--- --RRRISSSKKNIN 56 t_suis_wormbase_M514_04135 1 ------MLGPLA-PVKVTINQAPSIN--- --AFVHGTKSQ--- 28 b_xylophilus_wormbase_BUX.s00351.430 1 MFENLDVNDEDVGNQSTSGEM------VQKRYSSQEAFV--- --KT------35 m_incognita_wormbase_Minc13696+Minc13695 1 MIKEFIQVSNLFNAQKMSTTD------LTIHPAEDESET--- --RRLFEIKQK--- 42 p_exspectatus_gemoma_F58A4.14_ortholog 1 -MDDEDNSKE------PKKGNVSIPEP--- --KH------22 p_pacificus_manual_PPA17341 1 -MEDEDNSSLLNNVPISSREE------LKKGNISIPEP--- --KH------33 a_suum_manual_GS_14882 1 -MAANPKENMENADGAVSVGDMYAGTTSEPLHKE-SDLESTRATTPDD--- --EFTEKKPKK--- 55 o_volvulus_wormbase_OVOC4269 1 MSDDEEENGNNNRSDSTNSMNIGVKNKFITLENDLGQDNVERNGDTVPKVA SGKIVAKTKKR--- 62 l_loa_wormbase_EFO18218.1 1 MSDDGEKKTDSTELDS------DMAVKVARE--- --KATAKTRER--- 34 b_malayi_wormbase_Bm4542 1 MSDDGDENTTNDEDRTVSFLNIGVKNKPTSPRSD-LDALQGRTEFNGDMLL KGKVIAKSKKR--- 61 h_bacteriophora_genblastg_F58A4.14_ortholog 1 -MEISSQNKRSGQARTL------ILVIIEDAPPSG--- - -RKQPKWNSK- - - 37 h_contortus_wormbase_HCOI01431900.t1 1 -MADVEELSKEEELDDSNQNE------EIVSDRAPEP--- - - IPPSKWKAN- - - 39 h_contortus_manual_HCOI00399400.t1 1 -MADVEELSKEEELDDSNQNE------EIVSDRAPEP--- - -TPALKWKAN- - - 39 a_ceylanicum_wormbase_Acey_s0288.g1477.t3 1 -MAEPEQLTKEPEMDECNGEE------SNEAPEASEQSE--- - -QSAPKWKPN- - - 41 n_americanus_manual_NECAME_05770 1 -MTEVDSLVKEAEMDECNGEQ----SDEVIKSER-SAPYSAPKPESSE--- - -TSAPKWKSN- - - 51 c_angaria_wormbase_Cang_2012_03_13_00614.g12780.t1 1 -MEASNEDEII-GVNSVENEESADINQEQKPEDS-ADNLNSSSASASH--- --DRVSSAPRR--- 54 c_tropicalis_gemoma_Csp11.Scaffold629.g9328.t1 1 -MEESNQDEII-GVDGIPNED-----DPQPEPVQ-PENGEPTGEAAAS--- --ERKS--VKR--- 47 c_remanei_wormbase_CRE25560 1 -MEESNKDEII-GTDVLPNEK-----DDPQPEPL-ELETKEEVSVKPP--- --DRPP--PKI--- 47 c_brenneri_gemoma_CBN05677 1 -MEDSNQDEII-GVDPIPNEE------KAVEQ-PEEKSTGVQDGVV--- --TPVPSFKRV--- 46 c_elegans_F58A4.14a 1 -MEASNQDEII-GTDVIPNEQ------DNPEEVVPEPTSLDVPPPPP--- --ERAPSAPKR--- 48 c_japonica_wormbase_CJA19013 1 -MEESNKDEIIGTDGVLSKED------EAEQQ-ESVENETPEAPVV--- --ERAPSAPKR--- 47 c_sinica_wormbase_Csp5_scaffold_00091.g3989.t1 1 -MEASNQDEVI-GVVTIPNEE------DVEPEAN-GENGNAATADPVP--- --ERTP--LKR--- 46 c_briggsae_wormbase_CBG10029 1 -MEESNQDEIIGVVTVHNGED------GDRH-GVETNGVDVEPVP--- --DRPP--PKR--- 44 c_briggsae_wormbase_CBG09893 1 -MEESNQDEIIGVVTVHNGED------GEQN-GMETNGVEAEPIP--- --DRPP--PKR--- 44 p_redivivus_wormbase_g20191.t1 48 ------PPATPAAAAQSSPPTFGEEAFRSSKLRPSMGTMAITSK NH-TLYNMFVQHKI 98 s_ratti_wormbase_SRAE_X000167600 57 ELVLNNTNIQQVLKLPHIFSNNQEIYFLNCEKNTNKCK--QLIEQS------100 t_suis_wormbase_M514_04135 29 --T------KFATFDRCNWLLHIMYVTKDWTGCL--KLADNVLSTTNDY-CEYGHYVKGMV 78 b_xylophilus_wormbase_BUX.s00351.430 36 ------EL--PDKMNQQIHKAYINGQIEECK--MLIMELLMKSNPAVCEYALLIRALI 83 m_incognita_wormbase_Minc13696+Minc13695 43 --FT------RYKPPDRLNPTIYRLFIQQKFSQCK--QKIKEILDDTP EMLCEYPLLLRGQI 94 p_exspectatus_gemoma_F58A4.14_ortholog 23 ------LNYYNGQLHLALISGDYDDCK--AKASEMEAETGGR-NVYAKIVRGTI 67 p_pacificus_manual_PPA17341 34 ------LNYYNGQLHLALISGDYDDCK--AKASEMETETGGR-NVYAKIVRGTI 78 a_suum_manual_GS_14882 56 --SVIVDPKKKIPELTTFDRRNFLLHQYYIQRDFSSCK--ALIKEMLDE------100 o_volvulus_wormbase_OVOC4269 63 ------ELHGFDRRNFILHHHFIRRDFNACK--ALIKEMADECGEL------100 l_loa_wormbase_EFO18218.1 35 ------ELQGFDRENFILHQRFIRRDFIACK--ALIKEMADKYS EM-CEYPFCIRGKI 83 b_malayi_wormbase_Bm4542 62 ------ELQGLDRRNFILHQRFVRRDFNACK--ALVKEMADEYS EM-C------100 h_bacteriophora_genblastg_F58A4.14_ortholog 38 --L------ELSNFEASNSLLHNLFIQQDYIGCK--SLIG-MLEQYS NH-CEFAFFMRGLI 86 h_contortus_wormbase_HCOI01431900.t1 40 --F------ELLNFETSNALLHKLFVMGDYVGCK--SLIGEMLEQYS NQ-CEYAFYMRGLI 89 h_contortus_manual_HCOI00399400.t1 40 --F------ELLNFEASNALLHKLFVMGDYVGCK--SLIGEMLEQYS NQ-CEYAFYMRGLI 89 a_ceylanicum_wormbase_Acey_s0288.g1477.t3 42 --F------ELINFEASNALLHRLYVQGDYIGCK--SLIGEMLEQCS NQ-SEYAFYMRGVI 91 n_americanus_manual_NECAME_05770 52 --F------ELINFESSNALLYRLYVQGDYIGCK--SLIGEMLEQSS NQ-SEYAFYMRGV- 100 c_angaria_wormbase_Cang_2012_03_13_00614.g12780.t1 55 --I------ELLDCNSLNALMYHYFSQEEYAECK--SLIGEVLSKYAGR-NEFALNL---- 100 c_tropicalis_gemoma_Csp11.Scaffold629.g9328.t1 48 --V------ELIDCNAQNTLIFHYFSKGDYIECK--SLIREVQEKYK EK-NEAAHHVRGLI 97 c_remanei_wormbase_CRE25560 48 --V------EVYDCNASNALMYQYFGQRDYVECK--SLIGEIQSKYNEK-NEAALHVRGLI 97 c_brenneri_gemoma_CBN05677 47 ------ELFDCNSMNGLMFHYFAQGDYIECK--SLIGEIQSKYL DK-NETAHHVRGLI 95 c_elegans_F58A4.14a 49 --V------EILDCNSLNGLMYHYFAQGDYIECK--SIIGEIQSKYL ER-NEAAFHVRGLI 98 c_japonica_wormbase_CJA19013 48 --V------EIVDCNSLNALMFHYFAQGDYIECK--SLIGEIQSKYP EK-NETAFHVRGLI 97 c_sinica_wormbase_Csp5_scaffold_00091.g3989.t1 47 --V------EVIDCNSLNGLMYHYFAQGDYIECK--SLIGEIQGKYMER-NEAAFHVRGLI 96 c_briggsae_wormbase_CBG10029 45 --V------EILDCNSLNGLMYHYFAQGDYIECK--SLIGEIQSKYK EK-NEAAFHVRGLI 94 c_briggsae_wormbase_CBG09893 45 --V------EILDCNSLNGLMYHYFAQGDYIECK--SLIGEIQSKYK EK-NEAAFHVRGLI 94 p_redivivus_wormbase_g20191.t1 99 DE------100 s_ratti_wormbase_SRAE_X000167600 ------t_suis_wormbase_M514_04135 79 NLQQGKADESIKMLEKCLQFDR------100 b_xylophilus_wormbase_BUX.s00351.430 84 AREEGEIKESLEWLQKV------100 m_incognita_wormbase_Minc13696+Minc13695 95 AREEGE------100 p_exspectatus_gemoma_F58A4.14_ortholog 68 AREEGNLDEALRWFTQAMQLCPSSLEFAVEVGR 100 p_pacificus_manual_PPA17341 79 AREEGNLDEAFRWFTQAMQLCP------100 a_suum_manual_GS_14882 ------o_volvulus_wormbase_OVOC4269 ------l_loa_wormbase_EFO18218.1 84 ARMEGNFREALIWFEKA------100 b_malayi_wormbase_Bm4542 ------h_bacteriophora_genblastg_F58A4.14_ortholog 87 ARVEGELEEALEWF------100 h_contortus_wormbase_HCOI01431900.t1 90 ARIEGELEEAL------100 h_contortus_manual_HCOI00399400.t1 90 ARIEGELEEAL------100 a_ceylanicum_wormbase_Acey_s0288.g1477.t3 92 ARAEGELED------100 n_americanus_manual_NECAME_05770 ------c_angaria_wormbase_Cang_2012_03_13_00614.g12780.t1 ------c_tropicalis_gemoma_Csp11.Scaffold629.g9328.t1 98 ARN------100 c_remanei_wormbase_CRE25560 98 ARN------100 c_brenneri_gemoma_CBN05677 96 ARNEG------100 c_elegans_F58A4.14a 99 AR------100 c_japonica_wormbase_CJA19013 98 ARN------100 c_sinica_wormbase_Csp5_scaffold_00091.g3989.t1 97 ARNE------100 c_briggsae_wormbase_CBG10029 95 ARNEGE------100 c_briggsae_wormbase_CBG09893 95 ARNEGE------100

Figure 3.8: Multiple sequence alignment of first 100a.a. of bbs-4 orthologs. Among 25 nematode genomes, 25 bbs-4 orthologs are found, and 2 are not found. Note: C. briggsae and H. contortus contain two bbs-4 genes. Both of the C. briggsae bbs-4 genes have high confidence 5’ start sites, but only one of the H. contortus genes has a high confidence 5’ start site.

3.3.5 Curation of bbs-5 orthologs in nematodes bbs-5 is a gene implicated in Bardet-Biedl Syndrome type 5. It is expressed in ciliated neurons, and is a component of the BBSome (Wei et al., 2012). We identified bbs-4 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.9).

41 Table 3.6: Curation of bbs-5 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE04780 WormBase gene model Yes 73.8 C. tropicalis Csp11.Scaffold628.g7413.t1 GeMoMa gene model Yes 90.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN05214 WormBase gene model Yes 87.0 C. sinica Csp5_scaffold_04172.g32262.t2 WormBase gene model Yes 86.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG23799 WormBase gene model Yes 84.0 C. elegans R01H10.6 - - C. japonica CJA14195 genBlastG gene model Yes 90.0 C. angaria Cang_2012_03_13_00045.g2428.t2 WormBase gene model No 36.0 Gap in 5’ end of alignment H. bacteriophora Hba_14597 GeMoMa gene model Yes 61.2 No RNA-seq data, but first 100a.a. are 42 conserved H. contortus HCOI01842300.t1 WormBase gene model Yes 60.2 A. ceylanicum Acey_s0682.g1491.t1 Manual gene model Yes 60.2 N. americanus NECAME_02517 WormBase gene model Yes 59.2 P. pacificus R01H10.6_ortholog GeMoMa gene model Yes 51.9 P. exspectatus scaffold178-EXSNAP2012.27 WormBase gene model Yes 51.9 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_1000349000 WormBase gene model Yes 46.6 P. redivivus g7489.t1 WormBase gene model Yes 47.3 B. xylophilus BUX.s00192.2 WormBase gene model Yes 51.9 3’ end of gene truncated due to end of contig M. incognita Minc05234 WormBase gene model Yes 46.6 Gap 100bp upstream M. incognita Minc01275 WormBase gene model Yes 44.9 M. hapla MhA1_Contig953.frz3.gene14 WormBase gene model Yes 46.6 No RNA-seq data, but first 100a.a. are conserved Curation of bbs-5 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_21627 WormBase gene model Yes 54.7 RNA-seq suggests upstream exon but low coverage D. immitis nDi.2.2.2.t03335 WormBase gene model Yes 45.3 RNA-seq suggests upstream exon but low coverage O. volvulus OVOC5942 WormBase gene model Yes 48.1 RNA-seq suggests upstream exon but low coverage B. malayi Bm5173 WormBase gene model Yes 48.1 L. loa EFO18365.1 WormBase gene model Yes 50.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV62386 Manual gene model Yes 45.0 WormBase gene model has upstream exons but may be different gene T. suis M514_03147 WormBase gene model Yes 43.3 43 t_spiralis_manual_EFV62386 1 MFKSVEKNLKHYDVIWEDRDVRFDVDSKQLKLRNGEFIVDKSHGIEDTKGNRGQKGTLIVTNLRI 65 t_suis_wormbase_M514_03147 1 ---MSRRDSPKYDVIWEDRDIQFDLDSKLLKLRNGEFIVEKSHGVEDTKGSCGQKGTLVITNLRI 62 h_bacteriophora_gemoma_Hba_14597 1 ----MAGDKVNGDDLWQDKDIRFDVDHRMLRLISGEVQIDRMDMVEDTKGNNGDRGVMRVTNLRV 61 h_contortus_wormbase_HCOI01842300.t1 1 ----MSGEKVNGDELWQDKEIRFDVDHRMLRLVPGEIQIDRVDKVEDTKGNNGDRGVMRITNLRI 61 a_ceylanicum_manual_Acey_s0682.g1491.t1 1 ----MSGDKVNGDELWQDKDIRFDVDHRMLRLVPGEIQIDRMDMVEDTKGNNGDRGVMRVTNLRL 61 n_americanus_wormbase_NECAME_02517 1 ----MAGDKVNGDELWQDKDIRFDVDHRMLRLVPGEIQIDRIDMVEDTKGNNGDRGVMRVTNLRL 61 c_angaria_wormbase_Cang_2012_03_13_00045.g2428.t2 1 ------MDHVEDTKGNNGDRGTMRVTNLRV 24 c_remanei_wormbase_CRE04780 1 ---MDRMERVNGEDIWQDREIRFDVDHKLLRMAKGEFAVAKVEHVEDTKGNNGDKGIMKVTNLRL 62 c_brenneri_wormbase_CBN05214 1 ------MERVNGEDIWQDREIRFDVDHKLLRLINGEIQIAKVEHVEDTKGNNGDKGTMRVTNLRL 59 c_elegans_R01H10.6 1 ------MERVNGEDIWQDREIRFDVDHKLLRLINGEIQVAKIEHVEDTKGNNGDRGTMRVTNLRL 59 c_briggsae_wormbase_CBG23799 1 ------MERVNGEDIWQDREIRFDVDHKLLRMINGEVQIAKVENVEDTKGNNGDKGVIRVTNLRL 59 c_sinica_wormbase_Csp5_scaffold_04172.g32262.t2 1 ------MERVNGEDIWQDREIRFDVDHKLLRMINGEVQIAKVENVEDTKGNNGDKGIMRVTNLRL 59 c_japonica_genblastg_CJA14195 1 ------MERVNGEDIWQDREIRFDVDHKLLRLINGEIQVAKVENVEDTKGNNGDRGTMRVTNLRL 59 c_tropicalis_gemoma_Csp11.Scaffold628.g7413.t1 1 ------MERVNGEDIWQDREIRFDVDHKLLRLINGEVQVAKVENVEDTKGNNGDRGTMRVTNLRL 59 p_redivivus_wormbase_g7489.t1 1 - -MGRKDRSALSD I TWHDRE IFFDMDQKAMKL IPGEVLVEK IDGVEDTKGNNGDNGTLRITNLRL 63 p_exspectatus_wormbase_scaffold178-EXSNAP2012.27 1 ------MWQDREIRFDVDTRMLRLIPGEFTLDRIENVEDTKGNNGDRGIFRITNLRL 51 p_pacificus_gemoma_R01H10.6_ortholog 1 ------MWQDREIRFDVDTRMLRLIPGEFTLDRIENVEDTKGNNGDRGIFRITNLRL 51 b_xylophilus_wormbase_BUX.s00192.2 1 MLGRKKDEMALSDGIWQDREIQFDVDQRQLKLIPGEYLVERIENVEDTKGNNGDRGILRITNIRL 65 m_hapla_wormbase_MhA1_Contig953.frz3.gene14 1 ---MPKKDMALSDTLWQYRDVLFDLDTRMLRLTAGETIVERIDNVEDTKGNNGDRGLLRITNLRL 62 m_incognita_wormbase_Minc05234 1 ---MPKKDMALSDTMWQYRDVLFDLDTRMLRLAAGESIVERIDNVEDTKGNNGDRGLLRITNLRL 62 m_incognita_wormbase_Minc01275 1 ---MPKKDMALSDTMWQYRDVLFDLDTRMLRLAAGESIVERIDNVEDTKGNNGDRGLLRITNLRL 62 o_volvulus_wormbase_OVOC5942 1 ---MSSKIKAPLLTIWQDRDIRFDMKPRLLRLIPGEYLVDCIDGVEDTKGNCGDKGVLRITNLRL 62 d_immitis_wormbase_nDi.2.2.2.t03335 1 ------MPSLVMWQDREIRFDINPRLLRLIPGEHLVDRIDGVEDTKGNCGDNGVIRITNLRL 56 l_loa_wormbase_EFO18365.1 1 ---MNSKLQIPSIAIWQDRDIRFDVNPRLLHLIAGENLVDRIDDVEDTKGNCGDKGVLRITNLRL 62 b_malayi_wormbase_Bm5173 1 ------MPLISIWQDRDIRFDINPRLLQLISGEHLVERIDGVEDTKGNCGDKGILRITNLRL 56 a_suum_wormbase_GS_21627 1 ---MSAKKEAASDMIWQDRDIRFDLDSRLLRLIAGEHLVERIDGVEDTKGNNGDKGVLRVTNIRL 62 s_ratti_wormbase_SRAE_1000349000 1 -----MKSNFVYDGIWQDKDICFDLESRLLRLIPGEVLIEKIDNVEDTKGNNGDKGVLKITNIRL 60 t_spiralis_manual_EFV62386 66 IWLSHASSKINLTIGFNSITAIRTRSTLS---KLVGKT------100 t_suis_wormbase_M514_03147 63 IWFAHATCKINLSIGFNAVTGIRTRSNMS---KLLGKTESL------100 h_bacteriophora_gemoma_Hba_14597 62 IWYASSMPRINLSIGYGNITGLQSKEVAS---KVRGSECEAL------100 h_contortus_wormbase_HCOI01842300.t1 62 IWYASSMPRINLSIGYSNITGLQSREVAS---KVRGTEVEAL------100 a_ceylanicum_manual_Acey_s0682.g1491.t1 62 IWYASSMPRINLSIGYSNITGLQSREVVS---KVRGTEVEAL------100 n_americanus_wormbase_NECAME_02517 62 IWYASSMPRINLSIGYSSVTALQSREVVS---KVRGTEVEAL------100 c_angaria_wormbase_Cang_2012_03_13_00045.g2428.t2 25 IWHASSMPR IN I T IGWNC I TGVQTRQATS -SSQRRGGPCEA I YVLAKVSTA ATKFEFIFTTTSPA 88 c_remanei_wormbase_CRE04780 63 IWYAMNMPRINISIGWNTITGTQSKTSTSLAARNRGGT------100 c_brenneri_wormbase_CBN05214 60 IWHAASMPRINIT IGWNT ITGTQSKSSANVATRNRGVANEA------100 c_elegans_R01H10.6 60 IWHAASMPRINIT IGWNA ITGVQSKQTTSLVTRNRGISNEA------100 c_briggsae_wormbase_CBG23799 60 IWYAMSMPRINITIGWNTITGTQSKTSTSLATRNRGGSNEA------100 c_sinica_wormbase_Csp5_scaffold_04172.g32262.t2 60 IWYAASMPRINITIGWNTITGTQSKTATSLVSRNRGGSNEA------100 c_japonica_genblastg_CJA14195 60 IWHAASMPRINIT IGWNC ITGVQTKTSTS IAQRNRGGSNEA------100 c_tropicalis_gemoma_Csp11.Scaffold628.g7413.t1 60 IWHAASMPRINIT IGWNT ITATQSKTSTSLATRNRGTSNEA------100 p_redivivus_wormbase_g7489.t1 64 LWNANQMPRINLTIGWNCVTGATTRFAKS- - -KLKGRTES------100 p_exspectatus_wormbase_scaffold178-EXSNAP2012.27 52 IWHAQAMPRINLSIGLNTVAGIQSKKASS---KIKGETESMYVTAKAPQTRF------100 p_pacificus_gemoma_R01H10.6_ortholog 52 IWHAQAMPRINLSIGLNTVAGIQSKKASS---KIKGETESMYVTAKAPQTRF------100 b_xylophilus_wormbase_BUX.s00192.2 66 IWHAVNSPRINLSIGYNNIHGVTTRILKS---KVRGHA------100 m_hapla_wormbase_MhA1_Contig953.frz3.gene14 63 IWHAVAVPRINLSIGHNTITGITTRVAKS---KIRGQAESL------100 m_incognita_wormbase_Minc05234 63 IWHAVAVPRINLSIGHNTITGITTRVAKS---KIRGQAESL------100 m_incognita_wormbase_Minc01275 63 IWHAVAVPRINLSIGHNTITGITTR------KIRGQAESLYIMS------100 o_volvulus_wormbase_OVOC5942 63 TWHASAIPRINLCVGYNTVDGVTTRIAKT---KLRGQAESL------100 d_immitis_wormbase_nDi.2.2.2.t03335 57 TWHATAISRINLSVGYNTLGGVTIRTAKS---RLRGQAESLYLLARH------100 l_loa_wormbase_EFO18365.1 63 TWHAIAIPRINLSLGYNTISGVTTKMTKS---RLRGQAESL------100 b_malayi_wormbase_Bm5173 57 TWHATAIPRINLSLGYNTINGVTTRFTKS---RLRGQTESLYILAHH------100 a_suum_wormbase_GS_21627 63 IWHATSMPRINLSIGYNSINGVTTRLANS---KIRGQAESL------100 s_ratti_wormbase_SRAE_1000349000 61 IWNASSMTRINLSIGFNSVNGVTTRIANS---RVRGQAESLYL------100 t_spiralis_manual_EFV62386 ------t_suis_wormbase_M514_03147 ------h_bacteriophora_gemoma_Hba_14597 ------h_contortus_wormbase_HCOI01842300.t1 ------a_ceylanicum_manual_Acey_s0682.g1491.t1 ------n_americanus_wormbase_NECAME_02517 ------c_angaria_wormbase_Cang_2012_03_13_00045.g2428.t2 89 AHSKLFTTILSL 100 c_remanei_wormbase_CRE04780 ------c_brenneri_wormbase_CBN05214 ------c_elegans_R01H10.6 ------c_briggsae_wormbase_CBG23799 ------c_sinica_wormbase_Csp5_scaffold_04172.g32262.t2 ------c_japonica_genblastg_CJA14195 ------c_tropicalis_gemoma_Csp11.Scaffold628.g7413.t1 ------p_redivivus_wormbase_g7489.t1 ------p_exspectatus_wormbase_scaffold178-EXSNAP2012.27 ------p_pacificus_gemoma_R01H10.6_ortholog ------b_xylophilus_wormbase_BUX.s00192.2 ------m_hapla_wormbase_MhA1_Contig953.frz3.gene14 ------m_incognita_wormbase_Minc05234 ------m_incognita_wormbase_Minc01275 ------o_volvulus_wormbase_OVOC5942 ------d_immitis_wormbase_nDi.2.2.2.t03335 ------l_loa_wormbase_EFO18365.1 ------b_malayi_wormbase_Bm5173 ------a_suum_wormbase_GS_21627 ------s_ratti_wormbase_SRAE_1000349000 ------

Figure 3.9: Multiple sequence alignment of first 100a.a. of bbs-5 orthologs. Among 25 nematode genomes, 26 bbs-5 orthologs are found, and none are not found. Note: M. incognita contains two bbs-5 genes, and both genes have high confidence 5’ start sites.

3.3.6 Curation of bbs-8 orthologs in nematodes bbs-8 is a gene implicated in Bardet-Biedl Syndrome type 8. It is expressed in ciliated neurons, and is a component of the BBSome (Wei et al., 2012). bbs-8 is required along with bbs-7/osm-12 to stabilize IFT particles; in bbs-7/osm-12 and bbs-8 mutants, IFT-A and IFT-B move along the

44 axoneme separately (Ou et al., 2005a). We identified bbs-8 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.10).

45 Table 3.7: Curation of bbs-8 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE31167 WormBase gene model Yes 90.0 C. tropicalis Csp11.Scaffold601.g5450.t1 WormBase gene model Yes 93.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN05971 WormBase gene model Yes 88.0 C. brenneri CBN26405 WormBase gene model Yes 88.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved C. sinica Csp5_scaffold_00603.g13428.t1 WormBase gene model Yes 89.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG19013 WormBase gene model Yes 86.0 C. elegans T25F10.5 - - - C. japonica CJA34591 WormBase gene model Yes 84.0 No RNA-seq data for this gene, but first 100a.a. are conserved 46 C. angaria Cang_2012_03_13_00604.g12695.t1 WormBase gene model Yes 70.3 Gap 1.7kb upstream H. bacteriophora Hba_12653 GeMoMa gene model Yes 61.4 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00222100.t1 WormBase gene model Yes 54.1 RNA-seq suggests upstream exon but low coverage A. ceylanicum Acey_s0240.g3360.t3 WormBase gene model Yes 57.1 N. americanus NECAME_08343 GeMoMa gene model Yes 58.1 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) P. pacificus PPA28797+PPA28796 GeMoMa gene model Yes 46.3 P. exspectatus scaffold658-EXSNAP2012.18 GeMoMa gene model Yes 56.0 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000516900 WormBase gene model Yes 49.0 Gene only contains 1 exon P. redivivus g8508.t1 WormBase gene model Yes 47.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved B. xylophilus BUX.s01518.102+BUX.s01518.101 GeMoMa gene model Yes 38.1 RNA-seq suggests upstream exon but low coverage Curation of bbs-8 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. incognita Minc03694 WormBase gene model Yes 45.7 M. incognita Minc08215 WormBase gene model No 12.5 Gap in 5’ end of alignment; gap in assembly <100bp upstream M. hapla MhA1_Contig417.frz3.fgene1 WormBase gene model Yes 43.4 No RNA-seq data, but first 100a.a. are conserved A. suum GS_14596 GeMoMa gene model Yes 51.0 D. immitis nDi.2.2.2.t08081 WormBase gene model Yes 37.4 O. volvulus OVOC6415 GeMoMa gene model Yes 42.3 B. malayi Bm6009 WormBase gene model Yes 43.4 1.7kb from end of contig L. loa EJD75523.1 WormBase gene model Yes 40.6 T. spiralis EFV50304 GeMoMa gene model Yes 29.6 47 T. suis M514_11552 Manual gene model No 11.9 Gap in 5’ end of alignment t_suis_manual_M514_11552 1 --MSDAENE------QTDQDQEDME---EVDQEAMEEAHKRREHARAE REEVANLAAFG--KPST 52 m_incognita_wormbase_Minc08215 1 MRTSQQ IAT IAQRNSNDEEQDWFWPNQLGKCYYRMGMIRDAEN- -QFLLSL QRCPMVETDIGDTERSL 66 t_spiralis_gemoma_EFV50304 1 ------MDPLYVALH---YFRFRKFDQSID--ECTRIL QENPRDQAAWL-LKLNC 43 s_ratti_wormbase_SRAE_2000516900 1 ----MSINS------TIDPFYKALI---LFKFNKINECHE--VCNNIL DKNPLDQAVWS - LKLGC 49 m_hapla_wormbase_MhA1_Contig417.frz3.fgene1 1 ------MD------QLDPLCLALF---YFRINRIEDAHK--ECSKLL EKNSLDQAAWS - LKLSC 46 m_incognita_wormbase_Minc03694 1 ------MD------QLDPLCLALF---YFRINRIEDAHK--ECTRLL EKNSLDQAAWS - LKLSC 46 p_redivivus_wormbase_g8508.t1 1 -MASAKVSD------QLDPFYKALL---TFRINKIDECHE--ICSKIL ERNPLDQAAWS - LKLAC 52 b_xylophilus_gemoma_BUX.s01518.102+BUX.s01518.101 1 ------ME------QLDPLYLALL---YYRINKIDLASE--QCSKIL EKNPLDQAAWS - LKLAC 46 a_suum_gemoma_GS_14596 1 ----MESTM------ELDSLLKALL---LYKRNKVSEAIE--LCTEIL EKNPYDQAAWG - LKMTC 49 d_immitis_wormbase_nDi.2.2.2.t08081 1 ------MDDPLFRALL---HFKQNNIEKAIK--ICSDIL EKNPCNQTVWS - LKLNC 44 b_malayi_wormbase_Bm6009 1 ------M------ELDLSLKALI---YFKHNNIEKAIE--ICNDIL EKNPYDQAAWS - LKMSC 45 l_loa_wormbase_EJD75523.1 1 ------M------ELNPLLRALL---HFKHNNTEKAIK--ICSDIL EKNPCDQAAWS - LKLSC 45 o_volvulus_gemoma_OVOC6415 1 -----MTTM------ELDPFLQALL---HFKHNDIEKAIK--ICSEIL EKNPFDQAAWS - LKLSC 48 p_exspectatus_gemoma_scaffold658-EXSNAP2012.18 1 --MTSSEGN------KIDGLLKALR---LFSANEIDSAEN--EAGEIL RKNPLDQAAWT-LKLSC 51 p_pacificus_gemoma_PPA28797+PPA28796 1 --MASSEGN------KIDGLLKALR---LFSANEIDSAEN--EAGEIL KKNPLDQAAWT-LKLSC 51 h_bacteriophora_gemoma_Hba_12653 1 --MAEESAT------PKFEGLLKALR---LYRTNQIQACEE--ECTQLL NKNPLDQAAWA - LKLSC 52 n_americanus_gemoma_NECAME_08343 1 --MVEETVL----REEVPKFEGLFKALR---LYRNNQIKACEE--ECTKLL EKNPLDQAVWA - LKLTC 56 a_ceylanicum_wormbase_Acey_s0240.g3360.t3 1 --MAEGADT----SEEVPKFEGLFKALR---LYRNNQIQACEE--ECTKLL NKNPLDQAAWA - LKLCC 56 h_contortus_wormbase_HCOI00222100.t1 1 --MSDEPVATETHREEVPRFDGLFKALR---LYRNNQIQACEE--ECTKLL RKNPLDQAAWA - LKLSC 60 c_angaria_wormbase_Cang_2012_03_13_00604.g12695.t1 1 -MDEEKKYV------NFTGFIRVYK---LFRKNKLEHASM--LCTKLL QKNPLDQATWA - LKMQC 52 c_japonica_wormbase_CJA34591 1 --MSDDVSV------EFSGFIKAFR---LFRNNRLSEAEA--VCTALL RKNPLDQATWA - LKLQC 51 c_briggsae_wormbase_CBG19013 1 --MTDGPVI------EFTGFMKAFR---LFRENRLSEAEA--VCTNLL RKNPLDQATWA - LKMQC 51 c_sinica_wormbase_Csp5_scaffold_00603.g13428.t1 1 --MSEEPVV------EFTGFIKALR---LFRENRLPEAEA--VCTELL RKNPLDQATWA - LKLQC 51 c_remanei_wormbase_CRE31167 1 --MSDDPVI------EFTGFIKACR---LFRENRLPEAEV--VCTNLL RKNPLDQATWA - LKLQC 51 c_elegans_T25F10.5 1 --MSGESVI------EFTGFIKACR---LFRENRLAEAEA--VCTNLL RKNPLDQATWA - LKMQC 51 c_brenneri_wormbase_CBN05971 1 --MNDDPVI------EFSGFLKACR---LFRENRLSEAEA--VCTNLL RKNPLDQATWA - LKLQC 51 c_brenneri_wormbase_CBN26405 1 --MNDDPVI------EFSGFLKACR---LFRENRLSEAEA--VCTNLL RKNPLDQATWA - LKLQC 51 c_tropicalis_wormbase_Csp11.Scaffold601.g5450.t1 1 --MSEEPVI------EFSGFIKACR---LFRENRLSEAEA--VCTNLL RKNPLDQATWA - LKLQC 51 t_suis_manual_M514_11552 53 LPKEKLMRS-----EGIIPI------QSGTNKYASQKGMTGFGRPRD VIDKVKCENLKP--- 100 m_incognita_wormbase_Minc08215 67 LYYKLLLRQDASNVEGIACL------GAHIFYEGRPEIAL------100 t_spiralis_gemoma_EFV50304 44 IVCRTRIDDTEFEETGLAEE------LMN-EEVLANMPRPATSLRKPQS ASGNGQGSRPTTKSG 100 s_ratti_wormbase_SRAE_2000516900 50 FTEETYIDELENDEIGMVDI------FMD-DNIISKDARPGTSFNRLLT -SALGTSNQA----- 100 m_hapla_wormbase_MhA1_Contig417.frz3.fgene1 47 FAEEVYVDELENEETGLADT------FMDLGTAVATAARPGTSLYRPLTGAAGGPSPAVR---- 100 m_incognita_wormbase_Minc03694 47 FSEEVYVDELENEEAGLADT------FMDLGTAVATAARPGTSLYRPLTGTAGGPSPAVR---- 100 p_redivivus_wormbase_g8508.t1 53 FTEEVFVDELENDEAGLADT------YMD-DNVIATAARPGTSFARPIT -TGKGPS------100 b_xylophilus_gemoma_BUX.s01518.102+BUX.s01518.101 47 FTEEVYIDELENDEAGLADA------YLD-DHAIVTGARPGTSLNRPVT -NGGGPSPALRPR-- 100 a_suum_gemoma_GS_14596 50 MTEDIYVDEIENDERGVAET------FLD-DSIIASNARPGTSFSRPVT -SSKGPSQAI----- 100 d_immitis_wormbase_nDi.2.2.2.t08081 45 LTEEFYVDELENDECGIAEI------FLD-DAQLASRARPGTSLSRPVA -SGQTSRQAIRPVSS 100 b_malayi_wormbase_Bm6009 46 LTEQFYVDELENDERGIAEI------FLD-DTVLASKARPGTSLSRPIT -SGQMSRQAIRPTS- 100 l_loa_wormbase_EJD75523.1 46 LTEEFYVDELENDERGVAEI------FLD-DTVLASKTRPGTSLSRPVT -SGQTSRQAIRPIS- 100 o_volvulus_gemoma_OVOC6415 49 LTEEFYVDELENDERGIAEI------FLD-DTVLASRARPGTSLSRPVT -SDQTSRQAIR---- 100 p_exspectatus_gemoma_scaffold658-EXSNAP2012.18 52 LADKFYVDELENNDIGLAEQ------VID-TQIIAPNARPGTSFARPST -SSRAANP------100 p_pacificus_gemoma_PPA28797+PPA28796 52 LADKFYVDELENNDIGLAEQKRVLIVNKYII-IEIIAPNARPGTSFARPS------100 h_bacteriophora_gemoma_Hba_12653 53 IADSVYVDELENDYIGITEA------FLD-QNSIAPNARPGTSFQRPAT -TAKGMN------100 n_americanus_gemoma_NECAME_08343 57 LTDPVYVDELENDELGIAET------FLD-QNIIAPNARPGTSFQRPTT -SA------100 a_ceylanicum_wormbase_Acey_s0240.g3360.t3 57 LTDPVYVDELENDELGIAET------FLE-QNVIAPNARPGTSFQRPTT -SG------100 h_contortus_wormbase_HCOI00222100.t1 61 LTDPVFVDELDNDELGLAET------FLD-QNVIAPNARPGTSFQRP------100 c_angaria_wormbase_Cang_2012_03_13_00604.g12695.t1 53 ISDGTYVDELDNEDVGIAET------FLE-QNVIAPNARPGTSFQRPKT -TAKGIN------100 c_japonica_wormbase_CJA34591 52 LSDSTYVDELENEDVGLAET------FLE-QNVIAPNARPGTSFQRPKT -TSKGVNP------100 c_briggsae_wormbase_CBG19013 52 LSDSTYVDELDNEDMGLAET------FLE-QNVIATSARPGTSFQRPKT -TAKGMNP------100 c_sinica_wormbase_Csp5_scaffold_00603.g13428.t1 52 LSDSTYVDELENEDMGLAEA------FMD-QNVIAPNARPGTSFQKPKT -SAKGVNP------100 c_remanei_wormbase_CRE31167 52 LSDSTYVDELENEDMGLAET------FLD-QNVIAPSARPGTSFQRPKT -TAKGINP------100 c_elegans_T25F10.5 52 LSDSTYVDELENEDMGLAET------FLD-QNVIAPNARPGTSFARPKT -SAKGVNP------100 c_brenneri_wormbase_CBN05971 52 LSDSTYIDELENEDMGLAET------FLD-QNVIAKSARPGTSFARPKT -TAKGVNP------100 c_brenneri_wormbase_CBN26405 52 LSDSTYIDELENEDMGLAET------FLD-QNVIAKSARPGTSFARPKT -TAKGVNP------100 c_tropicalis_wormbase_Csp11.Scaffold601.g5450.t1 52 LSDSTYIDELENEDMGLAET------FLD-QNVIAPNARPGTSFARPKT -TAKGVNP------100

Figure 3.10: Multiple sequence alignment of first 100a.a. of bbs-8 orthologs. Among 25 nematode genomes, 27 bbs-8 orthologs are found, and none are not found. Note: C. brenneri and M. incognita contain two bbs-8 genes. Both C. brenneri bbs-8 genes have high confidence 5’ start sites, but only one M. incognita has a high confidence 5’ start site.

3.3.7 Curation of bbs-9 orthologs in nematodes bbs-9 is a gene implicated in Bardet-Biedl Syndrome type 9. It is expressed in ciliated neurons, and is a component of the BBSome (Wei et al., 2012). We identified bbs-9 orthologs in 19 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.11).

48 Table 3.8: Curation of bbs-9 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE28509 WormBase gene model Yes 67.0 C. tropicalis Csp11.Scaffold555.g3737.t1 WormBase gene model Yes 75.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN28750 WormBase gene model Yes 75.0 RNA-seq suggests upstream exon but low coverage C. sinica Csp5_scaffold_00667.g14214.t2 WormBase gene model Yes 66.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG12732 genBlastG gene model Yes 78.0 C. elegans C48B6.8 - - - C. japonica CJA34314 genBlastG gene model Yes 66.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved C. angaria Cang_2012_03_13_00009.g723.t1 GeMoMa gene model Yes 46.5 49 H. bacteriophora Hba_10665+Hba_10664 genBlastG gene model Yes 35.1 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI01933200.t1 genBlastG gene model Yes 36.0 RNA-seq suggests upstream exon but low coverage A. ceylanicum Acey_s0012.g1826.t2 WormBase gene model Yes 35.6 N. americanus NECAME_03447 genBlastG gene model Yes 35.2 No RNA-seq data for first intron, but first 100a.a. are conserved P. pacificus No ortholog found - - - Low sequence similarity (GeMoMa PID: 24.1, other predictions not found) P. exspectatus No ortholog found - - - Low sequence similarity (GeMoMa PID: 21.5, genBlastG PID: 7.1) S. ratti SRAE_1000341900 WormBase gene model Yes 34.3 P. redivivus g4752.t1 WormBase gene model Yes 34.3 B. xylophilus BUX.s00594.1 WormBase gene model No 5.6 ~400bp from end of contig M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 15.2, other predictions not found) Curation of bbs-9 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig2442.frz3.gene6 genBlastG gene model Yes 29.4 No RNA-seq data; first 100a.a. are partially conserved; 2.6kb from end of contig A. suum GS_01098+GS_06357+GS_14491/ No - - - Low sequence similarity (WormBase PID: 22, ortholog found GeMoMa PID: 25.1, genBlastG PID: 15.2) D. immitis nDi.2.2.2.t08406 WormBase gene model Yes 31.4 O. volvulus OVOC11916 WormBase gene model Yes 33.0 RNA-seq suggests upstream exon but low coverage B. malayi Bm2831 WormBase gene model Yes 32.4 L. loa EFO19638.2 WormBase gene model Yes 32.4 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV52649/No ortholog found - - - Low sequence similarity (WormBase PID: 19, GeMoMa PID: 20.7, genBlastG not found) 50 T. suis No ortholog found - - - Low sequence similarity (GeMoMa PID: 19.8, other predictions not found) b_xylophilus_wormbase_BUX.s00594.1 1 --XFETENQSSLGLQPFGSDVVISLFSAAKNGRYRIQSDSTDFLYLIFEDL VRRIKK------KQPD 59 m_hapla_genblastg_MhA1_Contig2442.frz3.gene6 1 MSLFQFQEWFSATQPNSFSLAIAKLI----EERE---LKKPKIFFSVFDPGREPDHR- -NEMGGLAT 58 o_volvulus_wormbase_OVOC11916 1 MSVFHVHEWLTLDLINANVCAVGELV----ENRDQIVVGSITGRIWIIDPGRANDPKQQQLLSCLLE 63 d_immitis_wormbase_nDi.2.2.2.t08406 1 MSVFHVHEWLALELIDANVCAVGELI----ENRDQLVVGSITGRIWIIDPGRAGETK-QQLLSCLLE 62 b_malayi_wormbase_Bm2831 1 MSVFHVHEWLTLELVDANVCAVGELI----EKRDQLVVGSLTGRIWIIDPGRANDTK-QQLLSCLLE 62 l_loa_wormbase_EFO19638.2 1 MSVFHVHEWLALEVIDGNVCAVGELI----EKRDQIVVGSITGRIWIVDPGRTSETK-QQLLSCLLE 62 h_bacteriophora_genblastg_Hba_10665+Hba_10664 1 MSLFRLTEWYSNLIPSSSCLTVGSLI----EERDQLIVGGEDGIVTILDPGCCTT------51 h_contortus_genblastg_HCOI01933200.t1 1 MSLFRISEWYSNLYPNASCISVGALI----ETRDQLLIGGEDGVLSILDPGGSEKDP------ILLE 57 a_ceylanicum_wormbase_Acey_s0012.g1826.t2 1 MSLFRISEWYTNLYPAASCITVGTLI----EGRDQLIIGGEDGVVTVLDPGGAEKDP------VLLE 57 n_americanus_genblastg_NECAME_03447 1 MSLFRISEWYTNLYPAASCIAVGAFI----ENRDQVIIGGEDGVLTVLDPGGTEKDP------ILLE 57 c_angaria_gemoma_Cang_2012_03_13_00009.g723.t1 1 MSLFRVIDWFTLTIPYASTMLEMEIY----NDRHQIITGGEDGQIAVFDPGNEDQQN-----IHVLT 58 c_japonica_genblastg_CJA34314 1 MSLFRLVEWVSHTIPNASAFLNASFF----QDRDQLVVGGEDGQIIILDPGFRDDNN-----HVLVT 58 c_elegans_C48B6.8 1 MSLFRLVEWVSQTIPNTSTILNASFF----QDREQLVIGGENGQITISDPGFRDTNA-----HVLCT 58 c_briggsae_genblastg_CBG12732 1 MSLFRLVEWVSHTIPNTSTVLNASFF----QERDQLIVGGEDGQIQIMDPGFRENSN-----HVLVT 58 c_brenneri_wormbase_CBN28750 1 MSLFRLVEWVTHTIPKTSAILNASFF----QDREQLVVGGEDGQIMILDPGNREDCN-----HVLVT 58 c_tropicalis_wormbase_Csp11.Scaffold555.g3737.t1 1 MSLFRLVEWVSHTIPHTSAILNASFF----QDREQLVVGGEDGQIIILDPGFRDDSN-----HVLAT 58 c_remanei_wormbase_CRE28509 1 MSLFRLVEWVSHAVPKVSTVLNASFY----QERDQLVVGGEDGQIIILDPGCREDNN-----HVLVT 58 c_sinica_wormbase_Csp5_scaffold_00667.g14214.t2 1 MSLFRLVEWVSHTIPRTATVINASFY----QDREQLVVGGDDGQIQILDPGCREDSN-----HVLVT 58 s_ratti_wormbase_SRAE_1000341900 1 MSLFRYHEFISHTSSEATTIDVGFLL----GKRDVVAIGSLNGTLSILDPG IDIENR--STKALIFE 61 p_redivivus_wormbase_g4752.t1 1 MSLFHFQEWFHTVVDEARTLTTGVLL----DDTSQLVIGTLDGLLCLLDPGRDVDNR--HEASSLME 61 b_xylophilus_wormbase_BUX.s00594.1 60 AKFSCPVPLHLFVTGIVKQVE-----LEKHRENERKEVQRVSIQMR------100 m_hapla_genblastg_MhA1_Contig2442.frz3.gene6 59 VQISLPI-LQLSTGYFLPSSYGPETIIIVALTPSTLIYLKINQ------100 o_volvulus_wormbase_OVOC11916 64 EDLSVSI-IDIAIANFVPNLGKN---LIAILSPQKLIIYHF------100 d_immitis_wormbase_nDi.2.2.2.t08406 63 DDLAVGI-IDIAIANFIAGLEQN---LIAILSPQKLIIYRFT------100 b_malayi_wormbase_Bm2831 63 EDLSAAI-LDIAIANFISGLEQN---LIAILLPQKLVIYHLI------100 l_loa_wormbase_EFO19638.2 63 EDLSVAI-LDIAIGNFISGLEQN---LIAVLSPQKLIIYRLI------100 h_bacteriophora_genblastg_Hba_10665+Hba_10664 52 ------TEILLGEFLPGLR-Q---VIALLAPHSLTYYRIKYDSDDPSQL KMEEMFAHSL 100 h_contortus_genblastg_HCOI01933200.t1 58 QKYSSKR------CFSS------VLAVLTPYSLTYFKLHAADLSRTKL EEMFSHQ--- 100 a_ceylanicum_wormbase_Acey_s0012.g1826.t2 58 QQTGRPV-IDIIIGEFLPAVG-P---VLAVLTPRVLSYFRLSYDASDL------100 n_americanus_genblastg_NECAME_03447 58 QEIGKPV-IDIIIGEFLPPGG-T---ILAVLTPRLLSYFHALDLSRTK------100 c_angaria_gemoma_Cang_2012_03_13_00009.g723.t1 59 YATYHPI-LQMEHGDFLASVP-N---VLAILSPDRLAYFKIQRETSS------100 c_japonica_genblastg_CJA34314 59 AQTKNPI-LQFACGNFISNMG-K---VLAVLSPKKLIFYKVNLGDSS------100 c_elegans_C48B6.8 59 TETKYAI-LQMASDNFLPSMN-N---ILAVLSPTKLTYYKVHFASPD------100 c_briggsae_genblastg_CBG12732 59 AETKYPI-LEMASDNFLPSME-N---ILAVLSPTKLTYYGVNFASSE------100 c_brenneri_wormbase_CBN28750 59 AQTKYPV-QQMASDKFLPAME-N---ILAVLSPTKLTYYKVNFANPE------100 c_tropicalis_wormbase_Csp11.Scaffold555.g3737.t1 59 YLTKYPI-LQMASDKFLAQME-N---ILAVLSPTKLTYFKVTVEKEE------100 c_remanei_wormbase_CRE28509 59 TQTKYPI-MEMASDVFLSTMD-R---ILAVLSPTKLTYYAVNFGKFF------100 c_sinica_wormbase_Csp5_scaffold_00667.g14214.t2 59 AQTKYPI-LEIVCDKFLPSMG-R---ILAVLSPKKLTYYAVNFGRRI------100 s_ratti_wormbase_SRAE_1000341900 62 ELLEYPI-LQLCVGKFLSSLTQD---LLACLHPKFLTFYRIFE------100 p_redivivus_wormbase_g4752.t1 62 IHLDQPI-IQLAIGNFVSSQGDN---VLAVLHPKLLVFYRVSK------100

Figure 3.11: Multiple sequence alignment of first 100a.a. of bbs-9 orthologs. Among 25 nematode genomes, 19 bbs-9 orthologs are found, and 6 are not found.

3.3.8 Curation of che-2 orthologs in nematodes che-2 is required for cilium formation; che-2 mutants form short cilia as well as abnormal posterior projections (Fujiwara et al., 1999). che-2 is expressed in ciliated neurons and undergoes IFT (Qin et al., 2001). We identified che-2 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.12).

51 Table 3.9: Curation of che-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE17245 WormBase gene model Yes 87.0 C. tropicalis Csp11.Scaffold630.g18136.t1+ genBlastG gene model Yes 88.0 No RNA-seq data, but first 100a.a. are Csp11.Scaffold630.g18137.t1 conserved C. brenneri CBN01357 WormBase gene model Yes 75.5 C. sinica Csp5_scaffold_02026.g24449.t2 - No 14.8 Gap in 5’ end of alignment; WormBase anno- tation 400bp from end of contig C. briggsae CBG13647 WormBase gene model Yes 82.2 RNA-seq suggests upstream exon but low coverage C. elegans F38G1.1 - - - C. japonica CJA14719 genBlastG gene model Yes 70.6 C. angaria Cang_2012_03_13_00929.g14977.t1 WormBase gene model Yes 68.0

52 H. bacteriophora Hba_17689 GeMoMa gene model Yes 20.3 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00912800.t1 genBlastG gene model Yes 41.0 A. ceylanicum Acey_s0214.g2338.t1 WormBase gene model Yes 18.6 N. americanus NECAME_12179+ - No 8.1 Gap near 5’ end of gene and in intron NECAME_12180+NECAME_12181 P. pacificus PPA00066 WormBase gene model Yes 19.9 P. exspectatus scaffold464-EXSNAP2012.8 WormBase gene model Yes 45.1 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000426700 WormBase gene model Yes 30.1 RNA-seq suggests upstream exon but low coverage P. redivivus g21659.t1 GeMoMa gene model Yes 38.1 Sparse RNA-seq data for this gene, but first 100a.a. are conserved B. xylophilus BUX.s00609.141 GeMoMa gene model Yes 10.1 M. incognita F38G1.1_ortholog - No 7.9 100bp from end of contig M. hapla MhA1_Contig2043.frz3.fgene1 genBlastG gene model Yes 14.8 No RNA-seq data, but first 100a.a. are partially conserved; 2.2kb from beginning of contig Curation of che-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_11756 WormBase gene model Yes 38.0 D. immitis nDi.2.2.2.t03308 Manual gene model Yes 30.0 Multiple in-frame ATGs, used ATG that was most conserved with other species O. volvulus OVOC2818 WormBase gene model Yes 27.0 B. malayi Bm3921c WormBase gene model Yes 33.0 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) L. loa EFO17455.1 WormBase gene model Yes 34.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved; small gap 700bp upstream T. spiralis EFV54821 WormBase gene model Yes 20.8 T. suis M514_05999 Manual gene model Yes 23.6 53 c_sinica_wormbase_Csp5_scaffold_02026.g24449.t2 1 WSHSLEKM------NSGSVMSMSWSPD---GTQLALGTAAGVVFHAHIIDK RLSYEEFEIVQTQKTVIEVRDV 64 m_incognita_genblastg_F38G1.1_ortholog 1 IKRCFLLV------DSSSVQVLNYEGKSICNIKISSFGEPFNEQTAAIAND ------LVVLRDK 52 t_suis_manual_M514_05999 1 MRLRRCTG------DAQTSISFDWSRH-GCGEVLFLGSERRLRCWSPAREE ------SFQICQLEGE 54 t_spiralis_wormbase_EFV54821 1 MRFEIVRKKEFH--TTNGISAIDWSVTKFADELYHCNDDCQLRLWRVVENQ ------SSTIANLSES 59 b_xylophilus_gemoma_BUX.s00609.141 1 MKLRLLNEDAEA - TDSTPVVGVGWLNN- - -ETALY IGDDKRLMQWSTENSQ ------GQEVANYNST 57 s_ratti_wormbase_SRAE_2000426700 1 MKFEITYTKNPS--HQAAVTAVTWIDS---DDAITCGDDEQVLLWDTGAFE ------SKQLMTLKGN 56 m_hapla_genblastg_MhA1_Contig2043.frz3.fgene1 1 MRLRIKLSTSLS-SHSEAINGVDWLNS---DESITTSNDHLTKLWIMEKIE ------S---MDLPIN 54 p_redivivus_gemoma_g21659.t1 1 MKLKVSPDPAPTNAHKEALVGVGWLNS---DEVLSAGDDHVVLKWTIGKRD ------CVPMPALKGT 58 c_angaria_wormbase_Cang_2012_03_13_00929.g14977.t1 1 MKLKLASSRNFR--HNDMIGGVGWLGP---ELILSVADDHNFMITNTATNE ------TQPLMKMNDN 56 c_japonica_genblastg_CJA14719 1 MKLKLYSSRNTR--HSDMVCGVGWIGT---ESILSTGDDHNFLLTNTITNQ ------STQILNMQSN 56 c_brenneri_wormbase_CBN01357 1 MKLKLGTSRKAR--HSDMVCGVGWIGT---DIILSTADDHRFLMTNTATND ------SQQILNMSES 56 c_briggsae_wormbase_CBG13647 1 MKLKLSTSRKAR--HSEMVCGVGWIGT---EAILSAGDDHNFLLTNTQNND ------SQQILSMPET 56 c_elegans_F38G1.1 1 MKLKLSASRKTR--HTEMVCGVGWIGT---EAILSAADDHVFLLTNTATNE ------SQQILNMPET 56 c_remanei_wormbase_CRE17245 1 MKLKLSASRKTR--HTEMVCGVGWIGT---ESILSAADDHLFLLTNTATNE ------SQQILSMQES 56 c_tropicalis_genblastg_Csp11.Scaffold630.g18136.t1+Csp11.Scaf... 1 MKLKLSASRKTR--HSEMVFGVGWIGT---ESILSAADDHIFLLTNTATNE ------SQQILSMQET 56 p_exspectatus_wormbase_scaffold464-EXSNAP2012.8 1 MRLKV IQAREAR- -HADCVCGVGWSNS - - -DEVLSFGDDQRLLKWNMINLE ------ANKMADLPSG 56 p_pacificus_wormbase_PPA00066 1 MRLKV IQAREAR- -HADCVSGVGWSNS - - -DEVLSFGDDQRLLKWNMINLE ------ANKMADLPSG 56 a_suum_wormbase_GS_11756 1 MRLKVT ISREPK - -HKDAVMAVGWLNS - - -DEMLSCGDDQQLLRWNLVSLE ------AHPLIQLPNT 56 d_immitis_manual_nDi.2.2.2.g03308 1 MRLKVIIAREPK--HEDSVLAVAWIRS---NELVSVGDDQQVLKWNLVNSD ------VQVLMHLPSS 56 b_malayi_wormbase_Bm3921c 1 MRLKVTMAREPK--HCDSVLAVAWARS---DELVSVGDDQQVLKWNLVNSD ------VQVLMHLPSS 56 o_volvulus_wormbase_OVOC2818 1 MRLKVTMAREPK--HEDSALAVAWVRS---DGLVSVGDDQQVLKWNLVNSD ------VQVLMHLPSS 56 l_loa_wormbase_EFO17455.1 1 MRLKVTTAREPK--HEDSVLAVAWVRS---DELVSVGDDQQVLKWNLVNSD ------VQMLMHLPSS 56 h_bacteriophora_gemoma_Hba_17689 1 MRLKVVHSRSPR- -HYESVCAVGWANS - - -DEMVSCSDDHNLLRWNL IDME ------ATSVANVPST 56 h_contortus_genblastg_HCOI00912800.t1 1 MRLKVSHSRTPP - -HLDSVAAVGWAGG- - -DE IFSCADDH I LLKWKLGNME ------ATTVTEMPST 56 n_americanus_genblastg_NECAME_12179+NECAME_12180+NE... ------a_ceylanicum_wormbase_Acey_s0214.g2338.t1 1 MRLKVSLARTPR- -HLDTVAAVGWAGG- - -EE I YSYADDHCLLRWNLSTME ------AATVTEMPST 56 c_sinica_wormbase_Csp5_scaffold_02026.g24449.t2 65 SSEVSREI---LETKERISRMSI--LYKYLIVVTSSYIYVY------100 m_incognita_genblastg_F38G1.1_ortholog 53 TQHSLIHL---FDPQNG-RVAGD--GEIKHLSNIV-ELCVNQCGGIVDRRI IFLD------100 t_suis_manual_M514_05999 55 TLWSLLKC---CPKWTTSSRSAAEVSDSLLLASDDGKLCYLSSVGKVEK------100 t_spiralis_wormbase_EFV54821 60 WHPIALRS---CPA-TKHANKSTTVSEAFLLATAEGKIAVISKSG------100 b_xylophilus_gemoma_BUX.s00609.141 58 SYPIQMDV---LHR------QVNTGAKANTHVNGNTSKIDKII DAHEGATLC------100 s_ratti_wormbase_SRAE_2000426700 57 -FPTSIQLSNGISS-TS-NKKSI--KDGIIISTTGGKIIIV-RNNKIEKT------100 m_hapla_genblastg_MhA1_Contig2043.frz3.fgene1 55 FKEFSVFL---FKI-NF------NEIFLITTSDGKIYLLNLNGKLEKIVDAHHGAT------100 p_redivivus_gemoma_g21659.t1 59 TYPLGLHF---YPR-GIGGKHVA--NDIFLLPSTDGKIQIISAAGKVE------100 c_angaria_wormbase_Cang_2012_03_13_00929.g14977.t1 57 FYPTSLHM---FPR-SQ-SKGTQ--NDIFAVSTSDGKVNILSRNGKAEKAV ------100 c_japonica_genblastg_CJA14719 57 FFPTSLHM---FPQ----TKGVQ--NEVFIVSTSDGKVIILSKNGKIEKTI DA------100 c_brenneri_wormbase_CBN01357 57 FYPTSLHI---FPRTSNQSKGGQ--NDVFAVATTDGKINILSRSGKMEK------100 c_briggsae_wormbase_CBG13647 57 FYPTSLHI---FPR-SQAAKGGQ--NDVFAVSTSDGKLNILSRSGKMEKV------100 c_elegans_F38G1.1 57 FFPTSLHI---FPR-SQ-TKGGQ--NDVFAVSTSDGKINILSRNGKVENMV ------100 c_remanei_wormbase_CRE17245 57 FYPTSLHI---FPR-AQ-NKGGQ--NDVFAVSTSDGKVNILSRSGKMEKIV ------100 c_tropicalis_genblastg_Csp11.Scaffold630.g18136.t1+Csp11.Scaf... 57 FFPTSLHI---FPR-SQ-NKGGQ--NDVFVVSTSDGKINILSRSGKLEKVV ------100 p_exspectatus_wormbase_scaffold464-EXSNAP2012.8 57 FYATSLHF---FPR-SL-SKQ----NDIFAVTTSDGKLHLFSRTGKIDRSV EA------100 p_pacificus_wormbase_PPA00066 57 FYATSLHF---FPR-SI-SKQ----NDIFAVTTSDGKLHLFSRTGKIDRSV EA------100 a_suum_wormbase_GS_11756 57 FYPTSMHW---FPK-GQ-LKQSS--NDIFVLTSTEGKFYIYNRSGRLEKVV ------100 d_immitis_manual_nDi.2.2.2.g03308 57 LHPTCMQW---LPY-DY-IKQQM--NDIFILTSTEGKFYICNRKGRIEKIV ------100 b_malayi_wormbase_Bm3921c 57 LYPTGMQW---FPH-DG-FKQQL--NDVFALTSTEGKFYICNRNGRIEKAV ------100 o_volvulus_wormbase_OVOC2818 57 LYPTDMQW---LPQ-DY-AKQQL--TDIFILTSTEGKFYICNRNGRIEKVA ------100 l_loa_wormbase_EFO17455.1 57 LYPTDIQW---FPR-DN-AKQQL--NDIFALTSTEGKFYICSRNGRIEKVV ------100 h_bacteriophora_gemoma_Hba_17689 57 LFPTSMHW---FPR-SA-PKNTN--FETFAMSSSDGKLHIMSHLGRVEKSV ------100 h_contortus_genblastg_HCOI00912800.t1 57 LFATSMHW---FPT-SV-RHDAH--GEVFALSCSDGQIHFVNKMGRIEKSV ------100 n_americanus_genblastg_NECAME_12179+NECAME_12180+NE... 1 - - - - -MQW- - -FPR-TA-KRDGA- -ND IFVLSGTDGRLHFVNKMGKLEKSF DAHKGAALQAKWSPDGTGLLSS 61 a_ceylanicum_wormbase_Acey_s0214.g2338.t1 57 LFATSMQW- - - YPR- TG-KREGS- -SDVFALSCTDGRIHFVNKMGK IEKSF ------100 c_sinica_wormbase_Csp5_scaffold_02026.g24449.t2 ------m_incognita_genblastg_F38G1.1_ortholog ------t_suis_manual_M514_05999 ------t_spiralis_wormbase_EFV54821 ------b_xylophilus_gemoma_BUX.s00609.141 ------s_ratti_wormbase_SRAE_2000426700 ------m_hapla_genblastg_MhA1_Contig2043.frz3.fgene1 ------p_redivivus_gemoma_g21659.t1 ------c_angaria_wormbase_Cang_2012_03_13_00929.g14977.t1 ------c_japonica_genblastg_CJA14719 ------c_brenneri_wormbase_CBN01357 ------c_briggsae_wormbase_CBG13647 ------c_elegans_F38G1.1 ------c_remanei_wormbase_CRE17245 ------c_tropicalis_genblastg_Csp11.Scaffold630.g18136.t1+Csp11.Scaf... ------p_exspectatus_wormbase_scaffold464-EXSNAP2012.8 ------p_pacificus_wormbase_PPA00066 ------a_suum_wormbase_GS_11756 ------d_immitis_manual_nDi.2.2.2.g03308 ------b_malayi_wormbase_Bm3921c ------o_volvulus_wormbase_OVOC2818 ------l_loa_wormbase_EFO17455.1 ------h_bacteriophora_gemoma_Hba_17689 ------h_contortus_genblastg_HCOI00912800.t1 ------n_americanus_genblastg_NECAME_12179+NECAME_12180+NE... 62 GEDGAVKLWSRNGLLRSVIAQMPSPVYCISFDSLSDNIL 100 a_ceylanicum_wormbase_Acey_s0214.g2338.t1 ------

Figure 3.12: Multiple sequence alignment of first 100a.a. of che-2 orthologs. Among 25 nematode genomes, 25 che-2 orthologs are found, and none are not found.

3.3.9 Curation of che-11 orthologs in nematodes che-11 is an IFT-A component and is expressed in ciliated neurons (Qin et al., 2001). che-11 mutants have slightly shortened cilia and IFT proteins accumulate in the axoneme (Qin et al., 2001). We identified che-11 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figures 3.13 and 3.14).

54 Table 3.10: Curation of che-11 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE27368+CRE27367 genBlastG gene model Yes 73.5 C. tropicalis Csp11.Scaffold629.g10374.t1 genBlastG gene model No 6.3 Gap in 5’ end of alignment C. brenneri CBN28654 WormBase gene model Yes 71.0 C. sinica Csp5_scaffold_00035.g1905.t1 WormBase gene model Yes 81.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG23392 WormBase gene model Yes 82.0 C. elegans C27A7.4 - - - C. japonica CJA00302 genBlastG gene model Yes 71.6 C. angaria Cang_2012_03_13_00362.g9752.t2 GeMoMa gene model Yes 34.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved H. bacteriophora C27A7.4_ortholog - No 6.8 End of contig 55 H. contortus HCOI00062600.t1 WormBase gene model No 2.8 First 100a.a. not conserved A. ceylanicum Acey_s0029.g1866.t1 WormBase gene model Yes 25.9 N. americanus NECAME_12412+NECAME_12411 GeMoMa gene model Yes 25.9 No RNA-seq data for first intron, but first +NECAME_12410 100a.a. are conserved; Gap 1.2kb upstream P. pacificus PPA09129+PPA09127 WormBase gene model Yes 13.8 P. exspectatus scaffold91-EXSNAP2012.66 WormBase gene model Yes 15.3 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000291300 GeMoMa gene model Yes 23.2 P. redivivus g16948.t1+g16946.t1 WormBase gene model Yes 18.9 Sparse RNA-seq data for this gene, but first 100a.a. are conserved B. xylophilus BUX.s01109.381 WormBase gene model Yes 17.1 M. incognita C27A7.4_ortholog - No 1.1 End of contig M. hapla MhA1_Contig96.frz3.gene16+ WormBase gene model Yes 18.8 No RNA-seq data, but first 100a.a. are MhA1_Contig96.frz3.gene17+ conserved MhA1_Contig96.frz3.gene18 A. suum GS_05832+GS_01984+GS_18543 - No 11.0 Gap in 5’ end of alignment Curation of che-11 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) D. immitis nDi.2.2.2.t10681 - No 4.2 End of contig O. volvulus OVOC10126 WormBase gene model Yes 11.3 B. malayi Bm2420 - No 3.3 Gap in 5’ end of alignment; ~100bp from end of contig L. loa EFO14850.1+EFO15803.2+ GeMoMa gene model No 20.8 Sparse RNA-seq data for this gene, and first EFO21495.2 100a.a. not conserved T. spiralis EFV54565 WormBase gene model No 14.2 First 100a.a. not conserved T. suis M514_09515/No ortholog found - - - Low sequence similarity (WormBase PID: 18.8, GeMoMa PID: 23.5, genBlastG PID: 15.9) 56 d_immitis_nDi.2.2.2.t10681 1 ------AMRSLVKSGDTARIVFFATA ARNKE IYILAANYLQTLNWK- - -DD 42 b_malayi_wormbase_Bm2420 1 ------MAHMCVKTQR---LDIALLC LGNMGHASGARALKKS------MK 35 h_bacteriophora_gemoma_C27A7.4_ortholog 1 ------NAFTCQVKDFAIFIQLASSCTSSCFD-ANCGGAFDRTRL EEDGQRNGDVFQFLNGTDPA----- 58 h_contortus_wormbase_HCOI00062600.t1 1 ------MGREQSRFRLPMHEPGPIFWDETDDRFLVCHAQASSQHVVDDMILTMFVPSD----- 52 m_incognita_genblastg_C27A7.4_ortholog 1 ------ILYLYLYLEWNREVFTYLIQW------KNIKFY------RK 29 a_suum_wormbase_GS_05832+GS_01984+GS_18543 1 ------MTRFSLLLQNALLICAKRILILCIPLR LDDYSKPLSVKNEKDSLGGEQNITE 52 c_tropicalis_genblastg_Csp11.Scaffold629.g10374.t1 1 ------TLKYIFQSVPRGREDNTRCFIAKRARFISLGVVE------KIERSVFYSLKYLK---LS 50 t_spiralis_wormbase_EFV54565 1 ------MALFNNKELLLDERNEHSQVVCFDMHPIYNTVAVSL- ---QHSSLAEIQIFNFEGSR---LD 55 p_exspectatus_wormbase_scaffold91-EXSNAP2012.66 1 ------MAHPEGDLNDEVHYEILAWSNVSPVLALSST LIHQGRPQGQLQFLSYSNNNIG-ND 55 p_pacificus_wormbase_PPA09129+PPA09127 1 ------MSHPEGDLNDEVHYEILAWSNVSPVLALSST LIHQGRPQGQLQFLSYSNNNIG-ND 55 a_ceylanicum_wormbase_Acey_s0029.g1866.t1 1 ------MEKIFSWSPCSGWLCLAS- ---QVQDAVQINFFTHKGAR---SE 37 n_americanus_gemoma_NECAME_12412+NECAME_12411+... 1 ------MEKIFSWSSCSGWLCLAC- ---PVQDAVQINFFTHKGSR---SE 37 c_angaria_gemoma_Cang_2012_03_13_00362.g9752.t2 1 ------MRPSIVQWSPHSSWICVTT- -FDAENSEGSVSFCDHTGIN---KE 40 c_brenneri_wormbase_CBN28654 1 ------MKPSIIEWAPHCGWICVVT- -KEEKKGISNIAFTDHTGNV---KE 40 c_japonica_genblastg_CJA00302 1 ------MKPSLIEWAPHCGWICVVTQQQNDETEEANVAFTDHSGSI---KE 42 c_remanei_genblastg_CRE27368+CRE27367 1 ------MSVKPLLIEWAPHCGWICVVT- -PSETPGETNVAFTDYTGTI---KE 42 c_elegans_C27A7.4 1 ------MKPFLIEWAPHCGWICVVT- -QDETTGEANVAFSDHSGSV---QE 40 c_briggsae_wormbase_CBG23392 1 ------MKPSLIEWAPHCGWLCVVT- -PDETKGEANIAFSDPSGTI---KE 40 c_sinica_wormbase_Csp5_scaffold_00035.g1905.t1 1 ------MKPSIIEWAPHCGWICVVT- -PDDKKGEANIAFSDPSGTI---KE 40 l_loa_gemoma_EFO14850.1+EFO15803.2+EFO21495.2 1 ------MHHELMEWHPKSGLLALTT- --YHANVGSEINFFTHQAIK---SN 39 o_volvulus_wormbase_OVOC10126 1 MPVLVEEKINNNRKNGNSNDNTDKNDNGTREIQHELMEWHPTSGLLALTT- --YHSNIGSEINFFTHQAVK---SD 70 s_ratti_gemoma_SRAE_2000291300 1 ------MDSKKNDDTVIHSHVLWHPCYDLLAVAS- --FCTSIGGYVTFSDKKSGK---SL 48 m_hapla_wormbase_MhA1_Contig96.frz3.gene16+MhA1... 1 ------MSTTKHLKLEWHSIKDLLAVSS- --INSNSGGFISFFTKKGGK---PF 42 p_redivivus_wormbase_g16948.t1+g16946.t1 1 ------MLVPQSTDANAEESEVLVVTWHSCRDFLACAN- --CFSNDIGVVKFYSKQGGK---LL 52 b_xylophilus_wormbase_BUX.s01109.381 1 ------MLVSKAVDDPSTAASPTHVDLSWHPIRDVMAVAS- --HSPSTGGYVSFVTHKGGD---AF 54 d_immitis_nDi.2.2.2.t10681 43 CDLM--KQIELFYNKANAYEHLASFYEACAQI--EIDDYHDYNKAVDALNE SLRCIAKALQC------100 b_malayi_wormbase_Bm2420 36 NGDPIEVQVAILAIQLGLLDEAQALFTSCGRY--DLVNRLLQT--RNRWDE AFKIAEKYDRIHLRNTYY---- 100 h_bacteriophora_gemoma_C27A7.4_ortholog 59 ------LQIPLLKSKTTHQER-----INHRII--IIILNMEYD--DDEIRGERIIGR------100 h_contortus_wormbase_HCOI00062600.t1 53 ------HGIQMQDLSRKSHACDILVGVSVPHL--YFLKKMEFE--EEEVRGEKSIGRY------100 m_incognita_genblastg_C27A7.4_ortholog 30 NIFKLKADSGIRQILDMPQQNYFLAITEEMMF-YQLMCDDKLVNDKIKVKL AAKS-KYFQMIRVGFGTVAFAY 100 a_suum_wormbase_GS_05832+GS_01984+GS_18543 53 ETAVDSLDTFTNDVKMKRQDNAKIRSIDSGLA--SFVIGGENGGLYLFIE------100 c_tropicalis_genblastg_Csp11.Scaffold629.g10374.t1 51 NRSNCIQLLQVEQIRWSPILSTAALITED-----SLVLLGEN---SITVKMRGKTAAI------100 t_spiralis_wormbase_EFV54565 56 NGELKLFSESFSILQWHPTATLLAFCETKRTF--GIF------NIKTKE IFNV------100 p_exspectatus_wormbase_scaffold91-EXSNAP2012.66 56 DGEMISSTTSITHMSWAPTTEWLVMTWSDGRV--SVSNMADTY--DVEV------100 p_pacificus_wormbase_PPA09129+PPA09127 56 DREMISSTTTITHMSWAPATEWLAMTWSDGRV--SVSNMAEAY--DVEV------100 a_ceylanicum_wormbase_Acey_s0029.g1866.t1 38 DVVVQRRSA-VSTIAWHPTDVIICVAWSDGLL--NVLSPESSA--EFTVDEQISS-KVLHLLWSADGKC---- 100 n_americanus_gemoma_NECAME_12412+NECAME_12411+... 38 NVVV-QRKSDVSTIAWHPTEVIVCVAWSDGLL--NILSPESSGKYKYAATE NVGS-KVLFLSWSADG------100 c_angaria_gemoma_Cang_2012_03_13_00362.g9752.t2 41 PGAS- -RPSP ITCLKWHPTKNLVLLGWKDGT I - -NLVPLGGP I - -SQT IQE NGKFLQNFEILFCKF------100 c_brenneri_wormbase_CBN28654 41 KGPV - -K IGSVSCVRWHTKRQFVCVGWSDGAV - -NFVQKGGAV - -SHTV IE SYPY-PNLGVEWSHDG------100 c_japonica_genblastg_CJA00302 43 SGPT - -KNGTVTCVRWHPKKQFVAVGWRDGGV - -CFVPRGGN I - -SHRVLD NYPH-PNVGVEWSH------100 c_remanei_genblastg_CRE27368+CRE27367 43 PGPT - -KPGVVTCVRWHPKKQFVAVGWKDGGV - -FFVPKGKNV - -SHTV IE SYPF-PNHGVDWSH------100 c_elegans_C27A7.4 41 KGPS - -KPGSVSCVRWHPKKQFVVVGWKDGGV - -CF IPKGGNV - -SHTVVE TYPF-PNQGVDWSHDG------100 c_briggsae_wormbase_CBG23392 41 KGPS - -KTGTVVCVKWHPKKQFVVVGWKDGGV - -CFVPKGGV I - -SHTVME TYPH-PNQGVDWSHDG------100 c_sinica_wormbase_Csp5_scaffold_00035.g1905.t1 41 KGPS - -KAGTVVCVRWHPKKQFVVVGWKDGGV - -CFVPKGGH I - -SHTVME TYPH-PNQGVDWSHDG------100 l_loa_gemoma_EFO14850.1+EFO15803.2+EFO21495.2 40 YLPIRKANSRIIYLCWHPVIDIIATSWDSGDVIFRFVQNDEANCICRKVSE KFVF-PFFKLS------100 o_volvulus_wormbase_OVOC10126 71 YLPIRKANARITYLCWHPVIDIIATSWDSG------100 s_ratti_gemoma_SRAE_2000291300 49 HISKGRKECS I IF IKWHP IEPVLA IAWSSGVV- -CLVKDELVD ISA IEWSL NGK------100 m_hapla_wormbase_MhA1_Contig96.frz3.gene16+MhA1... 43 FSSKVRNQNSPTTFCWHPTESLLA IGWETGHL - -SLVDPTKR- - - TANTKD RKPATSSAGKDS------100 p_redivivus_wormbase_g16948.t1+g16946.t1 53 YTSKEKTAAKPVQILWHPTEAVVVVGWDSGKV- -SL ISP------TNDVEADIPGV------100 b_xylophilus_wormbase_BUX.s01109.381 55 YTLPARQNERPLKLLWHLTDPKVAVGWDSGRI - -S I IEPFDQH- -EKE IE------100

Figure 3.13: Multiple sequence alignment of first 100a.a. of che-11 orthologs. Among 25 nematode genomes, 24 che-11 orthologs are found, and 1 is not found.

p_exspectatus_wormbase_scaffold91-EXSNAP2012.66 1 ------MAHPEGDLND-EVHYEILAWSNVSPVLALSST LIHQGRPQGQLQFLSYSNNNIGNDD 56 p_pacificus_wormbase_PPA09129+PPA09127 1 ------MSHPEGDLND-EVHYEILAWSNVSPVLALSST LIHQGRPQGQLQFLSYSNNNIGNDD 56 a_ceylanicum_wormbase_Acey_s0029.g1866.t1 1 ------MEKIFSWSPCSGWLCLAS- ---QVQDAVQINFFTHKGARSEDVV 40 n_americanus_gemoma_NECAME_12412+NECAME_12411+... 1 ------MEKIFSWSSCSGWLCLAC- ---PVQDAVQINFFTHKGSRSENVV 40 c_angaria_gemoma_Cang_2012_03_13_00362.g9752.t2 1 ------MRPSIVQWSPHSSWICVTT- -FDAENSEGSVSFCDHTGINKEPGA 43 c_brenneri_wormbase_CBN28654 1 ------MKPSIIEWAPHCGWICVVT- -KEEKKGISNIAFTDHTGNVKEKGP 43 c_japonica_genblastg_CJA00302 1 ------MKPSLIEWAPHCGWICVVTQQQNDETEEANVAFTDHSGSIKESGP 45 c_remanei_genblastg_CRE27368+CRE27367 1 ------MSVKPLLIEWAPHCGWICVVT- -PSETPGETNVAFTDYTGTIKEPGP 45 c_elegans_C27A7.4 1 ------MKPFLIEWAPHCGWICVVT- -QDETTGEANVAFSDHSGSVQEKGP 43 c_briggsae_wormbase_CBG23392 1 ------MKPSLIEWAPHCGWLCVVT- -PDETKGEANIAFSDPSGTIKEKGP 43 c_sinica_wormbase_Csp5_scaffold_00035.g1905.t1 1 ------MKPSIIEWAPHCGWICVVT- -PDDKKGEANIAFSDPSGTIKEKGP 43 o_volvulus_wormbase_OVOC10126 1 MPVLVEEKINNNRKNGNSNDNTDKNDNGTREIQHELMEWHPTSGLLALTT- --YHSNIGSEINFFTHQAVKSDYLP 73 s_ratti_gemoma_SRAE_2000291300 1 ------MDSKKNDD-TVIHSHVLWHPCYDLLAVAS- --FCTSIGGYVTFSDKKSGKSLHIS 51 m_hapla_wormbase_MhA1_Contig96.frz3.gene16+MhA1... 1 ------MS-TTKHLKLEWHSIKDLLAVSS- --INSNSGGFISFFTKKGGKPFFSS 45 p_redivivus_wormbase_g16948.t1+g16946.t1 1 ------MLVPQSTD--ANAE-ESEVLVVTWHSCRDFLACAN- --CFSNDIGVVKFYSKQGGKLLYTS 55 b_xylophilus_wormbase_BUX.s01109.381 1 ------MLVSKAVDDPSTAA-SPTHVDLSWHPIRDVMAVAS- --HSPSTGGYVSFVTHKGGDAFYTL 57 p_exspectatus_wormbase_scaffold91-EXSNAP2012.66 57 GEMISSTTSITHMSWAPTTEWLVMTWSDGRVSVSNMA--DTYDVEV------100 p_pacificus_wormbase_PPA09129+PPA09127 57 REMISSTTTITHMSWAPATEWLAMTWSDGRVSVSNMA--EAYDVEV------100 a_ceylanicum_wormbase_Acey_s0029.g1866.t1 41 V---QRRSAVSTIAWHPTDVIICVAWSDGLLNVLSPE--SSAEFTVDEQIS S -KVLHLLWSADGKC 100 n_americanus_gemoma_NECAME_12412+NECAME_12411+... 41 V---QRKSDVSTIAWHPTEVIVCVAWSDGLLNILSPESSGKYKYAATENVGS-KVLFLSWSADG- - 100 c_angaria_gemoma_Cang_2012_03_13_00362.g9752.t2 44 - - - -SRPSP ITCLKWHPTKNLVLLGWKDGT INLVPLG- -GP ISQT IQENGK FLQNFEILFCKF--- 100 c_brenneri_wormbase_CBN28654 44 - - - -VK IGSVSCVRWHTKRQFVCVGWSDGAVNFVQKG- -GAVSHTV IESYP Y-PNLGVEWSHDG- - 100 c_japonica_genblastg_CJA00302 46 - - - - TKNGTVTCVRWHPKKQFVAVGWRDGGVCFVPRG- -GN ISHRVLDNYP H-PNVGVEWSH- - - - 100 c_remanei_genblastg_CRE27368+CRE27367 46 - - - - TKPGVVTCVRWHPKKQFVAVGWKDGGVFFVPKG- -KNVSHTV IESYP F -PNHGVDWSH- - - - 100 c_elegans_C27A7.4 44 - - - -SKPGSVSCVRWHPKKQFVVVGWKDGGVCF IPKG- -GNVSHTVVETYP F -PNQGVDWSHDG- - 100 c_briggsae_wormbase_CBG23392 44 - - - -SKTGTVVCVKWHPKKQFVVVGWKDGGVCFVPKG- -GV ISHTVMETYP H-PNQGVDWSHDG- - 100 c_sinica_wormbase_Csp5_scaffold_00035.g1905.t1 44 - - - -SKAGTVVCVRWHPKKQFVVVGWKDGGVCFVPKG- -GH ISHTVMETYP H-PNQGVDWSHDG- - 100 o_volvulus_wormbase_OVOC10126 74 IR--KANARITYLCWHPVIDIIATSWDSG------100 s_ratti_gemoma_SRAE_2000291300 52 KG--RKECSIIFIKWHPIEPVLAIAWSSGVVCLVKDE------LVD ISA IEWSLNGK- 100 m_hapla_wormbase_MhA1_Contig96.frz3.gene16+MhA1... 46 KV - -RNQNSPTTFCWHPTESLLA IGWETGHLSLVDPT- - -KRTANTKDRKP ATSSAGKDS------100 p_redivivus_wormbase_g16948.t1+g16946.t1 56 KE- -KTAAKPVQILWHPTEAVVVVGWDSGKVSL ISPT- -NDVEAD I ------PGV------100 b_xylophilus_wormbase_BUX.s01109.381 58 PA- -RQNERPLKLLWHLTDPKVAVGWDSGRIS I IEPF - -DQHEKE IE------100

Figure 3.14: Multiple sequence alignment of first 100a.a. of che-11 orthologs, only showing genes with high confidence 5’ start sites. This additional alignment is generated in order to show the conserved regions more clearly and remove noise caused by sequences without high confidence 5’ start sites.

57 3.3.10 Curation of che-13 orthologs in nematodes che-13 is an IFT-B component and is expressed in ciliated neurons (Haycraft et al., 2003). We identified che-13 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.15).

58 Table 3.11: Curation of che-13 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE29839+CRE29840 GeMoMa gene model Yes 79.2 C. tropicalis Csp11.Scaffold547.g3522.t2 - No 3.0 No RNA-seq data, and first 100a.a. not conserved C. brenneri CBN15478+CBN06022 genBlastG gene model Yes 74.8 C. sinica Csp5_scaffold_01003.g17477.t1+ genBlastG gene model Yes 74.8 No RNA-seq data, but first 100a.a. are Csp5_scaffold_01003.g17479.t1 conserved C. briggsae CBG02230+CBG02227 genBlastG gene model Yes 73.8 C. elegans F59C6.7 - - - C. japonica CJA31840+CJA17731 genBlastG gene model Yes 68.2 C. angaria Cang_2012_03_13_00295.g8695.t1 GeMoMa gene model Yes 50.9 H. bacteriophora Hba_01063 GeMoMa gene model Yes 51.4 No RNA-seq data, but first 100a.a. are 59 conserved H. contortus HCOI01582600.t1 WormBase gene model Yes 54.9 A. ceylanicum Acey_s0138.g2078.t1+ WormBase gene model No 7.9 Gap in 5’ end of alignment Acey_s0138.g2076.t1 N. americanus NECAME_02201 GeMoMa gene model Yes 58.9 RNA-seq suggests upstream exon but low coverage P. pacificus PPA31205 GeMoMa gene model Yes 36.9 P. exspectatus scaffold14-EXSNAP2012.31 GeMoMa gene model Yes 35.1 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_X000110600 WormBase gene model Yes 37.6 P. redivivus g17483.t1 WormBase gene model Yes 33.6 Sparse RNA-seq data for this gene, first 100a.a. are partially conserved B. xylophilus BUX.s00972.1 WormBase gene model No 12.3 Gap in 5’ end of alignment M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 14.1, other predictions not found) M. hapla MhA1_Contig353.frz3.gene10 WormBase gene model Yes 45.0 No RNA-seq data, but first 100a.a. are conserved Curation of che-13 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_12766 WormBase gene model Yes 43.8 RNA-seq suggests upstream exon but low coverage D. immitis nDi.2.2.2.t03235 WormBase gene model Yes 38.9 O. volvulus OVOC722 WormBase gene model Yes 39.3 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) B. malayi Bm4588 WormBase gene model Yes 40.7 L. loa EFO14185.2 WormBase gene model Yes 37.5 No RNA-seq data for first intron, but first 100a.a. are conserved; gap 100bp upstream T. spiralis EFV58305 Manual gene model Yes 31.6 T. suis M514_21775 genBlastG gene model Yes 34.8 RNA-seq suggests upstream exon but low coverage 60 p_exspectatus_gemoma_scaffold14-EXSNAP2012.31 1 ---MDDDNSSTGPSL------DVSGPGDAYSILIS SQNLLEKLKCL----SYEKGFIK-GEKGR50 p_pacificus_gemoma_PPA31205 1 ---MEDDNSSDGPSL------HVSGPGDAYSILIS SQNLLEKLKCL----SYEKGFIK-GEKGR50 m_hapla_wormbase_MhA1_Contig353.frz3.gene10 1 ---MENSEETTNSAIRKVSET------EEEEPGRQFDVYIQNELVTERLKLL----NYETEFTELGDSLK57 t_suis_genblastg_M514_21775 1 MEGFGDEALSKEKE------RESGATVNSRNAIRMELLQQKLKLL----NMDGEFVGLTVANK53 t_spiralis_manual_EFV58305 1 ---MNGDSSVEQINS------YQVFLQMEELQQKLKIL----NVDEEFVCKSVAHK43 s_ratti_wormbase_SRAE_X000110600 1 ---MDDNILDSSRENNNIEFENSLRKSSIHLNNKEIIKNDSPGVEYNIFQV SEELLDKLKIL----QYENEFVRKNSSFK73 p_redivivus_wormbase_g17483.t1 1 ---MD------TNDSPAREFDSFAA NELLNERLALL----DYENGIVKERESFK41 h_bacteriophora_gemoma_Hba_01063 1 ---MDNEETLENPETVGRSEI---ENLE------TTVGPGSQYELYLK NEDLVDKLKLL----NYEERFLKMSKAYR61 c_angaria_gemoma_Cang_2012_03_13_00295.g8695.t1 1 MEDIENSEQKEEIENENENEDEIQKTNK------TDEGPGKEYEIYLK SEDFVDKLKLL----HYEKSFLKMGIAYK67 h_contortus_wormbase_HCOI01582600.t1 1 ---MEENE--EKEEE------HEQD------SQEGPGKLYEPCIK NEDLVDKLKLL----DYEEGFLKMNTAFK53 n_americanus_gemoma_NECAME_02201 1 ---MDDNE--EKEEH------GHED------LPEGPGKLYEPYIK NEDIVDKLKLL----NYEDGFLKMNPAFK53 c_japonica_genblastg_CJA31840+CJA17731 1 ---MEDQQ--EEDGQ------QEQL------TSDGPGKEFEIYIK NEELVDKLKLL----NYEDGFLKLGFAYK53 c_brenneri_genblastg_CBN06971 1 ---MEENHQEEHEETNSSQTEAV-ATSA------QEDGPGKEYEIYMRNEELVDKLKLL----NYEEGFLKLGVAYK63 c_elegans_F59C6.7a 1 ---MEEEH--E-EESHLSQSDTV-GSAI------VEDGPGKEYEIYIK NEELVDKLKLL----NYEDGFLKLGVVYK60 c_briggsae_genblastg_CBG02230+CBG02227 1 ---MEDGL--EDDQSNGHQTESTNGLEV------AEDGPGKEYEVYIRNEELVDKLKLL----NYEDGFLKLGVAYK62 c_remanei_gemoma_CRE29839+CRE29840 1 ---MEEIH--EQEESNNHQPDSI-TAET------TEDGPGKEYEIYTRNEELVDKLKLL----NYEEGFLKLGVAYK61 c_sinica_genblastg_Csp5_scaffold_01003.g17477.t1+Csp5_... 1 ---MEDSH--EDDQLSNHPSESIGTQET------AEDGPGKEYEMYTRNEELVDKLKLL----NYEDGFLKLGVAYK62 a_suum_wormbase_GS_12766 1 ---MEEDEKEGQEEG------SSRA------GAQGPGLRYSLFSQNEDLIDKLKLL----NYEEEYVSSSSSHR55 b_malayi_wormbase_Bm4588 1 ---MERKEDGEEKKN------E-EV------SDRSPAQKYDLYVL SDELSDKLKLL----NYEEDYAKLAPSYR54 l_loa_wormbase_EFO14185.2 1 ---MEGKEDGEEKEN------EQDG------NDRSPAQQYELYVL SDELNDKLKLL----NYEEDYTNLAPTYR55 o_volvulus_wormbase_OVOC722 1 ---MEEKKDGDEKED------GEEG------KDQSPAQQYDLYVL NDELSDKLKLL----NYEEDYANLAASYR55 d_immitis_wormbase_nDi.2.2.2.t03235 1 ---MEGKEDADEKAD------GQE------DRYPAQQYDVYVL SDELNDKLKLL----NYEQDYANSAASYR53 b_xylophilus_wormbase_BUX.s00972.1 1 XHYFAQSTNVGEQFFLFAELVAWLMRKAQWK------DVNIPSSNDDPNTV ANLIQEALKVKGIEGNYATQKLKSGSGRQ74 a_ceylanicum_wormbase_Acey_s0138.g2076.t1+Acey_s0138... 1 ---M------VAQGEVGVQDINA------DFPANKLKSGAGDA28 c_tropicalis_wormbase_Csp11.Scaffold547.g3522.t2 1 ---MNSNA------L SNPQTMRIPI-----DFTAAKLKSGAGEN30

p_exspectatus_gemoma_scaffold14-EXSNAP2012.31 51 PIEKHNFI-SRSSGEAFFSFVSLSAWLIGQCG-YTSFLTPSQVDDPNTTLA N------100 p_pacificus_gemoma_PPA31205 51 PIEKHNFI-SRSSGEAFFSFVSLSAWLIGQSG-NTSFPTPSQVDDPNTTLA N------100 m_hapla_wormbase_MhA1_Contig353.frz3.gene10 58 TVPKYYFVRSTNVGEQFHLFTSLCTWLIRKSGIVNDMEMPHEF------100 t_suis_genblastg_M514_21775 54 SISRHYFAFSTNSAEQFFQMTSLAAWLLRKCG-NSNFPLPKEDDNPNN------100 t_spiralis_manual_EFV58305 44 YISRHYFAVSTNPGEQFFMFVTLAAWLIRKCG-FRNFKEPQEHDDPNVVVSGILDAVK------100 s_ratti_wormbase_SRAE_X000110600 74 RIHKYYFVKSTNTGEQFFLFTNLAAWL------100 p_redivivus_wormbase_g17483.t1 42 PLPRHYFVKSTNSGEQFFLFTNLAVWLMKVNG-QMDLEYPQEFDDPNATIS AILAVLKAE------100 h_bacteriophora_gemoma_Hba_01063 62 PIQRHYFVTTTNVGEQFYLFTALASWLIKIGG-NQEFEMP------100 c_angaria_gemoma_Cang_2012_03_13_00295.g8695.t1 68 PIQRHYFVKSTNVGEQFFLFTSIAAWLIRKCG-D------100 h_contortus_wormbase_HCOI01582600.t1 54 PIHRYYFVQSKNVGEQFFMFTSLAAWLIRKCG-NESYEMPQEFDDPNA------100 n_americanus_gemoma_NECAME_02201 54 PVQRHYFVYSKNVGEQFFMFTSLAAWLIRKGG-NESYEMPQEFDDPNA------100 c_japonica_genblastg_CJA31840+CJA17731 54 PILKHYFVRSRNIGEQFFLFTSLAAWLIKKSG-EESFSMPQEFDDPNS------100 c_brenneri_genblastg_CBN06971 64 PILKHYFVKSRNVGEQFFLFTSLAAWLIKKSG-EESFN------100 c_elegans_F59C6.7a 61 PILKHYFVKSKNVGEQFFLFTSLAAWLIKKSG-DESYNMPQ------100 c_briggsae_genblastg_CBG02230+CBG02227 63 PISKHYFVKSVNVGEQFFLFTSLAAWLIKKSG-EESYIM------100 c_remanei_gemoma_CRE29839+CRE29840 62 PILKHYFVKSRNVGEQFFLFTSLAAWLIRKSG-DDSYNMP------100 c_sinica_genblastg_Csp5_scaffold_01003.g17477.t1+Csp5_... 63 PILKHYFVKSRNVGEQFFLFTSLAAWLIKKSG-ENSYNM------100 a_suum_wormbase_GS_12766 56 PISRHYFVESSNIGEQFFLFTTLCGWLIQKAI-NPNFPMPQEFDDP------100 b_malayi_wormbase_Bm4588 55 TVSREYFVASTNIGEQFFIFTTLSAWLIQKSI-DPTFIIPQEFDDPN------100 l_loa_wormbase_EFO14185.2 56 TVSREYFVKSTNIGEQFFIFTTLSAWLIQKAI-DSAFTLPQEFDDP------100 o_volvulus_wormbase_OVOC722 56 TISREYFVKNTNIGEQFFIFITLSAWLIQKAI-DPSFAFPEEFDDP------100 d_immitis_wormbase_nDi.2.2.2.t03235 54 TLSREYFVKTTNIGEQFFIFTTLSAWLIQKAI-DPSFAFSQEFNDPNG------100 b_xylophilus_wormbase_BUX.s00972.1 75 CVEILLALAEAALAANSFQFERMVPI------100 a_ceylanicum_wormbase_Acey_s0138.g2076.t1+Acey_s0138... 29 VLHVLDSLADAALIHTNFKWEKMIP-PEKEDD-DVAVDQEEEDDDETEVADD--DYIDDDDGG--VYVDLSAPLNNEQ 100 c_tropicalis_wormbase_Csp11.Scaffold547.g3522.t2 31 VIFLLSALADTALVHVGFQWQKMIP-PKEEDE-DTAVDEQEDDDDNEEIPE EPSNFLDDDEDENVIEIDLKA------100

Figure 3.15: Multiple sequence alignment of first 100a.a. of che-13 orthologs. Among 25 nematode genomes, 24 che-13 orthologs are found, and 1 is not found.

3.3.11 Curation of dyf-1 orthologs in nematodes

dyf-1 is an IFT-B component and is expressed in ciliated neurons (Ou et al., 2005a). dyf-1 participates in IFT and is required for OSM-3 to dock onto and move IFT particles (Ou et al., 2005a). In dyf-1 mutants, cilia are truncated and IFT particles move at the slower speed of kinesin-II, suggesting that OSM-3 is not participating in IFT (Ou et al., 2005a). We identified dyf-1 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.16).

61 Table 3.12: Curation of dyf-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE20997 WormBase gene model Yes 91.0 C. tropicalis Csp11.Scaffold629.g9792.t1+ GeMoMa gene model Yes 83.0 No RNA-seq data, but first 100a.a. are Csp11.Scaffold629.g9794.t1 conserved C. brenneri CBN29399 WormBase gene model Yes 95.0 C. brenneri CBN32059 WormBase gene model Yes 95.0 C. sinica Csp5_scaffold_00804.g15616.t3 WormBase gene model Yes 84.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG10774+CBG10775 WormBase gene model Yes 90.0 C. elegans F54C1.5 - - - C. japonica CJA04666 WormBase gene model Yes 84.0 C. angaria Cang_2012_03_13_00404.g10388.t1 genBlastG gene model Yes 71.0 62 H. bacteriophora Hba_17034 - No 12.4 Gap in 5’ end of alignment H. contortus HCOI01728500.t1 WormBase gene model Yes 61.4 A. ceylanicum Acey_s0006.g3075.t1 WormBase gene model Yes 59.4 N. americanus NECAME_16555 - No 15.7 End of contig P. pacificus PPA06719+PPA06701 WormBase gene model Yes 55.4 P. exspectatus scaffold555-EXSNAP2012.10 WormBase gene model Yes 56.4 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000141200 WormBase gene model Yes 51.5 P. redivivus g18346.t1 WormBase gene model Yes 53.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved; 100bp from end of contig B. xylophilus BUX.s00397.128 WormBase gene model Yes 48.5 M. incognita Minc18233 - No 1.1 End of contig M. hapla MhA1_Contig695.frz3.fgene1 WormBase gene model Yes 53.4 No RNA-seq data, but first 100a.a. are conserved Curation of dyf-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_12254 WormBase gene model No 56.4 Conflicting RNA-seq junctions D. immitis nDi.2.2.2.t05537 WormBase gene model Yes 50.5 O. volvulus OVOC11874 WormBase gene model Yes 51.5 B. malayi Bm4342 WormBase gene model Yes 52.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved L. loa EFO27943.1 WormBase gene model Yes 52.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV56115 GeMoMa gene model Yes 41.7 T. suis M514_05864 GeMoMa gene model Yes 32.4 63 m_incognita_wormbase_Minc18233 1 ------MTNS--NEEAEEIMKRVEREENVN------TDKKSFHL SII-IIIGTLYCAKSNYEFGISRI 53 h_bacteriophora_wormbase_Hba_17034 1 MTIVAEHSIGRVRQEEGERNGDAFRFLNGTDPALQIPRCLIYTTSPSQHLC AKLRNERGRTLIVELGV-GMVTEG74 n_americanus_wormbase_NECAME_16555 1 ---MAQDQMNELR--N--LTKKVQEIRQSNDET-TARKAVEAFDETLELYL PLL -MAQAK I YWDKGDY-ARVEK I 65 t_suis_gemoma_M514_05864 1 ------MLYS--E--AAKKTRTVYTLIRNGQYN-EAAQLLNQQLNVDHTNRAAL-SLLGYCYVQLHNF-TGASRC 62 t_spiralis_gemoma_EFV56115 1 ------MQF--IE--DGDKTKTIYTLIKEGHYS-DVIPLLTFQLTINNNNRAAL -SLLAYCYWQ IQAF -SSAAEC 62 s_ratti_wormbase_SRAE_2000141200 1 ------MPNYHIK--DGEYTATIYSLIKDDKYS-EVIRILQDELDRTQNNRPAL-SLLAYCYFYTQDY-VLAANC 64 c_angaria_genblastg_Cang_2012_03_13_00404.g10388.t1 1 -----MNSVMNIK--DGEYTATVYGLIKDQKFI-EATRILQYQNERDPKNRAAL-SLLAYCYYHTQDF-SSAADC 65 c_tropicalis_gemoma_Csp11.Scaffold629.g9792.t1+Csp11.Sca... 1 -----MNSLLNIK--EGEYTNTIYGLISDHKFN-DAIRILQYQHERSPKNL AAL-SLLAYCYYYTQDF-MNAADC 65 c_sinica_wormbase_Csp5_scaffold_00804.g15616.t3 1 -----MNALLNIK--EGEYTSTIYSLIHEHKFN-DAIRILQYQHERNPKNL AAL-SLLAYCHYYTQDF-QGAADC 65 c_japonica_wormbase_CJA04666 1 -----MNSMLNIK--EGEFTSTIYGMIREQKFS-EAIRILQYQHERNPKNL AAL-VLLAYCHYYTQDF-MSAADC 65 c_briggsae_wormbase_CBG10774+CBG10775 1 -----MNAMLNIK--EGEFTSTIYTLIHEHKFN-DAIRILQYQHERNPKNL AAL-SLLGYCYYYTQDF-QNAADC 65 c_brenneri_wormbase_CBN29399 1 -----MNALLNIK--EGEFTSTIYGLIHEHKFN-DAIRILQYQHERNPKNL AAL-SLLAYCYYYTQDF-MNAADC 65 c_brenneri_wormbase_CBN32059 1 -----MNALLNIK--EGEFTSTIYGLIHEHKFN-DAIRILQYQHERNPKNL AAL-SLLAYCYYYTQDF-MNAADC 65 c_elegans_F54C1.5a 1 -----MNAMLNIK--EGEFTSTIYTLIHEHKFN-DAIRILQYQHERNPKNL AAL-SLLAYCYYYTQDF-MNAADC 65 c_remanei_wormbase_CRE20997 1 -----MNAMLNIK--EGEFTSTIYGYIHEQKFN-EAIRILQYQHERNPKNL AAL-SLLAYCYYYTQDF-MNAADC 65 p_exspectatus_wormbase_scaffold555-EXSNAP2012.10 1 ------MSFAPIK--DGEYTSTIYGLIRDNRYN-DVIRILLYEVQKAPSNRAAL-SLLAYSYYYTQDF-PNAAIC 64 p_pacificus_wormbase_PPA06719+PPA06701 1 ------MSFAPIK--DGEYTSTIYGLIRDNRYN-DVIRISLYEVQKAPSNRAAL-SLLAYSYYYTQDF-PNAAIC 64 b_xylophilus_wormbase_BUX.s00397.128 1 ------MAFVPIR--DGEYTSTIYGMIRDNRFS-DAMRILQYELQRLPNSRAGL-SLLGYCYYQVQDY-LMAAET 64 m_hapla_wormbase_MhA1_Contig695.frz3.fgene1 1 ------MAPIK--DGEFTSTIYGMIRDNKFT-DAMRVLQYEVQRNSESRAAL-SLLGYCYYYIQDY-IMAAEC 62 a_ceylanicum_wormbase_Acey_s0006.g3075.t1 1 ------MSFAPIK--DGEFTSTIYGMIKEGKYT-EVIRVLQYEVQRAPSNRAAL-SLLGYCYYYCQDF-INAVDA 64 h_contortus_wormbase_HCOI01728500.t1 1 ------MSFVPIK--DGEFTSTIYGMIKEGKYN-EVIRILQYEVQRAPTNRAAL-SLLGYCSYYTQDY-INAVDA 64 p_redivivus_wormbase_g18346.t1 1 ------MAFVPIK--DGEFTTTIYGFIKENRYA-DCIRVLQYELQRTPNSRAAL-SLLAYCHYSTQDY-PMAAEC 64 a_suum_wormbase_GS_12254 1 ------MSFAPIK--DGEFTSTIYGMIKEHRYN-DVMRILQYELQRTPKSRAAL-SLLGYCYFYTQDY-PLAAEC 64 b_malayi_wormbase_Bm4342 1 ------MPFAPIK--DGEFTSTIYGMIKERRYE-NAIRSLQYELQRTPNSRAAL-SLLGYCYFYLQQF-IEAAEC 64 d_immitis_wormbase_nDi.2.2.2.t05537 1 ------MPFAPIK--DGEFTSTIYGMIKEGKYE-NAIRSLQYELQRMPNSRAAL-SLLGYCHFYQQEY-VEAATY64 l_loa_wormbase_EFO27943.1 1 ------MPFAPIK--DGQFTSTIYGMIKEGRHD-NAIRSLQYELQRAPNSRAAL-SLLGYCYFYLQQF-AEAAEC 64 o_volvulus_wormbase_OVOC11874 1 ------MPFAPIK--DGEFTSTIYGMIKEGRYE-NAIRSLQYELQRTPNSRAAL-SLLAYCHFYLQEF-AEAAEY64 m_incognita_wormbase_Minc18233 54 VRALEPCERKLGVDTWFYSKRCL-TSMMENIAKCVIVIRDDVLIECLQ 100 h_bacteriophora_wormbase_Hba_17034 75 MD-----LRSVGNTMVLHETSLI-EAFNLKFA------100 n_americanus_wormbase_NECAME_16555 66 FRKSVEFCSEHDTWKLNVAHTLFMQEQKFKEAAGF------100 t_suis_gemoma_M514_05864 63 YSQLVLSYPYFERYR---QTILF-HPYKLMLTTIIVIELNND------100 t_spiralis_gemoma_EFV56115 63 YRQLTLLYPNHEKYRLYLAQCYY-HSYMLDEAMTAAGKI------100 s_ratti_wormbase_SRAE_2000141200 65 YEQLCNLQPKNQEYRLNLAQSYY-NAFQFQDALTAVN------100 c_angaria_genblastg_Cang_2012_03_13_00404.g10388.t1 66 YSQLSYNFPAHQQYKIFHAQALY-NAFRMTDALAVI------100 c_tropicalis_gemoma_Csp11.Scaffold629.g9792.t1+Csp11.Sca... 66 YSQLAYNYPNHTEYRLYHAQALY-NAFRPADALTVV------100 c_sinica_wormbase_Csp5_scaffold_00804.g15616.t3 66 YSQLAYNFPNHPQYRLYHAQALY-HAFKPAEALGVV------100 c_japonica_wormbase_CJA04666 66 YSQLAYNYPNHSKYKLYHAQSLY-NAFRPEDALAVV------100 c_briggsae_wormbase_CBG10774+CBG10775 66 YSQLSYNYPHHSQYKLYHAQALY-HAFKPAEALNVV------100 c_brenneri_wormbase_CBN29399 66 YSQLSYNFPNQSQYRLYHAQSLY-NAFRPADALAVV------100 c_brenneri_wormbase_CBN32059 66 YSQLSYNFPNQSQYRLYHAQSLY-NAFRPADALAVV------100 c_elegans_F54C1.5a 66 YSQLSYNFPQYSQYKLYHAQSLY-NAFRPADALAVV------100 c_remanei_wormbase_CRE20997 66 YSQLSYNYPNHSQYKLYHAQALY-NAFRPADALAVI------100 p_exspectatus_wormbase_scaffold555-EXSNAP2012.10 65 YEKLASLYPTLPKYRLYHAQALY-NAFQLTDALGVIA------100 p_pacificus_wormbase_PPA06719+PPA06701 65 YEKLASLYPTLPKYRLYHAQALY-NAFQLTDALGVIA------100 b_xylophilus_wormbase_BUX.s00397.128 65 YGKLSELYGDHAEYKLYHAQALY-HAFMFPEAETVIH------100 m_hapla_wormbase_MhA1_Contig695.frz3.fgene1 63 YEKLSELYPQHTDYRLYHAQALY-NAYMFPEAVSVLALI------100 a_ceylanicum_wormbase_Acey_s0006.g3075.t1 65 YAQLAQLFPNFPEYKLYHAQSLY-NAFMLQEALQVVS------100 h_contortus_wormbase_HCOI01728500.t1 65 YSQLSRLFPNFPEYKVYHAQSLY-NAFRLQEALQIVS------100 p_redivivus_wormbase_g18346.t1 65 YDKLSQLVPDHEEYRLYHAQALY-NAFLLPEAVAVLS------100 a_suum_wormbase_GS_12254 65 YDQLSQHYPNYPEYRLYYAQSLY-NAFMFSEAITLLS------100 b_malayi_wormbase_Bm4342 65 YEQLVRLHSSYPEYRLYWAQSLY-NAFMFPEASAVIS------100 d_immitis_wormbase_nDi.2.2.2.t05537 65 YEQLVQLYPTYAEYRLYWAQSLY-NAFMFSEAIAIIS------100 l_loa_wormbase_EFO27943.1 65 YEKLVQLYSTYPEYKLYWAQSLY-NAFMFPEATAVVS------100 o_volvulus_wormbase_OVOC11874 65 YEKLVQLYPTYTEYRLYWAQSLY-NAFMFPEATAVLS------100

Figure 3.16: Multiple sequence alignment of first 100a.a. of dyf-1 orthologs. Among 25 nematode genomes, 26 dyf-1 orthologs are found, and none are not found. Note: C. brenneri contains two dyf-1 genes, and both genes have high confidence 5’ start sites.

3.3.12 Curation of dyf-2 orthologs in nematodes dyf-2 is involved in the assembly of IFT particles, and is expressed in ciliated neurons in amphid and phasmid neurons (Efimenko et al., 2006; Wei et al., 2012). In dyf-2 mutants, the BBSome complex forms but accumulates at the cilia base, and IFT turnaround at the cilia tip is defective, resulting in accumulation of IFT-B at the cilia tip (Wei et al., 2012). We identified dyf-2 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figures 3.17 and 3.18).

64 Table 3.13: Curation of dyf-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE04849+CRE04848 Manual gene model Yes 58.8 C. tropicalis Csp11.Scaffold628.g7258.t1+ GeMoMa gene model Yes 71.4 No RNA-seq data, but first 100a.a. are Csp11.Scaffold628.g7260.t1 conserved C. brenneri CBN15180+CBN13504 WormBase gene model Yes 58.0 C. sinica Csp5_scaffold_00519.g12390+ - No 14.5 100bp from end of contig Csp5_scaffold_00519.g12388 C. briggsae CBG18281 Manual gene model Yes 58.0 C. elegans ZK520.3a - - - C. japonica CJA12956 - No 17.3 Gap in 5’ end of alignment C. angaria Cang_2012_03_13_00051.g2661.t1+ WormBase gene model Yes 41.2 Cang_2012_03_13_00051.g2663.t1 H. bacteriophora Hba_17797+Hba_17798+ - No 1.0 Gap in 5’ end of alignment

65 Hba_17799+Hba_17800 H. contortus HCOI00608800.t1+ HCOI00608600.t1 WormBase gene model Yes 18.2 A. ceylanicum Acey_s0283.g1331.t5 WormBase gene model Yes 29.8 N. americanus NECAME_16635+ NECAME_16634+ - No 12.8 Gap in 5’ end of alignment NECAME_16633+ NECAME_16630+ NECAME_16629 P. pacificus PPA24696+PPA24694+ Manual gene model Yes 24.2 PPA24692+PPA24691 P. exspectatus scaffold705-EXSNAP2012.7+ WormBase gene model No 22.6 No RNA-seq data, and first 100a.a. not scaffold705-EXSNAP2012.8+ conserved scaffold705-EXSNAP2012.9+ scaffold705-EXSNAP2012.10 S. ratti SRAE_2000014600+ genBlastG gene model Yes 26.0 SRAE_2000014400 P. redivivus g2736.t1 WormBase gene model Yes 19.0 No RNA-seq data for this gene, but first 100a.a. are conserved B. xylophilus BUX.s00422.326+BUX.s00422.325 genBlastG gene model Yes 20.8 M. incognita Minc00367 WormBase gene model No 16.4 First 100a.a. partially conserved M. hapla MhA1_Contig1040.frz3.fgene1 - No 20.9 No RNA-seq data, first 100a.a. partially conserved Curation of dyf-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_12755 WormBase gene model No 23.0 Gap in 5’ end of alignment D. immitis nDi.2.2.2.t05972 WormBase gene model Yes 32.2 O. volvulus OVOC7996 Manual gene model Yes 28.9 B. malayi Bm7486a GeMoMa gene model Yes 26.9 Gene begins with GTG L. loa EFO25666.2 GeMoMa gene model Yes 26.7 Gene begins with GTG T. spiralis EFV58429/No ortholog found - - - Low sequence similarity (WormBase PID: 11.8, GeMoMa PID: 19, genBlastG: 13.6) T. suis M514_10553 WormBase gene model No 15.3 First 100a.a. not conserved 66 c_sinica_wormbase_Csp5_scaffold_00519.g12390.t1+Csp5_... 1 ------MWFAAVSMIDWMLSSIQL------WTMRSTN------FHQLDP- 31 c_japonica_wormbase_CJA12956 1 ------MRRDLLNWPKALVLAEKMDVKEI----PF------LSKEYAQ------ELELTGD 39 h_bacteriophora_wormbase_Hba_17797+Hba_17798+Hba_17799... 1 ------MGRIHNPGK-GMSKS--- -ALPYRRSVPS------WLKLTSE 31 m_hapla_wormbase_MhA1_Contig1040.frz3.fgene1 1 ------MKVTLNNKFRNL-INISA--- -IIAHKNLMIV------RLSEPE- 33 n_americanus_wormbase_NECAME_16635+NECAME_16634+... 1 ------MFRYLLLAVLF----QN--- -LFPWTSGKDNGGEDEPYITV--- 33 m_incognita_wormbase_Minc00367 1 ------MKLLFRIHNPQLYGNGPL--- -ISEWRPG-GN------NIAVAG- 33 b_xylophilus_genblastg_BUX.s00422.326+BUX.s00422.325 1 ------MKVLFRETDQSLIGNGSI--- -VFDWRPG-GN------HIAVGG- 33 t_suis_wormbase_M514_10553 1 MSFKQLFSRLPLKCVLLPGKHGSTKSTKLSMKQLFSIDCRPL-DLQQI--- -HFSIQPKLRN------LIAVSN- 63 p_redivivus_wormbase_g2736.t1 1 ------MKLLYFVPRPEG-IKDDVTINMLFSWRPQ-GN------YLASSN- 36 p_exspectatus_wormbase_scaffold705-EXSNAP2012.7+adjacent 1 ------MAKILYSKREDYF-GPGAT--- -KFRWRSG-GN------MLAASGS 34 p_pacificus_manual_ZK520.3a 1 ------MAKILYSKREDYF-GPGTT--- -KFRWRSG-GN------MLAASG- 33 a_ceylanicum_wormbase_Acey_s0283.g1331.t5 1 ------MKLLFRVTDETV-GPGKT--- -CIAWRPN-GN------TLAMAS- 32 h_contortus_wormbase_HCOI00608800.t1+HCOI00608600.t1 1 ------MKLLFRITEETV-GPGKT--- -CIAWRPN-GN------TLALAR- 32 c_angaria_wormbase_Cang_2012_03_13_00051.g2661.t1+Cang... 1 ------MSMKIQYRRSADEI-GNGTP--- -IVKWRPN-SH------ALAVAC- 34 c_elegans_ZK520.3a 1 ------MSLKVIPCTLTKNQEVFKCVSAQLQYRRGEEEH-GSGPI--- - IHRWRPN-GH------TVAVAC- 53 c_briggsae_manual_CBG18281 1 ------MSLKLQYRKGEEEH-GSGPI--- - IHRWRPH-GH------TLAVAC- 34 c_brenneri_wormbase_CBN15180+CBN13504 1 ------MSLKLQYRKGEEEH-GPGPI--- - IHRWRPH-GH------TLAVAC- 34 c_tropicalis_gemoma_Csp11.Scaffold628.g7258.t1+Csp11.Sca... 1 ------MSLKVTLHDFTFLVITSRVKLQYRKGEEEH-GTGEI--- - IHRWRPH-GH------TLAVAC- 50 c_remanei_manual_CRE04849+CRE04848 1 ------MTLKLQYRKGEEEH-GTGPI--- - IHRWRPH-GH------TLAVAC- 34 s_ratti_genblastg_SRAE_2000014600+SRAE_2000014400 1 ------MKFLFKLTEEQN-GSGEA--- -LCRWRPE-GN------YLAVAS- 32 a_suum_wormbase_GS_12755 1 ------QLIYRLTDKEL-GEGPV--- -TLEWRPG-GT------YIAVSG- 31 b_malayi_gemoma_Bm7486a 1 ------VHLQLICSLTEKEL-GEGEP--- -IFEWRPK-GN------YLAVAG- 34 l_loa_gemoma_EFO25666.2 1 ------VHLQLICSLTEKEL-GEGEP--- -IFEWRPK-GS------YIAIAG- 34 d_immitis_wormbase_nDi.2.2.2.t05972 1 ------MKLICSLTEREL-GEDEP--- -IFEWRPN-GN------YLAVAG- 32 o_volvulus_manual_OVOC7996 1 ------MKLIWSLTEKEL-GEGEP--- -IFEWRPK-GN------YLAVAG- 32 c_sinica_wormbase_Csp5_scaffold_00519.g12390.t1+Csp5_... 32 ------VPMYEIPSNPHLYTFRFQYKGAIWET-FTIDKNTFAVFDSQSIYVFLL 78 c_japonica_wormbase_CJA12956 40 ------HANALVNYE----KGVIDDADTPELQEHNE-ICQSG--IARMAIKTGDL--- 81 h_bacteriophora_wormbase_Hba_17797+Hba_17798+Hba_17799... 32 DV------QEHIVRLAKKGLRPSQIGVILRDSHGVAQVRRVTGNKIFRILKAKAS--FLY83 m_hapla_wormbase_MhA1_Contig1040.frz3.fgene1 34 ------HPTNLQFQE---RYGNLVSTVWFGNGNL LVGFDA-GFVVCLSAERSNDSISQ81 n_americanus_wormbase_NECAME_16635+NECAME_16634+... 34 ------SFNSLEAHP---NLPRAVRYLKRDSGVL LYRFHK------LMKDANFHIISF 76 m_incognita_wormbase_Minc00367 34 ------PSGVVRVFN---RYNEVTEEFNIND------WNPHRALLLICDNRGNFVLFD 76 b_xylophilus_genblastg_BUX.s00422.326+BUX.s00422.325 34 ------NNGVVCLYD---RYGERTEEVRGGSKID LLRWDSEGE ILA IASV IYSLVVYS 82 t_suis_wormbase_M514_10553 64 ------SSNFVTVYN---RQGEVFDQIKTFGHPC AIEWHPNGLTLA------100 p_redivivus_wormbase_g2736.t1 37 ------GMVSVRICD---RYGEIFDEFNAPEKDRFLEWHRDGDYLAAVGSNAPHVTIY85 p_exspectatus_wormbase_scaffold705-EXSNAP2012.7+adjacent 35 KDQLAKSEEIKDQRSGMMNTNSSSNNILAIFD---RKGDQIDCHHTTSGVI DFAWDFEGDVIAMLIDKS------100 p_pacificus_manual_ZK520.3a 34 ------SNNILAIFD---RKGDQIDCHHTTSGVI DFAWDFEGDVIAMLIDKSPILTLF 82 a_ceylanicum_wormbase_Acey_s0283.g1331.t5 33 ------ANKSVILYD---RKGAIIDVLDVTGNVI GMSWDKEGDVLGILTDGSSQALLW81 h_contortus_wormbase_HCOI00608800.t1+HCOI00608600.t1 33 ------NVVDMAWDKEGDVLGIIVDASALAILW59 c_angaria_wormbase_Cang_2012_03_13_00051.g2661.t1+Cang... 35 ------PNQSVIYYD---KKGNIIDAVEMTGPVL DIAWDKEGDVLAIAQTSMSTITLW83 c_elegans_ZK520.3a 54 ------ANNTVIYYD---KKGNVIDALNPTGKLI DIAWDKEGDVLAIAVANTGTIY-- 100 c_briggsae_manual_CBG18281 35 ------ANNTVIYYD---KKGNIIDALSPNGRIVDIAWDKEGDVLAIAVANTGTIYLW83 c_brenneri_wormbase_CBN15180+CBN13504 35 ------GNNSVIYYD---KKGNIIDALNPTGKIVDIAWDKEGDVLAIAVANTGTIYLW83 c_tropicalis_gemoma_Csp11.Scaffold628.g7258.t1+Csp11.Sca... 51 ------ANNSVIYYD---KKGGVIDALNPTGKIVDIAWDKEGDVLAIAVANTGTIYLW99 c_remanei_manual_CRE04849+CRE04848 35 ------ANNSVIYYD---KKGNVIDALNPTGKIVDIAWDKEGDVLAIAVANTGTIYLW83 s_ratti_genblastg_SRAE_2000014600+SRAE_2000014400 33 ------FNSFIRIYD---KSGKIIDEFNLEGPTI NMEWDKSGNILAITTTDSPYLTLW81 a_suum_wormbase_GS_12755 32 ------SNKHVKVFD---RNGALLEEIIQPGLVT SMTWDKDGDVLALTNDKTTAVTLW80 b_malayi_gemoma_Bm7486a 35 ------SSNLVKLYD---RSGNLIDELVIPGQVE ALSWDRDGDMLAIMNDKSTAVTLW83 l_loa_gemoma_EFO25666.2 35 ------STNWVKLYD---RNGNLIDELILPGRVE ALSWDRDGDMLAIMNDKSTVITLW83 d_immitis_wormbase_nDi.2.2.2.t05972 33 ------SNNLVKLYD---RSGNLIDKLVLSGQAK ALSWDRDGDVLAIVNDKNTAITLW81 o_volvulus_manual_OVOC7996 33 ------WNNLVKLYD---RSGNIIDELILPGRAK ALSWDRDGDVLAIINDKNTAITLW81 c_sinica_wormbase_Csp5_scaffold_00519.g12390.t1+Csp5_... 79 SKQH--IQGDSVIYVSPTRLPHAY------100 c_japonica_wormbase_CJA12956 82 RRGV--QMAKQLEGRVVKKDC------100 h_bacteriophora_wormbase_Hba_17797+Hba_17798+Hba_17799... 84 QLLQ--GMAPEIPEDLYHL------100 m_hapla_wormbase_MhA1_Contig1040.frz3.fgene1 82 ELFSIQEYKNGLACLMKTR------100 n_americanus_wormbase_NECAME_16635+NECAME_16634+... 77 EVGD--RLTAHYALKGVDCPQFINFL------100 m_incognita_wormbase_Minc00367 77 YLLS--RKVPIMGKHQRSIRTCAWNK------100 b_xylophilus_genblastg_BUX.s00422.326+BUX.s00422.325 83 SF----RGLPTFMQWSKKHQVL------100 t_suis_wormbase_M514_10553 ------p_redivivus_wormbase_g2736.t1 86 TMST--KKSNDVDLGLA------100 p_exspectatus_wormbase_scaffold705-EXSNAP2012.7+adjacent ------p_pacificus_manual_ZK520.3a 83 DVAT--KGMEMIDVSTGPKE------100 a_ceylanicum_wormbase_Acey_s0283.g1331.t5 82 NINT--RNAEPLETAMGAREL------100 h_contortus_wormbase_HCOI00608800.t1+HCOI00608600.t1 60 NINT--RNAEQLDTATGAKELPLCLVWSAVSPLLAIGNSNGNL 100 c_angaria_wormbase_Cang_2012_03_13_00051.g2661.t1+Cang... 84 DVNS--RSTDIVESGAAST------100 c_elegans_ZK520.3a ------c_briggsae_manual_CBG18281 84 DVNS--RNTDTLESCATSS------100 c_brenneri_wormbase_CBN15180+CBN13504 84 DVNS--RNTDTVESSAASS------100 c_tropicalis_gemoma_Csp11.Scaffold628.g7258.t1+Csp11.Sca... 100 D------c_remanei_manual_CRE04849+CRE04848 84 DVNS--RNTDTVESCATSS------100 s_ratti_genblastg_SRAE_2000014600+SRAE_2000014400 82 ELPT--KKIDRLDCSIGSKEN------100 a_suum_wormbase_GS_12755 81 EVGT--KKTETLDANMGSKESP------100 b_malayi_gemoma_Bm7486a 84 EFTS--KTATKLDSGMSGK------100 l_loa_gemoma_EFO25666.2 84 EFAS--KTVNKLDSGMSGK------100 d_immitis_wormbase_nDi.2.2.2.t05972 82 EFAS--KTVNKLDSGMSGKEK------100 o_volvulus_manual_OVOC7996 82 EFAS--KTVNKLECGISGREI------100

Figure 3.17: Multiple sequence alignment of first 100a.a. of dyf-2 orthologs. Among 25 nematode genomes, 24 dyf-2 orthologs are found, and 1 is not found.

67 b_xylophilus_genblastg_BUX.s00422.326+BUX.s00422.325 1 ------MKVLFRETDQSLIGNGSI--- -VFDWRPG-GNHIAVGGNNGVVCLYDRYGE46 t_suis_wormbase_M514_10553 1 MSFKQLFSRLPLKCVLLPGKHGSTKSTKLSMKQLFSIDCRPL-DLQQI--- -HFSIQPKLRNLIAVSNSSNFVTVYNRQGE76 p_redivivus_wormbase_g2736.t1 1 ------MKLLYFVPRPEG-IKDDVTINMLFSWRPQ-GNYLASSNGMVSVRICDRYGE49 p_pacificus_manual_ZK520.3a 1 ------MAKILYSKREDYF-GPGTT--- -KFRWRSG-GNMLAASGSNN I LA IFDRKGD46 s_ratti_genblastg_SRAE_2000014600+SRAE_2000014400 1 ------MKFLFKLTEEQN-GSGEA--- - LCRWRPE -GNYLAVASFNSF IR I YDKSGK45 b_malayi_gemoma_Bm7486a 1 ------VHL------QLICSLTEKEL-GEGEP--- -IFEWRPK-GNYLAVAGSSNLVKLYDRSGN47 l_loa_gemoma_EFO25666.2 1 ------VHL------QLICSLTEKEL-GEGEP--- - IFEWRPK -GSY IA IAGSTNWVKLYDRNGN47 d_immitis_wormbase_nDi.2.2.2.t05972 1 ------MKLICSLTEREL-GEDEP--- -IFEWRPN-GNYLAVAGSNNLVKLYDRSGN45 o_volvulus_manual_OVOC7996 1 ------MKLIWSLTEKEL-GEGEP--- - IFEWRPK -GNYLAVAGWNNLVKLYDRSGN45 a_ceylanicum_wormbase_Acey_s0283.g1331.t5 1 ------MKLLFRVTDETV-GPGKT--- -CIAWRPN-GNTLAMASANKSVILYDRKGA45 h_contortus_wormbase_HCOI00608800.t1+HCOI00608600.t1 1 ------MKLLFRITEETV-GPGKT--- -CIAWRPN-GNTLALA------31 c_angaria_wormbase_Cang_2012_03_13_00051.g2661.t1+Cang... 1 ------MSM------KIQYRRSADEI-GNGTP--- -IVKWRPN-SHALAVACPNQSVIYYDKKGN47 c_elegans_ZK520.3a 1 ------MSLKVIPCTLTKNQEVFKCVSAQLQYRRGEEEH-GSGPI--- - IHRWRPN-GHTVAVACANNTV I YYDKKGN66 c_briggsae_manual_CBG18281 1 ------MSL------KLQYRKGEEEH-GSGPI--- - IHRWRPH-GHTLAVACANNTV I YYDKKGN47 c_brenneri_wormbase_CBN15180+CBN13504 1 ------MSL------KLQYRKGEEEH-GPGPI--- - IHRWRPH-GHTLAVACGNNSV I YYDKKGN47 c_tropicalis_gemoma_Csp11.Scaffold628.g7258.t1+Csp11.Sca... 1 ------MSLKVTLHDFTFLVITSR---VKLQYRKGEEEH-GTGEI--- - IHRWRPH-GHTLAVACANNSV I YYDKKGG63 c_remanei_manual_CRE04849+CRE04848 1 ------MTL------KLQYRKGEEEH-GTGPI--- - IHRWRPH-GHTLAVACANNSV I YYDKKGN47 b_xylophilus_genblastg_BUX.s00422.326+BUX.s00422.325 47 RTEEVRGGSK IDLLRWDSEGE I LA IASV I YSLVVYSSFRGLPTFMQWSKKHQVL------100 t_suis_wormbase_M514_10553 77 VFDQIKTFGHPCAIEWHPNGLTLA------100 p_redivivus_wormbase_g2736.t1 50 IFDEFNAPEKDRFLEWHRDGDYLAAVGSNAPHVTIYTMSTKKSNDVDLGLA ------100 p_pacificus_manual_ZK520.3a 47 QIDCHHTTSGVIDFAWDFEGDVIAMLIDKSPILTLFDVATKGMEMIDVSTGPKE------100 s_ratti_genblastg_SRAE_2000014600+SRAE_2000014400 46 IIDEFNLEGPTINMEWDKSGNILAITTTDSPYLTLWELPTKKIDRLDCSIGSKEN------100 b_malayi_gemoma_Bm7486a 48 LIDELVIPGQVEALSWDRDGDMLAIMNDKSTAVTLWEFTSKTATKLDSGMSGK------100 l_loa_gemoma_EFO25666.2 48 LIDELILPGRVEALSWDRDGDMLAIMNDKSTVITLWEFASKTVNKLDSGMSGK------100 d_immitis_wormbase_nDi.2.2.2.t05972 46 LIDKLVLSGQAKALSWDRDGDVLAIVNDKNTAITLWEFASKTVNKLDSGMSGKEK------100 o_volvulus_manual_OVOC7996 46 IIDELILPGRAKALSWDRDGDVLAIINDKNTAITLWEFASKTVNKLECGISGREI------100 a_ceylanicum_wormbase_Acey_s0283.g1331.t5 46 I IDVLDVTGNV IGMSWDKEGDVLG I LTDGSSQALLWN INTRNAEPLETAMGAREL------100 h_contortus_wormbase_HCOI00608800.t1+HCOI00608600.t1 32 ------RNVVDMAWDKEGDVLGIIVDASALAILWNINTRNAEQLDTATGAKELPLCLVWSAVSPLLAIGNSNGNL 100 c_angaria_wormbase_Cang_2012_03_13_00051.g2661.t1+Cang... 48 IIDAVEMTGPVLDIAWDKEGDVLAIAQTSMSTITLWDVNSRSTDIVESGAA ST------100 c_elegans_ZK520.3a 67 VIDALNPTGKLIDIAWDKEGDVLAIAVANTGTIY------100 c_briggsae_manual_CBG18281 48 IIDALSPNGRIVDIAWDKEGDVLAIAVANTGTIYLWDVNSRNTDTLESCAT SS------100 c_brenneri_wormbase_CBN15180+CBN13504 48 IIDALNPTGKIVDIAWDKEGDVLAIAVANTGTIYLWDVNSRNTDTVESSAA SS------100 c_tropicalis_gemoma_Csp11.Scaffold628.g7258.t1+Csp11.Sca... 64 VIDALNPTGKIVDIAWDKEGDVLAIAVANTGTIYLWD------100 c_remanei_manual_CRE04849+CRE04848 48 VIDALNPTGKIVDIAWDKEGDVLAIAVANTGTIYLWDVNSRNTDTVESCAT SS------100

Figure 3.18: Multiple sequence alignment of first 100a.a. of dyf-2 orthologs, only showing genes with high confidence 5’ start sites. This additional alignment is generated in order to show the conserved regions more clearly and remove noise caused by sequences without high confidence 5’ start sites

3.3.13 Curation of dyf-3 orthologs in nematodes dyf-3 is expressed in ciliated neurons and is required for and undergoes IFT (Murayama et al., 2005). In dyf-3 mutants, cilia lack distal segments and have truncated middle segments, and no IFT can be detected in the residual middle segment (Ou et al., 2005b). We identified dyf-3 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.19).

68 Table 3.14: Curation of dyf-3 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE13104 WormBase gene model Yes 89.0 C. tropicalis Csp11.Scaffold630.g19302.t1 - No 7.5 Gap in 5’ end of alignment C. brenneri CBN32658 genBlastG gene model Yes 89.0 C. sinica Csp5_scaffold_02330.g26012.t1 GeMoMa gene model Yes 86.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG21052 genBlastG gene model Yes 88.0 C. elegans C04C3.5 - - - C. japonica CJA14406 genBlastG gene model Yes 89.0 C. angaria C04C3.5b_ortholog genBlastG gene model Yes 38.5 No RNA-seq data for first intron, but first 100a.a. are conserved H. bacteriophora Hba_14515 WormBase gene model Yes 67.6 No RNA-seq data, but first 100a.a. are 69 conserved H. contortus HCOI00215700.t1 WormBase gene model Yes 70.0 A. ceylanicum Acey_s0199.g1659.t1 WormBase gene model Yes 73.0 N. americanus NECAME_16174 GeMoMa gene model Yes 72.0 3’ end of gene truncated due to gap in assembly P. pacificus PPA26582 WormBase gene model Yes 51.3 P. exspectatus scaffold125-EXSNAP2012.21 genBlastG gene model Yes 57.7 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000491500 WormBase gene model Yes 52.5 P. redivivus g1481.t1 GeMoMa gene model Yes 54.5 B. xylophilus BUX.s00036.38 GeMoMa gene model Yes 57.4 M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 13.9, genBlastG PID: 15.7) M. hapla MhA1_Contig107.frz3.gene49 genBlastG gene model Yes 54.5 No RNA-seq data, but first 100a.a. are con- served; 700bp from end of contig A. suum GS_00740 WormBase gene model Yes 67.3 Curation of dyf-3 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) D. immitis nDi.2.2.2.t07016 WormBase gene model Yes 56.4 O. volvulus OVOC9332 genBlastG gene model Yes 44.7 B. malayi Bm1945c WormBase gene model No 44.0 Different first exon suggested by RNA-seq, and first 100a.a. not conserved L. loa EFO18073.2+EJD75298.1 genBlastG gene model Yes 58.4 T. spiralis EFV60151 genBlastG gene model Yes 46.7 T. suis M514_05015 genBlastG gene model Yes 40.6 70 c_tropicalis_genblastg_Csp11.Scaffold630.g19302.t1 1 ------MFQAFFHPISNISMFQADGHAAQELLPALKIL-YLAK------SDDSNLDSTPKWNQ-- 50 t_suis_genblastg_M514_05015 1 MSYKEFRDLKSMLEALNYPNSLSFGSYRSPNFSLTAEILRWLCE------WFVASESL------TE 54 t_spiralis_genblastg_EFV60151 1 MSYIKLQNFTLIMRALNYPKFISIESFYKPNFLLVADALRWICK------RYSNTHKF----TKTE 56 o_volvulus_genblastg_C04C3.5b_ortholog 1 MSYRELRNMVEHLRVLGYPRLVSMENFRTPNFKLVAEILEWIV------YDAQLSL-PPMIDTE 57 p_exspectatus_genblastg_scaffold125-EXSNAP2012.21 1 MSYRELRTITELARALSYPRIISIENFRTPNFALVAEMLEWIVKRW------RFDYS----PCESRNV 58 p_pacificus_wormbase_PPA26582 1 MSYRELRTITELARALAYPRLISIENFRTPNFALVAEMLEWIVKRLMTQKT HVADLEKFEPSALV-SADCVNE 72 c_angaria_genblastg_C04C3.5b_ortholog 1 MSYRELRS--EFLRDF------SFSPESTLNAQNLQNE 30 d_immitis_wormbase_nDi.2.2.2.t07016 1 MSYRELRNMVEQLRVLGYPRLVSMENFRTPNFKLIAELLEWIVH------RYDAQIPL-PLVIDTE 59 b_malayi_wormbase_Bm1945c 1 ------MVEQLRVLGYPRLVSMENFRTPNFKLIAEILEWLVH------RYDAQISI-PLVIETE 51 l_loa_genblastg_EFO18073.2+EJD75298.1 1 MSYRELRNMVEQLRILGYPRLVSMENFRTPNFKLVAELLEWLVH------RYDAQISI-PLVIETE 59 a_suum_wormbase_GS_00740 1 MSYRELRNVVEMMRALGYPRILSLENFRTPNFKLVAELLEWIVH------RFDPNSHL-PTILDTE 59 c_elegans_C04C3.5b 1 MSYRELRNLCEMTRSLRYPRLMSIENFRTPNFQLVAELLEWIVK------KFEPESNLDAHEVQTE 60 c_japonica_genblastg_CJA14406 1 MSYRELRNLCEMTRTLRYPRLMSIENFRTPNFKLVAELLEWIVK------KFEPDATLEAQSLETE 60 c_briggsae_genblastg_CBG21052 1 MSYRELRNLCEMTRTLRYPRLMSIENFRSPNFQLVAELLEWIVK------KFEPDASLDATSISTE 60 c_sinica_gemoma_Csp5_scaffold_02330.g26012.t1 1 MSYRELRNLCEMTRTLRYPRLMSIENFRTPNFQLVAELLEWIVK------KFEPDASLDATSISQE 60 c_remanei_wormbase_CRE13104 1 MSFRELRNFCEMTRTLRYPRLMSIENFRNPNFQLVAELLEWIVK------KFEPDASLDAQMIQTE 60 c_brenneri_genblastg_CBN32658 1 MSFRELRNLCEMTRSLRYPRLMSIENFRTPNFQLVAELLEWIVK------KFEPDATLDAQMILTE 60 h_bacteriophora_wormbase_Hba_14515 1 MSYRELRNLCEMTRALGYPRLLSIENFRSPNFLLIAELLEWIVKRYYGS------RFEPNATLSAQQTSTE 65 h_contortus_wormbase_HCOI00215700.t1 1 MSYRERRNLCEMTKAIGYPRILSLENFRTPNFRLVAELLEWIVK------RFDPSATISAEHTTTE 60 a_ceylanicum_wormbase_Acey_s0199.g1659.t1 1 MSYRELRNLCEMTRAIGYPRILSLENFRTPNFKLVAELLEWIVK------RFDPSATISAEHTETE 60 n_americanus_gemoma_C04C3.5b_R0 1 MSYRELRNLCEMTRAIGYPRILSLENFRTPNFKLVAELLEWIVK------RFDPSATISAEQTTTE 60 s_ratti_wormbase_SRAE_2000491500 1 MSYRDLRDTTEILRSLGFPRLVSIDNFRNPNFSLLAEILEWLVL------KFDGDIKI-KTNLEHE 59 p_redivivus_gemoma_g1481.t1 1 MSFRELRDATEILRSLEFPRLVSIENFRIPNFELMSEVLEWIVK------KFDANARI-PKKRDTE 59 b_xylophilus_gemoma_C04C3.5b 1 MSYRELRDAVEILRFLQYPRLISIENFRNPNFPLMAEIVEWTVK------KFDPNYRV-PKNIDTE 59 m_hapla_genblastg_C04C3.5b 1 MSFHELKEFTEILRSLGFPRLVSNENFRFPNFPLMAEILEWTVK------KFEPNIRL-PKQLDSE 59 c_tropicalis_genblastg_Csp11.Scaffold630.g19302.t1 51 ------VKNKLSSKMQEVRITRQLSAQLP--ETGALLSELLSKQEFISNQHERAAARA------100 t_suis_genblastg_M514_05015 55 KNRALFIKAAALYMYRQTGMKLNPRKMYQ--ANRNAIKELLRIVTPLY------100 t_spiralis_genblastg_EFV60151 57 QDRVIFIKSAVIFIYQQTGLKLNPKKLYQ--ADVEAIDELLKAVEP------100 o_volvulus_genblastg_C04C3.5b_ortholog 58 QERAFFIKSATFYIFQKAQLQKCRQLASQIPNHGATLYDLLAK------100 p_exspectatus_genblastg_scaffold125-EXSNAP2012.21 59 TERVIFIK--VSCVSNNARIKLNPKKLYQ--ADGYAVQELLVPMRI------100 p_pacificus_wormbase_PPA26582 73 QERVIFIKACVTMLLQNARIKLNPKKLY------100 c_angaria_genblastg_C04C3.5b_ortholog 31 TDRVNFVKQAVMLMLQNSRLKMNPKKLYQ--ADGHAVQELIPALKILYDARFDDNSANTELTPAWTQVKNKL 100 d_immitis_wormbase_nDi.2.2.2.t07016 60 QERTFFIKSAAYYIFQKARIKLNPKKLYM--ADGYAAQEVAVV------100 b_malayi_wormbase_Bm1945c 52 QERAFFIKSATFYILQKARIKLNPKKLYM--ADGYAVQEIAVVVRNLYEIT ------100 l_loa_genblastg_EFO18073.2+EJD75298.1 60 QERAFFIKSATFYILQKTRIKLNPKKLYM--ADGHAVQEIAVV------100 a_suum_wormbase_GS_00740 60 QERVVFIKTAVLILLQKARIKLNPKKLYQ--ADGYAVQELAVV------100 c_elegans_C04C3.5b 61 ADRVAFIKNAVLLMLQNSRIKMNPKKLYQ--ADGHAVQELLP------100 c_japonica_genblastg_CJA14406 61 ADRVAFIKNAVLLMLQNARIKMNPKKLYQ--ADGHAVQELLP------100 c_briggsae_genblastg_CBG21052 61 AERVEFVKNAVLLMLQNSRIKMNPKKLYQ--ADGHAVQELLP------100 c_sinica_gemoma_Csp5_scaffold_02330.g26012.t1 61 GTRVEFVKMAVLLMLQNSRIKMNPKKLYQ--ADGHAVQELLP------100 c_remanei_wormbase_CRE13104 61 ADRVNFIKNAVLLMLQNSRIKMNPKKLYQ--ADGHAVQELLP------100 c_brenneri_genblastg_CBN32658 61 ADRVNFIKNAVLLMLQNSRIKMNPKKLYM--ADGYAVQELLP------100 h_bacteriophora_wormbase_Hba_14515 66 QDRILFIKQAVLLLLQNSRLKLNPRRLYQ--ADGHAV------100 h_contortus_wormbase_HCOI00215700.t1 61 QDRVLFIKQAVLLLLQNSRLRLNPRRLYQ--ADGNAVQELLP------100 a_ceylanicum_wormbase_Acey_s0199.g1659.t1 61 QDRVLFIKQAVLLLLQNSRLKLNPRKLYQ--ADGYAAQELLP------100 n_americanus_gemoma_C04C3.5b_R0 61 QDRVLFVKQAVLLLLQNSRLRLNPRKLYQ--ADGYAVQELLP------100 s_ratti_wormbase_SRAE_2000491500 60 SDRILFIKQCVIILLQKARIKMNPRNLYQ--ANGLAIKEIMPA------100 p_redivivus_gemoma_g1481.t1 60 ADRILFVKTCVLILMQKARIKLNPRNLYM--SDGHAVREMMSV------100 b_xylophilus_gemoma_C04C3.5b 60 QNRVLFVKSCVLALVQKARVKINPKNLYQ--SDGYAVRELLPV------100 m_hapla_genblastg_C04C3.5b 60 QERVLFVKSIVLSLLQKAHVKLNPKNVYQ--SDGHAVREILPV------100

Figure 3.19: Multiple sequence alignment of first 100a.a. of dyf-3 orthologs. Among 25 nematode genomes, 24 dyf-3 orthologs are found, and 1 is not found.

3.3.14 Curation of dyf-5 orthologs in nematodes dyf-5 is a MAP kinase that regulates cilia length (Chen et al., 2006; Burghoorn et al., 2007). dyf-5 mutants in C. elegans and C. reinhardtii display elongated cilia (Burghoorn et al., 2007; Berman et al., 2003). In addition, dyf-5 mutants in C. elegans also have disrupted IFT, where kinesin-II motors enter the distal segment and IFT proteins accumulate halfway along the cilia and in the distal segment (Burghoorn et al., 2007). We identified dyf-5 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.20).

71 Table 3.15: Curation of dyf-5 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE20417 WormBase gene model Yes 100.0 C. tropicalis Csp11.Scaffold616.g6022.t2 GeMoMa gene model Yes 99.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN19125 genBlastG gene model Yes 100.0 C. brenneri CBN29786 genBlastG gene model Yes 100.0 C. sinica Csp5_scaffold_01078.g18204.t1 WormBase gene model Yes 97.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG22182 WormBase gene model Yes 100.0 C. elegans M04C9.5 - - - C. japonica CJA12112 - No 40.4 Gap in 5’ end of alignment C. japonica CJA12938 GeMoMa gene model Yes 98.0 RNA-seq suggests upstream exon but low 72 coverage C. angaria Cang_2012_03_13_00481.g11353.t1 GeMoMa gene model Yes 82.9 Small gap 600bp upstream H. bacteriophora Hba_19137 WormBase gene model Yes 85.1 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00082700.t1 WormBase gene model Yes 79.6 A. ceylanicum Acey_s0032.g2589.t2 WormBase gene model Yes 81.6 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) N. americanus NECAME_01617 - No 14.2 Gap in 5’ end of alignment P. pacificus PPA32892 GeMoMa gene model Yes 83.5 Gap 200bp upstream P. exspectatus M04C9.5_ortholog GeMoMa gene model Yes 83.5 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000094800 WormBase gene model Yes 56.9 P. redivivus g11778.t1 WormBase gene model Yes 77.4 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved); small gap 900bp upstream B. xylophilus BUX.s01337.109 WormBase gene model Yes 69.4 Curation of dyf-5 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. incognita Minc07403+Minc07404 WormBase gene model Yes 38.2 M. incognita Minc04561 - No 14.2 Gap in 5’ end of alignment M. incognita Minc01307 WormBase gene model Yes 35.3 Protein sequence contains Xs in second exon M. hapla MhA1_Contig1189.frz3.fgene4 WormBase gene model Yes 36.0 No RNA-seq data, first 100 a.a. partially conserved A. suum GS_21827+GS_00312 WormBase gene model Yes 87.1 Sparse RNA-seq data for this gene, but first 100a.a. are conserved D. immitis nDi.2.2.2.t05151 - No 5.4 500bp from end of contig O. volvulus OVOC1429 WormBase gene model Yes 80.4 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) B. malayi Bm6048 WormBase gene model Yes 27.5

73 L. loa EJD74161.1 WormBase gene model Yes 72.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved; gap 600bp upstream T. spiralis EFV55018 WormBase gene model Yes 65.3 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) T. suis M04C9.5_ortholog genBlastG gene model Yes 63.4 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) d_immitis_wormbase_nDi.2.2.2.t05151 1 ------XYRAPEILLRSTSYNSPIDIWALGCIMAELY------MLRPLFPGT------SELDQLF 47 m_incognita_wormbase_Minc04561 1 ------MAELY------MLRPLFPGT------SELDQIF 21 b_malayi_wormbase_Bm6048 1 ------MKNY---- -EKLEKIGEGTYGTVFKAKNCDTQE 28 m_hapla_wormbase_MhA1_Contig1189.frz3.fgene4 1 MTQQAATAASSVIGGIPQGLFNLGVPSIGTNSAPSTGIPPKTLADRY---- -KITKKVGDGTFGEVSLAIKLDTGD 71 m_incognita_wormbase_Minc07403+Minc07404 1 MTQQAATASSSVIGGMPQGLFNLGVPSIGTSSAPSTGIPPKTLADRY---- -RMTKKVGDGTFGEVSLAIKLDTGD 71 m_incognita_wormbase_Minc01307 1 MTQQAATASSSVIGGMPQGLFNLGVPSIGTSSAPSTGIPPKTLADRY---- -RMTKKVGDGTFGEVSLAIKLDTGD 71 s_ratti_wormbase_SRAE_2000094800 1 ------MSSIENSPNPSSSTTTKSGGQTLADRY---- -KVTKKVGDGTFGEVSLAKKLDTGD 51 b_xylophilus_wormbase_BUX.s01337.109 1 ------MTTGSAKSGSKPESLADRY---- -RITRKVGDGTFGEVSLAKKLDTGD 43 n_americanus_wormbase_NECAME_01617 ------h_bacteriophora_wormbase_Hba_19137 1 ------MASSSQTLADRY---- -HMTKRLGDGTFGEVLLAKKIDTGD 36 h_contortus_wormbase_HCOI00082700.t1 1 ------MTTLADRY---- -HMTKLLGDGTFGEVLLAKKIDTGD 32 a_ceylanicum_wormbase_Acey_s0032.g2589.t2 1 ------MTTLADRY---- -HMTKRLGDGTFGEVLLAKKIDTGD 32 p_redivivus_wormbase_g11778.t1 1 ------MQSAPAPGGGQTLADRY---- -RMTKKVGDGTFGEVSLARKIDTGD 41 p_exspectatus_gemoma_M04C9.5_ortholog 1 ------MSAGPGGGTLADRY---- -LLTKRLGDGTFGEVVLAKKIDTGD 38 p_pacificus_gemoma_M04C9.5b_R0 1 ------MSAGPGGGTLADRY---- -LLTKRLGDGTFGEVVLAKKIDTGD 38 c_angaria_gemoma_Cang_2012_03_13_00481.g11353.t1 1 ------MSGAVKLADRCKTMYNNXXXXLGDGTFGEVMLAKKIDTGD 40 c_sinica_wormbase_Csp5_scaffold_01078.g18204.t1 1 ------MSGVVKLAERY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_brenneri_genblastg_CBN19125 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_brenneri_genblastg_CBN29786 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_briggsae_wormbase_CBG22182 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_remanei_wormbase_CRE20417 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_elegans_M04C9.5b 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_tropicalis_gemoma_Csp11.Scaffold616.g6022.t2 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 c_japonica_wormbase_CJA12112 ------c_japonica_gemoma_CJA12938 1 ------MSSAVKLADRY---- -LMTKRLGDGTFGEVMLAKKIDTGD 35 a_suum_wormbase_GS_21827+GS_00312 1 ------MTSAGQTLADRY---- -NMTKRLGDGTFGEVLLAKKLDTGD 36 l_loa_wormbase_EJD74161.1 1 ------MYTEHMATLTGQTLADRY---- -LMTKRLGDGTFGEVLLAKKLDTGD 42 o_volvulus_wormbase_OVOC1429 1 ------MAASAGQTLADRY---- -LMTKRLGDGTFGEVLLAKKLDTGD 37 t_spiralis_wormbase_EFV55018 1 ------MNGSILANRY---- -RLLKEIGDGTFGEVWLAKRLSSNE 34 t_suis_genblastg_M04C9.5_ortholog 1 ------MNGLLVASRY---- -RLINEIGDGTFGEVWLAKRVGTNE 34 d_immitis_wormbase_nDi.2.2.2.t05151 48 KIITILGTPNKEDWPEGY---QLAVAMNFK-FQQCVPIPFATIVNSIGDDG LKLMTD------100 m_incognita_wormbase_Minc04561 22 KIINVLGTPTKEEWPESY---RLATAMNFS-FHQSSGVPLKSIVNTASDDA IKLMSDFLAWCPEKRPTAVNSLKYP 93 b_malayi_wormbase_Bm6048 29 IVAMKCV--RLDDDDEGVPSSALREICLLKELKHQNIVRLYDVVHS--ERK LTLVFEYC------DQDLKK 89 m_hapla_wormbase_MhA1_Contig1189.frz3.fgene4 72 VVAIKRMKKKFYSWDEAM---NLREVKSLKKL------100 m_incognita_wormbase_Minc07403+Minc07404 72 VVAIKRMKKKFYSWDEAM---NLREVKSLKKL------100 m_incognita_wormbase_Minc01307 72 VVAIKRMKKKFYSWDEAM---NLREVKSXXXX------100 s_ratti_wormbase_SRAE_2000094800 52 IVAIKRMKKKFYSWDEAM---ALREVKSLKKLSHPNIIKLKEVLRE--NDT LYF------100 b_xylophilus_wormbase_BUX.s01337.109 44 IVAIKRMKKKFYSWDEAM---ALREVKSLKKLNHPNIIKLKEVIRE--NDI LYFVFEFM------NEN--- 100 n_americanus_wormbase_NECAME_01617 1 ------MLTE------NVIRE--NDV LYFVFEYM------QENLYE 26 h_bacteriophora_wormbase_Hba_19137 37 KVAIKRMKKKFHTWEEAT---SLREVKALKKLPHPNIIKLREVIRE--NDI LYFVFEYM------QENLYE 96 h_contortus_wormbase_HCOI00082700.t1 33 KVAIKRMKKKFKTWEEAT---ALREVKALKKLPHPNIIKLREVIRE--NDV LYFVFEYM------QENLYE 92 a_ceylanicum_wormbase_Acey_s0032.g2589.t2 33 KVAIKRMKKKFKTWEEAT---ALREVKALKKLPHPNIIKLREVIRE--NDI LYFVFEYM------QENLYE 92 p_redivivus_wormbase_g11778.t1 42 VVAIKRMKKKFYSWDEAM---GLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QMNLY- 100 p_exspectatus_gemoma_M04C9.5_ortholog 39 KVAVKRMKKKFYSWEEAM---ALREVKSLKKLNHPNIIKLREVIRE--NDV LYFVFEFM------KENLYE 98 p_pacificus_gemoma_M04C9.5b_R0 39 KVAVKRMKKKFYSWEEAM---ALREVKSLKKLNHPNIIKLREVIRE--NDV LYFVFEFM------KENLYE 98 c_angaria_gemoma_Cang_2012_03_13_00481.g11353.t1 41 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFIFEFM------QENLYE 100 c_sinica_wormbase_Csp5_scaffold_01078.g18204.t1 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_brenneri_genblastg_CBN19125 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_brenneri_genblastg_CBN29786 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_briggsae_wormbase_CBG22182 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_remanei_wormbase_CRE20417 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_elegans_M04C9.5b 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_tropicalis_gemoma_Csp11.Scaffold616.g6022.t2 36 RVAIKRMKKKFYTWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFVFEFM------QENLYE 95 c_japonica_wormbase_CJA12112 1 ------MKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFIFEYM------QENLYE 54 c_japonica_gemoma_CJA12938 36 RVAIKRMKKKFYSWEEAM---SLREVKSLKKLNHPNIIKLREVIRE--NDI LYFIFEYM------QENLYE 95 a_suum_wormbase_GS_21827+GS_00312 37 KVAIKRMKKKFYSWDEAM---ALREVKSLKKLNHPNIIKLREVIRE--NDN LYFVFEYM------QENLYE 96 l_loa_wormbase_EJD74161.1 43 KVAIKRMKRKFYSWNEAM---ALREVKSLKKMNHPNIIKLREVIRE--HDN LYFVFEYM------QENL-- 100 o_volvulus_wormbase_OVOC1429 38 KVAIKRMKRKFYSWNEAM---ALREVKSLKKMNHPNIIKLREVIRE--HDN LYFIFEYM------QENLYE 97 t_spiralis_wormbase_EFV55018 35 KVAIKKMKKKYYSWDEAM---GLREVKSLKKMNHINVVKLKEVIRE--NDT LYFIFEYM------KENLYE 94 t_suis_genblastg_M04C9.5_ortholog 35 KVAIKRMKRKFFSWDEAM---NLREVKSLRKLNHVNVVKLKEVIRE--NDT LYFVFEYM------KENLYE 94 d_immitis_wormbase_nDi.2.2.2.t05151 ------m_incognita_wormbase_Minc04561 94 YFQVSQK------100 b_malayi_wormbase_Bm6048 90 YFDSCSGEIDQ------100 m_hapla_wormbase_MhA1_Contig1189.frz3.fgene4 ------m_incognita_wormbase_Minc07403+Minc07404 ------m_incognita_wormbase_Minc01307 ------s_ratti_wormbase_SRAE_2000094800 ------b_xylophilus_wormbase_BUX.s01337.109 ------n_americanus_wormbase_NECAME_01617 27 LMKDRDRYFPESVIRNIIYQILQGLAFMHRNGYFHRDMKPENIMCNGTELV KIADFGLAREVRSKPPYTDYVST 100 h_bacteriophora_wormbase_Hba_19137 97 LMKD------100 h_contortus_wormbase_HCOI00082700.t1 93 LMKDRDRY------100 a_ceylanicum_wormbase_Acey_s0032.g2589.t2 93 LMKDRDRY------100 p_redivivus_wormbase_g11778.t1 ------p_exspectatus_gemoma_M04C9.5_ortholog 99 LM------100 p_pacificus_gemoma_M04C9.5b_R0 99 LM------100 c_angaria_gemoma_Cang_2012_03_13_00481.g11353.t1 ------c_sinica_wormbase_Csp5_scaffold_01078.g18204.t1 96 LMKDR------100 c_brenneri_genblastg_CBN19125 96 LMKDR------100 c_brenneri_genblastg_CBN29786 96 LMKDR------100 c_briggsae_wormbase_CBG22182 96 LMKDR------100 c_remanei_wormbase_CRE20417 96 LMKDR------100 c_elegans_M04C9.5b 96 LMKDR------100 c_tropicalis_gemoma_Csp11.Scaffold616.g6022.t2 96 LMKDR------100 c_japonica_wormbase_CJA12112 55 LMKDRDRYFPESVIRNIIYQVLQGLSFMHKNGFFHRDMKPENIMCN------100 c_japonica_gemoma_CJA12938 96 LMKDR------100 a_suum_wormbase_GS_21827+GS_00312 97 LMKD------100 l_loa_wormbase_EJD74161.1 ------o_volvulus_wormbase_OVOC1429 98 LMK------100 t_spiralis_wormbase_EFV55018 95 MMKRRD------100 t_suis_genblastg_M04C9.5_ortholog 95 MVKKRE------100

Figure 3.20: Multiple sequence alignment of first 100a.a. of dyf-5 orthologs. Among 25 nematode genomes, 29 dyf-5 orthologs are found, and none are not found. Note: C. japonica contains two dyf-5 genes, and M. incognita contains three dyf-5 genes. One of the C. japonica dyf-5 genes has a high confidence 5’ start site, and two of the M. incognita dyf-5 genes have high confidence 5’ start sites.

74 3.3.15 Curation of dyf-11 orthologs in nematodes dyf-11 is an IFT-B component expressed in ciliated neurons (Kunitomo and Iino, 2008). dyf-11 mutants have truncated cilia and abnormal branching (Kunitomo and Iino, 2008). We identified dyf- 11 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.21).

75 Table 3.16: Curation of dyf-11 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE17230 WormBase gene model Yes 68.0 C. tropicalis Csp11.Scaffold629.g15635.t2 WormBase gene model Yes 81.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN00078 WormBase gene model Yes 66.0 C. sinica Csp5_scaffold_00693.g14473.t3 WormBase gene model Yes 71.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG22059 WormBase gene model Yes 63.1 C. elegans C02H7.1 - - - C. japonica CJA11321 WormBase gene model Yes 70.0 C. angaria Cang_2012_03_13_00309.g8972.t1 WormBase gene model Yes 60.2 H. bacteriophora Hba_15833 genBlastG gene model Yes 54.9 No RNA-seq data, but first 100a.a. are 76 conserved H. contortus HCOI01514900.t1 genBlastG gene model Yes 47.1 1.9kb from end of contig A. ceylanicum Acey_s0136.g1946.t1 - No 12.6 400bp from end of contig N. americanus NECAME_04663+NECAME_04664 genBlastG gene model Yes 51.0 Gap 600bp upstream P. pacificus PPA23315 WormBase gene model Yes 46.1 RNA-seq suggests upstream exon but low coverage P. exspectatus scaffold10-EXSNAP2012.5 WormBase gene model Yes 47.1 No RNA-seq data, but first 100a.a. are conserved S. ratti C02H7.1_ortholog genBlastG gene model Yes 43.7 No RNA-seq data for this gene, but first 100a.a. are conserved P. redivivus g3517.t1 WormBase gene model Yes 50.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved; Gap 1.8kb upstream B. xylophilus BUX.s00116.95 WormBase gene model No 47.6 Conflicting RNA-seq junctions M. incognita Minc16324 WormBase gene model Yes 50.5 M. hapla MhA1_Contig624.frz3.gene9 WormBase gene model Yes 49.5 No RNA-seq data, but first 100a.a. are conserved Curation of dyf-11 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_02500 WormBase gene model Yes 52.0 D. immitis nDi.2.2.2.t00860 WormBase gene model Yes 50.0 O. volvulus OVOC4606 WormBase gene model Yes 47.6 B. malayi Bm1925 WormBase gene model Yes 51.0 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) L. loa EFO23623.1 genBlastG gene model Yes 52.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV47407/Ortholog not found - - - Low sequence similarity (WormBase PID: 12.8, GeMoMa PID: 11.9, genBlastG PID: 8.4) T. suis M514_03063 Manual gene model No 32.7 First 100a.a. not conserved 77 a_ceylanicum_wormbase_Acey_s0136.g1946 1 ------HREEKVKEEKKERKSSKDKESHHKSSKEGHHKKRTDEEKKKKKK EKEKEKEKEKEEIQE 59 t_suis_manual_M514_03063 1 MGDLAFVRPTQKALRHLVDKTPLTEKLLSRPPFQYIRSIVASVIRSTGYMA NLFSNADLYSKEL-V 65 s_ratti_genblastg_C02H7.1_ortholog 1 -----MEKITIEVFSKIISEPILTETLLKRPPFRFLFDIISETIRKTSFLSDKYTEDILKFENF-N 60 c_angaria_wormbase_Cang_2012_03_13_00309.g8972.t1 1 ----MSTSETIELYAGLISKPTLTEQLLSRPPFKFIVDIVTNIIKSTGYLKDEFDKDELAAAG--T 60 c_brenneri_wormbase_CBN00078 1 ----MSLEKTQEILEKIIKQPKLTEKLLSRPPFKFIVEIVANVIAATGYLK NDISEGEIESAG--N 60 c_remanei_wormbase_CRE17230 1 ----MSYDQTRKAFENVIGKPSLTDKLLSKPPFKFIADIVSNVRSATGYLK NEFTDEEISTAA--T 60 c_briggsae_wormbase_CBG22059 1 -MASTTMEKTKKVLEKIIEKPKITIALLERPPFKFIVDIVSNVIEATGYLK NDFSKEEIQSAG--K 63 c_sinica_wormbase_Csp5_scaffold_00693.g14473.t3 1 ----MSFEKTKKTLEKIIKKPKITIALLERPPFKFIVDIVSNVIESTGYLKDDFSKDELKNAG--K 60 c_japonica_wormbase_CJA11321 1 ----MSIEETQKIFKDIIQKPNLTEQLLLKPPFKFIVDIVNNVIHSTGYLK NEFSDEELSKAG--S 60 c_tropicalis_wormbase_Csp11.Scaffold629.g15635.t2 1 ----MSVEKTQEILAHVITQPTLTSQLLSRPPFKFIVDIVSNVILSTGYLK NEFSSDELKSAG--S 60 c_elegans_C02H7.1 1 ----MSVEETREILEKVIQKPQLTDQLLSRPPFKFIVDIVSNVIKSTGYLK TDFTDDEIKSAG--N 60 p_exspectatus_wormbase_scaffold10-EXSNAP2012.5 1 ----MDHDRTRSLFAPLISKPTLTDQLLNRPPFKFLLDVFANTMTKTGFLKGQIDPSELDAAKL-T 61 p_pacificus_wormbase_PPA23315 1 ----MDHDRTRTLFAPLISKPTLTDQLLNRPPFKFLLDVFANTMTKTGFLKGQIDPSELDAAKL-T 61 b_xylophilus_wormbase_BUX.s00116.95 1 ----MEVAQTRELFAPLIKRPEMKDSLLVRPPFKFIHDIVREIVKSTGYLGNVFSADELDYAKAST 62 p_redivivus_wormbase_g3517.t1 1 ---MANTALTRSLFEPLIQKPPLTDQLLQRPPFKFLHDVVNETIRATGYLA ELFTADDLDHTKAAA 63 m_hapla_wormbase_MhA1_Contig624.frz3.gene9 1 ----MDTDRTRALFAPLIQRPTLTDQLLNRPPFKFLHDVVSETLKSTGYPSGLFTDDELDSTKAAS 62 m_incognita_wormbase_Minc16324 1 ----MDTDKTRALFAPLIQRPVLTDQLLNRPPFKFLHDIVSEVIKSTGYSDGLFTVDELDSTKASS 62 a_suum_wormbase_GS_02500 1 ----MDTDKTRQLFASLIQRPPLTDHLLQRPPFKFLHDVINATIQNTGFLF DIFTPEELDYSNM-K 61 o_volvulus_wormbase_OVOC4606 1 -MSKMYTDRTKELFAGLIDRPLLTDQLLLRPPFRFLHDIVKITIQNTGFLMNNFTNEEMDVST I - T 64 d_immitis_wormbase_nDi.2.2.2.t00860 1 ----MYMDRTKELFAGLIDRPPLTDQLLQRPPFRFLHDVIKITIQNTGFLMDKFTSEEMDASNI - T 61 b_malayi_wormbase_Bm1925 1 ----MYTDRTKELFADLIERPPLSDQLLQRPPFRFLHDIVKVTIQNTGFLMDNFTNEEMDASN I - T 61 l_loa_genblastg_EFO23623.1 1 ----MYTDRTKELFADLIERPPLTDQLLQRPPFRFLHDIVKFTIQNTGFLMEKFTNEEMDASNI - T 61 h_bacteriophora_genblastg_Hba_15833 1 ---MVDSEQTRAAFADLIDKPPLTDQLLSRPPFKFILDVVSATITKTGYLKDKFTKDELNPNRF-T 62 h_contortus_genblastg_HCOI01514900.t1 1 ---MANAKDTRAAFSGLIEKPVLTDELLARPPFRFILDIVSSTAAKTGYLRDRFPVEALNPAKF-K 62 n_americanus_genblastg_NECAME_04663+NECAME_04664 1 ---MVNAENTRAAFSGLIDKPPLTDHLLARPPFKFILDVVSSTISKTGYLKDQFPSDALNPSKL-T 62 a_ceylanicum_wormbase_Acey_s0136.g1946 60 RKDEEEDSSRDINSNVVEHHTDQGFD---EPSSSHVDSPKSPGN--- 100 t_suis_manual_M514_03063 66 NKEFKLL---FLRNVIAAV--EQTLNRTVEAKPGKIIAGL------100 s_ratti_genblastg_C02H7.1_ortholog 61 EKSIKIS---FLDELISIINNDGSLD---ELKGSKIVAGKDSHLTN- 100 c_angaria_wormbase_Cang_2012_03_13_00309.g8972.t1 61 DKTSKAAXXXXLDKLISLLN-NGDLE---NVKSAKIIAGKDPED--- 100 c_brenneri_wormbase_CBN00078 61 DKDTKRD---FLTKLIKLLD-DSSLK---SVKVTEILAGRDAEGANK 100 c_remanei_wormbase_CRE17230 61 DKDTKIA---FLEKLIQILD-DGSLG---NVKAIKVSSGKEPEETNK 100 c_briggsae_wormbase_CBG22059 64 DKDSKSA---FLEKLIKILD-DGSLK---NVKASKIMSGMEPVE--- 100 c_sinica_wormbase_Csp5_scaffold_00693.g14473.t3 61 DKESKGT---FLEKLIKILD-DGSLK---DVKASKIMSGIEAEETNK 100 c_japonica_wormbase_CJA11321 61 DKQTKIA---FLEKLIAILD-DGSLE---NVKASKIITGKSPEETNK 100 c_tropicalis_wormbase_Csp11.Scaffold629.g15635.t2 61 DKNTKTA---FLEKLIKILD-DGSLK---NVKAAKVVAGKDAEETNK 100 c_elegans_C02H7.1 61 DKNTKTA---FLDKLIKILD-DGSLK---NVKAAKIISGKDAEETNK 100 p_exspectatus_wormbase_scaffold10-EXSNAP2012.5 62 DKNEKAA---VLDRIIDALNDDGALD---VVKSAKILAGKEPEMT-- 100 p_pacificus_wormbase_PPA23315 62 DKNDKAA---VLDRIIEAVNDDGALD---VVKSAKILAGKEPEMT-- 100 b_xylophilus_wormbase_BUX.s00116.95 63 SKDSKAS---FLKKLKSNINTDGSLD---SVSASKIIAGKEPEL--- 100 p_redivivus_wormbase_g3517.t1 64 DKTSKIS---FLEKLIDALNVDGSLN---DVKASRIVAGKDAE---- 100 m_hapla_wormbase_MhA1_Contig624.frz3.gene9 63 SRDSKIT---FLQKLIDVLNIDGQLD---NLKPAKIVAGKEPEL--- 100 m_incognita_wormbase_Minc16324 63 SRDSKIA---FLQKLIDLLNINGELD---DLKPAKIVAGKEPEL--- 100 a_suum_wormbase_GS_02500 62 DKATKME---FLQKLIKALNDDGSLK---SVKAAKIVAGKEPELT-- 100 o_volvulus_wormbase_OVOC4606 65 DKTAKAS---FLKTLIKALNDDGSLK---NVKVGKIIAGKEP----- 100 d_immitis_wormbase_nDi.2.2.2.t00860 62 DKMAKTS---FLKTLIKALNDDGSLK---DVKAAKIIAGKEPEMT-- 100 b_malayi_wormbase_Bm1925 62 DKTAKAN---FLKTLIKALNDDGSLK---NVKAAKIIAGKEPEMT-- 100 l_loa_genblastg_EFO23623.1 62 DKTTKAN---FLKTLIKALNDDGSLK---DVKAAKIIAGKEPEMT-- 100 h_bacteriophora_genblastg_Hba_15833 63 DKSTKME---FLDKLIEALN-DDSLT---TVKSAKIVAGKEPEQT-- 100 h_contortus_genblastg_HCOI01514900.t1 63 DKTDKTA---FLDSLIEAVN-DGSLP---GIKSAKIVAGKEPELT-- 100 n_americanus_genblastg_NECAME_04663+NECAME_04664 63 DKAAKMA---FLDSLIQILN-DGSLD---DVKSSKIVAGKEPDLT-- 100

Figure 3.21: Multiple sequence alignment of first 100a.a. of dyf-11 orthologs. Among 25 nematode genomes, 24 dyf-11 orthologs are found, and 1 is not found.

3.3.16 Curation of dyf-13 orthologs in nematodes dyf-13 is involved in IFT and is expressed in ciliated neurons (Blacque et al., 2005). We identified dyf-13 orthologs in 23 nematode species, and the first 100 a.a. of these orthologs are fairly well- conserved (Figures 3.22 and 3.23).

78 Table 3.17: Curation of dyf-13 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE26172 WormBase gene model Yes 80.0 C. tropicalis Csp11.Scaffold629.g12146.t1 genBlastG gene model Yes 67.3 No RNA-seq data, but first 100a.a. are con- served; gap 300bp upstream C. brenneri CBN10651 GeMoMa gene model Yes 85.3 C. brenneri CBN25455 WormBase gene model No 56.6 First 100a.a. partially conserved (contains deletion) C. sinica Csp5_scaffold_00758.g15181.t1 WormBase gene model Yes 73.5 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG13079 WormBase gene model Yes 82.0 C. elegans C27H5.7a - - - C. japonica CJA25996 GeMoMa gene model Yes 74.0 Short contig; 5’ end of gene 200bp from end of contig 79 C. angaria Cang_2012_03_13_00375.g9968.t1 WormBase gene model Yes 36.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved H. bacteriophora Hba_20912+Hba_20913 GeMoMa gene model No 32.5 First 100a.a. partially conserved H. contortus HCOI01390400.t1 WormBase gene model Yes 33.7 A. ceylanicum Acey_s0134.g1822.t2 WormBase gene model Yes 36.5 N. americanus NECAME_01839+NECAME_01838 genBlastG gene model Yes 36.5 P. pacificus C27H5.7a_ortholog GeMoMa gene model Yes 29.1 P. exspectatus scaffold397-EXSNAP2012.13+ GeMoMa gene model Yes 29.1 No RNA-seq data, but first 100a.a. are scaffold397-EXSNAP2012.12 conserved S. ratti SRAE_0000037600 GeMoMa gene model Yes 29.6 P. redivivus g14826.t1 Manual gene model Yes 38.9 B. xylophilus BUX.s00961.74 GeMoMa gene model Yes 35.8 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) M. incognita C27H5.7a_ortholog - No 18.2 200bp from end of contig Curation of dyf-13 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig765.frz3.gene6+ genBlastG gene model No 26.2 First 100a.a. not conserved MhA1_Contig765.frz3.gene7 A. suum GS_13664 GeMoMa gene model Yes 31.4 D. immitis nDi.2.2.2.t03206 GeMoMa gene model Yes 34.2 O. volvulus OVOC10272 WormBase gene model Yes 33.6 B. malayi Bm2441 GeMoMa gene model Yes 31.0 L. loa EFO20955.2 WormBase gene model Yes 36.9 T. spiralis No ortholog found - - - Low sequence similarity (GeMoMa PID: 8.4, other predictions not found) T. suis No ortholog found - - - Low sequence similarity (GeMoMa PID: 20.6, other predictions not found) 80 m_incognita_gemoma_C27H5.7a_ortholog 1 -----QKIKVVFRNGDGALQLL-PTLVDI------VPEARLNLAIYHLKKNDTDAAFVL------MK-----NIKPQ 54 c_tropicalis_genblastg_Csp11.Scaffold629.g12146.t1 1 ---MMNLLRNRMRSSAVIP----MRKVPK------MPELD-----D FLSSHDYEGAISL------LN------VKA 46 c_japonica_gemoma_CJA25996 1 ---MLNLLRGRKKTSSQPV----LKKVQK------MPDLD-----D FLANMDYEGALSL------LN-----HKLRA 48 c_sinica_wormbase_Csp5_scaffold_00758.g15181.t1 1 --M--NLFRKGKKNPGPV-----LKKVPK------MPELD-----D FLGNQDYEGAISL------LN-----HKLKA 46 c_briggsae_wormbase_CBG13079 1 ---MMNLFRNRKKNNSGPA----LRKVPK------MPELD-----D FLANQDYEGAISL------LN-----HKLKA 48 c_brenneri_wormbase_CBN25455 1 ------MPELD-----D FQSNQDYEGAISL------LN-----HKLRA 26 c_brenneri_gemoma_CBN10651 1 ---MLNLFRGRKKNPPAG-----IKKVQK------MPELD-----D FLSNQDYEGAISL------LN-----HKLKA 47 c_remanei_wormbase_CRE26172 1 ---MLNLFRNRKKNAAGPA----LKKVQK------MPELD-----D FLANQDYEGAISL------LNIQIFQHKLRT 53 c_elegans_C27H5.7a 1 ---MLNLFRNRKRNGAGPT----IKKAQK------MPELD-----D FLSNQDYEGAISL------LN-----HKLKA 48 m_hapla_genblastg_MhA1_Contig765.frz3.gene6+MhA1_Con... 1 --MLLSRFRRNKKPEDENNNKNNKKDKNREDSDDGSKEGLFKRLN-----K FLEKRDYMGAISL------LE- - - - -HKMMK 64 s_ratti_gemoma_SRAE_0000037600 1 --MLLTRNKSSRLGKESNP----IHNIQE------TFNLE-----E FLLKRDYYGAISM------LE-----YQLKQ 49 p_redivivus_manual_g14826.t1 1 --MLFSRLRPSRTKAHP------RLKVDK------IKELD-----E FLEARDYTGALAT------LE-----HKQKN 47 p_exspectatus_gemoma_scaffold397-EXSNAP2012.13+scaf... 1 --MLLNKLRGKVKKDAPPVIERIRAGKQE------VPELK-----HFVDSRDYQGAISY------LQ-----FIRSE 53 p_pacificus_gemoma_C27H5.7_ortholog 1 --MLLNKLRGKVKKDAPPVIERIRAGKQE------VPELK-----HFVDSRDYQGAISY------LQ-----FIRSE 53 b_xylophilus_gemoma_BUX.s00961.74 1 --MLLSRLRPAWKKKD------EKKKVE------IPELE-----QFLEKKDYTGAISL------LE-ASLTHRQKS 50 h_bacteriophora_gemoma_Hba_20912+Hba_20913 1 ------MKKPVENR---PK-QQK------LPDLD-----E YLIKKDYAGATSL------LE-----FKQKN 39 h_contortus_wormbase_HCOI01390400.t1 1 --MLLSRLRPAKKKQSPAPR---PQIRQQ------IPDLE-----E FLIKKDYTAAISL------LE-----FKQKN 50 a_ceylanicum_wormbase_Acey_s0134.g1822.t2 1 --MLLSRLRPAKKKQQVVTR---PKPNQQ------IPDLD-----E FLLKRDYAAAISL------LQ-----YKQKN 50 n_americanus_genblastg_NECAME_01839+NECAME_01838 1 MLQLLSRLRPAKKKQPAATR---PKIAQQ------IPDLN-----E FLLKRDYAAAISL------LE-----FKQKN 52 c_angaria_wormbase_Cang_2012_03_13_00375.g9968.t1 1 --MLLSRLRSNRKKSNAASSSQ-PRKVQK------MPELE-----E FLMKKDYQGAISL------LE-----FKAKE 52 a_suum_gemoma_GS_13664 1 --MLLSRLRPAKKKTQE------KVKRRK------MPELE-----E FLVKRDYAGAISL------LE-----FRQNE 47 o_volvulus_wormbase_OVOC10272 1 --MLVSRSRRSSKKSEE------KLRDTK------IPELD-----E FLLKHDYIGAMSL------IQ-----FQSKD 47 d_immitis_gemoma_nDi.2.2.2.t03206 1 --MLLSRLRPSWKKSE------QKTVAK------IPEMD-----E FLLRRDYIGAMSF----KKLQ-----FQLKD 48 b_malayi_gemoma_Bm2441 1 --MLLSRLRPTWKKSEQ------KLKDKK------IPELD-----E FLLKRDYTGAMSLLQACKKLQ-----FQSKD 53 l_loa_wormbase_EFO20955.2 1 --MLLSRLRPTWKKSEQ------KLMDGK------IPELD-----E FLLKRDYTGAMSL------LQ--ACEFQSKN 50 m_incognita_gemoma_C27H5.7a_ortholog 55 STYEYLLKA------ITFCIKGIEENSQDLLNAAAGFFKIVGESPAECDTII------100 c_tropicalis_genblastg_Csp11.Scaffold629.g12146.t1 47 GQM-DQEES-----LYLWIAHCYYRLRNYEEAA--NVYLYLMQKTDAPGEL GNYLACCKFHM------100 c_japonica_gemoma_CJA25996 49 GNLSREEDD----SLRLWVAFCFYRLRNYEEAA--NVYISLMSKKDAPAEL GIYLACC------100 c_sinica_wormbase_Csp5_scaffold_00758.g15181.t1 47 GNLEKEQED----NLQLWIAHCYFRLRNYEQAA--QVYQFLMSRENAPAEL GVYLACCKF------100 c_briggsae_wormbase_CBG13079 49 GNLEREQED----NLQLWLAHCYYRLRNYEEAA--FVYQTLMDKDDSPAEL GVYLACC------100 c_brenneri_wormbase_CBN25455 27 GKLDREQED----SLQLWLAHCYYRLRNYDEAA--NVYIHLMEKEDAPAEL GVYLACCKFYMKQYLEAKAIADKCPKTPL 100 c_brenneri_gemoma_CBN10651 48 GKLDREQED----SLQLWLAHCYYRLRNYDEAA--NVYIHLMEKEDAPAEL GVYLACCK------100 c_remanei_wormbase_CRE26172 54 GNLDREQED----SLQLWLAHCYYRLRNYEEAA--NVYTFLMSKEDAPAEL GV------100 c_elegans_C27H5.7a 49 GNLDREEED----SLQLWLAHCYYRLRNYEEAA--NVYTFLMNKDDAPAEL GVYLACC------100 m_hapla_genblastg_MhA1_Contig765.frz3.gene6+MhA1_Con... 65 NEKNNSKNNDEQLNLKLWLGYCHFHAANYRNAK--SIY------100 s_ratti_gemoma_SRAE_0000037600 50 DPN-NIKTL------SWLGHCAFHAGEYKKAS--KIYEIILKHEKAPPGT EVHLGCAYFF------100 p_redivivus_manual_g14826.t1 48 EP--TLQNS------MWLAYCAYHGGMYQQAA--NVYESILKQKDAPPEV NLYLCCCYLMLGL------100 p_exspectatus_gemoma_scaffold397-EXSNAP2012.13+scaf... 54 KRY-DPENE------LWLGYCYFQSGEYSAA------RGEA NVYLGCCFFFTGHYEEAKE------100 p_pacificus_gemoma_C27H5.7_ortholog 54 KRY-DPENE------LWLGYCYFQSGEYSAA------RGEA NVYLGCCFFFTGHYEEAKE------100 b_xylophilus_gemoma_BUX.s00961.74 51 NGL-TIEDS------TWLGYCSYHAGDFKKAA--AAYSWIMAQPDAPPET FVYLGCCFF------100 h_bacteriophora_gemoma_Hba_20912+Hba_20913 40 GER-NETTD------LWLGHCYFRAGDYKKAT--DV------KVTHPEV PVYLAICYFFRGMYEEARNSAERAPKS-- 100 h_contortus_wormbase_HCOI01390400.t1 51 GEK-NETTD------LWLGHCFFRSGDYRKAL--EVYEDMKKQGIENPDL PVYLGICFF------100 a_ceylanicum_wormbase_Acey_s0134.g1822.t2 51 GEK-NETTD------LWLGHCYFRSGDYKRAL--DVYEEMKNNGSEYPDL PVFLGICYF------100 n_americanus_genblastg_NECAME_01839+NECAME_01838 53 GEK-NETTE------LWLGHCYFRSGDYKKAL--DVRYDMKNSGNENSDL PVFLGIC------100 c_angaria_wormbase_Cang_2012_03_13_00375.g9968.t1 53 SPEWDEARS------LWLGHCYFHAGEYRKAA--DIYEPMLEREECPPDA SLFLGC------100 a_suum_gemoma_GS_13664 48 GHR-DELTD------LWLGHCAFHAGEYKKAS--EVIE------CPPEV NVYLGCAFFFLGMYLEAKE------100 o_volvulus_wormbase_OVOC10272 48 GNI-DDMTE------LWLGHCAFHAGEYRKAI--STYEKMLVKKNCSSEI NVYIACCLFFLG------100 d_immitis_gemoma_nDi.2.2.2.t03206 49 GNT-NNMVE------LWLGHCAFRAGEYRKAI--SIYEKMLNKKNCSPEV NVYIACCLFFL------100 b_malayi_gemoma_Bm2441 54 MNT-DELTE------LWFGHCAFHAGEYRKAI------SNCPPEV NVYIACCFFFLGLCA------100 l_loa_wormbase_EFO20955.2 51 ENV-DKLTE------LWLGHCAFHAGEYRKAI--MVYERMLVRENCPAEV NVYIACCFF------100

Figure 3.22: Multiple sequence alignment of first 100a.a. of dyf-13 orthologs. Among 25 nematode genomes, 24 dyf-13 orthologs are found, and 2 are not found. Note: C. brenneri contains two dyf-13 genes, and only one of those genes have a high confidence 5’ start site.

c_tropicalis_genblastg_Csp11.Scaffold629.g12146.t1 1 ---MMNLLRNRMRSSAVIP----MRKVPKMPELDDFLSSHDYEGAISL------LNVKAGQMDQ-EESLYLWIAHCYYR 65 c_japonica_gemoma_CJA25996 1 ---MLNLLRGRKKTSSQPV----LKKVQKMPDLDDFLANMDYEGALSLLN- ----HKLRAGNLSREE-DDSLRLWVAFCFYR 69 c_sinica_wormbase_Csp5_scaffold_00758.g15181.t1 1 ----MNLFRKGKKN-PGPV----LKKVPKMPELDDFLGNQDYEGAISLLN- ----HKLKAGNLEKEQ-EDNLQLWIAHCYFR 67 c_briggsae_wormbase_CBG13079 1 ---MMNLFRNRKKNNSGPA----LRKVPKMPELDDFLANQDYEGAISLLN- ----HKLKAGNLEREQ-EDNLQLWLAHCYYR 69 c_brenneri_gemoma_CBN10651 1 ---MLNLFRGRKKNPPAG-----IKKVQKMPELDDFLSNQDYEGAISLLN- ----HKLKAGKLDREQ-EDSLQLWLAHCYYR 68 c_remanei_wormbase_CRE26172 1 ---MLNLFRNRKKNAAGPA----LKKVQKMPELDDFLANQDYEGAISLLNI QIFQHKLRTGNLDREQ-EDSLQLWLAHCYYR 74 c_elegans_C27H5.7a 1 ---MLNLFRNRKRNGAGPT----IKKAQKMPELDDFLSNQDYEGAISLLN- ----HKLKAGNLDREE-EDSLQLWLAHCYYR 69 s_ratti_gemoma_SRAE_0000037600 1 --MLLTRNKSSRLGKESNP----IHNIQETFNLEEFLLKRDYYGAISM------LEYQLKQDPN-NIKTLSWLGHCAFH 66 p_redivivus_manual_g14826.t1 1 --MLFSRLRPSRTKAHP------RLKVDKIKELDEFLEARDYTGALAT------LEHKQKNEPT--LQNSMWLAYCAYH 63 p_exspectatus_gemoma_scaffold397-EXSNAP2012.13+scaf... 1 --MLLNKLRGKVKKDAPPVIERIRAGKQEVPELKHFVDSRDYQGAISY------LQFIRSEKRY-DPENELWLGYCYFQ 70 p_pacificus_gemoma_C27H5.7_ortholog 1 --MLLNKLRGKVKKDAPPVIERIRAGKQEVPELKHFVDSRDYQGAISY------LQFIRSEKRY-DPENELWLGYCYFQ 70 b_xylophilus_gemoma_BUX.s00961.74 1 --MLLSRLRPAWKKKD------EKKKVEIPELEQFLEKKDYTGAISLLEA S-----LTHRQKSNGL-TIEDSTWLGYCSYH 67 h_contortus_wormbase_HCOI01390400.t1 1 --MLLSRLRPAKKKQSPAP--R-PQIRQQIPDLEEFLIKKDYTAAISL------LEFKQKNGEK-NETTDLWLGHCFFR 67 a_ceylanicum_wormbase_Acey_s0134.g1822.t2 1 --MLLSRLRPAKKKQQVVT--R-PKPNQQIPDLDEFLLKRDYAAAISL------LQYKQKNGEK-NETTDLWLGHCYFR 67 n_americanus_genblastg_NECAME_01839+NECAME_01838 1 MLQLLSRLRPAKKKQPAAT--R-PKIAQQIPDLNEFLLKRDYAAAISL------LEFKQKNGEK-NETTELWLGHCYFR 69 c_angaria_wormbase_Cang_2012_03_13_00375.g9968.t1 1 --MLLSRLRSNRKKSNAASSSQ-PRKVQKMPELEEFLMKKDYQGAISL------LEFKAKESPEWDEARSLWLGHCYFH 70 a_suum_gemoma_GS_13664 1 --MLLSRLRPAKKKTQE------KVKRRKMPELEEFLVKRDYAGAISL------LEFRQNEGHR-DELTDLWLGHCAFH 64 o_volvulus_wormbase_OVOC10272 1 --MLVSRSRRSSKKSEE------KLRDTKIPELDEFLLKHDYIGAMSL------IQFQSKDGNI-DDMTELWLGHCAFH 64 d_immitis_gemoma_nDi.2.2.2.t03206 1 --MLLSRLRPSWKKSE------QKTVAKIPEMDEFLLRRDYIGAMSF------KKLQFQLKDGNT-NNMVELWLGHCAFR 65 b_malayi_gemoma_Bm2441 1 --MLLSRLRPTWKKSEQ------KLKDKKIPELDEFLLKRDYTGAMSLLQAC---KKLQFQSKDMNT-DELTELWFGHCAFH 70 l_loa_wormbase_EFO20955.2 1 --MLLSRLRPTWKKSEQ------KLMDGKIPELDEFLLKRDYTGAMSLLQA ------CEFQSKNENV-DKLTELWLGHCAFH 67 c_tropicalis_genblastg_Csp11.Scaffold629.g12146.t1 66 LRNYEEAANVYLYLMQKTDAPGELGNYLACCKFHM------100 c_japonica_gemoma_CJA25996 70 LRNYEEAANVYISLMSKKDAPAELGIYLACC------100 c_sinica_wormbase_Csp5_scaffold_00758.g15181.t1 68 LRNYEQAAQVYQFLMSRENAPAELGVYLACCKF------100 c_briggsae_wormbase_CBG13079 70 LRNYEEAAFVYQTLMDKDDSPAELGVYLACC------100 c_brenneri_gemoma_CBN10651 69 LRNYDEAANVYIHLMEKEDAPAELGVYLACCK------100 c_remanei_wormbase_CRE26172 75 LRNYEEAANVYTFLMSKEDAPAELGV------100 c_elegans_C27H5.7a 70 LRNYEEAANVYTFLMNKDDAPAELGVYLACC------100 s_ratti_gemoma_SRAE_0000037600 67 AGEYKKASKIYEIILKHEKAPPGTEVHLGCAYFF------100 p_redivivus_manual_g14826.t1 64 GGMYQQAANVYESILKQKDAPPEVNLYLCCCYLMLGL------100 p_exspectatus_gemoma_scaffold397-EXSNAP2012.13+scaf... 71 SGEYSAA------RGEANVYLGCCFFFTGHYEEAKE 100 p_pacificus_gemoma_C27H5.7_ortholog 71 SGEYSAA------RGEANVYLGCCFFFTGHYEEAKE 100 b_xylophilus_gemoma_BUX.s00961.74 68 AGDFKKAAAAYSWIMAQPDAPPETFVYLGCCFF------100 h_contortus_wormbase_HCOI01390400.t1 68 SGDYRKALEVYEDMKKQGIENPDLPVYLGICFF------100 a_ceylanicum_wormbase_Acey_s0134.g1822.t2 68 SGDYKRALDVYEEMKNNGSEYPDLPVFLGICYF------100 n_americanus_genblastg_NECAME_01839+NECAME_01838 70 SGDYKKALDVRYDMKNSGNENSDLPVFLGIC------100 c_angaria_wormbase_Cang_2012_03_13_00375.g9968.t1 71 AGEYRKAADIYEPMLEREECPPDASLFLGC------100 a_suum_gemoma_GS_13664 65 AGEYKKASEVI------ECPPEVNVYLGCAFFFLGMYLEAKE 100 o_volvulus_wormbase_OVOC10272 65 AGEYRKAISTYEKMLVKKNCSSEINVYIACCLFFLG------100 d_immitis_gemoma_nDi.2.2.2.t03206 66 AGEYRKAISIYEKMLNKKNCSPEVNVYIACCLFFL------100 b_malayi_gemoma_Bm2441 71 AGEYRKAIS------NCPPEVNVYIACCFFFLGLCA---- 100 l_loa_wormbase_EFO20955.2 68 AGEYRKAIMVYERMLVRENCPAEVNVYIACCFF------100

Figure 3.23: Multiple sequence alignment of first 100a.a. of dyf-13 orthologs, only showing genes with high confidence 5’ start sites.

81 3.3.17 Curation of dyf-18 orthologs in nematodes dyf-18 is a kinase that functions in IFT (Phirke et al., 2011). Some IFT proteins show abnormal ac- cumulations along the cilia in dyf-18 mutants; for example, OSM-3 accumulates between the middle and distal segments and OSM-5 accumulates at the base of cilia (Phirke et al., 2011). We identi- fied dyf-18 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.24).

82 Table 3.18: Curation of dyf-18 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE15525 WormBase gene model Yes 77.0 C. tropicalis Csp11.Scaffold629.g15621.t2 GeMoMa gene model Yes 76.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri H01G02.2_ortholog genBlastG gene model Yes 75.0 C. sinica Csp5_scaffold_00015.g952.t1 WormBase gene model Yes 75.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG06171 WormBase gene model Yes 76.0 C. elegans H01G02.2 - - - C. japonica CJA10673b GeMoMa gene model Yes 74.3 C. angaria H01G02.2_ortholog genBlastG gene model Yes 59.2 Gap 500bp upstream H. bacteriophora Hba_14472 GeMoMa gene model Yes 20.0 No RNA-seq data, first 100a.a. partially 83 conserved H. contortus HCOI00363800.t1 WormBase gene model Yes 45.2 A. ceylanicum Acey_s0235.g3205.t1 WormBase gene model Yes 40.4 N. americanus NECAME_09786 - No 24.4 Gap 200bp upstream P. pacificus PPA01314 GeMoMa gene model No 34.6 First intron not supported; First 100a.a. not conserved; gene begins with TCG instead of ATG P. exspectatus scaffold430-EXSNAP2012.3 - No 35.6 First 100a.a. not conserved; gene begins with TCG instead of ATG S. ratti SRAE_2000245400 WormBase gene model Yes 31.1 P. redivivus g3143.t1 WormBase gene model Yes 39.4 Sparse RNA-seq data for this gene, but first 100a.a. are conserved B. xylophilus BUX.s00116.243 Manual gene model Yes 29.5 M. incognita Minc14870 WormBase gene model Yes 32.7 Sparse RNA-seq data for this gene, but first 100a.a. are conserved Curation of dyf-18 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig1126.frz3.gene1 genBlastG gene model Yes 26.9 No RNA-seq data, but first 100a.a. are conserved A. suum GS_10272 Manual gene model Yes 39.0 Upstream exons suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) D. immitis nDi.2.2.2.t11742 - No 3.9 Short contig (100bp from end of contig) O. volvulus OVOC3853 WormBase gene model Yes 34.2 B. malayi Bm4613c WormBase gene model Yes 31.5 L. loa EJD75780.1 WormBase gene model Yes 32.4 T. spiralis EFV56806 genBlastG gene model Yes 27.8 No RNA-seq data for this gene, but first 100a.a. are conserved T. suis M514_03611 GeMoMa gene model Yes 23.6 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are partially 84 conserved) d_immitis_wormbase_nDi.2.2.2.t11742 1 ------MEYVGSSLKLAIEDFNRPLNDEIPRYYMYQLF----VGVDYLHSLNIMHRDLKPDN 52 s_ratti_wormbase_SRAE_2000245400 1 ------MNILNKQYLTLEKVGRGSFSNVYKARSCENGELVALKE 38 t_spiralis_genblastg_H01G02.2_ortholog 1 ------MNSEIEKCRSIYDFKNLNKIGEGAYGT VYQAKDLKSGDIVAIKR 44 t_suis_gemoma_M514_03611 1 ------MYTSIKTGTAVPLPTGNDFRRCRPVYSFERLNQIGEGTYGV VYRARDTETGDIVALKR 58 h_bacteriophora_gemoma_Hba_14472 1 MSYIYITYCYLLFYSFLNLYGFPCNVTIFTRGCSMDKFDIVNRAGQGAYGV VLRARIKQTGKTVAIKK 68 h_contortus_wormbase_HCOI00363800.t1 1 ------MDRYEILRPAGQGAYGI VLRARIKESGKTVAIKK 34 n_americanus_gemoma_NECAME_09786 1 ------RTVAIKK 7 a_ceylanicum_wormbase_Acey_s0235.g3205.t1 1 ------MDRYEIVRPVGQGAYGI VLRAKTKEGGKTVAIKK 34 c_angaria_genblastg_H01G02.2_ortholog 1 ------MSNRYETIRVAGRGAFGT VIVAKDSFTKRRVAIKR 35 c_elegans_H01G02.2 1 ------MPSSIYRYETIQVAGRGAFGL VVIARDTLTSKRVAIKR 38 c_japonica_gemoma_CJA10673b 1 ------MPSSDRYETLQVAGRGAFGL VVIARDNFTKQRVAIKR 37 c_tropicalis_gemoma_Csp11.Scaffold629.g15621.t2 1 ------MNLSLNRYETIQVAGRGAFGL VVIAKDKLSNKRVAIKR 38 c_sinica_wormbase_Csp5_scaffold_00015.g952.t1 1 ------MTLSYNRYETVQVAGRGAFGL VVIARDNVSKKRVAIKR 38 c_briggsae_wormbase_CBG06171 1 ------MTLSFNRYETIQVAGRGAFGL VVIARDNVTKKRVAIKR 38 c_brenneri_genblastg_H01G02.2_ortholog 1 ------MTLSFNRYETLQVAGRGAFGL VVIARDNVTKKRVAIKR 38 c_remanei_wormbase_CRE15525 1 ------MTLSYNRYETIQVAGRGAFGI VVIARDNVSKKRVAIKR 38 p_exspectatus_gemoma_scaffold430-EXSNAP2012.3 1 ------SSSSTYRITEPIGQGAFGF VVKATVLETGSTVAIKR 36 p_pacificus_gemoma_PPA01314 1 ------SSSSTYRITEPIGQGAFGF VVKATVLETGSTVAIKR 36 p_redivivus_wormbase_g3143.t1 1 ------MPKYGIIGRIGQGAQGV LMKARAIETGETVAVKR 34 a_suum_manual_GS_10272 1 ------MSSSNEYGAMYGIIGPIGQGSFGL VMKARKLQTGETVAIKR 41 l_loa_wormbase_EJD75780.1 1 ------MVSRSRLYTSEGYDNSYNIISVVGRGAFGV VLKATHFQSHEEVAIKR 47 o_volvulus_wormbase_OVOC3853 1 ------MMSRSRLYTSEGYDDSYHIISVIGRGAFGV VLKATHFQSHEEVAIKR 47 b_malayi_wormbase_Bm4613c 1 ------MMSTSRLYTSEGYDDSYNIISIIGRGAFGV VLKATHFQSHEEVAIKR 47 b_xylophilus_manual_BUX.gene.s00116.243 1 ------MKYAFISRIGQGAYGC VLKAREVESGEIVAVKQ 33 m_hapla_genblastg_MhA1_Contig1126.frz3.gene1 1 ------MINSKYSFLSKLGQGAHGS VLKARNLETNKIFAIKQ 36 m_incognita_wormbase_Minc14870 1 ------MFNSKYSFLSKLGQGAHGS VLKARNLETNKIVAIKQ 36 d_immitis_wormbase_nDi.2.2.2.t11742 53 ILIT-STGLLKITDFGQCCIFVSDDPN----RNYDCQVAS------RWYRAPELLFGST- 100 s_ratti_wormbase_SRAE_2000245400 39 IFIT-KKSDKHIDIVREVILMKNLKNNINITKYKYCFGSN-----ERFILVMEYTPWDLRD IMEDYDN 100 t_spiralis_genblastg_H01G02.2_ortholog 45 VRCD-V--GLEMSTMREIAILKRTKHK-NIIALREVAIGQ---SLNSVFLVMEYCEHDLGSLL----- 100 t_suis_gemoma_M514_03611 59 MRTTCNQEGIPLSSLREVNILLNIRHR-NIIRLMDVAVSR-----DID------100 h_bacteriophora_gemoma_Hba_14472 69 MNVT-T--RDRLPILRELCALRNMNHP-KILTLIDV------100 h_contortus_wormbase_HCOI00363800.t1 35 MTVT-T--RNRLQILRELCTLRNLHHS-KVLKLVDVFCSR-----DSLSLV TEFVPYHLNDIITDPQR 93 n_americanus_gemoma_NECAME_09786 8 MTVT-S--RNQLQILRELCALRNLHHP-KVLKLLDVFCSR-----DSLSLV TEFIPFNLSNIISDPHR 66 a_ceylanicum_wormbase_Acey_s0235.g3205.t1 35 MTVT-S--RNQLQIIRELCALRNLHHP-KVLKLIDVFCSR-----DSLSLV TEFVPFNLSDVITDTQR 93 c_angaria_genblastg_H01G02.2_ortholog 36 ILIQ-N--VSKIQLSREISAIRCLLFL-QILKYIDCFAQS-----DLVSIV TEEVPYTLANIIEDKKR 94 c_elegans_H01G02.2 39 IMVP-N--VSKVSLAREISCLRNLHHR-NILKLLDCFPSA-----DLMSIV TEEVPYTLGDIIKDKTR 97 c_japonica_gemoma_CJA10673b 38 IIVP-N--VSKLHLVREISSLRCLHHR-NVLKYLDCFATA-----DILSIV TEEVPYTLADVIKDTLK 96 c_tropicalis_gemoma_Csp11.Scaffold629.g15621.t2 39 IMVP-N--VSKVQLAREISCLRVLHCR-NVLKYIDCFATG-----DMISIV TEEVPFTFGDVIRDKSR 97 c_sinica_wormbase_Csp5_scaffold_00015.g952.t1 39 ILVP-G--VSKIQLVREVSCLRSLHHR-NVLKFLDSFGTA-----DMVSIV TEEVPYTLGDVIKDKSR 97 c_briggsae_wormbase_CBG06171 39 ILIP-N--VSKIQLVREVSCLRCLHHR-NVLKYLDCFGTA-----DMVSIV TEEVPYTLADVINDKSR 97 c_brenneri_genblastg_H01G02.2_ortholog 39 ILVP-N--VSKIQLVREISCLRCLHCR-NVLKYLDCFATG-----DMISIV TEEVPFTFGDVIKDKSR 97 c_remanei_wormbase_CRE15525 39 ILVP-N--VSKLQLVREISSLRCLHHR-NILKYLDCFGTA-----DMISIV TEEVPYTFGDVIKDKSR 97 p_exspectatus_gemoma_scaffold430-EXSNAP2012.3 37 VNLA-T--RERAEC--ELAILRACRHP-NIVRFLDVCQSA-----STMSIVMEFVQFNLTDLIAGLRR 93 p_pacificus_gemoma_PPA01314 37 VNLA-T--RERAEC--ELAILRACRHP-NIVRFLDVCQSP-----STMSIVMEFVQFNLTDLIAGLRR 93 p_redivivus_wormbase_g3143.t1 35 IHLR-NDTKSHVDALREINCLRHCNHE-SIVKILDTFFDE-----NTVSIVMEFVESNLKLVINDVSR 95 a_suum_manual_GS_10272 42 IPIK-SDRRSEIEIIREMFALRNTDHE-NIVKLLDIVSST-----DTISLVMEYVESNLKAVIEDV-- 100 l_loa_wormbase_EJD75780.1 48 IPLK-RNSRNEIALIREVFALRNAYHK-NVVRLFDVMLNT-----DTVSLVMEYVESSLK------100 o_volvulus_wormbase_OVOC3853 48 IPLK-RNKRNEISLIREVFALRNAYHK-NIVRLFDVILNT-----DAVSLVMEYVGSSLK------100 b_malayi_wormbase_Bm4613c 48 IPLK-RNSRNEIALIREVFALRNAYHK-NVVRLFDVILNT-----DTISLI MEYVGSSLK------100 b_xylophilus_manual_BUX.gene.s00116.243 34 IALK-LYTQNQMDIFREINSLKNFDHE-NVVSLKEILYSS-----TTVSVV LEFVESNLKLVIYDERR 94 m_hapla_genblastg_MhA1_Contig1126.frz3.gene1 37 ITLV-EDTNVQLRIFREIQSLRLFNHK-NIIKLFDIIFNKRNNEIINISLI LEFIESNLKLVITDI-- 100 m_incognita_wormbase_Minc14870 37 ITLV-EDINVQLRIFREIQSLRLCNHK-NIIKLFDIIFNKRNEELINISLI FEFIESNLKLVIQDV-- 100 d_immitis_wormbase_nDi.2.2.2.t11742 ------s_ratti_wormbase_SRAE_2000245400 ------t_spiralis_genblastg_H01G02.2_ortholog ------t_suis_gemoma_M514_03611 ------h_bacteriophora_gemoma_Hba_14472 ------h_contortus_wormbase_HCOI00363800.t1 94 PKDDGFL------100 n_americanus_gemoma_NECAME_09786 67 PQEEYFLRFFFRQIVDGMKYIHSLYIMHRDLKPE 100 a_ceylanicum_wormbase_Acey_s0235.g3205.t1 94 PQEDALL------100 c_angaria_genblastg_H01G02.2_ortholog 95 PRTEQF------100 c_elegans_H01G02.2 98 PKT------100 c_japonica_gemoma_CJA10673b 97 PKGE------100 c_tropicalis_gemoma_Csp11.Scaffold629.g15621.t2 98 PKT------100 c_sinica_wormbase_Csp5_scaffold_00015.g952.t1 98 PKT------100 c_briggsae_wormbase_CBG06171 98 PKT------100 c_brenneri_genblastg_H01G02.2_ortholog 98 PKT------100 c_remanei_wormbase_CRE15525 98 PKT------100 p_exspectatus_gemoma_scaffold430-EXSNAP2012.3 94 SSQEEII------100 p_pacificus_gemoma_PPA01314 94 SSQEETI------100 p_redivivus_wormbase_g3143.t1 96 PHNDV------100 a_suum_manual_GS_10272 ------l_loa_wormbase_EJD75780.1 ------o_volvulus_wormbase_OVOC3853 ------b_malayi_wormbase_Bm4613c ------b_xylophilus_manual_BUX.gene.s00116.243 95 PTNMDL------100 m_hapla_genblastg_MhA1_Contig1126.frz3.gene1 ------m_incognita_wormbase_Minc14870 ------

Figure 3.24: Multiple sequence alignment of first 100a.a. of dyf-18 orthologs. Among 25 nematode genomes, 25 dyf-18 orthologs are found, and none are not found.

3.3.18 Curation of dylt-2 orthologs in nematodes dylt-2 is a dynein light chain subunit expressed in ciliated neurons (Efimenko et al., 2005). We identified dylt-2 orthologs in 20 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.25).

85 Table 3.19: Curation of dylt-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE04272 WormBase gene model Yes 81.0 C. tropicalis D1009.5_ortholog GeMoMa gene model Yes 79.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN18284 WormBase gene model Yes 81.0 C. sinica Csp5_scaffold_00034.g1869.t1 WormBase gene model Yes 74.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG00241 Manual gene model Yes 70.0 C. elegans D1009.5 - - - C. japonica CJA08890 WormBase gene model Yes 70.0 C. angaria D1009.5_ortholog GeMoMa gene model Yes 62.4 Sparse RNA-seq data for this gene, but first 100a.a. are conserved

86 H. bacteriophora Hba_08082 GeMoMa gene model Yes 41.3 No RNA-seq data, but first 100a.a. are conserved H. contortus No ortholog found - - - Low sequence similarity (GeMoMa PID: 31.5, other predictions not found) A. ceylanicum Acey_s0100.g3268.t1+ Manual gene model Yes 40.6 Acey_s0100.g3270.t1 N. americanus D1009.5_ortholog Manual gene model Yes 43.6 P. pacificus PPA07033 WormBase gene model Yes 39.4 P. exspectatus scaffold153-EXSNAP2012.8 WormBase gene model Yes 40.4 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_1000028800 WormBase gene model Yes 32.7 P. redivivus g22958.t1 GeMoMa gene model Yes 34.3 B. xylophilus No ortholog found - - - Low sequence similarity (GeMoMa PID: 21.2, other predictions not found) M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 17.8, other predictions not found) Curation of dylt-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla No ortholog found - - - Low sequence similarity (GeMoMa PID: 18.7, other predictions not found) A. suum GS_23052 GeMoMa gene model Yes 36.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved D. immitis nDi.2.2.2.t09876 WormBase gene model No 22.8 5’ end of protein sequence contains Xs O. volvulus OVOC3266 WormBase gene model Yes 36.2 B. malayi Bm8840 GeMoMa gene model Yes 34.0 L. loa EFO25471.1 WormBase gene model Yes 35.5 T. spiralis No ortholog found - - - Low sequence similarity (GeMoMa PID: 16.3, other predictions not found) T. suis M514_06564 GeMoMa gene model Yes 21.8 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved, 87 5’ end of gene can’t be extended due to stop codons) t_suis_gemoma_D1009.5_R0 1 ------MSELEPFSLHANETIEKGYVLRPKYENKFTLHKVQPLL RDALIDELSNVKIYNPEKADDL 60 c_angaria_gemoma_D1009.5_ortholog 1 ------MQNNEERNFVLRPTPDQKFRPNAVLPMI KEVVTDKLSA-TTYNFDEAEDL 49 c_japonica_wormbase_CJA08890 1 ------MDDANDRNFVLRPTPGQKFRPKAVNGVVQEILHEKLNGLEHYSMEESEKL 50 c_briggsae_manual_CBG00241 1 ------MEDNVDRNFVLRPTPGQKFRPKAITGMI QEVMNEKLGDMIKYDENEADQA 50 c_sinica_wormbase_Csp5_scaffold_00034.g1869.t1 1 ------MDDATDRNFVLRPTPGQKFRPKAITNMI QEVMAEKLGSMNKYDENEADQA 50 c_brenneri_wormbase_CBN18284 1 ------MDDANDRNFVLRPTPGQKFRPKTVSAMI HEVLGEKLSALTTYNVDEADQA 50 c_tropicalis_gemoma_D1009.5_ortholog 1 ------MDDSNERNFVLRPTPGQKFRPQSISGMI QDVLSEKLGSLTTYNVDEGDQV 50 c_remanei_wormbase_CRE04272 1 ------MEDANNRNFVLRPTPGQKFRPKAVTAMI QEVLGEKLGAVIAYDENEADQA 50 c_elegans_D1009.5 1 ------MEDANDRNFVLRPTPGQKFRPKAVAGMI QEILGEKLGALTIYNVDEAELV 50 p_redivivus_gemoma_g22958.t1 1 ------MSEEPPKLDGFVLRPSLQHKFRPTVGKKIL EKCASELLTA-QKYADANVNEL 51 s_ratti_wormbase_SRAE_1000028800 1 ------MDGFVLRPNIQNKFRSTMGKKIL EEVCQELLHD-KTFELATIEDS 44 a_suum_gemoma_D1009.5_ortholog 1 ------MDTSGFIIRPTNQNKFRSILGKRIL EETLNENLAG-RSYESDDVQTL 46 b_malayi_gemoma_Bm8840 1 ------MIYNGATSDVSGLVIRPTNQEKFRATIGQRIL EEVLVKSLEG-YTFESSNAEQL 53 d_immitis_wormbase_nDi.2.2.2.t09876 1 MNYNTTATTTTTTTTXXXTTTTTTTTTDISGLTIRPTNQEKFRATVGRRIL EEVLAENLAG-HIFEPNNAEQL 72 l_loa_wormbase_EFO25471.1 1 ------MNNNATALDVSGLVIRPTNQEKFRATVGRRIL EEVLAESLGG-HTFEPNNAEQL 53 o_volvulus_wormbase_OVOC3266 1 ------MNYNTTTFDVSGLVIRPTNQEKFRTTVGRRIL EEVLAENLGG-HTFESSNAEQL 53 p_exspectatus_wormbase_scaffold153-EXSNAP2012.8 1 ------MADRDFVIRAAQGEKFRAATAEKIMTEVVAEQLAG-EIFSMLTVEDL 46 p_pacificus_wormbase_PPA07033 1 ------MADRDFVIRAAQGEKFRAATAEKIMTEVVAEQLAG-EIFSMLTVEDL 46 h_bacteriophora_gemoma_Hba_08082 1 ------MVERNFILRPNQNQKFRSSHGEIIL RETMEEILSG-QLYEEDTVEEL 46 a_ceylanicum_manual_Acey_s0100.g3268.t1+Acey_s0100.g3270.t1 1 ------MSATNIERDFVLRPKQNEKFRKEDGEALL RTVAEETLAI-HNFEATNPSSL 50 n_americanus_manual_D1009.5_ortholog 1 ------MSAANVERDFVLRPKQSEKFRKEVGEALL RNVAEETLAS-IYFEEASSSSL 50 t_suis_gemoma_D1009.5_R0 61 AMSVMKAVRKRLKESTMKDYKFIVQCVVFERCGQGVEYDA------100 c_angaria_gemoma_D1009.5_ortholog 50 SKELSSTIRNRLKGLQLPRYKYIVQVYLAEQAGQGMATATQSVWDEDCDSY ----- 100 c_japonica_wormbase_CJA08890 51 TNEISVLIRDRLKVLQLPRYKYIVQTMITEQIGHGATTALQCCWDEDCDS------100 c_briggsae_manual_CBG00241 51 SKDISSTIRERLKGTQSSTLQVFVQTMIAEQSGSGATTAVQCVWDEDCDG------100 c_sinica_wormbase_Csp5_scaffold_00034.g1869.t1 51 SKSISASIRERLKALGLPRYKYIVQTMVAEQTGSGATTAVQCVWDEDCDG------100 c_brenneri_wormbase_CBN18284 51 SKDISKAIRERLKNLQLPRYKYIVQTMLAEQTGNGATTAVQCIWDEDCDG------100 c_tropicalis_gemoma_D1009.5_ortholog 51 SKDISAAIRERLKNLQLPRYKYIVQTMLAEQTGNGATTAVQCVWDEDCDG------100 c_remanei_wormbase_CRE04272 51 SRDISAAIREKLKGLQLPRYKYIVQTMIAEQCGNGATTAIQCVWDEDCDG------100 c_elegans_D1009.5 51 SKDISASIRERLKGLQLPRYKYVIQTMIAEQCGNGATTAVQCVWDEDCDG------100 p_redivivus_gemoma_g22958.t1 52 SRVLAENVRKEFIDLQLPRNKYVVEVILGEQRGQGARIHSGCSWDVDTD------100 s_ratti_wormbase_SRAE_1000028800 45 SVIVAETIRNRLKDLNLPRYKYIVKVVIAEQRGQGMNVTASCMWDPDTDNA VSHLY 100 a_suum_gemoma_D1009.5_ortholog 47 SDSLAVNIREKLKALNLPHYKYVVQVVIGEQRGQGARIGGACMWDSDTDSV SHH-- 100 b_malayi_gemoma_Bm8840 54 SNSLSAIIRNRLKELNLPKYKYIIQVILGEEHGQRVRAHAACMWDSD------100 d_immitis_wormbase_nDi.2.2.2.t09876 73 SNSLADKIRNRLKELNLPKYKYIIQVIL------100 l_loa_wormbase_EFO25471.1 54 SNSLSTLIRNRLKGLNLPKYKYIIQVILGEERGQRVRAHAACRWDSD------100 o_volvulus_wormbase_OVOC3266 54 SNSLANIIRNRLKGLNLPKYKYIIQVILGEERGQRVRAHGACMWDSD------100 p_exspectatus_wormbase_scaffold153-EXSNAP2012.8 47 GRKISDAINQRLKGLNLPRYKFIVQVMIGESRGQGVHAMSQCVWDADVDGMATI-- 100 p_pacificus_wormbase_PPA07033 47 GRKISDAINQRLKDLNLPRYKFIVQVMIGESRGQGVHAMSQCVWDADVDGMATI-- 100 h_bacteriophora_gemoma_Hba_08082 47 SVKIMVEVRSKLKALSFPNYKYIIQVMIGEQHGQGMNVLSQCVWDTDCDGS AKF-- 100 a_ceylanicum_manual_Acey_s0100.g3268.t1+Acey_s0100.g3270.t1 51 TDVIATTIRDRLKGLHLRQYKYVVQVAIGEQRGQGLNITSQCFWDNDCDG------100 n_americanus_manual_D1009.5_ortholog 51 ADSITAKIRDRLKALQLPHYKYIVQVAIGEQRGHGLNITSQCVWDTDCDG------100

Figure 3.25: Multiple sequence alignment of first 100a.a. of dylt-2 orthologs. Among 25 nematode genomes, 20 dylt-2 orthologs are found, and 5 are not found.

3.3.19 Curation of ift-20 orthologs in nematodes ift-20 is an IFT-B component expressed in ciliated neurons (Ou et al., 2007). In mammalian cells, strong knockdown of the homolog IFT20 blocks ciliary assembly (Follit et al., 2006). We identified ift-20 orthologs in 22 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.26).

88 Table 3.20: Curation of ift-20 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE19293 GeMoMa gene model Yes 83.7 C. tropicalis Csp11.Scaffold542.g3418.t1 WormBase gene model Yes 94.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN07292 WormBase gene model Yes 91.0 C. brenneri CBN16460 WormBase gene model Yes 90.0 C. sinica Csp5_scaffold_00676.g14302.t1 WormBase gene model Yes 91.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG10810 WormBase gene model Yes 90.0 C. elegans Y110A7A.20 - - - C. japonica CJA11221 WormBase gene model Yes 90.0 C. angaria Cang_2012_03_13_00270.g8222.t1 WormBase gene model Yes 75.0 89 H. bacteriophora Hba_03885 GeMoMa gene model Yes 28.2 No RNA-seq data, first 100 a.a. partially con- served; protein length is 97a.a. H. contortus HCOI00592900.t1 GeMoMa gene model Yes 27.0 Gap in middle of gene A. ceylanicum Acey_s0002.g679.t1 WormBase gene model Yes 43.1 N. americanus NECAME_03337 WormBase gene model Yes 42.2 P. pacificus Y110A7A.20_ortholog GeMoMa gene model Yes 30.4 P. exspectatus Y110A7A.20_ortholog genBlastG gene model No 30.8 No RNA-seq data, first 100a.a. partially conserved S. ratti SRAE_2000384900 WormBase gene model Yes 27.9 P. redivivus g489.t1 WormBase gene model Yes 16.4 B. xylophilus BUX.s00333.29 WormBase gene model Yes 24.8 M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 18.6, other predictions not found) M. hapla MhA1_Contig209.frz3.gene2 WormBase gene model Yes 19.8 No RNA-seq data, first 100 a.a. partially conserved Curation of ift-20 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum Y110A7A.20_ortholog Manual gene model Yes 37.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved D. immitis nDi.2.2.2.t01895 WormBase gene model Yes 35.6 O. volvulus OVOC4508 WormBase gene model Yes 37.5 B. malayi Bm9193 WormBase gene model Yes 36.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved L. loa EFO18254.1 WormBase gene model Yes 38.5 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis No ortholog found - - - Low sequence similarity (GeMoMa PID: 11.4, other predictions not found) T. suis No ortholog found - - - Low sequence similarity (GeMoMa PID: 8.3, other predictions not found) 90 b_xylophilus_wormbase_BUX.s00333.29 1 -----MNLNNNE-SLITSKMNGITVDDMNSLRIGDPEVTEMSIELQKLCDQFTHNVGELQGRLSDVQEVFRELG 68 p_exspectatus_genblastg_Y110A7A.20_ortholog 1 ------MGEADA-ILKKA---GLYMDDLHRLRLL------TTTALSEQSKE FSTTLNSFLGSSESLIKTFEEVS 58 p_pacificus_gemoma_Y110A7A.20_ortholog 1 ------MSEADA-ILKKA---GLYMDDLHRLRLLAPDAWETTTALSEQSKE FSTTLDSFLGSSQSLIKTFEERP 64 c_angaria_wormbase_Cang_2012_03_13_00270.g8222.t1 1 ------MVEE-HLAKS---GLFVDDFNKLRLIDPEVAEILQGAHDKSKE FNDQLRSFYSTTGGLIESIEEFA 62 c_briggsae_wormbase_CBG10810 1 ------MGDE-QLAKS---GLYVDDFNQLRLLDPDVAELLQTAQDKSSE FNEQLKSFQTVTCGLIDSIEEFA 62 c_japonica_wormbase_CJA11221 1 ------MGDE-HLAKA---GLFVDDFNRLRLIDPEAAELLQTAQDKSTE FNEQLKNFKTTTGGLIDSIEEFA 62 c_remanei_gemoma_CRE19293 1 -----MSTAADEQQLAKS---GLYVDDFNRLRLIDPEVAELLQNAQDKSAE FNDQLKNFQTTTGGLIDSIEEFA 66 c_brenneri_wormbase_CBN07292 1 ------MGDD-QLAKA---GLYVDDFNQLRLIDPDVAELLQSAQDKSSE FNEQLKNFQTTTGGLIESIEEFA 62 c_brenneri_wormbase_CBN16460 1 ------MGDD-QLAKA---GLYVDDFNQLRLIDPDVAELLQSAQDKSAE FNEQLKNFQTTTGGLIESIEEFA 62 c_sinica_wormbase_Csp5_scaffold_00676.g14302.t1 1 ------MGEE-QLAKA---GLYVDDFNHLRLLDPDVAELLQSAQDKSAE FNDQLKNFHSTTGGLIDSIEEFA 62 c_tropicalis_wormbase_Csp11.Scaffold542.g3418.t1 1 ------MGEE-QLAKA---GLFVDDFNQLRLIDPDVAELLQNAQDKSAE FNDQLKNFQSTTGGLIDSIEEFA 62 c_elegans_Y110A7A.20 1 ------MGDE-QLAKA---GLFVDDFNRLRLIDPDVAELLQSAQDKSSE FNDQLKNFQTTTGGLIDSIEEFA 62 a_suum_manual_Y110A7A.20_ortholog 1 ------MAEE-ALAKC---GLHLDSFNKIRLLQTDLSESGYELTDEVKE FRSKVNAFQSNTDGIMEILHEFA 62 b_malayi_wormbase_Bm9193 1 ------MADE-ALMKA---GLYLDAFSKIRLLQPGIADVSNELVEEAKE IVNKLNTFNDATEAIIKAFDGLA 62 d_immitis_wormbase_nDi.2.2.2.t01895 1 ------MADE-TLMKA---GLHVDAYNKIHLLQPDVADASNELVEETKE IVNKLNTFSDVTKAIIKAFDELA 62 l_loa_wormbase_EFO18254.1 1 ------MADE-ALMKL---GLFVDAYNKIRLLQPDVADASNELVEEVKE IVNKLNTFSEATEAIIKAFDGLA 62 o_volvulus_wormbase_OVOC4508 1 ------MVDE-ALMKA---GLYVDAYNKIRLLQPDVADASNELVEGAKEMVNKLNTFNDATATIIKAFDGLA 62 h_bacteriophora_gemoma_Y110A7A.20_R0 1 ------MSDE-ILNRV---GIYIDELNRLRLLNPEVADTCNDLYNESKGFSAHMVAFSNTTEALMKTLEEII 62 a_ceylanicum_wormbase_Acey_s0002.g679.t1 1 ------MADE-ILNKA---GIHIDEMNRIRLMDPEISDTLGDLRSQSRD FASQMTSFRATTDGLIKAFEELA 62 n_americanus_wormbase_NECAME_03337 1 ------MADE-ILNKA---GIHIDEMNRIRLMDPEISDTLGDLRTNSRD FAAQMTSFRTTTEGLLKAFEELA 62 h_contortus_gemoma_HCOI00592900.t1 1 ------MADEEVLSKA---GIHIDEMNRIRLLDPEISDTLSDLRGQARD FGAQMTSFRSTTDGLLKAFEEFL 63 m_hapla_wormbase_MhA1_Contig209.frz3.gene2 1 -----MDRQIDD-DFANR---KVFIDDLNHIRLVNPALLESSSNLAKEGKHFIGQFDSFIKTVREVREAMEKIG 65 p_redivivus_wormbase_g489.t1 1 MLCVSQYLVVNM-DSGSN---NVVIDELNQVRLVNPDVLEDSMRLSNETTT FVDKLTKFDQLVESVASMLSDLG 70 s_ratti_wormbase_SRAE_2000384900 1 ------MNDN---IED---NVYIDELNRLKLINANVSESSQTLKNETSE FVVKVDKFFESAKFLIDSMKELG 60 b_xylophilus_wormbase_BUX.s00333.29 69 EL---ADREKIIAMNTANEL-KMIQSNQDSSKDTMR------100 p_exspectatus_genblastg_Y110A7A.20_ortholog 59 SP---ISLPKFNYPMIQ----KSSIRKLRLYDSNLQVLIRQKEVELERL 100 p_pacificus_gemoma_Y110A7A.20_ortholog 65 MRE--AIPPKLR-LNDITTL-QVLIRQKEVELERLRTELE------100 c_angaria_wormbase_Cang_2012_03_13_00270.g8222.t1 63 KI---VETEKIRAMMVRNSA-EKDLKA--EDPVILQMTIRELSV----- 100 c_briggsae_wormbase_CBG10810 63 NV---VETEKIKAMMVRNTQ-ERDLAE--DDPVLLQMTIRELTV----- 100 c_japonica_wormbase_CJA11221 63 KV---VETEKIRAMMSRNTQ-ERDAAE--DDPVLLQMTIRELTV----- 100 c_remanei_gemoma_CRE19293 67 NV---VETEKIRAMMVRNTQ-ERDLAE--DDPVLLQMTIR------100 c_brenneri_wormbase_CBN07292 63 NV---VETEKIRAMMLRNTQ-DRDMAE--NDPVLLQMTIRELTV----- 100 c_brenneri_wormbase_CBN16460 63 NV---VETEKIRAMMLRNTQ-DRDMAE--NDPVLLQMTIRELTV----- 100 c_sinica_wormbase_Csp5_scaffold_00676.g14302.t1 63 NV---VETEKIKAMMVRNTQ-ERDLAD--DDPVLLQMTIRELTV----- 100 c_tropicalis_wormbase_Csp11.Scaffold542.g3418.t1 63 NV---VETEKIRAMMVRNTQ-ERDVAE--DDPVLLQMTIRELTV----- 100 c_elegans_Y110A7A.20 63 NV---VETEKIRAMMVRNTQ-ERDLAE--DDPVLLQMTIRELTV----- 100 a_suum_manual_Y110A7A.20_ortholog 63 KM---VDSEKLRAMSSRNAL-KMADKQKIADAQQLQILIRER------100 b_malayi_wormbase_Bm9193 63 KT---VEGEKIRAMSSRNAL-EMADKQHVVEEQQLQILIRER------100 d_immitis_wormbase_nDi.2.2.2.t01895 63 IT---VEGEKMRAISSRNVL-KTTDKQHIVDEQQLQILIRER------100 l_loa_wormbase_EFO18254.1 63 IT---VEGEKIRAMSSRNAL-KTTDKQHVANEQQLQILIRER------100 o_volvulus_wormbase_OVOC4508 63 IT---VEGEKIRAMSSRNAL-KSVDKQHVADEQQLQILIRER------100 h_bacteriophora_gemoma_Y110A7A.20_R0 63 IRERQVELERLHTELTAA---QSIEREQKEYLQQLINH------97 a_ceylanicum_wormbase_Acey_s0002.g679.t1 63 TL---VEAEKLRAMAARSAF-QSVDKARSADSQQLQIVIRER------100 n_americanus_wormbase_NECAME_03337 63 AL---VEAEKLRAMAARSAF-QSVDKTRSSDSQQLQIVIRER------100 h_contortus_gemoma_HCOI00592900.t1 64 HLHLQIQIREKQVELERLRV-ELAALQQVEQEQKDIIQ------100 m_hapla_wormbase_MhA1_Contig209.frz3.gene2 66 RI---VDGERLRLLTLQNQLDKSGNNSSSEETQQLQLL------100 p_redivivus_wormbase_g489.t1 71 RV---ADAERLRAMSAQSAA-KQSESQRLEERQQ------100 s_ratti_wormbase_SRAE_2000384900 61 KT---VDLARLKALETRNAL-RNSDNEKLNEQQQLRILIRQKQI----- 100

Figure 3.26: Multiple sequence alignment of first 100a.a. of ift-20 orthologs. Among 25 nematode genomes, 23 ift-20 orthologs are found, and 3 are not found. Note: C. brenneri contains two ift-20 genes, and both genes have high confidence 5’ start sites.

3.3.20 Curation of ifta-1 orthologs in nematodes ifta-1 associates with IFT-A in ciliated neurons and functions in retrograde IFT (Blacque et al., 2006). ifta-1 mutants have shortened cilia and disrupted IFT, with IFT proteins accumulating along the cilium (Blacque et al., 2006). We identified ifta-1 orthologs in 21 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.27).

91 Table 3.21: Curation of ifta-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE30141 WormBase gene model Yes 65.7 C. tropicalis Csp11.Scaffold626.g6613.t3 WormBase gene model Yes 63.3 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN17343 WormBase gene model Yes 69.6 C. sinica Csp5_scaffold_05456.g35025.t1 WormBase gene model Yes 58.9 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG14804 genBlastG gene model Yes 72.0 C. elegans C54G7.4 - - - C. japonica CJA21748 GeMoMa gene model Yes 67.0 C. angaria Cang_2012_03_13_00195.g6733.t1 WormBase gene model Yes 41.7 H. bacteriophora Hba_12179+Hba_12180+ WormBase gene model Yes 40.0 No RNA-seq data, but first 100a.a. are 92 Hba_12181+Hba_12182 conserved H. contortus HCOI00986600.t1 - No 13.4 300bp from end of contig A. ceylanicum Acey_s0187.g1113.t2 WormBase gene model Yes 37.7 No RNA-seq data for first intron, but first 100a.a. are conserved N. americanus No ortholog found - - - Low sequence similarity (GeMoMa PID: 18.2, genBlastG PID: 19.8) P. pacificus PPA23338 WormBase gene model No 4.9 Gap in 5’ end of alignment P. exspectatus No ortholog found - - - Low sequence similarity (GeMoMa PID: 22.9, genBlastG PID: 18.3) S. ratti C54G7.4_ortholog GeMoMa gene model No 18.4 No RNA-seq for this gene, first 100a.a. par- tially conserved, and gene begins with ATT P. redivivus g15008.t1 WormBase gene model Yes 24.8 B. xylophilus BUX.s00460.498 Manual gene model Yes 21.0 M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 16.4, genBlastG PID: 4.8) Curation of ifta-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig1878.frz3.fgene2 WormBase gene model Yes 22.5 No RNA-seq data, but first 100a.a. are conserved A. suum GS_11459 - No 9.0 Gap in 5’ end of alignment D. immitis nDi.2.2.2.t06430+nDi.2.2.2.t06427 Manual gene model No 17.4 Gaps in 5’ end of alignment O. volvulus OVOC1858 WormBase gene model Yes 24.4 B. malayi No ortholog found - - - Low sequence similarity (GeMoMa PID: 12.9, genBlastG PID: 6.5) L. loa EJD74175.1 WormBase gene model No 5.6 Gap in 5’ end of alignment T. spiralis EFV55416 WormBase gene model Yes 24.2 T. suis M514_26693 Manual gene model Yes 26.2 93 p_pacificus_wormbase_PPA23338 1 ---MSIENVDMVWHKRRHSFPPSPPSAS--PPPPPVRPSVPLEYDRHGNHF HAGKIAAG-----RLLPKKVMILDDD----- 67 h_contortus_wormbase_HCOI00986600 1 ------RGPDPITQTCMGNKFCL------LYCESGTIHKASL HDGTIQGRFSLIPNVVQMELNLSGTR----- 56 b_xylophilus_manual_BUX.s00460.498 1 ------MQHTYIFLTKK--IAVAGSKELCSIAWMKNKGYMAIGGDEGSL----KVIQLPDEAEKVM----- 54 m_hapla_wormbase_MhA1_Contig1878.frz3.fgene2 1 ------MYVFPARKFQIQTPGSTTFTCISWMLNRGYV AVGGSDGAL----RVMLLNLDQDKSN----- 53 p_redivivus_wormbase_g15008.t1 1 ------MSRSSQPRTELKAFIFLSKK--LSVPFTKEIKCVAWNYNRGFI CAGGVDGTA----RVFKLHADNVTAD----- 63 c_japonica_gemoma_CJA21748 1 --MATYLNSMANYNEKTIDRLSMTVFRKFNMGLPENGLLHFAEWNLNSNFV ATGGAQGSM----RVVRIGGEPASDP----- 71 c_elegans_C54G7.4 1 --MPPMLNVMANYNEKTIGKLTMSVFRKFNLGLPEHGQLHFAEWNYNSNYI ACGGALGKL----KVVKIGTDATDLN----- 71 c_remanei_wormbase_CRE30141 1 -MAPPVLNSMANYNEKTISKIHMCVFRKFNMSLPDNAQLHFSEWNYNNNYI TCGGALGTL----KVVKIGLDPVDTK----- 72 c_brenneri_wormbase_CBN17343 1 MTPPVLLNTMANFNEKTCSKVYMNVFRKFNMGLPEHAQLHFSEWNHNSNFI ACGGALGTL----KIVKIGMDVRDPK----- 73 c_tropicalis_wormbase_Csp11.Scaffold626.g6613.t3 1 ------MANFNEKT ISKMYMCVFRKFNMGLPENSQLHFSEWNHNSNF I ACGGALGTL----KVVKIGMDVRDPK----- 64 c_sinica_wormbase_Csp5_scaffold_05456.g35025.t1 1 ------MANFNEKT ISK IYMCVFRKFSMGLPENAQLHFAEWNHNSSF I ACGGALGML----RVVKISLEATEPK----- 64 c_briggsae_genblastg_CBG14804 1 - -MTLLLNAMANYNEKT ISKLYMCVFRKFNMGLPENAQLHFSEWNHNSNY I ACGGALGTL----RVVKISLDATEPK----- 71 t_spiralis_wormbase_EFV55416 1 ------MFTYISKK--ISIPNNTQLMCISWNHTDDYI ACGGDRGLL----KVIQLSTSAKQLD----- 51 t_suis_manual_M514_26693 1 ------MFTYISKK--ISIPNNTQLMCISWNQADNYI ACGGDQGLL----KVVKLNSSAKSHS----- 51 a_ceylanicum_wormbase_Acey_s0187.g1113.t2 1 ---MQLLNSAVNYNEKSINGCSLLLTRR--VKVPGNTNLACVEWNLNTNFL ASGGSGGML----KVIKLTHEDSVVG----- 68 h_bacteriophora_wormbase_Hba_12179+Hba_12180+Hba_12181+... 1 ---MQRLNTAVNYNEKSINGCAVHLTRK--VKVPSNTVLQCVEWNLNTNFI ASGGSGGLL----KVLKLACVDIGSF---KD 70 c_angaria_wormbase_Cang_2012_03_13_00195.g6733.t1 1 -MRIDLLNAMVNYNEKTITSMGMHLVRK--IGVPGTSELFCIEWNHNTDFI VVGGAMGTL----KIIKIEDNFKGVN----- 70 s_ratti_gemoma_C54G7.4_ortholog 1 ------IAVPFVSNLFCVSWMYNFGFI VAGGDEGQL----KIIRLSNLTLKKNNAGGQ 48 a_suum_wormbase_GS_11459 1 ------MDDPRASS----- 8 o_volvulus_wormbase_OVOC1858 1 ------MIFVYLSKK--ISIPNNTKLHCVSWMKTSGYI AAGGNDGLL----KILKLPANSLSQN----- 52 d_immitis_manual_C54G7.4 1 ------MKTSGYI AAGGNQGLL----KILKLSSNLQNQN----P 30 l_loa_wormbase_EJD74175.1 ------p_pacificus_wormbase_PPA23338 68 --SNTLACFVDDVSIKG-----IYV------YLSRKINLPPPSDAN------100 h_contortus_wormbase_HCOI00986600 57 LATIDSTNLLQFFDFNEDGLNKVNSMDVKEVADFKWDEEQEDSI------100 b_xylophilus_manual_BUX.s00460.498 55 -----KGPFKSFIHLEGH-TASARVKYA------CWNEVFQKLATCDESGL IIVWMSH------100 m_hapla_wormbase_MhA1_Contig1878.frz3.fgene2 54 -PPSSLNPWLSSQQLEGH-SSSVRVL------SWNEVYQKLASTDDAGV IIVWMT------100 p_redivivus_wormbase_g15008.t1 64 --KSNSNPFYMNSPLDGYGTSTVVLA------AWNEYYQKLATSD------100 c_japonica_gemoma_CJA21748 72 ---SKPNTLVMNQVLEGH-NATVLNA------TWNENNQ------100 c_elegans_C54G7.4 72 -KSPNAATLVVNQALEGH-NATVMNA------TWNEN------100 c_remanei_wormbase_CRE30141 73 -QNPATASLIVNQTLEGHQSATVLIA------TWN------100 c_brenneri_wormbase_CBN17343 74 ---PNASALIVNQSLEGH-NATVINA------TWNEN------100 c_tropicalis_wormbase_Csp11.Scaffold626.g6613.t3 65 ---PNASALIVNQSLEGH-NATVLNA------TWNENNQKLTTSDT------100 c_sinica_wormbase_Csp5_scaffold_05456.g35025.t1 65 -HHPSSTVLNVNQPLEGH-QATVLNA------TWNENNQKLTTS------100 c_briggsae_genblastg_CBG14804 72 -QHPSNTVLNVNQTLEGH-QATVMNA------TWNEN------100 t_spiralis_wormbase_EFV55416 52 NTATGQSDVALNQTLEGH-SGSIKVI------VWNEKHQKLTTADDRGY INVWVLH------100 t_suis_manual_M514_26693 52 KASSASSDVSMDINLEGH-SGSVLVA------VWNTTFQKLTTVDSRGY IIVWVLH------100 a_ceylanicum_wormbase_Acey_s0187.g1113.t2 69 ----STMNMPVNQALEGH-SGTVMCA------AWNEVHQKLTT------100 h_bacteriophora_wormbase_Hba_12179+Hba_12180+Hba_12181+...71 SKGGTPMNMSVNQALEGH-SGTVLCA------AWNEL------100 c_angaria_wormbase_Cang_2012_03_13_00195.g6733.t1 71 ------SLSVNQALEGH-SSSVMCA------SWNEIYQKLTTS------100 s_ratti_gemoma_C54G7.4_ortholog 49 PGTSAKEDISLNQNLEGHGSSNVILA------VWNEVYQKLTTCDTNGL IVVWMNHGD------100 a_suum_wormbase_GS_11459 9 ----GPTNLSVNQNLDGH-SGTVQVA------AWNEPYQKLTTSDSNGL IIVWLTQRDSWYEEMINNRNKSVVVDMAWSH 77 o_volvulus_wormbase_OVOC1858 53 FSSAIPTNLSMNQNLDGH-SGIVHIA------KWNESCQKLTTCDSNGL IIVWLT------100 d_immitis_manual_C54G7.4 31 SSSTISTNLLVNQNLDGH-SNIVHIA------EWNESCEKLTTCDSNGL IIVWLTQSDSWYEEMINRRNKSVVVGMA--- 100 l_loa_wormbase_EJD74175.1 1 ------MNQNLDGH-SNIVHIA------EWNECCQKLTTCDSNGL IIVWLTQSDSWYEEMINKRNKSVVVGMAWSH 63 p_pacificus_wormbase_PPA23338 ------h_contortus_wormbase_HCOI00986600 ------b_xylophilus_manual_BUX.s00460.498 ------m_hapla_wormbase_MhA1_Contig1878.frz3.fgene2 ------p_redivivus_wormbase_g15008.t1 ------c_japonica_gemoma_CJA21748 ------c_elegans_C54G7.4 ------c_remanei_wormbase_CRE30141 ------c_brenneri_wormbase_CBN17343 ------c_tropicalis_wormbase_Csp11.Scaffold626.g6613.t3 ------c_sinica_wormbase_Csp5_scaffold_05456.g35025.t1 ------c_briggsae_genblastg_CBG14804 ------t_spiralis_wormbase_EFV55416 ------t_suis_manual_M514_26693 ------a_ceylanicum_wormbase_Acey_s0187.g1113.t2 ------h_bacteriophora_wormbase_Hba_12179+Hba_12180+Hba_12181+... ------c_angaria_wormbase_Cang_2012_03_13_00195.g6733.t1 ------s_ratti_gemoma_C54G7.4_ortholog ------a_suum_wormbase_GS_11459 78 NGTKIAIAYEDGQVIVGSVDGNR------100 o_volvulus_wormbase_OVOC1858 ------d_immitis_manual_C54G7.4 ------l_loa_wormbase_EJD74175.1 64 NGSKIAIAYEDGQAIVGSVDGNRLWNKNVASNLVALC 100

Figure 3.27: Multiple sequence alignment of first 100a.a. of ifta-1 orthologs. Among 25 nematode genomes, 21 ifta-1 orthologs are found, and 4 are not found.

3.3.21 Curation of mks-1 orthologs in nematodes

MKS-1 forms a complex with MKSR-1 and MKSR-2 at the basal body (Bialas et al., 2009). We identified mks-1 orthologs in 11 nematode species, and the first 100 a.a. of these orthologs are not very well-conserved (Figure 3.28). mks-1 is poorly conserved outside of the B9 domain (Bialas et al., 2009). The low number of mks-1 orthologs with well-defined 5’ start sites may be a result of this poor conservation, since our gene annotation process is largely depends on sequence similarity.

94 Table 3.22: Curation of mks-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE14357 WormBase gene model Yes 73.5 C. tropicalis Csp11.Scaffold629.g9304.t2 WormBase gene model Yes 73.3 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN26042 WormBase gene model Yes 75.2 C. sinica Csp5_scaffold_00692.g14468.t1 WormBase gene model Yes 75.0 No RNA-seq data, but first 100a.a. are con- served; small gaps 300bp, 500bp, 1.6kb upstream C. briggsae CBG22495 - No 21.5 Conflicting RNA-seq junctions C. elegans R148.1 - - - - C. japonica CJA20432 genBlastG gene model Yes 64.4 C. angaria R148.1a_ortholog GeMoMa gene model No 22.0 No RNA-seq data for first intron, first 100a.a. partially conserved 95 H. bacteriophora Hba_17670+Hba_17671/ No ortholog - - - Low sequence similarity (WormBase PID: found 13.8, GeMoMa PID: 24.6, genBlastG PID: 16.9) H. contortus HCOI01291500.t1/ No ortholog found - - - Low sequence similarity (WormBase PID: 22.2, GeMoMa PID: 26.3, genBlastG PID: 15.6) A. ceylanicum Acey_s0002.g939.t1 WormBase gene model Yes 30.0 N. americanus NECAME_07939 WormBase gene model No 15.6 Gaps 100bp upstream and in introns P. pacificus No ortholog found - - - Low sequence similarity (GeMoMa PID: 24.4, genBlastG PID: 14.2) P. exspectatus No ortholog found - - - Low sequence similarity (GeMoMa PID: 17.7, genBlastG PID: 11.2) S. ratti SRAE_2000266400/No ortholog found - - - Low sequence similarity (WormBase PID: 19.2, GeMoMa PID: 15.5, genBlastG PID: 11.5) Curation of mks-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) P. redivivus g18200.t1/No ortholog found - - - Low sequence similarity (WormBase PID: 22.3, GeMoMa PID: 19.7, genBlastG PID: 16.8) B. xylophilus BUX.s01142.48/No ortholog found - - - Low sequence similarity (WormBase PID: 18.7, GeMoMa PID: 21.7, genBlastG PID: 18.5) M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 22.3, other predictions not found) M. hapla MhA1_Contig1538.frz3.gene4/No - - - Low sequence similarity (WormBase PID: ortholog found 18.8, GeMoMa PID: 8.5, genBlastG not found) A. suum GS_11643 WormBase gene model Yes 10.8 No RNA-seq data for first intron, first 100a.a. partially conserved D. immitis nDi.2.2.2.t02827/ No ortholog found - - - Low sequence similarity (WormBase PID: 96 22.9, GeMoMa PID: 23, genBlastG: 10.3) O. volvulus OVOC4983/No ortholog found - - - Low sequence similarity (WormBase PID: 20.4, GeMoMa PID: 25.5) B. malayi Bm10994 WormBase gene model Yes 19.4 L. loa EJD75716.1/No ortholog found - - - Low sequence similarity (WormBase PID: 22.4, GeMoMa PID: 21.3, genBlastG PID: 13.1) T. spiralis No ortholog found - - - Low sequence similarity (GeMoMa PID: 12.6, other predictions not found) T. suis No ortholog found - - - Low sequence similarity (GeMoMa PID: 13.4, other predictions not found) n_americanus_wormbase_NECAME_07939 1 ----MYIMAYLGNLTG------PYDGRDELVICRITLLNDRSI------IFEPRLSHNGYRIQ-SKIG---EY53 b_malayi_wormbase_Bm10994 1 MPEQRGVYFANSPLTGVKFRLVLEYNEPVADVP---QSLDNFTENISEKSV PALSTTFVEECTFRWQ-QKLY- - - - - 68 a_ceylanicum_wormbase_Acey_s0002.g939.t1 1 -MESGSVFDFTGSAKMMRLKVTL-AQRGLSETL------NGAENLTEA-- - - L ISRA IDELNLQWQ-QKVSPLHEN63 c_angaria_gemoma_R148.1_ortholog 1 ----MSVFEFSGPIERIFIS----SGKGFQNSE------KSRIDDISFGWQ-QRVP- - - - - 41 c_japonica_genblastg_CJA20432 1 MSKSSSIFWFSGTIDRISIRVQL-FQKGIQSLDFDRFHLEN---AVDEK-- - -PTKSRLDDVTFGWK-QKVG- - - - - 63 c_tropicalis_wormbase_Csp11.Scaffold629.g9304.t2 1 MPRASSTFIFSGSLDRLSIRVQL-FQKGIQSLDFDRFTQENS-NIFDEK-- - -PQKSRLDDVKFQWK-QRVG- - - - - 65 c_elegans_R148.1a 1 MSRLSSTFGFAGSVDRLSIRVQL-FQKGIQSLDFDRFNQDN--NVIDER-- --PSKSRLDDVTFKWK-QKVG----- 64 c_sinica_wormbase_Csp5_scaffold_00692.g14468.t1 1 MPRVSSTFGFAGSLDRLSIRVQL-FQKGIQALDFDRFGQEN--SVIDEK-- --PTKSRLDDVTFKWK-QKVG----- 64 c_brenneri_wormbase_CBN26042 1 MPRGSSSFGFAGSLDRLSIRVQL-FQKGIQSLDFDRFTQENN-NVINER-- --PTKSRLDDVTFLWK-HKVG----- 65 c_remanei_wormbase_CRE14357 1 MPRGSSTFGFVGSLDRLSIRVQL-FQKGIQSLDFDKFSLENNANVLEEK-- - -PTKSRLDDVTFGWK-QKVG- - - - - 66 a_suum_wormbase_GS_11643 1 ---MAAVFCDHEPLCIVEFTV---FQKEMMSAHGSRTY------LTTHCIRNAEFDVKLEYIGAVFGA56 c_briggsae_wormbase_CBG22495 1 ---MTSNLDGTTKLDILRIE----FLSGYQPEKPAEIKEIDLKERPGNDYMM-WRDTPATRAEIKKQFLYIFARLGP69 n_americanus_wormbase_NECAME_07939 54 LAMVHVWDDHFTPIYDQLEVMTLAPPIPETQSFELPKEDVTQFVSLM------100 b_malayi_wormbase_Bm10994 69 ------LRRDKFVPQSQTESCSKQGSVSQDEITMAAEK------100 a_ceylanicum_wormbase_Acey_s0002.g939.t1 64 PIAEQDSSRSEARLFTYVDGDELPQHLRVPTVSQRAE------100 c_angaria_gemoma_R148.1_ortholog 42 ------QKKKLFTLLESDELPSHLNQFKNDGNEDNSGKIAESSKEMAT NNDYIVFRKSKIISEQP 100 c_japonica_genblastg_CJA20432 64 -----APESSTRRIFTLTDGDEFPDFLKAEICDGEAEKPVNE------100 c_tropicalis_wormbase_Csp11.Scaffold629.g9304.t2 66 -----RAENSNRRIFTLREDDEFPEYLKGYKPLEEKEEAK------100 c_elegans_R148.1a 65 -----RAENSNRRIFTLRDGDEFPDYLSRFKPTEVSEPAEN------100 c_sinica_wormbase_Csp5_scaffold_00692.g14468.t1 65 -----QSQNSTRRIFTLHEDDEFPEYLQGFKPEEQEEKLED------100 c_brenneri_wormbase_CBN26042 66 -----RPENSNRRIFTLRDDEEFPEYLKQFKPEEEDSKGD------100 c_remanei_wormbase_CRE14357 67 -----RAENSNRRIFTLRDDDEFPDYLKDFKPVDDTPKV------100 a_suum_wormbase_GS_11643 57 L----PSYEDFLAKATKQSTSVLCEQIALKWQQKVPHPCNDALVPSSV------100 c_briggsae_wormbase_CBG22495 70 MD---IEDPSAVELLTRITVTDLNRITVEPAINQ------100

Figure 3.28: Multiple sequence alignment of first 100a.a. of mks-1 orthologs. Among 25 nematode genomes, 11 mks-1 orthologs are found, and 14 are not found.

3.3.22 Curation of mks-6 orthologs in nematodes mks-6 localizes to the basal bodies/transition zones of cilia, where it plays a role in basal body/tran- sition zone placement along with nphp-4 (Williams et al., 2011). We identified mks-6 orthologs in 9 nematode species, and the first 100 a.a. of these orthologs are somewhat well-conserved (Figure 3.29).

97 Table 3.23: Curation of mks-6 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE26811 - No 70.9 ~400bp from end of contig C. tropicalis Csp11.Scaffold629.g14766.t1 WormBase gene model Yes 63.6 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN31630 WormBase gene model Yes 77.1 Sparse RNA-seq data for this gene, but first 100a.a. are conserved C. sinica Csp5_scaffold_01220.g19352.t2 WormBase gene model Yes 79.8 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG20463 GeMoMa gene model Yes 83.7 C. elegans K07G5.3 - - - C. japonica CJA13639 genBlastG gene model Yes 72.1 C. angaria Cang_2012_03_13_00732.g13766.t1 - No 7.5 End of contig

98 H. bacteriophora Hba_16544/No ortholog found - - - Low sequence similarity (WormBase PID: 28.1, GeMoMa PID: 25.6, genBlastG PID: 27.5) H. contortus HCOI00765200.t1+ - - - Low sequence similarity (WormBase PID: HCOI00765100.t1/No ortholog found 19.1, GeMoMa PID: 21.7, genBlastG PID: 20.2) A. ceylanicum Acey_s0070.g474.t6 WormBase gene model No 13.7 Gaps in 5’ end of alignment N. americanus No ortholog found - - - Low sequence similarity (GeMoMa PID: 13.3, genBlastG PID: 11.8) P. pacificus No ortholog found - - - Low sequence similarity (GeMoMa PID: 20.9, genBlastG PID: 15.3) P. exspectatus No ortholog found - - - Low sequence similarity (GeMoMa PID: 18.8, genBlastG PID: 18.3) S. ratti SRAE_2000141300/No ortholog found - - - Low sequence similarity (WormBase PID: 24.3, GeMoMa PID: 25.3, genBlastG PID: 21.9) Curation of mks-6 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) P. redivivus g23505.t1/No ortholog found - - - Low sequence similarity (WormBase PID: 26.8, GeMoMa PID: 26.1, genBlastG PID: 26.1) B. xylophilus No ortholog found - - - Low sequence similarity (GeMoMa PID: 22, genBlastG PID: 23.7) M. incognita Minc06771+Minc06770/No ortholog - - - Low sequence similarity (WormBase PID: found 20.2, GeMoMa PID: 14.8, genBlastG PID: 7.7) M. hapla MhA1_Contig4.frz3.gene4/No or- - - - Low sequence similarity (WormBase PID: tholog found 21.7, GeMoMa PID: 15.7, genBlastG PID: 14.2) A. suum GS_03574 WormBase gene model No 18.5 Short contig D. immitis nDi.2.2.2.t02729/No ortholog found - - - Low sequence similarity (WormBase PID: 19.6, GeMoMa PID: 21.6, genBlastG PID: 99 16.1) O. volvulus OVOC224+OVOC225/No ortholog - - - Low sequence similarity (WormBase PID: found 24.3, GeMoMa PID: 20.1, genBlastG PID: 12.6) B. malayi Bm4888/No ortholog found - - - Low sequence similarity (WormBase PID: 12.4, GeMoMa PID: 14.9, genBlastG PID: 12) L. loa EFO17718.2/No ortholog found - - - Low sequence similarity (WormBase PID: 19.9, GeMoMa PID: 18.1, genBlastG PID: 12.5) T. spiralis EFV54423/No ortholog found - - - Low sequence similarity (WormBase PID: 18.3, GeMoMa PID: 12.6, genBlastG not found) T. suis M514_02579/No ortholog found - - - Low sequence similarity (WormBase PID: 19.6, GeMoMa PID: 11.8) a_ceylanicum_wormbase_Acey_s0070.g474.t6 1 MSDT------EEEMPPSTK------PRIQLPPLQVPRRRMSSSWTENLEPPPDMKSPPA 47 c_angaria_wormbase_Cang_2012_03_13_00732.g13766.t1 1 MFLCVSSLAVYVL------KRKSPAASRKIPGRYAQACRKASYCSASFSA FRNVFSYSNIICGEFLNRSALS 66 a_suum_wormbase_GS_03574 1 MRDEVGNIES------VSETAQSVS------AIKTGLDSE-DGT RRKTMSWVNDLDEEEEHLEWKR 53 c_tropicalis_wormbase_Csp11.Scaffold629.g14766.t1 1 MRRNSVSVEDGIVSSAPP--PSQPPEPSRRKPEKLFLRSASVGLAPG-DGEQSPM------EKEKKPNR 60 c_japonica_genblastg_CJA13639 1 MRRNSVSVEEGIVPR-----PHEEPVQRKQRPEKLFLRSASVGLAPG-DGEQSPSPGPTTKSPPVDADRKQNR 67 c_remanei_wormbase_CRE26811 1 MRPLRPE------ADQILASRK-KPEKLFLRSASVGLAPG-DGEQSPSPGPFSK---DEKEKKQNR 55 c_brenneri_wormbase_CBN31630 1 MRRNSVSVEDGIVRPEPSDQAQPTPATRKNRPEKLFLRSASVGLAPG-DGEQSPSPGPTSK---EDKDKKQNR 69 c_elegans_K07G5.3 1 MRRNSVSVEDGIVRPE----TEQTLASRKNKPEKLFLRSASVGLAPG-DGEQSPSPGPFTK---DEKDKKQNR 65 c_briggsae_gemoma_CBG20463 1 MRRNSVSVEDGIVRPPA---AAEQILASRKKPEKLHLRSASVGLAPSDDGEQSPSPGPFGK---DEKDKKQNR 67 c_sinica_wormbase_Csp5_scaffold_01220.g19352.t2 1 MRRNSVSVEDGIVRP-----PAEQILATRKKPEKLHIRSASVGLAPGEDGEQSPSPGPTGK-----DDKKQNR 63 a_ceylanicum_wormbase_Acey_s0070.g474.t6 48 VKSPEAKPGSSRAFFSRSSSEVKPFSDDEVFASPKTSVTSPLESQRDEVKEDK---- 100 c_angaria_wormbase_Cang_2012_03_13_00732.g13766.t1 67 TQIF------PPKREFPEESDQVFYGKASSAPSIFSGAEK------100 a_suum_wormbase_GS_03574 54 KELV----GAFPSSLSKERSVVVLASEGRYASGRTKSPSSSTSTKRPSNHG ------100 c_tropicalis_wormbase_Csp11.Scaffold629.g14766.t1 61 RESI------KEHLMKFKAKAGKILEDKPPKSAGTDESLNNTMSF M----- 100 c_japonica_genblastg_CJA13639 68 RESI------KGHLMKFKAKAGKILEEKSSKLAVQDEQN------100 c_remanei_wormbase_CRE26811 56 RESI------KDHLMKFKAKAGKILEEKQQTVKPLNDEQTPQMSF VSQTSE 100 c_brenneri_wormbase_CBN31630 70 RESI------KEHLMKFKAKAGKILEEKQQTVKTPND------100 c_elegans_K07G5.3 66 RESI------KEHLMKFKAKAGKILEEKQSVKPINEESQNS------100 c_briggsae_gemoma_CBG20463 68 RESI------KEHLMKFKAKAGKILEEKQPSVKPLVEEG------100 c_sinica_wormbase_Csp5_scaffold_01220.g19352.t2 64 RESI------KEHLMKFKAKAGKILEEKQQTTVKPMVEDTPQN------100

Figure 3.29: Multiple sequence alignment of first 100a.a. of mks-6 orthologs. Among 25 nematode genomes, 9 mks-6 orthologs are found, and 16 are not found.

3.3.23 Curation of mksr-1 orthologs in nematodes mksr-1 forms a complex with MKS-1 and MKSR-2 at the basal body (Bialas et al., 2009). We identified mks-1 orthologs in 23 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.30).

100 Table 3.24: Curation of mksr-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE17081 WormBase gene model Yes 81.0 C. tropicalis Csp11.Scaffold548.g3555.t1 genBlastG gene model Yes 76.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN06405 WormBase gene model Yes 75.0 C. sinica Csp5_scaffold_00564.g12936.t1 WormBase gene model Yes 81.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG11067 WormBase gene model Yes 86.0 C. elegans K03E6.4 - - - C. japonica CJA00755 genBlastG gene model Yes 55.0 C. angaria K03E6.4_ortholog GeMoMa gene model Yes 56.0 H. bacteriophora K03E6.4_ortholog GeMoMa gene model Yes 30.1 No RNA-seq data, but first 100a.a. are 101 conserved H. contortus HCOI01707400.t1 WormBase gene model No 19.7 Gap in 5’ end of alignment A. ceylanicum Acey_s0049.g1841.t1 WormBase gene model Yes 34.0 N. americanus NECAME_12547 GeMoMa gene model Yes 36.0 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved, 5’ end of gene can’t be extended due to stop codons) P. pacificus K03E6.4_ortholog Manual gene model Yes 21.6 P. exspectatus scaffold340-EXSNAP2012.44 - - - Low sequence similarity (WormBase PID: 14.1, GeMoMa PID: 23.9, genBlastG PID: 25.8) S. ratti SRAE_0000033500 WormBase gene model Yes 25.2 P. redivivus g3324.t1 WormBase gene model Yes 19.8 B. xylophilus BUX.s00252.110 WormBase gene model Yes 30.2 M. incognita K03E6.4_ortholog - No 14.5 200bp from end of contig Curation of mksr-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig29.frz3.gene26/No or- - - - Low sequence similarity (WormBase PID: tholog found 18.1, GeMoMa PID: 22.5, genBlastG PID: 20.2) A. suum GS_04113 GeMoMa gene model Yes 36.3 D. immitis nDi.2.2.2.t07590 WormBase gene model Yes 29.2 O. volvulus OVOC2661 Manual gene model Yes 26.0 B. malayi Bm4788 GeMoMa gene model Yes 28.3 No RNA-seq data for first intron, but first 100a.a. are conserved L. loa EFO17797.1 WormBase gene model Yes 28.3 T. spiralis EFV61002 WormBase gene model Yes 28.2 T. suis K03E6.4_ortholog Manual gene model Yes 22.2 102 c_angaria_gemoma_K03E6.4 1 ------MEKKSSIVVTVHGNVTSAQFPEEENICIK--LNVATSEDWRI ISGDHSVLSSFS--FRG-A----TSH-M60 c_japonica_genblastg_CJA00755 1 ------MDNTANLTVNVHGNVRLAEFPLQSNICVR- -LSTVTLGNWNL ISGDNIFLSCFS--YRG-A----DNS-M60 c_brenneri_wormbase_CBN06405 1 ------MDKSE IVSLT IHGNVKTTEFPEETNVCVK- -LHTVAMGDWK I ITGESVFLSSFS--YRG-T----DNQ-L 60 c_tropicalis_genblastg_Csp11.Scaffold548.g3555.t1 1 ------MEKSDNITVTIHGNVRTAEFPQENNVCVK--LHTVASGDWKV ITGESVALSSFS--YRG-T----DNQ-I 60 c_sinica_wormbase_Csp5_scaffold_00564.g12936.t1 1 ------MEKSDILSLTIHGHVRNTEFPEESNVCVK--LSTIAHGDWKI ISGDAVALSSFS--FRG-T----DNQ-I 60 c_remanei_wormbase_CRE17081 1 ------MEKSDAVSVTIHGNVRTTEFPEESNVCVK--LHTVVTGDWKVMTGESVVLSSFS--YRG-T----DNQ-I 60 c_elegans_K03E6.4 1 ------MEKADSIVLTIHGNVRTTEFPEESNVCVK--LSTVATGDWKI INGETVSLSSFS--FRG-A----DNQ-I 60 c_briggsae_wormbase_CBG11067 1 ------MEKSDSVSLTVHGNVRITEFPEESNVCVK--LNTVAVGDWKI ITGDTVALSSFS--YRG-A----DNR-I 60 p_pacificus_manual_K03E6.4 1 -MASSSKPPTTSNSSFIIVITGQVESGYFPSTPSLYIRSSYHSNFSFGWQK LSG-EDALSTCC------TIPNNGSR-F 70 t_spiralis_wormbase_EFV61002 1 ----MSSASVMYPTGFLLMINGQIETAQFFNASDIYCK--YCFVHGPDWNV LSGIEEGLSQMC--CRS-SLENSADG-F 69 t_suis_manual_K03E6.4 1 -----MAAEGSYPSSFLLLVSGQIESAEFHGVSDVYCR--YCFAFGEDWNV VSGVEEGLSQLC--CRN-AMDATCDQ-F 68 s_ratti_wormbase_SRAE_0000033500 1 ------MAKVKSKFIVIVNGNIIDACFHQTKSLYVK--YNFIYGPDWSF VTGIEEGISCSG--HKT-K----NKAEI 62 p_redivivus_wormbase_g3324.t1 1 MPPPMPSSSADPKSKFLVLVTGEIAYAEMTSTSTMYCK--YFFNYGPDWRF LSGIEEGISLTA--TRARH----HNR-I 70 m_incognita_genblastg_K03E6.4_ortholog 1 ------MIAVFKGISATA--YKT-E-RSFNNQ-I 23 a_suum_gemoma_GS_04113 1 ------MKSMFIVLLNGQIDTAQFPSLDNCYCK--YSYVYGNDWEQVSGLEEGLSARC--ERAPK----RDC-I 59 d_immitis_wormbase_nDi.2.2.2.t07590 1 -----MNGKEPKLSNFIVLLNGQIDSAEFLSFDNFYCK--YSYVYGIDWKQVSGIHEGLSARC--ERN-H--SGDNS-I 66 b_malayi_gemoma_Bm4788 1 -----MSAKQPKLSNFIVLLNGQIDSAEFLSFDNFYCK--YSYVYGIDWKQVGGVHEGLSARC--ERQ-R--FGNAE-I 66 o_volvulus_manual_OVOC2661 1 -----MSVKQSKLSNFIVLLNGQIDSAKFFSFDNFYCK--YSYVYGIDWKQVGGIHEGLSARC--ERS-G----NTG-I 64 l_loa_wormbase_EFO17797.1 1 -----MSAKQPKLSNFIVLLNGQVDSAEFLSFDNFYCK--YSYVYGIDWKQ IAGIREGISARCERERS-G----NTD-I 66 b_xylophilus_wormbase_BUX.s00252.110 1 -----MTTGRPEKSSFIVLVTGEIEKAVFPEVNSIYCK--YNYVYGPDWKF ISGVEEGLSPSC--ERGHQ----SSH-I 65 h_bacteriophora_gemoma_K03E6.4_R0 1 ------MLKDQSRSEFVVVVHGQIEQAYFPSIPNICVH--YTFSLGPDWSI VAGPNEGLSACC--SRG-T----SNK-I 63 h_contortus_wormbase_HCOI01707400.t1 1 ------MSYGPDWKHVSGSTEGLSATC--YRG-E----SYR-F 29 a_ceylanicum_wormbase_Acey_s0049.g1841.t1 1 ------MAQEKTFLLLINAHVESAEFPDIPSLCVK--FSTCFGPDWRHVAGAIEGLSATC--FKG-D----SYR-F 60 n_americanus_gemoma_K03E6.4 1 ------MSQEKAFLLLINGHVESAEFPDIPSLYVK--FSTSFGPDWKHVAGAIEGLSATC--FRG------DSHHF 60 c_angaria_gemoma_K03E6.4 61 NIDLPFETSYKSSTPYMWPRYVFSCYSKNRSGNDCLKAYG------100 c_japonica_genblastg_CJA00755 61 FIDLPFEFSIQGESPHMWPRIVLHCFSKDSFGNETIVGYG------100 c_brenneri_wormbase_CBN06405 61 FIDLPFECALKSTNPFMWPRLVLNCFSRDNSGRDSVSGYG------100 c_tropicalis_genblastg_Csp11.Scaffold548.g3555.t1 61 FIDLPFEFAIKGTSPFLWPRFVFNCFSKDNSGKDRVTGYG------100 c_sinica_wormbase_Csp5_scaffold_00564.g12936.t1 61 FIDLPFECGLRGTSPHMWPRFVLNCFSKDKSGKDCVVGYG------100 c_remanei_wormbase_CRE17081 61 FIDLPFECGLKGNSPFMWPRIVLNCFTKDTSGKDCVVGYG------100 c_elegans_K03E6.4 61 FIDLPFECGLKGSSPFMWPRLVLNCFSKDHSGKDCVTGYG------100 c_briggsae_wormbase_CBG11067 61 FIDLPFECGLKGSSPFMWPRLVLNCFSKDTSGKDCVIGYG------100 p_pacificus_manual_K03E6.4 71 VVDLPLSATFRGTSPFRWPQLVFSCYGIDS------100 t_spiralis_wormbase_EFV61002 70 VLNFPVELTLKSTNPYGWPRLVLSCYGTDWF------100 t_suis_manual_K03E6.4 69 VLNFPIDLTFRSTTPYGWPRLTIACYGSDWWG------100 s_ratti_wormbase_SRAE_0000033500 63 NLNTLIEGTFSSTNPYKWPQLILSCYGPDFFGNDIILG------100 p_redivivus_wormbase_g3324.t1 71 VLNTQIEATFQSTNPFRWPQLVVACYGPDA------100 m_incognita_genblastg_K03E6.4_ortholog 24 VLNTPIEATFSSTNPFKWPQLVLSFYGQDPFGNDVIRGYTSTHLPTTPGKKCPIFLPQASTNIQKIIGLLTGRRAEF 100 a_suum_gemoma_GS_04113 60 VIGLPIEATFTSTNPFRWPQLLLCCYGTDGFGNDVVRGYGA------100 d_immitis_wormbase_nDi.2.2.2.t07590 67 TVGMPLEATFASSNPFGWPQIVLSCYGTDFFGND------100 b_malayi_gemoma_Bm4788 67 TIAMPIEATFTSTSPFGWPQIVLTCYGSDFFGND------100 o_volvulus_manual_OVOC2661 65 TVGMPIEATFISTNPFGWPQIVLTCYGSDFFGNDVI------100 l_loa_wormbase_EFO17797.1 67 TVGMPIEATFTSTNPFGWPQIVLTCYGLDFFGND------100 b_xylophilus_wormbase_BUX.s00252.110 66 AINTPIDATFSSTNPFRWPQLIISCYGHDMFGNDV------100 h_bacteriophora_gemoma_K03E6.4_R0 64 VVALPFEVTFSSTNPYKWPQMVLSCYGSDAFGNDVIR------100 h_contortus_wormbase_HCOI01707400.t1 30 VPDLPITATFSSTNPYKWPQLVFSCYGNDLFGHDVVRGYGALPIPTIPGSH IRTVPCFVPEASSKYQKIIG------100 a_ceylanicum_wormbase_Acey_s0049.g1841.t1 61 VPDLPISATFSSTNPYKWPQLVFSCYGHDLLGHDVIRGYG------100 n_americanus_gemoma_K03E6.4 61 VPDLSISATFSSTNPYRWPQLVFACYGHDLLGHDVVRGYG------100

Figure 3.30: Multiple sequence alignment of first 100a.a. of mksr-1 orthologs. Among 25 nematode genomes, 23 mksr-1 orthologs are found, and 2 are not found.

3.3.24 Curation of mksr-2 orthologs in nematodes mksr-2 forms a complex with MKS-1 and MKSR-1 at the basal body (Bialas et al., 2009). We identified mks-1 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.31).

103 Table 3.25: Curation of mksr-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE02604 WormBase gene model Yes 90.0 C. tropicalis Csp11.Scaffold629.g10774.t1 WormBase gene model Yes 88.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN13357 WormBase gene model Yes 93.0 C. sinica Y38F2AL.2_ortholog genBlastG gene model Yes 89.0 No RNA-seq data, but first 100a.a. are con- served; small gap 400bp upstream; 3’ end of gene truncated due to end of contig C. briggsae CBG16827 WormBase gene model Yes 92.0 C. elegans Y38F2AL.2 - - - C. japonica CJA12789 genBlastG gene model Yes 83.0 C. angaria Y38F2AL.2_ortholog - No 53.8 No RNA-seq data for first intron, first 100a.a.

104 partially conserved, and gene does not begin with M H. bacteriophora Y38F2AL.2_ortholog genBlastG gene model Yes 66.7 No RNA-seq data, but first 100a.a. are con- served; existing gene models at this locus but few shared exons H. contortus HCOI01212300.t1 WormBase gene model Yes 74.0 A. ceylanicum Acey_s0078.g1164.t3 WormBase gene model Yes 70.0 N. americanus NECAME_00498 GeMoMa gene model No 72.0 Upstream exons suggested by RNA-seq; gap 200bp upstream of gene P. pacificus PPA17894 genBlastG gene model Yes 50.0 P. exspectatus scaffold21-EXSNAP2012.24 genBlastG gene model Yes 53.0 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_1000289400 WormBase gene model Yes 54.5 P. redivivus g14319.t1 WormBase gene model Yes 59.4 Curation of mksr-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) B. xylophilus Y38F2AL.2_ortholog genBlastG gene model Yes 62.4 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved, 5’ end of gene can’t be extended due to stop codons) M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 16, other predictions not found) M. hapla MhA1_Contig1793.frz3.gene1 Manual gene model Yes 56.9 No RNA-seq data, but first 100a.a. are conserved A. suum GS_07221 genBlastG gene model Yes 61.0 No RNA-seq data for this gene, but first 100a.a. are conserved D. immitis nDi.2.2.2.t01057 genBlastG gene model Yes 57.0 O. volvulus OVOC4679 WormBase gene model Yes 57.0

105 B. malayi Bm6433 WormBase gene model Yes 56.0 L. loa EFO24840.1 WormBase gene model Yes 58.0 T. spiralis EFV61254 GeMoMa gene model Yes 53.0 T. suis Y38F2AL.2_ortholog GeMoMa gene model Yes 50.0 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) t_spiralis_gemoma_Y38F2AL.2_R0 1 -MAELHV IGE I LDAVGFTEDGLSC -VWALK------YGGGWRVLEGSMSGQ TQIAFSSASINATFAHPIDIHLCTK 68 t_suis_gemoma_Y38F2AL.2_ortholog 1 -MAEVHL IGE IMDAQGFGENELFC - LWSLK------YGGGWRLLAGNASGQ THTASGSALQTPYFAHPIDVHLYTR 68 c_angaria_gemoma_Y38F2AL.2_ortholog 1 ---EGFFESDLK-IRARSEKKLDV-QFSLESKNQKLLGGGWKVMNGESEGQ TQTDCPSIFEQAYLSHPLDLHLSTN 71 p_exspectatus_genblastg_scaffold21-EXSNAP2012.24 1 -MAEIHVNGQIESADDFPSSSLFV-KWSLQ------LNGGWTLVRGNSSGQ THTDVGGVLRKAYFAHPIALHLGTR 68 p_pacificus_genblastg_PPA17894 1 -MAEIHVNGHIESADDFPSSSLFV-KWSLQ------LNGGWTLVRGNISGQ THTDVGGVLRKAYFAHPIALHLGTR 68 s_ratti_wormbase_SRAE_1000289400 1 -MAEVHVIGQILDAEEFSDTSLFC-KWLMK------VGGGWKVVEGEISGQ TQTDTSSL-SKTYFCHPIDVHLSTK 67 b_xylophilus_genblastg_Y38F2AL.2_ortholog 1 -MAEVHV IGQIESADGFVDNRLSC-RWIVY------LGGGWRL IEGETEGQ TQTDI-SVLERAYFCHPIDLHLSTK 67 m_hapla_manual_Y38F2AL.2 1 MTAELHI IGQIESAFGFGDNRLAC-RWSLH------CGGGWRV IEGDTEGQ THTDLPES-ERAYFAHPIDVHLATR 68 d_immitis_genblastg_nDi.2.2.2.t01057 1 -MAEVHI IGE IESASGFPEQRLFC-RWELV------FGGGWRL IQGVSKGQ TQTDLSEYGELASFSHPLDIHLVTK 68 b_malayi_wormbase_Bm6433 1 -MAEVHI IGE IEYASGFPEQRLFC-RWELG------FGGGWRV IQGVSKGQ TQIDLSEYEDFAYFSHPLDIHLITK 68 l_loa_wormbase_EFO24840.1 1 -MAEVHI IGE IESAFGFPEQRLFC-RWELG------FGGGWRV IQGVSKGQ TQTDLSEYGDFAYFSHPLDIHLVTK 68 o_volvulus_wormbase_OVOC4679 1 -MAEVHI IGE IESASGFPEQRLFC-RWELT------FGGGWRV IQGVSKGQ TQIDLSEYGDFASFSHPLDIHLVTK 68 p_redivivus_wormbase_g14319.t1 1 -MAEVHV IGE I YSAHDFPDNQLCC -RWQLQ------TGGGWRV ISGDREGQ TQTDLPTT-EAAYFAHPLNIHLSTR 67 c_japonica_genblastg_CJA12789 1 -MAEIFVSGQIISANGFGDNRLSV-RYQLS------FGGGWRIIQGEPEGQ TQTDCPSVFESAYFSHPIDIHLATN 68 c_tropicalis_wormbase_Csp11.Scaffold629.g10774.t1 1 -MAEIFVSGIISSAEGFGDNRLSI-RYQLS------CGGGWRVVQGEAEGQ TQTDCPSVFESAFFGHPIDLHLATS 68 c_sinica_genblastg_Y38F2AL.2_ortholog 1 -MAEVFVSGILASARGFGDNRISV-RYQLS------VGGGWRVVQGENEGQ TQTDCPSVFENAYFGHPIDMHLATS 68 c_elegans_Y38F2AL.2 1 -MAEVFVSGQILSARGFGDNRLSI-RYQLS------FGGGWRVVQGESEGQ TQTDCPSVFENAHFAHPIDLHLATS 68 c_remanei_wormbase_CRE02604 1 -MAEVFVSGIIASAKGFGDNRLSI-RYQLS------LGGGWRVVQGEAEGQ TQTDCPSVFENAYFGHPLDLHLATS 68 c_brenneri_wormbase_CBN13357 1 -MAEVFVSGTITSAKGFGDNRLSI-RYQLS------LGGGWRVVQGESEGQ TQTDCPSVFENAYFGHPIDLHLATS 68 c_briggsae_wormbase_CBG16827 1 -MAEVFVSGIIASARGFGDNRLSI-RYQLS------VGGGWRIVQGEGEGQ TQTDCPSVFENAYFGHPIDLHLATS 68 a_suum_genblastg_GS_07221 1 -MAEVHV IGQ IESATAFPDCRLFC-KWNLQ------IGGGWRVVEGETEGQ TQTDLPEYEEVAYFSHPIDVHLATK 68 h_bacteriophora_genblastg_Y38F2AL.2_ortholog 1 -MAEVYVSGQIVSADLFEDNRLACIKYLL------GGGWRVVEGKVKGQ TQTDLPGVFKEAYFAHPIDLHLATK 67 a_ceylanicum_wormbase_Acey_s0078.g1164.t3 1 -MAEVY ISGQ IESADGFGDNRVCC -RWSLQ------TGGGWRVVEGAVDGQ TQTDLPSAFEKAYFAHPIDLHLATK 68 h_contortus_wormbase_HCOI01212300.t1 1 -MAEVYVSGQ IESADGFGDNRLCC -RWTLQ------TGGGWRVVEGAVEGQ TQTDLPSVFDEAYFAHPIDLHLATK 68 n_americanus_gemoma_Y38F2AL.2_R0 1 -MAEVYVSGQ IESADGFADNRVCC -RWSLQ------TGGGWRVVEGAVEGQ TQTDLPSVFEEAYFAHPIELHLATK 68 t_spiralis_gemoma_Y38F2AL.2_R0 69 S IQGWPKLQFHVWSSDSVGRMR I AGYGFCHVP - 100 t_suis_gemoma_Y38F2AL.2_ortholog 69 S IQGWPK I ELQVWNRDAVGRAQSAGYGFCHVP - 100 c_angaria_gemoma_Y38F2AL.2_ortholog 72 T IQGWPK I L IQVWHHDNYGRQE IAGYGSL - - - - 100 p_exspectatus_genblastg_scaffold21-EXSNAP2012.24 69 TVQGWPR INLEVWHYDSYGRQELFGYGSLF IP - 100 p_pacificus_genblastg_PPA17894 69 TVQGWPR INLEVWHYDSYGRQELFGYGS IF IP - 100 s_ratti_wormbase_SRAE_1000289400 68 T IQNWPK IHLEVWHLDNYDRQE IRGYGTTF IPS 100 b_xylophilus_genblastg_Y38F2AL.2_ortholog 68 T IQNWPR IHLE IWRQDEFNRQE ICGYGTA I IPS 100 m_hapla_manual_Y38F2AL.2 69 T IR-WPR I L IEVWHYDKYGRHS I YGYGNCFVPS 100 d_immitis_genblastg_nDi.2.2.2.t01057 69 T IQGWPL ISLQVWHYDEFGRQELYGYGSMYLP - 100 b_malayi_wormbase_Bm6433 69 T IQGWPS ISLQ IWHYDEFGRQELYGYGS I YLP - 100 l_loa_wormbase_EFO24840.1 69 T IQGWPSVSLQ IWHYDEFGRQELYGYGS I YLP - 100 o_volvulus_wormbase_OVOC4679 69 T IQGWPS ISLQ IWHYDEFGRQELYGYGS I YLP - 100 p_redivivus_wormbase_g14319.t1 68 T IQGWPR IN IEVWHHDQYGRQEVYGYGTAF IPS 100 c_japonica_genblastg_CJA12789 69 T IQGWPRLLFQVWHHDGHGRQE IAGYGTLLLP - 100 c_tropicalis_wormbase_Csp11.Scaffold629.g10774.t1 69 S IQGWPRLLLQVWHHDAYGRQE IAGYGT I LLP - 100 c_sinica_genblastg_Y38F2AL.2_ortholog 69 S IQGWPRLLLQVWHHDNYGRQE IAGYGTLLLP - 100 c_elegans_Y38F2AL.2 69 S IQGWPRLLLQ IWHHDNYGRQE IAGYGTLLLP - 100 c_remanei_wormbase_CRE02604 69 S IQGWPRLLLQVWHHDDYGRQE IAGYGTLLLP - 100 c_brenneri_wormbase_CBN13357 69 S IQGWPRLLLQVWHHDNYGRQE IAGYGTLLLP - 100 c_briggsae_wormbase_CBG16827 69 S IQGWPRLL IQ IWHHDNYGRQE IAGYGTLLLP - 100 a_suum_genblastg_GS_07221 69 TMQGWPR IN IQVWHHDEFGRQELYGYGSTF IP - 100 h_bacteriophora_genblastg_Y38F2AL.2_ortholog 68 T IQGWPR IH IQVWHHDVYGRQELYGYGSVF IPS 100 a_ceylanicum_wormbase_Acey_s0078.g1164.t3 69 TVQGWPRMQLQVWHHDVYGRQELVGYGSLF LP - 100 h_contortus_wormbase_HCOI01212300.t1 69 T IQGWPR IQLQVWHHD I YGRQELVGYGSLFLP - 100 n_americanus_gemoma_Y38F2AL.2_R0 69 T IQGWPR IQLQVWHHDVYGRQELVGYGSLFLP - 100

Figure 3.31: Multiple sequence alignment of first 100a.a. of mksr-2 orthologs. Among 25 nematode genomes, 24 mksr-2 orthologs are found, and 1 is not found.

3.3.25 Curation of nphp-2 orthologs in nematodes nphp-2 interacts with MKS and MKSR genes, and affect the placement of transition zones of cilia (Warburton-Pitt et al., 2012). We identified nphp-2 orthologs in 22 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.32).

106 Table 3.26: Curation of nphp-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE18553 WormBase gene model Yes 62.4 C. tropicalis Csp11.Scaffold629.g13078.t1+ WormBase gene model Yes 58.3 No RNA-seq data, but first 100a.a. are Csp11.Scaffold629.g13077.t1 conserved C. brenneri CBN15509 WormBase gene model Yes 52.9 C. sinica Csp5_scaffold_00670.g14229.t1 WormBase gene model Yes 61.4 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG23685 GeMoMa gene model No 34.6 Gaps in first intron and promoter C. elegans Y32G9A.6 - - - C. japonica CJA19273 WormBase gene model Yes 48.5 C. angaria Cang_2012_03_13_00038.g2131.t1 GeMoMa gene model Yes 35.0 No RNA-seq data for first intron, but first 100a.a. are conserved 107 H. bacteriophora Hba_17198+Hba_17197+Hba_17196 GeMoMa gene model Yes 37.1 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00083300.t1 - No 16.3 600bp from end of contig H. contortus HCOI01612000.t1 - No 11.1 900bp from end of contig A. ceylanicum Acey_s0502.g2614.t4 Manual gene model Yes 32.1 N. americanus NECAME_00328+NECAME_00329 Manual gene model Yes 41.3 P. pacificus PPA32962+PPA18953 GeMoMa gene model Yes 34.0 P. exspectatus Y32G9A.6_ortholog genBlastG gene model Yes 31.2 No RNA-seq data, but first 100a.a. are con- served; existing gene models at this locus but few shared exons S. ratti SRAE_0000035300 WormBase gene model Yes 19.8 First 100a.a. partially conserved P. redivivus No ortholog found - - - Low sequence similarity (GeMoMa PID: 21.5, genBlastG PID: 17.8) B. xylophilus BUX.s00110.56 Manual gene model Yes 34.3 Gap 600bp upstream M. incognita Minc08613 WormBase gene model Yes 30.4 Gap 300bp upstream Curation of nphp-2 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig276.frz3.gene1 WormBase gene model Yes 29.1 No RNA-seq data, first 100 a.a. partially conserved A. suum GS_18064 - No 15.6 Gap in 5’ end of alignment D. immitis nDi.2.2.2.t07466 WormBase gene model Yes 29.9 Gap 1.2kb upstream O. volvulus OVOC7461 WormBase gene model Yes 30.1 B. malayi Bm6351 - No 20.2 Gap in 5’ end of alignment; gene doesn’t begin with ATG; short contig L. loa EFO15325.1+EJD73916.1 WormBase gene model Yes 31.4 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV61571/No ortholog found - - - Low sequence similarity (WormBase PID: 22.1, GeMoMa PID: 14.4, genBlastG PID: 12.6) 108 T. suis No ortholog found - - - Low sequence similarity (GeMoMa PID: 23.5, genBlastG PID: 15.7); existing gene models at this locus supported by RNA-seq h_contortus_wormbase_HCOI00083300 1 ------XVVEWLLALHPSTVNA ASHEGRTCLHLAAAQGNLEMV--ILLCTKGC 45 h_contortus_wormbase_HCOI01612000 1 MRMLKLLKQYNASFEIQNYRGDLPIHEA-VQSGSRDVVEWLLALHPSTVNA ASHEGRTCLHLAAAQGNLEMV--ILLCTKGC 79 a_suum_wormbase_GS_18064 1 ------MVISY-IDFNDQCEVILAICFR------YTPLLVAAMSGNTSAIK-MLLERGAQ 46 b_malayi_wormbase_Bm6351 1 ------NLEIVIILSQIGGDIYA TDRDQLGAIHCAASHGHIHIIEYLIRNLDPF 48 s_ratti_wormbase_SRAE_0000035300 1 ------MEELKLPYNFSEVDVGK KDSEGRVALHYAASNSDVDIIE-KIFLKDKT 47 b_xylophilus_manual_BUX.s00110.56 1 ------MDLIRA-LESLDYDRARTLLRDHPELATNRDDEGCAAIHYAAEIPDLQLFV-AVLEADPS 58 c_angaria_gemoma_Cang_2012_03_13_00038.g2131.t1 1 ------MSLIDA-IDQQNEAEVLKILEEHPGEVSI RDDEGNTYLNYAAEKSSLEIFK-KLYELDTT 58 m_hapla_wormbase_MhA1_Contig276.frz3.gene1 1 ------MVIDENLSASSSGDLLFTS-LDEGNLSEANAILDKIP------VPLHYAADCADIATFQ-RIFSLERS 60 m_incognita_wormbase_Minc08613 1 ------MVVEDENMLASSSGELLFTA-LDEGNLTEANAILDKKPESVAA RDNELRVPLHYAADCADIATFQ-RIFTLERS 72 c_japonica_wormbase_CJA19273 1 ------MALIIA-LDQEDQEEIQRLLDEHPEDVFT RDEDGKVALHYAASRSDMNTLQ-LVYLADPS 58 c_elegans_Y32G9A.6 1 ------MSHTLIEALDDERETSVIQKILEEHPEEASQPNEEKKVAIHYAAASGDLKTLK-LVFLADRS 61 c_brenneri_wormbase_CBN15509 1 ------MEQNIRLIEA-LDRNDTDMVQKILQENKEEVKF RNEEDKVAIHFAASESDLNTLK-IIYNAEKN 62 c_tropicalis_wormbase_Csp11.Scaffold629.g13078.t1+Csp11... 1 ------MALIEA-LDRQDDDLIQKILNQNPMEVSV RNEENKVAIHYAASEASLNTLK-IIFVADRT 58 c_briggsae_gemoma_CBG23685 1 MGPFYYEGPSITGPFYFSKLLSGGGGGGXXXRHEYDKIQQILEENPDEVRF RNEEDKVAIHYAASEGDLNTLK-LIFLGDRT 81 c_remanei_wormbase_CRE18553 1 ------MNMSLIDA-LDRHENDKIQQILEENPDEVRYRNEEDKVAIHYAASDGDLNALK-LIFLADRT 60 c_sinica_wormbase_Csp5_scaffold_00670.g14229.t1 1 ------MNMSLIEA-LDGHHYDMIQKILEENPDEVRHRNEEDKVALHYAASEGDLNSLK-LIFLADRT 60 p_exspectatus_genblastg_Y32G9A.6_ortholog 1 ------MAENSSLLHA-IERQDSIETNRLLDEFPSEVSHKDSDGRVALHYAADLMRIETVK-KIIDNDPS 62 p_pacificus_gemoma_PPA32962+PPA18953 1 ------MAENSSLLHA-IERQDSIETNRLLDEFPSEVSHKDSDGRVALHYAADLMRIETVK-KIIDNDPS 62 d_immitis_wormbase_nDi.2.2.2.t07466 1 ------MSLIRA-LNESDMIATARILDLFPSEVFF RDDEDRVALHHAAETANAEIFK-RILEMDHS 58 o_volvulus_wormbase_OVOC7461 1 ------MSLIRA-LNESDMIAVARILDLFPNEIFYRDSEDRIALHHAAETADTETFK-RILEMDHS 58 l_loa_wormbase_EFO15325.1+EJD73916.1 1 ------MSLIRA-LNKSDMIGTARILDLFPSEIFC RDNEDRIALHYAAETADAETFK-RILEMDQS 58 h_bacteriophora_gemoma_Hba_17198+Hba_17197+Hba_17196 1 ------MSGMSLLRA-LDTQDKQETARILETMPSEVSI RDSEDRIALHYAAETMDLITFQ-KIYEQDPT 61 a_ceylanicum_manual_Acey_s0502.g2614.t4 1 ------MFRECYSVPMSGSSLIRA-LDNHDRVETERLLLSQPTEVSMRDSEDRVALHYAAETMDLEMFQ-KILEQDPS 70 n_americanus_manual_Y32G9A.6_ortholog 1 ------MSGSSLIRA-LDSHDRAETERLLQSQPTEVSL RDSEDRVALHYAAETMDLEMFQ-KILEQDPS 61 h_contortus_wormbase_HCOI00083300 46 FVNPLMLYKGNLYTPLDLARRKNH------QVVVDYLS KKHEAKAAADFPESEREKNRITF- 100 h_contortus_wormbase_HCOI01612000 80 FVNPLMLYKGNLYTPLDLARR------100 a_suum_wormbase_GS_18064 47 INHI----DKDKHSAVHWAVVCGQEAQPLHYATITEDIPHERSEAILHILL KNGASVN------100 b_malayi_wormbase_Bm6351 49 IVNSV---DRNGDTALFYAVTLGH------YECARLLL LNGAEVNHQDRHLRCPIHCAAAK- 100 s_ratti_wormbase_SRAE_0000035300 48 LINAM---DNNGQSPITMAVINGN------LMAVEFFYNHGVSVDQYDNERHS I VHWAVVCG 100 b_xylophilus_manual_BUX.s00110.56 59 LIDNQ---DHHGFTPLFVAVTAGH------ADIVKHLI GKGSQLDHVDMDK------100 c_angaria_gemoma_Cang_2012_03_13_00038.g2131.t1 59 VLDDP---NKKGVTPLINSIMSGN------LNISKFLA ENGANVHQLDNDG------100 m_hapla_wormbase_MhA1_Contig276.frz3.gene1 61 LMDAQ---DQFGFTPLLVAAMAGN------VPILEFLI EQLDVLISLLR------100 m_incognita_wormbase_Minc08613 73 LMDAQ---DQFGFTPLLVAAMAGN------VTVLEFL------100 c_japonica_wormbase_CJA19273 59 LVDLS---DKEGYTPLLLALMDGR------TDNADFLT KNGANVHHVDVNG------100 c_elegans_Y32G9A.6 62 LLDVK---DATGQTPLLCALMAGK------IENADFLA NTGADAECHD------100 c_brenneri_wormbase_CBN15509 63 LVDAR---DATKLTPLLCAVMSGK------VENAEFLA NSGADVHAI------100 c_tropicalis_wormbase_Csp11.Scaffold629.g13078.t1+Csp11... 59 LVDVR---DGTKQTPLLCAIMSGK------IENAEFLA NNGADVHAIDENG------100 c_briggsae_gemoma_CBG23685 82 LVDVR---DGTRQTPLLCAIMS------100 c_remanei_wormbase_CRE18553 61 LVDVR---DGTKQTPLLCAVMSGR------IENAEFLA NNGADVHAIDE------100 c_sinica_wormbase_Csp5_scaffold_00670.g14229.t1 61 LVDAR---DGTKQTPLLCAVMSGR------IENAEFLA NNGADVHAVDE------100 p_exspectatus_genblastg_Y32G9A.6_ortholog 63 LIDVP---DVDGMTPLTMAVQAGR------EDV------GASISHMDENGHSL------100 p_pacificus_gemoma_PPA32962+PPA18953 63 LIDVP---DVDGMTPLTMAVQAGR------EDVVNLLL DRGASISHM------100 d_immitis_wormbase_nDi.2.2.2.t07466 59 LVHCQ---DQNGYTPLLIASMSGN------VSAIKLMI ENDIQINHVDKDK------100 o_volvulus_wormbase_OVOC7461 59 LAHCQ---DQNGFTPLLIASMNGN------VPVIKLLL ENDIQINHIDKDK------100 l_loa_wormbase_EFO15325.1+EJD73916.1 59 LIHCQ---DQNGYTPLLIASMSGN------VPAIKLLI ENGIQINHIDKEK------100 h_bacteriophora_gemoma_Hba_17198+Hba_17197+Hba_17196 62 LIDCQ---DKNGYTPLLMSVMGGR------TDLVEYLL SKGANLNHVD------100 a_ceylanicum_manual_Acey_s0502.g2614.t4 71 LLDCE---DKNGHTPLLMAVMGGR------TDLVELLL S------100 n_americanus_manual_Y32G9A.6_ortholog 62 LLDCE---DKNGHTPLLMAVMGGR------IELVEFLL SKGANIAHCD------100

Figure 3.32: Multiple sequence alignment of first 100a.a. of nphp-2 orthologs. Among 25 nematode genomes, 23 nphp-2 orthologs are found, and 3 are not found. Note: H. contortus contains two nphp- 2 genes, and neither genes has a high confidence 5’ start site.

3.3.26 Curation of odr-4 orthologs in nematodes odr-4 is expressed in some ciliated neurons (10 amphids and 2 phasmids) and is responsible for the localization of transmembrane odorant receptors to cilia (Dwyer et al., 1998; Efimenko et al., 2005). We identified odr-4 orthologs in 18 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.33).

109 Table 3.27: Curation of odr-4 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE25369 genBlastG gene model Yes 89.0 C. tropicalis Csp11.Scaffold629.g14469.t1 WormBase gene model Yes 93.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN21485 WormBase gene model Yes 91.0 C. sinica Csp5_scaffold_03070.g28966.t1 genBlastG gene model Yes 92.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG16563 genBlastG gene model Yes 91.0 C. elegans Y102E9.1a - - - C. japonica CJA42024 - No 4.0 Gap in 5’ end of alignment C. angaria Cang_2012_03_13_00036.g2039.t1 GeMoMa gene model Yes 78.0 Gap 400bp upstream H. bacteriophora Hba_07664 GeMoMa gene model Yes 47.5 No RNA-seq data, but first 100a.a. are 110 conserved H. contortus HCOI01274200.t1 - No 7.1 Short contig H. contortus HCOI01911600.t1 - No 13.7 Gap in 5’ end of alignment A. ceylanicum Acey_s0371.g136.t1 genBlastG gene model Yes 45.5 N. americanus NECAME_09322 genBlastG gene model Yes 46.2 P. pacificus Y102E9.1a_ortholog GeMoMa gene model No 27.6 Gap in 5’ end of alignment P. exspectatus No ortholog found - - - Low sequence similarity (GeMoMa PID: 19.1, genBlastG PID: 12.7) S. ratti SRAE_1000221300 WormBase gene model No 15.9 Gaps in 5’ end of alignment P. redivivus No ortholog found - - - Low sequence similarity (GeMoMa PID: 16.1, other predictions not found) B. xylophilus No ortholog found - - - Low sequence similarity (GeMoMa PID: 15.9, other predictions not found) M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 15.5, other predictions not found) Curation of odr-4 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla No ortholog found - - - Low sequence similarity (GeMoMa PID: 12.9, other predictions not found) A. suum GS_10493 WormBase gene model Yes 37.6 D. immitis nDi.2.2.2.t09789 Manual gene model Yes 34.7 O. volvulus OVOC9544 WormBase gene model Yes 34.7 B. malayi Bm7218 WormBase gene model Yes 34.8 L. loa EFO20314.1 WormBase gene model Yes 32.7 T. spiralis No ortholog found - - - Low sequence similarity (GeMoMa PID: 22.1, genBlastG PID: 22.1) T. suis M514_02139/No ortholog found - - - Low sequence similarity (WormBase PID: 20.3, GeMoMa PID: 12.3) 111 s_ratti_wormbase_SRAE_1000221300 1 ------MILNEKILN------KATTINTSDDTYILFIG LVSKTNTHYAVNVAKIREPDNESNE---- 51 p_pacificus_gemoma_Y102E9.1a_ortholog 1 ------LSIQGADIIVYFLCGSYVAD-EIHVAHIAKCPLPDTIDAE---- 39 a_suum_wormbase_GS_10493 1 ------MIALDKGLEGLLDTYVKEHC-----ERQKRFGLMATAAFLIGSKLDD-DIYVAHIAVCPLPETVDDE---- 61 b_malayi_wormbase_Bm7218 1 ------MILLDKCLEFVLNSYIREHC-----DKQRKYAITGSAGFLIGSKIDE-DFHVAHIAMCAHPDTIRDE---- 61 l_loa_wormbase_EFO20314.1 1 ------MILLDKCLELVLDCYIREHC-----DRQRKYSIGGSAGFLIGSK ICE -DFYVAH IAMCPNPDTMQDEEKY I 65 d_immitis_manual_nDi.2.2.2.g09789 1 ------MILLDNCLEPLLDCYIKEHC-----DRQRNYGIVGSAGFLIGSKINE-DFHVAHIAMCAYPDTIQDE---- 61 o_volvulus_wormbase_OVOC9544 1 ------MILLDKCLESLLDCYIREHC-----DRQRKYGIAGSAGFLIGSKMDE-DFHVAHIAMCAYPDT IQDE- - - - 61 c_angaria_gemoma_Y102E9.1a 1 ------MILFDQQLEEWVKKSAKNHE-----FVQIEKSIPASAYFLLGSFCSDGDIHIAHASKCPVHSSALEA---- 62 c_brenneri_wormbase_CBN21485 1 ------MILFDVQLQEWVAKSAKNHE-----FVQSDKGIPASAYFLLGSFCSDGDIHVAYTSKCPVHSSALQE---- 62 c_remanei_genblastg_CRE25369 1 ------MILFDVQLQEWVTKSAKNHE-----FVQSDKSIPASAYFLLGSFCSDNDIHVAYASKCPVHSSALEV---- 62 c_briggsae_genblastg_CBG16563 1 ------MILFDVQLQEWVTKSSKNHE-----FVQSDKGIPASAYFLLGSFCSDGDIHVAYASKCPVHSSALEE---- 62 c_elegans_Y102E9.1a 1 ------MILFDVQLQEWVTKSAKNHE-----FVLSDKGIPASAYFLLGSFCSDGDIHVAYASKCPVHSSALEE---- 62 c_tropicalis_wormbase_Csp11.Scaffold629.g14469.t1 1 ------MIIFDVQLQEWVTKSAKNHE-----FVSSDKGIPASAYFLIGSFCSDGDIHVAYASKCPVHTSALEE---- 62 c_sinica_genblastg_Csp5_scaffold_03070.g28966.t1 1 ------MILFDVQLQEWVTKSAKNHE-----FVLSDKGIPASAYFLIGSFCSDGDIHVAYASKCPVHSSALEE---- 62 h_bacteriophora_gemoma_Hba_07664 1 ------MIIFDAPLESWVDKNIKIHC------AKKNPNVNNSTYFLIGSFCSDGDINIAHTTKCPLPVTAGDP---- 61 h_contortus_wormbase_HCOI01911600 1 ------M------1 a_ceylanicum_genblastg_Acey_s0371.g136.t1 1 ------MIIFDRCLESWLDSTQKSQQ-----KSQKEGNFTYAVHFLIGSFCSDGDIHVAHAALCPLPATAADG---- 62 n_americanus_genblastg_NECAME_09322 1 ------MIIFDRCLESWLDSIQKSHQ-----NSRKDGKFTYAAYFLIGSFCSDGDIHVAHAALCPLPITAGDA---- 62 c_japonica_wormbase_CJA42024b 1 MHAKFIVRLERVISILYRTLLDGEIRDINEPLIKDVKKNKKTTIEAQLFLD PLYNRKPGAVDEISSNVHELLLDIE---- 76 h_contortus_wormbase_HCOI01274200 1 ------M----NGLKALVNGEMRNESEQLFKDVKKKKIDTIDVQLFVD TSENKENSCEPQVATNLHEVSFDIE---- 63 s_ratti_wormbase_SRAE_1000221300 52 ------KMDSHWISEFAYQLNLRLPGGFAILGFGLVTKKVTSENDNLL VKAVQRY------100 p_pacificus_gemoma_Y102E9.1a_ortholog 40 --NYDDLSKLLDLDWIVDHATKVLRMLPGGVTIVG-EYCTKNTVIHVFPAG ITAICCKKVFTDQ------100 a_suum_wormbase_GS_10493 62 --AGDLRSRLVDAEWISDHGTRVLRLLPGGISIVGLLWLAD------100 b_malayi_wormbase_Bm7218 62 --SGDIHSKSVDADWIADTGSRVLRFLPGGTMIVGLLWLAD------100 l_loa_wormbase_EFO20314.1 66 SRTGDIYSKSVDDDWIADIGSRVLRFLPGGTMIVG------100 d_immitis_manual_nDi.2.2.2.g09789 62 --IGDIHSKSVDADWISDIGSHVLRFLPGGTMIVGLLWLAD------100 o_volvulus_wormbase_OVOC9544 62 --TGDIYSKSIDADWIADIGSRVLRFLPGGTMIVGLLWLAD------100 c_angaria_gemoma_Y102E9.1a 63 - -NANDKTKLLDDQWMTDNAERVLRMLPGGVNVVGIAWFS------100 c_brenneri_wormbase_CBN21485 63 --DAPEKAKLLEDEWMSDHAERLLRMLPGGIHVVGIAWFS------100 c_remanei_genblastg_CRE25369 63 --NAPEKAKMLEEDWMADHAERLLRMLPGGIHVVGVAWFS------100 c_briggsae_genblastg_CBG16563 63 --NAPEKAKMLDDDWMADHAERLLRMLPGGIHVVGVAWFS------100 c_elegans_Y102E9.1a 63 --NATESSKMLEDEWMSDHAERLLRMLPGGIHVVGIAWFS------100 c_tropicalis_wormbase_Csp11.Scaffold629.g14469.t1 63 --NAPEKAKMLEDEWMSDHAERLLRMLPGGIHVVGIAWFS------100 c_sinica_genblastg_Csp5_scaffold_03070.g28966.t1 63 --NAPEKAKILEGDWMADHAERLLRMLPGGIHVVGIAWFS------100 h_bacteriophora_gemoma_Hba_07664 62 --EGDQLSQSFDDEWIADNAEKITRILPGGVHVVGVAWFSR------100 h_contortus_wormbase_HCOI01911600 2 --SGDALSKSLDDEWIADNAEKVTRILPGGIHIVGLAWFSDRSTYNLQKAM ITRSLARIQRTTNLLTTLSISSVSDQMAL 79 a_ceylanicum_genblastg_Acey_s0371.g136.t1 63 ---VNALSRTLDDEWIADNAEKVTRILPGGIHVVGLIWFSE------100 n_americanus_genblastg_NECAME_09322 63 ------LSRTLDDEWIADNAEKVTRVLPGGVHVVGLLWCSDRKY------100 c_japonica_wormbase_CJA42024b 77 -IRAAVPIRSTVNDAIRAIKHHLVR------100 h_contortus_wormbase_HCOI01274200 64 -IRAAVPAKSNVGSAITAVKHHLIRSLTARAEL---QYESM------100 s_ratti_wormbase_SRAE_1000221300 ------p_pacificus_gemoma_Y102E9.1a_ortholog ------a_suum_wormbase_GS_10493 ------b_malayi_wormbase_Bm7218 ------l_loa_wormbase_EFO20314.1 ------d_immitis_manual_nDi.2.2.2.g09789 ------o_volvulus_wormbase_OVOC9544 ------c_angaria_gemoma_Y102E9.1a ------c_brenneri_wormbase_CBN21485 ------c_remanei_genblastg_CRE25369 ------c_briggsae_genblastg_CBG16563 ------c_elegans_Y102E9.1a ------c_tropicalis_wormbase_Csp11.Scaffold629.g14469.t1 ------c_sinica_genblastg_Csp5_scaffold_03070.g28966.t1 ------h_bacteriophora_gemoma_Hba_07664 ------h_contortus_wormbase_HCOI01911600 80 IFIEPPAGKPTGLVIDVVRRG 100 a_ceylanicum_genblastg_Acey_s0371.g136.t1 ------n_americanus_genblastg_NECAME_09322 ------c_japonica_wormbase_CJA42024b ------h_contortus_wormbase_HCOI01274200 ------

Figure 3.33: Multiple sequence alignment of first 100a.a. of odr-4 orthologs. Among 25 nematode genomes, 19 odr-4 orthologs are found, and 7 are not found. Note: H. contortus contains two odr-4 genes, and neither gene has a high confidence 5’ start site.

3.3.27 Curation of osm-1 orthologs in nematodes osm-1 is an IFT-B component, and is required for proper cilium assembly, where osm-1 mutants have shortened cilia (Cole et al., 1998; Qin et al., 2001). We identified osm-1 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are well-conserved (Figure 3.34).

112 Table 3.28: Curation of osm-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE27648 genBlastG gene model Yes 88.0 C. tropicalis Csp11.Scaffold629.g13331.t1 WormBase gene model Yes 86.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN13813 WormBase gene model Yes 75.0 C. sinica Csp5_scaffold_00001.g63.t2 WormBase gene model Yes 85.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG16355 WormBase gene model Yes 87.0 C. elegans T27B1.1 - - C. japonica CJA16618 genBlastG gene model Yes 83.0 C. angaria Cang_2012_03_13_00288.g8551.t1 GeMoMa gene model Yes 77.0 H. bacteriophora T27B1.1_ortholog GeMoMa gene model Yes 57.0 No RNA-seq data, but first 100a.a. are 113 conserved H. contortus HCOI00970500.t1 WormBase gene model Yes 53.0 H. contortus HCOI00735000.t1 WormBase gene model Yes 53.0 A. ceylanicum Acey_s0132.g1691.t4 WormBase gene model Yes 53.0 N. americanus NECAME_01504+NECAME_01507 GeMoMa gene model Yes 54.0 One alternative intron suggested by RNA-seq, but first 100a.a. are conserved P. pacificus PPA24556+PPA24554 WormBase gene model Yes 52.0 P. exspectatus scaffold355-EXSNAP2012.6 WormBase gene model Yes 53.0 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_2000448100 WormBase gene model Yes 48.0 No RNA-seq data for this gene, but first 100a.a. are conserved P. redivivus g1247.t1 genBlastG gene model Yes 49.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved B. xylophilus BUX.s00460.434 WormBase gene model Yes 40.0 M. incognita Minc00536 WormBase gene model Yes 43.0 M. incognita Minc02229 WormBase gene model Yes 43.0 Curation of osm-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig239.frz3.gene3 WormBase gene model Yes 43.0 No RNA-seq data, but first 100a.a. are conserved A. suum GS_03035 genBlastG gene model Yes 52.0 No RNA-seq data for first intron, but first 100a.a. are conserved D. immitis nDi.2.2.2.t04646 WormBase gene model Yes 53.0 O. volvulus OVOC2219 WormBase gene model Yes 52.0 B. malayi Bm12734 - No 6.9 Short contig L. loa EFO14266.1+EJD74502.1 GeMoMa gene model Yes 56.0 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV57060 WormBase gene model Yes 44.0 T. suis M514_12468 WormBase gene model Yes 39.0 114 b_malayi_wormbase_Bm12734 1 MAGELFEAIKNFEQSLENYQKRKSYAKAIQLARIHFPEKVISLEEDWGD-- --YLIAEGNYDAAINHFLESGKTA 71 t_spiralis_wormbase_EFV57060 1 MRLNYSKELLPFQEKYCPVQAIAWSPNGHRLA-VGTAENSVILFDESGERKDKFNAKSADAKANRGSYLVKSLAF 74 t_suis_wormbase_M514_12468 1 MFVRYDKELIPYQGRLAPVQSIAWSPDSSRLA-VCNSDNVIFLYDESGEKRDKFNTKSVDSKGEKLSYVVKGIVF 74 s_ratti_wormbase_SRAE_2000448100 1 MKLRFLDTIIDSQLSSSKIPSVAWSSNNKKLA-VATKDRNIYIFDGIREQKDRFHAKPIESKYGKNSFVITAITF 74 p_exspectatus_wormbase_scaffold355-EXSNAP2012.6 1 MKLKYLSTPLPPQERMMPVRAMTASPNMNKLA-AVVEDRTIVLFDDKGTQRDRFSTKPIDAKYSKRSYIVLSLAF 74 p_pacificus_wormbase_PPA24556+PPA24554 1 MKLKYLSTPLPPQERMMPVRAMTASPNMNKLA-AVVEDRTIVLFDDKGAQRDRFSTKPIDGKYSKRSYIVLSLAF 74 b_xylophilus_wormbase_BUX.s00460.434 1 MNLKFLQTLIEPENSISRIPSIAYSPNDRKLA-IATNDRVIHLFDDKGNRKDKINTKPIESQYGKQSYKIQSVCF 74 m_hapla_wormbase_MhA1_Contig239.frz3.gene3 1 MKLKYLNGIVSPSEGIACAGALCFSPNGRKLA-VGTADRHVLLFDDKFRRRDKFATKPVDSKYGKTGYLIKSAVF 74 m_incognita_wormbase_Minc00536 1 MKLKYLNGIVSPSDEIACAGALCFSPNGRKLA-VCTLDRHVLLFDDKFRRRDKFATKPVDSKYGKTGYLIKSASF 74 m_incognita_wormbase_Minc02229 1 MKLKYLNGIVSPLDEIACAGALCFSPNGRKLA-VCTMDRHVLLFDDKFRRRDKFATKPVDSKYGKTGYLIKSASF 74 p_redivivus_genblastg_g1247.t1 1 MKLKYLATLTDSQDEVAKVRAIAVSPNNRKLA-VAAADLSVSLFDEKLQRRDKFATKPVDSKFGKKSYLVTALAF 74 a_suum_genblastg_GS_03035 1 MRLKYLQTILEQQDGAAKIAALDWSPNGAKLA-VATADRSIVLFDEKGQRRDKFPTKPTDSRYGKKSYLVKTIVF 74 d_immitis_wormbase_nDi.2.2.2.t04646 1 MRLRYLTTLVEQQNKMAKVPAIDWSPNGKKLA-IANADRVVLLFDESGKRRDKFATKPIDAKYGKKSYLVKALIF 74 l_loa_gemoma_EJD74502.1 1 MRLKYLTTIVEQQDGPAKISAMDWSPNGKKLA-IANADQVILLFDETGKRRDKFATKPIDAKYGKKSYQVKALIF 74 o_volvulus_wormbase_OVOC2219 1 MRLRYLTTLIEQQNGAAKIPAMDWSPNGKKLA-IANADRVILLFDENGNRRDKFATKPIDTKYGKKSYQVKALVF 74 h_contortus_wormbase_HCOI00970500.t1 1 MKLQYLATLLPQQEGSNKICSLACTPNGVKLA-VVGNDRLITLIDEKGDVRDRFSSKSIDSKYGKKSYVIKSVCF 74 h_contortus_wormbase_HCOI00735000.t1 1 MKLQYLATLLPQQEGSNKICSLACTPNGVKLA-VVGNDRLITLIDEKGDVRDRFSSKSIDSKYGKKSYVIKSVCF 74 n_americanus_gemoma_NECAME_01504+NECAME_01507 1 MKLQFLSTLLPQQEGSNKICSLACTPNGVKLA-VVGRDRIITLIDEKGDVKDRFNSKPLDNKYGKRSYIIKCICF 74 a_ceylanicum_wormbase_Acey_s0132.g1691.t4 1 MKLQFLSTLLPQQEGSNKVCSLACTPNGVKLA-VVGRDRIITLIDEKGDVKDRFSSKPLDSKYGKRSYIIRSICF 74 h_bacteriophora_gemoma_T27B1.1_ortholog 1 MKLRYLSNIIPQQEGAAKVCSVACSPSGMKTA-VAARDRSIVLLDENGEQRDKFATKPVDSKYGKKSYIVKCVVF 74 c_brenneri_wormbase_CBN13813 1 MKLKYLSSVIPSQEGEARVNSIACSPNGNRTA-IACDDRSIVLLDENGMQKDRFSCKPVDSKYGKKSYAILCLAF 74 c_angaria_gemoma_Cang_2012_03_13_00288.g8551.t1 1 MKLKFLETIIPSQDGESKITCIACSPNGQRTA-IACADRSIVLLDENGNQKDRFTCKPVDSKYSKKSFTIICMTF 74 c_briggsae_wormbase_CBG16355 1 MKLKYLSSIIPAQDGEAKIATIACSPNGFRAA-VACSDRSVALLDENGQQRDKFSCKPLDAKYGKKSFAVLCMTF 74 c_elegans_T27B1.1 1 MKLKYLSTILPAQDGEAKISNISCSPNGSRAA-IACSDRSVALLDENGVQKDRFTCKPIDAKYGKKSFTVLCMTF 74 c_tropicalis_wormbase_Csp11.Scaffold629.g13331.t1 1 MKLKYLASVIPAQDGEAKISTIVCSPNGNRTA-IACADRSIALLDENGQQKDRFSCKPIDSKYGKKSFTVLCMTF 74 c_japonica_genblastg_CJA16618 1 MKLKYLASVLPSQDGEAKITSIVCSPNGTRTA-IACADRSVALLDENGNQKDRFTCKPGDAKHGKKSFIVLCMAF 74 c_remanei_genblastg_CRE27648 1 MKLKYLSSIIPAQDGEAKITSIVCSPNGTRTA-IACADRSVALLDENGNQKDRFTCKPVDSKYGKKSFNVLCMTF 74 c_sinica_wormbase_Csp5_scaffold_00001.g63.t2 1 MKLKYLSNVIPAQDGEAKITTIACSPNGTRTA-VACADRSIALLDENGIQKDRFACKPVDAKYGKKSFVVLCMTF 74 b_malayi_wormbase_Bm12734 72 KALEAS IKAKQWSRAAQ IVDV IEDSELAK 100 t_spiralis_wormbase_EFV57060 75 SPDSTKLALGQTDNSVFIYKLGLKWT--- 100 t_suis_wormbase_M514_12468 75 SPDSTKLAVGQSDCSVFVYRLGQRWS - - - 100 s_ratti_wormbase_SRAE_2000448100 75 SPDS IKLAVGQSDNIVF IYK IGKNWD- - - 100 p_exspectatus_wormbase_scaffold355-EXSNAP2012.6 75 SPDSASLAVGQSDNVVFVYKIGADWN--- 100 p_pacificus_wormbase_PPA24556+PPA24554 75 SPDSASLAVGQSDNVVFVYKIGADWN--- 100 b_xylophilus_wormbase_BUX.s00460.434 75 SPDSTKLAVAQTDTIVFVYRLGESWD--- 100 m_hapla_wormbase_MhA1_Contig239.frz3.gene3 75 SPDSNKLA IGQTDD IVFVYKLGENWE- - - 100 m_incognita_wormbase_Minc00536 75 SPDSNKLA IGQTDD IVFVYKLGENWD - - - 100 m_incognita_wormbase_Minc02229 75 SPDSNKLA IGQTDD IVFVYKLGENWD - - - 100 p_redivivus_genblastg_g1247.t1 75 SPDSTKLAIGQTDNIVFVYRLGETWE--- 100 a_suum_genblastg_GS_03035 75 SPDSTRLAVGQTDQIVYVYRLGESWD--- 100 d_immitis_wormbase_nDi.2.2.2.t04646 75 SPDSTRIAIGQTDNITYVYRIGQTWD--- 100 l_loa_gemoma_EJD74502.1 75 SPDSTRIAIGQTDNITYVYRIGKTWD--- 100 o_volvulus_wormbase_OVOC2219 75 SPDSTYIAIGQTDNITYVYRIGKTWD--- 100 h_contortus_wormbase_HCOI00970500.t1 75 SPDSSRLAVAQTDNVVYVYKVGESWT--- 100 h_contortus_wormbase_HCOI00735000.t1 75 SPDSSRLAVAQTDNVVYVYKVGESWT--- 100 n_americanus_gemoma_NECAME_01504+NECAME_01507 75 SPDSSKLALGQSDNVVYVYKTGETWN--- 100 a_ceylanicum_wormbase_Acey_s0132.g1691.t4 75 SPDSSRLALGQSDNVVYVYKTGETWN--- 100 h_bacteriophora_gemoma_T27B1.1_ortholog 75 CPDSSKIAVGQSDHVVYVYKIGQTWW--- 100 c_brenneri_wormbase_CBN13813 75 SPDSSCIAIGQSDNVLFIYKVGTSWN--- 100 c_angaria_gemoma_Cang_2012_03_13_00288.g8551.t1 75 SPDSSRVAVGQSDNVVFIYKVGTTWN--- 100 c_briggsae_wormbase_CBG16355 75 SPDSSRIAIGQSDNVLFIYKVGTSWN--- 100 c_elegans_T27B1.1 75 SPDSSRIAIGQSDNVLFIYKVGTSWN--- 100 c_tropicalis_wormbase_Csp11.Scaffold629.g13331.t1 75 SPDSSRIAIGQSDNVLFIYKIGTSWN--- 100 c_japonica_genblastg_CJA16618 75 SPDSSRIAIGQSDNVLFIYKVGSTWN--- 100 c_remanei_genblastg_CRE27648 75 SPDSSRIAIGQSDNVLFIYKVGTSWN--- 100 c_sinica_wormbase_Csp5_scaffold_00001.g63.t2 75 SPDSSRIAIGQSDNVLFIYKVGTSWN--- 100

Figure 3.34: Multiple sequence alignment of first 100a.a. of osm-1 orthologs. Among 25 nematode genomes, 27 osm-1 orthologs are found, and none are not found. Note: H. contortus and M. incognita contain two osm-1 genes, and all of these genes have high confidence 5’ start sites.

3.3.28 Curation of osm-5 orthologs in nematodes osm-5 is an IFT-B component, and mutants have truncated cilia and male-mating defects (Qin et al., 2001). We identified osm-5 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.35).

115 Table 3.29: Curation of osm-5 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE07525 WormBase gene model Yes 91.1 C. tropicalis Csp11.Scaffold582.g4693.t1+ genBlastG gene model Yes 86.1 No RNA-seq data, but first 100a.a. are Csp11.Scaffold582.g4694.t1 conserved C. brenneri CBN01720 WormBase gene model Yes 78.5 C. sinica Csp5_scaffold_00652.g14034.t1 WormBase gene model Yes 83.3 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG02013 genBlastG gene model Yes 83.2 C. elegans Y41G9A.1 - - - C. japonica CJA09762 genBlastG gene model Yes 81.2 C. angaria Cang_2012_03_13_00320.g9150.t1 GeMoMa gene model Yes 62.2 Small gap in assembly at 5’ end of gene; gene doesn’t begin with ATG 116 H. bacteriophora Hba_17018 GeMoMa gene model Yes 46.4 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00077900.t1 - No 8.1 500bp from end of contig; gap in 5’ end of alignment A. ceylanicum Acey_s0036.g3288.t1+ WormBase gene model Yes 53.8 Acey_s0036.g3286.t1 N. americanus NECAME_13555 - No 14.1 Gap <200bp upstream P. pacificus Y41G9A.1_ortholog GeMoMa gene model Yes 28.5 P. exspectatus Y41G9A.1_ortholog - No 15.8 400bp from end of contig; gap in 5’ end of alignment S. ratti SRAE_2000236500 WormBase gene model Yes 34.8 P. redivivus g18596.t1 GeMoMa gene model Yes 36.3 B. xylophilus BUX.s01198.126 WormBase gene model Yes 39.6 M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 13.4, genBlastG PID: 7.3) M. hapla MhA1_Contig1998.frz3.gene5+ WormBase gene model Yes 26.9 No RNA-seq data, first 100 a.a. partially MhA1_Contig1998.frz3.gene6 conserved Curation of osm-5 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_05151 GeMoMa gene model Yes 29.4 Upstream exon suggested by RNA-seq, but low coverage D. immitis nDi.2.2.2.t06530 Manual gene model Yes 31.6 O. volvulus OVOC1224 WormBase gene model Yes 38.9 Gap ~2.5kb upstream B. malayi Bm6526 WormBase gene model Yes 33.0 L. loa EFO13552.1+EJD75140.1 WormBase gene model Yes 35.3 Sparse RNA-seq data for this gene, but first 100a.a. are conserved T. spiralis EFV52609 Manual gene model Yes 28.0 T. suis Y41G9A.1_ortholog Manual gene model Yes 26.0 117 t_suis_manual_Y41G9A.1_ortholog 1 ------MANMES AADDLYSGFSEIPSFLQSQELLGE--EQFGG 35 h_contortus_wormbase_HCOI00077900.t1 1 ------MNIDLTFTVLFNLAQQYMANDMANEALNTYQI 32 n_americanus_wormbase_NECAME_13555 1 MEQRVMDMLKESLFASERRSFKEALDKAKEAGRRERAVVKYREQQGVVESMNIDLTFTVLFNLAQQYMANEMANEALNTYQI 82 p_exspectatus_gemoma_Y41G9A.1_ortholog 1 ------HLAND--VYFQE 10 p_pacificus_gemoma_Y41G9A.1_ortholog 1 ------MSSQFVTDVYSGFDDYEHAYDSQHLAND--VYFQE 33 t_spiralis_manual_EFV52609 1 ------MT NNETLYSGFDELATSFNSEEILNN--KPFVQ 31 m_hapla_wormbase_MhA1_Contig1998.frz3.gene5+MhA1_Cont... 1 ------MSAQSRQNQQISSANNQQRQDDPYGGFNDYDHAYDLENLYSD--GNFVQ 47 h_bacteriophora_gemoma_Hba_17018 1 ------MINDPYGGFDDYEHAYDTQHLAQD--LSFKQ 29 a_ceylanicum_wormbase_Acey_s0036.g3288.t1+Acey_s0036... 1 ------MATQ TLDDLYEGFDEYVPAYDTTHLAQD--RAFQQ 33 c_angaria_gemoma_Cang_2012_03_13_00320.g9150.t1 1 ------DFYGGFDEYDNTYDVQNITQS--SQFQQ 26 c_japonica_genblastg_CJA09762 1 ------MIQMIPKE- DDDDFYGGFDNYDSAYDIQNITQN--PQFQQ 37 c_sinica_wormbase_Csp5_scaffold_00652.g14034.t1 1 ------MLNNTFRD- DDDDFYGGFDSYDQAYDIQNLTQN--PQFQQ 37 c_briggsae_genblastg_CBG02013 1 ------MHQNSFRENDDDDFYGGFDSYNQAYDIESITQN--PQFQQ 38 c_tropicalis_genblastg_Csp11.Scaffold582.g4693.t1+Csp11... 1 ------MQQNSYRD- DDDDFYGGFDSYDQAYDIQNITQN--AQFQQ 37 c_brenneri_wormbase_CBN01720 1 ------MNDDDDFYGGFNSYDKAYDIQNITSN--PQFQQ 31 c_remanei_wormbase_CRE07525 1 ------MHQNSFRE- DDDDFYGGFDSYDKAYDIQNITQN--PQFQQ 37 c_elegans_Y41G9A.1 1 ------MANSTFRE- DDDDFYGGFDSYDKAYDIQNITQN--PQFQQ 37 b_xylophilus_wormbase_BUX.s01198.126 1 ------MTT STEDPYSGFNDYDHAYDTDALFAD--KEFVR 32 s_ratti_wormbase_SRAE_2000236500 1 ------MNLHDDDDDDYEGFNDYEHAYDVENMFGD--RNFQQ 34 p_redivivus_gemoma_g18596.t1 1 ------MAAAA ASDDPYSGFNDYDHAYDLDTMYED--KEFAK 34 a_suum_gemoma_GS_05151 1 ------MDDRYEGFNDYDHAYDIQNVLGD--KVFQQ 28 d_immitis_manual_nDi.2.2.2.g06530 1 ------MSSMEEV KMDDRYEGFNDYDHAYDIQNFLDD--QIFQE 36 b_malayi_wormbase_Bm6526 1 ------MDDRYEGFNDYDHAYDVQNVLGD--QVFQE 28 o_volvulus_wormbase_OVOC1224 1 ------MDDRYEGFNDYDHAYDVQNVLDD--QIFQE 28 l_loa_wormbase_EFO13552.1+EJD75140.1 1 ------MDDRYEGFNDYDHAYDIQNVLDD--QVFQD 28 t_suis_manual_Y41G9A.1_ortholog 36 VAGRL-SGYSRPSAVSSSQNSTFGTTSAPFRT-PTSFK-AQLKTSR------PLTFSKPR---P------VSRAIS 93 h_contortus_wormbase_HCOI00077900.t1 33 IVKNK-MF---PNSGRLKVNIGNIYFKKK--D-YNKAI-KYYRMAL------DQVPSIQKETRIKIL------NNIGVA 91 n_americanus_wormbase_NECAME_13555 83 IVKNK-MF---PNSGRLKVNIG------100 p_exspectatus_gemoma_Y41G9A.1_ortholog 11 AIVRS-SHGRRPTSRMLSGSIRLATSVRVDTSIPRPLT-AVKGVGYSSYHNKIEAEEAKAARENNEDNIDGRIKEGESKAME 90 p_pacificus_gemoma_Y41G9A.1_ortholog 34 AIVRS-SHGRRPTSRMLS------R-SMRLA-TSVRVDT------S------IHRPLT 70 t_spiralis_manual_EFV52609 32 ATSKSTSYGRQVT------REGNTDQLLGNVS-RLSST-DRSRLQL------ASGRPLPTSKP--QS------SARPMT 88 m_hapla_wormbase_MhA1_Contig1998.frz3.gene5+MhA1_Cont... 48 AAVRS-SYGRRPAT------MNKVI-ELLRQGHI------DQAIEDLLAFNN---K------TDGKIA 92 h_bacteriophora_gemoma_Hba_17018 30 AVARS-SYGRRPVS-----RTP-SSYGNI--L-PSSYG-MPSSYGAR------SRSGISAAGSRN---E------PNRPMT 84 a_ceylanicum_wormbase_Acey_s0036.g3288.t1+Acey_s0036... 34 AVARS-SHGRRPTTSMAG-RAP-SAFGMA----PHSSYGGVSSYGV------RSSTGRTALATRN---E------PARPMT 92 c_angaria_gemoma_Cang_2012_03_13_00320.g9150.t1 27 AVARS-SHGRRPTTSQMGWRDPLSSHGKP- - - -PPTQSVGRSRAGG------RTAMAVNN---E------PARPMT 82 c_japonica_genblastg_CJA09762 38 AVARS-SHGRRPTTSQMGFRDPTSSYGKP----PPTMA-SQSRSGG------RTAMAMRN---E------PARPMT 92 c_sinica_wormbase_Csp5_scaffold_00652.g14034.t1 38 AVARS-SQGRRPVASQMGFREP-SSYGKP----PPTMI-NMSRMGG------RTAMGNNN---E------PARPMT 91 c_briggsae_genblastg_CBG02013 39 AVARS-SHGRRPNTSQMGFRDPGSSYGKP----PPTMI-NMSRMGG------RTAMANNN---E------PARPMT 93 c_tropicalis_genblastg_Csp11.Scaffold582.g4693.t1+Csp11... 38 AVARS-SHGRRPTASQMGFRDGTSSYGKP----PPTML-NQSRMGG------RTAMANNN---E------PARPMT 92 c_brenneri_wormbase_CBN01720 32 AVARS-SHGRRPTASQMGYRDGASSYGKP----PPTMI-NQSRMGG------RTAMANNN---E------PARPMT 86 c_remanei_wormbase_CRE07525 38 AVARS-SHGRRPTASQMGFRDNSSSYGKP----PPTMI-NQSRMGG------RTAMANNN---E------PARPMT 92 c_elegans_Y41G9A.1 38 AVARS -SHGRRPTASQMGFRDASSSYGKP - - - -PGTMMGNQSRMGG------RTAMANNN---E------PARPMT 93 b_xylophilus_wormbase_BUX.s01198.126 33 AAARS-SHSRRLPT---GARQAIGSRLGT----GSNQL-NAARLPT------ASALRSSLASRRIGNE------VSRPMT 91 s_ratti_wormbase_SRAE_2000236500 35 VLAKS-SHGRRPITGM--ARQMTTNRMGTSKN-SNRIE-ANLQTAS------RMRSSLISRG-GET------SNRPMT 94 p_redivivus_gemoma_g18596.t1 35 AVARS-SHGRRPVTANR-LKTTVTSVNGP----TNSLM-PPSAMGGRRTAA I--GTGMRSAIGSRR-GGE------VARPMT 100 a_suum_gemoma_GS_05151 29 AVARS-SYGRRPVRHLHSSRRG-TDAARPMTA-VRAAG-YTSAGGR------VQAVRSLDRTRDLKTE------VPHYCN 92 d_immitis_manual_nDi.2.2.2.g06530 37 SVTKS-SYGRRPKSSTS--RLGVVPVATSINR-INNVI-SSHRSGTDTIT- ---GFMLRSSIGSRH-AIE------PSVP-- 100 b_malayi_wormbase_Bm6526 29 AIAKS-SYGRRPKSSMS--RLGIIPVATSSSR-INSII-SSHRSGTGADTMN--GVMLRSSIGSRR-GID------ASVPMT 96 o_volvulus_wormbase_OVOC1224 29 AVAKS-SYGRRPKSSMS--RQGAVHVNR-----INSLV-SNHRSGA----D TMSGTMLRSSAGSRR-GTE------PSVPMT 90 l_loa_wormbase_EFO13552.1+EJD75140.1 29 AVAKS-SYGRRPRSSMS--RLGIIPVTTPIIR-VKSLV-TNHRSGT--DTMSGGGVMLRSSIGSRH-GIE------PSVPMT 96 t_suis_manual_Y41G9A.1_ortholog 94 SLLPAGY------100 h_contortus_wormbase_HCOI00077900.t1 92 FIKMGKYDE------100 n_americanus_wormbase_NECAME_13555 ------p_exspectatus_gemoma_Y41G9A.1_ortholog 91 LVKESVILRS------100 p_pacificus_gemoma_Y41G9A.1_ortholog 71 AVKGIGYSSYHNKIEAEEAKAARENNEDKN 100 t_spiralis_manual_EFV52609 89 SVRGAGYSSFGH------100 m_hapla_wormbase_MhA1_Contig1998.frz3.gene5+MhA1_Cont... 93 SAASNNLA------100 h_bacteriophora_gemoma_Hba_17018 85 AVRAAGYTSFATEMAK------100 a_ceylanicum_wormbase_Acey_s0036.g3288.t1+Acey_s0036... 93 AVRAAGYT------100 c_angaria_gemoma_Cang_2012_03_13_00320.g9150.t1 83 AVRAAGYTSFANKFQAAE------100 c_japonica_genblastg_CJA09762 93 AVRGAGYT------100 c_sinica_wormbase_Csp5_scaffold_00652.g14034.t1 92 AVRGAGYTS------100 c_briggsae_genblastg_CBG02013 94 AVRGAGY------100 c_tropicalis_genblastg_Csp11.Scaffold582.g4693.t1+Csp11... 93 AVRGAGYT------100 c_brenneri_wormbase_CBN01720 87 AVRGAGYTSFANKV------100 c_remanei_wormbase_CRE07525 93 AVRGAGYT------100 c_elegans_Y41G9A.1 94 AVRGAGY------100 b_xylophilus_wormbase_BUX.s01198.126 92 AVRGAGFTS------100 s_ratti_wormbase_SRAE_2000236500 95 AIKAAG------100 p_redivivus_gemoma_g18596.t1 ------a_suum_gemoma_GS_05151 93 EEKCRQME------100 d_immitis_manual_nDi.2.2.2.g06530 ------b_malayi_wormbase_Bm6526 97 AVRG------100 o_volvulus_wormbase_OVOC1224 91 AVRGAGYSSA------100 l_loa_wormbase_EFO13552.1+EJD75140.1 97 AVRG------100

Figure 3.35: Multiple sequence alignment of first 100a.a. of osm-5 orthologs. Among 25 nematode genomes, 24 osm-5 orthologs are found, and 1 is not found.

3.3.29 Curation of osm-6 orthologs in nematodes osm-6 is an IFT-B component, and mutants have truncated cilia and male-mating defects (Cole et al., 1998). We identified osm-5 orthologs in 25 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.36).

118 Table 3.30: Curation of osm-6 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE27341 WormBase gene model Yes 88.0 C. tropicalis Csp11.Scaffold629.g7776.t2 WormBase gene model Yes 84.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN06040 WormBase gene model Yes 91.0 C. brenneri CBN29520 WormBase gene model Yes 76.9 No RNA-seq data for first intron, but first 100a.a. are conserved C. sinica Csp5_scaffold_00230.g7569.t1 WormBase gene model Yes 87.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG23329 WormBase gene model Yes 87.0 C. elegans R31.3 - - - C. japonica CJA07647 WormBase gene model Yes 87.0 119 C. angaria Cang_2012_03_13_00330.g9306.t1 genBlastG gene model Yes 75.5 Small gap ~30bp upstream; protein sequence contains some Xs near 5’ end but is otherwise conserved H. bacteriophora Hba_19163 GeMoMa gene model Yes 33.6 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI00937900.t1 WormBase gene model Yes 30.4 Upstream exon suggested by RNA-seq, but low coverage A. ceylanicum Acey_s0046.g1421.t1 WormBase gene model Yes 29.2 N. americanus NECAME_01226 GeMoMa gene model Yes 29.2 P. pacificus PPA16989 GeMoMa gene model Yes 39.4 P. exspectatus scaffold50-EXSNAP2012.42 GeMoMa gene model Yes 42.2 No RNA-seq data, but first 100a.a. are conserved S. ratti SRAE_0000005500 WormBase gene model Yes 29.1 P. redivivus g8491.t1 WormBase gene model Yes 31.1 Upstream exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) B. xylophilus BUX.s01254.30 WormBase gene model Yes 31.2 Curation of osm-6 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. incognita Minc10419a WormBase gene model Yes 28.4 M. incognita Minc08072a WormBase gene model Yes 29.3 M. hapla MhA1_Contig183.frz3.gene3 Manual gene model Yes 26.6 No RNA-seq data, first 100 a.a. partially conserved A. suum GS_02705 Manual gene model Yes 32.7 D. immitis nDi.2.2.2.t06884 WormBase gene model Yes 26.5 O. volvulus OVOC5860 WormBase gene model Yes 20.0 RNA-seq suggests possible alternative splicing upstream B. malayi Bm5399 Manual gene model Yes 26.4 Gap 1.6kb upstream L. loa EFO21229.1 WormBase gene model Yes 20.5 No RNA-seq data for first intron, but first 100a.a. are conserved 120 T. spiralis EFV53522 WormBase gene model Yes 19.3 Upstream exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) T. suis M514_08751 - No 5.9 Conflicting RNA-seq junctions t_suis_genblastg_M514_08751 1 ------MNITKYVTGRTCQK----FIEAGHNLFALV A-EFLKRQSTFSIADTVIRTHFDKYFHP 53 t_spiralis_wormbase_EFV53522 1 ------MSANQRSVILFDEHGGPEFSAQ TIGYKNF IER------LKRNWQMKSY- 42 b_xylophilus_wormbase_BUX.s01254.30 1 ------MMTPLQARENITGPREGNVNNSKPNVILIDQSRNEPFTARN-GLRNLQRH------FRNQWTFEIN- 59 c_angaria_genblastg_R31.3 1 MPP------XXXXXXXXXSL--K--LLIDQSKQQQISLI S-GFRGVARH------LKSLLNVEIN- 48 c_japonica_wormbase_CJA07647 1 MPP------YSDDKAIDRSIGRK--VLIDQSKQQQISLI S-GFRGVARH------LKSVLTVEIN- 50 c_tropicalis_wormbase_Csp11.Scaffold629.g7776.t2 1 ------MASYSDENMTNRSLARK--VLIDQSKQQQISMI S-GFRGVARH------LKSVLTVEIN- 50 c_elegans_R31.3 1 MPP------FSDEKMTNRSIGRK--VLIDQSKQQQISLI S-GFRGVARH------LKSVLTVEIN- 50 c_briggsae_wormbase_CBG23329 1 ------MTSFTDEKMTDRSTFRK--ALIDQSKQQQISLI S-GFRGVARH------LKSFLTVEIN- 50 c_brenneri_wormbase_CBN29520 1 MPP------FSDEKMTNRSTARK--VLIDQSKQQQISLI S-GFRGVARH------LKSVLTVEIN- 50 c_brenneri_wormbase_CBN06040 1 MPP------FSDEKMTNRSTARK--VLIDQSKQQQISLI S-GFRGVARH------LKSVLTVEIN- 50 c_remanei_wormbase_CRE27341 1 ------MTSFADEKMTNRSTFRK--VLIDQSKQQQISLI S-GFRGVARH------LKSVLTVEIN- 50 c_sinica_wormbase_Csp5_scaffold_00230.g7569.t1 1 ------MVYYPDEKMTNRSTFRK--VLIDQSKQQQISLI S-GFRGVARH------LKSVLTVEIN- 50 s_ratti_wormbase_SRAE_0000005500 1 ------MFLEKENLHSSTNT---IIFDTSKAQIFNIH T-GFRSIVKR------LKRDYNVINN- 47 p_redivivus_wormbase_g8491.t1 1 ------MVKQQRERTTK--IVFDQSKYQQFTLGN-GFRQMQKA------LRSQWSLEVN- 44 m_hapla_manual_MhA1_Contig183.frz3.gene3 1 ------MTMTTSNL--IIFNQTKGEDYNFYN-NFRQLHRH------IRNGWEVEIHQ 42 m_incognita_wormbase_Minc10419a 1 ------MTSNL--IIFNQTKDEEYNIYN-NFRHLHRR------IRSGWEVEIHQ 39 m_incognita_wormbase_Minc08072 1 ------MTSNL--IIFNQTKDEEYNIYN-NFRHLHRH------IRSGWEVEIHQ 39 a_suum_manual_GS_02705 1 ------MSGAALVTNEILQASGSGTGTRSKIIFDQSKKQLFNIHS-GFRQVHRR------LKNAWRVETN- 57 d_immitis_wormbase_nDi.2.2.2.t06884 1 --MISNFDSNLRFNISTNKTTREINEKQHSTTNTR--IIFDQSKKEPYHFHS-GFRQLHRR------LRNEWRLEVN- 66 o_volvulus_wormbase_OVOC5860 1 -MLTSTFDPNLRFSISANKRTHEINEEQHFIAKTR--IIFDQSKKEPFHFQS-GFRQIHKR------LHNEWRLEVN- 67 b_malayi_manual_Bm5399 1 ------MNKRAHEINTEQHSSTKTR--VIFDQSKNESYHFHS-GFRQIHKR------LRNEWILEVN- 52 l_loa_wormbase_EFO21229.1 1 MLTTTTFDSNHRFNISTNKRTYAINEEQHSITNTR--IIFDQSKNEPYHFHS-GFRQIHKR------LRNEWHLEIN- 68 p_exspectatus_gemoma_scaffold50-EXSNAP2012.42 1 MPP------RHGFLVCTTSRGGTK--VLIDESKKEQFNVL S-GFRGLVKK------LRSEWTVESN- 51 p_pacificus_gemoma_PPA16989 1 MPP------SLQKGIAAQQQSRGGTK--VLIDESKKEQFNVL S-GFRGLVKK------LRSEWTVESN- 53 h_bacteriophora_gemoma_Hba_19163 1 MPPITDFRANSRY-----AEEKKQEPPKKNRAGHK--ILLDMSKREPFNMHS-GLRGIHRR------LKNTWTVEAN- 63 h_contortus_wormbase_HCOI00937900.t1 1 MAPVTDFSTNPRY---SQKQEMENSNGTRHGQGHK--IIINQSKKELFNINS-GLRGIHRR------LKNAWTVETN- 65 a_ceylanicum_wormbase_Acey_s0046.g1421.t1 1 MPPVTNFATNPRY-----GQEQNSENSAKKPTGHK--IVVNQSKKEAFSIHS-GLRGIQRR------LKNAWTVETN- 63 n_americanus_gemoma_NECAME_01226 1 MPPVTNFATNPRY-----AQEQNDENSSKKPTGHK--IVVNQSKKETFSVL S-GLRGIQKR------LKNAWTVETN- 63 t_suis_genblastg_M514_08751 54 MEASVIDGVGASALAAVAGRSPSDDEITCSKALKFVYPYGCTLSLTK------100 t_spiralis_wormbase_EFV53522 43 KAQVAYDRIAHANIYIIANANRQFTKEEIEAFDQFINNGKNLLLLSSGGDADDDCQIS--- 100 b_xylophilus_wormbase_BUX.s01254.30 60 EDEILETTLEKCKILLIPGPNGKFYQQEIDVIRNFINSGGS------100 c_angaria_genblastg_R31.3 49 TETLNLSALEDVRMLIIPQPKTTFGPGEIETIWKFVEEGGSLMILSGENGE K------100 c_japonica_wormbase_CJA07647 51 TEPISLSGLDDVRMLIIPQPKSALGTGEIETIWKFVEEGGALMILSGEGG------100 c_tropicalis_wormbase_Csp11.Scaffold629.g7776.t2 51 TEQINLNVLDDVRMLIIPQPRSAFGTGEIETIWKFVDEGGCLMILSGEGG------100 c_elegans_R31.3 51 TEPINLNGLEDVRMLIIPQPKTSFGTGEIEAIWKFVEEGGSLMILSGEGG------100 c_briggsae_wormbase_CBG23329 51 TEPINLNGLDDVRMLIIPQPKSSFGTGEIETIWRFVKEGGSLMILSGEGG------100 c_brenneri_wormbase_CBN29520 51 TEPINLNVLDDVRMIIIPH------QIESIWKFVEEGGCLMILSGEGGE RQTLNEL--- 100 c_brenneri_wormbase_CBN06040 51 TEPINLNVLDDVRMIIIPQPKSAFGTGEIESIWKFVEEGGCLMILSGEGG------100 c_remanei_wormbase_CRE27341 51 TEPINLNGLDDVRMIIIPQPKSAFGTGEIETIWRFVEEGGCLMILSGEGG------100 c_sinica_wormbase_Csp5_scaffold_00230.g7569.t1 51 TEPINLNGLDDVRMIIIPQPKSAFGTGEIDAIWRFVEEGGCLMILSGEGG------100 s_ratti_wormbase_SRAE_0000005500 48 TDQLTENTFQDCKLFVISLPNEKFTEAEFGALRDYIKNGGNLLVLLGEGGE SK------100 p_redivivus_wormbase_g8491.t1 45 NDEISEATFEQTQIYVIAAPQANFNESEFNALRQFVAAGGSLFVMMTEDGEQSLAT----- 100 m_hapla_manual_MhA1_Contig183.frz3.gene3 43 NDEITETSFGKCRIFVLPSPKIKFTEEEFSALRKFLKSGGSILVLSSEGGE EKNGTNV--- 100 m_incognita_wormbase_Minc10419a 40 NDEITEGILGKCRIFVLPCPRVKFTDEEFSALRKFLKSGGSLLVLSSEGGE EKNSTNVNFL 100 m_incognita_wormbase_Minc08072 40 NDEITEGILGKCRIFVLPCPRVKFTDEEFSALRKFLKSGGSLLVLSSEGGE EKNSTNVNFL 100 a_suum_manual_GS_02705 58 TDEINAETFNECRIFIIPYPRAKFTQEEFEHIRRYLEGGGNLL------100 d_immitis_wormbase_nDi.2.2.2.t06884 67 TDEINANTFIECCLFIIPYPKAKFSEFE-DVIMKY------100 o_volvulus_wormbase_OVOC5860 68 TDEINANTFIECCLFVIPYPKVKFSQDEIDHLK------100 b_malayi_manual_Bm5399 53 TNEINANTFTECCLFVLPYPRRKFSQNEIDHLKHFIDDGNSVLVLMSE------100 l_loa_wormbase_EFO21229.1 69 MDEINANTFIECCLFVIPYPKAKFSQNEIDHL------100 p_exspectatus_gemoma_scaffold50-EXSNAP2012.42 52 ADEISDGTFINCRIFIIPAPRAKFNNEEIDALRKFLKDGGGLMIMLAED------100 p_pacificus_gemoma_PPA16989 54 TDEISDGTFINCRIFIIPAPRAKFNNEEIDALRKFLKDGCGLMIMLS------100 h_bacteriophora_gemoma_Hba_19163 64 MDEITDGTFEDVRLIIFPHPKAKFNVGEIESIRRFLS------100 h_contortus_wormbase_HCOI00937900.t1 66 PDEITDGMFEGVRAFILPHPRAKFNVSEMEAIGRF------100 a_ceylanicum_wormbase_Acey_s0046.g1421.t1 64 SDEITDGMFDGVRAFILPQPRAKFNVSEMESIGRFLT------100 n_americanus_gemoma_NECAME_01226 64 SDEITDGTFEGVRAFILPHPRAKFNISEMESIGRFLT------100

Figure 3.36: Multiple sequence alignment of first 100a.a. of osm-6 orthologs. Among 25 nematode genomes, 27 osm-6 orthologs are found, and none are not found. Note: C. brenneri and M. incognita contain two osm-6 genes, and all of these genes have high confidence 5’ start sites.

3.3.30 Curation of osm-12 orthologs in nematodes osm-12 (also named bbs-7) is a component of the BBSome, and is implicated in Bardet-Biedl Syn- drome type 7. bbs-7 is required along with bbs-8 to stabilize IFT particles; in bbs-7/osm-12 and bbs-8 mutants, IFT-A and IFT-B move along the axoneme separately (Ou et al., 2005a). We iden- tified osm-12 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.37).

121 Table 3.31: Curation of osm-12 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE24399 GeMoMa gene model Yes 90.0 C. tropicalis Csp11.Scaffold629.g14540.t1+ GeMoMa gene model Yes 90.0 No RNA-seq data, but first 100a.a. are Csp11.Scaffold629.g14538.t1 conserved C. brenneri CBN08169 WormBase gene model Yes 92.0 C. sinica Csp5_scaffold_02761.g27836.t1+ GeMoMa gene model Yes 87.0 No RNA-seq data, but first 100a.a. are con- Csp5_scaffold_02761.g27838.t1 served; 800bp from end of contig C. briggsae CBG23043 GeMoMa gene model Yes 81.0 C. elegans Y75B8A.12 - - - C. japonica CJA18153a+CJA39931a WormBase gene model Yes 90.0 C. angaria Cang_2012_03_13_00227.g7406.t1 GeMoMa gene model Yes 35.2 No RNA-seq data for first intron, but first 100a.a. are conserved except for small gap

122 in assembly H. bacteriophora Hba_10053 genBlastG gene model Yes 48.2 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI02168000.t1 Manual gene model Yes 54.2 3.5kb from end of contig H. contortus HCOI00752400.t1 Manual gene model Yes 54.2 A. ceylanicum Acey_s0080.g1357.t1 WormBase gene model Yes 52.3 N. americanus NECAME_17284 - No 1.6 Gap in 5’ end of alignment P. pacificus PPA23445 WormBase gene model Yes 42.6 No RNA-seq data for first intron, but first 100a.a. are conserved; promoter contains Ns P. exspectatus Y75B8A.12_ortholog genBlastG gene model Yes 29.9 No RNA-seq data, first 100a.a. partially conserved S. ratti SRAE_2000165500 WormBase gene model Yes 32.1 P. redivivus g13890.t1 WormBase gene model Yes 34.7 B. xylophilus BUX.s01281.306 WormBase gene model Yes 38.1 Upstream exon suggested by RNA-seq, but low coverage M. incognita Minc15799 WormBase gene model No 17.7 First 100a.a. partially conserved Curation of osm-12 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla No ortholog found - - - Low sequence similarity (GeMoMa PID: 19, genBlastG PID: 19.8) A. suum GS_17803+GS_20345+GS_15251 GeMoMa gene model Yes 40.2 No RNA-seq data for this gene, but first 100a.a. are conserved; small gap 2kp upstream D. immitis Y75B8A.12_ortholog GeMoMa gene model No 11.3 End of contig O. volvulus OVOC7790 WormBase gene model Yes 39.3 Different first exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved, 5’ end of gene can’t be extended due to stop codons) B. malayi Bm7137 WormBase gene model Yes 39.6 L. loa EFO21233.2 WormBase gene model Yes 35.8 Sparse RNA-seq data for this gene, but first 100a.a. are conserved

123 T. spiralis EFV59623 Manual gene model Yes 38.7 T. suis Y75B8A.12_ortholog Manual gene model No 34.9 First 100a.a. partially conserved; some con- flicting RNA-seq introns n_americanus_wormbase_NECAME_17284 1 MTRKGGNHAAYLPSFQVHDTFEFSPQYNAYSLIIELV------L PIDFIIVQSRIPAKL-----VEVERNASVVC64 d_immitis_gemoma_Y75B8A.12_ortholog 1 ------YLNATSQSDVTLELIDSK------NALLASYRCQANTTRIEIKIRPLEGQCGILQA50 m_incognita_wormbase_Minc15799 1 ------MELNLNRVDYAHVGTTTNCCLKVIHTT------SVDGSHLSTRRHSTIKLENNTESGGGGGGEQLI 60 t_spiralis_manual_EFV59623 1 ------MKFNWYRSDYLQTSPTSRRCMRLLPCL------PGGGGLQQL-----VVAHQNGVVAC47 t_suis_manual_Y75B8A.12 1 ------MGWYMTELFQSSSTSKKCMRLLPHL------AKGDNTQQV-----VLANQNGVVWC45 s_ratti_wormbase_SRAE_2000165500 1 ------MNLNLSRIDIFQVGTTSQDCLKIITLNDD------EK KAKKKSKKNKSLIKI-----VVGSQDGVLTC57 a_suum_gemoma_Y75B8A.12_ortholog 1 ------MDITLTRVDYAQIGTASKGCMRVIAAD------K- --DDKKKKTRSLDKV-----VCGGHSGVVLC52 o_volvulus_wormbase_OVOC7790 1 ------MDLTLTRLDYAQIGTSNKRSMRIIPVN------P- --STKKKKSKFLDKI-----VCSGSNGTIMC52 b_malayi_wormbase_Bm7137 1 ------MELTLTRIDYAQIGTTCKRSMRIIPTN------QNTKKKSKFLDKI-----VCGGHSGTVLC51 l_loa_wormbase_EFO21233.2 1 ------MDLTLTRVDYTQIGVTCKRSMRIIPAN------QSTKKKSKFLDKV-----VCGSQNGTLLC51 p_exspectatus_genblastg_Y75B8A.12_ortholog 1 ------MSEIALTRVDLAQVGQTKRNCLKVIPVE------R- --DKTNGTPKSLERV-----VVGSVNGSVVC53 p_pacificus_wormbase_PPA23445 1 ------MSEIALTRVDLAQVGQTKRNCLKVIPVE------R- --DKTNGNPKSLERV-----VVGSVNGSVVC53 b_xylophilus_wormbase_BUX.s01281.306 1 ------MLVNVKFYRVDYAQVGTTNRGCMRLIHSEK------SGNVTPKKKAQQPKYRI-----VVGANSGILLC58 p_redivivus_wormbase_g13890.t1 1 ------MLNAPKLYRTDYAQVGTTNRGCLRVIHFDPAGEPENKGSIFRRNSKKGRQKDVAGDKV-----VVGSQDGILLC69 h_bacteriophora_genblastg_Y75B8A.12_ortholog 1 ------MGLNLNRTDFAQVGTTNRSCMQVIPSE------RI FNKKEKKIKKPNDKI-----VVGSQNGSVVC55 h_contortus_manual_HCOI02168000.t1 1 ------MELNFMRVDFAQVGTTNRTCMRVIPSE------KLDRSKKKRPLERV-----VVGSQNGCVIC52 h_contortus_manual_HCOI00752400.t1 1 ------MELNFMRVDFAQVGTTNRTCMRVIPSE------K- --LDRSKKKRPLERV-----VVGSQNGCVIC52 a_ceylanicum_wormbase_Acey_s0080.g1357.t1 1 ------MELGLTRVDFAQVGTTNRSCMRVIPSE------K- --IDRGKKKRPLERI-----VVGSQNGCVIC52 c_angaria_gemoma_Cang_2012_03_13_00227.g7406.t1 1 ------MIIYTRTDFAQVGTTNRGCMAVVSGE------KKKNIDK------33 c_briggsae_gemoma_CBG23043 1 ------MSNYSRTDFAQVGTTNRGCMKVIPAD------KEKDFDLI-----VVGGQNGSLIC45 c_sinica_gemoma_Csp5_scaffold_02761.g27836.t1 1 ------MQNYTRTDFAQVGTTNRGCMKVIPAD------KEKDFDLI-----VVGGQNGSLIC45 c_remanei_gemoma_CRE24399 1 ------MQNYSRTDFAQVGTTNRGCMKVIPAE------KASEFDVI-----VVGGQNGSLIC45 c_elegans_Y75B8A.12 1 ------MQNYSRTDFAQVGTTNRGCMRVIPSD------KEKEHDLI-----VVGGQNGSLIC45 c_tropicalis_gemoma_Csp11.Scaffold629.g14540.t1 1 ------MQNYTRTDYAQVGTTNRGCMRVIPAD------KEKDFDLI-----VVGGQNGSLIC45 c_brenneri_wormbase_CBN08169 1 ------MQNYTRTDYAQVGTTNRGCMRVIPAD------KDKDYDLI-----VVGGQNGSLIC45 c_japonica_wormbase_CJA18153a+CJA39931a 1 ------MQNYTRTDFAQVGTTNRGCMRVIPAD------KDKDFDLI-----VVGGQNGSLIC45 n_americanus_wormbase_NECAME_17284 65 -QQAQS------EYNPWSLLASYRCQANVSR-IELRVKVDEGTY------100 d_immitis_gemoma_Y75B8A.12_ortholog 51 YICPKIHPKMCQVRNYVVKPLSLHQRVHQFDSSRPLNTLKLTGNFSIAEA------100 m_incognita_wormbase_Minc15799 61 -IKRKGSNKRK--NKLNENETDKLIIGDQNGV------LLCMERKDGNT------100 t_spiralis_manual_EFV59623 48 -V-SWKKKPVIDFKTFPDARIDAIALGGEEHS-ACDKIFLASGSVVKGYNRRGKHF------100 t_suis_manual_Y75B8A.12 46 -L-RWKKQPLVEFKTFPGPRIDALTLGGEEDV-GADKIFFASGSVVNGYNRRGKHFLA------100 s_ratti_wormbase_SRAE_2000165500 58 -LENRDSSYNVVFKTLSGPPIVCVKLGGALNT-IQDRIFIGVDNF------100 a_suum_gemoma_Y75B8A.12_ortholog 53 -IARKNAETQVVFKTMPGPRIDCLRLGGALGT-VQDKIFVACENYVKGYS------100 o_volvulus_wormbase_OVOC7790 53 -FGRKDDETKVIFKTAPGPKVICICLGGALGM-VQDKIFCAHEDRVKGYT------100 b_malayi_wormbase_Bm7137 52 -FGRKDGETQVIFKTPPGPKVVCICLGGALGM-IQDKIFCAYEDRVKGYTK ------100 l_loa_wormbase_EFO21233.2 52 -FGRKGDETQVIFKTASGPKIVCICLGGALGM-IQDKIFCAYEDRVKGYTK ------100 p_exspectatus_genblastg_Y75B8A.12_ortholog 54 -LFRKKNETQ------VFIASDNVVRGVSK KGKHFFAFETNMAEPIRRMSVLFP------100 p_pacificus_wormbase_PPA23445 54 -LFRKKNETQVAYKTPPGKPVEVVFLGGAVGS-ICDKVFIASDNVVRGV------100 b_xylophilus_wormbase_BUX.s01281.306 59 -LERKADETKIVFKTNPGPPISFVRLGGALHT-IQDKTFAAAGE------100 p_redivivus_wormbase_g13890.t1 70 -LERKSVDTKIIFKTPTGPPITQVALGGALNT------100 h_bacteriophora_genblastg_Y75B8A.12_ortholog 56 -VCRRNNDTQILYKTLPGPPIESVCLGGAIGT-LQDKVFVASGSNVR------100 h_contortus_manual_HCOI02168000.t1 53 -LCRKNNDTQIIYKTLPGPPIEALCLGGALGT-LQDKVFVASGSNVRGIG------100 h_contortus_manual_HCOI00752400.t1 53 -LCRKNNDTQIIYKTLPGPPIEALCLGGALGT-LQDKVFVASGSNVRGIG------100 a_ceylanicum_wormbase_Acey_s0080.g1357.t1 53 -VCRKNNDTQIIYKTLPGPPIEAVCLGGAIGT-LQDKVFVASGSNVRGIG------100 c_angaria_gemoma_Cang_2012_03_13_00227.g7406.t1 34 ------KTPPGPPIDAMCLAGQLGT-TKDKLFVASGNSVRGINK KGKMFYDFPTDMAEPARRLHVYGVELVVAG 100 c_briggsae_gemoma_CBG23043 46 -LSRKSNDTTIVFKTPPGPPIQSLALGGSPANKKKDKVFFASGNQVRGVNK KGKIF------100 c_sinica_gemoma_Csp5_scaffold_02761.g27836.t1 46 -LSRKSNDTTIIFKTQPGLPIQSLALGGPIGSKKKDKIFIASGNSIRGVNRKGKTF------100 c_remanei_gemoma_CRE24399 46 -LSRKSNDTTIIFKTQPGYPIQSLALGGPQSSKKKDKIFVASQNTVRGVNRRGKTF------100 c_elegans_Y75B8A.12 46 -LSRKSNDTTIIFKTQPGYPVQSLALGGPASSKKKDKIFVASQNTVRGVNRKGKTF------100 c_tropicalis_gemoma_Csp11.Scaffold629.g14540.t1 46 -LSRKSNDTTIVFKTQPGNSIQSLALGGPASSKKKDKIFVASSNTVRGVNRKGKTF------100 c_brenneri_wormbase_CBN08169 46 -LSRKSNDTTIIFKTQPGPPIQSLALGGPASSKKKDKIFVASQNTVRGVNRKGKTF------100 c_japonica_wormbase_CJA18153a+CJA39931a 46 -LSRKSNDTTIIFKTQPGAPIQSLALGGPTSSKKKDKIFVSSQNTVKGVNRKGKTF------100

Figure 3.37: Multiple sequence alignment of first 100a.a. of osm-12 orthologs. Among 25 nematode genomes, 25 osm-12 orthologs are found, and 1 is not found. Note: H. contortus contains two osm-12 genes, and both genes have high confidence 5’ start sites.

3.3.31 Curation of tub-1 orthologs in nematodes tub-1 is expressed in ciliated neurons and is required for the localization of G-protein-coupled re- ceptors to the cilia (Brear et al., 2014). We identified tub-1 orthologs in 24 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figures 3.38 and 3.39).

124 Table 3.32: Curation of tub-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE01622 WormBase gene model Yes 69.2 C. tropicalis Csp11.Scaffold630.g18509.t1 - No 14.6 Gap in 5’ end of alignment C. brenneri CBN02834 WormBase gene model Yes 57.7 C. sinica Csp5_scaffold_00584.g13160.t1 WormBase gene model Yes 65.7 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG00741 WormBase gene model Yes 63.8 C. elegans F10B5.4 - - - C. japonica CJA16333 WormBase gene model Yes 39.7 C. angaria Cang_2012_03_13_00354.g9640.t1+ genBlastG gene model Yes 38.9 2bp gap 200bp upstream Cang_2012_03_13_00354.g9639.t1 C. angaria Cang_2012_03_13_00354.g9653.t4 genBlastG gene model Yes 38.9 125 H. bacteriophora Hba_21059 WormBase gene model Yes 27.0 No RNA-seq data, but first 100a.a. are conserved H. contortus HCOI02132100.t1 WormBase gene model Yes 32.1 A. ceylanicum Acey_s0544.g3235.t1 - No 3.9 End of contig N. americanus NECAME_14538 - No 6.6 Gap in 5’ end of alignment; several gaps in assembly upstream of gene P. pacificus PPA12804 - No 6.9 Gap in 5’ end of alignment; gap in assembly <100bp upstream of genBlastG prediction P. exspectatus scaffold223-EXSNAP2012.30 - No 8.7 Gap in 5’ end of alignment; gap in assembly ~200bp upstream of GeMoMa prediction S. ratti SRAE_X000013100 WormBase gene model Yes 25.2 P. redivivus g22223.t1 WormBase gene model Yes 27.6 B. xylophilus BUX.s01281.234 Manual gene model No 19.8 Gaps in 5’ end of alignment M. incognita No ortholog found - - - Low sequence similarity (GeMoMa PID: 22.1, genBlastG PID: 9.7) Curation of tub-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) M. hapla MhA1_Contig1803.frz3.gene4 WormBase gene model No 21.5 No RNA-seq data, first 100 a.a. partially conserved A. suum GS_12735 - No 1.6 Short contig D. immitis nDi.2.2.2.t07763 - No 4.7 100bp from end of contig O. volvulus OVOC11413 WormBase gene model Yes 25.0 Upstream exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) B. malayi Bm3252 - No 4.4 Short contig L. loa EFO14140.1+EJD74798.1 - No 20.8 Gap ~400bp upstream of gene T. spiralis EFV51631 WormBase gene model Yes 21.0 Upstream exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) T. suis M514_01275 WormBase gene model Yes 20.5 RNA-seq junctions on both strands 126 d_immitis_wormbase_nDi.2.2.2.t07763 1 MTVLIPGICDAEN------YRRLEIRPLLEQESILERWK-----NHKADDLI-----AMHNKSPVWNEDTQSYVLNFHG 63 a_ceylanicum_wormbase_Acey_s0502.g2614 1 MFRECYSVPMSGSSLIRALDNHDRVETERLLLSQPQKHSNFI------FAFTSFTLLSNTICPFIAWRLM 64 b_malayi_wormbase_Bm3252 1 ------IFLLAARKRK-----KSTTANYLISIDPTDLKRYGNSFVGKVRS N------ALGTQFTLYDNGENPKKSWVIG 62 c_tropicalis_gemoma_Csp11.Scaffold630.g18509.t1 1 ------MKTIR-----ERKNNKRSLQKNSQKNQKIL--SISFLKN LS------KTFRNSVLRLFQKKFIFGVFEN 56 a_suum_wormbase_GS_12735 1 MNALCPGTEYPDINEII-----TNLEQFVMAPAKRNVTYKCR--ITRDKKGVDR------GIYPTYFLHLEKDDEKRVFLLA 69 n_americanus_wormbase_NECAME_14538 1 ------MSRSSDTWNN------SENKNDVEEHT------LPSEELPAYDEISANIEKFVQA 43 p_exspectatus_wormbase_scaffold223-EXSNAP2012.30 1 ----MSPTQSEGDPENE-----EDFECGRVAVSPRIEEKNGI------VEE T------MLPMKEPDYEKIMKNLEEFVRM 59 p_pacificus_wormbase_PPA12804 1 ------MKEPDYEKIM-----KNLEEFVRMPAEEGVTYKCI--INRDKNGMDK------GMFPTYYLHLESNNKKKIFLLA 62 t_spiralis_wormbase_EFV51631 1 ------MDGESGSL------FEKQRRALSEKQRMKREK--STGMVVR TD------VVREEAAHLLTGTMKAVQATQD 57 t_suis_wormbase_M514_01275 1 ------MNERPD------RLLERQRRLMAERQLMRRGR--STGMVVR TDV------PRDESTHLISNSIRTLEPVYDS 58 b_xylophilus_manual_F10B5.4 1 ------MSNDWVT-----HNLQRQ--MQQNGIVASKVE------RADSS------EYDDPLPSTLNSGSLYSFLENA 52 l_loa_wormbase_EFO14140.1+EJD74798.1 1 ------MEERQRQKRRL--SSANIRT NDTSMMVMVTSKSNHKLHCYKGPLYFTSPED 49 o_volvulus_wormbase_OVOC11413 1 ------MMSNNKEWMK-----QRLEKQKKIMEEHLRQKRRL--SSANVRP NDTS-----LMLMAMERNYNKLCSYNGPLSC 63 m_hapla_wormbase_MhA1_Contig1803.frz3.gene4 1 ------MSASDRWVN-----HNLQRQRELLE-KQRQRRLQANLTSTVFK TNE------NGQNENQNFQINKSKDIFDSTQ 62 s_ratti_wormbase_SRAE_X000013100 1 ------MSNEQKEWIN-----HNLRKQRRLLEERQRQKRLN---TGVINSDKNC-----FGQSLFNAYSFNECKDKNNKEL 62 p_redivivus_wormbase_g22223.t1 1 ------MESNRAWVE-----ENLKRQRVMLEEKQRARRMQ--SAGLMRT TNHSAP---PIQGAGMTGMVPSSSYSFTHSA 64 h_bacteriophora_wormbase_Hba_21059 1 ------MSDANADWVA-----QNLHRQRKILEEKQKQKRMA---STGIRC NLV------PTMTGRPPYPSTDPVSFAAYPS 61 h_contortus_wormbase_HCOI02132100.t1 1 ------MSDGNADWVA-----QNLHRQRKILEEKQRQKRIT---SATIRS NQ------LPSSGYHTMNSSTGFGFASPPC 60 c_angaria_genblastg_Cang_2012_03_13_00354.g9640.t1+Cang... 1 -MSSSEPNSHTNAQWVE-----RNLQRQRKMLEDKQRQKRFQ--VGGGVRA N------PTFGQSPSFMSDYSLYSSSSGN 66 c_angaria_genblastg_Cang_2012_03_13_00354.g9653.t4 1 -MSSSEPNSHTNAQWVE-----RNLQRQRKMLEDKQRQKRFQ--VGGGVRA N------PTFGQSPSFMSDYSLYSSSSGN 66 c_japonica_wormbase_CJA16333 1 ------MDSNAQWMS- - - - -MNMQRQRKMLEEKQKQKRNQ- -SVGSVRT TTTT-----TTTSLASGYSGNSMNYQPLFES 62 c_brenneri_wormbase_CBN02834 1 ------MAEGNNPWIE-----QNLQRQRKMLEEKQKQKRHQ--SAGSVRT NSSSMTM--NSMKDYPTFDTTAPSSFGFSDS 66 c_elegans_F10B5.4 1 ------MTDTNSQWIE-----MNLQRQRKMLEDKQKQKRHQ--SAGSVRT TSTAMSM--NSMKDYPTFDNSLPFSISDNSS 66 c_remanei_wormbase_CRE01622 1 ------MTDTNSQWIE-----QNLQRQRKMLEDKQKQKRHQ--SAGSVRT TSTAMSL--NSMKDYPSFDNANAFTSPSTDG 66 c_briggsae_wormbase_CBG00741 1 ------MSDTNSAWIE-----QNLQRQRKMLEDKQKQKRHQ--SAGSVRT TTTTSSMSMNNMKDYPAFETSLPFSMSDHTS 68 c_sinica_wormbase_Csp5_scaffold_00584.g13160.t1 1 ------MSDTNSAWIE-----QNLQRQRKMLEDKQKQKRHQ--SAGSVRT TTS-ASMSMNSMKDYPTFDTSSPFSMVDTPS 67 d_immitis_wormbase_nDi.2.2.2.t07763 64 RVTQPSVKNFQIV------HDADPKYIVMQFGRIGYDAFTMDF------100 a_ceylanicum_wormbase_Acey_s0502.g2614 65 GYAVEKRREC------KETEVSMRDSEDRVALHYAAETMDLE------100 b_malayi_wormbase_Bm3252 63 DSVRQELAAVIYEQ------ESILERWKSRKADDLIAMHNKSPV------100 c_tropicalis_gemoma_Csp11.Scaffold630.g18509.t1 57 FHFSKKFQSLKIQKKFQKTKKRLISNNLMSSILSNFLSNSPLTS------100 a_suum_wormbase_GS_12735 70 ARKRKKSATANYL------ISIDPTDLSRVGSSFVGK------100 n_americanus_wormbase_NECAME_14538 44 PAKRNVIYKCSITRDKRGMDKGIYPTYYLHLEREDKKKIFLLAARRRKKST TANYLI 100 p_exspectatus_wormbase_scaffold223-EXSNAP2012.30 60 PAEEGVTYKCIINRDKNGMDKGMFPTYYLHLEGNNKKKIFL------100 p_pacificus_wormbase_PPA12804 63 ARKRKKCKTANYL------ISTDPTRLLRTGEGFIGKVRSNAIG------100 t_spiralis_wormbase_EFV51631 58 GSHMFNCYANPIVISDEE---ADEPPFCHSTSEQDGGTRQQQQYSD------100 t_suis_wormbase_M514_01275 59 HLLN--CYANPMQISDGE---DDDEPHIFGQEKRLNYHAVETGSQGA------100 b_xylophilus_manual_F10B5.4 53 PAQKPSHTPIPSSKSISS---ISTGPKIITVKGLTPPHSQRQSLSMESTTT ------100 l_loa_wormbase_EFO14140.1+EJD74798.1 50 PDRISPTTASGSTVT------TVVTSQYNESSSEDVNQSPYNSSATTFVPNSTSATF 100 o_volvulus_wormbase_OVOC11413 64 TSLEDPDRISPIIP------SESPPGTVDSESHIPIAISDLTV------100 m_hapla_wormbase_MhA1_Contig1803.frz3.gene4 63 FNTTASTTPTTSQN------NNIGGRQIHSSGSLYSFVGNESLI------100 s_ratti_wormbase_SRAE_X000013100 63 Y-----AYENPLTLSSDET--DAIQKTYITVKNFTPPPVQSIPSD------100 p_redivivus_wormbase_g22223.t1 65 HDYVSQPFGDPFSP------ASHSTSYGVVDKASMPAEPIPV------100 h_bacteriophora_wormbase_Hba_21059 62 A-----CYDGPLAFGDP----DSTTPTVITVKGLTPTTSISETNGGED------100 h_contortus_wormbase_HCOI02132100.t1 61 YDGPLSTYSDP------DATGPTLVTIPGIEYSQNTVTSRPTTSNK------100 c_angaria_genblastg_Cang_2012_03_13_00354.g9640.t1+Cang... 67 A-----SMPFAIEP------ERITPTVVTIQSTIPRLEPPPRHVP------100 c_angaria_genblastg_Cang_2012_03_13_00354.g9653.t4 67 A-----SMPFAIEP------ERITPTVVTIQSTIPRLEPPPRHVP------100 c_japonica_wormbase_CJA16333 63 ------ALPFGMTDSLSSSTANINSPMTSAPLNSSLNLNSNPIL------100 c_brenneri_wormbase_CBN02834 67 LSS---NMTIPLISAAASGGPPVAPPRAQSMTRHSPP------100 c_elegans_F10B5.4 67 VSV---SMNTPLIPTQD----PIAQPRMQSMPRQQPQQVQE------100 c_remanei_wormbase_CRE01622 67 VS----NMNAPLIPTQA----PVAPPRVQSMTTTRHNSIPQE------100 c_briggsae_wormbase_CBG00741 69 ------NMNTPLIPT------PQAPPRNHTLTTRQSSMPATDTL------100 c_sinica_wormbase_Csp5_scaffold_00584.g13160.t1 68 ------NMNTPLIPA------PQAPPRSHSMTTRQSSLPPTETLI------100

Figure 3.38: Multiple sequence alignment of first 100a.a. of tub-1 orthologs. Among 25 nematode genomes, 25 tub-1 orthologs are found, and 1 is not found. Note: C. angaria contains two tub-1 genes, and both genes have high confidence 5’ start sites.

o_volvulus_wormbase_OVOC11413 1 ------MMSNNKEWMKQRLEKQKKIMEEHLRQKRRLSSANVRPNDTSLMLMA------MERNYNKLCSYNGPLSCTSLEDP- 69 t_spiralis_wormbase_EFV51631 1 ------MDGESGSL---FEKQRRALSEKQRMKREKSTGMVVRTDVVREEA A------HLLTGTMKAVQATQDGSHMFN--- 63 t_suis_wormbase_M514_01275 1 ------MNERPDRL---LERQRRLMAERQLMRRGRSTGMVVRTDVPRDES THLISN-SIRTLEPVYDSHLLN------62 s_ratti_wormbase_SRAE_X000013100 1 ------MSNEQKEWINHNLRKQRRLLEERQRQKRLNTGVINSDKNCFGQ------SLFNAYSFNECKDKNNKELY---- 63 p_redivivus_wormbase_g22223.t1 1 ------MESNRAWVEENLKRQRVMLEEKQRARRMQSAGLMRTTNHSA------PPIQGAGMTGMVPSSSYSFTHSAH 65 h_bacteriophora_wormbase_Hba_21059 1 ------MSDANADWVAQNLHRQRKILEEKQKQKRMASTG-IRCNLV------PTMTGRPPYPSTDPVS--FAAYPS 61 h_contortus_wormbase_HCOI02132100.t1 1 ------MSDGNADWVAQNLHRQRKILEEKQRQKRITSA-TIRSNQ------LPSSGYHTMNSSTGFG--FASPP- 59 c_angaria_genblastg_Cang_2012_03_13_00354.g9640.t1+Cang... 1 MSSSEPNSHTNAQWVERNLQRQRKMLEDKQRQKRFQVGGGVRANPTFGQSP ------SFMSDYSLYSSSSGNA------67 c_angaria_genblastg_Cang_2012_03_13_00354.g9653.t4 1 MSSSEPNSHTNAQWVERNLQRQRKMLEDKQRQKRFQVGGGVRANPTFGQSP ------SFMSDYSLYSSSSGNA------67 c_japonica_wormbase_CJA16333 1 ------MDSNAQWMSMNMQRQRKMLEEKQKQKRNQSVGSVRTTTTTTTTS LASGYSGNSMNYQPLFESALPFG--MTDSLS 73 c_brenneri_wormbase_CBN02834 1 ------MAEGNNPWIEQNLQRQRKMLEEKQKQKRHQSAGSVRTNSS--SMTM------NSMKDYPTFDTTAPSSFGFSDSLS 68 c_elegans_F10B5.4 1 ------MTDTNSQWIEMNLQRQRKMLEDKQKQKRHQSAGSVRTTST--AMSM------NSMKDYPTFDNSLPFS--ISDNSS 66 c_remanei_wormbase_CRE01622 1 ------MTDTNSQWIEQNLQRQRKMLEDKQKQKRHQSAGSVRTTST--AMS L------NSMKDYPSFDNANAFTSPSTDGVS 68 c_briggsae_wormbase_CBG00741 1 ------MSDTNSAWIEQNLQRQRKMLEDKQKQKRHQSAGSVRTTTTTSSMSM------NNMKDYPAFETSLPFS--MSDHTS 68 c_sinica_wormbase_Csp5_scaffold_00584.g13160.t1 1 ------MSDTNSAWIEQNLQRQRKMLEDKQKQKRHQSAGSVRTTTS-ASMSM------NSMKDYPTFDTSSPFS--MVDTPS 67 o_volvulus_wormbase_OVOC11413 70 ----DRISPIIP------SESPPGTVDSE----SHIPIAISDLTV-- 100 t_spiralis_wormbase_EFV51631 64 ----CYANPIVISDEE---ADEPPFCHSTSEQDGGTRQQQQYSD---- 100 t_suis_wormbase_M514_01275 63 ----CYANPMQISDGE---DDDEPHIFGQEKRLNYHAVETGSQGA--- 100 s_ratti_wormbase_SRAE_X000013100 64 ----AYENPLTLSSDET--DAIQKTYITVKN---FTPPPVQSIPSD-- 100 p_redivivus_wormbase_g22223.t1 66 D---YVSQPFGD------PFSPASHSTSYGVVDKASMPAEPIPV--- 100 h_bacteriophora_wormbase_Hba_21059 62 A---CYDGPLAFGDP----DSTTPTVITVKGLTPTTSISETNGGED-- 100 h_contortus_wormbase_HCOI02132100.t1 60 ----CYDGPLSTYSDP---DATGPTLVTIPGIEYSQNTVTSRPTTSNK 100 c_angaria_genblastg_Cang_2012_03_13_00354.g9640.t1+Cang... 68 ----SMPFAIEP------ERITPTVVTIQ----STIPRLEPPPRHVP 100 c_angaria_genblastg_Cang_2012_03_13_00354.g9653.t4 68 ----SMPFAIEP------ERITPTVVTIQ----STIPRLEPPPRHVP 100 c_japonica_wormbase_CJA16333 74 SSTANINSPMTSA------PLNSSLN----LNSNPIL------100 c_brenneri_wormbase_CBN02834 69 --S-NMTIPLISAAASGGPPVAPPRAQSMT----RHSPP------100 c_elegans_F10B5.4 67 VSV-SMNTPLIPTQD----PIAQPRMQSMP----RQQPQQVQE----- 100 c_remanei_wormbase_CRE01622 69 ----NMNAPLIPTQA----PVAPPRVQSMT----TTRHNSIPQE---- 100 c_briggsae_wormbase_CBG00741 69 ----NMNTPLIPT------PQAPPRNHTL-----TTRQSSMPATDTL- 100 c_sinica_wormbase_Csp5_scaffold_00584.g13160.t1 68 ----NMNTPLIPA------PQAPPRSHSM-----TTRQSSLPPTETLI 100

Figure 3.39: Multiple sequence alignment of first 100a.a. of tub-1 orthologs, only showing genes with high confidence 5’ start sites. This additional alignment is generated in order to show the conserved regions more clearly and remove noise caused by sequences without high confidence 5’ start sites

3.3.32 Curation of xbx-1 orthologs in nematodes xbx-1 is a dynein light chain component, and functions in retrograde IFT (Schafer et al., 2003). xbx-1 mutants are truncated and IFT proteins such as OSM-5 accumulate in the tip (Schafer et al., 2003).

127 We identified xbx-1 orthologs in 23 nematode species, and the first 100 a.a. of these orthologs are fairly well-conserved (Figure 3.40).

128 Table 3.33: Curation of xbx-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) C. remanei CRE04951 genBlastG gene model Yes 90.0 C. tropicalis F02D8.3_ortholog GeMoMa gene model Yes 92.0 No RNA-seq data, but first 100a.a. are conserved C. brenneri CBN31685 WormBase gene model Yes 91.0 C. sinica Csp5_scaffold_00817.g15749.t1 genBlastG gene model Yes 90.0 No RNA-seq data, but first 100a.a. are conserved C. briggsae CBG11597 WormBase gene model Yes 89.0 C. elegans F02D8.3 - - - C. japonica CJA41183 genBlastG gene model Yes 79.2 C. angaria Cang_2012_03_13_01212.g15992.t1 WormBase gene model No 0.0 Gap 200bp upstream, 1.2kb from end of contig H. bacteriophora F02D8.3_ortholog genBlastG gene model Yes 46.2 No RNA-seq data, but first 100a.a. are 129 conserved H. contortus HCOI00579500.t1 WormBase gene model Yes 39.8 H. contortus HCOI02107700.t1 WormBase gene model Yes 38.5 Gap 600bp upstream A. ceylanicum Acey_s0007.g3412.t1 WormBase gene model Yes 39.3 N. americanus F02D8.3_ortholog Manual gene model Yes 37.6 P. pacificus PPA21711 WormBase gene model Yes 33.0 P. exspectatus scaffold451-EXSNAP2012.6 WormBase gene model Yes 32.1 No RNA-seq data, first 100 a.a. partially conserved S. ratti SRAE_2000400300 WormBase gene model Yes 33.3 P. redivivus g7439.t1 WormBase gene model Yes 28.2 B. xylophilus BUX.s00713.1091 WormBase gene model Yes 26.5 M. incognita Minc16028+Minc16029 Manual gene model No 24.8 Gaps in 5’ end of alignment M. incognita Minc15725 Manual gene model No 25.6 Gaps in 5’ end of alignment M. hapla No ortholog found - - Low sequence similarity (GeMoMa PID: 13.8, genBlastG PID: 18.8) Curation of xbx-1 orthologs

Species Gene Name Annotation Used High Confidence PID (first Comment 5’ Start Site 100a.a.) A. suum GS_16646 GeMoMa gene model Yes 41.4 No RNA-seq data for this gene, but first 100a.a. are conserved D. immitis nDi.2.2.2.t01309 WormBase gene model Yes 34.3 Upstream exon suggested by RNA-seq, but may be UTR (first 100a.a. are conserved) O. volvulus OVOC7272 WormBase gene model Yes 31.5 Upstream exon suggested by RNA-seq, but low coverage O. volvulus OVOC7345 WormBase gene model Yes 31.5 Upstream exon suggested by RNA-seq, but low coverage B. malayi Bm3149b WormBase gene model Yes 37.7 L. loa EFO16364.2 WormBase gene model Yes 36.3 T. spiralis EFV50050 Manual gene model Yes 25.2

130 T. suis No ortholog found - - - Low sequence similarity (GeMoMa PID: 11, other predictions not found) c_angaria_wormbase_Cang_2012_03_13_01212.g15992.t1 1 MNCPPPAPPPPPPHVFRNAYPVAPQP--QYPIAQYPV-----APPM------SIPPNAYALPPAYPLPPRGGYAV 62 m_incognita_manual_Minc16028+Minc16029 1 ---MNQ--GSGNEK-QEEN------STDLNNTKS----IAGLD------DSTLIFCGAPNSGKSTLLS 46 m_incognita_manual_Minc15725 1 ---MNQ--GLGNEK-QEEN------STDLNNAKS----IAGLD------DSTLIFCGAPNSGKSTLLS 46 t_spiralis_manual_EFV50050 1 ----MDIFQLVEENSSKENKASSPIK--ASVRSDDEN------ESILILCGSKNSDKSTLVS 50 b_xylophilus_wormbase_BUX.s00713.1091 1 -MAFVDIFEKGIEA-LKKQKEESAQE--KDRFKDEFN------KIRTIFVCGMAKSGKTSFIN 53 s_ratti_wormbase_SRAE_2000400300 1 ----MNIWELAEQK-IQENKSKRNNN--NSKNDTYSF-----HKDP------QEPIESYIIIAGYPKTGKTTYLN 57 p_redivivus_wormbase_g7439.t1 1 ---MPDIFNLALQK-LREKDAQHQES--SDKIGSNFN------NSTGDSTIIICGSEDCGKSSLIQ 54 p_exspectatus_wormbase_scaffold451-EXSNAP2012.6 1 --MDVDLWDLAKKK-IEEKEEAKRKG--ETELAEEEE-----RTSW------KDERTLLICGGKSSGKTSTIL 57 p_pacificus_wormbase_PPA21711 1 --MDVDLWDLAKKK-IAEKEEAKRKG--ETELAEEEE-----RTSW------KDERTLLICGGKSSGKTSTIL 57 a_suum_gemoma_GS_16646 1 ---MVDIWTLAKER-LNASQRKL-----DMNVGDFLE-----ESKF------ERRHESYILICGSKNCGKSQMIL 55 b_malayi_wormbase_Bm3149b 1 ---MLDIWSLARQQYLEECEKRTVRQ--QDAINSYIN----LIGSD------LQTSDTRLIICGSRNCGKTSMIF 60 d_immitis_wormbase_nDi.2.2.2.t01309 1 ---MVDIWLLARQQYLEESQKRTARQ--YQNATSYINPAAIVAGSN------LQTSDTRLVVCGSKNCGKTSMIL 64 l_loa_wormbase_EFO16364.2 1 ---MLDIWSLARQQRLEECRKRVAQQQQQDTITPYNS---LIIGSD------LQASDTRLIICGSKNCGKTSMIL 63 o_volvulus_wormbase_OVOC7272 1 ---MLDIWSLARQQYLEESRKR------QQTATSYTN----LIGSD------SQTSDTRLVVCGSKNCGKTSVIL 56 o_volvulus_wormbase_OVOC7345 1 ---MLDIWSLARQQYLEESRKR------QQTATSYTN----LIGSD------SQTSDTRLVVCGSKNCGKTSVIL 56 c_japonica_genblastg_CJA41183 1 ----MNIWELSKQK-LAEYKQKAAEL--EQKLDDENG-----AGNDNNYLS EIRRRHESHIVIAGNRKSGKSSFQM 64 c_elegans_F02D8.3 1 ----MNIWDLAKQK-LVENKQRAAEL--KQKLDDENG-----TQSD-NYLS EVRRRHESHIIFAGNRKSGKSSFML 63 c_briggsae_wormbase_CBG11597 1 ----MNIWDLAKQK-LVENKQRAAEL--EQKLDDENG-----AVND-NYLA EVRRRHESHIIIAGNRKCGKSSFML 63 c_remanei_genblastg_CRE04951 1 ----MNIWDLAKQK-LVENKQRAAEL--EQKLDDENG-----VATD-NYLA EVRRRHESHIIIAGNRKCGKSSFML 63 c_sinica_genblastg_Csp5_scaffold_00817.g15749.t1 1 ----MNIWDLAKQK-LVENKQRAAEL--EQKLDDENG-----AATD-NYLA EVRRRHESHIIIAGNRKCGKSSFML 63 c_tropicalis_gemoma_F02D8.3_ortholog 1 ----MNIWDLAKQK-LVENKQKAAEL--EHKLDDENG-----APSD-NYLS EVRRRHESHIIIAGNRKSGKSSFML 63 c_brenneri_wormbase_CBN31685 1 ----MNIWDLAKQK-LVENKQKAAEL--EQKLDDENG-----STTD-NYLA EVRRRHESHIIIAGNRKSGKSSFML 63 h_bacteriophora_genblastg_F02D8.3_ortholog 1 ----MNIWDLAYDK-LRENETKMEKL--EESLEEVGS-----NAEM------RKRHISHIIICGSAHSGKSSLVG 57 h_contortus_wormbase_HCOI00579500.t1 1 ----MDIWSLAKEK-LRENEEQSLKS--GERSDEKSS-----QGPQ------QRSSTHILICGNSQSGKSTLVN 56 h_contortus_wormbase_HCOI02107700.t1 1 - - - -MDIWSLAKEK-LRENEEQSLKA- -GERMDEKSS- - - - -QGPQ------QRSSTHILICGNSQSGKSTLVN 56 a_ceylanicum_wormbase_Acey_s0007.g3412.t1 1 - - -MMDMWAMAKEK-LKEKAVETATL- -DGKMDERTS- - - - -DGSP------RKTTSHVVICGVSQSGKSALVN 57 n_americanus_manual_F02D8.3_ortholog 1 - - -MLDMWELAKRK- IRENDDKTAKF- -NEAMDEKAD- - - - -DGAP------RRTTTHIVMCGVSQSGKSTLVN 57 c_angaria_wormbase_Cang_2012_03_13_01212.g15992.t1 63 PPPVYVT------PPTYPTPPPTYPTPPPT--YPTPPPPRCFQNGH ------100 m_incognita_manual_Minc16028+Minc16029 47 R---FIEFGGAPSTSFRQQSTEKSIALEYKFVNKILRGNNRQLVHCWELAGGGSMTN------100 m_incognita_manual_Minc15725 47 R---FIEFGGAPSTSFRQQSTEKSIALEYKFVNKILRGNNRQLVHCWELAGGGSMTN------100 t_spiralis_manual_EFV50050 51 K---FLE------REEQQYQICAMEYTFARWN-AGNEKKIGHIYQIEDGLDFINLFKSIR 100 b_xylophilus_wormbase_BUX.s00713.1091 54 R---YFD------RNENSEPTFGLSYRFATKT-RVNTKELVEFWELGNSALLSSLTS--- 100 s_ratti_wormbase_SRAE_2000400300 58 R---FLK------NEATINPTTILEYSFGYRH-RDSYKDIIHTWELSI DHNLS------100 p_redivivus_wormbase_g7439.t1 55 R- - -FMD------KSGPTSKT ISMDYNYV IRM-RNNNKEVATVWELGGGANMAKLI---- 100 p_exspectatus_wormbase_scaffold451-EXSNAP2012.6 58 R---FVD------KNENSRPTTALEYLYARRM-RGNVSTFNFNPNLCI MTKQL------100 p_pacificus_wormbase_PPA21711 58 R---FVD------KNESSRPTTALEYLYARRM-RGATKQLCHVWELGGGTRLN------100 a_suum_gemoma_GS_16646 56 R---FLD------RNENTKPTIALEYTYGRRT-RGTVKDVAHIWELGGGISLINL----- 100 b_malayi_wormbase_Bm3149b 61 R---FLN------RNEDTKPTIALEYTYGRRN-RGKMKDVGNIWELGGGT------100 d_immitis_wormbase_nDi.2.2.2.t01309 65 R---FLN------RNEDIRPTIALEYTYGRRN-RGKVKDVGHIWEL------100 l_loa_wormbase_EFO16364.2 64 R---FLN------RNEDARSTIALEYMYGRRN-RGKVKDVGHIWELG------100 o_volvulus_wormbase_OVOC7272 57 R---FLN------RNDDAKSTIALEYTYGRRN-RGKVKDVGHIWELGGGTNLCN------100 o_volvulus_wormbase_OVOC7345 57 R---FLN------RNDDAKSTIALEYTYGRRN-RGKVKDVGHIWELGGGTNLCN------100 c_japonica_genblastg_CJA41183 65 N---FVE------RKEELKESVGLEYSYARRT-RGNVKDVANLWEL------100 c_elegans_F02D8.3 64 N---FLE------RKEDLKDSVGLEYTYARRT-RGNVKDIANLWELG------100 c_briggsae_wormbase_CBG11597 64 N---FLE------RKEDLRESVGLEFTYARRT-RGNVKDVANLWELG------100 c_remanei_genblastg_CRE04951 64 N---FLE------RKEDLKESVGLEFTYARRT-RGNVKDVANLWELG------100 c_sinica_genblastg_Csp5_scaffold_00817.g15749.t1 64 N---FLE------RKEDLKESVGLEFTYARRT-RGNVKDVANLWELG------100 c_tropicalis_gemoma_F02D8.3_ortholog 64 N---FLE------RKEDLKDSVGLEFTYARRT-RGNVKDVANLWELG------100 c_brenneri_wormbase_CBN31685 64 N---FLE------RKEDLKDSVGLEFTYARRT-RGNVKDVANLWELG------100 h_bacteriophora_genblastg_F02D8.3_ortholog 58 K---FLD------RSEEPKETIALEYTYARRT-RGRNKDVCHIWELGGAANLV------100 h_contortus_wormbase_HCOI00579500.t1 57 K---FLD------KNEEAKETIALEYVYARRT-RGNNKDVCHIWELASGTKLAQ------100 h_contortus_wormbase_HCOI02107700.t1 57 K---FLD------KNEEAKETIALEYVYARRT-RGNNKDVCHIWELASGTKLAQ------100 a_ceylanicum_wormbase_Acey_s0007.g3412.t1 58 K---FLD------RNEEPKETTALEYIYARRT-RGNHKDVCHIWELGGGTMFA------100 n_americanus_manual_F02D8.3_ortholog 58 K---FLD------RNEEPKETTALEYIYARRT-RGNHKDVCHIWELGGGTKFA------100

Figure 3.40: Multiple sequence alignment of first 100a.a. of xbx-1 orthologs. Among 25 nematode genomes, 26 xbx-1 orthologs are found, and 2 are not found. Note: H. contortus, M. incognita, and O. volvulus contain two xbx-1 genes. Both genes in H. contortus and O. volvulus have high confidence 5’ start sites, and neither M. incognita gene has a high confidence 5’ start site.

3.3.33 Summary of gene annotation efforts

We present a summary of our gene annotation efforts for 32 ciliary genes. At this stage, we have a list of genes that have high-confidence 5’ start sites but do not have typical X-box motifs in their promoters. Some genomes contain a disproportionate number of duplicated or missing ciliary genes. The T. spiralis, M. incognita, T. suis, and M. hapla genomes are missing the highest number of ciliary genes, containing 12, 12, 10, and 8 missing genes, respectively. In addition, most of the gene duplications occur in the H. contortus, M. incognita, and C. brenneri genomes, which contain 7, 6, and 5 duplicated genes, respectively. This may be an indication that these genomes are generally of worse sequencing or assembly quality compared to the other species studied. Table 3.34: Summary of gene annotation of ciliary gene orthologs

Gene Name # high quality # genes with well- # typical motifs1 # promoters with- genes defined start sites out typical motifs

arl-6 26 23 17 4 bbs-1 26 21 17 4 bbs-2 25 23 19 5

131 Summary of gene annotation of ciliary gene orthologs

Gene Name # high quality # genes with well- # typical motifs # promoters with- genes defined start sites out typical motifs

bbs-4 25 19 17 2 bbs-5 26 25 25 1 bbs-8 27 25 21 7 bbs-9 19 18 17 1 che-2 25 22 17 8 che-11 24 15 18 0 che-13 24 21 16 5 dyf-1 26 22 19 2 dyf-2 24 15 10 6 dyf-3 24 22 16 6 dyf-5 29 25 22 4 dyf-11 24 21 20 2 dyf-13 24 20 19 2 dyf-18 25 21 19 3 dylt-2 20 19 20 0 ift-20 23 22 20 2 ifta-1 21 15 17 1 mks-1 11 8 5 2 mks-6 9 5 6 0 mksr-1 23 21 20 1 mksr-2 24 23 18 4 nphp-2 23 18 4 11 odr-4 19 14 9 5 osm-1 27 26 26 1 osm-5 24 21 21 3 osm-6 27 26 18 7 osm-12 25 21 16 6 tub-1 25 14 15 2 xbx-1 26 23 22 3

3.4 Tracing the missing ciliary genes through gene model reconstruc- tion using RNA-seq analysis

We failed to curate a few ciliary genes in some nematode genomes. Because of the high conservation of ciliary genes in metazoans, we hypothesize that these genes are missing because the corresponding genome assembly is incomplete. To test this hypothesis, we attempted to identify transcript informa- tion for these missing genes through searching RNA-seq libraries and constructing the gene models. First, we demonstrated that we are able to fully construct gene models (i.e., osm-5) using RNA-seq library data for genes that are successfully curated.

We used TBLASTN to search for the osm-5 protein sequence in RNA-seq reads. We initially searched for the C. elegans osm-5 protein sequence in C. elegans RNA-seq reads, to establish that this method is effective despite the difficulties of searching in short sequences. The C. elegans RNA-seq library was prepared by Jun Wang, and contains 250bp paired-end reads. We used TBLASTN with

1Total number of X-box motifs (i.e. genes may have more than one X-box motifs)

132 an e-value threshold of 1 and 10 (default). Next, we used the C. briggsae osm-5 protein sequence to search in C. elegans RNA-seq reads. A manually revised C. briggsae osm-5 gene model was used, based on RNA-seq splicing junction data and gene predictions from genBlastG. TopHat was used to align the short reads to the genome to compare reads identified by TBLASTN and reads aligned to the gene region in the current assembly.

Finally, we searched for osm-5 in M. incognita RNA-seq reads using the C. elegans protein sequence as the query. We used the SRR797067 library, which contains 73-75 bp paired-end reads, and ran TBLASTN with default settings.

Simple genome assemblies were produced by aligning reads to their corresponding C. elegans refer- ence coordinates, and a consensus sequence was constructed by using the most common amino acid at each position of the assembly. We used the EMBOSS implementation of Needleman-Wunsch for pairwise alignments, and MUSCLE for multiple sequence alignments (MSAs).

3.4.1 Reconstructing C. elegans osm-5 using C. elegans RNA-seq reads

Initially, we demonstrate that we can use the C. elegans osm-5 protein sequence to reconstruct a full- length osm-5 gene using C. elegans RNA-seq reads. Using an e-value threshold of 1, in C. elegans, 178 reads were identified by TBLASTN, and 95 were mapped to the osm-5 gene location of the C. elegans genome sequence. All of the 95 reads mapped to the osm-5 gene location were identified by TBLASTN, indicating that this method has high sensitivity.

It is evident that the e-values determined by BLAST are dependent on the length of reads used. This particular library contains 250bp reads, which results in alignment with low sequence similarity when the e-value is greater than 1. All of the reads mapped to the osm-5 gene region by TopHat were identified by BLAST, indicating high sensitivity. An additional 83 reads were identified by BLAST, which were unmapped by TopHat. It is notable that some BLAST alignments contain stop codons; this may be due to sequencing error in the reads.

The reconstructed osm-5 sequence (shown in Figure 3.41 and 3.42), is full-length and contains 3 matches.

133 >c_elegans_osm-5_reconstructed MANSTFREDDDDFYGGFFSYDKAYDIQNITQNPQFQQAVARSSHGRRPTASQMGFRDASSSYGKPPGTM MGNQSRMGGRTAMANNNEPARPMTAVRGAGYTSFANKVQAAERPLSTENSGENGEEKCRQMENKVMEML RESMLASEKKKFKEALDKAKEAGRRERAVVKHREQQGLVEMMNLDLTFTVLFNLAQQYEANDMTNEALN TYEIIVRNKMFPNSGRLKVNIGNIHFRKREFTKALKYYRMALDQVPSIQKDTRIKILNNIGVTFVRMGS YDDAISTFDHCVEENPNFITALNLILVAFCIQDAEKMREAFVKMIDIPGFPDDDYMKEKDDDDVLLNQT LNSDMLKNWEKRNKSDAEKAIITAVKIISPVIAPDYAIGYEWCLESLKQSVHAPLAIELEMTKAGELMK NGDIEGAIEVLKVFNSQDSKTASAAANNLCMLRFLQGGRRLVDAQQYADQALSIDRYNAHAQVNQGNIA YMNGDLDKALNNYREALNNDASCVQALFNIGLTAKAQGNLEQALEFFYKLHGILLNNVQVLVQLASIYE SLEDSAQAIELYSQANSLVPNDPAILSKLADLYDQEGDKSQAFQCHYDSYRYFPSNLETVEWLASYYLE TQFSEKSINYLEKAALMQPNVSKWQMMIASCLRRTGNYQRAFELYRQIHRKFPQDLDCLKFLVRIAGDL GMTEYKEYKDKLEKAEKINQLRLQRESDSSQGKRHSANSTHSLPPSGLTGLGSGSGGSSGGGTRQYSAH VPLLLDSGTPFTVAQRDMKAEDFSYDDPVAISSRPKTGTRKTTTDTNIDDFGDFDDSLLPD

Figure 3.41: C. elegans osm-5 protein sequence constructed from RNA-seq reads

134 Figure 3.42: Pairwise alignment between C. elegans osm-5 reference sequence and sequence recon- structed from RNA-seq reads

3.4.2 Reconstructing C. elegans osm-5 using a C. briggsae query

Using the C. briggsae osm-5 (CBG02013) sequence as the reference, we were able to reconstruct most of the C. elegans osm-5 gene model. Note: since we are assembling the reads based on the query coordinates, this gene model is 818a.a., corresponding to the length of CBG02013, instead of 820a.a. as in the previous section.

170 reads were identified by TBLASTN (e-value ≤ 1) and 95 reads were mapped to the osm-5 gene region. All of the 95 reads mapped to the osm-5 gene region were found by TBLASTN. Fewer reads were identified compared to using a C. elegans query. The 75 reads not mapped to the osm-5 region were all unmapped by TopHat. 28 of the reads identified by TBLASTN contain stop codons when translated.

The reconstructed sequence is shown in Figure 3.43, and the alignment with the C. elegans osm- 5 reference sequence is shown in Figure 3.46. This time, the 5’ end of the sequence can not be recovered, and the alignment contains 10 mismatches in the 3’ end of the gene as well as 2 gaps. A multiple sequence alignment of C. elegans osm-5, C. briggsae osm-5, and the reconstructed osm-5 sequence using the C. briggsae query is shown in Figure 3.47.

To investigate why the 5’ end of osm-5 could not be recovered, we explored the possibilities that the 5’ end of C. briggsae osm-5 may be defective, or the C. briggsae and C. elegans osm-5 genes may be divergent at the 5’ end. Figure 3.44 shows the 5’ gene model of C. briggsae osm-5. The RNA-seq data suggests revisions to the first exon, but the very 5’ end of the gene seems to be well-supported. (The revised gene model was used in this analysis.) In addition, a 5’ excerpt of an alignment between the C. elegans and C. briggsae osm-5 genes is shown in Figure 3.45. From this alignment, it does not appear that the first 22 amino acids are significantly less conserved compared to the rest of the protein.

135 >c_elegans_osm-5_reconstructed_cbg_query_evalue1 XXXXXXXXXXXXXXXXXXXXXXKAYDIQNITQNPQFQQAVARSSHGRRPTASQMGFRDASSSYGKPPGT MMGNQSRMGGRTAMANNNEPARPMTAVRGAGYTSANKVQAAERPLSTENSGENGEEKCRQMENKVMEML RESMLASEKKKFKEALDKAKEAGRRERAVVKHREQQGLVEMMNLDLTFTVLFNLAQQYEANDMTNEALN TYEIIVRNKMFPNSGRLKVNIGNIHFRKREFTKALKYYRMALDQVPSIQKDTRIKILNNIGVTFVRMGS YDDAISTFDHCVEENPNFITALNLILVAFCIQDAEKMREAFVKMIDIPGFPDDDYMKEKDDDDVLLNQT LNSDMLKNWEKRNKSDAEKAIITAVKIISPVIAPDYAIGYEWCLESLKQSVHAPLAIELEMTKAGELMK NGDIEGAIEVLKVFNSQDSKTASAAANNLCMLRFLQGGRRLVDAQQYADQALSIDRYNAHAQVNQGNIA YMNGDLDKALNNYREALNNDASCVQALFNIGLTAKAQGNLEQALEFFYKLHGILLNNVQVLVQLASIYE SLEDSAQAIELYSQANSLVPNDPAILSKLADLYDQEGDKSQAFQCHYDSYRYFPSNLETVEWLASYYLE TQFSEKSINYLEKAALMQPNVSKWQMMIASCLRRTGNYQRAFELYRQIHRKFPQDLDCLKFLVRIAGDL GMTEYKEYKDKLEKAEKINQLRLQRESDSSQGKRHSANSTHSLPPSGLTGLGSGSGGSSGGGRQYSAHV PLLLDSGTPFTVAQRDMKAEDFSYDPPAAISSPPKTTRKTTTTDNIIDFFGDDDSLLPD

Figure 3.43: C. elegans osm-5 protein sequence constructed from RNA-seq reads, using C. briggsae osm-5 as the query

Figure 3.44: 5’ end of C. briggsae osm-5 gene model, showing minor revisions to the end of the first exon supported by RNA-seq splicing junctions

Figure 3.45: 5’ excerpt of pairwise alignment between C. elegans (Y41G9A.1) and C. briggsae (CBG02013) osm-5 sequences

136 Figure 3.46: Pairwise alignment between C. elegans osm-5 reference sequence and reconstructed osm-5 from a C. briggsae osm-5 query, using TBLASTN e-value ≤ 1

137 Figure 3.47: Multiple sequence alignment between C. elegans osm-5 reference sequence, C. brig- gsae osm-5 reference sequence, and reconstructed osm-5 from a C. briggsae osm-5 query, using TBLASTN e-value ≤ 1

Using a higher e-value threshold (≤ 10) did not recover the 5’ end of the gene, and resulted in a consensus sequence with the same number of gaps while causing a higher number of mismatches. Although this method demonstrates that we are able to recover gene models by assembling RNA-seq reads, the resulting protein sequence is not completely accurate and may not be full-length.

3.4.3 Case study: Reconstruction of M. incognita osm-5

Using the C. elegans osm-5 query sequence, we were able to reconstruct fragments of the osm- 5 gene from the M. incognita RNA-seq library, as shown in Figure 3.48. 43.7% of the 818a.a.

138 >m_incognita_osm-5_reconstructed XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXGLRTGMPSELARPMTAIRGAGYSSAGRKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXY QTIVKNRSYANGHRLKINIGNIYFRKKEYTKALKYYRMALDQVPKVHQRMRAKILSNIGVAMVRLGRYE DALSSFEXXXXXXADYSTALNLIMAAYCLGNEEKMRESFQRLVDIPXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXILLAAKIISQAIAPSFSEGYAWCVECIKHSIYATLATELEMNKXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKLEDALQYCEQALSLDRYNSNALVNRGNIHFX XXXXXXALQCFREALQVDSGCIQAIYNSGLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XEHSQAIELFTQASTLAPTDPSILERLAXXXXXXXXXXXXXXXXFDSFHHFPSNISLIKWLGNYYMSAH FSEKAVLYFEKAALMEPNNPEWPMLTAGCQRRSGNFQRALEIYKQVHRRFPENVECLKFLVQLSKELRL TXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Figure 3.48: M. incognita osm-5 protein sequence constructed from RNA-seq reads, using C. elegans osm-5 as the query query sequence were recovered; the amount of query sequence that can be reconstructed is highly dependent on the RNA-seq library used. This resulting sequence shares 20% similarity with the C. elegans query (figure 3.49), which is understandably low considering the large number of gaps. Nevertheless, this result shows some evidence that M. incognita does have an osm-5 gene that is not present in the genome assembly.

Figure 3.49: Pairwise alignment between C. elegans osm-5 and the reconstructed M. incognita osm-5

Overall, our analysis indicates that it is possible to reconstruct protein sequences using RNA-seq libraries. This method effectively demonstrates that orthologs we failed to identify in the current genome assembly do show evidence of being present in the genome. We were able to reconstruct a

139 nearly full-length C. elegans osm-5 protein sequence from RNA-seq reads using C. briggsae osm-5 as the query, and we were able to partially reconstruct M. incognita osm-5 from RNA-seq reads using C. elegans osm-5 as the query.

3.5 Discussion

In this chapter, we have curated ciliary genes in 25 nematode species using a bioinformatics pipeline discussed in Chapter 2. This pipeline involves homology-based ortholog identification using InPara- noid as well as gene annotation tools genBlastG and GeMoMa. We improved and evaluated the 5’ start site annotation for each gene, requiring that 5’ protein sequences be conserved. We also make use of RNA-seq data when available, and RNA-seq intron junctions provide support for the accuracy of the first exon. We were able to identify orthologs for each ciliary gene in most species, and each gene is well-conserved at the 5’ end. Our analysis suggests that all nematode genomes have all of these ciliary genes. For the ones that do not have ciliary gene orthologs, they are most likely due to technical reasons such as genome assembly defects.

140 Chapter 4

Identification and comparative analysis of X-box motifs in nematodes

4.1 Introduction

In order to study variations in the properties of X-box motifs, we aim to identify candidate X-box motifs for ciliary genes in 25 nematode species. We used ciliary genes identified in the previous chapter, and applied our X-box motif search for genes with high-quality 5’ start site annotations, defining the promoter region as 2kb upstream of the ATG. We searched for X-box motifs based on properties of validated X-box motifs from 32 ciliary genes in C. elegans. For each gene, we first attempt to search for typical X-box motif candidates; that is, motifs that are found by HMMER. These motifs are 14bp and tend to have highly similar sequences to known X-box motifs. Genes that do not contain typical X-box motifs in their promoters are then subjected to additional search methods. These include searching for 6bp or 7bp half-motifs with flexible nucleotides in between, searching using regular expressions (“regex”) published by Efimenko et al. (2005), and if necessary, manual inspection of promoter sequences. We present results for each ciliary gene, detailing candidate X- box motif sequences, lengths, and locations. X-box motifs identified in this study are marked with a dagger (†).

We make an effort to evaluate X-box motifs by calculating a score based on similarity to X-box motifs from the 32 C. elegans ciliary genes we initially chose. We generated a position specific scoring matrix (PSSM) based on these X-box motifs, shown in Table 4.1, where each column represents nucleotide frequencies at each position. The score for each X-box motif is the sum of the score at each position. For 13bp and 15bp X-box motifs, we use a gap penalty of 0 and gap positions generated in multiple sequence alignments (tables in Section 4.2). The maximum possible score can

141 be obtained by adding the maximum value in each column, and is 369. Most typical X-box motifs have a score higher than 300, and most atypical X-box motifs have a score over 200.

Table 4.1: Position specific scoring matrix used to score X-box motifs

1 2 3 4 5 6 7 8 9 10 11 12 13 14

A 3 0 2 7 0 0 30 1 9 5 3 28 32 0 C 0 1 9 3 28 28 0 0 1 0 20 0 0 30 G 29 0 0 9 0 0 1 1 22 27 2 2 0 0 T 0 31 21 13 4 4 1 30 0 0 7 2 0 2

4.2 Identification of putative X-box motifs in nematodes

4.2.1 X-box motifs in the arl-6 promoter

Typical X-box motif candidates were found in the promoters of 16 out of 23 arl-6 orthologs (Table 4.2). Of the remaining genes, 3 were eliminated from further X-box motif searches because their promoter sequences contained gaps (Ns) and we found atypical X-box motif candidates for the other 4 genes. The locations of the X-box motif candidates are not conserved, appearing scattered over the 2kb promoter region (4.1). The sequences of both typical and atypical X-box motif candidates are similar to the known X-box consensus, although the LOGOs of atypical X-box motif candidates is more susceptible to noise due to the limited number of sequences (Figure 4.2). In addition, 4 of the candidate atypical X-box motifs are 15bp instead of the more common 14bp length. Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.3). Although there may be other cis-regulatory elements near the X-box motif such as C-box motifs (Burghoorn et al., 2012), since we generated the LOGO directly from the promoter sequence, these elements will not appear unless their location relative to the X-box motif is highly conserved.

142 Table 4.2: Alignment of X-box motifs in the arl-6 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTACCA-TAGAAAC† -575 333 typical (HMMER) C. tropicalis GTTGCCA-TGGCAAT† -639 337 typical (HMMER) C. brenneri GTCGTCA-TGGTTAC† -855 290 typical (HMMER) C. sinica GTCACTA-TGGAAAC† -709 310 typical (HMMER) C. briggsae GTTTCTA-TGGTAAC† -1252 332 typical (HMMER) C. elegans GTCGCTA-TGGGAAT -784 283 typical (HMMER) C. elegans GTTTCCA-TGGTTAC† -1140 330 typical (HMMER) C. japonica GTCTCCA-TAACCAC† -1975 294 typical (HMMER) C. japonica GTAACCA-TGGCAAC† -526 344 typical (HMMER) H. bacteriophora GTTGCCA-TGACAAC† -1091 343 typical (HMMER) H. contortus GTCGCTTACGTCAAC† -436 243 atypical (TFM-Scan 6bp motif) HCOI02112500.t1 H. contortus GTCGCTTATAGGAAC† -54 269 atypical (TFM-Scan 6bp motif) HCOI02112500.t1 H. contortus GTCGCTGACGTCAAC† -792 243 atypical (TFM-Scan 6bp motif) HCOI01745600.t1 A. ceylanicum GTGACGA-TCAGGAC† -1976 227 atypical (manual inspection) A. ceylanicum GTCTGGA-TCTAAAC† -1190 236 atypical (manual inspection) 143 P. pacificus GTTGCCAGTGGCAAC† -182 365 atypical (TFM-Scan 6bp/7bp motif) S. ratti GTAACCA-TAGTAAC† -75 318 typical (HMMER) P. redivivus GTTTCCA-TGGCAAC† -180 369 typical (HMMER) B. xylophilus GTTTCCA-TGGAAAT† -91 324 typical (HMMER) M. hapla TTCACCA-TAGCAAC† -413 309 typical (HMMER) A. suum GTTGCTA-TGGTGAC† -1946 302 typical (HMMER) D. immitis GTTGTCA-TGGAAAC† -656 324 typical (HMMER) O. volvulus GTTGTCA-TGGAAAC† -494 324 typical (HMMER) B. malayi GTTGTCA-TGGCAAC† -738 341 typical (HMMER) T. suis ATTACCC-TGGTAAC† -1486 294 typical (HMMER) Figure 4.1: Distribution of X-box motifs in the arl-6 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 19 input sequences, in- tifs (generated from 2 input se- tifs (generated from 4 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.2: Sequence logos of arl-6 X-box motifs.

Figure 4.3: LOGO depicting aligned arl-6 X-box motifs and 30bp of flanking promoter sequence.

4.2.2 X-box motifs in the bbs-1 promoter

Typical X-box motif candidates were found in the promoters of 16 out of 21 bbs-1 orthologs (Table 4.3). Of the remaining genes, 1 was eliminated from further X-box motif searches because its pro- moter sequence contained gaps (Ns) and we found atypical X-box motif candidates for the other 4 genes. The locations of the X-box motif candidates are somewhat conserved, with some clustering around the -100 to -300 region (4.4). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motif candidates show some similarity to the consensus, with more noise due to the limited number of sequences (Figure 4.5). In addition, 2 of

144 the candidate atypical X-box motifs have a length of 15bp instead of the more common 14bp. Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.6). Table 4.3: Alignment of X-box motifs in the bbs-1 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTGCC-ATGGAAAC† -125 348 typical (HMMER) C. tropicalis GTTTCC-ATAGCAAC† -111 356 typical (HMMER) C. brenneri ATCACC-ATGGCAAC† -754 325 typical (HMMER) C. sinica ATCCCC-ATAGCAAC† -877 308 typical (HMMER) C. sinica GTTGTC-ATAGCGAC† -292 302 typical (HMMER) C. briggsae GTTGTT-ATGGTAAC -311 304 typical (HMMER) C. elegans GTTCCC-ATAGCAAC -100 346 typical (HMMER) C. japonica GTTTCT-ATAGCAAC† -528 332 typical (HMMER) C. angaria GTAACC-ATGGCAAC† -68 344 typical (HMMER) H. bacteriophora GTTGCTAAGGGTAAC† -509 299 atypical (TFM-Scan 6bp/7bp motif) A. ceylanicum GTTACC-AGGGTTAC† -1257 295 typical (HMMER) S. ratti GTAGTT-ATGGCAAC† -100 298 typical (HMMER) P. redivivus GTTGTC-AAGGTGGT† -269 213 atypical (relaxed consensus regex) g8979.t1 P. redivivus GTTTGA-CTCCTAAC† -66 222 atypical (manual inspection) g16322.t1 P. redivivus GTTTCA-ATGATGAC† -1565 280 atypical (manual inspection) g16322.t1 B. xylophilus GTTGCC-ATGGAGAC† -122 322 typical (HMMER) M. incognita GTTACT-ATGACTAC† -92 291 typical (HMMER) M. hapla ACTACT-TTGACAAC† -165 232 atypical (relaxed consensus regex) M. hapla GTTCCTACTGATGAC† -184 244 atypical (TFM-Scan 6bp/7bp motif) A. suum GTTGTC-ATGGATAC† -1149 298 typical (HMMER) D. immitis ATTACC-ATGGTAAC† -344 324 typical (HMMER) O. volvulus ATTACC-ATGGTAAC† -265 324 typical (HMMER) B. malayi ATTACC-ATGGTAAC† -291 324 typical (HMMER) L. loa ATCACC-ATAGTAAC† -293 299 typical (HMMER)

145 Figure 4.4: Distribution of X-box motifs in the bbs-1 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 18 input sequences, in- tifs (generated from 4 input se- tifs (generated from 2 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.5: Sequence logo of bbs-1 X-box motifs.

Figure 4.6: LOGO depicting aligned bbs-1 X-box motifs and 30bp of flanking promoter sequence.

4.2.3 X-box motifs in the bbs-2 promoter

Typical X-box motif candidates were found in the promoters of 18 out of 23 bbs-2 orthologs (Table 4.4). We found atypical X-box motif candidates for the other 4 genes. The locations of the X-box motif candidates are mostly conserved, with visible clustering around the -100 region and only a few outliers (4.7). The sequences of typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motif candidates show some similarity to the consensus, with

146 more noise due to the limited number of sequences (Figure 4.8). In addition, 6 of the candidate atypical X-box motifs are 15bp instead of the more common 14bp length. Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.9). Table 4.4: Alignment of X-box motifs in the bbs-2 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTGCCA-TGGTGAC† -120 326 typical (HMMER) C. tropicalis GTTTCCA-TGACAAC† -102 347 typical (HMMER) C. brenneri GTTTCCA-TGGAGAC† -93 326 typical (HMMER) C. sinica GTCTCCA-TGACAAC† -106 335 typical (HMMER) C. briggsae ATATCCA-TGGCAAC -83 324 typical (HMMER) C. elegans GTATCCA-TGGCAAC -94 350 typical (HMMER) C. japonica GTCTCCA-TGACAAC† -80 335 typical (HMMER) C. angaria GTTTCCA-TAGCAAC† -73 356 typical (HMMER) H. bacteriophora ATCACTA-AAATAAC† -861 224 atypical (relaxed consensus regex) H. bacteriophora GTTGGCA-TAGCAAT† -1462 296 atypical (TFM-Scan 6bp/7bp motif) H. bacteriophora GTTGTCAACGGAGAC† -1481 268 atypical (TFM-Scan 6bp motif) H. contortus GTCGTCA-TGGCGAC† -291 303 typical (HMMER) A. ceylanicum GTTTTCA-TAGCAAG† -678 302 typical (HMMER) A. ceylanicum GTTACCA-TAGCGAC† -1729 324 typical (HMMER) N. americanus GTTGCTA-TGGTAAC† -1440 328 typical (HMMER) P. pacificus GTTTCCA-TAGAAAC† -72 339 typical (HMMER) P. exspectatus GTTGTCA-TAGAAAC† -71 311 typical (HMMER) S. ratti GTAACCA-TGACAAC† -116 322 typical (HMMER) P. redivivus GCAGCCA-TGGCAAC† -92 316 typical (HMMER) B. xylophilus ATTGTCA-TGACAAC† -51 293 typical (HMMER) M. incognita GTTACCA-TGGAAAT† -94 318 typical (HMMER) M. hapla GTTACCA-TGGAAAT† -97 318 typical (HMMER) A. suum GTTGCCATTGGAAAC† -68 348 atypical (TFM-Scan 6bp/7bp motif) O. volvulus ATTACTA-AAATGAC† -645 210 atypical (relaxed consensus regex) O. volvulus GTTACCCTTGACAAC† -140 311 atypical (TFM-Scan 6bp/7bp motif) B. malayi GTGTCCATTGGTTAC† -73 309 atypical (TFM-Scan 6bp/7bp motif) B. malayi GTTACCCTTGGCAAC† -63 333 atypical (TFM-Scan 6bp/7bp motif) L. loa GTTTCCCTTGACAAC† -64 317 atypical (TFM-Scan 6bp/7bp motif) T. suis GTTACTA-TGACGAC† -70 291 typical (HMMER)

147 Figure 4.7: Distribution of X-box motifs in the bbs-2 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 20 input sequences, in- tifs (generated from 3 input se- tifs (generated from 6 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.8: Sequence logo of bbs-2 X-box motifs.

Figure 4.9: LOGO depicting aligned bbs-2 X-box motifs and 30bp of flanking promoter sequence.

4.2.4 X-box motifs in the bbs-4 promoter

Typical X-box motif candidates were found in the promoters of 17 out of 19 bbs-4 orthologs (Table 4.5). We found atypical X-box motif candidates for the remaining 2 genes. The locations of the X-box motif candidates are somewhat conserved, with visible clustering around the -100 region and some outliers (4.10). The sequences of the typical X-box motif candidates are similar to the known X-box consensus (Figure 4.11). C. briggsae has two bbs-4 orthologs, and we identified an atypical X-box motif candidate for each of these genes. Sequence LOGOs were not generated for atypical X-box motif candidates due to only having one 13bp motif and one 14bp motif. Lastly, we generated

148 a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.12). Table 4.5: Alignment of X-box motifs in the bbs-4 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTTCCATGGCAAC† -63 369 typical (HMMER) C. tropicalis GTTTCCATGACAAC† -56 347 typical (HMMER) C. brenneri GTTTCCATGACAAC† -60 347 typical (HMMER) C. sinica GTTGTCATGACAAC† -73 319 typical (HMMER) C. briggsae GTCGTCGAGG-AAC† -352 251 atypical (TFM-Scan 6bp motif) CBG10029 C. briggsae GTTGGCATAGTTAC† -74 285 atypical (TFM-Scan 7bp motif) CBG09893 C. elegans GTTTCCATGGCAAC† -65 369 typical (HMMER) C. japonica GTCACCATAGCAAC† -75 338 typical (HMMER) C. angaria GTTGCCATGGTTAT† -75 298 typical (HMMER) H. bacteriophora GTTACCATAGTTAT† -331 283 typical (HMMER) H. contortus GTCTCCATGACGAC† -235 309 typical (HMMER) A. ceylanicum GTCACCATAGAAAC† -1220 321 typical (HMMER) P. pacificus GTAACCATGGAAAC† -81 327 typical (HMMER) P. exspectatus GTAACCATGGAAAC† -85 327 typical (HMMER) B. xylophilus GTTGCCATGGATAA† -154 292 typical (HMMER) M. incognita GTTGTCATAGCAAC† -338 328 typical (HMMER) A. suum ATTGCCATGGAAAC† -1742 322 typical (HMMER) O. volvulus GTATCCATGGTAAC† -726 337 typical (HMMER) B. malayi GTATCCATGGTAAC† -580 337 typical (HMMER) L. loa GTATCCATGGCAGC† -673 318 typical (HMMER)

Figure 4.10: Distribution of X-box motifs in the bbs-4 promoter.

149 Figure 4.11: Sequence logo of bbs-4 typical X-box motifs (generated from 18 input sequences, in- cluding C. elegans motif(s))

Figure 4.12: LOGO depicting aligned bbs-4 X-box motifs and 30bp of flanking promoter sequence.

4.2.5 X-box motifs in the bbs-5 promoter

Typical X-box motif candidates were found in the promoters of 23 out of 25 bbs-5 orthologs (Table 4.6). Of the remaining two genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found an atypical X-box motif candidate for the other remaining gene. The locations of the X-box motif candidates are somewhat conserved, with visible clustering around the -100 region and some outliers (4.13). The sequences of the typical X-box motif candidates are similar to the known X-box consensus (Figure 4.14). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.15). Table 4.6: Alignment of X-box motifs in the bbs-5 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTACCATGACAAC† -83 341 typical (HMMER) C. tropicalis ATCTCCATGACAAC† -65 309 typical (HMMER) C. brenneri GTAACCATGACAAC† -74 322 typical (HMMER) C. brenneri ATTACCATTGCAAC† -1605 315 typical (HMMER) C. sinica GTTTCTATGGCAAC† -84 345 typical (HMMER) C. briggsae GTTACTATGGCAAC -70 339 typical (HMMER) C. elegans GTCTCCATGGCAAC -66 357 typical (HMMER) C. elegans ATTACCATTGCAAC† -1748 315 typical (HMMER) C. japonica GTCGCCATGGCAAC† -64 353 typical (HMMER) H. bacteriophora GTTTCCATGGTAAC† -142 356 typical (HMMER) H. contortus GTTGCTATGGCGAT† -83 287 typical (HMMER) A. ceylanicum GTTGCCATGGATAT† -1616 294 typical (HMMER) N. americanus GTTGCCATAGAAAC† -69 335 typical (HMMER) P. pacificus GTTTCCCTGGCAAC† -191 339 typical (HMMER) P. exspectatus GTTTCCCTGGCAAC† -195 339 typical (HMMER) S. ratti GTTTCTATGGCAAC† -66 345 typical (HMMER) P. redivivus GGTGCCATGGCAAC† -95 334 typical (HMMER) B. xylophilus ATTGCCATGACAAC† -108 317 typical (HMMER) M. incognita GTTGTCTTGGCAAC† -88 312 typical (HMMER) Minc01275 M. hapla GTTACCATGTCAAC† -102 336 typical (HMMER)

150 Alignment of X-box motifs in the bbs-5 promoter

Species Sequence Location Score Type (Method) Comment

A. suum GTTACCATGGTAAC† -1157 350 typical (HMMER) A. suum GTTACCATGGTAAC† -1010 350 typical (HMMER) D. immitis GTTGTCATGGTAAT† -469 300 typical (HMMER) O. volvulus GTTGCCATGGTAAA† -478 322 typical (HMMER) B. malayi GTTGCCATGGTGAA† -383 296 typical (HMMER) L. loa GTTGCCATGGTGAA† -494 296 typical (HMMER) T. spiralis GTTTCTATAGCAAC† -97 332 typical (HMMER) T. suis GTTACC-TAGCAAC† -91 320 atypical (TFM-Scan 6bp motif)

Figure 4.13: Distribution of X-box motifs in the bbs-5 promoter.

Figure 4.14: Sequence logo of bbs-5 typical X-box motifs (generated from 27 input sequences, in- cluding C. elegans motif(s)).

Figure 4.15: LOGO depicting aligned bbs-5 X-box motifs and 30bp of flanking promoter sequence.

151 4.2.6 X-box motifs in the bbs-8 promoter

Typical X-box motif candidates were found in the promoters of 17 out of 25 bbs-8 orthologs (Table 4.7). Of the remaining 8 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found an atypical X-box motif candidate for the 7 remaining genes. The locations of the X-box motif candidates are somewhat conserved, with visible clustering around the -100 to -200 region and some outliers (4.16). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box candidates show some similarity to the consensus (Figure 4.17). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.18).

152 Table 4.7: Alignment of X-box motifs in the bbs-8 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTCTCCA-TAGCAAC† -82 344 typical (HMMER) C. tropicalis GTTTCCA-TAGCAAC† -91 356 typical (HMMER) C. tropicalis GTTACTA-TGGAAGT† -283 262 typical (HMMER) C. brenneri GTTTCC--TAGCAAC† -89 326 atypical (TFM-Scan 6bp motif) CBN05971 C. brenneri GTTTCC--TAGCAAC† -89 326 atypical (TFM-Scan 6bp motif) CBN26405 C. sinica ATCTCCA-TGACAAC† -96 309 typical (HMMER) C. briggsae GTCTCTA-TGGCAAC -74 333 typical (HMMER) C. elegans GTACCCA-TGGCAAC -85 340 typical (HMMER) C. japonica GTCTCCA-TAGAAAC† -223 327 typical (HMMER) H. bacteriophora GTTTCCA-CGGTAAC† -70 326 typical (HMMER) H. contortus GTTTCTA-TAGAGAT† -744 261 typical (HMMER) H. contortus GCTGCCA-TGGTAAC† -270 322 typical (HMMER) A. ceylanicum GCTGCCA-TGGAAAC† -719 318 typical (HMMER) N. americanus GTTGCTA-TGGAAAC† -860 324 typical (HMMER) P. pacificus GTTGCCA-TGTCAAC† -169 338 typical (HMMER) 153 P. pacificus GTATCCA-TGGAGAC† -95 307 typical (HMMER) P. exspectatus GTTGCCA-TGTCAAC† -167 338 typical (HMMER) P. exspectatus GTATCCA-TGGACAC† -95 305 typical (HMMER) S. ratti GTTACCA-TGGAAAC† -61 346 typical (HMMER) P. redivivus ATCACCA-TAGCAAC† -92 312 typical (HMMER) B. xylophilus GTAACTA-TGGTAAC† -75 307 typical (HMMER) M. incognita GTTACCT-AGGAAAC† -69 288 typical (HMMER) M. hapla GTTACCT-AGACAAC† -172 283 atypical (TFM-Scan 6bp/7bp motif, relaxed consensus regex) A. suum GTTGTCA-TGGAGAA† -96 268 typical (HMMER) D. immitis ATTTCC--CATCAAC† -1460 243 atypical (TFM-Scan 6bp motif) D. immitis GTTGCTATTGGTAAC† -74 328 atypical (TFM-Scan 6bp/7bp motif) O. volvulus GTTGCTATTGGAAAC† -60 324 atypical (TFM-Scan 6bp/7bp motif) B. malayi GTCACTG-AAGCAAT† -1452 228 atypical (TFM-Scan 6bp motif) B. malayi ATTGCTG-TGATAAC† -1241 251 atypical (TFM-Scan 6bp motif) B. malayi GTTGCTATTGGAGAC† -76 298 atypical (TFM-Scan 6bp/7bp motif) L. loa ATTACCT-TGGCAAC† -1384 308 typical (HMMER) T. spiralis GTCGCTT-TAACAAC† -1617 265 atypical (TFM-Scan 6bp motif, relaxed/average consensus regex) T. spiralis GTCGCTA--AGCAAC† -50 286 atypical (TFM-Scan 6bp motif) Figure 4.16: Distribution of X-box motifs in the bbs-8 promoter.

(a) Typical X-box motifs (generated from 22 input (b) Atypical 13bp X-box motifs (generated from sequences, including C. elegans motif(s)) 4 input sequences)

(c) Atypical 14bp X-box motifs (generated from (d) Atypical 15bp X-box motifs (generated from 4 input sequences) 3 input sequences)

Figure 4.17: Sequence logo of bbs-8 X-box motifs.

Figure 4.18: LOGO depicting aligned bbs-8 X-box motifs and 30bp of flanking promoter sequence.

154 4.2.7 X-box motifs in the bbs-9 promoter

Typical X-box motif candidates were found in the promoters of 17 out of 18 bbs-9 orthologs (Table 4.8). We found an atypical X-box motif candidate for the remaining gene. The locations of the X-box motif candidates are highly conserved, with visible clustering around the -100 region (4.19). The sequences of the typical X-box motif candidates are similar to the known X-box consensus (Figure 4.20). The location of the atypical X-box motif falls into the expected region, and the sequence is relatively similar to the consensus although with one insertion (15bp motif). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.21). Table 4.8: Alignment of X-box motifs in the bbs-9 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTTCC-ATAGCAAC† -86 356 typical (HMMER) C. tropicalis GTTTCC-ATGGCAAC† -95 369 typical (HMMER) C. brenneri GTTTCC-ATGGAAAC† -75 352 typical (HMMER) C. sinica GTCTCC-ATAGAAAC† -83 327 typical (HMMER) C. briggsae GTCTCC-ATGGCAAC† -85 357 typical (HMMER) C. elegans GTTTCC-ATGACAAC -82 347 typical (HMMER) C. japonica GTTTCC-ATGGCAAC† -63 369 typical (HMMER) C. angaria GTTACT-ATGGAGAT† -72 268 typical (HMMER) H. bacteriophora GTTGCC-ATGGAGAT† -70 294 typical (HMMER) H. contortus GTAACC-ATGACTAC† -91 296 typical (HMMER) A. ceylanicum GTCTCC-ATGACGAC† -77 309 typical (HMMER) N. americanus GTCTCC-ATGACGAC† -90 309 typical (HMMER) S. ratti ATTACC-ATGACAAC† -104 315 typical (HMMER) P. redivivus GTCTCC-ATGACGAC† -90 309 typical (HMMER) M. hapla GTTACTAATGGCAAC† -108 339 atypical (TFM-Scan 6bp motif) D. immitis GTTGCT-ATGGTAAT† -118 300 typical (HMMER) O. volvulus GTTGCT-ATGGTAAT† -99 300 typical (HMMER) B. malayi GTTGCT-ATGGTAAT† -103 300 typical (HMMER) L. loa GTTGCT-ATGGTAAT† -101 300 typical (HMMER)

155 Figure 4.19: Distribution of X-box motifs in the bbs-9 promoter.

Figure 4.20: Sequence logo of bbs-9 typical X-box motifs (generated from 19 input sequences, in- cluding C. elegans motif(s)).

Figure 4.21: LOGO depicting aligned bbs-9 X-box motifs and 30bp of flanking promoter sequence.

4.2.8 X-box motifs in the che-2 promoter

Typical X-box motif candidates were found in the promoters of 14 out of 22 che-2 orthologs (Table 4.9). We found atypical X-box motif candidates for the remaining 8 genes. The locations of the X-box motif candidates are somewhat conserved, with some clustering around the -100 to -200 re- gion (4.22). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and some of the atypical X-box motifs share similarity with the consensus (Figure 4.23). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.24).

156 Table 4.9: Alignment of X-box motifs in the che-2 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTACCC-GGACAAC† -116 282 atypical (TFM-Scan 6bp/7bp motif) C. tropicalis GTTACCT-TGACAAC -155 312 typical (HMMER) C. brenneri GTTCTTA-AAGAGGC† -1693 194 atypical (relaxed consensus regex) C. brenneri GTTGTCATTAGCAAC† -638 328 atypical (TFM-Scan 6bp/7bp motif) C. briggsae GTATCCA-TGGCAAC -183 350 typical (HMMER) C. elegans GTTGTCA-TGGTGAC -130 302 typical (HMMER) C. japonica GTAACCA-TGCCAAC† -79 317 typical (HMMER) C. angaria GTTACCA-TGGATAC† -74 320 typical (HMMER) C. angaria ATTACCA-TGGCAAC† -129 337 typical (HMMER) H. bacteriophora GTCGTCA-TAGCAAC† -170 316 typical (HMMER) H. contortus GTAACCA-TGGTGAC† -623 305 typical (HMMER) A. ceylanicum GTCGCCA-TGGTGAC† -1280 314 typical (HMMER) P. pacificus TTCTCCA-TGGCAAC† -105 328 typical (HMMER) P. exspectatus TTCTCCA-TGGCAAC† -136 328 typical (HMMER) S. ratti GTTGCTA--GGATAC† -134 268 atypical (TFM-Scan 6bp motif) P. redivivus GTCTTCCATGCGGAC† -1622 232 atypical (manual inspection) B. xylophilus GTTGCCA-AGGAAAC† -176 319 typical (HMMER) M. hapla ATTACCA-TGACAAC† -92 315 typical (HMMER) A. suum GTCGCCA-TGGTCAC† -1935 312 typical (HMMER) A. suum GTTACAG-TGGCAAC† -977 306 typical (HMMER) A. suum GTTACCT-TGGCAAC† -1281 334 typical (HMMER) D. immitis GTTACTATTAGCAAC† -197 326 atypical (TFM-Scan 6bp/7bp motif) O. volvulus GTTACTATTAGCAAC† -176 326 atypical (TFM-Scan 6bp/7bp motif) B. malayi GTTGCTATTAGCAAC† -171 328 atypical (TFM-Scan 6bp/7bp motif) L. loa ATCATTT-AAACAAC† -329 184 atypical (relaxed consensus regex) T. spiralis GTTGCCG-TGGCAAC† -48 336 typical (HMMER) T. suis GTTTCCA-TGGCAAT† -40 341 typical (HMMER)

Figure 4.22: Distribution of X-box motifs in the che-2 promoter.

157 (a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 18 input sequences, in- tifs (generated from 3 input se- tifs (generated from 5 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.23: Sequence logo of che-2 X-box motifs.

Figure 4.24: LOGO depicting aligned che-2 X-box motifs and 30bp of flanking promoter sequence.

4.2.9 X-box motifs in the che-11 promoter

Typical X-box motif candidates were found in the promoters of all 15 che-11 orthologs (Table 4.10). The locations of the X-box motif candidates are conserved, with clustering around the -100 to -200 region and a few outliers (4.25). The sequences of the typical X-box motif candidates are similar to the known X-box consensus (Figure 4.26). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.27). Table 4.10: Alignment of X-box motifs in the che-11 promoter

Species Sequence Location Score Type (Method)

C. remanei GTAACCATAGCAAC† -109 331 typical (HMMER) C. brenneri GTATCCATAGCAAC† -123 337 typical (HMMER) C. sinica GTTCCCATTGAAAC† -49 320 typical (HMMER) C. sinica GTATCCATAGCAAC† -112 337 typical (HMMER) C. briggsae GTATCCATAGCAAC -119 337 typical (HMMER) C. elegans ATCTCCATGGCAAC -86 331 typical (HMMER) C. japonica GTTTCCATGACAAC† -96 347 typical (HMMER) C. angaria ATCACCATGGCAAC† -156 325 typical (HMMER) A. ceylanicum GTGTCCATGACAAC† -110 326 typical (HMMER) N. americanus GTAACCATGACTAC† -101 296 typical (HMMER) P. pacificus CTCTCCATAGCAAC† -185 315 typical (HMMER) P. exspectatus CTCTCCATAGCAAC† -184 315 typical (HMMER) S. ratti GTTGTCATGGATAC† -79 298 typical (HMMER) P. redivivus GTTTCCATGGATAC† -87 326 typical (HMMER) B. xylophilus GTTCCTATGACAAC† -85 313 typical (HMMER) B. xylophilus GTTTCCAAAGGAAC† -874 309 typical (HMMER) M. hapla GTTGCTATGACAGC† -117 287 typical (HMMER) O. volvulus GTTGCTGTGGTAAT† -921 271 typical (HMMER) O. volvulus GTAACCATAGCAAC† -993 331 typical (HMMER)

158 Figure 4.25: Distribution of X-box motifs in the che-11 promoter.

Figure 4.26: Sequence logo of che-11 typical X-box motifs (generated from 19 input sequences, including C. elegans motif(s)).

Figure 4.27: LOGO depicting aligned che-11 X-box motifs and 30bp of flanking promoter sequence.

4.2.10 X-box motifs in the che-13 promoter

Typical X-box motif candidates were found in the promoters of 16 out of 21 che-13 orthologs (Table 4.11). We found atypical X-box motif candidates for the remaining 5 genes. The locations of the X- box motif candidates are somewhat conserved, with some clustering around the -100 region and some outliers (4.28). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs share some similarity with the consensus (Figure 4.29). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.30).

159 Table 4.11: Alignment of X-box motifs in the che-13 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTTCC-TTGACAAC† -98 318 typical (HMMER) C. brenneri GTCTCC-TTGACAAC† -99 306 typical (HMMER) C. sinica GTCTCC-ATGACAAC† -74 335 typical (HMMER) C. briggsae GTTTCC-TTGACAAC -87 318 typical (HMMER) C. elegans GTTGCT-ATAGCAAC -75 328 typical (HMMER) C. japonica GTTTCC-ATAGAAAC† -86 339 typical (HMMER) C. angaria GTTTCC-ATGGAAAC† -86 352 typical (HMMER) H. bacteriophora GTAGCC-ATGGAAAC† -206 329 typical (HMMER) H. contortus GTCGCT-ATGGCAAC† -235 329 typical (HMMER) N. americanus GTTACC-ATGGAGAC† -1876 320 typical (HMMER) P. pacificus GTTACC-TTGACTAC† -62 286 atypical (TFM-Scan 6bp/7bp motif) P. exspectatus GTTGCC-TTGACTAC† -62 288 typical (HMMER) S. ratti ATCACC-ATAGTAAC† -94 299 typical (HMMER) P. redivivus GTAACC-ATGGCAAC† -115 344 typical (HMMER) M. hapla GTTGTC-ATGGAAAC† -54 324 typical (HMMER) A. suum GTTTCC-ATGGCAGC† -696 337 typical (HMMER) D. immitis GTTCCCAATTTATAC† -1603 267 atypical (manual inspection) D. immitis GTTGTT-ATATCAAC† -251 277 atypical (manual inspection) O. volvulus GTTGCT-ATGGAATC† -309 292 typical (HMMER) B. malayi GTTATT-TTAGAAAT† -669 228 atypical (relaxed consensus regex) L. loa GTTGCT-ATGGAGGC† -832 266 typical (HMMER) T. spiralis GTTTCT--TGATGAC† -34 254 atypical (TFM-Scan 6bp motif) T. spiralis GTAGCT-ACGGCAAT† -1572 264 atypical (TFM-Scan 6bp motif) T. suis GTTGCT--AGGATAC† -34 239 atypical (TFM-Scan 6bp motif) T. suis GCTGCTAATGGAGAC† -948 268 atypical (TFM-Scan 7bp motif)

Figure 4.28: Distribution of X-box motifs in the che-13 promoter.

160 (a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 17 input sequences, in- tifs (generated from 4 input se- tifs (generated from 2 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.29: Sequence logo of che-13 X-box motifs.

Figure 4.30: LOGO depicting aligned che-13 X-box motifs and 30bp of flanking promoter sequence.

4.2.11 X-box motifs in the dyf-1 promoter

Typical X-box motif candidates were found in the promoters of 19 out of 21 dyf-1 orthologs (Table 4.12). We found atypical X-box motif candidates for the remaining 2 genes. The locations of the X- box motif candidates are strongly conserved, with clustering around the -100 region (4.31). The only outliers are atypical X-box motif candidates, which is reasonable because we optimize for sensitivity and do not expect all atypical X-box motif candidates to be true motifs. The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs share some similarity with the consensus (Figure 4.32). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.33).

161 Table 4.12: Alignment of X-box motifs in the dyf-1 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTGTC-ATGGAGAC† -100 298 typical (HMMER) C. tropicalis GTTGTC-ATGGACAC† -75 296 typical (HMMER) C. brenneri GTTGCT-ATGGATAT† -85 270 typical (HMMER) CBN29399 C. brenneri GTTGCC-ATGGATAT† -78 294 typical (HMMER) CBN32059 C. sinica GTTGTC-ATGGACAC† -106 296 typical (HMMER) C. briggsae GTTTCC-ATGGGCAC† -81 323 typical (HMMER) C. elegans GTTACC-ATGGATAT -107 292 typical (HMMER) C. japonica GTTGCC-ATGGTTAC† -98 326 typical (HMMER) C. angaria GTTACC-ATAGCAAC† -82 350 typical (HMMER) H. contortus GTTGCC-TTGACAAC† -95 314 typical (HMMER) A. ceylanicum GTTGTC-GTGGTAAC† -72 299 typical (HMMER) P. pacificus GTTTCC-TTAGCAAC† -93 327 typical (HMMER) P. exspectatus GTTTCC-TTAGCAAC† -96 327 typical (HMMER) S. ratti ATTACC-ATGGTAAC† -73 324 typical (HMMER) B. xylophilus GTTGCT-ATGACAAC† -86 319 typical (HMMER) 162 M. hapla GTTGTC-AAGGAAAC† -133 295 typical (HMMER) D. immitis GTTACT-ATGGCAAT† -79 311 typical (HMMER) O. volvulus GTTGTC-ATGGCAAT† -102 313 typical (HMMER) B. malayi GTTACC-ATAGCAAT† -80 322 typical (HMMER) L. loa GTTGCT-ATGGTAAT† -85 300 typical (HMMER) T. spiralis GTTTTC--GAGCAAC† -1722 273 atypical (TFM-Scan 6bp motif) T. spiralis GTTGCT--AAGCACC† -67 237 atypical (TFM-Scan 6bp motif) T. suis ATCTTC-TTGGTGAC† -484 239 atypical (average/relaxed consensus regex) T. suis ATTGTCAATGGCCAC† -989 287 atypical (TFM-Scan 7bp motif) T. suis GTTGCT--TGGCAAC† -72 311 atypical (TFM-Scan 6bp motif) Figure 4.31: Distribution of X-box motifs in the dyf-1 promoter.

(a) Typical X-box motifs (generated from 20 input (b) Atypical 14bp X-box motifs (generated from sequences, including C. elegans motif(s)) 3 input sequences)

Figure 4.32: Sequence logo of dyf-1 X-box motifs.

Figure 4.33: LOGO depicting aligned dyf-1 X-box motifs and 30bp of flanking promoter sequence.

4.2.12 X-box motifs in the dyf-2 promoter

Typical X-box motif candidates were found in the promoters of 9 out of 15 dyf-2 orthologs (Table 4.13). We found atypical X-box motif candidates for the remaining 6 genes. The locations of the X- box motif candidates are conserved, with clustering around the -100 region (4.34). The only outliers are atypical X-box motif candidates. The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and some of the atypical X-box motifs share some similarity with the consensus (Figure 4.35). Lastly, we generated a sequence LOGO including the 30bp sequence

163 flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.36). Table 4.13: Alignment of X-box motifs in the dyf-2 promoter

Species Sequence Location Score Type (Method)

C. remanei GT-TACCATGGAAAC† -159 346 typical (HMMER) C. tropicalis GT-TACCATGGTGAT† -117 296 typical (HMMER) C. brenneri GT-TACCATGGATAC† -126 320 typical (HMMER) C. briggsae AT-TTCCATAACAAT† -212 280 typical (HMMER) C. briggsae GT-TGCTATGGATAC† -150 298 typical (HMMER) C. elegans GT-TACCAAGGCAAC -154 334 typical (HMMER) C. angaria GT-TGTCATGGATAC† -69 298 typical (HMMER) H. contortus GT-TGCTATGGCAAC† -82 341 typical (HMMER) A. ceylanicum GT-AGCCATAGCAAC† -82 333 typical (HMMER) P. pacificus GT-TGCTATGGGTAC† -96 297 typical (HMMER) S. ratti GTTGTTCTTCGTGAC† -654 235 atypical (TFM-Scan 7bp motif) S. ratti GT-TTTTTTAAAAAT† -1625 212 atypical (relaxed consensus regex) S. ratti GT-TACTA-AGCAAC† -68 296 atypical (TFM-Scan 6bp motif) P. redivivus AT-TCTCAAAATAAC† -900 232 atypical (relaxed consensus regex) P. redivivus GTATCCCATGGCAAC† -130 359 atypical (TFM-Scan 6bp motif) B. xylophilus GT-TGTCATGGAAAT -65 296 typical (HMMER) D. immitis GT-TGCC-CAGCAAC† -74 292 atypical (TFM-Scan 6bp motif) O. volvulus GT-TGCC-CAGCAAC† -77 292 atypical (TFM-Scan 6bp motif) B. malayi GT-TGCC-CAGCAAC† -256 292 atypical (TFM-Scan 6bp motif) L. loa GT-TGCC-CAGCAAC† -278 292 atypical (TFM-Scan 6bp motif)

Figure 4.34: Distribution of X-box motifs in the dyf-2 promoter.

164 (a) Typical X-box motifs (generated from 11 input (b) Atypical 13bp X-box motifs (generated from sequences, including C. elegans motif(s)) 5 input sequences)

(c) Atypical 14bp X-box motifs (generated from (d) Atypical 15bp X-box motifs (generated from 2 input sequences) 2 input sequences)

Figure 4.35: Sequence logo of dyf-2 X-box motifs.

Figure 4.36: LOGO depicting aligned dyf-2 X-box motifs and 30bp of flanking promoter sequence.

4.2.13 X-box motifs in the dyf-3 promoter

Typical X-box motif candidates were found in the promoters of 16 out of 22 dyf-3 orthologs (Table 4.14). We found atypical X-box motif candidates for the remaining 6 genes. The locations of the X- box motif candidates are conserved, with clustering around the -100 to -200 region (4.37). The only outliers are atypical X-box motif candidates. The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and some of the atypical X-box motifs share some simi- larity with the consensus (Figure 4.38). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.39). Table 4.14: Alignment of X-box motifs in the dyf-3 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTTCTAT-GGCAAC† -87 345 typical (HMMER) C. brenneri GTTGCCACGGGATAC† -98 292 atypical (TFM-Scan 6bp motif) C. sinica GTTTCCGT-GGTAAC† -81 327 typical (HMMER) C. briggsae GTTGCCAT-GGGAAT† -76 319 typical (HMMER) C. elegans GTTTCTAT-GGGAAC -89 327 typical (HMMER) C. japonica GTTTCCAT-GGATAC† -59 326 typical (HMMER) C. angaria TTCACCAT-GGAAAC† -65 305 typical (HMMER)

165 Alignment of X-box motifs in the dyf-3 promoter

Species Sequence Location Score Type (Method)

H. bacteriophora GTTCCCAT-AGCAAC† -63 346 typical (HMMER) H. contortus GTTGCTAT-GGTGAG† -59 272 typical (HMMER) A. ceylanicum GTTGCTAT-GACGAC† -61 293 typical (HMMER) N. americanus GTTGCTAT-GACGAC† -70 293 typical (HMMER) P. pacificus TTCCCCAT-GGCAAC† -202 318 typical (HMMER) P. exspectatus GCTGCCCATGGCAAC† -211 276 atypical (TFM-Scan 6bp motif) S. ratti ATCATTTA-AATAAC† -1589 171 atypical (relaxed consensus regex) S. ratti GTTGATAG-GAAAAC† -597 245 atypical (TFM-Scan 6bp motif) S. ratti GTTGCT-T-AGCAAC† -93 298 atypical (TFM-Scan 6bp motif) P. redivivus GTTGCCAT-GACGAC† -141 317 typical (HMMER) B. xylophilus GTTGTCAT-GGTGAA† -162 272 typical (HMMER) M. hapla GTTGCTAAGGGTAAC† -261 299 atypical (TFM-Scan 6bp/7bp motif) A. suum ATCACCAT-GGAAAC† -90 308 typical (HMMER) D. immitis GTTCCCTT-GGCAAC† -91 330 typical (HMMER) O. volvulus GTTCCCTT-GGCAAC† -80 330 typical (HMMER) L. loa GTTCCCTT-GGCAAC† -78 330 typical (HMMER) T. spiralis ATTGCTC--GAAAAC† -1295 216 atypical (TFM-Scan 6bp motif) T. spiralis GTTGATCC-GATGAC† -1761 192 atypical (TFM-Scan 6bp motif) T. spiralis GTGACCA--AGCAAC† -223 299 atypical (TFM-Scan 6bp motif) T. spiralis GTTACCA--GGCAAC† -117 333 atypical (TFM-Scan 6bp motif) T. spiralis GTCGTCATCAGCAAC† -21 316 atypical (TFM-Scan 6bp motif) T. suis ATCATCAA-GACGAC† -228 224 atypical (relaxed consensus regex) T. suis GTTGCC-T-AGCAAC† -69 322 atypical (TFM-Scan 6bp motif)

Figure 4.37: Distribution of X-box motifs in the dyf-3 promoter.

166 (a) Typical X-box motifs (generated from 17 input (b) Atypical 13bp X-box motifs (generated from sequences, including C. elegans motif(s)) 5 input sequences)

(c) Atypical 14bp X-box motifs (generated from (d) Atypical 15bp X-box motifs (generated from 4 input sequences) 4 input sequences)

Figure 4.38: Sequence logo of dyf-3 X-box motifs.

Figure 4.39: LOGO depicting aligned dyf-3 X-box motifs and 30bp of flanking promoter sequence.

4.2.14 X-box motifs in the dyf-5 promoter

Typical X-box motif candidates were found in the promoters of 20 out of 25 dyf-5 orthologs (Table 4.15). Of the remaining 5 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 4 genes. The locations of the X-box motif show some conservation, with clustering around the -200 to -600 region (4.40). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and some of the atypical X-box motifs share some similarity with the consensus (Figure 4.41). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X- box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.42).

167 Table 4.15: Alignment of X-box motifs in the dyf-5 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTACCAT-AGACAC† -288 305 typical (HMMER) C. tropicalis GTTGCCAT-AGACAC† -275 307 typical (HMMER) C. brenneri GTTACCAT-AGACAC† -283 305 typical (HMMER) CBN19125 C. brenneri GTTACCAT-AGACAC† -303 305 typical (HMMER) CBN29786 C. sinica GTTACCAT-GGAGCC† -294 288 typical (HMMER) C. briggsae GTGTCTAT-GGTAAC† -1185 311 typical (HMMER) C. briggsae GTTACCAT-AGACAC† -569 305 typical (HMMER) C. elegans GTTACCAT-AGAAAC -285 333 typical (HMMER) C. japonica GTTTCTAT-GGCTAC† -256 319 typical (HMMER) C. angaria GTCTTTTA-AAAAGT† -1749 139 atypical (relaxed consensus regex) C. angaria GTTATTTT-GAAAAT† -930 219 atypical (relaxed consensus regex) C. angaria GTTATTTT-GAAAAT† -658 219 atypical (relaxed consensus regex) H. bacteriophora GTATCCTT-GACAAC† -539 299 typical (HMMER) H. contortus ATCACCGT-GGCAAC† -1397 296 typical (HMMER) H. contortus GTTGTCAT-GGCAGT† -1471 281 typical (HMMER) 168 A. ceylanicum GTTGTCAC-GACAAC† -1450 289 typical (HMMER) P. pacificus GTATCCTT-GACAAC† -606 299 typical (HMMER) P. exspectatus GTATCCTT-GACAAC† -312 299 typical (HMMER) S. ratti GTTTCCAT-GCCAAC† -513 342 typical (HMMER) P. redivivus GCTTTCTA-GACAAC† -1498 235 atypical (relaxed consensus regex) B. xylophilus GTTCTCAG-TTACAC† -388 212 atypical (manual inspection) B. xylophilus GTCTAGAT-ATAGAT† -1404 190 atypical (manual inspection) M. incognita GTTTCCAT-GGTCAC† -246 328 typical (HMMER) Minc07403+Minc07404 M. incognita GTTTCCAT-GGACAC† -256 324 typical (HMMER) Minc01307 M. hapla GTTTCTAT-GGTCAC† -468 304 typical (HMMER) A. suum GTCGTCAT-AGTGAC† -198 277 typical (HMMER) O. volvulus GTCGTCAT-GGTGAC† -625 290 typical (HMMER) B. malayi ATTTCTTT-AAAAAC† -444 238 atypical (average/relaxed consensus regex) B. malayi GTTTCCAGAAGCTAC† -531 301 atypical (TFM-Scan 6bp motif) T. spiralis GTAGCCAT-GGGAAC† -342 328 typical (HMMER) T. suis GTTGCCAT-GGCGAC† -185 339 typical (HMMER) Figure 4.40: Distribution of X-box motifs in the dyf-5 promoter.

(a) Typical X-box motifs (generated from 23 input (b) Atypical 14bp X-box motifs (generated from sequences, including C. elegans motif(s)) 7 input sequences)

Figure 4.41: Sequence logo of dyf-5 X-box motifs.

Figure 4.42: LOGO depicting aligned dyf-5 X-box motifs and 30bp of flanking promoter sequence.

4.2.15 X-box motifs in the dyf-11 promoter

Typical X-box motif candidates were found in the promoters of 17 out of 21 dyf-11 orthologs (Table 4.16). Of the remaining 4 genes, 2 were eliminated from further X-box motif searches because they contain gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 2 genes. The locations of the X-box motif show some conservation, with clustering around the -200 to -400 region (4.43). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and some of the atypical X-box motifs share some similarity with the consensus

169 (Figure 4.44). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X- box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.45). Table 4.16: Alignment of X-box motifs in the dyf-11 promoter

Species Sequence Location Score Type (Method)

C. remanei ATTTCCA-TGGCAAC† -218 343 typical (HMMER) C. tropicalis GTTTCCA-TAGCTAC† -164 330 typical (HMMER) C. brenneri GTTTCCA-TGACAAC† -220 347 typical (HMMER) C. sinica ATCTCTA-TGGCAAC† -222 307 typical (HMMER) C. briggsae GTTTCCC-TGGAAAC† -634 322 typical (HMMER) C. elegans GTCTCCA-TGACAAC -181 335 typical (HMMER) C. japonica GTTTCCA-TGGTAAC† -165 356 typical (HMMER) C. angaria GTAACCA-CAGCAAC† -339 301 typical (HMMER) C. angaria GTTTCTA-TGATGAC† -637 284 typical (HMMER) H. bacteriophora GCTTCCA-TGGCAAC† -1367 339 typical (HMMER) H. contortus GTCTCTT-AAGTAGT† -994 189 atypical (relaxed consensus regex) H. contortus ATTTTCAGTCGTAAC† -1911 285 atypical (TFM-Scan 7bp motif) P. pacificus GTTGCCC-AGGAGAC† -257 263 typical (HMMER) P. exspectatus GTTGCCC-AGGAGAC† -397 263 typical (HMMER) S. ratti GTTGCTA-TAGTAAC† -122 315 typical (HMMER) M. incognita GTTGTCA-TGGTAAT† -650 300 typical (HMMER) M. hapla GTTGTCA-TGGTAAT† -666 300 typical (HMMER) A. suum GTCATCA-TCTCTAA† -269 223 atypical (manual inspection) A. suum GTGCACATATGACAC† -436 214 atypical (manual inspection) D. immitis GTCGCCA-TTGAAAC† -434 314 typical (HMMER) D. immitis ATTACCA-TAGAAAC† -1283 307 typical (HMMER) O. volvulus GTTCCTA-TTGAAAC† -410 296 typical (HMMER) O. volvulus ATTACCA-TAGAAAC† -1255 307 typical (HMMER) B. malayi GTTTCTA-TGGTAAT† -1278 304 typical (HMMER) L. loa ATTACCA-TAGAAAC† -1142 307 typical (HMMER)

170 Figure 4.43: Distribution of X-box motifs in the dyf-11 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 20 input sequences, in- tifs (generated from 2 input se- tifs (generated from 2 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.44: Sequence logo of dyf-11 X-box motifs.

Figure 4.45: LOGO depicting aligned dyf-11 X-box motifs and 30bp of flanking promoter sequence.

4.2.16 X-box motifs in the dyf-13 promoter

Typical X-box motif candidates were found in the promoters of 18 out of 20 dyf-13 orthologs (Table 4.17). We found atypical X-box motif candidates for the remaining 2 genes. The locations of the X-box motif show some conservation, with clustering around the -100 region (4.46). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs share some similarity with the consensus (Figure 4.47). Lastly, we generated a se-

171 quence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.48). Table 4.17: Alignment of X-box motifs in the dyf-13 promoter

Species Sequence Location Score Type (Method)

C. remanei AT-CTCCATAGCAAC-† -112 318 typical (HMMER) C. tropicalis AT-CTCCATAGCAAC-† -124 318 typical (HMMER) C. brenneri AT-CTCCATAGCAAC-† -115 318 typical (HMMER) C. sinica AT-CTCCATGGAAAC-† -150 314 typical (HMMER) C. briggsae AT-CTCCATAGCAAC-† -97 318 typical (HMMER) C. elegans GT-CTCCATAGCAAC- -104 344 typical (HMMER) C. japonica AT-CTCCATAGCAAC-† -123 318 typical (HMMER) C. angaria GT-TACCATGGCAAC-† -65 363 typical (HMMER) H. contortus GT-TGTCATGGTAAG-† -76 298 typical (HMMER) H. contortus GT-AACCATCGCAAC-† -632 323 typical (HMMER) A. ceylanicum GT-TGTCATGGTAAA-† -77 298 typical (HMMER) N. americanus GT-CGTCATGGCACC-† -1853 297 typical (HMMER) P. pacificus GTGTATATTGATAAC-† -1086 247 atypical (manual inspection) P. pacificus GT-TTTAATATCAAC-† -1185 277 atypical (manual inspection) P. exspectatus GTGTATATTGATAAC-† -1089 247 atypical (manual inspection) P. exspectatus GT-CGTTATGTCAAAC† -377 248 atypical (manual inspection) P. exspectatus GT-CTTAATATCAAC-† -1188 265 atypical (manual inspection) S. ratti GT-TACCATGGAAAT-† -65 318 typical (HMMER) P. redivivus GT-TGCCATGGAAAC-† -138 348 typical (HMMER) B. xylophilus AT-TGTCATGGCAAC-† -101 315 typical (HMMER) A. suum GA-TGTCATGGCAAC-† -138 310 typical (HMMER) D. immitis GT-CACCCTGGCAAC-† -118 321 typical (HMMER) O. volvulus GT-CACCTTGGCAAC-† -116 322 typical (HMMER) B. malayi GT-TGCCGTGACGAC-† -121 288 typical (HMMER) L. loa AT-CGCCCTGGCAAC-† -117 297 typical (HMMER)

Figure 4.46: Distribution of X-box motifs in the dyf-13 promoter.

172 (a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 20 input sequences, in- tifs (generated from 2 input se- tifs (generated from 3 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.47: Sequence logo of dyf-13 X-box motifs.

Figure 4.48: LOGO depicting aligned dyf-13 X-box motifs and 30bp of flanking promoter sequence.

4.2.17 X-box motifs in the dyf-18 promoter

Typical X-box motif candidates were found in the promoters of 18 out of 21 dyf-18 orthologs (Table 4.18). We found atypical X-box motif candidates for the remaining 3 genes. The locations of the X-box motif show some conservation, with clustering around the -100 to -200 region (4.49). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and some of the atypical X-box motifs share some similarity with the consensus (Figure 4.50). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.51). Table 4.18: Alignment of X-box motifs in the dyf-18 promoter

Species Sequence Location Score Type (Method)

C. remanei GTCTC-CATGACAAC† -154 335 typical (HMMER) C. tropicalis GTTTC-CATGACAAC† -167 347 typical (HMMER) C. brenneri GTCTC-CATGACAAC† -163 335 typical (HMMER) C. sinica GTCTC-CATGACAAC† -178 335 typical (HMMER) C. briggsae GTCTC-CATGCCAAC† -169 330 typical (HMMER) C. elegans GTTAC-CGTGCCAAC† -281 307 typical (HMMER) C. elegans GTCTC-CATGACAAC -161 335 typical (HMMER) C. japonica GTCTC-CATGACAAC† -128 335 typical (HMMER) C. angaria ATCAC-CATAGCAAC† -109 312 typical (HMMER) H. bacteriophora GTTCC-CCTAGCAAC† -141 316 typical (HMMER) H. contortus GTTAC-TATGGTAAC† -1003 326 typical (HMMER) A. ceylanicum ATTGC-GATAGAAAT† -536 253 atypical (TFM-Scan 7bp motif) S. ratti GTAAC-CA-AGCAAC† -1109 301 atypical (TFM-Scan 6bp motif) S. ratti ATTAC-C-TGATGAC† -1811 246 atypical (TFM-Scan 6bp motif) P. redivivus GTTGT-TATGACAAT† -971 267 typical (HMMER) P. redivivus GTTAC-CATGGAGAT† -119 292 typical (HMMER) B. xylophilus GTTGT-CATGGAGAT† -70 270 typical (HMMER) M. incognita GTTGC-TATAGAAAC† -43 311 typical (HMMER) M. hapla GTTAC-CATAGTTAC† -61 311 typical (HMMER) A. suum GCTTC-CATAGCAAC† -1659 326 typical (HMMER)

173 Alignment of X-box motifs in the dyf-18 promoter

Species Sequence Location Score Type (Method)

O. volvulus GTTTC-CATGGCTAC† -500 343 typical (HMMER) B. malayi GTTTC-CATGGTGAC† -440 330 typical (HMMER) L. loa GTTCC-CATGGTTAC† -428 320 typical (HMMER) T. spiralis GTTTT-GATAGCAAC† -561 304 typical (HMMER) T. suis ATACT-TATCCCTAC† -104 192 atypical (manual inspection) T. suis GTTGCGCATGAATAG† -1446 270 atypical (manual inspection)

Figure 4.49: Distribution of X-box motifs in the dyf-18 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 13bp X-box mo- (c) Atypical 14bp X-box mo- ated from 21 input sequences, in- tifs (generated from 2 input se- tifs (generated from 2 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.50: Sequence logo of dyf-18 X-box motifs.

Figure 4.51: LOGO depicting aligned dyf-18 X-box motifs and 30bp of flanking promoter sequence.

174 4.2.18 X-box motifs in the dylt-2 promoter

Typical X-box motif candidates were found in the promoters of all 19 dylt-2 orthologs (Table 4.19). The locations of the X-box motif show strong conservation, with clustering around the -50 to -100 region (4.52). The sequences of the typical X-box motif candidates are similar to the known X- box consensus (Figure 4.53). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.54). Table 4.19: Alignment of X-box motifs in the dylt-2 promoter

Species Sequence Location Score Type (Method)

C. remanei GTCTCCTTGGCAAC† -82 328 typical (HMMER) C. tropicalis GTTTCCATGGAGAT† -60 298 typical (HMMER) C. brenneri GTTACCATAGAGAC† -95 307 typical (HMMER) C. sinica GTCTCCATGACAAC† -102 335 typical (HMMER) C. briggsae GTTTCCATGGCTAC -83 343 typical (HMMER) C. elegans GTTGCCATGACAAC -78 343 typical (HMMER) C. japonica GTCTCCATGGCAAC† -64 357 typical (HMMER) C. angaria GTCACCATAGCAAC† -99 338 typical (HMMER) H. bacteriophora GTATCCATGACAAC† -72 328 typical (HMMER) A. ceylanicum GTTTCCTTAGCAAC† -39 327 typical (HMMER) N. americanus GTCACCTTGACAAC† -36 300 typical (HMMER) P. pacificus ATCGCCATGGCAAC† -135 327 typical (HMMER) P. exspectatus ATCGTCATGGCAAC† -127 303 typical (HMMER) S. ratti ATTACCATAGCAAC† -65 324 typical (HMMER) P. redivivus GTTGCCATGGCAAT† -68 337 typical (HMMER) A. suum GTTGTCATGGGAAC† -1078 323 typical (HMMER) O. volvulus GTTTCCATGGATAC† -66 326 typical (HMMER) B. malayi GTTTCCATGGATAC† -67 326 typical (HMMER) L. loa GTCACCATGGATAC† -70 308 typical (HMMER) T. suis GTTGCTATGCAAAT† -1852 269 typical (HMMER) T. suis GTTTCCATGGAAAC† -57 352 typical (HMMER)

175 Figure 4.52: Distribution of X-box motifs in the dylt-2 promoter.

Figure 4.53: Sequence logo of dylt-2 typical X-box motifs (generated from 21 input sequences, including C. elegans motif(s)).

Figure 4.54: LOGO depicting aligned dylt-2 X-box motifs and 30bp of flanking promoter sequence.

4.2.19 X-box motifs in the ift-20 promoter

Typical X-box motif candidates were found in the promoters of 20 out of 22 ift-20 orthologs (Table 4.20). We found atypical X-box motif candidates for the remaining 2 genes. The locations of the X-box motif show some conservation, with clustering around the -50 to -200 region (4.55). The sequences of the typical X-box motif candidates are similar to the known X-box consensus (Figure 4.56). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.57).

176 Table 4.20: Alignment of X-box motifs in the ift-20 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTTCC-ATGGTAAC† -56 356 typical (HMMER) C. tropicalis GTTTCC-ATGACAAC† -55 347 typical (HMMER) C. brenneri GTTTCT-ATGGTAAC† -60 332 typical (HMMER) CBN07292 C. brenneri GTTTCT-ATGGTAAC† -57 332 typical (HMMER) CBN16460 C. sinica GCTCCC-ATGACAAC† -77 307 typical (HMMER) C. briggsae GTAACC-ATGACAAC† -66 322 typical (HMMER) C. elegans GTCTCT-ATAGCAAC -60 320 typical (HMMER) C. japonica GTATCC-ATGACAAC† -61 328 typical (HMMER) C. angaria GTTTCC-ATGACAAC† -89 347 typical (HMMER) H. bacteriophora GTTGAC-ATAGCAAC† -235 324 typical (HMMER) H. contortus GTTGCC-ATGGAGAA† -1106 292 typical (HMMER) A. ceylanicum GTTGCCAACGGCAAC† -336 335 atypical (TFM-Scan 6bp motif) N. americanus GTTTTC-ATAGGAAC† -468 314 typical (HMMER) P. pacificus ATCGCC-ATGACAAC† -98 305 typical (HMMER) S. ratti GTTGCT-ATGGAAAG† -65 294 typical (HMMER) P. redivivus GCTGCC-ATGGCAAC† -44 335 typical (HMMER) B. xylophilus ATCTCC-ATGGCAAC† -79 331 typical (HMMER) M. hapla GTTTCC-ATGGCTAC† -106 343 typical (HMMER) A. suum GTTGCT-A-GGTAAC† -1337 298 atypical (TFM-Scan 6bp motif) D. immitis GTTTCC-AAGGTAAC† -170 327 typical (HMMER) O. volvulus GTTTCC-AAGGCAAC† -178 340 typical (HMMER) B. malayi GTTTCC-AAGGTAAC† -180 327 typical (HMMER) L. loa GTTTCC-AAGGTAAC† -168 327 typical (HMMER)

Figure 4.55: Distribution of X-box motifs in the ift-20 promoter.

177 Figure 4.56: Sequence logo of ift-20 typical X-box motifs (generated from 21 input sequences, in- cluding C. elegans motif(s)).

Figure 4.57: LOGO depicting aligned ift-20 X-box motifs and 30bp of flanking promoter sequence.

4.2.20 X-box motifs in the ifta-1 promoter

Typical X-box motif candidates were found in the promoters of 14 out of 15 ifta-1 orthologs (Table 4.21). We found atypical X-box motif candidates for the remaining gene. The locations of the X- box motif show some conservation, with clustering around the -50 to -200 region and some outliers (4.58). The sequences of the typical X-box motif candidates are similar to the known X-box con- sensus (Figure 4.59). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.60). Table 4.21: Alignment of X-box motifs in the ifta-1 promoter

Species Sequence Location Score Type (Method)

C. remanei GTCCCCT-AGGTGGC† -948 218 atypical (relaxed consensus regex) C. remanei GTTGCCCTTGACAAC† -86 313 atypical (TFM-Scan 6bp/7bp motif) C. tropicalis GTTGCCA-TAACAAC† -130 330 typical (HMMER) C. brenneri GTTTCTA-TGACGAC† -131 297 typical (HMMER) C. sinica GTTACCA-AGACAAC† -170 312 typical (HMMER) C. briggsae GTCACTA-TGACAAC† -112 305 typical (HMMER) C. elegans GTTGCCA-TGGCAAT -114 337 typical (HMMER) C. japonica GTTGCTA-TGACAAC† -114 319 typical (HMMER) C. angaria GTTGCTG-TGGTAAC† -114 299 typical (HMMER) H. bacteriophora GTTACCA-TGGTGAC† -148 324 typical (HMMER) A. ceylanicum GTTGCCA-TGGTGAA† -81 296 typical (HMMER) P. redivivus GTTTCCA-TGACAAA† -1065 317 typical (HMMER) P. redivivus GTTGCCA-TGGCTAC† -89 339 typical (HMMER) B. xylophilus GTTGTCA-TAGTGAC† -110 289 typical (HMMER) M. hapla GTAGCCA-TGACAAC† -145 324 typical (HMMER) M. hapla ATTTCCA-TGATAAT† -1472 280 typical (HMMER) O. volvulus ATTGCCA-TGGTAAC† -133 326 typical (HMMER) T. spiralis GTTGCCA-AGGTGGC† -1078 265 typical (HMMER) T. spiralis GTTACCA-TGACAAC† -73 341 typical (HMMER) T. suis GTTGCTA-TGGAAAC† -63 324 typical (HMMER)

178 Figure 4.58: Distribution of X-box motifs in the ifta-1 promoter.

Figure 4.59: Sequence logo of ifta-1 typical X-box motifs (generated from 18 input sequences, in- cluding C. elegans motif(s)).

Figure 4.60: LOGO depicting aligned ifta-1 X-box motifs and 30bp of flanking promoter sequence.

4.2.21 X-box motifs in the mks-1 promoter

Typical X-box motif candidates were found in the promoters of 5 out of 8 mks-1 orthologs (Table 4.22). Of the remaining 3 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining two genes. The locations of the X-box motif show some conservation, although there are fewer well-defined mks-1 orthologs compared to other genes (4.61). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs show some similarity to the consensus (Figure 4.62). Lastly, we generated a sequence LOGO including

179 the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.63). Table 4.22: Alignment of X-box motifs in the mks-1 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTGCTATG-GGGAT† -230 269 typical (HMMER) C. tropicalis GTCCCCTTG-GTAAC† -65 305 typical (HMMER) C. brenneri GTCTCCATG-GTGAC† -75 318 typical (HMMER) C. elegans GTCACCATA-GGAAC -69 320 typical (HMMER) C. japonica GTTACCATG-GAAAC† -68 346 typical (HMMER) A. ceylanicum GTTGTCATG-ACAAC† -307 319 typical (HMMER) A. suum GTTGCCAGG-TGTAC† -403 265 atypical (manual inspection) A. suum GTGTTCATGCGAAAC† -726 307 atypical (manual inspection) B. malayi GTTTCCAAG-ACGGT† -566 232 atypical (relaxed consensus regex) B. malayi GTTTCCTTA-AAGAC† -1576 262 atypical (relaxed/average consensus regex) B. malayi GTTACC-TA-GCAGC† -117 288 atypical (TFM-Scan 6bp motif)

Figure 4.61: Distribution of X-box motifs in the mks-1 promoter.

Figure 4.63: LOGO depicting aligned mks-1 X-box motifs and 30bp of flanking promoter sequence.

180 (a) Typical X-box motifs (generated from 6 input (b) Atypical 14bp X-box motifs (generated from sequences, including C. elegans motif(s)) 3 input sequences)

Figure 4.62: Sequence logo of mks-1 X-box motifs.

4.2.22 X-box motifs in the mks-6 promoter

Typical X-box motif candidates were found in the promoters of all 5 mks-6 orthologs (Table 4.23). The locations of the X-box motif show some conservation, although there are fewer well-defined mks- 6 orthologs compared to other genes (4.64). The sequences of the typical X-box motif candidates are similar to the known X-box consensus (Figure 4.65). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.66). Table 4.23: Alignment of X-box motifs in the mks-6 promoter

Species Sequence Location Score Type (Method)

C. tropicalis GTTGTTTTGAAAAC† -1371 249 typical (HMMER) C. tropicalis GTTTCCATGGCAAC† -85 369 typical (HMMER) C. brenneri GTTTCCATGGAGAC† -90 326 typical (HMMER) C. sinica GTCTCCATGACGAC† -83 309 typical (HMMER) C. briggsae GTTGCCATGGGAGC† -68 315 typical (HMMER) C. elegans GTTGCCATAGCGAC -71 326 typical (HMMER) C. japonica GTAGCCATGGCAAC† -75 346 typical (HMMER)

181 Figure 4.64: Distribution of X-box motifs in the mks-6 promoter.

Figure 4.65: Sequence logo of mks-6 typical X-box motifs (generated from 7 input sequences, in- cluding C. elegans motif(s)).

Figure 4.66: LOGO depicting aligned mks-6 X-box motifs and 30bp of flanking promoter sequence.

4.2.23 X-box motifs in the mksr-1 promoter

Typical X-box motif candidates were found in the promoters of 20 out of 21 mksr-1 orthologs (Table 4.24). We found atypical X-box motif candidates for the remaining gene. The locations of the X-box motif show some conservation, with clustering around the -100 region and some outliers (4.67). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs show some similarity to the consensus (Figure 4.68). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.69).

182 Table 4.24: Alignment of X-box motifs in the mksr-1 promoter

Species Sequence Location Score Type (Method)

C. remanei GTCTCCCTGGCAAC† -109 327 typical (HMMER) C. tropicalis GTTACCTTAGCAAC† -111 321 typical (HMMER) C. brenneri GTCACCCTGGTAAC† -118 308 typical (HMMER) C. sinica GTCACCGTGGCAAC† -96 322 typical (HMMER) C. briggsae GTCACCTAGGCAAC† -75 293 typical (HMMER) C. elegans GTTCCCTTGGCAAC -85 330 typical (HMMER) C. japonica GTTTCCTTGACAAC† -69 318 typical (HMMER) C. angaria GTTACTATGACAAC† -104 317 typical (HMMER) H. bacteriophora GTTGCTATAGAAAC† -194 311 typical (HMMER) A. ceylanicum GTTGTCATGACAAC† -217 319 typical (HMMER) N. americanus GTTGCCATGACGAC† -259 317 typical (HMMER) P. pacificus GTTTCCATGGCAAT† -112 341 typical (HMMER) S. ratti GTTGTTAAAAATAT† -369 182 atypical (average consensus regex) S. ratti GTTGCTA-AGCAAC† -85 298 atypical (TFM-Scan 6bp motif) P. redivivus GTCTCCATGACAAC† -91 335 typical (HMMER) B. xylophilus GTTGCTATAGCGAT† -76 274 typical (HMMER) A. suum GTTGCCTTGGCGAC† -1079 310 typical (HMMER) D. immitis GTAACCATGGTTAC† -1227 305 typical (HMMER) O. volvulus GTCACCATGGTAAC† -606 338 typical (HMMER) B. malayi GTCACCATGGTGAC† -546 312 typical (HMMER) L. loa GTCACTATGGTGAC† -604 288 typical (HMMER) T. spiralis GTTGCCATAGAAAC† -81 335 typical (HMMER) T. suis GTTGTTATGGCAAC† -79 317 typical (HMMER)

Figure 4.67: Distribution of X-box motifs in the mksr-1 promoter.

183 Figure 4.68: Sequence logo of mksr-1 typical X-box motifs (generated from 21 input sequences, including C. elegans motif(s)).

Figure 4.69: LOGO depicting aligned mksr-1 X-box motifs and 30bp of flanking promoter sequence.

4.2.24 X-box motifs in the mksr-2 promoter

Typical X-box motif candidates were found in the promoters of 17 out of 23 mksr-2 orthologs (Table 4.25). Of the remaining 5 genes, two were eliminated from further X-box motif searches because they contain gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 4 genes. The locations of the X-box motif show some conservation, with clustering around the - 50 to -100 region and some outliers (4.70). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs show some similarity to the consensus (Figure 4.71). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.72).

184 Table 4.25: Alignment of X-box motifs in the mksr-2 promoter

Species Sequence Location Score Type (Method)

C. remanei GTCCCCA-TGGTGAC† -83 308 typical (HMMER) C. tropicalis GTTGCCTAGGGTAAC† -86 294 atypical (TFM-Scan 6bp/7bp motif) C. tropicalis GTCTCCC-AGTCAAC† -792 271 atypical (TFM-Scan 6bp motif) C. brenneri GTCACCA-TGGTGAC† -62 312 typical (HMMER) C. briggsae ATCGCCCGCGAGAAC† -1359 227 atypical (TFM-Scan 6bp motif) C. briggsae GTTGCCT-AGACGAC† -71 259 atypical (TFM-Scan 6bp motif, relaxed consensus regex) C. elegans GTTGCCG-TGGCAAC -67 336 typical (HMMER) C. japonica GTAACCA-TGGTAAC† -67 331 typical (HMMER) H. bacteriophora GCTTCTA-TGACAAC† -821 293 typical (HMMER) H. contortus GTTTCTA-TGGTGAC† -1433 306 typical (HMMER) A. ceylanicum GTGACCA-TAGTAAC† -232 316 typical (HMMER) P. pacificus CTCGCCA-TGGCAAC† -211 324 typical (HMMER) P. exspectatus CTCGCCA-TGGCAAC† -210 324 typical (HMMER) S. ratti GTTACTA-TAGAAAC† -124 309 typical (HMMER) P. redivivus ATCGCCA-TGGCAAC† -91 327 typical (HMMER) 185 B. xylophilus ATTGCCA-TAGTAAC† -56 313 typical (HMMER) M. hapla ATTGTCA-TGGTAAC† -258 302 typical (HMMER) A. suum TTCACCA-TGGCAAC† -73 322 typical (HMMER) D. immitis GTCTCCA-TGGCAAC† -77 357 typical (HMMER) O. volvulus GTCACCA-TGGTGAC† -79 312 typical (HMMER) B. malayi GTTTCCA-TGGTGAC† -68 330 typical (HMMER) L. loa ATTGCTA-TGGACAC† -1683 270 typical (HMMER) L. loa ATCTCCA-TGGCGAC† -70 305 typical (HMMER) T. spiralis ATAATTT-TGAAAAC† -537 202 atypical (average consensus regex) T. spiralis GTTTTCA-AAAAAAT† -1028 236 atypical (relaxed/average consensus regex) T. spiralis GTATCCATTGACGAC† -58 302 atypical (TFM-Scan 6bp/7bp motif) T. suis GCTTCCCCAGGCGAC† -578 254 atypical (TFM-Scan 6bp motif) T. suis GTTGCCGTTGGAAAC† -26 319 atypical (TFM-Scan 6bp/7bp motif) Figure 4.70: Distribution of X-box motifs in the mksr-2 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 19 input sequences, in- tifs (generated from 4 input se- tifs (generated from 5 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.71: Sequence logo of mksr-2 X-box motifs.

Figure 4.72: LOGO depicting aligned mksr-2 X-box motifs and 30bp of flanking promoter sequence.

4.2.25 X-box motifs in the nphp-2 promoter

Typical X-box motif candidates were found in the promoters of 4 out of 18 nphp-2 orthologs (Table 4.26). Of the remaining 14 genes, 3 were eliminated from further X-box motif searches because they contain gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 11 genes. The locations of the X-box motif do not show any significant clustering, and are instead scat- tered across the 2kb region (4.73). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs show some similarity to the consensus (Figure 4.74). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-

186 box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.75). Table 4.26: Alignment of X-box motifs in the nphp-2 promoter

Species Sequence Location Score Type (Method)

C. remanei GTTGTCCC-GGCAAC† -535 281 atypical (TFM-Scan 6bp/7bp motif) C. tropicalis GTTGTCCTGGGCAAC† -441 311 atypical (TFM-Scan 6bp/7bp motif) C. brenneri ATCCTCTT-GAAGAC† -1071 203 atypical (aelaxed/average consensus regex) C. brenneri GTCTTCAA-GAGGAT† -328 210 atypical (relaxed/average consensus regex) C. brenneri GTTGTCCTAGGCAAC† -1443 311 atypical (TFM-Scan 6bp/7bp motif) C. sinica GTTGTCCT-GGCAAC† -854 311 typical (HMMER) C. elegans GTTGTCAG-GGTAAC -187 299 typical (HMMER) C. japonica GTTTTCCT-GGAAAC† -391 298 typical (HMMER) C. angaria GTTGCTAG-GGAAAC† -667 295 typical (HMMER) H. bacteriophora GTCTCCAG-GGCAAC† -322 328 typical (HMMER) A. ceylanicum GTCTGAGC-TGCGAC† -587 194 atypical (manual inspection) A. ceylanicum ATCATCAT-GTCGAA† -1599 218 atypical (manual inspection) N. americanus GTTCCGGTAGTAGAC† -415 232 atypical (manual inspection) N. americanus GTAAGCAT-AGTAAT† -1284 262 atypical (manual inspection) N. americanus GTAATTCT-GCTTAC† -1578 200 atypical (manual inspection) P. pacificus GTCGATATATTCAAC† -90 252 atypical (manual inspection) P. pacificus GTACATACTTCTTAC† -154 170 atypical (manual inspection) P. exspectatus CTTGTGAT-GGGAAG† -273 236 atypical (manual inspection) S. ratti GTTGTCTT-GGTTAC† -208 273 atypical (TFM-Scan 6bp/7bp motif) M. hapla GTTGCTATTAGTAAC† -348 315 atypical (TFM-Scan 6bp/7bp motif) O. volvulus GTTTTATC-AGAAAC† -549 228 atypical (manual inspection) L. loa ATTTTTAT-GGTGAC† -1973 256 atypical (relaxed/average consensus regex)

Figure 4.73: Distribution of X-box motifs in the nphp-2 promoter.

187 (a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 5 input sequences, in- tifs (generated from 11 input se- tifs (generated from 6 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.74: Sequence logo of nphp-2 X-box motifs.

Figure 4.75: LOGO depicting aligned nphp-2 X-box motifs and 30bp of flanking promoter sequence.

4.2.26 X-box motifs in the odr-4 promoter

Typical X-box motif candidates were found in the promoters of 8 out of 14 odr-4 orthologs (Table 4.27). Of the remaining 6 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 5 genes. The locations of the X-box motif show some conservation, with clustering around the - 50 to -300 region and some outliers (4.76). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motifs show some similarity to the consensus (Figure 4.77). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.78). Table 4.27: Alignment of X-box motifs in the odr-4 promoter

Species Sequence Location Score Type (Method)

C. remanei ATCGTCAT-GGTAAC† -264 290 typical (HMMER) C. tropicalis ATCGTCAT-GGTAAC† -230 290 typical (HMMER) C. brenneri ATCGTCAT-GGTAAC† -252 290 typical (HMMER) C. sinica ATCGTCAT-GGTAAC† -254 290 typical (HMMER) C. briggsae ATCGCCAT-GGTTAC -262 288 typical (HMMER) C. elegans ATCGTCAT-GGTAAC -200 290 typical (HMMER) H. bacteriophora GATGTCAT-GGCAAC† -95 310 typical (HMMER) H. bacteriophora GTTACTAT-TGTAAC† -1107 304 typical (HMMER) A. ceylanicum GTTACCAT-AGTAAC† -96 337 typical (HMMER) N. americanus GTTACCAT-AGTAAC† -96 337 typical (HMMER) A. suum GTCTTTTT-AAGGGT† -42 141 atypical (relaxed consensus regex) A. suum GTCCTCTA-AAGGAC† -713 186 atypical (relaxed consensus regex) A. suum GTTGTCAA-AAAAGC† -784 228 atypical (relaxed consensus regex) D. immitis GTATAAATCAGCAAT† -238 253 atypical (manual inspection) O. volvulus GTAGTTATTAACAAC† -47 263 atypical (manual inspection) B. malayi GTTCACAT-TCGAAT† -629 236 atypical (manual inspection) B. malayi GTAGCCATACATAG-† -113 228 atypical (manual inspection)

188 Alignment of X-box motifs in the odr-4 promoter

Species Sequence Location Score Type (Method)

B. malayi GTTGTAAT-TATCAT† -15 200 atypical (manual inspection) L. loa GTCAACATCCGGTAC† -245 258 atypical (manual inspection) L. loa GTTAATAGCAGATAC† -1297 226 atypical (manual inspection) L. loa GTACAAAT-AGTAAC† -234 258 atypical (manual inspection)

Figure 4.76: Distribution of X-box motifs in the odr-4 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 10 input sequences, in- tifs (generated from 7 input se- tifs (generated from 4 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.77: Sequence logo of odr-4 X-box motifs.

Figure 4.78: LOGO depicting aligned odr-4 X-box motifs and 30bp of flanking promoter sequence.

189 4.2.27 X-box motifs in the osm-1 promoter

Typical X-box motif candidates were found in the promoters of 25 out of 26 osm-1 orthologs (Table 4.28). We found atypical X-box motif candidates for the remaining gene. The locations of the X- box motif show strong conservation, with clustering around the -100 region (4.79). In addition, the only two outliers are additional motifs, and each affected gene contains a motif that falls into the expected region. The sequences of the typical X-box motif candidates are similar to the known X- box consensus (Figure 4.80). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.81).

190 Table 4.28: Alignment of X-box motifs in the osm-1 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTTGCCATGGACAC† -77 320 typical (HMMER) C. tropicalis GTTGCCATGGACGC† -71 288 typical (HMMER) C. brenneri GTTACCATGGACAC† -75 318 typical (HMMER) C. sinica GTTGCCATGGTCAC† -80 324 typical (HMMER) C. briggsae GTTGCCATGGACAC -79 320 typical (HMMER) C. elegans GCTACCATGGCAAC -86 333 typical (HMMER) C. elegans GTATCCATACCAAC† -1320 310 typical (HMMER) C. japonica GTTGCTATGGACAC† -74 296 typical (HMMER) C. angaria GTTGCTATGGCAAT† -95 313 typical (HMMER) H. bacteriophora GTTGCCAAGACAAC† -60 314 typical (HMMER) H. contortus GTGTCCATGGAAAC† -80 331 typical (HMMER) HCOI00970500.t1 H. contortus GTGTCCATGGAAAC† -81 331 typical (HMMER) HCOI00735000.t1 A. ceylanicum GTGTCCATGGAAAC† -81 331 typical (HMMER) N. americanus GTATCCATGGAAAC† -88 333 typical (HMMER) P. pacificus GTTCCCATGGTTAC† -207 320 typical (HMMER) 191 P. exspectatus GTTGCTATGGTTGC† -192 270 typical (HMMER) S. ratti GTTACTATGGTAAC† -153 326 typical (HMMER) P. redivivus GTCGTCATGGCAAC† -92 329 typical (HMMER) B. xylophilus GTTGCTTTAATAGT† -1866 204 atypical (relaxed consensus regex) B. xylophilus GTTGCC-TGGTTAC† -87 296 atypical (TFM-Scan 6bp motif) M. incognita GTTTCCATGGAAAC† -93 352 typical (HMMER) Minc00536 M. incognita GTTTCCATGGAAAC† -92 352 typical (HMMER) Minc02229 M. hapla GTTTCCATGGAAAC† -59 352 typical (HMMER) A. suum GTCGTCATGACAAC† -79 307 typical (HMMER) D. immitis GTAACCATGCCAAC† -127 317 typical (HMMER) O. volvulus GTAACCATGCCAAC† -128 317 typical (HMMER) L. loa GTAACCATGCCAAC† -141 317 typical (HMMER) L. loa GTTTCCATGTAAAT† -84 297 typical (HMMER) T. spiralis GTTGCCATGGTTAC† -34 326 typical (HMMER) T. suis GTTGTCATGACAAC† -35 319 typical (HMMER) Figure 4.79: Distribution of X-box motifs in the osm-1 promoter.

Figure 4.80: Sequence logo of osm-1 typical X-box motifs (generated from 28 input sequences, including C. elegans motif(s)).

Figure 4.81: LOGO depicting aligned osm-1 X-box motifs and 30bp of flanking promoter sequence.

4.2.28 X-box motifs in the osm-5 promoter

Typical X-box motif candidates were found in the promoters of 18 out of 21 osm-5 orthologs (Table 4.29). We found atypical X-box motif candidates for the remaining 3 genes. The locations of the X- box motif show strong conservation, with clustering around the -100 region and some outliers (4.82). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motif candidates show some similarity to the consensus (Figure 4.83). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.84).

192 Table 4.29: Alignment of X-box motifs in the osm-5 promoter

Species Sequence Location Score Type (Method)

C. remanei ATCACTAT-GGAAAC† -123 284 typical (HMMER) C. tropicalis GTTACCAT-AGAGAC† -109 307 typical (HMMER) C. brenneri GTTACCAA-GGAGAC† -110 291 typical (HMMER) C. sinica GTTACCAT-GGAAAT† -108 318 typical (HMMER) C. briggsae GTTGCCAG-GGAAAC -92 319 typical (HMMER) C. elegans GGTGCCAT-GGCAAC† -67 334 typical (HMMER) C. elegans GTTACTAT-GGCAAC -115 339 typical (HMMER) C. japonica GTGCGTAT-ATGAAT† -611 200 atypical (manual inspection) C. japonica GTGCGGGT-GAGTAC† -1281 187 atypical (manual inspection) C. angaria GTTGCCAT-GGGAGC† -192 315 typical (HMMER) H. bacteriophora GTTCCCAT-AGCAAC† -64 346 typical (HMMER) A. ceylanicum GTATCCTT-GGCAAC† -81 321 typical (HMMER) P. pacificus GTTTCCAT-AGCAAC† -95 356 typical (HMMER) S. ratti GTTGCTAT-AGTAAA† -1282 285 typical (HMMER) S. ratti GTAACCAT-GGTAAT† -79 303 typical (HMMER) P. redivivus GTATCGAT-TGTGAA† -169 231 atypical (manual inspection) P. redivivus GTACCTGTTTGAAAC† -476 248 atypical (manual inspection) P. redivivus GTGACGATTAGAAAC† -813 284 atypical (manual inspection) B. xylophilus GTAGCTAT-GGAAAC† -128 305 typical (HMMER) B. xylophilus ATTGTCAT-AACAAC† -148 280 typical (HMMER) M. hapla GTCAGAAT-TTACAC† -674 201 atypical (manual inspection) M. hapla GTTGCTATTGGTTAC† -104 302 atypical (manual inspection) A. suum GTTGCTAT-GGTGAA† -87 272 typical (HMMER) A. suum GTTGCTAT-GGTGAA† -273 272 typical (HMMER) D. immitis GTTGTCAT-GGTGAG† -49 272 typical (HMMER) O. volvulus GTTGTCAT-GGTGAA† -79 272 typical (HMMER) B. malayi GTTGTCAT-GGTGAA† -80 272 typical (HMMER) L. loa GTTGTCAT-GGTGAA† -92 272 typical (HMMER) T. spiralis ACTGCCAT-AGCAAC† -76 296 typical (HMMER) T. suis GTTTCCAT-AGTGAC† -60 317 typical (HMMER)

193 Figure 4.82: Distribution of X-box motifs in the osm-5 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 23 input sequences, in- tifs (generated from 4 input se- tifs (generated from 3 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.83: Sequence logo of osm-5 X-box motifs.

Figure 4.84: LOGO depicting aligned osm-5 X-box motifs and 30bp of flanking promoter sequence.

4.2.29 X-box motifs in the osm-6 promoter

Typical X-box motif candidates were found in the promoters of 18 out of 26 osm-6 orthologs (Table 4.30). Of the remaining 8 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 7 genes. The locations of the X-box motif show strong conservation, with clustering around the -100 region and some outliers (4.85). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motif candidates show some similarity to the consensus (Figure 4.86). Lastly, we generated a sequence LOGO including the 30bp sequence

194 flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.87).

195 Table 4.30: Alignment of X-box motifs in the osm-6 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei AT-CTCCATGGCAAC-† -106 331 typical (HMMER) C. tropicalis GT-TTCCATGACAAC-† -118 347 typical (HMMER) C. brenneri GT-TTCCATGACAAC-† -98 347 typical (HMMER) CBN06040 C. brenneri GC-TTCCATGACAAC-† -104 317 typical (HMMER) CBN29520 C. sinica GT-CTCCATGACAAC-† -92 335 typical (HMMER) C. briggsae GTTGCGAATGGGAAT-† -1424 236 atypical (TFM-Scan 7bp motif) C. briggsae GTATCCCTTGACAAC-† -86 308 atypical (TFM-Scan 6bp motif) C. elegans GT-TACCATAGTAAC- -101 337 typical (HMMER) C. japonica GT-CTCCATGGCAAC-† -96 357 typical (HMMER) C. angaria GT-TTCCATAGCAAC-† -85 356 typical (HMMER) H. bacteriophora GT-TGCTATGGGTTAC† -389 235 atypical (TFM-Scan 6bp/7bp motif) H. bacteriophora GT-AGTTATGAATAC-† -1712 233 atypical (TFM-Scan 7bp motif) H. contortus GT-CGCCATGACGAC-† -196 305 typical (HMMER) A. ceylanicum GT-CGCCATGGCAAC-† -218 353 typical (HMMER) N. americanus GT-AACCATAGCGAC-† -217 305 typical (HMMER) 196 P. pacificus GT-TGCTATGGGGAC-† -92 297 typical (HMMER) P. exspectatus GT-TAGCATACCAAA-† -1793 265 atypical (manual inspection) P. exspectatus GT-AGGGTTAGCATAC† -1798 186 atypical (manual inspection) P. exspectatus GT-ACTAATGTCAAAC† -1473 231 atypical (manual inspection) S. ratti GT-TACCATGGAAAC-† -71 346 typical (HMMER) P. redivivus GT-TACCATGACGAC-† -239 315 typical (HMMER) B. xylophilus GT-TGCCATGGAAAG-† -131 318 typical (HMMER) M. incognita AT-CTCCATGGCAAC-† -57 331 typical (HMMER) Minc10419a M. incognita AT-CTCCATGACAAC-† -58 309 typical (HMMER) Minc08072a M. hapla GT-TTCCATGGCAAC-† -63 369 typical (HMMER) A. suum GT-CACCATAGCAAC-† -363 338 typical (HMMER) D. immitis GTTATTACTGATAAC-† -869 233 atypical (TFM-Scan 7bp motif) D. immitis GTAACCCATAGCAAC-† -351 327 atypical (TFM-Scan 6bp/7bp motif) O. volvulus GTAACCCATGGCAAC-† -325 340 atypical (TFM-Scan 6bp/7bp motif) L. loa GT-CATTTTGACAAC-† -1396 252 atypical (relaxed/average consensus regex) T. spiralis GT-TGAAATGTTGAC-† -891 243 atypical (manual inspection) T. spiralis GT-TCAAATGATCAAC† -991 210 atypical (manual inspection) Figure 4.85: Distribution of X-box motifs in the osm-6 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 19 input sequences, in- tifs (generated from 2 input se- tifs (generated from 9 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.86: Sequence logo of osm-6 X-box motifs.

Figure 4.87: LOGO depicting aligned osm-6 X-box motifs and 30bp of flanking promoter sequence.

4.2.30 X-box motifs in the osm-12 promoter

Typical X-box motif candidates were found in the promoters of 15 out of 21 osm-12 orthologs (Table 4.31). We found atypical X-box motif candidates for the remaining 6 genes. The locations of the X-box motif show some conservation, with clustering around the -100 to -300 region (4.88). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motif candidates show some similarity to the consensus (Figure 4.89). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.90).

197 Table 4.31: Alignment of X-box motifs in the osm-12 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GT-TGCCATGGCAAC† -121 365 typical (HMMER) C. tropicalis GT-TTCCATGACAAC† -190 347 typical (HMMER) C. brenneri GT-CACCATAGCAAC† -181 338 typical (HMMER) C. sinica GT-TGCCATGGTGAC† -741 326 typical (HMMER) C. briggsae GT-TGCCATGGTTAC -138 326 typical (HMMER) C. elegans GT-TGCCATAGTAAC -108 339 typical (HMMER) C. japonica GT-CGCCATTGAAAC† -1830 314 typical (HMMER) C. japonica GT-TGCTATGGTAAC† -85 328 typical (HMMER) C. angaria GT-TGCCATGGCAAT† -89 337 typical (HMMER) H. bacteriophora GT-TGTCATAACAAC† -250 306 typical (HMMER) H. contortus GT-TACCATAGAGAC† -612 307 typical (HMMER) HCOI02168000.t1 H. contortus GT-TACCATAGAGAC† -613 307 typical (HMMER) HCOI00752400.t1 A. ceylanicum GT-AACCATGGCAAC† -947 344 typical (HMMER) P. pacificus AT-CACCATGACGAC† -245 277 atypical (relaxed/average consensus regex) P. exspectatus AT-CACCATGGCGAC† -248 299 typical (HMMER) 198 S. ratti GT-TTCCATAGAAAC† -76 339 typical (HMMER) P. redivivus GT-AACCATGGAAAC† -100 327 typical (HMMER) B. xylophilus GT-TTTTAAAATGAT† -1819 190 atypical (relaxed/average consensus regex) B. xylophilus GTCTTCAAAAGAAAC† -248 282 atypical (TFM-Scan 6bp motif) A. suum GTCTCAATTCGTTAC† -947 214 atypical (manual inspection) A. suum GTTATAGATGTCTAC† -769 241 atypical (manual inspection) A. suum GT-TAGCGTTG-GAC† -120 238 atypical (manual inspection) O. volvulus GT-TGCTAT-GCAAC† -359 319 atypical (TFM-Scan 6bp motif) B. malayi GT-TTCCTTGGAAAC† -309 323 typical (HMMER) L. loa GT-CACTA-GGACAC† -1722 252 atypical (TFM-Scan 6bp motif) L. loa GT-TGCTA-AGCAAC† -257 298 atypical (TFM-Scan 6bp motif) T. spiralis GT-TGCT-TGGCAAC† -112 311 atypical (TFM-Scan 6bp motif) Figure 4.88: Distribution of X-box motifs in the osm-12 promoter.

(a) Typical X-box motifs (generated from 16 input (b) Atypical 13bp X-box motifs (generated from sequences, including C. elegans motif(s)) 5 input sequences)

(c) Atypical 14bp X-box motifs (generated from (d) Atypical 15bp X-box motifs (generated from 2 input sequences) 3 input sequences)

Figure 4.89: Sequence logo of osm-12 X-box motifs.

Figure 4.90: LOGO depicting aligned osm-12 X-box motifs and 30bp of flanking promoter sequence.

199 4.2.31 X-box motifs in the tub-1 promoter

Typical X-box motif candidates were found in the promoters of 11 out of 14 tub-1 orthologs (Table 4.32). Of the remaining 3 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remaining 2 genes. The locations of the X-box motif show some conservation, with clustering around the -100 to -300 region (4.91). The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box motif candidates show some similarity to the consensus (Figure 4.92). Lastly, we generated a sequence LOGO including the 30bp sequence flanking the X- box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.93). Table 4.32: Alignment of X-box motifs in the tub-1 promoter

Species Sequence Location Score Type (Method)

C. remanei ATCA-CCATGGCAAC† -242 325 typical (HMMER) C. brenneri GTCG-CCTTGGAGAC† -1171 281 typical (HMMER) C. brenneri ATCT-CCATGGCAAC† -1099 331 typical (HMMER) C. sinica ATCA-CCATAGCAAC† -238 312 typical (HMMER) C. briggsae ATCA-CCATGGCAAC -233 325 typical (HMMER) C. elegans ATCT-CCATGACAAC -184 309 typical (HMMER) C. japonica GTTG-CCATGACAAC† -1266 343 typical (HMMER) C. angaria ATCA-CCATGACAAC† -953 303 typical (HMMER) C. angaria GTAT-CTATAGAAAC† -1008 296 typical (HMMER) H. bacteriophora GTAT-CCATAGCAAC† -490 337 typical (HMMER) H. contortus GTCG-CCA-GGCAAC† -1231 323 atypical (TFM-Scan 6bp motif) S. ratti GTTA-CCCTAGTAAC† -790 307 typical (HMMER) P. redivivus GTTGCCCCTGGCAAC† -214 335 atypical (TFM-Scan 6bp/7bp motif) O. volvulus GTTG-CCTTGGTAAC† -727 323 typical (HMMER) T. spiralis GTTT-CTATGGCAAT† -1002 317 typical (HMMER) T. spiralis GTTT-CCCTGGAAAC† -112 322 typical (HMMER) T. spiralis GTTT-CCCAGGCAAC† -1255 310 typical (HMMER) T. suis GATG-CCATGGAAAC† -121 317 typical (HMMER)

200 Figure 4.91: Distribution of X-box motifs in the tub-1 promoter.

Figure 4.92: Sequence logo of tub-1 typical X-box motifs (generated from 16 input sequences, in- cluding C. elegans motif(s)).

Figure 4.93: LOGO depicting aligned tub-1 X-box motifs and 30bp of flanking promoter sequence.

4.2.32 X-box motifs in the xbx-1 promoter

Typical X-box motif candidates were found in the promoters of 19 out of 23 xbx-1 orthologs (Table 4.33). Of the remaining 4 genes, one was eliminated from further X-box motif searches because it contains gaps (Ns) in the promoter. We found atypical X-box motif candidates for the remain- ing 3 genes. The locations of the X-box motif show strong conservation, with clustering around the -100 region (4.94). In addition, most of the outliers are additional motifs, where the gene con- tains multiple candidate X-box motifs and one falls into the expected region. The sequences of the typical X-box motif candidates are similar to the known X-box consensus, and the atypical X-box

201 motif candidates show some similarity to the consensus (Figure 4.95a). Lastly, we generated a se- quence LOGO including the 30bp sequence flanking the X-box motif, and we do not observe any significantly conserved sequences outside of the X-box motif (Figure 4.96). Table 4.33: Alignment of X-box motifs in the xbx-1 promoter

Species Sequence Location Score Type (Method) Comment

C. remanei GTT-TCCATG-GAGAC† -81 326 typical (HMMER) C. tropicalis GTT-GTCATG-GAAAC† -109 324 typical (HMMER) C. brenneri GTT-GACATG-GTAAC† -118 324 typical (HMMER) C. sinica GTT-CCCATG-ACAAC† -96 337 typical (HMMER) C. briggsae GTT-TCCATG-GTTAC -94 330 typical (HMMER) C. elegans GTT-TCCATG-GTAAC -79 356 typical (HMMER) C. japonica GTT-GCCATG-GAGAC† -107 322 typical (HMMER) H. bacteriophora GTT-GCCATG-GAAAT† -83 320 typical (HMMER) H. contortus GTT-GTTACG-GAAAC† -1618 270 typical (HMMER) A. ceylanicum GTT-TCCATG-GCAAC† -74 369 typical (HMMER) N. americanus GTT-CCCATA-GCAAC† -76 346 typical (HMMER) P. pacificus ATC-TCCATA-GCAAC† -124 318 typical (HMMER) P. exspectatus ATC-TCCATG-GCAAC† -125 331 typical (HMMER) S. ratti GTT-ACCCTG-GTAAC† -69 320 typical (HMMER) P. redivivus GTT-ACCATG-GAAAC† -75 346 typical (HMMER) B. xylophilus GTT-GCTA-G-GTGAC† -77 272 atypical (TFM-Scan 6bp motif) A. suum ATC-TCCATG-ACGAC† -559 283 typical (HMMER) D. immitis GTT-GCTATG-CAAAC† -120 297 typical (HMMER) D. immitis ATT-CTCATG-ACAAC† -535 287 typical (HMMER) O. volvulus GTC-AAATTG-ATCAC† -613 203 atypical (manual inspection) OVOC7272 O. volvulus GTT-GTCAGATATAAC† -843 264 atypical (manual inspection) OVOC7272 O. volvulus GTT-TCTGTC-CATAC† -360 225 atypical (manual inspection) OVOC7272 O. volvulus TTGCACCATG-GCAAC† -99 313 atypical (manual inspection) OVOC7272 O. volvulus GTTGATTAAA-GCTAC† -1182 247 atypical (manual inspection) OVOC7345 O. volvulus GTT-CGTATT-CAAAA† -1870 211 atypical (manual inspection) OVOC7345 O. volvulus GTT-TCTGTC-CATAC† -360 225 atypical (manual inspection) OVOC7345 O. volvulus GTT-TCGATATATTAC† -516 267 atypical (manual inspection) OVOC7345 O. volvulus TTGCACCATG-GCAAC† -99 313 atypical (manual inspection) OVOC7345 B. malayi ATA-ACCATG-GCAAC† -175 318 typical (HMMER) L. loa ATA-ACCATG-ACAAC† -119 296 typical (HMMER) L. loa ATT-TGCATG-GCAAC† -653 315 typical (HMMER) T. spiralis GTT-GCTATG-GGGAC† -36 297 typical (HMMER) T. spiralis GTT-TCCATG-GTAGC† -123 324 typical (HMMER)

202 Figure 4.94: Distribution of X-box motifs in the xbx-1 promoter.

(a) Typical X-box motifs (gener- (b) Atypical 14bp X-box mo- (c) Atypical 15bp X-box mo- ated from 23 input sequences, in- tifs (generated from 4 input se- tifs (generated from 5 input se- cluding C. elegans motif(s)) quences) quences)

Figure 4.95: Sequence logo of xbx-1 X-box motifs.

Figure 4.96: LOGO depicting aligned xbx-1 X-box motifs and 30bp of flanking promoter sequence.

203 4.3 Putative X-box motifs in C. briggsae promoters

In C. briggsae, there are 4 ciliary gene orthologs with high confidence 5’ start sites that do not contain typical X-box motifs. These genes are CBG09893 (bbs-4 ortholog), CBG10029 (bbs-4 ortholog), CBG16827 (Cbr-mksr-2), and CBG23329 (Cbr-osm-6). We identified putative atypical X-box motifs for each of these genes, shown in Table 4.3.

Table 4.34: Summary of atypical X-box motifs in C. briggsae

Species Gene X-box Sequence Location Type (Score) Validated

C. elegans R31.3/osm-6 GTTACCATAGTAAC -101 HMMER (6.59) Yes

C. briggsae CBG23329/ GTATCCCTTGACAAC -86 TFM-Scan 6bp motif No data Cbr-osm-6 (97) C. briggsae CBG23329/ GTTGCGAATGGGAAT -1424 TFM-scan 7bp motif No data Cbr-osm-6 (96)

C. elegans Y38F2AL.2/ GTTGCCGTGGCAAC -67 HMMER (5.61) Yes mksr-2 C. briggsae CBG16827/ GTTGCCTAGACGAC -71 TFM-Scan 6bp motif No data Cbr-mksr-2 (99)1 C. briggsae CBG16827/ ATCGCCCGCGAGAAC -1359 TFM-scan 6bp motif No data Cbr-mksr-2 (66)

C. elegans F58A4.14a/ GTTTCCATGGCAAC -65 HMMER (10.05) No data bbs-4 C. briggsae CBG09893 GTTGGCATAGTTAC -74 TFM-Scan 7bp motif No data (60) C. briggsae CBG10029 GTCGTCGAGGAAC -352 TFM-Scan 6bp motif No data (80)

We conducted promoter alignments between each C. elegans and C. briggsae gene to analyze these putative motifs. 2kb promoters were aligned using the EMBOSS implementation of Needleman- Wunsch, and alignments are shown in Figures 4.97 to 4.100. The promoter sequences are depicted in 5’ to 3’ orientation, with the start codon adjacent of coordinate 2000. Although we do not have definitive evidence demonstrating that these X-box motifs are valid, it is likely that some of these candidate atypical motifs are true motifs. In particular, one of the Cbr-osm-6 candidate X-box motifs overlaps the C. elegans validated motif. In other cases, the C. elegans motif is disrupted by gaps in the C. briggsae promoter, but there are candidate X-box motifs nearby.

In the case of osm-6 (figure 4.97), the C. elegans and C. briggsae promoters are highly conserved, with only minor insertions and deletions. The location of the proximal motif predicted by TFM-scan

1This motif was also found using the "relaxed" regular expression published by Efimenko et al. (2005).

204 falls into the expected region and overlaps the C. elegans validated motif. There is also a putative distal motif, also located in a conserved region of the promoter. Both of these putative motifs are 15bp in length, differing from the 14bp motifs reported in C. elegans ciliary genes.

Figure 4.97: Promoter alignment of C. elegans osm-6 and C. briggsae Cbr-osm-6

The mksr-2 promoters are less conserved, containing many indels. The proximal motif predicted by TFM-Scan in C. briggsae is near, but does not overlap, the C. elegans X-box motif. Both of these motifs correspond to sequences interruped by insertions or deletions in the other genome.

205 Figure 4.98: Promoter alignment of C. elegans mksr-2 and C. briggsae Cbr-mksr-2.

The C. elegans bbs-4 and C. briggsae CBG10029 promoters show significant divergence, including a long insertion in the C. elegans promoter. The reported C. elegans bbs-4 motif has not been validated via mutagenesis or DAF-19 knockdown. The C. elegans motif overlaps with an insertion in the corresponding C. briggsae promoter, and the predicted C. briggsae motif is only 13bp long.

206 Figure 4.99: Promoter alignment of C. elegans bbs-4 and C. briggsae CBG10029.

The other bbs-4 ortholog in C. briggsae is CBG09893, and its promoter has higher sequence simi- larity with C. elegans than CBG10029 does. In this case, the predicted motif in C. briggsae overlaps with the C. elegans motif, and the C. briggsae promoter contains a small insertion in this region.

207 Figure 4.100: Promoter alignment of C. elegans bbs-4 and C. briggsae CBG09893.

4.4 Discussion of validation procedures

In this section, we will discuss methods to validate newly identified X-box motifs, as well as unique characteristics they may exhibit. This is especially important for atypical putative X-box motifs, which we hypothesize may carry functional information regarding the expression patterns of cil- iary genes. In these descriptions, we use C. briggsae osm-6 (CBG23329) as an example of a gene containing two putative atypical X-box motifs.

4.4.1 In vitro methods

In vitro methods can be used to demonstrate DAF-19 binding to the X-box motif. One such method is a electrophoretic mobility shift assay, where DNA bound to protein will have less mobility along the gel compared to unbound DNA, causing a band shift (Hellman and Fried, 2007). Biotinylated probes containing sequence of the X-box motif and flanking regions can be used to visualize the band shift, and lanes with probes and cytosolic or nuclear extract will exhibit bands that are absent from lanes with free probe (Mah et al., 2007). An additional lane with excess of unlabelled probe in addition to labelled probe and cytosolic or nuclear extract can be used to demonstrate binding specificity, as the bands containing DNA bound to DAF-19 will not be observed (Mah et al., 2007).

Another technique that can be used to validate the interaction of DAF-19 and the X-box motif is chromatin immunoprecipitation (ChIP). In ChIP, proteins are reversibly cross-linked to their DNA using formaldehyde (Carey et al., 2009). The chromatin is sheared by sonication, and immunopre- cipitation is performed using antibodies specific to the protein of interest. In this case, the complex of DAF-19 bound to the X-box motif can be specifically selected for using either antibodies to DAF-19 or antibodies to GFP if DAF-19 is fused to GFP in this experiment.

4.4.2 Transcriptional reporter gene assay

A straightforward method is to use a reporter gene assay consisting of the target promoter fused to a reporter gene, typically green fluorescent protein (GFP) (Boulin et al., 2006). The two sequences can be fused together by stitching polymerase chain reaction (PCR), where gfp and the target promoter will be amplified and the 3’ primer for the promoter sequence contains a fragment overlapping with the gfp gene sequence. Subsequent rounds of PCR facilitate the fusion of the two PCR products since there is a region of overlap (Hobert, 2002). The linear construct can be amplified using PCR and injected into the distal arm of the gonad using a process called microinjection (Evans, 2006). The

208 injected DNA concatenates to form extrachromosomal arrays, which are incorporated into the nu- cleus and replicated during mitosis (Evans, 2006; Frøkjær-Jensen et al., 2008). Some of the progeny will carry the extrachromosomal array and will exhibit visible fluorescence. Since the construct is present in hundreds of copies, the progeny containing the extrachromosomal array may exhibit unex- pected or undesirable phenotypes caused by overexpression of certain genes (Frøkjær-Jensen et al., 2008). In our case, we are expressing GFP, so issues of dominant-negative phenotypes or toxicity are not relevant; however, expression level cannot be accurately quantified using this method due to the copy number issue.

In this experiment, we should first demonstrate that a 500bp C. elegans osm-6 promoter can drive GFP expression in C. elegans. This serves as a positive control. Next, we should show that a 500bp C. briggsae osm-6 promoter can drive GFP expression in C. elegans. This demonstrates that the C. briggsae osm-6 promoter is functional. Next, we can mutate the putative atypical X-box motifs in the 500bp C. briggsae promoter using site-directed mutagenesis. This involves amplifying the sequence using primers that contain the desired mutation (similar to study in Chu et al. (2012)). If GFP expression in C. elegans is abolished when one or both of the X-box motifs are mutated, this means that the X-box motif is functional.

4.4.3 MosSCI assay

If we want to further analyze atypical X-box motifs quantitatively, using extrachromosomal arrays is unsuitable because the array contains a variable number of copies of the gene. There are tech- niques that can be applied to mitigate the copy number issue associated with extrachromosomal ar- rays. These involve integrating the injected gene constructs into C. elegans chromosomes, whereby only a single copy of the gene construct will be expressed. This is achieved using a method called Mos1-mediated single-copy insertion (MosSCI), which uses the Mos1 transposon and transposase originally from Drosophila (Frøkjær-Jensen et al., 2008). This entails using a C. elegans strain where a Mos1 element has been inserted in a specific location, and preparing the targeting vector containing the transgene (promoter::GFP fusion with homologous flanking regions). The vector is microinjected along with the transposase gene and several markers: hsp::transposase, mCherry, and twk-18(gf) (Frøkjær-Jensen et al., 2008). When the progeny undergoes heat-shock, the transposase gene will be activated, causing the Mos1 element to be excised and replaced with the transgene.

Next, it is necessary to distinguish between worms carrying the extrachromosomal array and worms with the transgene integrated into their genome. This is achieved using a positive selection marker and a two negative selection markers. One approach is to use a unc-119(ed3) mutant strain containing the Mos1 element, and include the corresponding positive-selection marker unc-119(+) along with the transgene. This way, worms that are WT must either carry the extrachromosomal array or have integrated the vector into their genome. Next, the negative selection markers mCherry and twk- 18(gf) reveal worms that are carrying the extrachromosomal array. TWK-18(gf) encodes a potassium

209 channel that causes muscle paralysis at 25°C. Worms that survive at 25°C must therefore have lost the extrachromosomal array and have the transgene integrated into their genomes.

As a positive control, we should first show that 500bp C. elegans osm-6 promoter drives GFP ex- pression in C. elegans. Next, we should demonstrate that mutation of the X-box motif in the 500bp C. elegans osm-6 promoter causes loss of GFP expression. Lastly, we can validate the two putative C. briggase osm-6 X-box motifs. We should show that the 500bp C. elegans osm-6 promoter with X-box motif replaced by the C. briggsae osm-6 X-box restores GFP expression in C. elegans.

4.5 Polymorphisms in C. elegans strains

In addition to studying X-box motifs in different nematode species, we are also interested in finding variations in X-box motifs in organisms separated by a smaller evolutionary timescale. To do this, we examined promoter regions and X-box motifs within C. elegans strains for polymorphisms. The Million Mutation Project (Thompson et al., 2013) facilitated sequencing of 40 wild C. elegans strains (4.5). Since the X-box motif is relatively conserved across species, we may find very few or no polymorphisms at the strain level.

Raw sequencing data was obtained from the SRA website. The GXW0001 strain was excluded because no sequencing information was available in the SRA database. We aligned short reads to version WS250 of the C. elegans N2 reference genome using the Bowtie2 aligner, and identified variations using SAMtools mpileup and VarScan 2 (Li et al., 2009; Koboldt et al., 2012). Table 4.35: C. elegans strains sequenced in Million Mutation Project (Thompson et al., 2013)

Strain Name City, Province Country Isolated by

AB1 Adelaide, South Australia Australia Alan Bird, 1984 AB3 Adelaide, South Australia Australia Alan Bird, 1984 CB4853 Altadena, California U.S.A. Carl Johnson CB4854 Altadena, California U.S.A. Carl Johnson, 1974 CB4856 Hawaii U.S.A. L. Hollen, 1972 ED3017 Edinburgh Scotland A. Cutter/E. Dolgin, 2005 ED3021 Edinburgh Scotland A. Cutter/E. Dolgin, 2005 ED3040 Johannesburg, Gauteng South Africa E. Dolgin, 2006 ED3042 Ceres, Western Cape South Africa E. Dolgin, 2006 ED3049 Ceres, Western Cape South Africa E. Dolgin, 2006 ED3052 Ceres, Western Cape South Africa E. Dolgin, 2006 ED3057 Limuru, Kiambu West District Kenya E. Dolgin, 2006 ED3072 Limuru, Kiambu West District Kenya E. Dolgin, 2006 GXW0001 Wuhan China Guoxiu Julie Wang, 2010 JU258 Ribeiro Frio, Madeira Portugal M.A. Felix

210 C. elegans strains sequenced in Million Mutation Project (Thompson et al., 2013)

Strain Name City, Province Country Isolated by

JU263 Le Blanc, Indre France M.A. Felix JU300 Le Blanc, Indre France M.A. Felix, 2002 JU312 Merlet, Lagorce (Ardeche) France M.A. Felix, 2002 JU322 Merlet, Lagorce (Ardeche) France M.A. Felix, 2002 JU345 Merlet, Lagorce (Ardeche) France M.A. Felix, 2002 JU360 Franconville (Val d’Oise) France M.A. Felix, 2002 JU361 Franconville (Val d’Oise) France M.A. Felix, 2002 JU394 Hermanville (Calvados) France A. Barriere, 2002 JU397 Hermanville (Calvados) France A. Barriere, 2002 JU533 Primel-Trigastrel (Finistere) France M.A. Felix, 2004 JU642 Le Perreux sur Marne France M.A. Felix, 2004 JU775 Lisbon Portugal M.A. Felix, 2005 JU1088 Kakegawa Japan M.A. Felix, 2007 JU1171 Conception Chile M.A. Felix, 2007 JU1400 Seville, Andalucia Spain M.A. Felix, 2008 JU1401 Carmona, Andalucia Spain M.A. Felix, 2008 JU1652 Montevideo Uruguay R. Giordano KR314 Vancouver, B.C. Canada Fred Dill LKC34 Madagascar Madagascar V. Stowell, 2005 MY1 Lingen, Emsland Germany H. Schulenberg, 2002 MY2 Roxel, Munster Germany H. Schulenberg, 2002 MY6 Roxel, Munster Germany H. Schulenberg, 2002 MY14 Mecklenbeck, Munster Germany H. Schulenberg, 2002 MY16 Mecklenbeck, Munster Germany H. Schulenberg, 2002 PX174 Lincoln City, Oregon U.S.A. B. White

In order to accept a variation as significant, we require the variant allele frequency (VAF) to be at least 0.3, and the p-value from VarScan to be < 0.05. This filters out variations with low read support (e.g. only 1 variant read out of 20 total reads), which are likely to result from sequencing error, as well as variations in regions with low read coverage (e.g. 4 variant reads out of 4 total reads).

We found a total of two significant variations in the X-box motifs: in the bbs-5 gene in the JU360 and PX174 strains, the X-box motif is GTTGCCATAGAGAC, where the underlined A is substituted with G in the reference N2 strain (Figure 4.101). These variations had a VAF of 1 in both strains, and p-values of 7.25e-12 and 1.10e-10, respectively.

211 Figure 4.101: bbs-5 X-box motif variations in the JU360 and PX174 strains.

Expanding our search to the entire 2kb promoter regions, we found 1488 significant variations, con- sisting of 1100 single base substitutions and 388 indels. Since the promoter regions are larger, it is unclear whether variations occur at a significantly lower rate in X-box motifs compared to the promoter regions.

We conducted a Monte Carlo simulation to evaluate the significance of the X-box sequence conser- vation. Since all of the X-box motifs in the high confidence ciliary genes are 14bp, we randomly sample 14bp from each ciliary gene promoter and count the number of variations from the reference C. elegans sequence. After many repetitions of this random sampling, we can approximate the con- servation of the “average” 14bp sequence in the promoter and determine whether the conservation of the X-box motif regions is significant. For each X-box motif, we sampled a random 14bp segment in

212 that promoter and counted the number of variations, using the same set of random segments across strains. (We used the same thresholds of VAF ≥ 0.3 and p-value < 0.05 as earlier.) At the end of the trial, if there were no variations in any gene in any strain, we considered the trial a success. After 10,000 repetitions, we calculated a p-value as follows:

#successes p = 10000

We obtained a p-value of 0.256, indicating that the X-box motifs are not significantly more conserved than the promoter regions in these strains. Due to the small evolutionary distances between the C. elegans strains, there may not be enough total variations to observe a significant difference in the X-box motifs.

4.6 Discussion

X-box motifs have been well-studied in C. elegans and other organisms, with X-box motifs identified in many ciliary genes. These reported motifs have allowed us to build a profile of characteristics of a typical X-box motif. We have curated ciliary genes in 25 nematode species and focused on promoters of genes with high confidence 5’ start sites. However, not all of these promoters contain typical X- box motifs. There are two main possibilities for this scenario: the X-box motif could reside in a different location (i.e. outside the 2kb region we examined), or the X-box motif could differ from the typical X-box profile. The value of this project is identifying atypical X-box motif candidates in these promoters. We have exhaustively searched for X-box motifs in these promoters, and provide a set of candidate X-box motifs that should include some of the real motifs. These candidate atypical X-box motifs differ from the consensus in motif length and sequence composition. We find that atypical X-box motifs do not cluster together in phylogenetic trees, but instead are interspersed among typical X-box motifs.

We propose a mechanism for how atypical X-box motifs may have evolved. We think it is unlikely that atypical motifs evolved from mutations of typical motifs because mutations in the X-box motifs would compromise ciliary function and should be selected against in evolution. Instead, the typical motif could be disrupted by insertions, deletions, or transposable elements and another sequence in the promoter adapted to become a usable X-box motif; if this is true, it makes sense that there is no clustering of atypical X-box motifs. If typical X-box motifs changed nucleotide by nucleotide into atypical X-box motifs, there should be a gradient of nucleotide changes. This lack of clustering also shows that there are no general sequence “traits” of an atypical X-box motif, but instead a variety of divergence from consensus X-box motifs. Additionally, in our C. briggsae promoter alignments (Section 4.3), the C. elegans X-box motifs either partially correspond to deletions in the C. briggsae promoter (osm-6 and mksr-2) or partially correspond to insertions in the C. briggsae promoter (both

213 bbs-4 orthologs). Promoter alignments do contain noise, but this does provide some evidence for our hypothesis.

Among species with gene duplications, we notice that either both copies of the gene contain typical X-box motifs or both copies contain atypical X-box motifs. Theoretically, it would make sense if one copy of the gene contains a typical X-box motif and the other contains an atypical X-box motif, since gene duplications result in less selective pressure for one of the copies. Most of the gene duplications occur in C. brenneri, H. contortus, and M. incognita. In the case of C. brenneri, this could be a result of poor genome assembly quality caused by heterozygosity; up to 40% of the C. brenneri genome has been estimated to be heterozygous (Barriere et al., 2009). Apparent gene duplications in H. con- tortus and M. incognita could similarly be caused by incomplete genome assemblies, where short contigs may be merged into other regions of the assembly. One method of determining whether du- plicated genes are caused by assembly errors or are true gene duplications is by comparing flanking genes: if flanking genes of duplicated genes are also similar, then the gene duplication may actually be caused by misassembly and should be merged into one sequence. It is also possible to test the va- lidity these gene duplications using experimental assays such as PCR amplification of the sequence. Primers specific to each copy of the gene can be designed to determine if the gene can be amplified in each case. Another method that can be used is fluorescent in situ hybridization (FISH), where hybridization of fluorescent probes and the target DNA sequence can be detected. In FISH, probes complementary to the target gene sequence can be generated with modified nucleotides that either directly contain a fluorophore or a hapten that can later bind to fluorophores (reviewed in O’Connor (2008)). In this case, if the species of interest contains a gene duplication, multiple copies of the gene should be stained.

We can also hypothesize that atypical X-box motifs can be caused by sequencing errors. In this case, we would expect to see atypical X-box motifs overrepresented in genomes with poor sequencing or assembly quality. However, we do not observe an unusually high number of atypical X-box motifs in any of the species included in this study. The presence of atypical X-box motifs seems to be well-distributed, with a few exceptions: C. sinica and M. incognita do not have ciliary genes with atypical X-box motifs, and C. angaria, C. japonica, and N. americanus only contain one gene each with atypical X-box motifs. In addition, since we do not observe a gradient of nucleotide changes or clustering of X-box motifs, it does not seem like atypical X-box motifs are typical X-box motif sequences with some sequencing errors.

The next stage of this project would entail validation of these atypical X-box motifs. First, we should determine if the 2kb promoter can drive reporter gene expression in C. elegans cilia. Warrington (2015) demonstrated that in many cases, promoters from other organisms (in his case, C. briggsae) can be used to drive expression in C. elegans. Failure to drive reporter gene expression would indicate that the X-box motif is located outside of the 2kb region, and we would next determine if the general promoter region can drive expression. We can achieve this using a CRISPR/Cas9 knock-in assay.

214 Zheng et al. (2014) demonstrated the targeted replacement of a gene with EGFP gene using two cleavage sites; in a similar vein, we could replace the ciliary gene in the target genome with a GFP gene and determine if the promoter can drive GFP expression in cilia.

If the 2kb promoter does drive reporter gene expression in C. elegans cilia, then this indicates that the X-box motif is located in the 2kb region but is not an atypical motif identified in this project. In this case, we can generate constructs that have different segments deleted from the promoter, to narrow down the location of the X-box motif.

In summary, we have identified X-box motifs in ciliary gene promoters in nematodes. This includes typical X-box motifs, and when promoters do not contain typical X-box motifs, we additionally find atypical X-box motifs. We find that atypical X-box motifs comprise a variety of sequences that are dissimilar to typical X-box motifs as well as to each other. One clear difference is motif length; all C. elegans X-box motifs used in this study are 14bp long, but atypical motifs can be 13bp or 15bp as well. We also find that these X-box motifs do not relate to each other, as they do not show any clustering patterns in phylogenetic trees. We hypothesize that these atypical X-box motifs emerged de novo from promoter sequences rather than from mutations of typical X-box motifs.

215 Chapter 5

Conclusion

Cilia are ubiquitous in metazoans and some unicellular eukaryotes, and have important functions in development and sensation. They share a common microtubule-based structure encoded by a core set of ciliary genes, and consist of three regions: the basal body, transition zone, and axoneme. Cilia are built by the process of intraflagellar transport, which involves the IFT and BBSome complexes and motor proteins. IFT is responsible for transporting proteins along the cilium, including ciliary com- ponents such as tubulin and cargo proteins such as signalling molecules. Because they are involved in many biological processes, ciliary dysfunction results in a variety of ciliopathies with symptoms ranging from kidney and liver defects to vision and hearing loss. We focus on primary or immotile cilia, which are regulated by RFX genes via binding to the X-box motif in the promoters of target genes. In C. elegans, cilia are present in sensory neurons and are mainly regulated by a single RFX gene, DAF-19.

In this study, we have developed a pipeline for identifying and annotating ciliary genes as well as candidate X-box motifs in their promoters. Each stage of this pipeline has technical challenges, and we established criteria to evaluate results at each stage. The first stage of this project involved curating a set of “high-confidence” ciliary genes that are well-studied in C. elegans and demonstrated to be regulated by DAF-19. We identify orthologs of these genes using several methods, including a reciprocal best BLAST hit approach as well as using TBLASTN-based gene annotation tools. In order to study cis-regulatory elements in the promoter, it is imperative to accurately define the 5’ start site of each gene. We improved the gene annotation of ciliary gene orthologs, focusing on 5’ ends, and evaluated the quality of annotation using two main criteria: the first intron is supported by RNA- seq splicing junctions, and the first 100a.a. of the protein sequence is conserved when compared to orthologs in the nematode species included in the study. We think that missing or duplicated ciliary gene orthologs are likely to be caused by genome sequencing or assembly errors rather than true gene loss or duplication. This can be tested using further bioinformatics or experimental analyses.

216 One method that can be used is aligning flanking genes to determine if flanking sequences are also similar.

We identified X-box motifs in the promoters of ciliary genes, using known X-box motifs in C. el- egans as a model. We first used HMMER to identify candidate X-box motifs highly similar to the consensus, and referred to these as “typical” X-box motifs. However, not all ciliary genes contain typical X-box motifs, so we used additional methods to exhaustively search for X-box motifs that may be divergent from the consensus, naming these “atypical” X-box motifs. These methods include searching for half-motifs with flexible nucleotides in between, which allows us to search for 13bp and 15bp motifs, as well as using regular expressions, and manually inspecting promoter regions for sequences resembling X-box motifs. We have identified a set of candidate atypical X-box motifs, which differ in length and sequence composition. Although these atypical X-box motifs differ from the consensus, they are still regulated by DAF-19, indicating that DAF-19 can tolerate variations in the binding sequence. We have referred to X-box motifs as “typical” or “atypical” simply by whether or not they can be identified using HMMER, since motifs identified by HMMER are highly similar to the consensus. We found that these atypical X-box motifs do not show any clustering patterns in a phylogenetic tree, and that there is no nucleotide by nucleotide gradient of changes. Therefore, we think that atypical X-box motifs did not diverge from typical X-box motifs through point mutations, but instead emerged de novo from nearby promoter sequences following disruptions of the promoter such as insertions or deletions. In addition, the lack of clustering of atypical X-box motifs indi- cates that while these X-box motifs differ from the consensus, they do not share common patterns or characteristics.

In order to further study these candidate atypical X-box motifs, validation procedures are necessary. This can include in vitro assays such as electrophoretic mobility shift assays or chromatin immuno- precipitation, which can be used to demonstrate that DAF-19 binds to these atypical X-box motifs. Some in vivo methods we can use are transcriptional reporter gene assays using extrachromosomal arrays, or MosSCI if quantitative analysis of gene expression is desired. While atypical X-box mo- tifs differ from the consensus X-box sequence, the DAF-19 DBD remains conserved in the nematode species included in this study. Therefore, it will be interesting to see if these atypical sequences affect the interaction with the DAF-19 DBD, and as a result, ciliary gene expression level. These candi- date atypical X-box motifs will help our understanding of the ciliary gene regulation, and we can use knowledge of these features to identify and further study X-box motifs that are divergent from known X-box motifs.

217 Bibliography

P. Abad, J. Gouzy, J. M. Aury, P. Castagnone-Sereno, E. G. Danchin, E. Deleury, L. Perfus-Barbeoch, V. Anthouard, F. Artiguenave, V. C. Blok, M. C. Caillaud, P. M. Coutinho, C. Dasilva, F. De Luca, F. Deau, M. Esquibet, T. Flutre, J. V. Goldstone, N. Hamamouch, T. Hewezi, O. Jaillon, C. Ju- bin, P. Leonetti, M. Magliano, T. R. Maier, G. V. Markov, P. McVeigh, G. Pesole, J. Poulain, M. Robinson-Rechavi, E. Sallet, B. Segurens, D. Steinbach, T. Tytgat, E. Ugarte, C. van Ghelder, P. Veronico, T. J. Baum, M. Blaxter, T. Bleve-Zacheo, E. L. Davis, J. J. Ewbank, B. Favery, E. Gre- nier, B. Henrissat, J. T. Jones, V. Laudet, A. G. Maule, H. Quesneville, M. N. Rosso, T. Schiex, G. Smant, J. Weissenbach, and P. Wincker. Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat. Biotechnol., 26(8):909–915, Aug 2008.

S. Aftab, L. Semenec, J. S. Chu, and N. Chen. Identification and characterization of novel human tissue-specific RFX transcription factors. BMC Evol. Biol., 8:226, 2008.

B. Alberts, A. Johnson, J. Lewis, D. Morgan, M. Raff, K. Roberts, and P. Walter. Molecular Biology of the Cell. Garland Science, 6 edition, 2014. ISBN 9780815344322.

S. J. Ansley, J. L. Badano, O. E. Blacque, J. Hill, B. E. Hoskins, C. C. Leitch, J. C. Kim, A. J. Ross, E. R. Eichers, T. M. Teslovich, A. K. Mah, R. C. Johnsen, J. C. Cavender, R. A. Lewis, M. R. Leroux, P. L. Beales, and N. Katsanis. Basal body dysfunction is a likely cause of pleiotropic Bardet-Biedl syndrome. Nature, 425(6958):628–633, Oct 2003.

T. Avidor-Reiss and M. R. Leroux. Shared and Distinct Mechanisms of Compartmentalized and Cytosolic Ciliogenesis. Curr. Biol., 25(23):R1143–1150, Dec 2015.

X. Bai, B. J. Adams, T. A. Ciche, S. Clifton, R. Gaugler, K. S. Kim, J. Spieth, P. W. Sternberg, R. K. Wilson, and P. S. Grewal. A lover and a fighter: the genome sequence of an entomopathogenic nematode Heterorhabditis bacteriophora. PLoS ONE, 8(7):e69618, 2013.

Timothy L. Bailey and Charles Elkan. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning, 21(1):51–80, 1995.

A. Barriere, S. P. Yang, E. Pekarek, C. G. Thomas, E. S. Haag, and I. Ruvinsky. Detecting heterozy- gosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes. Genome Res., 19(3):470–480, Mar 2009.

C. Benoist and D. Mathis. Regulation of major histocompatibility complex class-II genes: X, Y and other letters of the alphabet. Annu. Rev. Immunol., 8:681–715, 1990.

S. A. Berman, N. F. Wilson, N. A. Haas, and P. A. Lefebvre. A novel MAP kinase regulates flagellar length in Chlamydomonas. Curr. Biol., 13(13):1145–1149, Jul 2003.

218 N. J. Bialas, P. N. Inglis, C. Li, J. F. Robinson, J. D. Parker, M. P. Healey, E. E. Davis, C. D. Inglis, T. Toivonen, D. C. Cottell, O. E. Blacque, L. M. Quarmby, N. Katsanis, and M. R. Leroux. Func- tional interactions between the ciliopathy-associated Meckel syndrome 1 (MKS1) protein and two novel MKS1-related (MKSR) proteins. J. Cell. Sci., 122(Pt 5):611–624, Mar 2009.

O. E. Blacque, M. J. Reardon, C. Li, J. McCarthy, M. R. Mahjoub, S. J. Ansley, J. L. Badano, A. K. Mah, P. L. Beales, W. S. Davidson, R. C. Johnsen, M. Audeh, R. H. Plasterk, D. L. Baillie, N. Katsanis, L. M. Quarmby, S. R. Wicks, and M. R. Leroux. Loss of C. elegans BBS-7 and BBS-8 protein function results in cilia defects and compromised intraflagellar transport. Genes Dev., 18(13):1630–1642, Jul 2004.

O. E. Blacque, E. A. Perens, K. A. Boroevich, P. N. Inglis, C. Li, A. Warner, J. Khattra, R. A. Holt, G. Ou, A. K. Mah, S. J. McKay, P. Huang, P. Swoboda, S. J. Jones, M. A. Marra, D. L. Baillie, D. G. Moerman, S. Shaham, and M. R. Leroux. Functional genomics of the cilium, a sensory organelle. Curr. Biol., 15(10):935–941, May 2005.

O. E. Blacque, C. Li, P. N. Inglis, M. A. Esmail, G. Ou, A. K. Mah, D. L. Baillie, J. M. Scholey, and M. R. Leroux. The WD repeat-containing protein IFTA-1 is required for retrograde intraflagellar transport. Mol. Biol. Cell, 17(12):5053–5062, Dec 2006.

E. Bonnafe, M. Touka, A. AitLounis, D. Baas, E. Barras, C. Ucla, A. Moreau, F. Flamant, R. Dubruille, P. Couble, J. Collignon, B. Durand, and W. Reith. The transcription factor RFX3 directs nodal cilium development and left-right asymmetry specification. Mol. Cell. Biol., 24(10): 4417–4427, May 2004.

T. Boulin, J. F. Etchberger, and O. Hobert. Reporter gene fusions. WormBook, pages 1–23, 2006.

A. G. Brear, J. Yoon, M. Wojtyniak, and P. Sengupta. Diverse cell type-specific mechanisms localize G protein-coupled receptors to Caenorhabditis elegans sensory cilia. Genetics, 197(2):667–684, Jun 2014.

S. L. Brody, X. H. Yan, M. K. Wuerffel, S. K. Song, and S. D. Shapiro. Ciliogenesis and left-right axis defects in forkhead factor HFH-4-null mice. Am. J. Respir. Cell Mol. Biol., 23(1):45–51, Jul 2000.

J. Burghoorn, M. P. Dekkers, S. Rademakers, T. de Jong, R. Willemsen, and G. Jansen. Mutation of the MAP kinase DYF-5 affects docking and undocking of kinesin-2 motors and reduces their speed in the cilia of Caenorhabditis elegans. Proc. Natl. Acad. Sci. U.S.A., 104(17):7157–7162, Apr 2007.

J. Burghoorn, B. P. Piasecki, F. Crona, P. Phirke, K. E. Jeppsson, and P. Swoboda. The in vivo dis- section of direct RFX-target gene promoters in C. elegans reveals a novel cis-regulatory element, the C-box. Dev. Biol., 368(2):415–426, Aug 2012.

M. F. Carey, C. L. Peterson, and S. T. Smale. Chromatin immunoprecipitation (ChIP). Cold Spring Harb Protoc, 2009(9):pdb.prot5279, Sep 2009.

N. Chen, A. Mah, O. E. Blacque, J. Chu, K. Phgora, M. W. Bakhoum, C. R. Newbury, J. Khattra, S. Chan, A. Go, E. Efimenko, R. Johnsen, P. Phirke, P. Swoboda, M. Marra, D. G. Moerman, M. R. Leroux, D. L. Baillie, and L. D. Stein. Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biol., 7(12):R126, 2006.

219 S. P. Choksi, G. Lauter, P. Swoboda, and S. Roy. Switching on cilia: transcriptional networks regu- lating ciliogenesis. Development, 141(7):1427–1441, Apr 2014.

J. S. Chu, D. L. Baillie, and N. Chen. Convergent evolution of RFX transcription factors and ciliary genes predated the origin of metazoans. BMC Evol. Biol., 10:130, 2010.

J. S. Chu, M. Tarailo-Graovac, D. Zhang, J. Wang, B. Uyar, D. Tu, J. Trinh, D. L. Baillie, and N. Chen. Fine tuning of RFX/DAF-19-regulated target gene expression through binding to multiple sites in Caenorhabditis elegans. Nucleic Acids Res., 40(1):53–64, Jan 2012.

D. G. Cole, D. R. Diener, A. L. Himelblau, P. L. Beech, J. C. Fuster, and J. L. Rosenbaum. Chlamy- domonas kinesin-II-dependent intraflagellar transport (IFT): IFT particles contain proteins re- quired for ciliary assembly in Caenorhabditis elegans sensory neurons. J. Cell Biol., 141(4): 993–1008, May 1998.

G. M. Cooper and R. E. Hausmann. The Cell: A Molecular Approach. Sinauer Associates, 2 edition, 2000. ISBN 9780878931064.

F. Crick. Central dogma of molecular biology. Nature, 227(5258):561–563, Aug 1970.

J. G. Culotti and R. L. Russell. Osmotic avoidance defective mutants of the nematode Caenorhabditis elegans. Genetics, 90(2):243–256, Oct 1978.

C. A. Desjardins, G. C. Cerqueira, J. M. Goldberg, J. C. Dunning Hotopp, B. J. Haas, J. Zucker, J. M. Ribeiro, S. Saif, J. Z. Levin, L. Fan, Q. Zeng, C. Russ, J. R. Wortman, D. L. Fink, B. W. Birren, and T. B. Nutman. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet., 45(5):495–500, May 2013.

C. Dieterich, S. W. Clifton, L. N. Schuster, A. Chinwalla, K. Delehaunty, I. Dinkelacker, L. Fulton, R. Fulton, J. Godfrey, P. Minx, M. Mitreva, W. Roeseler, H. Tian, H. Witte, S. P. Yang, R. K. Wilson, and R. J. Sommer. The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism. Nat. Genet., 40(10):1193–1198, Oct 2008.

A. Dorn, B. Durand, C. Marfing, M. Le Meur, C. Benoist, and D. Mathis. Conserved major histocom- patibility complex class II boxes–X and Y–are transcriptional control elements and specifically bind nuclear proteins. Proc. Natl. Acad. Sci. U.S.A., 84(17):6249–6253, Sep 1987.

R. Dubruille, A. Laurencon, C. Vandaele, E. Shishido, M. Coulon-Bublex, P. Swoboda, P. Couble, M. Kernan, and B. Durand. Drosophila regulatory factor X is necessary for ciliated sensory neuron differentiation. Development, 129(23):5487–5498, Dec 2002.

N. D. Dwyer, E. R. Troemel, P. Sengupta, and C. I. Bargmann. Odorant receptor localization to olfactory cilia is mediated by ODR-4, a novel membrane-associated protein. Cell, 93(3):455– 466, May 1998.

S. R. Eddy. A new generation of homology search tools based on probabilistic inference. Genome Inform, 23(1):205–211, Oct 2009.

E. Efimenko, K. Bubb, H. Y. Mak, T. Holzman, M. R. Leroux, G. Ruvkun, J. H. Thomas, and P. Swoboda. Analysis of xbx genes in C. elegans. Development, 132(8):1923–1934, Apr 2005.

220 E. Efimenko, O. E. Blacque, G. Ou, C. J. Haycraft, B. K. Yoder, J. M. Scholey, M. R. Leroux, and P. Swoboda. Caenorhabditis elegans DYF-2, an orthologue of human WDR19, is a component of the intraflagellar transport machinery in sensory cilia. Mol. Biol. Cell, 17(11):4801–4811, Nov 2006.

P. Emery, M. Strubin, K. Hofmann, P. Bucher, B. Mach, and W. Reith. A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. Mol. Cell. Biol., 16(8):4486–4494, Aug 1996.

T. C. Evans. Transformation and microinjection. WormBook, 2006.

Y. Fan, M. A. Esmail, S. J. Ansley, O. E. Blacque, K. Boroevich, A. J. Ross, S. J. Moore, J. L. Badano, H. May-Simera, D. S. Compton, J. S. Green, R. A. Lewis, M. M. van Haelst, P. S. Parfrey, D. L. Baillie, P. L. Beales, N. Katsanis, W. S. Davidson, and M. R. Leroux. Mutations in a member of the Ras superfamily of small GTP-binding proteins causes Bardet-Biedl syndrome. Nat. Genet., 36(9):989–993, Sep 2004.

J. A. Follit, R. A. Tuft, K. E. Fogarty, and G. J. Pazour. The intraflagellar transport protein IFT20 is associated with the Golgi complex and is required for cilia assembly. Mol. Biol. Cell, 17(9): 3781–3792, Sep 2006.

C. Frøkjær-Jensen, M. W. Davis, C. E. Hopkins, B. J. Newman, J. M. Thummel, S. P. Olesen, M. Grunnet, and E. M. Jorgensen. Single-copy insertion of transgenes in Caenorhabditis elegans. Nat. Genet., 40(11):1375–1383, Nov 2008.

M. Fujiwara, T. Ishihara, and I. Katsura. A novel WD40 protein, CHE-2, acts cell-autonomously in the formation of C. elegans sensory cilia. Development, 126(21):4839–4848, Nov 1999.

K. S. Gajiwala, H. Chen, F. Cornille, B. P. Roques, W. Reith, B. Mach, and S. K. Burley. Structure of the winged-helix protein hRFX1 reveals a new mode of DNA binding. Nature, 403(6772): 916–921, Feb 2000.

E. Ghedin, S. Wang, D. Spiro, E. Caler, Q. Zhao, J. Crabtree, J. E. Allen, A. L. Delcher, D. B. Guil- iano, D. Miranda-Saavedra, S. V. Angiuoli, T. Creasy, P. Amedeo, B. Haas, N. M. El-Sayed, J. R. Wortman, T. Feldblyum, L. Tallon, M. Schatz, M. Shumway, H. Koo, S. L. Salzberg, S. Schobel, M. Pertea, M. Pop, O. White, G. J. Barton, C. K. Carlow, M. J. Crawford, J. Daub, M. W. Dimmic, C. F. Estes, J. M. Foster, M. Ganatra, W. F. Gregory, N. M. Johnson, J. Jin, R. Komuniecki, I. Korf, S. Kumar, S. Laney, B. W. Li, W. Li, T. H. Lindblom, S. Lustigman, D. Ma, C. V. Maina, D. M. Martin, J. P. McCarter, L. McReynolds, M. Mitreva, T. B. Nutman, J. Parkinson, J. M. Peregrin- Alvarez, C. Poole, Q. Ren, L. Saunders, A. E. Sluder, K. Smith, M. Stanke, T. R. Unnasch, J. Ware, A. D. Wei, G. Weil, D. J. Williams, Y. Zhang, S. A. Williams, C. Fraser-Liggett, B. Slatko, M. L. Blaxter, and A. L. Scott. Draft genome of the filarial nematode parasite Brugia malayi. Science, 317(5845):1756–1760, Sep 2007.

A. Gherman, E. E. Davis, and N. Katsanis. The ciliary proteome database: an integrated community resource for the genetic and functional dissection of cilia. Nat. Genet., 38(9):961–962, Sep 2006.

C. Godel, S. Kumar, G. Koutsovoulos, P. Ludin, D. Nilsson, F. Comandatore, N. Wrobel, M. Thomp- son, C. D. Schmid, S. Goto, F. Bringaud, A. Wolstenholme, C. Bandi, C. Epe, R. Kaminsky, M. Blaxter, and P. Maser. The genome of the heartworm, Dirofilaria immitis, reveals drug and vaccine targets. FASEB J., 26(11):4650–4661, Nov 2012.

221 C. J. Haycraft, P. Swoboda, P. D. Taulman, J. H. Thomas, and B. K. Yoder. The C. elegans homolog of the murine cystic kidney disease gene Tg737 functions in a ciliogenic pathway and is disrupted in osm-5 mutant worms. Development, 128(9):1493–1505, May 2001.

C. J. Haycraft, J. C. Schafer, Q. Zhang, P. D. Taulman, and B. K. Yoder. Identification of CHE-13, a novel intraflagellar transport protein required for cilia formation. Exp. Cell Res., 284(2):251–263, Apr 2003.

L. M. Hellman and M. G. Fried. Electrophoretic mobility shift assay (EMSA) for detecting protein- nucleic acid interactions. Nat Protoc, 2(8):1849–1861, 2007.

F. Hildebrandt, T. Benzing, and N. Katsanis. Ciliopathies. N. Engl. J. Med., 364(16):1533–1543, Apr 2011.

O. Hobert. PCR fusion-based approach to create reporter gene constructs for expression analysis in transgenic C. elegans. BioTechniques, 32(4):728–730, Apr 2002.

J. Hoefele, K. Mayer, M. Scholz, and H. G. Klein. Novel PKD1 and PKD2 mutations in autosomal dominant polycystic kidney disease (ADPKD). Nephrol. Dial. Transplant., 26(7):2181–2188, Jul 2011.

G. C. Horvath, W. S. Kistler, and M. K. Kistler. RFX2 is a potential transcriptional regulatory factor for histone H1t and other genes expressed during the meiotic phase of spermatogenesis. Biol. Reprod., 71(5):1551–1559, Nov 2004.

M. Huang, Z. Zhou, and S. J. Elledge. The DNA replication and damage checkpoint pathways induce transcription by inhibition of the Crt1 repressor. Cell, 94(5):595–605, Sep 1998.

P. N. Inglis, G. Ou, M. R. Leroux, and J. M. Scholey. The sensory cilia of Caenorhabditis elegans. WormBook, pages 1–22, 2007.

H. Ishikawa and W. F. Marshall. Ciliogenesis: building the cell’s antenna. Nat. Rev. Mol. Cell Biol., 12(4):222–234, Apr 2011.

A. R. Jex, S. Liu, B. Li, N. D. Young, R. S. Hall, Y. Li, L. Yang, N. Zeng, X. Xu, Z. Xiong, F. Chen, X. Wu, G. Zhang, X. Fang, Y. Kang, G. A. Anderson, T. W. Harris, B. E. Campbell, J. Vlaminck, T. Wang, C. Cantacessi, E. M. Schwarz, S. Ranganathan, P. Geldhof, P. Nejsum, P. W. Sternberg, H. Yang, J. Wang, J. Wang, and R. B. Gasser. Ascaris suum draft genome. Nature, 479(7374): 529–533, Nov 2011.

A. R. Jex, P. Nejsum, E. M. Schwarz, L. Hu, N. D. Young, R. S. Hall, P. K. Korhonen, S. Liao, S. Thamsborg, J. Xia, P. Xu, S. Wang, J. P. Scheerlinck, A. Hofmann, P. W. Sternberg, J. Wang, and R. B. Gasser. Genome and transcriptome of the porcine whipworm Trichuris suis. Nat. Genet., 46(7):701–706, Jul 2014.

J. Keilwagen, M. Wenk, J. L. Erickson, M. H. Schattat, J. Grau, and F. Hartung. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res., Feb 2016.

A. Kelly and J. Trowsdale. Complete nucleotide sequence of a functional HLA-DP beta gene and the region between the DP beta 1 and DP alpha 1 genes: comparison of the 5’ ends of HLA class II genes. Nucleic Acids Res., 13(5):1607–1621, Mar 1985.

222 T. Kikuchi, J. A. Cotton, J. J. Dalzell, K. Hasegawa, N. Kanzaki, P. McVeigh, T. Takanashi, I. J. Tsai, S. A. Assefa, P. J. Cock, T. D. Otto, M. Hunt, A. J. Reid, A. Sanchez-Flores, K. Tsuchihara, T. Yokoi, M. C. Larsson, J. Miwa, A. G. Maule, N. Sahashi, J. T. Jones, and M. Berriman. Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus. PLoS Pathog., 7(9):e1002219, Sep 2011.

D. C. Koboldt, Q. Zhang, D. E. Larson, D. Shen, M. D. McLellan, L. Lin, C. A. Miller, E. R. Mardis, L. Ding, and R. K. Wilson. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res., 22(3):568–576, Mar 2012.

H. Kunitomo and Y. Iino. Caenorhabditis elegans DYF-11, an orthologue of mammalian Traf3ip1/MIP-T3, is required for sensory cilia formation. Genes Cells, 13(1):13–25, Jan 2008.

R. Laing, T. Kikuchi, A. Martinelli, I. J. Tsai, R. N. Beech, E. Redman, N. Holroyd, D. J. Bartley, H. Beasley, C. Britton, D. Curran, E. Devaney, A. Gilabert, M. Hunt, F. Jackson, S. L. John- ston, I. Kryukov, K. Li, A. A. Morrison, A. J. Reid, N. Sargison, G. I. Saunders, J. D. Wasmuth, A. Wolstenholme, M. Berriman, J. S. Gilleard, and J. A. Cotton. The genome and transcriptome of Haemonchus contortus, a key model parasite for drug and vaccine discovery. Genome Biol., 14 (8):R88, 2013.

C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262(5131): 208–214, Oct 1993.

B. H. Lee, J. Liu, D. Wong, S. Srinivasan, and K. Ashrafi. Hyperactive neuroendocrine secretion causes size, feeding, and metabolic defects of C. elegans Bardet-Biedl syndrome mutants. PLoS Biol., 9(12):e1001219, Dec 2011.

H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16):2078– 2079, Aug 2009.

J. B. Li, J. M. Gerdes, C. J. Haycraft, Y. Fan, T. M. Teslovich, H. May-Simera, H. Li, O. E. Blacque, L. Li, C. C. Leitch, R. A. Lewis, J. S. Green, P. S. Parfrey, M. R. Leroux, W. S. Davidson, P. L. Beales, L. M. Guay-Woodford, B. K. Yoder, G. D. Stormo, N. Katsanis, and S. K. Dutcher. Com- parative genomics identifies a flagellar and basal body proteome that includes the BBS5 human disease gene. Cell, 117(4):541–552, May 2004.

A. K. Mah, K. R. Armstrong, D. S. Chew, J. S. Chu, D. K. Tu, R. C. Johnsen, N. Chen, H. M. Chamberlin, and D. L. Baillie. Transcriptional regulation of AQP-8, a Caenorhabditis elegans aquaporin exclusively expressed in the excretory system, by the POU homeobox transcription factor CEH-6. J. Biol. Chem., 282(38):28074–28086, Sep 2007.

D. J. Mathis, C. O. Benoist, V. E. Williams, M. R. Kanter, and H. O. McDevitt. The murine E alpha immune response gene. Cell, 32(3):745–754, Mar 1983.

M. Mitreva, D. P. Jasmer, D. S. Zarlenga, Z. Wang, S. Abubucker, J. Martin, C. M. Taylor, Y. Yin, L. Fulton, P. Minx, S. P. Yang, W. C. Warren, R. S. Fulton, V. Bhonagiri, X. Zhang, K. Hallsworth- Pepin, S. W. Clifton, J. P. McCarter, J. Appleton, E. R. Mardis, and R. K. Wilson. The draft genome of the parasitic nematode Trichinella spiralis. Nat. Genet., 43(3):228–235, Mar 2011.

223 A. Mortazavi, E. M. Schwarz, B. Williams, L. Schaeffer, I. Antoshechkin, B. J. Wold, and P. W. Sternberg. Scaffolding a Caenorhabditis nematode genome with RNA-seq. Genome Res., 20(12): 1740–1747, Dec 2010. S. Mukhopadhyay, Y. Lu, H. Qin, A. Lanjuin, S. Shaham, and P. Sengupta. Distinct IFT mechanisms contribute to the generation of ciliary structural diversity in C. elegans. EMBO J., 26(12):2966– 2980, Jun 2007. T. Murayama, Y. Toh, Y. Ohshima, and M. Koga. The dyf-3 gene encodes a novel protein required for sensory cilium formation in Caenorhabditis elegans. J. Mol. Biol., 346(3):677–687, Feb 2005. C. O’Connor. Fluorescence in situ hybridization (FISH). Nature Education, 1(1):171, 2008. C. H. Opperman, D. M. Bird, V. M. Williamson, D. S. Rokhsar, M. Burke, J. Cohn, J. Cromer, S. Diener, J. Gajan, S. Graham, T. D. Houfek, Q. Liu, T. Mitros, J. Schaff, R. Schaffer, E. Scholl, B. R. Sosinski, V. P. Thomas, and E. Windham. Sequence and genetic map of Meloidogyne hapla: A compact nematode genome for plant parasitism. Proc. Natl. Acad. Sci. U.S.A., 105(39): 14802–14807, Sep 2008. S. M. O’Rourke, M. D. Dorfman, J. C. Carter, and B. Bowerman. Dynein modifiers in C. elegans: light chains suppress conditional heavy chain mutants. PLoS Genet., 3(8):e128, Aug 2007. D. M. O’Sullivan, D. Larhammar, M. C. Wilson, P. A. Peterson, and V. Quaranta. Structure of the human Ia-associated invariant (gamma)-chain gene: identification of 5’ sequences shared with major histocompatibility complex class II genes. Proc. Natl. Acad. Sci. U.S.A., 83(12):4484–4488, Jun 1986. K. Otsuki, Y. Hayashi, M. Kato, H. Yoshida, and M. Yamaguchi. Characterization of dRFX2, a novel RFX family protein in Drosophila. Nucleic Acids Res., 32(18):5636–5648, 2004. G. Ou, O. E. Blacque, J. J. Snow, M. R. Leroux, and J. M. Scholey. Functional coordination of intraflagellar transport motors. Nature, 436(7050):583–587, Jul 2005a. G. Ou, H. Qin, J. L. Rosenbaum, and J. M. Scholey. The PKD protein qilin undergoes intraflagellar transport. Curr. Biol., 15(11):R410–411, Jun 2005b. G. Ou, M. Koga, O. E. Blacque, T. Murayama, Y. Ohshima, J. C. Schafer, C. Li, B. K. Yoder, M. R. Leroux, and J. M. Scholey. Sensory ciliogenesis in Caenorhabditis elegans: assignment of IFT components into distinct modules based on transport and phenotypic profiles. Mol. Biol. Cell, 18 (5):1554–1569, May 2007. X. Pan, G. Ou, G. Civelekoglu-Scholey, O. E. Blacque, N. F. Endres, L. Tao, A. Mogilner, M. R. Leroux, R. D. Vale, and J. M. Scholey. Mechanism of transport of IFT particles in C. elegans cilia by the concerted action of kinesin-II and OSM-3 motors. J. Cell Biol., 174(7):1035–1045, Sep 2006. L. A. Perkins, E. M. Hedgecock, J. N. Thomson, and J. G. Culotti. Mutant sensory cilia in the nematode Caenorhabditis elegans. Dev. Biol., 117(2):456–487, Oct 1986. P. Phirke, E. Efimenko, S. Mohan, J. Burghoorn, F. Crona, M. W. Bakhoum, M. Trieb, K. Schuske, E. M. Jorgensen, B. P. Piasecki, M. R. Leroux, and P. Swoboda. Transcriptional profiling of C. elegans DAF-19 uncovers a ciliary base-associated protein and a CDK/CCRK/LF2p-related kinase required for intraflagellar transport. Dev. Biol., 357(1):235–247, Sep 2011.

224 L. Pugliatti, J. Derre, R. Berger, C. Ucla, W. Reith, and B. Mach. The genes for MHC class II regulatory factors RFX1 and RFX2 are located on the short arm of chromosome 19. Genomics, 13(4):1307–1310, Aug 1992.

H. Qin, J. L. Rosenbaum, and M. M. Barr. An autosomal recessive polycystic kidney disease gene homolog is involved in intraflagellar transport in C. elegans ciliated sensory neurons. Curr. Biol., 11(6):457–461, Mar 2001.

W. Reith, S. Satola, C. H. Sanchez, I. Amaldi, B. Lisowska-Grospierre, C. Griscelli, M. R. Hadam, and B. Mach. Congenital immunodeficiency with a regulatory defect in MHC class II gene ex- pression lacks a specific HLA-DR promoter binding protein, RF-X. Cell, 53(6):897–906, Jun 1988.

W. Reith, E. Barras, S. Satola, M. Kobr, D. Reinhart, C. H. Sanchez, and B. Mach. Cloning of the major histocompatibility complex class II promoter binding protein affected in a hereditary defect in class II gene regulation. Proc. Natl. Acad. Sci. U.S.A., 86(11):4200–4204, Jun 1989.

W. Reith, C. Ucla, E. Barras, A. Gaud, B. Durand, C. Herrero-Sanchez, M. Kobr, and B. Mach. RFX1, a transactivator of hepatitis B virus enhancer I, belongs to a novel family of homodimeric and heterodimeric DNA-binding proteins. Mol. Cell. Biol., 14(2):1230–1244, Feb 1994.

M. Remm, C. E. Storm, and E. L. Sonnhammer. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol., 314(5):1041–1052, Dec 2001.

C. Rodelsperger, R. A. Neher, A. M. Weller, G. Eberhardt, H. Witte, W. E. Mayer, C. Dieterich, and R. J. Sommer. Characterization of genetic diversity in the nematode Pristionchus pacificus from population-scale resequencing data. Genetics, 196(4):1153–1165, Apr 2014.

H. Saito, R. A. Maki, L. K. Clayton, and S. Tonegawa. Complete primary structures of the E beta chain and gene of the mouse major histocompatibility complex. Proc. Natl. Acad. Sci. U.S.A., 80 (18):5520–5524, Sep 1983.

J. C. Schafer, C. J. Haycraft, J. H. Thomas, B. K. Yoder, and P. Swoboda. XBX-1 encodes a dynein light intermediate chain required for retrograde intraflagellar transport and cilia assem- bly in Caenorhabditis elegans. Mol. Biol. Cell, 14(5):2057–2070, May 2003.

E. M. Schwarz, Y. Hu, I. Antoshechkin, M. M. Miller, P. W. Sternberg, and R. V. Aroian. The genome and transcriptome of the zoonotic hookworm Ancylostoma ceylanicum identify infection-specific gene families. Nat. Genet., 47(4):416–422, Apr 2015.

R. She, J. S. Chu, B. Uyar, J. Wang, K. Wang, and N. Chen. genBlastG: using BLAST searches to build homologous gene models. Bioinformatics, 27(15):2141–2143, Aug 2011.

C. A. Siegrist, B. Durand, P. Emery, E. David, P. Hearing, B. Mach, and W. Reith. RFX1 is identical to enhancer factor C and functions as a transactivator of the hepatitis B virus enhancer. Mol. Cell. Biol., 13(10):6375–6384, Oct 1993.

J. J. Snow, G. Ou, A. L. Gunnarson, M. R. Walker, H. M. Zhou, I. Brust-Mascher, and J. M. Scholey. Two anterograde intraflagellar transport motors cooperate to build sensory cilia on C. elegans neurons. Nat. Cell Biol., 6(11):1109–1113, Nov 2004.

225 M. M. Sohocki, L. S. Sullivan, H. A. Mintz-Hittner, D. Birch, J. R. Heckenlively, C. L. Freund, R. R. McInnes, and S. P. Daiger. A range of clinical phenotypes associated with mutations in CRX, a photoreceptor transcription-factor gene. Am. J. Hum. Genet., 63(5):1307–1315, Nov 1998. J. Srinivasan, A. R. Dillman, M. G. Macchietto, L. Heikkinen, M. Lakso, K. M. Fracchia, I. An- toshechkin, A. Mortazavi, G. Wong, and P. W. Sternberg. The draft genome and transcriptome of Panagrellus redivivus are shaped by the harsh demands of a free-living lifestyle. Genetics, 193 (4):1279–1295, Apr 2013. V. Steimle, B. Durand, E. Barras, M. Zufferey, M. R. Hadam, B. Mach, and W. Reith. A novel DNA-binding regulatory factor is mutated in primary MHC class II deficiency (bare lymphocyte syndrome). Genes Dev., 9(9):1021–1032, May 1995. L. D. Stein, Z. Bao, D. Blasiar, T. Blumenthal, M. R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. D’Eustachio, D. H. Fitch, L. A. Fulton, R. E. Fulton, S. Griffiths-Jones, T. W. Harris, L. W. Hillier, R. Kamath, P. E. Kuwabara, E. R. Mardis, M. A. Marra, T. L. Miner, P. Minx, J. C. Mullikin, R. W. Plumb, J. Rogers, J. E. Schein, M. Sohrmann, J. Spieth, J. E. Stajich, C. Wei, D. Willey, R. K. Wilson, R. Durbin, and R. H. Waterston. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol., 1(2):E45, Nov 2003. P. Swoboda, H. T. Adler, and J. H. Thomas. The RFX-type transcription factor DAF-19 regulates sensory neuron cilium formation in C. elegans. Mol. Cell, 5(3):411–421, Mar 2000. K. Szymanska and C. A. Johnson. The transition zone: an essential functional compartment of cilia. Cilia, 1(1):10, 2012. Y. T. Tang, X. Gao, B. A. Rosa, S. Abubucker, K. Hallsworth-Pepin, J. Martin, R. Tyagi, E. Heizer, X. Zhang, V. Bhonagiri-Palsikar, P. Minx, W. C. Warren, Q. Wang, B. Zhan, P. J. Hotez, P. W. Sternberg, A. Dougall, S. T. Gaze, J. Mulvenna, J. Sotillo, S. Ranganathan, E. M. Rabelo, R. K. Wilson, P. L. Felgner, J. Bethony, J. M. Hawdon, R. B. Gasser, A. Loukas, and M. Mitreva. Genome of the human hookworm Necator americanus. Nat. Genet., 46(3):261–269, Mar 2014. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science, 282(5396):2012–2018, Dec 1998. O. Thompson, M. Edgley, P. Strasbourger, S. Flibotte, B. Ewing, R. Adair, V. Au, I. Chaudhry, L. Fernando, H. Hutter, A. Kieffer, J. Lau, N. Lee, A. Miller, G. Raymant, B. Shen, J. Shendure, J. Taylor, E. H. Turner, L. W. Hillier, D. G. Moerman, and R. H. Waterston. The million mutation project: a new approach to genetics in Caenorhabditis elegans. Genome Res., 23(10):1749–1762, Oct 2013. T. R. Unnasch and S. A. Williams. The genomes of Onchocerca volvulus. Int. J. Parasitol., 30(4): 543–552, Apr 2000. S. Veltel, R. Gasper, E. Eisenacher, and A. Wittinghofer. The retinitis pigmentosa 2 gene product is a GTPase-activating protein for Arf-like 3. Nat. Struct. Mol. Biol., 15(4):373–380, Apr 2008. S. R. Warburton-Pitt, A. R. Jauregui, C. Li, J. Wang, M. R. Leroux, and M. M. Barr. Ciliogenesis in Caenorhabditis elegans requires genetic interactions between ciliary middle segment localized NPHP-2 (inversin) and transition zone-associated proteins. J. Cell. Sci., 125(Pt 11):2592–2603, Jun 2012.

226 S. Ward. Chemotaxis by the nematode Caenorhabditis elegans: identification of attractants and analysis of the response by use of mutants. Proc. Natl. Acad. Sci. U.S.A., 70(3):817–821, Mar 1973.

T. Warrington. Computational and Molecular Dissection of an X-box cis-Regulatory Module. PhD thesis, Simon Fraser University, 2015.

A. M. Waters and P. L. Beales. Ciliopathies: an expanding disease spectrum. Pediatr. Nephrol., 26 (7):1039–1056, Jul 2011.

Q. Wei, Y. Zhang, Y. Li, Q. Zhang, K. Ling, and J. Hu. The BBSome controls IFT assembly and turnaround in cilia. Nat. Cell Biol., 14(9):950–957, Sep 2012.

Q. Wei, Q. Xu, Y. Zhang, Y. Li, Q. Zhang, Z. Hu, P. C. Harris, V. E. Torres, K. Ling, and J. Hu. Transition fibre protein FBF1 is required for the ciliary entry of assembled intraflagellar transport complexes. Nat Commun, 4:2750, 2013.

J. G. White, E. Southgate, J. N. Thomson, and S. Brenner. The structure of the nervous system of the nematode Caenorhabditis elegans. Philos. Trans. R. Soc. Lond., B, Biol. Sci., 314(1165):1–340, Nov 1986.

C. L. Williams, C. Li, K. Kida, P. N. Inglis, S. Mohan, L. Semenec, N. J. Bialas, R. M. Stupay, N. Chen, O. E. Blacque, B. K. Yoder, and M. R. Leroux. MKS and NPHP modules cooperate to establish basal body/transition zone membrane associations and ciliary gate function during ciliogenesis. J. Cell Biol., 192(6):1023–1041, Mar 2011.

M. E. Winkelbauer, J. C. Schafer, C. J. Haycraft, P. Swoboda, and B. K. Yoder. The C. elegans homologs of nephrocystin-1 and nephrocystin-4 are cilia transition zone proteins involved in chemosensory perception. J. Cell. Sci., 118(Pt 23):5575–5587, Dec 2005.

S. Y. Wu and M. McLeod. The sak1+ gene of Schizosaccharomyces pombe encodes an RFX family DNA-binding protein that positively regulates cyclic AMP-dependent protein kinase-mediated exit from the mitotic cell cycle. Mol. Cell. Biol., 15(3):1479–1488, Mar 1995.

Q. Zheng, X. Cai, M. H. Tan, S. Schaffert, C. P. Arnold, X. Gong, C. Z. Chen, and S. Huang. Precise gene deletion and replacement using the CRISPR/Cas9 system in human cells. BioTechniques, 57 (3):115–124, 2014.

227