<<

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2019

In-silico guided identification of ciliogenesis candidate in a non-conventional model

Natalia I. Acevedo Luna Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd

Part of the Commons

Recommended Citation Acevedo Luna, Natalia I., "In-silico guided identification of ciliogenesis candidate genes in a non- conventional animal model" (2019). Graduate Theses and Dissertations. 17382. https://lib.dr.iastate.edu/etd/17382

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. In-silico guided identification of ciliogenesis candidate genes in a non-conventional animal model

by

Natalia Acevedo Luna

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Bioinformatics and Computational Biology

Program of Study Committee: Heike Hofmann, Co-major Professor Geetu Tuteja, Co-major Professor Matthew Hufford Dennis V. Lavrov Mohan Gupta

The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred.

Iowa State University Ames, Iowa 2019 ii

DEDICATION

This dissertation would never have been possible without the support and love of the wonderful people I have the honor of calling my family, they are my core and my every- thing.

First and foremost, I am deeply beholden to my parents Pablo Acevedo and Gladys Luna who supported me in every possible aspect, have never once doubted me, and gave me the strength to continuously push myself further in my professional and personal development.

In the same way, I owe gratitude to my siblings Boris and Luciana for their continuous encouragement, patience, and care throughout the years. They are not only my brother and sister but also the best friends. Thank you for always being there for me!

Expressing my affection, appreciation, and devotion for my husband, Jan Hoinka, in words will never do justice to how deep my feelings for him truly are. His love, his unshat- tered believe in me, his honesty, and his persistent moral support and care have made me a better person and given me the strength to persist, both professionally and on a personal level, every single day.

Above all, I dedicate this thesis to the memory of my beloved Mother. Her example of life, perseverance, care, dedication, love, and her overall philosophy on life are deeply rooted in me and present in every single moment, shaping who I am and who I am still to become. This thesis is dedicated to her. iii

TABLE OF CONTENTS

Page

LIST OF FIGURES vi

LIST OF TABLES viii

ACKNOWLEDGEMENTS ix

ABSTRACT xi

CHAPTER 1. GENERAL INTRODUCTION 1

Motile and Non-motile Cilia 2 : A Conserved 4 Multiciliated Differentiation 5 Transcriptional Regulation by FoxJ1 and Rfx 6 Platynereis dumerilii as an Animal Model 8 Structure of this Dissertation 10 Bibliography 13

CHAPTER 2: PDUMBASE 17

Abstract 17 Background 18 Construction and Content 23 Transcriptome Assembly and Annotation Pipeline 24 Expression Analysis 26 Comparative Transcriptome 27 Utility and Discussion 28 Annotation Search 28 Sequence Similarity Search 29 Comparative Analysis 29 Co-expression Information 31 Database Implementation 33 Additional Features 33 Conclusions 33 List of Abbreviations 34 Declarations 35 Availability of Data and Materials 35 Competing Interests 35 Funding 35 iv

Authors’ Contributions 35 Acknowledgements 36 Bibliography 36

CHAPTER 3: CILIOGENESIS 45

Abstract 45 Background 46 Results 50 Identification of Known Ciliary Genes in P. dumerilii 50 Candidate Genes by β- Cell Fate Transformation 56 High Confidence Ciliogenesis Candidate Genes in P. dumerilii 60 Ciliogenesis Candidate Genes by Co-expression Analysis 66 Known Ciliary Genes by Localization and Functional Domains 75 Ciliogenesis Precursors 83 Discussion 86 Identification of Ciliogenesis Precursors and Transcriptional Regulators 88 Potential Novel Candidates Genes Based on Co-expression Analysis 90 Conclusion 96 Methods 97 Compilation of a Comprehensive Set of Known Ciliary Genes 97 Identification of Known Ciliary Genes Conserved in P. dumerilii 98 Classifying Known Ciliary Genes by Localization and Functional Domains 98 P. dumerilii Culture, Azakenpaullone Inhibitor Treatment, and Sequencing 99 Read Processing and Differential Expression Analysis 100 Identification of Ciliogenesis Candidate Genes 101 Expected Fold-change of Expression for Ciliary Genes 102 Identification of Ciliogenesis Candidate Genes by Annotation 103 Identification of Ciliogenesis Candidate Genes by Co-expression Analysis 103 Data and Availability 105 Bibliography 105

CHAPTER 4: DENDROSHINY 114

Abstract 114 Availability 115 Introduction 115 Materials and Methods 119 Data Input 119 Data Prepossessing 119 Cluster Generation and Classification 119 The Web Interface 120 Case Study 123 Effect of Tree Cutoff in Clustering Results 124 Effect of Timepoint Selection in Clustering Results 126 Discussion and Conclusion 128 Bibliography 131 v

CHAPTER 5. GENERAL CONCLUSIONS 134

Overview 134 Future Directions 136 PdumBase Expansion 136 Ciliogenesis Candidate Genes Outlook 136 Towards a Ciliogenesis Regulatory Networks 137 Concluding Remarks 140

APPENDIX A. PDUMBASE 141

Supplementary Figures 141 PdumBase Manual 143

APPENDIX B. CILIOGENESIS 172

Supplementary Figures 172 Supplementary Tables 175 Bibliography 272 vi

LIST OF FIGURES

Page

Figure 1.1 General structure of an Eukaryotic motile cilium 3 Figure 1.2 Overview of computational analysis pipeline 11

Figure 2.1 Schematic illustration of the PdumBase sitemap 25 Figure 2.2 PdumBase search interface and search result options 30 Figure 2.3 PdumBase expandable results option 32

Figure 3.1 Candidate genes by β-Catenin cell fate transformation 57 Figure 3.2 Predicted effect of cell fate transformation on treated 59 Figure 3.3 Distribution of the ratio of change in 61 Figure 3.4 Co-expression of highest IC gene 69 Figure 3.5 Co-expression of high confidence ciliogenesis candidate genes 74 Figure 3.6 Candidate genes classified into structural components 77 Figure 3.7 Candidate genes classified into functional components 81 Figure 3.8 Potential Ciliogenesis precursors identified by co-expression 84

Figure 4.1 Data flow overview of DendroShiny. 118 Figure 4.2 Screen capture of the web interface of DendroShiny 121 Figure 4.3 Impact of tree cut parameter c in co-expression results 126 Figure 4.4 Exploration of the clustering neighbourhood with Dendroshiny 127 Figure 4.5 Impact of time point selection in co-expression results 129

Figure A.1.1 Results from PdumBase Result Interface: Co-expression 141 Figure A.1.2 Results from PdumBase Result Interface: Gene models 142 Figure A.2.3 PdumBase Search result interface 146 Figure A.2.4 PdumBase Expression data tab interface 147 Figure A.2.5 PdumBase Search results interface: Uniprot 147 vii

Figure A.2.6 PdumBase Search results interface: Extension 148 Figure A.2.7 PdumBase Annotation tab interface 150 Figure A.2.8 PdumBase Search result interface: Expression profile 151 Figure A.2.9 PdumBase Plot tab interface 152 Figure A.2.10 Heat map of 13,160 expressed genes 153 Figure A.2.11 PdumBase Coexpression information interface 154 Figure A.2.12 PdumBase Ortholog expression profile interface 155 Figure A.2.13 PdumBase List tab interface 158 Figure A.2.14 PdumBase Ortholog groups interface 159 Figure A.2.15 PdumBase Alignment tab interface 159 Figure A.2.16 PdumBase Search interface 160 Figure A.2.17 PdumBase Search interface 160 Figure A.2.18 PdumBase Search result interface 161 Figure A.2.19 PdumBase Plot tab 162 Figure A.2.20 PdumBase Expression data tab 163 Figure A.2.21 PdumBase Annotation tab 164 Figure A.2.22 PdumBase Search results interface 165 Figure A.2.23 PdumBase Search results interface 165 Figure A.2.24 PdumBase Search results interface 166 Figure A.2.25 PdumBase Search results interface: Show other info 166 Figure A.2.26 PdumBase Search results interface: Show other info selected 167 Figure A.2.27 PdumBase Search results interface 168 Figure A.2.28 PdumBase Search results interface 169 Figure A.2.29 PdumBase Search interface 170 Figure A.2.30 PdumBase Search results interface 171

Figure B.1.1 Algorithmic Pipeline to Identify by Sequence Similarity 172 Figure B.1.2 Pipeline to Identify Ciliogenesis Candidate Genes 173 Figure B.1.3 Cover Art for Manuscript 174 viii

LIST OF TABLES

Page

Table 3.1 Description of sources of known ciliary genes 51 Table 3.2 Known ciliary genes classified into strict and inclusive core of high confidence genes 54 Table 3.3 Summary of Known ciliary genes discriminated into sub-sets 55 Table 3.4 Guilt by Association by gene name 65 Table 3.5 Ciliogenesis candidate genes by co-expression analysis 70 Table 3.6 Classification of known ciliary genes into functional and struc- tural components 76

Table A.2.1 Time points from Early Stages data set 144 Table A.2.2 Late Stages included in data set 145 Table A.2.3 and number of sequences for comparative analysis 155 Table A.2.4 Number of orthologs genes between the 6 species 156 Table A.2.5 Species and number of genes used to find orthologs groups 157

Table B.2.1 Compilation of Known Ciliary Genes 176

Table B.2.2 P. dumerilii Transcripts Aligning to Multiple Known Ciliary Genes 260 Table B.2.3 Strict Core of Known Ciliary Genes 271 ix

ACKNOWLEDGEMENTS

I would like to thank my major professor Dr. Heike Hofmann, for her consistent support and guidance and deciding to take the role as my major professor under unusual cir- cumstances. Her outstanding advising skills, allowing me independence to pursue my own approaches and to draw my own conclusion while guiding me through the process, have enabled me to grow as a researcher and person.

I would like to express my sincere gratitude to Dr. Stephan Q. Schneider for providing me the opportunity to join his research group and work under his guidance. His con- tribution to this thesis through countless and fruitful discussions not only substantially enriched my scientific expertise but also taught me the value of objective argumenta- tion to improve my overall scientific thinking on a completely new level. In addition, Dr. Schneider provided me with the required skills to understand the biological aspects of this work. Without these, the quality and extent of this dissertation would not have been the same.

I would also like to thank Dr. Geetu Tuteja, my co-major professor, for her scientific and professional advice. Her office door was always open for me, every conversation with her was encouraging and a truly enriched experience. In addition, I was always welcome to join her lab meetings providing me with insights into novel and diverse research topics.

This dissertation would not have been possible without my committee members, Dr. Matthew Hufford, Dr. Dennis V. Lavrov, and Dr. Mohan Gupta, to whom I would like to thank for their insightful scientific advice and their continued support in every interaction we had over the time of my PhD. studies.

In addition I am grateful to our collaborator Dr. Detlev Arendt at the European Molecu- lar Biology Laboratory (EMBL), for providing me the opportunity to intern with his group and training me in performing microinjections of Platynereis dumerilii embryos. I would x also like to extend this acknowledgment to all the members of his group during my visit as they all contributed to making my time at EMBL a wonderful scientific and personal experience.

I would also like to give a special note of appreciation to Dr. Drena Doobs, for the pro- fessional advice and for proving me an office space with a nice view of central campus. Her company in the office always made me feel welcomed and cared for.

In addition, the support, professionalism, and collaboration of my colleges that were the group members of Dr. Schneider lab was, and remains, truly extraordinary in every aspect. Thank you for your diverse character and teamwork.

Furthermore, I owe gratitude to Trish Stauble, former Bioinformatics and Computa- tional Biology (BCB) program coordinator, for her support and for keeping every inter- action personal hence maintaining my sense of membership to the program and to the university.

Finally, to the multitude of people I have gotten to know during my time at Iowa State University and I am proud to call my friends: thank you! You my friends, you know who you are, you have made this dissertation possible in ways that are impossible to convey in writing. xi

ABSTRACT

The annelid Platynereis dumerilii is increasingly used as a for devel- opmental comparative studies due to its phylogenetic position and the accessibility of embryos that exhibit a stereotypic cleavage pattern and invariant cell lineages with pre- dictable cell fates. To develop this unconventional model we established PdumBase, a comprehensive data base and intuitive online user interface based on stage specific transcriptomic data that allows genome wide identification of gene families contributing to particular biological processes during early developmental stages. One such impor- tant biological process is ciliogenesis, the formation of cilia, associated with a variety of cellular roles such as motility, signaling, and sensory functions. However, knowledge of its multiple molecular components and regulatory mechanisms govern- ing this dynamic process lags behind the functional understanding of cilia.

To close this gap and to highlight the versatility of the data encompassing Pdum-

Base, we have developed an in silico guided identification pipeline for genes that con- tribute to the generation of a multiciliated cell type (MCC). Based on sequence similarity we identified orthologous P. dumerilii genes to the majority of the known ciliary genes described in other species. In addition, we validated their potential contribution to cilio- genesis through a differential expression analysis based on wild type vs. experimentally manipulated hyperciliated embryos. Our study revealed over 600 known ciliary genes to be significantly upregulated in treated embryos. These included genes encoding for well-known candidates in ciliogenesis such as , kinesis, organizers, associated , signaling proteins, and factors, which were summarized into a set of high confidence core of ciliogenesis candidate genes.

To further associate genes that lack any annotation with ciliary activity, we devel- oped DendroShiny, a computational approach to implicate potentially novel ciliary genes among poorly characterized transcripts based on rigorous statistical analysis and clus- xii tering of their co-expression patterns. DendroShiny achieves these goals by (1) cluster- ing expression patterns of known genes and (2) using machine learning to determine ex- pression features that allow for the classification of poorly characterized genes. Finally, our approach interactively displays the relationship of the identified clusters and their corresponding expression patterns, consequently facilitating the downstream analysis of transcriptomic data sets.

Taken together, our approach enables the identification of candidate and potentially novel ciliary genes despite the lack of an annotated genome and sets the ground for the elucidation of regulatory interactions between these candidate genes. This work hence represents a first step towards the generation of a comprehensive survey for ciliogenesis genes in Platynereis dumerilii. 1

CHAPTER 1. GENERAL INTRODUCTION

The research compiled in this dissertation concerns the computational identification of genes involved in the formation of cilia, microtubule-based organelle present across most eukaryotic phyla. Cilia emanate from the cell surface and have diverse sensory and motility roles critical for development, cell signaling, and tissue homeostasis, among other physiological functions. It its therefore not surprising that the inability of cells to produce functional cilia has been linked to a number of developmental and genetic dis- eases, collectively known as . However, our understanding of function and development concerning ciliogenesis, the biological mechanism governing their forma- tion, has lagged behind given the significance of cilia in a biological system and itsim- pact on associated .

Obtaining a deeper insight into the underlying molecular mechanisms resulting in such ciliopathies involves the identification of the molecular components required to assem- ble functional cilia. This endeavor however can only be achieved with tailored computa- tional tools capable of processing next-generation biological data in a meaningful and timely manner. The resulting knowledge then has the potential to guide the elucidation of how the coordinated interaction of these ciliary components is regulated to produce functional cilia. Specifically the latter represents a crucial step towards a broader under- standing and possible treatment approaches of cilia related diseases.

In this work, we are interested in the formation of motile cilia in the context of multicil- iated cells. This type of cilia is known to have multiple functions during embryonic devel- opment as well as in adult tissues, making this organelle critical for survival. The crucial role of motile cilia is additionally reflected in their conservation across a wide array ofor- ganisms on both, the functional and structural domain. Taking advantage of such levels of conservation, we use transcriptomic data for the identification of known, as well as novel ciliary genes expressed in early development of a non conventional animal model, 2 the marine annelid Platynereis dumerilii. Towards this goal, we have developed a unique computational data pipeline capable of integrating, analyzing, and visualizing various high-throughtput ciliogenesis data sources.

This introductory chapter covers the generalities of motile cilia structure and func- tion, describing the principal steps of ciliogenesis, as well as providing an overview of the regulatory mechanisms involved in multiciliated cell (MCC) differentiation. In ad- dition, we highlight the advantages of using Platynereis dumerilii as an animal model to study ciliogenesis and briefly touch upon the computational efforts guiding the here presented results. All together, the content of this chapter lays the foundation under- lying our motivation to identify ciliogenesis candidate genes through the integration of computational approaches for the analysis of transcriptomic data in the absence of an annotated genome.

Motile and Non-motile Cilia

Cilia are conventionally classified into two types: motile and non-motile. In general, cilia are anchored to the cell surface through the basal body which is formed by micro- tubules arranged in nine triplets. From the basal body the extends from the cell surface surrounded by a ciliary membrane that is continuous to the plasma membrane [1]. The axoneme is structurally formed by 9 doublets of , either around a central pair or lacking the central pair, respectively known as 9 + 2 and 9 + 0. Motile 9 + 2 cilia usually exist as multiple cilia in multiciliated cells bearing many dozens of them [2], whereas motile and non-motile 9 + 0 cilia are present as a single cilia on the cell [3]. Almost all cells produce non-motile cilia, also called primary or sensory cilia, whereas only specialized cells, multiciliated cells and sperm, produce motile cilia (or flagella) with the arrangement 9+2 [4]. A general overview of the structure of a cilium is provided in Figure 1.1.

In this study, we focus on motile cilia present in multiciliated cells. These cells bear from 30 to 300 cilia depending on the tissue [6]. In specialized adult tissues with mul- 3

Central bridge Plasma Central singlet membrane Outer microtubules Inner dynein Nexin Spoke head Subfiber B Subfiber A

Basal body

Triplet

Figure 1.1: General structure of an Eukaryotic motile cilium. Shown is a cilium anchored to the cell surface through the basal body which is formed by microtubules arranged in triples. From the basal body, the axoneme extends from the cell surface while surrounded by a ciliary membrane that is continuous to the plasma membrane. Image modified from [5].

ticiliated cells, cilia generate a flow to clear airways, move the ovum in the mammalian oviduct and circulate fluid in specific brain regions [7]. Motile cilia can also transduct sig- nals, a function that is usually attributed to the primary cilia. On airway epithelial tissues for instance, motile cilia can change the beat frequency according to external stimuli [8]. Besides these functions, mostly described in , motile cilia can also drive locomotion in many unicellular protists, as in ciliates (i.e Paramecium) [9], green algae flagellates (i.e. Clamydomononas), as well as in marine larvae [10] .

As expected by the variety of vital ciliary roles, defects in cilia structure or function are associated with a wide range of human diseases[11]. More specifically, defects in motile cilia can lead to diseases affecting the respiratory system (bronchiectasis), the brain (hydrocephalus), cause infertility, as well developmental abnormalities compromising left-right patterning as in situs inversus [3, 12] and others. Since motile and non-motile cilia share the majority of the structural components, in a given ciliary gene 4 can halt the functionality of both types of cilia. Therefore a wide spectrum of phenotypes related to such dysfunctions exists. In a recent review, Reiter et al. [12] provides a survey of 187 genes to be implicated in 35 ciliopathies, exemplifying the complexity of these diseases, and highlighting the importance of elucidating the molecular mechanisms of ciliary assembly and function.

Cilium: A Conserved Organelle

Motile cilium structure appears to be conserved in very distantly related . Single-cell protists, including amoeboids, euglenoids, dinoflagellates, as well as green and red algae and others, exhibit motile cilia with the same general structure and mi- crotubule arrangement as motile cilia in multicellular organisms in distal branches of the eukaryotic tree. For instance some fungi produce flagellated gametes with the 9 + 2 structure, and most animals groups, with the exception of ecdysozoa, have cilia with this same arrangement.

Phylogenetic analysis among cilia in green algae Chlamydomonas reinhardtii (bikont) and cilia in metazoans, animal and fungi (unikont) suggests that every branch of the eu- karyotic tree includes or had included organisms with motile cilia with the 9 + 2 configu- ration. In addition, proteins associated with the central pair have been found conserved between human and Chlamydomononas [13]. Furthermore, (IFT) proteins, which are involved in ciliary assembly and motility, are also found conserved between distant eukaryotic organisms [14].

The high level of conservation suggests that motile cilia appeared before the diver- gence of the eukaryotes, indicating that the last eukaryotic common ancestor (LECA) was an organism with motile cilia with the microtubule arrangement 9 + 2 [15]. In addi- tion, this ciliated organism required IFT proteins for its ciliary assembly suggesting that the ciliary assembly process should involve conserved components as well. 5

The conservation of ciliary function and structure highlights an evolutionary advan- tage for cilia bearing eukaryotes [16], and allows the study of motile cilia and ciliogene- sis through distantly related organisms. On an even broader scope, comparative studies should enable an improved understanding of ciliary dysfunction and the elucidation of its connections to ciliopathies, potentially leading to novel treatments and preventive approaches.

Multiciliated Cell Differentiation

Ciliary assembly in the context of a single motile or non-motile cilium is connected to progression of the . The solitary cilium is assembled while the cell is arrested in G1 and is shed as the cell enters [17]. In multiciliated cells however, dozens of cilia are assembled in a cell that is terminally differentiating and therefore will no longer divide. Despite this major difference, the main steps of early ciliogenesis remain very similar as both process involve: 1) formation of a ciliary vesicle at the proximity of the centriole, 2) migration of towards the plasma membrane, 3) docking of ciliary vesicle with centriole and fusion with plasma membrane, and 4) elongation of axoneme mediated by IFT [17, 18].

This further implies the expression of common molecular components during ciliary assembly in the context of a mono-ciliated or multiciliated cell. By extension, it is ex- pected that the regulatory mechanisms governing these expression patterns are also expected to be shared. While ciliary component in both types of cilia assembly over- lap, the regulatory mechanisms in multiciliated cells appear to involve additional ele- ments [7]. The assembly of a large number of cilia requires centriole amplification for the production of multiple basal bodies. This process is in itself tightly coordinated, and requires its own regulatory components [19]. In vertebrates, multiciliated cell (MCC) dif- ferentiation is initiated by the repression of Notch signaling through microRNA miR-449 [20]. Notch inhibition triggers the activation of a regulatory mechanism that involves two 6

Geminin related proteins, Gemc1 and the Mcidas also called multi- cilin [7].

Mcidas not only triggers centriole amplification by regulating the expression of the specialized cyclin ccno, but also induces expression, which in turn induces the ex- pression of genes involved in ciliary motility [21]. Therefore, Mcidas and Gemc1 (up- stream of mcidas), are considered to be sufficient to initiate MCC differentiation. Both, Mcidas and Gemc1 are Geminin related proteins [18]. Geminin is described to have a dual function, regulating cell cycle progression and cell differentiation [22]. Geminin has been found conserved from C. elegans to , and current data suggest that mcidas and gemc1 originated from an ancestral bilaterian geminin gene [23].

In general, Mcidas activates a regulatory cascade that involves Rfx2/3, FoxJ1, and Myb transcription factors. The activation of these factors triggers cell cycle exit, basal body amplification, remodeling, and ciliogenesis [7, 18].

Transcriptional Regulation by FoxJ1 and Rfx

The transcription factors Rfx and Foxj1 are known to be central to the transcriptional control of the ciliogenesis process. In a recent review, Choksi et al. [24] state that Rfx transcription factors have been found in the five major eukaryotic branches, suggesting that this factor was present in the last eukaryotic common ancestor (LECA). Moreover, Rfx appears to be highly conserved across the animal kingdom. There are eight Rfx transcription factors described in (nine in , due to an extra in this lineage) that are found in all vertebrates. This number is reduced in where three Rfx transcription factors have been described in Drosophila, and only one in C. elegans (known as daf-19). Rfx proteins bind to target genes via the X-box, a highly conserved DNA found in many organisms. 7

Similarly, studies in mouse, Xenopus, and suggest that the role of FoxJ1 in motile cilia assembly is conserved across vertebrates also. In those studies, knockdown of foxj1 caused loss of all motile cilia [25, 24].

Due to the fact that both transcription factors are described to regulate the expression of ciliary genes, it is of importance to determine how these two transcriptional programs relate. To that end, a comparison of the potential target genes of both transcription factors could be useful. Previous studies by Chung et al. [26] reported a set of Rfx target genes in Xenopus. Likewise, targets of FoxJ1 have been identified in zebrafish by Choksi et al. [27]. Comparison of both sets shows an intersection of genes suggesting that the two transcription factors act in cooperation and/or redundantly to regulate the expression of ciliary genes during ciliogenesis.

Furthermore, studies in zebrafish, mouse, and cultured human cells suggest that

FoxJ1 can induce the expression of and during ciliogenesis [24]. Similarly Rfx3 has been shown to regulate the expression of foxj1 in mouse cultured cells [28]. Besides this cross-regulation of both factors, other scenarios of cooperation are possible. One such example concerns proteins encoded by the Rfx target genes that could enhance transcriptional activation of foxj1. In addition, both transcription factors could act in par- allel to regulate the same set of genes in a given cell type. Despite the abundance of studies characterizing these transcription factors, there are significant gaps in the un- derstanding of how these two ciliary transcriptional modules cooperate.

In Platynereis dumerilii both genes, rfx and foxj1, are found and preliminary phyloge- netic analysis suggests that several homologs to vertebrate genes can be identified (data not shown). However, further in silico guided analysis is required to establish their in- volvement, and determine the various contributions of these gene families to regulate ciliogenesis in Platynereis dumerilii. 8

Platynereis dumerilii as an Animal Model

Here we proposed the use of a non-conventional animal model, the marine polychaete annelid Platynereis dumerilii as an ideal organism to study motile cilia and ciliary assem- bly in the context of multiciliated cell (MCC) differentiation.

As an annelid, P. dumerilii is a lophotrochozoan/spiralian, considered to be a ”slow evolving” organism as its gene and protein sequences appeared to have accumulated less changes over time compared to other invertebrates [29]. Therefore, it has re- tained a more ancestral gene repertoire than others. In addition, the phylogenetic posi- tion of P.dumerilii enables evolutionary comparisons between bilaterians (ecdysozoans, lophotrochozoans, and deuterostomes). This in turn allows the prediction of possible an- cestral genes and functions that were present in the common ancestor of protostomes and deuterostomes.

Besides the above advantages, P.dumerilii is also considered a suitable animal model for comparative studies that allow insights into general lophotrochozoan/spiralian fea- tures, including an invariant stereotypcal spiral cleavage pattern, and indirect develop- ment via the formation of a free swimming ciliated larva, the trochophore.

A defining feature of the trochophore larva is a ciliated ring, the prototroch, locatedin the equatorial plane. This structure consists of two rows of multiciliated cells, and starts to be visible at around 12 hours post fertilization (hpf) in P. dumerilii. Shortly after the formation of the prototroch a ciliated apical tuft is formed as part of the apical organ that also contains sensory cells, and followed by the formation of a posterior ciliated structure, the telotroch [30]. A similar ciliated larva is characteristic of the development of several spiralian taxa including mollusks, sipunculid worms, and most polychaete an- nelids [31].

Furthermore, and unlike other spiralians, Platynereis dumerilii embryos are transpar- ent. In combination with their invariant stereotypical cleavage mode, this allows to iden- 9 tify each cell in the developing . More importantly, the fate of each cell during the early development is known and can be identified based on size and position of the cell within the embryo. This represents a major advantage to study various dynamic processes during cleavage and development.

Particular to the development of the ciliated structures, P. dumerilii embryos at the 16-cell stage generate four embryonic cells called trochoblasts, the cell lineages that will contribute entirely to the ciliated ring. These four trochoblasts will divide twice and terminally differentiate into 16 of the 24 multiciliated cells that form the prototroch. Cilia start to be visible at around 12 hpf in the protochophore, a pre-larva that is slowly rotating inside its thick jelly coat. This rotation is known to be driven by cilia, indicating that the fully functional cilia should be present at this early stage of development. At a later stage (approx. 24 hpf), the fully hatched trochophore larva is freely swimming due to coordinated cilia motion.

Given the phylogenetic position and developmental features of Platynereis dumerilii including transparent embryos, and ciliated structures, this marine annelid represents a highly suited model system to study the underlying mechanisms of ciliary assembly.

The first step towards this end consists in identifying conserved ciliary genes. Notably the majority of the ciliary genes have been identified in in either vertebrates (deuteros- tomes) or in invertebrates from the ecdyzosoan superphylum (protostomes), C. elegans and D. melanogaster, which lack motile cilia. Therefore characterizing ciliary compo- nents in P. dumerilii model is of critical importance and significance to gain insights into the of functional diversity of ciliated cells in metazoans. To achieve this goal we developed several computational tools to retrieve and analyze data on ciliogenesis genes from in-house generated Platynereis dumerilii transcriptome sources.

Despite all the advantages of P. dumerilii as an animal model, a major obstacle when working with this non-conventional animal system is the lack of an annotated genome.

P. dumerilii has a large genome (1Gbp) in comparison to other animals in its taxonomic 10 group [32]. The difficulty to assemble the P. dumerilii genome is related to an expansion of repetitive sequences. It is estimated that the repeat content of Platynereis genomes ranges between 10 to 30% [32]. While the advent of newer sequencing technologies should soon be able to overcome this assembly limitation, to date no official genome for P. dumerilii has been published. This is important because it prohibits the use of the majority of existing bioinformatics solutions that often rely on the availability of an annotated reference genome.

In this study, we overcome the lack of an annotated genome using an in-house de novo assembled transcriptome comprising expression data from the first fourteen hours of development as well as from experimentally manipulated embryos that make more ciliated cells. Furthermore, in Chapter 2 of this thesis, we describe PdumBase, our

Platynereis dumerilii transcriptome database and user interface we developed, aimed at facilitating the exploration, discovery, and use of P. dumerilii by the broader research community.

Structure of this Dissertation

This dissertation is organized into five chapters. Chapter 1 provides the reader with a general introduction to the topic at hand. Chapters 2 to 4 correspond to manuscripts that have either been published in peer reviewed journals or are currently in submission. In Chapter 5, we provide a general conclusion that highlights the significance and future directions of the research presented in this work. A graphical overview of the complete data pipeline described in this thesis and corresponding to the individual chapters can be found in Figure 1.2.

More specifically, Chapter 2 focuses on data accessibility and features PdumBase, the first comprehensive transcriptome database for Platynereis dumerilii during early stages of development. PdumBase includes additional stages over the P.dumerilii life cycle and provides access to the expression data of 17,213 genes (31,806 transcripts) from our de novo assembled Platynereis transcriptome (Figure 1.2A). Expression data for each 11

Figure 1.2: Graphical overview of the computational analysis pipeline constituting this work. A Depiction of the assembly process con the transcriptomic data obtain from RNA-seq data during the early stages of normal development in P. dumerilii. B Description of the pertubation assay aimed at the identification of ciliogenesis related genes using differencial analysis between expression data from hyperciliated and normal cells. C Functional workflow of Dendroshiny. 12 gene includes the normalized FPKM, the raw read counts, and information that can be leveraged for statistical analyses of differential gene expression and the construction of genome-wide co-expression networks. In addition, PdumBase includes early stage transcriptome expression data from five further species as a valuable resource for com- parative analysis of early development in different organisms. PdumBase represents the first online resource for the early developmental transcriptome of Platynereis dumerilii. It serves as a research platform for discovery and exploration of gene expression during early stages, throughout P. dumerilii life cycle, and enables comparison to other model organisms.

Chapter three reports the identification of ciliogenesis candidate genes in P. dumer- ilii employing our computational multi-layer approach consisting of the following ma- jor units: 1) Identifying orthologous to genes from an extensive survey of known cil- iary genes described in other species representing a curated annotation of ciliary genes in the P. dumerilii transcriptome. 2) Integrating the differential expression analysis be- tween wild-type and hyperciliated P. dumerilii larva obtained by ectopically inducing the wnt-β-catenin pathway in early development. This enabled the identification and valida- tion of known ciliary genes that are also upregulated in the hyperciliated larva. 3) Co- expression analysis using normal development expression data extracted from Pdum- Base to implicate potential ciliogenesis candidate genes among the non-annotated tran- scripts, based on expression profile similarities (Figure 1.2B). The results summarized in this chapter highlight possible differences in the components necessary to initiate cilio- genesis in MCC cells during development in P.dumerilii and those described in vertebrate MCC differentiation. This work hence represents a first step towards the generation of a comprehensive set of ciliogenesis candidate genes that bridges the gap between our current understanding of phenotypic and genomic relationships of ciliogenesis in

Platynereis dumerilii and other species.

Chapter 4 focuses on data visualization and introduces DendroShiny, our in silico tool for the analysis of genome-wide expression data. DendroShiny allows to interactively ex- 13 amine gene expression clusters as the clustering parameters are adjusted. DendroShiny (1) clusters the genes based on the expression profiles of a set of well characterized genes and uses the features of that set to classify non-annotated genes, (2) displays an interactive gene tree representing the similarity among the gene expression patterns, (3) allows the user to define the final number of clusters and visualizes the resulting gene sets, (4) interactively displays the expression profiles and meta data of genes in each cluster, and (5) allows to browse the resulting clusters by various attributes such as gene name. The use of DendroShiny is exemplified using a dataset consisting of

P. dumerilii expression data. In this context, DendroShiny enabled the identification of candidate ciliary genes despite the lack of an annotated genome (Figure 1.2C). Overall, DendroShiny allows the user to intuitively explore clustered gene expression data and has the potential to facilitate the downstream analysis of transcriptomic data sets.

Finally, the concluding chapter provides an overview highlighting the significance of our findings and work reported in this dissertation. Furthermore it presents an outline of key future directions based on the research conducted within this dissertation.

Taken together, the work presented here lays the foundation to further studies aiming at the elucidation of ciliary assembly in P. dumerilii. In addition, it presents a sophisti- cated, computational approach for the integration of transcriptomic data in the absence of an annotated genome that can be incorporated in similar studies including other non- conventional organisms.

Bibliography

[1] Zita Carvalho-Santos, Juliette Azimzadeh, José B. Pereira-Leal, and Mónica Bettencourt-Dias. Tracing the origins of centrioles, cilia, and flagella. Journal of Cell Biology, 2011.

[2] Eric R. Brooks and John B. Wallingford. Multiciliated Cells. Current Biology, 2014.

[3] Hiroaki Ishikawa and Wallace F. Marshall. Ciliogenesis: Building the cell’s antenna. Nature Reviews Molecular Cell Biology, 2011. 14

[4] Hannah M. Mitchison and Enza Maria Valente. Motile and non-motile cilia in human : from function to phenotypes. Journal of Pathology, 2017.

[5] Wikimedia Commons. File:eukaryotic cilium diagram en.svg — wikimedia com- mons, the free media repository, 2017. [Online; accessed 6-June-2019].

[6] Nathalie Spassky and Alice Meunier. The development and functions of multicili- ated epithelia. Nature Reviews Molecular Cell Biology, 2017.

[7] Eric R. Brooks and John B. Wallingford. Multiciliated Cells. Current Biology, 2014.

[8] R. A. Bloodgood. Sensory reception is an attribute of both primary cilia and motile cilia. Journal of Cell Science, 2010.

[9] Sidney L. Tamm. Ciliary motion in paramecium: A scanning electron microscope study. Journal of Cell Biology, 1972.

[10] Fu-Shiang Chia, John Buckland-Nicks, and Craig M. Young. Locomotion of marine invertebrate larvae: a review. Canadian Journal of Zoology, 2009.

[11] Aoife M. Waters and Philip L. Beales. Ciliopathies: An expanding spectrum. Pediatric Nephrology, 2011.

[12] Jeremy F.Reiter and Michel R. Leroux. Genes and molecular pathways underpinning ciliopathies. Nature Reviews Molecular Cell Biology, 2017.

[13] Gregory J. Pazour, Nathan Agrin, John Leszyk, and George B. Witman. Proteomic analysis of a eukaryotic cilium. Journal of Cell Biology, 170(1):103–113, 2005.

[14] David R. Mitchell. The evolution of eukaryotic cilia and flagella as motile and sensory organelles. Advances in Experimental Medicine and Biology, 2007.

[15] Tomer Avidor-Reiss, Andreia M. Maer, Edmund Koundakjian, Andrey Polyanovsky, Thomas Keil, Shankar Subramaniam, and Charles S. Zuker. Decoding cilia function: Defining specialized genes required for compartmentalized cilia biogenesis. Cell, 117(4):527–539, 2004.

[16] Gáspár Jékely and Detlev Arendt. Evolution of intraflagellar transport from coated vesicles and autogenous origin of the eukaryotic cilium. BioEssays, 2006.

[17] Lotte B. Pedersen, Iben R. Veland, Jacob M. Schrøder, and Søren T. Christensen. Assembly of primary cilia. Developmental Dynamics, 2008.

[18] Alice Meunier and Juliette Azimzadeh. Multiciliated cells in animals. Cold Spring Harbor Perspectives in Biology, 2016.

[19] Lina Ma, Ian Quigley, Heymut Omran, and Chris Kintner. Multicilin drives centriole biogenesis via proteins. Genes and Development, 2014. 15

[20] Brice Marcet, Benoît Chevalier, Guillaume Luxardi, Christelle Coraux, Laure- Emmanuelle Zaragosi, Marie Cibois, Karine Robbe-Sermesant, Thomas Jolly, Bruno Cardinaud, Chimène Moreilhon, Lisa Giovannini-Chami, Béatrice Nawrocki-Raby, Philippe Birembaut, Rainer Waldmann, Laurent Kodjabachian, and Pascal Barbry. Control of vertebrate multiciliogenesis by miR-449 through direct repression of the Delta/Notch pathway. Nature cell biology, 2011.

[21] Alexandre Benmerah, Bénédicte Durand, Rachel H. Giles, Tess Harris, Linda Kohl, Christine Laclef, Sigolène M. Meilhac, Hannah M. Mitchison, Lotte B. Pedersen, Ronald Roepman, Peter Swoboda, Marius Ueffing, and Philippe Bastin. The more we know, the more we have to discover: An exciting future for understanding cilia and ciliopathies. Cilia, 2015.

[22] Marina Arbi, Dafni Eleftheria Pefani, Stavros Taraviras, and Zoi Lygerou. Control- ling centriole numbers: Geminin family members as master regulators of centriole amplification and multiciliogenesis. Chromosoma, 2018.

[23] Dafni Eleutheria Pefani, Maria Dimaki, Magda Spella, Nickolas Karantzelis, Eirini Mit- siki, Christina Kyrousi, Ioanna Eleni Symeonidou, Anastassis Perrakis, Stavros Tar- aviras, and Zoi Lygerou. Idas, a novel phylogenetically conserved geminin-related protein, binds to geminin and is required for cell cycle progression. Journal of Bio- logical Chemistry, 2011.

[24] Semil P. Choksi, Gilbert Lauter, Peter Swoboda, and Sudipto Roy. Switching on cilia: transcriptional networks regulating ciliogenesis. Development (Cambridge, England), 141(7):1427–41, 2014.

[25] S. P. Choksi, D. Babu, D. Lau, X. Yu, and S. Roy. Systematic discovery of novel ciliary genes through functional genomics in the zebrafish. Development, 141(17):3410– 3419, 2014.

[26] Mei I. Chung, Taejoon Kwon, Fan Tu, Eric R. Brooks, Rakhi Gupta, Matthew Meyer, Julie C. Baker, Edward M. Marcotte, and John B. Wallingford. Coordinated genomic control of ciliogenesis and cell movement by RFX2. eLife, 3:e01439, 2014.

[27] Semil P Choksi, Deepak Babu, Doreen Lau, Xianwen Yu, and Sudipto Roy. System- atic discovery of novel ciliary genes through functional genomics in the zebrafish. Development (Cambridge, England), 141(17):3410–9, 2014.

[28] Loubna El Zein, Aouatef Ait-Lounis, Laurette Morlé, Joëlle Thomas, Brigitte Chhin, Nathalie Spassky, Walter Reith, and Bénédicte Durand. RFX3 governs growth and beating efficiency of motile cilia in mouse and controls the expression ofgenes involved in human ciliopathies. Journal of cell science, 122(Pt 17):3180–9, 2009.

[29] Antje Hl Fischer, Thorsten Henrich, and Detlev Arendt. The normal development of Platynereis dumerilii (Nereididae, Annelida). Frontiers in zoology, 7(1):31, 2010. 16

[30] Rudolf A. Raff. Origins of the other metazoan body plans: The evolution of larval forms. Philosophical Transactions of the Royal Society B: Biological Sciences, 2008.

[31] Adriaan Dorresteijn. Cell lineage and gene expression in the development of poly- chaetes. Hydrobiologia, 2005.

[32] Juliane Zantke, Stephanie Bannister, Vinoth Babu Veedin Rajan, Florian Raible, and Kristin Tessmar-Raible. Genetic and genomic tools for the marine annelid Platynereis dumerilii. Genetics, 197(1):19–31, 2014. 17

PDUMBASE: A TRANSCRIPTOME DATABASE AND RESEARCH TOOL FOR PLATYNEREIS DUMERILII AND EARLY DEVELOPMENT OF OTHER METAZOANS

Modified from a paper published in BMC Genomics 2018 19:618

https://doi.org/10.1186/s12864-018-4987-0

HSIEN-CHAO CHOU1,3#, NATALIA ACEVEDO-LUNA 1,2#, JULIE A. KUHLMAN1, AND STEPHAN Q. SCHNEIDER1*

1 Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA 50011 USA.

2 Statistics Department, Iowa State University, Ames, IA 50011 USA.

3 Current address: Center for Research, National Institutes of Health, MD 20894 USA

# These authors contributed equally to this manuscript

* Corresponding author

Abstract

Background The marine polychaete annelid Platynereis dumerilii has recently emerged as a prominent organism for the study of development, evolution, stem cells, regener- ation, marine ecology, chronobiology and neurobiology within metazoans. Its phyloge- netic position within the spiralian/ lophotrochozoan clade, the comparatively high con- servation of ancestral features in the Platynereis genome, and experimental access to any stage within its life cycle, make Platynereis an important model for elucidating the complex regulatory and functional molecular mechanisms governing early development, later organogenesis, and various features of its larval and adult life. High resolution RNA- seq gene expression data obtained from specific developmental stages can be used to dissect early developmental mechanisms. However, the potential for discovery of these mechanisms relies on tools to search, retrieve, and compare genome-wide information within Platynereis, and across other metazoan taxa. 18

Results To facilitate exploration and discovery by the broader scientific community, we have developed a web-based, searchable online research tool, PdumBase, featuring the first comprehensive transcriptome database for Platynereis dumerilii during early stages of development (2h ∼ 14h). Our database also includes additional stages over the P. dumerilii life cycle and provides access to the expression data of 17,213 genes (31,806 transcripts) along with annotation information sourced from Swiss-Prot, , KEGG pathways, Pfam domains, TmHMM, SingleP, and EggNOG orthology. Expression data for each gene includes the stage, the normalized FPKM, the raw read counts, and information that can be leveraged for statistical analyses of differential gene expression and the construction of genome-wide co-expression networks. In addition, PdumBase offers early stage transcriptome expression data from five further species as a valuable resource for investigators interested in comparing early development in different organ- isms. To understand conservation of Platynereis gene models and to validate gene an- notation, most Platynereis gene models include a comprehensive phylogenetic analysis across 18 species representing diverse metazoan taxa. Conclusion PdumBase represents the first online resource for the early developmental transcriptome of Platynereis dumerilii. It serves as a research platform for discovery and exploration of gene expression during early stages, throughout the Platynereis life cycle, and enables comparison to other model organisms. PdumBase is freely available at http://pdumbase.gdcb.iastate.edu Keywords Platynereis dumerilii, Early development, Spiralian, Life cycle, Expression pro- file, Database. Comparative genomics, Evo-Devo

Background

The annelid Platynereis dumerilii is an ideal organism for developmental and compar- ative studies due to its phylogenetic position as a spiralian/lophotrochozoan, a species- rich but less known third branch of bilateral symmetrical organisms, whose members are instrumental for inferring ancestral bilaterian or urbilaterian features [33, 34, 35, 36, 37]. However, in comparison to the other two major bilaterian branches, the deuterostomes 19 that include vertebrates and humans, and the ecdysozoans that include nematodes and like C. elegans and the fruit fly Drosophila, respectively, substantially less genetic and molecular studies have been performed in spiralian species. As a spiralian, Platynereis dumerilii exhibits intriguing developmental features that include a stereotypic cleavage pattern and invariant cell lineages with predictable cell fates [38, 39, 40]. Hav- ing invariant cell lineages makes it possible to link gene expression in distinct embryonic cells to later cell progeny and organs in larval and adult stages. Knowing this allows to make predictions, and test hypothesis of how molecular processes regulate the cellular origin and composition of organs and body parts [41, 42]. To date, this powerful property has only been exploited in a limited number of studies [43, 44] but, as better descriptions of later cell lineages and access to stage-specific and cell-specific gene expression data becomes available, its use is expected to increase substantially [45, 46, 47, 48]. Addi- tionally, the morphological and genomic attributes exhibited by Platynereis have been useful for inferring ancestral gene structures and ancient cell types representative of ancestral bilaterian species [49, 50]. These inferences are based on the findings that Platynereis shares many common features with vertebrates, including similar gene ex- pression profiles during the development of the brain, the central , the eye, appendages, and muscles. These features have been instrumental in developing and testing hypotheses about the urbilaterian body plan and the origin of complex or- gan systems during animal evolution [35, 36, 51, 52, 53, 54]. For these reasons, over the last decade, Platynereis has emerged as a powerful spiralian model for compar- ative genomic analyses, evolutionary (evo-devo) studies, stem cells, regeneration, marine ecology, chronobiology and neurobiology [55, 56, 57, 58, 59]. Platynereis as a laboratory organism boasts an ever-increasing experimental toolkit, ex- panding the usefulness of this species in identifying and dissecting gene function: dur- ing early and late development; during organogenesis; in various cell types; and in neu- ronal circuits that dictate the circadian and lunar rhythm-controlled swimming behaviors [59, 60, 61, 62, 63, 64, 65, 66, 67]. 20

The on average 3 months life cycle of Platynereis dumerilii including sexual maturation and mating synchronized by a monthly lunar cycle is well established under laboratory conditions, and contributes to the attractiveness of this annelid spiralian species as a powerful experimental organism [68]. Mating leads to ‘instant’ external fertilization of thousands of eggs, allowing experimental access to thousands of highly synchronously developing embryonic, larval, juvenile, and adult stages that can be used for large scale biochemical and a variety of Omics studies [59].

Early development in Platynereis is characterized by a spiral cleavage mode, a series of invariant, stereotypic asymmetric cell divisions stages, generating a spiral arrange- ment of embryonic cells characterized by vastly different cell sizes and distinct cell fates [38, 39, 40, 41]. This early embryonic spiral cleavage phase transitions after six rounds of cell divisions [about 7 h post fertilization (hpf) at 18 °C], to bilateral symmetrical oriented asymmetric cell divisions. The time of development depends drastically on temperature. Features described at each stage refer to embryos developing at controlled temperature of 18 °C. By 14hpf, the continued rapid cell divisions have given rise to a distinctly orga- nized embryo of 330 cells, a pre-trochophore larval stage that emerges from the jelly coat that surrounds the egg, and begins to rotate via a ring of differentiated multi-ciliated cells, the prototroch, and a developing apical organ, a sensory organ at the animal pole [69].

Within the next few days of development Platynereis transitions through several free- swimming larval stages (a primitive trochophore at 24hpf, a more elaborate meta- trochophore at 48hpf, and a nectochaeta larvae at 72hpf) [69]. These stages are mor- phologically characterized by the additions of distinct ciliary structures, an elaboration of the head and trunk region including the establishment of a complex nervous system, larval and adult eyes, and the formation of trunk segments that bear the first pairs of ap- pendages. After 1 week, the now juvenile three segmented young worm switches from a free swimming to a benthic lifestyle. A growth zone at the posterior end continues to add 21 segments and serial appendages throughout juvenile and adult stages while growing to its adult size of 2 in. in length and sexual maturation within 3 months.

The above attributes make Platynereis a favorable subject for various high-throughput sequencing (HTS) techniques including but not limited to gene expression profiling though the quantitative analysis of transcriptomes (RNA-seq). RNA-seq provides an un- biased approach to determine the transcriptional inventory for a process, and, by captur- ing the dynamic temporal expression profiles through the identification of differentially expressed genes between developmental states, enables investigators to gain system- level insights into organismal development [42].

The massive amount of data produced by modern HTS experiments is both a chal- lenge and an opportunity for countless biological discoveries. However, it can only be effectively utilized as a scientific resource in conjunction with dedicated computational pipelines to preprocess, analyze, store, and visualize this information. Well-established algorithms for preprocessing of raw sequencing reads in terms of quality control [70, 71] and de-multiplexing [72, 73] as well as robust approaches for transcriptome assembly [74, 75, 76], differential expression analysis [77, 78, 79] and functional enrichment studies have been developed. However, as the generated HTS datasets capture a genome-wide transcriptional response, the generated data richness and complexity is often magni- tudes higher than the focused interest of the target study. Consequently, much of the data remains unexplored or is not easily accessible to the scientific community. Thus, computational and bioinformatics strategies have to be developed that enable broad, fast, and long-term accessibility of the data, as well as the intuitive ability to easily search, extract, and/or to graphically visualize molecular processes, pathways, and structural composition.

Ideally, such a research tool opens the discovery space and should be capable of ef- ficiently retrieving the information of interest while scaling well with large data sizes. This in turn requires the processed data to be stored in a standardized, queryable, and 22 platform-independent manner such that an application-specific subset of data can effi- ciently be served to an algorithmic solution built on top of such system. Notably, cen- tralized information systems, such as relational databases, fulfill all of the above require- ments by providing storage solutions combined with powerful querying language such as Structured Query Languages (SQL) [80]. These systems can be further combined with modern web technologies to form a graphical user interface capable of represent- ing vast amounts of data in a concise, interactive and platform agnostic manner. Indeed, web-based database solutions have previously been leveraged to represent a large array of biological data over a wide range of organisms [81, 82, 83], as well as metadata such as functional annotations [84] and pathway information [85].

For Platynereis, several transcriptomic datasets have recently been reported [47, 66, 86], covering various developmental, larval, juvenile, and adult stages during its 3 month long life cycle. However, public accessibility, as well as ease of data analysis is cur- rently limited and requires a priori knowledge of a transcript/gene to be able to retrieve sequence and expression data. This prior knowledge however is often not available, mo- tivating the need for the development of databases, which enable investigators to query transcriptome and expression data based on functional annotation as well as other fea- tures. In addition, only a limited number of attempts have been made to link data to other important developmental model species in order to facilitate comparisons of develop- mental and cell biological modules, a prerequisite for building testable hypothesis in the field of evolutionary and developmental biology.

To close this gap, we developed PdumBase, a comprehensive, stand-alone, and user- friendly web-based resource, offering researchers a sophisticated platform to study molecular function, expression patterns, and relationships between genes at early em- bryonic stages in Platynereis dumerilii. PdumBase comprises transcriptomic data from the first 14 h of development of Platynereis capturing the transition from asinglefer- tilized egg to a 330 cells hatched early protrochophore stage. The data in PdumBase represents the first Platynereis transcriptome obtained from seven time points of early 23 embryonic development, recently reported by our group [87]. PdumBase also displays data from a novel analysis to construct genome-wide co-expression networks to iden- tify hub genes and genes sharing similar expression patterns in early embryonic stages. Furthermore, PdumBase incorporates transcriptomic data from Conzelmann M. et al. [66] that includes selected time points of later developmental stages including larval, ju- venile, and adult stages covering the entire life cycle, as well as sexually mature male and female worms. To facilitate comparative and evo-devo studies PdumBase includes searchable expression data from distinct developmental stages of six other species that were available at the time of database construction: Strongylocentrotus purpuratus [88], Xenopus tropicalis [89], Danio rerio [90] , Ascaris suum [91], Nematostella vectensis [92], and human [93]. The selection of species for comparative analysis of early development, includes a deuterostome invertebrate and vertebrates, an ecdysozoan, a cnidarian, and transcriptional ‘germ layer’ states of early differentiated human stem cells, respectively, an arbitrary yet informative choice for this ‘pilot’ feature of PdumBase that will be ex- panded in future releases. Finally, to understand how the Platynereis gene models are re- lated to genes in other species, and to provide additional validation for the previous gene annotations, a comprehensive phylogenetic analysis that includes 18 selected species, representing different metazoan taxa of various evolutionary distances, including key spiralian species like the annelid Capitella teleta and the mollusk Crassostrea gigas, is available for each Platynereis gene model. Thus, this research tool enables exploration, discovery and hypothesis building to the community at large by providing intuitive and interactive access to genome-wide developmental expression data of early development in Platynereis.

Construction and Content

PdumBase is a user-friendly database that provides a platform to easily access and compare expression profiles of Platynereis transcripts during early stages of develop- ment (2 to 14 hpf), with the option of displaying later stages of development (24 hpf to 3 months, and sexually mature male and female). For each transcript, PdumBase contains 24 detailed annotation, a preliminary phylogeny tree based on orthologous groups, and co- expression networks. Co-expression data can be obtained based on differential gene expression between consecutive time points throughout early development. In addition, the database integrates transcriptomic information of six additional species, giving the option of examining expression data of orthologous genes. Figure 2.1 summarizes the content and the organization of data available in PdumBase.

Transcriptome Assembly and Annotation Pipeline

To create the early reference transcriptome for Platynereis dumerilii, we combined the mRNA data of all sequenced libraries ( ∼ 785 million paired-end reads) at 2, 4, 6, 8, 10, 12, and 14 hpf and performed de novo transcriptome assembly using Trinity [94] resulting in a total of 273,087 non-redundant contigs (N50 size: 1466 bp). Detailed methods, char- acterization and validation on the transcriptome de novo assembly, and the annotation pipeline for gene models have been reported previously [87].

To annotate the assembly, we first identified 28,580 genes (51,260 transcripts) with a predicted open reading frame (ORF) of more than 100 amino acids. These potential protein coding genes were consequently aligned to the Swiss-Prot database [95] and only retained, if the corresponding E-value of the alignment had a value of at least 10e-10 or lower. Using this procedure, a total of 17,213 genes (31,806 transcripts) were matched [87]. Pfam domains, cleavage sites and transmembrane helices were also identified and integrated into the database. These were predicted using HMMER [96], signal [97] and tmHMM [98] respectively with default parameters. Finally, Gene Ontology (GO) and eggNOG information were obtained from the Swiss-Prot database [95], whereas associated KEGG pathways were identified by a BLASTP search with an E-value cutoff 10e−10. 25

Figure 2.1: Schematic illustration of the PdumBase sitemap. The content for each tabulator of the PdumBase home interface is summarized in “Tab Boxes” (blue) at the top of the figure.“Search Results” (pink), and “Expanded Results” (dark green) outline data retrieved from different search options. The result interface includes the “Show other info” option; a summary of its content is shown in “Other info selection” box (light green). Expandable results option for. Orthologous groups and Co-expression are also shown (light blue). Optional output summarizes download- able content as well as file format options (light yellow) 26

Expression Analysis

To build a comprehensive transcriptional profile of early Platynereis development, we extracted mRNA from Platynereis dumerilii 2, 4, 6, 8, 10, 12, and 14 h post fertilization (hpf), each with two biological replicas, as described [87]. The resulting reads from these samples were subsequently mapped to the assembled transcripts using Bowtie [99]. The read count for each transcript was estimated by RSEM [100]. The trimmed mean of M-values (TMM) [74], normalized FPKM was calculated using the R package “edgeR” [77]. A transcript was considered as “expressed” if its FPKM is larger than 1. This repre- sents an empirically chosen inclusionary cutoff based on our current methods of valida- tion (1) by whole mount in situ hybridization, which is able to detect transcripts above a level of 5 FPKM, and (2) successful amplification of transcripts by PCR from stages with an FPKM of 1 or higher. Whole mount in situ hybridization was chosen as a method of validation to enable us to determine spatial expression of transcripts, and as a method that was sensitive enough to visualize the onset of gene transcription in a single cell, and on two genomic loci within each nucleus [87].

Thus, for expression analysis, 20,977 transcripts were included that fulfilled the criteria of being expressed in at least one stage at an FPKM larger than 1. These transcripts were clustered based on their expression pattern into 32 groups. We obtained the expression dendrogram using the hierarchical clustering algorithm (R function hclust). Since the de- termination of number of clusters is context-dependent, we first used an automatic tree- cutting algorithm to classify all genes into 32 groups (R function cutree, k = 32). Pairs of groups were then merged as moving up the hierarchy. Each merge was validated by visual inspection of their average expression patterns of all stages. We repeated this step until no qualified merges were found. This process leads to 15 distinct clusters demonstrating distinct expression patterns. For successive time stages, the differen- tially expressed (DE) genes were determined using edgeR and considered as differen- tially expressed if the adjusted p-value was smaller or equal to 0.001. A co-expression network was also constructed using WGCNA [101], an R package for weighted corre- 27 lation network analysis. The co-expression network can provide not only the correla- tion between two genes, but also their topological similarity. This similarity captures how closely related two genes are by examining if they share similar connectivity with other “third party” genes (Supplementary Figure A.1.1). In addition to early stage ex- pression data (2 to 14 hpf), we also incorporated 310 million reads of later stage RNA sequencing datasets ranging from 24 h to 3 months, and sexually mature male and fe- male provided by Conzelmann, et al. [66], that were first mapped to our gene models and displayed within the database. The inclusion of this information provides a more comprehensive time series of gene expression profiles at various times throughout the Platynereis dumerilii life cycle.

Comparative Transcriptome

To enable comparative studies of gene expression between model organisms during early embryogenesis, we incorporated the expression data from six additional species including Danio rerio [90], Xenopus tropicalis [89], Homo sapiens [93], Strongylocentrotus purpuratus [88], Nematostella vectensis [92] and Ascaris suum [91]. Orthologous groups were identified using OrthoMCL [102] with reciprocal BLASTP search at protein level.

To further assist comparative analyses and to provide annotation validation, we con- structed a large scale evolutionary comparison between our gene models with 17 other species in phylogenetically informative positions including Capitella teleta [103], Helob- della robusta [103], Lottia gigantea [103] Crassostrea gigas [104], Daphnia pulex [103], Tribolium castaneum [103], Drosophila melanogaster [82], Strongylocentrotus purpura- tus [83], Saccoglossus kowalevskii [103], Branchiostoma floridae [103], Danio rerio [105], Xenopus tropicalis [103], Homo sapiens [105], Nematostella vectensis [103], Amphime- don queenslandica [103], adherens [103], and Monosiga brevicollis [103]. A custom OrthoMCL pipeline was used to identify a total of 40,206 homologous groups out of which 32,482 groups have at least two species. A pairwise BLASP search with cutoff 1e-05 and 50% identity was performed. The orthologous relationship was established 28 only if genes are the reciprocal best hits for any two species. For each Platynereis dumer- ilii gene having orthologous genes, a phylogenetic tree was created using RAxML [106]. Both, maximum likelihood and parsimony trees can be accessed in PdumBase. The mul- tiple alignment for each orthologous group is constructed using CLUSTALW [107].

Utility and Discussion

PdumBase provides a comprehensive, versatile online tool to investigate stage spe- cific transcriptional inputs during embryogenesis and throughout the life cycle ofthe annelid Platynereis dumerilii and other selected species. Users can search the genes of interest and examine their functional annotation, expression profiles, co-expression networks, and use the genes for comparative analyses. In addition to the search mod- ule, PdumBase also offers the option of downloading the queried data and searching the transcriptome using BLAST. Here we describe the interface and functions of these modules.

Annotation Search

To facilitate utility of PdumBase for a broad range of research questions, users may want to mine data using a variety of search parameters. This versatile accessibility to retrieve genes of interest is provided through a powerful search engine allowing users to query specific keywords covering a large array of functional annotation categories. These include gene ID, protein name, gene ontology, Pfam, KEGG pathways, eggnog, SignalP and tmHMM information. The search results are displayed as an expression profile of the matched genes as well as their functional annotation information. Byde- fault, the results page displays the Swiss-Prot annotation based on protein level BLAST search, and the expression data for early stages (2 to 14 hpf). Users can expand this table to include expression profiles of later stages (by checking the “Show later stages” box), detailed annotation (by selecting the “Show detailed annotation” box) and other advanced information by selecting “Show plots” and “Show other info” in the interface 29

(Figure 2.2). More detailed annotation can be obtained from the interface with links to the gene ontology, Pfam, KEGG pathways, EggNOG, SignalP and tmHMM information of the corresponding genes. Links to external data sources are also provided. Where available, the links to additional pages will be shown, including the expression of orthol- ogous genes whereas phylogenetic analysis and co-expression network information can be obtained by clicking “Show other info” (Figure 2.2).

Sequence Similarity Search

In addition to searching using keywords, PdumBase also supports sequence-based queries via the BLAST search page. Users can input either or protein se- quence. The search options for BLAST, including the E-value, the number of alignments, and the type of search are customizable. The search results return the expression as well as the functional annotation information for the matched sequence.

Additional properties of each gene such as (i) isoform level data and corresponding plots, (ii) replicate information including all biological and technical replicates, (iii) the raw read count information as estimated by RSEM, and (iv) detailed annotation informa- tion (Supplementary Figure A.1.2) can also be accessed from the gene ID page. While the main results page displays information on the best hit for genes in Platynereis, the detailed annotation information provides information regarding the source of the anno- tation.

Comparative Analysis

For investigators interested in comparative studies, PdumBase also provides a search- able interface for an additional six species including Danio rerio, Xenopus tropicalis, Homo sapiens, Nematostella vectensis, Strongylocentrotus purpuratus, and Ascaris suum. 30

Figure 2.2: PdumBase search interface and expandable search result options. a PdumBase Search interface. An example under Functional Annotation on “Keyword search” field is shown. b The PdumBase Results interface displays the expression data at different stages of devel- opment for each identified gene. c Screen capture of the interface obtained when theoption “Show detailed annotation” is selected from the Results screen. Detailed annotation includes Gene Ontology, KEGG, EggNOG, and more. d The results interface when the option “Show plots” is selected. This option displays the expression data (in FPKM) plotted over time of development for each gene found. e Result interface with the “Show other info” option selected. Clicking on the Green checkmark icon will redirect to a new tab with expanded search results for: “Ortholog Expression”, “Ortholog groups”, and “Co-expression” 31

Through “Show other info” in the search result page, users can link to comparative in- formation provided under the “Ortholog expressions” column if orthologous genes were identified through our automated approach (see above). The developmental expression profiles for all the orthologous genes are also provided in a single window to facilitate easy comparison of orthologous expression patterns. To further facilitate comparative studies and to enable additional validation for the annotations, PdumBase displays the phylogenetic analyses for each P. dumerilii gene with homologous counterparts across 18 species. The “Orthologous groups” page includes four tabs: (i) a list of information re- lated to all orthologous genes including the protein name, protein ID (accession number), species, species class and the links for downloading their protein or cDNA sequences, (ii) a tree plot based on maximum likelihood, (iii) a tree plot based on maximum parsi- mony, and (iv) a CLUSTALW multiple alignment report. The tree files, alignment report and all the protein/cDNA sequences are downloadable (Figure 3).

Co-expression Information

Expression patterns shared by multiple genes may indicate the function in a similar bi- ological process. By clicking on corresponding entry in the “Co-expression Info” column, PdumBase provides co-expression information that may reveal the potential interaction among a set of genes. This page is tabulated into three categories, namely “The same cluster”, “All”, and “DE genes”. Genes found within the same cluster by clustering analysis are shown in the first tab whereas the second tab includes every developmental gene expression profile independent of the cluster analysis. The co-expression network also provides the topological overlap information, which can be used to examine the simi- larity of interaction with all other genes in the network. The “DE genes” tab shows the co- expressed genes between each adjacent developmental stage for both, up-regulated and down-regulated genes. 32

Figure 2.3: PdumBase expandable results option. a PdumBase results interface displaying the options under “Show Other Info”. Click on the green check mark will open a new tab. b Inter- face retrieved when the green check mark icon under “Ortholog group” is selected. The resulting interface displays for the particular Platynereis gene under search, all the species in which an orthologous gene was found. The protein and/or cDNA sequences can be downloaded individu- ally for each orthologous gene, or for the complete set shown in the result table. c and d Display Phylogenetic trees of the orthologous groups: ML tree (c) and Tree- Parsimony (d) 33

Database Implementation

All data content is stored in a MySQL database. The web interface was implemented using PHP and the Smarty template engine. The business logics and presentation are separated by employing the model-view-controller design pattern. A centralized con- troller is used to coordinate the client requests and generate the corresponding views. All plots are either generated by server-side R scripts or client-side JavaScripts. In ad- dition, a BLAST database was constructed for the transcriptome assembly using NCBI BLAST.

Additional Features

Data Export The annotation and BLAST search results are all downloadable in text format (CSV) or as an Excel sheet. In addition, all assembled transcripts; annotation, expression data and differentially expressed genes are also available for download on our website. The raw Illumina reads are available upon request.

Manual A manual providing a detailed description of all the features and how to ac- cess them is available on the PdumBase web page http://pdumbase.gdcb.ias- tate.edu/platynereis/controller.php?action=manual, as well as in Appendix A.

Conclusions

PdumBase offers a user-friendly platform for researchers to study the regulatory land- scape for a spiral-cleaving embryo with an emphasis on early developmental stages of Platynereis dumerilii, and selected later stages throughout its life cycle. It provides re- searchers with an online tool for fast, dynamic, retrieval of expression patterns during early stages of development and life cycle. Furthermore, the large-scale data sets and comparative analyses offer valuable information for the studies of molecular dynamics 34 and evolutionary aspects of the P.dumerilii transcriptome in comparison to other model systems. It represents one of the first attempts to integrate and harness developmental expression profiles from various species in combination with phylogeny based annota- tions offering a versatile online research tool to discover and investigate various aspects of animal evolution and development.

Future expansions for PdumBase may include genome-wide expression profiles after experimental manipulations, single cell transcriptomic data for early stages of devel- opment, and genomic information of regulatory regions to provide further entry points for and network analysis. Further database subdivisions could display more detailed and/or manually curated aspect of early development e.g. asymmetric cell di- vision, distinct pathways, or the emergence of distinct cell lineages and cell types. Addi- tionally, we consider the inclusion of images of gene expression data as a possible and valuable future expansion. Further PdumBase expansions may include an interactive vi- sualization of clustered genes based on expression profile. Such tool would enable the user to fine tune the cluster parameters while simultaneously visualizing the changein cluster composition allowing to user to examine different sets of genes at a time.

As such, PdumBase can be seen as a prototype for an online research tool to make any large-scale genome-wide data set quickly accessible to researchers without requiring prior expertise in bioinformatics, showcasing how valuable and extensive transcriptional data set can be made accessible for community wide data mining. The versatility and variety of search options enable a wider range of research questions to be investigated both within a single laboratory and across the scientific community. The creation of such a database opens the genome-wide discovery space of data sets to the entire greater scientific community.

List of Abbreviations

BLAST, Basic Local Alignment Search Tool; FPKM, Fragments Per Kilobase per Mil- lion mapped reads; GO, gene ontology; hpf, hours post fertilization; KEGG, Kyoto Ency- 35 clopedia of Genes and Genomes; ORF, open reading frame; TMM, Trimmed Mean of M-values.

Declarations

Availability of Data and Materials

PdumBase is freely at http://pdumbase.gdcb.iastate.edu. The transcriptome assem- bly and functional annotation is available in the main menu download tab. Raw, prepro- cessed reads, and BAM files are available upon requests.

Competing Interests

The authors declare they have no competing interests.

Funding

Funding for this work was provided by the Roy J. Carver Charitable Trust and the Na- tional Science Foundation (Award ID 1455185) to SQS. The funding bodies had no role in the design of the study or the collection, analysis or interpretation of the data or the writing of the manuscript.

Authors’ Contributions

SQS, and HCC conceived the study. HCC processed the data and developed the database. NAL curated the public version of the database, and prepared the manual and figures. HCC and NAL wrote the manuscript. SQS (invertebrate data sets), andJAK (vertebrate data sets) designed and coordinated the study, and revised the manuscript. All authors read, corrected and approved the final manuscript. 36

Acknowledgements

The authors wish to thank Levi Baber for setting up and maintaining the production server of PdumBase.

Bibliography

[33] Florian Raible and Kristin Tessmar-Raible. Platynereis dumerilii. Current Biology, 24(15), 2014.

[34] Antonella Lauri, Thibaut Brunet, Mette Handberg-Thorsager, Antje H.L. Fischer, Oleg Simakov, Patrick R.H. Steinmetz, Raju Tomer, Philipp J. Keller, and Detlev Arendt. Development of the Annelid Axochord: Insights into notochord evolution. Science, 345(6202):1365–1368, 2014.

[35] Kristin Tessmar-Raible, Florian Raible, Foteini Christodoulou, Keren Guy, Martina Rembold, Harald Hausen, and Detlev Arendt. Conserved Sensory-Neurosecretory Cell Types in Annelid and Fish Forebrain: Insights into Hypothalamus Evolution. Cell, 129(7):1389–1400, 2007.

[36] Alexandru S. Denes, Gáspár Jékely, Patrick R H Steinmetz, Florian Raible, Heidi Sny- man, Benjamin Prud’homme, David E K Ferrier, Guillaume Balavoine, and Detlev Arendt. Molecular Architecture of Annelid Nerve Cord Supports Common Origin of Nervous System Centralization in Bilateria. Cell, 129(2):277–288, 2007.

[37] Detlev Arendt, Ulrich Technau, and Joachim Wittbrodt. Evolution of the bilaterian larval foregut. Nature, 409(6816):81–85, 2001.

[38] Edmund B. Wilson. The cell‐lineage of Nereis. A contribution to the cytogeny of the annelid body. Journal of Morphology, 6(3):361–480, 1892.

[39] A W C Dorresteijn. Quantitative analysis of during early em- bryogenesis of Platynereis dumerilii. Roux’s Archives of Developmental Biology, 199:14–30, 1990.

[40] Stephan Schneider, Albrecht Fischer, and Adriaan W.C. Dorresteijn. A morphome- tric comparison of dissimilar early development in sibling species of Platynereis (Annelida, Polychaeta). Roux’s Archives of Developmental Biology, 201(4):243–256, 1992.

[41] Christian Ackermann, Adriaan Dorresteijn, and Albrecht Fischer. Clonal domains in postlarval Platynereis dumerilii (Annelida: Polychaeta). Journal of Morphology, 266(3):258–280, 2005. 37

[42] Elizabeth A. Williams and Gáspár Jékely. Towards a systems-level understanding of development in the marine annelid Platynereis dumerilii. Current Opinion in Genetics and Development, 39:175–181, 2016.

[43] Stephan Q. Schneider and Bruce Bowerman. β-Catenin Asymmetries after All An- imal/Vegetal- Oriented Cell Divisions in Platynereis dumerilii Embryos Mediate Bi- nary Cell-Fate Specification. Developmental Cell, 13(1):73–86, 2007.

[44] Kaia Achim, Nils Eling, Hernando Martinez Vergara, Paola Yanina Bertucci, Jacob Musser, Pavel Vopalensky, Thibaut Brunet, Paul Collier, Vladimir Benes, John C Mar- ioni, and Detlev Arendt. Whole-body single-cell sequencing reveals transcriptional domains in the annelid larval body. and Evolution, 2018.

[45] B. Duygu Özpolat, Mette Handberg-Thorsager, Michel Vervoort, and Guillaume Bal- avoine. Cell lineage and cell cycling analyses of the 4d micromere using live imaging in the marine annelid platynereis dumerilii. eLife, 6, 2017.

[46] Hernando Martínez Vergara, Paola Yanina Bertucci, Peter Hantz, Maria Antoni- etta Tosches, Kaia Achim, Pavel Vopalensky, and Detlev Arendt. Whole-organism cellular gene-expression atlas reveals conserved cell types in the ventral nerve cord of Platynereis dumerilii. Proceedings of the National Academy of Sciences, 114(23):5878–5885, 2017.

[47] Kaia Achim, Jean-Baptiste Pettit, Luis R Saraiva, Daria Gavriouchkina, Tomas Lars- son, Detlev Arendt, and John C Marioni. High-throughput spatial mapping of single- cell RNA-seq data to tissue of origin. Nature Biotechnology, 33(5):503–509, 2015.

[48] Antje H.L. Fischer and Detlev Arendt. Mesoteloblast-like mesodermal stem cells in the polychaete annelid platynereis dumerilii (nereididae). Journal of Experimental Zoology Part B: Molecular and Developmental Evolution, 320(2):94–104, 2013.

[49] Florian Raible. Vertebrate-Type -Rich Genes in the Marine Annelid Platynereis dumerilii. Science, 310(5752):1325–1326, 2005.

[50] Detlev Arendt, Kristin Tessmar-Raible, Heidi Snyman, Adriaan W. Dorresteijn, and Joachim Wittbrodf. Ciliary photoreceptors with a vertebrate-type opsin in an inver- tebrate brain. Science, 306(5697):869–871, 2004.

[51] Detlev Arendt, Kristin Tessmar, Maria-Ines Medeiros de Campos-Baptista, Adriaan Dorresteijn, and Joachim Wittbrodt. Development of pigment-cup eyes in the poly- chaete Platynereis dumerilii and evolutionary conservation of larval eyes in Bilateria. Development (Cambridge, England), 129(5):1143–54, 2002.

[52] Nicolas Dray, Kristin Tessmar-Raible, Martine Le Gouar, Laura Vibert, Foteini Christodoulou, Katharina Schipany, Aurélien Guillou, Juliane Zantke, Heidi Snyman, Julien Béhague, Michel Vervoort, Detlev Arendt, and Guillaume Balavoine. Hedge- hog signaling regulates segment formation in the annelid Platynereis. Science, 329(5989):339–342, 2010. 38

[53] Thibaut Brunet, Antje HL Fischer, Patrick RH Steinmetz, Antonella Lauri, Paola Bertucci, and Detlev Arendt. The evolutionary origin of bilaterian smooth and stri- ated myocytes. eLife, 5:e19607, 2016.

[54] Jan Grimmel, Adriaan W.C. Dorresteijn, and Andreas C. Fröbius. Formation of body appendages during caudal regeneration in Platynereis dumerilii: Adaptation of con- served molecular toolsets. EvoDevo, 7(1), 2016.

[55] Eve Gazave, Julien Béhague, Lucie Laplane, Aurélien Guillou, Laetitia Préau, Adrien Demilly, Guillaume Balavoine, and Michel Vervoort. Posterior elongation in the an- nelid Platynereis dumerilii involves stem cells molecularly related to primordial germ cells. Developmental Biology, 382(1):246–267, 2013.

[56] Nicole Rebscher, Anika Kristin Lidke, and Christian Friedrich Ackermann. Hidden in the crowd: Primordial germ cells and somatic stem cells in the mesodermal poste- rior growth zone of the polychaete Platynereis dumerillii are two distinct cell popu- lations. EvoDevo, 3(1), 2012.

[57] Kathrin Pfeifer, Adriaan W.C. Dorresteijn, and Andreas C. Fröbius. Activation of Hox genes during caudal regeneration of the polychaete annelid Platynereis dumerilii. Development Genes and Evolution, 222(3):165–179, 2012.

[58] Mengting Yang, Jiaqi Liu, Xiangru Zhang, and Susan D. Richardson. Comparative Toxicity of Chlorinated Saline and Freshwater Wastewater Effluents to Marine Or- ganisms. Environmental Science and Technology, 49(24):14475–14483, 2015.

[59] Juliane Zantke, Tomoko Ishikawa-Fujiwara, Enrique Arboleda, Claudia Lohs, Katha- rina Schipany, Natalia Hallay, Andrew D. Straw, Takeshi Todo, and Kristin Tessmar- Raible. Circadian and Circalunar Clock Interactions in a Marine Annelid. Cell Reports, 5(1):99–113, 2013.

[60] Benjamin Backfisch, Vitaly V. Kozin, Stephan Kirchmaier, Kristin Tessmar-Raible, and Florian Raible. Tools for gene-regulatory analyses in the marine annelid Platynereis dumerilii. PLoS ONE, 9(4), 2014.

[61] Stephanie Bannister, Olga Antonova, Alessandra Polo, Claudia Lohs, Natalia Hallay, Agne Valinciute, Florian Raible, and Kristin Tessmar-Raible. TALENs mediate effi- cient and heritable mutation of endogenous genes in the marine annelid Platynereis dumerilii. Genetics, 197(1):77–89, 2014.

[62] Juliane Zantke, Stephanie Bannister, Vinoth Babu Veedin Rajan, Florian Raible, and Kristin Tessmar-Raible. Genetic and genomic tools for the marine annelid Platynereis dumerilii. Genetics, 197(1):19–31, 2014.

[63] Nadine Randel, Albina Asadulina, Luis A. Bezares-Calderón, Csaba Verasztó, Eliza- beth A. Williams, Markus Conzelmann, Réza Shahidi, and Gáspár Jékely. Neuronal connectome of a sensory-motor circuit for visual navigation. eLife, 3, 2014. 39

[64] Maria Antonietta Tosches, Daniel Bucher, Pavel Vopalensky, and Detlev Arendt. Melatonin signaling controls circadian swimming behavior in marine zooplankton. Cell, 159(1):46–57, 2014.

[65] Vinoth Babu Veedin-Rajan, Ruth M. Fischer, Florian Raible, and Kristin Tessmar- Raible. Conditional and Specific Cell Ablation in the Marine Annelid Platynereis dumerilii. PLoS ONE, 8(9), 2013.

[66] Markus Conzelmann, Elizabeth A. Williams, Karsten Krug, Mirita Franz-Wachtel, Boris Macek, and Gáspár Jékely. The neuropeptide complement of the marine an- nelid Platynereis dumerilii. BMC Genomics, 14(1), 2013.

[67] M. Conzelmann, S.-L. Offenburger, A. Asadulina, T. Keller, T. A. Munch, and G. Jekely. Neuropeptides regulate swimming depth of Platynereis larvae. Proceedings of the National Academy of Sciences, 108(46):E1174–E1183, 2011.

[68] Albrecht Fischer and Adriaan Dorresteijn. The polychaete Platynereis dumerilii (An- nelida): A laboratory animal with spiralian cleavage, lifelong segment proliferation and a mixed benthic/pelagic life cycle. BioEssays, 26(3):314–325, 2004.

[69] Antje Hl Fischer, Thorsten Henrich, and Detlev Arendt. The normal development of Platynereis dumerilii (Nereididae, Annelida). Frontiers in zoology, 7(1):31, 2010.

[70] Jared O’Connell, Ole Schulz-Trieglaff, Emma Carlson, Matthew M. Hims, Niall A. Gormley, and Anthony J. Cox. NxTrim: Optimized trimming of Illumina mate pair reads. Bioinformatics, 31(12):2035–2037, 2015.

[71] Anthony M. Bolger, Marc Lohse, and Bjoern Usadel. Trimmomatic: A flexible trim- mer for Illumina sequence data. Bioinformatics, 30(15):2114–2120, 2014.

[72] Marcel Martin. Cutadapt removes adapter sequences from high-throughput se- quencing reads. EMBnet.journal, 17(1):10, 2011.

[73] Mikkel Schubert, Stinus Lindgreen, and Ludovic Orlando. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9(1):88, 2016.

[74] Brian J Haas, Alexie Papanicolaou, Moran Yassour, Manfred Grabherr, Philip D Blood, Joshua Bowden, Matthew Brian Couger, David Eccles, Bo Li, Matthias Lieber, Matthew D Macmanes, Michael Ott, Joshua Orvis, Nathalie Pochet, Francesco Strozzi, Nathan Weeks, Rick Westerman, Thomas William, Colin N Dewey, Robert Henschel, Richard D Leduc, Nir Friedman, and Aviv Regev. De novo transcript se- quence reconstruction from RNA-seq using the Trinity platform for reference gen- eration and analysis. Nature protocols, 8(8):1494–1512, 2013.

[75] Bastien Chevreux, Thomas Pfisterer, Bernd Drescher, Albert J Driesel, Werner EG Müller, Thomas Wetter, and Sándor Suhai. Using the miraEST assembler for reliable 40

and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome research, 14(6):1147–59, 6 2004.

[76] Bo Li, Nathanael Fillmore, Yongsheng Bai, Mike Collins, James A Thomson, Ron Stewart, and Colin N Dewey. Evaluation of de novo transcriptome assemblies from RNA-Seq data. Genome biology, 15(12):553, 2014.

[77] M D Robinson and A Oshlack. A scaling normalization method for differential ex- pression analysis of RNA-seq data. Genome Biol, 11(3):R25, 2010.

[78] Cole Trapnell, Lior Pachter, and Steven L. Salzberg. TopHat: Discovering splice junc- tions with RNA-Seq. Bioinformatics, 25(9):1105–1111, 2009.

[79] Likun Wang, Zhixing Feng, Xi Wang, Xiaowo Wang, and Xuegong Zhang. DEGseq: An R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 26(1):136–138, 2009.

[80] E. F. Codd. A relational model of data for large shared data banks. Communications of the ACM, 26(1):64–69, 1970.

[81] Sheila A Kitchen, Camerron M Crowder, Angela Z Poole, Virginia M Weis, and Eli Meyer. De Novo Assembly and Characterization of Four Anthozoan (Phylum ) Transcriptomes. G3 (Bethesda, Md.), 5(11):2441–52, 2015.

[82] Helen Attrill, Kathleen Falls, Joshua L Goodman, Gillian H Millburn, Giulia Anton- azzo, Alix J Rey, Steven J Marygold, and the FlyBase FlyBase Consortium. FlyBase: establishing a Gene Group resource for Drosophila melanogaster. Nucleic acids research, 44(D1):786–92, 2016.

[83] R. Andrew Cameron, Manoj Samanta, Autumn Yuan, Dong He, and Eric Davidson. SpBase: The sea urchin genome database and web site. Nucleic Acids Research, 37(SUPPL. 1), 2009.

[84] Val Curwen, Eduardo Eyras, T. Daniel Andrews, Laura Clarke, Emmanuel Mongin, Steven M J Searle, and Michele Clamp. The Ensembl automatic gene annotation system. Genome Research, 14(5):942–950, 2004.

[85] Shujiro Okuda, Takuji Yamada, Masami Hamajima, Masumi Itoh, Toshiaki Katayama, Peer Bork, Susumu Goto, and Minoru Kanehisa. KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic acids research, 36(Web Server issue), 2008.

[86] Michal Levin, Leon Anavy, Alison G. Cole, Eitan Winter, Natalia Mostov, Sally Khair, Naftalie Senderovich, Ekaterina Kovalev, David H. Silver, Martin Feder, Selene L. Fernandez-Valverde, Nagayasu Nakanishi, David Simmons, Oleg Simakov, Tomas Larsson, Shang-Yun Liu, Ayelet Jerafi-Vider, Karina Yaniv, Joseph F. Ryan, Mark Q. Martindale, Jochen C. Rink, Detlev Arendt, Sandie M. Degnan, Bernard M. Degnan, 41

Tamar Hashimshony, and Itai Yanai. The mid-developmental transition and the evo- lution of animal body plans. Nature, 531(7596):637–641, 2016.

[87] Hsien-Chao Chou, Margaret M Pruitt, Benjamin R Bastin, and Stephan Q Schneider. A transcriptional blueprint for a spiral-cleaving embryo. BMC genomics, 17:552, 2016.

[88] Qiang Tu, R. Andrew Cameron, and Eric H. Davidson. Quantitative developmental transcriptomes of the sea urchin Strongylocentrotus purpuratus. Developmental Biology, 2014.

[89] Meng How Tan, Kin Fai Au, Arielle L. Yablonovitch, Andrea E. Wills, Jason Chuang, Julie C. Baker, Wing Hung Wong, and Jin Billy Li. RNA sequencing reveals a diverse and dynamic repertoire of the Xenopus tropicalis transcriptome over development. Genome Research, 2013.

[90] Steven A. Harvey, Ian Sealy, Ross Kettleborough, Fruzsina Fenyes, Richard White, Derek Stemple, and James C. Smith. Identification of the zebrafish maternal and paternal transcriptomes. Development, 2013.

[91] Jianbin Wang, Julianne Garrey, and Richard E. Davis. Transcription in pronuclei and one- to four-cell embryos drives early development in a nematode. Current Biology, 2014.

[92] Rebecca Rae Helm, Stefan Siebert, Sarah Tulin, Joel Smith, and Casey William Dunn. Characterization of differential transcript abundance through time during Nematostella vectensis development. BMC Genomics, 2013.

[93] Casey A. Gifford, Michael J. Ziller, Hongcang Gu, Cole Trapnell, Julie Donaghey, Alexander Tsankov, Alex K. Shalek, David R. Kelley, Alexander A. Shishkin, Rob- byn Issner, Xiaolan Zhang, Michael Coyne, Jennifer L. Fostel, Laurie Holmes, Jim Meldrim, Mitchell Guttman, Charles Epstein, Hongkun Park, Oliver Kohlbacher, John Rinn, Andreas Gnirke, Eric S. Lander, Bradley E. Bernstein, and Alexander Meissner. Transcriptional and epigenetic dynamics during specification of human embryonic stem cells. Cell, 2013.

[94] Sarah Tulin, Derek Aguiar, Sorin Istrail, and Joel Smith. A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems. EvoDevo, 4(1):16, 2013.

[95] E Jung, a L Veuthey, E Gasteiger, and a Bairoch. Annotation of in the SWISS-PROT database. Proteomics, 1(2):262–8, 2001.

[96] Dong Su Yu, Dae Hee Lee, Seong Keun Kim, Choong Hoon Lee, Ju Yeon Song, Eun Bae Kong, and Jihyun F. Kim. Algorithm for predicting functionally equivalent proteins from BLAST and HMMER searches. Journal of Microbiology and Biotech- nology, 22(8):1054–1058, 2012. 42

[97] Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne, and Henrik Nielsen. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods, 8(10):785–786, 2011.

[98] a Krogh, B Larsson, G von Heijne, and E Sonnhammer. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of molecular biology, 305(3):567–580, 2001.

[99] B Langmead, C Trapnell, M Pop, and S L Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the . Genome Biol, pages 1–10, 2009.

[100] Bo Li, Colin N Dewey, Z Wang, M Gerstein, M Snyder, Y Katz, ET Wang, EM Airoldi, CB Burge, M Nicolae, S Mangul, I Măndoiu, A Zelikovsky, H Jiang, WH Wong, C Trap- nell, B Williams, G Pertea, A Mortazavi, G Kwan, M van Baren, S Salzberg, B Wold, L Pachter, B Li, V Ruotti, RM Stewart, JA Thomson, CN Dewey, S Anders, W Huber, MD Robinson, DJ McCarthy, GK Smyth, M Guttman, M Garber, JZ Levin, J Don- aghey, J Robinson, X Adiconis, L Fan, MJ Koziol, A Gnirke, C Nusbaum, JL Rinn, ES Lander, A Regev, G Robertson, J Schein, R Chiu, R Corbett, M Field, SD Jackman, K Mungall, S Lee, HM Okada, JQ Qian, M Griffith, A Raymond, N Thiessen, T Cezard, YS Butterfield, R Newsome, SK Chan, R She, R Varhol, B Kamoh, AL Prabhu, ATam, Y Zhao, RA Moore, M Hirst, MA Marra, SJM Jones, PA Hoodless, I Birol, MG Grabherr, BJ Haas, M Yassour, JZ Levin, Da Thompson, I Amit, X Adiconis, L Fan, R Raychowd- hury, Q Zeng, Z Chen, E Mauceli, N Hacohen, A Gnirke, N Rhind, F di Palma, BW Bir- ren, C Nusbaum, K Lindblad-Toh, N Friedman, A Regev, U Nagalakshmi, Z Wang, K Waern, C Shou, D Raha, M Gerstein, M Snyder, JC Marioni, CE Mason, SM Mane, M Stephens, Y Gilad, R Morin, M Bainbridge, A Fejes, M Hirst, M Krzywinski, T Pugh, H McDonald, R Varhol, S Jones, M Marra, X Wang, Z Wu, X Zhang, GJ Faulkner, ARR Forrest, AM Chalk, K Schroder, Y Hayashizaki, P Carninci, DA Hume, SM Grimmond, A Mortazavi, BA Williams, K McCue, L Schaeffer, B Wold, J Feng, W Li, T Jiang, B Paşaniuc, N Zaitlen, E Halperin, H Richard, MH Schulz, M Sultan, A Nürnberger, S Schrinner, D Balzereit, E Dagand, A Rasche, H Lehrach, M Vingron, SA Haas, ML Yaspo, M Taub, D Lipson, TP Speed, F De Bona, S Ossowski, K Schneeberger, G Ratsch, C Trapnell, L Pachter, SL Salzberg, KF Au, H Jiang, L Lin, Y Xing, WH Wong, A Roberts, H Pimentel, C Trapnell, L Pachter, B Langmead, C Trapnell, M Pop, SL Salzberg, H Li, B Handsaker, A Wysoker, T Fennell, J Ruan, N Homer, G Marth, G Abecasis, R Durbin, WJ Kent, CW Sugnet, TS Furey, KM Roskin, TH Pringle, AM Zahler, null Haussler, null David, J Li, H Jiang, WH Wong, SA Bustin, L Shi, LH Reid, WD Jones, R Shippy, JA Warrington, SC Baker, PJ Collins, F de Longueville, ES Kawasaki, KY Lee, Y Luo, YA Sun, JC Willey, RA Setterquist, GM Fischer, W Tong, YP Dragan, DJ Dix, FW Frueh, FM Goodsaid, D Herman, RV Jensen, CD Johnson, EK Lobenhofer, RK Puri, U Schrf, J Thierry-Mieg, C Wang, M Wilson, PK Wolber, L Zhang, S Amur, W Bao, CC Barbacioru, AB Lucas, V Bertholet, C Boysen, B Brom- ley, D Brown, A Brunner, R Canales, XM Cao, TA Cebula, JJ Chen, J Cheng, TM Chu, E Chudin, J Corson, JC Corton, LJ Croner, C Davies, TS Davison, G Delenstarr, X Deng, D Dorris, AC Eklund, Xh Fan, H Fang, S Fulmer-Smentek, JC Fuscoe, K Gal- lagher, W Ge, L Guo, X Guo, J Hager, PK Haje, J Han, T Han, HC Harbottle, SC Harris, E Hatchwell, CA Hauser, S Hester, H Hong, P Hurban, SA Jackson, H Ji, CR Knight, 43

WP Kuo, JE LeClerc, S Levy, QZ Li, C Liu, Y Liu, MJ Lombardi, Y Ma, SR Magnuson, B Maqsodi, T McDaniel, N Mei, O Myklebost, B Ning, N Novoradovskaya, MS Orr, TW Osborn, A Papallo, T Patterson, JH Bullard, E Purdom, KD Hansen, S Dudoit, A Roberts, C Trapnell, J Donaghey, JL Rinn, L Pachter, ET Wang, R Sandberg, S Luo, I Khrebtukova, L Zhang, C Mayr, SF Kingsmore, GP Schroth, CB Burge, KD Hansen, SE Brenner, S Dudoit, Z Wu, X Wang, X Zhang, and JS Liu. RSEM: accurate tran- script quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12(1):323, 2011.

[101] Peter Langfelder and Steve Horvath. WGCNA: an R package for weighted correla- tion network analysis. BMC bioinformatics, 9:559, 2008.

[102] Li Li, Christian J Jr Stoeckert, and David S Roos. OrthoMCL: Identification of Or- tholog Groups for Eukaryotic Genomes – Li et al. 13 (9): 2178 – Genome Research. Genome Research, 13(9):2178–2189, 2003.

[103] Henrik Nordberg, Michael Cantor, Serge Dusheyko, Susan Hua, Alexander Poliakov, Igor Shabalov, Tatyana Smirnova, Igor V. Grigoriev, and Inna Dubchak. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Research, 42(D1), 2014.

[104] Guofan Zhang, Xiaodong Fang, Ximing Guo, Li Li, Ruibang Luo, Fei Xu, Pengcheng Yang, Linlin Zhang, Xiaotong Wang, Haigang Qi, Zhiqiang Xiong, Huayong Que, Yin- long Xie, Peter W H Holland, Jordi Paps, Yabing Zhu, Fucun Wu, Yuanxin Chen, Ji- afeng Wang, Chunfang Peng, Jie Meng, Lan Yang, Jun Liu, Bo Wen, Na Zhang, Zhiy- ong Huang, Qihui Zhu, Yue Feng, Andrew Mount, Dennis Hedgecock, Zhe Xu, Yunjie Liu, Tomislav Domazet-Lošo, Yishuai Du, Xiaoqing Sun, Shoudu Zhang, Binghang Liu, Peizhou Cheng, Xuanting Jiang, Juan Li, Dingding Fan, Wei Wang, Wenjing Fu, Tong Wang, Bo Wang, Jibiao Zhang, Zhiyu Peng, Yingxiang Li, Na Li, Jinpeng Wang, Maoshan Chen, Yan He, Fengji Tan, Xiaorui Song, Qiumei Zheng, Ronglian Huang, Hailong Yang, Xuedi Du, Li Chen, Mei Yang, Patrick M Gaffney, Shan Wang, Longhai Luo, Zhicai She, Yao Ming, Wen Huang, Shu Zhang, Baoyu Huang, Yong Zhang, Tao Qu, Peixiang Ni, Guoying Miao, Junyi Wang, Qiang Wang, Christian E W Steinberg, Haiyan Wang, Ning Li, Lumin Qian, Guojie Zhang, Yingrui Li, Huanming Yang, Xiao Liu, Jian Wang, Ye Yin, and Jun Wang. The oyster genome reveals stress adaptation and complexity of shell formation. Nature, 490(7418):49–54, 2012.

[105] Paul Flicek, M. Ridwan Amode, Daniel Barrell, Kathryn Beal, Konstantinos Billis, Si- mon Brent, Denise Carvalho-Silva, Peter Clapham, Guy Coates, Stephen Fitzgerald, Laurent Gil, Carlos García Girón, Leo Gordon, Thibaut Hourlier, Sarah Hunt, Nathan Johnson, Thomas Juettemann, Andreas K. Kähäri, Stephen Keenan, Eugene Kule- sha, Fergal J. Martin, Thomas Maurel, William M. McLaren, Daniel N. Murphy, Rishi Nag, Bert Overduin, Miguel Pignatelli, Bethan Pritchard, Emily Pritchard, Harpreet S. Riat, Magali Ruffier, Daniel Sheppard, Kieron Taylor, Anja Thormann, Stephen J.Tre- vanion, Alessandro Vullo, Steven P. Wilder, Mark Wilson, Amonida Zadissa, Bron- wen L. Aken, Ewan Birney, Fiona Cunningham, Jennifer Harrow, Javier Herrero, Tim J P Hubbard, Rhoda Kinsella, Matthieu Muffato, Anne Parker, Giulietta Spudich, 44

Andy Yates, Daniel R. Zerbino, and Stephen M J Searle. Ensembl 2014. Nucleic Acids Research, 42(D1), 2014.

[106] Alexandros Stamatakis. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9):1312–1313, 2014.

[107] J D Thompson, T J Gibson, and D G Higgins. Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics, Chapter 2:Unit 2 3, 2002. 45

IDENTIFICATION OF CILIOGENESIS CANDIDATE GENES IN PLATYNEREIS DUMERILII BY β-CATENIN INDUCED CELL FATE TRANSFORMATION

Manuscript to be published in a modified form in BMC Genome Biology

NATALIA ACEVEDO-LUNA 1,2, BENJAMIN R. BASTIN1, MARGARET PRUITT1,3, HEIKE HOFMANN2, AND STEPHAN Q. SCHNEIDER1*+

1 Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA 50011 USA.

2 Department of Statistics, Iowa State University, Ames, IA 50011 USA.

3 BioTek Instruments, 100 Tigan Street, Winnoski, VT 05404 USA

+ Current address: Institute of Cellular and Organismic Biology, Academia Sinica. Taipei, Taiwan

* Corresponding author

Abstract

Background The annelid Platynereis dumerilii is increasingly used as a model organism for developmental comparative studies due to its phylogenetic position as a lophotro- chozoan/spiralian and the accessibility of embryos that exhibit a stereotypic cleavage pattern and invariant cell lineages with predictable cell fates. One important biological process during development in most multicellular species is ciliogenesis, the formation of cilia, organelles associated with a variety of cellular roles such as motility, signaling, and sensory functions. However, knowledge of the multiple molecular components in- volved in the assembly of motile cilia in multiciliated cells (MCC) in P.dumerilii is limited. To close this gap, we developed a sophisticated computational approach that combines the identification of high confidence ciliogenesis candidate genes through orthologous discovery of known ciliary genes described in other species, with guilt-by-association guided novel ciliary gene discovery based on co-expression patterns between annotated 46 and non-annotated genes. Our study is informed by expression data obtained from wild- type and hyperciliated P. dumerilii larva.

Results Our results reveal over 600 known ciliary genes, described in other species, to be conserved in P. dumerilii and significantly upregulated (p-value ≤ 0.05) in hyperciliated embryos. These include genes encoding for well-known candidates in ciliogenesis such as dyneins, , microtubule organizers, intraflagellar transport, basal body associ- ated proteins, signaling proteins, and transcription factors. Our approach further allowed for the expansion of the known ciliary gene set, and enabled the identification of possi- ble novel ciliogenesis candidate genes among poorly characterized (non-annotated) P. dumerilii transcripts.

Conclusion This study highlights possible differences in the components necessary to initiate ciliogenesis in MCC cells during development in P. dumerilii and those described in vertebrate MCC differentiation. This work hence represents a first step towards the generation of a comprehensive set of ciliogenesis candidate genes that bridges the gap between our current understanding of phenotypic and genomic relationships of ciliogen- esis in Platynereis dumerilii and other species.

Background

Cilia are small microtubule-based projections from the cell surface, found in a large proportion of eukaryotic cells [108]. This conserved organelle appears to have evolved prior to the divergence of the last eukaryotic common ancestor [109] and is known to be involved in diverse physiological functions including motility, displacement of fluids, , sensory perception, and the establishment of left/right patterning during development [109, 110, 111]. More specifically, motile cilia present in multiciliated epithelial cells have an important role regulating the development and function of many organs including the brain [112], respiratory passages [113] and fallopian tubes [114]. In addition, cilia biosynthesis is tightly coordinated with the cell cycle [115]. 47

Given the diversity of roles performed by cilia, ciliary dysfunction can have profound physiological and developmental effects in the organism. In fact, several human dis- eases, collectively known as ciliopathies, have been linked to cilia defects [116]. A key step to understanding ciliary biology, and consequently to gain a deeper insight into the causes of ciliary dysfunction, consists in the identification of the components involved in the process of ciliary assembly.

Previous surveys have reported hundreds of proteins and genes to be involved in the assembly, structure, and function of cilia. For instance, Inglis et al. [108] compiled a ciliary proteome database reporting 157 proteins characterized in either Drosophila melanogaster or C. elegans, for which orthologs in five species including human and mouse were identified. Similarly, Gherman et al. [117] published the now defunct ciliary proteome database that integrated and curated ciliary proteomic data from ten different studies. Based on 75 proteins with experimental validation, the authors reported 1162 human ciliary proteins using computational predictions. In a more recent publication, Arnaiz et al. [118] reported the compilation of Cildb. This database contains predicted ciliary proteins from 44 species including their ortholog relation and links to human dis- eases referenced in the Online Mendelian Inheritance in Man (OMIM) database. Finally, a gene set published by the SysCilia Consortium [119] includes 303 curated ciliary genes and is considered the gold standard of known ciliary genes all included genes have ex- perimental evidence for ciliary localization, function in ciliogenesis and/or involvement in ciliopathies.

These ciliary gene surveys not only highlight the complexity of cilia but also that of cil- iogenesis, implying that the expression of such a large number of genes must be tightly coordinated to assure the proper progression of ciliogenesis. This regulation of gene expression is achieved in part at the transcriptional level. In particular, previous studies have independently found RFX (regulatory factor x) [120, 121] and FOXJ1 (forkhead box J1) [110, 122] transcription factors to be required for the expression of ciliary compo- nents across different taxa. 48

Cilia have been lost independently many times during evolution [109]. However, struc- tural organization of cilia is generally conserved across different taxa. In general, a cilium consist of an axoneme, a membrane covered projection of the microtubule cytoskeleton extending from the basal body that anchors it to the cell surface. Within the axoneme, the arrangement of microtubules varies in different types of cilia. In motile cilia, the char- acteristic microtubules arrangement consists of 9 doublets surrounding a central pair, whereas in non-motile cilia the central microtubule pair is absent [123, 124].

In this study we identified conserved and candidate novel ciliary genes in the inver- tebrate Platynereis dumerilii commonly known as the ”bristle worm” or ”ragworm”. P. dumerilii, a protostome belonging to the superphylum Spiralia/Lophotrochoza and to the phylum Annelida, is in a key phylogenetic position for comparative studies among bilaterians [125]. In addition, this annelid appears to have retained ancestral morpholog- ical and genomic features, suggesting that its genes and cell types could informative to gain insights into ancestral bilaterian features [126].

The genome of P. dumerilii is about 1Gbp in size [127], which is considered large in comparison with other invertebrates [126]. Interestingly, the number and position of in the P.dumerilii genome is of greater similarity to vertebrates than to some other protostomes (i.e the ecdyzosoans D. melanogaster and C. elegans) [128]. In addition, the P. dumerilii genome not only contains orthologs of many protein coding genes of vertebrates, but their protein sequences also show a lower divergence from vertebrates than fly and nematode [126, 128]. Taken together, these properties strongly support P. dumerilii exhibiting a slow rate of evolution compared to other protostomes. This slower rate of divergence is of particular interest because it allows for stronger inferences to be made regarding ancestral genes gene functions.

The species phylogenetic position and slow evolutionary rate make P. dumerilii a suit- able species for studies in comparative genomics, evolution and development. In addi- tion, P. dumerilii shows indirect development with invariant stereotypical cleavage pat- 49 terns that leads to the formation of the trochophore, a free swimming ciliated larva. A defining feature of the trochophore larva is a ciliated ring, the prototroch, located inthe equatorial plane. This structure consists of two rows of multiciliated cells that originates from the trochoblasts and starts to be visible at around 12 hours post fertilization (hpf). Shortly after the formation of the prototroch a ciliated apical tuft is formed, followed by the formation of a posterior ciliated structure, the telotroch [129]. Hence, here we use

P. dumerilii as a model organism to elucidate the formation of cilia, a crucial biological process during development.

Despite the important role of cilia and ciliogenesis, very little is known about this pro- cess and its transcriptional regulation in P.dumerilii. Current studies in invertebrates are mostly limited to Drosophila and C. elegans, both highly diverged species that lack motile cilia, imposing a challenge when establishing homologies between their candidate cil- iary genes with genes of other groups of organisms, and limiting such homologies to non-motile cilia.

To close this gap, we identified conserved and candidate ciliogenesis genes in

Platynereis dumerilii. Currently, no survey of ciliary genes in P. dumerilii has been pub- lished. To that end, we first compiled a set of known ciliary genes from different data sets reporting genes and proteins involved in ciliary assembly in different taxa, and we determined the extend of known ciliary genes conserved in P. dumerilii.

Next, we experimentally induced the formation of hyperciliated larvae in P. dumerilii through the pharmacological activation of the Wnt-β-catenin signaling pathway. Differ- ential analysis of expression data from the hyperciliated larvae compared to the wild-type larvae then allowed for the definition of a high confidence ciliary gene setin P. dumerilii. Based on this gene set, we were able to obtain additional novel ciliary candidate genes using co-expression analysis between the annotated genes of the high confidence set, and non-annotated but significantly differentially expressed genes from our hypercili- ated expression data. Our results not only enrich the understanding of developmental 50 processes of P. dumerilii, and of ciliary biology in a broader scope. They also provide insights into the evolution and diversity of cilia, and into the process of ciliogenesis as a whole, especially its transcriptional regulation.

Results

Identification of Known Ciliary Genes in P.dumerilii

We compiled a comprehensive set of 2359 known ciliary genes by combining genes reported in seven different ciliary studies (Table 3.1). The set includes all ciliary genes (303) from SysCilia [119] which provides a curated gene list with experimental evidence for ciliary localization, function in ciliogenesis, and/or involvement in ciliopathies. In ad- dition SysCilia includes experimentally validated genes with ciliary annotation from the Gene Ontology Consortium (GO). Similarly, we included 308 genes from Cildb [130], a database that compiles information from 53 high throughput ciliary and centriolar stud- ies reporting ciliary genes in 44 species. Only human ciliary genes from Cildb with 8 or more referenced studies were included. In addition, we integrated all genes reported by

Sigg et al. [131] in which a list of 436 ciliary genes isolated from a sea urchin (Strongy- locentrus pupuratus), a sea anemone (Nematostela vectensis), and a choanoflagellate (Salpingoeca rosetta) was compiled. Genes reported in this study are conserved in at least two of these three species. We refer to this subset of known ciliary genes as the shared ciliary proteome (Table 3.1).

Our compilation of known ciliary genes also includes all genes with confirmed ciliopa- thy involvement, as well as genes identified as candidate genes reported inthe study of Reiter et al. [132]. This extensive review reports 426 confirmed ciliary genes, from which 187 have established ciliopathy association, and 241 are reported as candi- dates (potentially involved in a ciliopathic phenotype).

Furthermore, we included genes from three studies identifying targets ofFoxJ1 and/orRfx factor in the context of ciliary studies. These two transcription factors are 51

Table 3.1: Description of sources of known ciliary genes showing the number of genes obtained from each source and the number of genes found conserved in P. dumerilii Dataset [ref] Description Num- Genes ber of found in Genes P. dumerilii SysCilia [119] High confidence gene set 302 296 with experimental validation Cildb [130] Genes from 55 high through- 308 260 put ciliary studies.compara- tive genomics with 44 species Shared Ciliary Proteome [131] Ciliary proteins isolated and 436 429 conserved in at least two of the following: S. purpuratus, N. vectensis, and S. rosetta Ciliopathy review [132] Genes with confirmed ciliopa- 426 419 thy involvement and candi- date ciliopathy genes RFX targets by Chung et. Genes induced by Rfx2 in X. 911 858 al[121] laevis high throughput analy- sis FoxJ1 targets by Choksi et. Genes induced by FoxJ1 in D. 596 543 al[110] rerio high throughput analysis RFX targets by Quigley et. al Rfx2 targets indentified in X. 852 769 [133] laevis 473 overlap with Chung et. al set FoxJ1 targets by Quigley et. FoxJ1 targets indentified in 786 764 al[133] X. laevis 224 overlap with Chosky et. al set Non-redundant Total 2359 2187

known as master regulators of ciliary assembly. By including genes from these three studies, we were able to 1) add transcriptional information to genes already in our known ciliary gene set by labeling them as targets of Rfx2 and/or FoxJ1, and 2) enrich our set of known genes with additional candidates. This resulted in 573 genes identified as FoxJ1 targets in zebrafish (Danio rerio) by Choksi et al. [110]. Out of these genes, 336 had pre- viously been listed as ciliary genes in other sources [108, 117]. Similarly, we aggregated 52

676 genes reported by Chung et al. [121] as targets of Rfx2 in frog (Xenopus laevis). Fi- nally, in a more recent study in X. laevis, Quigley et al. [133] reported 952 genes induced by Rfx2, and 787 by FoxJ1, 429 of these genes are regulated by both transcription fac- tors. This resulted in a total of 2359 known ciliary genes, the detailed list of which is available in Supplementary Table B.2.1.

Next, we used the Blast alignment tool [134] to identify the corresponding ortholog genes in P. dumerilii based on sequence similarity (see Methods). Table 3.1 summa- rizes the sources of known ciliary genes compiled in this study, as well as the number of known ciliary genes found to be conserved in P. dumerilii. We identified potential or- thologous genes to 2187 of the 2359 genes in the known set. These 2187 known cil- iary genes correspond to 1610 P. dumerilii transcripts of which 1240 have a one to one match to a single known ciliary gene, and 370 transcripts have significant alignments to more than one known gene. Such multiple alignments usually correspond to genes of the same gene family. However assigning a single P. dumerilii gene to a particular gene in a gene family would require a deeper phylogenetic analysis for each individual gene. Hence, we retained all significant alignments as potential annotations, and we assigned a weighted score (see Methods) to rank such potential annotations based on the qual- ity of their alignments. The complete list of genes with multiple significant matches is detailed in Supplementary Table B.2.2.

High Confidence Known Ciliary Genes We categorized the genes in the known ciliary set based on the strength of the support linking each gene to cilia. The more sources reporting a given gene as ciliary gene, the more confidence to consider such gene a known ciliary gene. This allowed us toidentify two subsets: a strict core of known ciliary genes, and an inclusive core (Table 3.2). The strict core contains a set of 57 high confidence ciliary genes, independently reported in SysCilia [119], Cildb [130], the shared ciliary proteome [131] and the ciliopathy review [132]. Therefore, all genes in the strict core have been experimentally validated and their localized expression and/or their function in cilia is well documented. 53 of the genes in 53 this core are reported to be targets of FoxJ1 and/or Rfx2, 37 have confirmed involvement in a ciliopathic phenotype while the remaining 20 are candidate ciliopathic genes. Based on sequence similarity we identified 55 of the 57 high confidence genes known ciliary genes in P. dumerilii (Table 3.2).

The other subset represents a more inclusive core corresponding to genes reported in two out of the four sources with experimental validation. This core contains 325 genes, of which 284 genes have reported localization/function in the cilium, 221 have transcriptional input information, 141 are confirmed ciliopathic genes and 144 are can- didate genes. From this core of known ciliary genes, we found 291 to be conserved in P. dumerilii (Table 3.2).

In addition, known ciliary genes were also classified according to the information pro- vided in the sources. For instance, two of the seven sources of known ciliary genes re- port localization and/or functional information [119, 132] resulting in a total of 469 genes to have localization information. We extracted all localization and functional terms and grouped similar terms into 10 modules. These modules are part of three ciliary com- ponents: 1) structural component, which includes terms related to axonemal, transition zone, basal body, ciliary membrane, and central pair, 2) functional component, including motility, transport, signaling, and regulation terms, and 3) ciliogenesis precursor, encom- passing and ciliary vesicle related terms.

The majority of the genes with localization and/or functional information fall into more than one module. For instance, the gene encoding for the protein Cep290 is part of the structural module as this protein is present in the basal body, however gene is also annotated as part of the centrosome [132]. Table 3.3 summarizes all known cil- iary genes along with the number of genes found in P. dumerilii, as well as the number of genes based on their associated information: 1) potential ciliopathic involvement, 2) transcriptional input, and 3) localization. Supplementary Table B.2.3 lists high con- 54

Table 3.2: Known ciliary genes classified into strict and inclusive core of high confidence genes. For each sub-set of known ciliary genes the intersection of known genes/transcripts found in P. dumerilii differentially expressed after β-catenin induced cell fate transformation DE Genes known found upregulated no change downregulated Transcriptional information RFX targets 0 - - - - 1 1 FOXJ1 targets 1 -- 1 1 52 46 4 2 RFX andFOXJ1 targets 52 50 44 4 2 Ciliopathy information 37 33 4 confirmed 37 - 35 31 4 20 15 3 2 candidate 20 20 15 3 2 Localization information Strict Core 55 46 7 2 structural components 55 53 44 7 2 34 29 4 1 functional components 34 34 29 4 1 35 30 4 1 ciliogenesis precursors 35 65 30 4 1 57 48 7 2 Total number of genes 57 55 46 7 2 Transcriptional information 10 7 3 0 RFX targets 10 10 7 3 0 19 12 3 3 FOXJ1 targets 19 18 11 3 3 191 151 30 7 RFX and FOXJ1 targets 192 185 145 30 7 Ciliopathy information 140 87 38 11 confirmed 141 135 85 35 11 141 85 34 16 candidate 144

Inclusive Core 129 75 32 16 Localization information 256 157 64 25 structural components 259 233 140 58 24 115 71 31 10 functional components 116 108 68 26 10 169 104 45 17 ciliogenesis precursors 171 160 97 42 17 321 202 76 32 Total number of genes 325 291 181 68 31

fidence genes in the strict and inclusive core including their localization/function, cilio- pathic involvement, and transcriptional information when available. 55

Table 3.3: Summary of Known ciliary genes discriminated into sub-sets based on trancriptional and ciliopathy information. For each sub-set of known ciliary genes the intersection of known Genes/transcripts found in P. dumerilii differentially expressed after β-catenin induced Cell fate transformation DE Genes known found upreg- no down- ulated change regu- lated Transcriptional information 646 172 249 173 RFX targets 713 599 152 222 159 295 86 87 70 FOXJ1 targets 320 281 81 85 67 793 438 190 118 RFX and FOXJ1 targets 817 715 392 175 106 Ciliopathy information 184 107 52 17 confirmed 186 174 103 47 17 235 135 54 36 candidate 240 212 121 49 32 Localization information All known ciliary genes 397 219 95 58 structural components 406 348 190 82 55 145 79 42 18 functional components 147 134 74 35 17 225 126 59 34 ciliogenesis precursors 229 212 116 56 33 2187 826 678 511 Total number of genes 2359 1610 611 511 353

Based on sequence similarity we identified potential P. dumerilii homologous genes for the majority of the genes in the set of known ciliary genes compiled from differ- ent sources. However, sequence similarity alone does not provide certainty that these genes are indeed involved in ciliary assembly andor ciliary function in the annelid worm. To further test this, we induced the formation of a hyperciliated embryo and observed 56 the expression of known ciliary genes in such embryos compared to that of normally developing embryos. In the next section we describe the results of this approach.

Candidate Genes by β-Catenin Cell Fate Transformation

Normal embryonic development of Platynereis dumerilii involves the formation of the trochophore, a larval stage consisting of free swimming larva with a ciliated ring on the equatorial plane. A prelude to the trochophore, the protochophore, is reached in P.dumer- ilii at about 12 hours post fertilization (hpf) (Figure 3.1A). At this early stage, cilia are starting to be visible in the pre-hatching larva. We hypothesize that most of the genes required for ciliary assembly should be expressed at 12 hpf, in preparation to form the ciliated ring.

Prior to the formation of the ciliary ring, early P.dumerilii development is characterized by asymmetric cell divisions oriented along the animal-vegetal axis. One interesting fea- ture of P.dumerilii development, only observed in few other organisms [135], is that in ev- ery cell division along this axis, there is a reiterative pattern of asymmetric distribution of β−Catenin protein between the two daughter cells [136]. This leads to a higher concen- tration of β−Catenin in the vegetal-pole cell and lower concentration in the animal-pole daughter cell (Figure 3.1B.). Interestingly, this asymmetric distribution of β−Catenin me- diates binary cell fate decisions enabling cell fate diversification between daughter cell pairs in the P. dumerilii embryo [135].

Furthermore, it has been previously shown that pharmacological activation of the Wnt/β-catenin signaling pathway with 1-azakenpaullone (Az) alters the asymmetric dis- tribution of β−Catenin causing distinct cell fate transformations [136]. More specifically, treatment with Az during the transition from 8 to 16-cell stage in P.dumerilii embryos (∼ 3:45 hpf to 4:30 hpf when developing at 18◦C ) caused a cell fate transformation resulting in larva with an increased number of cilia bearing cells (Figure 3.1C). 57

Figure 3.1: A. Schematic representation of the first 14 hours of P. dumerilii development, ciliary structures (depicted in dark gray) start to be visible at 12 hpf. Cells committed to become multi- ciliated are shown in light and dark gray. B. Asymmetric distribution of β-catenin concentration mediating binary cell fate decision. High concentration of β-catenin (shown in red) favors veg- etal pole fate.C. Schematic representation of the cell fate transformation expected when treat- ing embryos with 1-Azakenpaullone (Az), a GSK-3β inhibitor causing accumulation of β-catenin. Expansion of ciliated structures shown in dark gray. D. Distribution of known ciliary genes, an- notated transcripts, and non-annotated transcripts in the sets of differentially expressed genes by Az-treatment. E. Experimental setup for ectopic β-catenin activation in three replicates. Em- bryos were Az treated (5µM Az) for 45 minutes during the transition from 8 to 16 cells, control group was treated in the same time frame with 0.1%DMSO. RNA samples for sequencing were collected at 12 hpf, phenotypic screening was performed at 12 and 24 hpf. F. In situ hybridiza- tion of three genes at 12 and 24 hpf in control and Az-treated conditions. All embryos are shown in ventral view with anterior at the top. Schematic on the left show ciliated structures in gray.

Hence, we triggered the formation of hyperciliated larva by treating P. dumerilii em- bryos with Az during the cell division from 8 to 16-cell stage. In normal conditions, such 58 cell division produces vegetal-pole cells that differentiate into ciliated cells, while their animal-pole sister cells generate anterior structures. In the treated embryos however, Az elevates the β−Catenin concentration in the animal pole cells, whose concentration is normally low. This elevated level of β−Catenin promotes a cell fate transformation from animal-pole to vegetal-pole daughter cell fate. As a result larvae from treated embryos have an increased number of cililated cells, and ciliated cells are no longer confined to the equatorial plane but expanded toward the animal pole (see Figure 3.1E,F).

As both treatments, 0.1% DSMO (control), and 5µM 1-azakenpoullone (treatment), were administrated to the offspring of one mating event, our analysis takes into account not only the differences between control and treatment, but also the differences of each independent mating event (with three biological replicates). This constitutes a paired design in which each mating event receives both the control and the treatment. To ac- count for this, our analysis compares treatment and control for each mating event sep- arately to subtract the differences between them. After treatment, at 12 hpf a portion of ∼ 500 embryos from both control and treated sets from each batch were taken for RNA extraction, and RNA sequencing (see Methods). Phenotypic screening at 24 and 48 hpf showed a hyperciliated phenotype in > 90% of the treated embryos and a normal trochophore phenotype in the control set (Figure 3.1E figure).

Quantification of gene expression by RNA-seq followed by differential expression anal- ysis between Az-treated and control embryos allowed for the identification of 4517 genes upregulated by the treatment (Figure 3.1D). Among this set of significantly upregulated genes, 2380 are annotated genes and the remaining 2137 constitute non-annotated tran- scripts.

Interval of Expected Gene Expression Change based on Cell Lineage Transformation In order to identify the precise subset of upregulated ciliary genes which can be at- tributed to the activation of the Wnt/β-Catenin pathway, we hypothesized that such a set of genes is expected to be upregulated in a fold change that reflects the lineage 59

8-cell stage 16-cell stage

Animal 2 ciliated cells x4 24 Pole 4 ciliated cells x4 Wild Type Vegetal Neuroectoderm + Trunk Pole Endoderm + Mesoderm

4 ciliated cells x4 32 4 ciliated cells x4 Treated Endoderm + Mesoderm Endoderm + Mesoderm

Time 3:45 to 4:30hpf 12hpf [treatment time] [RNA-seq sample]

Figure 3.2: Predicted effect of cell fate transformation on treated embryos: Expected in- crease of ciliated cells in treated embryos. Ciliary and endo-mesoderm genes are expected to be upregulated by the treatment. cells bearing cilia in treated embryos increase in a factor of 1.3. From 24 cells with cilia in control embryos to ∼ 32 in treated embryos.

transformation triggered by the treatment, resulting in the increase of cells bearing cilia. To define this fold change we take into consideration the number of ciliated cellsinthe normal developing larva compared to that of the hyperciliated treated larva. More pre- cisely, in the normal trochophore stage the ciliated ring consists of 24 ciliated cells (see Figure 1a at 10 hpf), while in the treated embryos we estimate the number of ciliated cells to increase to at least 32, an increase by a factor of ∼ 1.4 (Figure 3.2).

The increase of ciliated cells in the treated embryos consequently allows for the defi- nition of an interval of expected fold change of gene expression between 1.3 to 2. Here, the lower bound of 1.3 reflects the increase from 24 to 32 ciliated cells in the treated embryos whereas the upper bound of 2 is reasonably chosen to be more inclusive in order to account for possible biological and experimental variation of expression.

To better define this interval of fold change, we identified eight different sets ofgenes among the 9374 annotated and diferentially expressed genes from the above analysis. Some of these gene sets are known to either be related to cilia assembly or being as- 60 sociated with non-ciliary functions, and we examined their distribution in regard to the ratio between Az-treated and control expression values. From those eight gene sets, six contain annotations related to cilia such as axonemal, dynein, flagellar, intraflagellar transport, , and , whereas the other two sets consist of genes with non- ciliary related annotation such as ribosomal and . The distribution of the ratios between the gene expression level in treated and control embryos shows that for the majority of the genes with ciliary annotations the expression level increases by a factor between 1.3 and 2 in the Az-treated condition. One noteworthy exception are genes as- sociated with tubulin where the majority of genes are found in a lower ratio interval (1 to 1.7)(Figure 3.3f). In contrast, the distribution of ratio of expression level for the gene sets not related with cilia is significantly lower (4% for ribosomal and 20% for Myosin gene set) compared to that of the ciliary annotated genes (Figure 3.3).

Among all genes included in the DE analysis (17378 genes), 4004 genes exhibit an in- crease in expression within the ratio interval in the treated condition. Not surprisingly, the majority of these 4004 genes were found to be significantly upregulated in our analysis (2866). As a consequence, we leverage the identified ratio as an added source of informa- tion to aid the identification of ciliogenesis candidate genes and potential Wnt/β-Catenin targets in combination with the gene annotation and the DE analysis downstream our analysis pipeliene.

High Confidence Ciliogenesis Candidate Genes in P.dumerilii

Studies in other organisms suggest that about ∼ 600 to ∼ 1000 genes are involved in assembling and maintaining functional cilia [131, 137, 138]. Our perturbation assay re- sulted in 4517 significantly upregulated genes in the hyperciliated larva. However, there are various extends of information available for each upregulated gene. For instance, 2380 of the 4517 upregulated transcripts have gene annotations, while the remaining 2137 genes lack annotation ( Figure 3.1D). In addition, 2866 genes are upregulated within 61

Figure 3.3: Distribution of the ratio of change in gene expression in treated embryos Shown are the relative abundance of genes per ratio of change in expression between Az-treated and normal condition for different sets of genes. A-F. Sets of genes with ciliary related annotation G,H. Sets of genes with non-cilary related annotations. Dotted line indicates the defined ratio of interest, note that for ciliary annotated genes (A-F) the majority of the genes are within this ratio. The number of genes per set is shown in parenthesis. 62 the expected ratio interval. Therefore, the level of confidence to implicate the upregu- lated genes with ciliary function in P. dumerilii varies from set to set.

The first step towards identifying ciliogenesis candidate genes among the 4517tran- scripts upregulated by the Az-treatment, consists in examining the extent of their over- lap with the known ciliary genes (1610 transcripts). Among the latter set, we found 611 upregulated genes, 484 of which fall within the defined ratio interval expected for Wnt/β-Catenin targets. Since, the expression pattern in the treated embryos and evi- dence from other species are implicating the genes in this set with ciliogenesis, we con- sider this group of 484 consists of mostly ciliogenesis candidate genes and of potential direct or indirect downstream targets of the Wnt/β-catenin pathway. Importantly, all of the genes of the previously defined strict core genes (55 genes known ciliary genes with the most evidence to be related with cilia in other species) have a change in expression in the hyperciliated larva that falls within the expected ratio interval, and the majority (48) are significantly upregulated. In addition, 46 of these genes are reported as targets of both FoxJ1 and Rfx transcription factors (Table 3.2). Consequently, we consider these 48 genes to be high confidence (HC) ciliogenesis candidate genes, as they are 1)re- ported by four independent sources as ciliary genes, 2) significantly upregulated by the Az-treatment, 3) and within the expected ratio interval.

Similarly, among the known ciliary genes in the more inclusive core with 242 homolo- gous genes in P. dumerilii, a total of 180 are significantly upregulated from which 157 are within the ratio interval. Filtering these genes for ciliary function allowed us to further identify potential targets to FoxJ1- and Rfx- dependent transcription, i.e. genes poten- tially related with a ciliopathic phenotype and genes based on functional modules (see Table 3.2 for additional details).

Finally, to assess the potential ciliary function of genes in the high confidence set, we selected the tektin gene family and performed whole mount in-situ hybridization to con- firm whether or not these genes were expressed in the trochophore (ciliary band)inthe 63

P. dumerilii larva (Figure 3.1A). The tektin family consists of coiled-coil domain contain- ing proteins that are thought to form filaments composed of two heterodimers (Tektin-2 and Tektin-4) and one homodimer (Tektin-1) [139, 140]. Transcripts annotated as tek- tin2, tektin4, and tektin1 are found upregulated in the treated larva. Figure 3.1E shows localized expression of these three genes at 12 and 24 hpf during normal development as well as in Az-treated P. dumerilii larva. As expected, the spatio-temporal expression pattern of these three tektin genes does correspond to that of the ciliated structures in

P.dumerilii larva in normal development as well as the expansion of expression towards the animal pole (Figure 3.1A,C,F) in Az-treated embryos the will develop into hypercili- ated larva. These findings confirm that for this particular gene family, the upregulated tektin-annotated transcripts indeed warrant be added to the ciliogenesis candidate gene set.

Guilt-by-association Based on Gene Family In order to expand the high confidence ciliogenesis candidate gene set, we searched for genes among the annotated upregulated subset (2380 genes) known to be a member of a gene family contained within the high confidence ciliary candidate set. We refer to this approach as guilt-by-association based on gene family.

The 48 genes in the high confidence set belong to 18 gene families as follows (the number of genes per family is shown in parenthesis): Dyneins (10), intraflagellar trans- port (10), radial spoke head (4), coiled-coil domain containing (4), tetratricopeptide re- peat domain (3), WD repeat domain containing (3), nucleoside diphosphate kinase (2), tektin filament forming (1), Bardet-Biedl syndrome (1), small GTPase adenosine diphos- phate ribosylation (1), sperm flagellar protein (1), /threonine- protein kinase (1), Adenylate kinase (1), Rhophilin associated tail protein (1), associated protein (1), growth arrest specific (1), TNF -associated factor (1), EF-hand domain con- taining protein (1), Hydrocephalus inducing protein(1). We found upregulated genes be- longing to 13 of the 18 gene families in the HC set, allowing us to identify 118 ciliogenesis candidate genes through our guilt-by-association (GbA) approach (Table 3.4). 64

Of note, among the dynein gene family, we identified dynein axonemal heavy (dnah), intermediate (dnai), and light chain (dnal), as well as dynein axonemal assembly factor (daaf). The candidate gene set in this gene family was expanded by 11 genes as 7 upregulated genes annotated as part of the dnah gene family and four annotated as daaf family were found. In addition, one P. dumerilii transcript annotated as dynein axonemal light chain was identified in the set of non DE genes. This gene (as well as additional genes from other gene families), could either be a false negative, or not regulated by the β-catenin signaling pathway (see Discussion).

Based on gene annotation, the gene family where most candidate genes were found is the coiled-coil domain containing family (ccdc). In total, 47 upregulated P. dumerilii transcripts are part of the ccdc gene family, all of which are upregulated in the hyper- ciliated larvae. Therefore, we were able to expand the candidate gene set in this gene family from 4 to 47.

Similarly, in addition to the three WD repeat domain containing (wdr) genes already included in the high confidence ciliary gene set, we identified a total of76 P. dumerilii transcripts with WD repeat annotation. From these, 26 genes are already part of the known ciliary set with 15 being significantly upregulated. Interestingly, of the remaining

50 wdr genes that are not part of the known ciliary genes, only 8 are significantly up- regulated hence expanding the candidate set of wdr genes by the same number. Most importantly, these 8 wdr genes had not been reported as ciliary genes in other species in any of the seven sources included in the compilation of known ciliary genes Table 3.1.

In summary, a total of 26 wdr genes (16 known ciliary genes and 8 not part of the known set) were added to the candidate gene set for the wdr family (Table 3.4).

Finally, for the tetratricopeptide repeat domain (ttc) family, 2 additional upregulated ttc genes not part of the known set were identified and added to the ciliogenesis candidate gene set in P. dumerilii. For this gene family, 37 genes are annotated as ttc, and 14 of those are upregulated and therefore part of the candidate gene set. 65

In summary, from the 18 gene families in the high confidence ciliogenesis candidate gene set, five had no transcripts added to the candidate gene set. Three of these five gene families had no other transcripts annotated in their family (dynein axonemal inter- mediate chain, clustering associated protein, and hydrocephalus inducing protein), and for the other two gene families (Growth arrest specific and Serine/threonine-protein ki- nase gene) there were other transcripts annotated within these families, but those were not upregulated and not part of the known ciliary gene set (Table 3.4).

Table 3.4: GbA by gene name: Ciliogenesis candidate genes identified based on gene family name of genes in the high confidence (HC) set. (*) denotes non-upregulated, ++, genes not present in the known set are shown in gray. Gene Description Genes Gene names in HC set Other P. dumerilii genes Added Total family HC in this gene family candidate name genes (GbS) dnah/dyn Dynein axonemal 6 , dnah2, , dnah3, dnah7c, dnah8, 7 13 heavy chain dnah6, dnah10, , , dnah12, dnah17 dnai Dynein axonemal 2 , dyi3 - 2 intermediate chain dnal Dynein axonemal 2 , dnali1 * 0 2 light chain dnaaf Dynein axonemal 0 - dnaaf1, , dnaaf3, 4 4 assembly factor dnaaf5 ift Intraflagellar 10 ift27, ift81, ift122, ift20, ift57*, ft46*, ift43* 0 10 transport ift74, ift52, ift88, ift172, ift140, ropn Rhophilin 1 ropn1l ropn1 1 2 associated tail protein rsph Radial Spoke Head 4 , , rsph14, , 3 7 ;rsph3b, rsph10b2; ccdc Coiled-coil 4 ccdc40, ccdc37, ccdc83, ccdc66, ccdc18, 43 47 domain-containing ccdc164, ccdc39 ccdc34, ccdc13, ccdc27, ccdc87, ccdc11, ccdc19, ccdc60 ...+33++ tekt Tektin filament 1 tekt2 , tekt4, tekt3*, 2 3 forming tekt5* ttc Tetratricopeptide 3 ttc29, ttc26, ttc30b ttc29, ttc26, ttc16 ttc25 14 17 repeat domain ,ttc12, ttc23, ttc130, , ttc40, ttc18’, ttc17, ttc6, ttc27, ttc28 ...+23*++ efhc EF-hand domain 1 efhc1 efhc2 1 2 containing gas Growth arrest 1 gas8 gas1*, gas2* 0 1 specific spef Sperm flagellar 1 spef2 spef1 1 2 protein mak Serine/threonine- 1 mak7 mak16* 0 1 protein kinase mne Nucleoside 2 nme7 , nme5 , nme9, nme6* 2 4 diphosphate kinase ak Adenylate kinase 1 ak7 ak8, ak1, ak5, ak2* 3 4 66

Table 3.4: (continued) Gene Description Genes Gene names in HC set Other P. dumerilii genes Added Total family HC in this gene family candidate name genes (GbS) bbs Bardet-Biedl 1 , , , 4 5 Syndrome , *, * arl Small GTPase 1 arl3 arl2bp, , arl9, 3 4 adenosine arl6ip1*, arl15*, arl2*, diphosphate arl11*, arl4c*, * ribosylation hydin Hydrocephalus 1 hydin - - 1 inducing protein cluap Clusterin associated 1 cluap1 - - 1 protein wdr WD repeat domain 3 wdr35, wdr19, wdr78 wdr65, wdr96, wdr16, 26 31 containing wdr31, wdr97, wdr27, wdr60, wdr61, wdr66, wdr67, wdr92, wdr52, wdr90, wdr63, wdr93, wdr88 ...+8, +42* +11*++ traf TNF 1 traf3ip1 traf6, traf7, traf2*, traf3*, 2 3 receptor-associated traf4* factor Total 48 118 166

In addition to the 118 P. dumerilii upregulated genes added to the ciliogenesis candi- date gene set, a total of 56 non-differentially expressed genes had annotations belong- ing to the same gene families of the HC set of known ciliary genes. From those, only 1

(Wdr59) is within the expected ratio interval. Since the remaining 55 genes are not signif- icantly upregulated, and their change in expression in the hyperciliated larvae does not fall within the ratio interval, we refrained from including these in the set of ciliogenesis candidate genes.

Ciliogenesis Candidate Genes by Co-expression Analysis

Expanding the high confidence ciliogenesis candidate genes through GbA based on gene family allowed us to identify potential ciliogenesis candidate genes among the 2380 upregulated and annotated genes. However, this approach relies on gene ontology and hence does not include the subset of 2137 upregulated transcripts that lack annotation. Any approach to overcome this limitation would require additional sources of informa- tion besides the annotation vocabulary that allows to reveal the expression properties 67 of genes throughout cilia assembly. To obtain such valued information, we examined P. dumerilii expression data during normal development. Such data enables the elucidation of genes with similar expression patters which in turn may indicate potential governance by the same regulatory machinery and, more importantly, involvement in the same func- tional domain [141, 142].

To identify (potentially novel) ciliogenesis candidate genes among all the genes up- regulated by the Az-treatment, we therefore examined the expression profile of all P. dumerilii transcripts during normal development. Briefly, we used PdumBase [143] to extract expression data for both known ciliary transcripts and unknown but upregulated transcripts in P. dumerilii consisting of 7 time points which cover the first 14 hours of development (2, 4, 6, 8, 10, 12, and 14 hpf Figure 3.1A). Next, we aimed at identifying co-expression patterns between the known ciliary genes and the remaining upregulated genes. Note that the latter correspond to either annotated genes whose annotation is not related to ciliary function in any other species, or, more importantly, non-annotated upregulated ciliary genes containing potential novel ciliary genes. We will continue to refer to these two groups of genes as unknown annotated and non-annotated genes.

Specifically, our approach achieves the task of identifying ciliogenesis candidate genes by co-expression in two main steps. First, we grouped the 611 upregulated known ciliary genes into clusters related to each other by common patterns of expression dur- ing early developmental stages (2 to 14 hpf). Here, a hierarchical clustering approach with stringent clustering parameters enforcing high similarity between expression pat- terns was used (see Methods) to ensure proper separation between distinct functional modules. Next, the unknown upregulated genes were classified into the clusters of the known ciliary genes. For this, we used supervised learning to train a random forest using the expression patterns of the known genes together with their cluster corresponding id as label and used this model to classify the unknown upregulated genes. By doing so, our approach allowed for the identification of upregulated genes with very similar patterns of expression to those of the known ciliary genes. Under this scheme, we clus- 68 tered the known ciliary upregulated genes into total of 486 clusters, and then classified the remaining 3906 upregulated genes into the clusters of known genes (R2 = 0.97 and

2 RAdjusted = 0.78).

To explore the content of such a large number of clusters we first ranked them by their information content (IC) which enabled us to identify clusters in order of their importance based expression profile features (see Methods).

The known gene in the cluster with the highest IC is rpl10ps3 (Figure 3.4). Rpsl10ps3 is included in the ciliary known gene set as it is reported by Sigg et al. (referenced as

Shared Ciliary Proteome Table 3.1) as a ciliary protein in S. purpuratus and N. vectensis with an homologous gene found in mouse [131]. The ciliary proteomic analysis by Siggs et al. reports rpsl10ps3 in the S. purpuratus whole cilia from early gastrula, and in the axoneme and ciliary membrane in cilia from late gastrula in S. purpuratus, and whole cilia in N. vectensis. Prior to Sigg et al., rpsl10ps3 had not been reported as ciliary protein by any other published database. Specific to P. dumerilii, through our gene expression approach we identified two unknown genes, rs8 and rla2, to be co-expressed with the homologous to rpsl10ps3 found in P. dumerilii. Interestingly all the genes in this cluster are encoding for ribosomal proteins (Figure 3.4), providing evidence that our strategy, based on the expression profiles of the known ciliary genes, does indeed identify genes with similar function from the unknown set. We also note that rs8 and rla2 are annotated but their annotation has not yet been reported to be related with ciliary function making them potential ciliogenesis candidates genes in P. dumerilii.

Co-expression of High Confidence Ciliogenesis Candidate Genes Our method also allows for the identification of potentially novel ciliogenesis candi- date genes that correspond to non-annotated, unknown genes with very similar expres- sion patterns to those of the well-characterized known genes. To identify such poten- tially novel genes, we first explored the co-expression of the 48 previously identified high confidence (HC) ciliary genes and their corresponding cluster compositions. TheseHC 69

Figure 3.4: Co-expression of highest IC gene Expression levels over early developmental stages (2 to 14 hpf) of the genes in the cluster containing the gene with highest information criterion: Rpl10ps3 (shown in black). All genes in this cluster encode for ribosomal proteins.

genes are significantly upregulated in P. dumerilii Az-treated embryos and homologous to genes that have been reported as ciliary genes in other species by four independent sources with experimental validation (first four sources in Table 3.1).

Of the 48 ciliogenesis candidate genes, 45 are found in 38 clusters containing 41 ad- ditional known ciliary genes, 125 annotated unknown genes (genes with annotation not known to be related with ciliary function in other species), and with 70 unknown non- annotated genes (Figure 3.5, Table 3.5). Hence, the co-expression approach allows to identify 125 potential ciliogenesis candidate genes, among the annotated unknown genes, and 70 potential novel ciliary candidate genes from the P.dumerilii non-annotated transcripts found in these clusters. The complete list of genes and transcripts in the HC clusters can be found in Table 3.5. 70

Table 3.5: Ciliogenesis candidate genes by co-expression analysis. Contains the gene name(s) (for annotated) and the P.dumerilii transcript id(s) (for non-annotated) of the unknown genes that were classified into the clusters of HC known ciliary genes. Genes that were also identified as potential candidate genes through the GbA by gene name are shown in gray Cluster HC genes in Other known in Annotated unknown non-annotated unknown Total cluster cluster 1 dnai1;dnaic1, lrrc61 al14e, trfr, comp186810_c1, comp190365_c0, 30 ak7, ift172 poly;yg31b, clax, comp202482_c0, comp206761_c0, co1a2, b4gt3, kc1a, comp213520_c0, comp215572_c0, hme1a;hme2a comp215655_c4, comp216881_c6, comp217039_c1, comp217376_c15, comp217647_c0, comp219143_c8, comp219805_c8, comp220525_c4, comp221237_c0, comp221573_c6, comp221663_c0, comp224112_c2 2 dnai2;dnaic2 enkur tsn33 comp217193_c2, comp226147_c0 5 3 dnal1 ppp2r3c; , gpn1 comp211212_c0 4 dnali1 pifo skp1, rsf1, zc3h1 - 5 5 dnah5;dnah8 dnaaf1, gsto1, mmp16, vigln, comp226251_c0, comp226251_c1 15 dnah11;dnah9; kcrm, lama4, tm9s2, dnah17, eif4g1 my18a, aplp 6 dnah1;dnah6 - - comp221808_c0, comp225473_c0 3 7 dnah2, dnah10 fgd3 pkhl1 - 4 8 dync2h1 - prof, spat6 - 3 9 ift122, nme7 morn2, , dydc1, sam15, lrc23, txnd3, - 11 dyi3 pol4, ce350 10 ift140 st5 star3, frmd5, cl16a - 5 11 ift20 ccd4 riad1,ptgr2 - 4 12 ift27 dyx1c1 plsl, pif1 comp212783_c5, comp224899_c1 6 13 ift52 - ppct, clcn7 comp223944_c0 4 14 ift74 - uckl1, mpu1, a13cb - 4 15 ift80 - - comp211901_c0 2 16 ift81 ccdc83 spg16 - 3 17 ift88 spata17; - comp224588_c3 3 18 ropn1l;ropn1 rsph14 - comp216541_c1 3 19 rsph3;rsph3b rsph10b2;rsph10b, chste , glna,mrp1 comp225031_c2 7 cfap57;wdr65 20 rsph9 c4orf22, dpcd, tmem9, ndk7, tbb4b, - 12 tmem53 lrc34, ptprm, pgpsq, cp6d1, rtjk 21 ccdc164 tekt4, ak8, cfap61 bre1b comp216883_c0 6 22 ccdc37;cfap100 alg6, iqub ct111 - 4 23 ccdc39, rsph1 morn3, sar1b asx - 5 24 ccdc40 - csp - 2 25 tekt2 ccdc81, spag6, reep5,yb145, nac1, comp219894_c1, comp220759_c6 10 apc1 26 ttc26 - copg2 - 2 27 ttc29 c21orf59 lrrx comp218081_c0 4 28 ttc30a; ttc30b - srbp1 - 2 29 efhc1 ppp4r4 trpv4, trpv5, cc148, - 7 kcrm, ugpa 30 gas8 ccdc147 - comp224934_c0 3 31 spef2 lpcat4 - - 2 32 bbs9 bgn, zbbx rna103, bur3, cd63, comp213093_c1, comp214499_c2, 33 rag1, h13, zfp26, dnli, comp216066_c3, comp217586_c2, a4, zn383, zn664, comp220670_c1, comp220812_c0, zbt44, zn814, star, comp221229_c1, comp222302_c0, uhmk1, fem1b, dtx3, comp222935_c2, comp225846_c0 lix1l, zn462, pkcb1, ycf3, rtjk, hnf4b, zc12a 33 arl3; arl9 mdh1b litaf, tsn1, ca074, - 6 hemk2 34 hydin dydc2;rgs22 odf3a, ksha, vmp1, - 6 35 cluap1 ttll4, wdr88 cc74b - 4 71

Table 3.5: (continued) Cluster HC genes in Other known in Annotated unknown non-annotated unknown Total cluster cluster 36 wdr19 ttc25 celf2 - 3 37 wdr35 - catl - 2 38 wdr78 lrriq3, iqcd prd10, vldlr, ube2w, comp209300_c0, comp212196_c1, 41 kld10, hecd2, sphm, comp214051_c1 comp214087_c1, tppp2, mum1, 4cll7, comp214882_c3, comp215635_c0, sc6a9, entp7, galt5, comp216839_c0, comp217204_c0, sl9a9, moxd1, rgpa1, comp217806_c0, comp217861_c2, med15, rtf21 comp218532_c0, comp218667_c5, comp219135_c0, comp219218_c0, comp219806_c0, comp219821_c0, comp220965_c1, comp221019_c2, comp222052_c0, comp222284_c0, comp223210_c1, comp223977_c0, comp225099_c0 Total 45 41 125 70 280

We further identified clusters that are most informative to implicate potential ciliary function among the 38 above clusters. For instance, clusters 1, 7, 9, and 23 each contain more than one HC ciliogenesis candidate gene (Table 3.5, Figure 3.5). Cluster 1 in partic- ular comprises a total of 30 genes, four of which are known ciliary genes in other species, and three of them are part of the HC gene set. Of the HC genes in this cluster, there is a dynein (dnai1), an intraflagellar transport (ift172), and a gene encoding for a metabolic (Ak7). The other known gene in this cluster is lrrc61, a leucine rich repeat con- taining gene. Lrrc61 is included in the known ciliary gene set as it is reported by Quigley et al. [144] and Chung et al.[121] as target of rfx transcription factor in Xenopus laevis. These four known genes are clustered together with 8 annotated unknown and 18 non- annotated unknown P.dumerilii transcripts. To further examine the potential ciliary func- tion of the unknown genes in this cluster, we verified if their fold change of expression in the treated embryos is within the defined ratio interval expected based on the cell fate transformation triggered by the az-treatment. Two of the annotated unknowns (kc1a and clax) and five of the non-annotated unknowns (comp190365_c0, comp202482_c0, comp215655_c4, comp221237_c0, comp224112_c2) exhibit an increase of expression within the ratio interval. Therefore, these seven upregulated unknown genes with similar pattern of expression to four known genes can be considered potentially ciliary genes in

P. dumerilii with five of them being potentially novel ciliary genes. 72

On the contrary, cluster 7 is a smaller cluster containing four genes, one unknown an- notated and three known ciliary genes of which two are HC genes. More importantly, both of the HC genes in this cluster belong to the same gene family: the dynein heavy chain genes dnah2 and dnah10. The fact that this one unknown annotated gene (pkhl1) has a pattern of expression similar to that of three known genes suggest that this un- known gene forms part of the ciliogenesis candidate genes in Platynareis (see Discus- sion).

Similar to cluster 7, cluster 23 contains four known genes, two of which are HC genes

(ccdc39 and rsph1). Even though the HC genes in this cluster do not belong to the same gene family, we believe this cluster allows to implicate the one unknown annotated gene

(asx) as a potential new ciliary candidate gene since it behaves very similar to four known genes (morn3, sar1b, ccdc39, rsph1) (Figure 3.5).

In the case of cluster 9, there are five known genes, two of which are HC genes: an intraflagellar transport gene (ift122) and a nucleoside diphosphate kinase (nme2). The other three known genes in this cluster, are a dynein light chain (dynll2), a DPY30 do- main containing gene (dydc1) known to be involved in spermiogenesis [145], and gene (morn2) encoding for a MORN repeat containing protein that is known to be expressed in the testis and might also be involved in spermiogenesis [146]. Together with these five known ciliary genes, there are six unknown annotated genes in this cluster, andfive genes of these are within the expected ratio interval for the fold change between Az- treated and non-treated embryos. This cluster then, contains potentially five new cilio- genesis candidate genes among the unknown annotated transcripts.

Interestingly, there is another cluster (cluster 34, Table 3.5, Figure 3.5) with a DPY30 domain containing gene (dydc2) as part of the known ciliary genes. In this case, dydc2 is grouped with the HC ciliary candidate gene hydin as well as with four unknown an- notated genes (odf3, ksha, vmp1, ube2h). Of these, the outer dense fiber of sperm tail 3 (odf3) gene encodes for a filamentous protein located on the outside of the axoneme 73 in sperm flagella [147]. As with dydc genes, odf3 gene is involved in spermiogenesis, which again indicates that the co-expression approach is able to detect genes with sim- ilar function based on their expression profiles. This particular cluster, containing two known genes (hydin and dydc2) and four unknown annotated genes (odf3, ksha, vmp1, ube2h), again supports including the annotated unknown genes as potential new cilio- genesis candidate genes in P. dumerilii.

One additional cluster with multiple genes of the same family is cluster 5. This cluster contains one dynein from the HC known genes, and three from the ciliogenesis candi- date genes by GbA by gene name. The fact that four dyneins are present in this cluster, indicates that the co-expression approach is indeed identifying groups of genes that have similar function (Figure 3.5, Table 3.5). Therefore, the unknown genes in this clus- ter, are likely to have a function related to that of the co-expressing known genes. Cluster 5 contains a total of 15 genes, comprising 10 unknown and five known genes: the four dyneins and eif4g1, eukaryotic initiation factor 4 gamma 1 gene, reported as ciliary gene target of Rfx by Chung et al. [121]). Among the 10 unknown genes in this cluster are 8 unknown annotated and 2 non-annotated transcripts. Six of the anno- tated unknown genes (gsto1, vigln, kcrt, lama4, tm9s2, and my18a) have a fold change of expression between treated and normal condition, that falls within the expected ra- tio interval. This further indicates their potential involvement in ciliogenesis as possible targets of the wnt/beta-catenin pathway. Overall, the over-representation of dyneins in the known genes of this cluster, supports the potential ciliary function of the unknown P. dumerilii transcripts in this cluster.

In general, the analysis of co-expression of the HC ciliogenesis genes in P. dumerilii allowed to identify 125 potential new ciliogenesis candidate genes among annotated unknown P. dumerilii genes, and 70 potential novel ciliogenesis candidate genes among the non-annotated unknown transcripts. In the latter set in particular, the co-expression analysis allowed us to filter from 2137 non-annotated upregulated transcripts down to 70 transcripts that have highly similar expression profile to the known ciliary genes. Among 74

Figure 3.5: Co-expression of high confidence ciliogenesis candidate genes. Expression levels (logCPM) over the first 14 hours of P. dumerilii development are shown for the 38 clusters of the 45 HC genes. Name of the HC gene in the cluster is shown in above the plot, known genes in each cluster are shown in black. Complete list of genes is given in Table 3.5. 75 those, 46 transcripts not only behave similar to known ciliary genes but they are also affected by the treatment in the expected ratio interval. Hence, these 46 P.dumerilii non- annotated transcripts constitute reasonable candidates to experimentally validate their potential ciliary function by testing if their spatiotemporal expression correlates with that of the ciliated structures during P. dumerilii development.

Known Ciliary Genes by Localization and Functional Domains

Under the premise that homologous genes have conserved functions, P. dumerilii genes identified to be conserved/homologous to known ciliary genes from other species would then be expected to have similar localization patterns as those described in their counterpart known ciliary genes. By extension, we suggest that P. dumerilii genes co- expressing with those of known localization would participate in a similar process and in some instances even co-localize. Hence, by examining the co-expression clusters of genes according to functional categories and/or structural ciliary subdivisions we aimed at identifying co-localizing ciliogenesis candidates genes related to these categories.

Known ciliary genes compiled in this study were classified based on the localization and/or functional information provided by two of the seven sources included in our ciliary gene compilation: SysCilia [119], and Ciliopathy review [132] (Table 3.1). Of the 469 genes with localization information, 371 are found conserved in P. dumerilii, and 213 are signifi- cantly upregulated by the Az-treatment indicating their potential role in ciliary assembly. Furthermore, 176 are affected by the az-treatment within the expected ratio interval that reflects the increase of ciliated cells in the treated embryos, adding supporting evidence to their potential involvement in ciliogenesis during P. dumerilii development.

Based on the reported localization and/or functional information, known ciliary genes were classified into structural components, functional components, and potential cili- ogenesis precursors. The structural components include terms related with axoneme, transition zone, basal body, central pair, and ciliary membrane. Whereas the functional components include terms associated with ciliary motility, signaling, transport, and reg- 76 ulation. Lastly, the ciliogenesis precursors include centrosome, distal appendages, and ciliary vesicle related terms. Table 3.6 shows the number of genes found in P. dumerilii for each component as well as the number of genes significantly upregulated and within the expected fold change.

Table 3.6: Classification of known ciliary genes into functional and structural components based on localization data reported in other species. Table shows the number of genes found con- served in P. dumerilii, number of significantly upregulated, and number of genes within theex- pected ratio interval Components known found upregu- ratio lated interval

Basal Body 197 180 96 71 Axoneme 146 126 106 69 Transition Zone 42 42 26 19

Structural Central Pair 23 17 16 16 Ciliary membrane 27 24 11 12

Transport 56 46 32 27 Regulation 23 18 8 7 Motility 9 8 4 4

Functional Signaling 12 10 5 5

Ciliogenesis precursors 79 67 43 33

We examined the co-expression clusters with known ciliary genes in the structural and functional components. About 170 clusters contain genes with known localization. Here, for each functional domain, we selected three genes frequently assessed in literature to identify potential new and novel ciliogenesis candidates genes in P. dumerilii likely to be involved in the function of their co-expressing group. 77

Figure 3.6: Ciliogenesis candidate genes classified into structural components Selection of three co-expression clusters per structural component. Co-expression plots show gene expres- sion (logCPM) over the first 14 hours of P. dumerilii development. The schematic representa- tion of the cilium indicates the expected localization of the gene products for each component. Known genes (shown in black) in each cluster have experimental validation for its ciliary local- ization/function in other species. For the current selection of genes the number of known genes, new candidates (annotated) and potential novel (not annotated) ciliogenesis candidates genes is shown.

Identification of Structural Components Among the genes known to express in the cilium central pair of microtubules are hy- din, spag17 and dnah1 ( Figure 3.6A). The hydin gene encodes for axonemal central pair apparatus protein which is involved in cilia motility and its function is associated with several ciliopathies related with sperm motility. As mentioned before, hydin co- expresses with known ciliary gene dydc and four P.dumerilii annotated transcripts (odf3, ksha, vmp1, and ube2h). Out of these, odf3 (Outer dense fiber of sperm tails 3) is also in- volved in sperm motility and known to localize on the outside of the central pair in sperm . All together this cluster seems to capture a group of genes that are function- ally related with ciliary motility and co-localizing in the central pair. However, none of the non-annotated P. dumerilii transcripts are part of this co-expressing group.

Closely associated with genes in the central pair, we identified radial spoke genes, four of which are high confidence known ciliary genes (rsph9, rsph1, rsph3, and rsph4a). 78

The coexpression patterns of three of the radial spoke genes is shown in Figure 3.6B.

Of these, rasph9 co-expresses with three other known ciliary genes (c4orf22, dpcd, and tmem53) (Table 3.5, cluster 20). Among the annotated P. dumerilii genes coexpressing with rasph9, are tmem9, tbb4b, lrc34, which also belong to gene families of known cil- iary genes suggesting that all genes in this co-expressing cluster are indeed functionally related. Similarly, rsph14 co-expresses with another known gene ropn1, reported to be in- volved motility of sperm flagella. This cluster also contains one unknown non-annotated transcript with a similar expression profile to the two known genes in the group making it a potential novel ciliogenesis candidate gene in P. dumerilii.

From the known ciliary genes with a reported role in the ciliary membrane (Fig- ure 3.6C), 24 are found in P.dumerilii and 11 are significantly upregulated in the Az-treated embryos. Figure 3.6C shows the expression patterns of three (slc47a2, rab8, and ) of the 11 known ciliary membrane genes. The co-expression clusters of these impli- cate 5 non-annotated and 9 annotated P. dumerilii transcripts to potentially co-localize and/or to have a ciliary membrane related function. Note that the expression profiles of slc47a2, rab8 are highly correlated as their expression values are higher in earlier stages of development (2 hpf) and gradually decrease as time progresses. This pattern is some- how opposite (although with a more subtle change) to the one of tulp1’s co-expression cluster showing lower expression values during the earlier stages and a steady increase between 4 and 10 hpf Figure 3.6C.

In regard to the axonemal component, 146 known genes are reported to localize in the axoneme, of these 126 where identified in P. dumerilii and 106 are found upregulated by the Az-treatment and form 51 co-expressing clusters. Figure 3.6D shows co-expression profiles of genes evc2, arm4, dnal1 localized in the axoneme, they co-expressed with twelve annotated upregulated P. dumerilii transcripts and with 3 non-annotated tran- scripts. These fifteen transcripts are potential ciliogenesis candidates which, based on their co-expressing partners, would be expected to co-localize in the axoneme. 79

A smaller set of 42 known ciliary genes is reported to be functionally related with the transition zone in the cilium, all of which are found in P. dumerilii and 26 of which are up- regulated by the Az-treatment Table 3.6). All of the transition zone upregulated genes are part of 17 co-expression clusters. Figure 3.6E shows the expression plots for three of the transition zone genes: c5orf42, tmem138 and ephx2. C5orf42, encodes for a ciliogene- sis precursor, known as the ciliogenesis and planar polarity effector 1, and co-expresses with c2cd3, a known ciliary gene involved in centriole elongation. Along with these two transition zone genes there are five non-annotated and seven annotated P.dumerilii tran- scripts whose annotations have not been directly linked with ciliary function. Similarly, tmem138 has nine annotated and four non-annotated co-expressing partners, whereas ephx2 co-expresses only with one annotated gene (Figure 3.6E). This indicates that the unknown genes in this set may be related to the transition zone and may even co-localize with components, since the c5orf42 and c2cd3 cluster contains multiple transition zone genes. However, further study is required to validate this prediction.

From the basal body structural component, for instance, we observed the co- expression pattern of the well characterized known ciliary genes encoding for the cen- trosomal proteins Cep76, Cep19, and for the centrosome and spindle pole associated protein 1 (Cspp1). Of this group of known ciliary genes, regulates centrosomal du- plication [148] and it is involved in the docking of the basal body to the cell membrane. Similarly, Cep19, which is also part of the ciliogenesis precursors domain, is involved in basal body assembly as it organizes and anchors microtubules to the centrosome. Cep19 is also known to initiate early stages of ciliary assembly by recruiting ciliary vesi- cles to the distal appendages of the centrioles [149]. Finally, the third known gene in this set, cspp1, has a role in regulation of cell cycle progression, spindle organization and basal body formation [132]. Among the P.dumerilii annotated transcripts that co-express with these basal body genes, are sir3, s17a5, ycf3, faf2, musk, csk, and others. Cep19 is the known gene in this set co-expressing with 5 unknown non-annotated transcripts that are potentially novel ciliogenesis candidate with functions related to the basal body (Figure 3.6F). 80

Since one of the sources for ciliary function/localization of the known ciliary genes compiled in this study is the ciliopathy review by Reiter et al. [132], most of the genes with this information are also genes associated with a ciliopathy. Of the genes in the set described above (basal body), cep19 and cspp1 have been associated with ciliopathies, cep19 with spermatogenic failure, Bardet-Biedl Syndrome, as well as with morbid obesity [150], and cspp1 with Joubert Syndrome 21 and Meckel Syndrome, Type 1 [132].

Identification of Functional Components Among the 371 known ciliary genes with localization/functional information only 12 have functional terms related with signaling. 10 of these known ciliary genes were found conserved in P. dumerilii and five are upregulated in the treated embryos. Figure 3.7A shows the co-expression patterns of three of these genes. All of the genes selected in this functional component belong to the kinesin family (kif9, kif27, and ). In general, we identified 39 kinesin genes in P. dumerilii of which 19 are upregulated and within the expected fold change of expression. However, no genes annotated as kinesin appear to co-express with any of the three selected kif genes with known signaling function.

Among the co-expression partners of kif9, kif27, and kif17 are 18 annotated and 12 non- annotated P. dumerilii transcripts.

Only nine of the known ciliary genes are functionally annotated with motility related terms (Table 3.6). Upregulated genes in this functional component include drc3, gas8, stk36 and dpcd, which are present in four co-expression clusters, three of which are shown in Figure 3.7B. Among these, there is drc3 (also called lrrc48), a gene encoding for a dynein regulatory complex and a key regulator of ciliary motility by maintaining the alignment of the distal and regulating microtuble sliding [151]. P. dumer- ilii drc3 co-expresses with two known ciliary genes of the coiled-coil domain-containing gene family, ccdc19 (also known as cfap45) and ccdc166, and with two unknown anno- tated P.dumerilii transcripts known as hecd1 and chic2. A different co-expression cluster contains gas8, a gene that also encodes for a dynein regulatory complex (drc4). Gas8 co- expresses with another ccdc gene and with one non-annotated transcript. Finally, of this 81

Figure 3.7: Ciliogenesis candidate genes classified into functional components A selection of three co-expression clusters per functional component. Co-expression plots show gene expres- sion (logCPM) over the first 14 hours of P. dumerilii development. The schematic representation of the cilium indicates the expected localization of the gene products for signaling and transport components. Most of the genes in motility and regulation components are not localized in a par- ticular region of the cilium as their expression affect all the cilium. Known genes (shown in black) in each cluster have experimental validation for its ciliary localization/function in other species. For the current selection of genes the number of known genes, new candidates (annotated) and potential novel (not annotated) ciliogenesis candidates genes is shown.

set of known genes involved in ciliary motility, stk36, a kinase gene also known to be in- volved in the Hedgehog pathway, is part of the largest co-expression cluster with seven unknown (no reported ciliary function) genes, five annotated and two non-annotated transcripts.

Among the functional components, the transport domain contains the most known genes (56 in total, Table 3.6). Of the 56 known ciliary genes involved in transport, we identified 46 homologous genes in P. dumerilii. In addition, among the transport known ciliary we further identified intraflagellar transport (ift) and lipidated protein intraflagellar targeting (LIFT) trafficking machinery. Figure 3.7D shows the co-expression pattern for three representative IFT genes (ift40, ift20, and ift88 ) also shown in clusters 10, 11, and 17 of the HC known ciliary genes (Table 3.5, see Discussion). Figure 3.7D additionally depicts three genes involved in trafficking machinery (pde6d, arl3, and arl3b). The ped6d gene encodes for a protein known to modulate cilia localization of target proteins and in 82

P.dumerilii, it co-expresses known ciliary gene which encodes for a kinesin involved in microtubule anterograde transport and centriole cohesion to distal appendage during initiation of ciliogenesis. These two known ciliary genes co-express with the annotated genes trpm2, y1869, cacl1, dhb12, bsdc1, pkh4, smbt, and pmgt1, and one P. dumerilii non-annotated transcript.

The other two clusters of the trafficking machinery involve two members of the ADP-

Ribosylation factor like GTPase family, arl3 and arl3b, both with known function in cil- iary maintenance and cargo trafficking to the periciliary membrane. In addition, both arl genes are associated with ciliopathies [152]. Arl3 coexpress with the known ciliary genes mdh1b and four annotated transcripts not previously related with ciliary function. Arl13b is contained in a smaller cluster only co-expressing with one annotated gene (cpt1). De- spite the fact that both arl genes are involved in similar functions, their expression pro- files over early P.dumerilii development are noticeably different. Arl3 exhibits a constant increase of expression whereas arl13b is highly expressed expression thought all mea- sured timepoints (Figure 3.7D).

The regulation domain contains 23 known ciliary genes, 18 of which are found con- served in P. dumerilii and 8 are upregulated in the Az-treated embryos. Three of these encode for the transcription factors: FoxJ1, Rfx2, and Mxi1. With our current cluster- ing parameters, P. dumerilii homologue to /2/3 does not co-express with any other upregulated gene. Besides transcription factors, other known ciliary genes with regu- latory function include syne2 and cetn2, which are known targets of Rfx2 and FoxJ1 respectively, and both are known to be involved in maintaining cytoskeleton organiza- tion. Figure 3.7C shows the co-expression patterns of four of the known ciliary genes with regulatory function (foxj1, syne2, cetn2, ) and two ciliary genes known to be targets of FoxJ1 and Rfx2.

Among all known ciliary genes compiled in this study, 817 genes are reported as tar- gets of both Rfx2 and FoxJ1 and of those 715 are found conserved in Platynereis dumerilii 83 whereas 392 are upregulated in the Az-treatment. Based on high throughput studies in

X. laevis [121, 133] a similar number of genes (713) is reported as targets of Rfx2 only, of those 599 are found in P. dumerilii and a smaller than expected subset of 152 is upregu- lated in the treated embryos. Similarly, most of the genes reported as targets of FoxJ1 based on studies in D. rerio and X. laevis [110, 133] were found in P. dumerilii and 82 are significantly upregulated by the treatment (Table 3.3). Figure 3.7C shows three selected targets of Rfx and FoxJ1.

The transcription factor Mxi1, known to inhibit transcriptional activity, is one of the Rfx2 targets [121] shown in Figure 3.7C. In P. dumerilii mxi1 is co-expressing with 8 annotated and 4 non-annotated transcripts, suggesting their potential involvement in a similar function. Other Rfx targets are syne2, a known ciliary genes with reported reg- ulatory function, and usp25. The latter co-expresses with seven annotated transcripts and 2 non-annotated, whereas syne2 only has 2 co-expression patterns, both among the annotated genes.

Of the FoxJ1 targets, cetn2 encodes for a microtubule binding protein with a role in microtubule-organizing center. It co-expresses with another known ciliary gene, c20orf85, and with one annotated and one non-annotated P. dumerilii transcript. In a larger cluster, a FoxJ1 target, tekt2 co-expresses with three known ciliary genes (Ta- ble 3.5, cluster 25). Finally, homologous foxj1 in P. dumerilii co-expresses with three annotated genes not previously associated with ciliary function and one non-annotated transcript.

Ciliogenesis Precursors

Based on the functional information of the known ciliary genes in other species, we identified genes involved in three stages of early ciliogenesis: 1) centriole maturation into a basal body, 2) association of centriole distal appendage with ciliary vesicles, and 3) formation of centriolar satellites (see Figure 3.8). A total of 79 known ciliary genes 84 have been functionally related with these stages of ciliogenesis which together form the ciliogenesis precursors functional domain.

Figure 3.8: Potential Ciliogenesis precursors identified by co-expression. Schematic represen- tation of three early ciliogenesis events along with the co-expression plots of genes with known functions in each ciliogenesis stages. Gene expression (logCPM) over the first 14 hours of P. dumerilii development. Known genes (shown in black) in each cluster have experimental val- idation for its ciliary localization/function in other species. For the current selection of genes the number of known genes, new candidates (annotated) and potential novel (not annotated) ciliogenesis candidates genes is shown. 85

Of the genes involved in centriole maturation to form the basal body, Figure 3.8A shows the co-expression profile of three known ciliary genes , and ttll5. Cep63 is known to initiate centrosome duplication by recruiting cep152. The expression profile of cep152 shows a significant increase of expression level between 4 and 8hpf that peaks at the 8 hpf time point. In contrast, cep63 appeared to be constantly highly expressed during the first 14 hours of P.dumerilii development. Showing similar levels of expression to cep63 is the expression pattern of ttl5 which encodes for tubulin tyrosine ligase like 5 protein. The co-expression patterns of these three genes allow to potentially implicate 6 annotated and 3 non-annotated P. dumerilii transcripts as novel ciliogenesis candidate genes (see Discussion).

In regard to ciliogenesis precursors related to the centriole distal appendage docking, five genes, , cep89, cep83, fbf1, and sclt1 are known to have an essential role in this stage of ciliogenesis [153, 154, 155]. We identified homologous genes in P.dumerilii to all of these five genes. Figure 3.8B shows the co-expression profiles of cep164, cep89 and fbf1. All together, cep164 and cep89 co-express with five annotated and two non- annotated transcripts. In addition, cep89 co-expresses with another known gene, stk38, which encodes for a serine/threonine kinase involved in the Hedgehog signaling path- way. On the contrary, fbf1 was not found to co-express with any other gene, despite its expression profile exhibiting a clear on set of expression between 6 and 10 hpf, similarto what is observed in cep89 and cep164 (see Figure 3.8B for details). Lastly, of the known ciliary genes involved in the formation of centriolar satellites, structures that act as an intermediate compartment for ciliogenic proteins, are , c2cd3, cep290, and cep76. The co-expression profile of ofd1, cep290 and cep76 is shown in Figure 3.8C. Ofd1, has a rather constant expression level with little variability and it co-expresses with five an- notated and one non-annoated P. dumerilii transcripts. In contrast, cep290 and cep76 have lower levels of expression earlier in development with increasing values as devel- opment progresses. Cep290 exhibits a larger change between 6 and 10 hpf compared to that of cep76. Cep290 co-expresses with known ciliary gene map7, a gene that encodes for a microtubule organization protein. Along with these known genes there is one non- 86 annotated and two annotated P. dumerilii transcripts. In the co-expression cluster with cep76, 5 annotated and two non-annotated transcripts were identified as potential cilio- genesis candidate genes. Finally, c2cd3 is also reported to localize in the axoneme, and its co-expression in P. dumerilii is described under the axonemal structural component (Figure 3.6E).

Discussion

In this study, we present the first comprehensive survey of ciliary genes in P. dumer- ilii. To this end, we developed a rigorous, computational, multi-stage strategy for the identification of genes involved in ciliogenesis of P. dumerilii. Our approach not only includes the identification of homologous genes to known ciliary genes described in other species but further expands the set of known ciliary genes by utilizing a guilt-by- association approach based on gene family name, and by integrating expression data from treated embryos resulting in hyperciliated P. dumerilii larva. In addition we inte- grated a co-expression analysis based on expression data from the first fourteen hours of normal P. dumerilii development.

Despite the fact that most of the source studies included in this compilation of known ciliary genes report genes in vertebrate model organisms such as mouse, frog, zebrafish, and human, we successfully identified the majority of these genes in the annelid P. dumerilii. Only 172 genes out of the 2359 known ciliary genes were found to have no homolog among the early development P. dumerilii transcripts we obtained from Pdum- Base [143]. For instance, based on sequence similarity alone, of the 302 of the ’Ciliary gold standard’ provided by the SysCilia consortium [119], only six genes were not found in P. dumerilii (c2orf71, cldn2, , mal, sstr3, and tubgcp5), two of which are asso- ciated with ciliopathies. Further genomic and phylogenetic analysis will be required to identify whether these, as well as all the 172 known genes not yet found in P. dumerilii, have homologous or if they are actually not present in P. dumerilii. 87

In addition, as expected, we identified the majority of the ciliary genes reported byNev- ers et al. [156] as conserved across all metazoans in P. dumerilii. For instance, Nevers identified six transition zone genes (cc2dc2a, , , b9d1, b9d2, tmem231) to be conserved in all ciliated organisms. Of those, all but one (cc2dc2a) are found conserved in P. dumerilii, three of which are upregulated in the treated embryos.

Interestingly, among all homologous known ciliary genes identified in P. dumerilii, less than half (611) were found to be significantly upregulated in the Az-treatment conducive to form hyperciliated larva. Hence, we focussed our downstream analysis on those sig- nificantly upregulated genes allowing us to to identify about 160 potential new and novel cilogenesis candidate genes by integrating information such as gene family name, and expression profiles into our pipeline.

It should be noted that among the downregulated genes, and the genes showing no significant change of expression in the Az-treated embryos, there are genes known tobe involved in ciliogenesis in other species. However, our analysis suggests their expres- sion not to be affected by the Az-treatment (Figure 3.1D) in P. dumerilii. It is possible for these genes to still be involved in ciliogenesis in P. dumerilii but regulated by a mecha- nism different the one triggered by the activation of the β/catenin pathway. Under this scenario, such set of genes could have a delayed onset of expression and hence their expression was not observed at the time we took the sample to assess relative mRNA abundances. Alternatively, these could be ciliary genes with additional functions, and therefore under diverse regulatory control.

Included in the known ciliary genes that are downregulated in the treated embryos are 53 genes associated with ciliopathies, some of which are well characterized ciliogenesis precursors. One example is the P. dumerilii homologue to the known ciliary gene ccno which encodes for centriole duplication cyclin O. Ccno is one of the genes involved in the initial stages of ciliogenesis in multiciliated cells (MCC) and its dysfunction has been linked to ciliopathies related with reduced number of cilia in MCC [157]. Given its known 88 function, ccno was expected to be upregulated in the hyperciliated larva. Similarly, sclt1 also known to be required to initiate ciliogenesis [154, 155], yet is found to be downreg- ulated by the treatment in P. dumerilii.

The above results hint towards the onset of expression of these genes to occur at a prior or later stage in development and is hence not captured in the 12 hpf mRNA sample for differential expression in our experimental setup. However, since these genes are known to be involved in the earlier stages of ciliogenesis, and because P. dumerilii larva starts to develop the protrochophore (ciliated ring Figure 3.1A ) at around 12 hpf, we are inclined to suggest that such genes, if involved in ciliogenesis, could be then highly expressed at an earlier time point in development, potentially maternally and/or ubiquitously expressed. Alternatively, a different set of molecular components could be involved in the initiation of ciliogenesis in MCC in P. dumerilii which are not described in other organisms. Further study is required to elucidate whether or not cnno and sclt1 (among the other downregulated) are involved in ciliary assembly in P. dumerilii.

Identification of Ciliogenesis Precursors and Transcriptional Regulators

Multiciliated cell (MCC) differentiation involves a massive production of basal bodies.

This process is well characterized in different vertebrates (chicken, Xenopus, and mouse) and it is known to involve two pathways: centriole duplication and centriole independent de novo basal body formation, also called ”deuterosome pathway” [158, 159]. Centri- ole duplication pathway produces two to six basal bodies and it is initiated by cep63, a protein required for centriole duplication that recruits cep152, a key regulator for basal body assembly [158]. The homologous genes to cep63 and cep152 in P. dumerilii (Fig- ure 3.8A) are significantly upregulated by the Az-treatment. Cep63 co-expresses with vash1, which regulates microtubule dynamics [160], and with med9, are involved in tran- scriptional regulation encoding for a protein sub-unit of the mediator complex that is co-activator of polymerase II. Cep152, co-expresses with two genes (limk2 and wasp1) known to be involved in in regulation of the cytoskeleton. 89

On the other hand, the ”deuterosome pathway” produces tens of basal bodies and is initiated by ccdc67 (also called deup1), paralog of cep63. In addition to ccdc67, ccdc78 is also involved in the initiation of this pathway in Xenopus [161]. We identified homol- ogous genes to ccdc67 and ccdc78 in P. dumerilli. However, in contradiction with the expected upregulation of basal body biogenesis regulators involved in MCC differentia- tion, P. dumerilii ccdc67 is found downregulated and ccdc78 has no significant change of expression in the Az-treated embryos. This results suggests that de novo basal for- mation in P. dumerilii could be initiated by different molecular components.

The above results are in agreement with basal body biogenesis in other protostomes.

Studies in planaria S. mediterranea indicate that basal body assembly depends on the same components for centriole duplication as those described in vertebrates, yet the initial steps for the ”deuterosome pathway” are likely to be different as depletion of ccdc67 (deup1) in S. mediterranea has shown to have no effect in MCC differentiation [162, 163].

Alternatively, finding this well-characterized ciliogenesis precursors genes (ccdc67 and ccdc78) not upregulated in the hyperciliated embryos, could indicate that these genes might be maternally provided or highly expressed at an earlier stage of develop- ment (prior our Az-treatment) and could have additional general functions in earlier cell divisions and therefore be regulated differently. Under such scenario, these genes could still be involved in basal body duplication without being upregulated in the hyperciliated larvae.

In regard to the transcriptional regulation of ciliogenesis, studies in vertebrates sug- gest that MCC differentiation is initiated by the inhibition of the Notch pathway, which in turn triggers the activation of a regulatory cascade that includes two geminin-related genes, gemc1 and mcidas (also called multicilin), and the transcription factors , Rfx2, FoxJ1, and Myb. Prior to the beginning of ciliary assembly, differentiating into a MCC requires the committed cell to exit cell cycle, form numerous basal bodies, and to 90 remodel its cytoskeleton [158, 159]. Expression of Gemc1 and mcidas is known to be enough to trigger the initiation of such events, hence initiating the MCC differentiation [158].

We identified P.dumerilii homologous genes to geminin, mcidas, e2f4, rfx1/2/3 (one P. dumerilii transcript is homologous to three rfx genes), foxj1 and . Of those, geminin, e2f4, and myb are not significantly upregulated, while mcidas, rfx, and foxj1 are upreg- ulated by the β-catenin activation treatment. In addition, under our current clustering parameters for the early development co-expression analysis, we found no genes to co- express with mcidas and rfx. Of the three genes, the expression levels of rfx, in com- parison to that of the other two genes, are constantly high during the first 14 hours of development. In contrast, both foxj1 and mcidas show a gradual and constant increase of expression as development progresses. Identifying these components in P. dumerilii is the first step towards the characterization of the regulatory mechanism driving cilio- genesis in Platynereis. However, functional assays are required to elucidate the interplay of these six (including geminin, e2f4 and myb) molecular components in the regulatory landscape of MCC differentiation during P. dumerilii development.

Potential Novel Ciliogenesis Candidates Genes Based on Co-expression Analysis

The co-expression analysis enables the possibility to identify ciliogenesis candidate in an annotation independent manner and allowing to extend the search space for cili- ogenesis candidate genes by an additional 2137 non-annotated upregulated transcripts found in P.dumerilii. These genes represent almost half of the total upregulated genes by the Az-treatment. For this set of non-annotated transcripts however, very limited infor- mation about their potential involvement in ciliogenesis exists. A comparison based on sequence similarity revealed that these transcripts appear not to be homologous to any of the 2359 known ciliary genes compiled in this study. By including the expression data however, we are able to identify not only potentially novel ciliogenesis candidate genes but also potential targets of the activation of the wnt/β-catenin pathway through com- 91 parison of their expression in the Az-treated embryos compared to that in non-treated embryos.

The clusters of genes co-expressing with the high confidence (HC) ciliogenesis can- didate genes allowed us to identify 70 unknown non-annotated transcripts as potential novel candidate ciliary genes in P.dumerilii. Of these 70 transcripts, 46 exhibit a change of expression in the treated embryos that is within the expected ratio interval based on the cell fate transformation and therefore, these 46 non-annotated transcripts with similar patterns of expression to the HC ciliogenesis candidate genes are considered potential wnt/β-catenin targets.

Based on their co-expression profile, 6 of the 46 potential novel ciliogenesis candidates genes are predicted to have similar function to the WD repeat domain which contains gene wdr78, known to contribute to the ATP-dependant microtubule motor activity by binding to dyneins [164]. Other known ciliary genes co-expressing with wdr78 are iqdc and lrriq3. Interestingly, iqcd gene encodes for an IQ domain-containing protein that forms a dynein regulatory complex 10 (DRC10) [165], further indicating that genes in this cluster might have a role related with the ciliary motor activity. This cluster (cluster 38, Table 3.5) represents the largest of the HC ciliogenesis candidate genes co-expression clusters and not only implicates the above mentioned 6 P. dumerilii non-annotated tran- scripts as potential novel ciliary candidate genes, but also includes 17 annotated tran- scripts, whose annotations have not yet been associated with ciliary function in other organisms. Our results, however suggest 15 of these annotated transcripts as poten- tially involved in ciliogenesis, as they are significantly upregulated by the az-treatment, their change of expression in the treated P. dumerilii embryos falls within the expected ratio interval, and their expression pattern during normal development correlates to that of three known ciliary genes involved in microtubule motor activity.

Similarly, cluster 1 contains 5 of the 46 potential novel ciliogenesis candidate genes, these candidates co-expressed with four known genes (dnai1, ak7, ift72, lrrc61) and 8 92 annotated genes, two of which (clax and kac1a) are affected by the treatment within the expected fold change. To elaborate on the potential ciliary activity of the unknown genes in this cluster, we examined the function of the known ciliary genes they are co- expressing with. Among those, there are three high confidence ciliary genes: a dynein axonemal intermediate light chain (dnai1), an intraflagellar transport gene (ift172), and a gene encoding for a metabolic enzyme, adenylate kinase 7, which has been reported to be involved in maintaining ciliary structure and function, and it is known to be expressed along the full length of the sperm flagellum co-localizing with alpha-tubulin [166]. Inad- dition to these three HC ciliary genes, lrrc6, a known gene for a leucine-rich repeat and IQ domain containing protein is also present in this cluster. This protein is known to be involved in axonemal assembly and has been linked with ciliopathies associated with respiratory infections and male infertility [167]. Interestingly, in ak7 gene have also been linked with male infertility [166]. Therefore, clax and kac1a, genes that have not been linked with ciliary function in other species, constitute potential new ciliogen- esis candidate genes in P. dumerilii. We further hypothesize that the 8 non-annotated P. dumerilii transcripts found in this cluster are potential novel ciliogenesis genes that might be involved in maintaining ciliary stucture and, or axoneme assembly related func- tions.

Three of the identified potential novel ciliogenesis candidates genes in P. dumerilii are possibly related with the coiled-coiled-domain containing gene family. The first of such potential novel genes (cluster 21, Table 3.5), co-expresses with the HC known ciliary gene ccdc164 and with three known ciliary genes cfap61, ak8, tekt4. Of these, ak8 and tekt4 were identified here as ciliogenesis candidate genes through the GbA by genename approach (Table 3.4). Both, ccdc164 and tekt4 encode for coiled-coiled-domain contain- ing proteins. Ccdc164 is known to be a key regulator of ciliary motility as it is part of the nexin-dynein regulatory complex (N-DRC) which maintains the alignment of the distal ax- oneme and regulates the microtubule sliding to allow movement [151]. Similarly, genes of the tektin family encode for proteins that form filaments known to associate with tubu- lin in the axonemal and centriolar microtubles. In the axoneme, the tektin filaments are 93 involved in the organization of the dyneins and radial spokes. In addition, reduced expres- sion of tektins has been linked with shortening of cilia and flagella [168, 169]. Not only the non-annotated unknown P.dumerilii transcript (comp216883_c0) co-expresses with the above mentioned coiled-coiled-domain containing genes, but also with the known cil- iary gene cfap61, which encodes for cilia and flagella associated protein 61 [170], adding supporting evidence to its potential involvement in ciliary function.

In a very similar co-expressing group of genes (Figure 3.5, cluster 25) also involving two coiled-coiled-domain containing genes, we identified additional two potential novel ciliary candidates genes. In this co-expressing set there is tekt2, from the HC known ciliary gene set, and ccdc81, which was found by guilt-by-association by gene name (Ta- ble 3.4), and encodes for a centrosome associated protein [171]. Other known ciliary genes in this cluster include spag6, encoding for a protein that associates with micro- tubules in the sperm flagellum (sperm associated 6) [172], and znf346 which is included in the set of known ciliary genes as it is reported by Chung et al. [121] to be a target of Rfx2 in Xenopus laevis. In addition, this cluster also implicates the poten- tial ciliary function of four annotated P. dumerilii transcripts (reep5, yb145, nac1, apc1), whose annotations have not been previously related with ciliary function. Of those four unknown annotated genes, three (reep5, yb145, apc1) exhibit the expected fold change of expression between Az-treated and control samples. Therefore we considered these genes as potential new ciliogenesis candidates genes. Along with the two potential novel ciliogenesis candidate non-annotated P. dumerilii transcripts.

In the set of 48 HC ciliogenesis candidates genes, there are 10 intraflagellar transport genes (see Table 3.4). Both with 10 genes in the set, intraflagellar transport represents the largest gene family in the HC genes along with the dyneins. To our surprise, none of the ift genes in the HC set appear to have similar patterns of expression among them- selves as they all belong to different co-expression clusters (clusters 1, 9 to 17 Table 3.5,

Figure 3.5). However, co-expressing with ift genes are five non-annotated unknown tran- scripts, that could potentially be involved in ciliary function. Three of these P. dumerilii 94 non-annotated transcripts co-express independently with an ift gene (clusters 13, 15, and 17), and two jointly co-expressing with the P. dumerilii homologous to ift27, in clus- ter 12. Together in this group, there is one more known ciliary gene: dyx1c1, which is part of the inclusive core of the known ciliary genes as it is reported by SysCilia [119], and by the ciliopathy review by Reiter et al. [132] as a gene confirmed to be involved in a ciliopathic phenotype. In addition, dyx1c1 is included in our compilation of known ciliary genes as target of Rfx and FoxJ1 [133, 111]. Dyx1c1 (also known as dnaaf4) is known to be required for the assembly of axonemal dynein arms and mutations in this gene lead to defects in cilium motility [173] while it has been linked to primary ciliary diskinesia and other ciliopathies in humans [174]. P. dumerilii dyxc1 co-expresses with ift27, two non-annotated transcripts, and two annotated transcripts (plsl and pif1) whose annotation has not been related with ciliary function in other species. Pls1, part of the plastin gene family (also known as fimbrin) encodes for an actin-binding protein involved in regulation of the actin cytoskeleton, and in the formation of actin bundles in microvilli and stereocilia [175], however it is not known to be involved in ciliary functions. Similarly, pif1 encodes for a metabolic enzyme that functions as a DNA helicase. Therefore, the supporting evidence to potentially implicate the unknown non-annotated genes in this cluster is inconclusive. On one hand the two non-annotated transcripts were found to co-express with one high confidence known ciliary gene, itf27, and with dyxc1, an also well supported known ciliary gene part of the inclusive core. On the other hand however, in this cluster two annotated transcripts are also co-expressing which are affected by the Az- treatment within the expected fold change (plsl: r = 1.71, and pif1 r = 1.93), but with annotations not known to be associated with ciliary function. Such scenario requires further study as the non-annotated transcripts could be functionally similar to either set of genes in this cluster.

In contrast, the co-expressing partners of the other ift genes in clusters 13, 15, and 17 could potentially implicate novel ciliary candidates. For instance, cluster 17 shows ift88 with one other known ciliary genes, spata17, an spermatogenesis associated gene [165], and one non-annotated P. dumerilii transcript (comp224588_c3 Table 3.5). The fact this 95 unknown gene has a very similar expression profile to that of two known genes (Fig- ure 3.5 cluster 17) provides evidence towards its potential involvement in ciliary function during P.dumerilii development. In addition, this gene is upregulated by the Az-treatment in a fold-change that relates with the phenotypic change triggered by the treatment, mak- ing it a potential novel ciliogenesis candidate gene.

Similar to ift88, the expression profile of ft80 allows to implicate another potential novel ciliary gene. In this case the P. dumerilii homologous to ift80 co-expresses with an unknown non-annotated transcript and both show highly similar expression profiles (Figure 3.5, cluster 15) while being the only two genes in this cluster. Cases like this, in which a cluster consists of one homolog to a well-characterized known ciliary gene, and the other corresponds to an unknown non-annotated transcript, represent an un- precedented opportunity to explore the potential nature of the unknown non-annotated transcript in the set. In this particular example, the unknown gene in the cluster could be an splice variant of its co-expressing partner (ift80) or a potential ift paralogous gene in P. dumerilii. The answer to this is out of the scope of our study, however the ability to identify such potential candidates among all the non-annotated (∼ 9000) P. dumerilii transcripts enables the possibility to improve the current transcriptome annotation while identifying potential ciliogenesis candidate genes.

Interestingly, the 38 HC clusters include 15 genes that were identified as ciliogene- sis candidate genes throughout the guilt by association approach by gene family name (shown in gray in Table 3.4). For these fifteen genes, we therefore have two independent approaches implicating them with ciliary function in P.dumerilii. First, the co-expression analysis for these genes adds certainty to consider them ciliogenesis candidates genes. Second, finding co-expression among genes that were already identified as potential candidates provides strong indication about the potential ciliary function of the unknown genes classified into their clusters. 96

One of such cases occurs in cluster 5 (Table 3.5, Figure 3.5) which consists of five known genes, four of which are dynein genes (see Results). The 10 unknown genes in this cluster are considered potential ciliogenesis candidates, and among those, six are potentially regulated by the activation of the wnt/β-catenin pathway, as their fold change of expression in the treated embryo falls within the expected ratio interval that reflects the increase of ciliated cells due to the cell fate transformation.

Similarly, co-expression clusters with more than one HC gene also provide insight into potential new ciliogenesis candidates and potentially novel ciliary genes. For instance, among the HC clusters, cluster 7 (Figure 3.5, Table 3.5), has two HC genes of the same gene family, encoding the ciliary motor dynein proteins Dnah2 and Dnah10, which are grouped together with another known ciliary gene (fgd3) and one annotated unknown gene (pkhl1). The fact that this cluster has three known ciliary genes, two of which are HC genes belonging to the dynein axonemal heavy chain gene family, and the third be- ing a known ciliary gene whose function is related with the control of actin cytoskeletal dynamics [176], allows us to implicate the unknown gene pkhl1 (homologous to mam- malian polycystic and hepatic disease-like1, also known as pkhd1l1) as a new ciliogenesis candidate gene in P. dumerilii. In addition, the sub-cellular localization re- ported for pkhl1 in mouse, includes the cilium [177] even though pkhl1 is not referenced as a ciliary gene in any of the sources used for our known ciliary gene compilation Ta- ble 3.1.

Conclusion

We present here the first comprehensive survey of ciliogenesis candidate genes and potential novel ciliary genes in Platynereis dumerilii. In addition, based on functional descriptions of their homologous counterparts in other species, a probable function and or localization is proposed for some of the candidate genes.

Despite the absence of a P. dumerilii reference genome, our approach allows to iden- tify potential new and novel ciliogenesis candidate genes, among all the significantly up- 97 regulated P. dumerilii transcripts with and without annotation. The importance of such identification relies on the inclusion of poorly characterized transcripts that have shown to not be homologous (by sequence) to any other annotated gene in other species. How- ever, with our current approach we are able to suggest the potential ciliary activity of 70 of such non-annotated P. dumerilii transcripts. Of those, 46 are potential targets of the β-catenin pathway activated by the treatment.

In addition, our approach identified 125 potential new ciliogenesis candidate genes in P. dumerilii among annotated transcripts, whose annotation have not been previously related with ciliary function. This not only improves the current status of the early P. dumerilii transcriptome annotations, but also contributes to the elucidation of ciliogene- sis by expanding the set of known ciliary genes.

Further studies would be required to functionally validate new and novel ciliogenesis candidates compiled here. Nevertheless, this represents the first compilation of ciliogen- esis candidate genes of this kind in a protostome organism. Therefore this compendium of candidate genes represent a valuable resource for future comparative studies of cil- iary function and diversity in invertebrates.

Methods

Compilation of a Comprehensive Set of Known Ciliary Genes

A comprehensive list of known ciliary genes was obtained by combining information from seven different ciliary studies including two ciliary databases: SysCilia [119] and Cildb [130]. The remaining studies consisted of a survey of conserved ciliary genes across different taxa by Sigg and collaborators [131], a Ciliopathy review by Reiter, J.F, and Leroux, M.R in which confirmed and candidate ciliopathy genes are reported[132]; In addition, three independent studies identifying ciliary genes targets of FoxJ1 and/or RFX3 transcription factors [133, 110, 178] were used. To consolidate duplicate entries, we mapped all gene identifiers to Ensembl Gene Stable ID from the human genome 98 annotations (version GRCh38.p12) [179]. This gene id was consequently used as the unique identifier to aggregate multiple instances of the same gene into one represen- tative record in the set. However, we kept track of each source from where each gene was added to the known ciliary gene set, to enable the identification of genes listed by multiple sources. For only a small number of genes (<5), the gene id was not found in the human annotations, in which case the Ensembl mouse gene stable IDs was used as a surrogate.

Identification of Known Ciliary Genes Conserved in P. dumerilii

Once the set of known ciliary genes has been established, we obtained protein se- quences from the Ensembl database [180] for each gene in the set. Next, we used BlastP [134] (see Additional Figure 1 for the general method outline) to align protein sequences of known ciliary genes to P. dumerilii predicted protein sequences of all transcripts ob- tained from PdumBase [143] (a detailed description of the methods for P. dumerilii pro- tein prediction can be found in Chou et al. 2016 [181]). We retained all BlastP align- ments with e-values smaller than or equal to 0.001. Under this criteria, some transcripts had significant alignments to multiple genes. Since such multiple matches typically re- quire case by case revision, all significant matches were kept as potential annotations for that particular transcript, and a weighted score was assigned to each match. This score was calculated as the Blast bit score for gene x aligned to transcript i over the sum of the bit scores for all significant alignments for transcript i. A weighted score of 1 represents a unique match whereas a score < 1 indicates multiple matches present with higher score representing better alignments. The weighted score therefore allows for categorizing the quality of each alignment when multiple annotations are present.

Classifying Known Ciliary Genes by Localization and Functional Domains

The sources of known ciliary genes constitute a mixture between curated datasets and high-throughput (HT) studies. As curated datasets have lower false discovery rate 99

(FDR) compared to HT studies, we leverage this information to find a subset of high con- fidence ciliary genes among all the genes in the known set. For this, we calculated the overlap of genes among the seven sources and defined two sub-sets of known genes: strict core genes and inclusive core genes. The strict core corresponds to the inter- section of genes reported in all four curated datasets ( SysCilia, Cildb, shared ciliary Proteome by Sigg et al. [131], and the ciliopathy survey by Reiter et al. [132]). Less re- strictive, the inclusive core contains genes present in two of the four sources of the strict core.

In addition to the source, the metadata in the known ciliary set includes the follow- ing information when available: localization and/or function, ciliopathy involvement, and transcriptional input. We extracted all terms used to describe localization and functional information from the original source for each gene. Similar terms were then grouped into 10 modules as follows: regulation, motility, signaling, transport, axonemal, basal body, central pair, centrosome, ciliary membrane, and transition zone. These modules make up two ciliary components: structural and functional.

Platynereis dumerilii Culture, Azakenpaullone Inhibitor Treatment, and Sequencing

P.dumerilii embryos were collected from the breading culture maintained at Iowa State University according to the protocols described by Zantle et al. [126]. Fertilized eggs and embryos were kept at 18◦C to maintain consistent rates of development.

Embryos were treated with Azakenpaullone following a paired experimental setup with three replicas as follows: embryos from three mating events were separately split into two groups of ∼1200 embryos. One group was treated with 5uM Azakenpaullone (Az) in 0.1% DMSO from ∼ 3:45 hpf to 4:30 hpf (8-cell to the 16 cell-stage) according to pro- tocols in Schneider and Bowerman, 2007 [136]. The second group (control) was treated with DMSO at the same time interval. A portion of ∼500 embryos from treated and control groups were collected at 12 hpf and homogenized in Trizol for subsequent RNA extraction and sequencing, or fixed at 12 hpf or 24 hpf for in situ hybridization. Foreach 100 group in each batch ∼100-200 embryos were kept for morphological screening at 24 and 48 hpf (Figure 3.1E).

Extracted RNA was treated with RNase-free DNase set (QIAGEN), purified with the RNeasy Mini Kit (QIAGEN), and checked for RNA degradation on a 1% agarose gel. Ad- ditional total RNA quality was checked using the Bioanalyzer system (Agilent). Prepara- tion of each barcoded Illumina mRNA-seq library, and Illumina deep sequencing with 75 bp–100 bp paired-end reads (4 samples per lane) were performed by the Genome Se- quencing and Analysis Core Resource at Duke Institute for Genome Sciences and Policy using an Illumina HiSeq sequencing system.

Read Processing and Differential Expression Analysis

Raw reads were processed with Trimmomatic [182] to remove adapter sequences and low quality reads. Quality of pre-processed reads was analyzed and visualized with FastX toolkit [183]. Bowtie was used to align the reads from each sample to the P. dumerilii transcriptome previously assembled in our group [181]. We then used SAMTools [184] to estimate the number of reads mapped to each transcript, and EdgeR [185] was used for the subsequent analysis of differential expression.

Read counts were normalized based on the library size of each sample. Low ex- pressed genes were filtered by keeping genes that are expressed in 2 or more sam- ples. To decide whether or not a given gene is being expressed at a particular sample, a threshold of 1 fragments per kilobase of transcript per million mapped reads (FPKM) was established. This inclusive threshold is below the lowest observable expression value through in-situ hybridization in Platyneries embryos using our current laboratory protocols. According to this threshold, any gene with an FPKM greater than 1 would be considered expressed. Using this expression threshold, 17378 transcripts were included in the differential expression (DE) analysis. 101

As both treatments, 0.1% DSMO (control), and 5µM 1-azakenpoullone (Az-treatment), were administrated to offspring of one mating event, our analysis takes into account not only the differences between control and treatment, but also the differences of each independent mating event (replicates). This constitutes a paired design comparing treat- ment and control for each mating event separately to then subtract the differences be- tween them.

Given our experimental design, we fit a generalized linear model for each gene as- suming a negative binomial distribution for the expression levels and evaluate with a quasi-likelihood test [186].

This model allows us to 1) compare significant differences in expression between the treatment and control at a significance level α of 0.05, and 2) assess variability between the three independent mating events (replicates). Additional Figure 2 depicts the in silico pipeline containing the relevant steps included in the DE analysis. With this pipeline, we identified 4517 significant upregulated genes, 4425 downregulated and 8436 genes with no significant change in their expression.

Identification of Ciliogenesis Candidate Genes

Among the total set of 17378 transcripts included in the DE analysis, a subset of 8004 transcripts lack annotation. We aim at identifying ciliogenesis candidate genes among both the annotated (9374) and the non-annotated transcripts (8004). For this, we de- signed two annotation-independent approaches. The first approach takes into account the ratio between expression values in Az-treated and control embryos, while the second approach evaluates co-expression patterns during normal development. In addition, we examined the annotated set and identify genes with ciliary related annotations. Finally, we identify the genes that compile the most evidence to be appointed as ciliogenesis candidate genes by finding the overlap between the three approaches. In the following sections we describe the methods to identify ciliogenesis using 1) expression values from treated vs. non-treated P. dumerilii larva, 2) annotation, and 3) co-expression pat- 102 terns based on expression values from the first 14 hours of P. dumerilii development under normal conditions.

Expected Fold-change of Expression for Ciliary Genes

In order to identify ciliary genes among the P. dumerilii transcripts affected by the Az- treatement. We defined an interval of expected fold change of gene expression between treated and non-treated embryos based on the predicted cell fate transformation trig- gered by the AZ-treatment.

Under normal development, P. dumerilii larva form a ciliated ring composed of two rows of twelve cells. Based on the cell fate transformation, we predict that treated larva will have an increase of ciliated cells from 24 to ∼ 32. Hence, we hypothesized that, ciliary genes would exhibit a change in expression, between treated and control embryos, that reflect such increase in the number of ciliated cells in these two conditions.

To account for biological variability, we define an interval for the expected change of expression, or ratio (r), between 1.3 and 2 (1.3 ≤ r ≤ 2) which reflects the change of gene expression between treated and control samples and it is defined based on the pre- dicted increase of ciliated cells in the treated embryos, and on empirical evidence from observed gene expression increase in ciliary annotated genes in treated samples. More specifically, to better define this ratio interval we examined the distribution of fold-change in genes retrieved from PdumBase [143] with annotations related to cilia (Axonemal, Dynein, Flagellar, Intraflagellar transport (IFT), Kinesin, Tubulin), and compared it with the fold change in genes with non-ciliary-related annotations (Ribosomal and Myosin).

For the comparison of fold change distribution (Figure 3.3), we took the log-fold change logF C, the ratio of the normalized count value per gene estimated during the DE analysis, and based on this value calculated a ratio r. Then, the ratio for a given gene

logF Cj j is be equal to: rj = 2 . r then represents the ratio of normalized expression value of gene j in the treatment over the expression value of j in the control. 103

The distribution of the ratio between the expression levels in treated and normal condi- tions observed in the pre-defined genes sets allowed us to better define the upper endof the interval which was determined to have a value of 2. This ratio interval indicates that ciliary genes are expected to be from slightly upretulated (1.3) to a two-fold upregulated in the Az-treatment. We then use the gene’s ratio as an added sournce of information to potentially appoint genes as candidate ciliary genes affected by the activation β-catenin pathway in the treated embryos.

Identification of Ciliogenesis Candidate Genes by Annotation

Guilt-by-association by Gene Family We first, identified the intersection of known ciliary genes foundin P. dumerilii (1610) and genes included in the DE analysis (17378). A total of 1475 known ciliary genes were found in this intersection. Next, we used EdgeR to identify significantly upregulated (611), downregulated (353) and known genes with no significant change in their expression (511) among these known ciliary genes. We then classified DE genes according to their membership in the known ciliary genes sub-sets (results shown in Table 3.3)

Subsequently, we identified gene names matching the gene families of genes inthe known ciliary genes high confidence set (HC) among gene names ofall P.dumerilii anno- tated transcripts in PdumBase. True matches were filtered by the DE results (i.e upregu- lated, no-change in expression, or downregulated). Upregulated transcripts with match- ing gene names were added to the candidate gene set. This allowed for the expansion of the candidate set by 118 genes ( Table 3.4).

Identification of Ciliogenesis Candidate Genes by Co-expression Analysis

In order to identify ciliogenesis candidate genes from the group of unknown but up- regulated genes, we adopted a multi-stage approach. First we clustered known ciliary genes into groups related to each other by expression similarity. Next, we trained a ran- 104 dom forest classifier using the clustering result as training data and consequently used the trained model to predict cluster labels for the unknown genes.

For the clustering stage, we used expression data obtained during the first 14 hours of normal developing P. dumerilii embryos. RNA was collected for two biological and two technical replicas in intervals of 2 hours from 2 to 14 hpf for a total of seven time points: 2, 4, 6, 8, 10, 12, and 14 hpf. Expression data of the 611 ciliary genes known upregu- lated by the Az-treatment (previously identified) was clustered using hierarchical clus- tering with Pearson correlation as the distance function and Ward’s minimum variance method for the linkage procedure. The number of clusters was defined empirically by using DendroShiny (manuscript in preparation), an in-house visualization tool that allows to explore the resulting cluster content and size as cluster parameters, such as the tree cutoff c, are modified. Cluster content and size were evaluated for all c = [0, 1] in incre- ments of 0.001. We observed only minor changes in the clustering results for c ≤ 0.006. Therefore, we selected c = 0.0055, resulting in 486 clusters which were used as training data for the random forest model. We parameterized the random forest model using the cluster ids for each of the known genes as the response vector and their corresponding expression patterns as predictors (ntree = 3000). This model was found to have an R2 of 0.97, with an adjusted R2 of 0.78 and consequenctly used to predict the cluster labels for the 3906 unknown genes.

Cluster’s Information Criterion In order to guide the exploration large number of clusters, we leverage their informa- tion content, under the rationale that higher information content of clusters correlates with higher biological relevance and a more meaningful the biological interpretation. For this, after classifying the unknown genes and assigning them to a known ciliary gene cluster, we explore the coherence of the resulting clusters by these ranking according to their predictive power based on the information criterion. Let pi,j be the probability of gene j to be assigned to cluster j. The information criterion ICj for gene j is calculated as the sum of the squared probability for j to belong to cluster i over all clusters i ∈ [1...n]: 105

n X 2 ICj = (pij) i=1 Hence, the information criterion values range between 0 to 1 where larger values indicate higher predicted probability of that gene to be assigned to a particular cluster.

Data and Availability

The data, as well as the source code for the computational approach described in this work has been deposited at https://github.com/NataliaAcevedoLuna/Ciliogenesis.git.

Bibliography

[108] Peter N. Inglis, Keith A. Boroevich, and Michel R. Leroux. Piecing together a ciliome. Trends in Genetics, 22(9):491–500, 2006.

[109] Peter Satir, David R. Mitchell, and Gaspar Jekely. Chapter 3 How Did the Cilium Evolve? Current Topics in Developmental Biology, 85:63–82, 2008.

[110] S. P.Choksi, D. Babu, D. Lau, X. Yu, and S. Roy. Systematic discovery of novel ciliary genes through functional genomics in the zebrafish. Development, 141(17):3410– 3419, 2014.

[111] Semil P. Choksi, Gilbert Lauter, Peter Swoboda, and Sudipto Roy. Switching on cilia: transcriptional networks regulating ciliogenesis. Development (Cambridge, England), 141(7):1427–41, 2014.

[112] Shinya Ohata and Arturo Alvarez-Buylla. Planar Organization of Multiciliated Ependymal (E1) Cells in the Brain Ventricular . Trends in Neurosciences, 2016.

[113] Ann E. Tilley, Matthew S. Walters, Renat Shaykhiev, and Ronald G. Crystal. Cilia Dysfunction in Disease. Annual Review of , 2015.

[114] Johanna Raidt, Claudius Werner, Tabea Menchen, Gerard W. Dougherty, Heike Ol- brich, Niki T. Loges, Ralf Schmitz, Petra Pennekamp, and Heymut Omran. Ciliary function and motor protein composition of human fallopian tubes. Human Repro- duction, 2015.

[115] Laura E. Yee and Jeremy F. Reiter. Ciliary vesicle formation: A prelude to ciliogen- esis. Developmental Cell, 32(6):665–666, 2015. 106

[116] Friedhelm Hildebrandt, Thomas Benzing, and Nicholas Katsanis. Ciliopathies. The New England journal of medicine, 364(16):1533–43, 4 2011.

[117] Adrian Gherman, Erica E Davis, and Nicholas Katsanis. The ciliary proteome database: an integrated community resource for the genetic and functional dis- section of cilia. Nature genetics, 38(9):961–2, 9 2006.

[118] Olivier Arnaiz, Agata Malinowska, Catherine Klotz, Linda Sperling, Michal Dadlez, France Koll, and Jean Cohen. Cildb: a knowledgebase for and cilia. Database : the journal of biological databases and curation, 2009(0):bap022, 1 2009.

[119] Teunis J.P. Van Dam, Gabrielle Wheway, Gisela G. Slaats, Martijn A. Huynen, and Rachel H. Giles. The SYSCILIA gold standard (SCGSv1) of known ciliary compo- nents and its applications within a systems biology consortium. Cilia, 2013.

[120] Jeffrey S C Chu, David L Baillie, and Nansheng Chen. Convergent evolution of RFX transcription factors and ciliary genes predated the origin of metazoans. BMC evolutionary biology, 10:130, 2010.

[121] Mei I. Chung, Taejoon Kwon, Fan Tu, Eric R. Brooks, Rakhi Gupta, Matthew Meyer, Julie C. Baker, Edward M. Marcotte, and John B. Wallingford. Coordinated genomic control of ciliogenesis and cell movement by RFX2. eLife, 3:e01439, 2014.

[122] Shubha Vij, Jochen C. Rink, Hao Kee Ho, Deepak Babu, Michael Eitel, Vi- jayashankaranarayanan Narasimhan, Varnesh Tiku, Jody Westbrook, Bernd Schier- water, and Sudipto Roy. Evolutionarily Ancient Association of the FoxJ1 Transcrip- tion Factor with the Motile Ciliogenic Program. PLoS Genetics, 8(11), 2012.

[123] Zita Carvalho-Santos, Juliette Azimzadeh, José B. Pereira-Leal, and Mónica Bettencourt-Dias. Tracing the origins of centrioles, cilia, and flagella. Journal of Cell Biology, 2011.

[124] Francesc R. Garcia-Gonzalo and Jeremy F. Reiter. Scoring a backstage pass: Mechanisms of ciliogenesis and ciliary access. Journal of Cell Biology, 2012.

[125] Florian Raible and Kristin Tessmar-Raible. Platynereis dumerilii. Current Biology, 24(15), 2014.

[126] Juliane Zantke, Stephanie Bannister, Vinoth Babu Veedin Rajan, Florian Raible, and Kristin Tessmar-Raible. Genetic and genomic tools for the marine annelid Platynereis dumerilii. Genetics, 197(1):19–31, 2014.

[127] Oleg Simakov, Tomas A. Larsson, and Detlev Arendt. Linking micro-and macro- evolution at the cell type level: A view from the lophotrochozoan Platynereis dumer- ilii. Briefings in Functional Genomics, 12(5):430–439, 2013.

[128] Florian Raible, Kristin Tessmar-Raible, Kazutoyo Osoegawa, Patrick Wincker, Claire Jubin, Guillaume Balavoine, David Ferrier, Vladimir Benes, Pieter de Jong, Jean 107

Weissenbach, Peer Bork, and Detlev Arendt. Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science (New York, N.Y.), 310(5752):1325–6, 11 2005.

[129] Rudolf A. Raff. Origins of the other metazoan body plans: The evolution of larval forms. Philosophical Transactions of the Royal Society B: Biological Sciences, 2008.

[130] O. Arnaiz, J. Cohen, A. M. Tassin, and F. Koll. Remodeling Cildb, a popular database for cilia and links for ciliopathies. Cilia, (SUPPLEMENT 1), 2015.

[131] Monika Abedin Sigg, Tabea Menchen, Chanjae Lee, Jeffery Johnson, Melissa K. Jungnickel, Semil P.Choksi, Galo Garcia, Henriette Busengdal, Gerard W. Dougherty, Petra Pennekamp, Claudius Werner, Fabian Rentzsch, Harvey M. Florman, Nevan Krogan, John B. Wallingford, Heymut Omran, and Jeremy F. Reiter. Evolutionary Proteomics Uncovers Ancient Associations of Cilia with Signaling Pathways. De- velopmental Cell, 2017.

[132] Jeremy F. Reiter and Michel R. Leroux. Genes and molecular pathways underpin- ning ciliopathies. Nature Reviews Molecular Cell Biology, 2017.

[133] Ian K. Quigley and Chris Kintner. Rfx2 Stabilizes Foxj1 Binding at Chromatin Loops to Enable Multiciliated Cell Gene Expression. PLoS Genetics, 13(1), 2017.

[134] Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lip- man. Basic local alignment search tool. Journal of Molecular Biology, 1990.

[135] Vincent Bertrand. β-catenin-driven binary cell fate decisions in animal develop- ment. Wiley Interdisciplinary Reviews: Developmental Biology, 2016.

[136] Stephan Q. Schneider and Bruce Bowerman. β-Catenin Asymmetries after All An- imal/Vegetal- Oriented Cell Divisions in Platynereis dumerilii Embryos Mediate Bi- nary Cell-Fate Specification. Developmental Cell, 13(1):73–86, 2007.

[137] Karen R. Christie and Judith A. Blake. Sensing the cilium, digital capture of ciliary data for comparative genomics investigations. Cilia, 7(1), 2018.

[138] Alexander E. Ivliev, Peter A.C. ’t Hoen, Willeke M.C. van Roon-Mom, Dorien J.M. Peters, and Marina G. Sergeeva. Exploring the transcriptome of ciliated cells using in silico dissection of human tissues. PloS one, 2012.

[139] Mark A. Pirner and Richard W. Linck. Tektins are heterodimeric polymers in flag- ellar microtubules with axial periodicities matching the tubulin lattice. Journal of Biological Chemistry, 1994.

[140] Jan M. Norrander, Catherine A. Perrone, Linda A. Amos, and Richard W. Linck. Structural comparison of tektins and evidence for their determination of complex spacings in flagellar microtubules. Journal of Molecular Biology, 1996. 108

[141] Yasunobu Okamura, Takeshi Obayashi, and Kengo Kinoshita. Comparison of gene coexpression profiles and construction of conserved gene networks to find func- tional modules. PLoS ONE, 2015.

[142] Joshua M. Stuart, Eran Segal, Daphne Koller, and Stuart K. Kim. A gene- coexpression network for global discovery of conserved genetic modules. Science, 2003.

[143] Hsien Chao Chou, Natalia Acevedo-Luna, Julie A. Kuhlman, and Stephan Q. Schnei- der. PdumBase: A transcriptome database and research tool for Platynereis dumer- ilii and early development of other metazoans. BMC Genomics, 19(1):1–11, 2018.

[144] Ian K. Quigley and Chris Kintner. Rfx2 Stabilizes Foxj1 Binding at Chromatin Loops to Enable Multiciliated Cell Gene Expression. PLoS Genetics, 13(1), 2017.

[145] Shuchun Li, Yuan Qiao, Qian Di, Xiuning Le, Lei Zhang, Xiaosong Zhang, Changy- ong Zhang, Jie Cheng, Shudong Zong, Samuel S. Koide, Shiying Miao, and Ling- fang Wang. Interaction of SH3P13 and DYDC1 protein: a germ cell component that regulates acrosome biogenesis during spermiogenesis. European Journal of Cell Biology, 2009.

[146] Alex Bateman, Maria Jesus Martin, Claire O’Donovan, Michele Magrane, Rolf Apweiler, Emanuele Alpi, Ricardo Antunes, Joanna Arganiska, Benoit Bely, Mark Bingley, Carlos Bonilla, Ramona Britto, Borisas Bursteinas, Gayatri Chavali, Elena Cibrian-Uhalte, Alan Da Silva, Maurizio De Giorgi, Tunca Dogan, Francesco Fazz- ini, Paul Gane, Leyla Garcia Castro, Penelope Garmiri, Emma Hatton-Ellis, Reija Hi- eta, Rachael Huntley, Duncan Legge, Wudong Liu, Jie Luo, Alistair Macdougall, Pru- dence Mutowo, Andrew Nightingale, Sandra Orchard, Klemens Pichler, Diego Poggi- oli, Sangya Pundir, Luis Pureza, Guoying Qi, Steven Rosanoff, Rabie Saidi, Tony Saw- ford, Aleksandra Shypitsyna, Edward Turner, Vladimir Volynkin, Tony Wardell, Xavier Watkins, Hermann Zellner, Andrew Cowley, Luis Figueira, Weizhong Li, Hamish McWilliam, Rodrigo Lopez, Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Syl- vain Poux, Nicole Redaschi, Lucila Aimo, Ghislaine Argoud-Puy, Andrea Auchin- closs, Kristian Axelsen, Parit Bansal, Delphine Baratin, Marie Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Emmanuel Boutet, Lionel Breuza, Cristina Casal- Casas, Edouard De Castro, Elisabeth Coudert, Beatrice Cuche, Mikael Doche, Dol- nide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Vi- cente Lara, Philippe Lemercier, Damien Lieberherr, Thierry Lombardot, Xavier Mar- tin, Patrick Masson, Anne Morgat, Teresa Neto, Nevila Nouspikel, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Monica Pozzato, Manuela Pruess, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne Lise Veuthey, Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Chuming Chen, Yongx- ing Chen, John S. Garavelli, Hongzhan Huang, Kati Laiho, Peter McGarvey, Dar- ren A. Natale, Baris E. Suzek, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai Su Yeh, 109

Meher Shruti Yerramalla, and Jian Zhang. UniProt: A hub for protein information. Nucleic Acids Research, 43(D1):D204–D212, 2015.

[147] Christoph Petersen, Gerhard Aumüller, Masoud Bahrami, and Sigrid Hoyer-Fender. Molecular Cloning of Odf3 Encoding a Novel Coiled-Coil Protein of Sperm Tail Outer Dense Fibers. Molecular Reproduction and Development, 2002.

[148] William Y. Tsang, Alexander Spektor, Sangeetha Vijayakumar, Bigyan R. Bista, Ji Li, Irma Sanchez, Stefan Duensing, and Brian D. Dynlacht. Cep76, a Centrosomal Pro- tein that Specifically Restrains Centriole Reduplication. Developmental Cell, 2009.

[149] Bahareh A. Mojarad, Gagan D. Gupta, Monica Hasegan, Oumou Goudiam, Renata Basto, Anne Claude Gingras, and Laurence Pelletier. CEP19 cooperates with FOP and CEP350 to drive early steps in the ciliogenesis programme. Open biology, 2017.

[150] Esra E. Ylldlz Bölükbasi, Sara Mumtaz, Muhammad Afzal, Ute Woehlbier, Sajid Ma- lik, and Asllhan Tolun. Homozygous mutation in CEP19, a gene mutated in morbid obesity, in Bardet-Biedl syndrome with predominant postaxial polydactyly. Journal of Medical Genetics, 2018.

[151] Maureen Wirschell, Heike Olbrich, Claudius Werner, Douglas Tritschler, Raqual Bower, Winfield S. Sale, Niki T. Loges, Petra Pennekamp, Sven Lindberg, Unne Sten- ram, Birgitta Carlén, Elisabeth Horak, Gabriele Köhler, Peter Nürnberg, Gudrun Nürn- berg, Mary E. Porter, and Heymut Omran. The nexin-dynein regulatory complex subunit DRC1 is essential for motile cilia function in algae and humans. Nature Genetics, 2013.

[152] Yujie Li, Qing Wei, Yuxia Zhang, Kun Ling, and Jinghua Hu. The small GTPases ARL-13 and ARL-3 coordinate intraflagellar transport and ciliogenesis. Journal of Cell Biology, 2010.

[153] Susanne Graser, York Dieter Stierhof, Sébastien B. Lavoie, Oliver S. Gassner, Stefan Lamla, Mikael Le Clech, and Erich A. Nigg. Cep164, a novel centriole appendage protein required for primary cilium formation. Journal of Cell Biology, 2007.

[154] K. Joo, C. G. Kim, M.-S. Lee, H.-Y. Moon, S.-H. Lee, M. J. Kim, H.-S. Kweon, W.-Y. Park, C.-H. Kim, J. G. Gleeson, and J. Kim. CCDC41 is required for ciliary vesicle docking to the mother centriole. Proceedings of the National Academy of Sciences, 2013.

[155] Barbara E. Tanos, Hui Ju Yang, Rajesh Soni, Won Jing Wang, Frank P. Macaluso, John M. Asara, and Meng Fu Bryan Tsou. Centriole distal appendages promote membrane docking, leading to cilia initiation. Genes and Development, 2013.

[156] Yannis Nevers, Megana K. Prasad, Laetitia Poidevin, Kirsley Chennen, Alexis Allot, Arnaud Kress, Raymond Ripp, Julie D. Thompson, Hélene Dollfus, Olivier Poch, and Odile Lecompte. Insights into ciliary genes and evolution from multi-level phyloge- netic profiling. Molecular Biology and Evolution, 34(8):2016–2034, 2017. 110

[157] Julia Wallmeier, Dalal A. Al-Mutairi, Chun Ting Chen, Niki Tomas Loges, Petra Pen- nekamp, Tabea Menchen, Lina Ma, Hanan E. Shamseldin, Heike Olbrich, Gerard W. Dougherty, Claudius Werner, Basel H. Alsabah, Gabriele Köhler, Martine Jaspers, Mieke Boon, Matthias Griese, Sabina Schmitt-Grohé, Theodor Zimmermann, Cor- dula Koerner-Rettberg, Elisabeth Horak, Chris Kintner, Fowzan S. Alkuraya, and Hey- mut Omran. Mutations in CCNO result in congenital mucociliary clearance disorder with reduced generation of multiple motile cilia. Nature Genetics, 2014.

[158] Alice Meunier and Juliette Azimzadeh. Multiciliated cells in animals. Cold Spring Harbor Perspectives in Biology, 2016.

[159] Eric R. Brooks and John B. Wallingford. Multiciliated Cells. Current Biology, 2014.

[160] Joppe Nieuwenhuis, Athanassios Adamopoulos, Onno B. Bleijerveld, Abdelghani Mazouzi, Elmer Stickel, Patrick Celie, Maarten Altelaar, Puck Knipscheer, Anastassis Perrakis, Vincent A. Blomen, and Thijn R. Brummelkamp. Vasohibins encode tubulin detyrosinating activity. Science, 2017.

[161] Huijie Zhao, Lei Zhu, Yunlu Zhu, Jingli Cao, Shanshan Li, Qiongping Huang, Tao Xu, Xiao Huang, Xiumin Yan, and Xueliang Zhu. The cep63 paralogue deup1 enables massive de novo centriole biogenesis for vertebrate multiciliogenesis. Nature Cell Biology, 2013.

[162] Juliette Azimzadeh, Mei Lie Wong, Diane Miller Downhour, Alejandro Sánchez Al- varado, and Wallace F. Marshall. Centrosome loss in the evolution of planarians. Science, 2012.

[163] Juliette Azimzadeh and Cyril Basquin. Basal bodies across eukaryotes series: Basal bodies in the freshwater planarian Schmidtea mediterranea. Cilia, 2016.

[164] Pascale Gaudet, Michael S. Livstone, Suzanna E. Lewis, and Paul D. Thomas. Phylogenetic-based propagation of functional annotations within the Gene Ontol- ogy consortium. Briefings in Bioinformatics, 2011.

[165] Linn Fagerberg, Björn M Hallström, Per Oksvold, Caroline Kampf, Dijana Djureinovic, Jacob Odeberg, Masato Habuka, Simin Tahmasebpoor, Angelika Danielsson, Karolina Edlund, Anna Asplund, Evelina Sjöstedt, Emma Lundberg, Cristina Al-Khalili Szigyarto, Marie Skogs, Jenny Ottosson Takanen, Holger Berling, Hanna Tegel, Jan Mulder, Peter Nilsson, Jochen M Schwenk, Cecilia Lindskog, Frida Danielsson, Adil Mardinoglu, Asa Sivertsson, Kalle von Feilitzen, Mattias Forsberg, Martin Zwahlen, IngMarie Olsson, Sanjay Navani, Mikael Huss, Jens Nielsen, Fredrik Ponten, and Mathias Uhlén. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Molecular & cellular proteomics : MCP, 2014.

[166] Patrick Lorès, Charles Coutton, Elma El Khouri, Laurence Stouvenel, Maëlle Givelet, Lucie Thomas, Baptiste Rode, Alain Schmitt, Bruno Louis, Zeinab Sakheli, Marhaba Chaudhry, Angeles Fernandez-Gonzales, Alex Mitsialis, Denis Dacheux, 111

Jean Philippe , Jean François Papon, Gérard Gacon, Estelle Escudier, Christophe Arnoult, Mélanie Bonhivers, Sergey N. Savinov, Serge Amselem, Pierre F. Ray, Emmanuel Dulioust, and Aminata Touré. Homozygous missense mutation L673P in adenylate kinase 7 (AK7) leads to primary male infertility and multiple mor- phological anomalies of the flagella but not to primary ciliary dyskinesia. Human Molecular Genetics, 2018.

[167] Esther Kott, Philippe Duquesnoy, Bruno Copin, Marie Legendre, Florence Dastot- Le Moal, Guy Montantin, Ludovic Jeanson, Aline Tamalet, Jean François Papon, Jean Pierre Siffroi, Nathalie Rives, Valérie Mitchell, Jacques De Blic, André Coste, Annick Clement, Denise Escalier, Aminata Touré, Estelle Escudier, and Serge Amse- lem. Loss-of-function mutations in LRRC6, a gene essential for proper axonemal assembly of inner and outer dynein arms, cause primary ciliary dyskinesia. Ameri- can Journal of Human Genetics, 2012.

[168] Linda A. Amos. The tektin family of microtubule-stabilizing proteins. Genome Biology, 2008.

[169] Rebecca Ryan, Marion Failler, Madeline Louise Reilly, Meriem Garfa-Traore, Mar- ion Delous, Emilie Filhol, Thérèse Reboul, Christine Bole-Feysot, Patrick Nitschké, Véronique Baudouin, Serge Amselem, Estelle Escudier, Marie Legendre, Alexandre Benmerah, and Sophie Saunier. Functional characterization of tektin-1 in motile cilia and evidence for TEKT1 as a new candidate gene for motile ciliopathies. Human Molecular Genetics, 2018.

[170] Erin E. Dymek and Elizabeth F. Smith. A conserved CaM- and radial spoke- associated complex mediates regulation of flagellar dynein activity. Journal of Cell Biology, 2007.

[171] E. N. Firat-Karalar, J. Sante, S. Elliott, and T. Stearns. Proteomic analysis of mam- malian sperm cells identifies new components of the centrosome. Journal of Cell Science, 2014.

[172] Lorna I. Neilson, Patrick A. Schneider, Peter G. Van Deerlin, Marianthi Kiriakidou, Deborah A. Driscoll, Maria C. Pellegrini, Shawn Millinder, Karen K. Yamamoto, Cyn- thia K. French, and Jerome F. Strauss. cDNA cloning and characterization of a hu- man sperm antigen (SPAG6) with to the product of the Chlamydomonas PF16 . Genomics, 1999.

[173] Hannah M. Mitchison, Miriam Schmidts, Niki T. Loges, Judy Freshour, Athina Drit- soula, Rob A. Hirst, Christopher O’Callaghan, Hannah Blau, Maha Al Dabbagh, Heike Olbrich, Philip L. Beales, Toshiki Yagi, Huda Mussaffi, Eddie M.K. Chung, Heymut Omran, and David R. Mitchell. Mutations in axonemal dynein assembly factor DNAAF3 cause primary ciliary dyskinesia. Nature Genetics, 2012.

[174] Michael R. Knowles, Maimoona Zariwala, and Margaret Leigh. Primary Ciliary Dyskinesia. Clinics in Chest Medicine, 2016. 112

[175] Hiroto Shinomiya. Plastin Family of Actin-Bundling Proteins: Its Functions in Leukocytes, Neurons, Intestines, and Cancer. International Journal of Cell Biology, 2012.

[176] N.German Pasteris, Koh-ichi Nagata, Alan Hall, and Jerome L. Gorski. Isolation, characterization, and mapping of the mouse Fgd3 gene, a new Faciogenital Dys- plasia (FGD1; Aarskog Syndrome) gene homologue. Gene, 2002.

[177] Joseph E. Tym, Costas Mitsopoulos, Elizabeth A. Coker, Parisa Razaz, Amanda C. Schierz, Albert A. Antolin, and Bissan Al-Lazikani. canSAR: An updated cancer re- search and drug discovery knowledgebase. Nucleic Acids Research, 2016.

[178] Mei I. Chung, Sara M. Peyrot, Sarah LeBoeuf, Tae Joo Park, Kriston L. McGary, Ed- ward M. Marcotte, and John B. Wallingford. RFX2 is broadly required for ciliogenesis during vertebrate development. Developmental Biology, 363(1):155–165, 2012.

[179] Javier Herrero, Matthieu Muffato, Kathryn Beal, Stephen Fitzgerald, Leo Gordon, Miguel Pignatelli, Albert J. Vilella, Stephen M.J. Searle, Ridwan Amode, Simon Brent, William Spooner, Eugene Kulesha, Andrew Yates, and Paul Flicek. Ensembl com- parative genomics resources. Database, 2016.

[180] Fiona Cunningham, Premanand Achuthan, Wasiu Akanni, James Allen, M Rid- wan Amode, Irina M Armean, Ruth Bennett, Jyothish Bhai, Konstantinos Billis, Sanjay Boddu, Carla Cummins, Claire Davidson, Kamalkumar Jayantilal Dodiya, Astrid Gall, Carlos García Girón, Laurent Gil, Tiago Grego, Leanne Haggerty, Erin Haskell, Thibaut Hourlier, Osagie G Izuogu, Sophie H Janacek, Thomas Juettemann, Mike Kay, Matthew R Laird, Ilias Lavidas, Zhicheng Liu, Jane E Loveland, José C Marugán, Thomas Maurel, Aoife C McMahon, Benjamin Moore, Joannella Morales, Jonathan M Mudge, Michael Nuhn, Denye Ogeh, Anne Parker, Andrew Parton, Ma- teus Patricio, Ahamed Imran Abdul Salam, Bianca M Schmitt, Helen Schuilenburg, Dan Sheppard, Helen Sparrow, Eloise Stapleton, Marek Szuba, Kieron Taylor, Glen Threadgold, Anja Thormann, Alessandro Vullo, Brandon Walts, Andrea Winterbot- tom, Amonida Zadissa, Marc Chakiachvili, Adam Frankish, Sarah E Hunt, Myrto Kostadima, Nick Langridge, Fergal J Martin, Matthieu Muffato, Emily Perry, Mag- ali Ruffier, Daniel M Staines, Stephen J Trevanion, Bronwen L Aken, Andrew DYates, Daniel R Zerbino, and Paul Flicek. Ensembl 2019. Nucleic Acids Research, 2018.

[181] Hsien-Chao Chou, Margaret M Pruitt, Benjamin R Bastin, and Stephan Q Schneider. A transcriptional blueprint for a spiral-cleaving embryo. BMC genomics, 17:552, 2016.

[182] Anthony M. Bolger, Marc Lohse, and Bjoern Usadel. Trimmomatic: A flexible read trimming tool for Illumina NGS data. Bioinformatics, 2014.

[183] Assaf Gordon and G J Hannon. Fastx-toolkit. FASTQ/A short-reads pre-processing tools. Unpublished http://hannonlab. cshl. edu/fastx_ toolkit, 2010. 113

[184] Bo Li, Victor Ruotti, Ron M. Stewart, James A. Thomson, and Colin N. Dewey. RNA- Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 2009.

[185] M D Robinson and A Oshlack. A scaling normalization method for differential ex- pression analysis of RNA-seq data. Genome Biol, 11(3):R25, 2010.

[186] Steven P.Lund, Dan Nettleton, Davis J. McCarthy, and Gordon K. Smyth. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology, 2012. 114

DENDROSHINY: A DYNAMIC VISUALIZATION TOOL FOR THE ANALYSIS OF GENOME WIDE GENE EXPRESSION DATA

Manuscript to be published in a modified form as an Application note in Oxford Bioinformatics

NATALIA ACEVEDO-LUNA 1,2, STEPHAN Q. SCHNEIDER1+, AND HEIKE HOFMANN2*

1 Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA 50011 USA.

2 Statistics Department, Iowa State University, Ames, IA 50011 USA.

+ Current address: Institute of Cellular and Organismic Biology, Academia Sinica. Taipei, Taiwan

* Corresponding author

Abstract

Modern genome-wide gene expression analysis approaches routinely yield pro- hibitively large sets of potential genes of interest which require further post-processing in order to distill useful information from the data. One such approach attempts to assign potential functions to poorly characterized genes by clustering annotated genes with non-annotated candidates based on their expression patterns.

The success of these methods is highly dependent on a sensible choice of numerous clustering parameters. Making an informed decision regarding these parameters and interpreting the clustering results however is challenged by a current lack of user friendly, graphical, and interactive visualization tools.

To close this gap, we have developed DendroShiny, a tool to interactively examine gene expression clusters as the clustering parameters are adjusted. DendroShiny (1) clusters the genes based on the expression profiles of a set of well characterized genes and uses the features of that set to classify non-annotated genes, (2) displays an interactive 115 gene tree representing the similarity among the gene expression patterns, (3) allows the user to define the final number of clusters and visualizes the resulting gene sets,(4) interactively displays the expression profiles and meta data of genes in each cluster, (5) allows to browse the clusters by gene name.

We exemplify the use of DendroShiny through the analysis of RNA-seq data from the early embryonic development of Platynereis dumerilii, an annelid animal model. Here, DendroShiny allowed for the identification of candidate ciliary genes despite the lackof an annotated genome. Overall, DendroShiny allows the user to intuitively explore clus- tered gene expression data and has the potential to facilitate the downstream analysis of transcriptomic data sets.

Availability

A reference implementation of DendroShiny is currently available at http://sqs- lab.gdcb.iastate.edu:3838/dendroshiny while the source code has been deposited at https://github.com/NataliaAcevedoLuna/DendroShiny.

Introduction

The advent of efficient sequencing technologies allows the access to genome and transcriptome sequences in an unprecedented manner exceeding our technical capa- bilities to interpret the information they encode. Despite the effort of several groups, in- cluding but not limited to Ensembl [187], RefSeq [188], GENCODE [189], and the Gene On- tology Resource [190], functional gene annotation still remains a challenging computa- tional task. For instance, transcriptomic analysis throughout early developmental stages of Drosophila melanogaster, a well-established model organism, has shown that about 30% of transcribed map to non-annotated intronic and intergenic genome regions. The biological function of the majority of these non-annotated transcribed re- gions remains unknown [191]. In humans, about half of protein coding regions and their corresponding genes remain non-annotated [192, 193]. 116

Functional annotation represents an even greater challenge when working with non conventional model organisms due to the frequent absence of a reference genome.

In such circumstances, researches rely on de novo transcriptome assemblies in which RNA-seq reads are compiled into gene transcripts in a genome independent manner [194, 195, 196]. Functional annotation of such transcripts is frequently carried out by automated computational pipelines. These approaches frequently rely on homology queries searching for similar sequences in other annotated model organisms to infer function for each assembled transcript. This however introduces a bias towards tradi- tional genetic models [197] as a greater body of research for these typically mammal animal models exists. Due to its automated nature, this process is also prone to high levels of noise and false prediction which, in combination with the complexity of the assembly, its sensitivity to sequencing errors, and the frequently encountered chimeric transcripts (transcripts with assembled regions belonging to different genes)[194], can lead to incomplete transcriptome annotation. However, with continuous development of improved and affordable sequencing technologies, and the prospect of the read length becoming longer than that of the transcripts, a number of current challenges have the potential to soon be resolved. Until then, functional annotation still requires a great detail of manual curation, a process that is time consuming and often prohibitively expensive in a large number of studies.

When functional annotation is applied to the cellular and/or organismal level, one is frequently concerned with the functional discovery of sets of genes that jointly perform complex cellular functions. In this scope, analysis of co-expression networks has shown that genes with similar expression features have an increased likelihood of being asso- ciated with a common function [198]. A frequent functional annotation approach for genes therefore relies on identifying groups of genes that correlate in their expression patterns [199]. Furthermore, the analysis of such co-expression networks has lead to the discovery of genes related to a specific function in a particular cell and/or tissue type [200], or to genes involved in a distinctive pathological phenotype [201, 202]. 117

However, identifying groups of genes related to each other through a common expres- sion pattern is not a trivial task. A major reason for this, concerns the need to choose specific parameters for the clustering approaches which must reflect the underlying bi- ology of the data. Therefore, manual adjustment is usually required for parameter selec- tion during gene clustering. This variability in experimental design, biological question, and the resulting RNA-seq data is reflected in the large number of analysis pipelines published to date (see Ballouz, S. et al. [203] and van Dam, U. et al. [204] for a detailed overview), each of which offer parameter choices tailored to the particular setup at hand. Efficient data visualization at strategic stages of these pipelines can dramatically aidthis process but remains challenging given the size of a typical high throughput sequencing data set. To the best of our knowledge however, none of the currently available solutions allows for an interactive and intuitive approach for optimal parameter selection.

In order to close this gap we present DendroShiny, a dynamic visualization tool for the exploration of genome wide co-expression networks. Our tool allows for the visual inspection of the co-expression groups as the cluster parameters are modified. Den- droShiny has shown to facilitate the selection of the clustering cutoff based on the bio- logical question at hand, as it provides the user with an interactive visualization that is updated as the clustering settings are adjusted. In addition DendroShiny enables effi- cient interactive exploration of the co-expression clusters by allowing the user to select gene groups based on size, annotated gene content (genes with known function), or by searching by gene name or transcript id of interest.

DendroShiny is implemented in R [205] and Shiny [206], and takes as input expression data over different conditions or time points as well as the associated metadata which can include an array of different type of information such as functional annotation, gene name, phenotypic association, and more. Our tool then produces a queryable dynamic visualization of the corresponding co-expression clusters using information from the functionally annotated genes to derive aspects of the poorly characterized ones. At its core, DendroShiny takes a subset of feature genes corresponding to the known or well 118 characterized genes, and clusters their expression profiles based on a pre-determined similarity measure. DendroShiny then trains a supervised learning model based on these clusters and uses it to derive similarities between the expression patterns of the known and unknown genes (see Figure 4.1).

We have applied DendroShiny to identify previously non-annotated genes potentially involved in ciliary assembly in a non-conventional animal model, the annelid Platynereis dumerilii. In this case study, DendroShiny allowed us to infer gene function for unknown (non-annotated) genes from genes with known function that show similar expression profile during early developmental stages.

METADATA EXPRESSION

Figure 4.1: Data flow overview of DendroShiny. DendroShiny requires two input files: metadata and expression file. One column of the metadata file defines the subset of genes tocluster and to train the model. The user can select the high tree cut cutoff c which determines the number of clusters obtained. Once the tree is calculated the user can select tree granularity which changes the display of the dendrogram. Cluster content can be accessed by clicking on a particular cluster, represented by a circle, or by searching the cluster content by gene name(s) or transcript ID(s) 119

Materials and Methods

Data Input

DendroShiny uses R for its server-side computations and takes as input expression data time series for a set of genes (or transcripts), as well as corresponding metadata in- cluding but not limited to gene annotation, differential expression information, and more (See Figure 4.1). For a detailed description of the input format, we refer the reader to the manual integrated within DendroShiny.

Data Prepossessing

On startup, our approach first filters out any genes which have previously notbeen identified as differentially expressed. The remaining genes are subsequently separated into two subsets corresponding to annotated and non-annotated genes according to their metadata information.

Next, a similarity matrix between all pairs of annotated genes is computed. As dis- tance measure, the Pearson correlation between the log transformed expression vec- tors of every gene pair is used. This matrix then forms the input to an agglomerative clustering algorithm using Ward’s clustering criterion in order to generate a hierarchical representation of the proximity between the input genes in form of a dendrogram.

Cluster Generation and Classification

Given a user defined similarity threshold value c, a set of clusters is computed from the previously performed clustering process by cutting the dendrogram at height c and designating the resulting sub-trees as individual clusters with their leafs (genes) as mem- bers. Each cluster is consequently assigned a unique identifier in no particular order.

To classify the set of non-annotated genes, DendroShiny relies on training a random forest[207] with cluster information from the annotated genes. Specifically, our methods 120 trains a random forest with 3000 trees using the gene expression profiles together with their corresponding cluster ids as predictors and response variables, respectively. The trained model is then used to derive similarities between poorly characterized genes and annotated genes in a computationally feasible manner.

We note that the use of a random forest to classify the non-annotated genes into the clusters of annotated genes, is computationally more efficient compared to the naive approach of jointly clustering the full dataset (annotated and non-annotated genes). This operation has quadratic time complexity and therefore grows exponentially with the number of data points. Instead, DendroShiny performs clustering on only a small subset of genes (annotated only) and combines this result with with efficient training and prediction routines for the random forest. In addition, our approach requires sig- nificantly less memory for its operation making it an ideal application for real-time web interface usage.

The Web Interface

The client-side interface of DendroShiny is written in Shiny [208] and is composed of two vertically separated sections, housing the configuration widgets of the various user-definable parameters for clustering and visualization on the left, and the interactive visualization of the results on the right.

In the parameter panel, users can choose the cutoff c as described in Material and Methods, which defines the number of clusters that will be generated from thepre- computed dendrogram. Note that due to the potential size of the input data, this might result in a significant amount of clusters, making any attempt at visualizing their hierar- chical relationship challenging. To tackle this issue, DendroShiny features a tree granu- larity option which collapses branches of the dendrogram below a user-defined height threshold (selected in percent) in order to improve visibility and navigability of the data (see Figure 4.2 A). In addition, our approach allows the user to dynamically select which time points of the expression data should be included in the analysis. 121

Figure 4.2: Screen capture of the web interface of DendroShiny A. Parameter panel where Cut- off c, Tree granularity and Time points can be adjusted. B. Dendrogram of the gene clustering. Each cluster is represented as a circle, size is proportional to the number of genes in the clus- ter and color represents the number of annotated/known genes. C. Search dataset: the user can browse the cluster content by typing gene name or transcript id. Selected cluster content: displays gene names of genes in the selected cluster. Genes from this display can be selected and its corresponding expression profile will be highlighted. D. Parallel coordinate plots showing the expression patterns of selected clusters. Clusters can be selected by clicking on their circled representation from panel A, or by searching their content by gene name(s) in panel C. 122

Based on the combination of the above described parameters, DendroShiny generates a tree representation of the clustering results in which each leaf consists of one or more clusters and whose height is trimmed at the desired tree granularity. Clusters are styled as individual circles whose size is proportional to their number of genes and which are color coded according to the number of annotated genes they contain (see Figure 4.2 B). Additional cluster information including the cluster id, the cluster size, and the number of annotated and non-annotated genes is displayed in a text box when hovering over the cluster of interest.

Upon selecting one or more clusters, DendroShiny proceeds to query the expression and annotation data of the contained genes, and to display expression as parallel co- ordinate plots for each cluster as follows: the expression patterns of annotated genes are shown in green whereas red lines correspond to non-annotated genes predicted by the random forest to belong to this cluster. Furthermore, the average expression of the annotated genes is highlighted in blue. Hovering over any of the expression patterns will display additional information such as the unique gene identifier and, if available, anno- tation data (see Figure 4.2 D). As an alternative to manually selecting individual clusters, the dataset can also be searched for gene names and alternative annotations using the ”Search Dataset” function (see Figure 4.2 C). Here, an arbitrary number of search terms can be selected from the widget, resulting in the clusters containing the genes matching the selection to be highlighted in the dendrogram, as well as their corresponding expres- sion plots being displayed. The expression patterns of the queried genes are highlighted in black.

Each of the individual expression plots can further be selected to allow for analysis in added detail. Specifically, an exhaustive list of all genes comprising the selected cluster is made available to the user under ”Selected Cluster Content”. One or more genes can then be selected from this list and highlighted in the corresponding expression plot to allow for easy comparison of their expression patterns. 123

Case Study

We applied DendroShiny to identify genes potentially involved in ciliogenesis in a non- conventional animal model, the marine annelid Platynereis dumerilii. Ciliogenesis is the biological process in which cilia, small projections from the cell surface, are assembled. Cilia are involved in a variety of functions, both at the cellular and organismal level. Par- ticular to P.dumerilii, multiciliated cells are formed during development to produce a free swimming larva that propels itself through coordinated cilia movement.

We obtained RNA-seq expression data for seven time points comprising the early de- velopmental stages of P. dumerilii from 2 hours to 14 hours post fertilization (hpf) in intervals of 2 hours (i.e. 2, 4, 6, 8, 10, 12, and 14 hpf) and utilized DendroShiny to elu- cidate potential ciliary genes among 4517 genes that were identified to be upregulated in a treatment that triggered the formation of a hyperciliated larva (see Chapter 3). This set includes 611 genes whose annotation is related to a ciliary function in other species which we refer to as known ciliary genes in P. dumerilii. Among the remaining 3906 genes, 1769 are annotated but have previously not been associated with ciliary function in other species while the remaining 2137 genes lack any annotation.

Using DendroShiny’s interactive capabilities, we were able to efficiently identify the optimal clustering parameters (here, c = 0.005, see Methods) which best reflected the underlying biological properties of the data on a global level. Specifically, a tree cutoff value of 0.005 resulted in a clustering landscape in which unknown genes were assigned to the annotated genes with highest expression similarity on average. For this, the clus- tering step of DendroShiny’s pipeline was performed using the expression profiles of the 611 known ciliary genes, while the remaining 3906 genes were classified into the corre- sponding ciliary gene clusters using the random forest approach (see Methods).

To demonstrate how DendroShiny can facilitate the identification of genes potentially involved in ciliary assembly. Two known ciliary genes, fbf1 and mcidas, were selected for visual exploration of their co-expression clusters. These two genes were selected be- 124 cause they have been reported to have critical functions during initiation of ciliogenesis in other species [209, 210].

The first gene, fbf1, is a ciliogenesis precursor known to be involved in microtubule organization and in the docking of centriole distal appendage to a ciliary vesicle [209], an important step in the initial stages of ciliogenesis. The second gene, mcidas, is a tran- scription factor involved in initiation of multiciliated cell differentiation, triggering basal body formation [210]. Hence, the function of both, fbf1 and mcidas, is of major impor- tance to initiate ciliogenesis and identification of their co-expressing genes could aid in better characterizing this biological process in P. dumerilii. However, the homolog to these genes in P.dumerilii were observed to have no co-expressing partners when using the globally optimal clustering parameters.

To tackle this challenge, leveraged DendroShiny’s ability to explore the co-expression clusters of these genes (or any gene) as the cluster parameter c is modified. This hence allows to explore locally optimal clustering solutions which enable the interactive identi- fication of genes with similar behavior to a particular gene of interest.

Appropriate Selection of Clustering Threshold Aids the Identification of Candidate Genes

When increasing the tree cutoff from 0.005 to 0.015, the cluster containing fbf1 changes from a singleton cluster to having four additional co-expressing genes: the known ciliary gene cast (also called hspg2), two annotated genes (zswm6 and zbt20) and one non-annotated transcript (comp226246_c0). Interestingly, the two known genes in this co-expression group have a related function. While fbf1 is involved in cell polar- ization and ciliary basal body docking to the plasma membrane, cast is known to reg- ulate membrane fusion events [211]. Along with these two well characterized known genes are two genes with annotations not previously related with ciliary function. Both of these genes, zswm6 and zbt20, encode for proteins containing a domain and both are associated with functions related with neurogenesis [212, 213]. In addition 125 to these three annotated genes co-expressing with fbf1, the remaining non-annotated transcript appears to have highly similar patterns of expression compared to cast (see Figure 4.3).

Taking into account that the data processing involved a filtering step. In this example, by user choice only the genes upregulated in a treatment that cause hyperciliation in P. dumerilii larva were selected to be included in DendroShiny for the analysis.

Note that by design, our dataset contains only those transcripts which have been iden- tified as upregulated as the result of a treatment that causes hyperciliation in P. dumer- ilii increasing the probability of these genes to be involved in ciliary assembly. The co- expression analysis using DendroShiny therefore facilitates the identification of poten- tial novel candidates ciliary genes respectively among all the upregulated annotated and non-annotated genes.

In this particular example, out of a total of 2137 non-annotated transcripts, Den- droShiny allowed us to identify transcript (comp226246_c0) as a gene potentially in- volved in the initial steps of ciliogenesis based on co-expression with two genes with known function in the initiation of ciliogenesis. More specifically, its potential function is related to docking of ciliary vesicle to the distal appendage of the centriole, and fusion to the ciliary membrane. Experimental validation is required to verify this function, however it highlights the benefit of DendroShiny’s approach in guiding the selection of potential transcripts to validate by providing the user with meaningful data to make an informed decision.

In addition, to implicating a potential novel ciliary gene, two annotated genes, zswm6 and zbt20, whose annotation have not been related to ciliary function, were identified as potential new ciliogenesis candidates. As with the non-annotated transcript, further in-vitro analysis is required to determine the extend of their ciliary function. 126

4 4 10 A 10 B

103 103

2 2 10 10 known ciliary

Expression (log) annotated fbf1 1 1 10 fbf1 10 non-annotated

2 4 6 8 10 12 14 2 4 6 8 10 12 14 Hours Post Fertilization Hours Post Fertilization

Figure 4.3: Impact of tree cut parameter c in co-expression results A. Parallel coordinate plot of fbf1 at c = 0.005. At this low tree cut fbf1 represents a single-gene cluster. B. Parallel coordinate plot of fbf1 co-expression cluster at c = 0.015. The expression of another known ciliary gene (cast), two annotated (zswm6, zbt20) genes and one non-annotated transcript, are depicted in green, red and with a red dotted line, respectively.

Another advantage of DendroShiny’s visualization techniques relates to the explo- ration of the hierarchical similarity structure between clusters. Specifically, our approach allows to easily examine the neighboring clusters to a cluster of interest. Figure 4.4 shows the selection of three clusters belonging to the same branch containing the clus- ter for the known gene fbf1 (Figure 4.4 B) . In this particular example, the remaining clusters exhibit distinct patters of expression as compared to the fbf1 cluster suggesting that the chosen clustering threshold meaningfully partitions the data into biologically - evant groups. This preliminary exploration therefore highlights the utility of DendroShiny in making an informed decision for a purposeful clustering threshold.

Appropriate Time Point Selection Can Dramatically Impact Co-expression Results

One important feature of DendroShiny is that it allows to easily select the expression data to be included in the analysis. In this case study, the expression data includes 7 time points corresponding to 2, 4, 6, 8, 10, 12, and 14 hpf. This information is displayed in the left panel from which the user can select the samples to include in the analysis. 127

A # Known Genes

B Cluster 318 C Cluster 315 104 104

103 103

102 102 Expression (log)

101 101

2 4 6 8 10 12 14 2 4 6 8 10 12 14

D Cluster 316 E Cluster 325 104 104

103 103

102 102

Expression (log) xmin 101 101

2 4 6 8 10 12 14 2 4 6 8 10 12 14 Hours Post Fertilization Hours Post Fertilization

Figure 4.4: Exploration of the clustering neighbourhood with Dendroshiny. A) The resulting den- drogram with c = 0.0015 and with collapsed banches at a 2% granularity for improved visualiza- tion. Selected clusters are depicted in red. B) Co-expression cluster containing gene fbf1 C-D) three exemplifying clusters in the vicinity of B). Note that all clusters exhibit distinct expression patterns compared to B). 128

To exemplify this feature we selected mcidas, a gene that encodes for a transcription factor known to initiate the transcriptional cascade that triggers ciliogenesis [210]. Co- expression results for mcidas at c = 0.005 including all time points (2 to 14 hpf) failed to identify any co-expression partners. However, when excluding 2 hpf from the analysis,

DendroShiny successfully identifies three genes co-expressing with mcidas. Visual ex- ploration of the expression patterns shows that indeed, genes in mcidas cluster behave very similar from 4 hpf on wards (see Figure 4.5.

Under this new clustering settings, mcidas co-expression cluster includes the known ciliary gene wdr97, one annotated transcript (tc1d1) not yet related to a ciliary function, and one non-annotated transcript. This result points towards these two unknown genes (with no known function in ciliogenesis in other species) being involved in the formation of cilia in P. dumerilii. It furthermore highlights the utility of DendroShiny to identify a small set of potential candidates out of 3906 initial genes. However, further experimental validation is required to verify our finding.

This type of analysis highlights how selecting specific time points can be used to de- duce certain properties, biases and noise in the underlying data. In this particular exam- ple, the clustering results without 2hpf suggest that this particular time point contains a higher degree of expression variability compared to the others. It is however worth mentioning that techniques such as this should be use as an exploratory step only in the analysis pipeline in order to fine tune the required parameters.

Discussion and Conclusion

Given the ever increasing complexity and size of modern high throughput sequencing experiments, effective adjustment of analysis parameters in computational processing pipelines is paramount in ensure the underlying biology of the datasets is appropriately represented in silico. This however cannot always be achieved in an automated manner and requires tailored, algorithmic solutions that allow for fast, interactive, and meaning- 129

104 A 104 B

known ciliary 3 3 Expression (log) 10 10 annotated non-annotated mcidas mcidas

2 4 6 8 10 12 14 4 6 8 10 12 14 Hours Post Fertilization Hours Post Fertilization

Figure 4.5: Impact of time point selection in co-expression results A. Parallel coordinate plot of mcidas at c = 0.005, including all time points from 2 to 14 hpf. With this settings mcidas rep- resents a single-gene cluster. B. Parallel coordinate plot of mcidas co-expression cluster, same c as in plot (A) (c = 0.005) but excluding 2hpf in the clustering. These settings result in four genes in mcidas co-expression cluster. The expression of another known ciliary gene (wdr97), an an- notated (tc1d1) gene and one non-annotated transcript, are depicted in green, red and with a red dotted line, respectively.

ful visualization of the relevant data aspects affected by the choice of these parame- ters.

Our solution, DendroShiny, tackles this challenging issue by providing the user with an intuitive, web-based interface for the exploration of genome wide co-expression net- works. Most importantly, our tool enables efficient selection of clustering parameters in real time on large datasets allowing the user to interactively inspect the resulting co- expression groups for their biological relevance. This analysis is aided by a large array of auxiliary functionalities including but not limited to the ability to search the dataset by annotation terms, visualizing the expression profiles of the clusters, and represent- ing large volumes of clusters in a novel way without loss of information regarding their hierarchical relationship.

DendroShiny additionally features the ability to exclude an arbitrary number of time points from the analysis. This is important as it allows not only for fast identification of 130 possible noise in the data due to experimental, or technological reasons, but also to high- light which time points might contain relevant expression information. Such analysis can for instance be achieved by iteratively removing one time point a time and observing how the composition of the resulting clusters change given the premise that removal of the above described data would result in a significantly altered cluster landscape.

Our case study highlights the effectiveness and necessity of tools such as Den- droShiny as we were able to implicate the potential function of poorly characterized transcripts and guide the selection of non-annotated transcripts for experimental valida- tion. Note that without the ability of interactive exploration of the dataset under specific clustering parameters, automated pipelines typically rely of user-provided defaults or compute clusters based on a mathematically optimal value based measurable proper- ties of the data such as density [214] or posterior probability [215]. The resulting clusters however rarely reflect the true underlying biology of the data and therefore require user expertise and manual intervention in order to generate meaningful results.

Overall, our examples show the power of exploring how the cluster content changes as the clustering parameters are modified to yield important insights into the underlying biology of the data. In this context, DendroShiny can be used to aid experimental design. In the above examples, DendroShiny facilitated the identification of poorly characterized genes, and hinted to their potential function. A subsequent step would involve the func- tional validation of this prediction. In a wet-lab setting, DendroShiny has the potential to facilitate the selection of these candidates in an intuitive, graphical, interactive, and user friendly manner depending on the researchers expertise and interest. Furthermore, the results can be experimentally verified hence accelerating the the discovery of novel genes in a data-driven manner.

DendroShiny’s potential is not limited to aiding the identification of the potential func- tion of non-annotated or poorly characterized transcripts but is applicable to a wide array additional of scenarios. As an example, our method could be leveraged to identify co- 131 expressing partners of non-coding RNAs in order to relate their function with that or their co-expressing coding genes.

In all, DendroShiny represents a versatile, interactive, and easy to use tool which we hope will facilitate the analysis of current and future high-throughout sequencing datasets by closing the gap between manual, command line based analysis of datasets, and fully automated but frequently imprecise data processing pipelines.

Bibliography [187] Paul Flicek, M. Ridwan Amode, Daniel Barrell, Kathryn Beal, Konstantinos Billis, Si- mon Brent, Denise Carvalho-Silva, Peter Clapham, Guy Coates, Stephen Fitzgerald, Laurent Gil, Carlos García Girón, Leo Gordon, Thibaut Hourlier, Sarah Hunt, Nathan Johnson, Thomas Juettemann, Andreas K. Kähäri, Stephen Keenan, Eugene Kule- sha, Fergal J. Martin, Thomas Maurel, William M. McLaren, Daniel N. Murphy, Rishi Nag, Bert Overduin, Miguel Pignatelli, Bethan Pritchard, Emily Pritchard, Harpreet S. Riat, Magali Ruffier, Daniel Sheppard, Kieron Taylor, Anja Thormann, Stephen J.Tre- vanion, Alessandro Vullo, Steven P. Wilder, Mark Wilson, Amonida Zadissa, Bron- wen L. Aken, Ewan Birney, Fiona Cunningham, Jennifer Harrow, Javier Herrero, Tim J P Hubbard, Rhoda Kinsella, Matthieu Muffato, Anne Parker, Giulietta Spudich, Andy Yates, Daniel R. Zerbino, and Stephen M J Searle. Ensembl 2014. Nucleic Acids Research, 42(D1), 2014.

[188] Kim D. Pruitt, Tatiana Tatusova, Garth R. Brown, and Donna R. Maglott. NCBI Ref- erence Sequences (RefSeq): Current status, new features and genome annotation policy. Nucleic Acids Research, 2012.

[189] Jennifer Harrow, Adam Frankish, Jose M. Gonzalez, Electra Tapanari, Mark Diekhans, Felix Kokocinski, Bronwen L. Aken, Daniel Barrell, Amonida Zadissa, Stephen Searle, If Barnes, Alexandra Bignell, Veronika Boychenko, Toby Hunt, Mike Kay, Gaurab Mukherjee, Jeena Rajan, Gloria Despacio-Reyes, Gary Saunders, Charles Steward, Rachel Harte, Michael Lin, Cédric Howald, Andrea Tanzer, Thomas Derrien, Jacqueline Chrast, Nathalie Walters, Suganthi Balasubramanian, Baikang Pei, Michael Tress, Jose Manuel Rodriguez, Iakes Ezkurdia, Jeltje Van Baren, Michael Brent, David Haussler, Manolis Kellis, Alfonso Valencia, Alexandre Rey- mond, Mark Gerstein, Roderic Guigó, and Tim J. Hubbard. GENCODE: The reference human genome annotation for the ENCODE project. Genome Research, 2012.

[190] Huaiyu Mi, Xiaosong Huang, Anushya Muruganujan, Haiming Tang, Caitlin Mills, Diane Kang, and Paul D. Thomas. PANTHER version 11: Expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhance- ments. Nucleic Acids Research, 2017.

[191] J. Robert Manak, Sujit Dike, Victor Sementchenko, Philipp Kapranov, Frederic 132

Biemar, Jeff Long, Jill Cheng, Ian Bell, Srinka Ghosh, Antonio Piccolboni, and Thomas R. Gingeras. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nature Genetics, 2006.

[192] Martin C. Frith, Michael Pheasant, and John S. Mattick. The amazing complexity of the human transcriptome. European Journal of Human Genetics, 2005.

[193] Yasunobu Okamura, Takeshi Obayashi, and Kengo Kinoshita. Comparison of gene coexpression profiles and construction of conserved gene networks to find func- tional modules. PLoS ONE, 2015.

[194] Jeffrey A. Martin and Zhong Wang. Next-generation transcriptome assembly. Na- ture Reviews Genetics, 2011.

[195] Brian J Haas, Alexie Papanicolaou, Moran Yassour, Manfred Grabherr, Philip D Blood, Joshua Bowden, Matthew Brian Couger, David Eccles, Bo Li, Matthias Lieber, Matthew D Macmanes, Michael Ott, Joshua Orvis, Nathalie Pochet, Francesco Strozzi, Nathan Weeks, Rick Westerman, Thomas William, Colin N Dewey, Robert Henschel, Richard D Leduc, Nir Friedman, and Aviv Regev. De novo transcript se- quence reconstruction from RNA-seq using the Trinity platform for reference gen- eration and analysis. Nature protocols, 8(8):1494–512, 8 2013.

[196] Chon Kit Kenneth Chan, Nedeljka Rosic, Michał T. Lorenc, Paul Visendi, Meng Lin, Paulina Kaniewska, Brett J. Ferguson, Peter M. Gresshoff, Jacqueline Batley, and David Edwards. A differential k-mer analysis pipeline for comparing RNA-Seq tran- scriptome and meta-transcriptome datasets without a reference. Functional and Integrative Genomics, 2019.

[197] C. R. Primmer, S. Papakostas, E. H. Leder, M. J. Davis, and M. A. Ragan. Annotated genes and nonannotated genomes: Cross-species use of Gene Ontology in ecology and evolution research. Molecular Ecology, 2013.

[198] Cecily J. Wolfe, Isaac S. Kohane, and Atul J. Butte. Systematic survey reveals general applicability of ”guilt-by-association” within gene coexpression networks. BMC Bioinformatics, 2005.

[199] Bjoern O. Hansen, Neha Vaid, Magdalena Musialak-Lange, Marcin Janowski, and Marek Mutwil. Elucidating gene function and function evolution through compari- son of co-expression networks of plants. Frontiers in Plant Science, 2014.

[200] Neil A. Mabbott, J. K. Baillie, Helen Brown, Tom C. Freeman, and David A. Hume. An expression atlas of human primary cells: Inference of gene function from coex- pression networks. BMC Genomics, 2013.

[201] Yang Yang, Leng Han, Yuan Yuan, Jun Li, Nainan Hei, and Han Liang. Gene co- expression network analysis reveals common system-level properties of prognostic genes across cancer types. Nature communications, 2014. 133

[202] Hyeongmin Kim and Yong Min Kim. Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types. Scientific Reports, 2018.

[203] S. Ballouz, W. Verleyen, and J. Gillis. Guidance for RNA-seq co-expression network construction and analysis: Safety in numbers. Bioinformatics, 2015.

[204] Sipko van Dam, Urmo Võsa, Adriaan van der Graaf, Lude Franke, and João Pedro de Magalhães. Gene co-expression analysis for functional classification and gene- disease predictions. Briefings in bioinformatics, 2018.

[205] R Core Team. R: A Language and Environment for Statistical Computing. R Foun- dation for Statistical Computing, Vienna, Austria, 2018.

[206] Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. shiny: Web Application Framework for R, 2019. R package version 1.3.2.

[207] Xi Chen and Hemant Ishwaran. Random forests for genomic data analysis. Ge- nomics, 2012.

[208] RStudio Inc. Shiny: Easy web applications in R. http://shiny.rstudio.com/, 2014.

[209] Barbara E. Tanos, Hui Ju Yang, Rajesh Soni, Won Jing Wang, Frank P. Macaluso, John M. Asara, and Meng Fu Bryan Tsou. Centriole distal appendages promote membrane docking, leading to cilia initiation. Genes and Development, 2013.

[210] Alice Meunier and Juliette Azimzadeh. Multiciliated cells in animals. Cold Spring Harbor Perspectives in Biology, 2016.

[211] Joshua L. Hood, Barbara B. Logan, Anthony P. Sinai, William H. Brooks, and Thomas L. Roszman. Association of the calpain/calpastatin network with subcel- lular organelles. Biochemical and Biophysical Research Communications, 2003.

[212] Pascale Gaudet, Michael S. Livstone, Suzanna E. Lewis, and Paul D. Thomas. Phylogenetic-based propagation of functional annotations within the Gene Ontol- ogy consortium. Briefings in Bioinformatics, 2011.

[213] Weiping Zhang, Jing Mi, Nan Li, Lili Sui, Tao Wan, Jia Zhang, Taoyong Chen, and Xuetao Cao. Identification and characterization of DPZF, a novel human BTB/POZ zinc finger protein sharing homology to BCL-6. Biochemical and Biophysical Re- search Communications, 2001.

[214] João C Marques and Michael B Orger. Clusterdv: a simple density-based cluster- ing method that is robust, general and automatic. Bioinformatics, 2018.

[215] Manon Ragonnet-Cronin, Emma Hodcroft, Stéphane Hué, Esther Fearnhill, Valerie Delpech, Andrew J.L. Brown, and Samantha Lycett. Automated analysis of phylo- genetic clusters. BMC Bioinformatics, 2013. 134

CHAPTER 5. GENERAL CONCLUSIONS

Overview

Even though, the primary focus of this dissertation relies on the analysis of next- generation sequencing data for the identification of ciliogenesis candidate genes, the overarching goal has been the development of in silico tools for data accessibility, anal- ysis, and visualization.

The ability to access and interpret the massive amount of biological data produced by constantly improving sequencing technologies has become a bottleneck in biological research. This is even more relevant when working with non-conventional model organ- isms, as the majority of tools and databases are focused on the conventional models with annotated genomes. The development of PdumBase, a transcriptome database and research tool for the marine annelid Platynereis dumerilii, therefore represents a sub- stantial contribution to the field of evolution and developmental biology. PdumBase pro- vides a versatile online tool to investigate stage specific transcriptional inputs during em- bryogenesis and throughout the life cycle of the annelid Platynereis dumerilii and other selected species. In addition, PdumBase also provides a searchable interface for an ad- ditional six species including Danio rerio, Xenopus tropicalis,Homo sapiens, Nematostella vectensis, Strongylocentrotus purpuratus, and Ascaris suum.

More importantly, PdumBase facilitated the identification of ciliogenesis candidate genes providing initial annotation to ∼ 17000 transcripts upon which the curated ciliary annotation was preformed. Our systematic approach for the identification of ciliogene- sis candidate genes allowed us to identify the majority of genes with known ciliary func- tion in P.dumerilii previously described in other species. A total of 1610 ciliary genes were found conserved in P.dumerilii. This represents the first comprehensive survey of ciliary genes in this invertebrate. 135

Among the known conserved ciliary genes found P. dumerilii, we further identified a set of 48 high confidence ciliogenesis genes. Based on co-expression analysis using expression data from normal development, we identified 125 potential new candidate genes. These are genes which have annotation but have not been associated with cil- iary function in other species. In addition, we elucidated 70 potential novel ciliary genes among the non-annotated transcripts.

The significance of our findings lays in the fact that our approach allowed us toidentify potential new and novel ciliogenesis candidate genes P. dumerilii transcripts with and without annotation despite the absence of a reference genome. This in turn enabled the inclusion of poorly characterized transcripts that showed no homology to any other annotated gene in other species by primary structure.

The co-expression analysis implemented to identify ciliogenesis candidate genes drew our attention to the need of a visualization tool that would allow us to define the co- expression clusters parameters through visual inspection and to further explore the clus- ter content in an interactive way. This need motivated the development of DendroShiny, a dynamic visualization tool for the exploration of genome-wide expression data.

DendroShiny enables efficient selection of clustering parameters in real time onlarge datasets allowing the user to interactively inspect the resulting co-expression groups for their biological relevance. DendroShiny represents a versatile, interactive, and easy to use tool which we hope will facilitate the analysis of current and future high-throughout sequencing datasets by closing the gap between manual, command line based analysis of datasets, and fully automated but frequently imprecise data processing pipelines.

The work presented here lays the foundation to further studies to elucidate the molec- ular processes of ciliary assembly in P. dumerilii. In addition, it represents an approach for the integration of transcriptomic data in the absence of an annotated genome, an approach that can be incorporated in similar studies including other non-conventional organisms. 136

Future Directions

PdumBase Expansion

Future expansions for PdumBase may include genome-wide expression profiles after experimental manipulations, single cell transcriptomic data for early stages of develop- ment, and genomic information of regulatory regions to provide further entry points for promoter and network analysis. Further database subdivisions could display more de- tailed and/or manually curated aspects of early development such as asymmetric cell division, distinct pathways, or the emergence of distinct cell lineages and cell types. Ad- ditionally, we consider the inclusion of images of gene expression data as a possible and valuable future expansion. Further PdumBase extensions may include an interactive vi- sualization of clustered genes based on expression profile by integrating components of DendroShiny into the user interface of PdumBase. Such a tool would enable the user to fine tune the cluster parameters while simultaneously visualizing the change in cluster composition allowing to user to examine different sets of genes at a time.

As such, PdumBase can be seen as a prototype for an online research tool to make any large-scale genome-wide data set quickly accessible to researchers without requir- ing prior expertise in bioinformatics, showcasing how valuable and extensive transcrip- tional data set can be made accessible for community wide data mining. The versatility and variety of search options hence enable a wider range of research questions to be investigated both, within a single laboratory and across the scientific community.

Ciliogenesis Candidate Genes Outlook

We developed a systematic approach of finding and classifying candidate ciliary genes in Platynereis dumerilii combining different sources of information. Based on se- quence similarity we found that the majority of the known ciliary genes described in other species are also conserved in P.dumerilii. Among those orthologous genes poten- tially involved in ciliogenesis, we further identified 611 potential targets of the β-catenin 137 signal transduction pathway based on their change of expression in hyperciliated em- bryos compared to wild-type. In addition, among the potential targets we categorized 46

Platynereis transcripts as high confidence ciliogenesis candidates, as those genes fulfill several stringent criteria.

Based on co-expression pattern using expression data from normal development, we were able to implicate potential novel ciliary candidate genes, 125 among the annotated transcripts, and 70 among non-annotated transcripts. However, experimental validation to confirm their involvement in ciliary assembly and function is currently missing. There- fore, part of the future work in relation to our ciliogenesis approach consists in cloning of the most promising candidates, whose selection have been guided using DendroShiny, and checking their spatio-temporal expression pattern at different stages of Platynereis dumerilii development. A gene that is involved in ciliogenesis should be expressed in the developing ciliary structures like the prototroch, apical tuft and telotroch multiciliated organs of the trochophore larva.

Furthermore, some of the expected ciliogenesis candidates genes that have been well characterized in other species, were found not to be affected by the treatment that trig- gered hyperciliation in P. dumerilii. This result could indicate that these genes are not target by the β-catenin pathway, but could still contribute to ciliogenesis via alternative regulatory mechanisms in P.dumerilii, for example these genes could be maternally pro- vided at high levels and/or are ubiquitously highly expressed. However, further work is required to discern between these possibilities.

Towards a Ciliogenesis Gene Regulatory Networks

Our results from the identification of ciliogenesis candidate genes show thatin

Platynereis dumerilii both transcription factors, Rfx and FoxJ1 are likely to be involved in ciliogenesis as both genes are found in the known candidate ciliary gene set and both are significantly upregulated in the hyperciliated larva and within the ratio interval defined based on the cell lineage affected by the activation of the Wnt/β-catenin pathway. Foxj1 138 is expressed early in the ciliary cell lineages that form every ciliary structure in Platynereis larvae (data not shown).

In addition, among the ciliogenesis candidate genes identified here, 392 known ciliary genes found conserved in P.dumerilii are reported as targets of rfx and foxj1. Therefore, promoter analysis on those genes constitutes a promising starting point for the study of the regulatory mechanisms driving ciliogenesis in P. dumerilii.

In general, a potential approach to characterize the promoter regions of these can- didate genes would include 1) mapping the candidate transcripts to the P. dumerilii draft genome consisting of 147, 821 contigs and extracting the upstream region of these genes, 2) developing appropriate in silico pipelines for the identification of potential tran- scription factor binding sites, and 3) experimental validation through a promoter con- struct approach to identify functional binding sites.

To this date, promoter prediction represents a computational challenge. This is espe- cially true in the case of our particular context as the availability of upstream sequences might be limited to a reduced number of genes due to the lack of a published refer- ence genome in P. dumerilii. Furthermore, so far no characterized transcription binding motifs for P. dumerilii are currently known. Therefore, approaches that rely on genome wide analysis, and/or require transcription factor binding site information might not be applicable to our current data. However, as the sequencing technology and annotation approaches improve, we expect these current limitations to soon be overcome.

Single Cell Transcriptomics An alternative approach for the study of ciliogenesis in Platynereis dumerilii involves single cell transcriptomics.

Our current data set was obtained from whole embryos and therefore the expression data, obtained after inducing hyperciliation, represents the combined effect of the treat- ment over a mixture of cells with different cell fates. Furthermore, the cell fate trans- 139 formation induced by the treatment described in Chapter 3 also affects the cells with endomesodermal fate (Figure 3.2). Therefore the set of upregulated genes in our cur- rent dataset represent a mixture of at least two, ciliogenesis and endomesodermal, gene sets.

In addition, the cell fate transformation triggered by the treatment, caused a significant morphological change in the embryo, not necessarily limited to ciliary assembly. As a result the majority of the genes are affected by the treatment and introduce high levels of expression noise hindering the identification of significantly differentially expressed genes.

Hence, identifying the correct set of genes affected by the treatment, and discerning between ciliary and endomesodermal genes among those genes in the absence of an annotated genome represents an even greater challenge. We believe to have overcome most of those limitations with our innovative computational approach. However, single cell transcriptomics has the potential to eliminate a number of biological noises including the presence of a mixture of cell fates among the genes upregulated in the hyperciliated larva. This in turn can noticeably increase the prediction accuracy of the downstream data analysis pipeline as a result.

The single cell approach would take advantage of the developmental features of

Platynereis dumerilii, presenting a larva with stereotypical cleavage with predictable cell fates. This in turn enables the isolation of individual trochoblasts, cells committed to differentiate to multiciliated cells. Isolating a series of these cells over a time series covering the development from 4 hpf, when four trochoblasts are present in the animal region of the P. dumerilii embryo, to 24hpf when the trochophore is fully developed.

Single cell RNA-seq analysis from the time series described above would allow to identify cell state distinguishing genes, and to cluster and classify the genes accord- ing to their kinetic trends, identifying differentially expressed genes and their expression profiles. In addition, computational analysis of these type of time series single celltran- 140 scriptomic data would enable to trace the trajectory of a cell as it differentiates. In this particular case, identifying the trajectory of a trochoblast as it differentiates into a MCC would enable the identification of the genes that enable the transition towards amulti- ciliated cell fate.

This approach however, requires the sequencing of a large number of cells in a time series with numerous sampling points. With current sequencing technology and its as- sociated cost, this undertaking could prove prohibitively expensive at this time. However, continuous improvement and cost reduction in this biotechnological sector could soon make single cell analysis in this scope a viable option.

Concluding Remarks

Overall the computational and biological efforts outlined within this work represent a first and crucial step towards the generation of a comprehensive survey of ciliogenesis genes in a non-conventional animal model and to achieve a comprehensive molecular in- ventory of a multiciliated cell type. This data set is bridging the gap between our current understanding of phenotypic and genomic relationships of ciliogenesis in Platynereis dumerilii and other species. We further believe that the tools developed here are appli- cable not only in the context of ciliogenesis but also for the study of other biological process in P. dumerilii and other organisms as well. 141

APPENDIX A. PDUMBASE

SUPPLEMENTARY FIGURES

Supplementary Figure A.1.1

Figure A.1.1: Expandable results from PdumBase Result Interface: Co-expression A. The Result Interface with the option “Show other info” selected. B. Co-expression information interface under the tabulator “The same cluster” displays all the transcripts/genes in the same cluster of a given component. Shown are protein name, correlation, topology overlap, and expression data. C. Co- expression information interface under the tabulator “All” displays all genes sorted by the ranking according to correlation score. D. Co-expression information interface under the tabulator “DE genes” displays deferentially expressed genes between consecutive time points. 142

Supplementary Figure A.1.2

Figure A.1.2: Expandable results from PdumBase Result Interface: Gene models A. The Result Interface allows to click on each gene found, expanding its expression and annotation informa- tion. B. Shows the interface obtained when clicking on a Gene ID from the result table. Plots of expression data over time of development are shown (early stages and all stages are plotted) Expression plot includes data of all isoforms found for a gene. C. Expression Data tab. This tab displays FPKM and Raw counts for a given gene and its possible isoforms. D. Annotation Tab. This interface retrieves detailed annotation information. 143

PDUMBASE MANUAL

The Schneider lab Platynereis dumerilii online database: PdumBase, provides a com- prehensive, versatile online tool to investigate stage specific transcriptional inputs during embryogenesis and during the life cycle of the annelid Platynereis dumerilii and other selected species (e.g. Danio rerio, Xenopus tropicalis, Nematostella vectensis, Strongylocentrotus purpuratus).

This document provides a brief description of the database content and a detailed guide on how to browse its data thorough exemplary searches. The tutorial is intended as a motivational introduction while exploring and trying out the features PdumBase has to offer as an online resource to integrate and visualize our data and findings.

Database Content

In the following, the database content as well as its structure is explained. First, the details of the raw RNA-Seq data sets are highlighted, followed by an introduction into their corresponding expression data and associated annotation profiles. Furthermore the gene expression profiling features of this software are introduced, followed byan introduction into Platynereis specific coexpression networks as well as their comparative transcriptome data.

1. RNA-Seq Data Sets

Platynereis dumerilii Normal Development Data Set The Normal Development Data Set is comprised of two sources of information each of which corresponding to different stages of development Platynereis dumerilii at specific time points.

• Early stages data set: RNA-seq data generated by Schneider lab 144

Description: This data corresponds to the first comprehensive transcriptome draft during early development in Platynereis dumerilii using the de novo assembly strategy. We performed mRNA deep sequencing of distinct stages using the Illumina HiSeq se- quencing system with read lengths of 75bp to 100bp.

Time points: 0, 2, 4, 6, 8, 10, 12, 14 hours post fertilization (hpf). Each stage has two biological replicates. The depth of these libraries ranges from 40 to 120 million paired- end reads (see Table A.2.1).

Table A.2.1: Time points from Early Stages data set Time (hpf) Description Time (hpf) Description 0 Unfertilized egg 8 ∼ 80-cell 2 Zygote 10 ∼ 140-cell 4 ∼ 8-cell 12 ∼ 220-cell 6 ∼ 30-cell 14 ∼ 330-cell

Assembly: All the biological replicates, which contain about 1.5 billion reads, were assembled into 357,961 transcripts in a genome independent manner. Due to events, out of the total transcripts assembled, 193,310 belong to genes.

Time points: This data set consist of 10 time points from 24 hours post fertilization to 3 months old adults. This set also includes female and male RNA-seq samples. There are no biological replicates (Table A.2.2).

2. Expression Data

The PdumBase web interface displays the mean FPKM (fragments per kilobase per million reads mapped) as the default measurement of gene expression. The FPKM for each replicate was obtained by normalizing the total number of mappable reads with 145

Table A.2.2: Late Stages included in data set. Time points are shown in hours post fertilization (hpf), days (d) and months (M). Time Description 24 (hpf) Early trochophore larvae 36 (hpf) Mid trochophore larvae 48 (hpf) Early metatrochophore larvae 72 (hpf) Early nectochaete larvae 4 d Mid nectochaete larvae 10 d Errant juvenile 15 d 3-segmented errant juvenile 1 Mpre 1 Mpost 3 M Adult Male Sexually mature adult Female Sexually mature adult

the corresponding transcript length. A transcript or gene is considered as expressed if its FPKM is ≥ 1. Furthermore, the FPKM for each stage was obtained by combining the replicates into a single set.

The result search page displays the mean FPKM values as the default measurement of gene expression (see Figure A.2.3). However, FPKM values from individual samples, as well as the raw counts of each transcript can also be retrieved by clicking on the ”Expression data” tab after selecting a particular transcript of interest (Figure A.2.4).

For more information we refer the reader to the Tutorial Example Section.

3. Annotation

This section is concerned with describing the different annotations, how these were sourced from external databases for convenient browsing and data exploration specific to Platynereis dumerilii. 146

Figure A.2.3: PdumBase Search result interface displays mean FPKM as measurement of ab- solute expression

Uniprot Annotation The PdumBase search results interface retrieves the Uniprot annotation data, display- ing the Uniprot accession number, gene name, protein name, the species of annota- tion origin, and the E-value (see Figure A.2.5). The annotation was performed using BLASTP by aligning the transcripts with predicted open reading frames (ORF) against non-redundant SwissProt databases. A total 31,806 transcripts (17,213 genes) retrieved at least one hit using an E-value cutoff of 10−10. Among the annotated transcripts, 26% aligned to human and 19% to mouse proteins.

Pfam Annotation We also annotated for potential protein domains by aligning all transcripts against the Pfam database. The Pfam annotation can be accessed in the database web interface by selecting the option ”Show detailed annotation” on the search results page, or by click- ing on the tab ”Annotation” after having selected a particular transcript from the result interface (see Figure A.2.6). Annotation was performed using HMMER. We were able to assign Pfam domains to 32,464 transcripts (18,146 genes), identifying a total of 431,701 Pfam domains. Further- 147

(a)

(b)

Figure A.2.4: PdumBase Expression data tab interface: (a) The upper frame displays mean FPKM and raw counts data, from samples as a pool. (b) Lower frame displays expression data from individual replicas.

Figure A.2.5: PdumBase Search results interface displays Uniprot annotation data on the right- most panel. Annotation data includes accession number, gene name, protein name, species and E-value. Clicking on the accession number will redirect to the UniProt page for that particular protein.

more, out of the transcripts with domain annotations, 28,326 (15,690) were also present in the Uniprot BLASTP annotation.

KEGG Pathways Annotation Identifying the active biological pathways in early stages is crucial to decipher the mechanisms involved in the diversification of embryonic cells. The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides well-annotated pathway databases including 148

(a)

(b)

Figure A.2.6: PdumBase Search results interface. (a) The search result page allows to cus- tomize the information displayed by checking one or more options from the left top corner. (b) Selecting the option: ”Show detailed annotation” will show detailed gene ontology, KEGG Path- ways and annotation. 149 metabolism, genetic and cellular processing. Our assembled transcripts were mapped to KEGG pathways. In total, 18,532 transcripts (10,132 genes) are associated with the known KEGG pathways.

In our database, the KEEG annotation is accessible by selecting the option ”Show detailed annotation” as seen in Figure A.2.6.

Gene Ontology Annotation The assembled transcripts were also annotated with Gene ontology (GO) terms of ho- mologous genes. A total of 30,287 transcripts (16,498 genes) could be associated with at least one annotated GO term. The GO annotation shows high enrichment in the func- tion associated with transcription and regulation activities in the biological process and molecular functions. The GO terms related to cell differentiation such as “cell transduc- tion”, “”, “cell division” and “cell cycle” are also enriched.

All annotation information for a given transcript is summarized and displayed in the annotation tab interface (see Figure A.2.7 ).

It is worth pointing out that one important feature of our database is that the search interface allows for the submission of searches by Keyword, Pfam, Gene Ontology, and Kegg Pathway, making it possible to narrow down a request by a particular annotation of interest.

4. Gene Expression Profiling

Our Platynereis dumerilii database: PdumBase, includes a detailed gene expression profiling of the early developmental stages (2 to 14hpf). An expression profile canbe interpreted as the changes in the abundance of a transcript over time.

Plots depicting these fluctuations of transcript abundance (FPKM) are shown for each transcript. Accessing this data is available via the option ”Show Plots” on the search 150

Figure A.2.7: PdumBase Annotation tab interface. This tab is available once an entry from the result page has been selected. It is accessible via result page → clicking on a gene or transcript of interest → clicking on the Annotation tab

result page (Figure A.2.8), or by clicking on the transcript of interest and selecting the tab labeled ”Plots” (Figure A.2.9).

For the purpose of the expression profiling analysis we filtered out low expression transcripts. Among the assembled transcripts with predicted ORF, 18,940 transcripts 151

Figure A.2.8: PdumBase Search result interface. Expression profile plots are displayed when the option ”Show plots” is selected.

and 13,160 genes were found to be expressed in at least one of the 7 stages. After clustering the genes according to their expression profile, we found a total of 15 distinct clusters (see Figure A.2.10).

Clusters 1-4 show the obvious maternal signature with a total of 4,302 genes belong- ing this group. The clusters 10-15 (5827 genes) correspond to the zygotic genes with slightly different activation time points. The clusters 3 and 11 are the major maternal and zygotic groups respectively showing slow decreased and increased expression pat- terns. The 6th, 7th, and 8th cluster contain a set of genes whose RNAs were mainly 152 expressed at 4, 6 and 8 hours and degraded after these stages. The 9th cluster is a less dynamic group, showing stable expression throughout all stages.

Figure A.2.9: PdumBase Plot tab interface. Shows the expression profile plot for a given tran- script.

Access to the cluster information is available by selecting the option ”Show other info” on the result search page and then, for a selected transcript/gene, clicking on the icon under ”Coexpression info”. The first tab of the new results page will display all the genes in the same cluster, along with other expression data (see Figure A.2.11).

5. Coexpression Networks

A coexpression network is a correlation network that describes the pairwise correla- tion patterns of expression data. When a set of genes are highly correlated, they may share similar biological function or be involved in the same biological pathway. A coex- pression network can also be used for identifying hub genes which have high connectiv- 153

Figure A.2.10: Heat map of 13,160 expressed genes clustered into 15 groups according to the time series patterns.

ity to other genes in a cluster. We used weighted correlation network analysis (WGCNA) to analyze Platynereis dumerilii expression profiling data.

For this analysis, we included a total of 13,192 genes whose FPKM was ≥ 1 for at least one sample. Correlation values and topology overlap for the coexpression networks can be found in the database on the Coexpression information interface. This page can be reached from the search result interface by selecting the option ”Show other info” and by clicking on the icon under the column ”Coexpression info” in the results table. The Coexpression information interface is shown in Figure A.2.11. 154

Figure A.2.11: PdumBase Coexpression information interface. Displays all the tran- scripts/genes in the same cluster of a given component, shows protein name, correlation and topology overlap.

6. Comparative Transcriptome Data

Ortholog Expression With the aim of identifying conserved stages of development, we gathered publicly available expression data from five species for which we then identified orthologs w.r.t.

Platynereis dumerilii (see Tables A.2.3 and A.2.4) and proceeded to establish global com- parison expression profiles among the ortholog groups.

The ortholog expression data for a particular Platinereis dumerilii transcript, can be found in our database by selecting the option ”Show other info” and by clicking on the 155 icon under the column Ortholog Expressions for the specific transcript of interest. The resulting interface will display the ID number and expression data for the orthologs found for that transcript/gene in the other 5 species (see Figure A.2.12).

Figure A.2.12: PdumBase Ortholog expression profile interface. Displays the expression data from the selected Platynereis dumerilii gene and the orthologs genes found in the other species along with their expression and annotation data (when available).

Table A.2.3: Species and number of protein sequences for comparative analysis Species Number of sequences Platynereis dumerilii 28,580 Danio rerio 26,241 Xenopus tropicales 18,442 Homo sapiens 23,393 Nematostella vectensis 27,273 Ascaris suum 15,446 156

Table A.2.4: Number of orthologs genes between the 6 species Species PlatynereisDanio Xenopus Homo Ne- Ascaris dumerilii rerio tropi- sapiens matostella suum calis vectensis Platynereis dumerilii 5635 5402 5051 5840 3654 Danio rerio 10784 10246 6731 4307 Xenopus tropicales 10284 6415 4140 Homo sapiens 6094 3941 Nematostella 4245 vectensis Ascaris suum

Ortholog Groups We also identified orthologs genes for 18 selected species (Table A.2.5) using the program OrthoMCL. This program runs all versus all Blastp queries among all the protein sequences from these 18 species and selects the best reciprocal hits. Once the orthologs genes were identified, phylogenetic trees were assembled using RaxMl.

To access the ortholog genes for a given Platynereis dumerilii transcript/gene, select the option ”show other info”. If ortholog groups are found for that particular transcript, a check-mark will appear under the field ”Ortholog groups”. Clicking on this icon will open a new interface with four tabs: ”List”, ”Tree-ML”, ”Tree-Parsimony”, and Alignment (see Figures A.2.13, A.2.14, and A.2.15 respectively).

Tutorial Example: Searching By Keyword

This section will show some of the Platynereis dumerilii web database features through exemplary searches using the blast info search function. 157

Table A.2.5: Species and number of genes used to find orthologs groups Class Code Species Number of genes Lophotrochozoa pdu Platynereis dumerilii 28,580 Lophotrochozoa cte Capitella teleta 32,415 Lophotrochozoa hro Helobdella robusta 23,423 Lophotrochozoa lgi Lottia gigantea 23,851 Lophotrochozoa cgi Crassostrea gigas 26,089 Ecdysozoa dpu Daphnia pulex 30,907 Ecdysozoa tca Tribolium castaneum 16,524 Ecdysozoa dme Drosophila melanogaster 13,937 Deuterostomia spu Strongylocentrotus purpuratus 20,759 Deuterostomia sko Saccoglossus kowalevskii 34,239 Deuterostomia bfo Branchiostoma floridae 50,817 Deuterostomia dre Danio rerio 26,459 Deuterostomia xtr Xenopus tropicalis 18,442 Deuterostomia hsa Homo sapiens 23,393 Prebilateria nve Nematostella vectensis 27,273 Prebilateria aqu Amphimedon queenslandica 29,883 Prebilateria tad Trichoplax adhaerens 11,520 Preanimalia mbr Monosiga brevicollis 9,196

Search

The search interface allows to submit searches under different criteria: By Keyword, Pfam, SingalIP,TmHMM, EggNog, Gene Ontology, and KEEG Pathway (Figure A.2.16). By searching under different or combined fields, the search can be customized according to the user needs.

In addition, the search interface offers the option of selecting a sorting criteria to re- trieve the results according to the expression values from any stage (0 to 14hpf) (Figure A.2.17). This feature can be particularly convenient when searching with terms that might result in a multitude of hits such as ”cell cycle” which retrieves more than 1000 genes, or ”membrane” with around 500 hits. Therefore, searching for general terms might result in a request which could take more than 60 seconds to load. Please allow time for those general searches to load. 158

Figure A.2.13: PdumBase List tab interface under Ortholog groups. Shows the species list, code, name, ortholog protein ID and contains links to access/download the protein and cDNA sequences in Fasta format.

On the other hand, when searching for a particular gene name, for instance the tran- scription factor FoxA2 in the field Blast Info, the most likely outcome will be one single hit displaying the Platynereis dumerilii transcript/gene with that particular annotation. 159

(a) (b)

Figure A.2.14: PdumBase Ortholog groups interface: (a) Phylogenetic tree among ortholog genes displayed under Tree-ML tab (b) Phylogenetic tree displayed under Tree-Parsimony tab. Both trees show the species code and the transcript/gene ID.

Figure A.2.15: PdumBase Alignment tab interface under Ortholog groups. Displays CLUSTAL 2.1 multiple sequence alignment.

Search Results

The resulting search results interface displays by default the transcript or gene model ID, protein name, expression data as mean FPKM from early stages (0 to 14 hpf), ex- 160

Figure A.2.16: PdumBase Search interface.

Figure A.2.17: PdumBase Search interface. Searching for FoxA2

pression data in inhibitor experiment, and annotation information (Figure A.2.18). 161

In addition, the results interface allows to expand the results displayed by selecting from the options on the left upper corner. The user can select one or more options ac- cording to his/her particular research needs (see also Expanded search result options Section).

Figure A.2.18: PdumBase Search result interface. shows Gene ID, expression data from early stages and from inhibitor experiment, and annotation information. The data retrieve options are found on the left upper corner.

Access to Detailed Information

Clicking on the gene model for FoxA2 ”comp221418_co” will give access to the de- tailed data results interface. The detailed data result page has three tabs: Plot, Expres- sion data and Annotation, from which different information can be accessed.

The Plot Tab Clicking on the Plot tab will display expression profile data (FPKM values against stages) for early and late states (Figure A.2.19).

The Expression Data Tab The Expression data tab will show mean and individual sample FPKM values as well as raw counts (Figure A.2.20). 162

Figure A.2.19: PdumBase Plot tab from Detailed data results interface. Displaying expression profile plots for FoxA2.

The Annotation Tab Clicking the annotation tab will retrieve a summary of all annotation related informa- tion including: Species from which the annotation was obtained, GO extended annota- tion, KEEG pathways, EggNog, and Pfam domains (see figure A.2.21).

Expanded Search Result Options

The search result default data output can be expanded by selecting the options pro- vided in the search results interface (Figure A.2.22).

Selecting ”Show plots” Selecting the ”show plots” option will retrieve a visual representation of early and late stage expression profile for all the Gene IDs displayed in the search result interface (Fig- ure A.2.23). 163

(a)

(b)

Figure A.2.20: PdumBase Expression data tab from Detailed data results interface. (a) Dis- plays expression data (mean FPKM and raw counts) from pooled samples from early stages of development. (d) show individual replicates expression data for early stages

Selecting ”Show later stages” To display the mean expression data (FPKM) from later stages of development (24hpf to 3M) it is required to select the option ”show later stages” as shown in Figure A.2.24.

Selecting ”Show other info” Clicking ”Show other info” provides access to additional data on comparative tran- scriptomics (see Figure A.2.25):

• Ortholog Expressions - if available a green check-mark icon will be displayed.

• Ortholog groups - if available a green check-mark icon will be displayed.

• Coexpression info - if available a blue icon will be displayed. 164

Figure A.2.21: PdumBase Annotation tab from Detailed data results interface. Displaying de- tailed annotation information for FoxA2.

It is important to mention that the additional data is not available to all the gene models but only to those transcripts for which orthologs genes were identified. See Table A.2.4 for the estimated numbers of orthologs found. 165

Figure A.2.22: PdumBase Search results interface checking the boxes from the search result options on the left will expand the results displayed.

Figure A.2.23: PdumBase Search results interface with the option ”Show Plots” selected. Ex- pression plots for both, early and late stages are shown for the gene under search: FoxA2.

Coexpression Link Selecting the coexpression link gives access to data about the expression profiling and coexpression. ”The same cluster tab” from this interface displays the Gene ID of all 166

Figure A.2.24: PdumBase Search results interface with the option ”Show later stages” selected. Here the later stages expression data from FOXA2 is displayed.

Figure A.2.25: PdumBase Search results interface ”Show other info” option selected. Additional information links are displayed.

genes belonging to the cluster of the gene under search (see Figure A.2.26). 167

Figure A.2.26: PdumBase Search results interface ”Show other info” option selected. The Co- expression info link displays the list of genes clustered with the gene under study.

Orthologs Groups Link Clicking the Orthologs groups link gives access to an interface with tree tabs: List, Tree-ML and Tree-Parsimony. As mentioned in the section ”Comparative transcriptome data”, 18 species were selected to assess the ortholog groups. The first tab shows the list of species from which orthologs were found for the searched gene. This interface also allows to download the protein and cDNA sequences of the orthologs in Fasta for- mat (see Figure A.2.27).

The second and third tab under the Orthologs groups link will display phylogenetic trees based on ML and parsimony analysis respectively. Figure A.2.28 shows the tree- ML for the FoxA2 ortholog genes. 168

Figure A.2.27: PdumBase Search results interface ”Show other info” option selected. The Or- tholog groups link displays the list species where orthologs were found. For FoxA2, orthologs were found in all of the 18 selected species.

Example Search for ” genes”

This final example will show a sample search with multiple results, indicating theop- tions that our web database offers to download the data in case further analysis is re- quired.

Finding Homeobox Genes that are Highest Expressed at 8hpf Searching for homeobox term in the blast field at the search interface will retrieve 114 hits. To find the highest expressed homeobox genes at 8 hpf, is is required tosortthe hits by expression values at 8 hpf in descending order (See Figure A.2.29). 169

Figure A.2.28: PdumBase Search results interface ”Show other info” option selected. The Or- tholog groups Tree-ML tab displays phylogenetic tree constructed with the ortholog protein se- quences. Tree-ML for FoxA2 orthologs among the 18 species.

Downloading Results from Platynereis dumerilii Web Database One important feature of our web database is that it allows to download the search results in different formats. The search results can be downloaded in both, comma separated value (CSV) format file and Excel file. Furthermore, the protein sequences from the genes displayed in the results can be downloaded in Fasta format. Links to download are found in the upper frame of the search result interface (see Figure ??). 170

Figure A.2.29: PdumBase Search interface search required to Find homeobox genes that are highest expressed at 8hpf. 171

Figure A.2.30: PdumBase Search results interface. Here the result page is displaying the ten top hits, sorted by expression level at 8 hpf. Links to download data are shown with a floppy disk icon and are found in the upper frame.

Concluding Remarks

Given the here presented features and ease of use that our Platynereis dumerilii database offers, we are confident that this work will provide a reliable resource tothe community for transcriptome studies due to its extensive content and user friendly de- sign. 172

APPENDIX B: CILIOGENESIS

SUPPLEMENTARY FIGURES

Supplementary Figure B.1.1

Figure B.1.1: Algorithmic pipeline to identify by sequence similarity P. dumerilii homologous to known ciliary genes characterized in other species. 173

Supplementary Figure B.1.2

Figure B.1.2: Pipeline to identify ciliogenesis candidate genes in P. dumerilii. Includes major components for: 1) Differential expression analysis in hyperciliated larva obtained by β-catenin induced cell fate transformation (major steps shown in blue. Number of transcripts included in analysis are shown in the schematic on the left). 2) Information from well characterized known ciliary genes in other species, and 3) co-expression analysis with expression data from normal development (number in the box indicates the number of genes identified in P. dumerilii in each category). 174

Supplementary Figure: Cover Art

Figure B.1.3: Cover art of the ciliogenesis paper to be submitted in conjunction with the manuscript. 175

SUPPLEMENTARY TABLES

Supplementary Table B.2.1

(See next page.) 176 Centrosome, Ciliary Membrane, Transition Zone Found Category Score transcript id in PdumBase (see ), and not included in the DE analysis , indicating whether a gene is a target → P. dumerilii EValue BitScore Weighted tity Per- cent Iden- 41.379 1.35e-16054.19888.372 8.03e-49 476 5.34e-164 153 454 1 × 1 146.386 3.43e-52 × × 36.21 164 3.03e-10051.712 36284.783 1.11e-92 1 ×86.166 279 1 057.19 Basal Body ×49.851 6.62e-116 0 895 1.27e-116 1 Central Pair, Centrosome 354 0.497 × 34242.254 907 5.08e-27 × 0.753 0.503 1 Axonemal, 112 Central Pair, × × × 0.247 × indicates if the known gene was found conserved in Tf interaction DE Info → → → ↑ ↑ ↓ ↑ ↑ ↓ ↑ ↑ Name Found , referring to the , containing functional terms from the original source. Name refer to the metrics of the sequence alignment with BlastP while , indicating if the known ciliary gene was upregulated (upward arrow Component ID Local Gene comp117673_c0 GLBL2 comp156697_c0comp156697_c0 RY44comp163215_c0 × RY44 39.189 ×comp163508_c0 CEP19 2.59e-59 44.144 6.82e-24 SPG17 206comp164124_c0 102 0.669 0.331 SPAG6 × × Basalcomp164374_c0 Body, Transition Zone FANK1 Localization Component ID DE Info ). X X X X thy Cil- iopa- X Weighted score , and action FoxJ1 RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 BitScore , indicating if a known gene is confirmed to involvedbe in a ciliopathic phenotype (green check , of known genes, sorted alphabetically by author and shown as: 1 = Cildb by Arnaiz [216], 2 = FoxJ1 EValue , SysCilia CilDB Ciliopathy Survey SysCilia Ciliopathy Survey Ciliome2 Ciliome SysCilia CilDB Ciliopathy Survey Cilia Proteome Ciliome Ciliopathy Source ), neutral or with no significant change of expression (rightward arrow ↓ Localization Known Ciliary TF inter- tz central pair central pair Percent Identity , providing the local annotation in PdumBase. tctn1 transition zone fank1 Ciliome2 RFX Name spag6 axoneme indicates if the genes belongs to one or more functional or structural component (see Methods in chapter 3) spag17 central pair , shown if/when a gene has also been reported in a cilia related database. Category 8 6, 7, 8 6, 8 77 spag6l Ciliome2 comp164124_c0 SPAG6 Known ciliary genes (2359) compiled from 7 sources (see Table 3.1 for detailed information). The columns represent, from left to right: , and Local gene name ), or if it is a candidate ciliopathic gene (orange check mark X Ciliary database Ensembl IDENSG00000149328 Source 2, 3ENSG00000132541ENSG00000108953 7ENSG00000006468 1ENSG00000170775 Gene 3ENSG00000204852 3 glb1l2 1, 4, 5,ENSG00000242715 6, 3, 4, 5 rida ywhaeENSG00000174007 2, 4, 5, gpr37 6 ccdc169ENSG00000155761 2, 3, 4, 5, cep19ENSG00000151729 bbENSG00000077327 2 Ciliome2 CilDB 1, 2, 4, 5, ENS- MUSG00000022783 ENSG00000084652ENSG00000203780 slc25a4 3 Ciliopathy Survey 2, 3, 4, 5, Rfx2 ENSG00000204381 3 RFX txlna Rfx2 Rfx2 RFX comp118619_c0 layn comp118739_c0 comp135579_c0 UK114 comp135683_c1 1433E NPY2R ETV1 × FoxJ1 × 29.252 91.892 1.49e-13 1.66e-69 70.1 228 comp163787_c0 Rfx2 1 1 ADT3 × Rfx2 × comp164143_c0 TXLNA comp164374_c0 FANK1 ), down regulated (downward arrow Table B.2.1: the Ensembl ID as uniquetargets identifier. by The Choksi [217], 3[220], = 7 Rfx = targets Ciliary by proteome Chung byof Sigg [218], [221], 4 and = 8 FoxJ1 SysCiliaof targets by FoxJ1 by Van and/or Quigley Dam Rfx [219], [222]. transcription 5 factors. mark = Rfx targets bychapter Quigley ). [219], 6 = Ciliopathy↑ by Reiter (marked as ×). The columns a Weighted score smaller than 1P. indicates dumerilii multiple matches (See Methods in chapter 3). 177 Ciliary Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 54.01981.395 0 7.88e-13275.463 1.16e-106 1368 369 306 1 130.45 6.2e-35 × 1 × 53.922 × 4.14e-111 128 33631.633 9.63e-08 1 × 48.5 178.986 7.83e-76 × 77.439 1 4.04e-90 24070.536 × 9.78e-55 0.246 28177.439 2.36e-93 × 171 0.28943.70426.452 2.01e-24 Axonemal, Basal Body, 0.176 6.15e-08 × 282 91.7 × Regulation 49.7 0.29 × 1 1 × × DE Info ↑ ↑ → → ↑ → → → → → ↓ ↓ Name Component ID Local Gene (continued) comp165303_c0 PEBP1 × 37.931 3.19e-23 90.5comp185929_c1 1 WDR65 ×comp186336_c1 Axonemal, Basal Body, comp187617_c1 CA189comp191794_c0 × GCY 29 F166B comp191809_c0 × 4.41e-06 54.661 7.11e-168 STK33 42.7comp197635_c0 512comp197731_c0 CC014 1comp197812_c0 GABR2 1 × ×comp197812_c0 × 32.95 RL17 comp197812_c0 1.34e-37 RL17 comp197812_c0 149 RL17 comp198002_c0 RL17 1 CX065 × X X X X thy Cil- iopa- action RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Table B.2.1: Ciliopathy Survey SysCiliaSysCilia FoxJ1Ciliome2 Cilia Proteome comp175054_c0 WHRN × 35.347 2.34e-56 comp182197_c0 198 5HTR × 1 35.602SysCilia 1.32e-67 Ciliopathy Survey × RFX Ciliopathy Survey 221 Signaling, Axonemal, Basal 1 × Ciliary Membrane Localization Known Ciliaryoverlap with cep290 TF inter- connecting cilium ciliary tip stereocilium membrane unknown CilDB Ciliopathy Survey basal body factor Name wdr65 cfap57, fam166b Ciliome2 RFX 6, 7 7 rpl10ps37 comp186301_c0 RL10 Ensembl IDENSG00000089220 Source 6ENSG00000100426ENSG00000181585 3ENSG00000095397 Gene 2 2, 8 pebp1 axoneme ENSG00000171522 zbed4ENSG00000149295 2 dfnb31 8 tmieENSG00000243710 basal body 1, 2, 4, 5, ENS- MUSG00000058443 ptger4ENSG00000198755ENSG00000163263 1 drd2 4, 5ENSG00000124788 ciliary ENSG00000101890 2 4, 5ENSG00000215187 c1orf189 2, 3, rpl10a 4, 5, ENSG00000130413 gucy2f atxn1 2, 3, 4, 5ENSG00000114405 Rfx2 4, 5 FoxJ1ENSG00000175697 stk33 CilDB 3, 4, 5ENSG00000198331 4, 5, c3orf14 6, FoxJ1 8 comp174024_c0ENSG00000275410 comp174219_c0 gpr156 6ENSG00000219626 ZBED4 hyls1 2, 4, 5 TMIE ×ENSG00000214954 basal body comp178447_c0 × 27.828 4, RFX 5ENSG00000135940 8.4e-56 fam228b 30.588ENSG00000169314 hnf1b 7 OPN4A 2.35e-09 4, 5 transcription ENSG00000091656 RFX FoxJ1 × 201 3 54.7 28.889 lrrc69 1.66e-10 comp186325_c0 c22orf15 cox5b 1 RFX 1 62.4 comp186721_c0 R10AB × zfhx4 × RFX 1 ATX1 RFX × × 68.807 Ciliome2 4.92e-48 168 RFX 1 RFX × RFX Rfx2 comp197872_c0 COX5B comp198194_c4 ZFHX4 × 49.254 4.88e-96 315 1 × 178 Axonemal, Basal Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- 44.7 8.76e-166 494 0.509 × Regulation, Transport, cent Iden- 69.255 2.49e-16240.365 8.18e-143 477 421 0.491 × 1 Transport 58.365 ×55.172 Transport 089.45 5.2e-49 3.62e-146 62190.769 15739.901 5.19e-127 407 1.14e-140 386 0.513 1 431 1 × × 0.48749.049 × 53.617 2.51e-91 Transport, Axonemal, × Basal 150.633 2.84e-16136.434 × 9.39e-25 272 47762.338 0 1.26e-27 93.6 0.4746.154 1 537 97.8 1.62e-4847.115 × × 8.69e-47 1 0.53 158 1 × 154 × 0.506 × Transition Zone 0.494 ×94.709 × Regulation 0 754 0.112 × DE Info ↑ ↑ ↓ ↑ ↓ ↓ ↓ → ↑ → → ↑ ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp198929_c1 MAK comp198929_c1comp199131_c0comp200057_c0 MAK WDR34 comp201244_c1 OSR2 × CHSTBcomp202331_c0 90.244 × 9.42e-53 35.612 FARP2 1.72e-52 172 178 1 × 1 Axonemal, × Ciliary comp205370_c0comp205429_c0 TM216 comp205476_c0 MORN2 comp205854_c0 MXI1 PKHL1comp205859_c0 × 24.006 1.82e-169 CP2J2 × 593 40.937 8.78e-125 0.183 × 374 Transport, Axonemal, Basal 1 × X X X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Ciliopathy Survey Ciliome Ciliopathy SurveySysCilia Ciliopathy Survey Rfx2 Localization Known Ciliaryregulation - length axoneme TF inter- ift factor cilium cytosol axoneme - signalling mak cilium cilia Name 7, 8 Ensembl IDENSG00000105464 SourceENSG00000111837 2 1, 4, 5, 6, Gene ENSG00000112144 6 grin2dENSG00000119333 4, 5, 6ENSG00000143867 6ENSG00000159921ENSG00000124302 3 4, wdr34 5ENSG00000158716 ickENSG00000239388 ift-dynein 2 cilia regulationENSG00000006607 - 2 4, 5ENSG00000154639ENSG00000160799 axoneme 3 chst8 Ciliopathy gne SurveyENSG00000132341 3 dusp23 1, 8 asb14ENSG00000133104 Ciliopathy farp2 Survey ENSG00000148700 2ENSG00000119919 RFX 2ENSG00000117448 cxadr 3 ccdc12ENSG00000085662 7 FoxJ1ENSG00000172461 ran 3ENSG00000160221 2 ciliumENSG00000159231 cytosol 2 spg20ENSG00000149212 7ENSG00000080546 add3 3 nkx2-3 SysCilia CilDBENSG00000177042 2 akr1a1 comp198370_c0 4, akr1b1 5, 6ENSG00000188010 c21orf33 2, fut9 4, 5, 7 NMDE2ENSG00000119950 cbr3 tmem80 × sesn3 3, 6ENSG00000059728 sesn1 RFX Rfx2 tz 42.466 morn2ENSG00000170927 3 Cilia Proteome 8.88e-89 FoxJ1 6, 8 Ciliome2 FoxJ1 RFX 303ENSG00000205038 Cilia Proteome mxi1ENSG00000134716 2 Rfx2 Rfx2 transcription 2, Ciliopathy comp200831_c1 4, Survey 5 FoxJ1 comp203998_c0 pkhd1 comp201364_c0ENSG00000107796 1 Cilia Proteome Ciliome basal comp202221_c1 body Ciliome2 1 Cilia Proteome × FoxJ1 FoxJ1 GLK DUS23 pkhd1l1 cyp2j2 RAN FoxJ1 RFX Rfx2 × × ANK3 comp202795_c0 comp203959_c0 comp204279_c0 36.15 44.295 Rfx2 × 2.12e-40 2.47e-22 26.068 FoxJ1 CONT CCD12 comp204962_c0 comp203998_c0 3.28e-31 HTS × comp205339_c1 134 97.8 Rfx2 comp204489_c0 comp204470_c0 30.075 127 3.46e-07 RAN ES1 comp204489_c0 SESN1 AK1A1 1 NKX25 1 comp204625_c0 × 52.4 × × 1 AK1A1 73.991 × × CilDB 52.778 comp205339_c1 9.42e-122 68.148 FUCTC × 6.49e-101 × comp205099_c0 3.13e-51 × Rfx2 1 56.631 348 SESN1 4.96e-103 309 34.296 × 169 3.38e-39 CBR3 304 FoxJ1 0.504 RFX 1 144 × 0.496 1 comp205476_c0 × × × 1 comp205854_c0 MXI1 × PKHL1 × comp206273_c0 37.773 ACT2 0 2641 0.817 × 179 Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- 97.6 093.6 775 2.28e-175 0.115 492 × 0.073 × cent Iden- 94.97494.4390.716 094.44498.66791.057 0 0 753 2.21e-78 0 099.02 754 0.112 73290.517 254 8.73e-66 753 6.19e-154 × 775 0.11230.245 0.109 0.038 1.52e-42 0.112 228 44374.757 × 0.115 × × 7.4e-51 × 154 0.034 0.066 × 72.928 Basal Body 6.26e-100 × × 157 1 28667.556 × 1 4.43e-11427.464 1 × 325 7.25e-44 × Basal Body 167 Axonemal, Basal Body, 1 ×32.812 1 1.8e-44 Basal Body 45.527 ×47.14345.318 1.87e-32 166 Basal52.913 1.06e-59 Body, Transition Zone 053.312 136 0.18294.268 218 0 60853.591 1.02e-91 × 0 0.14936.65269.048 0.15 0.668 6.8e-41 621 × 43.279 7.94e-37 263 0 619 7.22e-60 × × 66.822 0.426 14171.795 3.5e-106 120 0.425 Basal Body 54571.649 3.73e-106 195 1 × 1.78e-104 × 308 × 305 1 1 301 1 1 0.337 × 0.334 × × 0.329 × × × × DE Info ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → ↓ ↑ → ↑ ↑ ↑ ↑ ↑ ↑ → → → → → ↓ ↓ ↓ Name Component ID Local Gene (continued) comp206273_c0 ACT2 comp206644_c0comp206699_c0 CCD83 SSNA1 comp207682_c0comp207694_c0 RAB28 TECT1 comp207904_c1 PDE9Acomp208104_c0 × 63.426 FGFR2 comp208188_c0 0 PLST 611 1 × X X X X X thy Cil- iopa- action FoxJ1 RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 Rfx2 FoxJ1 FOXJ1 Table B.2.1: Ciliopathy Survey Survey Ciliome2 SysCiliaSysCilia Ciliopathy Survey RFX comp207076_c0 TPPC3 Localization Known Ciliary TF inter- appendages centrosome tz tctn3 transition zone ssna1 bb SysCilia CilDB Ciliopathy Name pde9a Ciliome2 RFX 7, 8 8 7 Ensembl IDENSG00000159251 SourceENSG00000184009 1ENSG00000163017 1ENSG00000169067 1ENSG00000143632 1ENSG00000075624 Gene 1ENSG00000143702 1 6ENSG00000170035 actg1ENSG00000110675 2 actg2ENSG00000169136 2 actbl2ENSG00000123454 2 acta1ENSG00000150676 2 actb 2, 3, 7ENSG00000176101 bb subdistal 1, 4, 5,ENSG00000163106 6, elmod1ENSG00000054116 CilDB 7 CilDB ccdc83 8 CilDB dbhENSG00000129159 CilDBENSG00000116396 CilDB 2ENSG00000157869 2 CilDB 3, 4, 5, hpgds 6 trappc3ENSG00000119977 basal body 3, 4, 5, 6, Ciliome2ENSG00000184557 kcnc1 rab28ENSG00000160191 kcnc4 Cilia Proteome 2 Ciliome bb rootlet 2, 3, 4, 5, ENSG00000060140 FoxJ1 Ciliome2 Ciliopathy 4, Survey 5ENSG00000077782 comp206273_c0ENSG00000109819 socs3 Rfx2 3 FoxJ1 comp206273_c0ENSG00000120549 Cilia FoxJ1 3 Proteome comp206273_c0 CiliomeENSG00000120756 3, comp206273_c0 6ENSG00000136167 comp206625_c0 7 ACT2 FoxJ1 comp206273_c0 RFX ENSG00000165802 3 styk1 ACT2 ENSG00000143947 FoxJ1 comp206273_c0 3 ACT2 ENSG00000119314 ACT2 MOXD1 7 ppargc1a comp206273_c0ENSG00000064490 kiaa1217 fgfr1 3 ACT2 comp206273_c0ENSG00000162961 × 3 bb (centrosome) ACT2 ENSG00000135097 7 33.586 comp206273_c0ENSG00000092758 3 pls1 Ciliopathy ACT2 Survey 5.02e-92ENSG00000165672 comp207478_c0 lcp1 3 ACT2 ENSG00000117450 1 rps27a nelfENSG00000167815 1 FoxJ1 ACT2 295 ptbp3 comp207050_c0 KCNAW 1 dpy30 × Rfx2 msi1 HPGDS col9a3 49.024 1 1.01e-138 × prdx3 Ciliome2 prdx1 comp207478_c0 × FoxJ1 39.303 prdx2 418 3.81e-47 KCNAW 0.503 × 154 Ciliome2 RFX × 48.517 comp207723_c1 Rfx2 1.88e-130 Rfx2 CilDB 1 CilDB 413 CilDB SOCS7 × × Rfx2 0.497 Rfx2 29.57 comp208104_c0 Rfx2 comp208104_c0 × 8.22e-16 Rfx2 comp208188_c0 FGFR2 Rfx2 76.3 FGFR2 Rfx2 comp208188_c0 comp208351_c0 PLST comp208502_c0 comp208410_c0 comp208562_c1 1 comp208551_c0 PLST NELF × PTBP3 comp208566_c0 RS27A comp208602_c0 DPY30 ANRA2 × comp208709_c1 comp208709_c1 36.702 MSI1H comp208709_c1 CO1A2 4.67e-32 PRDX3 × PRDX3 PRDX3 124 43.064 8.38e-58 1 209 × 1 × 180 Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 41 1.91e-41 155 1 × tity Per- cent Iden- 37.959 1.05e-4850.27 162 1.55e-6550.74630.579 4.8e-43 20549.415 6.32e-30 1 4.42e-11937.179 0.569 155 × 61.499 117 1.5e-88 350 × 0.431 0 279 Transport, Axonemal, Basal 1 × 54.362 149.47 1.68e-164 × 764 × 8.25e-100 1 48766.986 × 29955.851 1.44e-100 1 2.23e-70 1 × 313 1 × 213 Transport, Axonemal, Basal × 40.476 183.47885.435 195.862 × 080.176 2.78e-88 0 × 70.225 2.06e-136 088.889 6.89e-83 652 6.9e-150 293 385 78343.256 79570.567 4.23e-56 271 0.157 2.71e-160 422 0.587 0.418 1 0.425 0.413 × 179 × × 44867.424 × × × 1 1 × 0 138.564 3.75e-62 × × 58043.902 2.03e-38 219 1 134 1 × × 1 × DE Info → ↑ ↑ ↑ → → → ↑ → ↑ → ↓ ↑ ↓ ↓ ↓ ↓ ↓ → ↑ ↓ ↓ → ↑ Name Component ID Local Gene (continued) comp208831_c0comp208958_c0 LRC61 IFT27 comp209723_c1 INVS comp209785_c1 PPIL6 comp210096_c0comp210160_c2 SIX1 × PI16 82.353 3.31e-132 × 32.075 382 2.18e-29 112comp210431_c0 1 × KCNRG 1 × × 45.133comp210539_c0 2.04e-22comp210589_c0 TTC24 93.2 B910 1 × X thy Cil- iopa- action FOXJ1 Rfx2 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Localization Known Ciliarycilium ift ift-b TF inter- body centrosome inversin compartment ift27 basal body Name 6, 7, 8 Ensembl IDENSG00000151332 SourceENSG00000127399 3 3, 4, 5ENSG00000100360 Gene 1, 2, 4, 5, ENSG00000101773 lrrc61ENSG00000092009 mbip 3ENSG00000214435 2ENSG00000166148 7ENSG00000131408 2ENSG00000119509 2 6, 8 rbbp8 cma1 as3mtENSG00000131446 avpr1aENSG00000185250 3 nr1h2 2, 4, 5, 7 invsENSG00000196218 ciliumENSG00000188786 basal 2ENSG00000126432 3ENSG00000170577 7 ppil6 mgat1 3, 4, 5 Ciliome2ENSG00000124490 4, 5 RFX ENSG00000160293 Rfx2 ryr1ENSG00000101210 3ENSG00000156508 prdx5 1 six2ENSG00000147526 1, 2ENSG00000123131 3 Ciliome2 CiliaENSG00000129116 Proteome crisp2 1ENSG00000140988 3 Rfx2ENSG00000129566 comp208828_c0 1 FoxJ1ENSG00000010278 RFX vav2 3, eef1a2 7 eef1a1ENSG00000139684 2 FoxJ1ENSG00000198553 7 MBIP1 FoxJ1 tacc1 Ciliome2 3, 4, 5 prdx4ENSG00000145692 palld comp208958_c0ENSG00000213424 comp209365_c0 comp209039_c0 7 rps2 tep1ENSG00000182253 3 comp209586_c2ENSG00000161860 2 Rfx2 cd9 kcnrgENSG00000187862 AS3MT comp209592_c3 2 IFT27 CilDB esd PLMN CilDB Cilia 4, Proteome 5, 7ENSG00000141314ENSG00000154640 gm5096 ANR 3 CilDB FoxJ1 4, ECR 5 krt222 × Rfx2 FoxJ1 synm RFX 40.443 comp209778_c0 CilDB syce2 ttc24 3.93e-93 RFX MGAT1 rhbdl3 Ciliome2 286 comp209795_c1 btg3 comp210042_c0 Rfx2 comp210301_c0 comp209918_c1 PRDX5 RY44 Rfx2 1 EF1A1 MTF1 × × Rfx2 34.575 Rfx2 comp210301_c0 comp210281_c0 1.67e-98 FoxJ1 RFX comp210301_c0 333 EF1A1 comp210310_c1 VAV2 comp210310_c1 comp210349_c0 EF1A1 Rfx2 comp210371_c0 PRDX4 1 FoxJ1 comp210397_c0 comp210428_c0 PRDX4 FoxJ1 RFX × RS2 TEP1 Rfx2 ESTD × CD9 RFX comp210434_c0 comp210449_c1 30.73 comp210449_c1 comp210495_c0 BHMT1 NF70 0 NF70 SYCE2 comp210588_c0 × × × 32.667 857 30.11 0.000725 RHBL3 28.571 8.65e-10 2.03e-12 × 41.6 41.713 1 56.2 72.4 5.72e-93 0.365 × 0.635 × 284 1 × × 1 × 181 Membrane Membrane Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 41.739 1.74e-20 89.4 1 × 25.168 1.32e-6725.311 257 3.33e-11290.155 0.39 40272.393 7.87e-13161.94 8.68e-88 × 2.15e-133 365 0.6165.517 Axonemal, Ciliary 6.92e-48 262 × 37936.029 156.863 151 Transport, 3.5e-22 Axonemal, Basal 36.504 1.72e-168 1 × 2.86e-89 1 85.9 × 47836.082 1 × 47.582 5.37e-29 277 0.633 × 1 0.36737.599 110 × 066.99 × × 4.27e-45 0 731 147.525 14133.333 3.09e-23 0.572 × 548 2.41e-2470.391 × 31.658 96.3 6.42e-100 0.428 1 7.44e-29 100 286 × × 1 116 1 × 0.264 1 × × × Regulation DE Info → → → ↑ ↓ → → ↑ ↓ ↓ → ↓ ↓ ↑ → ↑ ↓ ↓ Name Component ID Local Gene (continued) comp210592_c0 S2545 comp210605_c0 DOPR1comp210605_c0 × 36.686 DOPR1 2.95e-67comp210608_c4 × 23.442 221 PK1L1 1.35e-15comp210608_c4 0.221 79.7 PK1L1 ×comp210801_c0 0.08 Axonemal, Ciliary comp210850_c0 × NIT2 LAMC1 × 17.391 8.24e-05comp211304_c1 44.7 ECE1 comp211492_c0 1 MYCBP comp211513_c0 × comp211543_c0 CG057 CCNA X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Table B.2.1: SysCiliaSysCiliaSysCilia Ciliopathy Survey SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey comp210605_c0 comp210605_c0 DOPR1 × DOPR1 41.265 × 3.33e-77 37.795 1.14e-70 247 231 0.247 0.231 × × Ciliary Membrane Ciliary Membrane Ciliome2Ciliopathy Survey RFX RFX Localization Known Ciliarymembrane membrane TF inter- membrane axoneme - ciliary signalling membrane axoneme - signalling - signalling epithelial cell differentiation Name c7orf57 Ensembl IDENSG00000152093 Source 4, 5ENSG00000184845 8ENSG00000169676 Gene 8ENSG00000158748 6, cfc1b 8ENSG00000171517 drd1 4, 5ENSG00000267534 ciliary drd5ENSG00000128271 3ENSG00000008710 htr6 ciliary 3 6, 8 ciliary lpar3ENSG00000158683 6, 8 s1pr2 adora2aENSG00000136238 pkd1ENSG00000108179 7ENSG00000114021 7 ciliary 4, 5ENSG00000173812 pkd1l1ENSG00000169908 3 cilium axoneme 4, 5ENSG00000145384 RFX ENSG00000120053 2 rac1ENSG00000169154 7 ppifENSG00000165995 nit2 2ENSG00000067191 3 tm4sf1ENSG00000113140 3 eif1ENSG00000145194 3 2, 4, 5 fabp2ENSG00000171551 got1 got1l1ENSG00000214114 cacnb2 2 RFX Ciliome2 cacnb1 2, 3, 4, Ciliome2 5 Rfx2 Rfx2 sparc ece2ENSG00000128590ENSG00000164746 3 4, 5, 7ENSG00000111229 ecel1ENSG00000152669 7 Ciliome2 4, 5, 6 comp210605_c0 comp210605_c0 gm11992, DOPR1 DOPR1 RFX × arpc3 × ccno Cilia Proteome RFX 24.756 Rfx2 34.768 multi-ciliated 1.77e-12 comp210648_c0 2.42e-41 FoxJ1 comp210758_c0 69.3 FoxJ1 152 Rfx2 RFX RAC1 Rfx2 PPIE 0.069 0.152 comp210828_c1 Rfx2 RFX Ciliome2 comp210910_c0 × × comp210911_c0 comp210911_c0 SUI1 FoxJ1 comp211156_c2 FABPI comp211156_c2 AATC AATC comp211281_c1 CACB2 Rfx2 CACB2 × × comp211304_c1 SPRC 57.245 65.158 ECE1 0 comp211504_c0 0 comp211518_c0 592 DNJB4 558 0.515 ARPC3 0.485 × × 182 Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 53.169 2.35e-106 32360.66455.629 2.94e-89 7.7e-55 0.73655.975 1.15e-59 265 × 17171.963 18241.901 4.25e-167 166.406 1.15e-67 135.085 6.64e-59 × 471 1.22e-103 1 × 21885.333 178 × 36064.48 5.01e-97 Basal Body 149.654 Transport, Axonemal, 1.76e-147 Basal 1 × 276 1 1 0 × 42776.282 × 5.03e-79 × 559 153.782 1 5.39e-93 232 × × 1 27331.373 1 Transport, Axonemal, × Basal 1.07e-11 ×29.885 1 58.9 6.59e-39 Basal Body, Centrosome × 138 Axonemal, Basal 1 Body, 37.745 × 9.29e-38 162.06965.487 1.3e-48 × 2.26e-49 131 16074.747 162 1 0.497 0.503 0 × × × 602 1 × DE Info ↓ ↓ ↑ ↑ → → ↓ ↑ ↓ ↓ ↑ ↓ ↑ ↑ → ↑ → → → Name Component ID Local Gene (continued) comp211543_c0 CCNA comp211567_c1comp211586_c0 AR2BP TM138 comp211764_c0 TEKT4 comp211765_c0comp211775_c0 CETN3 MORN3 comp211787_c1comp211810_c0 NA comp211886_c2 TPGS1 comp211886_c2 ANKS3comp211894_c1 × ANKS3 46.881 × CI135 6.34e-146comp211947_c0 39.908 3.43e-24 437 NELL1 106 0.574 × 40.696 × 0.139 Axonemal, Ciliary × 0 Basal Body 545 1 × X X X X X X X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 FoxJ1 FOXJ1 Rfx2 RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey Cilia Proteome Ciliome Survey Ciliome CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliary TF inter- cilium tz sperm acrosome and manchette Name morn3 tmem138 basal body linc00935 rp11-579d7.1, 1700028p14rik 8 5, 6, 7 Ensembl IDENSG00000133101 Source 2, 3, 4, 5ENSG00000099256ENSG00000102931 Gene 2 ccna1 4, 5, 6ENSG00000149483 2, 4, 5, 6, ENSG00000081307 prtfdc1 arl2bpENSG00000127952 3ENSG00000169567 bb 7ENSG00000171798 7ENSG00000122375 2ENSG00000119844 2ENSG00000177889 2ENSG00000126581 7ENSG00000163060 uba5 3 styxl1 Ciliopathy Survey 1, 2, 6, 8 hint1 kndc1ENSG00000153140 opn4 aftph 3, 4, 5, 6 tekt4 RFX ENSG00000139714 becn1 RFX cilium axoneme 1, 2, 3, 4, cetn3 SysCilia CilDB Ciliopathy ENSG00000239474 Ciliome2ENSG00000257987 centriole 3 4, FoxJ1 5, 7ENSG00000141933 Ciliome2 2, Ciliopathy 4, Survey 5 4930415o20rik, ENSG00000168096 kbtbd10 3, 6ENSG00000167595 comp211555_c0ENSG00000173638 Rfx2 6 tpgs1ENSG00000204711 RFX 3 4, 5, HPRT 7ENSG00000167106 FoxJ1ENSG00000162636 anks3 FoxJ1 3ENSG00000184613 FoxJ1 3 axoneme proser3 c9orf135, 2, 4, 5 Rfx2 slc19a1 comp211613_c0 comp211695_c0 bbENSG00000102144 (centrosome)ENSG00000183496 1 comp211706_c0 comp211679_c0 Ciliopathy fam102a Survey Ciliopathy Survey 3 comp211709_c1 fam102b UBA5 HINT1 comp211725_c0 comp211709_c1 nell2 VKIND STYL1 comp211762_c0 OPSD UBE2N OPSD Rfx2 × pgk1 mex3b × BECN1 39.542 Rfx2 38.17 4.25e-77 7.62e-71 RFX 253 248 0.505 comp211784_c0 CilDB 0.495 × Rfx2 KBTBA × × Rfx2 Rfx2 29.153 4.73e-89 RFX comp211886_c2 289 Rfx2 comp211932_c1 ANKS3 comp211932_c1 × 1 F102A 63.253 F102A × 3.68e-64 comp211967_c0 comp212028_c1 218 MEX3C PGK1 0.286 × × 58.306 1.73e-98 308 1 × 183 Membrane Membrane Found Category Score EValue BitScore Weighted 9898 1.25e-103 4.51e-104 301 294 0.334 0.326 × × tity Per- cent Iden- 39.16745.32 1e-18 6.88e-4597.987 79.3 7.1e-106 149 306 151.256 185.028 1.55e-69 × 92.614 0.34 × 4.08e-11281.41 211 × 061.481 5.17e-80 342 8.51e-51 63170.518 1 0.564 264 5.43e-135 18468.802 × ×42.667 0.436 387 1 9.14e-37 Motility Basal Body, Transition Zone 1 × 0 × 125 × 1 547 × 1 1 × 68.727 × 60.984 1.58e-14138.667 8.34e-137 2.05e-47 399 39652.649 166 8.2e-95 1 172.587 28566.238 1.94e-119 × 1 2.98e-125 × × 348 360 156.543 0.492 × 2.53e-158 0.508 × × 464 1 × DE Info → → ↓ ↓ ↓ ↑ → ↑ ↑ → → ↓ ↓ ↑ → → → → → ↓ Name Component ID Local Gene (continued) comp212063_c4 CB061 comp212080_c0 ADCY2comp212138_c0 ×comp212287_c5 35.173 DPCD 7.86e-163comp212337_c0 ACT2 506 SP4 1comp212503_c0 ×comp212519_c0 U669 Axonemal, Ciliary PTC1comp212519_c0 ×comp212519_c0 46.202 PTC1 × PTC1 0 45.994comp212559_c0 × 2.5e-93 1042 36.923 THEGL 2.96e-32 311comp212686_c1 0.405 132 0.121 × TRI13 × Axonemal,comp212876_c0 Ciliary 0.051 × Transition 30.636 Zone × 4.98e-16 SSR4comp212883_c0 × 80.9 KCNA2 39.228 5.6e-68 1 222 × 0.542 × Axonemal X X X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey SysCilia Ciliopathy Survey Ciliopathy Survey Localization Known Ciliary TF inter- membrane axoneme - ciliary signalling tz membrane axoneme - signalling ciliary signalling Name 1700011e24rik Ensembl IDENSG00000239605 Source 4, 5, 7ENSG00000144834ENSG00000160014 7ENSG00000198668 1ENSG00000143933 Gene 1 c2orf61, ENSG00000138031 1 6, 8 tagln3ENSG00000166171 calm3ENSG00000156052 6, 8 calm1ENSG00000108641 7 calm2 adcy3 1, 6, 8ENSG00000181355ENSG00000170374 ciliary 7 3, 4, 5 dpcdENSG00000168906ENSG00000092929 b9d1 Ciliome2 gnaq motile 7 ciliumENSG00000135821 CilDB 2 transition zone ENSG00000185127 CilDB 2 ofcc1 CilDB 4, SysCilia sp7 5 Ciliopathy Survey ENSG00000185920 6, 8 mat2a unc13dENSG00000186889 c6orf120 4, Ciliome2 5, 6 glulENSG00000169139ENSG00000187492 7 ptch1 4, 5ENSG00000117425 ciliary tmem17ENSG00000175093 3ENSG00000152495 3 tzENSG00000249693 2, 7 comp212066_c2 4, 5, 7ENSG00000163435 cdhr4ENSG00000139977 Cilia Proteome 2 comp212071_c1 CiliomeENSG00000204977 3 comp212071_c1 CNN3 ptch2 3, comp212071_c1 4, spsb4 5 camk4 Ciliopathy Survey FoxJ1ENSG00000179636 thegl CALM ENSG00000168028 CALM 7, RFX 8ENSG00000278195 CALM 1 6 trim13 Ciliome2 comp212140_c0 elf3 naa30ENSG00000126262 RFX ENSG00000112486 FoxJ1 2ENSG00000182255 RFX comp212483_c0 2 GNAQ tppp2 4, 5ENSG00000100362 Ciliome2 2 rpsa comp212287_c5 GLNA sstr3 comp212390_c0 axoneme - ffar2 comp212340_c1 kcna4 ACT2 ccr6 BAIP3 pvalb RFX METK2 SysCilia FoxJ1 Ciliome2 × 32.671 CilDB Rfx2 Rfx2 6.03e-83 comp212519_c0 RFX 282 comp212553_c1 RFX FoxJ1 PTC1 Rfx2 comp212519_c0 × comp212531_c0 KCC4 1 37.968 3.71e-34 × PTC1 SPSB1 comp212818_c0 comp212617_c0 × comp212664_c0 133 47.085 FoxJ1 RFX FoxJ1 RSSA 0.052 NAA30 EHF 0 FoxJ1 × × comp212818_c0 44.355 comp212876_c0 1.72e-23 952 comp212876_c0 RSSA 102 0.37 comp212903_c0 SSR4 SSR4 × × × 24.54 CALM 1 27.645 3.38e-12 × 1.84e-29 × 32.075 67.4 5.51e-12 120 0.165 59.3 0.293 × × 1 × 184 Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Transition Zone, Other Organells Ciliary Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- 52.4 2.15e-84 271 1 × 41.6 1.63e-52 171 1 × cent Iden- 43.003 5.51e-10357.8465.196 9.01e-118 318 1.07e-103 348 29843.63635.45 1 4.32e-52 2.72e-2552.093 × 1 1 8.26e-77 186 97.4 × ×41.837 231 6.09e-56 1 Central Pair 1 0.561 × 18139.946 × × 9.94e-89 0.439 Basal Transport, Body Axonemal, Basal 64.51658.179 2.08e-25 × 27549.425 3.57e-19 Axonemal 102 055.519 130.137 97.4 6.93e-98 2.09e-77 2212 × 1 0.222 34260.256 Axonemal, Ciliary 257 × 39.216 × 6.86e-32 1 0.778 4.64e-5093.37 Transport, Axonemal, Basal × 50.562 × 122 1 3.81e-12628.09 7.89e-65 165 3.14e-51 × 354 200 1 Axonemal,53.061 Basal Body, 1 187 0.639 2.88e-47 × 0.361 × 36.929 × × 6.44e-103 160 162.656 319 × 1 0 × 1 629 Axonemal, Basal × Body, 1 × DE Info → ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → ↑ → → ↑ → ↑ ↓ ↓ → ↑ → ↓ ↑ Name Component ID Local Gene (continued) comp212929_c0comp212949_c0 ZN568 NDK5 comp212978_c1comp213064_c0 PIFO ROP1L comp213064_c0comp213108_c3 ROP1L comp213111_c1 CK065 CB062 comp213122_c7 HERC1 comp213123_c0 IQCB1 comp213208_c0 IKBP1 comp213287_c0 CEP70 comp213313_c0comp213345_c0 CU002 comp213367_c1 CP27A CC14A X X X X X X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Ciliopathy Survey SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX Ciliome Localization Known Ciliarycomponent TF inter- sperm fibrous sheath sheath axoneme Ciliopathy Survey Ciliome2cilium bb RFX centrosome bb tz (connecting cilium) axoneme catip, nme5 radial spoke Name ropn1l cilium cytosol cdc14a Ciliome2 Cilia Proteome c2orf62 5, 6, 7, 8 5, 6, 7, 8 7 7 Ensembl IDENSG00000101850 SourceENSG00000196458 2 4, 5ENSG00000136720ENSG00000112981 3 Gene 1, 2, 3, 4, gpr143ENSG00000166482 znf605ENSG00000173947 2 4, 5, hs6st1 6,ENSG00000145491 7 1, 2, 3, 4, ENSG00000065371 mfap4 pifo 6ENSG00000166323 bb 2, 4, 5ENSG00000158428 2, 4, 5, 6, c11orf65ENSG00000040633 ropn1ENSG00000103657 3 Ciliopathy Survey sperm fibrous ENSG00000116127 3 4, 5, 6, 8ENSG00000100577 FoxJ1ENSG00000173226 RFX 7 RFX 4, 5, 6, phf23 8 Rfx2 Cilia herc1 Proteome Ciliome basalENSG00000177426 body ENSG00000146243 3 comp212912_c0 4, 5 RFX ENSG00000067560 gstz1 FoxJ1ENSG00000115963 basal body 7ENSG00000114107 3 GP143 comp212946_c6 4, 5ENSG00000110881 × irak1bp1ENSG00000177989 tgif1 2 27.733ENSG00000160226 H6ST3 1, 2 comp212950_c0 1.4e-39 4, 5, 6, 8 rhoaENSG00000135929 Ciliome2 rnd3 2, 3 145ENSG00000079335 FGL1 c21orf2 asic1 odf3b 2, 3, 4, 5, bb tz 1 cyp27a1 Rfx2 Rfx2 × Ciliome2 CilDB comp213122_c3 comp213122_c4 RFX Rfx2 PHF13 comp213122_c7 HERC1 HERC1 RFX Rfx2 FoxJ1 FoxJ1 comp213198_c0 comp213251_c0 Rfx2 TGIF2 comp213251_c0 comp213307_c0 RHO1 comp213300_c0 ODF3A RHO1 ASIC1 × 30.939 1.67e-53 197 1 × 185 Centrosome, Other Organells Body, Ciliary Membrane Centrosome, Other Organells Found Category Score EValue BitScore Weighted 56 1.04e-56 184 1 × 40 8.12e-5840 5.24e-55 191 193 0.247 × 0.25 × tity Per- cent Iden- 44.898 7.3e-11147.541 5.58e-7034.826 327 21760.046 050.929 2.82e-176 1 9.33e-83 1 × 539 95966.286 252 × 3.79e-8771.812 1 1.51e-77 141.597 260 142.522 2.38e-55 × × 8.34e-84 242 × 0.51860.976 177 0.48237.968 × 1.37e-42 293 2.29e-4553.636 × 29.515 2.89e-56 154 134.146 7.11e-21 150 146.735 5.77e-112 × 1.73e-87 186 × 88.2 1 34754.167 1 3.06e-32 264 × 1 × 1 0.342 1 125 × 32.778 × 8.86e-23 × × 0.162 Axonemal, Basal Body, 96.3 × 34.969 123.823 7.35e-1619.676 1.57e-16 × 2.05e-09 72.8 Basal Body, 80.9 Transition Zone 74.535 61.2 1 0.067 1 0 × × × 857 Axonemal, Basal Body, 0.933 × DE Info ↑ → ↑ ↓ ↑ ↓ ↓ → ↑ → ↓ → ↓ ↓ ↓ → → → → ↑ → ↑ ↑ ↑ Name Component ID Local Gene (continued) comp213440_c1comp213484_c1 SLX1 WDR96 comp213501_c0 RAB36 comp213518_c3comp213518_c3 APC10 APC10 comp213638_c0comp213659_c0 SPD1A AR6P1 comp213727_c0comp213727_c0 TBCD7 comp213727_c0 TBCD7 comp213764_c1 TBCD7 comp213765_c2 TM237 NCS1 × 70.811comp213941_c1 3.43e-88 257 TCPE 1 × X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: CilDB Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Rfx2 SysCilia Ciliopathy Survey RFX SysCiliaSysCilia Ciliopathy Survey comp213774_c0 GPR98 × 35.347 0 619 1 × Signaling, Axonemal, Basal Localization Known Ciliary TF inter- vesicle trafficking tz cilium stereocilium bb assem- bly/folding factor Name wdr96, cfap43 c12orf52 7 Ensembl IDENSG00000100591 SourceENSG00000169105 2ENSG00000132207 3 4, 5ENSG00000197748 Gene 1, 2, 4, 5, ENSG00000135316 ahsa1ENSG00000100228 2 chst14 2, slx1a 3, 4, 5ENSG00000164162 4, 5ENSG00000144362 rab36 syncrip 4, 5ENSG00000148834ENSG00000203734 7ENSG00000163806 anapc10 2 4, 5 phospho2ENSG00000176887ENSG00000170540 3 2, 3ENSG00000069011 CiliaENSG00000156831 gsto1 Proteome 3ENSG00000146842 spdya ect2l 3ENSG00000145979 3 3, 6, 8 arl6ip1 FoxJ1ENSG00000139405 Rfx2 FoxJ1 4, RFX 5ENSG00000163666 nsmce2 tmem209 tbc1d7 4, 5ENSG00000147642 Ciliome2ENSG00000155755 basal body 2 RFX comp213379_c0 4, 5, 6,ENSG00000107130 8 comp213389_c1 rita1, comp213487_c2 2, 4, 5 AHSA1 tmem237ENSG00000164199 RFX HNRPQ CHSTB 8 transition zone RFX sybu ×ENSG00000198570ENSG00000196476 ncs1 31.161 2ENSG00000179941 2 3.69e-43 6, 8 RFX FoxJ1 gpr98 154ENSG00000150753 connecting Rfx2 Rfx2 1 c20orf96 comp213531_c0 rd3 bbs10 1 Rfx2 Rfx2 Ciliome Rfx2 Ciliome comp213609_c0 basal body bbs - GSTO1 × comp213647_c0 ECT2L cct5 RFX comp213703_c0 RFX SX11B comp213669_c1 comp213706_c0 Cilia Proteome FoxJ1 TM209 NSE2 PITX CilDB FoxJ1 comp213727_c0 FoxJ1 TBCD7 comp213831_c2 comp213904_c0 RD3 CT096 comp213941_c1 TCPE 186 Ciliary Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Body Found Category Score EValue BitScore Weighted 40 2.61e-119 369 1 × tity Per- cent Iden- 85.088 1.65e-7197.03 209 9.65e-6495.683 6.58e-90 204 189.24794.268 5.39e-118 279 0.181 ×35.593 5.29e-9759.031 5.95e-37 × 336 0.248 Axonemal, Basal Body, 57.326 308 Transport, Axonemal, × 1.15e-145 Basal 138 0.298 0 0.273 × 44270.275 545 × 143.478 0.344 2.27e-24 0 × 27.559 ×43.846 1.04e-07 171.884 91.7 5.15e-36 842 Motility ×75.942 54.782.474 126 0.656 0 Signaling, Axonemal,77.662 Basal 122.573 0 ×38.114 1 × 3.1e-15 0 540 1 0 Other44.375 Organells × 569 1.54e-39 79.3 0.487 × 0 67056.209 617 9.45e-61 0.513 × 135 0.521 64671.141 0.479 1 × 4.16e-151 194 × 0.173 0.827 × × 427 × ×31.356 Other Organells 1 3.27e-2570.94 Other Organells × 1 1.81e-119 102 × 340 1 1 × × DE Info ↑ ↓ ↓ ↓ ↓ → ↓ ↓ ↓ → → → ↓ ↓ ↑ ↑ ↑ → → ↓ ↓ ↑ ↓ ↑ Name Component ID Local Gene (continued) comp213943_c0comp213953_c0 DYLT1 comp213953_c0 RAP1B RAP1B comp213968_c0comp214048_c0 SUFU 2AAA comp214048_c0 2AAA comp214110_c0 GMPR2 comp214157_c0comp214179_c4 GSK3B NPHP3 comp214289_c1 PLS2 comp214336_c0 TTL10 comp214409_c0 VGFR1 × 37.882 2.67e-86 290 0.51 × X X X X X X X thy Cil- iopa- action RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1FoxJ1 comp214353_c5FOXJ1 TPPP2 comp214382_c0 × 53.165 CD022 4.56e-48 154 1 × Table B.2.1: SysCilia CilDB Ciliopathy Survey Cilia Proteome Ciliome Survey Ciliome2 Cilia Proteome Ciliome Ciliome SysCilia Ciliopathy Survey CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Ciliopathy Survey Ciliome2 FoxJ1SysCilia Ciliopathy Survey comp214057_c0 F183A Ciliome CilDB Ciliome2 Cilia Proteome Ciliome Localization Known Ciliarytransition zone axoneme tz TF inter- tip (dynein regulatory complex) could not find) compartment Name axoneme c4orf22 fam183b ift22, rabl5 cilium ift ift-b SysCilia CilDB Ciliopathy 6, 8 6, 7, 8 Ensembl IDENSG00000146425 Source 1, 2, 4, 5, ENSG00000128581 1, 2, 4, 5, Gene ENSG00000117477 2, 4, 5, 7ENSG00000127314ENSG00000131149 7ENSG00000187151 ccdc181 3ENSG00000107882 2 6, 8ENSG00000114473 1, 2, 6, 7 kiaa0182 rap1bENSG00000105568 angptl5 6ENSG00000186973 sufu Ciliome2 Cilia iqcg Proteome 2, 7 ciliary tipENSG00000148841 ciliary motile axoneme ENSG00000116741 3ENSG00000137198 3 ppp2r1a 4, 5ENSG00000100938 fam183a, Ciliome2 role inENSG00000082701 cilia? (i 3ENSG00000105723 8ENSG00000162598 3, 6ENSG00000113971 itprip 7 rgs2 6, gmpr 8ENSG00000188716ENSG00000030419 2 gmpr2ENSG00000196542 3 gsk3b gsk3a 3, gm12695 4, 5 cytosolicENSG00000215021 Rfx2ENSG00000162571 1 inversin FoxJ1 4, 5 dupd1ENSG00000159713 sptssb Ciliopathy 1, Survey 2ENSG00000204950ENSG00000126895 comp213953_c0 2ENSG00000197826 SysCilia 3 comp213953_c0 phb2 1, 2, ttll10 comp213956_c2 7ENSG00000157404 RAP1B Rfx2 tppp3 4, 5ENSG00000134853 RAP1B 1700007g11rik, ENSG00000077274 FBCD1 3 lrrc10b 2 avpr2 Rfx2 Rfx2 RFX CilDB Rfx2 kit pdgfra CilDB Cilia capn6 Proteome comp214104_c2 comp214108_c0 FoxJ1 MB212 RFX comp214110_c0 comp214157_c0 Rfx2 RGS20 GMPR2 GSK3B comp214178_c0 Cilia Proteome RFX comp214179_c4 CA087 comp214244_c2 FoxJ1 DUS3 Rfx2 ZN37A FoxJ1 comp214304_c0 × RFX 30.851 Rfx2 PHB2 comp214364_c4 3.38e-23 comp214374_c12 comp214427_c1 100 LRC40 ANR CAN5 comp214409_c0 × 1 × 41.46 38.4 × VGFR1 4.96e-154 1.4e-77 × 36.325 459 1.32e-81 246 279 1 1 × 0.49 × × 187 Ciliary Membrane, Transition Zone, Other Organells Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted 80 2.48e-66 224 0.116 × Axonemal, Basal Body, tity Per- cent Iden- 72.52743.925 1.4e-84 1.12e-2141.975 24739.13 1.08e-67 87 2.97e-7538.395 1.43e-71 211 1 241 163.29154.326 2.61e-27 × 226 1.59e-178 × 1 1 115 Basal Body × 511 1 × 0.18487.944 0.816 ×81.818 3.51e-98 × × 69.388 Central Pair 034.717 3.61e-9762.069 2.35e-83 309 2.47e-89 1400 29855.992 275 0.16 0.724 8.78e-153 283 × 56.522 × 1 444 0.20466.463 5.29e-152 153.186 ×41.833 1e-63 1.63e-147 × 0.321 × 441 4.75e-70 Basal Body, Transition Zone × 421 216 0.319 220 × 0.156 156.122 × 1 × 1.8e-66 × 51.11128.571 1.46e-84 20439.531 2.22e-09 7.71e-108 278 58.9 352 140.789 0.825 9.1e-78 0.175 × × × 1 Axonemal, Basal 243 Body, × Axonemal, Ciliary 1 × DE Info → ↑ ↓ ↑ ↑ ↓ ↓ → → → ↓ ↑ ↓ ↓ ↓ ↓ → ↑ ↓ ↑ ↑ ↓ ↓ Name Component ID Local Gene (continued) comp214437_c1comp214495_c2 LRRC7 × 32.704 U740 comp214566_c0 1.27e-18comp214643_c0 CA177 85.9 RTDR1 comp214656_c5 1 × TERA comp214656_c5 TERA comp214706_c0comp214706_c0 MEIS2 MEIS2 comp214739_c0comp214768_c0 PIHD1 comp214768_c0 HYEP × HYEP 50.442 3.77e-145 × 48.101comp214810_c0 432 1.02e-43comp214814_c0 0.613 ANKS6 151 × ASPG 0.214 × × 57.895 5.48e-40 142 1 × X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 Table B.2.1: Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Cilia Proteome Ciliome SysCilia comp214773_c0 RANG Localization Known Ciliaryaround bb TF inter- Ciliopathy Survey Ciliome2radial spoke? RFX CilDB Ciliopathy Survey cytoplasm axonemal dynein complex assembly cytosol lexm rtdr1, Name rsph14 ccdc103 axoneme cfap126, c1orf192 7 6, 7 6, 8 Ensembl IDENSG00000165383 Source 4, 5ENSG00000123728ENSG00000188931 3 2, 4, 5, 6, Gene ENSG00000158457ENSG00000162398 3 lrrc18 4, 5, 7ENSG00000100218 rap2c 1, 2, 4, 5, ENSG00000260916 c1orf177, ENSG00000240038 tspan33 3ENSG00000167131 2 2, 3, 4, 5, ENSG00000165280 ccpg1ENSG00000187288 1 amy2b 4, 5ENSG00000213024ENSG00000187510 8ENSG00000135709 3 3, 4, 5ENSG00000134138 cidec vcp 3, RFX 4, 5 kiaa0513 nup62ENSG00000143995 plekhg7ENSG00000125740 Rfx2 3 transition zoneENSG00000153936 2ENSG00000104872 3 SysCilia Rfx2 4, 5ENSG00000143819 3, 4, 5 CilDB meis1ENSG00000164114 comp214493_c0 4, 5 hs2st1ENSG00000150556 pih1d1 Rfx2ENSG00000099901 comp214544_c0 3 RAP2C FoxJ1 ephx1 8ENSG00000115459 TSN33 ENSG00000179387 3 map9ENSG00000165138 3 6 lypd6bENSG00000185055 comp214648_c0 comp214648_c0 RFX ranbp1 4, 5, 7ENSG00000240891 basal body RFX Rfx2 elmod3 2 comp214661_c0 AMYP AMYP elmod2 efcab10 anks6 RFX NUP62 comp214656_c5 axoneme plcxd2 comp214681_c0 Rfx2 TERA Ciliopathy Survey FoxJ1 Rfx2 RFX PKHG7 RFX Ciliome2 comp214706_c0 RFX comp214706_c0 Cilia Proteome comp214729_c1 Rfx2 MEIS2 HS2ST HTH RFX Rfx2 Rfx2 FoxJ1 comp214768_c0 comp214782_c7 HYEP comp214816_c0 comp214782_c7 × ELMD3 45.736 PLCX3 ELMD3 1.11e-32 122 0.173 × 188 Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 41.781 3.47e-2635.40438.168 9.54e-16 9969.93 6.61e-144 77.4 2e-51 434 129.851 0.151 2.68e-07 0.849 16762.312 × × 44.934 2.62e-87 × 33.929 53.1 2.67e-125 Basal Body, Centrosome 0.00026336.704 1 267 39252.273 4.15e-62 38.9 1.47e-74 1 × 56.12267.347 5.69e-124 200 1 × 136.567 231 165.247 1.83e-19 × 357 × 0 × 1 93.2 1 046.411 × 705 191.453 1.45e-148 × 0.11764.474 2.69e-71 108573.096 × 8.8e-28 0.883 464 × 45.089 2.69e-108 2.63e-65 225 × 48.498 1 110 30976.291 5.54e-74 1 216 × 72.444 1 × 233 0 1 0.481 1 × 39.286 0 0.519 1.27e-24 × × × 820 × 112646.965 92 1.77e-9742.47 1 1 2.3e-9741.322 295 × 1 5.47e-2446.467 × 1.17e-110 293 × 95.9 Axonemal, Basal 1 Body, 357 × 1 1 × 1 × × DE Info → ↑ ↑ ↓ ↓ ↑ → ↑ ↓ ↓ ↓ → → ↑ ↑ → ↓ ↑ → → → ↓ ↑ ↓ → ↑ ↓ Name Component ID Local Gene (continued) comp214858_c1 CC28A comp214918_c0comp214982_c2 EST2 × ROBO2 comp215014_c0 34.463 1.36e-94comp215029_c1 CI116 300 HACD2 comp215074_c0 1 × CC147 comp215232_c0comp215257_c0 PRDM9 TCPD comp215277_c2comp215291_c3 CT085 comp215310_c0 PAR12comp215313_c3 × SCPDL comp215318_c0 31.59 KLD8A 3.93e-55 F177A 198 1 × X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey CilDB Ciliopathy Survey Ciliome2 Cilia Proteome SysCilia comp215267_c2 CTNB Localization Known Ciliarycentriolar satellite? TF inter- chlamy- domonas flagellarprotein cytosol ptplb Name cfap58, ccdc147 c15orf65 1700021f07rik 7 Ensembl IDENSG00000160050 Source 6, 8ENSG00000264364ENSG00000206113 1, 7ENSG00000163132 Gene 7ENSG00000198848 ccdc28b 3 2, 4, 5 bbs-associated ENSG00000166250 dynll2 4, 5ENSG00000134970 cfap99ENSG00000159399 3ENSG00000261652 ces1 3 4, 5ENSG00000198203ENSG00000206527 2 clmp 4, 5ENSG00000063854 tmed7ENSG00000198055 flj27352, 7 CilDB Ciliome2ENSG00000263155 3 hk2ENSG00000120051 3 sult1c2 1, 2, 3, 6, hacd2, ENSG00000119669ENSG00000155313 3ENSG00000154582 hagh 3ENSG00000128016 7 grk6 gcom1ENSG00000152700 3ENSG00000125812 7 4, 5ENSG00000161551 irf2bplENSG00000115484 3 usp25 4, 5 Rfx2ENSG00000168036 RFX tceb1 comp214879_c0 zfp36 8 Ciliome2 sar1b gzf1ENSG00000124237 RFX 4, 5, znf577 7ENSG00000059378 NA Rfx2 cct4 4, 5ENSG00000143653 comp214888_c0 comp214879_c0 Rfx2 ctnnb1 4, c20orf85, 5ENSG00000162873 FoxJ1 centrosome Ciliome2 HOX7P 4, 5ENSG00000151327 Ciliome2 NA parp12 4, 5 comp214989_c1ENSG00000100429 sccpdh 3 Rfx2 Rfx2 comp214996_c2 klhdc8a comp215028_c0 TMED3 fam177a1 HXK2 comp215036_c1 Rfx2 S1C2A Rfx2 hdac10 comp215054_c0 comp215054_c0 Rfx2 GLO2 RFX GRK5 GRK5 Rfx2 comp215086_c0 comp215137_c1 comp215105_c1 RFX comp215209_c8 comp215150_c1 I2BPL UBP28 ELOC × SAR1B comp215232_c0 51.945 RFX TISB 2.58e-119 RFX PRDM9 RFX 372 RFX 1 Rfx2 × comp215332_c1 HDAC6 189 Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Ciliary Membrane, Transition Zone Ciliary Membrane, Transition Zone Centrosome, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted 50 3.41e-62 213 1 × tity Per- cent Iden- 34.03 8.85e-83 29936.47139.211 3.58e-32 7.87e-9343.966 1 1.72e-25 13044.444 28436.768 2.06e-31 × 0.314 2.13e-70 103 0.686 Transport, Axonemal, Basal 46.279 × 110 6.73e-143 × 244 1 41692.557 1 × 79.195 183.962 8.58e-80 0.322 × 3.33e-55 0 × 75.887 ×86.648 4.39e-143 263 179 613 Axonemal 434 0.20486.364 0 0.474 × 186.275 0 × 620 1 1.51e-128 × 82.323 Basal Body × 0.266 375 619 1.27e-12083.732 × 3.88e-121 0.161 0.266 348 Axonemal, Basal Body, × 369 ×91.787 0.14953.883 1.7e-13855.533 9.04e-72 Axonemal, Basal 0.158 Body, × 388 × 33.333 216 035.354 6.71e-120 0.642 5.83e-72 0.358 38862.021 557 × 2.39e-139 × 23744.156 395 1 1 1.9e-54 1 × × 191 × 1 Axonemal, Basal Body, Axonemal, Ciliary × 1 × DE Info ↑ ↑ ↑ → → ↑ → → → → → → → → → → → → → ↓ ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp215393_c0comp215409_c0 CERS6 comp215446_c0 Z658B comp215466_c0 CA173 ZMY10 comp215466_c0comp215468_c4 PP2AB comp215472_c0 MAD comp215472_c0 KAPCA comp215472_c0 KAPCA comp215472_c0 KAPCA comp215492_c0 KAPCA SPHK2 comp215513_c1 GLT11 comp215574_c0comp215579_c0 TTC29 CU059 X X X X X X thy Cil- iopa- action Rfx2FoxJ1 comp215392_c0FOXJ1 LOXH1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey Ciliome2 Ciliome2Ciliopathy Survey RFX RFX SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliaryconnecting cilium tz (connecting cilium) TF inter- dynein assembly axonemal dynein ttc29 axoneme Name c1orf173 zmynd10 axonemal 6 6, 7, 8 Ensembl IDENSG00000104237 Source 3, 6, 7, 8ENSG00000143850 Gene ENSG00000154227 3 2, 3ENSG00000179456 cilium 4, 5ENSG00000143320ENSG00000178965 3 plekha6 2, 4, 5, 7 cers3ENSG00000004838 zbtb18 2, 3, 4, 5, erich3, crabp2ENSG00000113575ENSG00000105290 6ENSG00000171121 3 4, 5ENSG00000170365ENSG00000072062 3 1, 6 ppp2caENSG00000142875 kcnmb3 bb aplp1 6ENSG00000144395 smad1 prkaca 3, 4, 5 bbENSG00000153237 axoneme Rfx2 4, 5 prkacb CiliopathyENSG00000145526 Survey Rfx2 ccdc150ENSG00000176170 CilDB bb Ciliopathy 2 axoneme Survey RFX 3, 4, 5ENSG00000138069 ccdc148 Rfx2 CiliopathyENSG00000111737 Survey 7ENSG00000178234 7 comp215393_c0 6 sphk1 cdh18ENSG00000173402 CERS6 ENSG00000137473 3 1, 2, comp215439_c0 4, rab1a 5, ENSG00000159079 rab35 galnt11 RFX Rfx2 1, 2, 4, 5 golgi MYP2 ENSG00000162639 Rfx2 dag1 2 c21orf59 Ciliopathy Survey comp215466_c0 Ciliome2 RFX Ciliome2 henmt1 comp215468_c5 PP2AB RFX SMAD5 CilDB Cilia Proteome RFX FoxJ1 comp215472_c0 Rfx2 comp215508_c0 KAPC comp215508_c0 RAB1A RAB1A FoxJ1 comp215542_c0 DAG1 comp215581_c6 HENMT 190 Body, Central Pair, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 50 2.45e-22 84.7 1 × tity Per- cent Iden- 56.707 5.59e-6267.633 9.74e-85 188 26864.90465.926 2.5e-77 1 0.51849.827 1.86e-5935.54 7.35e-101 × 44.792 × 9.74e-54 24940.669 2.02e-53 194 301 5.32e-83 Transport, Axonemal, Basal 0.482 179 18233.273 280 1 × 1 6.14e-77 × 1 × 146.667 251 1 × 1.9e-25 × × 62.813 5.39e-148 1 11148.913 431 × 28.78 2.17e-16579.814 6.96e-28 185.538 489 188.05 × 0 106 0 5.1e-9996.84 × Regulation 1 536 1 603 × 30997.291 0 × 0.661 0.339 1 912 0 × 93.431 × × 0.1397.065 916 098.423 ×97.072 0.13 0 79692.774 Axonemal, Basal Body, 093.939 × 0.113 0 909 Axonemal, 0 Basal 925 Body, 0 × 0.129 912 0.132 819 Axonemal, Basal Body, × 831 × 0.13 0.117 0.118 × × × Axonemal DE Info ↓ → ↑ ↑ → ↓ ↓ → ↑ → ↑ → ↑ → ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp215635_c1 RSH4A comp215733_c0 CASC1 comp215741_c1 MIEAP comp215787_c1comp215795_c0 TTLL3 TBC19 comp215825_c0 CSN5 comp215880_c0comp215880_c0 TBB TBB comp215880_c0comp215880_c0 TBB TBB X X X X X thy Cil- iopa- action RFX FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Ciliome Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey SysCilia CilDB Ciliopathy Survey SysCiliaCilDB Ciliopathy Survey Ciliome2 comp215880_c0 TBB Localization Known Ciliarycentral pair TF inter- epithelial cell differentiation centriole cytosol microtubule cytoskeleton centriole cytosol microtubule cytoskeleton centriole cytosol cytoskeleton casc1 CilDB Ciliome2 Ciliome RFX Name rsph4a cilium axoneme tubb2b basal body 6, 7, 8 5, 7 8 Ensembl IDENSG00000127445 SourceENSG00000171530 7ENSG00000111834 7 1, 2, 4, 5, Gene ENSG00000104941ENSG00000171843 1 pin1ENSG00000127838 3 tbcaENSG00000118298 3ENSG00000185305 2ENSG00000118307 2 1, 2, 3, 4, rsph6aENSG00000163071 mllt3 pnkd 2, 3, 4, Ciliome2 5 ca14 arl15ENSG00000234602 spata18 4, 5, 6ENSG00000109680 CilDB 2, 4, 5ENSG00000120658 mcidasENSG00000101695 3 Cilia multi-ciliated ENSG00000048392 Proteome 2 tbc1d19ENSG00000121022 2 4, 5ENSG00000137266ENSG00000180011 3ENSG00000137267 3 enox1 FoxJ1 1, 6, rnf125 8 comp215586_c0 rrm2b cops5ENSG00000137285 Rfx2 slc22a23 Cilia Proteome 1, Rfx2 Ciliome 4, 5, 6, PIN1 comp215623_c3 tubb2a zadh2 FoxJ1 comp215682_c4ENSG00000198211 basal body RFX 8 RFX comp215635_c1 TBCA ENSG00000104833 comp215668_c2 CAH1 ENSG00000188229 Ciliome 1 comp215682_c0 RSH4A 1, 4, 5ENSG00000183311 comp215725_c0ENSG00000176014 1 ENL ENSG00000258947 tubb3 PNKD 1 ARL15 1, 6, basal 7 body tubb4b tubb4a FoxJ1 Rfx2 tubb FoxJ1 tubb3 tubb6 RFX microtubule Rfx2 CilDB comp215817_c0 CilDB Rfx2 comp215797_c0 comp215807_c0 CilDB RIR2 ENOX1 CilDB RN166 comp215825_c0 comp215830_c0 CSN5 RFX ZADH2 × 52.601 1.83e-122 358 comp215880_c0 1 comp215880_c0 TBB comp215880_c0 × TBB TBB 191 Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Basal Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted 50 5.49e-45 144 1 × Transport, Axonemal, Basal tity Per- cent Iden- 34.234 2.04e-5643.697 2.34e-31 18961.259 1.99e-174 116 1 52157.025 × 80.23 1.5e-150 1 × 28.497 1 430 2.19e-68 0 × 232 903 1 Transport, Axonemal,41.606 Basal 90.11 1.72e-33 × 47.327 1.41e-60 1 1 11445.576 0 183 × × 2.7e-111 Regulation, Axonemal, Transport, Axonemal, 1 Basal 786 345 147.126 × 72.47746.64 × 7.8e-53 1 1 1.47e-64 044.04145.741 5.86e-115 × 179 × 82.011 2.03e-82 206 549 1.14e-109 342 254 334 174.497 142.612 3.49e-168 1 126.641 × 0.41625.179 3.58e-05 × × 1 46886.538 2.18e-25 × 0 × 3.44e-134 × 46.630.185 0.58464.331 5.89e-87 114 1447 375 7.03e-67 × 0.029 0.071 281 × 0.9 216 1 × × × 1 1 × × DE Info → → ↑ ↓ ↑ → → ↓ ↓ → → → → → ↓ → ↓ ↓ ↓ ↓ ↓ → → ↓ Name Component ID Local Gene (continued) comp215883_c0comp215888_c0 NXRD1 comp215927_c2 ANR52 TTLL9 comp215953_c2comp215956_c0 TTC26 MKKS comp216013_c0comp216049_c4 CAV1 × HSB11 37.778 5.05e-38comp216061_c1 129 PURA comp216162_c0 1 CDK1 × comp216377_c1 STOM × 58.209 1.45e-103 304 1 × Axonemal, Basal Body, X X X X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Survey Cilia Proteome Ciliome SysCilia Ciliopathy Survey Localization Known Ciliary TF inter- modification (polyglutamyla- tion) cilium ift-b cytosol bbs - bb assembly/fold- ing factor axoneme ttll9 cilium axoneme ttc26 basal body lrrc34 CilDB Ciliome2 RFX Name hspb11 cilium ift ift-b SysCilia CilDB Ciliopathy 5, 6, 8 6, 7, 8 5, 6, 8 5, 7 Ensembl IDENSG00000165555 Source 2, 4, 5ENSG00000230062 4, 5ENSG00000107719 Gene ENSG00000131044 noxred1 2 1, 2, 3, 4, ankrd66ENSG00000139637ENSG00000105948 7 1, 2, 4, 5, pald1ENSG00000125863 6, 8 myg1ENSG00000187713ENSG00000130520 3ENSG00000113231 3 mkksENSG00000105974 7 basal 4, body 5ENSG00000085465ENSG00000081870 2 tmem203 1, 2, 3, 4, RFX lsm4 pde8bENSG00000071127 cav1ENSG00000164985 7 RFX ENSG00000172733 3 ovgp1 4, 5ENSG00000148154ENSG00000064961 FoxJ1 3ENSG00000171757 3 1, 2, 3, 4, wdr1 psip1 Ciliome2 purgENSG00000170312ENSG00000143341 1 comp215898_c0ENSG00000114455 hmg20b ugcg 7ENSG00000139915 3ENSG00000075785 2ENSG00000140022 7 PALDENSG00000138152 3ENSG00000172818 × Rfx2 2, Ciliome2 7ENSG00000133115 cdk1 3 hmcn1 comp215946_c0 39.099 6, 8 hhla2 Rfx2 mdga2 RFX MYG1 rab7 0 btbd16 ston2 FoxJ1 comp215965_c1 stoml3 ovol1 551 comp216001_c2 basal body comp215978_c0 CilDB TM203 PDE8B Cilia 1 Rfx2 Proteome comp216020_c0 RFX LSM4 Ciliome2 × Ciliome2 Rfx2 Rfx2 CHIA comp216050_c0 FoxJ1 comp216054_c0 WDR1 comp216147_c0 comp216094_c5 FoxJ1 Rfx2 CYC comp216166_c2 HM20A Rfx2 CEGT HMCN1 comp216162_c0 Rfx2 comp216317_c0 comp216166_c2 comp216166_c2 comp216264_c0 CDK1 BTBDG HMCN1 comp216279_c2 HMCN1 RAB7A comp216372_c1 STNB × OVO 32.277 1.68e-86 290 1 × 192 Ciliary Membrane, Transition Zone Centrosome, Other Organells Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 34.34346.631 2.66e-3140.526 2.27e-110 1.37e-9536.441 115 32755.34 5.53e-30 290 1.28e-112 0.5378.042 124 1 340 0.47 × × 76.803 × 0 132.558 1 6.58e-77 0 × 567 ×25.731 256 0.52173.504 521 Axonemal, Basal 4.4e-54 Body, ×84.571 0.479 2.51e-96 1 0 19788.083 Axonemal, Basal Body, × 60.635 5.45e-116 × 1.54e-141 286 535 336 137.455 402 1.71e-43 0.46 × 0.54 1 0.728 ×29.941 150 × 3.6e-63 × × Basal Body, Ciliary 0.272 Central Pair 22765.541 × 1.31e-5667.85776.19 6.32e-173 139.076 200 1.58e-69 8.9e-36 486 × 0.29244.892 20878.684 0.708 Axonemal, Basal 149 Body, ×61.491 × 0 6.94e-69 Basal 0.165 Body 0 135.63 3.41e-95 × 754 × 219 637 0.835 300 0.256 0.744 × × 0.396 × × Axonemal Axonemal DE Info ↑ ↓ ↓ → → → → ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ → → → ↑ ↑ → → ↑ Name Component ID Local Gene (continued) comp216399_c0comp216417_c5 WNT5A comp216433_c2 FLCN comp216433_c2 BBS5 comp216462_c0 BBS5 CERKL comp216504_c0comp216517_c0 ODPB comp216525_c0 TBA2 comp216525_c0 DJB13 comp216555_c5 TMM53 comp216571_c0 KLHL6 × OFD1 29.084comp216572_c0 1.96e-62comp216609_c0 224 ALF COBA1 comp216629_c1 1comp216629_c1 × GNAS comp216634_c1 GNAS TXND3 X X X X X X X thy Cil- iopa- action FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey CilDBCiliopathy Survey RFX Ciliome2 SysCilia Ciliopathy Survey Ciliopathy Survey Ciliome2 SysCilia Ciliopathy Survey RFX Localization Known Ciliary TF inter- ift-associated cytosplasm nucleus bb distal end centriolar satellite signalling axonemal dynein assembly (oda) ift80, nme8 Name dnajb13 radial spoke CilDB Ciliopathy Survey rp11-724o16.1 6, 7 8 Ensembl IDENSG00000184988 SourceENSG00000108379 2ENSG00000115596 3 2, 3ENSG00000145216ENSG00000154803 Gene 2 tmem106a 6ENSG00000163093 wnt3 1, 6, wnt6 8ENSG00000251569 fip1l1 1, 4, 5ENSG00000188452 flcn 2, 3, 4, 5 bbs5 axonemebb basalENSG00000198870 body bbs - ENSG00000168291 bbs5, 7 Ciliopathy Survey 4, cerkl 5ENSG00000118197 6ENSG00000137692ENSG00000187726 3 1, 2, 4, stkld1 5, pdhbENSG00000126106 FoxJ1 4, 5 ddx59ENSG00000163376 dcun1d5 4, nucleus 5ENSG00000046651 Rfx2 Rfx2 6, 8 tmem53 FoxJ1 comp216391_c1ENSG00000106012 kbtbd8 4, 5, 6ENSG00000109107 T106B comp216399_c0 ofd1ENSG00000117505 7ENSG00000189157 3 basal body comp216413_c0 2, WNT5A 4, RFX 5ENSG00000204248 iqceENSG00000087460 2 FIP1 bb 6, 7ENSG00000100583 fam47e aldoc 4, 5, RFX 7 dr1ENSG00000086288 4, 5, 6, 7, col11a2 Rfx2 gnas Ciliopathy samd15 Survey axoneme - RFX Ciliome2 comp216495_c3 RFX RFX comp216517_c0 SGK71 TBA2 RFX Rfx2 FoxJ1 comp216572_c0 RFX comp216582_c3 ALF comp216609_c0 NC2B COBA1 193 Membrane, Transition Zone Found Category Score EValue BitScore Weighted 54 3.22e-151 434 1 × Axonemal, Ciliary tity Per- cent Iden- 48.372 8.63e-5347.93755.838 5.5e-88 9.57e-79 18566.667 1.04e-48 273 0.244 23634.737 6.26e-45 × 166 0.3654.634 3.87e-158 137.795 × 158 9.54e-37 1 × 45459.091 4.82e-103 × 140 150.794 308 1 4.72e-33 ×60.36 × 0.448 141.689 Ciliary Membrane 118 2.92e-86 1.81e-177 × × 0.17261.905 262 551 1.69e-10154.167 × 45.082 1.07e-89 0.381 312 1.21e-2954.472 1 × 1.05e-49 26954.118 120 × 170.807 3.35e-29 3.43e-85 156 × 128.355 102 1 0.60528.191 5.48e-49 248 × 2.94e-36 × 0.39587.574 ×70.902 2.11e-108 17941.905 × 1.35e-121 147 Transport 157.921 6.91e-85 30639.273 9.1e-88 × 347 4.54e-34 156.994 259 139.706 × 258 1 4.93e-76 129 × 177.154 0 × 1 × 270 1 1 × 0 825 × × Basal Body, Transition Zone 1 829 1 × × 1 × DE Info ↑ ↑ ↑ ↓ → ↑ ↑ ↑ ↑ ↑ → ↓ → ↑ ↑ ↑ ↓ ↑ ↑ → → ↓ ↓ ↑ ↓ → ↓ ↑ Name Component ID Local Gene (continued) comp216634_c1comp216637_c4 TXND3 comp216644_c2 IF4E3 KLF6 comp216684_c0comp216693_c0 OFUT2 comp216707_c0 ZN474 comp216707_c0 CA228 CA228 comp216710_c0comp216738_c0 WDR49 comp216758_c0 PAX3 comp216772_c0 LRC27 comp216774_c2 TC1D2 MORN5 comp216801_c0 CC142 comp216835_c0 CCD34 comp216880_c0 TEKT1 X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 Table B.2.1: SysCiliaCiliome2 RFX comp216654_c1 SHH Ciliome Ciliome2 Ciliome Localization Known Ciliary TF inter- membrane tekt1 axoneme CilDB Ciliopathy Survey Name gm1661 6, 7 Ensembl IDENSG00000165084 Source 4, 5ENSG00000181322ENSG00000163412 2 4, 5ENSG00000127528 Gene 3, 4, c8orf34 5ENSG00000164690 8ENSG00000186866 nme9 eif4e3 4, 5ENSG00000164185 4, 5ENSG00000198520 2, 4, 5, 7 shhENSG00000197889 pofut2 2, ciliary 4, 5 c1orf228, znf474 Cilia Proteome CiliomeENSG00000204220ENSG00000174776 3 3, 4, 5 FoxJ1 meig1ENSG00000009709 4, 5ENSG00000121310ENSG00000148814 2 wdr49 RFX pfdn6 4, 5ENSG00000213123 comp216634_c1 1, 4, 5,ENSG00000272741 6 RFX pax7ENSG00000185681 1 TXND3 2, echdc2 4, RFX 5, 7 tctex1d2 lrrc27ENSG00000163645 ift-dyneinENSG00000135637 7 rp11-447l10.1 4, 5 morn5ENSG00000241553 RFX ENSG00000164924 7ENSG00000075188 RFX CilDB Ciliopathy 1 SurveyENSG00000182132 8ENSG00000109881 ccdc142 2 erich6 4, 5 RFX ENSG00000165671 RFX ENSG00000175356 CilDB 3 arpc4ENSG00000173276 3 ywhazENSG00000167658 Ciliome2 Cilia 3 Proteome nup37ENSG00000167858 1 RFX kcnip1 ccdc34 Rfx2 transition 1, zone 2, 4, 5, SysCilia nsd1 scube2 RFX FoxJ1 RFX eef2 comp216707_c0 CilDB CA228 comp216752_c0 RFX comp216772_c0 AUHM CilDB TC1D2 comp216814_c0 FoxJ1 RFX comp216783_c5 NUP37 Rfx2 Rfx2 comp216811_c2 Rfx2 comp216810_c0 F194A comp216819_c0 14332 ARPC4 KCIP4 comp216848_c0 comp216857_c1 comp216871_c0 comp216873_c1 MATN2 NSD2 TRIO × EF2 39.216 7.58e-23 105 1 × 194 Membrane, Transition Zone Found Category Score EValue BitScore Weighted 50 8.52e-15 70.9 1 × tity Per- 84.5 1.82e-120 35262.5 1 3.39e-61 × 185 1 × cent Iden- 42.13263.448 1.39e-9156.818 1.16e-6055.474 1.45e-58 29040.972 20234.239 2.09e-61 200 0 2.29e-109 0.502 1 0.49827.16 199 341 65047.312 × 2.26e-20 × 38.745 × 1.33e-50 2.78e-93 1 90.9 1 1 16230.435 291 × 9.34e-29 × × 45.798 1 Basal Body, Transition Zone 4.7e-65 1 Axonemal, Ciliary 108 1 × × 211 × 64.17444.848 8.1e-149 1 4.02e-82 × 157.398 430 3.47e-132 256 Basal57.037 Body × 89.577 1.7e-134 38847.015 1 1.08e-38 165.294 393 0.497 0 × 4.81e-80 × 128 × 0.50354.042 590 Axonemal 83.721 1.3e-120 25029.508 × 5.76e-40 4.06e-18 1 35851.858 1 149 1 × 78.2 0.706 × × 0.29448.485 0 4.23e-107 × Transport × 138.166 318 685 7.44e-79 × 36.842 9.25e-28 244 1 1 115 × 0.68 × 0.32 × × DE Info → → → ↑ ↓ ↑ → → → → → ↑ → ↓ ↓ ↑ ↓ ↓ → ↓ → ↓ ↓ → ↑ ↑ → → Name Component ID Local Gene (continued) comp216907_c2comp216914_c0 TTLL4 S47A1 comp216938_c1comp216957_c0 ZNT1 comp216961_c4comp216972_c5 SIX3 CJ107 FBX16 comp216986_c0comp217004_c1 SPAG1 CEL3A comp217034_c3 MOK comp217043_c2comp217062_c1 M1I1A comp217135_c0 WDR16 comp217181_c1 ZMY12 WDR54 X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FOXJ1 Rfx2 RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey Ciliopathy Survey Ciliome RFX Ciliome2 CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Proteome Ciliome Localization Known Ciliary TF inter- membrane axoneme bb Ciliopathy Survey dynein assembly unclear localisation mok regulation - ift CilDB Ciliopathy Survey Name wdr16 cfap52, zmynd12 CilDB Ciliome2 Cilia 1700040l02rik 6, 7 6, 7 7 Ensembl IDENSG00000164031 SourceENSG00000131143 3ENSG00000133030 7ENSG00000135912 3 4, 5ENSG00000163002 Gene ENSG00000180638 8 dnajb14 6, 8 cox4i1ENSG00000169288ENSG00000254858 mprip 3ENSG00000170385 ttll4 3 4, 5 slc47a2 nup35ENSG00000138083 ciliary 4, transition 5 zoneENSG00000183346 6, 7ENSG00000214050 SysCilia mpv17l2 mrpl1 slc30a1 2, 4, 5ENSG00000164107 c10orf107, ENSG00000133398 3ENSG00000168439 3ENSG00000104450 fbxo16 3 2, 4, 5, 6ENSG00000161082 4, 5 hand2ENSG00000159409 Rfx2 med10ENSG00000113712 spag1 3ENSG00000035141 stip1 7 axonemal Rfx2ENSG00000164318 3ENSG00000080823 RFX 3 1, 3, celf5 4, 5, comp216913_c0ENSG00000137203 comp216881_c4ENSG00000119703 csnk1a1 celf3 3 fam136aENSG00000165175 2 NUP53 Rfx2 Rfx2 comp216891_c1 comp216891_c1 3, DJB12 RFX egflam 4, 5ENSG00000166596 ADCK3 1, ADCK3 2, 4, RFX 5, tfap2aENSG00000066185 mid1ip1 zc2hc1c RFX 1, 2, 4, comp216937_c1 comp216915_c1 5, Ciliome2ENSG00000005448 4, 5ENSG00000010327 M17L2 RM01 Rfx2 2 Rfx2 Rfx2 wdr54 RFX comp216976_c0 stab1 comp216977_c4 Rfx2 Rfx2 comp216983_c0 TWST2 Rfx2 MED10 STIP1 comp217005_c0 Rfx2 RFX FoxJ1 comp217004_c1 comp217005_c2 KC1A comp217017_c0 CEL3A F136A EGFLA comp217035_c1 comp217035_c1 × RFX AP2A AP2E FoxJ141.654 comp217181_c1 WDR54 0 571 1 × 195 Ciliary Membrane, Transition Zone Membrane Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Centrosome, Other Organells Found Category Score EValue BitScore Weighted 50 3.24e-61 18964 1.2e-84 1 249 × 1 × tity Per- cent Iden- 39.602 1.72e-115 35381.23 180.583 2.84e-125 0 × 352 52427.1742.179 2.15e-37 8.47e-9167.609 1 1 15055.06 277 × ×42.246 3.67e-118 045.161 7.09e-5130.884 Axonemal, 1.11e-59 Basal Body, 162.556 8.54e-84 351 1 583 16262.138 × 186 × 34.524 291 0 0.466 1 2.7e-06 0.534 1 0 × × 592 × × 1 52.4 584 0.503 × 0.497 1 × 44.898 1.36e-61 × × 44.08134.085 3.75e-95 194 1.5e-47 28926.891 1 169 7.26e-24 × 0.626 145.415 101 1.08e-112 × × 94.565 0.374 351 Transport, 7.16e-61 Axonemal, Basal × 182 1 0.167 × × Axonemal, Basal Body, DE Info ↑ ↓ ↓ ↓ → → ↑ ↓ ↓ → → → ↓ ↓ ↑ ↑ ↓ ↑ ↑ → → Name Component ID Local Gene (continued) comp217213_c0 MDH1B comp217216_c1comp217220_c0 NDK6 KAPR comp217276_c0 FAIM1 comp217357_c0comp217367_c0 FYN SSR2comp217367_c0 × 30.275 SSR2comp217368_c3 1.15e-45 × 35.946 ODF3A 162comp217381_c4 1.05e-71 0.315 DCDC2 228comp217381_c4 × 0.444 Axonemal, Ciliary DCDC2 comp217389_c0 × SAS6 comp217312_c11 WDR31 X X X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 RFX FOXJ1 Rfx2 FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey FoxJ1 SysCilia SysCilia Ciliopathy Survey Ciliome2 SysCilia Ciliopathy Survey Localization Known Ciliary TF inter- membrane axoneme ciliary signalling cilium cytosol ciliary membrane axoneme centriole bb (centriole) odf3 CilDB Ciliome2 RFX Name dcdc2, mdh1b Ciliome2 RFX dcdc2a 7 7 7, 8 Ensembl IDENSG00000138400 Source 2, 3, 4, 5, ENSG00000172113 Gene 4, 5ENSG00000108946 6, 7ENSG00000277791ENSG00000158234 7 2, nme6 3, 4, 5 prkar1aENSG00000172175 bb axonemeENSG00000148225 3 4, psmb3 5 faimENSG00000130707 CiliopathyENSG00000134253 Survey Ciliome2 7ENSG00000123159 3ENSG00000163154 7ENSG00000183578 3 malt1ENSG00000187231 wdr31 3ENSG00000176105 3 4, 5ENSG00000197122 ass1 trim45 Ciliome2 CiliaENSG00000184863 Proteome Ciliome 3 tnfaip8l2ENSG00000128285 gipc1 2 tnfaip8l3 2, 6, 8 sestd1 RFX yes1ENSG00000162009 RFX 3, 4, 5 mchr1 rbm33ENSG00000126353 src Ciliome2ENSG00000177947 ciliary 2 1, 3, 4, 5, ENSG00000162761 sstr5ENSG00000146038 2 3, 4, 5, 6, ccr7ENSG00000222046 Rfx2 comp217238_c0 RFX 2, 3, 4, 5 Rfx2ENSG00000156876 PSB3 Rfx2 6, 8 dcdc2b Rfx2ENSG00000158373 comp217292_c5 Rfx2 1 comp217320_c0 RFX MALT1 comp217325_c0 FoxJ1 comp217334_c0 Rfx2 ASSY basal comp217334_c0 body hist1h2bd comp217329_c1 TRIM2 comp217345_c4 TFIP8 × RFX TFIP8 GIPC1 25.105 comp217365_c0 1.78e-45 comp217357_c0 FoxJ1 172 CilDB NA FYN FoxJ1 1 RFX comp217367_c0 × comp217373_c0 SSR2 × LMX1B 29.664 1.71e-31 124 comp217393_c2 0.241 × H2B 196 Body Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 94.56594.565 6.78e-6195.652 1.35e-6094.565 4.56e-6195.652 1.28e-60 18247.044 5.21e-61 18143.384 4.94e-126 182 0.167 182 0.166 182 379 0.16755.462 × 0 0.16763.443 × 1.69e-32 0.16736.364 × × 2.28e-49 721 1 × 136 035.78 × 164 0.841 0.159 3.62e-6928.652 796 × × 4.2e-2699.27 1 23169.186 8.03e-96 1 111 ×35.931 1.47e-99 × 0 272 1 Signaling, Axonemal, Basal 25.529 1 7.31e-89 × 32241.245 508 1 7.29e-138 × 32.857 320 3.52e-98 × 1 417 1 ×68.471 308 × 130.842 4.46e-165 Centrosome 139.333 1.61e-150 × 31.025 1.48e-114 462 1.86e-54 × 1 475 348 ×41.228 196 1 2.09e-1937.673 Axonemal, 1 Basal Body, 0.698 1 7.22e-74 × 84.7 × 26.543 × × 6.74e-08 233 0.30244.9163.125 4.6e-32 0.814 × 53.1 6.24e-154 × 0.186 441 123 × 1 1 × × Basal Body, Centrosome DE Info → → → → → → ↑ ↑ ↓ ↑ → → → → ↑ ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↑ ↑ ↑ → ↑ Name Component ID Local Gene (continued) comp217463_c1 ARMC3 comp217479_c0 SPEF1 comp217492_c2comp217503_c0 PK1IP comp217510_c2 CC105 comp217524_c1 CDK20 comp217557_c0 CCD13 K1875 comp217679_c0 DAAF3 comp217699_c0 ENO4 comp217699_c0comp217720_c0 ANR53 comp217720_c0 SPT17 comp217776_c0 CB073 TBE X X X X thy Cil- iopa- action FOXJ1 Rfx2 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Table B.2.1: Ciliome2 Cilia Proteome Ciliome Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX Localization Known Ciliary TF inter- satellite axonemal dynein complex assembly eno4 Ciliome2 RFX spef1 rootlet ciliary tip CilDB Ciliopathy Survey Name wdr97 dnaaf3 cytosol 5, 6, 7 8 7 Ensembl IDENSG00000180596 SourceENSG00000124635 1, 7ENSG00000184678 1, 7ENSG00000273703 1ENSG00000276410 7ENSG00000127837 Gene 7 hist1h2bcENSG00000165309 3 hist1h2bj 3, 4, 5, 7 hist2h2beENSG00000168743 hist1h2bhENSG00000125851 2 hist1h2bbENSG00000101222 2 armc3 1, 2, 3, aamp 4, ENSG00000111845 CilDB Ciliome2 CilDB 4, Ciliome2 5ENSG00000160994 npnt CilDB pcsk2 2, 3, Ciliome2 7ENSG00000274267 Ciliome2ENSG00000156345 7 4, pak1ip1 5, 7ENSG00000244607 ccdc105 2, 4, 5, 6ENSG00000179698 hist1h3b 4, 5 cdk20ENSG00000164930 ccdc13ENSG00000261701 3ENSG00000167646 centriolar 2 comp217393_c2 2, 4, 5, kiaa1875, 6, comp217393_c2 Ciliome2ENSG00000086189 RFX ENSG00000161638 Rfx2 3 H2B ENSG00000089195 Ciliome2 3 fzd6ENSG00000188316 H2B Ciliome2 comp217393_c2 comp217393_c2 3 hpr comp217393_c2 2, 3, 4, 5, FoxJ1 FoxJ1ENSG00000144031 H2B H2B Rfx2 dimt1 comp217396_c0 H2B 4, 5ENSG00000162814 itga5 RFX trmt6 2, 4, 5, 7 RFX AAMP ENSG00000177994 comp217463_c1 comp217464_c1 4, ankrd53 5ENSG00000186834 spata17ENSG00000074935 ARMC3 3 NEC2 6, 8 c2orf73 comp217507_c0 hexim1 Rfx2 Ciliome2 Ciliome centriole bb H3 FoxJ1 SysCilia Ciliopathy Survey Rfx2 RFX Rfx2 comp217560_c0 Rfx2 comp217575_c0 FZD1 RFX TMPS3 comp217682_c1 × comp217686_c0 25.984 comp217697_c0 1.41e-24 RFX DIM1 ITAV TRM6 102 Rfx2 1 × comp217755_c0 HEXIM 197 Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 46.22439.143 071.875 9.87e-69 1.1e-152 125762.36 253 440 1.56e-70 0.832 0.16876.048 0.31177.528 2.32e-90 × 23885.714 4.73e-162 × 45.062 3.09e-76 × Basal Body, Transition Zone 7.12e-32 286 0.168 452 Transport, Axonemal, Basal 52.619 249 × 0.20243.459 1.08e-151 131 0.31937.288 Basal Body × 5.06e-63 × 436 1 026.429 134.448 4.32e-27 205 × 30.921 6.72e-54 1062 × 149.878 6.43e-1140.952 7.31e-138 116 × 2.53e-53 181 144.66 65.9 1 416 8.13e-17 × 40.943 173 1 × 58.865 6.78e-134 1 1.77e-48 1 85.5 × 1 × 399 × 169.412 × 16474.125 7.01e-35 × 172.18 0.564 1 6.08e-56 × 127 063.291 × × 9.41e-28 173 0.43666.993 Transport, 1248 Axonemal, Basal 27.88939.623 × 3.32e-53 111 2.31e-98 0 1 1 0.16387.879 200 4.26e-59 × 307 568 × 55.717 × Transport, Axonemal, Basal 190 1 0.837 1 0 × × × 1 1000 Axonemal, Basal Body, × 1 × Axonemal, Basal Body, DE Info ↑ ↑ → → → → ↓ → → ↑ ↓ → ↑ ↑ ↓ ↑ ↑ ↓ ↓ ↓ ↓ ↑ → → ↑ ↓ ↓ ↓ Name Component ID Local Gene (continued) comp217778_c2 C2D2A comp217812_c0 GNPI2 comp217830_c0 MAP2 comp217874_c0 DHB2 comp217959_c0comp217986_c0 AAED1 comp217993_c0 CC177 AGM1 comp218002_c0comp218016_c3 IFT20 SEPT2 comp218042_c0comp218043_c0 FUZZY SOX14 X X X X X thy Cil- iopa- action RFX FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 SysCilia comp218048_c0 GCP2 Localization Known Ciliarytz bb subdistal appendages TF inter- (connecting cilium) cilium golgi ift ift-b centriole cytosol Name transition zone 6, 8 Ensembl IDENSG00000048342 Source 1, 3, 4, 5, ENSG00000014164ENSG00000102981 3 Gene 8ENSG00000100578 3, 4, 5, 6ENSG00000081014ENSG00000113552 zc3h3 3 pard6a kiaa0586ENSG00000169554 2ENSG00000078018 cilium 3 bb 3, 4, 5ENSG00000178802ENSG00000140443 7 ap4e1ENSG00000161267 gnpda1 3 SysCilia 3, 4, map2 5 Ciliopathy SurveyENSG00000168140ENSG00000163914 3ENSG00000185448 2ENSG00000096060 7 mpiENSG00000158122 bdh1 igf1r 2 RFX 4, 5ENSG00000267909 4, 5ENSG00000116957 vasnENSG00000042317 fam47c 2 rho 6, 8 Rfx2 fkbp5 aaed1 ccdc177ENSG00000006555 Ciliome2ENSG00000069329 3ENSG00000109083 comp217812_c0 7 spata7 1, tbce 6, 7, 8 cilium Rfx2 comp217778_c2 tz ENSG00000183644 FoxJ1 GNPI2 Cilia 4, Proteome 5 RFX ENSG00000164402 Rfx2ENSG00000183977 ttc22 C2D2A 2 ift20 vps35ENSG00000010361 7 basal 4, body 5, 6, 8 c11orf88 comp217812_c0 comp217812_c0ENSG00000125285 RFX Rfx2 FoxJ1 Ciliome 4, 5ENSG00000130640 comp217825_c2 sept8 GNPI2 GNPI2 8 pp2d1 fuz Rfx2 comp217859_c0 UBCD2 cytosol bb Ciliome2 FoxJ1 comp217890_c0 RFX comp217867_c3 RFX tubgcp2 MPI FoxJ1 SysCilia Ciliopathy Survey basal body OPSP comp217886_c0 ILPR Cilia Proteome RFX comp217942_c1 comp217930_c1 LIGO2 comp217989_c0 FKBP4 Rfx2 FA47E FoxJ1 TBCE RFX comp218000_c4 comp217993_c0 comp218016_c3 VPS35 AGM1 RFX SEP8B comp218033_c2 PP2D1 198 Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- 33.5 8.26e-22 101 1 × cent Iden- 48.90848.846 040.729 5.24e-83 2.3e-5338.71 547 250 6.82e-21 17760.0438.014 87.4 2.37e-51 1 1 152.091 0 × × 3.71e-58 193 1 ×81.513 63266.163 × 8.67e-140 197 Regulation 88.933 1.16e-157 195.146 2.01e-155 410 0.325 4.75e-62 1 × 451 45948.701 × 0.675 × 4.28e-141 21244.628 Transport, 0.684 Axonemal, × Basal 153.382 1.11e-77 414 0.31660.938 × 38.509 3.28e-50 × × 4.46e-25 236 052.128 143.46 175 2.1e-28 100 3.38e-53 59765.373 × 1 0.22735.774 2.06e-148 110 6.42e-65 0.773 × 17939.649 × 1 42735.849 8.55e-54 × Basal Body 2.04e-62 221 × 161.882 139.778 197 0.349 1 × 4.05e-117 21670.995 × 0 0.31166.616 × × 379 0.34154.589 × 2.12e-72 0 82954.181 × 0 5.78e-113 1 235 924 332 932 1 × 0.414 0.498 × 0.586 0.502 × × × × Axonemal, Ciliary DE Info ↑ → → → ↓ ↓ → → ↓ → → → → → → ↑ → ↓ ↓ ↓ ↓ ↓ ↑ → ↑ → → ↑ ↑ Name Component ID Local Gene (continued) comp218052_c1 CC135 comp218070_c0comp218075_c0 OTX2B comp218099_c0 NA comp218102_c0 XRRA1 ODO2 comp218142_c0comp218166_c1 RS3 comp218188_c6 UBP3 comp218229_c0 RSG1 comp218242_c0 TEPP comp218254_c1 HES4B MPIP comp218339_c4 TTC16 comp218356_c0comp218356_c0 NUDT9 NUDT9 X X X X thy Cil- iopa- action FOXJ1 Rfx2 RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: CilDB Ciliome2Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX Localization Known Ciliaryfactor TF inter- cilium ift ift-a drc7 Name c3orf84 ccdc135, bc048562, 7 Ensembl IDENSG00000159625 Source 1, 3, 4, 5, ENSG00000116096ENSG00000105392 7 Gene 6ENSG00000236980 4, 5, 7ENSG00000101557ENSG00000166435 7 rp11-694i15.6, 3, 4, 5, spr 7ENSG00000119650 4, 5, transcription 6, 8ENSG00000119689 xrra1 usp14ENSG00000183508 7ENSG00000149273 3ENSG00000164118 1 ift43 2, 4, 5 basal Ciliome2 body ENSG00000140455 4, 5ENSG00000132881 fam46c dlstENSG00000111679 6 cep44ENSG00000092841 Ciliome2 3 rps3ENSG00000159648 3 4, 5, 7ENSG00000185272 usp3ENSG00000114315 2 4, 5ENSG00000196361 rsg1 ptpn6ENSG00000164045 3 bb tepp 4, 5 CiliaENSG00000158402 Proteome Ciliome CilDBENSG00000101224 3 rbm11ENSG00000106100 RFX 3 hes1ENSG00000163931 comp218062_c1 3 RFX ENSG00000167094 7 cdc25a elavl3 4, 5, Ciliopathy 7 Survey ENSG00000132361ENSG00000124782 SPRE cdc25c 3ENSG00000179921 comp218085_c0 cdc25b 3 Ciliome2 6ENSG00000170502 Rfx2 nod1 ttc16 4, UBP14 5 tkt kiaa0664 rreb1 RFX gpbar1 RFX comp218129_c0 axoneme nudt9 Rfx2 comp218142_c0 comp218102_c0 Rfx2 FA46C Ciliome2 Ciliopathy Survey FoxJ1 ODO2 RS3 RFX RFX Rfx2 comp218209_c4 comp218209_c4 Rfx2 comp218239_c0 PTN11 Rfx2 PTN11 Rfx2 RFX RBM7 comp218244_c2 Rfx2 comp218254_c1 Rfx2 comp218254_c1 ELAV3 comp218316_c3 RFX comp218319_c0 MPIP MPIP comp218345_c0 CN16B TKTL2 comp218345_c0 CLU CLU 199 Other Organells Centrosome, Transition Zone, Other Organells Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 52 2.15e-1740 76.6 1.28e-23 91.7 1 × 1 × tity Per- 37.5 2.42e-44 162 1 × cent Iden- 81.667 4.51e-107 31460.082 6.54e-8574.535 1 27250.413 ×40.789 0 0.169 1.05e-18 Transport, Centrosome, 044.149 × 133740.606 89.7 4.56e-105 3.11e-77 Axonemal, Basal Body, 85137.028 0.831 31837.025 5.22e-8539.911 1.12e-69 1 × 25337.229 3.99e-9775.909 1 4.53e-50 278 × 0.476 133.665 5.74e-125 234 2.42e-84 303 × 0.524 × × 163 35755.344 × 5.17e-96 270 1 169.173 1 × 1 1.86e-67 283 × 1 × 51.408 × 0.362 9.23e-96 212 × Basal Body × 0.27154.672 286 Axonemal, Basal Body, × 0.36646.667 0 1.22e-104 Axonemal 57.576 × 53.259 9.81e-15 308 78224.04234.457 5.95e-13 75.9 1.72e-42 055.896 1 69.7 1 0.117 147 57331.466 × × 0 × 7.41e-64 1 Basal 0.883 Body, Transport, Transition Axonemal, Zone Basal 1 1064 213 × × × 1 1 × × Axonemal DE Info → ↓ ↓ → → ↑ ↑ ↑ → → → ↑ → → ↑ ↓ ↓ ↓ → → → → → → → ↑ → Name Component ID Local Gene (continued) comp218357_c2 ARF1 comp218378_c1 PSMD2 comp218438_c1comp218451_c1 CB050 CN16B comp218467_c1comp218481_c9 NUP1B comp218486_c0 UBX11 comp218486_c0 MARE3 comp218486_c0 MARE3 comp218502_c1 MARE3 ANR60 comp218531_c1 TM231 comp218544_c4comp218546_c3 LRC72 ARMC4 X X X X X X thy Cil- iopa- action FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey Cilia Proteome SysCilia Ciliopathy Survey RFX Ciliome2 RFX SysCilia CilDB Ciliopathy Survey Ciliopathy Survey Ciliome2 RFX SysCiliaSysCilia CilDB Ciliopathy Survey Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome comp218522_c0 EXOC5 Localization Known Ciliarynetwork vesicle trafficking rhodopsin transport tz TF inter- cytosol golgi bb axoneme (tip) dynein membrane cilium tz dynein assembly dnal4 axonemal Name lrrc74b 7 Ensembl IDENSG00000168374 Source 2, 6, 8ENSG00000187049 Gene 4, 5, 6, 8 arf4ENSG00000175166ENSG00000189060 trans-golgi 1 tmem216ENSG00000197275 3ENSG00000122432 basal 3 body golgi ENSG00000150873 7 4, 5ENSG00000183921ENSG00000187905 2 psmd2 4, 5, 7ENSG00000100565 h1f0 rad54bENSG00000095370 7 spata1 c2orf50ENSG00000172840 3 ac002472.13, ENSG00000162377 3ENSG00000103274 sdr42e2 3ENSG00000158062 6 2, 4, 5 lrrc74a CilDBENSG00000101367 sh2d3c 1, 6, 8 pdp2ENSG00000100246 selrc1 ubxn11 nubp1 3, 4, 5, 6, bbENSG00000084764 mapre1 4, 5 centrosome ENSG00000124227 Ciliome2 4, 5ENSG00000070367 8 Ciliopathy Survey Cilia mapre3 ProteomeENSG00000205084 Ciliome Rfx2 Rfx2 1, ankrd60 6, RFX 8ENSG00000141569ENSG00000140848 3 RFX ENSG00000123684 FoxJ1 2 exoc5ENSG00000205858 comp218378_c1 tmem231 3 4, cytosol 5 plasma transitionENSG00000169126 zone comp218384_c1 comp218396_c0 PSMD2 2, Rfx2 3, 6, 7 trim65ENSG00000198001 comp218412_c2 Rfx2 comp218442_c0 cpne2 RA54B Rfx2 3 lpgat1 H5 lrrc72 armc4 comp218451_c1 SPAT1 D42E1 axonemal comp218454_c1 CN16B comp218458_c0 comp218462_c0 irak4 BCAR3 RFX PDP1 SELR1 RFX Rfx2 FoxJ1 Rfx2 RFX comp218533_c0 comp218533_c0 Rfx2 comp218537_c0 CPNE3 CPNE3 PLCE comp218549_c0 IRAK4 200 Body, Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 57.937 2.37e-163 46678.53976.256 4.66e-12847.727 4.2e-12668.359 2.11e-55 361 1 355 177 × 0.504 057.882 0.49651.148 Basal × Body, Centrosome 1700 × 1 0 067.453 × 51.297 1.21e-101 1 978 5.05e-12950.884 767 294 × 39023.037 1 Transport, Axonemal, Basal 047.315 1 3.2e-15 143.145 4.55e-136 × 1 1.84e-59 × 634 × 77.8 Basal 407 Body,58.011 Transition Zone × 9.56e-75 21259.204 3.34e-78 1 1 1 223 × 154.976 × 233 × 53.561 5.19e-171 Axonemal, × Central83.654 Pair, 1.57e-131 1 6.97e-129 486 1 × 396 362 ×73.936 1 0.36280.769 Transport, 1 Axonemal, 1.3e-98 Basal 89.163 3.81e-125 × ×89.776 × 285 352 Transport, Axonemal, Basal 086.016 078.956 0.285 0.35283.361 112584.452 × × 0 594 0 0.137 0 1089 0.072 0 × 1038 1061 × 0.132 1082 0.126 Axonemal, 0.129 Ciliary × 0.132 × × × DE Info → ↓ ↓ ↓ ↑ ↓ ↑ → ↓ ↑ ↑ → → → ↓ ↓ → ↑ ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↓ Name Component ID Local Gene (continued) comp218553_c1 WRP73 comp218580_c2 WDR35 comp218593_c0 KIF9 comp218632_c0comp218643_c0 DMP4 CC164 comp218673_c2comp218677_c0 LRC63 comp218683_c0 CHAC2 SNP25 comp218728_c0 RAB8A comp218728_c3 X X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FoxJ1 Table B.2.1: Ciliopathy Survey RFX Survey Ciliome2 Cilia Proteome Ciliome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey Ciliome2 SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliarysatellite bb (distal or proximal end) TF inter- axoneme nexin bridge axoneme membrane cilium vesicle trafficking rhodopsin transport axoneme kif9 Ciliome2 Cilia Proteome drc1, pacrg axoneme Name wdr35 cilium ift ift-a SysCilia CilDB Ciliopathy ccdc164 5, 6, 7, 8 7 6, 7, 8 7, 8 Ensembl IDENSG00000116213 Source 4, 5, 6ENSG00000152932ENSG00000105649 Gene 7ENSG00000159200 wrap73 3ENSG00000118965 2 centriolar 1, 2, 3, 4, ENSG00000102900ENSG00000088727 rab3c 8 rab3a 2, 3, 4, 5, rcan1ENSG00000041353ENSG00000177706 3 4, 5 nup93ENSG00000157856 1, transition 2, zone 4, 5, ENSG00000197093ENSG00000183773 SysCilia 3 Ciliome rab27b fam20cENSG00000173988 3, 7 3, 4, 5, 7ENSG00000143942 4, 5ENSG00000132639 gal3st4 lrrc63 6, aifm3 8 FoxJ1ENSG00000115947ENSG00000182511 Rfx2 chac2 3ENSG00000167461 3 snap25 6, 7, 8 cilium cytosol comp218574_c1 comp218590_c2ENSG00000084733 comp218557_c0 orc4 rab8aENSG00000166128 comp218557_c0 RCAN3 7ENSG00000109971 fes 7 ciliary ENSG00000112530 NUP93 1, 8 Rfx2 RFX RAB3 1, 2, RAB3 3, 6, ENSG00000126803ENSG00000173110 1ENSG00000206383 rab10 1ENSG00000235941 rab8b 1 Rfx2 RFX 1 Rfx2 comp218596_c0 RB27A RFX comp218661_c0 Ciliome2 comp218662_c0 Ciliome2 SysCilia CilDB G3ST3 AIFM3 Rfx2 Rfx2 CilDB CilDB CilDB CilDB comp218702_c0 comp218706_c1 ORC4 comp218728_c3 comp218728_c0 FPS comp218728_c0 HSP70 RAB8A RAB8A comp218728_c3 comp218728_c3 comp218728_c3 comp218728_c3 HSP70 HSP70 HSP70 HSP70 201 Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Body, Ciliary Membrane Found Category Score EValue BitScore Weighted 55 3.89e-52 179 0.058 × 25 0.000858 40.8 1 × tity Per- cent Iden- 84.45270.5289.08 7.58e-7471.508 1.63e-101 0 2.86e-7574.757 25152.615 2.37e-97 333 1078 265 0.03171.569 0.131 309 0.04 0 0.032 2.65e-42 × × × 0.03868.377 × Basal 673 Body 167 × 62.252 0 0.05464.846 1 2.24e-125 × 1746 0 × 38.938 Basal Body, 400 Centrosome, 50.953 0.567 2.46e-22 58975.701 3.11e-120 × 0.1345.299 0.191 0 92 1.94e-58 Transport, Axonemal, 350 Basal × ×27.917 600 4.31e-18 220 1 Central56.688 Pair 1 0.081 90.5 × 56.688 1 × 0 × 0.03354.649 × 69.076 Axonemal, Basal 0 Body, × 74045.965 1.36e-12771.115 3.31e-87 078.431 Signaling, Axonemal, Basal 0.272 367 739 1.05e-5471.023 267 0 × 56833.01 7.76e-83 0.135 0.271 171 4.66e-15 0.164 0.208 × × 917 271 0.10547.45 × 77.8 × 0.564 0.16735.666 × Basal 1.69e-66 Body × 40.201 × 0 1 7.38e-47 238 × 542 153 1 1 1 × × × Basal Body, Centrosome Regulation DE Info ↓ ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ → ↓ → ↑ ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓ → ↓ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp218728_c3comp218728_c3 HSP70 comp218730_c0 HSP70 comp218760_c0 IQUB comp218760_c0 IF122 comp218760_c0 IF122 IF122 comp218942_c3 CF165 comp218942_c3 TAPT1 comp218963_c1comp218972_c2 ENY2 comp218986_c1 CJ067 comp218997_c1comp219008_c1 NA SRC8 comp219028_c1 CEP89 C070A X X X X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 RFX FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 Table B.2.1: Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 SysCilia CilDB Ciliopathy Survey Ciliome2 SysCiliaCilDB Cilia ProteomeCilDB Ciliome2 RFX comp218942_c3 PDZD7 Ciliopathy Survey SysCilia comp218942_c3 Ciliopathy Survey RFX CF165 Localization Known Ciliary TF inter- cilium ift ift-a component body inhibition appendage iqub undetermined Ciliopathy Survey Ciliome2 nme7 radial spoke ift122 basal body Name c6orf165 c6orf165 1700001c02rik 7 7, 8 7, 8 Ensembl IDENSG00000212866 SourceENSG00000088970 1ENSG00000105254 3, 6ENSG00000105072 7 4, 5ENSG00000077458 Gene ENSG00000164675 3 plk1s1, kiz 2, 4, 5, 6, bbENSG00000162946 c19orf44 6, tbcb 8ENSG00000163913 fam76b 1, 4, 5, 6, ENSG00000143156 Ciliopathy Survey 1, 4, CilDB 5,ENSG00000152795 6, ENSG00000143977 centrosome 3 bbENSG00000215045 3 Ciliome2ENSG00000106541 2 SysCilia CiliopathyENSG00000162068 Rfx2 Survey 3ENSG00000132589 3ENSG00000064012 7ENSG00000186862 2 hnrpdl 8 snrpg grid2ipENSG00000006611 agr2 8ENSG00000213204 ntn3 RFX 1, flot2 2, 4, casp8 5ENSG00000272514 pdzd7 Rfx2 rp3-382i10.7, 1, nucleus 7 basal ENSG00000169762 ush1cENSG00000211460 comp218728_c3 6ENSG00000185900 2 stereociliumENSG00000138363 comp218728_c3 8ENSG00000120533 cfap206, HSP70 7 4, 5 SysCilia Ciliome2ENSG00000126456 comp218728_c3 HSP70 ENSG00000179133 3 tapt1 4, 5ENSG00000160813 sgk196 HSP70 bb tsn 4, 5ENSG00000085733 Rfx2 eny2 atic 6 Rfx2ENSG00000121289 c10orf67 FoxJ1 4, 5, 6, irf3ENSG00000173557 8 Rfx2 ppp1r35 Rfx2 4, Ciliopathy 5, Survey 7 FoxJ1 comp218760_c0 SysCilia cep89 Cilia Proteome cttn comp218760_c0 comp218798_c3 centriole c2orf70, distal ciliogenesis comp218801_c0 IF122 comp218868_c5 comp218942_c3 comp218802_c1 GRD2I IF122 comp218890_c0 × FoxJ1 TXD12 FLOT2 PDZD7 39.623 NET1 CASP8 3.41e-78 × 29.314 275 comp218942_c3 RFX 6e-59 RFX 1 CF165 Rfx2 comp218963_c1 RFX 206 × SG196 1 comp218963_c1 × comp218963_c1 PUR9 PUR9 202 Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted 70 1.4e-160 473 1 × tity Per- 44.7 9.88e-179 516 1 × cent Iden- 49.013 9.26e-8227.358 6.07e-11 259 61.632.258 8.76e-77 149.315 4.31e-16 1 × 24674.099 × 77.834.848 6.04e-06 1 065.97764.371 1 × 50.4 684 × 035.206 0 3.77e-50 Regulation, Basal25.608 Body, 1 584 1 1.96e-43 55829.243 169 × 2.56e-40 × 0.511 171 0.489 × 57.047 160 1 × 0.517 8.21e-120 × 0.483 × 347 × Basal Body 65.854 4.78e-167 136.261 487 1.76e-34 × 134 1 × 30.335 1 1.97e-84 × 286 1 × DE Info ↑ ↓ → ↑ ↓ ↑ → ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp219039_c1 CQ105 × 38.514comp219110_c0 3.52e-29comp219113_c0 TBCC1 105comp219147_c1 SGPP1 comp219163_c2 1 VAX2A × RUVB2 comp219204_c0comp219211_c4 FXL20 comp219214_c3 CP093 comp219214_c3 DZIP1 DZIP1 comp219219_c1comp219230_c0 F221A comp219243_c0 DLRB2 ×comp219268_c0 CDKL2 59.574comp219277_c1 6.5e-40 SX17A comp219277_c1 ADCY5 128 × 0.522 ADCY5 53.282 4.25e-81 × × 46.956 281 0 0.272 × 752 Transport, Axonemal, Basal 0.728 × Axonemal, Ciliary X X X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FOXJ1 Rfx2 Table B.2.1: Ciliome2 RFX SysCilia Ciliopathy Survey Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Localization Known Ciliary TF inter- transcription factor appendage cilium ift ift-b ift46 basal body Name c16orf93 1700006e09rik 8 Ensembl IDENSG00000231256 Source 2, 4, 5, 7ENSG00000076067ENSG00000109927 c17orf105, 3ENSG00000124688 Gene 2ENSG00000134533 2ENSG00000113838 2 3, 4, 5ENSG00000163082 rbms2 4, mad2l1bp 5ENSG00000214513 tecta 6, tbccd1 8 rergENSG00000183207 1, 4, 5ENSG00000158615 sgpp2ENSG00000173826 2ENSG00000108306 2 notoENSG00000153558 7 Ciliome 4, nucleus ruvbl2 5ENSG00000236104ENSG00000196118 2 ppp1r15b 4, 5ENSG00000134874 kcnh6 6 fbxl20ENSG00000158163 fbxl2 2, 3, 4, 5 ccdc189, zbtb22 FoxJ1ENSG00000188732 CilDB Rfx2 2, 4, 5 dzip1l dzip1 FoxJ1ENSG00000168589 RFX bb distal Ciliome 1, 4, FoxJ1 5,ENSG00000125971 7 comp219070_c1 Ciliome2 fam221aENSG00000138769 1 RFX 4, 5 comp219041_c3ENSG00000153002 MD2BP dynlrb2 comp219044_c2ENSG00000171056 2 RFX 4, 5ENSG00000118096 comp219108_c0 SHEP MUC5A 1, 4, 5, 6, FoxJ1 × cdkl2 FoxJ1ENSG00000174233 RERG 24.242 3, 4, 5, 6 cpb1 1.78e-41 × sox7ENSG00000162949 RFX CilDBENSG00000148584 Ciliome2 3 60.101 2 168 1.85e-77 FoxJ1 comp219189_c2 adcy6 comp219179_c0 axoneme CilDB comp219204_c0 231 KCNH7 RFX 1 VF71 capn13 RFX × × FXL20 Ciliopathy comp219205_c0 Survey 78.543 a1cf RFX 1 × ZN135 0 × RFX 45.57 829 RFX 3.22e-19 FoxJ1 1 84.3 RFX × comp219230_c0 1 DLRB2 comp219261_c0 × × Rfx2 52.128 FoxJ1 CBPO 1.52e-35 × 45.725 117 1.81e-80 comp219286_c6 0.478 comp219301_c0 251 × CANB A1CF 1 × 203 Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 33.755 2.05e-8442.243 1.54e-116 27357.233 8.09e-66 34737.072 1 2.93e-74 199 × 33.186 144.109 2.02e-36 23435.624 × 144.444 5.55e-100 1.73e-13 132 0 × 50.321 1 317 2.03e-16231.288 65.9 × 59230.109 1.61e-41 1 46623.207 4.39e-37 1 7.63e-10 × 43.609 1 15437.778 1 × 4.23e-28 151 127.424 65.1 7.82e-25 × 9.06e-27 × × 121 135.958 109 140.845 2.31e-126 1 114 × 0.526 1.41e-33 × 0.47473.401 × 393 × 2.98e-15873.401 × 117 1 Axonemal, Ciliary 8.15e-158 484 × 173.401 48850.794 7.07e-151 1 0.33435.714 3.18e-107 × 9.39e-135 0.337 × × 478 32059.477 × 42229.412 0.3353.211 2.67e-07 2.35e-31 0 1 × 1 54.346.939 × 125 88486.897 7.03e-42 × 7.91e-9082.881 158.873 139 1 1 5.98e-130 282 × 0 × × 380 0.247 1 860 × 0.469 × Centrosome 0.753 × × DE Info ↑ ↓ → → ↓ → ↓ ↑ ↓ ↓ ↑ → → → ↑ ↓ → ↑ ↑ ↑ ↓ ↑ ↓ → → ↑ → → → Name Component ID Local Gene (continued) comp219305_c3comp219317_c0 PCAT2 comp219330_c0 CECR5 comp219340_c0 EFCB2 MPRG comp219360_c2 CT201 comp219408_c2 DCLK1 comp219424_c1 ZBBX comp219468_c0comp219468_c0 CDKL5 CDKL5 comp219498_c3 FXL13 comp219517_c0comp219522_c2 FOXD3 comp219526_c0 AKIB1 × PP2BA 54.665 0 850 1 × X X X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 RFX FOXJ1 FOXJ1 FoxJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 FOXJ1 Table B.2.1: Ciliopathy Survey Proteome Ciliome Ciliopathy Survey RFX Ciliopathy Survey Localization Known Ciliary TF inter- cilium) axoneme domonas flagellarprotein satellite Name c20orf201 Ensembl IDENSG00000176454 Source 3, 4, 5ENSG00000069998 4, 5ENSG00000203666 Gene 3, 4, lpcat4 5, 7ENSG00000137819 3, 4, 5 cecr5 efcab2ENSG00000184530ENSG00000180104 2ENSG00000157227 8ENSG00000171695 3 paqr5 4, 5ENSG00000124217ENSG00000105507 3ENSG00000164176 c6orf58 2, 7ENSG00000257315 7 exoc3ENSG00000183638 Ciliome2 mmp14 lkaaear1, 3 6ENSG00000186212ENSG00000198944 3 mocs3ENSG00000169064 cabp5 3 2, 4, 5 edil3 zbed6ENSG00000165105 RFX ENSG00000197223 rp1l1 3 RFX ENSG00000197375 SysCilia sowahb 3 tzENSG00000010626 (connecting sowaha 3 zbbx RFX 1, 2, 3,ENSG00000008086 7 3, Ciliome2 4, 5, 6ENSG00000011347 rasef RFX ENSG00000104626 lrrc23 slc22a5 3 c1dENSG00000161040 3 cdkl5 3, 4, 5, 7 FoxJ1ENSG00000084754 chlamy- ENSG00000100918 Rfx2 3 FoxJ1ENSG00000164379 3 fbxl13 4, syt7 5ENSG00000001629 Rfx2 eri1 4, CilDB comp219344_c0 5 Ciliome2 Cilia ENSG00000275489 comp219357_c1ENSG00000149179 7 Rfx2 hadha comp219358_c0 comp219363_c1 6ENSG00000138814 CF058 foxq1 rec8 EXOC3 ENSG00000129990 Rfx2 3, 7 ankib1 MMP17 Rfx2 3 comp219362_c1 CALM RFX 1700001p01rik × c11orf49 MOCS3 comp219372_c0 43.919 centriolar ppp3ca comp219367_c0 Rfx2 3.3e-37 comp219418_c3 Rfx2 Rfx2 ZC11A comp219418_c3 syt5 MFGM 126 SWAHC SWAHC comp219455_c1 RFX comp219459_c0 1 Rfx2 comp219457_c1 Ciliome2 Rfx2 × RASEF ORCT Rfx2 C1D × RFX Rfx2 35.674 comp219468_c0 RFX 1.63e-98 comp219490_c1 Rfx2 CDKL5 310 comp219501_c0 ERI1 comp219514_c0 Rfx2 ECHA 1 comp219525_c0 comp219526_c0 RAD21 × CQ098 PP2BA comp219534_c1 SY65 204 Membrane, Transition Zone Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 59 0 631 1 × tity Per- 31.2 9.63e-09 55.8 1 × cent Iden- 65.09284.861 1.43e-149 2.51e-162 43185.924 46886.184 1.72e-94 0.53134.646 0 0.34 × 52.778 5e-24 28246.865 9.94e-96 × 628 6.35e-90 0.205 94.4 291 0.45673.404 × 281 0.245 6.56e-87 × 0.75547.679 × 42.202 1.68e-136 × 263 1 × 412 0 1 2204 ×39.283 1 Axonemal, Ciliary × 46.97 0 1 6.92e-5852.988 × 927 8.91e-96 20778.771 Axonemal, Basal Body, 79.823 28147.321 1 0 156.79 4.1e-84 021.866 × 7.35e-99 × 1 1.1e-19 1087 Axonemal, 281 Basal 75445.228 Body, × 288 0.59 94 0.41 0 × 144.061 132.192 × 6.87e-14 × 1 595 × 0 × 74.7 833 Basal Body, 1 Other Organells 1 × 1 × × DE Info → ↓ ↓ ↓ ↑ ↑ ↓ ↓ ↑ → ↑ ↑ → → ↓ → → ↓ ↓ → ↑ → ↑ Name Component ID Local Gene (continued) comp219537_c3comp219537_c3 GBB comp219546_c0 GBB AKA14 comp219557_c0comp219559_c2 TRIM2 DNAL1 comp219572_c0comp219576_c2 HD TTLL7 comp219624_c0 BROMI comp219642_c3comp219670_c0 YPEL2comp219671_c1 × STX6 95.614 8.41e-79 HS90A comp219690_c3 228comp219700_c0 TRIM2 1 PP4R4 comp219717_c2 × K1430 X X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 Rfx2 FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 SysCilia Ciliopathy Survey Rfx2 SysCilia Ciliopathy Survey RFX Ciliome2 RFX Localization Known Ciliary TF inter- axonemal dynein complex basal body bb (centrosome) cytosol Ciliopathy Survey RFX filaments bbs nuclear? dnal1 axoneme Name cfap97 c6orf170 6, 7, 8 Ensembl IDENSG00000143858 SourceENSG00000166220 2 2, 4, 5, 7ENSG00000172354ENSG00000182362 Gene 7 4, 5ENSG00000186471 tbata 4, syt2 5ENSG00000123815ENSG00000115661 3ENSG00000237330 3 gnb2 3, 4, ybey 5 akap14ENSG00000119661 1, 3, 4, 5, Ciliome2ENSG00000153560 adck4 rnf223ENSG00000197386 stk16 3 3, 6, 8ENSG00000137941 Ciliome2 2, 3, 4, 5ENSG00000146350 RFX ubp1 htt 2, 4, 5, 6 centrosome ENSG00000140265 ttll7ENSG00000100027 3 FoxJ1 tbc1d32, 4, 5ENSG00000104915 4, 5ENSG00000096384 RFX ENSG00000080824 1 RFX zscan29 1, 4, 5,ENSG00000196132 7 comp219534_c1 ypel1ENSG00000169738 Cilia 3 ProteomeENSG00000119401 Rfx2 2 stx10 RFX hsp90aa1 comp219537_c3 Rfx2 4, 5, 6, 8 SY65 ENSG00000119698 2, 3, 4, 5 GBB RFX myt1ENSG00000071189 comp219546_c0 dcxr intermediate ENSG00000164323 comp219556_c3 3 Rfx2 ppp4r4 3, 4, 5, CilDB 7 Ciliome2 ADCK3 CilDB STK16 kiaa1430, snx13 comp219563_c3 RFX Ciliome Rfx2 Cilia Proteome Ciliome UBIP1 RFX RFX RFX comp219628_c0 FoxJ1 Rfx2 PRDM9 comp219671_c1 comp219685_c1 HS90A comp219678_c5 Rfx2 DCXR MYT1 comp219710_c0 SNX13 205 Axonemal, Basal Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 40 2.28e-65 222 1 × tity Per- cent Iden- 35.59 9.62e-168 54232.21649.133 3.97e-12152.857 1.23e-45 9.15e-18 1 39942.342 1.31e-47 171 × 85.1 0.393 1 Signaling,32.974 Transport, 179 0.19624.841 5.53e-16029.817 × × 4.83e-36 0.411 × 8.23e-13 49386.06687.958 × 4.18e-74 14885.841 71.6 1.52e-124 3.47e-67 186.243 233 36082.278 9.42e-124 186.607 × 7.92e-44 1 216 0.15237.209 5.15e-69 0.235 × 350 7.71e-56 × 0.141 × 151 × 52.916 219 0.229 × 191 0.09971.084 × 0.143 0 × 29.467 × 1 0 1.12e-22 992 × 45.58635.63 776 3.24e-131 10384.639 5.25e-100 1 409 0.20149.156 5.93e-139 1 324 × 047.543 × 0.79930.82 × 44350.847 Axonemal, Basal 3.34e-113 × Body, 582 2.09e-29 1 0 Axonemal, Ciliary 39.116 0.2764.032 2.14e-133 367 × 66.992 5.61e-118 122 83144.775 × 1 405 0.224 341 × 0.506 0 × 1 0 Basal × Body 1 × 699 1 1063 × × 1 1 × × Motility, Basal Body DE Info → ↓ ↓ ↓ ↓ ↓ ↑ ↑ → ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↓ ↑ ↓ ↓ → → ↓ ↓ ↓ → ↓ ↓ ↓ → Name Component ID Local Gene (continued) comp219726_c4 PARD3 comp219734_c3comp219745_c2 DAB MPRB comp219822_c0 SLAI1 comp219829_c0 NCAH comp219832_c3comp219833_c0 PI5L1 comp219835_c3 AASS DYI3 comp219863_c0comp219869_c0 KC1D PDE11 comp219874_c1 IRF4 comp219918_c0 ULK4 X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 SysCiliaCiliome2 SysCilia Ciliopathy Survey comp219841_c2 RFX TRAK1 Localization Known Ciliarylocalisation required for centrosome positioning TF inter- axoneme axonemal dynein complex basal body ciliary motility ulk4 bb orientation dnai2 Name dnaic2, spata45 7, 8 8 Ensembl IDENSG00000148498 Source 6, 8ENSG00000168615 Gene ENSG00000182950 3ENSG00000185523 1 pard3 4, 5ENSG00000173406 cilium apical ENSG00000054598 2ENSG00000170915 3 4, adam9 5ENSG00000184465 c1orf227, odf3l1ENSG00000197057 2ENSG00000109171 2 4, 5ENSG00000080511 dab1ENSG00000115756 foxc1 paqr8 2, 7ENSG00000116717 7 4, 5 wdr27ENSG00000104490 dthd1ENSG00000104228 slain2 3ENSG00000105048 CilDB 2ENSG00000167103 2 rdh8 gadd45a 3, hpcal1 4, 5ENSG00000008311 4, 5ENSG00000171595 ncald trim35 pip5kl1 1, 4, 5, 6, tnnt1ENSG00000173805 Rfx2 8 Ciliome2 Cilia Proteome aassENSG00000182606 Ciliome2ENSG00000197050 2ENSG00000141551 3 FoxJ1 FoxJ1 1, 6, 7ENSG00000137098 Rfx2 RFX hap1 4, comp219727_c0 5, 7ENSG00000128655 comp219734_c3 FoxJ1ENSG00000095464 centrosome 7 FoxJ1ENSG00000137265 trak1 csnk1d RFX 2 znf420 ADA12 comp219829_c0 4, comp219734_c3 5 bbENSG00000079616 and golgi spag8 DAB ENSG00000109079 comp219735_c0 RFX 3ENSG00000031698 2 NCAH ENSG00000168038 comp219781_c0 pde11a 7 CilDB Ciliopathy DAB Survey FXC1A comp219814_c0 2, 4, pde6c 5, 6, Rfx2 FoxJ1 × RFX WDR27 comp219829_c0 FoxJ1 DTHD1 67.76 tnfaip1 1.08e-67 sars RFX NCAH comp219829_c0 comp219829_c0 229 Ciliome2 comp219829_c0 Cilia Proteome NCAH NCAH 1 NCAH × FoxJ1 Rfx2 Ciliome2 FoxJ1 RFX comp219841_c2 comp219843_c0 comp219869_c0 RFX TRAK1 Rfx2 FoxJ1 comp219869_c0 ZFP26 PDE11 PDE11 comp219892_c0 comp219890_c0 comp219895_c1 BACD3 KIF22 SYSC 206 Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 25 4.6e-2332 1.24e-87 104 293 1 × 1 × tity Per- 68.5 046.4 2.22e-28 536 125 0.572 × 1 Centrosome × cent Iden- 76.92361.111 5.51e-116 2.57e-101 340 29830.736 5.29e-31 0.71839.481 1 6.76e-66 117 × × 0.282 Transport,39.308 Axonemal, 234 Basal × 45.992 047.699 3.69e-138 1 1.01e-171 × 408 581 49234.48755.777 1.32e-128 1 4.27e-107 1 1 40141.156 × × 31070.943 2.02e-70 × 0.42858.348 224 × 0 125.287 × 0 736 0.000738 140.431 0.543 × 40.436.747 620 4.35e-10367.273 1.61e-36 ×71.756 0.457 315 2.36e-60 1 Transport, 125 Axonemal, Basal 0 ×67.72764.808 × 204 166.898 Transport, 4.92e-130 Axonemal, Basal 633 1 Axonemal, 0 Ciliary × 0.11 39345.714 0.341 × 0 5.44e-31 × 62537.981 0.212 ×30.303 1.17e-38 989 115 8.2e-10 0.337 × Basal Body 143 × 58.9 1 1 × 1 × 1 Transport, × Axonemal, Basal × DE Info ↓ → → ↓ → ↑ ↑ → ↑ ↓ ↓ ↓ ↑ ↑ ↑ ↑ → ↓ ↑ ↓ ↓ ↓ ↓ ↑ → ↓ → Name Component ID Local Gene (continued) comp219969_c0 STX comp219992_c0comp220009_c0 NSUN5 comp220010_c0 CC170 PCDP1 comp220044_c0 FIGL1 comp220067_c7comp220075_c1 CCDCX comp220075_c1 KRP95 comp220085_c1 KRP95 CC067 comp220132_c0comp220132_c0 GDIB GDIB comp220143_c4 BBS2 X X X X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 FoxJ1 comp220038_c6FOXJ1 RFX KAD8 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 RFX FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey Ciliome Ciliome2Cilia Proteome Ciliome Ciliopathy Survey RFX Ciliome2 Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Ciliome2 Ciliome2 FoxJ1 comp220151_c1 CK001 Localization Known Ciliaryaxoneme - photoreceptor outer segment TF inter- (mother centriole) axoneme CilDB Ciliopathy Survey cilium bbs - ift-associated cilium ift-kinesin SysCilia Ciliopathy Survey bbs2 basal body Name pcdp1 cfap20, c16orf80 1110032a03rik 8 6, 7 8 Ensembl IDENSG00000111640 SourceENSG00000166900 1, 7 6, 8ENSG00000178750 Gene ENSG00000179299 2 gapdh 4, 5ENSG00000084676ENSG00000120262 stx3 3 2, 4, cilium 5, cytosol 7ENSG00000163075 stx19 nsun7 2, 4, 5, 7 ccdc170ENSG00000117394 ncoa1ENSG00000165695 3 CilDB Ciliome2 cfap221, 1, 2, 7,ENSG00000132436 8 6ENSG00000182263ENSG00000159377 3 slc2a1ENSG00000124194 ak8 7 Ciliome2 Cilia Proteome 4, 5ENSG00000162496ENSG00000101350 fignl1 3 4, 5, basal 6, body 7, ENSG00000084731 psmb4 gdap1l1 fign 6, 8 comp219957_c3ENSG00000070761 SysCilia CilDB Ciliome2 dhrs3 1, 2, 4, 5, FoxJ1 RFX ENSG00000140740ENSG00000257727 G3P 7 Rfx2ENSG00000057608 3ENSG00000001460 6 cilium ift-kinesin Ciliome2 2, 4, 5, 7 SysCilia Ciliopathy comp219969_c0 Survey ENSG00000203879ENSG00000049618 7 uqcrc2ENSG00000125124 3 Rfx2 cnpy2 comp220002_c1 stpg1 4, 5, 6, STX 7, gdi2ENSG00000137720 NCOA2 bb 2, 7ENSG00000101670ENSG00000156575 3 gdi1 2 RFX Rfx2 comp220011_c0 Ciliome2 c11orf1, Ciliome2 Ciliopathy Rfx2 Survey GTR1 comp220052_c1 lipg prg3 comp220044_c0 Ciliome2 PSB4 comp220068_c0 FIGL1 RFX S16C6 Rfx2 comp220086_c0 Rfx2 comp220119_c3 QCR2 CNPY2 comp220132_c0 Rfx2 comp220132_c0 FoxJ1 GDIB GDIB comp220166_c0 comp220182_c3 SVEP1 LIPP 207 Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 50 6.06e-112 330 0.418 × tity Per- 30.1 1.45e-84 277 0.395 × cent Iden- 54.286 1.2e-9896.403 293 096.154 1 851 093.961 × 0.1 843 095.923 × 0.09996.87596.875 794 Axonemal, Basal Body, 0 ×95.43392.857 0.094 095.536 Axonemal, Basal Body, 0 816 3.81e-150 0 ×93.687 0 851 0.096 427 84896.575 Axonemal, Basal Body, 80481.746 4.88e-98 0 × 76544.728 1.39e-147 0.1 0.05 0.1 0.09549.089 310 × 419 760 0.0952.661 × × 0 × 33.945 6.36e-124 0.037 1.91e-40 × 0.049 0 0.09 362 808 × × × 140 80832.321 0.19955.897 1 3.27e-88 1 1.54e-61 1 × × × 55.963 28540.503 1.14e-73 213 × 49.881 6.72e-114 0.40665.789 247 0.27 34157.639 × 0 5.96e-38 × 0 0.31369.409 808 1 × 140 544 × 0 0.205 0.795 1 × × 548 × Centrosome 1 × DE Info ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → → ↑ ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ Name Component ID Local Gene (continued) comp220183_c1comp220192_c1 ZDH15 TBA1C comp220192_c1 TBA1C comp220192_c1 TBA1C comp220192_c1 TBA1C comp220192_c1comp220192_c1 TBA1 TBA1C comp220194_c0 CG063 comp220214_c0 ANR45 comp220214_c0comp220221_c0 KLH20 IPYR comp220257_c1comp220257_c1 MCRS1 MCRS1 X X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCilia CilDB Ciliopathy Survey SysCilia CilDB Ciliopathy Survey CilDB Ciliome2 RFX Ciliopathy Survey Localization Known Ciliarycentriole cytosol microtubule cytoskeleton TF inter- centriole cytosol microtubule cytoskeleton centriole cytosol microtubule cytoskeleton satellite Name basal body cfap69 basal body ankrd45 Ciliome2 RFX 8 8 7 Ensembl IDENSG00000104219 Source 4, 5ENSG00000167552 1, 4, 5, 6, Gene ENSG00000167553 zdhhc2 1, 4, 5, 6, ENSG00000127824 1, 6, 8ENSG00000123416 1, 4, 5ENSG00000198033 tuba4aENSG00000075886 1ENSG00000152086 basal body 1, 2ENSG00000183785 1ENSG00000164287 1 4, 5ENSG00000170522 4, tuba3c 5 tuba3dENSG00000186994ENSG00000121577 3 tuba3eENSG00000105792 3 cdc20b 1, 4, 5, RFX ENSG00000070610 7ENSG00000132912 elovl6 3 CilDBENSG00000183831 3 c7orf63, 2, 3, 4, kank3 5, popdc2 CilDB CilDBENSG00000116679 CilDB 4, 5ENSG00000183655 gba2 CilDBENSG00000159588 dctn4 3 4, 5ENSG00000138777 RFX ENSG00000155158 ivns1abp 3ENSG00000167693 3ENSG00000100109 3 FoxJ1ENSG00000187778 3 ccdc17 klhl25 6ENSG00000175664 3, 4, 5 ppa2 RFX ttc39bENSG00000197905 comp220192_c1 3 nxn tfip11 RFX mcrs1 comp220192_c1 tex26 centriolar TBA1C Rfx2 comp220192_c1 Rfx2 TBA1C comp220192_c1 TBA1C Rfx2 Rfx2 TBA1C comp220192_c1 comp220192_c1 RFX TBA1C TBA1 comp220200_c0 RFX Rfx2 comp220208_c0 GBA2 Rfx2 DCTN4 Rfx2 Rfx2 Rfx2 comp220214_c0 RFX comp220221_c0 KLH20 comp220221_c0 Rfx2 comp220223_c0 comp220224_c0 IPYR IPYR TFP11 NXN comp220278_c0 TEAD1 208 Centrosome, Other Organells Zone, Other Organells Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 44.56143.488 0 1.15e-10856.925 370 77448.367 0 9.36e-17354.737 1 145.194 496 75167.456 2.96e-158 × ×49.306 2.93e-63 0 5.12e-36 0.333 Axonemal, 459 Basal41.224 Body, 41.429 1 223 × 53340.767 137 0.30833.808 6.29e-54 × 046.259 9.42e-71 0.358 × 0 8.09e-27 1 Basal Body, Transition 36.853 201 1 × 54523.894 1.04e-78 235 × 62670.408 2.36e-12 119 ×25.102 6.48e-42 0.465 251 1 8.4e-39 0.535 Centrosome 38.947 71.2 1 × 152 1 × × 5e-15 × 14870.64 1 × 1 8.95e-172 1 × 7977.632 × 180.62 1.18e-31 487 × 83.333 2.38e-68 1.62e-167 × 0.58884.681 1 12656.514 2.15e-142 215 47346.075 × × 0.15273.397 4.16e-94 415 Basal 0.533 Body, 0 Centrosome, 0.2647.48 × 36.301 280 0.467 1.62e-165 × 0 × 27.841 1.3e-38 719 8.53e-22 × 32.039 484 667 4.34e-05 1 14432.477 8947.17 1 5.31e-55 × 45.4 1 9.65e-62 1 × 0.662 1 206 × 0.338 × × 214 × × 0.351 1 × × DE Info → → ↓ ↓ ↓ ↓ ↑ → ↓ ↓ → ↑ ↑ ↑ ↑ → ↑ ↑ ↓ ↓ ↓ ↑ ↑ ↓ ↑ ↑ ↓ → ↑ ↑ ↑ ↓ Name Component ID Local Gene (continued) comp220285_c0 EXOC4 comp220309_c0comp220310_c2 HIPL2comp220312_c0 × NEK8 43.74 1.4e-175 LAT2 comp220332_c0 520 FR1OP 1comp220372_c0 × 4ET comp220397_c1comp220424_c1 BTBDH comp220431_c0 CCD60 SEPT2 comp220434_c2 P2R3C comp220459_c2 TTLL1 comp220480_c1comp220480_c1 CC153 comp220483_c3 ANO8 KHDR2 X X X X thy Cil- iopa- action FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey FoxJ1 Ciliopathy Survey SysCilia Ciliopathy Survey Localization Known Ciliary(centrosome) TF inter- inversin compartment satellite ciliary base Name Ensembl IDENSG00000131558 Source 6, 8ENSG00000113645ENSG00000182218 3 Gene 2, 3ENSG00000160602 exoc4 2, 6, 8ENSG00000155465 basal body bb 4, 5ENSG00000103257 wwc1 hhipl1ENSG00000151012 3ENSG00000127241 2ENSG00000213066 2 transition zone 6ENSG00000148180 slc7a7ENSG00000135407 2, 7ENSG00000011007 7 slc7a5ENSG00000163116 slc7a11 3ENSG00000184708 7 masp1 fgfr1op 4, 5ENSG00000166068ENSG00000139044 centriolar 3ENSG00000096746 gsn 2ENSG00000204347 3 avil tceb3 eif4enif1 4, 5ENSG00000183273 stpg2 3, 4, 5 spred1 b4galnt3ENSG00000168385 hnrnph3 6, 8 Rfx2 btbd17 Rfx2 ENSG00000169247 ccdc60 Ciliome2 Cilia ProteomeENSG00000172115 3ENSG00000092020 Ciliome2 3 4, 5 FoxJ1 RFX ENSG00000070087 sept2ENSG00000111144 3 comp220286_c0ENSG00000175287 centrosome 7 Rfx2ENSG00000100271 FoxJ1 3 sh3tc2 FoxJ1 4, ppp2r3c 5ENSG00000160803 WWC2 cycsENSG00000120370 comp220339_c0 3ENSG00000248712 3 pfn2 4, 5, 7ENSG00000176381 lta4h phyhd1 comp220312_c0 comp220312_c0 Rfx2 RFX 4, VILI 5 comp220331_c1ENSG00000129646 ttll1ENSG00000130695 7 ubqln4 ccdc153 4, LAT2 Rfx2 LAT2 5 FoxJ1 gorab DYH8 Rfx2 comp220339_c0 RFX prr18 comp220351_c0 RFX Ciliome2 qrich2 cep85 comp220363_c0 VILI comp220376_c0 ELOA1 comp220373_c0 comp220382_c1 STPG2 CGAT2 SPRE2 Rfx2 HNRPF RFX Rfx2 Rfx2 Rfx2 comp220431_c0 RFX comp220431_c0 Rfx2 RFX SEPT2 Rfx2 comp220440_c3 comp220434_c2 SEPT2 comp220449_c0 RFX LKHA4 P2R3C RFX comp220462_c0 PHYD1 comp220479_c0 UBQL1 GORAB comp220481_c0 QRIC2 209 Ciliary Membrane, Other Organells Membrane, Transition Zone Ciliary Membrane, Other Organells Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted 50 1.73e-78 246 0.404 × tity Per- 37.5 1.53e-73 241 1 × cent Iden- 52.663 7.09e-3946.61247.794 2.87e-11740.923 6.98e-82 149 1.78e-85 348 0.245 262 273 0.57 × 0.43 × 1 × × 40.123 3.49e-6339.93131.579 2.65e-4152.886 2.85e-10 217 14655.939 63.2 0 1 1 0 × 837 1 × 40.336 × 809 5.63e-26 161.268 9.31e-123 103 × 1 362 × 1 Axonemal,63.514 Ciliary ×63.592 1 6.62e-8266.772 1.53e-77 Basal Body, × Centrosome, 42.117 263 253 Basal 0 Body, Centrosome, 59.01661.619 4.04e-94 0.51 0 0.49 3390 × 302 × 0 667 1 0.187 975 × × 1 0.605 Axonemal, Ciliary × × DE Info ↓ ↓ → → ↑ → ↑ ↓ ↑ ↑ ↑ ↓ ↑ ↓ ↓ ↓ ↑ → → Name Component ID Local Gene (continued) comp220492_c7comp220534_c0 RABEK SC5A8comp220541_c1 × 49.701 RIMS2 4.79e-49comp220541_c1 × 59.524comp220545_c2 174 RIMS2 1.09e-66 × 0.235 RNF25 61.345comp220572_c1 244 × 4.06e-67 0.301comp220600_c0 242 KIF6 × 0.298comp220608_c3 DYI2 Basal Body, Centrosome, comp220609_c0 × CYLD SNX11 × 45.809comp220636_c1 5.75e-148 459 LRC71 comp220645_c2 1 USP9X × comp220666_c7 K6PF X X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 Table B.2.1: SysCilia RFX SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Cilia Proteome Ciliome Localization Known Ciliary TF inter- plasma membrane centrosome axoneme axonemal dynein complex lrrc71 Ciliome2 RFX dnai1, Name dnaic1 6, 7, 8 7 7 gm1305 comp220647_c0 KLP6 Ensembl IDENSG00000123094 SourceENSG00000131773 3ENSG00000107798 2ENSG00000111276 3ENSG00000136933 3 Gene 3, 4, 5ENSG00000124762 rassf8 khdrbs3 4, 5ENSG00000124429ENSG00000138074 3 rabepk cdkn1b lipaENSG00000182040 3 4, 5, 8 cdkn1aENSG00000166262 4, 5ENSG00000176406 pof1b ush1g slc5a6ENSG00000163481 3 cytoplasmic 4, 5ENSG00000134490ENSG00000146072 fam227b 3ENSG00000164627 3 2, 4, 5ENSG00000122735 rims2 rnf25 1, tmem241 2, 4, 5, tnfrsf21 Rfx2ENSG00000083799 FoxJ1 4, 5ENSG00000086300 RFX Rfx2 Rfx2 6, 8ENSG00000078668 RFX 8 comp220483_c3 comp220483_c3ENSG00000160838 Rfx2 snx10 2, 3, comp220487_c2 4, Rfx2 KHDR2 5, comp220487_c2 Ciliome KHDR2 centrosome bbENSG00000075891ENSG00000125618 vdac3 RFX 3 SysCilia Ciliopathy LIPG Survey ENSG00000124486 LIPG 2 centrosome 6ENS- comp220534_c0MUSG00000087236 comp220534_c0 Rfx2 RFX ENSG00000110958 SysCilia Rfx2ENSG00000067057 RFX 1 SC5A8 SC5A8 Rfx2 3, 4, 5 × usp9x × 52.688 axoneme 41.76 1.64e-26 comp220541_c1 1.42e-154 ptges3 comp220555_c1 pfkp 105 comp220557_c0 RIMS2 Ciliopathy 460 Survey TM241 RFX × 0.142 0.622 TNR16 58.824 × 2.17e-95 × CilDB 325 comp220614_c1 0.401 VDAC2 × Rfx2 FoxJ1 RFX comp220641_c3 comp220641_c3 PAX2A PAX2A comp220666_c7 K6PF 210 Ciliary Membrane, Transition Zone Centrosome, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 74.76257.792 1.25e-112 4.09e-5950.867 334 1.48e-58 20651.034 0.207 1.23e-44 20150.394 × 54.589 5.07e-38 1 0.27267.797 2.19e-75 15977.184 3.7e-20 × × 140 0.215 23949.717 91.3 0 0.189 × 0.323 × 038.436 689 × 1 1.2e-9554.462 690 × 43.529 1 30853.551 2.67e-79 0 × 125.301 268 1249 0 Axonemal, 1 Basal 5.04e-10 Body, ×51.316 2.05e-148 × Signaling 555 58.9 1 134.865 43157.018 3.13e-69 0.904 0.096 × × 34.904 1.69e-145 1.06e-124 × × 224 42943.993 1 399 8.08e-135 Motility ×56.566 1 40558.186 1 Axonemal, Basal Body, 61.538 1 3.97e-180 × 52.381 2.11e-140 × 0 1.08e-52 × 533 131.954 401 9.18e-67 Transport, Axonemal, Basal 56239.793 × 189 0.487 0.6864.246 222 0.513 × 4.92e-158 0.32 0 × × × 501 1 576 × 1 1 × × Axonemal, Basal Body, DE Info → → ↓ ↓ ↓ ↓ ↓ → ↑ ↑ → ↑ ↑ ↑ ↓ → ↑ ↑ ↑ ↓ ↓ ↑ ↑ ↓ ↑ ↑ Name Component ID Local Gene (continued) comp220672_c0comp220674_c5 K1456 comp220674_c5 CATB CATB comp220682_c0comp220683_c1 LIS1 KAD7 comp220692_c0comp220696_c1 CCD61 comp220701_c1 DIP2C LRC48 comp220706_c0 NEK2 comp220747_c1comp220764_c1 WDR60 CCD37 comp220788_c1comp220791_c0 FA49B comp220793_c2 MOT12 comp220800_c0 ATS7 TTBK1 X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Cilia Proteome Ciliome Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey RFX Localization Known Ciliary TF inter- localisation motile cilium CilDB Ciliopathy Survey centriole bb proximal end chlamy- domonas flagellarprotein appendage ak7 apical cell drc3 ttbk2 cytoplasm distal Name lrrc48, wdr60 cilium ift-dynein SysCilia Ciliopathy Survey RFX ccdc37 cfap100, 6, 7, 8 6, 7 7 lrrc518 6, 7, 8 Ciliome28 comp220701_c1 LRC48 Ensembl IDENSG00000100142 SourceENSG00000250305 3 4, 5ENSG00000183137 3, 4, 5 Gene ENSG00000165325 kiaa1456 4, 5ENSG00000169991 polr2fENSG00000164733 cep57l1 3ENSG00000106511 2ENSG00000007168 2 6, 8 ccdc67ENSG00000140057 1, 2, 4, 5, iffo2 pafah1b1 ctsb meox2ENSG00000104983 axoneme bb 4, 5ENSG00000160305 4, 5ENSG00000149201 SysCilia Ciliopathy Survey ENSG00000171962 7 1, 2, 4, ccdc61 5, ENS- MUSG00000064307 Cilia Proteome dip2aENSG00000117650 RFX Rfx2 4, ccdc81 5, 6, 8 RFX ENSG00000163536ENSG00000136052 3ENSG00000126870 3 FoxJ1 RFX 3, nek2 4, 5, 6, ENSG00000163885 comp220666_c7 basal body 1, Rfx2 2, 4, 5, serpini1 FoxJ1ENSG00000000938 slc41a2ENSG00000101336 comp220674_c5 K6PF 3ENSG00000153310 3ENSG00000151838 7 4, 5 CATB ENSG00000152779 comp220674_c5 comp220681_c0 4, 5 RFX ENSG00000151388 2, 3ENSG00000128881 fgr fam49b ccdc175 CATB HXA2B RFX hck 3, 4, 5, 6, slc16a12 adamts12 Ciliome2 comp220700_c2 Rfx2 Rfx2 CCD81 comp220708_c0 comp220736_c4 Rfx2 RFX Rfx2 SPB8 S41A1 RFX Rfx2 comp220788_c1 comp220765_c0 comp220765_c0 FA49B SRC42 SRC42 211 Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 35 1.44e-94 320 0.144 × Transport, Axonemal, Basal 25 1.02e-44 166 1 × tity 100100 3.39e-67100100 2e-68 1.11e-68100 206100 2e-68 4.08e-69100 6.53e-69 204 200 0.145 2e-68 203 200 0.144 0.141 × 206 0.143 0.141 × × 200 0.145 × × 0.141 × × Per- cent Iden- 55.1251.562 0 0 69063.458 545 0.31154.415 × 0.246 050.849 Transport, 0 Axonemal, × Basal 48.601 663 Transport, Axonemal, 1301 Basal 0 0.299 0 0.572 × 97261.889 × 554 1.51e-132 Axonemal 0.428 Axonemal, Basal72.414 Body, 392 × 2.12e-158 129.983 451 4.9e-47 × 134.018 Axonemal, Basal Body, × 1.9e-41 177 1 Transport, Axonemal, Basal × 157 0.53 × 0.4732.584 Centrosome, Other × 3.52e-0752.41958.273 1.62e-87 48.132.379 1.4e-5146.092 6.31e-99 4.23e-98 270 177 1 31428.427 298 1.45e-128 × 1 1 1 × 442 1 × × × 1 Transport, Axonemal, Basal × DE Info → → → → ↑ ↑ ↓ → ↑ ↑ ↓ ↓ → → → → → → → ↓ ↑ → → ↑ ↓ Name Component ID Local Gene (continued) comp220806_c4comp220806_c4 CNGA2 comp220806_c4 CNGA2 comp220806_c4 CNGA2 CNGA2 comp220813_c0comp220821_c0 FLNA comp220825_c0 GCP4 comp220828_c3 TT39C CLUA1 comp220837_c0comp220860_c0 NEK8comp220860_c0 RFP4A × 39.688 RFP4A 8.88e-135 433 1 ×comp220938_c0 Basal Body DYXC1 comp220950_c1 NRG X X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey FoxJ1 SysCilia Ciliopathy Survey Ciliopathy Survey SysCiliaSysCilia CilDB Ciliopathy Survey Ciliome2 RFX Ciliopathy Survey SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey Cilia Proteome Localization Known Ciliary- ciliary signalling - ciliary signalling TF inter- - ciliary signalling ciliary signalling centriole cytosol ift-b subcomplex localisation bb? network vesicle trafficking cilium axonemal dynein complex assembly Name cluap1 centriole cilium dyx1c1 centrosome 7, 8 8 Ensembl IDENSG00000183862 Source 6, 8ENSG00000132259 2, 6, 8 Gene ENSG00000070729 cnga2 6, 8ENSG00000198515 cilium axoneme cnga4 6ENSG00000196924 cilium axoneme 3, 8 cngb1ENSG00000128591 cilium axoneme 4, 5ENSG00000137822 cnga1 4, 5, 8 axoneme - ENSG00000085831 3, 4, 5 basal bodyENSG00000103351 tubgcp4 1, 4, basal 5, body 6, SysCiliaENSG00000177453 ttc39aENSG00000119638 7 6ENSG00000090565 6, 8ENSG00000131242 4, 5 nim1kENSG00000197061ENSG00000197238 rab11fip3 1 nek9ENSG00000270276 1trans-golgi ENSG00000197837 unclear 1ENSG00000270882 rab11fip4 1ENSG00000158406 1ENSG00000278637 1 hist1h4cENSG00000107984 7 Rfx2 comp220813_c0ENSG00000167972 hist1h4j 3 hist1h4aENSG00000100344 3ENSG00000176531 RFX hist4h4 3 hist1h4aENSG00000135686 FLNA 3 hist1h4hENSG00000256061 3 hist1h4a RFX 2, 4, 5, 6, dkk1 abca3ENSG00000134121 CilDB pnpla3 CilDB 4, 5 phldb3 CilDB klhl36 CilDB CilDB CilDB Ciliome2 chl1 RFX comp220831_c0 NIM1 comp220871_c0 comp220871_c0 Rfx2 comp220871_c0 Rfx2 Rfx2 comp220871_c0 comp220871_c0 Rfx2 H4 comp220871_c0 comp220871_c0 Rfx2 H4 H4 H4 comp220902_c0 H4 comp220903_c3 H4 H4 comp220911_c0 RFX comp220920_c0 comp220922_c0 DKK4 ABCA3 PLPL2 × PHLB2 KLHL9 43.772 0 1379 1 × 212 Body Transition Zone, Other Organells Axonemal, Basal Body, Ciliary Membrane, Transition Zone Body, Central Pair, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted 65 3.16e-138 402 1 × Transport, Axonemal, Basal tity Per- 45.5 7.51e-94 300 1 × cent Iden- 48.06449.77 3.02e-15344.356 3.77e-13045.491 44365.132 420 041.549 2.03e-61 061.017 4.84e-25 142.978 1.02e-36 0.12 151754.303 7.81e-74 186 1551 97.1 × × 0.435 142 0.657 0.445 231 0.343 0 × 33.969 × × × 71.825 1 1.1e-34 548 3.55e-137 1 × × 139 405 149.569 0.31 1.55e-144 × 1 × 42252.342 × 53.791 2.22e-166 Ciliary Membrane, 28.03 1.03e-156 0.32330.973 479 1.3e-2949.458 1.05e-43 × 45542.857 3.83e-104 0.367 Signaling, Transport, 114 160 307 × 0 1 0.343 0.657 Basal57.99 Body × 158.258 674 × 4.14e-148 × 64.904 1.01e-116 × 61.421 2.77e-85 470 344 1 046.94592.966 4.26e-97 268 × 92.097 1 600 180.921 0.309 9.81e-178 315 × 098.462 × 0.691 035.03 × 5.57e-38 497 1.11e-48 × 626 1 638 137 0.262 × 167 0.33 0.336 × 0.072 × × × 1 × DE Info ↓ ↑ ↑ ↑ → ↓ ↓ ↓ → → ↑ ↑ ↑ ↑ → ↓ ↓ ↓ → ↑ → ↑ → → → ↓ ↓ ↓ ↓ ↑ Name Component ID Local Gene (continued) comp221016_c1comp221019_c0 UAP1 GLIS3comp221040_c5 × 61.111 8.76e-51 TUB comp221040_c5 175comp221040_c5 0.423 TUB × TUB Signaling, Axonemal, Basal comp221083_c1 RSPH3 comp221104_c0 PEPD comp221112_c0comp221123_c4 PP1A CCD89 X X X X thy Cil- iopa- action FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey Rfx2 SysCilia CilDB Ciliopathy Survey Ciliome2 Ciliome Localization Known Ciliary TF inter- axoneme (tip) membrane various - inner segment tz (connecting cilium) synapse tip ift-a interactor cilium axoneme central pair radial spoke rsph3 Name rsph3b, 5, 6, 7, 8 Ensembl IDENSG00000138678 SourceENSG00000126759 2ENSG00000183873 3ENSG00000136531 3ENSG00000105245 2ENSG00000142168 Gene 3ENSG00000109610 7ENSG00000105997 3 agpat9ENSG00000177606 3ENSG00000197355 3 scn5a cfp 4, 5 scn2aENSG00000111087 numbl 6, 8ENSG00000107249 sod1ENSG00000105499 sod3 2 hoxa3ENSG00000112041 2 uap1l1 6, 8 Ciliome jun gli1ENSG00000078246 ciliary tip pla2g4c glis3 3, 6, Ciliome2 tulp1 8 plasma ENSG00000166402 FoxJ1ENSG00000135537 6ENSG00000183060 2ENSG00000071282 tulp3 3ENSG00000183386 3 axoneme ciliary ENSG00000134780 3 Rfx2 Rfx2ENSG00000130363 3 FoxJ1 comp220968_c2 1, 2, Rfx2 3, 4, tub lace1 lysmd4 Rfx2 bb GPAT4 rootlet lmcd1ENSG00000186185 Rfx2ENSG00000029364 RFX 2 comp220994_c0 comp220994_c0ENSG00000124299 Rfx2 dagla comp220994_c0 3ENSG00000124074 7 comp221005_c1 comp220999_c0 Ciliopathy Survey 2, 4, 5 SCNA SCNA ENSG00000127947 SCNA FoxJ1 comp221005_c1 FoxJ1ENSG00000186298 comp221007_c0 NUMB 3 SODC kif18bENSG00000172531 slc39a9 1ENSG00000166352 comp221013_c0 1, 7 enkd1 SODC HXB3A pepd 4, 5ENSG00000164266ENSG00000179071 2 comp221036_c6 comp221019_c0 JUN 2, ptpn12 4, 5 ppp1cc ppp1ca c11orf74 PA24A GLIS3 ccdc89 × spink1 Cilia FoxJ1 Proteome Rfx2 Ciliome 80.882 Rfx2 4.06e-75 Rfx2 RFX Rfx2 CilDB CilDB 239 comp221050_c0 comp221051_c4 0.577 comp221071_c0 FoxJ1 LACE1 × Rfx2 comp221071_c0 LYSM3 comp221080_c1 FHL2 FHL2 DGLA Rfx2 comp221084_c0 RFX comp221092_c3 KI18A RFX S39A9 FoxJ1 comp221104_c0 comp221109_c0 comp221112_c0 comp221112_c0 PEPD PTN12 PP1A comp221112_c0 PP1A PP1A 213 Body, Central Pair, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted tity Per- 63.5 4.66e-77 239 0.298 × cent Iden- 64.789 2.89e-2347.327 101 1.2e-14537.75 0.191 4.49e-96 427 × 305 0.80953.96 2.67e-153 × 66.561 1 4.69e-123 44181.034 × 1.8e-57 37037.931 Axonemal 179.644 4.18e-37 0.462 19246.853 × 8.2e-83 × 35.23 144 0 0.24 4.07e-79 26672.101 × 85751.799 7.2e-148 1 252 9.4e-109 × 1 415 1 31654.023 1 × 4.09e-83 × × 1 1 264 Axonemal × ×61.90562.08162.205 Transport, Axonemal, Basal 163.551 042.506 0 × 27.915 1.77e-93 047.719 1.49e-37 1072 0 6.83e-79 108268.86 320 1099 0.246 135 1102 6.25e-10161.477 0.248 24052.381 0.252 × 62.887 0.253 1.54e-122 × 348 1 3.78e-80 × 1 0 × 1 × 388 0.353 × 266 × 638 × 1 0.647 1 × × × DE Info ↓ ↓ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↑ ↑ ↓ ↑ ↑ ↓ ↓ ↓ ↓ → ↓ → → → → ↓ Name Component ID Local Gene (continued) comp221126_c1 CK5P3 comp221129_c1 CC151 comp221131_c0comp221141_c0 LGMN comp221141_c0 SRPK1 SRPK1 comp221178_c5comp221192_c0 MORN1 PPR32 comp221232_c1 RSPH9 comp221238_c6comp221243_c0 PPR42 FHDC1 × 37.5 2.79e-82comp221262_c2comp221265_c0 296 TOLIP ACK1 1 × X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 Table B.2.1: CilDB Ciliopathy Survey Ciliome2 Ciliome Ciliopathy SurveySysCilia CilDB Ciliopathy Survey Ciliome2 Cilia RFX Proteome Ciliome Localization Known Ciliary TF inter- dynein assembly (motile cilia) radial spoke rsph9 cilium axoneme Name lrp2bp Ciliome2 RFX ppp1r32 axoneme ccdc151 axonemal 7 5, 6, 7 7 6, 7, 8 Ensembl IDENSG00000109771 Source 2, 3, 4, 5, ENSG00000108465ENSG00000151704 Gene 3ENSG00000198003 2 1, 2, 3, 4, ENSG00000158286 cdk5rap3ENSG00000100600 2 4, 5ENSG00000119782 kcnj1ENSG00000135250 7 4, 5ENSG00000119862 2, 4, rnf207 5 lgmnENSG00000126787ENSG00000111642 3 fkbp1bENSG00000116151 2 srpk2 4, Ciliome 5ENSG00000162148 lgalsl 3, 4, 5, 6, ENSG00000136930 dlgap5ENSG00000172426 7 morn1 chd4 1, 2, 4, 5, Ciliome2 FoxJ1 Rfx2ENSG00000178125 2, 4, psmb7 5ENSG00000137460 3, 4, 5 ppp1r42ENSG00000070961 comp221128_c1 FoxJ1 comp221126_c1ENSG00000058668 1 RFX ENSG00000157087 1ENSG00000067842 1 fhdc1ENSG00000047648 IRK5 CK5P3 1 RFX Ciliome2ENSG00000095209 3 ×ENSG00000078902 RFX 3 comp221130_c2 4, atp2b1 44.86 5ENSG00000100242 atp2b4 Ciliome 1.24e-96 comp221141_c0 4, atp2b2 5 Rfx2ENSG00000061938 atp2b3 RN207ENSG00000102974 arhgap6 RFX FoxJ1 3 tmem38bENSG00000136535 298 × 3 SRPK1 3 tollip 32.979 2.45e-58 sun2 1 comp221158_c6 CilDB CilDB comp221173_c3 RFX 211 tnk2 × CilDB CilDB DLGP5 CHD5 comp221208_c0 1 × RFX PSB7 Rfx2 Rfx2 RFX comp221250_c1 comp221250_c1 RFX comp221250_c1 comp221250_c1 AT2B2 comp221257_c2 Rfx2 comp221260_c2 AT2B2 AT2B2 Rfx2 Rfx2 AT2B2 RHG06 T38BA comp221265_c0 comp221269_c0 comp221291_c0 ACK1 CTCF TBR1 214 Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Centrosome, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted 67 5.86e-152 432 0.88 × Basal Body tity Per- cent Iden- 32.343 3.31e-3932.203 149 7.66e-37 0.51230.323 142 9.5e-11 ×51.659 0.48886.158 1.67e-59 Axonemal, 58.9 Basal Body, 85.882 × 8.48e-4842.239 209 0.12 0 1.59e-91 163 × 50.678 75159.758 288 1 0.178 0.822 0 × 37.603 × 0 2.78e-44 1 × 37.013 774 2.98e-65 × 44.04 827 151 7.93e-7352.033 208 5.42e-87 1 1 1 228 0.29833.065 × 34.118 1.27e-35 262 × × 41.682 × 9.65e-40 0.327 1.71e-149 Axonemal, 0.375 Basal Body, 144 × 44.597 153 442 × 1.93e-141 0.485 0.515 41751.772 × 1 1.1e-172 × 0.45741.27 ×51.391 496 5.82e-135 ×40.61 Axonemal, Basal Body, 31.895 0.543 Transport, Axonemal, 2.51e-62 427 Basal 030.357 × 0 1.51e-1562.35 219 689 Axonemal, 1 Ciliary 1089 80.5 × 0 1 1 1 1 × 1045 × × × 1 × DE Info ↑ ↑ ↓ ↓ → → → → ↓ → ↑ ↑ ↑ ↑ ↓ ↓ → → → → → → ↑ ↑ ↑ Name Component ID Local Gene (continued) comp221295_c0 F161B comp221295_c0comp221299_c3comp221299_c3 F161B AURKB CS024 comp221306_c0comp221310_c1 ARP3 CAYP2 comp221333_c0 PMGT1 comp221359_c0 CCD42 comp221362_c0 GR101comp221365_c1 × 34.763comp221372_c2 MKS1 6.24e-72 TEKT3 comp221372_c2 248 TEKT3 1comp221395_c1 × comp221398_c0 LRIQ3 comp221407_c0 VWA3B AT2C1 X X X X X X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1FOXJ1 comp221359_c0FOXJ1 CCD42 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX Ciliome2CilDBProteome Ciliome FoxJ1 RFX comp221354_c1Survey YI011 Localization Known Ciliarycilium basal body centrosome bb TF inter- tekt3 axoneme CilDB Ciliopathy Survey RFX Name cfap77 ccdc42b fam161a connecting 8 6 Ensembl IDENSG00000170264 Source 3, 4, 5, 6, ENSG00000156050 Gene 4, 5, 7ENSG00000087586ENSG00000174132 1, 6 3, 4, 5 fam161bENSG00000151065ENSG00000115091 3ENSG00000106686 2, fam174a 7 4, aurka 5ENSG00000180881 bb (centrosome) 2, 4, 5ENSG00000120694 CilDB Ciliopathy Survey dcp1bENSG00000085998 3 actr3 spata6l 6 caps2ENSG00000188523 2, 7ENSG00000186710 1, 4, 5 pomgnt1 hsph1ENSG00000161973 1, golgi 2, 7ENSG00000184117 c9orf171, ENSG00000171509 Ciliome2 Cilia Proteome 2 cfap73, 4, 5ENSG00000106070 RFX ENSG00000141738 ccdc42 FoxJ1 3ENSG00000011143 Ciliopathy 3 Survey 4, RFX 5, nipsnap1 6, 8ENSG00000153060 rxfp1 1, 6, 8 comp221306_c0 grb10 Rfx2ENSG00000125409 grb7 RFX basal body tz 1, CilDB 3, Ciliome2 4, Cilia 5, ARP3 RFX ENSG00000109805 tekt5ENSG00000197912 Cilia SysCilia Proteome 3 Ciliopathy Survey ciliumENSG00000130702 axoneme 3ENSG00000162620 3 SysCilia Rfx2 comp221301_c0 RFX CilDB Ciliopathy 2, 3, 7ENSG00000176029 4, 5ENSG00000064270 DCP1B FoxJ1 ncapg 4, 5 spg7 lama5 lrriq3 comp221314_c1 c11orf16 atp2c2 comp221359_c0 HS74L RFX Rfx2 NIPSN Rfx2 Ciliome2 comp221363_c4 comp221363_c4 RAPH1 Rfx2 Rfx2 RAPH1 Rfx2 Rfx2 RFX RFX comp221378_c2 comp221391_c0 comp221393_c0 CND3 SPG7 LAMA 215 Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 70.43 1.53e-10165.33364.523 29356.68865.868 7.17e-49 064.375 6.96e-163 0 3.63e-158 1 182 461 71684.793 × 449 649 6.11e-133 0.118 0.507 0.463 0.493 37286.047 0.42 × × × 1.36e-132 × 39.485 × 1.06e-53 0.5 372 Basal Body 40.385 ×33.186 8.2e-133 186 1.15e-26 0.5 Axonemal, Basal Body, 40.288 417 ×43.35 5.98e-107 115 1 8.06e-39 Centrosome 40.162 × 32449.074 1.5e-103 154.658 8.27e-90 1 155 × 333 × 144.664 274 0.318 067.416 1.17e-75 0.682 4.01e-123 × × 47.287 783 × 6.04e-81 234 1 35368.451 3.89e-177 × 250 0.58536.559 1 1 501 × 0.415 × × 045.322 × 46.038 1 4.65e-94 2.87e-152 589 ×48.471 312 45265.517 Motility, Axonemal, Ciliary 7.79e-87 1 0 1 1 254 × × 597 × 1 Axonemal 1 × × DE Info ↑ ↓ ↓ ↓ ↑ ↑ ↓ ↓ ↑ → ↓ ↑ ↓ ↓ → ↑ ↑ ↓ ↓ ↑ ↑ ↑ ↑ → ↓ Name Component ID Local Gene (continued) comp221410_c5 RBL2A comp221437_c1comp221442_c1 KPYM comp221455_c0 CDKL1 RB11B comp221455_c0comp221467_c0 RB11B F92AA comp221476_c0 FBX5 comp221486_c2 APBB2 comp221502_c0 MAAT1 comp221521_c0comp221531_c4 MEC2 comp221554_c0 GAS8 F184A X X X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 FoxJ1 FoxJ1FOXJ1 RFX FOXJ1 FoxJ1 comp221521_c0FOXJ1 Rfx2 FoxJ1 MEC2 Table B.2.1: SysCilia Ciliopathy Survey Ciliopathy Survey Ciliome2 Proteome Ciliome Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Ciliopathy Survey Ciliome2 comp221572_c0 TTC25 Localization Known Ciliary TF inter- basal body vesicle trafficking trafficking regulatory complex dynein assembly gas8 axoneme dynein Name maats1 CilDB Ciliome2 Cilia 7 6, 7, 8 Ensembl IDENSG00000079974 Source 1, 3, 4, 5ENSG00000067225ENSG00000095906 1ENSG00000101447 Gene 3, 6ENSG00000100490 rabl2b 3ENSG00000205111 7 3, 4, 5ENSG00000103769 nubp2 6, pkm 8 fam83d bbENSG00000185236 cdkl4 cdkl1 CilDB 6, 7ENSG00000153789 rab11a 3, 4, 5 centrosome ENSG00000116747 Ciliopathy SurveyENSG00000156509 3 CilDB rab11b 3, 4, 5 fam92b vesicle ENSG00000101194 Ciliome2ENSG00000141434 3 RFX Rfx2 4, 5ENSG00000163697 fbxo43 trove2ENSG00000163428 2ENSG00000183833 3 1, 2, 4, 5, slc17a9ENSG00000213533 mep1bENSG00000148175 3 Rfx2 1, 2, apbb2 7ENSG00000116218 lrrc58 RFX 4, 5ENSG00000141013 1, tmem110 2, 4, 5, ENSG00000111879 stom comp221437_c1 2, comp221437_c1 3, 4, comp221442_c1 5 nphs2ENSG00000250151 KPYM RFX CiliomeENSG00000204815 fam184a KPYM CDKL1 2 6, 7ENSG00000118777 RFX Rfx2ENSG00000124767 2 CilDB Ciliome2 Cilia ENSG00000136542 3 arpc4-ttll3 3 Rfx2 ttc25 RFX FoxJ1 axonemal abcg2 comp221475_c2 Rfx2 glo1 galnt5 comp221485_c4 comp221486_c2 Rfx2 RO60 S17A9 APBB2 RFX comp221497_c1 RFX comp221503_c0 LRC58 TM110 FoxJ1 FoxJ1 comp221556_c1 Rfx2 Rfx2 TTLL3 comp221595_c1 comp221606_c2 comp221607_c0 ABCG2 GALT5 LGUL × 43.13 2.83e-138 430 1 × 216 Centrosome, Other Organells Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 70 0 1001 0.446 × tity Per- 67.5 1.22e-89 288 0.249 × Centrosome cent Iden- 45.706 2.45e-9585.271 3.65e-69 29070.997 237 1 0.106 0 × × 100636.163 Axonemal, Basal Body, 67.672 4.75e-100 0.448 8.53e-98 321 × 32564.508 0.28145.52 168.84 2.38e-18027.66 × × 0 3.67e-0855.14 520 Basal Body 54.474 0 54462.609 50.854.478 1.27e-35 128858.146 0 6.18e-31 1 0 0.4765.508 0.01248.625 1.35e-174 127 × 204031.466 × 125 × 1985 0 1 525 0.50461.983 0.501 0 0.496 0.48735.857 × 0 783 × 54.521 3.17e-34 × × 53.96 2.2e-120 × 0 551 157.627 1.14e-130 584 140 1 × 423 58850.249 450 035.977 1 2.2e-118 × 0.485 1 1 0.51541.352 × 725 1 × 2.96e-79 372 × × 031.95 × × Basal 8.46e-60 0.661 Body, 0.339 Ciliary 256 579 Basal Body × × 209 Transport, Axonemal, 1 Basal 1 0.514 × × × Transport, Axonemal, Basal DE Info ↑ ↑ ↑ ↑ → → → → → → → → → ↓ ↓ ↑ ↓ ↑ ↓ ↑ → ↑ ↑ ↑ ↑ ↓ ↑ ↑ Name Component ID Local Gene (continued) comp221632_c2 KPC1 comp221634_c0 FBN2 ×comp221645_c1 43.602comp221645_c1 1.04e-36 FLOT1 150 FLOT1 comp221653_c6 1 × PTPRF comp221689_c0comp221694_c0 ZN423 MARK3 comp221709_c0 IFT81 comp221724_c0comp221727_c0 LRCC1 comp221728_c0 TCHP CC114 X X X X X X thy Cil- iopa- action FoxJ1FOXJ1 comp221617_c0FOXJ1 CC113 FOXJ1 FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 RFX FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: Proteome Ciliome SysCilia Ciliopathy Survey RFX Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey Ciliome2 SysCilia CilDB Ciliopathy Survey Cilia Proteome Ciliome Localization Known Ciliarybb assem- bly/folding factor TF inter- satellite cilium ift ift-b dynein complex assembly ift81 basal body Name ssx2ip centriolar c19orf26 6 6, 7, 8 Ensembl IDENSG00000103021 Source 1, 2, 7ENSG00000181004 4, 5, 6, 8 Gene ccdc113ENSG00000166501ENSG00000154229 3ENSG00000136122 bbs12 3 4, 5 basal bodyENSG00000166147 bbs - ENSG00000049323 5ENSG00000119681 3ENSG00000130038 2ENSG00000135837 prkcb 2 prkca CilDB Ciliome2 4, Cilia 5, bora 6ENSG00000117155 2, 3, 4, 5, fbn1 ltbp1ENSG00000137312 efcab4b ltbp2ENSG00000235676 7 bbENSG00000077522 (centrosome) 3ENSG00000099625 2 Ciliopathy Survey 4, 5ENSG00000153707ENSG00000105426 3ENSG00000108813 2 abhd16aENSG00000064195 flot1 3ENSG00000108395 RFX 3 Cilia Proteome actn2ENSG00000082397 cbarp, 3ENSG00000140284 2ENSG00000102935 3 ptprd 6, 8ENSG00000007047 ptprs Rfx2ENSG00000167984 6 Rfx2 FoxJ1ENSG00000162105 5 RFX trim37ENSG00000251322 epb41l3 3 Ciliome2ENSG00000122970 slc27a2 2 RFX 1, 3, 4, Rfx2 5, nucleus nucleus FoxJ1ENSG00000137504 mark4 comp221632_c2ENSG00000133739 comp221634_c1 3 SysCilia Ciliopathy comp221632_c2 Survey nlrc3 Ciliome bb 4, shank2 5ENSG00000139437 shank3 4, KPC1 5ENSG00000105479 FBN2 comp221634_c1 Cilia Proteome KPC1 comp221634_c1 comp221642_c0 1, 2, 6, × 8 Rfx2 crebzf lrrcc1 32.722 Ciliopathy FBN2 Survey 6.77e-117 EFC4A FoxJ1 FBN2 ccdc114 × tchp FoxJ1 × cilium FoxJ1 axonemal 403 50.879 Rfx2 34.701 comp221645_c1 3.77e-150 comp221646_c0 0.189 Rfx2 comp221648_c1 0 Rfx2 Rfx2 490 FLOT1 × comp221653_c6 ABHGA Rfx2 comp221670_c0 1239 comp221653_c6 ACTN 0.23 PTPRD 0.581 E41L1 × comp221655_c0 RFX PTPRD Rfx2 comp221655_c0 comp221657_c4 × FoxJ1 comp221674_c2 DLLH TRI37 DLLH S27A2 Rfx2 RFX comp221702_c1 comp221707_c0 comp221707_c0 RFX SHAN3 CN16B SHAN3 comp221709_c0 IFT81 217 Centrosome, Other Organells Found Category Score EValue BitScore Weighted tity Per- 43.2 6.95e-23 98.6 0.115 × cent Iden- 32.747 9.1e-5744.048 2.15e-67 19842.92523.026 3.99e-23 239 0.48629.73 027.778 1.14e-12 × 0.309 7.33e-09 10443.939 Axonemal 534 × 5.67e-57 73.238.614 60.555.102 5.21e-167 Basal 1 Body 0.691 6.55e-75 207 × 1 × 51255.963 153.433 255 4.4e-76 × 34.595 5.95e-132 1 × 1.36e-27 145.411 0.28 × 259 Centrosome 39650.079 1.45e-4746.854 × 115 ×47.692 0.285 0.43554.925 4.09e-116 157 0 Basal Body,76.837 Centrosome 1.38e-125 × × 0 1 365 397 607 1 × 0 79846.05539.007 1.13e-135 × 134.363 2.58e-90 714 1 1 4.02e-13 45073.333 1 × 74.429 1.07e-148 × 306 × 73.9 1.47e-111 × 0.527 1 417 0.35875.49 × 34776.238 × 2.16e-10942.609 × 6.39e-108 1 0.29159.921 1.25e-36 0.242 1.41e-107 335 × × 33236.364 ×54.278 7.33e-56 128 Centrosome 315 0.234 8.03e-137 0.232 Axonemal,73.203 Basal Body, 39.186 × 5.33e-71 191 × 409 6.87e-113 1 1 241 × 380 × 1 1 × × 1 1 × × DE Info ↑ ↓ ↓ ↑ ↑ ↑ → → ↓ ↓ ↓ → ↓ ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓ ↑ ↓ ↓ ↓ ↓ → → → → ↑ ↑ Name Component ID Local Gene (continued) comp221728_c0comp221738_c2 CC114 comp221740_c3 PLK2 comp221787_c0 EFC12 CCD14 comp221854_c1 NOTUM comp221860_c0 ILFT1 comp221929_c0 EGFR comp221942_c2comp221947_c0 CCD66 PHB comp221957_c0comp221976_c0 FBX8 ADRO X X X X X X thy Cil- iopa- action RFX FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: CilDB Ciliopathy Survey Ciliome2 Ciliopathy SurveySysCilia Ciliopathy Survey RFX RFX Ciliome2Ciliopathy Survey RFX RFX Localization Known Ciliarydynein assembly TF inter- satellite centriole bb distal appendage satellite Name cep83 lmntd1 ccdc63 axonemal 1700007k13rik 6, 7 Ensembl IDENSG00000173093 Source 1, 3, 4, 5, ENSG00000142731 4, 5, 6ENSG00000173846 Gene ENSG00000172771 3 4, 5ENSG00000116786ENSG00000175455 3 plk4 4, 5, 6ENSG00000176769ENSG00000123358 bb (centrosome) 2 efcab12ENSG00000162843 3 plk3 CiliopathyENSG00000173588 Survey 2, 7 plekhm2 ccdc14 4, 5, 6, 8 centriolar ENSG00000186654ENSG00000185269 3 tcerg1lENSG00000152936 RFX 2 ccdc41, nr4a1 wdr64 4, 5, 7ENSG00000133027ENSG00000140451 3ENSG00000133816 3ENSG00000012211 3ENSG00000106346 3 prr5 notum ifltd1, ENSG00000066027 3ENSG00000160345 7 2, 4, 5, 7 pemt Ciliome2ENSG00000146648 mical2 pif1ENSG00000147439 prickle3 3 c9orf116, ENSG00000180376 3 usp42 ppp2r5a RFX 4, 5, 6ENSG00000167085 Rfx2ENSG00000173531 1 Rfx2 6 FoxJ1ENSG00000177981 ccdc66 egfrENSG00000170017 FoxJ1 7 bin3 centriolar ENSG00000107372 3 Rfx2ENSG00000164117 comp221738_c2 3 4, 5 phb comp221773_c1ENSG00000166881 mst1ENSG00000161513 comp221843_c0 3 basal body PLK2 4, 5 comp221801_c0ENSG00000205143 SNX29 asb8 Rfx2 FoxJ1 alcamENSG00000166444 WDR64 3 zfand5 comp221812_c1 fbxo8 3 Ciliopathy Survey TCRG1 tmem194a Rfx2 Rfx2 Rfx2 RARA CilDB fdxr Rfx2 comp221854_c1 × comp221854_c1 Rfx2 arid3c 36.982 Ciliome2 NOTUM st5 3.33e-61 NOTUM comp221866_c0 comp221878_c0 comp221868_c0 comp221889_c0 213 Rfx2 Rfx2 PEM2 comp221910_c2 MICA3 comp221914_c0 PRIC2 PIF1 1 UBP36 × comp221929_c0 2A5A comp221929_c0 Rfx2 Rfx2 RFX Rfx2 comp221947_c0 EGFR EGFR comp221947_c0 RFX PHB Rfx2 comp221947_c0 comp221952_c3 Rfx2 PHB comp221959_c1 ZFAN6 PHB T194A comp221980_c1 comp221990_c4 DRI ST5 218 Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted 50 1.75e-65 218 1 × Transport, Axonemal, Basal tity Per- cent Iden- 47.458 1.43e-53 19232.9434.22834.584 1.61e-102 130.588 0 3.52e-07 352 ×44.809 0 9.43e-46 811 Transport, 51.2 Axonemal, 0.169 Basal 32.338 91977.143 153 0.39 × 38.095 6.18e-1228.283 4.99e-17 1 0.441 0 × 0.15242.276 2.91e-10 67.8 1.29e-24 × × 71.6 × 856 60.1 10864.483 1 0.84855.446 2.4e-122 1 8.52e-72 1 × × × 152.799 385 × 56.891 241 × 36.364 8.02e-114 1.8e-76 0 0.30665.035 1 33448.404 2.96e-132 × 1.16e-43 × 278 546 41431.469 Transport, Axonemal, Basal 1 155 8.8e-13 0.69434.877 1 5.54e-62 × × 1 70.925.501 × 135.674 5.56e-47 216 × 30.501 1.12e-104 0.09 × 52.493 1.2e-61 0.27662.39 2.41e-123 173 324 × × 236 369 0.221 0.413 0 × × 1 5610 1 × × 1 × Transport, Axonemal, Basal DE Info ↑ → → → ↑ ↓ ↓ ↑ ↑ ↓ ↓ ↑ → ↑ ↑ ↓ ↑ ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↓ ↑ Name Component ID Local Gene (continued) comp221993_c0 MIPT3 comp222009_c0 ABCA5 comp222018_c0comp222032_c2 UBX10 FBX36 comp222062_c0 AR13B comp222070_c4 TBC12 comp222084_c0comp222121_c1 CNTLN ZC21A comp222126_c2 S43A3 X X thy Cil- iopa- action RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1FOXJ1 FoxJ1 comp222126_c2FoxJ1 S43A3 comp222143_c1 DYHC2 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome SysCilia Ciliopathy Survey Ciliopathy Survey Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Localization Known Ciliarycilium ift cytoplasm cytoskeleton ift-b TF inter- cytosol lipidated protein transport ift-associated ift retrograde transport Name traf3ip1 basal body dync2h1 golgi axoneme 5, 6, 7, 8 8 Ensembl IDENSG00000204104 Source 1, 2, 3, 4, ENSG00000141338 Gene 4, 5ENSG00000160752ENSG00000154265 3ENSG00000162543 3 4, 5ENSG00000153832 abca8 2, 4, 5, 7ENSG00000183090ENSG00000122378 3 fdps ubxn10 abca5ENSG00000088986 2 fbxo36ENSG00000197361 1ENSG00000083635 3ENSG00000169379 3 6, 8 fam213a frem3ENSG00000159733ENSG00000117280 3 fbxl22 6 nufip1 Ciliome2 arl13bENSG00000132405 basalENSG00000133740 body 2ENSG00000044459 2 zfyve28 4, 5ENSG00000143772 Cilia Proteome CiliomeENSG00000104427 rab29 3 RFX CilDB 3, 4, golgi 5 cilia base tbc1d14 RFX FoxJ1ENSG00000137691 1, Rfx2 2 cntln RFX Rfx2ENSG00000119242 zc2hc1a 2, 4, 5 itpkbENSG00000167703ENSG00000134802 c11orf70 3 comp222036_c0ENSG00000072195 2 Rfx2ENSG00000135605 ccdc92 3 comp222009_c0 comp222009_c0ENSG00000187240 2 1, 2, Rfx2 6, E75 7, ABCA5 Rfx2 ABCA5 slc43a2 slc43a3 comp222032_c2 speg CilDB Cilia Proteome Rfx2 comp222041_c3 tec comp222042_c0 FREM1 comp222050_c0 DNAL4 FoxJ1 FXL14 NUFP1 FoxJ1 RFX comp222067_c0 RFX Rfx2 comp222070_c4 LST2 comp222080_c6 RFX TBC12 comp222089_c1 E2F5 Rfx2 FoxJ1 IP3KB Rfx2 FoxJ1 comp222126_c2 comp222126_c2 comp222136_c0 S43A3 S43A3 comp222141_c2 OBSCN BTKL 219 Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 50 5.25e-41 149 0.338 × Transport, Axonemal, Basal tity Per- cent Iden- 49.415 1.66e-141 45449.77 7.71e-14649.251 0.49981.622 1.81e-102 456 × 034.444 301 0.501 1.55e-11957.209 967 × 375 5.47e-162 1 49159.864 1 × 1 3.35e-127 0.411 × Transport, Basal52.143 Body × 38475.46 5.57e-105 × 1.99e-89 Basal Body, Centrosome, 0.321 320 Axonemal, Basal Body, × 27048.726 0.26885.286 5.95e-96 × 135.548 292 047.841 × 0.66238.525 1.22e-91 0 66236.082 6.08e-1740.789 × 7.24e-1228.816 5.27e-149 275 743 80.9 6.13e-45 65.9 1 45345.591 0.55139.669 9.92e-140 178 × 1 0.44954.185 1 1.49e-18 × 49.574 4.96e-83 Transport, × 432 Axonemal, × Basal 155.639 × 80.9 3.95e-52 1 × 247 030.103 × 1 166 1.2e-28 1 Basal58.148 738 Body × 1 3.43e-108 × 124 1 × 343 1 × 1 × 1 × × Basal Body DE Info → → → ↓ ↑ → → → ↓ → → ↑ ↓ → ↓ ↓ ↑ ↑ → → ↓ → → → ↑ Name Component ID Local Gene (continued) comp222146_c0 CBPC2 comp222146_c0 CBPC2 comp222187_c0comp222191_c0 TTC12 NEK1 comp222191_c0 NEK1 comp222216_c0 TAD2B comp222219_c1comp222228_c0 KIF3A CT194 comp222258_c1 CSPP1 comp222300_c0comp222311_c0 MSRB2 comp222315_c0 K1407 PTPC1 comp222173_c11 ARL2 X X X X X X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FoxJ1 Table B.2.1: Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliary TF inter- transport bb (centrosome) pericentriolar matrix axoneme nucleus bb subdistal appendages kif3a cilium ift-kinesin Name kiaa1407 8 Ensembl IDENSG00000165923 Source 2, 3, 4, 5ENSG00000146856 Gene 4, 5ENSG00000066933 agbl2ENSG00000213465 2 4, 5, 6ENSG00000149292 2, agbl3 4, 5, 8ENSG00000137601 myo9a 4, arl2 5, 6, 8 lipidated ttc12 protein ENSG00000197168 centrosome 3, 4, 5 nek1ENSG00000136098ENSG00000171497 basal 2 SysCilia body CiliomeENSG00000178053 7 6, 8 nek5ENSG00000173011ENSG00000131437 3 RFX RFX nek3 4, 5, 6, 7, ppid mlf1ENSG00000088854 4, cilium 5 axonemeENSG00000162461 RFX ENSG00000136158 3 tada2b SysCilia CiliopathyENSG00000164056 Survey 3 FoxJ1ENSG00000175764 3 c20orf194ENSG00000104218 2 2, 4, 5, 6 slc25a34 Ciliome2ENSG00000257594ENSG00000187556 3ENSG00000118939 spry2 comp222161_c0 2ENSG00000106397 cspp1 spry1 7ENSG00000148450 ttll11 3 bb MYO9A 2, 4, 5 RFX ENSG00000163617 galnt4 nanos3 2, 4, 5ENSG00000158079 uchl3 FoxJ1 msrb2 plod3 2, Ciliopathy 6, Survey 8 ccdc191, Rfx2 ptpdc1 comp222191_c0 comp222199_c1 RFX RFX bb Cilia Proteome NEK1 Rfx2 PPID comp222216_c0 Rfx2 Rfx2 SysCilia Ciliopathy Survey FoxJ1 TAD2B RFX comp222230_c0 Rfx2 FoxJ1 comp222237_c0 comp222237_c0 S2535 comp222254_c0 Rfx2 SPY2 SPY2 TTL11 comp222276_c1 comp222278_c0 NANO1 comp222298_c0 GALT5 comp222291_c0 PLOD3 UCHL 220 Centrosome, Ciliary Membrane, Transition Zone, Other Organells Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Membrane Found Category Score EValue BitScore Weighted tity Per- 45.6 5.96e-101 325 0.181 × cent Iden- 77.326 1.29e-8568.595 290 1.96e-4967.14 0.118 18251.433 × 0 0.074 Axonemal, Basal Body, 065.051 × 67438.125 Ciliary Membrane 652 1.63e-25 0.275 078.68953.518 × 6.6e-67 0.266 1.46e-132 108 652 Regulation, Basal × Body, 211 38839.545 0.26641.39 3.85e-88 Regulation 153.258 6.98e-89 × 70.106 × 1 1 286 0 28223.81 × × 0 1.39e-15 1 Transport, Axonemal, Basal 89446.278 1 572 5.91e-150 78.2 × 0.499 × 0.319 50238.562 × 3.31e-61 1 × 33.314 1 × 218 051.768 × 1.57e-13550.464 1 Central Pair, Centrosome 62.667 939 8.96e-106 41749.569 2.89e-30 × 55.378 4.07e-133 337 1 110 394 1 0.754 054.173 × 0.246 × × 776 1 × 0 × 0.519 719 × 0.481 Axonemal, Ciliary × Signaling DE Info ↑ ↑ ↑ ↑ ↑ → ↑ ↑ ↑ → → → → ↑ ↑ ↓ ↑ ↑ → → → ↑ ↑ Name Component ID Local Gene (continued) comp222326_c0 RFX3 comp222326_c0comp222326_c0 RFX3 RFX3 comp222326_c0comp222336_c1 RFX3 comp222360_c1 MIIP TEKT2 comp222364_c1comp222365_c3 KCRM comp222379_c0 RBM19 SPEF2 comp222414_c0comp222422_c0 DEP1B DLEC1 comp222431_c0 F188B comp222458_c1 PKD2 comp222458_c1 PKD2 X X X X X X X thy Cil- iopa- action RFX FOXJ1 RFX FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCiliaSysCilia CilDB Ciliopathy Survey CilDB Ciliopathy Survey Cilia Proteome Survey Ciliome2 Cilia Proteome Ciliome comp222326_c0 RFX3 Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Localization Known Ciliarycentriole cilia bb axoneme membrane TF inter- transcription factor factor membrane axoneme - signalling rfx3 nucleus rfx2 transcription tekt2 cilium axoneme SysCilia CilDB Ciliopathy dlec1 Ciliome2 RFX spef2 central pair SysCilia CilDB Ciliopathy Name cep41 basal body 8 8 5, 6 5, 6, 7, 8 5, 6, 7, 8 7 Ensembl IDENSG00000106477 Source 1, 4, 5, 6, ENSG00000064300 Gene 8ENSG00000080298 1, 4, 5, 6, ENSG00000087903 1, 2, 3, 4, ngfrENSG00000132005 4, ciliary 5ENSG00000116691 4, 5ENSG00000113761ENSG00000092850 3 1, 2, 3, 4, rfx1ENSG00000259207ENSG00000115221 miip 3ENSG00000066629 3 znf346ENSG00000166165 1 1, 4, 5ENSG00000103018ENSG00000204052 3 3, 4, 5 itgb3ENSG00000152582 itgb6 eml1 1, ckb 2, 3, 4, lrrc73ENSG00000035499 cyb5b 4, 5ENSG00000008226 2, 3, 4, 5, ENSG00000106125 depdc1b CilDB CilDB RFX 4, 5ENSG00000145390ENSG00000213145 RFX 3ENSG00000120647 2 Rfx2ENSG00000118762 5 fam188b 6, 8ENSG00000107593 usp53 Rfx2 6 RFX crip1 Rfx2 ccdc77 comp222354_c0 pkd2 ciliary RFX PACRG Rfx2 pkd2l1 comp222361_c1 ciliary comp222361_c2 signaling comp222364_c1 Ciliopathy Survey RFX ITB1 comp222364_c1 PAT3 EMAL1 EMAL1 RFX Rfx2 FoxJ1 RFX comp222438_c2 comp222438_c2 comp222456_c0 UBP54 RASF2 CCD77 221 Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 39.91448.649 1.52e-43 1.83e-1867.11129.863 1.87e-86 149 80.1 1.51e-2640.089 255 117 167.179 1 039.45 7.42e-98 × 42.938 × 1 6.17e-1758.586 3.32e-37 1 55852.486 1.45e-30 282 × 2.48e-64 × 79 126 122 Centrosome 71.059 1 196 173.381 1 × 1 × 1 056.923 1 × Basal 0 × Body, Centrosome, 9.9e-128 × 52.444 × 58627.054 6.25e-63 66072.857 4.47e-63 36838.462 6.38e-6352.389 4.64e-75 218 0.628 1 225 1 197 0.372 × 249 × 048.297 × × 1 174.131 621 1 0 × 1.86e-135 × 0.393 × 40.418 38329.457 960 3.82e-70 × 1.71e-09 0.60742.034 Transport, 220 Axonemal, Basal 64.646 1 59.7 2.34e-146 × 54.717 × 48393.909 1.65e-11 0.11 1 054.118 2.21e-140 Transport, Axonemal, Basal 38.725 1.31e-94 × × 0.89 67.464.762 3.48e-29 391 66232.224 × 6e-46 2.46e-100 298 117 1 331 0.718 1 1 146 0.282 × × × × × 1 Axonemal, 1 Ciliary × × DE Info ↓ ↑ → → ↓ → → ↓ ↓ ↑ → ↑ ↑ ↑ → ↑ → ↓ ↓ ↑ ↑ ↑ ↑ → ↑ ↓ ↑ ↑ ↓ ↑ Name Component ID Local Gene (continued) comp222516_c2comp222517_c0 K0753 ORC1 comp222569_c3 TC1DB comp222602_c0comp222610_c1 WDR69 GHC1 comp222626_c0 RBL2 comp222626_c0comp222643_c0 RBL2 comp222658_c2 IDLC RGS22 comp222660_c2 STK39 X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 Table B.2.1: Ciliopathy SurveySysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliarysatellite TF inter- nucleus centriole cilium connecting cilium tz dynein dnali1 cilium axonemal Name tctex1d1 Ciliome2 Cilia Proteome RFX 7 6, 7, 8 Ensembl IDENSG00000181781 SourceENSG00000170788 1ENSG00000273079 7ENSG00000101421 2ENSG00000198920 7 Gene 4, 5, 6ENSG00000085840 6, 8 odf3l2 dydc1ENSG00000157916 kiaa0753 grin2bENSG00000171060 chmp4b 3ENSG00000133112 centriolar 2ENSG00000117036 7ENSG00000152760 3 orc1 2, 3, 4, 5, centrosome ENSG00000152684 CilDB c8orf74 rer1ENSG00000123977 3 Ciliome2 2, 4, tpt1 5 etv3ENSG00000177542 4, 5ENSG00000178104ENSG00000186952 3 daw1ENSG00000132819 2 peloENSG00000102904 2ENSG00000092200 slc25a22 2, 7 6, 8 pde4dip tmem232 FoxJ1ENSG00000103479 3, tsnaxip1 4, rbm38 5ENSG00000163879 1, comp222474_c0 2, basal 4, body 5, comp222502_c2ENSG00000162241 comp222478_c0ENSG00000133665 FoxJ1 3 rbl2 comp222477_c2 Rfx2 ODF3A 2, 4, CHM4B 5 NMDA1ENSG00000132554 Rfx2 Ciliome2 DYDC1 ENSG00000198648 × 3 6 29.784 slc25a45ENSG00000159239 dydc2 comp222533_c2 1.28e-88ENSG00000104388 7 comp222519_c0 RFX ENSG00000162032 7 Rfx2ENSG00000161692 3 311ENSG00000163528 RFX 3 MLF1 comp222562_c0 rgs22ENSG00000158220 comp222555_c0 3 RER1 stk39 1700003e16rik FoxJ1 3 Rfx2 FoxJ1 axoneme 1 ERG rab2a TCTP comp222570_c0 FoxJ1 spsb3 Ciliome × dbf4b chchd4 Ciliopathy Survey comp222625_c3 esyt3 PELO comp222617_c3 comp222610_c1 RFX comp222621_c0 TXIP1 TM232 GHC1 Ciliome2 RFX RB24A Rfx2 Rfx2 comp222653_c0 Rfx2 MCATL Rfx2 Rfx2 comp222658_c2 Rfx2 comp222661_c1 comp222666_c0 RGS22 CB081 comp222670_c2 comp222670_c2 RAB2 comp222679_c0 comp222680_c0 SPSB3 SPSB3 MIA40 ESYT2 222 Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 27.688 3.29e-5051.95497.29729.252 2.66e-75 187 1.13e-12 0 21847.143 63.9 1 618 6.94e-65 ×39.674 1 5.82e-107 201 1 Basal86.869 Body 1 × 2.76e-109 × 37074.143 × 42.484 1 355 Transport, Axonemal, Basal 0 × 38.049 1 0.266 050.165 2.29e-15843.478 8.2e-103 × × 982 3.93e-33 480 61644.984 305 0.73469.811 1.5e-161 12063.136 4.18e-50 × 155.748 2.38e-98 132.039 498 1 × 171 × 1 3e-29 320 × 0 0.504 × 0.17356.566 0.324 × 4.55e-110 119 Transport,56.075 725 × Axonemal, Basal × 35968.47365.473 1 049.608 3.12e-171 1 0 × 1 × 491 657 035.068 × 516 0.48871.977 650 1 0.512 0 × × × 0 724 1 1332 × 1 Axonemal, Ciliary × 1 Axonemal, Basal × Body, DE Info ↑ ↓ ↓ → → ↑ ↑ ↑ ↑ → → ↑ → → → ↑ ↓ ↑ ↓ ↑ ↑ ↑ ↓ ↓ Name Component ID Local Gene (continued) comp222681_c1comp222685_c1 LRP1 × CEP63 comp222714_c2 42.131 0comp222720_c3 VHL comp222744_c0 598 EFCB1 comp222751_c0 0.175 AKP13 comp222764_c0 × AP3B2 CA222 comp222779_c0 SP17 comp222789_c0comp222790_c0 LAT2 comp222817_c0 M3K19 RCBT1 comp222825_c2 EFHC1 X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FoxJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey Ciliome2 CiliomeSysCilia Ciliopathy Survey RFX Ciliome2 Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia comp222829_c1 TPC10 Localization Known Ciliary TF inter- nucleus axoneme fibrous sheath axoneme centrosome efhc1 axoneme Name spa17 cilium sperm cfap74 7, 8 5, 6, 7, 8 Ensembl IDENSG00000165474 Source 4, 5ENSG00000168702ENSG00000182923 2ENSG00000057657 6ENSG00000113163 Gene 3ENSG00000100410 3ENSG00000134086 3 gjb2 6, 8 lrp1b cep63ENSG00000034239 col4a3bp 2, bb 4, 5, 7ENSG00000104880 phf5a 2, vhl 3ENSG00000109794 efcab1 cytosol 4, cilium 5ENSG00000103723ENSG00000142609 Ciliopathy 3 Survey arhgef18 2, 4, 5, 7ENSG00000092295 fam149aENSG00000130958 2 c1orf222, ENSG00000064199 3 2, 4, ap3b2 5, 6, Ciliome2ENSG00000115504ENSG00000086232 3 RFX ENSG00000106351 3ENSG00000198162 tgm1 slc35d2 2ENSG00000140481 FoxJ1 3 2, 3, Rfx2 Rfx2 4, 5 RFX ehbp1ENSG00000176601 Rfx2 eif2ak1 4, 5 ccdc33ENSG00000136161 agfg2 man1a2 comp222681_c1 4, 5ENSG00000086061 comp222691_c0 comp222709_c0ENSG00000140403 1ENSG00000096093 map3k19 Rfx2 1 LRP1 comp222713_c0 1, 2, 3, PRDM1 4, RFX C43BP × rcbtb2 ×ENSG00000160218 37.69 PHF5A 56.18 8 Rfx2 dnaja4ENSG00000122729 5.84e-78 0 7 269 FoxJ1 Rfx2 trappc10 2816 basal body comp222751_c0 0.825 1 Rfx2 CilDB aco1 Rfx2 × CilDB AP3B2 × comp222773_c2 RFX FoxJ1 Rfx2 comp222775_c1 TGM1 S35D2 RFX comp222784_c0 comp222784_c0 comp222784_c0 RFX comp222788_c2 EHBP1 EHBP1 EHBP1 MA1A2 comp222823_c0 comp222823_c0 DNJA1 DNJA1 comp222849_c0 ACOC 223 Membrane Body, Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- 41.2 9.08e-55 192 0.139 × cent Iden- 51.0146.15437.813 8.39e-68 0 1.79e-9061.136 207 562 288 0.26946.981 0.731 040.28 × × 128.729 0 855 4.11e-39 Transport, × Axonemal, Basal 063.175 80849.07 6.44e-138 150 1 788 5.78e-12439.216 401 2.89e-49 ×43.465 1 372 1 2.29e-86 1 Basal Body,39.423 Centrosome 184 × × 0.269 143.787 1.11e-16 × 40.288 1.27e-39 285 0.133 Axonemal, × Basal × 4.41e-27 Body, 84.3 0.20646.988 × 156 2.36e-18 112 0.061 × 0.11338.022 × 0.081 85.160.269 9e-98 × 40.351 × 0.21862.455 5.02e-56 306 0 × 208 079.476 0.782 Transition Zone 73629.966 3.29e-139 2.23e-32 × 105034.278 1 39358.144 5.17e-71 1 128 × 0.521 1 233 × 0 0.17 × × 0.309 × 618 Transport, Axonemal, Basal × 1 × DE Info ↑ ↑ ↑ ↑ ↓ ↑ ↑ → → → → → → → → ↑ ↑ ↓ ↓ ↑ ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp222857_c0comp222868_c0 SMO × 47.258 IFT74 comp222870_c0comp222876_c0 0 CI117 CEP76 587comp222883_c1comp222886_c1 HEAT6 1 F154B ×comp222899_c2 Axonemal, Ciliary comp222899_c2 ANXA7 ANXA7 comp222905_c1comp222921_c1 PCX4comp222921_c1 × T11L1 39.032 T11L1 comp222956_c0 0 IFT88 comp222961_c3 809comp222964_c1 TTC23 1 KATL2 × X X X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome CiliomeCiliopathy Survey RFX RFX SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliarymembrane axoneme - ciliary signalling cilium ift ift-b TF inter- centriole duplication bb axoneme Ciliopathy Survey Ciliome2 RFX cilium ift ift-b ift74 basal body ift88 basal body saxo1 Name pcnx4 c9orf117 7, 8 5, 6, 7, 8 Ensembl IDENSG00000128602 Source 6, 8ENSG00000096872 Gene 1, 4, 5, 6, ENSG00000172339ENSG00000160401 smo 3 2, 4, ciliary 5ENSG00000101624 4, 5, 6 cfap157, ENSG00000171444ENSG00000068097 alg14 3 4, 5ENSG00000155875 cep76 4, 5, 6, 7 bb (centrosome) ENSG00000085491ENSG00000138279 3 fam154a, ENSG00000185043 heatr6 7 mccENSG00000174628 7 4, 5, 7ENSG00000182718 2, 3 slc25a24ENSG00000170234ENSG00000175105 3 anxa7ENSG00000166922 3ENSG00000126773 iqck 2 cib1 4, 5ENSG00000179029 anxa2 2, 4, 5, 6ENSG00000166046 znf654 Rfx2 4, 5 tmem107ENSG00000197763 pcnxl4, scg5ENSG00000136169 Ciliome2 7 tzENSG00000032742 3 1, 2, 3, 4, Cilia tcp11l2 ProteomeENSG00000140395 RFX comp222868_c0 Rfx2ENSG00000205838 3, 7 txnrd3 Ciliopathy 4, Survey 5 setdb2ENSG00000103852 ALG14 Ciliome CiliomeENSG00000167216 2 Rfx2 Rfx2 2, 3, 4, 5 wdr61 comp222877_c0 RFX RFX ttc23l katnal2 FoxJ1 CRCM Ciliome2 ttc23 comp222898_c4 Rfx2 comp222899_c2 Rfx2 SCMC2 ANXA7 comp222899_c2 comp222899_c2 comp222899_c2 ANXA7 RFX ANXA7 comp222899_c2 ANXA7 Rfx2 ANXA7 Rfx2 comp222931_c0 RFX RFX comp222942_c1 TRXR3 FoxJ1 SETB1 comp222961_c3 WDR61 comp222961_c3 TTC23 224 Body, Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Transition Zone Ciliary Membrane, Transition Zone Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 64.0549.289 3.96e-118 1.54e-5844.10332.946 2.83e-149 343 2.05e-71 213 44941.489 0.32243.077 1.08e-156 244 1 0.678 2.1e-3655.398 × × 495 × 1 12252.659 043.292 × 157.182 2.55e-132 0 1 66750.215 × 0 3.15e-84 42756.471 × 0.505 65532.138 4.91e-26 601 1.64e-93 252 ×31.198 0.495 1 1.27e-54 103 Transport, Axonemal, 310 0.71 Basal × 47.592 1 × 54.25 1.02e-118 197 0.29 × 0.611 × 2.09e-142 × 351 Regulation, 0.38953.171 Other × Organells 1.87e-143 411 × 55.268 414 0.498 1 × × 0.502 048.71836.741 5.58e-131 Axonemal, Basal × Body, 73.333 4.58e-125 556 3.42e-143 392 Axonemal, Basal Body, 384 41274.865 158.713 1 0.281 1 1.15e-163 × 050.313 × × 1.77e-171 × Axonemal, 464 Basal54.451 Body, 591 Transport, Axonemal, Basal 52337.445 0.316 1.58e-31 0.403 0 0.484 × × 130 × 557 Centrosome 0.516 1 × × DE Info ↓ ↓ ↓ → → ↑ → → → ↑ → → ↑ ↑ ↓ ↑ ↑ → ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ Name Component ID Local Gene (continued) comp223001_c1comp223004_c0 S2611 LRC43 comp223024_c0 PROF4 comp223042_c6comp223044_c0 CBPC5 comp223047_c6 FA58B comp223047_c6 LRRD1 LRRD1 comp223075_c0comp223075_c0 KAPR comp223078_c0 KAPR JADE3 comp223086_c0 ADHL comp223086_c0comp223092_c1 ADHL comp223092_c1 MABP1 MABP1 X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FoxJ1FOXJ1 comp223086_c0FOXJ1 ADHL Table B.2.1: Ciliopathy Survey Ciliome2 SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey Ciliome Ciliopathy Survey Localization Known Ciliary TF inter- cytosolic? nucleus transition zone tz nucleus modification (polyglutamyla- tion) (centrosome) Name Ensembl IDENSG00000158104 SourceENSG00000177414 7 4, 5ENSG00000181045ENSG00000158113 3 Gene 3, 4, 5, 7ENSG00000154914ENSG00000176732 ube2u 3 hpd 4, 5 slc26a11ENSG00000092964 lrrc43 8ENSG00000072832ENSG00000178904 2 usp43ENSG00000084693 3 pfn4 4, 5ENSG00000262919 dpysl2 Ciliome2 6ENSG00000196526 ciliumENSG00000240720 3 crmp1 dpy19l3 4, 5ENSG00000188306 agbl5 2, 4, 5 fam58a SysCiliaENSG00000110721ENSG00000114302 cilia regulation 3 afap1 1, 6, 7 lrrd1ENSG00000005249 lrriq4 RFX Cilia Proteome 1, 6 Rfx2 RFX ENSG00000077684 prkar2a 6, chka 8 bb axoneme comp222970_c0 prkar2bENSG00000105229 Rfx2 FoxJ1ENSG00000125772 RFX CilDB 2 bb Ciliopathy axoneme Survey ENSG00000214021 comp223001_c1 3 HPPD phf17 6, 8 CilDB Ciliopathy basal Survey body S2611 ENSG00000197894 Rfx2 comp223032_c1 comp223018_c0 comp223032_c1 2, 7 RFX ENSG00000196616 pias4 gpcpd1 4, 5ENSG00000075702 ttll3 DPYS UBP31 DPYS 6 ciliumENSG00000137802 axoneme Rfx2 RFX 4, comp223039_c1 5 adh5ENSG00000074706 RFX adh1b 2 D19L3 wdr62 mapkbp1 Rfx2 comp223044_c0 basal body ipcef1 FA58B Ciliome2 Cilia Proteome comp223050_c4 FoxJ1 CHKA Rfx2 comp223082_c0 RFX comp223084_c0 RFX PIAS2 GPCP1 FoxJ1 comp223093_c3 CNKR2 225 Membrane, Other Organells Membrane, Transition Zone Body Body Found Category Score EValue BitScore Weighted 55 4.9e-83 265 1 × tity Per- cent Iden- 38.182 8.73e-1054.54546.802 60.145.098 3.24e-106 2.38e-81 0 0.06536.42 325 3.41e-171 × 61.035 256 54639.929 0.34936.376 52332.234 0.586 × 031.077 5.25e-76 1 0 7.86e-87 0 × 51.471 1 × 940 4.99e-136 274 593 291 × 76559.694 413 0.26444.915 9.26e-80 Basal41.406 Body, 1 Ciliary 0.73649.047 1 × 2.07e-29 160.751 236 × 0 × 141.121 2.34e-132 × × 53.429 4.19e-13 105 0 × 379 70759.552 1 0.081 73.6 Axonemal, 1196 Ciliary 038.558 × × 1.37e-168 0 139.819 1 0.919 101844.863 3.31e-59 1 505 × 8.63e-82 × × 787 × 187 144.619 25829.814 1 1.83e-114 × 63.315 1.38e-10 130.868 1.3e-175 × 1 341 2.62e-86 × 150.303 60.1 × 3.14e-47 493 × 48.214 297 1 2.49e-4649.673 1 179 2.8e-116 × 162.299 177 × 1 0.25247.089 1.43e-167 × 52.568 355 × 0.249 × 1.85e-123 493 0 0.499 × Signaling, Axonemal, Basal 371 × Signaling, 1435 Axonemal, Basal 1 Regulation 1 × 1 × × DE Info → → → ↓ ↑ ↓ ↓ → → ↑ ↑ → ↑ → → → ↑ ↑ → ↓ → ↓ → ↓ ↓ → → → → → → ↑ ↑ Name Component ID Local Gene (continued) comp223095_c1 TTLL3 comp223097_c1comp223103_c5 ZN708 INTU comp223127_c0comp223132_c0 CC157 INP5E comp223273_c0comp223278_c0 S26A5 RNF32 comp223287_c0comp223301_c5 TMC5 comp223301_c5comp223301_c5 ZIC4 ZIC4 comp223362_c0 ZIC4 K0895 X X X X X thy Cil- iopa- action RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FoxJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy SurveySysCilia Ciliopathy Survey Rfx2 Ciliopathy Survey Localization Known Ciliary TF inter- membrane bb axoneme - signalling axoneme (tip) axoneme (tip) factor Name cldn34 2 ankrd44 FoxJ1 comp223245_c0 ANR28 Ensembl IDENSG00000234469 Source 2, 4, 5ENSG00000135677ENSG00000118596 3 ac002365.1, ENSG00000196152 Gene 2 4, 5ENSG00000164066 4, 5, 6,ENSG00000110619 8ENSG00000130164 3ENSG00000198677 slc16a7 3 gnsENSG00000225663 znf79 3ENSG00000187860 3 intu 4, 5ENSG00000148384 plasma 4, 5, 6, 8 carsENSG00000131165 fam195bENSG00000131584 ldlr ttc37 3 ccdc157ENSG00000142684 3 inpp5eENSG00000170456 3ENSG00000198355 3 axoneme ENSG00000179023 3ENS- 3MUSG00000052331 chmp1aENSG00000105650 acap3ENSG00000131374 3 dennd5bENSG00000170615 3 4, 5ENSG00000214063 pim3 klhdc7aENSG00000105982 3 FoxJ1 2, Rfx2 3, 4, 5 RFX pde4c tbc1d5 slc26a5ENSG00000239264ENSG00000163646 3ENSG00000241685 rnf32 2 tspan4ENSG00000103534 Rfx2 3 comp223095_c1 2, comp223095_c1 3 Rfx2 Rfx2ENSG00000074047 Rfx2 RFX 3, 6, 8ENSG00000106571 txndc5 GNS 6, 8 GNS ENSG00000043355 clrn1 arpc1a comp223108_c0 6 tmc5ENSG00000137817 Rfx2 Ciliome comp223126_c0 comp223113_c0ENSG00000145147 comp223126_c0 2 Rfx2ENSG00000164542 3 Rfx2 Rfx2 SYCC ciliary 2, tip 4, 5 TTC37 VLDLR TTC37 Rfx2 Rfx2 ciliary tip zic2 comp223136_c1 kiaa0895 parp6 transcription comp223144_c4 Rfx2 comp223150_c0 Rfx2 RFX slit2 RFX comp223150_c0 CHM1A ACAP2 comp223200_c0 comp223203_c1 Rfx2 ZN593 DEN5B comp223249_c0 KELC PIM3 comp223257_c0 Rfx2 PDE4D FoxJ1 Rfx2 comp223276_c2 TBCD5 Rfx2 TSN9 comp223280_c2 comp223282_c0 comp223283_c0 TXND5 RFX FoxJ1 CLRN1 ARC1A Rfx2 comp223335_c0 comp223350_c0 PARP6 SLIT 226 Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted 50 6.05e-67 223 0.215 × tity Per- cent Iden- 46.57457.89561.749 3.21e-11576.351 062.594 34365.997 0 1223 0 0 0 727 136.517 69067.626 1.75e-20 1 55346.814 3.68e-126 × 171158.238 3.22e-117 × 96.366.355 1 38944.231 1 4.28e-46 345 1 × 0 127.301 2e-04 × 26.923 3.32e-40 1 168 × × 158.073 1.19e-47 1536 1 43.5 7.8e-135 × Transport, × 155 Axonemal, Basal 54.957 × 175 142.857 1.07e-152 453 137.531 0.47 1 × 42.382 4.87e-173 0.53 503 × 0.474 2.43e-165 × × 042.382 × 51961.345 × 1.06e-163 0.526 49448.328 60262.529 1.07e-103 Regulation, 0.463 Centrosome × 49056.907 0.502 046.875 0.537 × 337 0.498 × 048.463 0 × 788 2.87e-111 × 049.901 577 1 6.96e-157 347 56727.446 602 1 × 6.07e-29 0.504 46547.881 0.335 0.496 × × 119 0.449 × × 1 059.778 × 59.645 × 29.176 1 833 9.95e-107 052.37 0 × 29.565 3.31e-144 35449.817 1.81e-28 565 1 567 465 117 × 0.499 0 1 0.501 Axonemal, × Ciliary × 1 × 878 1 × × 1 × DE Info → ↓ → → → ↑ → ↑ ↓ ↓ ↓ → ↑ ↑ → → → → ↑ ↑ ↓ ↑ → → → → → → → ↑ → → → → → → Name Component ID Local Gene (continued) comp223399_c0 IF172 comp223430_c0 ESPN comp223439_c0 TUT7 comp223451_c0 R10B2 comp223474_c0comp223475_c0 XYLT WEE1 comp223485_c0comp223499_c0 CF118 CCD39 comp223501_c4 HEAT4 X thy Cil- iopa- action RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Ciliome Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliary TFcilium inter- ift ift-b factor centriole duplication axonemal dynein complex assembly ift172 basal body Name ccdc39 axoneme rsph10b2 5, 6, 7, 8 5, 6, 7, 8 Ensembl IDENSG00000074054 SourceENSG00000160917 1ENSG00000065833 3ENSG00000100412 2ENSG00000010219 3, 7ENSG00000138002 Gene 7 1, 2, 3, 4, clasp1ENSG00000163040 cpsf4ENSG00000070759 2 me1 aco2ENSG00000156110 2ENSG00000130816 dyrk4 7ENSG00000152785 3ENSG00000156381 3 4, 5 ccdc74aENSG00000166828 CilDBENSG00000213199 3ENSG00000118513 tesk2 2 4, 5, dnmt1 Cilia adk 6 Proteome Ciliome2ENSG00000134744 bmp3 ankrd9 Ciliome2ENSG00000137216 3ENSG00000165548 scnn1g 3ENSG00000169402 2 asic3 myb 4, 5, 7ENSG00000155026 FoxJ1 transcription ENSG00000106105 3ENSG00000069956 zcchc11 Rfx2 7 tmem63bENSG00000170525 3 tmem63c rsph10b, Rfx2ENSG00000123836 3ENSG00000015532 2 4, comp223391_c0 5ENSG00000263042 rsph10b 4, 5ENSG00000166483 comp223367_c1 comp223394_c1 gars mapk6ENSG00000171643 MAOX 3 FoxJ1 pfkfb3 comp223368_c1ENSG00000112539 2 pfkfb2 comp223395_c0 4, CLAP1 5 xylt2 FoxJ1 ACON ENSG00000145075 CPSF4 1, Rfx2 wee2 2, 3, 4, DYRK4 Rfx2 RFX ENSG00000151474 comp223401_c1 wee1 c6orf118ENSG00000138823 s100z Rfx2 3 Ciliome2ENSG00000187105 comp223404_c0 2 FoxJ1 4, 5, SPICE 7ENSG00000075651 Cilia Proteome comp223422_c0ENSG00000145911 3 TESK2 comp223428_c0 Rfx2ENSG00000125207 comp223412_c0 Rfx2 3 frmd4a FoxJ1 7 DNMT1 comp223433_c1 heatr4 BMP3B mttp comp223433_c1 ADK2 FoxJ1 Rfx2 ASIC5 pld1 n4bp3 comp223439_c0 ASIC5 comp223449_c0 Rfx2 comp223449_c0 piwil1 Rfx2 TM63B RFX TUT7 comp223470_c0 TM63B comp223451_c0 RFX comp223464_c0 comp223467_c1 Rfx2 RFX R10B2 F26 comp223470_c0 FoxJ1 SYG MK04 F26 comp223475_c0 comp223475_c0 Rfx2 RFX FoxJ1 WEE1 WEE1 Rfx2 Rfx2 comp223500_c1 comp223500_c1 FRM4A FRM4A comp223518_c2 comp223539_c1 comp223547_c2 LZTS2 PLD1 PIWL1 227 Membrane, Other Organells Organells Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 78.448 4.15e-5469.67442.02942.029 19342.13 0 5.04e-54 0 0.254 041.623 567 × 4.86e-152 172 65385.253 654 Basal Body, 0.746 Ciliary 45345.238 7.1e-134 0.104 4.8e-88 0.5 × 41.637 0.274 0.5 × 9.89e-62 379 × × 293 × Ciliary93.243 Membrane, Other 0.22944.444 1.66e-38 213 0.177 × 0.129 142 × 026.733 4.48e-05 × 0.08662.857 545 3.99e-99 45.4 × 80.874 1.14e-91 287 137.363 130.556 2.27e-12 265 × 1 × 1e-39 Axonemal, 0.81245.21 Basal 61.2 Body, × 1.66e-170 × 151 0.18863.81646.499 515 Transport, Axonemal, Basal × 1.36e-127 1 0 38581.25 1 × 7.25e-9274.486 558 × 80.392 Centrosome 52.324 1 3.95e-118 28435.379 029.718 3.33e-48 1 × 364 5.48e-105 0 0.2 × 771 163 0.257 349 × 611 0.543 × Basal Body 1 × 1 1 × × × DE Info ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ → ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↑ ↓ → Name Component ID Local Gene (continued) comp223558_c1 Y1354 comp223573_c0comp223573_c0 RAB5C EFHB comp223573_c0comp223573_c0 EFHB EFHB comp223586_c0 FRITZ comp223593_c0comp223601_c1 NA comp223601_c2 CB5D1 ARL3 comp223603_c0comp223615_c2 CCD11 LRGUK comp223634_c2 CCD19 comp223645_c0 6PGD X X X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 CilDB Ciliopathy Survey Ciliome2 CilDB Ciliome2 Cilia Proteome Localization Known Ciliary TFmembrane inter- endosomes membrane axoneme bbs bb axoneme protein transport centriolar satellite arl3 cytosol lipidated efhb CilDB Ciliome2 Ciliome RFX Name wdpcp cytosol plasma cfap53 cyb5d1 CilDB Ciliome2 Cilia ccdc19 cfap45, ccdc11, 7 8 7 7, 8 6, 7 5, 7 Ensembl IDENSG00000126603 Source 6, 8ENSG00000138430ENSG00000082684 7ENSG00000111785 3ENSG00000124839 Gene 2 6, 8 glis2ENSG00000163576 1, nucleus 2, 4, 5, sema5b ola1ENSG00000108774ENSG00000162755 ric8b 7 rab17 4, 5ENSG00000150773 plasma SysCilia Ciliopathy Survey 2, 4, 5ENSG00000068489ENSG00000143951 3 rab5c klhdc9 3, 4, 5, Ciliome2 6, pih1d2ENSG00000082497 4, 5ENSG00000182224 prr11 1, 2, 4, 5, ENSG00000138175 1, Ciliome2 4, sertad4 5, 6, ENSG00000196503ENSG00000172361 Rfx2 2 1, 2, 4, 5, FoxJ1ENSG00000155530 2, 4, 5ENSG00000204231 comp223558_c1ENSG00000213085 2 comp223570_c1 arl9 1, 2, 3, comp223570_c1 4, RFX Y1354 lrguk RFX SEM5A ENSG00000163611 SEM5A 4, 5, 6ENSG00000142657ENSG00000164188 7 rxrb comp223573_c0ENSG00000155506 Rfx2 3ENSG00000153904 3ENSG00000187609 2 spice1 RAB5C 3 RFX bb (centrosome) Ciliome Ciliopathy Survey ranbp3l pgd comp223573_c0 larp1 ddah1 Cilia Proteome exd3 RAB5C RFX RFX FoxJ1 FoxJ1 Ciliome2 Cilia Proteome Ciliome comp223601_c2 FoxJ1 comp223626_c0 ARL3 RXR Rfx2 comp223663_c4 Rfx2 Rfx2 DDAH1 comp223645_c0 comp223645_c0 comp223661_c0 6PGD 6PGD comp223664_c1 LARP1 MUT7 228 Centrosome, Other Organells Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome Found Category Score EValue BitScore Weighted tity Per- 42.5 4.12e-2852.4 120 2.3e-85 1 264 × 0.338 × cent Iden- 42.788 3.67e-5049.302 5.82e-66 16135.123 2.37e-67 227 123.20730.087 244 5.2e-09 × 1.94e-162 151.05452.375 Axonemal, Basal Body, 60.1 × 54977.545 1 087.126 0 × 1 0 1 1551 Axonemal,52.316 Basal Body, 1637 0 × × 538 0.48759.912 0.513 0 601 × 0.472 × 0.528 0 × 53.418 739 × 42.969 596 0 5.26e-169 165.015 0.46962.791 × 505 67549.476 6.94e-81 ×29.16 061.947 0.531 2.43e-12934.67 Transport, 1.28e-97 237 Axonemal, Basal 0 1 4.67e-52 × 98635.402 44145.841 × 1.05e-121 293 848 3.77e-159 1 197 Axonemal, Basal Body, 39058.491 1 1 × 466 0.336 1 1 × 0.664 × × × 046.122 × × 1 1.04e-7637.485 51643.211 1.79e-159 × 33.028 237 4.95e-08 0.662 487 0 × 56.2 1 699 1 × × 1 1 × × DE Info → ↑ → ↑ ↑ → → ↓ ↓ → ↑ ↑ ↑ → → → ↑ ↑ ↓ ↓ ↑ ↓ → → ↓ → → ↓ Name Component ID Local Gene (continued) comp223669_c1comp223674_c0 CH037 comp223678_c0 LRC56 TOPRS comp223693_c5comp223693_c5 WDR5 WDR5 comp223699_c0comp223708_c0 NRDE2 × TTLL6 comp223708_c0 31.256 1.2e-135comp223718_c0 TTLL6 444 CE104 1comp223740_c0 × comp223750_c1 CYTSA comp223758_c0 YPC2 GMPAA comp223769_c0 TPGS2 X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 Rfx2 RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey FoxJ1 SysCilia CilDB Ciliopathy Survey Localization Known Ciliaryciliary root bbs bb TF inter- centrosome bb modification (polyglutamyla- tion) axoneme (tip) ttll13 Name centriole bb 7 rp23281f13.18 comp223688_c1 MYS2 Ensembl IDENSG00000156172 Source 4, 5, 6, 8ENSG00000161328 2, 4, 5 Gene c8orf37ENSG00000197579 basal body 6, 8ENSG00000189108 lrrc56ENS- 2MUSG00000110266 ENSG00000108846ENSG00000103222 topors 3ENSG00000196981 3 basal body 4, 5ENSG00000196363 il1rapl2 3, 4, 5ENSG00000140564 Cilia Proteome Ciliome abcc3ENSG00000119720 3 abcc1 wdr5b 4, 5ENSG00000170703 RFX wdr5 2, 6, 8ENSG00000213471 3, nrde2 furin 4, 5ENSG00000116198 ttll6 1, 4, 5,ENSG00000224455 6, cilium axoneme ENSG00000112245 3 ttll13p, ENSG00000142002 3ENSG00000188039 3ENSG00000142279 3, 7ENSG00000102804 3 FoxJ1 4, 5ENSG00000100014 vps52ENSG00000181085 3 ptp4a1 4, 5 Rfx2ENSG00000132530 dpp9 nwd1 Rfx2ENSG00000144591 RFX 2 tsc22d1 3, wtip 4, 5 comp223679_c0 RFX ENSG00000163689 specc1l mapk15ENSG00000134779 2 FGFR3 4, 5 comp223690_c0ENSG00000189337 gmppa RFX Rfx2 comp223690_c0ENSG00000070882 3 xaf1ENSG00000198948 2 3 MRP1 c3orf67 MRP1 tpgs2 comp223697_c1 kazn osbpl3 mfap3l FURIN Rfx2 Rfx2 Rfx2 Rfx2 RFX Rfx2 Rfx2 RFX comp223720_c1 comp223723_c0 comp223729_c1 RFX comp223733_c0 VPS52 FoxJ1 TP4A2 comp223736_c1 K1239 DPP9 FoxJ1 comp223740_c0 RFX AJUBA CYTSA comp223753_c2 Rfx2 FoxJ1 Rfx2 comp223758_c0 TRAD1 GMPAA comp223801_c0 comp223786_c1 comp223815_c2 OSBL6 KAZRN NRG 229 Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 44 8.31e-38 149 1 × tity Per- cent Iden- 32.36849.153 2.49e-8172.857 1.3e-1242.657 1.61e-11668.657 1.75e-97 275 6.82e-143 62.4 33448.25765.796 342 40257.041 144.138 1 0 1 3.12e-39 × 0 1 0 × 1 × 53275.885 143 × 539 × 135774.38 074.409 1 1 2.45e-5233.888 1 167.308 1.86e-112 × × 722 1.1e-177 × 0 183 × 35.944 359 Axonemal, Basal Body, 0.443 515 0.112 724 0 ×47.111 × 49.219 1 1.85e-50 0.444 Axonemal, 1.19e-26 Basal Body, 123.423 572 × × 9.27e-23 192 × 59.524 96.7 0.749 5.09e-56 0.25153.093 103 0.484 × 6.92e-7660.59 × 199 0.516 ×50.955 Axonemal, Basal Body, 33.951 240 0.097 × 2.18e-58 Basal Body, Centrosome 83.422 0 0 0.11739.657 × 5.27e-11040.669 211 1054 × 1.79e-175 320 565 0 0.512 54745.256 1 0.275 × 698 135.764 × × 4.46e-102 0 1 × 44.293 359 1.61e-168 1 × 772 × 513 1 1 × 1 × × Basal Body, Transition Zone DE Info ↑ → ↓ ↑ ↑ → ↑ ↓ ↑ → → → → → → → ↑ ↑ ↑ ↑ ↑ ↑ → ↓ ↑ → → ↑ ↑ ↑ Name Component ID Local Gene (continued) comp223850_c0 SURF4 comp223874_c1comp223876_c0 KTU TRC comp223881_c1comp223883_c0 PANK1 AZI1 comp223886_c0comp223886_c0 CBY1 comp223887_c2 F227B comp223887_c2 SSBP3 SSBP3 comp223894_c1 MTRAA comp223914_c0comp223923_c1 CCD15 comp223939_c1 NPHP4 FGD4 X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 Rfx2 FoxJ1 FOXJ1 Rfx2 Table B.2.1: SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey Cilia Proteome Localization Known Ciliary TF inter- axonemal dynein complex assembly basal body centrosome centriolar satellite appendage tz Name transition zone dnaaf2 cytosol cep131 8 8 Ensembl IDENSG00000196338 SourceENSG00000124145 2ENSG00000125656 3ENSG00000248905 3ENSG00000148248 3 Gene 4, 5ENSG00000071073ENSG00000170145 3ENSG00000173575 nlgn3 3ENSG00000165506 sdc4 3 2, clpp 4, 5, 6, fmn1 surf4ENSG00000211455 mgat4a 6, 8 sik2ENSG00000189409 chd2ENSG00000112079 Cilia 2 ProteomeENSG00000196967 2ENSG00000125779 3 stk38l 4, 5ENSG00000141577 cytosol unclear 4, 5, mmp23b 6, 8 FoxJ1 SysCilia Ciliopathy Survey ENSG00000147689 stk38 znf585aENSG00000100211 pank2 3 4, 5, azi1, 6,ENSG00000184949 8 4, 5ENSG00000164123 Rfx2 comp223825_c0 4, Rfx2 5ENSG00000145687 Rfx2 fam83a cby1 RFX 4, 5 EST1C ENSG00000075711 centriole fam227a Rfx2 distal ENSG00000132535 3ENSG00000014914 2 c4orf45 Rfx2 comp223834_c0 Rfx2 4, 5ENSG00000071859 comp223835_c1ENSG00000156299 comp223836_c0 ssbp2 3ENSG00000153930 3ENSG00000149548 comp223859_c4 3 SDC CLPP 3, dlg1 mtmr11 4, FMN2 5 comp223862_c0 FoxJ1ENSG00000131697 comp223870_c4 MGT4B fam50a 1, 2, 3,ENSG00000127084 6, FoxJ1 Rfx2 tiam1 ccdc15 RFX 3, 4, ankfn1 5 SIK2 CHD2 ENSG00000181652 comp223876_c0 3 comp223876_c0 Cilia Proteome comp223878_c1 Rfx2 fgd3 TRC RFX ZN208 TRC atg9b RFX FoxJ1 RFX comp223883_c0 Rfx2 RFX AZI1 comp223887_c2 Rfx2 Rfx2 RFX Rfx2 DLG1 comp223887_c2 comp223897_c0 RFX DLG1 comp223899_c0 comp223905_c0 FAM50 Rfx2 ANKF1 SIF2 comp223945_c0 ATG9A 230 Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 42.64535.512 054.21953.683 053.253 55052.431 027.789 0 558 1.68e-100 037.956 1041 036.879 1 1.26e-18 1068 0.209 350 5.86e-147 624 × 0.39 × 60259.333 0.801 87 484 0.466.187 1.87e-52 0.509 × Axonemal,45.545 1.24e-126 Ciliary 0.491 × 3.06e-84 × 0.199 × 76.735 176 372 × 151.266 × 1.15e-146 268 0.321 × 0.679 0 42732.357 × 0.24 × 50.754 847 × 1.28e-57 0 152.011 0.76 196 × 104654.839 × 62.611 8.77e-87 Transport, 0 Axonemal, Basal 45.273 7.9e-159 1 1 263 679 459 0 ×63.69 × 45.963 Regulation, Basal Body, 139.05 845 8.24e-97 1 162.71 0 2.74e-50 × 37.627 × × 55.385 321 1 639 186 0 055.214 × 067.062 1 151660.667 4.17e-161 1 1 739 2.39e-49 × 0 68842.084 0.672 × 45440.111 × 2.35e-163 0.328 4.82e-147 176 0.343 × 687 0.226 531 × 487 × 0.088 0.343 × 0.522 Transport, × Axonemal, Basal 0.478 × × × DE Info ↑ ↑ ↑ ↑ → → ↑ ↑ ↑ → → ↓ ↓ ↑ ↑ ↑ → → → ↑ → ↑ → ↑ ↑ → → → → → → Name Component ID Local Gene (continued) comp223949_c0comp223950_c0 LRC45 S4A10 comp223964_c0comp223965_c0 SFI1 EFCB5 comp223984_c0comp223993_c0 SYT7 TILB comp224014_c0comp224016_c1 FOXJ1 POMT2 comp224029_c1 WDR66 comp224036_c0 KTNA1 comp224051_c0 BBS1 comp224051_c0 MDHC X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey Cilia Proteome Ciliome SysCilia Ciliopathy Survey RFX Proteome Ciliome SysCilia CilDB Ciliopathy Survey Localization Known Ciliary TF inter- dynein complex assembly transcription factor cilium bbs - ift-associated lrrc6 cilium axonemal foxj1 nucleus bbs1 basal body Name wdr66 CilDB Ciliome2 Cilia 6, 8 78 gm290965, 7 8 comp224009_c4 CC162 Ensembl IDENSG00000169683 Source 4, 5ENSG00000004939 3, 6ENSG00000013375ENSG00000033867 Gene 7ENSG00000148842 3ENSG00000158158 lrrc45 3ENSG00000198089 3 slc4a1 4, 5ENSG00000233198 axonemeENSG00000176927 3 pgm3 3, 4, slc4a7 5, 7 cnnm2ENSG00000143420 cnnm4ENSG00000151923 Ciliopathy Survey 7ENSG00000110975 3 efcab5 4, 5 rnf224ENSG00000166226ENSG00000129295 3 2, Rfx2 3, 4, 5, Ciliome2 ensaENS- syt10 tial1MUSG00000075225 ENSG00000129654 2, cct2 4, 5, 6, ENSG00000009830 RFX 4, 5ENSG00000129521ENSG00000152683 3ENSG00000158023 3 1, 2, Rfx2 3, 4, Rfx2 pomt2 Rfx2ENSG00000102781 RFX 4, 5 RFX ENSG00000129595 egln3 slc30a6 comp223950_c0 Rfx2ENSG00000167491 3ENSG00000101333 3 comp223950_c0ENSG00000149782 3 comp223952_c0 S4A10 ENSG00000174483 3 comp223952_c0 katnal1 1, 4, 5, 6, S4A10 RFX CNNM2 Rfx2 epb41l4aENSG00000256349 CNNM2 comp223964_c0ENSG00000014641 gatad2a 1ENSG00000055332 7 Rfx2 plcb4 4, 5ENSG00000151276 plcb3ENSG00000081026 SFI1 3 ctd-3074o7.11 2 comp223979_c1 comp223979_c1 eif2ak2 mdh1 comp223984_c0 TIAR RFX TIAR magi1 magi3 TCPB Rfx2 CilDB Rfx2 Ciliome2 RFX Rfx2 comp224019_c3 comp224023_c0 Rfx2 Rfx2 EGLN1 Rfx2 ZNT6 comp224046_c1 comp224048_c2 RFX comp224049_c2 E41L5 comp224049_c2 comp224051_c0 Rfx2 P66A FoxJ1 PLCB4 comp224051_c0 PLCB4 BBS1 MDHC comp224054_c0 comp224054_c0 MAGI2 MAGI2 231 Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 80 2.56e-88 301 0.152 × tity Per- 68.5 9.47e-98 286 0.517 × 40.5 1.48e-71 254 1 × cent Iden- 37.33851.867 4.97e-64 4.18e-7946.995 5.25e-57 20724.351 255 2.53e-11 179 1 67.8 167.845 × 1 × 65.263 1 0 × 1.1e-9083.944 ×29.891 1681 5.27e-54 267 Transport, Axonemal, Basal 055.801 0.84835.714 194 0.483 3.35e-52 × 63257.343 × 0 0.236 4.22e-6865.789 201 Transport 51.744 4.18e-36 × 629 4.25e-75 1 203 0.381 123 × 0.764 0.385 × 24224.444 × 0.23343.603 × Centrosome 1.5e-13 3.41e-15843.168 × 127.778 75.1 471 1.66e-19 ×63.776 0 5.14e-91 Axonemal, 94.4 Ciliary 131.787 1 987 4.75e-62 266 × 61.73 × 1 0.548 219 Basal Body 1 ×62.338 0 × 0.45256.911 × Basal40.149 Body, 1.98e-42 Transition Zone 1041 7.25e-67 × 0 157 0.461 215 847 × 0.069 0.095 0.375 Transport, × Axonemal, Basal × × DE Info ↓ ↑ ↓ ↑ → → ↑ ↑ ↓ → → ↑ ↑ ↑ → ↑ → → ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp224075_c0comp224076_c0 COMD1 comp224090_c0 MDM1 comp224091_c0 STRUM comp224091_c0 U119B comp224096_c0 U119B MYCT comp224099_c3comp224099_c3 CNTRB comp224106_c1 TPPC1 MNS1 comp224122_c0comp224137_c0 CEP78 comp224142_c1 TECT2 comp224142_c1 KAD1 comp224146_c1 KAD1 IFT80 X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX CilDB Ciliopathy Survey Rfx2 Ciliopathy SurveySysCilia CilDB Ciliopathy RFX Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliary TFcilium inter- bb and axoneme transport centriole axoneme (flagella) tz cilium ift ift-b ift80 basal body mns1 axoneme Name mdm1 centrosome 8 5, 6, 7, 8 5, 6, 7, 8 Ensembl IDENSG00000116455 SourceENSG00000115468 3ENSG00000166689 3ENSG00000173163 3 4, 5ENSG00000111554 Gene 3, 4, 5, 6, wdr77ENSG00000169583 efhd1 commd1 plekha7 4, 5ENSG00000164961ENSG00000175970 3 1, 4, 5ENSG00000109103 1, 3, 6ENSG00000087258 clic3ENSG00000197496 7 unc119b kiaa0196 2, 4, 5ENSG00000151229 unc119ENSG00000170037 3 lipidated protein 4, 5, 6ENSG00000170043 slc2a10 gnao1 4, 5ENSG00000186187ENSG00000138587 3 CilDB slc2a13 1, 2, 3, 4, daughter trappc1 Rfx2ENSG00000165891ENSG00000148602 Rfx2 RFX Rfx2 3ENSG00000148019 2 Ciliome2 znrf1 4, 5, 6ENSG00000052749ENSG00000168778 3 4, 5, RFX 6, comp224061_c4ENSG00000106992 8 RFX comp224065_c0 comp224068_c0 3, e2f7 4, Rfx2 5, 7 lrit1ENSG00000154027 MEP50 bb 4, tctn2 5 PKHA7 DYH7 ENSG00000068885 rrp12 transition zone × 1, 2, 3, ak1 4, RFX 41.606 comp224090_c0ENSG00000248710 2.81e-58ENSG00000115875 Ciliopathy Survey 1ENSG00000165182 ak5 Rfx2 3 STRUM 202 2 comp224092_c0 RFX rp11-432b6.3 RFX Ciliome2 1 GNAO Rfx2 srsf7 comp224096_c0 cxorf58 × MYCT Rfx2 CilDB FoxJ1 comp224099_c3 RFX Rfx2 TPPC1 comp224108_c0 comp224115_c1 RFX comp224123_c0 E2F8 GP125 RRP12 Rfx2 FoxJ1 comp224146_c1 IFT80 comp224146_c1 comp224146_c1 CX058 IFT80 232 Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- 66.9 0 600 0.598 × Transport, Axonemal, Basal cent Iden- 31.69 4.22e-14652.48 46557.818 2.11e-12533.582 395 2.1e-14 1 052.515 × 34.093 3.8e-162 67.4 65442.986 1.82e-167 1 4.04e-49 470 × 51637.482 143.636 7.93e-145 1 160 3.05e-21 ×37.931 × 1 451 1 2.01e-8687.755 85.1 Axonemal, Basal 1 Body, × × 283 × 1 0 1 × 66.786 1 × 41.919 731 3.94e-13644.926 × 81.285 404 036.524 1 0 6.06e-91 0.402 030.981 × 177726.461 3.64e-86 1938 × 55.422 7.09e-58 283 1477 0.478 0.522 281 0.16126.098 0.839 209 × 0 × 1.6e-29 0.57358.667 × × 0.427 1.51e-2942.921 681 × Axonemal, 124 Ciliary × 11355.835 032.231 1 0.102 1 1.06e-6848.288 × 0 99154.795 1.25e-170 × × 245 Basal Body 0.898 529 773 0 0.317 × 0.683 × 816 Basal 1 Body, Centrosome × × 1 × DE Info ↑ ↓ → ↑ ↓ → → → → ↑ → ↑ ↑ → → ↓ ↓ → → ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓ Name Component ID Local Gene (continued) comp224155_c0 AXDN1 comp224159_c0comp224172_c0 SPTC2 PIHD3 comp224187_c1 EVG1 comp224195_c1comp224199_c0 CC138 BBS4 comp224204_c2 FBXW9 comp224229_c0comp224231_c1 CBPC1 comp224244_c0 MTUS2 comp224244_c0 DCTN1 DCTN1 comp224260_c1 CTND2 X X X X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 Rfx2 FOXJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1 Table B.2.1: Ciliome2SysCilia CilDB Ciliopathy Survey FoxJ1 comp224194_c0 CE049 Ciliopathy Survey Ciliopathy Survey Localization Known Ciliary TF inter- cilium bbs - ift-associated centriolar satellite ift-associated? appendage basal foot bbs4 basal body Name axdnd1 Ciliome2 RFX 1700001l19rik 7 6, 8 Ensembl IDENSG00000162779 Source 2, 3, 4, 5, ENSG00000117859ENSG00000172296 Gene 3 4, 5ENSG00000080572 2, 4, 5, 6ENSG00000002549ENSG00000167202 7 osbpl9ENSG00000128346 sptlc3 3 pih1d3 2, 4, 5 cytoplasmENSG00000012779ENSG00000215217 7 2, c22orf23 7ENSG00000163006 tbc1d2b Ciliopathy Survey lap3 Ciliome 4, 5ENSG00000138071ENSG00000140463 RFX 7 1, c5orf49, 3, 4, 5, alox5 ccdc138ENSG00000163754ENSG00000198691 2 Ciliome2ENSG00000165029 actr2 7ENSG00000083312 3 8ENSG00000132004 4, 5 Rfx2ENSG00000099910 RFX ENSG00000185271 gyg1 3 abca4ENSG00000135049 2 abca1 3, 4, 5 tnpo1 Ciliome2ENSG00000132938 fbxw9 RFX axoneme Rfx2 4, 5 comp224157_c2ENSG00000214413 agtpbp1 klhl22 6ENSG00000204843 klhl33 SysCilia OSB11 6 Ciliome2ENSG00000144535 comp224173_c0 mtus2 RFX ENSG00000057294 comp224183_c1 3 2, 3ENSG00000169862 bbip1 AMPL ENSG00000111667 2 TBD2B bbs - 7 subdistal comp224190_c1 dis3l2 pkp2 comp224198_c0 FoxJ1 ctnnd2 AOSL Rfx2 usp5 ARP2 RFX comp224204_c2 comp224201_c4 comp224199_c0 RFX Rfx2 FoxJ1 Cilia Proteome TNPO1 comp224201_c4 ABCA1 Cilia Proteome Ciliome BBS4 RFX Ciliome2 ABCA1 FoxJ1 comp224209_c0 comp224209_c0 Rfx2 KLHL9 KLHL9 Rfx2 comp224260_c1 CTND2 comp224257_c1 comp224275_c1 DI3L2 UBP5 233 Membrane Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 50.259 9.13e-5150.50540.359 2.79e-15633.552 18035.58 46479.904 2.79e-125 054.669 4.25e-117 0.28 0 0.72 433 1325 × 33446.261 1035 032.423 × 6.36e-161 Basal 0.155 Body,46.348 0.474 Ciliary 1.18e-4627.027 0.371 475 587 1 × 2.33e-08 × 56.102 165 × 057.477 × 0.447 0.55351.831 54.7 6.89e-158 3.29e-126 × 0 × 949 1 499 366 1 × 579 1 × 160.309 1 × 1.62e-77 1 × ×65.052 × 236 Transport, Axonemal, Basal 43.182 0 7.88e-159 172.34 1847 486 ×73.323 Transport, 0 Axonemal, Basal 1 122.013 070.406 1003 × 6.68e-09 × Transport, Axonemal, 1010 0.498 Basal 76.821 58.9 0 1.9e-83 × 0.50264.286 593 Transport, 1 Axonemal, 9.5e-18 × Basal 265 × 0.628 Transport, Axonemal, Basal 0.281 85.9 × × 0.091 Basal Body, Centrosome, × DE Info → → ↓ ↓ ↓ → ↑ ↑ → ↑ → → ↑ ↑ → ↑ ↑ ↑ ↑ ↓ → → → Name Component ID Local Gene (continued) comp224277_c5 OSBL1 comp224286_c0 BMAL1 comp224328_c1 LRAD2 comp224332_c1 XRP2 comp224355_c4comp224358_c1 LZTL1 comp224360_c0 WDR19 comp224361_c0 ARMC2 comp224361_c0 TT30A TT30A comp224381_c0comp224381_c0 SEPT7 SEPT7 X X X X X X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX Survey Ciliome2 Survey Survey Ciliome2 SysCilia Ciliopathy Survey Localization Known Ciliarymembrane bb TF inter- lipidated protein transport bb axoneme pericentriolar membrane cilium bbs - ift-associated ciliary base Name wdr19 cilium ift ift-a SysCilia CilDB Ciliopathy ttc30b cilium ift ift-b SysCilia CilDB Ciliopathy 6, 7, 8 7, 8 Ensembl IDENSG00000130545 Source 6, 8ENSG00000130703ENSG00000130827 3ENSG00000198753 3ENSG00000136040 Gene 3ENSG00000233087 2ENSG00000133794 7 crb3 3, 4, 5 ciliary osbpl2ENSG00000029153 plxna3ENSG00000138166 3 plxnb3ENSG00000163491 3 plxnc1ENSG00000187942 3 rab6a 4, 5ENSG00000197299ENSG00000107263 3ENSG00000102218 3 4, 5, dusp5 6, 8 nek10 ldlrad2ENSG00000079308 Ciliome2 rapgef1 blmENSG00000116044 rp2 3ENSG00000163818 3 basal body 4, 5, 6, 8ENSG00000157796 1, 3, 4, 5, Rfx2ENSG00000118690 tns1 lztfl1 Rfx2 2, 4, Rfx2 5 basal body FoxJ1ENSG00000197557 RFX 1, 6, 8ENSG00000196659 armc2 comp224277_c5 1, 4, comp224279_c0 5, 6, Rfx2 comp224279_c0ENSG00000198723 Rfx2 comp224279_c0 ttc30a comp224285_c2ENSG00000122545 Rfx2 OSBL1 RFX 7 PLXA4 cilium 6, ift 8 ift-b PLXA4 PLXA4 ENSG00000136918 RAB6A Rfx2 Rfx2 2, 4, SysCilia 5 CilDB Ciliopathy 1700019b03rik comp224286_c0ENSG00000120913 comp224301_c2 comp224316_c0 2 sept7 BMAL1 centrosome DUS10 wdr38 NEK10 comp224331_c3 comp224329_c0 Rfx2 Rfx2 Ciliome2 pdlim2 RPGF1 BLM RFX comp224341_c4 comp224341_c4 TENS TENS × × 60.204 60.204 9.22e-130 2.65e-134 412 408 comp224376_c0 RFX 0.502 0.498 CS045 FoxJ1 × × comp224381_c0 SEPT7 234 Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 42.857 7.17e-156 48045.67456.846 7.99e-11549.819 3.69e-9135.38 352 1 4.3e-4656.078 271 0 6.61e-75 × 1 157 102632.766 238 1 1.76e-131 × × 80.795 1 1 44426.407 5.78e-96 161.077 8.37e-08 × × 50.175 × 277 1 57.4 046.722 Centrosome 052.381 × 45.353 1 56633.981 4.63e-78 1 0 858 1.43e-52 × 0 × 66.667 237 822 1 9.63e-07 181 66940.86 147.709 × 4.35e-3152.55 51.6 × 1 142.694 1 1 Transport, × 0 134 Axonemal, Basal × × 42.461 0 1 × 034.589 5.42e-127 0.127 Basal Body 925 6.07e-88 × 1332 × 408 700 0.87342.623 30462.162 0.368 0.632 × 46.212 1.57e-50 148.296 0 × × 47.598 × 141.875 181 043.65 1.32e-35 0 × 94637.113 0 2.03e-11 60334.831 132 1 660 0 2.56e-28 623 1 62.8 0.233 × 0.051 0.255 × 112 572 0.241 0.359 × × × × 0.641 × 0.221 × × DE Info ↑ ↓ ↓ → ↓ ↑ ↑ ↑ ↓ ↑ ↑ ↑ ↓ → ↑ → ↓ ↓ → ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓ ↓ ↑ ↑ Name Component ID Local Gene (continued) comp224402_c0 VWA3A comp224436_c1 TEX9 comp224437_c0 CI174 comp224469_c0comp224473_c1 PTHB1 UBP20 comp224496_c0 IQCC comp224506_c0 CE112 comp224511_c1 MYBPP comp224524_c0 BAIP2 X X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FoxJ1 FOXJ1 Table B.2.1: Ciliopathy Survey Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Localization Known Ciliary TF inter- satellite ift-associated tex9 centriolar bbs9 cilium bbs - Name vwa3a Ciliome2 RFX mycbpap Ciliome2 Ciliome RFX 7 6 7, 8 7 Ensembl IDENSG00000175267 Source 2, 3, 4, 5, ENSG00000135736ENSG00000091164 Gene 3ENSG00000009335 7ENSG00000002834 3ENSG00000105767 7ENSG00000151575 3 2, ccdc102a 3, 4, 5, ENSG00000197816 txnl1 ube3c 2, 4, 5, 7 lasp1 cadm4ENSG00000135930ENSG00000173517 3ENSG00000122870 ccdc180 3ENSG00000122507 2 1, 4, 5, 6, ENSG00000077254 Ciliome2ENSG00000003987 6 eif4e2ENSG00000168748 2 peak1ENSG00000143412 3ENSG00000182871 bicc1 3ENSG00000166183 Ciliome2 3 CiliomeENSG00000160051 2 4, 5 usp33ENSG00000198286 mtmr7ENSG00000198399 Rfx2 bb 3ENSG00000083067 2 ca7 anxa9ENSG00000154240 RFX col18a1 7 3, 4, Rfx2 5 aspgENSG00000157168 iqcc Rfx2ENSG00000136449 3 card11 Ciliopathy Survey comp224405_c0 2, 3, 4, 5, cep112 itsn2 comp224405_c1 trpm3ENSG00000090863 comp224413_c1 C102A ENSG00000120690 1ENSG00000162909 TXNL1 3 comp224435_c0ENSG00000135773 1 nrg1 Ciliome Rfx2 comp224430_c0ENSG00000014216 UBE3C 1, 7 Rfx2ENSG00000075142 1 HMCN2 FoxJ1ENSG00000203697 7ENSG00000166845 LASP1 × 2 Cilia Proteome Ciliome glg1 4, 5 24.916ENSG00000175866 capn2 1.63e-18 capn9 FoxJ1 2 comp224451_c0 FoxJ1 comp224454_c0 capn1 comp224462_c1 FoxJ1 Rfx2 Rfx2 88.2 Rfx2 c18orf54 capn8 sri IF4E2 PINK1 RFX BIC1B baiap2 comp224481_c0 1 Rfx2 CilDB comp224497_c1 comp224494_c0 CilDB RFX × CilDB comp224482_c0 Ciliome2 comp224485_c0 MTMR8 comp224486_c0 CilDB ITSN1 ASPG ANX12 CAH1 CO4A1 Ciliome2 Rfx2 × comp224497_c1 × 44.425 37.653 2.74e-154 1.83e-58 ITSN1 comp224499_c0 459 Rfx2 224 comp224506_c0 TRPM3 1 RFX comp224522_c1 FoxJ1 CE112 1 comp224519_c2 × × comp224522_c1 FoxJ1 comp224520_c0 CANB GSLG1 comp224522_c1 comp224522_c1 E74EB CANB comp224522_c1 CANB comp224524_c0 CANB CANB BAIP2 235 Body, Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Membrane, Transition Zone Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 70.71772.803 068.38 6.87e-129 1.64e-17460.241 391 796 5.19e-16239.147 58435.012 1.63e-45 0.329 0.671 527 9.23e-69 0.526 × ×41.457 174 0.47438.632 1.28e-160 × 246 Transport, 3.55e-107 Axonemal, Basal ×45.113 Axonemal, Ciliary 501 1 339 Axonemal, Ciliary 1 0 × × 37.097 1 1 832 × × 034.914 7.44e-3638.769 1 580 142 ×55.195 0.803 0 5.89e-108 Transport, 0.19755.056 Centrosome, × 1.83e-58 31771.795 846 × 46.339 189 0.626 058.276 1 × 0 0.374 3.48e-112 × 543 × 36150.172 850 Motility, Central Pair 34.146 1.79e-12 1 030.172 1 129.775 5.33e-26 × 28.108 70.5 2.39e-16 × 1031 × 49.244 2.35e-11 114 Axonemal, Ciliary 86.7 68.6 1 1 0 0.558 1 × 0.442 × × 661 × × 1 × DE Info ↑ ↑ → → → ↓ → ↓ → ↑ ↑ ↑ ↑ ↑ → ↑ ↑ → → → ↑ ↑ ↑ Name Component ID Local Gene (continued) comp224526_c0 TTC8 comp224530_c1comp224530_c1 WNK1 comp224535_c2 WNK1 comp224542_c0 RGS comp224553_c1 ZY11B ASAP1 comp224554_c0 ANMY1 comp224554_c0comp224558_c0 ANMY1 comp224567_c0 STK36 comp224567_c0 CD047 comp224586_c0 CD047 comp224587_c2 IQCH DAAF1 comp224589_c0comp224596_c0 SGSM1 GGT7 comp224608_c2 GIT2 X X X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX Ciliome2Ciliome SysCilia CilDB Ciliopathy RFX Survey Cilia Proteome Ciliome Localization Known Ciliarycilium ift bbs - ift-associated TF inter- network vesicle trafficking rhodopsin transport formation cilia orientation axonemal dynein complex assembly ttc8 basal body Name dnaaf1 axoneme c4orf47 8 5, 6, 8 Ensembl IDENSG00000165533 Source 1, 4, 5, 6, ENSG00000184305ENSG00000060237 3 Gene 3, 6ENSG00000126562 6ENSG00000137747ENSG00000159788 2 fam190a 3, 4, 5 wnk1ENSG00000008853ENSG00000162378 axoneme 3 wnk4 2, tmprss13 3ENSG00000153317 rgs12 axoneme 6, 8 Ciliopathy Survey rhobtb2ENSG00000144504 zyg11b Ciliopathy Survey 2, 3, 4, 5 asap1 Rfx2 ENSG00000139445 trans-golgi 4, ankmy1 5ENSG00000163482 4, 5, 6, 8ENSG00000205129 4, 5, 7ENSG00000197980 Rfx2 foxn4 stk36 4, 5ENSG00000112699 1700029j07rik, centralENSG00000103599 pair 7 2, 4, FoxJ1 5, 7ENSG00000154099 RFX comp224526_c0 1, 2, lekr1 3, 4, Rfx2 iqchENSG00000167037 gmds Rfx2 TTC8 comp224532_c0 4, 5ENSG00000167741 4, 5ENSG00000144730ENSG00000138771 PLMN 3 RFX ENSG00000164403 3 comp224536_c3ENSG00000139436 sgsm1 3 4, 5 Ciliome2 Cilia Ciliome2 Proteome ggt6 RHBT2 RFX shroom3 il17rd shroom1 git2 RFX comp224568_c0 RFX GMDS RFX Rfx2 Rfx2 Rfx2 RFX comp224599_c1 comp224597_c0 comp224599_c1 SHRM4 I17RD SHRM4 236 Ciliary Membrane, Transition Zone Centrosome, Transition Zone, Other Organells Body, Central Pair, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted 36 3.56e-143 446 1 × tity Per- cent Iden- 52.586 1.27e-3761.036 12758.077 9.93e-10628.571 0 8.67e-05 1 32333.088 1.56e-52 × 54141.087 44.3 0.87962.295 8.4e-118 Basal28.862 Body, 183 Centrosome 7.3e-23 × 0.121 3.71e-2456.031 1 379 × 92.4 102 × 147.852 0 1 × 1 1 1566 0 × 68.65543.86 × × 0.553 1264 5.46e-6728.686 0 8.72e-68 × 0.447 219 1383 Transport, Axonemal, 61.187 237 × 5.3e-89 Transport 1 1 1 27165.072 × × 8.39e-105 ×56.522 301 1 6.28e-70 Axonemal, Basal Body, 67.308 3.56e-131 × 0.33954.671 21549.842 Transport, × Axonemal, 373 Basal 0.242 0 0 0.4230.986 × 26.421 7.4e-46 115044.963 × 4.45e-30 951 0.547 179 131 0.453 0 × × 1091 1 1 × × 1 Basal Body × DE Info ↑ ↓ → → → → ↓ ↑ → → ↓ → ↑ ↑ ↑ ↑ ↑ ↑ ↑ → ↑ → → Name Component ID Local Gene (continued) comp224611_c1 FOPNL comp224627_c0comp224627_c0 GLYG comp224633_c1 CC160 ATX10 comp224641_c1comp224645_c0 CC166 comp224645_c0 TT21B TT21B comp224652_c1comp224654_c0 NCKXH SDCG8 comp224656_c0 RSPH1 comp224657_c0 CAPSL comp224657_c0 MTAP comp224690_c0 MYOME X X X X X thy Cil- iopa- action RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FoxJ1FOXJ1 comp224657_c0 CAPSL Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCilia Ciliopathy Survey Survey Ciliome2 CilDB Ciliopathy Survey RFX SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Proteome Ciliome Ciliome2 FoxJ1 comp224669_c0 GNN × 30.221 2.03e-70 249 1 × Localization Known Ciliarysatellite bb centriolar satellite TF inter- localisation unstudied) centriole transition zone bb tz central pair radial spoke fopnl centriolar capsl CilDB Ciliome2 Cilia rsph1 cilium axoneme Name ttc21a ift-a (presumed bc030307 8 5, 6 6, 7, 8 7 2, 7 ttc41, Ensembl IDENSG00000133393 Source 1, 4, 5, 6, ENSG00000143870ENSG00000158423 Gene 3 1, 2, 3ENSG00000203952 4, 5ENSG00000130638 6, 8ENSG00000154447ENSG00000110987 ribc1 pdia6 3ENSG00000255181 ccdc160 3 4, 5ENSG00000123607 atxn10 1, 6, 7, 8 unclear ENSG00000168026 sh3rf1 1, ccdc166 2, 3, 4, bcl7a ttc21b CilDB Cilia ProteomeENSG00000074370 axoneme ift-aENSG00000185052 3 4, 5ENSG00000054282 SysCilia CilDB Ciliopathy Rfx2 6, 8ENSG00000160188 atp2a3 slc24a3 1, 2, 4, 5, basal Rfx2 body ENSG00000152611 RFX 1, 2, 4, 5, ENSG00000105519 1, 2ENSG00000099810 Rfx2 comp224625_c1 4, 5 RFX ENSG00000117139 Rfx2ENSG00000117614 3ENS- 3MUSG00000044937 PDIA6 capsENSG00000080854ENSG00000115207 3 mtapENSG00000136861 comp224638_c1 3ENSG00000114770 comp224639_c0 6 kdm5bENSG00000134243 3 SH3R3 3 syf2 Rfx2 RFX BCL7A igsf9b CilDB Cilia Proteome bb (centrosome) abcc5 Ciliopathy sort1 Survey comp224647_c0 ATC1 RFX Rfx2 Rfx2 Rfx2 Rfx2 comp224661_c0 Rfx2 Rfx2 comp224661_c0 KDM5A comp224686_c0 KDM5A comp224687_c0 comp224704_c0 TUTL TF3C2 comp224709_c0 × MRP5 34.42 SORT 3.72e-158 496 1 × 237 Body Axonemal, Basal Body Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Centrosome, Ciliary Membrane, Transition Zone Found Category Score EValue BitScore Weighted 75 1.92e-179 505 0.534 × Basal Body, Centrosome tity Per- cent Iden- 47.30652.713 042.965 034.816 745 044.118 3.18e-67 1.52e-48 0.536 64661.798 87845.019 3.32e-74 238 × 157 0.46445.182 Signaling, 233 Axonemal, Basal 0 × 37.669 5.33e-113 1 129.808 1.64e-61 135.424 2.43e-40 × × 386 601 3.01e-40 1 × 43.448 22645.205 160 × 34.89 157 0.585 138.148 1 4.93e-77 0 0.41553.752 0 0.115 × 56.953 × × × 65.208 0 246 580 × 0 625 0 0.426 0 942 0.459 1 89760.843 × 95277.934 5.26e-153 × 61041.775 × 4.14e-127 0.48528.612 1 3.99e-104 0.515 441 1.38e-41 369 × × 335 × 1 0.46665.409 157 4.12e-160 × ×36.701 130.645 1 4.93e-99 Signaling, 462 Transport, Basal37.759 Body 1.18e-26 × 178.319 8.86e-67 × 63.324 2.65e-130 313 × 109 1 244 380 Transport,62.677 Axonemal, Basal 063.674 × 1 0.391 1 0.609 Basal 1819 Body, × 0 Transition Zone × × × 0 0.158 1387 1870 × 0.12 0.162 Axonemal, Central Pair, × × DE Info ↑ ↑ ↑ → → → ↑ ↓ → → → → → → ↓ → → ↑ ↑ ↑ ↓ → ↓ ↓ ↓ ↓ → → → → → Name Component ID Local Gene (continued) comp224720_c2 KIF19 comp224723_c0 DOP1 comp224730_c0comp224737_c1 ARF3 ANKR5 comp224768_c1 CTL4 comp224790_c0comp224798_c0 KIF17 comp224798_c0 POC1A comp224818_c0 POC1A GP161 comp224819_c0 PLK1 comp224838_c0 ANK2 comp224838_c0 ANK2 X X X X X X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey Ciliome2 Cilia Proteome Survey SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Localization Known Ciliaryaxoneme (tip) TF inter- ift-kinesin bb central pair kif19 ciliary tip kif17 ciliary tip Name poc1a centriole bb SysCilia CilDB Ciliopathy akd1, ak9 Ciliome2 RFX 8 7, 8 8 7 Ensembl IDENSG00000196169 Source 2, 4, 5, 6, ENSG00000265681ENSG00000083097 7 Gene 3, 4, 5ENSG00000118217ENSG00000152213 3 4, 5ENSG00000242247 dopey1ENSG00000132623 rpl17 3 2, 4, 5ENSG00000198862ENSG00000126016 3 arl11ENSG00000114019 3ENSG00000213347 arfgap3 ankef1 3 4, 5ENSG00000129353ENSG00000204385 2 Ciliome2ENSG00000174226 2ENSG00000144554 3 ltn1ENSG00000138190 amot 3 amotl2ENSG00000144036 8ENSG00000117245 8 slc44a2 2, 4, 5, 6, slc44a4ENSG00000164087 snx31 1, fancd2 4, 5,ENSG00000139323 6, exoc6ENSG00000104522 RFX 3, exoc6b 6ENSG00000125637 7ENSG00000143147 3 6, 8 Rfx2 RFX ENSG00000166851 poc1b comp224720_c2 Rfx2 RFX 6, 8 bbENSG00000110328 tsta3ENSG00000101197 gpr161 SysCilia psd4 3 SysCiliaENSG00000139746 KIF19 3 cilium axonemeENSG00000188313 3ENSG00000144451 2 comp224729_c1 Rfx2 SysCilia Ciliopathy Rfx2 Survey Rfx2 1, 6, 7, plk1 8 comp224736_c4 RFX Ciliopathy SurveyENSG00000146386 transition galntl4 zone ATF6A ENSG00000155085 FoxJ1 7 FoxJ1 birc7 2, ARFG3 spag16 3, rbm26 4, 5, Rfx2 plscr1 axoneme Rfx2 comp224750_c2 comp224766_c1 Rfx2 comp224766_c1 comp224768_c1 abracl LTN1 comp224768_c1 AMOT AMOT comp224772_c0 CTL4 comp224781_c0 comp224783_c0 comp224783_c0 CTL4 Cilia Proteome SNX17 FACD2 EXC6B Rfx2 EXC6B FoxJ1 Rfx2 comp224801_c0 comp224799_c0 Rfx2 Rfx2 PSD3 comp224835_c1 FCL comp224822_c0 ALDH2 comp224832_c1 comp224835_c1 GALT5 BIRC2 RBM26 comp224838_c0 ANK2 238 Body, Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted 69 9.04e-40 157 0.014 × 25 3.07e-12 73.6 1 × tity Per- cent Iden- 60.59659.841 053.886 2.38e-61 1895 063.08177.037 0.164 219 192573.783 6.17e-59 1.89e-123 × 0 0.01933.891 0.16735.652 3.75e-29 221 40844.46 × 2.92e-41 × 152456.034 0.01932.996 2.82e-37 118 0.035 0.13237.828 2.09e-66 152 × 46.972 0 1.41e-103 × 24.597 × 3.7e-162 138 0.01 9.56e-25 241 34661.212 545 1 × 0.202 4.26e-68 481 0.411 111 0.589 × 0.79854.941 × × 7.95e-79 224 × 47.093 × 154.93 1 0.212 257 × 60.87 0 × × 0.244 0 574 × 061.979 67653.333 1.26e-89 0.54440.502 3.65e-100 66043.223 2.9e-132 × 3.14e-76 261 1 29835.484 1.53e-141 405 1 × 24955.652 1 446 × 164.758 6.72e-4061.824 1 × 58.306 Transport, × Axonemal, Basal 1 6.26e-129 158 0 × 1 0 × 34.286 370 × 2.4e-64 910 1 701 Axonemal, Basal Body, × 241 1 1 1 × × 1 × × Basal Body, Centrosome, DE Info → → → → → → → → → ↓ ↓ ↓ ↓ → ↑ ↑ ↑ ↑ ↑ → ↓ → ↑ → ↑ → ↑ ↓ ↓ ↑ ↑ Name Component ID Local Gene (continued) comp224838_c0comp224838_c0 ANK2 comp224838_c0 ANK2 ANK2 comp224864_c0comp224866_c0 CD158 comp224866_c0 AMFR comp224872_c0 AMFR comp224877_c2 ZDH17 comp224890_c1 BBS7 TRIM9 comp224911_c0 BRE4 comp224935_c0comp224939_c0 TECR STIL X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia comp224917_c2 TPPC9 Localization Known Ciliary TF inter- cilium bbs - ift-associated centrosome basal body Name 6, 7, 8 Ensembl IDENSG00000151150 Source 3, 4, 5ENSG00000145362 4, 5ENSG00000159214 Gene 2, 4, 5 ank3ENSG00000149294ENSG00000163701 3ENSG00000149532 3 ank2ENSG00000163348 ccdc24 3ENSG00000183475 3ENSG00000159212 3ENSG00000221955 2ENSG00000188038 3 ncam1ENSG00000187764 2 il17reENSG00000092421 3 cpsf7ENSG00000109586 3 pygo2ENSG00000163749 3 asb7 4, 5ENSG00000153347 slc12a8 clic6 3, 4, 5 sema4d nrn1lENSG00000154114 sema6a 4, ccdc158 5 galnt7ENSG00000159461ENSG00000186908 fam81b 2 RFX 4, 5ENSG00000138686 1, 2, Cilia 4, Proteome 5, RFX tbcelENSG00000164309 RFX 4, zdhhc17 5ENSG00000188042 amfrENSG00000050130 3ENSG00000138079 Rfx2 3ENSG00000158850 FoxJ1 2 Rfx2 4, 5 Rfx2ENSG00000167632 cmya5 Rfx2 8 Rfx2ENSG00000099204 Rfx2 arl4c jkampENSG00000013364 3 comp224838_c0 comp224848_c0 Ciliome slc3a1 b4galt3ENSG00000092820 Rfx2 FoxJ1 3, 7 comp224838_c0ENSG00000099797 Rfx2 2 comp224838_c0 trappc9 RFX 2, 4, comp224838_c0 Rfx2 5 ANK2 CLIC4 ENSG00000123473 basal RFX body comp224838_c0 ANK2 comp224849_c0 4, ablim1 5, ANK2 6, 8 ANK2 comp224849_c0 mvp comp224855_c2 ANK2 comp224855_c2 tecr FoxJ1 S12A8 RFX ezr comp224860_c0 S12A8 RFX SEM1A stil SEM1A centrosome bb GALT7 comp224866_c0 SysCilia Ciliopathy Survey RFX Cilia RFX Proteome Cilia Proteome AMFR Ciliome Rfx2 Rfx2 FoxJ1 RFX FoxJ1 RFX comp224893_c0 Rfx2 comp224906_c0 comp224908_c0 Rfx2 comp224933_c2 ARL4A JKAMP SLC31 RADI comp224925_c0 comp224927_c0 ABLM1 MVP 239 Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Found Category Score EValue BitScore Weighted 44 1e-44 14750 2.81e-43 1 × 171 1 × tity Per- 62.5 1.59e-32 123 0.139 × 46.5 8.99e-61 189 1 × cent Iden- 50.49530.495 5.93e-32 035.836 9.3e-37 13429.385 76442.446 7.1e-50 14246.869 1.42e-141 0.861 9.46e-163 163.394 184 426 × 48.414 × 471 152.041 4.85e-58 065.449 × 0 1 1 149.722 202 883 × × 0 57763.851 × 0.127 0 1394 1 × 052.326 1 1071 0.873 7.25e-45 × Basal Body, Centrosome 36.646 × 112833.735 4.46e-73 × 175 161.856 262 1 0 × 6.21e-8244.67 1 × Centrosome 1360 2.89e-5058.873 256 1 × 0.842 183 × 0.15838.747 0 × × 0 1 59027.85 × 2.56e-4336.513 548 1 0.763 170 × 0 × 0.237 Axonemal, Basal Body, 628 Axonemal, Basal Body, × 1 × DE Info ↑ ↑ ↑ ↑ ↑ ↑ ↑ → ↑ ↑ ↑ → ↑ ↓ ↓ ↑ → → ↑ ↑ → ↓ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp224941_c0comp224941_c0 RBGP1 comp224945_c0 RBGP1 comp224946_c0 CCD87 BICR1 comp224952_c0 ALG6 comp224969_c0 NNTM comp224979_c1comp224994_c0 WDR67 comp224999_c1 MCM4B PSF3 comp225021_c1comp225021_c1 TTC40 comp225037_c1 PACRL NF2L1 comp225041_c0 ODFP2 comp225041_c0comp225043_c1 ODFP2 CC108 X X X thy Cil- iopa- action FOXJ1 FoxJ1 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX Ciliopathy Survey RFX Ciliome2SysCilia RFX SysCilia Ciliopathy Survey CilDB Ciliome2 Cilia Proteome Ciliome comp225039_c0 GCP3 Localization Known Ciliary TF inter- centriole bb distal end centriolar satellite centriole cytosol subdistal appendage and distal appendage ttc40 bicdl2 Name cfap65 tbc1d31 ccdc108, 5, 7 Ensembl IDENSG00000159708 Source 4, 5ENSG00000152061 2, 3ENSG00000182791 Gene 2, 3, 7ENSG00000162069 2, lrrc36 4, 5 rabgap1lENSG00000162510ENSG00000187239 ccdc87 2ENSG00000088035 3 ccdc64b, 4, 5ENSG00000099308ENSG00000106278 3ENSG00000103540 3 4, 5, matn1 6,ENSG00000112992 8 fnbp1ENSG00000162222 7 alg6ENSG00000156787 3 Ciliome2 4, 5, mast3 cp110, 6ENSG00000104738 ptprz1 4, 5ENSG00000181938 4, 5ENSG00000140320 wdr67, nntENSG00000170011 ttc9c 3ENSG00000119139 2 Rfx2 ENSG00000171811 mcm4 RFX 2 Rfx2 2, 4, 5, 7 gins3ENSG00000163138 4, 5 bahd1ENSG00000050344 cfap46, myrip 4, 5ENSG00000126216 Ciliome2 tjp2 8 FoxJ1ENSG00000136811 pacrgl Rfx2 6, 8 RFX Rfx2 tubgcp3 Rfx2ENSG00000122417 basal body comp224947_c0 4, 5ENSG00000181378 Cilia Proteome odf2 comp224948_c0 1, 2, 3, 4, Rfx2 CO6A6 basal body bb comp224955_c0 RFX FNBP1 comp224968_c0 odf2l RFX FoxJ1 MAST2 comp224969_c0 PTP99 Rfx2 comp224975_c3 FoxJ1 NNTM TTC9C comp225018_c0 RFX comp225001_c1 RFX comp225002_c0 ZO1 BAHD1 MYRIP RFX 240 Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Body Found Category Score EValue BitScore Weighted 40 1.15e-07 54.7 0.21 × 56 2.61e-138 418 0.561 × tity Per- cent Iden- 32.827 4.85e-39 15036.278 4.94e-49 0.45635.793 × 179 3.62e-9642.213 Basal Body, Centrosome 0.54439.08 9.68e-5651.522 306 5.07e-103 1.91e-142 ×43.651 181 8.64e-58 323 Axonemal, Basal Body, 419 148.461 206 1 × 1 1 × 073.529 0.79 × 52.128 4.41e-60 × 54.292 × 70239.557 7e-34 22941.208 0 1.01e-129 117 031.298 126.933 8.86e-99 1 436 954 0.10932.682 3.85e-44 × 95663.571 2.45e-98 × 42.484 × 326 0.313 0.891 Axonemal, Ciliary 41.324 5.62e-138 166 0.68763.462 4.67e-107 325 × 0 × 0.39944.624 7.08e-163 × 406 0.203 2.24e-98 Signaling, 330 0.398 Axonemal, × 67.092 Basal 1302 472 Basal × 46.574 Body × 296 1 0 1 1 0 0.34345.125 1 × 1.15e-103 × × 568 × × 34.134 680 327 1.33e-83 0.65761.619 0.439 × 47.826 1 292 4.31e-132 × × 0 384 1 870 × 1 1 × × DE Info ↑ ↑ → ↓ ↑ ↑ → → ↑ ↑ ↓ ↓ ↑ ↑ → → → → → ↓ ↓ → → ↑ ↑ ↑ ↑ → ↑ Name Component ID Local Gene (continued) comp225047_c1 NINL comp225047_c1comp225051_c5 NINL NEK11 comp225057_c0comp225058_c0 TCTE1 comp225059_c0 RN219 EFHC2 comp225081_c0comp225081_c0 KIF27 KIF27 comp225104_c0comp225106_c0 HOME2 LRRC9 comp225122_c1comp225129_c0 UBP2 CCD57 X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia Ciliopathy Survey Ciliopathy Survey SysCilia Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey Localization Known Ciliarysubdistal appendage subdistal appendage centrosome TF inter- vesicle trafficking axoneme (tip) efhc2 axoneme CilDB Ciliopathy Survey Name 5, 6, 7 Ensembl IDENSG00000100503 Source 4, 5, 6, 8ENSG00000101004 Gene 6, 8 ninENSG00000114670 centriole bb 4, 5ENSG00000169926ENSG00000049283 3ENSG00000146221 ninl 2 4, 5, 7 basalENSG00000152193 body nek11 4, 5ENSG00000189050ENSG00000183690 3 1, 2, 3, tcte1 4, epn3ENSG00000079432 rnf219ENSG00000023572 3ENSG00000109436 7ENSG00000165115 rnft1 3 4, 5, 6,ENSG00000166813 8 6, 8ENSG00000090530 Ciliome2ENSG00000141696 3ENSG00000110811 cic glrx2 3 kif27 tbc1d9ENSG00000090861 3ENSG00000146233 bb 7ENSG00000114796 2ENSG00000147596 kif7 2ENSG00000103942 3 leprel1 ciliary tip RFX 4, leprel4 5 RFX ENSG00000143437 leprel2ENSG00000131951 3 SysCilia Ciliopathy cyp39a1 Rfx2 Survey aars 2, Ciliome2 3, 4, 5 FoxJ1 klhl24 prdm14 RFX homer2ENSG00000143258 RFX 4, 5ENSG00000036672 lrrc9ENSG00000176155 Rfx2 3 arnt comp225054_c0 2, comp225055_c0 4, 5ENSG00000163655 Ciliome2ENSG00000105963 usp21 7 KLF13 7 EPN2 Rfx2 ccdc57 Rfx2 comp225058_c0 usp2 Ciliome RN219 gmps Rfx2 comp225070_c0 adap1 Rfx2 comp225061_c0 comp225070_c0 Rfx2 FoxJ1 GLRX2 TBCD9 FoxJ1 Rfx2 CIC RFX RFX comp225083_c0 comp225083_c0 Ciliome2 comp225083_c0 comp225092_c1 comp225097_c0 Rfx2 P3H1 comp225100_c0 P3H1 comp225102_c1 P3H1 CP39A SYAC RFX KLH24 RFX PRD14 Rfx2 comp225104_c0 ARNT comp225122_c1 comp225132_c1 UBP2 GUAA comp225133_c0 ADAP1 241 Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 46.21745.38370.35423.735 3.95e-118 068.603 0.000897 049.883 33754.268 1.08e-120 43.1 62251.718 6.05e-54 0 62442.131 36733.582 7.88e-107 0.499 1 3.08e-19 198 0.501 0 827 136.749 0.264 × 321 × × 0.142 93.2 × 0.594 × 57344.054 044.608 × × 140.51642.857 1 0 74735.024 1 × 0 1.2e-81 × 0 × 0 70254.902 71427.429 3.6e-127 1 287 635 5.98e-62 0.255 69728.372 × 0.26 1.78e-67 372 0.231 × 46.043 238 0.254 1 6.49e-32 × × 44.485 256 × 0.48244.299 1.4e-126 × 156.569 1.6e-126 119 0.51835.088 × 2.48e-94 × 409 1.1e-42 × 414 Basal Body 43.413 317 1 0.497 6.02e-93 Basal Body 153 0.503 × 47.801 × 50.299 × 302 153.791 1 0 × 0.177 0 × 076.386 × 69641.065 709 1460 0.408 045.37 0.415 0 × 1730 1 × 997 0 × 1 Axonemal, 983 Basal Body, 1 × × 1 × DE Info → → → ↑ → → → → ↓ → ↑ ↑ ↑ ↑ ↑ ↑ → ↑ ↑ ↑ ↑ ↑ ↑ → ↓ ↓ ↓ → ↑ ↑ ↑ Name Component ID Local Gene (continued) comp225170_c2 VWA3B comp225177_c1 YC006 comp225194_c1comp225194_c1 PCNT comp225201_c0 PCNT IQCD comp225216_c0comp225224_c0 B3GT1 ACSL1 comp225228_c0 PI3R4 comp225230_c0 ANKAR X X X thy Cil- iopa- action FOXJ1 Rfx2 FOXJ1 FoxJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 Rfx2 FoxJ1 FOXJ1 FoxJ1 Table B.2.1: Proteome Ciliome Cilia Proteome Ciliome Localization Known Ciliary TF inter- Name 2 plk1s1 FoxJ1 comp225165_c0 KIZ Ensembl IDENSG00000148341 SourceENSG00000142687 3ENSG00000004455 3ENSG00000149506 7ENSG00000105220 2ENSG00000197381 Gene 7ENSG00000060491 3ENSG00000198719 sh3glb2 3 kiaa0319lENSG00000113369 3ENS- 3MUSG00000074749 ak2ENSG00000168658 zp1 3, gpi1 adarb1 4, 5, 7ENSG00000154928 ogfrENSG00000116106 3 arrdc3 dll1ENSG00000142627 3 vwa3bENSG00000182580 3ENSG00000187695 3 2, 4, 5 Ciliome2ENSG00000100811 Ciliome2 ephb1ENSG00000127914 3 epha4 rp11-723o4.6 2, 6 epha2ENSG00000160299 ephb3 4, 5, 6ENSG00000166578 4, 5, 7ENSG00000101577 Rfx2 Rfx2ENSG00000134324 akap9 3, 5 yy1ENSG00000143514 2 bbENSG00000176597 (centrosome) 2 3, FoxJ1 4, Ciliopathy 5 bb Survey Cilia iqcd Rfx2ENSG00000184154 comp225136_c0 comp225136_c0 1, Rfx2 2, lpin2 3 Rfx2 Rfx2ENSG00000164398 tp53bp2 comp225137_c0 b3gnt5 lpin1 RFX ENSG00000151726 1 K319L comp225158_c0 K319L ENSG00000196455 comp225149_c0 3 Ciliopathy Survey 6 comp225158_c0 lrtomt KAD2 comp225158_c0 Ciliome2 Rfx2 NETR ENSG00000005007 RFX G6PI comp225163_c1 comp225161_c0 Rfx2ENSG00000151687 RED1 3 Rfx2 acsl6 RFX 2, 4, 5, Rfx2 7 acsl1 Cilia Proteome G6PI ENSG00000115183 pik3r4 ARRD3 DLL1 3 axoneme golgi comp225174_c0 CilDB ankar Cilia Proteome Rfx2 comp225174_c0 Ciliopathy RFX Survey upf1 comp225174_c0 comp225174_c0 FoxJ1 EPA4A EPA4A CilDB tanc1 EPA4A EPA4A RFX Rfx2 comp225180_c0 RFX FoxJ1 comp225214_c0 Ciliome2 TYY1 comp225202_c0 ASPP1 comp225202_c0 LPIN2 Rfx2 RFX LPIN2 Rfx2 comp225224_c0 comp225224_c0 Rfx2 ACSL1 ACSL1 comp225229_c1 RENT1 comp225235_c0 TANC2 242 Centrosome, Transition Zone, Other Organells Membrane, Transition Zone Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 43.865 6.53e-7631.636 25171.979 1.62e-82 299 1 059.53347.333 × 4003 1 041.398 081.897 1.36e-34 0.553 × 1.66e-71 324033.19 × 610 143 4.8e-12045.538 0.447 211 × 37229.004 1 1 0 3.06e-63 1 × × 61.389 × 1 643 2.75e-137 238 Axonemal, Basal62.319 Body, 30.999 Transport 3.62e-20 × 423 3.1e-9339.362 1 1 94.4 1.81e-2852.296 × 312 × 132.549 1.35e-137 2.24e-97 124 Basal Body,36.731 Centrosome × 1 397 128.311 332 Axonemal, × Ciliary 52.279 0.000331 1 0 × 138.095 × 45.4 3.4e-35 1 0 683 × 0.062 × 34.168 0.938 14331.009 776 × 4.12e-94 ×81.18 0.073 081.127 Regulation, 1 324 Motility ×83.944 1827 0 × 0 Transport, Axonemal, Basal 0.927 1 0 Basal Body 606 616 × × 631 0.327 0.332 Centrosome × 0.341 × Motility × Motility DE Info ↑ ↑ ↑ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↓ → → ↑ → ↑ ↓ ↓ ↑ ↑ ↑ ↑ ↓ ↓ ↓ Name Component ID Local Gene (continued) comp225237_c0 CCD96 comp225248_c0 DYH7 comp225268_c0comp225272_c2 AHI1 comp225275_c0 PDE6D comp225280_c0 CHFR comp225285_c0 PIBF1 comp225293_c0 LRIQ1 comp225295_c0 TBX3 comp225298_c0 RHG12 comp225306_c1 DLGP2 comp225307_c0 CC171 comp225308_c0 PTK7 comp225310_c0 K0556 LCA5 comp225315_c0comp225321_c0comp225321_c0 CE128 GNAI GNAI X X X X X X X X X thy Cil- iopa- action RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 FOXJ1 Table B.2.1: Ciliome CilDB Ciliome2 Ciliome RFX SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX Ciliopathy Survey SysCilia Ciliopathy Survey Ciliome2 Ciliopathy SurveyCiliopathy Survey RFX Localization Known Ciliary TF inter- transport matrix bb (centrosome) cilium motility cilium axoneme tz (connecting cilium) appendage orientation basal body Name ccdc96 Ciliome2 Cilia Proteome dnah7c, dnah7b, 7 5, 7 8 Ensembl IDENSG00000173013 Source 2, 3, 4, 5, ENSG00000186635ENSG00000118997 Gene 2 1, 2, 3, 4, ENSG00000174844ENSG00000135541 7 3, 6, 8 arap1ENSG00000162607ENSG00000156973 3 6, 8ENSG00000072609 dnah12 ahi1 4, 5ENSG00000083535 basal body 4, tz 5, 6, 8ENSG00000133640 pde6d usp1 SysCilia 2, Ciliopathy 4, Survey 5 lipidated protein ENSG00000135111 chfr Rfx2 6ENSG00000165030 pericentriolar ENSG00000159314 3 lrriq1 4, 5ENSG00000080845 4, 5ENSG00000141985ENSG00000164989 3 arhgap27 FoxJ1 4, 5ENSG00000112655 axoneme 6ENSG00000156860 dlgap4ENSG00000047578 3 ccdc171 4, Ciliopathy 5, sh3gl1 Survey 6ENSG00000135338 comp225239_c0 4, 5, 6, 7, Rfx2 ptk7 kiaa0556ENSG00000165164 ARAP1 ENSG00000100629 cell bb 7 polarity fbrs RFX 4, 5, 6ENSG00000114353 comp225248_c0ENSG00000065135 6 6ENSG00000127955 RFX comp225270_c0 DYH7 gm7173 cep128 7 Ciliopathy Survey subdistal UBP1 RFX gnai2 gnai3 Rfx2 ciliary motility RFX ciliary RFX gnai1 Ciliopathy Survey RFX Rfx2 comp225294_c0 Rfx2 NFIL3 Ciliome2 comp225304_c0 SH3G3 comp225307_c0 AUTS2 comp225310_c0 CX022 comp225321_c0 GNAI 243 Centrosome, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 67.614 3.85e-9241.74 26673.521 0 138.976 825 032.419 8.95e-177 ×41.86 4.06e-58 0.597 3.25e-28 553 556 Basal Body, Transition Zone 35.42 218 × 26.745 0.403 102 9.17e-79 160.819 × 59.804 0 2.2e-63 142.793 × 289 1 × 672 0 22645.72 × 0 1 5.95e-150 0.165 114170.486 1310 1 × 1.02e-140 470 × 0.835 × 34.254 Centrosome 44158.101 1.09e-18 × 1 1.99e-114 144.44430.464 89.4 × 1.12e-08 × 358 143.654 1.73e-1458.075 54.7 ×51.792 76.6 2.52e-131 1 0 1 4.63e-98 Axonemal, 0.417 Basal Body, × 0.583 37836.709 × × 947 2.48e-21 297 × 35.96541.578 1 87.4 1.9e-2155.637 1.13e-125 1 133.661 8.42e-143 × 2.24e-65 × 98.2 × 37533.079 1 47344.781 1.02e-67 2.83e-75 218 × 1 1 224 0.493 1 242 × × 0.507 × × × 1 × DE Info → ↑ ↑ → ↑ ↑ ↑ ↑ → → ↑ → ↑ → ↑ ↓ ↓ ↑ → → ↑ ↓ → → ↓ ↓ ↑ Name Component ID Local Gene (continued) comp225322_c0comp225327_c0 B9D2 TTC18 comp225327_c0 WDR92 comp225333_c0 CA194 comp225337_c0 CCD18 comp225344_c0comp225348_c0 WDR52 comp225350_c0 ARHG4 NEK4 comp225353_c3 ARMC9 comp225359_c0comp225366_c0 WDR90 comp225375_c3 NIPA2 CA158 comp225399_c0comp225401_c0 5HT1R WASL X X X thy Cil- iopa- action RFX FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey CilDB Ciliome2 Cilia Proteome Ciliome Ciliome2Ciliopathy SurveyCiliome2 RFX RFX SysCilia Ciliopathy Survey RFX RFX Ciliome2 RFX Localization Known Ciliarytz TF inter- satellite ciliary rootlet bb rootlet b9d2 transition zone ttc18, Name cfap70 cfap44 c1orf158 1700013f07rik 8 5, 7 Ensembl IDENSG00000123810 Source 1, 4, 5,ENSG00000156042 6, 1, 2, 3, 4, Gene ENSG00000243667 3, 4, 5ENSG00000180448ENSG00000005483 3ENSG00000179902 3 2, wdr92 4, 5, 7ENSG00000068650ENSG00000122483 3 c1orf194, 4, hmha1 5, 6ENSG00000141522ENSG00000141279 7 mll5ENSG00000206530 3, 7 3, 4, 5, 7 ccdc18 atp11aENSG00000182957 centriolar 3, 4, 5 arhgdia npepps wdr52, ENSG00000114904 4, 5, 6, 8ENSG00000170091 spata13ENSG00000135931 3 4, 5ENSG00000165188 nek4ENSG00000122203 3ENSG00000161996 basal body 3 RFX Ciliome2 Ciliome2 4, 5ENSG00000137106 hmp19ENSG00000163293 armc9 7 3, Rfx2 4, 5 rnf183 kiaa1191ENSG00000157330 Rfx2 wdr90 2, 4, 5, 7ENSG00000101216 Rfx2 Rfx2 nipal1ENSG00000111684 grhpr 1700012p22rik, 3ENSG00000124313 3 comp225331_c0ENSG00000150594 3 4, comp225332_c0 5ENSG00000184160 RHG29 ENSG00000015285 3 RFX 4, gmeb2 comp225334_c0 5 comp225340_c0 MLL5 lpcat3 comp225340_c0 iqsec2 adra2a Ciliome2 AT8B2 PSA adra2c PSA was Rfx2 RFX Rfx2 Rfx2 RFX comp225352_c0 RFX comp225356_c1 comp225356_c1 CEP57 comp225365_c1 TRIM3 TRIM3 Rfx2 Rfx2 Rfx2 GRHPR RFX Rfx2 RFX comp225378_c0 comp225383_c0 comp225393_c1 GMEB1 MBOA5 comp225399_c0 IQEC1 5HT1R 244 Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- 50.2 1.52e-167 485 1 × cent Iden- 43.53841.55145.619 4.87e-71 0 9.24e-8761.527 226 624 27159.228 0 126.531 1 147.998 5.6e-82 5890 0 × 47.798 × ×39.024 0.504 3.24e-156 5791 0 282 Axonemal, Basal Body, 066.129 × 49564.853 0.496 1718 1722 Axonemal, Ciliary 1 × 067.558 0.499 0 158.117 × 0.501 1.44e-128 × 1297 × × 1261 036.585 389 0.507 1.8e-12 Axonemal, Basal Body, 41.057 0.493 748 × 48.862 × 65.929.201 1 028.088 1.19e-14851.082 3.59e-60 1 × 0.05 048.495 493 581 × Axonemal, Ciliary × 229 0 680 0.683 0.438 054.639 Basal Body, Transition Zone 0.31760.521 7.3e-28 0.512 × × 59528.986 81067.059 × 2.75e-30 × 58.904 8.07e-34 0 114 1.43e-127 0.87738.76 1 117 122 0.123 368 × 662 2.8e-97 × 44.04 × 0.249 1 0.751 1.14e-7955.333 311 1 × 3.53e-111 × × 47.005 244 1.13e-113 × 325 1 0.429 401 0.571 × × × 1 × DE Info → → → ↑ ↑ ↑ ↓ ↓ ↑ ↓ ↓ ↑ ↑ ↑ → → → ↑ ↑ ↑ ↑ ↑ → → → → ↑ ↑ ↑ → Name Component ID Local Gene (continued) comp225417_c0comp225426_c0 R3GEF comp225426_c0 DYH5 DYH5 comp225446_c0 TTC17 comp225456_c1comp225469_c1 ANR42 comp225471_c1 WDR78 comp225471_c1 ELMO1 ELMO1 comp225484_c0 WDR63 comp225506_c0comp225514_c0 DZAN1 comp225514_c0 ELOV4 ELOV4 X X X X thy Cil- iopa- action Rfx2 FoxJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Proteome Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Proteome Ciliome Localization Known Ciliarycentrosome bb TF inter- axonemal dynein complex membrane? axonemal dynein tz Name dnah5 axoneme wdr78 axoneme wdr63 CilDB Ciliome2 Cilia 7 wdr957, 8 6, 7, 8 5, 7 comp225406_c0 WDR49 Ensembl IDENS- SourceMUSG00000029658 ENSG00000205476ENSG00000127328 3 6, 8 Gene ENSG00000039139 1, 2, 3, 6, ENSG00000124721 ccdc85c rab3ip 1, 2, 3,ENSG00000007062 7 basalENSG00000135636 body 1ENSG00000138119 7ENSG00000052841 3 dnah8 6ENSG00000105953ENSG00000197444 7 prom1ENSG00000137494 2 4, dysf 5ENSG00000127334 myofENSG00000152763 ttc17 7 1, 2, 4, cytosol 5, plasma CilDB Ciliome2 Cilia ENSG00000144061 ogdh ankrd42 ogdhl 1, 5, 6,ENSG00000102890 8 CilDB 4, 5ENSG00000155849 dyrk2 Ciliome2ENSG00000175899 Rfx2 3 nphp1ENSG00000125730 3ENSG00000165895 2 transition zone ENSG00000162643 3 elmo3 1, Ciliome2 2, 3, 4, Cilia Proteome elmo1ENSG00000155368 comp225414_c0ENSG00000159423 3 Ciliome2 a2m arhgap42ENSG00000171155 3ENSG00000141294 c3 CC85C 3ENSG00000147687 3 FoxJ1ENSG00000154309 Rfx2 2ENSG00000089091 3 2, 4, 5 aldh4a1 c1galt1c1 dbiENSG00000197977 comp225430_c0 RFX comp225441_c0 4, 5ENSG00000118402 lrrc46 comp225450_c0 tatdn1 comp225441_c0 dzank1 4, PRM1A 5ENSG00000187605 disp1 MYOF 3 comp225450_c0 ODO1 MYOF elovl2 RFX elovl4 ODO1 comp225460_c0 Rfx2 Cilia tet3 Proteome DYRK2 Rfx2 Rfx2 FoxJ1 comp225471_c1 RFX Rfx2 comp225478_c0 Rfx2 comp225480_c0 Rfx2 comp225478_c0 ELMO1 Rfx2 FoxJ1 CD109 RHG26 CD109 Rfx2 comp225487_c2 comp225492_c0 comp225484_c0 RFX comp225504_c1 comp225504_c1 RFX AL4A1 C1GLT WDR63 comp225505_c0 TATD1 TATD1 Rfx2 DISP1 × 41.383 comp225519_c0 0 TET2 803 1 × 245 Membrane, Transition Zone Centrosome, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Found Category Score EValue BitScore Weighted 75 4.03e-85 273 1 × tity Per- cent Iden- 49.73947.629 031.2451.376 039.638 2.97e-27 81743.011 0 752 3.77e-173 116 0.521 042.67836.313 922 0.479 510 × 0.112 1.55e-73 663 0 × 34.496 0.888 Basal × Body 42.805 2.94e-41 240 136.478 6.55e-163 × 927 1 2.4e-22 0.62749.597 143 × 527 × × 0.37354.34 1 94 Axonemal, 0 Ciliary 51.985 Basal 6.14e-72 Body 165.506 × × 68.285 59858.696 × 1 0 25428.094 2.24e-49 033.567 2.57e-19 0.398 × 0 0.169 649 178 × 884 90.5 × 0 914 0.43239.822 Axonemal, Basal 0.09 Body, 0.44729.032 9.08e-178 0.463 × 581 4.15e-66 1 × × 50.137 549 × 1.24e-118 × 236 1 36526.376 1 5.58e-43 × 145.75 × 1 166 3.44e-88 × ×38.889 Transition Zone 31738.534 0.000301 1 Transport, Axonemal, Basal 4.33e-82 44.3 × 1 300 × 1 Centrosome 1 × × Basal Body, Transition Zone DE Info ↑ ↑ ↑ ↑ → → ↑ ↑ ↑ ↑ → → → → ↓ ↓ ↓ ↑ ↑ → ↑ ↑ → → ↑ → ↓ Name Component ID Local Gene (continued) comp225521_c1comp225521_c1 CE120 CE120 comp225538_c0comp225547_c0 ANO4 comp225554_c1 RHPN2 EFCB7 comp225561_c0comp225562_c0 SPAS1 DVL3 comp225580_c0comp225590_c0 ADGB comp225600_c0 SOCS5 comp225614_c0 CE042 RPGR comp225618_c0comp225621_c1 FBX15 PCM1 X X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX Localization Known Ciliary TF inter- centriole cilium tz satellite centriolar satellite Name Ensembl IDENSG00000168944 Source 4, 5, 6ENSG00000185264 4, 5ENSG00000142798ENSG00000153113 Gene 3ENSG00000047617 3 6 bbENSG00000158106 (centrosome) 4, tex33 5 Ciliopathy SurveyENSG00000127249ENSG00000203965 3 hspg2 3, 4, 5, 6 castENSG00000197766 ano2ENSG00000065883 RFX rhpn1 2 axonemeENSG00000249481 3 efcab7 atp13a4 4, 5ENSG00000107404 bb 6, 8 Ciliopathy Survey ENSG00000150764ENSG00000161202 3 cdk13 cfdENSG00000124253 spats1 3ENSG00000100889 7ENSG00000103254 Ciliopathy Survey 7 dvl1ENSG00000182492 3ENSG00000118492 3 basal body bb 2, dixdc1 4, 5, 7ENSG00000171150 RFX dvl3 SysCilia RFX Ciliopathy Survey pck1 4, 5 fam173aENSG00000159842 pck2 Rfx2ENSG00000197603 3 adgb 4, Rfx2 5, bgn 6ENSG00000156313 RFX 6, 8 socs5 Rfx2ENSG00000141665 c5orf42 comp225531_c0 2, 4, tz 5 abr comp225531_c0ENSG00000078674 Ciliome2 rpgr 4, PGBM 5, 6, Rfx2 FoxJ1 8 RFX basal body PGBM comp225550_c0 fbxo15ENSG00000176058ENSG00000126883 3 Ciliopathy Survey 8 AT133 Rfx2 centriolar comp225554_c1 comp225559_c0 RFX Rfx2 Rfx2 RFX PLMN CDK12 nup214 tprn Rfx2 transition zone comp225562_c0 RFX SysCilia comp225562_c0 comp225568_c0 DVL3 comp225568_c0 Rfx2 comp225568_c0 DVL3 comp225579_c0 PCKGM PCKGM PCKGM RFX LRIG3 comp225592_c0 BCR Rfx2 comp225632_c0 NU214 comp225625_c0 NA 246 Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Axonemal, Basal Body, Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 29.938 1.77e-3463.043 134 1.98e-91 266 124.737 8.12e-13 × 0.787 × 72 Transport, Axonemal, Basal 33.186 0.21342.094 4.56e-134 × 448 075.38555.43536.058 673 1 2.41e-61 036.33 7.06e-8322.519 × 1.72e-47 0.486 2.09e-24 218 713 265 × 161 112 0.51444.042 1 1 × × 1 1 0 × 60.471 × ×67.816 584 043.026 3.89e-34 Axonemal, Basal Body, 36.203 3.58e-61 550 140 1 062.264 7.88e-14 208 × 181538.462 1 1 0.74968.367 69.7 2.81e-100 2.05e-81 × × 75.586 × 0.251 1 31552.535 Transport, Axonemal, Basal 263 Transport, Axonemal, Basal × × 0 0 1 Transport, 0.25 Axonemal, Basal 788 × × 648 0.75 1 × × DE Info → → → → ↑ ↑ ↑ ↑ → ↑ ↑ ↑ ↑ ↑ → → ↓ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp225642_c0 CCD78 comp225643_c0 ARL6 comp225659_c0 CC146 comp225659_c0 RUVB1 comp225675_c0 LBN comp225675_c1 TTC6 comp225679_c0 IFT52 comp225719_c0comp225719_c0 RIPL1 RIPL1 comp225732_c1 IMA3 X X X thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 RFX FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 SysCilia Cilia Proteome FoxJ1 comp225647_c0 USH2A ×SysCilia Ciliopathy Survey 37.444 0 3496SysCilia CilDB Ciliopathy Survey Ciliome2 1SysCilia × Ciliopathy Survey Signaling, Transport, SysCilia Ciliopathy Survey FoxJ1 Localization Known Ciliarycilium cytosol transition TF zone inter- bbs - ift-associated centrosome stereocilium ciliary membrane cytosol nucleus axoneme (proximal) - inv compartment cilium ift ift-b body centriole bb body bb ttc6 RFX ift52 basal body Name ccdc146 CilDB Ciliome2 Ciliome RFX fzp686j19100 5, 7 7 7, 8 Ensembl IDENSG00000162004 Source 2, 3, 4, 5ENSG00000113966 Gene 1, 6, ccdc78 7, 8ENSG00000183347ENSG00000042781 2 arl6 2, 8 basal body ENSG00000074964ENSG00000135205 gbp6 ush2a 3 1, 2, 3, 4, cilium ENSG00000175792 4, 5 arhgef10lENSG00000182095ENSG00000172594 3ENSG00000196814 3ENSG00000173040 3 RFX 6, 8 ruvbl1 tnrc18ENSG00000139865 smpdl3a fam125b 2, 3, 4, 5, evc2ENSG00000101052 cilia membrane 1, 4, 5, 6, ENSG00000120915 FoxJ1ENSG00000073910 3ENSG00000188026 3 6, 8ENSG00000150977 Rfx2 2, 6, 8 comp225643_c0 ephx2ENSG00000164306ENSG00000166473 3 rilpl1 fry 4, ATLA2 5ENSG00000186432 RFX cilium basal ENSG00000110422 rilpl2 3 comp225658_c0 3 cilium Rfx2 basal Rfx2 pkd1l2, dk- ccdc111 Rfx2 ARHGA kpna4 hipk3 comp225663_c0 comp225667_c0 comp225669_c0 TNC18 ASM3B F125B Rfx2 Rfx2 Rfx2 comp225697_c0 comp225707_c2 RASF8 Rfx2 Rfx2 FRYL comp225729_c1 CC111 comp225732_c1 comp225737_c1 IMA3 HIPK2 247 Centrosome, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Centrosome Found Category Score EValue BitScore Weighted tity Per- cent Iden- 51.891 2e-16828.008 485 1.25e-106 38242.93 1 0.109 × ×38.954 0 Axonemal 1.77e-12332.619 Axonemal, Basal Body, 41.195 1334 1.21e-163 43465.314 0.381 547 0.124 037.427 0 ×53.091 0.156 × 802 Basal Body 1031 × 0 055.056 0.229 5.96e-6729.123 710 1 × 23.762 852 6.31e-60 8.18e-14 20857.143 ×27.903 5.23e-44 1 227 75.9 1.06e-56 1 Transport 47.579 × 1 0.749 161 × 0.251 202 × × 045.589 × 1 1 117536.615 0 × 62.63 6.81e-109 × 3.57e-12660.772 1162 381 150.157 375 3.8e-9184.795 × 0 2.82e-100 1 0.326 1 Transport, Axonemal, Basal 32381.287 286 775 ×58.192 4.83e-98 × × 1.35e-66 Central Pair 0.504 0.674 1 281 227 × × × 0.496 0.444 Regulation, Basal Body, × × Basal Body, Centrosome DE Info → ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↓ ↑ ↑ ↑ → ↑ ↑ ↓ ↓ ↓ ↑ ↑ ↑ → Name Component ID Local Gene (continued) comp225739_c2 CCD65 comp225741_c0 CROCC comp225741_c0 CROCC comp225741_c0 CROCC comp225751_c0 KIFA3 comp225762_c0 IQCA1 comp225765_c0comp225767_c0 ALKB7 comp225782_c0 FHAD1 comp225786_c1 CLHC1 IF140 comp225788_c0 CT026 comp225793_c1comp225800_c0 SAE2 comp225803_c2 CAS comp225803_c2 CETN2 CETN2 X X X X X X thy Cil- iopa- action RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Cilia Proteome Cilia Proteome Ciliome Ciliome Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Ciliome2 Cilia Proteome Ciliome CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Localization Known Ciliarydynein assembly centriole centriole- TF inter- centriole cohesion proximal end cilium ift ift-a radial spoke CilDB Ciliopathy Survey ciliogenesis crocc ciliary root bb ift140 basal body Name kifap3 ift-kinesin CilDB Ciliopathy Survey cfap61 ccdc65 axonemal c20orf26, iqca1, iqca Ciliome2 Cilia Proteome 5, 6, 7 5, 6, 8 6 7 5, 6, 7, 8 6, 7 Ensembl IDENSG00000139537 Source 1, 2, 3, 4, ENSG00000126001 Gene 6, 8ENSG00000058453 1, 2, 3, 4, basal body ENSG00000008277 4, 5ENSG00000226321ENSG00000004700 2ENSG00000075945 2 1, 2, 4, 5, adam22ENSG00000075420ENSG00000132321 ac104809.3 3 2, 3, 4, 5, recqlENSG00000125652 4, 5ENSG00000142621 fndc3bENSG00000162592 2, 7 4, 5ENSG00000007545ENSG00000162994 3 alkbh7 4, 5ENSG00000187535 fhad1 1, 2, 3, ccdc27 4, ENSG00000089101 cramp1l 1, 2, clhc1 4, 5, ENSG00000162804 RFX ENSG00000119640 2 FoxJ1 4, 5ENSG00000126261 Ciliome2ENSG00000130940 3 4, FoxJ1 5ENSG00000147400 1, 2, 6, sned1 7 acyp1 comp225741_c0 Rfx2ENSG00000177143ENSG00000177030 6 casz1 3 CROCC comp225741_c0 cetn2 FoxJ1 RFX centriole CROCC comp225756_c0 RFX cetn1 deaf1 Rfx2 centriole comp225767_c0 FND3A RFX FHAD1 Ciliopathy Survey comp225777_c0 FoxJ1 CRML RFX Rfx2 RFX comp225791_c0 NOTCH Rfx2 comp225793_c1 SAE2 comp225803_c3 RN150 248 Transition Zone Body, Centrosome, Other Organells Centrosome, Ciliary Membrane, Other Organells Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 45.50948.081 6.1e-87 9.08e-12239.62740.541 284 41542.334 5.42e-13942.23 6.89e-156 046.667 0.556 2.8e-153 434 2.16e-05 472 1 × 590 0.479 47731.919 48.5 × 0.521 6.85e-109 × Basal Body, × 0.111 Centrosome 69.479 1 388 161.86 × × 49.598 × 0.889 3.59e-87 046.058 1.52e-99 Axonemal, Basal Body, 43.33 2.07e-47 ×31.616 277 608 1.19e-71 340 Transport,28.814 Axonemal, 184 Basal 37.809 0 8.64e-09 0.313 0.687 0.26736.653 1.52e-50 262 0.145 2.88e-115 × × × 59.3 748 × 196 38562.682 1 0.232 0.588 0.76859.456 × × 43.523 × 0 1 5.18e-51 × Basal Body, Centrosome 069.149 × 2203 9.21e-85 169 2108 Regulation, Basal49.805 Body, 0.511 8.08e-69 30544.755 0.489 × 143.779 5.03e-13542.464 1.38e-93 260 × × 7.59e-156 1 459 0.36241.206 286 × 467 1.58e-27 0.638 × Basal Body, Centrosome, × 50.814 118 1 1 1.04e-100 × × 320 1 0.212 × × Centrosome Transport, Axonemal, Basal DE Info → ↑ ↑ ↑ ↑ ↑ ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ → ↑ ↑ → ↑ ↑ ↑ → ↑ → → Name Component ID Local Gene (continued) comp225806_c0 KIF24 comp225815_c0 TRIPB comp225815_c0comp225816_c1 TRIPB SYWC comp225821_c0 FBF1 comp225827_c0 CEP97 comp225832_c0comp225848_c1 EMAL6 comp225854_c0 SPAT4 comp225858_c0 CENPJ FGD6 comp225867_c0comp225868_c0 K1841 CEP72 comp225870_c0 XPP1 X X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey SysCilia CilDB Ciliopathy Survey Ciliome2 Localization Known Ciliarydistal end TF inter- axoneme (proximal) - inv compartment ift-associated appendages distal end inhibition of ciliogenesis satellite centriolar satellite cilium ift ift-b ift57 basal body Name kiaa2012 7, 8 Ensembl IDENSG00000170153 SourceENSG00000186638 2 4, 5, 6,ENSG00000075391 8ENSG00000171777 3ENSG00000068831 Gene 3ENSG00000079691 2ENSG00000072840 3 rnf150 6, 8 centriole bb rasal2ENSG00000100815 rasgrp4 rasgrp2 6, 8 lrrc16aENSG00000140105 evc 4, 5ENSG00000163291ENSG00000100348 transition zone 2ENSG00000106355 trip11 7ENSG00000114867 7ENSG00000188878 golgi golgi 3 4, 5, wars 6,ENSG00000160219 8ENSG00000185950 3 paqr3ENSG00000182504 3 4, txn2 5, 6, 8 lsm5 fbf1 eif4g1ENSG00000214595 centriole FoxJ1 distal 4, 5, 7ENSG00000165521 gab3ENSG00000150628 2 centrosome irs2 Rfx2 bb Rfx2 3, 4, 5, 7 FoxJ1ENSG00000151849 Rfx2 eml6 comp225803_c3 4, 5, 6, 8 spata4ENSG00000182329 eml5 4, RN150 5ENSG00000180263 comp225807_c1 comp225810_c0 comp225810_c0ENSG00000179262 3ENSG00000162929 7 comp225811_c0 centrosome bb DAB2P 3, ac079354.1, 4, RFX 5 GRP3 GRP3 ENSG00000112877 SysCilia Ciliome2 Ciliopathy Survey FoxJ1 LR16A 6, 8 kiaa1841 RFX fgd6 rad23a CiliaENSG00000114446 Proteome Rfx2 1, 4, 5, 6, Rfx2 comp225816_c1 Rfx2 centriolar RFX FoxJ1 comp225817_c0 SYWC comp225817_c0 comp225817_c0 RFX comp225823_c0 IF4G3 IF4G3 IF4G3 comp225823_c0 comp225832_c0 PKHA7 IRS2B EMAL6 RFX Rfx2 comp225858_c0 comp225859_c0 FGD6 RD23B 249 Centrosome, Ciliary Membrane, Other Organells Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Body, Centrosome, Ciliary Membrane, Transition Zone, Other Organells Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 56.2740.55342.947 3.98e-149 2.7e-76 0 44537.20932.632 1.57e-47 276 2.19e-06 743 0.29564.894 163 1.9e-38 × 0.493 44.3 167.77853.772 × 15852.122 1e-35 × 151.232 142.84 1.11e-57 0.515 Axonemal, Basal 0 × Body, 149 × 0 × 196 809 0.485 0 Basal 798 Body, Centrosome × 0.50350.181 697 1 0.497 6e-171 × ×31.571 × 1.04e-86 1 521 Basal Body 39.13728.188 × 9.46e-147 28742.264 3.92e-4133.396 1.19e-104 1 Axonemal, Basal Body, 441 0.394 1.13e-7742.608 162 × 371 × 0.606 280 Axonemal, Basal Body, Basal Body, × 0 Centrosome 0.5746.407 1 7.16e-83 0.43 Basal Body × × 41.786 692 × 28663.836 0.41648.693 0 0.172 2.84e-97 ×24.006 031.25 3.37e-52 × 685 Transport, 294 Axonemal, Basal 1.59e-35 Basal Body, Centrosome, 514 193 0.412 0.364 127 0.636 × 0.603 × 0.397 × Axonemal × × DE Info → → ↓ → → ↑ ↑ → → ↑ ↑ ↑ → → → ↑ ↑ ↑ ↑ ↑ → → ↑ ↑ Name Component ID Local Gene (continued) comp225876_c0 GCP6 comp225886_c0comp225890_c0 CE048 CE164 comp225904_c0comp225907_c0 SAV1 HEAT2 comp225916_c0comp225916_c0 SCLT1 comp225930_c0 CCD22 comp225931_c0 F179B OCRL comp225931_c0comp225935_c0 OCRL comp225937_c1 DCUP PX11B X X X X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey Ciliopathy Survey SysCilia Ciliopathy Survey RFX SysCilia Ciliopathy Survey Ciliopathy Survey Localization Known Ciliarycentriole cytosol bb TF inter- distal appendage cytosol axonemal dynein complex assembly axonemal dynein assembly distal appendage axoneme signalling sclt1 centriole bb tex43 Name dnaaf5 cep164 centriole bb 8 8 Ensembl IDENSG00000108039 SourceENSG00000081800 7ENSG00000128159 2 6, 8ENSG00000164122 Gene ENSG00000196900 2 xpnpep1 4, 5ENSG00000110274 slc13a1 tubgcp6 3, 4, 5, 6, basal body ENSG00000100095ENSG00000145087 2 c5orf48, ENSG00000147155 asb5 3ENSG00000151748 3ENSG00000164818 6 Ciliome2 4, 5, 6, 8 sez6l stxbp5l heatr2, ebpENSG00000111490 sav1 8 bbENSG00000151466 3, 4, 5, 6, ENSG00000101997ENSG00000174611 tbc1d30 6 FoxJ1ENSG00000198718 7 basalENSG00000189350 Ciliopathy body Survey 7 4, 5ENSG00000122126 comp225870_c0 6, FoxJ1 8 SysCilia ccdc22 comp225870_c0 fam179bENSG00000112742 bb fam179a (centrosome) XPP1 ky 8 Ciliopathy Survey S13A3 ENSG00000204084 FoxJ1 Rfx2 comp225877_c0 ocrl 6ENSG00000126088 ciliumENSG00000132122 Rfx2 bb 7 ASB13 4, 5ENSG00000140527ENSG00000166821 ttk Ciliome2 3, 7 comp225890_c0 4, inpp5b 5 comp225903_c0 centrosome axoneme - spata6 comp225903_c0 CE164 urod STXB5 SysCilia wdr93 comp225908_c0 pex11a STXB5 TBC30 RFX comp225930_c0 F179B comp225925_c0 comp225931_c0 RFX KY TTK Rfx2 RFX comp225935_c0 comp225937_c1 WDR93 DCUP 250 Central Pair, Centrosome, Ciliary Membrane, Other Organells Centrosome, Other Organells Centrosome, Ciliary Membrane, Transition Zone, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- 46.4 2.57e-101 305 0.093 × cent Iden- 44.869 1.31e-128 42537.71 4.46e-122 0.3437.172 2.67e-124 × 41151.365 Axonemal, Basal 414 Body, 0.32966.29333.181 × 0.331 0 039.939 × 0 1.51e-74 726 361539.749 793 238 1 0 150.27833.936 6.81e-119 1 × 1 × 1.76e-13628.723 739 × 372 × 2.61e-11 454 Axonemal,58.091 Basal Body, 67.8 1.43e-90 1 0.8754.084 157.525 8.04e-168 × × 0.13 267 × 27.123 Axonemal, 502 Basal Body, 5.06e-22 Basal × Body 038.779 2.01e-115 123.016 99.8 801 164.423 2.13e-10 × 358 0.07564.886 × 0.606 Axonemal,22.66 63.2 Ciliary 0.271 0 × 1.68e-30 × × 0.048 0 82265.217 Basal Body 131 × 47.753 1.59e-99 324661.223 1 335 0 1 1 × 0 0.102 × 1134 × × 1516 0.345 0.461 × × DE Info ↑ ↑ ↑ ↓ ↑ ↑ ↑ ↑ → ↑ ↑ → → ↓ ↓ ↓ ↓ → ↑ → ↑ ↑ ↑ ↑ Name Component ID Local Gene (continued) comp225942_c0 CP135 comp225942_c0comp225942_c0 CP135 CP135 comp225946_c0comp225957_c0 C2CD3 comp225959_c0 WDR88 MKS3 comp225965_c0comp225965_c0 QN1 comp225966_c2 K1377 comp225974_c0 RAB23 comp225974_c0 KIF2A comp225974_c0 LMNA comp225981_c1 LMNA DGKD comp226028_c0 RIBC2 comp226028_c0 SMC1A X X X X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 RFX FOXJ1 Rfx2 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey SysCilia Ciliopathy Survey RFX SysCilia CilDB Ciliopathy Survey Localization Known Ciliarycentriole basal body central pair bb centriole duplication TF inter- distal centriole body tz bb distal end Ciliopathy Survey RFX ribc2 CilDB Ciliome2 RFX Name centrosome cep162 tmem67 axoneme basal kiaa1377 8 6, 8 7 Ensembl IDENSG00000174799 Source 1, 4, 5, 6, ENSG00000135926 Gene 3, 4, 5ENSG00000135951 2, 4, 5ENSG00000177200 tmbim1ENSG00000158486 3ENSG00000168014 1, 3, 7 tsga10 4, 5, 6, 8ENSG00000166359 3, 4, 5 dnah3ENSG00000164953 c2cd3 chd9 1, 3, basal 4, body 5, bb wdr88ENSG00000099904ENSG00000135315 2 4, 5, 6ENSG00000110318 2, 4, 5 CilDB Ciliome2ENSG00000112210 kiaa1009, 6, zdhhc8 8ENSG00000140859ENSG00000068796 cep126, RFX 7 4, 5, 6,ENSG00000126337 7 Rfx2 4, RFX 5ENSG00000160789 rab23 2, 3ENSG00000171431 axoneme kif2aENSG00000077044 2 Cilia Proteome bb Ciliome Rfx2 4, 5ENSG00000197694 comp225944_c0ENSG00000178971 SysCilia 3 Ciliopathy Survey ENSG00000128408 3 FoxJ1 RFX 1, 3, 4, 5, DYH3 ENSG00000112977 krt20 dgkd CiliopathyENSG00000077935 Survey Ciliome2 7 comp225943_c0 4, 5ENSG00000072501 RFX comp225964_c0 2 ctc1 CHD7 ZDHC8 smc1b dap smc1a RFX Rfx2 Cilia Proteome Ciliome FoxJ1 RFX comp225968_c0 Rfx2 FoxJ1 Rfx2 KIFC3 comp225974_c0 RFX comp226028_c0 comp226010_c0 LMNA comp226025_c0 SMC1A SPTCA CTC1 comp226028_c0 SMC1A 251 Centrosome, Ciliary Membrane, Transition Zone, Other Organells Centrosome, Transition Zone, Other Organells Centrosome, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 28.447 7.05e-8047.12649.545 2.82e-14144.189 1.76e-55 293 41678.571 189 032.461 4.55e-61 132.507 5.34e-18 174.444 2.2e-84 × 748 1.47e-39 186 152.044 88.6 × Basal53.902 Body × 30044.079 152 0.22854.179 1 1 1.2e-32 039.085 0.772 0 × 0.064 × × 2236 × 0 135 × Basal 0 Body 77656.0932.537 0.936 140867.188 1.31e-73 1585 1 × 32.59 0 1 × 249 4.32e-62 0 161.671 × 135.874 3.71e-172 721 × 23.34 1.93e-79 235 652 × 1 4.31e-37 51047.76 Axonemal, 279 Basal 1 Body, × 155 1 1 × 35.616 1 1.17e-101 0 × × 161.179 × 131.33 Basal Body × 319 823 1.28e-3627.822 × 064.341 5.88e-3628.287 7.73e-127 Basal Body 130 1 7.65e-160 1 530 134 36130.317 × 0.492 534 × 4.19e-22 0.508 Basal × Body 62.141 Axonemal, 1 Basal Body, 66.854 1 1.35e-166 × 105 1 2.27e-85 × 42.815 × 491 2.02e-141 × 249 1 Axonemal, Basal 449 Body, × 1 1 × 1 × × DE Info ↑ ↑ → ↑ → ↑ ↑ ↓ ↓ → ↑ → ↑ → ↑ ↓ ↑ → ↓ ↑ ↑ ↑ → → → → ↑ ↑ → → → Name Component ID Local Gene (continued) comp226029_c0comp226040_c0 KZ TTLL5 comp226046_c0 SPTCB comp226066_c0 CE290 comp226077_c0comp226080_c1 PAK1 ANR26 comp226090_c0comp226091_c1 CNTRL comp226092_c0 FTM POC5 comp226113_c0comp226116_c0 RTTN comp226129_c0 NA UBE2C X X X X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey RFX Ciliome2 FoxJ1 comp226107_c0 CG062 Localization Known Ciliary TF inter- centrosome tz centriolar satellite Name basal body tz SysCilia Ciliopathy Survey RFX c7orf62 8 Ensembl IDENSG00000103995 Source 4, 5, 6ENSG00000162877ENSG00000180694 3ENSG00000119685 3 Gene 3, 4, 5, 6 cep152ENSG00000143183 bbENSG00000166145 (centrosome) 3ENSG00000187720 pm20d1 3 Ciliopathy SurveyENSG00000186409 tmem64 2 ttll5 4, 5ENSG00000173898 bbENSG00000172534 3ENSG00000120896 3 tmco1ENSG00000085563 RFX 2 spint1ENSG00000198707 7 ccdc30 thsd4 4, 5, 6, 8 Ciliopathy Survey sptbn2ENSG00000145780ENSG00000075240 hcfc1 2 cep290 sorbs3ENSG00000149269 3 abcb1a basal 4, body 5ENSG00000107890 RFX 4, 5, 6ENSG00000163681ENSG00000181409 3 fem1cENSG00000119397 gramd4 3 3, 4, pak1 5, ankrd26 Rfx2 6 Rfx2 CiliaENSG00000103494 Proteome bb Ciliome2 3, 4, 5, 6, slmapENSG00000152359 4, aatk 5, Rfx2 6ENSG00000197329 bb Rfx2ENSG00000164645 comp226034_c1 Ciliome 2 FoxJ1 RFX FoxJ1 comp226039_c0 Ciliopathy 2, Survey 7ENSG00000232312ENSG00000115137 3 P2012 Rfx2ENSG00000176225 TMM64 poc5 3 Rfx2 4, 4921511h03rik, 5, comp226041_c0 bb 6, distal 8 Ciliopathy end Survey comp226044_c0ENSG00000110427 comp226057_c0 RFX peli1 comp226044_c0 2, 4, TMCO1 5 FoxJ1 Ciliopathy Survey gpank1ENSG00000141576 dnajc27 SRBS1 comp226046_c0 PPN1 ENSG00000175063 rttn PPN1 3 RFX comp226062_c0 comp226054_c1 Rfx2 kiaa1549l 4, 5 basal bodyENSG00000083290 bb RFX SPTCB 3 RFX HCFC1 comp226068_c2 MDR1 SysCilia Ciliopathy Survey Ciliome rnf157 Rfx2 RFX FEM1C comp226071_c0 Rfx2 ulk2 Cilia Proteome GRAM4 comp226081_c2 FoxJ1 comp226084_c1 SLMAP RFX LMTK1 Rfx2 Rfx2 comp226095_c0 PELI1 comp226107_c0 comp226112_c1 Rfx2 RFX GPAN1 DJC27 Rfx2 comp226127_c1 RN157 comp226130_c0 ULK2 252 Membrane, Transition Zone Body, Ciliary Membrane Membrane, Transition Zone Body, Ciliary Membrane Body, Ciliary Membrane Body, Ciliary Membrane Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- cent Iden- 33.855 2.73e-12733.136 4.18e-59 41870.918 225 0.65 069.34231.663 × 0.35 3872 × 037.401 059.928 0.681 1810 × 1030 066.056 0 0.319 Axonemal, Ciliary 66.592 188127.381 × 5612 0 1 1.15e-17 043.811 × 0.311 6197 1 8457.264 Signaling, × Axonemal, 6219 Basal × 41.656 0.344 0 Axonemal, Ciliary 64.14 0.345 × 0 1 0 71739.93 × Axonemal 30.728 × 2.2e-176 68428.165 0 2.92e-95 0.512 627 1.63e-100 0.488 586 × 276039.122 342 0.145 36341.854 × 2.43e-174 0.136 × 0.07963.852 0.64 0.113 0 2.1e-165 × 52051.542 Signaling, × Axonemal, × Basal × 4.54e-173 2852 468 Signaling, Axonemal, Basal Signaling, Axonemal, Basal 49746.418 1 0.88764.767 × × 0.287 1 1.43e-91 067.647 Centrosome × × 64.549 8.79e-9765.873 292 627 Axonemal,57.785 Basal Body, 6.42e-107 318 0 0.168 0.362 0 318 0.183 × × 857 1145 × 0.428 1 0.572 × × × DE Info ↑ ↑ ↑ ↑ ↑ → ↑ ↑ ↑ → → → → → → → ↓ ↓ ↑ ↓ → → → → → → → Name Component ID Local Gene (continued) comp226133_c0comp226133_c0 FBW10 comp226135_c0 PTN13 DYH2 comp226148_c0comp226148_c0 DYHC comp226149_c0 DYHC comp226154_c0 F169B WDR47 comp226177_c3comp226181_c0 LRC49 comp226182_c2 OSCP1 XPP3 comp226182_c2comp226182_c2 KTNB1 KTNB1 X X X X X thy Cil- iopa- action FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 FoxJ1 RFX FOXJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 SysCiliaSysCilia CilDB Ciliopathy Survey Cilia Proteome Ciliome CilDB Ciliopathy Survey Ciliome2 SysCilia comp226139_c0SysCiliaSysCilia DS Ciliopathy SurveySysCilia Ciliopathy Survey RFX comp226160_c0 comp226160_c0 MYO7A MYO7A comp226167_c0 FAT Localization Known Ciliaryaxonemal dynein TF inter- cilium stereocilium axonemal dynein complex dynein stereocilium stereocilium cilium stereocilium satellite nucleus basal body mitochondria Name dnah2 axoneme dnah9 axonemal 5, 6, 7, 8 7 Ensembl IDENSG00000171931 Source 4, 5ENSG00000170324 4, 5ENSG00000183914 Gene 1, 2, 3, 4, fbxw10ENSG00000167978 frmpd2ENSG00000107736 3 8ENSG00000105429ENSG00000105877 3 1, 2, 6, 8ENSG00000007174 srrm2 cdh23 1, 4, 5,ENSG00000187775 6, dnah11 connecting ENSG00000198780 1, 2, megf8 7 axoneme 3, 4, 5ENSG00000085433 4, 5ENSG00000132394 dnah17 fam169aENSG00000091536 3 8ENSG00000137474 RFX 8ENSG00000128833 wdr47 RFX ENSG00000266714 3ENSG00000150275 2 eefsec 8 CilDB Ciliome2 ciliaryENSG00000196159 tip myo7aENSG00000137821 7 ciliary 4, tip myo5c 5, 6ENSG00000116885 Rfx2 myo15b pcdh15 1, 4, 5ENSG00000196236 FoxJ1 connecting 6, 8 Rfx2 lrrc49 fat4ENSG00000140854 centriolar oscp1 comp226135_c0 4, 5ENSG00000150433 xpnpep3 RFX 4, comp226148_c0 5ENSG00000149084 mitochondrium DYH2 comp226142_c0ENSG00000129250 3ENSG00000054523 3ENSG00000124171 katnb1 DYHC RFX 3 tmem218 MEGF8 3 Rfx2 CilDB hsd17b12 kif1c pard6b Rfx2 FoxJ1 comp226154_c0 RFX SELB comp226160_c0 comp226160_c0 MYO7A MYO7A RFX RFX comp226167_c0 Rfx2 FAT Rfx2 Rfx2 Rfx2 comp226182_c2 comp226187_c1 KTNB1 comp226187_c1 comp226191_c0 KIF1A PAR6B KIF1A 253 Membrane, Transition Zone Centrosome, Ciliary Membrane, Transition Zone Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- 40.221 8.36e-6829.701 6.13e-4042.955 215 4.58e-6539.855 165 220 1 056.16 1 × 1 2747 × 56.236 0 × 41.071 0.246 1.04e-107 0 4204 ×41.327 338 4206 0.377 Axonemal, Ciliary 045.122 × 0.37740.071 1 2.43e-3271.32 Axonemal, Central × Pair, 64528.035 × 1.29e-16 119 033.032 0 80.5 1 105528.652 122.905 7.21e-06 578 × 021.131 3.03e-20 × 22.162 7.93e-15 1 Axonemal, 1 Ciliary 50.429.25 8.15e-07 1349 99.8 × 1 2.92e-16 × 79.7 0.03151.985 52.8 0.827 0.061 Basal × Body,32.529 Transition Basal Zone Body 82.8 0.049 × × 0.032 × Basal Body 0 × 31.067 0.05 0 × 29.421 1.04e-141 2.1e-47 1576 × 25.989 465 692 1.66e-161 0.95 18729.705 0.402 0.598 529 5.55e-155 × × × 541 1 1 × × 1 Transport, Basal Body × Basal Body, Centrosome DE Info ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → → → → ↑ Name Component ID Local Gene (continued) comp226194_c1comp226196_c0 DNAS1 comp226199_c2 K1731 comp226201_c0 ZDHC1 DYH6 comp226201_c0 DYH6 comp226202_c0comp226203_c0 CC176 CCD40 comp226216_c0comp226220_c0comp226221_c0 GRDN STK4 CC104 comp226224_c0 FSIP1 comp226229_c0 TRPM2 comp226231_c0comp226236_c0 CENPF comp226239_c0 EFCB6 CE192 X X X X X thy Cil- iopa- action FoxJ1 RFX FOXJ1 FOXJ1 RFX FOXJ1 Rfx2 FoxJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 FOXJ1 Rfx2 FOXJ1 Rfx2 FOXJ1 FoxJ1 FOXJ1 FoxJ1 Table B.2.1: SysCilia CilDB Ciliopathy Survey Ciliome2 Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Ciliome CilDB Ciliome2 Cilia Proteome Ciliome Cilia Proteome Ciliome SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliome Ciliopathy Survey Ciliopathy Survey Localization Known Ciliary TF inter- axonemal dynein complex assembly central pair axonemal dynein complex assembly transition zone Ciliopathy Survey RFX ift-associated? centriole duplication bbof1 Name dnah1 axoneme dnah6 axoneme ccdc40 axoneme ccdc104 ccdc176, kiaa1731 5, 6, 7, 8 6, 7, 8 27 6, 7, 8 dnah67 gm9195 FoxJ1 comp226201_c0 DYH6 comp226222_c0 GOGA4 Ensembl IDENSG00000163687 Source 2, 3ENSG00000166004 4, 5ENSG00000159714 Gene 4, 5ENSG00000114841 dnase1l3 1, 2, 3, 4, cep295, ENSG00000115423 zdhhc1 1, 2, 4, 5, ENS- MUSG00000052861 ENSG00000119636 1, 2, 4, 5, ENSG00000141519 1, 2, 4, 5, ENSG00000135750ENSG00000115355 3ENSG00000104375 6ENSG00000163001 6 3, 4, 5, 6 Rfx2 ENS- MUSG00000109446 ENSG00000135525 kcnk1 ccdc88a cfap36, ENSG00000167880 3 bbENSG00000162614 RFX 3 stk3ENSG00000162520 3ENSG00000150667 bb 2 3, 4, 5ENSG00000096433 map7ENSG00000142185 Ciliopathy 3 Survey 2, evpl 4, 5 nexnENSG00000130529 Ciliopathy fsip1 Survey syncENSG00000117724 3 6ENSG00000186976 trpm2 2, itpr3 4, 5, 7ENSG00000101639 6 trpm4 efcab6 cenpf Rfx2 bb bb (centrosome) comp226215_c0 Rfx2 Ciliome2 Rfx2 KCNK1 Rfx2 RFX FoxJ1 RFX comp226222_c0 Rfx2 RFX comp226222_c0 comp226222_c0 comp226222_c0 GOGA4 Rfx2 GOGA4 GOGA4 GOGA4 comp226224_c0 comp226229_c0 ITPR1 TRPM2 254 Membrane, Transition Zone Centrosome, Ciliary Membrane, Transition Zone, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Body, Centrosome, Ciliary Membrane, Other Organells Found Category Score EValue BitScore Weighted tity Per- 38.5 9.29e-113 376 0.324 × cent Iden- 35.019 1.67e-7443.411 6.73e-18 24726.203 0.74629.626 84 066.61 × 0.254 038.517 733 × 0 94744.189 0.436 0 5715 0.564 × 022.974 × 78652.747 Regulation, Transport 3.2e-50 5.02e-26 1 134938.248 0.676 2.15e-86 × 200 103 × Axonemal, 1 Ciliary 290 Central Pair, Centrosome 138.594 × 146.226 3.26e-120 × Central Pair, 1 Centrosome × 63.682 042.652 × 39050.199 1.54e-131 1.88e-83 0 554 Centrosome 455 1 250 771 1 × 1 × Transport 1 1 × × × × 0 0× 0 0 0 () 0 Axonemal, Basal Body, () Transport, Axonemal, Basal DE Info ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ → → → → ↑ ↑ ↑ Name Component ID Local Gene (continued) comp226241_c1comp226241_c1 CC173 comp226242_c0 CC173 comp226242_c0 SYNE1 comp226243_c2 SYNE1 comp226244_c0 DYH10 comp226247_c0 CL055 HYDIN comp226257_c0comp226260_c0 THIO comp226263_c0 CC112 DYHC1comp226266_c0 × 69.298 PCX1 comp228549_c0 9.2e-96comp278575_c0 ENKUR 316 0.042 DOPR2 × × 38.482 4.3e-71 234 1 × X X X X X X X thy Cil- iopa- action FOXJ1 FoxJ1 FOXJ1 FoxJ1 RFX FOXJ1 FoxJ1 RFX FOXJ1 Rfx2 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 FOXJ1 RFX FOXJ1 FoxJ1 FOXJ1 Table B.2.1: SysCilia Ciliopathy Survey Rfx2 SysCilia CilDB Ciliopathy Survey Ciliome2 SysCilia CilDB Ciliopathy Survey Ciliome2 Cilia Proteome Ciliopathy SurveyCiliome2 Ciliome RFX CilDB Ciliopathy Survey RFX RFX Proteome Ciliome SysCilia Ciliopathy Survey SysCiliaSysCilia Ciliopathy Survey × 0 0 0 () Transport, Axonemal, Basal Localization Known Ciliaryremodelling TF inter- axonemal dynein central pair function satellite transport axoneme cilium membrane hydin central pair enkur CilDB Ciliome2 Cilia Name dnah10 axoneme fam154b 6, 7, 8 5, 6, 7, 8 7 Ensembl IDENSG00000154479 Source 2, 4, 5, 7ENSG00000155749 4, 5ENSG00000054654 ccdc173 Gene 3, 6, 8ENSG00000131018 2, 3ENSG00000197653 als2cr12 1, 2, 4, 5, syne2ENSG00000188596ENSG00000197119 6, trafficking 7 actin ENSG00000157423 syne1 3 Ciliome2 1, 2, 3, 4, ENSG00000084674ENSG00000136810 cfap54 2 slc25a29 4, 5 central pairENSG00000164221 4, 5, 6ENSG00000197102 RFX Cilia ProteomeENSG00000188659 Ciliome 1 Ciliopathy Survey 2, 4, 5, 7 apob ccdc112ENSG00000068724 txn Rfx2 ENSG00000138036 3 centriolar 1, 4, 5, saxo2, ENSG00000135749 6 RFX ENSG00000132676 2ENSG00000151023 3 dync2li1 1, 2, 4, 5, iftENSG00000043591 retrograde ttc7a 4, 5ENSG00000184811 pcnxl2ENSG00000143032 3ENSG00000104938 dap3 CilDB 2ENSG00000179270 2 Rfx2 6, 8 adrb1ENSG00000165376 tusc5 8 clec4m c2orf71ENSG00000108753 FoxJ1 comp226244_c0ENSG00000172005 RFX 8 basal body 6, 8 CL055 cldn2 comp226252_c0 basal body hnf1b mal Rfx2 APLP comp226263_c0 cilium periciliary FoxJ1 Rfx2 DYHC1 × comp226265_c0 RFX SysCilia 73.369 comp226266_c1 Rfx2 TTC7B FoxJ1 comp226272_c0 FoxJ1 0 PCX1 SYNE1 7181 comp687803_c0 0.958 comp7365_c0 comp8847_c0 × SYNG1 BARH1 × CLC4E × × 37.5 62.59 34.752 3.68e-08 2.59e-51 6.56e-15 50.1 169 75.1 1 1 1 × × × × 0 0 0 () 255 Centrosome, Ciliary Membrane, Other Organells Membrane, Transition Zone Found Category Score EValue BitScore Weighted tity Per- cent Iden- × 0 0× 0 0 () 0 0 () Axonemal, Ciliary DE Info Name Component ID Local Gene (continued) X X thy Cil- iopa- action RFX FOXJ1 FoxJ1 Table B.2.1: SysCiliaSysCiliaCilDB Ciliopathy Survey Ciliome2 Cilia Proteome × × 0 0 0 0 0 0 () () Ciliary Membrane Axonemal, Basal Body, Localization Known Ciliarymembrane centriole cytosol TF inter- chlamy- domonas flagellarprotein Name cfap161, c15orf26 6, 7 Ensembl IDENSG00000183473 Source 8ENSG00000153575 8ENSG00000235067 Gene ENSG00000269265 1ENSG00000183941 1ENSG00000124529 sstr3 1ENSG00000227739 tubgcp5 1 ciliary ENSG00000197697 1 basal body ENSG00000188987 1ENSG00000269278 1ENSG00000268135 tubb 1 hist2h4aENSG00000224501 ribc1 1, 5 hist1h4bENSG00000204388 1ENSG00000271801 hist1h2be 1ENSG00000231555 tubb 1 hist1h4dENSG00000156206 1 1, 2, 4, 5, lrrc23 ift46ENSG00000263148 hspa1bENSG00000198558 hspa1b 1 CilDBENSG00000182217 CilDB CilDB 1 morn3ENSG00000267807 CilDB hspa1b 1ENSG00000229684 1 CilDBENSG00000215328 1 CilDBENSG00000196230 CilDB 1 1, atp2b3 6ENSG00000232421 hist1h4l CilDB CilDB hist2h4bENSG00000262866 1 CilDBENSG00000196176 1, CilDB 5 actg1ENSG00000224156 1 CilDBENSG00000265594 hspa1a tubb 1 CilDBENSG00000198518 1 tubbENSG00000234475 1ENSG00000269671 axoneme 1ENSG00000232575 tubb 1 CilDB ift20 hist1h4aENSG00000226704 CilDB 1ENSG00000168242 CilDB RFX 1ENSG00000197914 tubb 1 CilDB cep41 Ciliopathy hist1h4e Survey ENSG00000198339 CilDB 1ENSG00000234258 hspa1a 1 CilDB CilDBENSG00000232804 1 rabl2bENSG00000264719 1ENSG00000269666 tubb 1 hspa1l hist1h2biENSG00000269079 CilDB 1 CilDB hist1h4k CilDBENSG00000204390 1ENSG00000268512 hist1h4i 1ENSG00000237724 CilDB 1, hspa1l 5 CilDB CilDB hspa1b hist2h2be 1 CilDB CilDB pgk1 × phb2 CilDB × hspa1l CilDB × CilDB cetn2 × CilDB RFX hspa1a × 0 CilDB 0 0 × × 0 CilDB CilDB × CilDB 0 × 0 0 0 0 × 0 CilDB 0 0 × CilDB CilDB 0 × 0 CilDB × 0 0 0 CilDB 0 0 0 0 0 0 0 0 0 0 × 0 0 () × 0 () () 0 × 0 () 0 0 0 0 × () 0 0 () 0 × × () RFX 0 () 0 0 0 0 () × 0 0 0 () 0 × () × () 0 0 () 0 × 0 0 0 0 × × 0 0 × 0 0 0 × () 0 0 () 0 0 0 0 () × 0 × × 0 × () 0 0 0 × 0 0 0 0 () () 0 0 × 0 0 × × 0 0 () 0 0 0 0 () 0 0 0 × () 0 0 0 × × 0 × 0 0 () () () × 0 0 0 0 0 0 0 0 0 () 0 0 () 0 0 0 () 0 () () 0 0 0 0 0 () () 0 0 () 0 0 () () 0 0 () () () () () 256 Found Category Score EValue BitScore Weighted tity Per- cent Iden- ×× 0× 0 0 0× 0 0 0× 0 0× 0 () 0 0 0 () ()× 0 Axonemal 0 0 0 Basal Body × Axonemal 0 () 0 0 0 () × () 0 0 0× 0 () 0× 0× () 0 0× 0 0 0 0 () 0 0 0 0 () 0 0 () () () DE Info Name Component ID Local Gene (continued) X X X thy Cil- iopa- action RFX FOXJ1 FOXJ1 FOXJ1 Rfx2 FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FoxJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 Rfx2 Table B.2.1: Ciliopathy Survey Ciliopathy Survey Ciliome RFX Localization Known Ciliary TF inter- dynein assembly signalling Name spaca9 c10orf53 Ensembl IDENSG00000187990 SourceENSG00000204389 1ENSG00000197846 1ENSG00000198327 1ENSG00000260842 1ENSG00000236251 Gene 1, 5ENSG00000265590 hist1h2bg 1 6 hspa1a hist1h2bfENSG00000169641ENSG00000163870 hist1h4f 6 6 tekt4ENSG00000167414 hspa1l c21orf59ENSG00000257767 7ENSG00000234828 7 axonemal ENSG00000108578 7 CilDBENSG00000178645 7 luzp1 4, CilDB 5, 7 CilDBENSG00000206044 tpra1 bb (centrosome)ENSG00000144843 CilDB 5 axoneme - 1700007b14rik gng8 4, Ciliopathy 5 Survey CilDB 1700024g13rik, ENSG00000162373 aldh2 CilDB 3, 4, 5 blmhENSG00000270032 ac005841.1ENSG00000214556 5ENSG00000160298 5 adprh 2, bend5 4, 5ENSG00000260489ENSG00000165698 Ciliome2 5 RFX c12orf55 2, c21orf58 4, c17orf98 5ENSG00000205081ENSG00000262277 5ENSG00000230989 5 c9orf96 c9orf9, 2, 4, 5ENSG00000262673ENSG00000264872 5ENSG00000186326 cxorf30 3, 5 4, hsbp1 5 dixdc1ENSG00000249428ENSG00000091482 5 4, 5 RFXENSG00000230873 mapk4 4, mest 5ENSG00000163762 rgs9bp RFX × rp11-503n18.3 3, RFX 4, 5 × × smpx 0 Cilia Proteome × × stmnd1 RFX RFX RFX tm4sf18 0 0 × 0 0 0 0 0 RFX 0 RFX 0 0 0 × 0 RFX 0 0 × RFX 0 0 () × 0 0 () 0 () × RFX × 0 () () RFX RFX Rfx2 RFX 0 0 0 () 0 0 RFX × RFX 0 0 0 × RFX 0 0 0 0 () 0 () × 0 () 0 0 0 () × () × 0 0 0 0 0 × × () × () 0 0 0 0 0 0 0 () 0 0 0 0 () 0 0 () 0 () () () 257 Found Category Score EValue BitScore Weighted tity Per- cent Iden- ×× 0× 0 0× 0 0× 0 0 0× 0 0 0 () × 0 0 0 () 0 0 0 () × 0× 0 () 0 0× 0 () 0 0 () × 0 0× () 0 0 0 0 0× 0 () 0 0 () 0 0 () 0 0 () () 0 () DE Info Name Component ID Local Gene (continued) thy Cil- iopa- action FOXJ1 Rfx2 FoxJ1 FOXJ1 Rfx2 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FOXJ1 FoxJ1 Table B.2.1: Localization Known Ciliary TF inter- Name Ensembl IDENSG00000233493 Source 3, 4, 5ENSG00000260703 2, 5ENSG00000214376 Gene tmem238 3, 4, 5ENSG00000119411 4, 5ENSG00000262176 ttc25ENSG00000140104 5 vstm5 4, 5ENSG00000166455 4, 5ENSG00000262841 bspryENSG00000186567 5 c14orf79 c11orf1 4, 5ENSG00000263046ENSG00000121083 c16orf46 5 CiliaENSG00000268103 Proteome 5ENSG00000176182 5 ceacam19 4, 5ENSG00000204899 cdk7 4, 5ENSG00000270122 cep95ENSG00000171136 RFX 5 RFX dynll2 4, 5ENSG00000262263 mypop fmr1ENSG00000178307 5 4, mzt1 5ENSG00000215912 4, RFX 5ENSG00000264528 ptprnENSG00000114654 rln3 3, 5ENSG00000132139 tmem11 3 styxl1ENSG00000011523 RFX 3 2, 3ENSG00000266766 RFX RFX ttc34ENSG00000167744 3ENSG00000164326 RFX tube1 3ENSG00000261264 ccdc48 3ENSG00000236183 3 RFX gas2l2ENSG00000118242 3ENSG00000260383 RFX 3ENSG00000260119 3 notch2ENSG00000261694 3 RFXENSG00000204131 3 ntf4 RFX cartptENSG00000188522 3 RFX ENSG00000264524 RFX trpv5 3 ly6g6c 3 mreg heatr7a RFX Cilia Proteome trpv6 RFX adck5 RFX nhsl2 fam83g RFX mtmr11 RFX × Rfx2 RFX RFX Rfx2 0 Rfx2 Rfx2 × 0 Rfx2 × 0 Rfx2 × Rfx2 × Rfx2 0 Rfx2 0 0 Rfx2 Rfx2 0 0 Rfx2 () × 0 Rfx2 0 Rfx2 Rfx2 0 Rfx2 0 × 0 0 0 × () 0 0 0 × () 0 × () 0 () 0 0 × 0 0 × 0 × 0 0 () × × 0 0 0 0 × × () 0 0 0 0 × × 0 0 () 0 0 × 0 × × 0 0 () 0 0 0 () 0 0 0 0 0 0 0 0 0 () 0 0 0 0 0 0 () () 0 0 () () 0 0 () 0 () 0 () () () () () 258 Found Category Score EValue BitScore Weighted tity Per- cent Iden- DE Info Name Component ID Local Gene (continued) thy Cil- iopa- action Table B.2.1: Localization Known Ciliary TF inter- Name 2 dnah7c FoxJ1 × 0 0 0 () Ensembl IDENSG00000135052 SourceENSG00000132744 3ENSG00000168209 3ENSG00000265231 3ENSG00000174083 3ENSG00000107562 Gene 3ENSG00000088340 3ENSG00000261230 3 golm1ENSG00000152409 3ENSG00000164620 acy3 3ENSG00000143367 ddit4 3 akr1c2ENSG00000112787 3 pik3r6ENSG00000174100 3 cxcl12ENSG00000165259 3 fer1l4ENSG00000263773 3 cpsf1ENSG00000080200 3ENSG00000159445 3 jmyENSG00000228006 rell2 3ENSG00000174374 tuft1 3 fbrsl1ENSG00000155984 mrpl45 3ENSG00000261132 3ENSG00000263005 ankrd35 3 hdxENSG00000138041 3 crybg3ENSG00000260181 ac004017.1 3 them4ENSG00000005955 3 wbscr16ENSG00000151353 tmem185a 3ENSG00000164405 3ENSG00000143971 3ENSG00000147669 Rfx2 3 rassf1ENSG00000104147 3 smek2 Rfx2ENSG00000263151 3 Rfx2ENSG00000153923 Rfx2 surf4 ggnbp2 3 tmem18ENS- Rfx2 3MUSG00000095861 Rfx2 uqcrqENSG00000243587 Rfx2 etaa1ENSG00000272573 2 Rfx2 polr2kENSG00000137441 2ENSG00000100053 Rfx2 2 oip5 Rfx2ENSG00000102854 bcl7b 2 clca3p Rfx2ENSG00000109944 2 Rfx2 Rfx2ENSG00000184599 2 c6orf183ENSG00000142694 2 Rfx2ENSG00000205835 Rfx2 mustn1 2ENSG00000047457 Rfx2 Rfx2 2 fgfbp2ENSG00000189171 Rfx2 2 crybb3ENSG00000203756 Rfx2 2 Rfx2 c11orf63ENSG00000163993 msln 2 fam19a3ENSG00000113302 2ENSG00000178878 2 × eva1b Rfx2ENSG00000197747 Rfx2 2 gmnc × Rfx2 2 s100a13 × × tmem244 0 Rfx2 Rfx2 cp Rfx2 × 0 × Cilia s100p Proteome 0 Rfx2 0 × il12b Rfx2 apold1 0 × 0 s100a10 Rfx2 0 0 × 0 Rfx2 0 × Rfx2 0 0 × Rfx2 × 0 0 FoxJ1 × 0 0 FoxJ1 0 0 × 0 0 0 × 0 FoxJ1 0 0 × 0 × () 0 FoxJ1 × Ciliome 0 0 0 () 0 0 × × 0 FoxJ1 0 () 0 FoxJ1 0 () 0 0 FoxJ1 0 0 () 0 () 0 0 0 × FoxJ1 × 0 0 () × 0 FoxJ1 0 0 0 () FoxJ1 0 0 × FoxJ1 × 0 FoxJ1 0 × () 0 0 0 FoxJ1 0 () 0 FoxJ1 () 0 × 0 0 () 0 () × FoxJ1 0 0 FoxJ1 0 × 0 0 0 () 0 () 0 × () × 0 () 0 0 × × 0 () 0 0 0 × () () 0 0 0 0 × 0 0 0 0 0 × 0 0 0 () () 0 0 () × 0 0 × 0 × 0 0 0 () () 0 () 0 × 0 0 0 0 0 × () 0 0 × 0 0 () × () 0 × 0 × 0 0 () 0 0 × 0 0 () () 0 () 0 × 0 × 0 0 () 0 0 0 0 () 0 0 0 0 () 0 0 0 0 0 () 0 () () 0 0 0 0 0 () 0 0 () () 0 () 0 () () () () () 259 Found Category Score EValue BitScore Weighted tity Per- cent Iden- DE Info Name Component ID Local Gene (continued) thy Cil- iopa- action Table B.2.1: Localization Known Ciliary TF inter- Name Ensembl IDENSG00000131096 SourceENSG00000130203 2ENSG00000105550 2ENSG00000141428 2ENSG00000132874 2ENSG00000197629 Gene 2ENSG00000133477 2NONE 2 pyy apoe c18orf21 fgf21 slc14a2 mpeg1 2 fam83f Cilia Proteome n/a FoxJ1 FoxJ1 FoxJ1 FoxJ1 FoxJ1 FoxJ1 FoxJ1 FoxJ1 × 0 × × × × 0 0 × 0 0 × 0 0 0 0 0 0 0 0 () 0 0 × 0 0 0 0 0 0 () 0 () () () () 0 () 0 () 260

Supplementary Table B.2.2

Table B.2.2: P. dumeilii transcripts (370) with significant alignments to more than one gene from the known ciliary gene set. For each P.dumerilii transcript with multiple matches, this table includes all the known ciliary genes with significant alignment, the current local annotation in PdumBase, the differential expression (DE) results (upregulated ↑, downregulated ↓, no signifi- cant change of expression →, or not included in DE analysis). In addition, the following alignment metrics from BlastP are shown: Percent identity, Evalue, and BitScore, as well as the Weighted Score calculated taking into account all matches per transcript (see chapter Methods).

Cid Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score lpin2 44.485 1.4E-126 409.0 0.497 comp225202_c0 LPIN2_HUMAN ↑ lpin1 44.299 1.6E-126 414.0 0.503 eef1a2 83.478 0.0 783.0 0.418 comp210301_c0 eef1a1EF1A1_RAT ↓ 85.435 0.0 795.0 0.425 tacc1 95.862 2.78E-88 293.0 0.157 ogdh 66.129 0.0 1297.0 0.507 comp225450_c0 ODO1_MACFA ↓ ogdhl 64.853 0.0 1261.0 0.493 lrrc34 82.011 1.14E-109 334.0 0.416 comp216162_c0 CDK1_CARAU ↓ cdk1 74.497 3.49E-168 468.0 0.584 pck1 65.506 0.0 884.0 0.447 comp225568_c0 pck2PCKGM_HUMAN ↓ 68.285 0.0 914.0 0.463 fam173a 58.696 2.24E-49 178.0 0.090 ocrl 42.608 0.0 692.0 0.416 comp225931_c0 ttkOCRL_HUMAN ↑ 46.407 7.16E-83 286.0 0.172 inpp5b 41.786 0.0 685.0 0.412 ccno 31.658 7.44E-29 116.0 0.264 comp211543_c0 CCNA_PATVU ↓ ccna1 53.169 2.35E-106 323.0 0.736 kif3b 70.943 0.0 736.0 0.543 comp220075_c1 KRP95_STRPU ↑ kif3c 58.348 0.0 620.0 0.457 usp53 50.464 8.96E-106 337.0 0.754 comp222438_c2 UBP54_MOUSE → crip1 62.667 2.89E-30 110.0 0.246 tubb2a 96.84 0.0 912.0 0.130 tubb2b 97.291 0.0 916.0 0.130 tubb3 93.431 0.0 796.0 0.113 tubb4a 97.065 0.0 909.0 0.129 comp215880_c0 TBB_PARLI ↑ tubb4b 98.423 0.0 925.0 0.132 tubb 97.072 0.0 912.0 0.130 tubb6 92.774 0.0 819.0 0.117 tubb3 93.939 0.0 831.0 0.118 cep135 44.869 1.31E-128 425.0 0.340 comp225942_c0 tmbim1CP135_DANRE ↑ 37.71 4.46E-122 411.0 0.329 tsga10 37.172 2.67E-124 414.0 0.331 elovl2 44.04 1.14E-79 244.0 0.429 comp225514_c0 ELOV4_MACMU ↑ elovl4 55.333 3.53E-111 325.0 0.571 akap9 27.429 5.98E-62 238.0 0.482 comp225194_c1 PCNT_HUMAN ↑ pcnt 28.372 1.78E-67 256.0 0.518 pkp2 32.231 1.06E-68 245.0 0.317 comp224260_c1 CTND2_HUMAN ↓ ctnnd2 48.288 1.25E-170 529.0 0.683 mchr1 30.275 1.15E-45 162.0 0.315 comp217367_c0 sstr5SSR2_HUMAN × 35.946 1.05E-71 228.0 0.444 ccr7 29.664 1.71E-31 124.0 0.241 cnga2 55.12 0.0 690.0 0.311 cnga4 51.562 0.0 545.0 0.246 comp220806_c4 CNGA2_HUMAN → cngb1 35.0 1.44E-94 320.0 0.144 cnga1 63.458 0.0 663.0 0.299 cep350 67.672 8.53E-98 325.0 0.281 comp221645_c1 ssx2ipFLOT1_DROME → 67.5 1.22E-89 288.0 0.249 flot1 64.508 0.0 544.0 0.470 60.082 6.54E-85 272.0 0.169 comp218378_c1 PSMD2_BOVIN ↓ psmd2 74.535 0.0 1337.0 0.831 kiaa0513 62.069 2.47E-89 283.0 0.204 meis2 55.992 8.78E-153 444.0 0.321 comp214706_c0 MEIS2_HUMAN ↓ 56.522 5.29E-152 441.0 0.319 fosb 66.463 1.0E-63 216.0 0.156 bbip1 58.667 1.51E-29 113.0 0.102 comp224244_c0 DCTN1_HUMAN ↑ dctn1 42.921 0.0 991.0 0.898 ankrd45 33.945 1.91E-40 140.0 0.199 comp220214_c0 ivns1abpANR45_MOUSE ↑ 30.1 1.45E-84 277.0 0.395 klhl25 32.321 3.27E-88 285.0 0.406 lrrd1 32.138 1.64E-93 310.0 0.611 comp223047_c6 LRRD1_HUMAN ↑ lrriq4 31.198 1.27E-54 197.0 0.389 gzf1 45.089 2.63E-65 216.0 0.481 comp215232_c0 PRDM9_MOUSE → znf577 48.498 5.54E-74 233.0 0.519 ttc37 36.376 0.0 765.0 0.736 comp223126_c0 TTC37_XENLA → fam195b 32.234 5.25E-76 274.0 0.264 plcb4 62.71 0.0 1516.0 0.672 comp224049_c2 PLCB4_HUMAN ↑ plcb3 37.627 0.0 739.0 0.328 unc119b 68.5 9.47E-98 286.0 0.517 comp224091_c0 U119B_DANRE ↑ unc119 65.263 1.1E-90 267.0 0.483 znf593 41.406 2.07E-29 105.0 0.081 comp223150_c0 ZN593_XENTR → dennd5b 49.047 0.0 1196.0 0.919 rab3c 78.539 4.66E-128 361.0 0.504 comp218557_c0 RAB3_DROME ↓ rab3a 76.256 4.2E-126 355.0 0.496 261

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score ccdc17 55.897 1.54E-61 213.0 0.270 comp220221_c0 ppa2IPYR_DROME ↓ 50.0 6.06E-112 330.0 0.418 ttc39b 55.963 1.14E-73 247.0 0.313 c19orf26, cbarp 27.66 3.67E-8 50.8 0.012 comp221653_c6 ptprdPTPRF_DANRE → 55.14 0.0 2040.0 0.501 ptprs 54.474 0.0 1985.0 0.487 trim65 57.576 9.81E-15 75.9 0.117 comp218533_c0 CPNE3_HUMAN → cpne2 53.259 0.0 573.0 0.883 plk4 44.048 2.15E-67 239.0 0.309 comp221738_c2 PLK2_RAT ↓ plk3 42.925 0.0 534.0 0.691 fignl1 68.5 0.0 536.0 0.572 comp220044_c0 FIGL1_HUMAN ↓ fign 34.487 1.32E-128 401.0 0.428 lrrc23 73.401 2.98E-158 484.0 0.334 comp219468_c0 cdkl5CDKL5_MOUSE ↑ 73.401 8.15E-158 488.0 0.337 syt7 73.401 7.07E-151 478.0 0.330 cdc25a 35.774 6.42E-65 221.0 0.349 comp218254_c1 cdc25cMPIP_DROME ↓ 39.649 8.55E-54 197.0 0.311 cdc25b 35.849 2.04E-62 216.0 0.341 ddx59 84.571 2.51E-96 286.0 0.460 comp216517_c0 TBA2_PATVU ↑ dcun1d5 88.083 5.45E-116 336.0 0.540 acyp1 62.63 3.57E-126 375.0 0.326 comp225793_c1 SAE2_XENTR ↓ uba2 60.772 0.0 775.0 0.674 fam179b 42.264 1.19E-104 371.0 0.570 comp225930_c0 F179B_MOUSE ↑ fam179a 33.396 1.13E-77 280.0 0.430 anapc10 66.286 3.79E-87 260.0 0.518 comp213518_c3 APC10_HUMAN ↓ phospho2 71.812 1.51E-77 242.0 0.482 rsph4a 67.633 9.74E-85 268.0 0.518 comp215635_c1 RSH4A_HUMAN ↑ rsph6a 64.904 2.5E-77 249.0 0.482 ift43 52.091 3.71E-58 197.0 0.325 comp218102_c0 ODO2_BOVIN → dlst 81.513 8.67E-140 410.0 0.675 pkm 65.333 0.0 716.0 0.463 comp221437_c1 nubp2KPYM_HUMAN ↓ 64.523 0.0 649.0 0.420 fam83d 56.688 7.17E-49 182.0 0.118 gpi1 68.603 0.0 827.0 0.594 comp225158_c0 adarb1G6PI_CRIGR → 49.883 1.08E-120 367.0 0.264 ogfr 54.268 6.05E-54 198.0 0.142 styk1 32.812 1.8E-44 166.0 0.182 comp208104_c0 fgfr1FGFR2_DANRE ↑ 45.527 0.0 608.0 0.668 ppargc1a 47.143 1.87E-32 136.0 0.149 lrrc46 67.059 8.07E-34 122.0 0.249 comp225504_c1 TATD1_XENLA → tatdn1 58.904 1.43E-127 368.0 0.751 dync1h1 73.369 0.0 7181.0 0.958 comp226263_c0 DYHC1_MOUSE × saxo2, fam154b 69.298 9.2E-96 316.0 0.042 mxi1 46.154 1.62E-48 158.0 0.506 comp205476_c0 MXI1_RAT ↑ mxd1 47.115 8.69E-47 154.0 0.494 calm3 98.0 1.25E-103 301.0 0.334 comp212071_c1 calm1CALM_LOCMI ↓ 98.0 4.51E-104 294.0 0.326 calm2 97.987 7.1E-106 306.0 0.340 glrx2 52.128 7.0E-34 117.0 0.109 comp225070_c0 GLRX2_MOUSE ↓ tbc1d9 54.292 0.0 954.0 0.891 rab29 55.446 8.52E-72 241.0 0.306 comp222070_c4 TBC12_MOUSE ↑ tbc1d14 52.799 0.0 546.0 0.694 phb 73.333 1.07E-148 417.0 0.291 mst1 74.429 1.47E-111 347.0 0.242 comp221947_c0 PHB_CHICK ↓ asb8 75.49 2.16E-109 335.0 0.234 alcam 76.238 6.39E-108 332.0 0.232 cops5 85.538 0.0 603.0 0.661 comp215825_c0 CSN5_DANRE ↓ slc22a23 88.05 5.1E-99 309.0 0.339 ak1 63.776 5.14E-91 266.0 0.548 comp224142_c1 KAD1_CHICK ↑ ak5 31.787 4.75E-62 219.0 0.452 dpysl2 55.398 0.0 667.0 0.505 comp223032_c1 DPYS_RAT → crmp1 52.659 0.0 655.0 0.495 yes1 62.556 0.0 592.0 0.503 comp217357_c0 FYN_XIPHE ↓ src 62.138 0.0 584.0 0.497 sesn3 53.617 2.84E-161 477.0 0.470 comp205339_c1 SESN1_XENLA → sesn1 50.633 0.0 537.0 0.530 ccdc153 27.841 8.53E-22 89.0 0.662 comp220480_c1 CC153_RAT ↑ prr18 32.039 4.34E-5 45.4 0.338 rhoa 93.37 3.81E-126 354.0 0.639 comp213251_c0 RHO1_DROME ↓ rnd3 50.562 7.89E-65 200.0 0.361 stk38l 75.885 0.0 722.0 0.443 comp223876_c0 mmp23bTRC_DROPS → 74.38 2.45E-52 183.0 0.112 stk38 74.409 0.0 724.0 0.444 disc1 71.569 2.65E-42 167.0 0.054 ift122 68.377 0.0 1746.0 0.567 comp218760_c0 nme7IF122_XENTR ↑ 62.252 0.0 589.0 0.191 hnrpdl 64.846 2.24E-125 400.0 0.130 snrpg 55.0 3.89E-52 179.0 0.058 mak 44.7 8.76E-166 494.0 0.509 comp198929_c1 MAK_HUMAN ↑ ick 69.255 2.49E-162 477.0 0.491 tmem107 46.988 2.36E-18 85.1 0.218 comp222921_c1 T11L1_HUMAN ↑ tcp11l2 38.022 9.0E-98 306.0 0.782 stx3 61.111 2.57E-101 298.0 0.718 comp219969_c0 STX_APLCA → stx19 30.736 5.29E-31 117.0 0.282 rab17 42.13 5.04E-54 172.0 0.104 efhb 41.623 4.86E-152 453.0 0.274 rab5c 85.253 7.1E-134 379.0 0.229 comp223573_c0 RAB5C_PONAB ↑ klhdc9 45.238 4.8E-88 293.0 0.177 pih1d2 41.637 9.89E-62 213.0 0.129 262

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score prr11 93.243 1.66E-38 142.0 0.086 magi1 42.084 2.35E-163 531.0 0.522 comp224054_c0 MAGI2_RAT → magi3 40.111 4.82E-147 487.0 0.478 nek1 57.209 5.47E-162 491.0 0.411 comp222191_c0 nek5NEK1_MOUSE → 59.864 3.35E-127 384.0 0.321 nek3 52.143 5.57E-105 320.0 0.268 ribc1 58.077 9.93E-106 323.0 0.879 comp224627_c0 GLYG_RAT → ccdc160 28.571 8.67E-5 44.3 0.121 fbxo36 44.809 9.43E-46 153.0 0.152 comp222032_c2 FBX36_MOUSE ↓ frem3 32.338 0.0 856.0 0.848 fbxw10 33.855 2.73E-127 418.0 0.650 comp226133_c0 FBW10_HUMAN ↑ frmpd2 33.136 4.18E-59 225.0 0.350 sema5b 42.029 0.0 653.0 0.500 comp223570_c1 SEM5A_RAT ↓ ric8b 42.029 0.0 654.0 0.500 bbs4 66.9 0.0 600.0 0.598 comp224199_c0 BBS4_HUMAN ↑ gyg1 66.786 3.94E-136 404.0 0.402 sod1 65.132 2.03E-61 186.0 0.657 comp221005_c1 SODC_DROVI ↓ sod3 41.549 4.84E-25 97.1 0.343 c4orf45 59.524 5.09E-56 199.0 0.097 ssbp2 53.093 6.92E-76 240.0 0.117 comp223887_c2 SSBP3_MOUSE ↑ dlg1 60.59 0.0 1054.0 0.512 dlg4 50.955 0.0 565.0 0.275 agbl2 49.415 1.66E-141 454.0 0.499 comp222146_c0 CBPC2_DANRE → agbl3 49.77 7.71E-146 456.0 0.501 spry2 38.525 6.08E-17 80.9 0.551 comp222237_c0 SPY2_MOUSE ↓ spry1 36.082 7.24E-12 65.9 0.449 mcrs1 65.789 0.0 544.0 0.795 comp220257_c1 MCRS1_HUMAN ↓ tex26 57.639 5.96E-38 140.0 0.205 flna 54.415 0.0 1301.0 0.572 comp220813_c0 FLNA_MOUSE ↑ flnc 50.849 0.0 972.0 0.428 ccdc42b, cfap73 37.013 2.98E-65 208.0 0.298 comp221359_c0 ccdc42CCD42_NEMVE ↑ 44.04 7.93E-73 228.0 0.327 nipsnap1 52.033 5.42E-87 262.0 0.375 kif19 47.306 0.0 745.0 0.536 comp224720_c2 KIF19_XENLA ↑ rpl17 52.713 0.0 646.0 0.464 clic3 80.0 2.56E-88 301.0 0.152 comp224090_c0 STRUM_HUMAN → kiaa0196 67.845 0.0 1681.0 0.848 sclt1 31.571 1.04E-86 287.0 0.394 comp225916_c0 SCLT1_HUMAN → ccdc22 39.137 9.46E-147 441.0 0.606 rab11a 84.793 6.11E-133 372.0 0.500 comp221455_c0 RB11B_DIPOM ↓ rab11b 86.047 1.36E-132 372.0 0.500 gpbar1 54.589 2.12E-72 235.0 0.414 comp218356_c0 NUDT9_HUMAN ↑ nudt9 54.181 5.78E-113 332.0 0.586 ribc2 46.4 2.57E-101 305.0 0.093 dap 65.217 1.59E-99 335.0 0.102 comp226028_c0 RIBC2_HUMAN ↑ smc1b 47.753 0.0 1134.0 0.345 smc1a 61.223 0.0 1516.0 0.461 pdzd7 45.299 1.94E-58 220.0 0.081 ush1c 27.917 4.31E-18 90.5 0.033 c6orf165, 56.688 0.0 740.0 0.272 comp218942_c3 PDZD7_PONAB ↑ rp3-382i10.7 c6orf165, cfap206 56.688 0.0 739.0 0.271 tapt1 54.649 0.0 568.0 0.208 tsn 69.076 1.36E-127 367.0 0.135 gli2 50.303 3.14E-47 179.0 0.252 comp223301_c5 gli3ZIC4_XENLA → 48.214 2.49E-46 177.0 0.249 zic2 49.673 2.8E-116 355.0 0.499 hyls1 78.986 7.83E-76 240.0 0.246 hnf1b 77.439 4.04E-90 281.0 0.289 comp197812_c0 RL17_PHLPP → fam228b 70.536 9.78E-55 171.0 0.176 lrrc69 77.439 2.36E-93 282.0 0.290 poc1a 75.0 1.92E-179 505.0 0.534 comp224798_c0 POC1A_XENTR ↑ poc1b 60.843 5.26E-153 441.0 0.466 cep41 77.326 1.29E-85 290.0 0.118 ngfr 68.595 1.96E-49 182.0 0.074 comp222326_c0 rfx3RFX3_HUMAN ↑ 67.14 0.0 674.0 0.275 rfx2 51.433 0.0 652.0 0.266 rfx1 65.051 0.0 652.0 0.266 dynlrb2 59.574 6.5E-40 128.0 0.522 comp219230_c0 DLRB2_MOUSE × dynlrb1 52.128 1.52E-35 117.0 0.478 myb 58.073 7.8E-135 453.0 0.474 comp223439_c0 TUT7_HUMAN → zcchc11 54.957 1.07E-152 503.0 0.526 fbn1 50.879 0.0 1239.0 0.581 comp221634_c1 ltbp1FBN2_MOUSE × 34.701 3.77E-150 490.0 0.230 ltbp2 32.722 6.77E-117 403.0 0.189 kiaa1009, cep162 33.936 1.76E-136 454.0 0.870 comp225965_c0 QN1_MOUSE ↑ kiaa1377, cep126 28.723 2.61E-11 67.8 0.130 acta2 94.709 0.0 754.0 0.112 actc1 94.974 0.0 753.0 0.112 97.6 0.0 775.0 0.115 94.43 0.0 754.0 0.112 actbl2 90.716 0.0 732.0 0.109 comp206273_c0 acta1ACT2_CAEEL ↑ 94.444 0.0 753.0 0.112 actb 98.667 0.0 775.0 0.115 cep170 91.057 2.21E-78 254.0 0.038 ube2e3 93.6 2.28E-175 492.0 0.073 elmod1 99.02 8.73E-66 228.0 0.034 atf5 90.517 6.19E-154 443.0 0.066 ift80, nme8 35.63 3.41E-95 300.0 0.396 comp216634_c1 TXND3_CIOIN ↑ 263

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score c8orf34 48.372 8.63E-53 185.0 0.244 nme9 47.937 5.5E-88 273.0 0.360 prkaca 86.648 0.0 620.0 0.266 prkacb 86.364 0.0 619.0 0.266 comp215472_c0 ccdc150KAPCA_MOUSE → 86.275 1.51E-128 375.0 0.161 ccdc148 82.323 1.27E-120 348.0 0.149 cdh18 83.732 3.88E-121 369.0 0.158 cfap46, ttc40 33.735 0.0 1360.0 0.842 comp225021_c1 TTC40_HUMAN ↑ pacrgl 61.856 6.21E-82 256.0 0.158 gnai2 81.18 0.0 606.0 0.327 comp225321_c0 gnai3GNAI_LYMST ↓ 81.127 0.0 616.0 0.332 gnai1 83.944 0.0 631.0 0.341 ccdc30 74.444 1.47E-39 152.0 0.064 comp226046_c0 SPTCB_DROME ↓ 52.044 0.0 2236.0 0.936 b9d1 92.614 4.08E-112 342.0 0.564 comp212287_c5 ACT2_CAEEL ↑ ofcc1 81.41 5.17E-80 264.0 0.436 fam149a 86.869 2.76E-109 355.0 0.266 comp222751_c0 AP3B2_HUMAN ↑ ap3b2 74.143 0.0 982.0 0.734 abcc3 51.054 0.0 1551.0 0.487 comp223690_c0 MRP1_CHICK → abcc1 52.375 0.0 1637.0 0.513 eml6 62.682 0.0 2203.0 0.511 comp225832_c0 EMAL6_HUMAN ↑ eml5 59.456 0.0 2108.0 0.489 ac002472.13, lrrc74b 40.606 3.11E-77 253.0 0.476 comp218451_c1 CN16B_RAT → lrrc74a 37.028 5.22E-85 278.0 0.524 hist1h2bd 94.565 7.16E-61 182.0 0.167 hist1h2bc 94.565 6.78E-61 182.0 0.167 hist1h2bj 94.565 1.35E-60 181.0 0.166 comp217393_c2 H2B_PLADU → hist2h2be 95.652 4.56E-61 182.0 0.167 hist1h2bh 94.565 1.28E-60 182.0 0.167 hist1h2bb 95.652 5.21E-61 182.0 0.167 cfap54 38.517 0.0 786.0 0.676 comp226244_c0 CL055_HUMAN ↑ slc25a29 38.5 9.29E-113 376.0 0.324 arl3 80.874 1.14E-91 265.0 0.812 comp223601_c2 ARL3_XENLA ↑ arl9 37.363 2.27E-12 61.2 0.188 tuba1a 96.403 0.0 851.0 0.100 tuba1c 96.154 0.0 843.0 0.099 93.961 0.0 794.0 0.094 tuba1b 95.923 0.0 816.0 0.096 tuba3c 96.875 0.0 851.0 0.100 tuba3d 96.875 0.0 848.0 0.100 comp220192_c1 TBA1C_MOUSE ↑ tuba3e 95.433 0.0 804.0 0.095 tuba8 92.857 0.0 765.0 0.090 cdc20b 95.536 3.81E-150 427.0 0.050 elovl6 93.687 0.0 760.0 0.090 kank3 96.575 4.88E-98 310.0 0.037 popdc2 81.746 1.39E-147 419.0 0.049 rps3 88.933 2.01E-155 459.0 0.684 comp218142_c0 RS3_PIG → cep44 95.146 4.75E-62 212.0 0.316 arl6 63.043 1.98E-91 266.0 0.787 comp225643_c0 ARL6_MOUSE → gbp6 24.737 8.12E-13 72.0 0.213 mlf1 50.0 5.25E-41 149.0 0.338 comp222216_c0 TAD2B_XENLA → tada2b 48.726 5.95E-96 292.0 0.662 kcnc1 49.024 1.01E-138 418.0 0.503 comp207478_c0 KCNAW_DROME × kcnc4 48.517 1.88E-130 413.0 0.497 wdr54 38.166 7.44E-79 244.0 0.680 comp217181_c1 WDR54_MOUSE → stab1 36.842 9.25E-28 115.0 0.320 exoc6 53.752 0.0 897.0 0.485 comp224783_c0 EXC6B_HUMAN → exoc6b 56.953 0.0 952.0 0.515 ptk7 36.731 0.0 683.0 0.938 comp225307_c0 PTK7_CHICK ↓ fbrs 28.311 3.31E-4 45.4 0.062 tnfaip8l2 42.246 7.09E-51 162.0 0.466 comp217334_c0 TFIP8_BOVIN → tnfaip8l3 45.161 1.11E-59 186.0 0.534 rpgrip1 52.389 0.0 621.0 0.393 comp222626_c0 RBL2_HUMAN ↓ rbl2 48.297 0.0 960.0 0.607 celf5 57.398 3.47E-132 388.0 0.497 comp217004_c1 CEL3A_XENLA ↓ celf3 57.037 1.7E-134 393.0 0.503 rasgrp4 40.541 5.42E-139 434.0 0.479 comp225810_c0 GRP3_HUMAN ↑ rasgrp2 42.334 6.89E-156 472.0 0.521 sema4d 32.996 2.09E-66 241.0 0.411 comp224855_c2 SEM1A_SCHAM ↓ sema6a 37.828 1.41E-103 346.0 0.589 ccdc41, cep83 55.102 6.55E-75 255.0 0.280 comp221854_c1 prr5NOTUM_MOUSE ↓ 55.963 4.4E-76 259.0 0.285 notum 53.433 5.95E-132 396.0 0.435 ptges3 59.016 4.04E-94 302.0 0.187 comp220666_c7 pfkpK6PF_DROME → 61.619 0.0 975.0 0.605 polr2f 74.762 1.25E-112 334.0 0.207 anks3 46.881 6.34E-146 437.0 0.574 comp211886_c2 proser3ANKS3_HUMAN × 39.908 3.43E-24 106.0 0.139 slc19a1 63.253 3.68E-64 218.0 0.286 cfap70, ttc18 41.74 0.0 825.0 0.597 comp225327_c0 TTC18_HUMAN ↑ wdr92 73.521 0.0 556.0 0.403 fhad1 29.123 6.31E-60 227.0 0.749 comp225767_c0 FHAD1_MOUSE ↑ ccdc27 23.762 8.18E-14 75.9 0.251 tppp2 72.587 1.94E-119 348.0 0.492 comp212818_c0 RSSA_XENLA → rpsa 66.238 2.98E-125 360.0 0.508 elmod3 51.111 1.46E-84 278.0 0.825 comp214782_c7 ELMD3_BOVIN ↑ elmod2 28.571 2.22E-9 58.9 0.175 ttc21b 56.031 0.0 1566.0 0.553 comp224645_c0 TT21B_HUMAN → ttc21a 47.852 0.0 1264.0 0.447 264

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score dkfzp686j19100, 68.367 2.05E-81 263.0 0.250 comp225732_c1 IMA3_MOUSE ↑ pkd1l2 kpna4 75.586 0.0 788.0 0.750 cep112 42.694 0.0 700.0 0.632 comp224506_c0 CE112_HUMAN ↑ nrg1 42.461 5.42E-127 408.0 0.368 pax2 63.514 6.62E-82 263.0 0.510 comp220641_c3 PAX2A_DANRE ↓ pax8 63.592 1.53E-77 253.0 0.490 card11 40.86 4.35E-31 134.0 0.127 comp224497_c1 ITSN1_MOUSE ↓ itsn2 47.709 0.0 925.0 0.873 ran 89.45 3.62E-146 407.0 0.513 comp203998_c0 RAN_CHICK ↓ spg20 90.769 5.19E-127 386.0 0.487 adra2a 33.661 2.24E-65 218.0 0.493 comp225399_c0 5HT1R_DROME ↓ adra2c 33.079 1.02E-67 224.0 0.507 cep250 28.008 1.25E-106 382.0 0.109 crocc 42.93 0.0 1334.0 0.381 comp225741_c0 adam22CROCC_MOUSE ↑ 38.954 1.77E-123 434.0 0.124 ac104809.3 32.619 1.21E-163 547.0 0.156 recql 41.195 0.0 802.0 0.229 dynll2 35.404 9.54E-16 77.4 0.151 comp214879_c0 NA ↑ cfap99 38.168 6.61E-144 434.0 0.849 cep131, azi1 35.944 0.0 572.0 0.749 comp223883_c0 AZI1_HUMAN → fam83a 47.111 1.85E-50 192.0 0.251 slc2a10 29.891 5.27E-54 194.0 0.236 comp224096_c0 MYCT_HUMAN → slc2a13 55.801 0.0 629.0 0.764 ift27 50.27 1.55E-65 205.0 0.569 comp208958_c0 IFT27_HUMAN ↑ rbbp8 50.746 4.8E-43 155.0 0.431 sun2 68.86 6.25E-101 348.0 0.353 comp221265_c0 ACK1_MOUSE → tnk2 61.477 0.0 638.0 0.647 bbs10 19.676 2.05E-9 61.2 0.067 comp213941_c1 TCPE_RAT ↑ cct5 74.535 0.0 857.0 0.933 akr1a1 52.778 6.49E-101 309.0 0.504 comp204489_c0 AK1A1_PIG × akr1b1 56.631 4.96E-103 304.0 0.496 tbata 84.861 2.51E-162 468.0 0.340 comp219537_c3 gnb2GBB_PINFU ↓ 85.924 0.0 628.0 0.456 ybey 86.184 1.72E-94 282.0 0.205 lrrc36 62.5 1.59E-32 123.0 0.139 comp224941_c0 RBGP1_MOUSE ↑ rabgap1l 50.495 0.0 764.0 0.861 fam58a 50.215 3.15E-84 252.0 0.710 comp223044_c0 FA58B_MOUSE → afap1 56.471 4.91E-26 103.0 0.290 abca8 32.94 0.0 811.0 0.390 comp222009_c0 fdpsABCA5_HUMAN → 34.228 1.61E-102 352.0 0.169 abca5 34.584 0.0 919.0 0.441 spag16 63.324 0.0 1819.0 0.158 abracl 62.677 0.0 1387.0 0.120 ak9, akd1 63.674 0.0 1870.0 0.162 60.596 0.0 1895.0 0.164 59.841 0.0 1925.0 0.167 comp224838_c0 ccdc24ANK2_MOUSE → 53.886 2.38E-61 219.0 0.019 ncam1 63.081 0.0 1524.0 0.132 il17re 77.037 6.17E-59 221.0 0.019 cpsf7 73.783 1.89E-123 408.0 0.035 pygo2 69.0 9.04E-40 157.0 0.014 asb7 33.891 3.75E-29 118.0 0.010 54.042 1.3E-120 358.0 0.706 comp217035_c1 AP2E_XENLA ↓ zc2hc1c 83.721 5.76E-40 149.0 0.294 leprel1 31.298 8.86E-99 326.0 0.399 comp225083_c0 leprel4P3H1_MOUSE → 26.933 3.85E-44 166.0 0.203 leprel2 32.682 2.45E-98 325.0 0.398 wars 69.479 0.0 608.0 0.687 comp225816_c1 SYWC_BOVIN ↓ paqr3 61.86 3.59E-87 277.0 0.313 evc 46.667 2.16E-5 48.5 0.111 comp225815_c0 TRIPB_HUMAN ↓ trip11 31.919 6.85E-109 388.0 0.889 c11orf49 86.897 7.91E-90 282.0 0.247 comp219526_c0 PP2BA_RAT → ppp3ca 82.881 0.0 860.0 0.753 plekha6 36.471 3.58E-32 130.0 0.314 comp215393_c0 CERS6_MOUSE ↑ cers3 39.211 7.87E-93 284.0 0.686 kiaa0664 70.995 0.0 924.0 0.498 comp218345_c0 CLU_AEDAE → rreb1 66.616 0.0 932.0 0.502 pfkfb3 62.529 0.0 577.0 0.504 comp223470_c0 F26_LITCT → pfkfb2 56.907 0.0 567.0 0.496 lrtomt 43.413 6.02E-93 302.0 0.177 comp225224_c0 acsl6ACSL1_CAVPO ↓ 47.801 0.0 696.0 0.408 acsl1 50.299 0.0 709.0 0.415 mxd3 35.424 3.01E-40 157.0 0.115 comp224768_c1 slc44a2CTL4_DANRE → 43.448 0.0 580.0 0.426 slc44a4 45.205 0.0 625.0 0.459 aurka 67.0 5.86E-152 432.0 0.880 comp221299_c3 AURKB_BOVIN ↓ fam174a 30.323 9.5E-11 58.9 0.120 gmpr 71.884 0.0 540.0 0.487 comp214110_c0 GMPR2_HUMAN ↓ gmpr2 75.942 0.0 569.0 0.513 kiaa2012, ac079354.1 49.805 8.08E-69 260.0 0.362 comp225858_c0 FGD6_MOUSE ↑ fgd6 44.755 5.03E-135 459.0 0.638 pard6a 71.875 1.1E-152 440.0 0.311 kiaa0586 62.36 1.56E-70 238.0 0.168 comp217812_c0 GNPI2_XENTR → ap4e1 76.048 2.32E-90 286.0 0.202 gnpda1 77.528 4.73E-162 452.0 0.319 akap14 34.646 5.0E-24 94.4 0.245 comp219546_c0 AKA14_RAT ↑ adck4 52.778 9.94E-96 291.0 0.755 bbs5 78.042 0.0 567.0 0.521 comp216433_c2 BBS5_MOUSE → 265

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score rp11-724o16.1, bbs5 76.803 0.0 521.0 0.479 spata7 58.865 1.77E-48 164.0 0.564 comp217993_c0 AGM1_HUMAN ↓ ttc22 69.412 7.01E-35 127.0 0.436 arhgdia 60.819 2.2E-63 226.0 0.165 comp225340_c0 PSA_MOUSE → npepps 59.804 0.0 1141.0 0.835 spice1 81.25 7.25E-92 284.0 0.200 comp223645_c0 pgd6PGD_HUMAN ↓ 74.486 0.0 771.0 0.543 ranbp3l 80.392 3.95E-118 364.0 0.257 zmynd10 46.279 6.73E-143 416.0 0.322 comp215466_c0 ppp2caZMY10_HUMAN → 92.557 0.0 613.0 0.474 aplp1 79.195 8.58E-80 263.0 0.204 scnn1g 27.301 3.32E-40 155.0 0.470 comp223433_c1 ASIC5_MOUSE ↑ asic3 26.923 1.19E-47 175.0 0.530 fank1 49.851 1.27E-116 342.0 0.753 comp164374_c0 FANK1_HUMAN ↑ layn 42.254 5.08E-27 112.0 0.247 c11orf70 31.469 8.8E-13 70.9 0.090 ccdc92 34.877 5.54E-62 216.0 0.276 comp222126_c2 S43A3_HUMAN ↓ slc43a2 25.501 5.56E-47 173.0 0.221 slc43a3 35.674 1.12E-104 324.0 0.413 shank2 54.521 2.2E-120 423.0 0.485 comp221707_c0 SHAN3_HUMAN ↑ shank3 53.96 1.14E-130 450.0 0.515 tsc22d1 34.67 4.67E-52 197.0 0.336 comp223740_c0 CYTSA_CHICK ↓ specc1l 35.402 1.05E-121 390.0 0.664 cfp 49.77 3.77E-130 420.0 0.120 comp220994_c0 scn5aSCNA_DROME ↑ 44.356 0.0 1517.0 0.435 scn2a 45.491 0.0 1551.0 0.445 opn4 39.542 4.25E-77 253.0 0.505 comp211709_c1 OPSD_ENTDO × aftph 38.17 7.62E-71 248.0 0.495 lmcd1 30.973 1.05E-43 160.0 0.343 comp221071_c0 FHL2_HUMAN ↓ fhl3 49.458 3.83E-104 307.0 0.657 ttll3 73.333 3.42E-143 412.0 0.281 comp223086_c0 adh5ADHL_GADMO ↓ 74.865 0.0 591.0 0.403 adh1b 58.713 1.15E-163 464.0 0.316 gab3 28.814 8.64E-9 59.3 0.232 comp225823_c0 PKHA7_DANRE ↑ irs2 37.809 1.52E-50 196.0 0.768 efcab7 36.313 1.55E-73 240.0 0.627 comp225554_c1 EFCB7_XENLA ↑ cfd 34.496 2.94E-41 143.0 0.373 cntrob 35.714 3.35E-52 201.0 0.381 comp224099_c3 trappc1CNTRB_HUMAN ↑ 57.343 4.22E-68 203.0 0.385 znrf1 65.789 4.18E-36 123.0 0.233 fam161a 32.343 3.31E-39 149.0 0.512 comp221295_c0 F161B_HUMAN ↑ fam161b 32.203 7.66E-37 142.0 0.488 cacnb2 57.245 0.0 592.0 0.515 comp211156_c2 CACB2_HUMAN × cacnb1 65.158 0.0 558.0 0.485 ropn1l 52.093 8.26E-77 231.0 0.561 comp213064_c0 ROP1L_XENLA ↑ ropn1 41.837 6.09E-56 181.0 0.439 wee2 48.463 2.87E-111 347.0 0.335 comp223475_c0 wee1WEE1_RAT → 49.901 6.96E-157 465.0 0.449 s100z 50.0 6.05E-67 223.0 0.215 gjb2 42.131 0.0 598.0 0.175 comp222681_c1 LRP1_MOUSE × lrp1b 37.69 0.0 2816.0 0.825 prdx4 80.176 2.06E-136 385.0 0.587 comp210310_c1 PRDX4_MOUSE ↓ palld 70.225 6.89E-83 271.0 0.413 kdm5b 54.671 0.0 1150.0 0.547 comp224661_c0 KDM5A_HUMAN ↑ syf2 49.842 0.0 951.0 0.453 odf2 38.747 0.0 548.0 0.763 comp225041_c0 ODFP2_CHICK ↑ odf2l 27.85 2.56E-43 170.0 0.237 nphp3 38.114 0.0 646.0 0.827 comp214179_c4 NPHP3_XENTR → dupd1 44.375 1.54E-39 135.0 0.173 usp21 45.125 1.15E-103 327.0 0.439 comp225122_c1 UBP2_CHICK ↑ usp2 56.0 2.61E-138 418.0 0.561 kit 37.882 2.67E-86 290.0 0.510 comp214409_c0 VGFR1_RAT × pdgfra 36.325 1.32E-81 279.0 0.490 pkd1 25.168 1.32E-67 257.0 0.390 comp210608_c4 PK1L1_HUMAN → pkd1l1 25.311 3.33E-112 402.0 0.610 syt10 45.545 3.06E-84 268.0 0.240 comp223984_c0 SYT7_HUMAN ↓ cct2 76.735 0.0 847.0 0.760 ccdc103 80.0 2.48E-66 224.0 0.116 comp214656_c5 vcpTERA_XENLA → 87.944 0.0 1400.0 0.724 cidec 81.818 3.51E-98 309.0 0.160 alms1 49.425 3.57E-19 97.4 0.222 comp213122_c7 HERC1_HUMAN → gstz1 55.519 6.93E-98 342.0 0.778 rbm26 37.759 8.86E-67 244.0 0.391 comp224835_c1 RBM26_XENLA → plscr1 78.319 2.65E-130 380.0 0.609 hspg2 31.24 0.0 922.0 0.888 comp225531_c0 PGBM_HUMAN ↑ cast 51.376 2.97E-27 116.0 0.112 prdx3 66.822 3.5E-106 308.0 0.337 comp208709_c1 prdx1PRDX3_PONAB ↓ 71.795 3.73E-106 305.0 0.334 prdx2 71.649 1.78E-104 301.0 0.329 sfi1 27.789 1.68E-100 350.0 0.801 comp223964_c0 SFI1_HUMAN ↑ rnf224 37.956 1.26E-18 87.0 0.199 grk6 67.347 0.0 705.0 0.883 comp215054_c0 GRK5_MOUSE → gcom1 36.567 1.83E-19 93.2 0.117 hmcn1 42.612 0.0 1447.0 0.900 comp216166_c2 hhla2HMCN1_HUMAN ↓ 26.641 3.58E-5 46.6 0.029 mdga2 25.179 2.18E-25 114.0 0.071 plxna3 40.359 0.0 1325.0 0.474 comp224279_c0 plxnb3PLXA4_HUMAN ↓ 33.552 0.0 1035.0 0.371 plxnc1 35.58 2.79E-125 433.0 0.155 266

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score gsn 41.224 0.0 545.0 0.465 comp220339_c0 VILI_CHICK ↓ avil 41.429 0.0 626.0 0.535 ift57 50.814 1.04E-100 320.0 0.212 comp225870_c0 xpnpep1XPP1_MOUSE → 56.27 0.0 743.0 0.493 slc13a1 40.553 3.98E-149 445.0 0.295 bbs1 55.385 0.0 688.0 0.343 ctd-3074o7.11 55.214 0.0 687.0 0.343 comp224051_c0 BBS1_HUMAN → mdh1 67.062 4.17E-161 454.0 0.226 eif2ak2 60.667 2.39E-49 176.0 0.088 mep1b 43.35 8.06E-39 155.0 0.318 comp221486_c2 APBB2_HUMAN ↓ apbb2 40.162 1.5E-103 333.0 0.682 pcdh15 28.165 1.63E-100 363.0 0.113 comp226167_c0 FAT_DROME ↓ fat4 39.122 0.0 2852.0 0.887 drd1 41.265 3.33E-77 247.0 0.247 drd5 37.795 1.14E-70 231.0 0.231 htr6 36.686 2.95E-67 221.0 0.221 comp210605_c0 DOPR1_DROME × lpar3 23.442 1.35E-15 79.7 0.080 s1pr2 24.756 1.77E-12 69.3 0.069 adora2a 34.768 2.42E-41 152.0 0.152 krt222 32.667 7.25E-4 41.6 0.365 comp210449_c1 NF70_DORPE × synm 30.11 2.03E-12 72.4 0.635 wdr93 24.006 3.37E-52 193.0 0.603 comp225937_c1 WDR93_HUMAN ↑ pex11a 31.25 1.59E-35 127.0 0.397 iqcg 57.326 1.15E-145 442.0 0.344 comp214048_c0 2AAA_PIG ↓ ppp2r1a 70.275 0.0 842.0 0.656 xpnpep3 51.542 4.54E-173 497.0 0.287 katnb1 46.418 0.0 627.0 0.362 comp226182_c2 XPP3_RAT → tmem218 64.767 1.43E-91 292.0 0.168 hsd17b12 67.647 8.79E-97 318.0 0.183 arntl 54.669 0.0 587.0 0.553 comp224286_c0 BMAL1_CHICK ↑ arntl2 46.261 6.36E-161 475.0 0.447 sept7 70.406 0.0 593.0 0.628 comp224381_c0 wdr38SEPT7_PONAB → 76.821 1.9E-83 265.0 0.281 pdlim2 64.286 9.5E-18 85.9 0.091 gsk3b 82.474 0.0 670.0 0.521 comp214157_c0 GSK3B_SPECI ↑ gsk3a 77.662 0.0 617.0 0.479 sstr3 39.228 5.6E-68 222.0 0.542 comp212876_c0 ffar2SSR4_RAT × 24.54 3.38E-12 67.4 0.165 ccr6 27.645 1.84E-29 120.0 0.293 frmd4a 59.778 0.0 565.0 0.499 comp223500_c1 FRM4A_HUMAN → mttp 59.645 0.0 567.0 0.501 36.585 1.8E-12 65.9 0.050 comp225471_c1 elmo3ELMO1_HUMAN → 41.057 0.0 581.0 0.438 elmo1 48.862 0.0 680.0 0.512 ube2u 49.289 1.54E-58 213.0 0.322 comp223001_c1 S2611_HUMAN ↓ slc26a11 44.103 2.83E-149 449.0 0.678 cep85 47.17 9.65E-62 214.0 0.351 comp220483_c3 rassf8KHDR2_XENTR ↓ 52.663 7.09E-39 149.0 0.245 khdrbs3 50.0 1.73E-78 246.0 0.404 ptpn6 53.382 0.0 597.0 0.773 comp218209_c4 PTN11_RAT → myl6 60.938 3.28E-50 175.0 0.227 fam49b 61.538 2.11E-140 401.0 0.680 comp220788_c1 FA49B_MOUSE ↑ ccdc175 52.381 1.08E-52 189.0 0.320 tbc1d7 46.735 1.73E-87 264.0 0.342 rita1, c12orf52 54.167 3.06E-32 125.0 0.162 comp213727_c0 TBCD7_MOUSE → hesx1 40.0 8.12E-58 191.0 0.247 sybu 40.0 5.24E-55 193.0 0.250 ttc8 70.717 0.0 796.0 0.671 comp224526_c0 TTC8_MOUSE ↑ fam190a 72.803 6.87E-129 391.0 0.329 ccpg1 63.291 2.61E-27 115.0 0.184 comp214648_c0 AMYP_PIG ↓ amy2b 54.326 1.59E-178 511.0 0.816 shroom3 29.775 2.39E-16 86.7 0.558 comp224599_c1 SHRM4_HUMAN ↑ shroom1 28.108 2.35E-11 68.6 0.442 txn2 49.598 1.52E-99 340.0 0.267 comp225817_c0 lsm5IF4G3_MOUSE ↑ 46.058 2.07E-47 184.0 0.145 eif4g1 43.33 0.0 748.0 0.588 myo15a 41.656 0.0 627.0 0.145 64.14 0.0 2760.0 0.640 comp226160_c0 MYO7A_AEDAE → myo5c 39.93 2.2E-176 586.0 0.136 myo15b 30.728 2.92E-95 342.0 0.079 gnas 78.684 0.0 637.0 0.744 comp216629_c1 GNAS_LYMST → samd15 61.491 6.94E-69 219.0 0.256 fam81b 61.212 4.26E-68 224.0 0.212 comp224866_c0 tbcelAMFR_HUMAN ↑ 54.941 7.95E-79 257.0 0.244 amfr 47.093 0.0 574.0 0.544 rdh8 86.066 4.18E-74 233.0 0.152 hpcal1 87.958 1.52E-124 360.0 0.235 gadd45a 85.841 3.47E-67 216.0 0.141 comp219829_c0 NCAH_DROME ↓ ncald 86.243 9.42E-124 350.0 0.229 trim35 82.278 7.92E-44 151.0 0.099 86.607 5.15E-69 219.0 0.143 tctex1d2 54.472 1.05E-49 156.0 0.605 comp216772_c0 TC1D2_HUMAN ↑ rp11-447l10.1 54.118 3.35E-29 102.0 0.395 hsp90ab1 78.771 0.0 1087.0 0.590 comp219671_c1 HS90A_RABIT → hsp90aa1 79.823 0.0 754.0 0.410 trpm2 32.529 0.0 692.0 0.598 comp226229_c0 TRPM2_MOUSE → trpm4 31.067 1.04E-141 465.0 0.402 gdi2 67.273 0.0 633.0 0.341 stpg1 71.756 2.36E-60 204.0 0.110 comp220132_c0 GDIB_PIG ↓ 267

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score gdi1 67.727 0.0 625.0 0.337 arid1b 64.808 4.92E-130 393.0 0.212 spata17 37.673 7.22E-74 233.0 0.814 comp217720_c0 SPT17_HUMAN ↑ c2orf73 26.543 6.74E-8 53.1 0.186 fkbp1b 63.5 4.66E-77 239.0 0.298 comp221141_c0 srpk2SRPK1_HUMAN ↓ 66.561 4.69E-123 370.0 0.462 lgalsl 81.034 1.8E-57 192.0 0.240 urod 63.836 0.0 514.0 0.636 comp225935_c0 DCUP_DANRE → spata6 48.693 2.84E-97 294.0 0.364 ankmy1 37.097 0.0 580.0 0.803 comp224554_c0 ANMY1_MOUSE ↑ foxn4 34.914 7.44E-36 142.0 0.197 dysf 47.998 0.0 1718.0 0.499 comp225441_c0 MYOF_MOUSE ↓ myof 47.798 0.0 1722.0 0.501 dvl1 49.597 0.0 598.0 0.398 comp225562_c0 dixdc1DVL3_XENTR → 54.34 6.14E-72 254.0 0.169 dvl3 51.985 0.0 649.0 0.432 capn2 46.212 0.0 603.0 0.233 capn9 48.296 0.0 660.0 0.255 comp224522_c1 capn1CANB_DROME ↓ 47.598 0.0 623.0 0.241 sri 41.875 1.32E-35 132.0 0.051 capn8 43.65 0.0 572.0 0.221 ush1g 59.524 1.09E-66 244.0 0.301 comp220541_c1 fam227bRIMS2_RAT × 61.345 4.06E-67 242.0 0.298 rims2 58.824 2.17E-95 325.0 0.401 capsl 65.072 8.39E-105 301.0 0.339 comp224657_c0 capsCAPSL_MOUSE ↑ 56.522 6.28E-70 215.0 0.242 mtap 67.308 3.56E-131 373.0 0.420 drc3, lrrc48 53.551 0.0 555.0 0.904 comp220701_c1 LRC48_MOUSE ↑ lrrc51 25.301 5.04E-10 58.9 0.096 eml1 53.258 0.0 894.0 0.499 comp222364_c1 ckbEMAL1_HUMAN → 70.106 0.0 572.0 0.319 cyb5b 45.6 5.96E-101 325.0 0.181 ppp2r3c 83.333 1.62E-167 473.0 0.533 comp220434_c2 P2R3C_BOVIN ↑ pfn2 84.681 2.15E-142 415.0 0.467 spsb3 54.118 1.31E-94 298.0 0.718 comp222670_c2 SPSB3_MOUSE ↑ dbf4b 38.725 3.48E-29 117.0 0.282 sh3glb2 46.217 0.0 622.0 0.499 comp225136_c0 K319L_MOUSE → kiaa0319l 45.383 0.0 624.0 0.501 actr3 86.158 0.0 751.0 0.822 comp221306_c0 ARP3_PONAB → spata6l 85.882 8.48E-48 163.0 0.178 cox4i1 63.448 1.16E-60 202.0 0.502 comp216891_c1 ADCK3_MOUSE → mprip 56.818 1.45E-58 200.0 0.498 c18orf54 37.113 2.03E-11 62.8 0.359 comp224524_c0 BAIP2_CRIGR ↑ baiap2 34.831 2.56E-28 112.0 0.641 spag6 84.783 0.0 895.0 0.497 comp164124_c0 SPAG6_MOUSE ↑ spag6l 86.166 0.0 907.0 0.503 cep120 49.739 0.0 817.0 0.521 comp225521_c1 CE120_HUMAN ↑ tex33 47.629 0.0 752.0 0.479 ensa 59.333 1.87E-52 176.0 0.321 comp223979_c1 TIAR_HUMAN → tial1 66.187 1.24E-126 372.0 0.679 ift81 57.627 0.0 725.0 0.661 comp221709_c0 IFT81_HUMAN ↑ crebzf 50.249 2.2E-118 372.0 0.339 anxa7 49.07 5.78E-124 372.0 0.269 cib1 41.2 9.08E-55 192.0 0.139 iqck 39.216 2.89E-49 184.0 0.133 comp222899_c2 anxa2ANXA7_BOVIN → 43.465 2.29E-86 285.0 0.206 pwwp2a 39.423 1.11E-16 84.3 0.061 znf654 43.787 1.27E-39 156.0 0.113 scg5 40.288 4.41E-27 112.0 0.081 spag8 49.156 5.93E-139 443.0 0.270 comp219869_c0 pde11aPDE11_TAKRU ↓ 47.543 0.0 831.0 0.506 pde6c 30.82 3.34E-113 367.0 0.224 dnajb13 60.635 1.54E-141 402.0 0.728 comp216525_c0 DJB13_MOUSE ↑ tmem53 37.455 1.71E-43 150.0 0.272 ccdc114 31.95 8.46E-60 209.0 0.514 comp221728_c0 CC114_HUMAN ↑ ccdc63 32.747 9.1E-57 198.0 0.486 spint1 32.461 5.34E-18 88.6 0.228 comp226044_c0 PPN1_CAEEL ↑ thsd4 32.507 2.2E-84 300.0 0.772 rabl5, ift22 97.03 9.65E-64 204.0 0.181 ccdc181 95.683 6.58E-90 279.0 0.248 comp213953_c0 RAP1B_DANRE ↓ rap1b 89.247 5.39E-118 336.0 0.298 kiaa0182 94.268 5.29E-97 308.0 0.273 pkhd1 24.006 1.82E-169 593.0 0.183 comp205854_c0 PKHL1_HUMAN × pkhd1l1 37.773 0.0 2641.0 0.817 mapre1 55.344 5.17E-96 283.0 0.362 comp218486_c0 dnal4MARE3_HUMAN ↓ 69.173 1.86E-67 212.0 0.271 mapre3 51.408 9.23E-96 286.0 0.366 ift46 53.282 4.25E-81 281.0 0.272 comp219277_c1 ADCY5_RABIT × adcy6 46.956 0.0 752.0 0.728 cep164 64.894 1.9E-38 158.0 0.515 comp225890_c0 CE164_HUMAN ↑ sez6l 67.778 1.0E-35 149.0 0.485 lca5 38.095 3.4E-35 143.0 0.073 comp225310_c0 LCA5_HUMAN ↑ gm7173 34.168 0.0 1827.0 0.927 57.525 0.0 801.0 0.606 krt36 27.123 5.06E-22 99.8 0.075 comp225974_c0 KIF2A_HUMAN ↓ lmna 38.779 2.01E-115 358.0 0.271 krt20 23.016 2.13E-10 63.2 0.048 a2m 29.201 1.19E-148 493.0 0.683 comp225478_c0 CD109_HUMAN ↑ c3 28.088 3.59E-60 229.0 0.317 268

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score 1700029j07rik, 55.195 5.89E-108 317.0 0.626 comp224567_c0 CD047_XENLA ↑ c4orf47 lekr1 55.056 1.83E-58 189.0 0.374 ttll6 59.912 0.0 596.0 0.469 comp223708_c0 TTLL6_DANRE ↑ ttll13p, ttll13 53.418 0.0 675.0 0.531 kiaa1217 45.318 1.06E-59 218.0 0.150 comp208188_c0 pls1PLST_RAT ↑ 52.913 0.0 621.0 0.426 lcp1 53.312 0.0 619.0 0.425 abca4 41.919 0.0 1777.0 0.478 comp224201_c4 ABCA1_HUMAN → abca1 44.926 0.0 1938.0 0.522 slc25a22 56.923 9.9E-128 368.0 0.628 comp222610_c1 GHC1_BOVIN ↑ pde4dip 52.444 6.25E-63 218.0 0.372 fbxl20 65.977 0.0 584.0 0.511 comp219204_c0 FXL20_HUMAN ↓ fbxl2 64.371 0.0 558.0 0.489 cdkn1a 49.701 4.79E-49 174.0 0.235 comp220534_c0 pof1bSC5A8_MOUSE × 52.688 1.64E-26 105.0 0.142 slc5a6 41.76 1.42E-154 460.0 0.622 ephx1 50.442 3.77E-145 432.0 0.613 comp214768_c0 map9HYEP_HUMAN × 48.101 1.02E-43 151.0 0.214 lypd6b 45.736 1.11E-32 122.0 0.173 slc12a8 44.46 0.0 545.0 0.798 comp224849_c0 S12A8_XENLA ↓ nrn1l 56.034 2.82E-37 138.0 0.202 wdr63 48.495 0.0 810.0 0.877 comp225484_c0 WDR63_HUMAN ↑ dbi 54.639 7.3E-28 114.0 0.123 pkd2 55.378 0.0 776.0 0.519 comp222458_c1 PKD2_BOVIN ↑ pkd2l1 54.173 0.0 719.0 0.481 fam47e 39.076 8.9E-36 149.0 0.165 comp216609_c0 COBA1_MOUSE ↑ col11a2 44.892 0.0 754.0 0.835 c9orf116, 43.2 6.95E-23 98.6 0.115 comp221929_c0 1700007k13rik EGFR_APIME ↓ egfr 46.055 1.13E-135 450.0 0.527 bin3 39.007 2.58E-90 306.0 0.358 tnpo1 81.285 0.0 1477.0 0.839 comp224204_c2 TNPO1_MOUSE ↓ fbxw9 36.524 6.06E-91 283.0 0.161 prkar2a 54.25 2.09E-142 411.0 0.498 comp223075_c0 KAPR_STRPU ↑ prkar2b 53.171 1.87E-143 414.0 0.502 slc7a7 48.367 9.36E-173 496.0 0.333 comp220312_c0 slc7a5LAT2_RABIT ↓ 54.737 0.0 533.0 0.358 slc7a11 45.194 2.96E-158 459.0 0.308 hspa8 89.163 0.0 1125.0 0.137 pacrg 89.776 0.0 594.0 0.072 hspa2 86.016 0.0 1089.0 0.132 hspa6 78.956 0.0 1038.0 0.126 hspa1l 83.361 0.0 1061.0 0.129 comp218728_c3 hspa1aHSP70_ONCMY ↓ 84.452 0.0 1082.0 0.132 hspa1b 84.452 0.0 1078.0 0.131 plk1s1, kiz 70.52 7.58E-74 251.0 0.031 tbcb 89.08 1.63E-101 333.0 0.040 c19orf44 71.508 2.86E-75 265.0 0.032 fam76b 74.757 2.37E-97 309.0 0.038 rilpl1 36.203 3.58E-61 208.0 0.749 comp225719_c0 RIPL1_RAT → rilpl2 62.264 7.88E-14 69.7 0.251 wdr61 79.476 3.29E-139 393.0 0.521 comp222961_c3 ttc23lWDR61_DANRE ↑ 29.966 2.23E-32 128.0 0.170 ttc23 34.278 5.17E-71 233.0 0.309 dnaja1 68.473 0.0 516.0 0.512 comp222823_c0 DNJA1_HUMAN ↑ dnaja4 65.473 3.12E-171 491.0 0.488 rnf183 44.444 1.12E-8 54.7 0.417 comp225356_c1 TRIM3_HUMAN ↓ kiaa1191 30.464 1.73E-14 76.6 0.583 amot 37.669 1.64E-61 226.0 0.585 comp224766_c1 AMOT_MOUSE → amotl2 29.808 2.43E-40 160.0 0.415 dnah11 59.928 0.0 5612.0 0.311 comp226148_c0 dnah9DYHC_HELCR ↑ 66.056 0.0 6197.0 0.344 dnah17 66.592 0.0 6219.0 0.345 dnah5 61.527 0.0 5890.0 0.504 comp225426_c0 DYH5_HUMAN ↑ dnah8 59.228 0.0 5791.0 0.496 rnf219 43.651 8.64E-58 206.0 0.790 comp225058_c0 RN219_PONAB → rnft1 40.0 1.15E-7 54.7 0.210 stxbp5l 53.772 0.0 809.0 0.503 comp225903_c0 STXB5_HUMAN → ebp 52.122 0.0 798.0 0.497 cc2d2a 46.224 0.0 1257.0 0.832 comp217778_c2 C2D2A_HUMAN ↑ zc3h3 39.143 9.87E-69 253.0 0.168 fgr 56.566 0.0 562.0 0.513 comp220765_c0 SRC42_DROME ↓ hck 58.186 3.97E-180 533.0 0.487 dnah7c, dnah7b, 71.979 0.0 4003.0 0.553 comp225248_c0 DYH7_HUMAN ↑ dnah7 dnah12 59.533 0.0 3240.0 0.447 sept2 70.64 8.95E-172 487.0 0.588 comp220431_c0 sh3tc2SEPT2_BOVIN ↓ 77.632 1.18E-31 126.0 0.152 cycs 80.62 2.38E-68 215.0 0.260 wdr47 43.811 0.0 717.0 0.512 comp226154_c0 WDR47_HUMAN → eefsec 57.264 0.0 684.0 0.488 dlx4 62.609 1.27E-35 127.0 0.504 comp221655_c0 DLLH_BRAFL ↓ dlx3 54.478 6.18E-31 125.0 0.496 ece2 47.582 0.0 731.0 0.572 comp211304_c1 ECE1_MOUSE ↓ ecel1 37.599 0.0 548.0 0.428 tctn1 39.189 2.59E-59 206.0 0.669 comp156697_c0 RY44_DROME × ccdc169 44.144 6.82E-24 102.0 0.331 lipa 46.612 2.87E-117 348.0 0.570 comp220487_c2 LIPG_RAT → 269

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score cdkn1b 47.794 6.98E-82 262.0 0.430 gm9195 33.032 0.0 1349.0 0.827 map7 28.652 7.21E-6 50.4 0.031 comp226222_c0 evplGOGA4_HUMAN ↑ 22.905 3.03E-20 99.8 0.061 nexn 21.131 7.93E-15 79.7 0.049 sync 22.162 8.15E-7 52.8 0.032 c7orf62, 31.33 1.28E-36 130.0 0.492 comp226107_c0 CG062_HUMAN → 4921511h03rik gpank1 27.822 5.88E-36 134.0 0.508 ccdc146 42.094 0.0 673.0 0.486 comp225659_c0 CC146_HUMAN ↑ ruvbl1 75.385 0.0 713.0 0.514 rab1a 91.787 1.7E-138 388.0 0.642 comp215508_c0 RAB1A_LYMST → rab35 53.883 9.04E-72 216.0 0.358 syt5 58.873 5.98E-130 380.0 0.469 comp219534_c1 SY65_APLCA → syt2 65.092 1.43E-149 431.0 0.531 rab11fip3 29.983 4.9E-47 177.0 0.530 comp220860_c0 RFP4A_DANRE ↓ rab11fip4 34.018 1.9E-41 157.0 0.470 78.448 4.15E-54 193.0 0.254 comp223558_c1 Y1354_DROME ↓ ola1 69.674 0.0 567.0 0.746 eno4 31.025 1.86E-54 196.0 0.698 comp217699_c0 ENO4_XENLA ↑ ankrd53 41.228 2.09E-19 84.7 0.302 crb3 50.259 9.13E-51 180.0 0.280 comp224277_c5 OSBL1_RAT → osbpl2 50.505 2.79E-156 464.0 0.720 deaf1 58.192 1.35E-66 227.0 0.444 comp225803_c3 RN150_DANRE → rnf150 45.509 6.1E-87 284.0 0.556 cnnm2 53.253 0.0 624.0 0.509 comp223952_c0 CNNM2_RAT → cnnm4 52.431 0.0 602.0 0.491 tmem63b 42.857 0.0 602.0 0.537 comp223449_c0 TM63B_HUMAN → tmem63c 37.531 4.87E-173 519.0 0.463 ift80 61.73 0.0 1041.0 0.461 rp11-432b6.3 62.338 0.0 847.0 0.375 comp224146_c1 IFT80_HUMAN ↑ srsf7 56.911 1.98E-42 157.0 0.069 cxorf58 40.149 7.25E-67 215.0 0.095 fam102a 62.069 1.3E-48 160.0 0.497 comp211932_c1 F102A_XENLA → fam102b 65.487 2.26E-49 162.0 0.503 dnah2 70.918 0.0 3872.0 0.681 comp226135_c0 DYH2_MOUSE ↑ srrm2 69.342 0.0 1810.0 0.319 klhl22 30.981 3.64E-86 281.0 0.573 comp224209_c0 KLHL9_MOUSE → klhl33 26.461 7.09E-58 209.0 0.427 ttc30a 72.34 0.0 1003.0 0.498 comp224361_c0 TT30A_XENLA ↑ ttc30b 73.323 0.0 1010.0 0.502 kif27 39.557 0.0 956.0 0.687 comp225081_c0 KIF27_HUMAN ↑ kif7 41.208 1.01E-129 436.0 0.313 cdkl1 65.868 6.96E-163 461.0 0.507 comp221442_c1 CDKL1_MOUSE ↑ cdkl4 64.375 3.63E-158 449.0 0.493 wdr62 50.313 1.77E-171 523.0 0.484 comp223092_c1 MABP1_XENLA ↓ mapkbp1 54.451 0.0 557.0 0.516 tekt5 44.597 1.93E-141 417.0 0.457 comp221372_c2 TEKT3_BOVIN → tekt3 51.772 1.1E-172 496.0 0.543 nin 32.827 4.85E-39 150.0 0.456 comp225047_c1 NINL_HUMAN ↑ ninl 36.278 4.94E-49 179.0 0.544 ac002365.1, cldn34 38.182 8.73E-10 60.1 0.065 comp223095_c1 gnsTTLL3_DANRE → 54.545 0.0 546.0 0.586 slc16a7 46.802 3.24E-106 325.0 0.349 sgk196 45.965 3.31E-87 267.0 0.164 atic 71.115 0.0 917.0 0.564 comp218963_c1 SG196_CHICK ↓ eny2 78.431 1.05E-54 171.0 0.105 71.023 7.76E-83 271.0 0.167 stom 67.416 4.01E-123 353.0 0.585 comp221521_c0 MEC2_CAEEL ↓ nphs2 47.287 6.04E-81 250.0 0.415 ppp1cc 92.966 0.0 626.0 0.330 ppp1ca 92.097 0.0 638.0 0.336 comp221112_c0 PP1A_RAT ↓ c11orf74 80.921 9.81E-178 497.0 0.262 spink1 98.462 5.57E-38 137.0 0.072 slc4a1 35.512 0.0 558.0 0.209 comp223950_c0 pgm3S4A10_HUMAN ↑ 54.219 0.0 1041.0 0.390 slc4a7 53.683 0.0 1068.0 0.400 wdr5b 77.545 0.0 538.0 0.472 comp223693_c5 WDR5_RAT ↓ wdr5 87.126 0.0 601.0 0.528 hap1 29.467 1.12E-22 103.0 0.201 comp219841_c2 TRAK1_HUMAN ↓ trak1 45.586 3.24E-131 409.0 0.799 wnt3 46.631 2.27E-110 327.0 0.530 comp216399_c0 WNT5A_AMBME ↓ wnt6 40.526 1.37E-95 290.0 0.470 cep57l1 50.867 1.48E-58 201.0 0.272 ccdc67 51.034 1.23E-44 159.0 0.215 comp220674_c5 CATB_RAT ↓ iffo2 50.394 5.07E-38 140.0 0.189 ctsb 54.589 2.19E-75 239.0 0.323 bbs12 85.271 3.65E-69 237.0 0.106 comp221632_c2 prkcbKPC1_APLCA ↑ 70.997 0.0 1006.0 0.448 prkca 70.0 0.0 1001.0 0.446 got1 56.863 1.72E-168 478.0 0.633 comp210911_c0 AATC_CHICK ↓ got1l1 36.504 2.86E-89 277.0 0.367 cby1 49.219 1.19E-26 96.7 0.484 comp223886_c0 CBY1_BOVIN ↑ fam227a 23.423 9.27E-23 103.0 0.516 pepd 64.904 0.0 600.0 0.691 comp221104_c0 PEPD_MOUSE → enkd1 61.421 2.77E-85 268.0 0.309 atp2b1 61.905 0.0 1072.0 0.246 atp2b4 62.081 0.0 1082.0 0.248 comp221250_c1 AT2B2_RAT ↓ atp2b2 62.205 0.0 1099.0 0.252 270

Table B.2.2: (continued)

Component ID Gene Name Local Annotation DE Info Percent Identity EValue BitScore Weighted Score atp2b3 63.551 0.0 1102.0 0.253 ift74 51.01 0.0 562.0 0.731 comp222868_c0 IFT74_MOUSE ↑ alg14 46.154 8.39E-68 207.0 0.269 syne2 26.203 0.0 733.0 0.436 comp226242_c0 SYNE1_HUMAN ↑ syne1 29.626 0.0 947.0 0.564 ehbp1 44.984 1.5E-161 498.0 0.504 comp222784_c0 eif2ak1EHBP1_HUMAN → 69.811 4.18E-50 171.0 0.173 agfg2 63.136 2.38E-98 320.0 0.324 wnk1 68.38 1.64E-174 584.0 0.526 comp224530_c1 WNK1_MOUSE → wnk4 60.241 5.19E-162 527.0 0.474 tns1 60.204 9.22E-130 412.0 0.502 comp224341_c4 TENS_CHICK × nfe2l2 60.204 2.65E-134 408.0 0.498 ptch1 46.202 0.0 1042.0 0.405 tmem17 45.994 2.5E-93 311.0 0.121 comp212519_c0 ube2v2PTC1_MOUSE × 37.968 3.71E-34 133.0 0.052 cdhr4 36.923 2.96E-32 132.0 0.051 ptch2 47.085 0.0 952.0 0.370 fsip1 29.25 2.92E-16 82.8 0.050 comp226224_c0 FSIP1_XENLA ↑ itpr3 51.985 0.0 1576.0 0.950 cetn2 84.795 2.82E-100 286.0 0.504 comp225803_c2 CETN2_MOUSE ↑ cetn1 81.287 4.83E-98 281.0 0.496 grb10 33.065 1.27E-35 144.0 0.485 comp221363_c4 RAPH1_HUMAN ↓ grb7 34.118 9.65E-40 153.0 0.515 ephb1 44.054 0.0 702.0 0.255 epha4 44.608 0.0 714.0 0.260 comp225174_c0 EPA4A_XENLA ↑ epha2 40.516 0.0 635.0 0.231 ephb3 42.857 0.0 697.0 0.254 dcdc2, dcdc2a 34.085 1.5E-47 169.0 0.626 comp217381_c4 DCDC2_RAT ↑ dcdc2b 26.891 7.26E-24 101.0 0.374 c11orf88 63.291 9.41E-28 111.0 0.163 comp218016_c3 SEPT2_DROME → sept8 66.993 0.0 568.0 0.837 hist1h4c 100.0 3.39E-67 206.0 0.145 hist1h4j 100.0 2.0E-68 200.0 0.141 hist1h4a 100.0 1.11E-68 204.0 0.144 comp220871_c0 hist4h4H4_XENTR → 100.0 2.0E-68 200.0 0.141 hist1h4a 100.0 4.08E-69 203.0 0.143 hist1h4h 100.0 6.53E-69 206.0 0.145 hist1h4a 100.0 2.0E-68 200.0 0.141 kif1c 64.549 0.0 857.0 0.428 comp226187_c1 KIF1A_AEDAE → kif1b 65.873 0.0 1145.0 0.572 homer2 44.624 2.24E-98 296.0 0.343 comp225104_c0 HOME2_MOUSE → arnt 67.092 0.0 568.0 0.657 lrp2bp 64.789 2.89E-23 101.0 0.191 comp221126_c1 CK5P3_RAT ↓ cdk5rap3 47.327 1.2E-145 427.0 0.809 cp110, ccp110 52.041 4.85E-58 202.0 0.127 comp224969_c0 NNTM_HUMAN ↑ nnt 65.449 0.0 1394.0 0.873 61.111 8.76E-51 175.0 0.423 comp221019_c0 GLIS3_MOUSE × glis3 80.882 4.06E-75 239.0 0.577 rab8a 83.654 6.97E-129 362.0 0.362 comp218728_c0 rab10RAB8A_RAT ↑ 73.936 1.3E-98 285.0 0.285 rab8b 80.769 3.81E-125 352.0 0.352 sowahb 43.609 4.23E-28 121.0 0.526 comp219418_c3 SWAHC_MOUSE → sowaha 37.778 7.82E-25 109.0 0.474 dnah1 39.855 0.0 2747.0 0.246 comp226201_c0 dnah6DYH6_HUMAN ↑ 56.16 0.0 4204.0 0.377 dnah6 56.236 0.0 4206.0 0.377 dydc2 29.457 1.71E-9 59.7 0.110 comp222658_c2 RGS22_HUMAN ↑ rgs22 42.034 2.34E-146 483.0 0.890 rsph10b2, rsph10b 42.382 2.43E-165 494.0 0.502 comp223451_c0 R10B2_HUMAN ↑ rsph10b 42.382 1.06E-163 490.0 0.498 iqce 65.541 1.31E-56 200.0 0.292 comp216572_c0 ALF_SCHMA → aldoc 67.857 6.32E-173 486.0 0.708 odf3l1 49.133 1.23E-45 171.0 0.393 comp219734_c3 spata45, c1orf227DAB_DROME ↓ 52.857 9.15E-18 85.1 0.196 dab1 42.342 1.31E-47 179.0 0.411 ccdc173 35.019 1.67E-74 247.0 0.746 comp226241_c1 CC173_HUMAN ↑ als2cr12 43.411 6.73E-18 84.0 0.254 gmppa 58.491 0.0 516.0 0.662 comp223758_c0 GMPAA_DANRE → c3orf67 52.4 2.3E-85 264.0 0.338 tulp1 71.825 3.55E-137 405.0 0.310 comp221040_c5 tulp3TUB_HUMAN ↑ 49.569 1.55E-144 422.0 0.323 tub 52.342 2.22E-166 479.0 0.367 gm1661, c1orf228 59.091 4.82E-103 308.0 0.448 comp216707_c0 meig1CA228_BOVIN ↑ 50.794 4.72E-33 118.0 0.172 pfdn6 60.36 2.92E-86 262.0 0.381 armc3 43.384 0.0 721.0 0.841 comp217463_c1 ARMC3_HUMAN ↑ npnt 55.462 1.69E-32 136.0 0.159 dzip1 25.608 1.96E-43 171.0 0.517 comp219214_c3 DZIP1_MOUSE ↑ dzip1l 29.243 2.56E-40 160.0 0.483 271

Supplementary Table B.2.3

Table B.2.3: Strict core of known ciliary genes. From left to right: Gene name and Localiza- tion terms as reported in the original source, Transcription factor (TF) interaction indicating if reported as target of FoxJ1 and/or Rfx. Ciliopathy indicates if confirmed (X) or candidate (X) involvement in a ciliopathic phenotype. P.dumerilii Transcript id and Local annotation (based on PdumBase), as well as differential expression status in the hyperciliated embryos: Upregulated (↑), downregulated (↓), and neutral or with no significant change of expression (→). These 48 Strict Core known ciliary genes that are upregulated genes represent the High Confidence (HC) ciliary genes in P. dumerilii Gene Localization TF inter- Cil- Component ID Local DE Name action iopa- annotation Info thy mak cilium, cilia regulation - length, foxj1, rfx X comp198929_c1 MAK ↑ axoneme ssna1 bb foxj1, rfx X comp206699_c0 SSNA1 → ift27 basal body, cilium, ift, ift-b foxj1, rfx X comp208958_c0 IFT27 ↑ nme5 radial spoke component foxj1, rfx X comp212949_c0 NDK5 ↑ ropn1l cilium, cytosol, sperm fibrous sheath foxj1, rfx X comp213064_c0 ROP1L ↑ ttc29 axoneme, axonemal dynein foxj1, rfx X comp215574_c0 TTC29 ↑ rsph4a cilium, axoneme, central pair foxj1, rfx X comp215635_c1 RSH4A ↑ ttc26 basal body, cilium, ift-b foxj1, rfx X comp215953_c2 TTC26 ↑ ift20 cilium, basal body, golgi, ift, ift-b X comp218002_c0 IFT20 ↑ wdr35 cilium, ift, ift-a foxj1, rfx X comp218580_c2 WDR35 ↑ ccdc164, axoneme, nexin bridge foxj1, rfx X comp218643_c0 CC164 ↑ drc1 ift122 basal body, cilium, ift, ift-a foxj1, rfx X comp218760_c0 IF122 ↑ dnal1 axoneme, axonemal dynein complex foxj1, rfx X comp219559_c2 DNAL1 ↑ dnai2, axoneme, axonemal dynein complex foxj1, rfx X comp219835_c3 DYI3 ↑ dnaic2 dnai1, axoneme, axonemal dynein complex foxj1, rfx X comp220600_c0 DYI2 ↑ dnaic1 ak7 apical cell localisation foxj1, rfx X comp220683_c1 KAD7 ↑ ccdc37, chlamydomonas flagellar protein foxj1, rfx X comp220764_c1 CCD37 ↑ cfap100 cluap1 centriole, cilium, ift-b subcomplex foxj1, rfx X comp220828_c3 CLUA1 ↑ rsph3, cilium, axoneme, central pair, radial foxj1, rfx X comp221083_c1 RSPH3 ↑ rsph3b spoke rsph9 cilium, axoneme, radial spoke foxj1, rfx X comp221232_c1 RSPH9 ↑ gas8 axoneme, dynein regulatory complex foxj1, rfx X comp221531_c4 GAS8 ↑ ift81 basal body, cilium, ift, ift-b foxj1, rfx X comp221709_c0 IFT81 ↑ traf3ip1 cilium, basal body, cytoplasm, foxj1, rfx X comp221993_c0 MIPT3 ↑ cytoskeleton, ift, ift-b dync2h1 golgi, axoneme, ift retrograde transport foxj1 X comp222143_c1 DYHC2 ↑ tekt2 cilium, axoneme foxj1, rfx X comp222360_c1 TEKT2 ↑ spef2 central pair foxj1, rfx X comp222379_c0 SPEF2 ↑ dnali1 cilium, axonemal dynein foxj1, rfx X comp222643_c0 IDLC ↑ efhc1 axoneme foxj1, rfx X comp222825_c2 EFHC1 ↑ ift74 basal body, cilium, ift, ift-b foxj1, rfx X comp222868_c0 IFT74 ↑ ift88 basal body, cilium, ift, ift-b foxj1, rfx X comp222956_c0 IFT88 ↑ ift172 basal body, cilium, ift, ift-b foxj1, rfx X comp223399_c0 IF172 ↑ ccdc39 axoneme, axonemal dynein complex foxj1, rfx X comp223499_c0 CCD39 ↑ assembly arl3 cytosol, lipidated protein transport foxj1, rfx X comp223601_c2 ARL3 ↑ mns1 axoneme, axoneme (flagella) foxj1, rfx X comp224106_c1 MNS1 → ift80 basal body, cilium, ift, ift-b foxj1, rfx X comp224146_c1 IFT80 ↑ wdr19 cilium, ift, ift-a foxj1, rfx X comp224358_c1 WDR19 ↑ ttc30b cilium, ift, ift-b foxj1, rfx X comp224361_c0 TT30A ↑ bbs9 cilium, bbs - ift-associated foxj1, rfx X comp224469_c0 PTHB1 ↑ ttc21b axoneme, ift-a X comp224645_c0 TT21B → rsph1 cilium, axoneme, central pair, radial foxj1, rfx X comp224656_c0 RSPH1 ↑ spoke 272

Table B.2.3: (continued) Gene Localization TF inter- Cil- Transcript ID Local DE Name action iopa- annotation Info thy bbs7 basal body, cilium, bbs - ift-associated foxj1, rfx X comp224877_c2 BBS7 → dnah5 axoneme, axonemal dynein complex foxj1 X comp225426_c0 DYH5 ↑ wdr78 axoneme, axonemal dynein foxj1, rfx X comp225469_c1 WDR78 ↑ arl6 cilium, basal body, bbs - ift-associated, X comp225643_c0 ARL6 → transition zone, cytosol ift52 basal body, cilium, ift, ift-b foxj1, rfx X comp225679_c0 IFT52 ↑ ift140 basal body, cilium, ift, ift-a foxj1, rfx X comp225786_c1 IF140 ↑ dnah2 axoneme, axonemal dynein foxj1, rfx X comp226135_c0 DYH2 ↑ ccdc40 axoneme, axonemal dynein complex foxj1, rfx X comp226203_c0 CCD40 ↑ assembly dnah10 axoneme, axonemal dynein foxj1, rfx X comp226243_c2 DYH10 ↑ hydin central pair, central pair function foxj1, rfx X comp226247_c0 HYDIN ↑

Bibliography [216] O. Arnaiz, J. Cohen, A. M. Tassin, and F. Koll. Remodeling Cildb, a popular database for cilia and links for ciliopathies. Cilia, (SUPPLEMENT 1), 2015.

[217] S. P.Choksi, D. Babu, D. Lau, X. Yu, and S. Roy. Systematic discovery of novel ciliary genes through functional genomics in the zebrafish. Development, 141(17):3410– 3419, 2014.

[218] Mei I. Chung, Taejoon Kwon, Fan Tu, Eric R. Brooks, Rakhi Gupta, Matthew Meyer, Julie C. Baker, Edward M. Marcotte, and John B. Wallingford. Coordinated genomic control of ciliogenesis and cell movement by RFX2. eLife, 3:e01439, 2014.

[219] Ian K. Quigley and Chris Kintner. Rfx2 Stabilizes Foxj1 Binding at Chromatin Loops to Enable Multiciliated Cell Gene Expression. PLoS Genetics, 13(1), 2017.

[220] Jeremy F. Reiter and Michel R. Leroux. Genes and molecular pathways underpin- ning ciliopathies. Nature Reviews Molecular Cell Biology, 2017.

[221] Monika Abedin Sigg, Tabea Menchen, Chanjae Lee, Jeffery Johnson, Melissa K. Jungnickel, Semil P. Choksi, Galo Garcia, Henriette Busengdal, Gerard W. Dougherty, Petra Pennekamp, Claudius Werner, Fabian Rentzsch, Harvey M. Florman, Nevan Krogan, John B. Wallingford, Heymut Omran, and Jeremy F. Reiter. Evolutionary Pro- teomics Uncovers Ancient Associations of Cilia with Signaling Pathways. Develop- mental Cell, 2017.

[222] Teunis J.P. Van Dam, Gabrielle Wheway, Gisela G. Slaats, Martijn A. Huynen, and Rachel H. Giles. The SYSCILIA gold standard (SCGSv1) of known ciliary components and its applications within a systems biology consortium. Cilia, 2013.