<<

Exploring the bZIP regulatory network in the Amphimedon queenslandica: Insights into the ancestral regulatory genome

Katia Jindrich B.Sc, M.Sc

A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in 2016 School of Biological Sciences

I Abstract

What genomic innovations supported the emergence of multicellular , over 600 million years ago, remains one of the most fundamental questions in evolutionary biology. Increasing evidence suggests that the elaboration of the regulatory mechanisms controlling , rather than gene innovation, underlies this transition. Since transcriptional regulation is largely achieved through the binding of specific transcription factors to specific cis-regulatory DNA, understanding the early evolution of these master orchestrators is key to retracing the origin of animals (metazoans). Basic (bZIP) transcription factors constitute one of the most ancient and conserved families of transcriptional regulators. They play a pivotal role in multiple pathways that regulate cell decisions and behaviours in all kingdoms of life. Here, I explore the early evolution and putative roles of bZIPs in a representative of one of the oldest surviving animal phyla, the marine sponge Amphimedon queenslandica.

Using recently sequenced genomes from across the Eukaryota, and in particular the Metazoa, I first reconstruct the evolution of bZIPs, and document that this superfamily has undergone multiple independent kingdom-level expansions. I present here a thorough analysis of the whole superfamily of bZIP transcription factors across the main eukaryotic and the putative bZIP network that was in place in the ancestor of all extant animals. To then explore the putative role of bZIP transcription factors in the metazoan ancestor, I identify the bZIP complement in the Amphimedon queenslandica (phylum Porifera). As expected for regulatory molecules, a majority of the 17 A. queenslandica bZIPs display high temporal specificity, cell-type specific localisation and are dynamically expressed throughout development. Along with bZIPs high expression across developmental stages, these characteristics point towards bZIPs having a fundamental role in this sponge. Given the consistent central role of these transcription factors in the functioning of extant animals, I infer that were already integrated in developmental and cell specific regulatory networks in the ancestral regulatory genome and supported the emergence of multicellularity.

The PAR and ATF4 orthologues particularly stand out: they are highly expressed throughout Amphimedon life cycle and peak in expression at key developmental transitions. A. queenslandica PAR-bZIP AqPARa is part of the sponge circadian network. In this thesis, I establish that AqPARa is both spatially and temporally co-expressed with A. queenslandica cryptochrome AqCRY2, and that expression of both genes is entrained by light. I also explore the involvement of circadian genes that are conserved in other phyla in A. queenslandica response to light. This work provides insights

II on the animal ancestral circadian regulatory network, and the evolution of PAR-bZIPs implication in the regulation of environmentally cued behaviours.

A. queenslandica ATF4 orthologue, AqATF4, is highly expressed in larval archeocytes. Using chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) of AqATF4 and five histone H3 post-translational modifications (H3K4me3, H3K4me1, H3K27ac, H3K36me3 and H3K27me3), I explore AqATF4 putative target genes and binding sites in A. queenslandica larva, and the complexity of A. queenslandica regulatory landscape. I conclude that many of the roles bZIPs play in bilaterians have a more ancient origin, and were present in the last common ancestor of all contemporary animals.

III Declaration by author

This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly-authored works that I have included in my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my research higher degree candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the policy and procedures of The University of Queensland, the thesis be made available for research and study in accordance with the Copyright Act 1968 unless a period of embargo has been approved by the Dean of the Graduate School.

I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Where appropriate I have obtained copyright permission from the copyright holder to reproduce material in this thesis.

IV Publications during candidature Peer-reviewed papers Jindrich, K. and Degnan, B.M. 2016. The diversification of the basic leucine zipper family in eukaryotes correlates with the evolution of multicellularity. BMC Evolutionary Biology 16: 28.

Conference abstracts Jindrich K, Roper KE, Lemon S, Degnan S and Degnan BM. (2016). Functional diversity of bZIP transcription factors in the sponge Amphimedon queenslandica: insights into the ancestral animal regulatory genome (Oral presentation). The European Society of Evolutionary Developmental Biology, Sixth Meeting, Uppsala, Sweden.

Jindrich K, Roper KE, Lemon S, Degnan BM and Degnan S. (2016). An ancient role for bZIP trancsrition factors? PAR-bZIPs and the sponge circadian (Poster presentation). Society for Molecular Biology and Evolution 2016, Gold Coast, Australia.

Jindrich K, Roper KE, Lemon S, Reitzel AM, Degnan BM and Degnan S. (2015). Rhythmic gene expression in the demosponge Amphimedon queenslandica: how old is the animal circadian clock? (Oral presentation). 2015 EMBL Australia PhD Symposium, Melbourne, Australia

V Publications included in this thesis Jindrich, K. and Degnan, B.M. 2016. The diversification of the basic leucine zipper family in eukaryotes correlates with the evolution of multicellularity. BMC Evolutionary Biology 16: 28.

This work was incorporated as Chapter 2.

Contributor Statement of contribution Katia Jindrich (Candidate) Designed study (80%) Conducted analysis (100%) Wrote and edited the paper (75%) Bernard Degnan Designed study (20%) Wrote and edited paper (25%)

VI Contributions by others to the thesis

Bernard M. Degnan contributed to the conception and design of this research, advised on methods and analyses, and provided critical comments on the thesis and all the associated publications.

Milos Tanurdzic advised on methods and analyses in Chapter 3 and 5.

Sandie Degnan contributed to the conception and design of Chapter 5 and provided critical comments on the associated manuscript.

Sussan Lemon contributed with adult sample collection in Chapter 4.

Kerry E. Roper contributed with qPCR analyses on adult samples and contributed to experiment design in Chapter 4.

Selene L. Fernandez-Valverde produced several of the datasets analysed across this thesis (these contributions are specifically acknowledged in the relevant chapters).

Federico Gaiti advised on methods and analyses in Chapter 5, and provided critical comments on the associated manuscript.

Chromatin immunoprecipitation sequencing was conducted by Central Analytical Research Facility (CARF), Australia.

Statement of parts of the thesis submitted to qualify for the award of another degree

None.

VII Acknowledgements

I would like to start by thanking my remarkable and tireless supervisor, Bernie Degnan. There are quite a lot of things I would like to thank you for, so I’ll try and stay “focused and concise”, as you so relentlessly taught me. Thank you for your tireless enthusiasm, your guidance through this project and meeting my natural scepticism with a never-ending supply of optimism. I am deeply grateful for all the opportunities and independence you have given me to learn, and grow as a scientist. The passion you have for science and teaching is truly inspiring. And last but not least, thank you for the understanding and compassion you have never failed to show when I needed.

All my thanks to my co-supervisor, Milos (I won’t write your Surname, since I’m bound to mess up the accents!). Thanks for making time for me, often on short notice, and trying to translate my nonsense into questions you could actually answer. And for your “You won’t know until you do it” attitude that helped me move forward more than once.

I also wish to thank Sandie Degnan, not only for her contribution to this thesis, but for her infinite kindness and enthusiasm. You’ve always had a way of asking just the right question to make everything fall in place in my mind. On a more personal note, your genuine Ðand quite motherly, if I may say- interest in the details of our lives greatly contributes to that warm and welcoming atmosphere that you and Bernie have created in the lab.

All my thanks to Christine Beveridge and Paul Ebert for their time, enthusiasm and helpful comments at each of my milestones. Many thanks are also due to Gail Walter, postgraduate administrator, for her endless patience.

This project was funded by a grant from the Australian Research Council (ARC) awarded to Bernie Degnan. I thank the University of Queensland (UQ) and the ARC for awarding me the UQ International and an ARC Laureate Postgraduate Scholar scholarships, respectively. I also wish to thank the UQ Graduate School, the School of Biological Sciences and the Marine Biological Laboratory (MBL) for their generous funding, which allowed me to attend the MBL Gene Regulatory Networks course in Woods Hole, USA and the European Society for Evolutionary Developmental Biology 2016 meeting in Uppsala, Sweden. Finally, I wish to thank Bernie Degnan for financially contributing to these trips and supporting my attendance at several other conferences, as well as number of field trips to Heron Island. It was a great pleasure Ðand great fun, too- to get to

VIII meet and share my research with fellow researchers and leaders in this field, and greatly contributed to my growth as a scientist.

I was lucky enough to go on a few field trips to Heron Island. I will fail to find the words to say just how beautiful and special this place is, and, coming from a non-marine biology background, how lucky I’ve always felt to get to call it “my office” -even just for a few weeks-. I wish to thank the research staff of the Heron Island Research Station (HIRS), in particular Liz Perkins and Maureen Roberts, who goes above and beyond to help out, always with a smile.

All my thanks to all of the past and present members of the Degnan lab (in no particular order): Andrew, Jabin, Carmel, Kerry, Jaret, Nobuo, Felipe, Aude, Shun, Selene, Maely, Ben, Daniel, Bec, Fede, Yasu, Kevin, Will, Tahsha, Simone, Claire, Emily, Eunice, Arun, Gemma and Markus. Thanks for your help, advice and friendship that have made this PhD such a wonderful time for me. I am particularly grateful to Andrew and Jabin who made Fede and I so very welcome during our early days in the lab and in Brisbane, and to Shun, Kevin and Aude, for this memorable Heron trip and countless other memories.

To Laura Grice, for all those things that you make look like nothing but that saved me countless hours of pulling my hair out. Your support, patience and humour through it all have been more help to me than I can say.

I am greatly indebted to Bernie Degnan, Sandie Degnan, Laura Grice, Federico Gaiti and Oliver Edmonds for their time and thoughtful comments on the different chapters of this manuscript.

I owe a great deal of my sanity to Kerry Roper, Carmel McDougall and Federico Gaiti. Kerry and Carmel, I would like to do what countless “Degnies” have done before me and thank you for your invaluable help, advice and reassurance, but I am most deeply grateful for your friendship, your kindness and the person you both inspire me to be. And for the many, many times you infected me with some Australian-branded Calm so I wouldn’t just go all Frenchie on people. Fede, all my thanks goes to you for being the “better-me” and thereby leading the way, always one step ahead but holding a hand out to drag me along.

Of all of the great people I’ve met outside the lab, there is one above all who deserves to be named here. All my thanks to you, Jake, for “drop-dancing” into my life that day 3+ years ago and putting me back on my feet when I needed. So many memories that cannot really be put in words, so just

IX thank you for being THERE for the fun, the not-so-fun and the truly boring, and never saying no to yet another rollercoaster (metaphorical or not). You made a lot of this, well, just possible.

To my wonderful mother, who has always taught me never to look back but was always there if I ever did, and who has spent 4 years getting excited about “stuff” in , though she has never actually seen one. Tu as suivi pas à pas ce PhD, comme tu l’as fait pour le reste de ma vie mouvementée. Du fond du cÏur, merci.

And finally, to the poor guy who deserves this degree maybe more than I do, my Ð very patient - partner, Oli. Like in all areas in our life, you always reminded me of the things that mattered, but never once failed to care about the ones that didn’t. Thank you for “all those little things”, your unshakable love, and bringing me “there” Ðsometimes (often?) a little despite myself-, where all the rest begins.

X Keywords transcription factor, bZIP, gene regulatory networks, evolution, gene regulation, complexity, multicellularity, histones, chromatin, sponge

Australian and New Zealand Standard Research Classifications (ANZSRC) 060405 Gene Expression (50%) 060409 Molecular Evolution (40%) 060404 Epigenetics (10%)

Fields of Research (FoR) Classification 0603 Evolutionary Biology (40%) 0604 (40%) 0601 Biochemistry and Cell Biology (20%)

XI TABLE OF CONTENTS

Chapter 1 – GENERAL INTRODUCTION ...... 1 1.1 Evolutionary origins of the animal kingdom ...... 1 1.2 Evolution of animal transcription factors ...... 2 1.3 A. queenslandica transcription factor repertoire ...... 3 1.4 The basic leucine zipper (bZIP) family of transcription factors ...... 4 1.4.1 A large family of highly conserved TFs ...... 5 1.4.2 bZIP binding to DNA ...... 6 1.4.3 bZIP families ...... 6 1.4.4 bZIP functions ...... 6 1.5 The Amphimedon queenslandica model system ...... 8 1.6 Aims ...... 9

Chapter 2 – THE DIVERSIFICATION OF THE BASIC LEUCINE ZIPPER FAMILY IN EUKARYOTES CORRELATES WITH THE EVOLUTION OF MULTICELLULARITY ...... 18 2.1 Abstract ...... 18 2.2 Introduction ...... 18 2.3 Methods ...... 20 2.3.1 Taxonomic sampling and retrieval of bZIPs ...... 20 2.3.2 Phylogenetic analyses ...... 21 2.3.3 Search for other motifs and domains in bZIP-containing proteins ...... 22 2.4 Results ...... 22 2.4.1 Accrual of the animal bZIP repertoire over the course of holozoan evolution ...... 22 2.4.2 Emergence of novel domain combinations in holozoan bZIP families ...... 24 2.4.3 Conservation of eukaryotic bZIP DNA-binding motif ...... 24 2.4.4 Independent expansion of bZIPs in different eukaryotic lineages ...... 25 2.5 Discussion ...... 26 2.5.1 The metazoan bZIP network is comprised of members of differing age ...... 26 2.5.2 The diversification of bZIPs in eukaryotes support a role in the evolution of multicellularity ...... 27 2.5.3 Reconstruction of the ancestral bZIP ...... 28 2.5.4 Conclusions ...... 29

Chapter 3 – CHARACTERISATION OF AMPHIMEDON QUEENSLANDICA BZIP TRANSCRIPTION FACTORS ...... 36 3.1 Abstract ...... 36 3.2 Introduction ...... 36 3.2.1 Normal development of Amphimedon queenslandica ...... 37

XII 3.2.2 Predicting bZIP dimers ...... 38 3.3 Methods ...... 40 3.3.1 AqbZIPs genomic features ...... 40 3.3.2 Phylogenetic analyses ...... 40 3.3.3 A. queenslandica transcriptomics using CEL-Seq ...... 41 3.3.4 Differentially expressed genes ...... 41 3.3.5 Correlation of gene expression profile analysis ...... 41 3.3.6 Gene ontology enrichment analysis ...... 42 3.3.7 Whole mount in situ hybridisation ...... 42 3.3.8 Immunohistochemistry ...... 42 3.3.9 Dimer prediction ...... 43 3.3.10 Promoter searches ...... 44 3.4 Results ...... 44 3.4.1 Genomic organisation of A. queenslandica 17 bZIPs ...... 44 3.4.2 Phylogenetic analysis of A. queenslandica bZIP domains ...... 45 3.4.3 bZIPs are dynamically expressed across A. queenslandica life cycle ...... 45 3.4.4 bZIPs show cell-specific and broad patterns of expression during development .. 46 3.4.4.1. Localised expression of bZIPs during embryogenesis ...... 47 3.4.4.2. Localised expression of bZIPs in larvae ...... 47 3.4.4.3. Localised expression of bZIPs in juvenile sponge (oscula stage) ...... 48 3.4.5 Co-expression of bZIPs with other genes ...... 49 3.4.6 A. queenslandica bZIP dimers prediction ...... 50 3.4.7 Sequence analysis of A. queenslandica bZIPs promoters ...... 51 3.5 Discussion ...... 52 3.5.1 A. queenslandica bZIPs are expressed in a variety of cellular contexts ...... 52 3.5.1.1. Embryonic expression of A. queenslandica bZIPs ...... 52 3.5.1.2. bZIP expression during larval development ...... 53 3.5.1.3. bZIP expression during metamorphosis ...... 54 3.5.1.4. bZIP expression in the 3 day old juvenile ...... 56 3.5.1.5. Functional implications of AqbZIPs expression ...... 57 3.5.2 AqbZIPs dimerization network ...... 58 3.5.3 bZIPs in the metazoan last common ancestor ...... 59

Chapter 4 - LIGHT-ENTRAINED GENE EXPRESSION IN THE DEMOSPONGE AMPHIMEDON QUEENSLANDICA ...... 92 4.1 Abstract ...... 92 4.2 Introduction ...... 92 4.3 Material and Methods ...... 94 4.3.1 Gene identification and phylogenetic analyses ...... 94 XIII 4.3.2 Developmental expression of circadian genes ...... 94 4.3.3 Expression of circadian genes in adults ...... 95 4.3.4 Expression of circadian genes in larvae ...... 95 4.3.5 cDNA synthesis and quantitative qPCR ...... 95 4.3.6 In situ hybridisation ...... 96 4.4 Results ...... 96 4.4.1 The A. queenslandica genome encodes several homologues of circadian genes ..... 96 4.4.2 AqCRY2 and AqPARa spatial and temporal expression are correlated throughout A. queenslandica development ...... 97 4.4.3 Diel patterns of gene expression in field adults ...... 98 4.4.4 Light dependent gene expression in adult sponges ...... 98 4.4.5 Light dependent gene expression in larvae ...... 99 4.5 Discussion ...... 100 4.5.1 Cryptochrome AqCRY2 appears to be light sensor or effector in the negative feedback loop ...... 100 4.5.2 A. queenslandica has orthologues to the positive elements of bilaterian circadian regulatory loops ...... 102 4.5.3 Putative conserved role of AqPARa in the circadian positive feedback loop ...... 103 4.5.4 A circadian clock or a free-running circadian clock? ...... 104

Chapter 5 – AQATF4 DNA-BINDING IN RELATION TO THE HISTONE MODIFICATION LANDSCAPE IN THE AMPHIMEDON QUEENSLANDICA LARVA ...... 113 5.1 Abstract ...... 113 5.2 Introduction ...... 113 5.2.1 ATF4 regulatory role ...... 115 5.3 Material and methods ...... 116 5.3.1 Animal collection ...... 116 5.3.2 Antibodies ...... 116 5.3.3 Chromatin Immunoprecipitation (ChIP) assays ...... 116 5.3.4 ChIP-Seq data analyses ...... 118 5.3.5 Histone H3 PTMs tracks and peaks analyses ...... 119 5.3.6 AqATF4 tracks and peaks analyses ...... 120 5.3.7 De novo motif analyses ...... 120 5.3.8 Functional enrichment analyses ...... 120 5.3.9 Read coverage profiles ...... 120 5.3.10 AqATF4 expression profile and patterns ...... 121 5.3.11 Data acesss ...... 121 5.4 Results ...... 121 XIV 5.4.1 Nine chromatin states were identified in A. queenslandica ...... 121 5.4.2 Histone PTMs correlate with different gene expression groups ...... 122 5.4.3 Enhancer-like elements in A. queenslandica ...... 123 5.4.4 AqATF4 localises to a subset of nuclei in the larval inner cell mass ...... 124 5.4.5 AqATF4 occupancy is not promoter-proximal ...... 125 5.4.6 AqATF4 is associated with poised genomic regions ...... 125 5.4.7 AqATF4 putative targets ...... 126 5.5 Discussion ...... 127 5.5.1 Histone H3 PTMs reveal distinct chromatin states in A. queenslandica larvae ..... 127 5.5.2 AqATF4 binding is enriched around open chromatin ...... 128 5.5.3 AqATF4 appears to have multiple regulatory functions ...... 129 5.5.4 Conclusions ...... 131

Chapter 6 – GENERAL DISCUSSION ...... 144 6.1 What can the evolutionary trajectory of the bZIP family tell us? ...... 144 6.2 Developmental expression and putative roles of bZIPs in A. queenslandica ..... 145 6.3 Putative function of bZIPs in the animal last common ancestor ...... 146 6.4 The animal LCA had complex regulatory mechanisms ...... 147 6.5 Future directives ...... 148 6.6 Concluding remarks ...... 149 REFERENCES………………………………………………………………………………...... 150 APPENDICES…………………………………………………………………………………....169

XV LIST OF FIGURES

Figure 1.1. Phylogenetic relationship between holozoans ...... 10 Figure 1.2. Reconstruction of bHLH evolution in the opisthokonts...... 11 Figure 1.3. Origin and early evolution of metazoan TFs ...... 12 Figure 1.4 Developmental expression of A. queenslandica transcription factors ...... 13 Figure 1.5. bZIP basics ...... 14 Figure 1.6. Amphimedon queenslandica ...... 15 Figure 1.7. A. queenslandica life cycle ...... 16 Figure 2.1. Phylogenetic analysis of the metazoan bZIP superfamily...... 30 Figure 2.2. Evolution of recognised bilaterian bZIP families...... 30 Figure 2.3. Conservation and evolution of bZIPs in eukaryotes...... 32 Figure 2.4. Phylogenetic analysis of eukaryotic bZIPs...... 33 Figure 2.5. A model for the eukaryotic bZIP evolution...... 34 Figure 2.6. Origin and diversification of holozoan bZIPs...... 35 Figure 3.1. Embryonic and larval stages of A. queenslandica...... 61 Figure 3.2. Morphological stages during A. queenslandica metamorphosis ...... 62 Figure 3.3. Predicting bZIP dimers...... 63 Figure 3.4. A. queenslandica bZIP introns and splice variants...... 65 Figure 3.5. AqbZIPs phylogeny...... 65 Figure 3.6. A. queenslandica bZIPs ...... 67 Figure 3.7. AqbZIP expression levels relative to other A. queenslandica genes...... 68 Figure 3.8. Developmental expression profiles of the AqbZIPs...... 68 Figure 3.9. Differentially expressed AqbZIPs across development ...... 72 Figure 3.10. AqbZIPs embryonic expression pattern...... 72 Figure 3.11. AqbZIPs gene expression patterns observed in embryos...... 74 Figure 3.12. AqbZIPs expression in larvae ...... 75 Figure 3.13. AqbZIPs gene expression patterns observed in larvae ...... 76 Figure 3.14. AqATF4 labelling in A. queenslandica larvae ...... 77 Figure 3.15. AqbZIPs expression in a range of cell types in juvenile ...... 78 Figure 3.16. AqbZIPs expression in a range of cell types in juvenile ...... 80 Figure 3.17. AqJUN labelling in A. queenslandica juvenile ...... 81 Figure 3.18. AqATF4 labelling in A. queenslandica juvenile ...... 82 Figure 3.19. Modules of co-expression in A. queenslandica ...... 84 Figure 3.20. Genes correlated with the AqbZIPs ...... 84

XVI Figure 3.21. AqbZIPs leucine zipper features and dimerization interface...... 87 Figure 3.22. Prediction of AqbZIPs dimerization network ...... 88 Figure 3.23. Features of AqbZIPs promoters ...... 89 Figure 4.1. Circadian gene network in bilaterians ...... 105 Figure 4.2. Phylogenetic analysis of A. queenslandica orthologues to circadian genes...... 105 Figure 4.3. Developmental expression of AqPARa...... 108 Figure 4.4. Temporal gene expression of putative circadian genes AqCRY2, AqPARa and AqClock in adults...... 109 Figure 4.5. Residual oscillations in AqCRY2 and AqPARa expression in adults under constant dark regime...... 110 Figure 4.6. Temporal gene expression of putative circadian genes AqCRY2, AqPARa and AqClock in larvae...... 111 Figure 4.7. Residual oscillations in AqCRY2 and AqPARa expression in larvae under constant light conditions...... 112 Figure 5.1. Chromatin states in larva ...... 133 Figure 5.2. Histone PTMs are correlated with gene expression variations during development ... 135 Figure 5.3. H3 PTMs in the vicinity of AqbZIP genes ...... 135 Figure 5.4. Distal enhancer-like elements in A. queenslandica larva ...... 137 Figure 5.5. AqATF4 expression in larva ...... 139 Figure 5.6. Preferential distribution of AqATF4 peaks location across genomic features...... 139 Figure 5.7. Correlation of AqATF4 occupancy sites with H3 PTMs ...... 141 Figure 5.8. AqATF4 putative targets ...... 143

XVII LIST OF TABLES

Table 1.1. Human bZIP families ...... 17 Table 3.1. Genes correlated with the AqbZIPs ...... 90 Table 3.2. Selected genes correlated with the AqbZIPs ...... 91

XVIII LIST OF ABBREVIATIONS General abbreviations AARNet Australian academic and research network AP Anterior-posterior Ar Archeocyte ARC Australian Research Council ARE AP-1 recognition element ATF2 Activating transcription factor 2 ATF3 Activating transcription factor 3 ATF4 Activating transcription factor 4 ATF5 Activating transcription factor 5 ATF6 Activating transcription factor 6 ATF7 Activating transcription factor 7 BATF Basic leucine zipper transcription factor, ATF-like BED Browser extensible data bHLH Basic helix-loop-helix BLAST Basic local alignment search tool BLAT BLAST-like alignment tool BLIND Basic linear index determination of transcriptomes bp Base pair BPP Bayesian posterior probability BR basic region BSA Bovine serum albumin bZIP basic leucine zipper CARE C/ebp-Atf response element CARF Central analytical research facility CBP CREB binding protein Cc Choanocyte chamber cDNA Complementary DNA CEBP CCAAT/enhancer-binding protein CEL-seq Single-cell RNA-seq by multiplexed linear amplification ChIP-seq Chromatin immunoprecipitation sequencing CNC Cap'n'collar CpG 5'—C—phosphate—G—3' CRE cAMP response element

XIX CREB cAMP responsive element-binding protein CTCF CCCTC-binding factor DAPI 4',6-Diamidino-2-Phenylindole, Dihydrochloride db Database DBD DNA-binding domain DE Differentially expressed DIG Digoxigerin DNA Deoxyribonucleic acid DNase Deoxyribonuclease dNTP Deoxyribonucleotide triphosphate DOC Sodium deoxycholate DTT Dithiothreitol E-value Expect value EDTA Ethylenediaminetetraacetic acid EL Epithelial layer ENCODE Encyclopedia of DNA elements FDR False discovery rate FSW Filtered seawater GO Gene ontology GPCR G protein-coupled GRN Gene regulatory network h Hour H3K27ac Histone H3 lysine 27 acetylation H3K27me3 Histone H3 lysine 27 trimethylation H3K36me3 Histone H3 lysine 36 trimethylation H3K4me1 Histone H3 lysine 4 monomethylation H3K4me3 Histone H3 lysine 4 trimethylation HMM Hidden Markov model hpe Hours post emergence hps Hours post settlement ICM Inner cell mass IDR Irreproducible discovery rate Irx Iroquois ISH In situ hybridisation JGI Joint Genome Institute

XX JUN Jun proto-oncogene, AP-1 transcription factor subunit Kb Kilobase KEGG Kyoto encyclopaedia of genes and genomes LCA Last common ancestor LiCl Lithium chloride lncRNA Long non-coding RNA LZ leucine zipper Mb Megabase Min Minute mL Milliliter ML Maximum likelihood mM Millimolar modENCODE encyclopaedia of DNA elements mRNA Messenger RNA n Number NaCl Sodium chloride NaOAc Sodium acetate NCBI National center for biotechnology information NCBI National Center for Biotechnology Information NFE2 Nuclear factor (erythroid-derived 2) NFE2L2 Nuclear factor (erythroid-derived 2)-like 2 NFIL3 Nuclear factor Interleukin 3 ng Nanogram NGS Next-generation sequencing NJ Neighbour joining NP-40 Nonyl phenoxypolyethoxylethanol NSC Normalised strand cross-correlation coefficient nt Nucleotide ORF Open reading frame P-adj Adjusted P-value P-value Calculated probability p300 E1A binding protein p300 PAR proline and acidic amino acid-rich PCR Polymerase chain reaction pp Posterior pole

XXI PS Post-settlement postlarva PTM Post-translation modification RNA Ribonucleic acid RNA-seq RNA sequencing RNAPII RNA polymerase II RNaseA Ribonuclease A rRNA ribosomal RNA RT Room temperature SDS Sodium dodecyl sulfate Sec Second SEL Sub-epithelial layer SMBE Society for molecular biology and evolution SOX Sry-box TE-SDS Tris EDTA buffer with SDS TES Transcription end site TF Transcription factor TGF-b Transforming growth factor beta TRE TPA response element Tris-HCl Tris hydrochloride TSS Transcription start site UniPROBE Universal PBM resource for oligonucleotide binding evaluation UniProt Universal protein resource UPR Unfolded protein response UQ University of Queensland UTR Untranslated region XBP1 X-box binding protein 1 µL Microliter µm Micrometer # Number

Abbreviations of names

Ac Acanthamoeba castellanii Ad Acropora digitifera Al Antonospora locustae

XXII An Aspergillus nidulans Aq Amphimedon queenslandica At Arabidopsis thaliana Bf Branchiostoma floridae Ca Candida albicans Cc Chondrus Crispus Ci Ciona intestinalis Cn Cryptococcus neoformans Co Capsaspora owczarzaki Cr Chlamydomonas reinhardtii Ct Capitella telata Dd Dictyostelium discoideum Dm Drosophila melanogaster Dp Daphnia pulex Ec Encephalitozoon cuniculi Eh Entamoeba histolica Es Ectocarpus sliculosus Gl Giardia lamblia Gs Galdieria sulphuraria Ha Hyaloperonospora arabidopsidis Hm Hydra magnipapillata Hs Homo sapiens Hv Hartmannella vermiformis Lg Lottia gigantea Mb Monosiga brevicollis Mc Mucor circinelloides Mg Magnaporthe grisea Ml Mnemiopsis leydi Ng Naeglaria gruberi Nv Nematostella vectensis Oc Oscarella carmela Pb Pleurobrachia bachei Pi Phytophthora infectans Pp Physarum polycephalum Ps Phytophthora sojae

XXIII Pt Phaeodactylum tricornutum Pu Pythium ultimum Ri Rhizophagus irregularis Sci Sycon ciliatum Sc Saccharomyces cerevisiae Sp Strongylocentrotus purpuratus Spo Schizosaccharomyces pombe Ta Trichoplax adherans Tc Tribolium castaneum Tp Thalassiosira pseudonana Um Ustilago maydis

XXIV CHAPTER 1 Ð GENERAL INTRODUCTION

1.1 Evolutionary origins of the animal kingdom

Over 600 million years ago, the evolution of a multicellular progenitor evolved from a protist ancestor, marking the beginning of the animal kingdom and leading to the tremendous diversity of forms and lifestyles displayed by extant animals (King, 2004). Although multicellularity evolved independently multiple times- notably in animal, fungi and plants- the origin of animals constitutes a pivotal event in the history of life on Earth (Szathmary and Smith, 1995). The reconstruction of the animal ancestor is instrumental in identifying the innovations that supported the transition to animal multicellularity and the subsequent evolution of animals.

Multiple phylogenomic investigations have shown that metazoans are closely related to three unicellular lineages, choanoflagellates (e.g. Monosiga brevicollis), filastereans (e.g. Capsaspora owczarzaki) and ichthyosporeans (e.g. Creolima fragrantissima) (Figure 1.1). (de Mendoza et al., 2015; King et al., 2008; Lang et al., 2002; Ruiz-Trillo et al., 2008; Suga et al., 2013)}. Together these four groups of organisms form the holozoan (Figure 1.1).

The metazoan clade comprises bilaterians (including a majority of animal phyla), cnidarians (hydras, jellyfish, corals, and sea anemones), placozoans, ctenophores and sponges (Figure 1.1). Metazoans form a monophyletic group (Muller, 1995) but the base of the animal tree remains contentious. Notably, whether the divergence of ctenophores (comb jellies) (Dunn et al., 2008; Hejnol et al., 2009; Whelan et al., 2015) or sponges (Philippe et al., 2009; Srivastava et al., 2010) represents the basal metazoan node is still under investigation. The resolution of this phylogenetic conflict bears significant consequence on the inference of ancestral morphological features. Indeed, sponges have a very simple body plan that lacks nerve cells, musculature and centralised gut (Simpson, 1984), while ctenophores have complex nervous and muscular systems (Moroz et al., 2014). This suggests one of two scenarios: (1) the last common ancestor (LCA) possessed those complex morphological features that were lost on the sponge lineage, or (2) the LCA was a simple organism with grade of organisation similar to sponges and, either ctenophores independently evolved complexity Ð there is evidence that the ctenophore nervous system and mesoderm evolved convergently (Moroz et al., 2014; Ryan et al., 2013) - or ctenophores diverged more recently than sponges from the main metazoan stem.

1 Another point of contention is whether sponges form a monophyletic (e.g. Philippe et al., 2009) or paraphyletic (e.g. Sperling et al., 2009) group. Phylum Porifera comprises four classes: Demospongiae -the largest and most speciose class-, Calcarea, Homoscleromorpha and Hexactinellida (Degnan et al., 2009). Increasing evidence support the existence of two sister classes comprised of and hexactinellids on one hand and calcisponges and homoscleromorphs on the other (e.g. Philippe et al., 2009). Although the resolution of sponge phylogeny is critical to our understanding of poriferan evolution and biology, commonalities between sponges suggest that the LCA of all animals Ðor excluding the ctenophores- likely bore a simple body plan and a benthic lifestyle with a pelagic larva form (Leys and Hill, 2012). If sponges are paraphyletic in relation to eumetazoans, this would argue that the LCA of sponges and eumetazoans was sponge-like.

1.2 Evolution of animal transcription factors

The transition from a unicellular to a multicellular lifestyle must have required the evolution of complex gene regulatory networks (GRNs) that coordinate spatiotemporal and cell type-specific gene transcription. In metazoans, these networks are populated by diffusible regulatory molecules that recognise and bind specific sequences of DNA, transcription factors (TFs). Upon activation, TFs move into the nucleus to find their target site and modulate transcription by promoting or preventing the recruiting of the transcriptional machinery to adjacent genes. Interaction of TFs with DNA is mediated by the DNA binding domains (DBDs), whose structural characteristics are traditionally used to group TFs into broad families. TF genes represent only a small proportion of the total number of protein-coding genes in the animal genome (Weirauch and Hughes, 2011). TF conservation and evolution in animals have been the focus of many studies over the last decades (e.g. Duboule and Dolle, 1989; Finnerty et al., 2009; Finnerty et al., 2003; Jager et al., 2006; Kusserow et al., 2005; Ryan et al., 2006) but the recent focus on early-branching metazoans revealed that the complexity of most TF families appears to have increased three times during eumetazoan evolution (Larroux et al., 2008b; reviewed in Degnan et al., 2009). A first expansion phase preceded -and most likely supported- the emergence of animal multicellularity; a second led to the cnidarian-bilaterian TF repertoire, which is typically two to three times larger than that of sponges or placozoans; and finally, a third expansion occurred during the bilaterian stem (e.g. bHLH evolution, Figure 1.2). Those expansions correlate with morphological and cellular transitions, and support the traditional view that a larger TF repertoire contributes towards organismal complexity.

2 Tracking the expansion of gene repertoire through metazoan evolution brings the question of the origin of particular TF subfamilies. Although some TF families present in metazoans evolved after the divergence of metazoan and choanoflagellate lineages (e.g. in Degnan et al., 2009; Larroux et al., 2008b; Richards, 2010), many others have an older origin and are found in unicellular and colonial holozoans, or in all eukaryotes (e.g. in Best et al., 2004; de Mendoza et al., 2013; Gamboa- Melendez et al., 2013; King, 2004; Sebe-Pedros et al., 2011). The sequencing of the Capsaspora owczarzaki genome (Suga et al., 2013) revealed an extensive repertoire of TF domains in this filasterian, including bHLH, bZIP, HMG box, MADS-box, , Forkhead, CP2, , STAT DNA binding and Churchill, T-box, Runt, and the Rel homology domains (Sebe-Pedros et al., 2011). The four latter domains were subsequently lost in Monosiga brevicollis, or in choanoflagellates altogether (Sebe-Pedros et al., 2011). In the metazoan LCA most of these TF families expanded and diversified into conserved subclasses unique to animals (Degnan et al., 2009; Larroux et al., 2008b). Other important metazoan TF families originated along to metazoan stem and are unique to animals, including nuclear receptors, Ets and Smad families (Sebe-Pedros et al., 2011)

These broad TF families are comprised of members of differing evolutionary age; for instance, ANTP or POU are metazoan-specific TFs of the eukaryotic homeobox family (Figure 1.3) (Degnan et al., 2009; Larroux et al., 2008b). The metazoan TF toolkit, which evolved during the transition between unicellular protists and the first multicellular animals, comprise metazoan- specific nuclear receptors, Smad and ETS families along with specific members of TF families of more ancient origin (Figure 1.3).

1.3 A. queenslandica transcription factor repertoire

A. queenslandica TF repertoire is similar to that of eumetazoans and comprises most of the main TF classes, including bHLH (n=16), bZIP (n=17), non-Sox HMG (n=13), homeobox (n=31), T-box (n=7) and ETS (n=9) (Conaco et al., 2012; Degnan et al., 2015; Larroux, 2007; Simionato et al., 2007; Srivastava et al., 2010) (Figure 1.4A). TFs from these classes are expressed in all developmental stages (Figure 1.4) (Anavy et al., 2014; Conaco et al., 2012; Levin et al., 2016). A number of TF families classically involved in metazoan development are also present in A. queenslandica, including Sox and nuclear genes, as well as homeobox families NK, paired- like, Pax, POU, LIM-HD, and Six (Conaco et al., 2012; Srivastava et al., 2010). A. queenslandica however lacks TFs involved in mesoderm development (Fkh, Gsc, Twist, Snail) (Srivastava et al., 2010) consistent with sponges lacking a mesoderm (Leys and Hill, 2012). Despite the absence of a neuromuscular system in sponges (Leys and Hill, 2012; Simpson, 1984), A. 3 queenslandica possesses a number of TFs involved in determination of bilaterian muscle and nerve cells, including PaxB, Lhx genes, SoxB, Msx, , Irx and bHLH neurogenic factors (Srivastava et al., 2010).

A number of these TFs shows dynamic and localised patterns of expression in A. queenslandica (reviewed in Degnan et al., 2015). For instance, expression of homeobox TF AmqBsh is restricted to a subset of micromeres early in embryogenesis and sclerocytes in the larva, suggesting a role for this TF in the early specification of this cell type (Degnan et al., 2015; Larroux, 2007). Similarly, Sox TF AmqSOX is specifically expressed in an uncharacterised cell lineage throughout embryogenesis and larva stages (Degnan et al., 2015; Larroux, 2007). Transcripts of GATA TF AqGATA are localised in the inner layer of late-stage embryos, larvae and juveniles and AqGATA is thought to be involved in the establishment of cell layers in A. queenslandica (Degnan et al., 2015; Nakanishi et al., 2014).

A. queenslandica genome also encodes a number of TFs, ligands and receptors involved in developmental signalling cascades -such as the Wnt, TGF-β, Notch and Hedgehog pathways- that also display localised developmental expression patterns in embryos and larvae (Adamska et al., 2007a; Adamska et al., 2010; Adamska et al., 2007b; Degnan et al., 2015; Richards, 2010; Richards and Degnan, 2012; Srivastava et al., 2010). For instance, the Notch receptor AmqNotch, its ligand AmqDelta1, and one of the 16 sponge bHLH genes AmqbHLH1 are expressed in globular cells in the larva and their putative progenitors in embryos (Richards, 2010). Interestingly, this cell lineage presumably gives rise to sensory cells of the larva epithelium (Degnan et al., 2015; Richards, 2010)

1.4 The basic leucine zipper (bZIP) family of transcription factors

The first members of the bZIP family were discovered over 30 years ago by concurrent work on oncogenic viruses and yeast transcriptional regulation (Curran et al., 1982; van Straaten et al., 1983). CGN4 was identified as a positive regulator of amino acid biosynthesis in the yeast S. cerevisiae (Hinnebusch and Fink, 1983). FOS, JUN and CEBPα were amongst the first mammalian transcription factors to be purified and cloned (Landschulz et al., 1989; Maki et al., 1987). Within a few years, molecular work on those four TFs collectively revealed that bZIPs bind palindromic DNA sites as dimers (Hope and Struhl, 1987), and that those interactions were mediated by a 60 amino acid region (Hope and Struhl, 1986) containing a leucine zipper domain, which mediates dimerization (Sassonecorsi et al., 1988). By 1995, eighty-two bZIP proteins from a variety of eukaryotic species had been cloned and characterised (reviewed by (Hurst, 1994)

4 For over three decades, all aspects of bZIPs functions have been studied. A wealth of publications describing the role of a specific bZIP -or bZIP subfamily- in relation to specific biological contexts is available. What follows is thus by no mean an exhaustive survey of the current state of knowledge on bZIPs but rather an overview of notions that will be used in the following chapters. In particular, I will not address the regulation and activation of bZIPs in great detail. Those topics have been described in (Miller (2009) and references therein).

1.4.1 A large family of highly conserved TFs bZIP TFs have been identified in all eukaryotic kingdoms (de Mendoza et al., 2013). Typically, the bZIP repertoire comprises between 20 and 50 TFs (Amoutzias et al., 2007; Sebe- Pedros et al., 2011; Tian et al., 2011; Zhukovskaya et al., 2006). Plant genomes comprise expanded bZIP repertoires, with for instance 75 and 125 bZIPs identified in Arabidopsis thaliana and Zea mays, respectively (Correa et al., 2008). In eumetzoans, the number of bZIP genes in a species genome generally correlates with morphological and cell type complexity (Amoutzias et al., 2007).

bZIP transcription factors are named after their highly conserved bipartite functional domain. The bZIP domain is 60 to 80 amino acids long (Vinson et al., 2002) and consists of an N- terminal basic region and a leucine zipper (Figure 1.5A). The basic region is 30 amino acids long and contains clusters of similar basic residues that dictate DNA binding preferences. Specifically, five residues make direct contact with specific bases in the major groove of double-stranded DNA (Figure 1.5A) (Fujii et al., 2000). The basic region also contains the nuclear localisation signal (NLS, which in human consists of two clusters of lysine or arginine residues 10-12 residues apart)(Hurst, 1994). In several metazoan and plant species, nuclear import and export of bZIPs is regulated by phosphorylation of family-specific residue (Harreman et al., 2004)(e.g. S239 in human CEBP-bZIPs (Buck et al., 2001). The basic region also contains several family-specific function sites (e.g. Bannister et al., 1991; Miller, 2006, 2009).

The leucine zipper is an alpha-helical coiled coil comprised of four to eight heptad repeats of leucine residues (LxxxxxxLxxxxxxL) and mediates the formation of parallel bZIP dimers (Figure 1.5A) (Vinson et al., 2002). Four heptads seem to be minimally required to ensure stable dimerization (Tsukada et al., 2011). Only a subset of all possible bZIP homodimeric (between two of the same protein) or heterodimeric combinations actually forms in vivo. Extensive mutagenic and biophysical studies have established that dimerization specificity is largely determined by amino acids in positions a, d, e and f of the (gabcdef)n leucine zipper (reviewed in Deppmann et al., 2006; Reinke, 2012; Vinson et al., 2002). The contribution of each position of the leucine zipper to

5 dimerization specificity is covered in greater detail in Chapter 3 (3.1.2). Simplistically, charged residues at a and d (usually a leucine) positions form the hydrophobic core of the complex and determines whether a dimer will form; interactions between polar residues at analogous e and g positions of each monomer determines the strength of the dimer.

Dimerization is fundamental for bZIP biological function in several ways, since it can lead to different synergistic interactions with other regulatory proteins (Chinenov and Kerppola, 2001; Miller, 2009) or a change in transcriptional activity (Chinenov and Kerppola, 2001; Motohashi et al., 2002). For instance, the switch of dimerization partner can change the transcriptional activity of small MAFs from being an activator to a repressor (Motohashi et al., 2002).

1.4.2 bZIP binding to DNA Since chimera of leucine zippers that cannot dimerize fail to bind DNA, dimerization is considered a prerequisite for bZIPs to form stable interactions with DNA (Eychene et al., 2008). However, it is still unclear whether dimerization precedes DNA binding, and compelling evidence has been provided for either of two mechanisms: (1) formation of a bZIP dimer that subsequently binds DNA, and (2) formation of a DNAC bZIP monomer complex, followed by the binding of a second bZIP monomer (Metallo and Schepartz, 1994). The final bZIP dimer-DNA complex forms a “chopsticks-like” structure (Miller, 2009) over the DNA double helix, where each basic region contacts one-half of the binding motif (Figure 1.5B). Therefore, bZIPs canonically tend to bind palindromic sites (Figure 1.5C). Dimerization thus expands the repertoire of potential binding sites, and thereby the number of genes that can be regulated.

1.4.3 bZIP families Based on primary binding sites and protein sequence similarity, bZIPs are grouped into families. I will only briefly review the metazoan families here. bZIPs were first studied in humans and therefore, bZIP subfamilies are defined by homology to the human proteins. The human repertoire comprises 53 bZIPs, grouped into 12 subfamilies (Vinson et al., 2002) that canonically recognise five palindromic binding sites- the CEBP, TRE, CRE, CRE-like and PAR sites- (Table 1.1) (Miller, 2009). Orthologues of all of these subfamilies have been found in bilaterian and cnidarian species (Amoutzias et al., 2007) (see also Chapter 2, 2.1).

1.4.4 bZIP functions bZIPs sit at the base of number of major signalling pathways that regulate basic cell-fate decisions (e.g. Shaulian and Karin, 2002; Shaywitz and Greenberg, 1999; Thompson et al., 2009;

6 Tsukada et al., 2011; Yang and Cvekl, 2007). They are thus involved in an eclectic collection of biological processes such as embryogenesis, metabolism or circadian rhythm, and are of prime importance in cancer development (e.g. Kaloulis et al., 2004; Ma, 2013; Reitzel et al., 2013). Members of different bZIP subfamilies have similar but often distinct roles that can be tissue- specific (e.g. MAF role in lens cell differentiation (Yang and Cvekl, 2007). An extensive review on the role of bZIPs in metazoans is beyond the scope of this work. Specific examples are discussed in the following chapters when relevant. What follows is thus a brief overview highlighting the importance of bZIPs in the regulation of core cellular processes in animals.

Oncogenes JUN and FOS are probably the best-characterised bZIPs in bilaterians. Their role as regulator of cell proliferation and differentiation has been the topic of numerous reviews (e.g. Jochum et al., 2001; Shaulian and Karin, 2002; van Dam and Castellazzi, 2001; Zenz and Wagner, 2006). JUN and FOS also promote -and occasionally prevent -apoptosis.

The role of CREB in regulating cell survival and proliferation (reviewed in Mayr and Montminy, 2001) and its implication in cancer have been extensively studied (reviewed in Shaywitz and Greenberg, 1999; Steven and Seliger, 2016). Cnidarian CREB is involved in the reactivation of developmental processes, notably in the regeneration of the Hydra head (Chera et al., 2007; Kaloulis et al., 2004).

Bilaterian members of the MAF family display highly specific roles in a given tissue in relation to terminal differentiation, notably in the brain, eye, kidney and pancreas development (Eychene et al., 2008; Yang and Cvekl, 2007). MAF-bZIPs also regulate cell-cell interactions, e.g. in Drosophila where large MAF knockout induces a loss of communication between somatic cells and germline cells, resulting in a defect in germ cell differentiation (Eychene et al., 2008).

Members of the ATF4, ATF6 and ATF2 families appear to be involved particularly in cellular responses to a wide array of stress stimuli (e.g. in Calfon et al., 2002; Hai and Hartman, 2001; Tsai and Weissman, 2010; Yoshida et al., 2001). Similarly, NFE2-bZIPs canonically mediate the cytoprotective response to oxidative stress (Hayes and Dinkova-Kostova, 2014; Ma, 2013).

PAR-bZIPs have mostly been studied in for their role in the regulation of apoptosis (Cowell, 2002) and circadian oscillatory mechanisms in several bilaterians and cnidarians (e.g. Ben-Moshe et al., 2010; Brady et al., 2011; Cyran et al., 2003; Mitsui et al., 2001; Reitzel et al., 2013)

7 1.5 The Amphimedon queenslandica model system

The research I present throughout this thesis predominantly focuses on the bZIP complement of the demosponge Amphimedon queenslandica (Porifera, Demospongiae, Haplosclerida, Niphatidae) (Figure 1.6)(Hooper and Soest, 2006). As most marine invertebrates, A. queenslandica has a biphasic life cycle (Figures 1.6 and 1.7), comprised of a pelagic larval phase and benthic juvenile/adult phases (Degnan et al., 2008). A. queenslandica is a hermaphrodite and reproduces by spermcast spawning (Degnan et al., 2008). Eggs are fertilised internally, and embryos are brooded in large chambers (Figure 1.6D) until they hatch (Degnan et al., 2008). Larvae (Figure 1.6B) emerge daily from these brood chambers and swim in the water column for typically 4 to 24 hours (Degnan and Degnan, 2010) in search for a suitable habitat to settle and initiate metamorphosis. Approximately 3 days after settlement, the postlarva is able to feed by filtering the surrounding seawater and bears all the morphological hallmarks of the adult body plan, except for reproductive structure and cells, and is thus referred to as a juvenile (Leys and Degnan, 2002).

A. queenslandica cells and tissues can easily be obtained at various developmental stages. A. queenslandica adults live on the reef flats of Indo-Pacific coral reefs. On the Great Barrier Reef, Australia, adult A. queenslandica can easily be collected from Shark Bay, Heron Island Reef by snorkelling at low tide (Leys et al., 2008). Adults brood embryos year round, allowing easy access to developmental material (Degnan et al., 2008; Leys et al., 2008) and methods to release and collect larvae are well established (Leys et al., 2008).

As a representative of one of the earliest-diverging metazoan phyletic lineages, A. queenslandica is an attractive model to study the origin and evolution of gene families. Genomic (Srivastava et al., 2010) and transcriptomic (Anavy et al., 2014; Conaco et al., 2012; Fernandez- Valverde et al., 2015; Levin et al., 2016) resources, as well as a thorough description of the cytological events that accompany development and metamorphosis (reviewed in Degnan et al., 2015, described in Chapter 3 (3.1.1)), are available. The recent sequencing of the genome and transcriptome of several sponge species from different classes -e.g the calcareous sponge Sycon ciliatum (Fortunato et al., 2014) and the homoscleromorph Oscarella Carmela (Nichols et al., 2012)- will also enable comparative studies within Porifera, and reveal the ancestral features common to all sponge lineages, as well as demosponges innovations. By comparing the biology and genomics of sponges with that of other animals and unicellular holozoans, we can reconstruct the events that lead to the origin and early diversification of the animal kingdom.

8 1.6 Aims

Using whole genome data from basal metazoan lineages Ðe.g. A. queenslandica and ctenophores- and unicellular holozoan relatives- e.g. M. brevicollis- the evolutionary trajectory of metazaon TF families can be reconstructed. This approach has revealed the evolutionary origin of specific metazoan subfamilies (classes) from TF families that evolved early in opisthokont or eukaryotic evolution (e.g. homeobox, Sox, T-box and Fox genes (Larroux et al., 2008b); bHLH (Simionato et al., 2007)). However, to my knowledge, the origin and genesis of bZIPs subfamilies beyond the cnidarians have not been investigated (Amoutzias et al., 2007). Further, the full repertoire of bZIPs in a non-bilaterian animal has not been characterised.

The overall goal of this work is to identify and characterise the complete repertoire of bZIPs in the sponge A. queenslandica, and infer their possible roles in this sponge. Ultimately, by placing these results in a comparative evolutionary framework, I set out to infer what may have been the contribution of bZIPs to the origin of animal multicellularity and the evolution of animal complexity.

In Chapter 2, to investigate a possible correlation between bZIP expansion with increased morphological and cellular complexity, I retrace the emergence of new bZIP subfamilies in the five main eukaryotic kingdoms, with a particular emphasis of bZIP evolution in holozoans and metazoans. In Chapter 3, I characterise the temporal and spatial expression of bZIPs in the main phases of A. queenslandica development. Together these two chapters highlight the likely involvement of bZIPs in basic cellular processes in the animal LCA. Chapters 4 and 5 are then case studies that allow me to further investigate the function of particular bZIPs in A. queenslandica. In Chapter 4, I explore the role of a PAR bZIP in the regulation of this sponge response to light and discuss the possibility of an ancestral animal circadian clock. In Chapter 5, I examine the interplay between AqATF4 DNA-binding and the existence of distinct chromatin states in A. queenslandica. Overall, this work provides the first in-depth analysis of bZIP TFs in a non-bilaterian animal and shows that bZIPs were integrated into complex regulatory mechanisms, comprising TF networks and dynamic chromatin remodelling, in the animal LCA.

9 Figures

Figure 1.1. Phylogenetic relationship between holozoans Relationships among major holozoan lineages based on (King et al., 2008; Sebe- Pedros et al., 2011; Srivastava et al., 2010; Suga et al., 2013; Whelan et al., 2015). A grey oval indicates the multicellular common ancestor of all animals. The base of the metazoan tree is still unresolved (e.g. Srivastava et al., 2010; Whelan et al., 2015).

10

Figure 1.2. Reconstruction of bHLH evolution in the opisthokonts. The bHLH family of transcription factors (TFs) underwent three major expansions in the opisthokonts -prior the emergence of the metazoans, during the protoeumetazoan stem. The number of TFs in each bHLH group (key) is reported in the coloured ovals; the estimated numbers of bHLH genes in each LCA (black polygons) is reported in grey boxes. LCA: last common ancestor. From (Degnan et al., 2009).

11

Figure 1.3. Origin and early evolution of metazoan TFs Cladogram representing the origins and evolution of the main TF classes present in extant metazoans. Coloured dots represent the hypothesized origin of a domain; colours are unique for each domain class. A cross indicates the loss of the domain or specific protein family in a lineage. Metazoan apomorphies are shown as black dots. From (Sebe-Pedros et al., 2011)

12

Figure 1.4 Developmental expression of A. queenslandica transcription factors (A) Average normalised read counts of A. queenslandica members of eight of the main transcription factor classes present in all animals. Expression is shown at four stages of A. queenslandica development. PRE: precompetent larva, COMP: competent larva, POST: postlarva. From (Conaco et al., 2012). (B) (Left) Percent of genes in each TF class that are expressed in four developmental stages. The number of TFs in each class is reported on top of the bars. (Right) Percentage of genes in each TF class that are found within the top 25% of their expression range across four developmental stages. From (Conaco et al., 2012)

13

Figure 1.5. bZIP basics (A) Multiple alignment of the bZIP domain (basic region and leucine zipper) of bZIPs from five species representing the main metazoan lineages: Porifera (Aq, the sponge A. queenslandica), Cnidaria (Nv, the sea anemone N. vectensis), Protostomia (Dm, the fruit fly D. melanogaster), Echinodermata (Sp, the sea urchin S. purpuratus) and Chordata (Hs, H. sapiens). Residues conserved in over 50% of the sequences are highlighted. The consensus sequence is presented as a logo at the top. Numbers indicate amino acid positions from the start of the bZIP domain. (B) Crystal structure of a quaternary complex of JUN and FOS bound to double-stranded DNA (PDB: 1FOS) (Glover and Harrison, 1995). (C) Consensus sequences recognised by bZIP transcription factors. TRE and CRE (green) are the canonical palindromic bZIP recognition sites. BZIPs from the MAF and NFE2 families recognise extended binding sites (T-MARE or C-MARE and NF-E2 or ARE, respectively), comprised of a CRE or TRE core sequence (green) and flanking sequences (red). (C) From (Motohashi et al., 2002).

14

Figure 1.6. Amphimedon queenslandica (A) Adult in situ on the southern Great Barrier Reef. (B) Larva with posterior pigment ring ( pp ) to the left . (C) A 3-day-old juvenile settled on crustose coralline alga (ca) with osculum (os) pointed upwards. (D) A brood chamber, with small white and translucent oocytes are located on the outer edge of the brood chamber (white arrows), which is surrounded by brown somatic cells. White ( w ), brown ( b ), cloud ( c ), and ring ( r ) stage embryos and unhatched larvae ( l ) are mixed within the chamber. Scale bar: 1 mm. Adapted from (Degnan et al., 2015)

15

Figure 1.7. A. queenslandica life cycle Adult A. queenslandica brood embryos in large internal chambers. Larvae are released daily and swim in the water column for at least 4 hours before they develop the competence to settle and initiate metamorphosis. Approximately three days after settlement, the postlarva displays most of the hallmarks of the adult body plan, including an aquiferous system with canals and oscula. Image by William Hatleberg.

16 Tables

Primary DNA site Secondary DNA sites Bzip families Main family members Signature sequence for DNA recognition Name Consensus sequence c-JUN JUN JunB NxxAAxxCR TRE (AP1) TGACTCA JunD c-FOS FosB Fra1 FOS Fra2 FOS B-ATF JDP1 JDP2 NxxAAxxCR CRE TGACGTCA ATF3 ATF3 ATF2 ATF2 ATF7 CREB CREB ATF1 CREM ATF6 ATF6 NxxSAxxSR XBP1 XBP1 ATF4 ATF4 NxxAAxxYR ATF5 CRE-like TTACGTAA Oasis OASIS CREB3 NxxSAxxSR CREB-H C/EBPa C/EBPb CEBP C/EBPd NxxAVxxSR CCAAT ATTGCGCAAT CRE, PAR C/EBPg C/EBPe MafK small MAFs MafG MafF MAF c-Maf NxxYAxxCR MARE TGCTGACTCAGCA MafB Large MAFs MafB NRL NFE2L1 NRF2 NRF3 CNC NxxAAxxCR NF-ET TGCTGACTCAT MARE, CRE BACH1 BACH2 P45 PAR HLF PAR NxxAAxxSR PAR ATTACGTAAT NFIL3 NFIL3

Table 1.1. Human bZIP families The human bZIP repertoire comprises 53 bZIPs that are grouped in 12 main bZIP families based on sequence similarity, DNA-binding properties and dimerization preferences (e.g. Vinson et al., 2002); see 1.4). Three of these families are further split into subfamilies (MAF: small MAFs and large MAFs; PAR: PAR and NFIL3; FOS: FOS and ATF3). The signature sequence refers to the five DNA-contacting residues that determine DNA-binding specificity (see 1.4.1 and 1.4.2) (Fujii et al., 2000). Known binding sites are also reported (Miller, 2009; Vinson et al., 2002).

17 CHAPTER 2 Ð THE DIVERSIFICATION OF THE BASIC LEUCINE ZIPPER FAMILY IN EUKARYOTES CORRELATES WITH THE EVOLUTION OF MULTICELLULARITY

2.1 Abstract

Multicellularity evolved independently in diverse eukaryotic clades. In all cases, this transition required an elaboration of the regulatory mechanisms controlling gene expression. Amongst the conserved eukaryotic transcription factor families, the basic leucine zipper (bZIP) superfamily is one of the most ancient and best characterised. This gene family plays a diversity of roles in the specification, differentiation and maintenance of cell types in plants and animals. Recent evidence has shown that they are similarly involved in stress response and regulation of cell proliferation in fungi, amoebozoans and heterokonts. Using 49 sequenced genomes from across the Eukaryota, I demonstrate that the bZIP superfamily has evolved from a single ancestral eukaryotic gene and undergone multiple independent expansions. bZIP family diversification is largely restricted to multicellular lineages, consistent with bZIPs contributing to the complex regulatory networks underlying differential and cell type-specific gene expression in these lineages. Analyses focused on the Metazoa suggest an elaborate bZIP network was in place in the most recent shared ancestor of all extant animals that was comprised of 11 of the 12 previously recognised families present in modern taxa. In addition this analysis identifies three bZIP families that appear to have been lost in mammals. Thus the ancestral metazoan and eumetazoan bZIP repertoire consists of 12 and 16 bZIPs, respectively. These diversified from 7 founder genes present in the holozoan ancestor. Our results reveal the ancestral metazoan, holozoan and opisthokont bZIP repertoire and provide insights into the early evolution and diversification of bZIPs in the five main eukaryotic kingdoms. It suggests that early diversification of bZIPs in multiple eukaryotic lineages was a prerequisite for the evolution of complex multicellular organisms.

2.2 Introduction

Increasing evidence suggests that the evolution of complex multicellular organisms arose from the expansion and diversification of gene regulatory networks (reviewed in Levine and Tjian, 2003). In eukaryotes, the precise control of gene expression, often in response to physiological and environmental stimuli, largely depends on the binding of specific transcription factor proteins to specific DNA sequences (Weirauch and Hughes, 2011). This ancient mode of gene regulation has been co-opted into and is instrumental in the ontogeny of multicellular eukaryotes, sitting at nodes in developmental gene regulatory networks (GRNs) that underlie spatiotemporal and cell type- 18 specific gene transcription (Davidson and Erwin, 2006). Analysis of GRNs, largely in bilaterian animals, reveals they are populated by transcription factors of differing evolutionary age, with most being either unique to metazoans (e.g. nuclear receptors) or of an older evolutionary origin (e.g. basic helix-loop-helix transcription factors) (Degnan et al., 2009).

The basic leucine zipper (bZIP) superfamily of transcription factors appears to have originated early in eukaryotic evolution (Weirauch and Hughes, 2011). bZIPs sit at the heart of key pathways regulating cellular decisions across this domain of life (Deppmann et al., 2006). They have been consistently implicated in a wide range of core eukaryotic cellular processes, including cell proliferation and differentiation, stress response and homeostasis (Miller, 2009). However, the ancestral role of bZIPs in eukaryotes has been difficult to infer because a single conserved function has not been identified amongst living eukaryotes.

bZIP transcription factors take their name from a highly conserved 60-80 amino acid bZIP domain, which has a bipartite organisation consisting of an N-terminal basic region, responsible for DNA binding, and a leucine zipper, which mediates homo- and hetero-dimerization between bZIPs. As dimers, they regulate transcription by binding short DNA target sites, often in the form of 8 base pair palindromes. In recent years, the bZIP network of several eukaryotes has been described, and their evolution has been, to some extent, investigated in animals (Amoutzias et al., 2007; Sebe- Pedros et al., 2011), fungi (Tian et al., 2011) and plants (Correa et al., 2008). The recent sequencing of disparate eukaryotic genomes now allows the search for the primordial bZIP and the reconstruction of the evolutionary trajectories this family has taken in different higher eukaryotic lineages, including phyla, kingdoms and superkingdoms.

Here, I analysed the bZIP gene repertoires from a wide range of eukaryote genomes, focusing on the lineage with the widest coverage of draft genomes, the holozoans. I traced back metazoan bZIP families to their origin in the holozoan last common ancestor, opisthokont last common ancestor and beyond. By comparing bZIPs diversification in the main eukaryotic clades, I demonstrate that bZIPs originated from a single protein and then evolved independently in each major eukaryotic lineage. bZIP family complexity appears to increase incrementally over long evolutionary periods, prior to evolutionary transitions into a complex multicellular condition. Early in eukaryotic evolution, a first expansion phase occurred independently in each of the four main eukaryotic lineages - Opistokonta, Amoebozoa, Planta and Heterokonta- yielding to 3 or 4 bZIP families. These families constitute the core of the bZIP complement in extant fungi, heterokonts and amoebozoans. A second expansion phase occurred in holozoans and early plants, prior to the

19 emergence of complex multicellularity in either lineage. In holozoans, it occurred in two main steps: one before the divergence of unicellular holozoans, which display numerous examples of colonial forms with life cycles comprised of more than one cell type (e.g. in Salpingoeca rosetta (Fairclough et al., 2013) and Capsaspora owczarzaki (Suga et al., 2013), and one during the early evolution of metazoans, prior to the emergence of the crown group.

2.3 Methods

2.3.1 Taxonomic sampling and retrieval of bZIPs The sequences of the full complement of bZIP genes were retrieved from the fully sequenced genomes of 31 non-metazoan eukaryotes and 18 metazoan representatives. Metazoan opisthokonts include: Homo sapiens, Branchiostoma floridae and Ciona intestinalis (Chordata); Strongylocentrotus purpuratus (Echinodermata); Drosophila melanogaster, Tribolium castaneum and Daphnia pulex (Arthropoda): Lottia gigantea (Mollusca), Capitella telata (Polychaeta); Nematostella vectensis, Acropora digitifera and Hydra magnipapillata (Cnidaria); Trichoplax adherans (); Mnemiopsis leydi and Pleurobrachia bachei (Ctenophora); and Amphimedon queenslandica, Oscarella carmela and Sycon ciliatum (Porifera). Non-metazoan opisthokonts include: Monosiga brevicollis (Choanoflagellata); Capsaspora owczarzaki (Filasterea); and Saccharomyces cerevisiae, Schizosaccharomyces pombe, Candida albicans, Aspergillus nidulans, Magnaporthe grisea, Ustilago maydis, Cryptococcus neoformans, Mucor circinelloides, Rhizophagus irregularis, Antonospora locustae and Encephalitozoon cuniculi (Fungi). Plants include: Arabidopsis thaliana (Planta); Chlamydomonas reinhardtii (green alga); and Galdieria sulphuraria and Chondrus Crispus (red alga). Amoebozoans include: Entamoeba histolica, Dictyostelium discoideum, Acanthamoeba castellanii, Hartmannella vermiformis and Physarum polycephalum. Heterokonts include: Thalassiosira pseudonana, Phytophthora sojae, Pythium ultimum, Phytophthora infectans, Ectocarpus sliculosus (brown alga), Hyaloperonospora arabidopsidis, and Phaeodactylum tricornutum; Excavates include: Giardia lamblia and Naeglaria gruberi. As there are few published analyses of heterokont bZIPs, I sampled several species in this lineage. Subsequent phylogenetic analyses in this study were restricted to Phytophthora infectans (oocmycete), Thalassiosira pseudonana (diatom) and Ectocarpus siliculosus (brown alga). Species and data sources information can be found in Appendix 2.1. All databases were publically available and no animal work requiring ethics approval was conducted.

The complete bZIP set for each species was obtained through an iterative process including (i) building an initial set of bZIP proteins using , (ii) interrogating the proteome by BlastP in several databases (NCBI, JGI, Ensemble and species specific genome browsers (Aurrecoechea et 20 al., 2009; Cunningham et al., 2015; Finn et al., 2014; Gaudet et al., 2011; Hemmrich and Bosch, 2008; Shinzato et al., 2011; Yilmaz et al., 2011) and then (iii) re-interrogating each of the proteomes with a HMMER genome-wide scanning using custom Hidden Markov Models generated for each phylogenetic clade. When searching early branching holozoans, additional models were generated for each phylum. A general cutoff value of 10-4 was used but proteins with higher e-value were also manually selected against the identification criteria listed below. When possible, I interrogated our dataset with previously published data to assess completeness (see Appendix 2.1 for details, (Amoutzias et al., 2007; Cock et al., 2010; Correa et al., 2008; Gamboa-Melendez et al., 2013; Nunez-Corcuera et al., 2012; Reinke et al., 2013; Tian et al., 2011; Vinson et al., 2002; Ye et al., 2013). Putative bZIPs were then manually inspected for the following features: (1) a basic domain BR, as defined by (Vinson et al., 2002), and (2) a leucine zipper, within the two heptads located C-terminally of the BR and presenting a coil-coiled structure of two heptads minimum.

2.3.2 Phylogenetic analyses I defined the N-terminal domain boundary of the bZIP domain as the N-terminal end of the crystal structure of in complex with DNA (Ellenberger et al., 1992) and to avoid artefacts in the tree building algorithm, I set the length of the basic domain (basic region and leucine zipper) at 60 amino acids. Protein sequences were trimmed to their bZIP domain and aligned using the MAFFT v7 algorithm (Katoh et al., 2002) in Geneious and then manually inspected. Maximum likelihood (ML) analyses were carried out by RaxML (Stamatakis, 2006). The LG+G substitution model was scored as the best-fit model for each alignment, using ProtTest (Abascal et al.). Branch support was estimated by performing 1,000 bootstrap replicates. Bayesian analyses were carried out with MrBayes 3.2 (Ronquist et al., 2012), using the LGG substitution matrix, with 2 parallel runs, four chains and a resampling frequency of 100. Different temperatures were used when convergence was not achieved. I considered that I had reached convergence when the average standard deviation of split frequencies fell below 0.05. The analysis was terminated if convergence was not reached after 12,000,000 generations.

To determine orthologue assignments, I looked at each species independently. For holozoan bZIPs, I constructed three bZIP reference sets including the sequences from Homo, Branchiostoma and poriferans (bZIPs from a representative demosponge, calcisponge and homoscleromorph), respectively. I made multiple alignments comprising the putative bZIPs of each species and one of the reference set; the bZIPs identified in each phylum; and the bZIPs identified in each clade. Phylogenetic trees were inferred by maximum likelihood and Bayesian analyses. I considered two sequences as orthologues if their grouping was supported by bootstrap values and posterior

21 marginal probabilities superior to 80% and 90%, respectively. I then manually inspected each protein for conserved amino acids peculiar to each bZIP family (described in (Miller et al., 2003) (CEBP); (Kurokawa et al., 2009) (MAF); (Fujii et al., 2000)(JUN and FOS); (Haas et al., 1995) (PAR); (Luo et al., 2012) (CREB) and (Donald and Shakhnovich, 2005) (all) and Appendix 2.4). Protein sequences, names and family assignment can be found in Appendix 2.1.

Using a similar method, I sought to identify groups of orthologues within and between six eukaryotic kingdoms: animals, fungi, plants, amoebozans, heterokonts and excavates. Our classification was compared with previous studies that have focused on a specific clade (in fungi (Tian et al., 2011) and plants (Correa et al., 2008)), to confirm our orthologue assignments and limited to taxa for which there is an available draft genome. Thus the eukaryotic lineages with wider and deeper coverage are likely to be more accurate. To reduce the effect of potentially unstable sequences, I ran multiple permutations of the same analysis, with only one or two species representative of each kingdom or lineage, changing the lineage representatives each time.

2.3.3 Search for other motifs and domains in bZIP-containing proteins The complete set of full-length bZIP sequences from the following representative holozoan species was searched for the presence of conserved motifs using the MEME suite (Bailey and Elkan, 1994): Homo sapiens; Drosophila melanogaster; Hydra magnipapillata; Amphimedon queenslandica; Mnemiopsis leydii; Monosiga brevicollis; and Capsaspora owczarzaki. The minimal and maximal width for a motif was set to 6 and 50 residues, respectively. The motifs found to be conserved between orthologues were investigated further by building HMM for these motifs/domains and searching the entire derived bZIP proteome of all holozoan species.

2.4 Results

2.4.1 Accrual of the animal bZIP repertoire over the course of holozoan evolution 446 bZIP genes were identified from 18 metazoan (nine bilaterian, three cnidarian, one placozoan, two ctenophore, three sponge) and two unicellular holozoan (a choanoflagellate and a filasterean) genomes, using an iterative process that included (i) screening predicted proteomes using PFAM, (ii) interrogating the coding sequences in the genome by blastP and (iii) then using hidden Markov models (HMMs) constructed using the bZIP sequences identified in each clade to re-interrogate each of the proteomes (Appendix 2.1). By comparing and undertaking phylogenetic analyses of the bZIPs from each holozoan against three reference sets of bZIPs (bZIPs identified in Homo, Branchiostoma and poriferans, which is a composite of the bZIPs found in the genomes of a representative demosponge, calcisponge and homoscleromorph), I identified 12 bZIP families that 22 were comprised of at least one human orthologue, and one family (REPTOR) and two sub-families (PAR-Like and OASIS-Like) that appear to have been lost in humans (Figure 2.1, Appendices 2.2, 2.3, 2.4 and 2.5).

Phylogenetic analyses suggest that PAR-L and OASIS-L are related to PAR and OASIS (Appendices 2.2 and 2.3). PAR-L bZIPs possess the two Asn residues at positions 13 and 14, characteristic of the PAR and CEPB families (Appendix 2.2). OASIS-L bZIPs include the OASIS- specific Tyr at position 27 but exhibit a unique combination of amino acids at positions 13 to 21: NAIxAxxNR (x represents variable residues), instead of the typical NKxSAxxSR OASIS combination (Appendix 2.2). REPTOR however does not appear to be related to any other bZIP family and is characterised by an astonishingly conserved basic domain, with 33 consecutive residues (positions 7-35, 37-39 and 41) identical in nearly all species, and a very short coiled-coil region (Figure 2.1 and Appendix 2.2). I named this family after its Drosophila orthologue, which has been recently identified as a downstream factor of the TORC1 signalling pathway (Tiebe et al., 2015). The origin of the PAR-L and OASIS-L sub-families, and REPTOR family can be traced back to being present in the last common metazoan and eumetazoan ancestor, respectively.

Nearly all bZIP families and subfamilies have been maintained in metazoan taxa included in this survey, although there are few cases of gene loss. Notably, ctenophores, which are deemed to be the earliest branching metazoan phyletic lineage (Whelan et al., 2015), exhibit a higher level of bZIP gene loss than any other metazoan lineage (Figure 2.2 and Appendix 2.5). As a number of bZIP genes missing in the ctenophores surveyed are present in non-metazoan holozoans Ð PAR, CREB and CEBP in Pleurobrachia and PAR in Mnemioposis - the ctenophore bZIP repertoire is not likely to reflect the ancestral metazoan condition. I conclude that at least eleven of bilaterian orthology groups were present in the last common ancestor to extant metazoans, and seven in the holozoan LCA.

Notably, this approach identified a putative member of the JUN family in the holozoan Capsaspora, a bZIP gene family previously regarded as metazoan-specific (Amoutzias et al., 2007). This finding is consistent with our observation that members of the fungal GCN4 family and the metazoan JUN family appear to be distant orthologues (Figure 2.4 and Appendix 2.5).

Thus, our approach identified (i) the foundational bZIP family present prior to the evolution of metazoan multicellularity, (ii) the origin of metazoan bZIP families (OASIS A-B, MAF-NFE2) and bilaterian subfamilies (sMAF, NFIL3, CEBPg, ATF3), (iii) the expansion and loss of specific

23 family members in particular lineages, and (iv) uncovered three uncharacterised bZIP families (Figures 2.2 and 2.3 and Appendix 2.2 and 2.3).

2.4.2 Emergence of novel domain combinations in holozoan bZIP families bZIPs appear to have increased their connectivity throughout metazoan evolution through the linking to other protein domains (Miller, 2009). The kinase inducible activation domain (KID) is associated with CREB in several metazoans and mediates the interaction of CREB and p300/CBP. Although the Capsaspora genome encodes p300 (Sebe-Pedros et al., 2011), its CREB gene does not encode a KID. In contrast both sponge and ctenophore include a KID domain, consistent with a domain-shuffling event early in metazoan evolution to bring together these domains.

A detailed analysis of domains associated with bZIP domains in other families identified a number of conserved domain combinations, including an ATF2-specific and a JUN-specific transactivation domain, previously described in human ATF2 (Nagadoi et al., 1999) and JUN (Sutherland et al., 1992), and an ATF6-specific domain, which is present in unicellular holozoan orthologues (Appendix 2.6). I also recovered highly conserved regions located N-terminally of the PAR, MAF and OASIS basic domain, the latter being present in unicellular holozoan OASISes (Appendix 2.6). Interestingly, these regions were not conserved in the PAR-L and OASIS-L bZIPs. None of these domains were detected in fungal orthologues or related bZIPs.

2.4.3 Conservation of eukaryotic bZIP DNA-binding motif Based on our observation that many bilaterian and indeed metazoan bZIPs originated early in holozoan evolution, well prior to the diversification of the Metazoa, I attempted to retrace bZIP evolution beyond the opisthokont last common ancestor. A total of 896 bZIPs were identified from 49 eukaryotic genomes, including opisthokont (metazoans, choanoflagellates, and fungi), amoebozoan, plant (Viridiplantae - land plants and green algae, and red algae), heterokont (oomycetes, diatoms and brown algae) and excavate representatives. bZIP genes were recovered in all species surveyed (Appendix 2.1), except two amoebozoans (Hartmannella vermiformis and Physarum polycephalum) and a microsporidial fungus (Antonospora locustae), consistent with bZIPs being present in the most recent shared ancestor of extant eukaryotes.

Amongst the eukaryotic bZIPs, the leucine residues in the leucine zipper region (residues 31, 38, 45, 52 and 59 in Figure 2.3) and a nine amino acid region in the N-terminal of the basic DNA-binding domain (residues 13-21 in Figure 2.3) are the most conserved. Five very highly

24 conserved amino acids within the basic region - Asn13, Ala16, Ala17, Ser/Cys20 and Arg21 - have been shown to be instrumental in determining sequence-specific DNA binding in bilaterians and fungi (Fujii et al., 2000; Gamboa-Melendez et al., 2013). The first and last of these amino acids are the most conserved amongst eukaryotes, with only three differences at these positions found across all eukaryotes surveyed: the excavate Giardia has a solitary bZIP factor with a Gly at position 13; a few plants have Lys at position 21; and in oomycete heterokonts have a Cys at position 13 in several proteins that appear to be functional (Gamboa-Melendez et al., 2013). The level of conservation of the other three residues varies between eukaryotic lineages (Figure 2.3). For instance, nonpolar Ala is the most common residue in position 16 in opisthokonts and ameobozoans, while in plants, heterokonts and excavates this site is most often populated by Ser, which has an uncharged polar side chain (Figure 2.3). Amongst the most conserved residues, position 20 is the most variable in terms of residue diversity and disparity, with Cys only observed in this position in opisthokonts.

2.4.4 Independent expansion of bZIPs in different eukaryotic lineages As eukaryotic bZIPs appear to have originated from a single ancestral protein (Weirauch and Hughes, 2011), I sought to identify any orthologues present in extant representatives of the six main lineages (Figure 2.3). Given the large number of sequences, each analysis included only one or two representative species from each of these lineages, although I ran multiple permutations, changing the lineage representatives each time. In general, the topology of the resulting trees remained constant regardless of the representative used. As with previous studies that included bZIPs from different eukaryotic groups (Amoutzias et al., 2007; Derelle et al., 2007; Sebe-Pedros et al., 2011), the alignments do not contain enough informative positions for meaningful bootstrap analysis, which impedes the inference of phylogenetic relationships. Nonetheless, I consistently recovered one cluster, which includes bZIPs from metazoans (OASIS-ATF6 family), fungi (HAC1 family), plants (proto-B group) and heterokonts (Figure 2.4), suggesting an ancestral bZIP existed prior to the divergence of these eukaryotic lineages. Interestingly, those families are among the founder bZIP families of each kingdom. This grouping is further supported by the high conservation of five unique residues (Figure 2.4).

In most cases, bZIP family expansions are restricted to individual organismal lineages (Figure 2.4), with orthologues only identifiable within a given eukaryotic lineage (Figure 2.5). The bZIP superfamily has been divided into orthology groups in both metazoans (Deppmann et al., 2006) and plants (Correa et al., 2008). These usually coincide with DNA-binding and dimerization preferences (Fujii et al., 2000; Vinson et al., 2002). Consistent with previous studies (Deppmann et

25 al., 2006; Tian et al., 2011), I identified 13 bZIP families in metazoans and five in fungi. I also recovered 13 clusters in plants, which correspond to the 13 families described in (Correa et al., 2008), and five in the green alga Chlamydomonas. In heterokonts, both maximum likelihood and Bayesian analyses support four ancestral orthologue groups. The limited number of genomes available in Amoebozoa and the debatable phylogenetic relationship between amoebozoan species greatly limit our analysis. I tentatively identified 3 groups of orthologues in this kingdom.

2.5 Discussion

Using 49 sequenced genomes representing disparate eukaryotic lineages, including 18 representative metazoan genomes, I have reconstructed the evolution of one of the most ancient transcription factor families employed in animal development and disease, the bZIPs. I demonstrate that the evolution of multicellularity is correlated with the expansion and diversification of bZIPs in different families. However, an apparent increase in morphological and behavioural complexity (e.g. along the bilaterian or eumetazoan stem) is not always accompanied with an increase in gene family number. Indeed, bZIP family complexity appears to increase incrementally over long evolutionary periods, possibly being one of a number of prerequisites for the evolution of networks underlying complex gene regulation.

2.5.1 The metazoan bZIP network is comprised of members of differing age The phylogenetic analysis of metazoan bZIPs highlights three major periods in the evolution of bZIPs (Figure 2.6). The first diversification of the metazoan bZIP complement occurred prior to the divergence of holozoan lineages. There were at least three identifiable ancestral opisthokont bZIPs families, ATF6, ATF2-sko1 and Jun-CGN4, that expanded into 7 holozoan families. A second round of expansion and diversification occurred prior to the divergence of extant metazoan lineages, with all of the 13 metazoan bZIP families evolving prior the divergence of eumetazoan and sponge lineages. Based on the bZIPs present in early branching metazoans, I infer that the last common ancestor to all animals possessed minimally 12 bZIPs (Figure 2.6). This complement has remained remarkably stable over the course of metazoan evolution, with very little evidence of gene loss. Four bZIP families (MAF, PAR, CEBP and FOS) duplicated and underwent further diversification prior to bilaterian cladogenesis; three families underwent another round of duplication in stem chordates (NFE2) and vertebrates (ATF4 and FOS) (Figure 2.6).

The OASIS family of bZIPs form three orthology groups (Appendix 2.3) that emerged either prior to metazoan cladogenesis or before the divergence of sponge and eumetazoan lineages. The OASIS-L group is distinct from the other OASIS family members and lacks the OASIS- 26 specific extended basic domain. PAR, CEBP and FOS duplicated and diverged along the bilaterian stem to give rise to three new pairs of families, PAR-NFIL3, CEBP-CEBPg and FOS-ATF3. Cnidarians also possess pairs of CEBP and FOS genes that clade with one of the bilaterian orthologues, suggesting that these duplicated prior to the divergence of cnidarians and bilaterians; one of the duplicates subsequently diverged to form a new subfamily in bilaterians. The sponge- eumetazoan common ancestor possessed two PAR orthologues: PAR and PAR-L; PAR further duplicated and diversified in stem bilaterians to give rise to PAR and NFIL3 orthology groups. Independent expansion of this family in bilaterians, cnidarians and poriferans has yielded PAR- members that are unique to these taxa (Appendix 2.3). Similarly, although all eumetazoans possess at least two MAF-bZIPs, the emergence of large MAF and small MAF orthology groups did not occur until after the divergence of cnidarians and bilaterians (Appendix 2.3). This study also identified a new bZIP family, called REPTOR after its Drosophila member (Tiebe et al., 2015), which emerged from an unidentified bZIP ancestor along the poriferan-eumetazoan stem. All REPTORs display an almost identical basic region and a very short leucine zipper, which may nonetheless be functional for dimerization (Reinke et al., 2013; Tiebe et al., 2015).

ATF2-Sko1, ATF6-HAC1 and JUN-CGN4 families are common between metazoans and fungi, and most likely reflect the bZIP network that was in place in the common ancestor of extant opisthokonts. Members of those families preferentially recognise a palindromic sequence that is conserved in extant animals and fungi (Miller, 2009). I found that the transactivation of ATF2, CREB and JUN are conserved throughout metazoans, and that of ATF6 throughout holozoans (Appendix 2.6). The conservation of both co-factors and binding sites potentially reflects the primordial role of these factors in the opisthokont ancestor and points towards a conserved function across animal and fungal orthologues. Consistent with this idea, metazoan ATF6 and yeast HAC1 hold a similar role in the activation of the seemingly conserved unfolded protein response (Yoshida et al., 2001). Similarly, ATF2 plays a key role in the control of homeostasis in animals (reviewed in Hai and Hartman, 2001) while SKO1 is central in the yeast response to different stress stimuli (Rep et al., 2001). This reasoning may explain the general housekeeping role of CREB/ATF factors in animals, in contrast to the specific roles of metazoan-specific families (e.g. PAR and the control of circadian rhythm (Reitzel et al., 2013)).

2.5.2 The diversification of bZIPs in eukaryotes support a role in the evolution of multicellularity Multicellularity evolved several times in eukaryotes (Parfrey and Lahr, 2013) and arose from a diversification of gene regulatory networks (Levine and Tjian, 2003). Given the bZIPs are

27 part of the ancestral eukaryotic transcription factor repertoire, a relationship between bZIP expansion, and diversification and evolution of complex multicellularity appears plausible. Our analysis supports the inference that the eukaryotic LCA possessed a solitary bZIP, which underwent an early expansion and diversification in the lineages leading to crown animals and fungi (stem opisthokont lineage), and Viridiplantae (land plants and green algae) (Figure 2.5). This is consistent with an early increase in bZIP membership being a prerequisite for the evolution of the complex multicellularity displayed in animals, plants and fungi. The bZIP superfamily underwent a further duplication and divergence in each of these lineages, as detailed above for animals, with some of the expansions occurring prior to the emergence of the stem metazoan lineage (Figures 2.5 and 2.6).

The differential expansion of the bZIP superfamily in opisthokont and plant lineages is consistent with the hypothesis that the independent diversification of bZIP families contributed to evolution of complex multicellularity on more than one occasion. Specifically a marked difference in ancestral metazoan and fungal, and land plant and algal bZIP repertoires, with morphologically more complex metazoans and land plants having more bZIP families (12 and 13, respectively) than simpler fungi and unicellular algae (4 and 5, respectively; Figure 2.5).

The bZIP superfamily has also diversified in other eukaryotic lineages that display colonial or simple multicellularity. For instance, with a more limited dataset I identified 3 and 4 families in amoebozoans and heterokonts, respectively. In these cases again it appears that the expansion and diversification of bZIPs into different families occurs in lineages that include organisms with complex life cycles and more than one cell type (Figure 2.5). The establishment of the core of the bZIP network early in evolution is consistent with bZIPs central role in basic cellular processes (Deppmann et al., 2006).

2.5.3 Reconstruction of the ancestral bZIP The giardial bZIP has been proposed as a model for the precursor of all bZIPs [7]. However, it is consistently recovered in a clade that lacks most eukaryotic lineages; only metazoan and fungal bZIPs clade with this bZIP. Although in this study I could not confidently identify the ancestral bZIP of each kingdom, the similarity between sequences from ancient families of plant (proto-B), metazoan (ATF6-OASIS), fungal (HAC1), amoebozoan and heterokont bZIPs suggests they share features of the ancestral eukaryotic bZIP (Figure 2.3), including the deeply conserved NxxSAxxSR (residues 13-21) signature motif.

28 This five-residue motif is involved in sequence-specific DNA binding and has not varied greatly over the entire course of the bZIP superfamily evolution. Indeed this may explain the restricted number bZIP families and the similarity of bZIP DNA-binding sequences throughout Eukaryota. Although each monomer possesses its own transactivation activity (reviewed in Llorca et al., 2014), bZIPs can nonetheless regulate a wide range of cellular processes because they bind DNA as dimers, where each basic domain contributes individually to the recognition of one half binding site (Fujii et al., 2000). Pairing of bZIPs generates an extensive array of dimeric regulators, which in combination determines the transcriptional activity. Thus the independent diversification of bZIPs in multiple eukaryotic lineages allows for a marked and lineage-specific expansion in potential combinations. As dimerization is key to the functional diversification of bZIP transcription factors, offering the potential for flexible and complex transcriptional programs, the expansion of this gene family potentially contributed to the foundations underlying the evolution of complex multicellular organisms.

2.5.4 Conclusions I compiled a dataset of 896 bZIPs from 49 sequenced genomes from across the Eukaryota. The depth of this dataset permits an assessment of the evolution of this family of transcription factors in relation to the timing of the major transitions that occurred during eukaryotic evolution, including the evolution of multicellularity. I demonstrated that bZIPs underwent an early expansion and diversification, independently in each of the five main eukaryotic lineages, and was likely a contributing prerequisite for the evolution of organisms with complex life cycles and multiple cell types. Focusing on metazoans, I reconstructed the duplication events that shaped bZIP sub-families and identified three previously uncharacterised bZIP sub-families that appear to have been lost in mammals. Our analysis identified the ancestral metazoan and holozan bZIP repertoire, which comprise 7 and 12 founder genes, respectively.

29 Figures

97/100 4/- REPTOR

67/100 ATF4 81/100 CREB

51/100 ATF6

20/58 OASIS

27/100 XBP1 58/100 OASIS-L 57/100 NFE2

JUN 24/100

14/100 MAF

58/100 ATF2

FOS 12/52

33/99 CEBP Chordates Bilaterians Eumetazoans PAR-L Metazoans Holozoans

4/50 PAR Opisthokonts

Figure 2.1. Phylogenetic analysis of the metazoan bZIP superfamily. Mid-point rooted maximum likelihood tree of unambiguously identified bZIPs. Branches are collapsed into families. Statistical support was obtained with 1,000 bootstrap replicates and Bayesian posterior probabilities; both values are shown on key branches. Coloured circles to the right indicate the presence of members of that family in taxa listed to the right

Figure 2.2. Evolution of recognised bilaterian bZIP families. (See next page) Listed in the first column are bZIP families (left) and subfamilies (right). Along the top are the species included in this survey and their phylogenetic relationship based on (Srivastava et al., 2010; Whelan et al., 2015). The number of genes per bZIP family and subfamily are listed below each species. Colours reflect the level of evidence behind this tally (decreasing from red to orange). For each species the total number of bZIPs identified and the number of bZIPs retrieved are reported at the bottom of the table.

30

Figure 2.2. Evolution of recognised bilaterian bZIP families. (legend on previous page)

31 Holozoa& (392/16))

Fungi& (106/9))

Amoebozoa& (27/3))

Plantae& (96/4))

Heterokonta& (128/6))

Excavata& (4/2))

Figure 2.3. Conservation and evolution of bZIPs in eukaryotes. The consensus bZIP domain for each of the six main eukaryotic lineages is depicted with a logo (Schuster-Bockler et al., 2004). The number of sequences/species used in this analysis is indicated in parentheses. Black circles indicate five highly conserved residues of the basic domain throughout eukaryote, which have been shown to be involved in DNA-binding site recognition in animals and yeast (Fujii et al., 2000).

32

Figure 2.4. Phylogenetic analysis of eukaryotic bZIPs. Unrooted Maximum Likelihood tree of bZIPs from two species representing each of the five main eukaryotic lineages: metazoans (blue); fungi (green); amoebozoans (red); plants (brown); and heterokonts (black). To avoid crowding the tree, the Arabidopsis bZIP set was restricted to two genes from each bZIP family (as described in (Correa et al., 2008). Branches are collapsed into kingdom-specific clades and the name is indicated when possible. Statistical support was obtained with 1,000 bootstrap replicates and an asterisk indicates well-supported branches (>50%). A unique cluster containing orthologues from the metazoan OASIS-ATF6 family, the fungal HAC1 family and the plant group proto-B group is highlighted in yellow. bZIPs from this clade share five additional conserved positions, indicated by white circles (top logo, yellow background), compared to the whole eukaryotic set (bottom logo). Black circles indicate five highly conserved positions of the bZIP domain discussed earlier.

33 PAR CEBP OASIS ATF6 Metazoa LCA CREB 12 Holozoan LCA 7 ATF2 Animals (9 -53) JUN Holozoa ATF6/HAC1 Unicellular Opisthokont LCA 3 ATF2/SKO1 holozoans (14- 20) JUN-CGN4 HAC1 SKO1 Ascomycetes (6-14) Fungi LCA 4 CGN4 YAP Basidiomycetes (6-10) Fungi Zygomycetes (12-27) Microsporidia (1) Entamoeba (3) Amoebozoa LCA 3 Dictyostelium (18) Amoeboza Eukaryote Acanthamoeba (6) LCA 1 Land Plant LCA 13 Proto-B Plants (75) H-like Plant LCA 4 Proto-C G/J-like Algae LCA 5 Plantae Green algae (9) Red algae (9-12)

Brown algae (21) Heterokont LCA 4 Diatoms (20) Heterokonta Oomycetes (37-45)

Excavate LCA 1 Giardia (1) Excavata Naeglaria (3)

Figure 2.5. A model for the eukaryotic bZIP evolution. A simplified phylogenetic tree of the eukaryotes is depicted. Phylogenetic relationships are based on the most commonly accepted view of the phylogeny of the Eukaryotes (King, 2004). A white square indicates multicellular lineages; a black square indicates unicellular lineages; a black and white square indicate lineages comprising both multicellular and unicellular species. In parentheses, I reported the range of bZIPs I identified in the species sampled in this study. The estimated number of bZIPs in each last common ancestor (LCA) is indicated in a yellow box next to its name. When possible, I indicate the inferred bZIP families by a circle. Two circles have the same colour when they are thought to have originated from the same protein.

34

Figure 2.6. Origin and diversification of holozoan bZIPs. Each row represents the putative bZIP complement present in the corresponding LCA (last common ancestor). An hexagon represents a bZIP protein, and the family it belongs to is indicated underneath. Main families are indicated by different colours while subfamilies are indicated with different shades. bZIP subfamilies that had not been reported prior to this study are marked with stripes. Vertical lines represent families or subfamilies that have been conserved throughout metazoan evolution. A double line indicates a duplication event; a fork indicates the divergence of one of the copies. A dotted line reflects an uncertain relationship. In the case of CEBP-PAR and MAF-NFE2, I ascertained that the two subfamilies originated from a single protein but could not identify this ancestor (depicted as an hexagon bearing a question mark). At the bottom, subfamilies are gathered into the traditional bilaterian main families.

35 CHAPTER 3 Ð CHARACTERISATION OF AMPHIMEDON QUEENSLANDICA BZIP TRANSCRIPTION FACTORS

3.1 Abstract

The animal ancestor possessed a minimal repertoire of 12 bZIP transcription factors, each of which gave rise to a bZIP subfamily present in extant animals. Although the function of bZIPs have been characterised to varying extents in bilaterians, there is little understanding of their role in early branching metazoans and thus the ancestor common to themselves and bilaterians. Here, I systematically characterised the complete set of bZIPs in a representative of one of the first branching animal phyletic lineages, the sponge Amphimedon queenslandica. Using a genome-wide gene expression dataset, I examined the quantitative expression of bZIPs across A. queenslandica development, from the early cleavage-stage embryo to the adult, and compared it to in situ hybridisation expression patterns. Since bZIPs bind DNA as dimers, I also predicted which AqbZIP-AqbZIP interactions are more stable in A. queenslandica. I then sought to identify potential bZIP targets by identifying genes whose expression profiles was statistically correlated to those of one or more bZIP genes. This study revealed that A. queenslandica bZIPs display high temporal specificity and distinct cell type-specific localisation, and contains correlative evidence for the involvement of bZIPs in conserved cellular or developmental contexts. I infer that bZIPs may have been recruited in developmental and cell specific regulatory networks at the dawn of the animal kingdom.

3.2 Introduction

Across the animal kingdom, bZIP transcription factors sit at the core of regulatory pathways that control every aspect of cell decision making (reviewed in Miller, 2009). Since all of the metazoan bZIP classes pre-date the divergence of sponge and eumetazoan lineages (Chapter 2), they are likely to have been part of gene regulatory networks (GRNs) in their common ancestor. Although bZIPs have been extensively studied in bilaterians, notably in the context of cancer development (Hai and Hartman, 2001; Mayr and Montminy, 2001; Miller, 2009; Shaulian and Karin, 2002; Thompson et al., 2009; van Dam and Castellazzi, 2001), few published results are available in non-bilaterian metazoans (Galliot et al., 2009; Kaloulis et al., 2004).

The sponge Amphimedon queenslandica provides a useful point of comparison to the bilaterian and eumetazoan situation. In addition to being the sister groups to the eumetazoans,

36 sponges are markedly different in their overall body plan and composition. Further, sponges have a biphasic life cycle comprised of embryonic and metamorphic phases and Ðunlike a majority of eumetzoan- appear to be in a constant state of morphogenesis, with the juvenile/adult body plan constantly reshaping to shifting environmental conditions (Degnan et al., 2015). This provides multiple unique opportunities to study the morphogenetic and cellular processes in this animal (Degnan et al., 2015). It is thus an ideal model to examine the regulation of cellular processes outside eumetazoans. The developmental toolkit that regulates cell decisions and behaviours during A. queenslandica embryogenesis and metamorphosis has been described and is strikingly similar to eumetazoans (Adamska et al., 2007a; Adamska et al., 2010; Conaco et al., 2012; Degnan et al., 2015; Larroux et al., 2006; Nakanishi et al., 2014; Richards and Degnan, 2012; Srivastava et al., 2010) but the involvement of the bZIPs in this process has not been addressed.

3.2.1 Normal development of Amphimedon queenslandica

The A. queenslandica biphasic life cycle consists of a pelagic larva and a sessile benthic adult (Degnan et al., 2015; Degnan and Degnan, 2010). Larvae emerge from internal brood chambers in the adult, and swim in the water column Ð typically for a few hours Ð in search of a suitable habitat to settle and metamorphose into a juvenile sponge (Conaco et al. (2012) and references therein). The cellular and transcriptional events occurring across the A. queenslandica life cycle have been extensively characterised (Degnan et al., 2015).

Embryological development occurs in internal chambers containing 50 to 150 asynchronously developing embryos (Degnan et al., 2015). Embryological states are identified and named after the pattern of pigment cells at their posterior pole (Figure 3.1). Following fertilisation, cleaving embryos grow by the fusion of nutritive maternal nurse cells, which eventually undergo programmed cell death (Degnan et al., 2015). This period of asymmetric cleavages leads to the formation of a blastula. Blastula stage, or brown stage, is characterised by cellular specialisation and determination: six cell types are identifiable, including pigment cells that are spread throughout the embryo and give the embryo its brown colour. Cells are also sorted into an inner and an outer layer. This is also when the first asymmetry is observed in the embryos; Wnt is expressed in cells on the future posterior side of the embryo (Adamska et al., 2007a). From cloud to spot stages, the embryo anteroposterior axis becomes obvious, as pigment cells migrate to the future posterior pole of the larva and come together to form a dark spot. The pigment spot then develops into a ring, apparently by migration of pigment cells into this area in response to molecular signals (Richards, 2010). At this stage, the epithelial cell types and patterns form. The ICM is comprised of various

37 cell types, probably originating from the differentiation of local precursor cells. The embryo grows and elongates, and the sub-epithelial layer appears between the inner and outer layers. The late trilayered embryo morphologically resembles a larva. Throughout embryological development, a layer of squamous follicle cells -probably of maternal origin- surrounds embryos (Degnan et al., 2015).

When the larva hatches, it has three cell layers made up of different and various cell types: the ciliated epithelial layer, which consists of epithelial, globular, flask and anterior cuboidal cells, and specialised long-ciliated and pigmented cells associated with the pigment ring; the sub- epithelial layer, composed of a bed of collagen fibres and spherulous cells; and the inner cell mass, which contains several cell types, including archeocytes and spicules producing sclerocytes (Degnan et al., 2015; Richards, 2010). At the posterior pole, the pigment cells and posterior ciliated cells form the locomotive and photosensory organ (Leys and Degnan, 2001). At this stage the larva is negatively phototactic (Leys and Degnan, 2001). After approximately 4 hours, the larva becomes responsive to chemical cues and able to initiate settlement and metamorphosis (Degnan and Degnan, 2010). Settlement can notably be induced by the coralline alga Amphiroa sp.(Nakanishi et al., 2014).

Metamorphosis begins with the larva anterior region flattening onto the settlement substrate (Figure 3.2). By approximately 6 hours post-settlement (hps), the postlarva is spread across the settlement surface. The aquiferous system first appears at the chamber stage, approximately 48 hps. Next, tent-pole-like structures formed of siliceous spicule skeleton appear. Approximately 3 dps, the osculum is formed. At this stage, the postlarva displays most of the adult morphological characteristics and cell types, and is able to feed by filtering surrounding seawater (Degnan et al., 2015 and references therein). The oscula stage and the adult are composed of several readily identifiable cell types, including choanocytes Ð gathered in choanocytes chambers Ð archeocytes, sclerocytes, spherulous cells and exopinacocytes forming the outer layer (Fieth et al, in prep). The putative functions of each cell type are discussed in section 5.4, as relevant.

3.2.2 Predicting bZIP dimers bZIP transcription factors form homodimers or heterodimers with other bZIPs, and each member of the pair binds one half of the DNA target binding site (Vinson et al., 2002). Only a subset of all possible bZIP-bZIP combinations actually forms in vivo. Dimerization is largely determined by the sequence of the leucine zipper domain (Vinson et al., 2006; Vinson et al., 2002).

38 The leucine zipper domain is an alpha-helix composed of repeats of seven amino acids, conventionally labelled a, b, c, d, e, f and g (McLachlan and Stewart, 1975). Based on structural and thermodynamic studies, a set of rules has emerged that can predict bZIP dimerization specificity on a genome-wide basis from the amino acid sequences of the leucine zippers (reviewed in Vinson et al., 2002) and (Grigoryan and Keating, 2008); used in (Deppmann et al., 2004; Fassler et al., 2002; Vinson et al., 2002). Briefly, interactions between hydrophobic residues in positions a and d from one monomer with residues in analogous positions a’ and d’ from the second monomer (’ refers to the second bZIP monomer) create a hydrophobic core (Figure 3.3 A), which stabilises the bZIP dimer (Vinson et al., 2006; Vinson et al., 2002). e-g’ and e’-g interactions between charged residues in analogous positions e and g of each monomer can either be attractive or repulsive and thus determine the overall stability of the dimer (Figure 3.3A). The numbers of favourable and unfavourable interactions between pairs of a-d’ and e-g’ residues from two leucine zippers can be used to infer the stability of the resulting dimer (Deppmann et al., 2004; Fassler et al., 2002; Vinson et al., 2002).

Building on these simple rules, several computational models have been developed to quantify bZIP dimer stability (Reinke, 2012). Notably, using a machine learning approach that integrates coupling energies and known interactions from a bZIP database, Fong et al. predicted 70% of interactions between human bZIPs that are observed in vitro, with an 8% false negative rate (Fong et al., 2004) (Figure 3.3B). This algorithm is most accurate at predicting strong interactions and non-binders, but cannot accurately determine the affinity of interactions. Use of this tool aids inference of bZIP partnering selectivity and has been used for genome-wide prediction of bZIP dimerization networks (Ali et al., 2015; Fong et al., 2004; Patro and Kingsford, 2013). Critically, this algorithm does not predict bZIP interactions based on similarity with orthologues, but rather computes the strength of each residue-residue interaction between two hypothetical leucine zippers, and derives a score for the corresponding dimer. This tool is thus suitable for use in non-model species or distant relatives of well-studied vertebrates, such as sponges.

In this chapter, I provide a detailed view of the complexity and dynamics of A. queenslandica bZIPs expression, which suggest that bZIPs play a central role in this sponge. Specifically, I document the AqbZIPs’ temporal and spatial expression patterns throughout development and metamorphosis. I also identify other genes that share a similar expression profile, and predict stable bZIP-bZIP interactions. Since the level of cellular organisation in the sponge is significantly simpler than that of bilaterians, many contexts in which bZIPs are involved in other animals (e.g. lens development or heart regeneration) are not applicable in sponges, which lack

39 neurons and complex organs. Nonetheless, I show that AqbZIPs exhibit dynamic and complex expression patterns and are broadly expressed in a range of cell types, and provide correlative evidence for the involvement of bZIPs in the regulation of fundamental cellular processes in A. queenslandica. In light of these results, I propose that bZIPs were already integrated in GRNs regulating cell decisions in the last common ancestor of sponges and eumetazoans.

3.3 Methods

3.3.1 AqbZIPs genomic features AqbZIP genomic coordinates and sequences were obtained from in-house gene models (S. Fernandez-Valverde, S. Degnan and B. Degnan, unpublished). Genome version Aqu2.1 was used for all analyses, except the splice variants that are based on version Aqu2 (Fernandez-Valverde et al., 2015), which is publicly available (http://amphimedon.qcloud.qcif.edu.au/index.html). A full list of AqbZIPs genomic features and accession numbers across multiple databases is provided in Appendix 3.1. AqbZIP cDNA sequences are listed in Appendix 3.2. Protein sequences were provided in Chapter 2 (Appendix 2.1).

Intron phase values of all of AqbZIPs were extracted from a publicly-available GFF3 file (http://amphimedon.qcloud.qcif.edu.au/index.html) (Fernandez-Valverde et al., 2015). In this file format, phase values represent the position of the first nucleotide of an exon, relative to the first nucleotide of the next corresponding codon in the open reading frame. Simply put, a value of “1” in the GFF3 file corresponds to a phase 2 intron, and vice versa. Erroneous phase values associated with the first exon were removed, since it is by definition not preceded by an intron. Gene stick figures were made with the Wormweb online Graphic Maker tool (available at http://wormweb.org/exonintron).

3.3.2 Phylogenetic analyses Phylogenetic analyses were conducted as described in Chapter 2 (Jindrich and Degnan, 2016). Briefly, AqbZIP domains were aligned using the MAFFT v7 plugin in Geneious (Kearse et al., 2012). Maximum likelihood analyses were carried out using RaxML (LG+G substitution matrix and 1000 bootstraps replicates) (Stamatakis, 2006). Bayesian analyses were conducted using MrBayes 3.2 (Ronquist et al., 2012) with the LGG substitution matrix and based on two parallel runs, four chains and a resampling frequency of 100. Convergence was reached when the average standard deviation fell below 0.05.

40 3.3.3 A. queenslandica transcriptomics using CEL-Seq Expression profiles of A. queenslandica genes, including the AqbZIPs, were obtained from a genome-wide expression dataset (Anavy et al., 2014: NCBI GEO (GSE54364) and S. Fernandez- Valverde, N. Nakanishi, K. Roper, B. Degnan, S. Degnan, unpublished data). Briefly, total RNA was extracted from 82 developmental samples spanning A. queenslandica development and quantified using CEL-Seq (Hashimshony et al., 2012). Samples were ordered chronologically using the BLIND method (Anavy et al., 2014). Biological replicates were clustered into 17 broad developmental stages, spanning early cleavage to adult stages. Since only one replicate sample was available for each individual larval time point (0-2 hours post emergence (hpe), 2-4 hpe, 3-5 hpe, 4- 6 hpe, 6-8 hpe, 8-10 hpe, 10-12 hpe, 24-26 hpe and 48-50 hpe for larvae 6-50 hpe stage), I combined them into two groups: all larvae between 0 Ð 6 hpe and all larvae between 6 Ð 50 hpe. The mean and standard deviation were calculated for the expression at each developmental stage for each of the 17AqbZIPs. A heatmap scaled by row of AqbZIPs expression trends was generated using the R function heatmap.2 from the gplots package (http://www.cran.r- project.org/web/packages/gplots/index.html).

3.3.4 Differentially expressed genes Differential gene expression analysis on the genome-wide CEL-Seq dataset described above was performed using DESeq2 (p-adj cutoff < 0.05). Seven biological replicates were used for each morphological stage, except for the two larval stages (i.e. larvae 0 Ð 6 and 6 Ð 50 hpe), which were a mixture of time-points. The stages in which the AqbZIPs were statistically significantly up or down regulated were extracted from the resulting gene lists.

3.3.5 Correlation of gene expression profile analysis The list of all A. queenslandica genes that were differentially expressed in at least one of the 17 developmental stages was used as input for this analysis. To reduce noise, genes with an overall expression of less than 500 normalised counts throughout the whole developmental time course were discarded. A Pearson correlation matrix comparing the expression profiles of those genes was built using R through the Rstudio interface. The statistical significance of each correlation was assessed using a Fisher test. I extracted from the resulting list the genes that were significantly (p ≤ 0.05) correlated (cor ≥ 0.90) with each of the 17 AqbZIPs.

To identify genes whose expression profiles correlated with that of AqbZIPs during either embryogenesis or metamorphosis Ð but not across the entire developmental timecourse Ð the same analyses were performed using only the relevant developmental stages as input (cor ≥ 0.95). The

41 script used to perform this analysis was generated by S. Fernandez-Valverde and is provided in Appendix 3.3.

BLASTP hits for all correlated genes were extracted from a genome-wide annotation performed by S. Fernandez-Valverde using Blast2GO version 2.8 (Conesa and Gotz, 2008).

3.3.6 Gene ontology enrichment analysis Gene ontology terms overrepresented in the list of genes correlated to any and each AqbZIPs were identified using BINGO within Cytoscape 3.3.0 (Maere et al., 2005) with custom annotations (S. Fernandez-Valverde, unpublished) and a FDR adjusted P-value cut-off of 0.05. Redundant GO terms were clustered using REVIGO (SimRel measure, redundancy >0.7) (Supek et al., 2011) and a treemap was generated using R. Venn diagrams were generated using the Venny (available at: http://bioinfogp.cnb.csic.es/tools/venny/index.html).

3.3.7 Whole mount in situ hybridisation A. queenslandica developmental material was collected from Heron Island Reef, Great Barrier Reef, Queensland, Australia, and fixed as previously described (Leys et al., 2008). Gene fragments from the predicted coding sequence of 11 highly expressed AqbZIPs (Srivastava et al., 2010) were amplified from cDNA with gene specific primers (Appendix 3.4), cloned into pGEM-T Easy (Promega) following manufacturer instructions and verified by sequencing using M13F and M13R primers. Digoxigenin (DIG)-labelled antisense and sense RNA probes were transcribed from polymerase chain reaction (PCR) products with T7 or SP6 Polymerase (Promega), using DIG RNA Labelling Mix (Roche). DIG-labelled probes were used at a final concentration of 1 ng/µL. Probes length and domain coverage are reported in Appendix 3.4.

Whole mount in situ hybridisation was performed as described in (Larroux et al., 2008a). Whole mount and sectioned samples were photographed with a Nikon Sight DS-U1 camera (Nikon Australia Pty Ltd, Lidcombe, Australia) on either an Olympus SZX7 or a Nikon Eclipse Ti microscope with Nomarski optics (Olympus Australia Pty Ltd, Mt Waverly, VIC, Australia).

3.3.8 Immunohistochemistry Larvae and juveniles were collected and fixed as described in (Leys et al., 2008). Immunohistochemistry was carried out as described in Appendix 3.5 and (Nakanishi et al., 2014), using custom-made primary antibodies raised by GenScript Biotechnology (NJ, USA) against a peptide located outside the bZIP domain. Out of seven antibodies raised against five AqbZIPs, two

42 Ðrabbit Anti-ATF4 and rabbit Anti-JUN- produced a strong and specific signal. Antigen peptides were CSKVKILKDHNKELN (AqJUN) and CTPKRTPQGRGGRKS (AqATF4). To control for non-specific binding, the primary antibody was incubated overnight at 4¼C with a 1:10 molar excess of antigen peptide. The pre-absorbed antibody was then used in place of the primary at the same concentration.

For secondary antibodies I used AlexaFluor 488 anti-rabbit and AlexaFluor 568 anti-chicken (Molecular Probes). Nuclei were labelled with the fluorescent dye 4',6-diamidino-2-phenylindole (DAPI, Molecular Probes). Specimens were mounted in ProLong Gold antifade reagent (Molecular Probes) and observed using the Zeiss LSM 510 META confocal microscope. Because of the thickness of the specimen, imaging all samples with constant parameters -notably the same gain- is not always possible. As such, all negative controls were imaged with a higher gain, to ensure that the loss of signal observed in the controls relative to the tested sample was not a technical artefact. Image analysis was performed using the Fiji distribution of ImageJ.

3.3.9 Dimer prediction The N-terminal end of the leucine zipper (LZ) was determined according to previous work (Newman and Keating, 2003; Vinson et al., 2002) and the C-terminal end by the point at which the coiled-coil probability determined by Paircoil2 (available at http://paircoil2.csail.mit.edu) drops below 10% (McDonnell et al., 2006). To predict the stability of each bZIP pairing, I manually annoted the properties of the residues contained in the a, d, e and g positions as originally described in (Vinson et al., 2002). Although bZIP orthologues do not necessarily display the same partner preferences as one another, I also compared each predicted LZ to that of its H. sapiens and N. vectensis orthologues (Chapter 2) and reported residue substitutions in AqbZIPs that may drastically alter partner selectivity (Newman and Keating, 2003; Reinke, 2012; Vinson et al., 2002).

I then scored every possible AqbZIP dimer using the “bZIP Coiled-Coil scoring form” server (available at http://compbio.cs.princeton.edu/bzip/), which provides a front-end for a machine-learning algorithm developed by Fong et al. (Fong et al., 2004). This model integrates the computation of electrostatic interactions and quantitative coupling energies between two LZs, with a Support Vector Machine trained on a database of known interacting coiled-coils. It derives a score for the resulting dimer, where a higher score indicates a more stable pairing. Two score functions are available that use different training databases. By comparing scores obtained for H. sapiens bZIP pairings -using the “base-optimized weights” score function- (Figure 3.3B) to documented human bZIP dimers (Fong et al., 2004), two thresholds were determined: strong interactions

43 (scoring above 35) and non-binders (scoring below 10). Pairs scoring between 30 and 35 were considered as potential secondary interactions. The alternative score function “base-optimized weights + human bZIP” was used to confirm strong pairings. All computational predictions were assessed against the manual annotations of the leucine zippers described above.

3.3.10 Promoter searches Promoters were defined as previously described (Fernandez-Valverde et al., 2015), i.e. a 500 bp region spanning from 400 bp upstream to 100 bp downstream of the transcription start site (TSS). Motifs enriched in the AqbZIP promoter sequences were identified using MEME (using settings -mod zoops -nmotifs 10 -minw 6 -maxw 15 Ðrevcomp) and analysed with TOMTOM (using default settings). Known DNA motifs from the jolma2013, the JASPAR_CORE vertebrates and the uniprobe_mouse databases that were enriched in AqbZIP promoter sequences were identified using AME (p-value < 0.05). Finally, promoter sequences were scanned for three canonical bZIP binding sites Ð the ATF4, CRE and TRE motifs from the Jolma2013 and Jaspar databases - using FIMO (default settings). The tools used in this analysis are part of the MEME suite (Bailey et al., 2009).

3.4 Results

3.4.1 Genomic organisation of A. queenslandica 17 bZIPs Using sequence similarity and phylogenetic analyses described in Chapter 2 (Jindrich and Degnan, 2016), I identified 17 bZIPs in A. queenslandica. All, except four, of these are located on different scaffolds. The four co-located genes, AqPARb, AqNFE2b, AqATF2 and AqPARa, are at least 10 kb away from one another on the same scaffold (Appendices 3.1 and 3.6).

Shared or divergent intron position and splicing phase can reflect phylogenetic relationships in a gene family (Patthy, 1987). The A. queenslandica bZIP genes have between 0 and 12 introns, which range in size from 17 bp to 1.7 kb with an average size of 215 bp. While A. queenslandica genes have an average of 5 exons (Fernandez-Valverde et al., 2015), most bZIPs have less than 3 introns (Figure 3.4A and Appendix 3.1). Five genes, AqATF6, AqCREB, AqOASIS A, AqOASIS B and AqXBP1, have introns located in the region encoding the bZIP domain (Figure 3.4A). Two of these genes, AqATF6 and AqCREB, have isoforms (Fernandez-Valverde et al., 2015); alternative splicing only affects the integrity of the bZIP domain for one of the 4 AqCREB isoforms (Figure 3.4 B).

44 3.4.2 Phylogenetic analysis of A. queenslandica bZIP domains The phylogenetic relationship between A. queenslandica bZIPs has been described in Chapter 2 (Jindrich and Degnan, 2016) and is therefore only briefly related here. I identified at least one representative of the main 12 bZIP subfamilies present in metazoans (Figure 3.5 and Figure 3.6). 16 of the 17 AqbZIPs could be traced back to the sponge-eumetazoan LCA (Figure 3.5). A. queenslandica encodes two NFE2 genes, while the genomes of 2 other sponges Ð one calcareous and one homoscleromorph sponge Ð each contained only a single NFE2 copy. This indicates that the two A. queenslandica NFE2 genes probably originated from a duplication event after divergence of the demosponge lineage from other sponge classes, or that other sponges have lost additional NFE2 copies.

A. queenslandica bZIP domains are highly similar from each other (Appendix 2.7). Aside from family-specific domains, reported in Chapter 2, AqbZIPs do not appear to possess species- specific domains.

3.4.3 bZIPs are dynamically expressed across A. queenslandica life cycle Across sponge development, A. queenslandica bZIP steady-state expression levels are higher than most other genes, with expression values consistently above the 75th percentile of genome-wide transcript abundance (Figures 3.7 and 3.8). AqATF4 is the most highly expressed of all the AqbZIPs at any given developmental stages (Figure 3.8). AqATF2, AqOASISA and AqATF6 CEL-Seq read counts remain low Ð relative to other AqbZIPs- even in developmental stages where they are up regulated (Figure 3.8).

Each AqbZIP exhibits at least one significant change in expression between successive developmental stages (padj < 0.05; Figure 3.9). These changes overlap with important transitions in A. queenslandica life cycle: when embryonic cell movements commence at the transition from cleavage to brown stages, at the emergence of the free-swimming larva (from ring to late ring/larva), during metamorphosis of the larvae into a post-larva (larva 7-50h to PS1_1h) and at the formation of the functionally feeding juvenile (tent pole to oscula/adult). I also observed clusters of bZIPs, whose expression are up/down regulated in concert, corresponding to different morphological stages- embryo, larva, young post-larva and feeding post-larva/adult (Figure 3.9), and that are described below. Throughout this section, expression levels of each AqbZIPs are described relative to that of other AqbZIPs at the same developmental stage.

45 In cleavage embryos, all but three bZIPs -AqPARa, AqOASIS and AqCREB- are highly expressed (Figure 3.9). During cloud and spot stages of embryogenesis, coalescence of pigment cells into a “spot”, at the future posterior pole of the larvae, marks the establishment of the anterior- posterior axis (Adamska et al., 2007a). AqCREB, AqFOS and AqJUN are highly expressed during these stages. As the pigment spot turns into a ring, AqFOS, AqJUN and AqOASISB reach their highest expression level (Figures 3.8 and 3.9).

In late ring and larval stages, transcript abundance of seven bZIPs - AqMAF, AqPARa, AqOASIS, AqATF4, AqOASISB, AqCREB and AqCEBP Ð is markedly higher relative to preceding embryonic stages. Most of these genes are differentially expressed in the developing or newly hatched larva, while expression levels of AqATF4, AqOASIS and AqCEBP are higher in the mature larvae.

As the larva initiates metamorphoses into a juvenile sponge, specific bZIPs are differentially expressed. In the first hour following larval settlement, AqMAF is the only bZIP that is significantly upregulated and transcript abundance of AqPARa, AqOASIS, AqATF4, AqOASISB and AqCEBP remains high following larval stages. By 7 hps, as the postlarva flattens on the settlement substrate, expression of AqXBP1, AqATF2 and AqATF6 spikes. During later stages of metamorphosis, the aquiferous system and the tent-pole-like structures that lift the postlarva epithelial layer appear. Expression of bZIPs remains stable during this period.

Finally, the osculum appears and the postlarva become a feeding juvenile sponge. Seven bZIPs -AqMAF, AqJUN, AqFOS, AqXBP1, AqATF2, AqOASISA and AqNFE2B- are highly expressed from this stage onto adulthood (Figures 3.8 and 3.9). AqPARa is the only bZIP significantly up regulated between oscula and adult stages. AqPARa, AqATF4 and AqFOS are the most highly expressed AqbZIPs during these stages (Figure 3.9).

3.4.4 bZIPs show cell-specific and broad patterns of expression during development Using in situ hybridisation, I analysed patterns of mRNA localisation for the 12 most highly and dynamically expressed bZIPs in 5 developmental stages: cleavage; spot; ring; newly hatched larvae (0-2 h after release); and oscula stage (juvenile). Spatial expression patterns of the other 5 AqbZIPs were not explored here due to time constraints. Since the expression of AqPARa is discussed in detail in Chapter 4, the expression patterns of this gene is only briefly described here.

46 3.4.4.1. Localised expression of bZIPs during embryogenesis Each of the AqbZIPs has a different developmental spatial expression profile and pattern (Figures 3.10 and 3.11). In late cleavage embryos, cells expressing AqCREB, AqFOS, AqPARa and AqMAF are distributed throughout the embryo (Figures 3.10A1-F1 and 3.11). Specifically, AqFOS and AqMAF transcripts are localised in a subset of micromeres and macromeres (Figures 3.10A1- A2, B1-B2 and 3.11). In contrast, expression of AqCEBP and AqJUN is detected in the most central part of the embryo. Both genes are also expressed in the maternal follicular cell layer (Figures 3.10C1, E1-E2 and 3.11).

In spot stage, the embryo has two distinct cell layers (Degnan et al., 2015). Cells expressing AqFOS, AqJUN and AqMAF are restricted to the inner cell mass (ICM) (Figures 3.10 A.3, B3, C2 and 3.11). In contrast, AqCREB transcripts are more abundant in the outer layer (Figure 3.10D2). Expression of AqCEBP is strong throughout the embryo, but predominantly in the ICM (Figures 3.10 E3 and 3.11).

Later in development, in ring stage embryos, AqPARa-expressing cells are located around the pigment ring (Figures 3.10 F2 and 3.11). Transcripts from all other analysed AqbZIP genes were localised in the ICM (Figures 3.10 and 3.11). Cells expressing AqJUN expression appear to be particularly concentrated in the inner part of ICM (Figures 3.10C3 and 3.11).

3.4.4.2. Localised expression of bZIPs in larvae In newly hatched larvae, AqbZIPs are predominantly expressed in the ICM or the sub-epithelial layer (SEL) (Figures 3.12 and 3.13). Specifically, AqCEBP and AqATF4 expression was detected in cells of the ICM (Figure 3.12C and D-D1). AqFOS, AqJUN and AqMAF transcripts are abundant in SEL cells (Figure 3.12A-B2, F). Expression of these genes is also diffuse throughout the ICM, and markedly weaker at the anterior pole (Figure 3.12A, B, F). AqCREB expressing cells were present both in the ICM and SEL (Figure 3.12 E-E1). Only AqPARa and AqATF4 transcripts were detected in the epithelial layer, though epithelial expression of AqATF4 was faint relative to ICM expression. (Figures 3.12 D, G-G2 and 3.13). AqPARa and AqATF4 expression profile and pattern in larval stages are covered in more depth in Chapters 4 and 5, respectively.

I then sought to compare the localisation of AqbZIP proteins to that of their mRNAs. Custom-made polyclonal antibodies were raised by Genscript against five selected AqbZIPs; only two Ðanti- AqATF4 and anti-AqJUN- were deemed specific enough to be included in this work. Anti-AqJUN was however not used in larval stages. 47 Fixed larvae were labelled with an anti-ATF4 antibody, raised against a peptide located outside AqATF4 bZIP domain. Larvae incubated with the pre-immune serum or a pre-absorbed antibody, which was pre-incubated with the antigen peptide, served as negative controls. Anti-ATF4 specifically labelled a subset of large nuclei in the ICM, featuring a large nucleolus characteristic of archeocytes (Nakanishi et al., 2014) (Figure 3.14B), and also a subset of nuclei scattered in the epithelial layer, which appear to be flask cells (Figure 3.14C). No staining occurred with the blocked antibody (Figure 3.14D and Appendix 3.8) and non-specific cytoplasmic staining occurred with the pre-immune serum (Appendix 3.8)

3.4.4.3. Localised expression of bZIPs in juvenile sponge (oscula stage) Transcripts from all AqbZIP genes were detected in the juvenile sponge (Figures 3.15 and 3.16). AqNFE2a-, AqNFE2b- and AqXBP1-expressing cells form line-shaped structures on the periphery of the juvenile sponge (Figure 3.16A1, B1, D1).

At this stage, six cell types are morphologically discernible (Fieth et al., in prep). All but one of the AqbZIPs investigated -AqNFE2b- were expressed in choanocytes (Figures 3.15 and 3.16); however some of the genes were only expressed in a subset of chambers (e.g. AqOASIS, Figure 3.16E2-E3), or in a subset of choanocytes within a chamber (e.g. AqPARa, Figure 3.16J2). Expression of several AqbZIPs was also observed in archeocytes (Figures 3.15 and 3.16). In both choanocytes and archeocytes transcripts were predominantly localised in or around what appears to be the nucleus (e.g. AqATF4 and AqOASIS, Figure 3.16C2 and E2; AqXBP1 and AqMAF Figure 3.16A2 and I3). Notably, AqJUN and AqCREB transcripts were particularly abundant in the nucleus of migrating archeocytes (Figure 3.16F3 and G3), which exhibit an elongated cell body and form tracts in the mesohyl (Degnan lab, unpublished).

Other cell types comparatively expressed fewer AqbZIPs. Spongocytes, most concentrated in the cortical layers, expressed only three bZIPs (AqJUN, AqFOS and AqATF4) (Figures 3.15 and 3.16C3, G4, H4). Finally, AqCREB, AqNFE2a and AqNFE2b transcripts were particularly abundant in amoeboid cells in the exopinacoderm (the external layer of the sponge), most likely a type of pinacocyte (Figure 3.16 A3, C3, F4). Spherulous cells and spicules-producing sclerocytes did not express any of the AqbZIPs investigated here (e.g. Figure 3.16 C4).

I sought to confirm the expression pattern of AqJUN and AqATF4 by immunostaining, using custom made antibodies against a peptide located outside the bZIP domain. Juveniles incubated with a pre-absorbed antibody served as negative control. Anti-JUN labelled choanocytes and 48 archeocytes distributed throughout the juvenile (Figure 3.17). Specifically, the nuclei of both cell types stained positively for JUN (Figure 3.17B-C), as well as vesicles in archeocytes cell body and surrounding nuclei of choanocytes (Figure 3.17 B-C). Pre-absorption of the antibody abolished all nuclear staining (Appendix 3.9). However, cytoplasmic labelling resembling “agglomerates” occurred with the blocked antibody (Appendix 3.9). Therefore, unspecific binding of the antibody could account for the strong cytoplasmic JUN labelling of choanocytes cell body (Figure 3.17B).

Anti-ATF4 specifically labelled the nucleus of a subset of cells, characterised by a compact nucleus and almost no cell body (Figure 3.18A). These characteristics apply to pinacocytes. Vesicles in the cytoplasm of archeocytes were also labelled with ATF4 (Figure 3.18A). Other cell types, e.g. spherulous cells were not labelled with ATF4 (Figure 3.18B).

3.4.5 Co-expression of bZIPs with other genes First, I compared AqbZIPs general trend of expression to one another and to that of the rest of A. queenslandica genes. S. Fernandez-Valverde defined 21 eigengene categories, comprising genes whose differential expression is correlated across all developmental stages (Figure 3.19, S. Fernandez-Valverde, unpublished). The 17 AqbZIPs fall into 11 eigengene categories and only one co-expression module included more than 2 bZIPs (Figure 3.19). Those 11 modules represent six of the eight major trends of expression observed in A. queenslandica. Members of the same bZIP family belonged to different co-expression modules (Figure 3.19).

To identify putative bZIP targets, I performed a genome-wide correlation analysis to identify genes whose development-wide expression trends were statistically correlated with that of the AqbZIPs. Using this approach, the expression profiles of three AqbZIPs correlated with that of less than 10 genes -AqPARa (n=1), AqATF4 (n=7) and AqCEBP (n=4)-, while other AqbZIPs had up to 125 correlated genes (Table 3.1). Interestingly, the sole gene correlated with AqPARa expression is one of the sponge cryptochromes (Aqu2.1.32109_001). The relationship between PAR-bZIPs and cryptochrome genes is developed in Chapter 4. AqbZIPs that belong to the same eigengene category shared most, but not all, correlated genes with one another and AqbZIPs from different categories did not share any genes with the other bZIPs (Figure 3.20 A,B and Appendix 3.10). Expression profiles of six AqbZIPs Ð AqFOS, AqJUN, AqOASISB, AqXBP1, AqATF2 and AqCREB- were not statistically correlated with that of any other genes (Appendix 3.10).

To investigate the putative association of the AqbZIPs with specific biological processes, I identified GO terms that were overrepresented in the list of genes correlated with any and each of

49 the AqbZIPs, relative to the rest of the genome (Figure 3.20C, Appendices 3.10 and 3.11). Overall, genes associated with cellular response to various stimuli and cell signalling were abundant amongst the genes correlated with the AqbZIPs (Appendices 3.10 and 3.11). For instance, AqNFE2a and AqOASIS gene suites included many stress and cGMP (Cyclic guanosine monophosphate) response genes, respectively (Figure 3.20C and Appendix 3.10).

Since most bZIPs are upregulated in multiple developmental stages, and are putatively regulating different targets in each of them, I also searched for genes whose trend of expression was statistically correlated to that of AqbZIPs during either embryogenesis or metamorphosis but not across the entire developmental timecourse. Selected genes, which include a range of signalling molecules and transcription factors, are reported in Table 3.2, and the full list is provided in Appendix 3.11.

It should be noted that this analysis does not take into account that the transcription of a regulator precedes that of its target by approximately 3 hours for A. queenslandica at 15 ¼C (Bolouri and Davidson, 2003). This caveat becomes more important during early metamorphosis, when the time between consecutive developmental stages is shorter.

3.4.6 A. queenslandica bZIP dimers prediction Since dimerization ultimately determines bZIPs DNA binding preferences, and thus gene targets, I sought to build a network of putative interactions between A. queenslandica bZIPs.

bZIP dimerization is mediated by the leucine zipper, which is composed of repeats of seven amino acids conventionally labelled (gabcdef)n . AqbZIP leucine zippers are 3 to 5 heptads long (Figure 3.21B), similar to human bZIPs (Vinson et al., 2002). The a position contains mainly non- polar non-charged residues, while the g and e positions contains a majority of polar charged residues; by definition, the d position is occupied almost exclusively by leucine residues. The proportions and nature of the amino acids occupying these positions are similar between A. queenslandica and human (Figure 3.21A). Since members of the same bZIP subfamily generally share similar leucine zippers, I compared each A. queenslandica leucine zipper to that of its orthologues (Chapter 2) and reported residue substitutions at positions a, d, e and f, that could alter dimerization preferences (Appendix 3.12). Only AqJUN and AqFOS leucine zippers markedly differ from their bilaterian counterparts, exhibiting similar substitutions to their cnidarian orthologues (Appendix 3.12).

50 I predicted interactions in two ways. First, I manually annotated AqbZIP leucine zipper sequences using established dimerization determinants (Vinson et al., 2002) (Figure 3.21C). Next, I used the bZIP Coiled-Coil Scoring Form developed by Fong et al. (available at http://compbio.cs.princeton.edu/bzip/) to evaluate the stability to each AqbZIP-AqbZIP interaction. This model has been used in other bZIP studies (Ali et al., 2015; Deppmann et al., 2006; Patro and Kingsford, 2013) and provides an acceptable understanding of bZIP partnering (Reinke, 2012). The bZIP scoring form predicted high-confidence strong interactions and non-binders for all of A. queenslandica bZIPs, exept AqNFE2a, AqNFE2b, AqMAF, and AqFOS (Figure 3.21D and Appendix 3.13). Human bZIPs from the NFE2 and MAF family form well-documented dimers in vivo that here received low scores between 10 and 25 (See 3.2.2, Figure 3.3B), probably due to the presence of basic residues in a position (Motohashi et al., 2002; Newman and Keating, 2003; Reinke et al., 2013) and (Fong et al., 2004). AqNFE2s and AqMAF have several basic residues in a positions, and differ from all other AqbZIPs in 2 ways: (i) they do not have a polar neutral Asn residue in the a position of the 2nd heptad; and (ii) they usually contain a polar positive residue in e positions and a polar negative residue in g positions. AqFOS leucine zipper contains a bulky Tyr residue in d position in heptad 1, instead of the usual Leu (Figure 3.21C). AqFOS did form 3 pairings that scored between 23.5 and 29.7, well above other scores (<15). Its strongest interaction was with AqJUN.

To build a putative AqbZIPs dimerization network, I classified pairings in 3 categories: strong interactions, which scored above 35; secondary interactions, which scored above 33 or scored above 30 and were supported by alternative methods (see methods for details); and non- binders, which scored below 10 (Figure 3.22A). I predicted 11 homodimers and 9 heterodimers. AqCREB, AqATF6, AqOASIS and AqXBP1 preferentially form homodimers; AqCEBP and AqPARb preferentially form heterodimers; AqMAF and AqFOS do not form homodimers (Figure 3.22).

3.4.7 Sequence analysis of A. queenslandica bZIPs promoters To investigate the possible regulation of AqbZIPs by known regulatory pathways, I searched for conserved motifs in the promoter regions (as previously defined in (Fernandez-Valverde et al., 2015), i.e. 400 bp upstream to 100 bp downstream of the transcription start site (TSS) of AqbZIP genes, using a range of tools from the MEME Suite (Bailey et al., 2009). Only one motif was discovered de novo; it was present in all promoters and matched to a canonical TRE site, namely the JUN binding site from the JASPAR CORE database (MA0488.1) (Figure 3.23A). No enriched motif was found in a control set of 17 randomly selected promoters (10 iterations). I then searched

51 for known eukaryotic DNA motifs in the AqbZIP promoters using a range of bioinformatics tools from the MEME suite (Bailey et al., 2009). Only two known motifs were enriched in AqbZIPs promoters: the ATF4 and the CRE-like JUND binding sites from the Jolma2013 and JASPAR CORE databases, respectively (Figure 3.23B). The ATF4 binding site comprises a CRE/TRE half site (TGA) followed by a CAAT site, and is the putative primary binding site of the human ATF4- CEBP dimer (Jolma et al., 2013). Scanning the sequences for these motifs, and the canonical TRE and CRE binding sites, uncovered additional putative binding sites (Figure 3.23B). In addition to the CRE-like motif discovered de novo, AqOASIS, AqMAF and AqATF4 have an ATF4-like binding motif; AqFOS and AqCREB have a CRE motif; and AqOASISA and AqNFE2b have a TRE motif (Figure 3.23B). Predicted bindings sites lie at varying distances from each other and the TSS (Figure 3.23B). The same set of searches on a 2 kb window upstream of the TSSs did not identify any significant motifs.

3.5 Discussion

3.5.1 A. queenslandica bZIPs are expressed in a variety of cellular contexts 3.5.1.1. Embryonic expression of A. queenslandica bZIPs Cleavage is a period of extensive cell division, which in sponges appears to be largely asymmetric (Degnan et al., 2015). During early cleavage, maternal nurse cells migrate into the embryo and undergo programmed cell death, depositing their cytoplasmic content into the embryo via an unknown mechanism (Degnan et al., 2015). The transcripts of many transcription factor and other genes are localised to different subsets of micromeres during cleavage (Adamska et al., 2010; Fahey, 2011; Gauthier; Larroux et al., 2006; Rivera et al., 2012), suggesting that cell specification may be occurring (Degnan et al., 2015). All of the AqbZIPs but one -AqCREB- are highly expressed in cleaving embryos, consistent with them playing a role in A. queenslandica early development. It is however unclear if these bZIP mRNAs are maternally or zygotically derived. The in situ hybridisation patterns from most of the AqbZIPs analysed here -AqFOS, AqMAF, AqPARa and AqJUN- were localised to micromeres and macromeres; AqCEBP mRNA was broadly expressed throughout the cleaving embryo. The early expression of bZIPs in A. queenslandica differs from the general pattern in other metazoans, where they are not usually expressed until later embryonic development (Adryan and Teichmann, 2010; Schep and Adryan, 2013) but is consistent with cell speciation being part of early embryogenesis in A. queenslandica (Degnan et al., 2015).

From brown to spot stage, only AqCREB is significantly upregulated. Brown stage is characterised by cellular specification, determination and in some case differentiation of pigment

52 cells, sclerocytes and ciliated epithelium (Degnan et al., 2015; Leys and Degnan, 2001). It is at this stage that the larval anterior-posterior axis appears to be first established (Adamska et al., 2007a). In cloud to spot stages, anterior-posterior axial polarity has been established with pigment cells migrating to the future posterior pole of the larva. Canonically, CREB genes are not primarily involved in embryogenesis in other taxa (Lonze and Ginty, 2002). AqCREB transcripts are localised to the outer cell layer of the embryo, in contrast to AqFOS, AqMAF, AqCEBP and AqJUN, which appear to be more enriched in the inner layer. AqCREB in situ hybridisation on sectioned embryos could provide insights on the role of this gene in spot embryos.

The pigment spot develops into a ring apparently by migration of pigment cells into this area (Richards, 2010). At the same time, epithelial and inner cell mass cells are proliferating and undergoing morphogenesis to form a functional larval body plan (Degnan et al., 2015). AqJUN and AqFOS are upregulated and broadly expressed in the inner cell mass (ICM) of this stage. The ICM comprises various cell types, presumably from the differentiation of local precursor cells, and active cell division is still occurring in this layer (Nakanishi et al., 2014). Developmental roles of JUN and FOS genes in other animals include the regulation of cell proliferation and differentiation (Mechta- Grigoriou et al., 2001; Mechta-Grigoriou et al., 2003). Through the JNK signalling pathway, JUN and FOS are also instrumental in cell motility and tissue morphogenesis in Drosophila and mouse (Xia and Karin, 2004). The involvement of JUN and FOS genes in the regulation of any cellular processes in bilaterians is consistent with their broad expression patterns in A. queenslandica but makes it difficult to infer a particular function for these genes in this sponge.

AqPARa expression pattern in and around the pigment ring matches closely that of A. queenslandica light-responsive cryptochrome AqCRY2 (Rivera et al., 2012) and their expression profile are significantly correlated. The involvement of AqPARa, and bZIPs more generally, in light-responsive and circadian processes is explored in Chapter 4.

3.5.1.2. bZIP expression during larval development Larval stages are marked by a set of differentiated cells on the surface epithelium, and cells in the ICM, which are proliferating and potentially undifferentiated; this cell layer has archeocyte- like pluripotent cells (Degnan et al., 2015; Nakanishi et al., 2014). A range of transcription factors, signalling pathway components, and structural genes are expressed in a cell-type specific manner at this stage, in both the ICM and various epithelial cell types (Adamska et al., 2010; Larroux et al., 2006; Richards and Degnan, 2012). A number of bZIPs are expressed in the ICM, including AqCEBP and AqATF4. AqATF4 transcripts appear to be localised around the nuclei of 53 mesenchymal archeocytes, which are pluripotent stem cells (Funayama, 2013). Interestingly, the human ATF4:CEBP heterodimer is a key regulator of cell differentiation from mesenchymal stem cells (Cohen et al., 2015). However, since AqATF4 and AqCEBP do not form a strong dimer in A. queenslandica, this function may not be conserved; their interaction is deemed as secondary. Further work would be required to refine our understanding of the CEBP expression pattern, ideally using an anti-CEBP antibody to determine whether both proteins are co-expressed in the same nuclei. The putative role of AqATF4 in A. queenslandica larva is discussed in greater depth in Chapter 5. AqCREB is also differentially expressed in the newly hatched larva and broadly expressed in large cells of the ICM. However, I have not characterised AqCREB expression in this stage in enough detail to take these observations further.

Cells of the sub-epithelial layer (SEL) predominantly express three AqbZIPs, including AqMAF, which is significantly upregulated just before the larva emerges from the brood chamber. AqJUN and AqFOS transcripts are also markedly more abundant in this layer, relative to ICM expression. The SEL is populated by large spherulous cells and smaller unidentified cells, but its precise function is still unclear (Degnan et al., 2015; Leys and Degnan, 2001; Richards, 2010). Resolving the expression patterns of these AqbZIPs at a cellular level would be necessary to formulate hypothesis on their possible role in the larva SEL.

Amongst the AqbZIPs analysed here, only AqPARa and AqATF4 were expressed in cells of the epithelial layer. Putative function of AqPARa is explored in Chapter 4. AqATF4 epithelial expression seems restricted to the nucleus of flask cells, which are putative sensory/secretory cells (Degnan et al., 2015; Richards, 2010). As previously stated, the putative role of AqATF4 in A. queenslandica larva is discussed in greater depth in Chapter 5.

3.5.1.3. bZIP expression during metamorphosis Early metamorphosis (the first 24 hps) is a period of extensive transdifferentiation and programmed cell death, notably of larval epithelial cells (Degnan et al., 2015; Nakanishi et al., 2014). Surprisingly, only three bZIPs ÐAqMAF, AqXBP1 and AqATF6- are significantly upregulated during this period and do not include metazoan master regulators JUN and FOS (van Dam and Castellazzi, 2001); other bZIPs are upregulated during this period, though not to an extent that is statistically significantly greater than the previous stage. Spatial expression of these genes during early metamorphosis was not investigated.

54 Canonically, MAF genes are primarily involved in differentiation processes in other animals (reviewed in Eychene et al., 2008), including tissue-specific terminal differentiation in mammals and insects (Blank and Andrews, 1997; Eychene et al., 2008; Motohashi et al., 2002). Since expression patterns in metamorphosing postlarvae were not conclusive, AqMAF’s precise function during metamorphosis remains uncertain. It is worth noting that AqMAF expression level remains low relative to other AqbZIPs, despite being significantly upregulated. Although expression level of a transcription factor does not necessarily reflect its activity (Vaquerizas et al., 2009), this could suggest that AqMAF is only active in a small subset of cells. Another alternative is that, akin to its bilaterian orthologues, AqMAF is acting as transcriptional or target switch. Indeed, since MAFs recognise a MAF-specific binding site (MARE), MAF heterodimerization with other various bZIPs lead to different targets being activated (Eychene et al., 2008; Motohashi et al., 2002).

The dimer predictive model used in this study does not perform well on MAF bZIPs and even strong dimers formed by human MAF bZIPs score low (Fong et al., 2004). Part of the issue lies in the unusual composition of MAF leucine zippers, which generate residue-residue interactions that current computational models, including the one used here, are not able to optimise (Fong et al., 2004; E. Kane, personal communication). Consistently, AqMAF was not predicted to form any stable dimer. As more experimental measurements of bZIP interactions become available (Potapov et al., 2015; Reinke et al., 2013), new computational models have been developed (e.g. Potapov et al., 2015) that could inform on AqMAF possible bZIP partners. Those predicted interactions could then be verified in vitro (see discussion in 3.3.2). Resolving the expression pattern of AqMAF during early metamorphosis, notably its protein localisation could also provide valuable insight on its possible involvement in A. queenslandica cell differentiation. If the AqMAF protein is abundant enough in early postlarvae, the ChIP-Seq method described in Chapter 5 could identify putative MAF targets, especially since MAF generally bind a longer and more conserved binding site relative to other bZIPs (Motohashi et al., 2002).

At about 6-7 hps, AqXBP1 and AqATF6 expression levels spike. In bilaterians, transcription of XBP1 is induced by ATF6 following the activation of the unfolded protein response (UPR) pathway (Calfon et al., 2002; Tsai and Weissman, 2010; Yoshida et al., 2001). The yeast putative ATF6 orthologue, HAC1 (Chapter 2), is also a key component of the UPR and performs very similar functions to XBP1 in this context (Shamu, 1998). Consistent with this idea, AqXBP1 promoter region contains an ATF6 binding site (CRE); since CRE sites were found in most AqbZIPs promoters, it is however unclear whether this reflects a particular relationship or a general inter-connectivity of AqbZIPs.

55 The 6-7 hps stage is marked by extensive apoptosis in the juvenile epithelium (Nakanishi et al., 2014). The UPR is the cells response to restore homeostasis under conditions of stress and can lead to apoptosis if the stress persists. In this context, activation of XBP1 has been shown to promote cell survival (He et al., 2010; Jiang et al., 2015). Interestingly, the A. queenslandica AqXBP1 expression profile is statistically correlated only to two genes containing ring finger domains -including one putative E3 ubiquitin-protein ligase- whose bilaterian orthologues are known to play a prominent role in the regulation of the UPR pathway. The precise function of AqXBP1 and its relation to A. queenslandica metamorphosis is still unclear. It is conceivable that AqXBP1 plays a role in cell survival at a stage marked by a large degree of apoptosis. AqXBP1- expressing cells are particularly abundant in the epithelial layer of the 3 d old juvenile but its localisation in the 6 h old postlarva is currently unknown. The use of a counterstain labelling apoptotic bodies would be particularly informative. In any case, it seems that the link between ATF6 and XBP1 may predate the bilaterians.

3.5.1.4. bZIP expression in the 3 day old juvenile The first indications of morphogenesis starts at about 36 hps and by 3 d the functional juvenile body plan is formed (Degnan et al., 2015; Sogabe et al., 2016). At this stage, multiple cells types are identifiable, including two cell types that can transdifferentiate -choanocytes and archeocytes (Sogabe et al., 2016). The expression of multiple A. queenslandica bZIPS in these two sponge stem cells (Funayama, 2013) is consistent with putative conserved roles as master regulators of cellular differentiation and proliferation in animals (Miller, 2009; Shaulian and Karin, 2002; Weirauch and Hughes, 2011). AqJUN and AqCREB particularly stand out as they are highly expressed in migrating archeocytes. AqJUN mRNA and protein were localised in choanocytes and archeocytes. Anti-JUN labelled both the nuclei and cytoplasm of the choanocytes. Cytoplasmic signal could result from background staining or reflect the protein’s ubiquitous expression in cells that are in a constant state of fate determination, consistent with the strong expression of the mRNA in motile archeocytes. The role of JUN genes in the regulation of cell differentiation is well established in bilaterians (Shaulian and Karin, 2002; Zenz and Wagner, 2006 and references therein). Similarly, vertebrate and cnidarian CREB genes play a key role in neuron differentiation (Barco and Marie, 2011; Galliot et al., 2009; Kaloulis et al., 2004). In the cnidarian Hydra, CREB is also involved in stem cell self-renewal and dedifferentiation processes during head regeneration (Chera et al., 2007). The putative role of AqCREB and AqJUN in sponge archeocytes dedifferentiation suggests that these genes may have regulated differentiation processes early in metazoan evolution.

56 Other AqbZIPs that could be involved in conserved regulatory pathways include AqNFE2a and AqNFE2b. Canonically, members of the NFE2 family are activated through the NFR2 pathway and serve a cytoprotective role during time of oxidative stress (Hayes and Dinkova-Kostova, 2014; Ma, 2013). Sponges rely on the phagocytosis of a variety of particles, notably free-living bacteria, by choanocytes for feeding and they maintain a bacteria symbiont. Recognition of non-symbiotic bacteria is of prime importance and is thought to be achieved though the NRF2 pathway (Yuen, 2016). Notably, differential expression of AqNFE2s in juvenile under bacterial infection strongly supports this hypothesis (Yuen, 2016). Consistent with this idea, the AqNFE2a expression profile correlates with that of a suite of genes associated with cellular response to stress.

In juvenile, AqATF4 transcripts were most abundant in choanocytes, while AqATF4 was predominantly localised in archeocytes. Both AqATF4 transcripts and protein were also abundant in a previously uncharacterized cell type of the endoderm that resembles a spongocyte. Since choanocytes transdifferentiate into other cell types including epithelial pinacocytes, via an archeocyte intermediate (Nakanishi et al., 2014), the lack of cell identity permanency may explain discrepancies.

3.5.1.5. Functional implications of AqbZIPs expression The dynamic expression of bZIPs through A. queenslandica development is consistent with these transcription factors contributing to a variety of developmental and cellular processes. This may explain the overall low numbers of genes identified as potential AqbZIP targets in the correlation analyses reported. If AqbZIPs gene targets vary in different cell-types and developmental stages, and have a wide spectrum of functions, it would not be reflected in whole- animal expression profiles, spanning several morphological phases (i.e. embryogenesis or metamorphosis). Additionally, expression of all AqbZIPs was detected with in situ hybridisation in each developmental stage investigated here, and expression levels of all of the AqbZIPs are above the 75th percentile of genome-wide expression through the A. queenslandica life cycle. This suggests that bZIPs may be active in all developmental stages, even those where they are not significantly upregulated relative to previous stages. qPCR or CEL-Seq on individual cell-types may provide a better understanding on the localisation and timing of AqbZIPs expression. Alternatively, examining the changes in AqbZIPs expression in response to natural stimuli or manipulated environment, such as that adopted in Chapter 4, gives us functional insights on AqbZIPs role in A. queenslandica.

57 3.5.2 AqbZIPs dimerization network Reinke et al. (2013) proposed that the complexity of the bZIP network increased through opisthokont evolution. Supporting this hypothesis, yeast bZIPs and metazoan members of ancient bZIP families act strictly or preferably as homodimers (Reinke et al., 2013; Yin et al., 2013). In A. queenslandica, 50% of the 22 predicted pairings form homodimers. The choanoflagellate Monosiga brevicollis interactome show similar proportions, while that of eumetazoans is usually composed of only 20-30% homodimers (Reinke, 2012). Note that since a greater number of heterodimeric combinations exist relative to homodimers, these proportions do not truly reflect the propensity of bZIPs to preferentially homodimerize. However, in non-human eumetazoans and choanoflagellate, 65% of all possible homodimers occur, while most heterodimers do not (Reinke, 2012). In contrast, yeast bZIPs form over 90% of all possible homodimers (Reinke, 2012). A. queenslandica predicted interactomes show similar proportions top that of bilaterian. Thus, it seems plausible that the propensity of bZIPs to form heterodimers increased before the emergence of the holozoans, probably to enhance bZIP connectivity.

Using parsimony, Reinke et al. (2013) proposed a eumetazoan ancestral bZIP network from the interactomes of 4 bilaterians and 1 cnidarian (Reinke et al., 2013). This approach is seemingly in contradiction with results from the same paper showing that sequence conservation does not imply conserved interactions, and that a single amino acid substitution can have a dramatic effect on dimer stability. Nonetheless, the A. queenslandica predicted network is strikingly similar to Reinke et al. prediction. I predicted 2 interactions (JUN and AT4 homodimers) that are not in the proposed ancestral network, which in turn included 3 additional interactions (SMAF homodimer, FOS-ATF2 and PAR-LMAF heterodimers). Since the LMAF family originated in eumetazoans (Chapter 2) and AqJUN and AqFOS leucine zippers markedly differ from their eumetazoan orthologues, these discrepancies may be expected. I also predicted two non-binders, the FOS homodimer and MAF-PAR heterodimer, that are included as ambiguous interactions in the proposed ancestral network. Bearing in mind the caveats of both approaches, the concordance of these two predictions may reflect a core of conserved high-affinity interactions that maintained key residues in the leucine zipper domain through animal evolution.

The approach I have taken to construct this dimerization network is well documented but carries important caveats. For instance, amino acids preceding the leucine zipper domain -and referred to as the fork in the literature- strongly influence dimer stability (Reinke, 2012) but are not part of the predictive algorithm. Modelling A. queenslandica bZIPs secondary structure would be highly informative. The predictions I have made remain to be experimentally verified. A.

58 queenslandica small dimerization network provides a great opportunity for experimental testing. I did not have the time to address this question systematically. I sought to estimate the affinity of a subset of A. queenslandica bZIP dimers using an in-vitro pull down assay but this experiment did not succeed in the time allocated for this PhD. Checking a subset of the strongest bZIP pairings- and non-binders could inform on the robustness of the computational approach. It could also be used as correlative evidence along with protein expression pattern to accurately predict AqbZIPs DNA targets.

Finally, these predictions only reflect the strongest interactions. Less stable dimers can form, especially in the absence of high affinity partners (Grigoryan et al., 2009; Reinke, 2012). The JUN:FOS dimer is one of the best documented and most conserved bZIP dimer across the animal kingdom (Mechta-Grigoriou et al., 2001; Mechta-Grigoriou et al., 2003; Reinke et al., 2013; Shaulian and Karin, 2002). In A. queenslandica, pairing between AqJUN and AqFOS does not appear to be strong. However, both genes share highly correlated expression profiles and are expressed in the same cell types in several developmental stages, which suggest that their regulation may be connected. Since AqFOS homodimer is highly unstable, AqJUN:AqFOS dimer formation may be a case where cellular context rather than dictates partner specificity. I caution that only localisation of AqFOS mRNA is presented here. Co-localisation of AqJUN and AqFOS mRNA does not necessarily reflect that of their protein product. Nuclear co-expression of AqJUN and AqFOS has been investigated but the AqFOS antibody was not deemed specific enough to warrant reliable conclusion (data not shown).

3.5.3 bZIPs in the metazoan last common ancestor Metazoan bZIPs serve multiple roles in numerous cell types and pathways (cite review articles). Since A. queenslandica cellular organisation is far simpler than that of most other metazoans, inter-species functional comparisons in relation to specific cell types or tissues may not be meaningful. This analysis of developmental expression of A. queenslandica bZIPs demonstrates that this gene family is differentially expressed during embryogenesis, larval development and metamorphosis, consistent with this transcription factor family functioning in different developmental and cellular contexts. It also suggests that AqbZIPs work together, either by selective dimerization or within networks where one AqbZIP regulates another. Although functional confirmation of bZIPs roles was beyond the scope of this study, the detailed survey presented here establishes that those transcription factors are likely to be central to the functioning of A. queenslandica. Taken together these observations suggest that regulatory networks that

59 include bZIPs were already in place in the ancestor of all contemporary animals, and most likely coordinated a range of basic cellular processes within the context of the first multicellular animals.

Two AqbZIPs particularly stood out from this work. First, AqPARa expression profile statistically correlated to that of one of A. queenslandica cryptochrome AqCRY2, and both genes show strikingly similar in situ patterns. PAR-bZIPs and cryptochrome genes are both involved in the circadian response in other eumetazoans. The entrainment of AqPARAa and AqCRY2 expression by light is explored in Chapter 4. Second, AqATF4 is continually amongst the most expressed AqbZIPs and its protein is specifically localised to archeocytes and epithelial cells nuclei in the larva. The abundance of AqATF4 protein and the availability of a specific antibody against it make it an ideal candidate to functionally investigate AqATF4 gene targets using chromatin immunoprecipitation sequencing (ChIP-Seq). This will be the focus of Chapter 5 and will allow me to explore the interplay between transcription factor binding and chromatin features in A. queenslandica.

60 Figures

Figure 3.1. Embryonic and larval stages of A. queenslandica. Whole mount light micrographs of fixed and cleared embryos and larvae. Stage is indicated in the bottom left corner. Posterior pole is on top in (D-I). Scale bar: 100 µm. Image by (Richards, 2010).

61

Figure 3.2. Morphological stages during A. queenslandica metamorphosis (A) Free-swimming larvae. Posterior pole up. (B-F) Post-larva viewed from the top. (B) 1 h post-settlement (hps). (C) Mat formation, 6hps. (D) Chamber stage, 48hps. (E) Tent- pole stage, 48-72 hps. Arrowhead indicates tent pole-like structures. (F) Oscula stage juvenile. Inset shows the osculum. Scale bars: A and F-inset: 100 µm; B Ð F: 1 mm. Image by (Degnan et al., 2015)

62

Figure 3.3. Predicting bZIP dimers. (A - B) Cartoon of two leucine zippers interacting in a dimer shown in (A) side view and (B) top view. The dimerization interface between the two α-helices (cylinders) is formed by residues positions a, d, e and g (highlighted). Typically, a leucine is found at the d position. Positions in the two helices are distinguished by the prime notation (e.g. a and a' are analogous positions in the two helices). (C) H. Sapiens bZIP dimer interaction scores. Each column represents one bZIP, and each dot represents the predicted interaction score of hypothetical dimers formed between that bZIP and another bZIP, as computed by the bZIP Coiled-Coil Scoring Form (See methods). The colour of the dot indicates the strength of the putative interaction as measured in vitro: strong dimers are shown in red, non-binders in blue. Blue horizontal lines demark recommended thresholds for high-confidence predictions (above 32.8 and below 27.8 for strong dimer and non-binders, respectively) as determined by Fong et al. (2004). Adapted from (Fong et al., 2004).

63

Figure 3.4. A. queenslandica bZIP introns and splice variants. (legend over page)

64 Figure 3.4. A. queenslandica bZIP introns and splice variants. (see previous page) (A) Schematic representation of each of the 17 AqbZIPs indicating the location of introns and bZIP domain (blue). Intron phase is indicated on the right for each intron, listed from left to right. The number of splice variants (Fernandez-Valverde et al., 2015) is indicated in the last column. Scale bar: 100 bp. (B) Only two of the seventeen AqbZIPs (AqATF6 and AqCREB) have introns that interrupt the bZIP domain coding region and also exist as multiple splicing isoforms. Of these isoforms, only one (one of four AqCREB isoforms) alters the integrity of the bZIP domain as it is part of the 3’-UTR.

Figure 3.5. AqbZIPs phylogeny. (see next page) (A) Multiple alignment of AqbZIPs bZIP domain (basic region and leucine zipper). Residues conserved in over 50% of the sequences are highlighted. The consensus sequence is presented as a logo at the top. Numbers indicate amino acid positions from the start of the bZIP domain. (B) Mid-point rooted maximum likelihood tree of bZIPs from A. queenslandica (Aq; red), and species representing the vertebrates (H sapiens, Hs; black), insects (D. melanogaster, Dm; blue) and cnidarians (N. vectensis, Nv; green). Statistical support was obtained with 1,000 bootstrap replicates and Bayesian posterior probabilities; both values are shown on key branches.

65

Figure 3.5. AqbZIPs phylogeny. (legend on previous page)

66

Figure 3.6. A. queenslandica bZIPs List of A. queenslandica bZIPs, depicted as a hexagon, and the bZIP families they belong to.

67

Figure 3.7. AqbZIP expression levels relative to other A. queenslandica genes. Black data points represent the log 10 normalised CEL-Seq counts of one of the 17 AqbZIPs in each developmental stage, indicated on the x-axis. Lines indicate the genome-wide percentile (75th in green and 90th in red) of the same values for all A. queenslandica genes.

Figure 3.8. Developmental expression profiles of the AqbZIPs. (see next page) AqbZIP average gene expression levels across seventeen developmental stages, indicated along the x-axis. Expression values are CEL-Seq normalised read counts. Asterisks indicate those developmental stages where the AqbZIP is differentially expressed relative to the previous stage (p ≤ 0.05). Error bars are standard deviations. Note that the scales on y-axes are not consistent between graphs, as AqbZIPs are expressed at different levels.

68

Figure 3.8. Developmental expression profiles of the AqbZIPs. (Part 1 of 2)

69

Figure 3.8. Developmental expression profiles of the AqbZIPs. (Part 2 of 2) (legend on previous page)

70

Figure 3.9. Differentially expressed AqbZIPs across development. (legend over page)

71 Figure 3.9. Differentially expressed AqbZIPs across development (see previous page) Heatmap showing expression profiles of the 17 AqbZIPs across the developmental stages indicated along the x-axis. Expression levels were measured by CEL-Seq and are rescaled by row. Each row represents one AqbZIP and rows are clustered by expression profile similarity. Colours represent expression levels: high to low (red to white to blue). Asterisks indicate those developmental stages where the AqbZIP is differentially expressed relative to the previous stage (p ≤ 0.05). Boxes highlight the genes that are significantly upregulated during four morphological phases (embryo, larvae, young post-larva and feeding juvenile/adult).

Figure 3.10. AqbZIPs embryonic expression pattern. (see next page) Localisation of AqFOS (A1-A4), AqMAF (B1-B4), AqJUN (C1-C3), AqCREB (D1-D3), AqCEBP (E1-E4), AqPARa (F1-F2) transcripts by in situ hybridisation in three embryonic stages indicated in the left bottom corner. All plates show cleared whole mounts, except for E1, F1 and F2, which are whole mounts. Red arrowheads point to AqbZIP-expressing cells region; black arrowheads indicate negative cells. Dashed line marks the inner/outer cell layer boundary in (D2). Posterior pole to the bottom (A3, D2, E3), facing up (C2, F2) and right (A4, B3-B4, C3,D3, E4). Scale bars: 100 µm, except for D1 and E4 (75 µm), and A2 and B2 (10 µm). Mac: macromeres, Mic: micromeres.

72

Figure 3.10. AqbZIPs embryonic expression patterns. (legend on previous page)

73

Figure 3.11. AqbZIPs gene expression patterns observed in embryos. Diagrams summarising the gene expression patterns observed in cleavage, spot and ring embryos. Coloured areas indicate regions of expression and colour intensity reflects transcripts abundance. AqbZIPs sharing the same expression pattern during embryogenesis are depicted on the same diagram.

74

Figure 3.12. AqbZIPs expression in larvae Transcripts of AqFOS (A), AqJUN (B-B2), AqMAF (F) are particularly abundant in the SEL but not at the posterior pole (black arrowhead). AqCEBP (C), AqATF4 (D-D1) and ACREB (E) show strong expression in cells of the ICM. AqATF4 is also expressed in epithelial cells (red arrowhead in D) and AqCREB in cells of the SEL (E1). AqPARa is strongly expressed in the epithelial layer (G-G2). In all panels, red arrowheads point to domains of AqbZIP expression. Dashed line marks the ICM boundary in (D). (H) Larvae cell layers, negative control. (A-H, D1-E1, H) Cleared whole mounts, (B1, B2, G2) median sections through the middle of the ring. Scale bars: 100 µm, except for B2 (15 µm), and D1-E1 (5 µm). ICM: inner cell mass, SEL: sub-epithelial layer, EL: epithelial layer. (H) Image by G. Richards.

75

Figure 3.13. AqbZIPs gene expression patterns observed in larvae Diagrams summarising the gene expression patterns observed in larvae. Coloured areas indicate regions of expression and colour intensity reflects transcripts abundance. AqbZIPs sharing the same expression pattern are depicted on the same diagram.

76

Figure 3.14. AqATF4 labelling in A. queenslandica larvae (A) whole mount micrograph; (B-D) confocal sections of larvae labelled with anti-ATF4, DAPI, DiI or the blocked ATF4 antibody, as indicated in the left bottom corner. (B2) and (C4) are a merge of (B-B1) and (C-C2), respectively. AqATF4 is localised in cells of the ICM (B-B2), with large nucleus with nucleoli (inset in B1-B2), and in cells of the epithelial layer (C1-C4). All nuclear staining is abolished by pre-absorption (Pa) of the antibody (D). Red arrows and arrowheads indicate ATF4-labelled cells, white arrows point to negative cells. The dashed line indicates the ICM boundary. Scale bars: (A) 100µm, (C1-C4) 30µm, (B-B2, D) 30µm, inset of (B) 5µm. ICM: inner cell mass, SEL: sub-epithelial layer, EL: epithelial layer. (A) Image by G. Richards.

77

Figure 3.15. AqbZIPs expression in a range of cell types in juvenile Summary of the cellular localisation of AqbZIPs transcripts based on the in situ hybridisation patterns presented in Figure 3.16. Each cell type is associated with one colour; the same colour code is used in Figure 3.16.

78

Figure 3.16. AqbZIPs expression in a range of cell types in juvenile (Part 1 of 2)

79

Figure 3.16. AqbZIPs expression in a range of cell types in juvenile (Part 2 of 2) Expression of AqNFE2a (A), AqNFE2b (B), AqATF4 (C), AqXBP1 (D), AqOASIS (E), AqCREB (F), AqJUN (G), AqFOS (H), AqMAF (I) and AqPARa (J) was detected in all juveniles (oscula stage). Choanocytes (Ch) and archecoytes (Ar) expressed many bZIPs (C2-J2, A2, D3, F3-I3). Pinacocytes (Pin) expressed AqNFE2a (A3), AqNFE2b (B3) and AqCREB (F4). Spongocytes (Sp) expressed AqJUN (G4), AqFOS (H4) and AqATF4 (C3). In all panels, coloured arrowheads point to AqbZIP-expressing cells, and each colour corresponds to a specific cell-type (Figure 3.15). Black arrowheads point to negative cells, such as spherulous cells (Sph) and sclerocytes (Sc). Figure 3.15 summarises in which cell types AqbZIPs transcripts were detected. Scale bars: (A1-J1) 100µm, (F4) 50µm, (A2, B2, D2, F2-J2, A3-D3, F3-I3, C4, E4) 20µm, (C2, E2, E3, G4, H4) 10µm.

80

Figure 3.17. AqJUN labelling in A. queenslandica juvenile Juveniles (oscula stage) labelled with anti-JUN or DAPI as indicated in the left bottom corner. (B2) and (C2) are a merge of (B-B1) and (C-C1), respectively. Choanocyte chambers (Cc) and archecoytes (Ar) are strongly labelled with AqJUN. Specifically, AqJUN signal is detected in the nuclei (Nu) of those cells, and vesicles (Ves) in archecoytes (C-C2). Red arrows indicate JUN-labelled nuclei, white arrows point to negative nuclei. Scale bars: (A) 50µm, (B-B2) 20µm, (C-C2) 10µm.

81

Figure 3.18. AqATF4 labelling in A. queenslandica juvenile Juveniles (oscula stage) labelled with anti-ATF4 or DAPI as indicated in the left bottom corner. (A5), (A6) and (B3) are a merge of (A, A3), (A2,A3) and (B-B2), respectively. AqATF4 specifically labels the nucleus of putative pinacocytes (Pin). No signal was detected in other cell types, e.g. spherulous cells (Sph) in (B, B3). AqATF4 signal was also detected in archecocytes (Ar) vesicles (A-A6). Scale bars: (A) 10µm, (B-B3) 5µm.

82

Figure 3.19. Modules of co-expression in A. queenslandica (legend over page)

83 Figure 3.19. Modules of co-expression in A. queenslandica (see previous page) Normalised expression profile of all A. queenslandica genes across the seventeen developmental stages indicated on the x-axis. Eigengene categories were defined by S. Fernandez-Valverde (unpublished) and their characteristic expression profile is shown on the right hand side. AqbZIPs are indicated by the markers and are identified below the heatmap. Image by S. Fernandez-Valverde.

Figure 3.20. Genes correlated with the AqbZIPs (see next page) (A-B) Venn diagrams showing the overlap between the lists of genes correlated with (A) three AqbZIPs from the same module of expression (Figure 3.15) -AqPARb, AqNFE2b and AqOASISA- and (B) four AqbZIPs from different modules -AqATF6, AqMAF, AqOASISA and AqATF4-. AqbZIPs from the same module of expression shared most but not all correlated genes (A), while AqbZIPs from different modules shared none (B). (C) Treemaps of the statistically enriched GO terms (Biological Processes) in the list of genes correlated with AqNFE2a and AqOASIS. See also Appendices 3.10 and 3.11 for a complete list of genes correlated with the AqbZIPs and their GO terms.

84

Figure 3.20. Genes correlated with the AqbZIPs (legend on previous page)

85

Figure 3.21. AqbZIPs leucine zipper features and dimerization interface (legend over page)

86 Figure 3.21. AqbZIPs leucine zipper features and dimerization interface. (see previous page)

(A) Frequency of amino acid at the g, d, e, and a positions of A. queenslandica (in red) and H. sapiens (in blue) leucine zippers. Only heptads participating in the dimerization interface were included. (B) Number of heptads constituting the dimerization interface of AqbZIPs. (C) Amino acid sequence of all 17 A. queenslandica leucine zippers (LZ). Residues of the leucine zipper are grouped by heptads (gabcdef). An asterisk marks the natural C terminus. Within an heptad, if both the g and e positions contain charged amino acids, the heptad is coloured in: green for the attractive basicÐacidic pairs (R<- >E and K<->E, K<->D), orange for the attractive acidicÐbasic pairs (E<->R, E<->K, D<- >K, and D<->R), red for the repulsive acidic pairs (E<->E, D<->E, and E<->D), and blue for the repulsive basic pairs (K<->K, R<->K, K<->R, and R<->R). If only one of the residues is charged, it is colored in red if it is acidic and blue if it is basic, indicating a possible salt bridge formation with another leucine zipper. The a and d positions are highlighted in black if it contains polar amino acids and purple if it contains charged amino acid (K, R, H). Yellow indicates the presence of a proline or a pair of glycines, either of which disrupts the α - helical structure (Vinson et al., 2002). (D) Interaction scores of the AqbZIPs. Each column shows the score of all possible dimers using base- optimized weights for one AqbZIP sequence. The red and blue lines mark the thresholds for high-confidence prediction of strong interactions (>35) and non-binders (<10), respectively. Dashed lines mark the thresholds adopted in the original study (Fong et al., 2004). Note that heterodimers appear twice on the plot. Aq: A. queenslandica, Hs: H. sapiens.

87

Figure 3.22. Prediction of AqbZIPs dimerization network (A) AqbZIP dimers were classified into 3 categories: strong interaction (yellow), secondary interaction (orange) and non-binder (grey). White indicates that no reliable prediction was made. Homodimeric interactions are boxed. AqbZIPs that preferentially form homodimers or heterodimers are labelled in red or blue, respectively. (B) Putative Aqbzip dimerization network. AqbZIP proteins (black dots) are grouped by families (blue circle). Predicted interactions are represented as lines, with thickness indicating the strength of the predicted dimer. When all members of a family form a dimer with the same AqbZIP, the line joins the family circle to that AqbZIP. AqFOS: AqNFE2a dimer is represented as a dashed line, because it is less supported than other dimers.

88 MA0488.1 A 2

1 bits G CA C T TT A A A G A C G C C CGG ATG T 0 G 2 3 4 5 6 7 8 9 1 T T C TCA 10 11 12 13 2

1 bits A T A

A G 0 T CGGGGT G 2 3 4 5 6 7 8 9 1 C T10 1 T TCA Tomtom (no SSC) 13.08.2016 20:08

B

Figure 3.23. Features of AqbZIPs promoters (A) Only one motif was enriched in AqbZIPs promoter sequences that matches to the JUN binding site of the JASPAR CORE database (MA0488.1). (B) Known-binding sites found in AqbZIPs promoter sequences.

89 Tables

Number of correlated genes AqbZIP Development-wide During embryogenesis During metmorphosis Aqu2.14504_001 OASIS 53 26 90 Aqu2.18822_001 MAFG 14 0 2 Aqu2.20356_001 FOS 0 1 0 Aqu2.21825_001 JUN 0 1 2 Aqu2.24970_001 OASIS A 95 159 0 Aqu2.30050_001 OASIS B 0 3 5 Aqu2.36410_001 NFE2a 124 172 0 Aqu2.36583_001 XBP1 0 57 2 Aqu2.37170_001 CEBPb 75 123 0 Aqu2.37217_001 ATF6 48 0 0 Aqu2.37521_001 PARb 125 166 2 Aqu2.37524_001 NFE2b 117 156 2 Aqu2.37529_001 ATF2 0 0 0 Aqu2.37571_001 PARa 1 62 2 Aqu2.38530_001 ATF4 7 0 18 Aqu2.43743_001 CREB 0 23 9 Aqu2.44103_001 CEBP 4 167 0

Table 3.1. Genes correlated with the AqbZIPs The number of genes whose expression profiles during development, embryogenesis or metamorphosis significantly correlated to that of one or more of the AqbZIPs is indicated.

90 Transcription factors Aqu2.1.17692_001 grainyhead-like protein 2 homolog Aqu2.1.20114_001 t-box transcription factor Aqu2.1.28989_001 transcription factor mitochondrial-like isoform x1 Aqu2.1.29052_001 t-box transcription factor Aqu2.1.30486_001 transcription factor ap-4 Aqu2.1.33570_001 transcription initiation factor iia subunit 2 Aqu2.1.35570_001 transcription factor isoform x1 Protein Kinases Aqu2.1.16697_001 phosphatidylinositol- -bisphosphate 3-kinase catalytic subunit delta isoform Aqu2.1.31587_001 focal adhesion kinase 1-like Aqu2.1.32890_001 rac-beta serine threonine-protein kinase isoform x2 Aqu2.1.33525_001 serine threonine protein kinase 3 Aqu2.1.35176_001 mob kinase activator 2-like isoform x2 Aqu2.1.36128_001 diacylglycerol kinase theta-like Aqu2.1.37149_001 skeletal receptor tyrosine protein kinase-like Aqu2.1.38661_001 phosphatidylinositol-4-phosphate 5-kinase type-1 alpha Aqu2.1.42041_001 dual specificity protein kinase shkc-like Aqu2.1.43665_001 cyclin-dependent kinase 5 activator 2 Aqu2.1.43953_001 serine threonine-protein kinase plk1 Ras small GTPases Aqu2.1.16423_001 ras-related protein rab-23 Aqu2.1.16423_001 ras-related protein rab-23 Aqu2.1.22051_001 ras gtpase-activating-like protein iqgap1 Aqu2.1.23692_001 ras-related protein rab-11b Aqu2.1.26973_001 ras-related protein rab-14 Aqu2.1.27475_001 ras-related protein rab-35-like Aqu2.1.30867_001 ras-related c3 botulinum toxin substrate 1 Aqu2.1.34138_001 ras-like gtp-binding protein rho1 Cellular processes Aqu2.1.32964_001 apoptosis-inducing factor 3-like isoform x2 Aqu2.1.36566_001 growth arrest-specific protein 2 Aqu2.1.38165_001 death-associated protein 1-like Aqu2.1.38711_001 fas apoptotic inhibitory molecule 1 Aqu2.1.38721_001 wilms tumor protein 1-interacting protein homolog Aqu2.1.42112_001 tumor suppressor candidate 2 Aqu2.1.43253_001 tumor suppressor candidate 5 homolog Signalling pathways Aqu2.1.18919_001 receptor-type tyrosine-protein phosphatase alpha Aqu2.1.25033_001 camp-specific 3 -cyclic phosphodiesterase 4d isoform x2 Aqu2.1.28793_001 g-protein coupled receptor 64 Aqu2.1.32109_001 cryptochrome-2 Aqu2.1.36125_001 signal transducing adapter molecule 1 Aqu2.1.36694_001 low quality protein: cadherin egf lag seven-pass g-type receptor 1 Aqu2.1.37629_001 low-density lipoprotein receptor-related protein 5 Aqu2.1.41568_001 tgf-beta receptor type-1 Aqu2.1.41660_001 fibroblast growth factor receptor substrate 2-like Aqu2.1.42680_001 neurogenic locus notch homolog protein 1- partial

Table 3.2. Selected genes correlated with the AqbZIPs

91 CHAPTER 4 - LIGHT-ENTRAINED GENE EXPRESSION IN THE DEMOSPONGE AMPHIMEDON QUEENSLANDICA

4.1 Abstract

The circadian clock is a molecular network that coordinates organismal adaptations, including behaviour and physiology, with daily environmental changes in the day-night cycle. In animals, recent studies suggest that the topology of the circadian general circuitry may be conserved between bilaterians and cnidarians. However, our understanding of the molecular pathways in place at the base of the animal kingdom, especially those inhabiting marine environments, is limited. The demosponge Amphimedon queenslandica is a marine sponge and a representative of one of the earliest metazoan phyletic lineages, making it an attractive comparative model to understand the origin, and early evolution and function of circadian pathways in animals. Here, we identified homologues of circadian genes in A. queenslandica genome, including two cryptochromes Ð flavoproteins sensitive to blue light, three Timeout-like genes, two PAR-bZIPs and seven bHLH- PAS transcription factors. Real-time PCR (qPCR) analyses, in both adult sponges and larvae, show that expression of AqPARa and AqCRY2 oscillate with the daily light cycle and that rhythmic expression of these is partially lost when the animals are exposed to constant light. Surprisingly the expression of other circadian genes, in particular AqClock, is not light-dependent. I also showed that AqPARa is co-expressed with the cryptochrome AqCRY2 in the larval photosensory organ, and temporally throughout A. queenslandica development. These results provide the first molecular evidence of a circadian clock in two developmental stages of the marine sponge A. queenslandica, and are consistent with key molecular and functional features of the circadian clock being in place in the ancestor of all animals.

4.2 Introduction

Circadian clocks have evolved multiple times during eukaryotic evolution, allowing organisms to coordinate their physiology and behaviour with predictable environmental rhythms (Bell-Pedersen et al., 2005; Foster and Kreitzman, 2005). These molecular timekeeping mechanisms are entrained by several environmental signals, most often light, temperature, and feeding cycles (Harmer et al., 2001; Majercak et al., 1999). They drive physiological and behavioural rhythms by producing oscillations in gene or protein activation. Broadly, circadian clocks comprise three main modules: (1) sensors that perceive the entrainment signals from the environment; (2) molecular oscillators of transcriptional-translational feedback loops that maintain 92 the clock pacing and transmit rhythmic signals to downstream components; and (3) clock-controlled genes (CCGs) that coordinate circadian responses within cells (Bass, 2012; Bell-Pedersen et al., 2005; Harmer et al., 2001; Reitzel et al., 2013; Yu and Hardin, 2006). Our understanding of this clock’s general circuitry largely draws from work on mammal and insect model species, which has revealed that the topology of the molecular oscillator is conserved across these animal lineages (Bell-Pedersen et al., 2005; Harmer et al., 2001; Reitzel et al., 2013; Reppert and Weaver, 2002; Yu and Hardin, 2006). Specifically, two interconnected regulatory loops are organised around a pair of bHLH-PAS transcription factors, Clock and Cycle (or Bmal1 in mammals), which activate the transcription of other clock elements (Figure 4.1). A positive feedback loop, comprising transcription factors of the (NR) and PAR-bZIP families, activates the transcription of Clock and Cycle. Combinations of Period, Timeless and cryptochrome genes form a negative feedback loop that regulates the degradation of Clock and Cycle (Figure 4.1). Within each feedback loop, key roles are performed by different genes depending on the phylum and sometimes even the species (Bell-Pedersen et al., 2005; Brady et al., 2011; Harmer et al., 2001; Reitzel et al., 2010; Reitzel et al., 2013; Reppert and Weaver, 2002; Yu and Hardin, 2006).

Recent studies on representatives of animal lineages basal to the bilaterians, notably the sea anemone Nematostella vectensis and the stony coral Acropora millepora, have identified orthologues to most of the circadian components and reported their transcriptional response to light:dark cycles (Brady et al., 2011; Reitzel et al., 2010; Vize, 2009). Based on these studies, it appears that the topology of the circadian clock may pre-date the bilaterian-cnidarian split, which suggests that some components of the animal circadian clock may have been conserved since the last common animal ancestor.

The marine sponge Amphimedon queenslandica is a representative of one of the oldest extant animal phyletic lineages. Adults live attached to hard substrates on intertidal reef flats of the Great Barrier Reef and release larvae daily. These swim in the water column for 2 to 12 h before settling and metamorphosing into a benthic juvenile. The rhythmic release of larvae is entrained by light in A. queenslandica (Jindrich, Roper et al., in prep) and in another demosponge Callyspongia ramose (Amano, 1988) and, A. queenslandica adults and larvae are responsive to light (Leys et al., 2002). A. queenslandica has an extensive repertoire of regulatory and signalling genes that are shared with modern bilaterians (Degnan et al., 2015; Srivastava et al., 2010). Notably, the genome encodes a functional cryptochrome gene that seems to drive the photic behaviours of the larvae (Rivera et al., 2012), and a suite of bHLH-PAS and bZIP transcription factors bearing high similarity to their bilaterian orthologues (Jindrich and Degnan, 2016; Simionato et al., 2007).

93 Together, these molecular and behavioural data suggest that sponges may have a light-entrained circadian time keeping mechanism similar to that described in other animals. Those components of a sponge circadian clock that are shared with eumetazoans are likely to reflect a core clock traceable back to the ancestor of all animals.

We found that the A. queenslandica genome has orthologues to most of the bilaterian circadian genes. Here, I characterise conserved molecular components of a sponge circadian clock by quantifying their mRNA expression in response to light:dark cycles in two developmental stages of A. queenslandica, adults and free-swimming larvae. Two of these genes, AqCRY2 and AqPARa, display light-entrained and cyclic patterns of transcription that are consistent with them having a role in the regulation of circadian processes. I discuss the implications of a molecular clock in A. queenslandica and its developmental components.

4.3 Material and Methods

4.3.1 Gene identification and phylogenetic analyses Amphimedon queenslandica candidate circadian genes were identified through BLAST searches of the JGI, Ensembl, and newly annotated (Aqu2.1) in-house A. queenslandica databases using publically available bilaterian and cnidarian protein models of the Timeless-Timeout, bHLH- PAS, cryptochrome and PAR-bZIP families. Accession numbers of candidate circadian genes identified are listed in Appendix 4.1. Sequences were aligned with MUSCLE and trimmed to conserved domains. Maximum likelihood (ML) analyses were carried out using RaxML (Stamatakis, 2006). The best-fit substitution matrix was determined using ProtTest (Abascal et al., 2005). Branch support was estimated by performing 1000 bootstrap replicates. Phylogenetic trees were visualised with FigTree v1.4.

4.3.2 Developmental expression of circadian genes Messenger RNA expression of circadian genes through A. queenslandica development was determined used CEL-Seq data comprising 82 A. queenslandica developmental samples from early cleavage to adult (Hashimshony et al., 2012; Anavy et al., 2014: NCBI GEO (GSE54364)). I averaged the biological replicates for each developmental stage. I combined larval samples into two groups: 0Ð7 and 6Ð50 h post-emergence. I plotted the mean normalised expression (counts per million mapped) with the Standard Error across the sponge developmental stages using GraphPad software.

94 4.3.3 Expression of circadian genes in adults Field adults were tagged and maintained in situ in Shark Bay and 1 cm3 biopsies were removed from the outer layer of tissue with a scalpel at regular intervals. Biopsies were immediately placed in RNAlaterTM and frozen. Three adults were sampled in September 2010 (LD cycle approximately 12 h:12 h, sunrise 5:34 am and sunset 5:50 pm on first day of experiment) and three in January 2011 (LD cycle approximately 10:14, sunrise 5:12 am and sunset 6:43 pm on first day of experiment) by S. Lemon.

To determine the effect of different light-treatment on gene expression, adults were collected from the field and transferred and reared in aquaria for a week at The University of Queensland under an artificial 12:12 light:dark cycle (7 am to 7 pm). Animals were then exposed to either 12:12 light:dark or constant dark for 3 days prior to sampling. Samples were collected every 3 h over 36 h, as described above.

4.3.4 Expression of circadian genes in larvae Twelve adult sponges were collected and reared in individual tanks at HIRS with water flowing directly from the reef, for no longer than one week. Larvae were collected from individual sponges at hourly intervals from 1-4 pm using larval traps fitted to the aquaria outflow lines. Two to five individuals were sampled from each cohort every 30 min to 1 h. Larvae were immediately submerged in RNAlaterTM and stored until processing. The intensity of the light experienced by those larvae throughout the day is reported in Appendix 4.3.

To follow gene expression under different light treatments, larvae were collected between 1- 2 pm and immediately transferred to 6-well plates. The plates were semi-submerged in flowing water from the reef, to mimic natural temperature fluctuations. Larvae were exposed to either constant dark or constant light (full spectrum LUMILUX T5 HE tube) for at least one hour before the sampling started at 3 pm, as described above. The light intensity under water was 120 Lux, as measured with a Li-COR LI - 250A light meter, which corresponds to what animals exposed to natural light experienced at 5.30 pm (Appendix 4.3).

4.3.5 cDNA synthesis and quantitative qPCR RNA was purified from larvae and adults using Trizol (Sigma) following the manufacturer’s recommendations. Residual genomic DNA contamination was removed with DNase I (Invitrogen) treatment and quantified using QuBit (Invitrogen) technology. cDNA was transcribed from RNA

95 using Superscript III (Invitrogen) and pentadecamers for adult tissue or a combination of pentadecamers and Oligo dT-15 (Stangegaard et al., 2006) in larval samples.

Primers were designed using Primer3 for amplicons ranging from 100-250 bp with an optimal Tm of 60¡C. Primers were blasted back to the reference genome to ensure specificity, and then optimised for annealing temperatures and primer concentrations. qPCR primers are described in Appendix 4.3. I performed Real-time qPCR on a LightCycler 480 (Roche) platform using Lightcycler 480 SYBR green master mix I (Roche). Each sample was run in triplicate. An inter-run calibrator was used on each plate and was incorporated into the final analyses. Raw Cq values were imported into qBASEPlus software (Biogazelle) for analysis. The GeNorm analysis function was performed within qBASEPlus for both the adult and larval samples to select stable reference genes (Appendix 4.3). Quality control on qPCR data included visual analyses of melt curves and default quality control settings in qBASEPlus. Calibrated and normalised relative quantities (CNRQ) and their associated standard error were exported from qBASEPlus and visualised using GraphPad software. K. Roper performed qPCR analyses on field adult samples and contributed larval samples.

4.3.6 In situ hybridisation I amplified a 704 bp fragment from the predicted coding sequence of AqPARa from cDNA using the primers (5’- CACAACACCGAGACTCCTCC -3’) and (5’- TCCACAGGGAGAGCGTCATA -3’). The fragment was cloned into pGEM TEasy (Promega) using the manufacturer’s protocol and verified by sequencing using M13F and M13R primers. Digoxygenin (DIG)-labelled antisense RNA probes were transcribed from polymerase chain reaction (PCR) products using DIG RNA Labelling Mix (Roche) and T7 or SP6 Polymerase (Promega) following the manufacturer’s instructions. I performed whole mount in situ hybridisation, as previously described (Larroux et al., 2008a). I observed whole-mount and sectioned samples under an Olympus SZX7 or a Nikon eclipse Ti microscope with Nomarski optics (Olympus Australia Pty Ltd, Mt Waverly, VIC, Australia) and photographed samples with a Nikon Sight DS-U1 camera (Nikon Australia Pty Ltd, Lidcombe, Australia).

4.4 Results

4.4.1 The A. queenslandica genome encodes several homologues of circadian genes Through similarity and phylogenetic analyses, we identified A. queenslandica orthologues to metazoan circadian genes. A. queenslandica has two PAR-bZIPs (Figure 4.2A): AqPARa is more closely related to PAR-bZIPs involved in circadian rhythms in other species, while AqPARb

96 belongs to the PAR-like subfamily of bZIPs, the role of which is yet to be investigated (Jindrich and Degnan, 2016).

From the bHLH-PAS family, A. queenslandica has seven orthologues, four of which have been reported elsewhere (Simionato et al., 2007). Notably, we found one orthologue to the core circadian bHLH transcription factor Clock, but no orthologue to Cycle/BMAL1 (Figure 4.2B). We did identify three distant orthologues to ARNT, based on high sequence similarity and an identical PAS domain. Two of these ÐAqARNT2 and AqARNT3 Ð share very high sequence similarity and are located close together on the same scaffold, suggesting a recent duplication event. Finally, A. queenslandica has one well-supported orthologue to NPAS1/3 as well as two genes most similar to NPAS1/3 and AHR (Figure 4.2B).

From the Timeless/timeout family, we identified three genes with high similarity to Timeout but no orthologues to Period in A. queenslandica (Figure 4.2C), consistent with that latter subfamily having originated in bilaterians (Brady et al., 2011; Reitzel et al., 2010). Completing the circadian genetic toolkit, A. queenslandica also has two documented cryptochromes (Rivera et al., 2012).

4.4.2 AqCRY2 and AqPARa spatial and temporal expression are correlated throughout A. queenslandica development AqCRY2 is expressed in the larval photo sensory organ, the pigment ring (Rivera et al., 2012). I performed in situ hybridisation to determine whether expression of AqPARa is consistent with involvement in photic or circadian processes. Early in A. queenslandica embryogenesis (white/cleavage stage), AqPARa is expressed in a subset of micromeres evenly distributed throughout the embryo (Figure 4.3A). After gastrulation, pigment cells combine at the posterior pole of the embryo to form the pigment spot. At this stage, AqPARa expression is strongest in cells around and beneath the spot and in a subset of cells in the inner cell mass (Figure 4.3B-D). As the pigment spot transforms into the pigment ring, AqPARa transcripts become enriched in cells around and beneath the ring. Expression at this stage is also detected at the anterior pole (Figure 4.3 E-H). After hatching, in the pre-competent larva, AqPARa expression appears in epithelial cells surrounding the larval body, including around the pigment ring (Figure 4.3 I-L). This expression patterns mimics closely the expression of AqCRY2 (Rivera et al., 2012). CEL-Seq data across A. queenslandica life cycle show that both genes are dynamically expressed in the same developmental pattern across embryonic, larval and metamorphic development (Figure 4.3 M).

97 4.4.3 Diel patterns of gene expression in field adults Analysis of gene expression from biopsies collected from sponges living in situ on the reef flat reveals that expression of both AqCRY2 and AqPARa increases towards dawn and peaks in the mid-morning and late morning, respectively. Transcript levels then steadily decrease through the afternoon and troughs during the night (Figure 4.4A). Due to safety concerns, sponges were not sampled in the field throughout the night. However, previous experiments showed low AqCRY2 and AqPARa transcript levels during the night (S. Lemon, unpublished data). Both genes showed slight but corresponding variations between individuals (Appendix 4.4). Both during summer and spring, the timing of AqCRY2 and AqPARa peak expression varies between 8 am to 2 pm, depending on individuals and day. In the summer however, we observed a second peak of smaller amplitude in the mid afternoon (Appendix 4.4).

Transcription of AqClock did not correlate with the day-night cycle. Adults displayed unsynchronised increases in expression, both during day and night (Figure 4.4A and Appendix 4.4). Of the other candidate circadian genes we identified, only a subset showed expression levels that allowed reliable qPCR to be performed (Appendix 4.5). However, expression of those genes did not show a consistent response to the diurnal light cycle (Appendix 4.6).

4.4.4 Light dependent gene expression in adult sponges To examine the light-dependence of AqCRY2 and AqPARa diurnal expression, I measured transcription in A. queenslandica maintained in aquaria and exposed to either light:dark (12 h:12 h) or constant dark. Patterns of gene expression under a light:dark cycle (natural or artificial) were consistent with patterns seen in biopsies collected from adult sponges in the wild, confirming that gene expression in sponges reared in aquaria reflects that of sponges in their natural environment (Figure 4.4). Expression of both genes increases dramatically after the onset of light and slowly decreases during the afternoon despite the light level being constant throughout the subjective day. Removal of light significantly decreases AqPARa transcript level; AqCRY2 mRNA abundance decreases to a lesser extent

In adults reared under constant darkness for 3 d, expression of AqCRY2 and AqPARa were both significantly lower than under natural light (Figures 4.4B ad 4.5). 24 h-period oscillations of significantly smaller amplitudes were observed for both genes in sponges kept in constant dark. AqCRy2 and AqPARa expression was generally lower during the subjective night than during the subjective day (Figures 4.4 and 4.5). In both genes, maximal expression was reached later than

98 under 12 h:12 h light cycle (Figure 4.4B). Prior experiments have indicated that oscillations of significant amplitude are sustained into the first day of darkness (S. Lemon, unpublished).

4.4.5 Light dependent gene expression in larvae Next, I examined whether similar cycles of transcription were observed in larvae. Specifically, gene expression levels were assessed from the point when they emerged from the maternal sponge (between 1 and 4 pm, depending on the release period). In larvae released between 1 and 2 pm, AqCRY2 and AqPARa transcript levels correlate with light intensity, following closely the pattern observed in adults. Gene expression starts declining from 4.30 pm and drops drastically after sunset at 6 pm (Figure 4.6; see Appendix 4.2 for light intensity report). Variation in transcription levels between individuals was not correlated with maternal origin or the age of the larvae (i.e. cohorts from different mothers and larvae of different age, defined as the hours post emergence from maternal sponge, show similar relative expression at any given time point (Appendix 4.7). However, larvae show the greatest variability in expression when they emerge from the maternal sponge, regardless of the release period; transcript levels become more similar as larvae age. Transcription of AqClock in larvae remains relatively constant, with very little variations between individuals (Figure 4.6 and Appendix 4.7).

To further investigate whether the expression of AqCRY2 and AqPARa in young larvae responds to environmental cues or is entrained by the brooding mother, I exposed larvae after release to constant light or constant dark. In larvae exposed to constant light, AqCRY2 expression gradually decreased throughout the subjective day and was relatively constant during the subjective night (Figure 4.6). During the subjective day, AqCRY2 and AqPARa expression was lower under constant light than under natural light; the light intensity I used (120 Lux) was low relative to light levels larvae could experience naturally (300-600 Lux at 2 pm under natural light; 120 Lux corresponds to the intensity of the natural light at ~5.30 pm; Appendix 4.2). Under constant dark, gene expression levels were significantly lower than those under the natural and constant light regimes but similar in expression to when larvae exposed to the natural light regime experienced darkness (Figure 4.6). Gene expression of AqCRY2 and AqPARa was significantly higher during the subjective day than during the subjective night (Figure 4.7).

I then monitored the transcriptional response to a sudden change of light treatment. Larvae were exposed to constant light for 5 h after their emergence from the brood chamber, and then transferred to constant dark. AqCRY2 and AqPARa expression responded to the change in light intensity within 30 and 90 min, respectively (Figure 4.7).

99 4.5 Discussion

Behavioural and molecular studies on cnidarians have proposed that the circadian gene network may be conserved in all eumetazoans (Reitzel et al., 2013) and references therein). Though gene orthologues may play different roles in different species, the molecular clock is built around two interconnected feedback loops, which modulate the transcription of a limited number of core genes (Figure 4.1). Here, I discuss molecular and behavioural evidence of a similar circadian network being in place in the marine sponge Amphimedon queenslandica.

4.5.1 Cryptochrome AqCRY2 appears to be light sensor or effector in the negative feedback loop Marine sponges display light-induced behaviours, despite lacking opsin-based eyes and nervous system (Leys et al., 2002; Rivera et al., 2012). Sponge larvae can detect different light wavelengths and intensities through a posterior ring-like structure of pigmented cells, which mediate larval phototactic swimming (Leys et al., 2002; Rivera et al., 2012). Rivera et al. (2012) established the involvement of AqCRY2, one of A. queenslandica two cryptochromes, in the sponge ability to respond to light but could not determine whether AqCRY2 was an effector or a cue activating downstream targets responsible for the light response. Cryptochromes belong to a large family of conserved flavoproteins involved in photoreception and the entrainment of circadian rhythms. The exact function of AqCRY2 in the sponge is currently unknown (Rivera et al., 2012). Here, we provide evidence for a possible role of AqCRY2 in circadian entrainment.

The two cryptochrome genes in A. queenslandica originated from gene duplication within the sponge lineage; they are type I cryptochromes but appear to have different functions: AqCRY2 is expressed in the sponge photosensory organ and AqCRY2 protein absorbs light; AqCRY1 display a complementary spatial expression pattern and does not appear to be light responsive (Rivera et al., 2012). Consistent with these findings, transcription of AqCRY1 did not correlate with light exposure in either developmental stage, whereas AqCRY2 showed a strong diurnal pattern, both in larvae and adults. Since I have no indication that the same regulatory mechanisms are in place in these two developmental stages, I will first discuss them separately.

In adults, AqCRY2 transcription seems to be entrained by a burst of light and regulated by a mechanism independent of the light response, since transcription decreases before the removal of the light cue. Whether AqCRY2 transcription levels reflect its protein level or activity is unknown. In Drosophila for instance, mRNA cycling does not control CRY protein activity, as the protein is only stable in the dark (Ashmore and Sehgal, 2003). Since AqCRY2 is not degraded by the light 100 (Rivera et al., 2012), a simpler regulatory mechanism could be in place in A. queenslandica, where AqCRY2 indirectly regulates its own expression, possibly through a transcriptional repressor or via activation of the degradation of its mRNA. Adult sponges reared under constant dark for 3 days showed some maintenance of differential expression during periods of subjective light and dark, despite all other environmental parameters being constant. Though the amplitude of these oscillations appears negligible compared to those under natural light, it supports a mode of regulation beyond the light response.

In larvae, different timing of removal of the light cue generates a similar transcriptional response. Light-entrained larvae transferred to constant dark at 7.30 pm show a sudden drop in AqCRY2 transcription within 1.5 h, which mimics that observed at nightfall in larvae reared under natural light. A similar timing is observed in larvae reared under constant dark, after their exposure to a burst of light while they emerged from the brood chamber at 1-2 pm. Such response to light removal, regardless of the time of the day, suggests that AqCRY2 transcription in larvae is largely light responsive rather than driven by an endogenous signal. However, AqCRY2 is also differentially expressed between subjective day and night in larvae exposed to constant light, which could indicate the existence of an additional regulation, independent of the strong regulation of AqCRY2 by direct absorption of light.

In other animals, cryptochromes are part of the negative feedback loop, with a species- dependent combination of genes from the period and timeless family (Reitzel et al., 2013). Together, they inhibit the molecular clock and thereby their own expression. A. queenslandica does not have a Period orthologue. Since several cnidarians also lack Period genes (Brady et al., 2011; Reitzel et al., 2010; Shoguchi et al., 2013), it most likely was not a component of the ancestral animal circadian clock. A. queenslandica does however have several timeless domain containing genes. In bilaterians, the timeless family comprises two paralogues, timeless1 and timeless2/timeout, seemingly resulting from a duplication of the timeout gene after the divergence of the cnidarians (Reitzel et al., 2010; Rubin et al., 2006). Accordingly, we identified two reliable gene models containing a timeless domain, both more closely related to timeout. Since A. queenslandica TO genes cluster together to the exclusion of other sponge and cnidarian TO orthologues and only AqTO1 is significantly expressed during A. queenslandica life cycle, I propose that A. queenslandica TO genes originated from a species-specific duplication. In cnidarians, timeout orthologues behave differently in different species. In the sea anemone Nematostella vectensis timeout expression is not light responsive (Reitzel et al., 2010) whereas the coral Acropora millepora timeout displays a diel expression, which is lost under constant dark (Brady et

101 al., 2011; Brady et al., 2016). In A. queenslandica, expression of AqTO1 was not consistently correlated with the day-night cycle. In both cnidarians and A. queenslandica, the involvement of timeout genes in circadian regulation is currently unclear.

4.5.2 A. queenslandica has orthologues to the positive elements of bilaterian circadian regulatory loops In bilaterians and cnidarians, the negative feedback loop paces the circadian clock by inhibiting the Clock:Cycle heterodimer. Clock and cycle are two basic helix-loop-helix (bHLH) transcription factors containing a PAS domain, and appear to be the core components in all bilaterian clocks (Bell-Pedersen et al., 2005; Harmer et al., 2001; Reitzel et al., 2013). The expression of one of these transcription factors oscillates with a 24 h periodicity, whereas the other shows little to no oscillation. Which gene oscillates varies between species (Bell-Pedersen et al., 2005; Harmer et al., 2001).

A. queenslandica genome harbors one Clock gene, which transcription remained stable across the day-night cycle in both developmental stages. We confirmed the absence of a Cycle orthologue in A. queenslandica (Simionato et al., 2007) and identified three ARNT orthologues, most likely resulting from a recent duplication along the sponge lineage of a single ARNT/Cycle- like gene. Only AqARNT1 is significantly expressed during A. queenslandica life cycle, but shows asynchronous and inconsistent response to the natural light cycle.

Orthologues to Clock and Cycle have been identified in several cnidarians (Brady et al., 2011; Hoadley et al., 2011; Reitzel et al., 2010) and are able to dimerize in at least two species, the sea anemone (Reitzel et al., 2010) and a coral (Shemesh et al., unpublished). Since preliminary evidence suggests that Cycle may not be present in other sponge species (K. Jindrich, unpublished), Cycle could have been lost in the sponge lineage or not part of the ancestral metazoan circadian regulatory network. This suggests one of three scenarios: (1) the sponge molecular clock is organised around the same building blocks as bilaterian and putatively cnidarian clocks, and another transcription factor is substituting for Cycle. Since AqClock did not show differential expression between day and night, the expression of this other putative factor would oscillate; (2) AqClock is not involved in A. queenslandica molecular clock, and this portion of the diurnal clock bears no resemblance to that of other metazoans; (3) AqClock is involved in A. queenslandica molecular clock but its function does not depend on transcriptional oscillations. Several studies have suggested that protein oscillation does not always require mRNA cycling (Harms et al., 2004 and references therein). Our data only reflect the changes in transcript levels in whole animals and

102 thus are inherently limited. Protein expression profiles and AqClock cellular localisation may provide further insights on the involvement of AqClock in the sponge putative clock.

A. queenslandica bHLH repertoire also comprises two NPAS-like and one AHR-like genes. However, all three gene models sit closely on a single genomic contig and only one of them - AqNPAS-like1 - clusters with bilaterian bHLH genes (specifically, NPAS1/3, HIF and SIM). This may indicate a recent duplication event, either within A. queenslandica or the sponge lineage evolution, of an ancestral NPAS gene. The involvement of NPAS1/3, AHR or HIF genes in circadian processes has not been extensively studied in other animals.

4.5.3 Putative conserved role of AqPARa in the circadian positive feedback loop The positive feedback loop comprises transcription factors from the nuclear receptors (NR) and PAR-bZIP families and balances the negative loop by modulating the transcription of either Clock or Cycle. While the organisation of the loop is conserved between mammals and insects, different genes performed the key roles in different species. In mouse, nuclear receptors ROR and Rev activate and repress, respectively, the expression of Cycle (BMAL1), which in turn drives their transcription (Reppert and Weaver, 2002). PAR-bZIPs are indirectly involved in the regulation of Clock and Cycle (Mitsui et al., 2001; Yamaguchi et al., 2000) but act primarily as first order clock control genes, mediating the circadian info to downstream targets. In the fly Drosophila melanogaster, the antagonistic regulation of Clock is performed by two PAR bZIPs, which activate and repress, respectively, the transcription of DmClock through competitive binding to V/P/D box (Cyran et al., 2003). Cnidarians do not have Rev/ROR orthologues; the expression of two PAR- bZIPs oscillate in opposite fashion in response to a light:dark cycle, and appears to regulate the expression of NvClock (Reitzel et al., 2010; Reitzel et al., 2013). Oscillations are lost under constant dark conditions (Reitzel et al., 2013).

A. queenslandica only has two nuclear receptors, none of which belongs to the ROR/Rev subfamily (Bridgham et al., 2010). On the other hand, A. queenslandica possesses two PAR-bZIPs of which AqPARa is a direct orthologue to the bilaterian circadian PAR-bZIPs and shows entrained diel patterns of gene expression. AqPARb is thought to belong to another subfamily of PARbZIPs and its expression does not suggest an involvement in circadian or photic processes (Jindrich and Degnan, 2016; this study).

The timing of AqPARa peak expression is highly correlated to that of A. queenslandica cryptochrome AqCRY2, both throughout development and across the day-night cycle. Both genes

103 show rhythmic expression in periods of light, or subjective day when the light cue is absent (see discussion on AqCRY2 transcription regulation above). AqCRY2 and AqPARa are also spatially co- expressed during development, in pigment cells located in the sponge ‘eye’. The co-expression of both genes suggests that their regulation is connected and since there is no phase delay, it is unlikely that one directly drives the transcription of the other. A possible scenario would be that transcription of both genes is controlled via feedback from their own protein products, possibly through a repressor degraded by light. One of the A. queenslandica cryptochromes could also act as an inhibitor, following a light-induced conformation change as observed in Drosophila for instance (Lin et al., 2001). Our data only reflects changes at the RNA level and protein expression profiles may have a different phase and pattern (Yu and Hardin, 2006). Post-translational modifications control the timely subcellular localisation of clock proteins (Harms et al., 2004) but was not considered in this study.

4.5.4 A circadian clock or a free-running circadian clock? One of the defining features of the classical circadian clock is to be free-running (Bell- Pedersen et al., 2005). Mammal and insect clocks maintain cyclic outputs for extended periods of time after the removal of the entraining cue. In cnidarians however, current data suggest that light- entrained gene oscillation patterns and rhythmic behaviour dissipate within one to three days, when animals are exposed to constant darkness (Brady et al., 2011; Hoadley et al., 2011; Reitzel et al., 2010). In A. queenslandica, I observed some maintenance of gene oscillations of small amplitude after three days under constant light conditions. This could result from an unidentified endogenous pacemaker or from a failure to reset the clock. For instance, if the light-dependent degradation of an inhibitor is resetting the proposed clock, remnants of oscillations could be carried on from previous days even under constant darkness. Since the maintenance of transcriptional oscillations is driven by the central nervous system in eumetazoans (Emery et al., 2000; Reppert and Weaver, 2002), the maintenance of oscillations in A. queenslandica, which lacks a nervous system, would require another, yet to be defined, mechanism.

104 Figures

Figure 4.1. Circadian gene network in bilaterians Diagrams of the simplified gene networks composing the human and fly (D. melanogaster) circadian clock. The conserved bilaterian molecular clock is composed of two interconnected regulatory loops that modulate the expression of two bHLH transcription factors -Clock and Cycle-. Clock and Cycle dimerize and act as positive elements by upregulating transcription of clock-controlled genes, including their own regulators.

Figure 4.2. Phylogenetic analysis of A. queenslandica orthologues to circadian genes. (See next page) Maximum likelihood analysis of (A) PAR-bZIPs, (B) bHLH-PAS and (C) Timeless/Timeout genes from A. queenslandica (in red) and representatives of the main metazoan lineages. Statistical support is indicated on each branch (1,000 bootstraps).

105

Figure 4.2. Phylogenetic analysis of A. queenslandica orthologues to circadian genes. (legend on previous page)

106

Figure 4.3. Developmental expression of AqPARa. (legend over page)

107 Figure 4.3. Developmental expression of AqPARa. Localisation of AqPARa transcripts by whole-mount in situ hybridisation. (A, B, E, H) Whole-mounts; (C, G, I, K) cleared whole mounts; (D, F, J) sections; (B) posterior pole facing up; (C-K) posterior pole to right. (A) During cleavage, AqPARa is expressed throughout the embryo. (B-D) At the spot and early ring stages, AqPARa expression is strongest in cells around and beneath the spot, and in a subset of cells in the inner cell mass. (E-H) As the ring develops, AqPARa transcripts remain enriched in cells around and beneath the ring, although expression is also detected at the anterior pole. Higher magnifications of the embryo in F show the strong expression of AqPARa around the ring. (I-K) As the embryo develops into a swimming larva, AqPARa expression appears in sub-epithelial cells surrounding the larval body. Larva sectioned through the middle (J) of the ring or the edge (K) of the ring. (L) In older larvae (10 h post-emergence), AqPARa expression becomes more evident in sub-epithelial cells. (M) CEL-Seq expression profiles of AqPARa and AqCRY2 during A. queenslandica development.

108

Figure 4.4. Temporal gene expression of putative circadian genes AqCRY2, AqPARa and AqClock in adults. (A) Three adult sponges were sampled in the field throughout 5 days. AqPARa and AqCRY2 expression correlates with periods of daylight. (B) Adults sponges were subjected to a 12:12 light:dark treatment (green line) or constant darkness (black line) and sampled every 3 hours over 2 days (n=3). Asterisks indicate significant differences among treatments (**< 10-2; ***<10-3, ****<10-4) and error bars are +/- sem. Shaded area indicates a period of subjective night.

109

Figure 4.5. Residual oscillations in AqCRY2 and AqPARa expression in adults under constant dark regime. Expression levels at time points corresponding to the subjective day (7 am to 7 pm) and night (7 pm to 7 am) were averaged into “Day” and “Night”, respectively. Other environmental inputs (e.g. temperature) were maintained constant. Under both natural light (green) and constant darkness (black) regimes, average expression levels of AqCRY2 and AqPARa in adults were lower during the subjective night, relative to the subjective day.

110

Figure 4.6. Temporal gene expression of putative circadian genes AqCRY2, AqPARa and AqClock in larvae. Larvae of different maternal origin were reared separately from each other and subjected to three light regimes: constant light (orange), constant darkness (black) or natural light (green). Time points were 30 min to 1 h apart and n=6-10. Error bars are +/- sem. Shaded area indicates a period of subjective night. A light grey shading indicates the hours before sunset, when the light intensity had already significantly decreased (Appendix 4.2)

111

Figure 4.7. Residual oscillations in AqCRY2 and AqPARa expression in larvae under constant light conditions. (A) Expression levels at time points corresponding to the subjective day (3 to 6 pm) and night (7 to 11 pm) were averaged into “Day” and “Night”, respectively. Other environmental inputs (e.g. temperature) were maintained constant. Under all light regimes (natural light in green; constant darkness in black; constant light in orange), average expression levels of AqCRY2 and AqPARa in larvae were lower during the subjective night, relative to the subjective day. (B) Temporal gene expression of AqCRY2 and AqPARa in larvae reared under constant light until 7.30 pm and then under constant dark (red line). Gene expression in larvae reared under constant dark (black line) and constant light (orange line) are also plotted as reference. Asterisks indicate significant difference between treatments. Gene expression matches the light condition after 30 min and 1.5 h for AqCRY2 and AqPARa, respectively. Error bars are +/- sem. Asterisks indicate significant difference between subjective day and subjective night (** p< 10-3; *** p < 10-5). 112 CHAPTER 5 Ð AQATF4 DNA-BINDING IN RELATION TO THE HISTONE MODIFICATION LANDSCAPE IN THE AMPHIMEDON QUEENSLANDICA LARVA

5.1 Abstract

Sequence-specific transcription factors regulate transcription by controlling the recruitment of the transcriptional machinery to the promoter of their target genes. In bilaterians, this is achieved in collaboration with co-regulators that remodel the chromatin by covalently modifying histone tails, and thus promote Ð or restrict Ð access to DNA. It is now well established that all animals share an extensive repertoire of conserved transcription factors, including several members of the basic leucine zipper (bZIP) family, but the role of chromatin architecture in gene regulation in non- eumetazoan animals (sponges and ctenophores) is currently unknown. In this chapter, I explore the relationship between the binding of a bZIP transcription factor Ð AqATF4 Ð to DNA and the histone post-translational modification landscape in the larva of a representative of one of the first- branching animals, the demosponge Amphimedon queenslandica. Using chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq), I provide genome-wide maps of four modifications of the histone H3 tail (H3K4me3, H3K4me1, H3K27ac and H3K27me3), and define distinct chromatin states akin to those found in bilaterians. Then, I show that ATF4 occupancy in this sponge is associated with active distal cis-regulatory sites that present similar characteristics to bilaterian enhancers. Finally, I discuss the putative role of ATF4 in the regulation of cellular processes fundamental to the functioning of the mature A. queenslandica larvae. Collectively, I infer that the interplay between transcription factor networks, including the bZIPs, and dynamic chromatin states was likely part of the complex transcriptional regulatory mechanisms present in the last common ancestor of all animals.

5.2 Introduction

Cells are continuously engaged in the process of decision-making, in which, in response to changes in extracellular conditions, a cell must decide whether to proliferate, differentiate or die. In metazoans, coordination of cellular processes is governed by dynamic and cell-specific patterns of gene expression (Peter and Davidson, 2015). Canonically, gene expression regulation depends largely on the binding of transcription factors to specific sequences of DNA. These transcription factors modulate transcription by promoting or inhibiting the recruitment of RNA polymerase to gene promoters (Ptashne and Gann, 2002; Weirauch and Hughes, 2011). The expanding field of epigenetic research suggests that covalent modifications of histone proteins are another critical factor in the regulation of gene expression (Cheng et al., 2012; Turner, 2007). Specifically,

113 transcription factors access to DNA rely on the remodelling of the chromatin structure by post- translational modifications (PTMs) of histones, which are in turn carried out by histone modifier proteins recruited by transcription factors (Cheng et al., 2012; Li et al., 2007).

The association of various histone PTMs with transcriptional output has been well characterised in mammals (Consortium, 2012; Roadmap Epigenomics et al., 2015; Turner, 2007; Zhou et al., 2011). For instance, trimethylation of lysine residues at positions 4 and 27 of the histone H3 tail (H3K4me3 and H3K27me3) are associated with expressed genes and silenced genomic regions, respectively. Other PTMs correlate with the activity of regulatory features, such as cis-regulatory elements, which are non-coding regions that contains multiple transcription factor binding sites (Davidson, 2006). Specifically, modifications of the histone H3 tail, such as monomethylation of H3K4 residues (H3K4me1) and acetylation of H3K27 residues (H3K27ac) mark active distal cis- regulatory elements (enhancers).

The conservation of such a complex regulatory landscape beyond bilaterians has only recently been investigated. One of the closest unicellular relatives of animals, Capsaspora, lacks chromatin repressive marks and distal enhancers (Sebe-Pedros et al., 2016). The combination of chromatin regulatory patterns in cnidarians appears, however, to be as complex as that of bilaterians (Schwaiger et al., 2014). Amphimedon queenslandica possesses a toolkit of transcriptional regulators comparable to that of bilaterians (Adamska et al., 2007a; Adamska et al., 2010; Conaco et al., 2012; Degnan et al., 2015; Gaiti et al., 2015; Larroux et al., 2006; Nakanishi et al., 2014; Srivastava et al., 2010) but little is known about chromatin states and dynamics. Given the phylogenetic position of sponges as sister group to the eumetazoans and the relative simplicity of their body plan (they lack a true gut, nerves and muscles (Simpson, 1984), A. queenslandica is an ideal model to investigate the origin of complex gene regulation. In other words, is the complex regulatory architecture found in bilaterians and cnidarians a eumetazoan innovation and associated with complex body plans or did it evolve earlier with the evolution of animal multicellularity?

A. queenslandica chromatin architecture and its implications on the evolution of multicellular animals has recently been described by F. Gaiti (PhD thesis under review) and Gaiti, Jindrich, Fernandez-Valverde, Degnan and Tanurdžić (under review). In this chapter, I explore transcriptional regulation in A. queenslandica at two levels: through the binding of transcription factors and through dynamic changes in histone modifications. Using chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) (Robertson et al., 2007) in A.

114 queenslandica mature larvae, I profile four PTMs of the histone H3 tail (H3K4me3, H3K4me1, H3K27ac, H3K27me3) and one of the most highly expressed bZIP transcription factors, AqATF4.

5.2.1 ATF4 regulatory role The ATF4 bZIP family is metazoan specific (Jindrich and Degnan, 2016). In bilaterians, ATF4 expression is induced by a variety of stress stimuli, including, but not limited to, endoplasmic reticulum (ER) stress, oxidative stress and viral infection (reviewed in Ron, 2002; Rutkowski and Kaufman, 2003). One of ATF4’s best-characterised roles is in the regulation of cellular redox state in response to ER stress. Through the PERK-ATF4 branch of the unfolded protein pathway (UPR), ATF4 targets genes involved in antioxidant metabolism and amino acid synthesis and transport, putatively to help the cells cope with oxidative stress (Han et al., 2013; Tsai and Weissman, 2010). When the UPR fails to restore homeostasis, ATF4 initiates one of the main ER stress-induced cell death mechanisms, through activation and dimerization with CHOP, a human CEBP-like bZIP (Han et al., 2013; Tsai and Weissman, 2010).

ATF4 has also been involved in the regulation of developmental processes, including eye development in mice and larva metamorphosis in Drosophila (Hai and Hartman, 2001). However, its particular function in these contexts is still unclear.

ATF4 can act as an activator or a repressor depending on the promoter and cellular context (Ameri and Harris, 2008; Hai and Hartman, 2001). The ATF4 canonical binding sites in vertebrates are TGACTCA, the so-called cAMP-response element (CRE) site (Hai and Hartman, 2001; Lee et al., 2015) and TGATGCAA, a hybrid binding site comprising a half CRE (TGA) and a half CEBP- like sites (Jolma et al., 2013). However, mammalian ATF4 forms heterodimers in vivo with a range of bZIPs (reviewed in Hai and Hartman, 2001), which bind a range of hybrid binding sites such as the antioxidant response element (ARE, TGACxxxGC) Ð a composite between CRE and NFE2 recognition sites(Ameri and Harris, 2008). ATF4-bZIPs in other metazoans also form numerous bZIP dimers in vitro with unknown binding specificity (Reinke et al., 2013). A. queenslandica AqATF4 is remarkably similar to its metazoan orthologues and notably bears an almost identical DNA-contacting region to the human ATF4 (Appendix 5.1) (Jindrich and Degnan, 2016). AqATF4 predicted dimerization partners are AqPARa and AqCEBP (Chapter 3).

In this chapter, I show the association of combinatorial chromatin signatures with different genomic features and expression patterns in A. queenslandica. Then, I use an anti-AqATF4 antibody to determine AqATF4 binding sites using ChIP-Seq, and show that AqATF4 binding

115 regions correlated with regions of open chromatin. Collectively, these results suggest that the interplay between chromatin modification dynamics and transcription factor networks was already in place in the animal last common ancestor.

5.3 Material and methods

5.3.1 Animal collection Amphimedon queenslandica larvae were released and collected from Heron Island Reef, Great Barrier Reef, Queensland, Australia, as previously described (Leys et al., 2008). Specifically, larvae were collected between noon and 2 pm from a pool of 10-12 adult sponges and reared in tanks for 10 hours before being processed for ChIP-Seq, as described below. Given the number of individual larvae required to perform successful ChIP-Seq experiments, harvesting larvae from only one maternal adult was not possible. This time point corresponds to larvae that are competent to settle and undergo metamorphosis (10-12 hours post emergence from the brood chamber) (Degnan and Degnan, 2010).

5.3.2 Antibodies I used a mouse monoclonal antibody against the unphosphorylated C-terminal repeat of RNA polymerase II (clone 8WG16, #05-952, Merck Millipore, Billerica, MA), a rabbit polyclonal antibody against H3K4me3 (#07-473, Merck Millipore), a rabbit polyclonal antibody against H3K27me3 (#07-449, Merck Millipore), a mouse monoclonal antibody against H3K4me1 (#17- 676, Merck Millipore), a rabbit polyclonal antibody against H3K27ac (#07-360, Merck Millipore), a rabbit monoclonal antibody against H3K36me3 (#17-10032, Merck Millipore) (Appendix 5.2). The entire amino acid sequence of histone H3 is perfectly conserved between A. queenslandica and other eukaryotes where these antibodies have been used successfully (Barraza et al., 2015; Eckalbar et al., 2016; Harmeyer et al., 2015; Liu et al., 2007; Sebe-Pedros et al., 2016) (Appendix 5.3). I also used a custom-made affinity-purified rabbit polyclonal antibody, raised by GenScript Biotechnology (NJ, USA), against a peptide located outside the bZIP domain of AqATF4 (CTPKRTPQGRGGRKS).

5.3.3 Chromatin Immunoprecipitation (ChIP) assays The ChIP protocol was adapted from Gaiti et al. (under review). Approximately 350 larvae were pooled, homogenised and crosslinked in 2% formaldehyde for 5 min at room temperature (RT). Crosslinking was quenched with 125 mM glycine for 5 min at RT. Cells were washed twice in filtered seawater (0.22 µm) and centrifuged at 500 x g for 5 min. Pelleted cells were lysed in SDS

116 Lysis buffer (10 mM EDTA, 50 mM Tris-HCl at pH 8.0, 1% SDS, plus cOmpleteTM Mini Protease Inhibitor Cocktail as per the manufacturer’s recommendation), incubated for at least 10 min on ice, and sonicated for 24 min (24 cycles, each one 30 sec “ON”, 30 sec “OFF”) in a Bioruptor Sonicator (Diagenode, Seraing, Belgium) to generate 200-300 bp fragments. Optimal sonication conditions were previously set up by testing a range of sonication cycles (from 6 to 30), determining that 24 cycles were optimal. Non-soluble material was removed from the lysate by centrifugation at 12,000 x g for 10 min at 4¡C. An aliquot of the soluble material was removed for input DNA and stored at - 20¡C. To reduce the SDS concentration to 0.1%, the remaining soluble material was diluted 10 fold in ChIP dilution buffer (1.2 mM EDTA, 16.7 mM Tris-HCl at pH 8.0, 167 mM NaCl, 1.1% Triton X-100, 0.01% SDS, plus cOmpleteTM Mini Protease Inhibitor Cocktail as per the manufacturer’s recommendation). To reduce non-specific background, the diluted soluble material was pre-cleared with Dynabeads protein G beads (#10003D, ThermoFisher, Waltham, MA), and, at the same time, the antibodies were linked to Dynabeads protein G beads (#10003D, ThermoFisher), rotating for one hour at 4¡C. At this point, the pre-cleared diluted soluble material was incubated with the antibody-bead mixtures, rotating at 4¡C overnight. Immunoprecipitated material was washed three times with Low Salt Wash Buffer (2 mM EDTA, 20 mM Tris-HCl at pH 8.0, 150 mM NaCl, 1% Triton X-100, 0.1% SDS), three times with High Salt Wash Buffer (2 mM EDTA, 20 mM Tris-HCl at pH 8.0, 500 mM NaCl, 1% Triton X-100, 0.1% SDS), three times with LiCl Wash Buffer (1 mM EDTA, 1 mM Tris-HCl at pH 8.0, 1% DOC (deoxycholic acid), 1% NP-40, 250 mM LiCl), and three times with TE buffer. DNA complexes were eluted for 30 min at 65¡C with TE-SDS (1X) and de-crosslinked overnight at 65¡C, along with input DNA, with the addition of 125 mM NaCl. De- crosslinked DNA complexes and input DNA were treated with 1 µg/µL RNaseA for 1 hour, and subsequently with 0.1 µg/µL proteinase K for 1 hour. Finally, immunoprecipitated and input DNA were purified with phenol:chloroform:isoamyl extraction (25:24:1), recovered by precipitation with ethanol in the presence of 300 mM NaOAc pH 5.2 and 2 µl of glycogen carrier (10 mg/ml), and resuspended in UltraPureª DNase/RNase-Free Distilled Water (ThermoFisher) for later use. Libraries of immunoprecipitated DNA and input DNA were prepared using the NEBNext ChIP-Seq Library Prep Master Mix Set for Illumina (#E6240, New England Biolabs, Ipswich, MA) according to the manufacturer’s protocol. The quality and profile of the libraries was analyzed using Agilent High Sensitivity DNA Kit (#5067-4626, Agilent, Santa Clara, CA) and quantified using KAPA Library Quantification Kits (#KK4824, Kapa Biosystems, Wilmington, MA). Deep sequencing (paired-end 40bp) of the libraries Ð H3K4me3, H3K4me1, H3K27me3, H3K27ac, H3K36me3, RNAPII, AqATF4 and input DNA Ð was performed by the Central Analytical Research facility (CARF), Brisbane, Queensland, Australia, on Illumina NextSeq500 instrument (Illumina, San Diego, CA, United States).

117 5.3.4 ChIP-Seq data analyses Raw Illumina sequencing reads were checked using FastQC v0.52 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adapter contamination prior to read quality filtering was removed using Cutadapt (Martin, 2011). Reads were then quality filtered using Trimmomatic v1.0.0 (SLIDINGWINDOW: 4:15, LEADING: 3, TRAILING: 3, HEADCROP: 3, MINLEN: 20) (Bolger et al., 2014). Quality filtered paired-end Illumina sequencing reads were then aligned to the Amphimedon genome (Srivastava et al., 2010) using Bowtie v1.1.2 (-m 1, -n 2, - X 500, --best) (Langmead et al., 2009). Non-aligned reads were removed using SAMtools v0.1.19 (Li et al., 2009). For all the ChIP-Seq data sets, strand cross-correlation measures were used to estimate signal-to-noise ratios using SPP v1.11.0, as described in the modENCODE and ENCODE guidelines (Consortium, 2012; Kellis et al., 2014; Kharchenko et al., 2008; Landt et al., 2012). ChIP-Seq data sets for each mark were flagged if the scores were below a normalised strand cross- correlation coefficient (NSC) threshold of 1.05. These analyses were performed on Galaxy-qld server (galaxy-qld.genome.edu.au) developed within the GVL project (Afgan et al., 2016; Afgan et al., 2015) and maintained by the Research Computing Centre, University of Queensland, Australia. Datasets description and statistics are listed in Appendix 5.4.

Histone PTM and RNAPII regions of enrichment relative to sequenced input DNA controls were determined using MACS2 v2.1.0 (Zhang et al., 2008) according to ENCODE and Roadmap Epigenomics consortium guidelines (Consortium, 2012; Kellis et al., 2014; Landt et al., 2012). Specifically, MACS2 was used to identify broad domains that pass a broad-peak Poisson P-value of 0.1 and gapped peaks, which are broad domains (P <0.1) that include at least one narrow peak (-q 0.05, --broad, --nomodel, --extsize 146, -g 1.45e8). I used the gappedPeak representation for the histone PTMs with relatively compact enrichment patterns, including H3K4me3, H3K27ac and H3K4me1. For the diffused histone PTMs Ð H3K36me3 and H3K27me3 Ð I used the broadPeak representation. AqATF4 peaks were detected using MACS2 and a p-value threshold of 1e-05 (-p 0.00001, --call-summits, --nomodel, --extsize 146, -g 1.45e8). Choice of peak calling algorithm and parameters were based on published similar work (Cohen et al., 2015; Han et al., 2013; Huggins et al., 2016; Li et al., 2011; Sebe-Pedros et al., 2016). AqATF4 peaks were also detected using the Peakzilla software (Bardet et al., 2013) relative to input DNA (-c 2 Ðs 3).

For every pair of aligned ChIP and matching input DNA data sets, I also used MACS2 (Zhang et al., 2008) to generate genome-wide signal coverage tracks for every position in the Amphimedon genome (Srivastava et al., 2010). The two types of signal score statistics computed per base are as follows: (i) Fold-enrichment ratio of ChIP-Seq counts relative to expected background

118 counts λlocal and (ii) Negative log10 of the Poisson P-value of ChIP-Seq counts relative to expected background counts λlocal.

5.3.5 Histone H3 PTMs tracks and peaks analyses Chromatin states across the genome were defined using ChromHMM v1.10 (Ernst and Kellis, 2012) with default parameters. For each H3 PTMs ChIP-Seq data set, read counts were computed in non-overlapping 200 bp bins across the Amphimedon genome (Srivastava et al., 2010). ChromHMM was tested with different a priori defined states (from 5 to 15) and a 9-state model was chosen as the best model to cover all possible genomic locations (promoter, enhancer, gene body) that I expected to resolve given the selection of histone PTMs used. To assign biologically meaningful mnemonics to the states, ChromHMM was used to compute the overlap and neighbourhood enrichments of each state relative to various types of functional annotations. The functional annotations used were as follows: (1) CpG islands obtained using hidden Markov models as described in Wu et al. (Wu et al., 2010); (2) exons, genes, introns, transcription start sites (TSSs) and transcription end sites (TESs), 200 bp windows around TSSs and 200 bp windows around TESs based on Aqu2.1 and lincRNA gene model annotations (Fernandez-Valverde et al., 2015; Gaiti et al., 2015); (3) expressed and repressed genes, their TSSs and TESs. Genes were classified into expressed and repressed class based on their CEL-seq expression levels in the mature larva (Anavy et al., 2014; Hashimshony et al., 2012; Levin et al., 2016).

Regions of enrichment of the various histone PTMs and RNAPII were overlapped with protein-coding genes and lincRNAs; the Bioconductor R package GeneOverlap v1.14.0 (https://www.bioconductor.org/packages/release/bioc/html/GeneOverlap.html) was used to test and visualise their association with lists of various gene expression groups (R Development Core team). Protein-coding genes and lincRNAs were classified into ‘high’, ‘mid’, ‘low’ and ‘non- expressed’ based on their CEL-seq expression levels in the mature larva (Anavy et al., 2014; Hashimshony et al., 2012; Levin et al., 2016).

Enhancer elements were predicted as reliable H3K4me1 regions of enrichment, which did not overlap TSSs (no intersection with 200 bp upstream or 200 bp downstream of the TSSs of protein-coding genes and lncRNAs), but overlapped with regions designated as being in an enhancer chromatin state (‘TxEnhA1’ or ‘TxEnhA2’ or ‘EnhWk’ or ‘EnhP’ state) based on the ChromHMM analysis. The activated enhancer elements were predicted intersecting enhancer elements with H3K27ac significant peaks, requiring a 50% minimal overlap fraction. BEDTools v2.23.0 (Quinlan and Hall, 2010) was used to calculate overlaps between regions of enrichment and 119 chromatin states with the different genomic features, as well as to identify the nearest TSS for each of the activated enhancer elements.

5.3.6 AqATF4 tracks and peaks analyses Regions of enrichment of AqATF4 were overlapped with the following A. queenslandica genomic regions, defined as above: exons; introns; 200 bp windows around TSSs; 200 bp windows around TESs; and 200 to 1000 bp upstream of TSSs. Genome-wide coverage of these regions was determined using a custom Perl script (Appendix 5.5).

GeneOverlap was used to test the association of AqATF4 regions of enrichment with several gene expression groups (“high”, “mid”, “low”, “non-expressed”), as described above. BEDTools (Quinlan and Hall, 2010) was used to compute the overlap between AqATF4 peaks and each of the five H3 PTMs peaks and predicted enhancer-like elements, requiring a 50% minimal overlap fraction. A set consisting of randomly selected regions matching AqATF4 peaks features (number of peaks and peak length) was used as control (1,000 iterations).

5.3.7 De novo motif analyses De novo motif enrichment analyses in H3 PTMs and AqATF4 peaks were performed using MEME-ChIP against JASPAR CORE and UniPROBE Mouse databases (-meme-minw 6, -meme- maxw 15, -meme-nmotifs 20, -dreme-e 0.05, -meme-mod zoops) and against all vertebrate DNA motifs (-meme-minw 4, -meme-maxw 30, -meme-nmotifs 10, -dreme-e 0.05, -meme-mod zoops), respectively (Machanick and Bailey, 2011). Each motif was renamed according to their most similar motif in the TOMTOM database or literature, if any. Motif mining in AqATF4 peaks was also performed with HOMER (Heinz et al., 2010) using the Ðsize given option.

5.3.8 Functional enrichment analyses Gene Ontology (GO) functional enrichment analyses were performed using the Cytoscape plugin BiNGO (Maere et al., 2005; Shannon et al., 2003) with custom annotation (Fernandez- Valverde, unpublished) and a FDR adjusted P-value cut-off of 0.01.

5.3.9 Read coverage profiles Transcription Start Site (TSS) input DNA-normalised coverage profiles and heat maps were calculated using ngs.plot v2.61 (Shen et al., 2014). Only lincRNAs found in scaffolds larger than 10 kb were used for all the analyses and, given the compact genome of Amphimedon (Fernandez- Valverde and Degnan, 2016), all the TSS profile analyses were restricted to non-overlapping genes

120 with an intergenic distance >1 kb that were found in scaffolds larger than 10 kb. All genome browser figures were generated using a local instance of the UCSC genome browser (Kuhn et al., 2013).

5.3.10 AqATF4 expression profile and patterns Analysis of AqATF4 profiles and pattern of expression were performed as described in Chapter 3 (3.1.4, 3.1.8 and 3.1.9).

5.3.11 Data acesss ChIP-seq data sets have been deposited to the NCBI Gene Expression Omnibus (GEO) (Edgar et al., 2002) under accession number GSE79645. CEL-seq data sets can be downloaded from NCBI GEO (GSE54364) (Anavy et al., 2014). The Aqu2.1 gene model was used for all analyses in this study (http://amphimedon.qcloud.qcif.edu.au/) (Fernandez-Valverde et al., 2015).

5.4 Results

5.4.1 Nine chromatin states were identified in A. queenslandica To explore chromatin accessibility in A. queenslandica larvae, I performed ChIP-Seq on five histone H3 modifications that have been extensively used to define different chromatins states in animals (Ho et al., 2014; Zhou et al., 2011): (i) monomethylated lysine 4 (H3K4me1), associated with distal cis-regulatory elements such as enhancers; (ii) trimethylated lysine 4 (H3K4me3), enriched in active promoters; (iii) trimethylated lysine 36 (H3K36me3), marking actively transcribed regions; (iv) trimethylated lysine 27 (H3K27me3), found in silenced regions; and (v) acetylated lysine 27 (H3K27ac), which occurs around activated regulatory regions. Antibodies against these modifications of the histone H3 tail have also successfully been used in a range of eukaryotic species (Barraza et al., 2015; Eckalbar et al., 2016; Harmeyer et al., 2015; Liu et al., 2007; Sebe-Pedros et al., 2016). Since the amino acid sequence of A. queenslandica histone H3 is identical to its bilaterian orthologues (Appendices 5.3, 5.6) and A. queenslandica also have the relevant histone methyltransferases and acetyltransferases (Appendix 5.7), these antibodies are prone to recognise the correct epitopes in this sponge. An antibody against unphosphorylated Ser2 residues of RNA polymerase II (RNAPII 8WG16) C-terminal domain also was included (Appendix 5.2).

ChIP-Seq experiments were carried out on several hundred individual larvae, which are comprised of at least 10 morphologically distinct cell types (Leys and Degnan, 2002). Despite the inherent cellular heterogeneity of the starting material, distinct chromatin states were identified. A 121 nine-states model was chosen that best represented the combinatorial distribution of H3 PTMs in larvae and adults (F. Gaiti, PhD thesis under review and Gaiti et al., under review) and their association with genomic features (promoter, enhancer, gene body) and different transcriptional states. Active promoters (‘TssA’) and enhancers (‘TxEnhA1’, ‘TxEnhA2’ and ‘EnhWk’) correlated with actively transcribed genes, while bivalent or poised regulatory (‘BivTx’, ‘EnhP’), repressed Polycomb (‘ReprPC’, ‘ReprPCWk’), and quiescent (‘Quies’) states correlated with lowly or non- expressed genes (Figure 5.1). Specifically, the active promoter state (‘TssA’), characterised by the presence of H3K4me3, was enriched around transcription start sites (TSSs) of expressed genes. ‘TxEnhA1’ and ‘TxEnhA2’ states Ð defined by H3K4me1 and H3K27ac enrichment, which typically mark enhancers in eumetazoans (Heinz et al., 2015; Ho et al., 2014; Zhou et al., 2011) Ð were predominantly localised in non-first exons and introns. In contrast, the ‘ReprPC’ states (defined by H3K27me3 enrichment) were distributed along the gene body, consistent with the association of H3K27me3 with transcriptional silencing (Ho et al., 2014; Zhou et al., 2011) (Figure 5.1).

Regions of H3K36me3 enrichment coincided with H3K27me3 and were associated with non-transcribed genes (Appendix 5.8). This was most likely the result of technical difficulties, possibly due to weak antibody binding, which were also reported in adult (Gaiti et al, under review). H3K36me3 was therefore excluded from all further analyses.

A remarkably similar set of chromatin states was identified in adult A. queenslandica that associated with the same genomic features as in larvae, and changed between these two developmental stages in correlation with changes in gene expression (Gaiti et al, under review). The consistent combinatorial pattern of chromatins states, genomic features and transcriptional states in two markedly different A. queenslandica stages, characterised by different cell types and transcriptomes (Conaco et al., 2012; Degnan et al., 2015; Fernandez-Valverde et al., 2015), provides corroborating evidence that this sponge possesses complex and dynamic chromatin states, akin to eumetazoans.

5.4.2 Histone PTMs correlate with different gene expression groups To investigate the putative role of histone H3 PTMs in transcriptional regulation, I computed the association of each mark and RNAPII with groups of low, medium, high or non- expressed genes (Fisher’s exact test, FDR adjusted P-value <0.05). Genes were sorted based on their expression level in mature larva measured by CEL-seq (Anavy et al., 2014; Hashimshony et al., 2012; Levin et al., 2016).

122

RNAPII and all of A. queenslandica H3 PTMs except H3K27me3 were highly correlated with expressed protein-coding genes. H3K27me3, which marked silenced genomic regions in eumetazoans (Schwaiger et al., 2014; Zhou et al., 2011), was correlated with non-expressed protein- coding genes (Figure 5.2A). Surprisingly, long intergenic non-coding RNAs (lincRNAs) did not show patterns of significant correlation (Appendix 5.9). That is in contrast to the adult dataset, where H3K4me3 was significantly associated with the TSSs of highly expressed lincRNAs (Gaiti et al., under review).

Input-normalised ChIP-Seq read coverage of H3K4me3 revealed a strong peak positioned immediately after the TSS of expressed protein-coding genes, with peak height correlating with expression states (Figure 5.2B). This is consistent with H3K4me3 being promoter-proximal and positioned on the +1 nucleosome, as observed in eumetazoans (Ho et al., 2014; Zhou et al., 2011) and adult A. queenslandica (Gaiti et al., under review). Additional enrichment in H3K4me3 was also observed slightly upstream of the TSS of mid-expressed genes, which has not been reported in non-mammalian species (Schwaiger et al., 2014) and was not observed in the A. queenslandica adult dataset (Gaiti et al., under review). This profile might result from genes being transcribed in the opposite direction from nearby TSSs (Schwaiger et al., 2014); however, since only non- overlapping genes were included in this analysis, it is unclear why such a profile is observed in A. queenslandica larvae.

Importantly, regions of H3K4me3 enrichment specifically marked the TSS of genes in the developmental stage where they are actively transcribed (Figure 5.2C-D). H3K4me3 strongly marked the TSS of a number of transcription factor genes, including many of the AqbZIPs (Figure 5.3). While these genes are not all up-regulated in mature larvae, they are still amongst the most highly expressed genes (Chapter 3). For instance, larva-specific AqOASIS and AqATF4 TSSs correlated with a strong enrichment in H3K4me3 relative to AqATF2, which is more lowly expressed at this stage (Figure 5.3).

5.4.3 Enhancer-like elements in A. queenslandica To identify putative A. queenslandica enhancer elements in silico, I selected distal H3K4me1 regions of enrichment (more than 200 bp upstream or downstream from TSSs of protein- coding genes and lncRNAs) that overlapped with any of the enhancer chromatin states (‘TxEnhA1’,‘TxEnhA2’, ‘EnhWk’ or ‘EnhP’ states) (Figure 5.4A-B). A subset of these regions was also marked by H3K27ac, and therefore likely to be active (Figure 5.4A-B). 41% of the predicted

123 activated enhancer-like elements were occupied by RNAPII, which suggests that enhancer RNAs are being transcribed from these regions (Li et al., 2016; Natoli and Andrau, 2012) (Figure 5.4A-B).

Consistent with the enrichment of enhancer chromatin states at introns (Figure 5.1A), predicted activated enhancer-like elements were predominantly intragenic, with only 20% found in intergenic regions. These enhancer-like regions are significantly enriched with consensus binding motifs of key developmental transcription factor families, including Irx and SOX (Figure 5.4 C). Similar overall genomic distribution and motifs were identified in activated enhancer-like elements predicted in adult A. queenslandica (F. Gaiti, PhD thesis under review and Gaiti et al., under review).

To get insights on the putative target genes under the regulation of these activated enhancer- like elements, I examined the closest located TSSs to each of the predicted activated enhancer-like elements. Since A. queenslandica lacks the CCCTC-binding factor (CTCF) (Heger et al., 2012), which mediates enhancer-promoter long-range interactions in bilaterians (Merkenschlager and Odom, 2013; Seitan et al., 2013), it is likely that A. queenslandica enhancers interactions are limited to nearby promoters. These nearest neighbour genes were significantly enriched for gene ontology (GO) terms associated with basic cellular and developmental processes and transcription factor activity (Hypergeometric test, FDR adjusted P-value <0.01) (Figure 5.4.D; Appendix 5.10). They notably included a number of transcription factors, such as and NFKB2 (Appendix 5.11), but none of the AqbZIPs.

5.4.4 AqATF4 localises to a subset of nuclei in the larval inner cell mass Through A. queenslandica development, AqATF4 is expressed in the top quartile of overall and AqbZIP gene expression (Figure 5.5 A). AqATF4 is significantly upregulated in late embryogenesis and all larval stages (late ring to larvae 6-50 h post emergence). AqATF4 transcripts are particularly abundant in larval cells of the inner cell mass (ICM), with some expression in the epithelial layer (Figure 5.5 B). The custom-made antibody raised against AqATF4 almost exclusively labelled the nucleus of a subset of cells in the ICM using immunofluorescence. These cells have structural hallmarks characteristic of archeocytes; i.e., large amoeboid cells with large nuclei (Figure 5.5 C). Pre-absorption of the antibody with the AqATF4 antigen peptide abolished all nuclear staining (Figure 5.5 C).

124 Chromatin in the vicinity of the AqATF4 gene in larvae was marked by a strong enrichment of H3K4me3 and H3K27ac, consistent with the high level of expression of this gene in mature larvae, in contrast to adults (Figure 5.5 D).

5.4.5 AqATF4 occupancy is not promoter-proximal Using the widely used peak-calling algorithm MACS2 (Zhang et al., 2008) with stringent parameters (Cohen et al., 2015; Han et al., 2013; Landt et al., 2012; Li et al., 2011; Sebe-Pedros et al., 2016), 1,756 discrete AqATF4 peaks were identified (Appendix 5.4). Another peak-caller, Peakzilla (Bardet et al., 2013), identified 814 peaks, 80% of which belonged to the MACS2 peak sets. Manual inspection of the read coverage tracks of the Peakzilla peaks that were not identified using MACS2 showed a relatively poor enrichment for AqATF4 in those regions and suggested that the peakzilla parameters chosen were not optimal. From hereon, only analyses of the MACS2 peak sets are presented.

23% of AqATF4 peak summits were located in exons, representing a 1.7-fold enrichment over a random distribution of peaks. Likewise, peak summits at TES represented a 1.5-fold enrichment over a random distribution (Figure 5.6A and Appendix 5.12). Other genic regions, including TSSs, showed no significant enrichment, although individual examples of ATF4 peaks in introns, TSSs and intergenic regions exist (Figure 5.6 A). Interestingly, the position of AqATF4- bound regions varies with transcriptional state. Input-normalised read coverage reveals an enrichment of AqATF4 immediately before the TES in expressed genes but not before that of non- expressed genes (Figure 5.6B). Given the compactness of A. queenslandica genome (Fernandez- Valverde and Degnan, 2016), only non-overlapping genes with an intergenic distance >1 kb (~ 36% of all A. queenslandica genes) were included in these analysis. The estimation of AqATF4 peak genomic location may thus not be fully accurate due to the overlap of A. queenslandica genomic features.

5.4.6 AqATF4 is associated with poised genomic regions AqATF4 occupancy sites are highly enriched for RNAPII and all H3 PTMs except H3K4me3 (Figure 5.7 A, B and Appendix 5.12), consistent with AqATF4 not being enriched at TSSs (Figure 5.6 A). Specifically, 76% of AqATF4 peaks coincided with H3K4me1 enrichment, corresponding to a 20-fold increase over random. Consistent with the co-localisation of AqATF4 with H3K4me1 and H3K27ac, 23% of AqATF4peaks coincided with predicted enhancer-like elements and more specifically active enhancer-like elements, representing a 15 and 27-fold enrichment over random, respectively (Figure 5.7 A, B and Appendix 5.12).

125 Accordingly, AqATF4 enrichment coincides with genomic regions defined as bivalent Ð ‘BivTx’ and ‘EnhP’- and marked both by permissive and repressive H3 PTMs (Figure 5.7C). Specifically, regions with little PTM (‘Quies’ or ‘EnhWk’) were depleted in AqATF4, consistent with these regions not being accessible to transcription factor binding (Heinz et al., 2015). Note that regions marked with H3K4me1 and H3K27ac but not H3K27me3 (‘TxEnhA1’ and‘TxEnhA2’) were also markedly depleted with AqATF4, suggesting that the association of AqATF4 enrichment with bivalent chromatin states is not the result of technical or biological bias.

5.4.7 AqATF4 putative targets Three motifs were significantly enriched in AqATF4 putative binding sequences but only present in a subset of the peaks, including one that matches the JUND-bZIP canonical binding site (Figure 5.8A). This bZIP-like motif comprised of TCA-repeats and is markedly longer than a canonical bZIP-binding motif (e.g., the 8 bp-long CRE). Other significantly enriched motifs included a and an IRX binding motifs (Figure 5.8A). Since the motifs discovered by MEME were only present in a small, but significant, subset of AqATF4 peaks, I completed this analysis using HOMER (Heinz et al., 2010), another widely used motif-mining tool (Cohen et al., 2015; Heinz et al., 2010; Sebe-Pedros et al., 2016). HOMER identified a wider range of motifs, including binding motifs of JUND and several other transcription factors such as RUNX, Deaf1 and GATA5 (Figure 5.8 A and Appendix 5.13). The JUN-like motif identified by HOMER was shorter than that identified by MEME but bore similar TCA-repeats; however, none of the motifs resembled a typical bZIP binding site comprising of two highly conserved half sites occasionally forming a palindrome.

Next, I nominated putative AqATF4 gene targets (n= 1,530) by searching for the closest TSS of protein-coding genes to each peak summit. As explained earlier (see 5.4.3), long-range transcriptional regulation is probably limited in A. queenslandica. Thus, I selected putative target genes whose TSSs were within 5 kb (n= 1,306) and 1 kb (n= 673) of an AqATF4 peak summit. Gene Ontology (GO) terms enriched in these three gene lists include transmembrane signalling and protein transport (Figure 5.8B and Appendix 5.14). Genomic occupancy of AqATF4 within 1 kb of TSSs is also markedly associated with protein degradation and DNA modifications. The full list of AqATF4 putative gene targets is provided in Appendix 5.15. and notably includes several histone and protein modifiers (e.g., histone deacetylase 11, several acetyl- and methyl- transferase proteins; e.g., several e3 ubiquitin-protein ligases). Interestingly, a majority of these genes markedly fluctuate in expression in the early stages of metamorphosis (Appendix 5.16).

126 5.5 Discussion

Here, I undertook the first transcription factor ChIP-Seq analysis in an early-branching non- eumetazoan. Building on the H3 PTMs ChIP-Seq analyses on adult A. queenslandica (Gaiti et al., under review), the present analysis of combinatorial histone H3 PTMs patterns in A. queenslandica larvae allows the first glimpse at the interaction of a transcription factor with cis-regulatory DNA and chromatin in a sponge, and, thus, inferences into transcriptional regulation in the last common ancestor of all metazoans.

The ChIP-Seq analysis presented in this chapter was carried out on an admixture of all larval somatic cells, comprised of at least 10 morphologically distinct cell types (Leys and Degnan, 2002). In contrast to ChIP-Seq analyses performed on unicellular organisms, cell lines or more homogenous tissue samples (e.g. Han et al., 2013; Li et al., 2011; Pham et al., 2013; Sebe-Pedros et al., 2016), my analysis measured a diversity of gene transcriptional states simultaneously. While this sampling strategy provides an understanding of all the chromatin states present at this developmental stage, it might dilute cell type-specific signals. Additionally, larval samples were comprised of several hundred of individuals of different maternal origin. Since A. queenslandica larvae age asynchronously (Degnan and Degnan, 2010) and express different transcripts at different stages of the larval phase (Anavy et al., 2014; Conaco et al., 2012; Levin et al., 2016), multiple individuals might further contribute to the heterogeneity of the cell population analysed here.

5.5.1 Histone H3 PTMs reveal distinct chromatin states in A. queenslandica larvae Despite the great heterogeneity of the cell population used here, distinct chromatin states characterised by clear and significant H3 PTM marks were identified in A. queenslandica larvae, similar to bilaterians and cnidarians (Ho et al., 2014; Schwaiger et al., 2014; Turner, 2007; Zhou et al., 2011). For instance, the genome-wide promoter analysis of H3K4me3 Ð the canonical and widespread eukaryotic histone PTM of active transcription Ð reveals a complex correlation between H3K4me3-containing nucleosome occupancy and gene expression in A. queenslandica larvae. Importantly, despite being comprised of different cell types and having a distinct gene expression profile from the larva (Conaco et al., 2012; Degnan et al., 2015; Fernandez-Valverde et al., 2015; Gaiti et al., 2015), the adult genome possesses a remarkably similar set of chromatin states (Gaiti et al., under review). Obtaining consistent chromatin states based on histone PTMs ChIP-Seq data from two markedly different stages of the A. queenslandica life cycle provides corroborating evidence that this sponge possesses the same regulatory states as present in eumetazoans and confirmed the dynamics of histone PTMs in genes regulated throughout A. queenslandica development (Gaiti et al., under review). The implication of A. queenslandica H3 PTM regulatory 127 landscape in regards to the evolution of multicellular animals is thoroughly discussed in Gaiti et al. (under review).

These distinct regulatory states also enabled the prediction of enhancer elements in A. queenslandica larvae, which were characterised by the same combination of histone H3 PTMs as in eumetazoans (Heinz et al., 2015; Ho et al., 2014; Zhou et al., 2011). In bilaterians, transcriptional regulation by distal enhancer elements, which typically contains multiple recognition motifs for sequence-specific transcription factors, contributes to the spatiotemporal coordination of cell differentiation processes (Heinz et al., 2015; Levine et al., 2014). Consistent with this premise, A. queenslandica predicted enhancers were associated with several developmental and transcriptional regulators (e.g., POU, SOX). The genomic localisation of predicted active enhancers to intronic regions supports the hypothesis that, due to the lack of large intergenic regions in A. queenslandica (Fernandez-Valverde and Degnan, 2016), cis-regulatory elements may be preferentially located in distal introns (Gaiti et al, under review). Enhancer elements bearing the same characteristics are also observed in adults that are active in a stage-specific manner between these two stages (Gaiti et al, under review). This observation supports the hypothesis that enhancer elements are an integral component of this sponge cis-regulatory landscape, as they are in eumetazoans (Heinz et al., 2015; Schwaiger et al., 2014; Shlyueva et al., 2014).

5.5.2 AqATF4 binding is enriched around open chromatin Overall, AqATF4 occupancy strongly coincided with several histone H3 PTMs. This is consistent with transcription factors only binding those of their recognition sites that are accessible, i.e. in regions of remodelled chromatin (John et al., 2011; Li et al., 2007).

AqATF4 occupancy was not associated with promoter regions. Numerous human promoters harbour CRE and CCAAT-sites, which are both bZIP recognition sites that can be bound by ATF4 dimers (Weirauch and Hughes, 2011). Accordingly, ATF4 binding to the promoters of many genes results in transcriptional regulation (reviewed in Ameri et al., 2011; Ameri and Harris, 2008; Hai and Hartman, 2001; e.g. in Lee et al., 2015; Luo et al., 2003). Consistent with this idea, while the majority of ChIP-Seq peaks identified for bZIPs in bilaterians are not in promoter regions, bZIP occupancy is markedly enriched (up to 5 fold enrichment over random) in the promoter regions of the gene they regulate (Consortium, 2012; Han et al., 2013; Huggins et al., 2016; Li et al., 2011; Pham et al., 2013; van den Beucken et al., 2009; Yu et al., 2014). However, often only TSS- proximal peaks are analysed (Huggins et al., 2016; van den Beucken et al., 2009). It is worth noting that some of the aforementioned studies defined TSS proximal regions as a window significantly

128 larger than bilaterian promoters (spanning up to -5 kb to +2 kb across the TSS), which in A. queenslandica would inevitably include numerous overlapping genomic features (Fernandez- Valverde and Degnan, 2016). Since only non-overlapping A. queenslandica TSSs were included in the present analysis, the overall proportion of AqATF4 peaks overlapping TSS-proximal regions may be higher. However, the marked depletion of H3K4me3 signal in AqATF4-bound regions suggests that AqATF4 does not typically bind the promoter of the genes under its control.

In contrast, AqATF4 sites strongly correlated with predicted A. queenslandica enhancer-like elements, and were notably enriched for H3K4me1 and H3K27ac, which typically mark active distal regulatory regions (Turner, 2007). Interestingly, distal binding of three bZIPs Ð ATF4 in human mesenchymal stem cells (Cohen et al., 2015), JUN and CEBP in murine macrophages (Heinz et al., 2015) Ð to enhancer elements has been shown to be critical for terminal cell differentiation. bZIP general occupancy in extended TSS-proximal regions that also contain enhancer chromatin signatures in bilaterians (Peter and Davidson, 2015) may, thus, reflect ATF4 enrichment both in promoter and putative enhancer regions.

Finally, recent evidence also suggests that ATF4 can bind silent chromatin and act as a pioneer transcription factor (Zaret and Carroll, 2011) by promoting the transcription of histone- modifying enzymes (Gjymishka et al., 2008; Shan et al., 2012). Shan et al. speculated that the specificity of ATF4 transcriptional response to different stress-signals is modulated by chromatin modifications rather than binding of ATF4 to different sites (Shan et al., 2012). Since AqATF4 putative targets include several A. queenslandica histone modifiers, a similar role for AqATF4 in A. queenslandica is plausible.

5.5.3 AqATF4 appears to have multiple regulatory functions Binding of AqATF4 in bivalent regions, bearing both permissive and repressive histone PTMs, is consistent with its association with both expressed and repressed genes and suggests three alternative, but not mutually exclusive, scenarios. First, AqATF4 could be acting as both a repressor and an activator in A. queenslandica. Supporting this hypothesis, the ambiguous role of ATF4 has been suggested in a number of studies in bilaterians (Ameri et al., 2011; Ameri and Harris, 2008; Chen et al., 2003; Hai and Hartman, 2001; Karpinski et al., 1992).

Alternatively, AqATF4 could be acting as a transcriptional enhancer in a cell-type specific manner. Whole animal ChIP-Seq, as used here, does not provide sensitivity towards cell-type specificity. AqATF4 may thus be binding regulatory elements that are transcriptionally active in

129 some cells and silenced in others. The combination of both acetylation and tri-methylation modifications on H3K27 at AqATF4 enrichment sites is consistent with this notion, since bilaterian poised enhancers usually bear only one of those marks (Heinz et al., 2015). The localisation of AqATF4 to only a small subset of nuclei in A. queenslandica larvae lends strong support to this mixed-signal hypothesis.

Finally, human poised enhancers are most commonly found in embryonic stem cells and usually associated with the ability to rapidly switch transcriptional output in response to environmental stimuli (Heinz et al., 2015). Interestingly, AqATF4 is predominantly expressed in archeocytes, which are the primary stem cell in sponges (Funayama, 2013). The induction of metamorphosis induces these and other cells to markedly change their behaviours, locations and eventually their state of differentiation (Nakanishi et al., 2014).

AqATF4 putative targets include a number of GPCR signaling and membrane trafficking genes. This is consistent with the requirements of the larval benthic life style. At this developmental stage, the larva is competent and needs to respond rapidly to the environment to find a suitable habitat to settle and initiate metamorphosis (Degnan and Degnan, 2010). AqATF4 occupancy was also associated with protein degradation processes, akin to ATF4 involvement in the mammalian ubiquitin-proteasome pathway (Ameri and Harris, 2008; Lassot et al., 2001). This could suggest a role for AqATF4 in the high protein turnover that may accompany the transition to the metamorphosing post-settlement stages.

The absence of strong canonical motif in AqATF4-bound regions could be due to AqATF4 having different dimerization partners in different cell types. Bilaterian ATF4s form multiple heterodimers with different bZIPs that collectively recognise different DNA sites (Ameri and Harris, 2008; Cohen et al., 2015; Hai and Hartman, 2001; Han et al., 2013; Shan et al., 2012). As a result, ATF4 ChIP-Seq peaks can produce composite or low-complexity motifs without clear homology to canonical palindromic bZIP binding sites (Cohen et al., 2015; Huggins et al., 2016), which are also found in a subset of peaks (e.g. 15% of genome-wide peaks (Huggins et al., 2016), although usually in a larger proportion (Cohen et al., 2015; Han et al., 2013). A. queenslandica AqATF4 form strong predicted interactions with AqPARa and AqCEBP, which are both upregulated in the larva (see Chapter 3, notably Figures 3.9 and 3.21). AqATF4 and AqCEBP transcripts are also predominantly localised in the larva ICM (Chapter 3, notably Figures 3.13). The ATF4:CEBP heterodimer recognises the hybrid site TGATGCAA in human and Drosophila (Huggins et al., 2016; Jolma et al., 2013); however, the ATF4:PAR heterodimer is not formed in

130 vertebrates and its DNA binding preferences are therefore unknown. PAR homodimers tend to preferentially bind PAR-sites, TTAYRTAA, as well as composites of PAR and other bZIPs half sites (Haas et al., 1995). Although A. queenslandica AqATF4 dimerization partners remain to be experimentally confirmed (Chapter 3), it is plausible that AqATF4 heterodimers could bind a collection of hybrid sites comprising one CRE half site (TGA) and one dimer-specific half site. This diversity would be even greater since the ChIP-Seq dataset presented here was performed on a mixture of cell types. A better optimisation of peak calling parameters could also improve the detection of centrally-enriched (Lesluyes et al., 2014) motifs in AqATF4 ChIP-Seq peaks. Notably, overlapping regions of enrichment from different peak callers could refine the location of AqATF4 binding sites.

5.5.4 Conclusions AqATF4 is highly expressed in the ICM of A. queenslandica larvae and immunostaining suggests that AqATF4 moves into the nuclei of a subset of archeocyte-like cells at this stage, where it presumably regulates gene expression. ChIP-Seq analysis shows that AqATF4 occupancy is enriched in regions of chromatin assigned as being active or poised enhancers, suggesting it is acting in the regulation of a subset of genes in the genome. Genes in the vicinity of these enhancers are not only associated with protein trafficking and degradation, cell signalling and DNA modification, but also dynamically expressed in larvae and during early metamorphosis (on-going investigation). DNA motifs enriched in AqATF4 binding sites show hallmarks of non-canonical bZIP binding and suggest multiple bZIP dimerization partners. Predicted dimerization of AqATF4 with AqPARa and AqCEBP, which are also highly expressed in the competent larva, supports this hypothesis. Collectively, these observations suggest that AqATF4 play a role in the functioning of A. queenslandica competent larva and most likely in its preparation for metamorphosis. It also constitutes the first study, to my knowledge, of the interplay between transcription factors binding and chromatin dynamics in a non-eumetazoan.

131 Figures

Figure 5.1. Chromatin states in larva (legend over page)

132 Figure 5.1. Chromatin states in larva (see previous page) (A) Definition and enrichments for a 9-state hidden Markov model based on four histone PTMs (H3K4me3, H3K27ac, H3K4me1, H3K27me3) in Amphimedon larva. From left to right: chromatin state definitions, abbreviations, histone PTM probabilities, genomic coverage, protein-coding gene functional annotation enrichments, expressed (Expr.; CEL-seq normalised counts >0.5 in larva) and repressed (Repr.; CEL-seq normalised counts <0.5 in larva) protein-coding gene enrichments. Blue shading indicates intensity, scaled by column. (B) Chromatin state annotations on a gene rich highly transcribed (active) scaffold (contig13500). For the definition of chromatin states see panel (A). Coding genes (purple) and long non-coding RNAs (blue) are shown, along with signal coverage tracks showing CEL-seq expression in larva. A grey scale indicates CEL-seq expression level: white (no-expression); black (highest expression). (C) Chromatin state annotations on a predominantly silenced scaffold (contig13522 from 500,000 to 1,500,000 bp). For the definition of chromatin states see panel (A). Coding genes (purple) and long non-coding RNAs (blue) are shown, along with signal coverage tracks showing CEL-seq expression in adult. A grey scale indicates CEL-seq expression level: white (no-expression); black (highest expression). (D) Positional enrichments in 100 bp genomic bins around the TSS and TES (+/- 1 kb) of expressed (Expr.; CEL-seq normalised counts >0.5 in larva) protein-coding genes in larva. (E) Same as (D) for repressed protein-coding genes in larva (Repr.; CEL-seq normalised counts <0.5 in larva). Blue shading indicates intensity. This plate was prepared in collaboration with F. Gaiti.

133

Figure 5.2. Histone PTMs are correlated with gene expression variations during development (legend over page)

134 Figure 5.2. Histone PTMs are correlated with gene expression variations during development (A) Larva regions of enrichment of four histone PTMs (H3K4me3, H3K27ac, H3K4me1, and H3K27me3) and RNAPII were overlapped with protein-coding genes and their association with lists of various gene expression groups was tested. Only non- overlapping protein-coding genes in scaffolds larger than 10 kb were used in this analysis. The colour key represents the log2(odds ratio) and the significant adjusted P- values (Fisher’s exact test) are superimposed on the grids. A P-value of zero means the overlap is highly significant. N.S.: not significant. (B) TSS-centred average input DNA normalised read coverage plot of H3K4me3 across Amphimedon protein-coding genes. The x-axis spans +/- 3 kb around TSSs and represents the position within the gene relative to TSS. The y-axis represents the input DNA normalised enrichment for H3K4me3 ChIP-seq reads in larva. Protein-coding genes with TSSs closer than 1 kb to each other were excluded from the analysis and only non-overlapping protein-coding genes in scaffolds larger than 10 kb were plotted. Blue line: Non-expressed genes. Orange line: second top 500 expressed genes. Green line: first top 500 expressed genes. Expressed genes: CEL-seq read counts >0 in larva. The shaded grey area represents the average size of Amphimedon coding sequences. (C) Coding gene with larva-specific expression marked by H3K4me3. The genomic window shows signal coverage tracks of H3K4me3 counts relative to input DNA and RNA-Seq expression in both larva and adult. (D) Same as (C) for coding genes with adult-specific expression. This plate was prepared in collaboration with F. Gaiti.

Figure 5.3. H3 PTMs in the vicinity of AqbZIP genes (see next page) Three genomic windows showing signal coverage tracks of four histone H3 PTMs (H3K4me3, H3K27ac, H3K4me1, and H3K27me3) and RNAPII normalised to input DNA in the vicinity of two highly (A) and one lowly expressed (B) AqbZIPs. RNA-Seq larval expression is shown below the tracks. Note that the scale of RNA-Seq tracks is different between genes.

135

Figure 5.3. H3 PTMs in the vicinity of AqbZIP genes (legend on previous page)

136

Figure 5.4. Distal enhancer-like elements in A. queenslandica larva (A) Overview of the computational filtering pipeline adopted to predict the putative larva A. queenslandica activated enhancer-like elements. See main text and Methods for details. (B) Example of an intragenic activated enhancer-like element, co-occupied by RNAPII (highlighted in grey). Gene models are shown in purple, along with signal coverage tracks of ChIP-seq counts relative to input DNA and RNA-Seq expression in larva. (C) DNA motifs enriched in the larva predicted activated enhancer-like sequences. For each motif, the best match to a motif in the JASPAR CORE and UniPROBE mouse databases, the E-value and the number of sites contributing to the construction of the motif are shown. The matched motif is shown on the top and the query motif is shown on the bottom. (D) Enriched gene ontology (GO) terms among the nearest neighbour protein-coding genes of the larva predicted activated enhancer-like elements. Bar length indicates the significance of the enrichment (Hypergeometric test; -log10(adjusted P-value).

137

Figure 5.5. AqATF4 expression in larva (legend over page)

138 Figure 5.5. AqATF4 expression in larva (see previous page) (A) Log10 normalised expression values of AqATF4 (red line) across Amphimedon development, from early-cleavage to adult. Green and black dotted lines indicate the 75th percentile of the same values for all of the AqbZIPs and A. queenslandica genes, respectively. (B) Localisation of AqATF4 transcripts in cells of the inner cell mass (ICM) and epithelial layer (red arrowheads). Scale bars: 100µm, 5µm (inset). (C) Localisation of AqATF4 in a subset of large nuclei (A-B) in the ICM. Pre-absorption (Pa) of the antibody abolished nuclear staining (C). Nuclei are stained with DAPI (A’, B, B’). Red arrows and arrowheads point to ATF4-labeled cells. White arrows and arrowheads point to negative cells. Scale bars: 30µm (A, C), 5µm (B). (D) Genomic window showing the signal coverage tracks of H3K4me3 and H3K27ac counts relative to input DNA and RNA-Seq expression in both larva and adult, in the vicinity of AqATF4. Corresponding coverage tracks and RNA-Seq have the same scale between adult and larva.

Figure 5.6. Preferential distribution of AqATF4 peaks location across genomic features. (see next page) (A) Distribution of AqATF4 peaks from ChIP-Seq data across the genome. Peaks are classified as within introns (Introns), within coding sequences (Exons), within 1kb upstream of TSS or more than 1kb upstream of TSS. Proportions of peaks within 200 bp upstream or downstream of TSSs and TESs are also shown. Distribution is reported as fold enrichment over random (See also Appendix 5.12). Illustrative examples of AqATF4 site locations are shown in different genomic windows showing coverage for AqATF4 normalised to input DNA. (B) Gene-centred average input DNA normalised read coverage plot of AqATF4 across A. queenslandica protein-coding genes. The x- axis spans 1 kb upstream of TSSs and downstream of TESs. The y-axis represents the input DNA normalised enrichment for AqATF4 ChIP-seq reads in larva. Protein-coding genes with TSSs closer than 1 kb to each other were excluded from the analysis and only non-overlapping protein-coding genes in scaffolds larger than 10 kb were plotted. Blue line: Non-expressed genes. Orange line: second top 500 expressed genes. Green line: first top 500 expressed genes. Expressed genes: CEL-seq read counts >0 in larva. The heatmap on the right-hand side of the plot shows the AqATF4 ChIP-Seq reads along the same y-axis, that were used to build the plot.

139

Figure 5.6. Preferential distribution of AqATF4 peaks location across genomic features (legend over page)

140

Figure 5.7. Correlation of AqATF4 occupancy sites with H3 PTMs (A) Overlap of AqATF4 peaks with regions of H3 PTMs and RNAPII enrichment, and predicted enhancer-like elements, plotted as log2(fold enrichment) over random. The percentage of AqATF4 peaks contained in each category is indicated in brackets. (B) Illustrative case example of AqATF4 occupancy in a predicted active enhancer-like element. The genomic window shows signal coverage tracks of ChIP-seq counts relative to input DNA and RNA-Seq expression in larva. (C) Preferential distribution of AqATF4 sites across chromatin states (defined in figure 5.1), plotted as log2(fold enrichment) over random.

141

Figure 5.8. AqATF4 putative targets (legend over page)

142 Figure 5.8. AqATF4 putative targets (see previous page) (A) DNA motifs enriched in AqATF4-bound sequences, identified by MEME-ChIP. For each motif, the best match to a motif in the MEME vertebrate DNA motifs database, the E- value and the number of sites contributing to the construction of the motif are shown. The matched motif is shown on the top and the query motif is shown on the bottom. HOMER also identified a motif matching the JUND binding site that is shown underneath the MEME-CHIP motif. See Appendix 5.13 for all motifs identified by HOMER. (B) Enriched gene ontology (GO) terms among AqATF4 putative gene targets. Only TSSs within 1 kb of an AqATF4 ChIP-Seq peak summit were included in this analysis. See Appendices 5.14 and 5.15 for a full analysis of AqATF4 putative targets. Bar length indicates the significance of the enrichment (Hypergeometric test; -log10(adjusted P-value)).

143 CHAPTER 6 Ð GENERAL DISCUSSION

The basic leucine zipper (bZIP) family is amongst the largest and oldest class of eukaryotic transcription factors (TFs) (Weirauch and Hughes, 2011). Over three decades of research have established the fundamental role of bZIPs in a myriad of pathways regulating cell-fate decisions in animals (e.g. Davudian et al., 2016; van Straaten et al., 1983), most notably in mammalian models in the context of cancer development and therapies (e.g. Ameri and Harris, 2008; Thompson et al., 2009; Tsukada et al., 2011; van Dam and Castellazzi, 2001). However, the function(s) of bZIPs outside the Bilateria remain largely unknown. Here, I have used the sponge Amphimedon queenslandica to explore the putative role of bZIPs in early metazoan evolution and development.

First, I traced the emergence of new bZIP proteins during eukaryotic Ðand especially metazoan- evolution and identified the full bZIP repertoire in A. queenslandica (Chapter 2). Further characterisation of AqbZIPs in the context of A. queenslandica development revealed that these TFs are highly and dynamically expressed in a cell- and developmental stage-specific manner, and are likely to have specific interactions with each other (Chapter 3). Taken together, these observations suggest the involvement of AqbZIPs in multiple basic cellular processes in this sponge. Using A. queenslandica transcriptional changes in experimentally manipulated light conditions as a case study, I confirmed that at least one AqbZIP -AqPARa - is integrated in a regulatory network controlling circadian-like processes in this sponge (Chapter 4). Finally, I demonstrated the binding of another AqbZIP ÐAqATF4- to specific DNA regions marked by several chromatin modifications (Chapter 5). Comparing this work to what is known of bZIP TFs in bilaterians and cnidarians provides insights into ancient homologies that are shared in animals, and genomic innovations that accompanied the evolution of animal complexity.

6.1 What can the evolutionary trajectory of the bZIP family tell us?

bZIPs arose from a single ancestral protein early in eukaryotic evolution and subsequently underwent independent periods of gene duplication and divergence in numerous lineages. Although reconstructing the primordial bZIP was beyond the scope of this work, tracking bZIPs evolution in the five main eukaryotic kingdoms highlighted the striking conservation of the bZIP domain through hundreds of millions of years of independent evolution (Chapter 2). Conservation of protein sequence over long evolutionary periods reflects strong evolutionary pressures that coincide with a central regulatory role in physiology and development (Koonin and Galperin, 2003). Further,

144 five amino acids are particularly conserved in the DNA-contacting basic region in all eukaryotic bZIPs, reflecting strong functional and structural constraints on these sites (Lee et al., 2007; Zhang and Yang, 2015). In the case of the bZIPs, this is likely to reflect the conservation of the mechanisms that underlie DNA recognition.

In recent years, the full complement of bZIP TFs has been characterised in a number of plants (e.g. Baloglu et al., 2014; Jakoby et al., 2002; Li et al., 2015; Liu et al., 2014; Nijhawan et al., 2008; Pourabed et al., 2015; Wang et al., 2015; Wei et al., 2012), fungi (e.g. Tian et al., 2011) and unicellular eukaryotes (e.g. Gamboa-Melendez et al., 2013; Ye et al., 2013). However, genesis and expansion of bZIP genes in animals had not been thoroughly described. bZIP early innovations, leading to the sponge-eumetazoans and eumetazoan LCAs, correlate with an increase in morphological and cellular complexity in a similar manner to those of other ancient TF families (e.g. the bHLHs (Degnan et al., 2009; Larroux et al., 2008b; Simionato et al., 2007)). However, in contrast to the bHLHs, the bZIP complement remained relatively stable since the sponge- eumetazoan split; they underwent further expansions early in vertebrate evolution, and then in humans. The late expansion of bZIPs may reflect the co-option of these TFs in the regulation of vertebrate- and human-specific processes. For instance, the multiple CREB and CEBP paralogues found in vertebrates are expressed in different tissues and organs, and play distinct roles supporting mammalian synaptic plasticity and memory formation (Alberini, 2009; Ramji and Foka, 2002).

6.2 Developmental expression and putative roles of bZIPs in A. queenslandica

Sponges are amongst the morphologically simplest multicellular animals, lacking true tissues, a centralised gut and nerve and muscle cells (Degnan et al., 2008). They however have at least 12 distinct cell types (Leys and Degnan, 2002;Fieth et al, in prep), some of which are specific to one developmental phase (e.g. sperulous cells are only found in the juvenile sponge), which includes pluripotent stem cells (Funayama, 2013). The basal phylogenetic position and morphological simplicity of the sponge A. queenslandica makes it an attractive model to explore the evolution and regulation of cell-fate determination in animals. Further, A. queenslandica appears to be in a constant state of morphogenesis through embryogenesis, metamorphosis and adulthood, with the juvenile/adult body plan constantly reshaping to shifting environmental conditions (Degnan et al., 2015). This provides multiple opportunities to explore the interplay between cellular processes such as proliferation, transdifferentation and apoptosis (e.g. Nakanishi et al., 2014).

As expected for regulatory molecules, AqbZIPs are expressed throughout A. queenslandica life, but different AqbZIPs are differentially expressed at each developmental stage (Chapter 3). 145 Localised expression patterns of some AqbZIPs are broad (e.g. AqCEBP in embryos), while others transcripts are restricted to subset of cells (e.g. AqPARa around the embryo pigment ring). Distinct and dynamic profiles and pattern of expressions suggest that these TFs play a number of roles in multiple cell decisions in this sponge. The association of a majority of the AqbZIPs with the sponge’s primary stem cells, choanocytes and archeocytes (Funayama, 2013), suggest that AqbZIPs may be contributing to the regulation of fate decisions in these cells, including transdifferentiation in A. queenslandica larvae, juveniles and adults. The early embryonic expression of multiple bZIPs in A. queenslandica is consistent with a role in cell type specification and differentiation, both of which happen earlier in A. queenslandica compared to most other animals (Degnan et al., 2015).

Gene silencing and transgenic techniques are yet to be developed in A. queenslandica and thus, functional characterisation of TFs is challenging in this sponge. However, the putative involvement of AqPARa in a light-sensitive process allowed me to experimentally infer a role for this bZIP, and identify its possible partners, by manipulating the light regime (Chapter 4). Ongoing analyses of genome-wide transcriptional changes -measured by CEL-Seq- in response to manipulated light stimuli could identify other players in this network (T. Say, unpublished). ChIP- Seq experiments on another bZIP, AqATF4 confirm that this TF binds specific regions of A. queenslandica DNA (Chapter 5). Association of AqATF4 occupancy with poised cis-regulatory elements and putative gene targets involved in several cellular processes further supports the hypothesis that bZIPs are integrated in several A. queenslandica developmental regulatory networks. These two case studies confirm that bZIPs are functional and play a critical role in this sponge.

6.3 Putative function of bZIPs in the animal last common ancestor

Processes that are conserved in all animals likely reflect core regulatory components of the ancestral regulatory mechanisms. Although A. queenslandica cannot be considered a representative of all sponges, genomic features and processes shared between this sponge and cnidarian and bilaterian model species were most likely present in the LCA. Cnidarian bZIPs are primarily investigated for their involvement in three biological contexts: tissue regeneration through differentiation (CREB) (Chera et al., 2007), circadian response (PAR) (Reitzel et al., 2013) and cytoprotective response to stress (NFE2) (Reitzel et al., 2008). Both the NFE2 and PAR bZIP families are metazoan-specific (Chapter 2). Interestingly, A. queenslandica NFE2 bZIPs and canonical members of the Nrf2 pathway are upregulated in juveniles under bacterial infection (Yuen, 2016), suggesting that the involvement of NFE2 bZIPs in immune response dates back to the animal last common ancestor (LCA). Similarly, one of A. queenslandica PAR-bZIP is involved 146 in this sponge circadian-like response to light (Chapter 4). Although circadian oscillators are not a metazoan novelty (Bell-Pedersen et al., 2005), the involvement of the cryptochrome family in the regulation of circadian processes in animals appears to have originated along the metazoan stem (Oliveri et al., 2014; Rivera et al., 2012). CRY genes were presumably co-opted after the duplication of an ancestral photolyase involved in DNA-repair (Oliveri et al., 2014; Rivera et al., 2012). Since some of PAR targets also include genes involved in DNA-repair in mammals (Gachon, 2007), it is possible that the PAR-CRY sub-circuit was co-opted early in metazoan evolution. Finally, the precise function(s) of CREB in A. queenslandica still remain to be explored. The upregulation of AqCREB mRNA levels in the larva could suggest a role for AqCREB in the early hours of metamorphosis, a period characterised by extensive transdifferentiation and apoptosis (Chapter 3; Nakanishi et al., 2014).

In the A. queenslandica competent larva, putative targets of AqATF4 were associated with membrane trafficking, DNA modification and protein degradation processes (Chapter 5). This could reflect the general requirements of this developmental stage, when the larva is getting ready to settle and initiate metamorphosis. Genome-wide profiling of A. queenslandica transcriptome suggest that the larva may be poised, to cope with the rapid and widespread transcriptional changes required upon settlement (Conaco et al., 2012). Given the integration of mammalian ATF4s in the ubiquitin- proteasome pathway (Ameri and Harris, 2008; Lassot et al., 2001), a role of AqATF4 in protein degradation in A. queenslandica could also suggest a conserved function for ATF4 bZIPs in metazoans.

6.4 The animal LCA had complex regulatory mechanisms

Retracing the metazoan evolution of bZIPs also allowed for the reconstruction of the ancestral bZIP repertoire (Chapter 2). All the twelve bZIP families present in modern bilaterians can be traced back to the animal LCA, suggesting that the core of bZIPs regulatory functions were already integrated to the first metazoan regulatory program. Seven of these families are also found in unicellular holozoans. Similarly, dynamic chromatin states and putative distal enhancers elements that are central to gene regulation in eumetazoans (Schwaiger et al., 2014) are present in A. queenslandica (Chapter 5). This sponge also has bilaterian-like DNA regulatory elements in promoter regions, putatively to allow for context- and cell type-specific gene expression (Fernandez-Valverde and Degnan, 2016). The comparison with representatives of unicellular sister groups, such as Capsaspora, suggests that distal enhancers and complex promoters are metazoan-specific innovations (Gaiti et al, under review; Sebe-Pedros et al., 2016). The enriched binding of AqATF4 to putative distal enhancers, 147 identified by combinations of histone H3 PTMs, provides the first experimental evidence that these regions are likely to function in a similar manner to that observed in bilaterians. Taken together, these observations confirm that the regulatory mechanisms in place in the animal LCA were more complex than originally thought. The transition to multicellularity correlated with several innovations, including distal cis-regulatory elements and 5 new bZIP families (MAF, NFE2, FOS, XBP1 and ATF4).

6.5 Future directives

Developing a ChIP-Seq protocol has been an important first step towards functional characterisation of TFs in A. queenslandica. For instance, ChIP-Seq experiments on AqPARa could reveal intermediate players that are missing for the circadian-like network discussed in Chapter 4. Indeed, since cryptochromes are not transcription factors, AqPARa light-sensitive expression must involve at least one mediator, which could also be regulated by AqPARa (feedback loop). Further, differential binding of AqPARa between larvae reared under different light regimes, possibly coupled with the transcriptome analyses mentioned above, could even distinguish between AqPARa light-responsive targets and those involved in other processes. This avenue was explored during the course of this work, but the anti-AqPARa antibody was not deemed specific enough to pursue this further.

Similarly, differential binding of TFs between developmental stages -for which the ChIP- Seq protocol has been optimised (i.e. precompetent larvae, competent larvae and adult)- could provide further insights into stage-specific targets. For instance, targets specifically activated in competent larvae relative to precompetent larvae likely reflect genes that are related with the initiation of metamorphosis. This is consistent with the poised state of the competent larvae suggested by genome-wide transcriptome analyses (Conaco et al., 2012) and the present work (chapter 5). For instance, most of AqATF4 putative targets markedly change expression within the first hours of metamorphosis (Chapter 5). Further characterisation of these targets could highlight gene families that underlie this morphological transition.

Finally, genomic and transcriptomic data from an increasing number of sponge species will shed light on the putative role of bZIPs in sponges other than A. queenslandica (e.g Guzman and Conaco, 2016). A comparative approach within the Porifera will likely support the pivotal role of bZIPs in the LCA of all animals and uncover sponge novelties.

148 6.6 Concluding remarks

In summary, this body of work confirms the integration of bZIPs in several functional regulatory networks in A. queenslandica. It also provides a foundation upon which to base and direct subsequent research on bZIPs in this sponge. Ultimately, there is a need to functionally confirm several of the findings discussed here, and relevant experiments are proposed throughout the document. It is now well established that the expansion of TF repertoires supports an increase in morphological and regulatory complexity. In the case of the bZIPs, this appears to be achieved through the emergence of new bZIP families, which recognise similar but distinct DNA sites, but also through a diversification of bZIP-bZIP dimeric interactions. The presence of active distal enhancers and complex promoters in A. queenslandica - but not in unicellular holozoans - suggests that innovations in the non-coding regulatory genome may also have contributed to an increase in regulatory capacities. Together, these two regulatory mechanisms allowed for the fine-tuning of gene expression required by complex multicellularity. The recent development of new molecular tools to study TF-DNA and TF-TF interactions, and the increasing numbers of genomes and transcriptomes available, especially those of early branching metazoan (sponges and ctenophores) and unicellular holozoan phyla, hold the potential to deepen our understanding of the first metazoan GRNs, and thus, the regulatory innovations underlying the evolution of metazoan complexity.

149 REFERENCES

Abascal, F., Zardoya, R., and Posada, D. (2005). ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104-2105.

Adamska, M., Degnan, S.M., Green, K.M., Adamski, M., Craigie, A., Larroux, C., and Degnan, B.M. (2007a). Wnt and TGF-beta expression in the sponge Amphimedon queenslandica and the origin of metazoan embryonic patterning. PLoS One 2, e1031.

Adamska, M., Larroux, C., Adamski, M., Green, K., Lovas, E., Koop, D., Richards, G.S., Zwafink, C., and Degnan, B.M. (2010). Structure and expression of conserved Wnt pathway components in the demosponge Amphimedon queenslandica. Evol Dev 12, 494-518.

Adamska, M., Matus, D.Q., Adamski, M., Green, K., Rokhsar, D.S., Martindale, M.Q., and Degnan, B.M. (2007b). The evolutionary origin of hedgehog proteins. Curr Biol 17, R836-837.

Adryan, B., and Teichmann, S.A. (2010). The developmental expression dynamics of Drosophila melanogaster transcription factors. Genome Biol 11, R40.

Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Cech, M., Chilton, J., Clements, D., Coraor, N., Eberhard, C., et al. (2016). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44, W3-W10.

Afgan, E., Sloggett, C., Goonasekera, N., Makunin, I., Benson, D., Crowe, M., Gladman, S., Kowsar, Y., Pheasant, M., Horst, R., et al. (2015). Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud. PLoS One 10, e0140829.

Alberini, C.M. (2009). Transcription factors in long-term memory and synaptic plasticity. Physiol Rev 89, 121-145.

Ali, A.M., Reis, J.M., Xia, Y., Rashid, A.J., Mercaldo, V., Walters, B.J., Brechun, K.E., Borisenko, V., Josselyn, S.A., Karanicolas, J., et al. (2015). Optogenetic Inhibitor of the Transcription Factor CREB. Chem Biol 22, 1531-1539.

Amano, S. (1988). Morning Release of Larvae Controlled by the Light in an Intertidal Sponge, Callyspongia ramosa. The Biological Bulletin 175, 181-184.

Ameri, K., Harris, A., and Singleton, D. (2011). Atf4. AfCS-Nature Molecule Pages.

Ameri, K., and Harris, A.L. (2008). Activating transcription factor 4. Int J Biochem Cell Biol 40, 14-21.

Amoutzias, G.D., Veron, A.S., Weiner, J., 3rd, Robinson-Rechavi, M., Bornberg-Bauer, E., Oliver, S.G., and Robertson, D.L. (2007). One billion years of bZIP transcription factor evolution: conservation and change in dimerization and DNA-binding site specificity. Mol Biol Evol 24, 827- 835.

Anavy, L., Levin, M., Khair, S., Nakanishi, N., Fernandez-Valverde, S.L., Degnan, B.M., and Yanai, I. (2014). BLIND ordering of large-scale transcriptomic developmental timecourses. Development 141, 1161-1166.

Ashmore, L.J., and Sehgal, A. (2003). A fly's eye view of circadian entrainment. J Biol Rhythms 18, 206-216. 150 Aurrecoechea, C., Brestelli, J., Brunk, B.P., Carlton, J.M., Dommer, J., Fischer, S., Gajria, B., Gao, X., Gingle, A., Grant, G., et al. (2009). GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis. Nucleic Acids Res 37, D526-530.

Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37, W202-208.

Bailey, T.L., and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36.

Baloglu, M.C., Eldem, V., Hajyzadeh, M., and Unver, T. (2014). Genome-wide analysis of the bZIP transcription factors in cucumber. PLoS One 9, e96014.

Bannister, A., Cook, A., and Kouzarides, T. (1991). In vitro DNA binding activity of Fos/Jun and BZLF1 but not C/EBP is affected by redox changes. Oncogene 6, 1243-1250.

Barco, A., and Marie, H. (2011). Genetic approaches to investigate the role of CREB in neuronal plasticity and memory. Mol Neurobiol 44, 330-349.

Bardet, A.F., Steinmann, J., Bafna, S., Knoblich, J.A., Zeitlinger, J., and Stark, A. (2013). Identification of transcription factor binding sites from ChIP-seq data at high resolution. Bioinformatics 29, 2705-2713.

Barraza, A., Cabrera-Ponce, J.L., Gamboa-Becerra, R., Luna-Martinez, F., Winkler, R., and Alvarez-Venegas, R. (2015). The Phaseolus vulgaris PvTRX1h gene regulates plant hormone biosynthesis in embryogenic callus from common bean. Front Plant Sci 6, 577.

Bass, J. (2012). Circadian topology of metabolism. Nature 491, 348-356.

Bell-Pedersen, D., Cassone, V.M., Earnest, D.J., Golden, S.S., Hardin, P.E., Thomas, T.L., and Zoran, M.J. (2005). Circadian rhythms from multiple oscillators: lessons from diverse organisms. Nat Rev Genet 6, 544-556.

Ben-Moshe, Z., Vatine, G., Alon, S., Tovin, A., Mracek, P., Foulkes, N.S., and Gothilf, Y. (2010). Multiple PAR and E4BP4 bZIP transcription factors in zebrafish: diverse spatial and temporal expression patterns. Chronobiol Int 27, 1509-1531.

Best, A.A., Morrison, H.G., McArthur, A.G., Sogin, M.L., and Olsen, G.J. (2004). Evolution of eukaryotic transcription: insights from the genome of Giardia lamblia. Genome Res 14, 1537-1547.

Blank, V., and Andrews, N.C. (1997). The Maf transcription factors: regulators of differentiation. Trends Biochem Sci 22, 437-441.

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.

Bolouri, H., and Davidson, E.H. (2003). Transcriptional regulatory cascades in development: initial rates, not steady state, determine network kinetics. Proc Natl Acad Sci U S A 100, 9371-9376.

Brady, A.K., Snyder, K.A., and Vize, P.D. (2011). Circadian cycles of gene expression in the coral, Acropora millepora. PLoS One 6, e25072.

151 Brady, A.K., Willis, B.L., Harder, L.D., and Vize, P.D. (2016). Lunar Phase Modulates Circadian Gene Expression Cycles in the Broadcast Spawning Coral Acropora millepora. Biol Bull 230, 130- 142.

Bridgham, J.T., Eick, G.N., Larroux, C., Deshpande, K., Harms, M.J., Gauthier, M.E., Ortlund, E.A., Degnan, B.M., and Thornton, J.W. (2010). Protein evolution by molecular tinkering: diversification of the nuclear receptor superfamily from a ligand-dependent ancestor. PLoS Biol 8.

Buck, M., Zhang, L., Halasz, N.A., Hunter, T., and Chojkier, M. (2001). Nuclear export of phosphorylated C/EBPbeta mediates the inhibition of albumin expression by TNF-alpha. EMBO J 20, 6712-6723.

Calfon, M., Zeng, H., Urano, F., Till, J.H., Hubbard, S.R., Harding, H.P., Clark, S.G., and Ron, D. (2002). IRE1 couples endoplasmic reticulum load to secretory capacity by processing the XBP-1 mRNA. Nature 415, 92-96.

Chen, A., Muzzio, I.A., Malleret, G., Bartsch, D., Verbitsky, M., Pavlidis, P., Yonan, A.L., Vronskaya, S., Grody, M.B., Cepeda, I., et al. (2003). Inducible enhancement of memory storage and synaptic plasticity in transgenic mice expressing an inhibitor of ATF4 (CREB-2) and C/EBP proteins. Neuron 39, 655-669.

Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K.Y., Rozowsky, J., Yan, K.K., Dong, X., Djebali, S., Ruan, Y., et al. (2012). Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 22, 1658-1667.

Chera, S., Kaloulis, K., and Galliot, B. (2007). The cAMP response element binding protein (CREB) as an integrative HUB selector in metazoans: Clues from the hydra model system. Biosystems 87, 191-203.

Chinenov, Y., and Kerppola, T.K. (2001). Close encounters of many kinds: Fos-Jun interactions that mediate transcription regulatory specificity. Oncogene 20, 2438-2452.

Cock, J.M., Sterck, L., Rouze, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J.M., Badger, J.H., et al. (2010). The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465, 617-621.

Cohen, D.M., Won, K.J., Nguyen, N., Lazar, M.A., Chen, C.S., and Steger, D.J. (2015). ATF4 licenses C/EBPβ activity in human mesenchymal stem cells primed for adipogenesis. Elife 4, e06821.

Conaco, C., Neveu, P., Zhou, H.J., Arcila, M.L., Degnan, S.M., Degnan, B.M., and Kosik, K.S. (2012). Transcriptome profiling of the demosponge Amphimedon queenslandica reveals genome- wide events that accompany major life cycle transitions. Bmc Genomics 13.

Conesa, A., and Gotz, S. (2008). Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008, 619832.

Consortium, E.P. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74.

Correa, L.G., Riano-Pachon, D.M., Schrago, C.G., dos Santos, R.V., Mueller-Roeber, B., and Vincentz, M. (2008). The role of bZIP transcription factors in green plant evolution: adaptive features emerging from four founder genes. PLoS One 3, e2944.

152 Cowell, I.G. (2002). E4BP4/NFIL3, a PAR-related bZIP factor with many roles. BioEssays 24, 1023-1029.

Cunningham, F., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, G., Fitzgerald, S., et al. (2015). Ensembl 2015. Nucleic Acids Res 43, D662- 669.

Curran, T., Peters, G., Van Beveren, C., Teich, N.M., and Verma, I.M. (1982). FBJ murine osteosarcoma virus: identification and molecular cloning of biologically active proviral DNA. J Virol 44, 674-682.

Cyran, S.A., Buchsbaum, A.M., Reddy, K.L., Lin, M.C., Glossop, N.R., Hardin, P.E., Young, M.W., Storti, R.V., and Blau, J. (2003). vrille, Pdp1, and dClock form a second feedback loop in the Drosophila circadian clock. Cell 112, 329-341.

Davidson, E.H. (2006). cis-Regulatory Modules, and the Structure/Function Basis of Regulatory Logic. In The Regulatory Genome: Gene Regulatory Networks in Development and Evolution (Academic Press), pp. 31-86.

Davidson, E.H., and Erwin, D.H. (2006). Gene regulatory networks and the evolution of animal body plans. Science 311, 796-800.

Davudian, S., Mansoori, B., Shajari, N., Mohammadi, A., and Baradaran, B. (2016). BACH1, the master regulator gene: A novel candidate target for cancer therapy. Gene 588, 30-37. de Mendoza, A., Sebe-Pedros, A., Sestak, M.S., Matejcic, M., Torruella, G., Domazet-Loso, T., and Ruiz-Trillo, I. (2013). Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc Natl Acad Sci U S A 110, E4858-4866. de Mendoza, A., Suga, H., Permanyer, J., Irimia, M., and Ruiz-Trillo, I. (2015). Complex transcriptional regulation and independent evolution of fungal-like traits in a relative of animals. Elife 4, e08904.

Degnan, B.M., Adamska, M., Craigie, A., Degnan, S.M., Fahey, B., Gauthier, M., Hooper, J.N., Larroux, C., Leys, S.P., Lovas, E., et al. (2008). The Demosponge Amphimedon queenslandica: Reconstructing the Ancestral Metazoan Genome and Deciphering the Origin of Animal Multicellularity. CSH Protoc 2008, pdb emo108.

Degnan, B.M., Adamska, M., Richards, G.S., Larroux, C., Leininger, S., Bergum, B., Calcino, A., Taylor, K., Nakanishi, N., and Degnan, S.M. (2015). Porifera. In Evolutionary Developmental Biology of Invertebrates 1: Introduction, Non-Bilateria, Acoelomorpha, Xenoturbellida, Chaetognatha, A. Wanninger, ed. (Springer-Verlag).

Degnan, B.M., Vervoort, M., Larroux, C., and Richards, G.S. (2009). Early evolution of metazoan transcription factors. Curr Opin Genet Dev 19, 591-599.

Degnan, S.M., and Degnan, B.M. (2010). The initiation of metamorphosis as an ancient polyphenic trait and its role in metazoan life-cycle evolution. Philos Trans R Soc Lond B Biol Sci 365, 641- 651.

Deppmann, C.D., Acharya, A., Rishi, V., Wobbes, B., Smeekens, S., Taparowsky, E.J., and Vinson, C. (2004). Dimerization specificity of all 67 B-ZIP motifs in Arabidopsis thaliana: a comparison to Homo sapiens B-ZIP motifs. Nucleic Acids Res 32, 3435-3445.

153 Deppmann, C.D., Alvania, R.S., and Taparowsky, E.J. (2006). Cross-species annotation of basic leucine zipper factor interactions: Insight into the evolution of closed interaction networks. Molecular Biology and Evolution 23, 1480-1492.

Derelle, R., Lopez, P., Le Guyader, H., and Manuel, M. (2007). Homeodomain proteins belong to the ancestral molecular toolkit of eukaryotes. Evolution & Development 9, 212-219.

Donald, J.E., and Shakhnovich, E.I. (2005). Predicting specificity-determining residues in two large eukaryotic transcription factor families. Nucleic Acids Res 33, 4455-4465.

Duboule, D., and Dolle, P. (1989). The Structural and Functional-Organization of the Murine Hox Gene Family Resembles That of Drosophila Homeotic Genes. Embo Journal 8, 1497-1505.

Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., Smith, S.A., Seaver, E., Rouse, G.W., Obst, M., Edgecombe, G.D., et al. (2008). Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745-749.

Eckalbar, W.L., Schlebusch, S.A., Mason, M.K., Gill, Z., Parker, A.V., Booker, B.M., Nishizaki, S., Muswamba-Nday, C., Terhune, E., Nevonen, K.A., et al. (2016). Transcriptomic and epigenomic characterization of the developing bat wing. Nat Genet 48, 528-536.

Edgar, R., Domrachev, M., and Lash, A.E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30, 207-210.

Ellenberger, T.E., Brandl, C.J., Struhl, K., and Harrison, S.C. (1992). The Gcn4 Basic Region Leucine Zipper Binds DNA as a Dimer of Uninterrupted Alpha-Helices - Crystal-Structure of the Protein-DNA Complex. Cell 71, 1223-1237.

Emery, P., Stanewsky, R., Helfrich-Förster, C., Emery-Le, M., Hall, J.C., and Rosbash, M. (2000). Drosophila CRY Is a Deep Brain Circadian Photoreceptor. Neuron 26, 493-504.

Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 9, 215-216.

Eychene, A., Rocques, N., and Pouponnot, C. (2008). A new MAFia in cancer. Nat Rev Cancer 8, 683-693.

Fahey, B. (2011). Origin of animal epithelia: insights from the genome of the demosponge Amphimedon queenslandica. In School of Biological Sciences (The University of Queensland).

Fairclough, S.R., Chen, Z., Kramer, E., Zeng, Q., Young, S., Robertson, H.M., Begovic, E., Richter, D.J., Russ, C., Westbrook, M.J., et al. (2013). Premetazoan genome evolution and the regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta. Genome Biol 14, R15.

Fassler, J., Landsman, D., Acharya, A., Moll, J.R., Bonovich, M., and Vinson, C. (2002). B-ZIP proteins encoded by the Drosophila genome: evaluation of potential dimerization partners. Genome Res 12, 1190-1200.

Fernandez-Valverde, S.L., Calcino, A.D., and Degnan, B.M. (2015). Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica. BMC Genomics 16, 387.

Fernandez-Valverde, S.L., and Degnan, B.M. (2016). Bilaterian-like promoters in the highly compact Amphimedon queenslandica genome. Sci Rep 6, 22496.

154 Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., et al. (2014). Pfam: the protein families database. Nucleic Acids Research 42, D222-D230.

Finnerty, J.R., Mazza, M.E., and Jezewski, P.A. (2009). Domain duplication, divergence, and loss events in vertebrate Msx paralogs reveal phylogenomically informed disease markers. BMC Evol Biol 9, 18.

Finnerty, J.R., Paulson, D., Burton, P., Pang, K., and Martindale, M.Q. (2003). Early evolution of a homeobox gene: the gene in the Cnidaria and the Bilateria. Evol Dev 5, 331-345.

Fong, J.H., Keating, A.E., and Singh, M. (2004). Predicting specificity in bZIP coiled-coil protein interactions. Genome Biol 5, R11.

Fortunato, S.A., Adamski, M., Ramos, O.M., Leininger, S., Liu, J., Ferrier, D.E., and Adamska, M. (2014). Calcisponges have a ParaHox gene and dynamic expression of dispersed NK homeobox genes. Nature 514, 620-623

Foster, R., and Kreitzman, L. (2005). Rhythms of Life: The Biological Clocks that Control the Daily Lives of Every Living Thing (Yale University Press).

Fujii, Y., Shimizu, T., Toda, T., Yanagida, M., and Hakoshima, T. (2000). Structural basis for the diversity of DNA recognition by bZIP transcription factors. Nat Struct Biol 7, 889-893.

Funayama, N. (2013). The stem cell system in demosponges: suggested involvement of two types of cells: archeocytes (active stem cells) and choanocytes (food-entrapping flagellated cells). Dev Genes Evol 223, 23-38.

Gachon, F. (2007). Physiological function of PARbZip circadian clock-controlled transcription factors. Ann Med 39, 562-571.

Gaiti, F., Fernandez-Valverde, S.L., Nakanishi, N., Calcino, A.D., Yanai, I., Tanurdzic, M., and Degnan, B.M. (2015). Dynamic and Widespread lncRNA Expression in a Sponge and the Origin of Animal Complexity. Molecular Biology and Evolution 32, 2367-2382.

Galliot, B., Quiquand, M., Ghila, L., de Rosa, R., Miljkovic-Licina, M., and Chera, S. (2009). Origins of neurogenesis, a cnidarian view. Dev Biol 332, 2-24.

Gamboa-Melendez, H., Huerta, A.I., and Judelson, H.S. (2013). bZIP transcription factors in the oomycete phytophthora infestans with novel DNA-binding domains are involved in defense against oxidative stress. Eukaryot Cell 12, 1403-1412.

Gaudet, P., Fey, P., Basu, S., Bushmanova, Y.A., Dodson, R., Sheppard, K.A., Just, E.M., Kibbe, W.A., and Chisholm, R.L. (2011). dictyBase update 2011: web 2.0 functionality and the initial steps towards a genome portal for the Amoebozoa. Nucleic Acids Res 39, D620-624.

Gauthier, M. (2009). Developing a sense of self: exploring the evolution of immune and allorecognition mechanisms in metazoans using the demosponge Amphimedon queenslandica. In School of Biological Sciences (The University of Queensland).

Gjymishka, A., Palii, S.S., Shan, J., and Kilberg, M.S. (2008). Despite increased ATF4 binding at the C/EBP-ATF composite site following activation of the unfolded protein response, system A transporter 2 (SNAT2) transcription activity is repressed in HepG2 cells. Journal of Biological Chemistry 283, 27736-27747. 155 Glover, J.N., and Harrison, S.C. (1995). Crystal structure of the heterodimeric bZIP transcription factor c-Fos-c-Jun bound to DNA. Nature 373, 257-261.

Grigoryan, G., and Keating, A.E. (2008). Structural specificity in coiled-coil interactions. Curr Opin Struct Biol 18, 477-483.

Grigoryan, G., Reinke, A.W., and Keating, A.E. (2009). Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature 458, 859-864.

Guzman, C., and Conaco, C. (2016). Gene Expression Dynamics Accompanying the Sponge Thermal Stress Response. PLoS One 11, e0165368

Haas, N.B., Cantwell, C.A., Johnson, P.F., and Burch, J.B. (1995). DNA-binding specificity of the PAR basic leucine zipper protein VBP partially overlaps those of the C/EBP and CREB/ATF families and is influenced by domains that flank the core basic region. Mol Cell Biol 15, 1923- 1932.

Hai, T., and Hartman, M.G. (2001). The molecular biology and nomenclature of the activating transcription factor/cAMP responsive element binding family of transcription factors: activating transcription factor proteins and homeostasis. Gene 273, 1-11.

Han, J., Back, S.H., Hur, J., Lin, Y.H., Gildersleeve, R., Shan, J., Yuan, C.L., Krokowski, D., Wang, S., Hatzoglou, M., et al. (2013). ER-stress-induced transcriptional regulation increases protein synthesis leading to cell death. Nat Cell Biol 15, 481-490.

Harmer, S.L., Panda, S., and Kay, S.A. (2001). Molecular bases of circadian rhythms. Annu Rev Cell Dev Biol 17, 215-253.

Harmeyer, K.M., South, P.F., Bishop, B., Ogas, J., and Briggs, S.D. (2015). Immediate chromatin immunoprecipitation and on-bead quantitative PCR analysis: a versatile and rapid ChIP procedure. Nucleic Acids Res 43, e38.

Harms, E., Kivimae, S., Young, M.W., and Saez, L. (2004). Posttranscriptional and posttranslational regulation of clock genes. J Biol Rhythms 19, 361-373.

Harreman, M.T., Kline, T.M., Milford, H.G., Harben, M.B., Hodel, A.E., and Corbett, A.H. (2004). Regulation of nuclear import by phosphorylation adjacent to nuclear localization signals. J Biol Chem 279, 20613-20621.

Hashimshony, T., Wagner, F., Sher, N., and Yanai, I. (2012). CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports 2, 666-673.

Hayes, J.D., and Dinkova-Kostova, A.T. (2014). The Nrf2 regulatory network provides an interface between redox and intermediary metabolism. Trends in Biochemical Sciences 39, 199-218.

He, Y., Sun, S., Sha, H., Liu, Z., Yang, L., Xue, Z., Chen, H., and Qi, L. (2010). Emerging roles for XBP1, a sUPeR transcription factor. Gene Expr 15, 13-25.

Heger, P., Marin, B., Bartkuhn, M., Schierenberg, E., and Wiehe, T. (2012). The chromatin insulator CTCF and the emergence of metazoan diversity. Proc Natl Acad Sci U S A 109, 17507- 17512.

156 Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589.

Heinz, S., Romanoski, C.E., Benner, C., and Glass, C.K. (2015). The selection and function of cell type-specific enhancers. Nat Rev Mol Cell Biol 16, 144-154.

Hejnol, A., Obst, M., Stamatakis, A., Ott, M., Rouse, G.W., Edgecombe, G.D., Martinez, P., Baguna, J., Bailly, X., Jondelius, U., et al. (2009). Assessing the root of bilaterian animals with scalable phylogenomic methods. Proceedings of the Royal Society B-Biological Sciences 276, 4261-4270.

Hemmrich, G., and Bosch, T.C. (2008). Compagen, a comparative genomics platform for early branching metazoan animals, reveals early origins of genes regulating stem-cell differentiation. Bioessays 30, 1010-1018.

Hinnebusch, A.G., and Fink, G.R. (1983). Positive regulation in the general amino acid control of Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 80, 5374-5378.

Ho, J.W., Jung, Y.L., Liu, T., Alver, B.H., Lee, S., Ikegami, K., Sohn, K.A., Minoda, A., Tolstorukov, M.Y., Appert, A., et al. (2014). Comparative analysis of metazoan chromatin organization. Nature 512, 449-452.

Hoadley, K.D., Szmant, A.M., and Pyott, S.J. (2011). Circadian clock gene expression in the coral Favia fragum over diel and lunar reproductive cycles. PLoS One 6, e19755.

Hooper, J.N.A., and Soest, R.W.M.v. (2006). A new species of Amphimedon (Porifera, Demospongiae, Haplosclerida, Niphatidae) from the Capricorn-Bunker Group of Islands, Great Barrier Reef, Australia: target species for the "sponge genome project". Zootaxa 1314, 31-39.

Hope, I.A., and Struhl, K. (1986). Functional dissection of a eukaryotic transcriptional activator protein, GCN4 of yeast. Cell 46, 885-894.

Hope, I.A., and Struhl, K. (1987). Gcn4, a Eukaryotic Transcriptional Activator Protein, Binds as a Dimer to Target DNA. Embo Journal 6, 2781-2784.

Huggins, C.J., Mayekar, M.K., Martin, N., Saylor, K.L., Gonit, M., Jailwala, P., Kasoji, M., Haines, D.C., Quinones, O.A., and Johnson, P.F. (2016). C/EBPgamma Is a Critical Regulator of Cellular Stress Response Networks through Heterodimerization with ATF4. Mol Cell Biol 36, 693-713.

Hurst, H.C. (1994). Transcription factors. 1: bZIP proteins. Protein Profile 1, 123-168.

Jager, M., Queinnec, E., Houliston, E., and Manuel, M. (2006). Expansion of the SOX gene family predated the emergence of the Bilateria. Mol Phylogenet Evol 39, 468-477.

Jakoby, M., Weisshaar, B., Dröge-Laser, W., Vicente-Carbajosa, J., Tiedemann, J., Kroj, T., and Parcy, F. (2002). bZIP transcription factors in Arabidopsis. Trends in Plant Science 7, 106-111.

Jiang, D., Niwa, M., and Koong, A.C. (2015). Targeting the IRE1alpha-XBP1 branch of the unfolded protein response in human diseases. Semin Cancer Biol 33, 48-56.

Jindrich, K., and Degnan, B.M. (2016). The diversification of the basic leucine zipper family in eukaryotes correlates with the evolution of multicellularity. BMC Evol Biol 16, 28.

157 Jochum, W., Passegue, E., and Wagner, E.F. (2001). AP-1 in mouse development and tumorigenesis. Oncogene 20, 2401-2412.

John, S., Sabo, P.J., Thurman, R.E., Sung, M.H., Biddie, S.C., Johnson, T.A., Hager, G.L., and Stamatoyannopoulos, J.A. (2011). Chromatin accessibility pre-determines binding patterns. Nat Genet 43, 264-268.

Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K.R., Rastas, P., Morgunova, E., Enge, M., Taipale, M., Wei, G., et al. (2013). DNA-binding specificities of human transcription factors. Cell 152, 327-339.

Kaloulis, K., Chera, S., Hassel, M., Gauchat, D., and Galliot, B. (2004). Reactivation of developmental programs: The cAMP-response element-binding protein pathway is involved in hydra head regeneration. Proceedings of the National Academy of Sciences of the United States of America 101, 2363-2368.

Karpinski, B.A., Morle, G.D., Huggenvik, J., Uhler, M.D., and Leiden, J.M. (1992). Molecular- Cloning of Human Creb-2 - an Atf/Creb Transcription Factor That Can Negatively Regulate Transcription from the Camp Response Element. Proceedings of the National Academy of Sciences of the United States of America 89, 4820-4824.

Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059-3066.

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., et al. (2012). Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647- 1649.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G.E., Dekker, J., et al. (2014). Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A 111, 6131-6138.

Kharchenko, P.V., Tolstorukov, M.Y., and Park, P.J. (2008). Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26, 1351-1359.

King, N. (2004). The unicellular ancestry of animal development. Dev Cell 7, 313-325.

King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J., Fairclough, S., Hellsten, U., Isogai, Y., Letunic, I., et al. (2008). The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451, 783-788.

Koonin, E.V., and Galperin, M.Y. (2003). Sequence - Evolution - Function: Computational Approaches in Comparative Genomics. In Sequence - Evolution - Function: Computational Approaches in Comparative Genomics (Boston).

Kuhn, R.M., Haussler, D., and Kent, W.J. (2013). The UCSC genome browser and associated tools. Brief Bioinform 14, 144-161.

Kurokawa, H., Motohashi, H., Sueno, S., Kimura, M., Takagawa, H., Kanno, Y., Yamamoto, M., and Tanaka, T. (2009). Structural basis of alternative DNA recognition by Maf transcription factors. Mol Cell Biol 29, 6232-6244.

158 Kusserow, A., Pang, K., Sturm, C., Hrouda, M., Lentfer, J., Schmidt, H.A., Technau, U., von Haeseler, A., Hobmayer, B., Martindale, M.Q., et al. (2005). Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433, 156-160.

Landschulz, W.H., Johnson, P.F., and Mcknight, S.L. (1989). The DNA-Binding Domain of the Rat-Liver Nuclear Protein-C/Ebp Is Bipartite. Science 243, 1681-1688.

Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al. (2012). ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 22, 1813-1831.

Lang, B.F., O'Kelly, C., Nerad, T., Gray, M.W., and Burger, G. (2002). The closest unicellular relatives of animals. Curr Biol 12, 1773-1778.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.

Larroux, C. (2007). Genome content and developmental expression of transcription factor genes in the demosponge Amphimedon queenslandica : insights into the Ancestral Metazoan Developmental Program. In School of Biological Sciences (The University of Queensland).

Larroux, C., Fahey, B., Adamska, M., Richards, G.S., Gauthier, M., Green, K., Lovas, E., and Degnan, B.M. (2008a). Whole-mount in situ hybridization in amphimedon. CSH Protoc 2008, pdb prot5096.

Larroux, C., Fahey, B., Liubicich, D., Hinman, V.F., Gauthier, M., Gongora, M., Green, K., Worheide, G., Leys, S.P., and Degnan, B.M. (2006). Developmental expression of transcription factor genes in a demosponge: insights into the origin of metazoan multicellularity. Evol Dev 8, 150-173.

Larroux, C., Luke, G.N., Koopman, P., Rokhsar, D.S., Shimeld, S.M., and Degnan, B.M. (2008b). Genesis and expansion of metazoan transcription factor gene classes. Mol Biol Evol 25, 980-996.

Lassot, I., Segeral, E., Berlioz-Torrent, C., Durand, H., Groussin, L., Hai, T., Benarous, R., and Margottin-Goguet, F. (2001). ATF4 degradation relies on a phosphorylation-dependent interaction with the SCF(betaTrCP) ubiquitin ligase. Mol Cell Biol 21, 2192-2202.

Lee, D., Redfern, O., and Orengo, C. (2007). Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 8, 995-1005.

Lee, J.E., Oney, M., Frizzell, K., Phadnis, N., and Hollien, J. (2015). Drosophila melanogaster activating transcription factor 4 regulates glycolysis during endoplasmic reticulum stress. G3 (Bethesda) 5, 667-675.

Lesluyes, T., Johnson, J., Machanick, P., and Bailey, T.L. (2014). Differential motif enrichment analysis of paired ChIP-seq experiments. BMC Genomics 15, 752.

Levin, M., Anavy, L., Cole, A.G., Winter, E., Mostov, N., Khair, S., Senderovich, N., Kovalev, E., Silver, D.H., Feder, M., et al. (2016). The mid-developmental transition and the evolution of animal body plans. Nature 531, 637-641.

Levine, M., Cattoglio, C., and Tjian, R. (2014). Looping back to leap forward: transcription enters a new era. Cell 157, 13-25.

159 Levine, M., and Tjian, R. (2003). Transcription regulation and animal diversity. Nature 424, 147- 151.

Leys, S.P., Cronin, T.W., Degnan, B.M., and Marshall, J.N. (2002). Spectral sensitivity in a sponge larva. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 188, 199-202.

Leys, S.P., and Degnan, B.M. (2001). Cytological basis of photoresponsive behavior in a sponge larva. Biol Bull 201, 323-338.

Leys, S.P., and Degnan, B.M. (2002). Embryogenesis and metamorphosis in a haplosclerid demosponge: gastrulation and transdifferentiation of larval ciliated cells to choanocytes. Invertebrate Biology 121, 171-189.

Leys, S.P., and Hill, A. (2012). The physiology and molecular biology of sponge tissues. Adv Mar Biol 62, 1-56.

Leys, S.P., Larroux, C., Gauthier, M., Adamska, M., Fahey, B., Richards, G.S., Degnan, S.M., and Degnan, B.M. (2008). Isolation of amphimedon developmental material. CSH Protoc 2008, pdb prot5095.

Li, B., Carey, M., and Workman, J.L. (2007). The role of chromatin during transcription. Cell 128, 707-719.

Li, D., Fu, F., Zhang, H., and Song, F. (2015). Genome-wide systematic characterization of the bZIP transcriptional factor family in tomato (Solanum lycopersicum L.). BMC Genomics 16, 771.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.

Li, M., Ge, Q., Wang, W., Wang, J., and Lu, Z. (2011). c-Jun binding site identification in K562 cells. J Genet Genomics 38, 235-242.

Li, W., Notani, D., and Rosenfeld, M.G. (2016). Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat Rev Genet 17, 207-223.

Lin, F.J., Song, W., Meyer-Bernstein, E., Naidoo, N., and Sehgal, A. (2001). Photic signaling by cryptochrome in the Drosophila circadian system. Mol Cell Biol 21, 7287-7294.

Liu, J., Chen, N., Chen, F., Cai, B., Dal Santo, S., Tornielli, G.B., Pezzotti, M., and Cheng, Z.M. (2014). Genome-wide analysis and expression profile of the bZIP transcription factor gene family in grapevine (Vitis vinifera). BMC Genomics 15, 281.

Liu, Y., Taverna, S.D., Muratore, T.L., Shabanowitz, J., Hunt, D.F., and Allis, C.D. (2007). RNAi- dependent H3K27 methylation is required for heterochromatin formation and DNA elimination in Tetrahymena. Genes & Development 21, 1530-1545.

Llorca, C.M., Potschin, M., and Zentgraf, U. (2014). bZIPs and WRKYs: two large transcription factor families executing two different functional strategies. Front Plant Sci 5, 169.

Lonze, B.E., and Ginty, D.D. (2002). Function and regulation of CREB family transcription factors in the nervous system. Neuron 35, 605-623.

160 Luo, Q., Viste, K., Urday-Zaa, J.C., Senthil Kumar, G., Tsai, W.W., Talai, A., Mayo, K.E., Montminy, M., and Radhakrishnan, I. (2012). Mechanism of CREB recognition and coactivation by the CREB-regulated transcriptional coactivator CRTC2. Proc Natl Acad Sci U S A 109, 20865- 20870.

Luo, S.Z., Baumeister, P., Yang, S.J., Abcouwer, S.F., and Lee, A.S. (2003). Induction of Grp78/BiP by translational block - Activation of the Grp78 promoter by ATF4 through an upstream ATF/CRE site independent of the endoplasmic reticulum stress elements. Journal of Biological Chemistry 278, 37375-37385.

Ma, Q. (2013). Role of nrf2 in oxidative stress and toxicity. Annu Rev Pharmacol Toxicol 53, 401- 426.

Machanick, P., and Bailey, T.L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696-1697.

Maere, S., Heymans, K., and Kuiper, M. (2005). BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21, 3448- 3449.

Majercak, J., Sidote, D., Hardin, P.E., and Edery, I. (1999). How a circadian clock adapts to seasonal decreases in temperature and day length. Neuron 24, 219-230.

Maki, Y., Bos, T.J., Davis, C., Starbuck, M., and Vogt, P.K. (1987). Avian sarcoma virus 17 carries the jun oncogene. Proc Natl Acad Sci U S A 84, 2848-2852.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17, 10.

Mayr, B., and Montminy, M. (2001). Transcriptional regulation by the phosphorylation-dependent factor CREB. Nat Rev Mol Cell Biol 2, 599-609.

McDonnell, A.V., Jiang, T., Keating, A.E., and Berger, B. (2006). Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics 22, 356-358.

McLachlan, A.D., and Stewart, M. (1975). Tropomyosin coiled-coil interactions: Evidence for an unstaggered structure. Journal of Molecular Biology 98, 293-304.

Mechta-Grigoriou, F., Gerald, D., and Yaniv, M. (2001). The mammalian Jun proteins: redundancy and specificity. Oncogene 20, 2378-2389.

Mechta-Grigoriou, F., Giudicelli, F., Pujades, C., Charnay, P., and Yaniv, M. (2003). c-jun regulation and function in the developing hindbrain. Dev Biol 258, 419-431.

Merkenschlager, M., and Odom, D.T. (2013). CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152, 1285-1297.

Metallo, S.J., and Schepartz, A. (1994). Distribution of labor among bZIP segments in the control of DNA affinity and specificity. Chemistry & Biology 1, 143-151.

Miller, M. (2006). Phospho-dependent protein recognition motifs contained in C/EBP family of transcription factors - in silico studies. Cell Cycle 5, 2501-2508.

161 Miller, M. (2009). The importance of being flexible: the case of basic region leucine zipper transcriptional regulators. Curr Protein Pept Sci 10, 244-269.

Miller, M., Shuman, J.D., Sebastian, T., Dauter, Z., and Johnson, P.F. (2003). Structural basis for DNA recognition by the basic region leucine zipper transcription factor CCAAT/enhancer-binding protein alpha. J Biol Chem 278, 15178-15184.

Mitsui, S., Yamaguchi, S., Matsuo, T., Ishida, Y., and Okamura, H. (2001). Antagonistic role of E4BP4 and PAR proteins in the circadian oscillatory mechanism. Genes Dev 15, 995-1006.

Moroz, L.L., Kocot, K.M., Citarella, M.R., Dosung, S., Norekian, T.P., Povolotskaya, I.S., Grigorenko, A.P., Dailey, C., Berezikov, E., Buckley, K.M., et al. (2014). The ctenophore genome and the evolutionary origins of neural systems. Nature 510, 109-114.

Motohashi, H., O'Connor, T., Katsuoka, F., Engel, J.D., and Yamamoto, M. (2002). Integration and diversity of the regulatory network composed of Maf and CNC families of transcription factors. Gene 294, 1-12.

Muller, W.E. (1995). Molecular phylogeny of Metazoa (animals): monophyletic origin. Naturwissenschaften 82, 321-329.

Nagadoi, A., Nakazawa, K., Uda, H., Okuno, K., Maekawa, T., Ishii, S., and Nishimura, Y. (1999). Solution structure of the transactivation domain of ATF-2 comprising a zinc finger-like subdomain and a flexible subdomain. J Mol Biol 287, 593-607.

Nakanishi, N., Sogabe, S., and Degnan, B.M. (2014). Evolutionary origin of gastrulation: insights from sponge development. BMC Biol 12, 26.

Natoli, G., and Andrau, J.C. (2012). Noncoding transcription at enhancers: general principles and functional models. Annu Rev Genet 46, 1-19.

Newman, J.R., and Keating, A.E. (2003). Comprehensive identification of human bZIP interactions with coiled-coil arrays. Science 300, 2097-2101.

Nichols, S.A., Roberts, B.W., Richter, D.J., Fairclough, S.R., and King, N. (2012). Origin of metazoan cadherin diversity and the antiquity of the classical cadherin/beta-catenin complex. Proc Natl Acad Sci U S A 109, 13046-13051.

Nijhawan, A., Jain, M., Tyagi, A.K., and Khurana, J.P. (2008). Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice. Plant Physiol 146, 333-350.

Nunez-Corcuera, B., Birch, J.L., Yamada, Y., and Williams, J.G. (2012). Transcriptional repression by a bZIP protein regulates Dictyostelium prespore differentiation. PLoS One 7, e29895.

Oliveri, P., Fortunato, A.E., Petrone, L., Ishikawa-Fujiwara, T., Kobayashi, Y., Todo, T., Antonova, O., Arboleda, E., Zantke, J., Tessmar-Raible, K., et al. (2014). The Cryptochrome/Photolyase Family in aquatic organisms. Mar Genomics 14, 23-37.

Parfrey, L.W., and Lahr, D.J.G. (2013). Multicellularity arose several times in the evolution of eukaryotes (Response to DOI 10.1002/bies.201100187). BioEssays 35, 339-347.

Patro, R., and Kingsford, C. (2013). Predicting protein interactions via parsimonious network history inference. Bioinformatics 29, i237-246.

162 Patthy, L. (1987). Intron-dependent evolution: preferred types of exons and introns. FEBS Lett 214, 1-7.

Peter, I.S., and Davidson, E.H. (2015). Genomic Control Process: Development and Evolution (London: Academic Press).

Pham, T.H., Minderjahn, J., Schmidl, C., Hoffmeister, H., Schmidhofer, S., Chen, W., Langst, G., Benner, C., and Rehli, M. (2013). Mechanisms of in vivo binding site selection of the hematopoietic master transcription factor PU.1. Nucleic Acids Res 41, 6391-6402.

Philippe, H., Derelle, R., Lopez, P., Pick, K., Borchiellini, C., Boury-Esnault, N., Vacelet, J., Renard, E., Houliston, E., Queinnec, E., et al. (2009). Phylogenomics revives traditional views on deep animal relationships. Curr Biol 19, 706-712.

Potapov, V., Kaplan, J.B., and Keating, A.E. (2015). Data-driven prediction and design of bZIP coiled-coil interactions. PLoS Comput Biol 11, e1004046.

Pourabed, E., Ghane Golmohamadi, F., Soleymani Monfared, P., Razavi, S.M., and Shobbar, Z.S. (2015). Basic leucine zipper family in barley: genome-wide characterization of members and expression analysis. Mol Biotechnol 57, 12-26.

Ptashne, M., and Gann, A. (2002). Genes and Signals (Cold Spring Harbor, NY: Cold Spring Harbor Lab Press).

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.

R Development Core team. R: A language and environment for statistical computing (Vienna, Austria: R Foundation for Statistical Computing).

Ramji, D.P., and Foka, P. (2002). CCAAT/enhancer-binding proteins: structure, function and regulation. Biochem J 365, 561-575.

Reinke, A.W. (2012). Determining Protein Interaction Specificity of Native and Designed bZIP Family Transcription Factor. In DEPARTMENT OF BIOLOGY (MASSACHUSETTS INSTITUTE OF TECHNOLOGY).

Reinke, A.W., Baek, J., Ashenberg, O., and Keating, A.E. (2013). Networks of bZIP protein-protein interactions diversified over a billion years of evolution. Science 340, 730-734.

Reitzel, A.M., Behrendt, L., and Tarrant, A.M. (2010). Light entrained rhythmic gene expression in the sea anemone Nematostella vectensis: the evolution of the animal circadian clock. PLoS One 5, e12805.

Reitzel, A.M., Sullivan, J.C., Traylor-Knowles, N., and Finnerty, J.R. (2008). Genomic survey of candidate stress-response genes in the estuarine anemone Nematostella vectensis. Biol Bull 214, 233-254.

Reitzel, A.M., Tarrant, A.M., and Levy, O. (2013). Circadian clocks in the cnidaria: environmental entrainment, molecular regulation, and organismal outputs. Integr Comp Biol 53, 118-130.

Rep, M., Proft, M., Remize, F., Tamas, M., Serrano, R., Thevelein, J.M., and Hohmann, S. (2001). The Saccharomyces cerevisiae Sko1p transcription factor mediates HOG pathway-dependent

163 osmotic regulation of a set of genes encoding enzymes implicated in protection from oxidative damage. Mol Microbiol 40, 1067-1083.

Reppert, S.M., and Weaver, D.R. (2002). Coordination of circadian timing in mammals. Nature 418, 935-941.

Richards, G.S. (2010). The origins of cell communication in the animal kingdom: Notch signalling during embryogenesis and metamorphosis of the demosponge Amphimedon queenslandica. In School of Biological Sciences (The University of Queensland).

Richards, G.S., and Degnan, B.M. (2012). The expression of Delta ligands in the sponge Amphimedon queenslandica suggests an ancient role for Notch signaling in metazoan development. Evodevo 3, 15.

Rivera, A.S., Ozturk, N., Fahey, B., Plachetzki, D.C., Degnan, B.M., Sancar, A., and Oakley, T.H. (2012). Blue-light-receptive cryptochrome is expressed in a sponge eye lacking neurons and opsin. J Exp Biol 215, 1278-1286.

Roadmap Epigenomics, C., Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi- Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330.

Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y.J., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., et al. (2007). Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651-657.

Ron, D. (2002). Translational control in the endoplasmic reticulum stress response. Journal of Clinical Investigation 110, 1383-1388.

Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Hohna, S., Larget, B., Liu, L., Suchard, M.A., and Huelsenbeck, J.P. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61, 539-542.

Rubin, E.B., Shemesh, Y., Cohen, M., Elgavish, S., Robertson, H.M., and Bloch, G. (2006). Molecular and phylogenetic analyses reveal mammalian-like clockwork in the honey bee (Apis mellifera) and shed new light on the molecular evolution of the circadian clock. Genome Res 16, 1352-1365.

Ruiz-Trillo, I., Roger, A.J., Burger, G., Gray, M.W., and Lang, B.F. (2008). A phylogenomic investigation into the origin of metazoa. Molecular Biology and Evolution 25, 664-672.

Rutkowski, D.T., and Kaufman, R.J. (2003). All roads lead to ATF4. Developmental Cell 4, 442- 444.

Ryan, J.F., Burton, P.M., Mazza, M.E., Kwong, G.K., Mullikin, J.C., and Finnerty, J.R. (2006). The cnidarian-bilaterian ancestor possessed at least 56 : evidence from the starlet sea anemone, Nematostella vectensis. Genome Biology 7.

Ryan, J.F., Pang, K., Schnitzler, C.E., Nguyen, A.D., Moreland, R.T., Simmons, D.K., Koch, B.J., Francis, W.R., Havlak, P., Program, N.C.S., et al. (2013). The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342, 1242592.

164 Sassonecorsi, P., Ransone, L.J., Lamph, W.W., and Verma, I.M. (1988). Direct Interaction between Fos and Jun Nuclear Oncoproteins - Role of the Leucine Zipper Domain. Nature 336, 692-695.

Schep, A.N., and Adryan, B. (2013). A comparative analysis of transcription factor expression during metazoan embryonic development. PLoS One 8, e66826.

Schuster-Bockler, B., Schultz, J., and Rahmann, S. (2004). HMM Logos for visualization of protein families. BMC Bioinformatics 5, 7.

Schwaiger, M., Schonauer, A., Rendeiro, A.F., Pribitzer, C., Schauer, A., Gilles, A.F., Schinko, J.B., Renfer, E., Fredman, D., and Technau, U. (2014). Evolutionary conservation of the eumetazoan gene regulatory landscape. Genome Res 24, 639-650.

Sebe-Pedros, A., Ballare, C., Parra-Acero, H., Chiva, C., Tena, J.J., Sabido, E., Gomez-Skarmeta, J.L., Di Croce, L., and Ruiz-Trillo, I. (2016). The Dynamic Regulatory Genome of Capsaspora and the Origin of Animal Multicellularity. Cell 165, 1224-1237.

Sebe-Pedros, A., de Mendoza, A., Lang, B.F., Degnan, B.M., and Ruiz-Trillo, I. (2011). Unexpected repertoire of metazoan transcription factors in the unicellular holozoan Capsaspora owczarzaki. Mol Biol Evol 28, 1241-1254.

Seitan, V.C., Faure, A.J., Zhan, Y., McCord, R.P., Lajoie, B.R., Ing-Simmons, E., Lenhard, B., Giorgetti, L., Heard, E., Fisher, A.G., et al. (2013). Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments. Genome Res 23, 2066- 2077.

Shamu, C.E. (1998). Splicing: HACking into the unfolded-protein response. Current Biology 8, R121-R123.

Shan, J.X., Fu, L.C., Balasubramanian, M.N., Anthony, T., and Kilberg, M.S. (2012). ATF4- dependent Regulation of the JMJD3 Gene during Amino Acid Deprivation Can Be Rescued in Atf4-deficient Cells by Inhibition of Deacetylation. Journal of Biological Chemistry 287, 36393- 36403.

Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-2504.

Shaulian, E., and Karin, M. (2002). AP-1 as a regulator of cell life and death. Nature Cell Biology 4, E131-E136.

Shaywitz, A.J., and Greenberg, M.E. (1999). CREB: a stimulus-induced transcription factor activated by a diverse array of extracellular signals. Annu Rev Biochem 68, 821-861.

Shen, L., Shao, N., Liu, X., and Nestler, E. (2014). ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 15, 284.

Shinzato, C., Shoguchi, E., Kawashima, T., Hamada, M., Hisata, K., Tanaka, M., Fujie, M., Fujiwara, M., Koyanagi, R., Ikuta, T., et al. (2011). Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476, 320-U382.

Shlyueva, D., Stampfel, G., and Stark, A. (2014). Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15, 272-286.

165 Shoguchi, E., Tanaka, M., Shinzato, C., Kawashima, T., and Satoh, N. (2013). A genome-wide survey of photoreceptor and circadian genes in the coral, Acropora digitifera. Gene 515, 426-431.

Simionato, E., Ledent, V., Richards, G., Thomas-Chollier, M., Kerner, P., Coornaert, D., Degnan, B.M., and Vervoort, M. (2007). Origin and diversification of the basic helix-loop-helix gene family in metazoans: insights from comparative genomics. BMC Evol Biol 7, 33.

Simpson, T.L. (1984). The cell biology of sponges (New York: Springer).

Sogabe, S., Nakanishi, N., and Degnan, B.M. (2016). The ontogeny of choanocyte chambers during metamorphosis in the demosponge Amphimedon queenslandica. Evodevo 7.

Sperling, E.A., Peterson, K.J., and Pisani, D. (2009). Phylogenetic-signal dissection of nuclear housekeeping genes supports the paraphyly of sponges and the monophyly of . Mol Biol Evol 26, 2261-2274.

Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M.E., Mitros, T., Richards, G.S., Conaco, C., Dacre, M., Hellsten, U., et al. (2010). The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466, 720-726.

Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688-2690.

Stangegaard, M., Dufva, I.H., and Dufva, M. (2006). Reverse transcription using random pentadecamer primers increases yield and quality of resulting cDNA. In BioTechniques, pp. 649Ð 657.

Steven, A., and Seliger, B. (2016). Control of CREB expression in tumors: from molecular mechanisms and signal transduction pathways to therapeutic target. Oncotarget 7, 35454-35465.

Suga, H., Chen, Z., de Mendoza, A., Sebe-Pedros, A., Brown, M.W., Kramer, E., Carr, M., Kerner, P., Vervoort, M., Sanchez-Pons, N., et al. (2013). The Capsaspora genome reveals a complex unicellular prehistory of animals. Nat Commun 4, 2325.

Supek, F., Bosnjak, M., Skunca, N., and Smuc, T. (2011). REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6, e21800.

Sutherland, J.A., Cook, A., Bannister, A.J., and Kouzarides, T. (1992). Conserved motifs in Fos and Jun define a new class of activation domain. Genes & Development 6, 1810-1819.

Szathmary, E., and Smith, J.M. (1995). The Major Evolutionary Transitions. Nature 374, 227-232.

Thompson, M.R., Xu, D.K., and Williams, B.R.G. (2009). ATF3 transcription factor and its emerging roles in immunity and cancer. Journal of Molecular Medicine-Jmm 87, 1053-1060.

Tian, C., Li, J., and Glass, N.L. (2011). Exploring the bZIP transcription factor regulatory network in Neurospora crassa. Microbiology 157, 747-759.

Tiebe, M., Lutz, M., De La Garza, A., Buechling, T., Boutros, M., and Teleman, A.A. (2015). REPTOR and REPTOR-BP Regulate Organismal Metabolism and Transcription Downstream of TORC1. Dev Cell 33, 272-284.

Tsai, Y.C., and Weissman, A.M. (2010). The Unfolded Protein Response, Degradation from Endoplasmic Reticulum and Cancer. Genes Cancer 1, 764-778.

166 Tsukada, J., Yoshida, Y., Kominato, Y., and Auron, P.E. (2011). The CCAAT/enhancer (C/EBP) family of basic-leucine zipper (bZIP) transcription factors is a multifaceted highly-regulated system for gene regulation. Cytokine 54, 6-19.

Turner, B.M. (2007). Defining an epigenetic code. Nat Cell Biol 9, 2-6. van Dam, H., and Castellazzi, M. (2001). Distinct roles of Jun : Fos and Jun : ATF dimers in oncogenesis. Oncogene 20, 2453-2464. van den Beucken, T., Koritzinsky, M., Niessen, H., Dubois, L., Savelkouls, K., Mujcic, H., Jutten, B., Kopacek, J., Pastorekova, S., van der Kogel, A.J., et al. (2009). Hypoxia-induced Expression of Carbonic Anhydrase 9 Is Dependent on the Unfolded Protein Response. Journal of Biological Chemistry 284, 24204-24212. van Straaten, F., Muller, R., Curran, T., Van Beveren, C., and Verma, I.M. (1983). Complete nucleotide sequence of a human c-onc gene: deduced amino acid sequence of the human c-fos protein. Proc Natl Acad Sci U S A 80, 3183-3187.

Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., and Luscombe, N.M. (2009). A census of human transcription factors: function, expression and evolution. Nat Rev Genet 10, 252-263.

Vinson, C., Acharya, A., and Taparowsky, E.J. (2006). Deciphering B-ZIP transcription factor interactions in vitro and in vivo. Biochim Biophys Acta 1759, 4-12.

Vinson, C., Myakishev, M., Acharya, A., Mir, A.A., Moll, J.R., and Bonovich, M. (2002). Classification of human B-ZIP proteins based on dimerization properties. Mol Cell Biol 22, 6321- 6335.

Vize, P.D. (2009). Transcriptome analysis of the circadian regulatory network in the coral Acropora millepora. Biol Bull 216, 131-137.

Wang, Z., Cheng, K., Wan, L., Yan, L., Jiang, H., Liu, S., Lei, Y., and Liao, B. (2015). Genome- wide analysis of the basic leucine zipper (bZIP) transcription factor gene family in six legume genomes. BMC Genomics 16, 1053.

Wei, K., Chen, J., Wang, Y., Chen, Y., Chen, S., Lin, Y., Pan, S., Zhong, X., and Xie, D. (2012). Genome-wide analysis of bZIP-encoding genes in maize. DNA Res 19, 463-476.

Weirauch, M.T., and Hughes, T.R. (2011). A catalogue of eukaryotic transcription factor types, their evolutionary origin, and species distribution. In A handbook of transcriptions factors (Springer), pp. 25-73.

Whelan, N.V., Kocot, K.M., Moroz, L.L., and Halanych, K.M. (2015). Error, signal, and the placement of Ctenophora sister to all other animals. Proc Natl Acad Sci U S A.

Wu, H., Caffo, B., Jaffee, H.A., Irizarry, R.A., and Feinberg, A.P. (2010). Redefining CpG islands using hidden Markov models. Biostatistics 11, 499-514.

Xia, Y., and Karin, M. (2004). The control of cell motility and epithelial morphogenesis by Jun kinases. Trends Cell Biol 14, 94-101.

Yamaguchi, S., Mitsui, S., Yan, L., Yagita, K., Miyake, S., and Okamura, H. (2000). Role of DBP in the circadian oscillatory mechanism. Mol Cell Biol 20, 4773-4781.

167 Yang, Y., and Cvekl, A. (2007). Large Maf Transcription Factors: Cousins of AP-1 Proteins and Important Regulators of Cellular Differentiation. Einstein J Biol Med 23, 2-11.

Ye, W., Wang, Y., Dong, S., Tyler, B.M., and Wang, Y. (2013). Phylogenetic and transcriptional analysis of an expanded bZIP transcription factor family in Phytophthora sojae. BMC Genomics 14, 839.

Yilmaz, A., Mejia-Guerra, M.K., Kurz, K., Liang, X., Welch, L., and Grotewold, E. (2011). AGRIS: the Arabidopsis Gene Regulatory Information Server, an update. Nucleic Acids Res 39, D1118-1122.

Yin, W., Reinke, A.W., Szilagyi, M., Emri, T., Chiang, Y.-M., Keating, A.E., Pocsi, I., Wang, C.C.C., and Keller, N.P. (2013). bZIP transcription factors affecting secondary metabolism, sexual development and stress responses in Aspergillus nidulans.

Yoshida, H., Matsui, T., Yamamoto, A., Okada, T., and Mori, K. (2001). XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107, 881-891.

Yu, K., Mo, D., Wu, M., Chen, H., Chen, L., Li, M., and Chen, Y. (2014). Activating transcription factor 4 regulates adipocyte differentiation via altering the coordinate expression of CCATT/enhancer binding protein beta and peroxisome proliferator-activated receptor gamma. FEBS J 281, 2399-2409.

Yu, W., and Hardin, P.E. (2006). Circadian oscillators of Drosophila and mammals. J Cell Sci 119, 4793-4795.

Yuen, B. (2016). Deciphering the genomic tool-kit underlying animal-bacteria interactions: Insights through the demonsponge Amphimedon queenslandica In School of Biological Sciences (The University of Queensland).

Zaret, K.S., and Carroll, J.S. (2011). Pioneer transcription factors: establishing competence for gene expression. Genes Dev 25, 2227-2241.

Zenz, R., and Wagner, E.F. (2006). Jun signalling in the epidermis: From developmental defects to psoriasis and skin tumors. International Journal of Biochemistry & Cell Biology 38, 1043-1049.

Zhang, J., and Yang, J.R. (2015). Determinants of the rate of protein sequence evolution. Nat Rev Genet 16, 409-420.

Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.

Zhou, V.W., Goren, A., and Bernstein, B.E. (2011). Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet 12, 7-18.

Zhukovskaya, N.V., Fukuzawa, M., Yamada, Y., Araki, T., and Williams, J.G. (2006). The Dictyostelium bZIP transcription factor DimB regulates prestalk-specific gene expression. Development 133, 439-448.

168 APPENDICES

Note: Additional appendices Appendices that are too large to be included in the following section are available for download via Cloudstor+, a cloud storage and sharing web service run by AARNet (Australian Academic and Research Network). These appendices are referenced in-text, and their titles and descriptions are listed in this Appendices section, in the order in which they would normally occur. These online- only files are marked with an asterisk here.

Download information for these files is as follows: Link: https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r Password: bZIP Link expiry date: None These files are also available upon request to the author (Katia Jindrich) at [email protected] or [email protected].

Appendix 2.1. bZIP dataset. * First tab contains databases information, method of protein retrieval and references. Following tabs include the full list of the protein sequences used in this study, accession information and, when relevant, family assignment. * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 2.2. Phylogenetic analysis of the bZIP complement in Metazoa.* (A) Mid-point rooted maximum likelihood tree of unambiguously identified bZIPs from 6 representative species (3 bilaterians (H. sapiens, B. floridae and D. pulex), 1 sponge (A. queenslandica), 1 filasterean (C. owczarzaki) and 1 fungus (S. cerevisiae). Poriferan branches are shown in orange, filasterean branches in pink and fungal branches in green. bZIP families are shaded in grey. (B) Alignments of PAR-L, OASIS-L and REPTOR subfamilies. Shading indicates residue conservation at a given position, decreasing from black (100%) to light grey (60%). Typical PAR- and OASIS- bZIPs are included at the top of the alignment for reference; triangles indicate the positions discussed in the text. * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

169 Appendix 2.3. Diversification of bZIP subfamilies. * Mid-point rooted Bayesian inference trees of the bZIP members of five families: OASIS, PAR, FOS-ATF3, CEBP and MAF. Posterior probabilities are displayed on the branches. Hsap: Homo Sapiens; Brfl: Branchiostoma floridae; Spur: Strongylocentrotus purpuratus; Dmeg: Drosophila melanogaster; Tcas: Tribolium castaneum ; Dapu: Daphnia pulex; Lotg: Lottia gigantean; Ctel: Capitella telata; Nmev: Nematostella vectensis; Hmag: Hydra magnipapillata; Adi: Acropora digitifera ;Tadh: Trichoplax adherans; Amq: Amphimedon queenslandica; Mnle: Mnemiopsis leidyi. * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 2.4. Family-specific amino acid analysis. Based on sequence similarity, we built a consensus sequence for each family. The number of sequences used to build this consensus is indicated in parentheses. As a reference, the canonical bZIP sequence is shown in the box above the alignment and black circles indicate five highly conserved residues of the basic domain discussed in the text. We only display residues that are specific to a bZIP family; residues that are conserved in all bZIPs appear as a dot in the alignment. Yellow highlighted residues are positions that are most specific to each family and that were used to confirm family assignment in this study.

Canonical bZIP sequence

Basic region Leucine zipper

170 Appendix 2.5. Statistical support for family assignment.* We reported the boostrap values (NJ: neighbor-joining and ML: maximum likelihood) and posterior probabilities (BI: Bayesian inference) that supported the assignment of each protein to a bZIP family, listed on the left. Taxa are listed at the top. * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 2.6. Emergence of new domain combinations in metazoan bZIP families. A bar represents a typical bZIP family member, on which we indicated the positions of the bZIP domain (BRLZ, in yellow) and a conserved motif (in grey). Three of these motifs Ð KID in CREB, TAD in ATF2 and HOB 1-2 in JUN- have been previously described in bilaterian bZIPs. Each motif is displayed as a logo. Coloured circles to the right indicate the presence of this motif in the taxa listed on the top right. Hs: Homo sapiens; Dm: Drosophila melanogaster; Hm: Hydra magnipapillata; Aq: Amphimedon queenslandica; Mb: Monosiga brevicollis; Co: Capsaspora owczarzaki; Sc: Saccharomyces cerevisiae.

171 Appendix 3.1. Genomic features of AqbZIPs * First tab contains the accession numbers and gene coordinates of AqbZIPs; Second tab contains the list and genomic locations AqbZIPs introns. * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 3.2. AqbZIPs cDNA sequences* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 3.3. Script use for correlation analysis* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 3.4. Gene-specific primers and description of RNA probes used for in-situ hybridisation

Gene targeted Probe size (bp) Primers Forward ACACTCCCTCTGTCTCCTCT AqJUN 749 Reverse AGGAAGTGGTGCTTTCGGAG Forward CCTTTGCGAGTCACCATCCT AqMAF 904 Reverse ACGGTTCTTCAGTGTCCGTC Forward TCCTTGCTCGACTCCAGAAT AqFOS 712 Reverse CAAGTGCTTCGTTGGACTCA Forward TCACTTCAAGGTGCAAGCCT AqCEBP 839 Reverse AGCTCTTGTACTCTTTTCTCTGTCT Forward CACAACACCGAGACTCCTCC AqPARa 940 Reverse TCCACAGGGAGAGCGTCATA Forward TCCCAGATGAGGAGGACCAG AqATF4 784 Reverse AGTACGCTTTGGAGTGGGTG Forward TTGCTTGGAGAGGCGAGAAG AqOASIS 898 Reverse TGTCCCTCTAGACCTCTGCA Forward CCCTCACTGGTCTAGAGCCT AqNFE2a 882 Reverse GAGGGCGATGGAGTATGACG Forward GGCTAGAAGACCCCGGACTA AqNFE2b 936 Reverse AGCTTGTCCCGAGTGATTGG Forward AAACCAGCCAAACCACTCCA AqXBP1 869 Reverse TGGGTTCCAGCCAGTTTGTT Forward ACCACCTGAGAAGCAATCGG AqCREB 876 Reverse AGGAGAGCCTACGAAACCCT

172 Appendix 3.5. Immunohistochemitry protocol* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 3.6. AqbZIPs and other genes located on Contig 13482

Appendix 3.7. Protein sequence conservation within the AqbZIPs* Multiple alignments of the full-length protein sequences of the AqbZIPs. Conserved residues (50%) are highlighted. * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

173 Appendix 3.8. Control of Anti-ATF4 antibody specificity in larva (A-B2) No nuclear staining was detected in larvae labelled with a pre-absorbed (Pa) anti- ATF4 antibody (A-A2) or the pre-immune serum (B-B1). Nuclei are stained with DAPI in (A1-B1). (A2) and (B2) are a merge of (A-A1) and (B-B1), respectively. (C) Whole mount micrograph showing the larva cell layers. Nuclei are labelled with DAPI as indicated. Scale bars: (A-B2) 30µm, (C) 100µm. (C) image by G. Richards.

174 Appendix 3.9. Control of Anti-JUN antibody specificity in juvenile No nuclear staining was detected in juvenile (oscula stage) labelled with a pre-absorbed (Pa) anti-JUN antibody. (C) is a merge of (A-B) and C2 is an enlargement of (C). Nuclei are labelled with DAPI as indicated. Scale bars: (A-C) 20µm, (C2) 5µm

175 Appendix 3.10. Genes correlated with the AqbZIPs * * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

176

Appendix 3.11. GO terms enriched in the list of genes correlated with the AqbZIPs

177 Appendix 3.12. Annotation of AqbZIP leucine zippers relative to their human and cnidarian orthologs.* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 3.13. Interactions scores for all AqbZIPs possible dimers, using the two scoring functions of the bZIPs scoring form.* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 4.1. Accession numbers of A. queenslandica candidate circadian genes

Accession number Family Gene Aqu1 Aqu2 Aqu2.1 Ensembl NCBI AqCLOCK Aqu1.220834 AQU2.28537_001 Aqu2.1.31785_001 PAC:15719362 XM_003386608.1 AqpARNT3 Aqu1.212589 AQU2.17581_001 Aqu2.1.19264_001 PAC:15711117 XM_003389652.1 XM_003389856.1//XM AqpARNT1 Aqu1.211840 AQU2.16593_001 Aqu2.1.18085_001 PAC:15710368 _003389651.1 bHLH-PAS AqNPAS-like 1 Aqu1.222845 AQU2.31028_001 Aqu2.1.34613_001 PAC:15721373 AqAHR Aqu1.222846 AQU2.31029_001 Aqu2.1.34614_001 PAC:15721374 AqNPAS-like 2 Aqu1.222843 AQU2.31026_001 Aqu2.1.34611_001 PAC:15721371 AqpARNT2 Aqu1.212588 AQU2.17588_001 Aqu2.1.19262_001 PAC:15711116 AqTO2 Aqu1.215545 AQU2.21803_001 Aqu2.1.24064_001 Timeless AqTO3 Aqu1.203589 AQU2.04592_001 Aqu2.1.05286_001 XM_003388630.1 AqTO1 Aqu1.217007 AQU2.23783_001 Aqu2.1.26309_001 XM_003388112.1 AqCry2 Aqu1.221089 AQU2.28820_001 Aqu2.1.32109_001 XM_003386521.1 Cryptochromes AqCry1 Aqu1.221016 AQU2.28726_001 Aqu2.1.32002_001 XM_003386534.1 AqPARa Aqu1.224916 AQU2.33675_001 Aqu2.1.37571_001 PAC:15723444 XM_003384841.1 Par-bZIP AqPARb Aqu1.224866 Aqu2.33631_001 Aqu2.1.37521_001 PAC:15723394

178 Appendix 4.2. Light intensity report Light intensity was measured across several days in the tanks where the experiences were conducted. At the bottom of the table, the light intensity experienced by larvae reared under constant light is indicated for reference.

Date time Light intensity (µmol) 4.30pm 14.4 4.45pm 13.3 5pm 12.85 5.25pm 11.8 5.30pm 9.58 5/03/2015 5.45pm 7.45 6pm 4.6 6.10pm 2.4 6.15pm 1.4 6.30pm 0.12 6.45pm 0.03 12pm 200 6/03/2015 2pm 290 3.30pm 34 13/03/2015 4pm 23 5.45pm 0.002 2.30pm 57.2 3pm 34 4pm 23.8 5pm 15.2 15/03/2015 7pm dark 8pm dark 9pm dark 11pm dark Constant light 12.5

Appendix 4.3. qPCR primers

Annealing mRNA product Name Aqu2.1 Model Forward Primer Reverse Primer temperature length (bp) AqClock Aqu2.1.31785_001 GGATCTCTATCGGGTATCGGG TGAGCTACAGGAATCACCCATT 61 255 AqpARNT3 Aqu2.1.19264_001 GCTCCACCACGAAGCAATGAAG CTTGAGCCTGCTCTGACTTCTCC 57 203 AqpARNT1 Aqu2.1.18085_001 TGCCGTGCTACATTGTACTGGT CAAAATATTAGAGACACGTGGGTC 57 244 AqTO1 Aqu2.1.26309_001 GCAACTACCAACTGAAGACCAG CATCAGGCCATACTTCACGAGC 57 316 AqPARa Aqu2.1.37571_001 GGACTGTTTGCATCATTCCAG GTCCTCGTAGTCCTCTGTATCAGT 61 225 AqCry2 Aqu2.1.32109_001 CTTGCTAAGAGAATCGATTGCTAC CGAGATTGGATTATAGTGCTC 58 187 AqCry1 Aqu2.1.32002_001 GGTATCCATGGATCGATGCAG CAGACTGTAGAGACCGGTAGCTC 57 181 AqYWAHZ*/** Aqu2.1.29077_001 AGCGGAGGGCAAAAAGAA CCTCGGCAAGATAGCGATAGTAGT 63 185 AqTetraspanin** Aqu2.1.39600_001 GCCACTATCATGACTCTTGCTG CAGCCAACGCTACAATCACT 63 139 Aq FK506 binding protein** Aqu2.1.31343_001 GCTGTGTCAAGGCAAATCCT AGGACACCACTGCTTGTGAA 63 225 AqECH*/** Aqu2.1.32897_001 GGTGAACGTATTGGTGAGTTC GTTTCTCAAGGAAGGCAGTC 61 172 AqGAPDH*/** Aqu2.1.29717_001 GCACCTTCTGCTGATGCT ACGACCATCACGCCATTT 61 237 Aqu1.228535* Aqu2.1.42249_001 GCTCTGTCTCATCAGCCACTT CATTGGCTTGATTCCCATTT 61 164 Aqu1.225621** Aqu2.1.38453_001 GCATTGAGAGGAAAGCCACT TGCATCAGGCTCAATAGCAA 63 179 AqSDHA* AGGGGAGTGGTAGCTATGAA TGAAACTGTACAAACTCCATGTCT 58 194

179 Appendix 4.4. Temporal gene expression of circadian-like genes in adult Amphimedon queenslandica. 3 adult sponges were sampled in the field throughout 5 days, during spring (September) and summer (January). Shaded area indicates period of night. Error bars are ± SEM.

Spring Summer

Sponge1 Sponge2 Sponge3 Sponge 4 Sponge 5 Sponge 6 8 8

6 6 AqCRY2 4 4

2 2 Relative expression

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 6p 4a 10a12p 10a 10a12p 10a

15 15

10 10 AqPARa

5 5 Relative expression

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 6p 4a 10a12p 10a 10a12p 10a

20 20

15 15

AqClock 10 10

5 5 Relative expression 0 0

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 6p 4a 10a12p 10a 10a12p 10a Time Time

180 Appendix 4.5. Developmental expression profiles of candidate circadian genes. Average gene expression levels of the candidate circadian genes identified in this study across seventeen developmental stages, indicated along the x-axis. Expression values are CEL-Seq normalised read counts. Error bars are standard deviations. Shadings indicate three morphological phases: embryonic stages inside the maternal brood chamber (brown), larval stages (white) and fully metamorphosed and feeding adult stages (blue). Error bars are +SD. p.s = post settlement.

800 2500 400 AqCRY2 AqPARa AqClock 2000 600 300 1500 400 200 1000 200 100 500

0 600 150 150 AqARNT3 AqNPAS-like1 AqpARNT2

400 100 100

200 50 50

150 6000 500 AqNPAS-like2 AqpARNT1 AqCRY1 400 100 4000 300

200 50 2000 100

150 150 150 AqTO1 AqTO2 AqTO3

100 100 100

50 50 50 Normalised expression (counts per million mapped)

Spot Ring 150 AqAHR 800 AqPARb Cloud adult Brown Larvae1 hr p.s oscula Cleavage Late spotLate ring 6-7 hr p.s chambertent-pole 11-1223-24 hr p.s hr p.s 600 100

400

50 200

0

Spot Ring Spot Ring Cloud adult Cloud adult Brown Larvae1 hr p.s oscula Brown Larvae1 hr p.s oscula Cleavage Late spotLate ring 6-7 hr p.s chambertent-pole Cleavage Late spotLate ring 6-7 hr p.s chambertent-pole 11-1223-24 hr p.s hr p.s 11-1223-24 hr p.s hr p.s

181 Appendix 4.6. Temporal gene expression of circadian-like genes in adult Amphimedon queenslandica. 3 adult sponges were sampled in the field throughout 5 days, during spring (September) and summer (January). Shaded area indicates period of night. Error bars are ± SEM.

Spring Summer

Sponge 1 Sponge 2 Sponge 3 Sponge 4 Sponge 5 Sponge 6 10 10

8 8

6 6

AqCRY1 4 4

2 2 Relative expression

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 6p 4a 10a12p 10a 10a12p 10a

5 5

4 4

AqpARNT3 3 3 2 2

1 1 Relative expression

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 6p 4a 10a12p 10a 10a12p 10a

5 5

4 4

AqTO1 3 3

2 2

1 1 Relative expression

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 6p 4a 10a12p 10a 10a12p 10a

5 25

4 15 5 3 AqpARNT1 4 2 3 2 1 1 Relative expression

2a 4a 6a 8a 2p 4p 6p 4a 6a 4a 6a 2p 4p 4a 6a 4a 6a 8a 2p 4p 6p 4a 6a 10a12p 10a 10a12p Time Time

182 Appendix 4.7. Expression of AqCRY2, AqPARa and AqClock orthologues in individual larvae throughout a day. Larval cohorts were naturally released by maternal sponges and collected in flow traps in one-hour intervals: 12-1pm (light blue triangles), 1- 2pm (black triangles), 2-3pm (purple triangles) and 3-4pm (green triangles). Four to five individual larvae were collected from each cohort at hourly intervals post release (as numbers permitted) and processed for qPCR analysis of circadian gene orthologues AqCRY2 (A), AqPARa (B) and AqClock (C). Shaded area = natural darkness.

Aqcry2 AqPARa AqClock 1.0 A 1.0 B 1.0 C 0.5 0.5 0.5

0.0 0.0 0.0

-0.5 -0.5 -0.5

-1.0 -1.0 -1.0 Relative expression

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 Time (pm) Time (pm) Time (pm)

Appendix 5.1. Conservation of ATF4 protein sequence across metazoans Multiple alignment of ATF4 orthologues in representative metazoan species (Chapter 2), trimmed to their bZIP domain. Conserved residues (50%) are highlighted. The portion of the bZIP domain that mediates DNA-binding is boxed, and DNA-contacting residues are marked by a red dot. Hp: Homo Sapiens; Bf: Branchiostoma floridae; Dm: Drosophila melanogaster; Tc: Tribolium castaneum ; Dp: Daphnia pulex; Lg: Lottia gigantean; Ct: Capitella telata; Hm: Hydra magnipapillata; Ad: Acropora digitifera ;Ta: Trichoplax adherans; Aq: Amphimedon queenslandica.

183 Appendix 5.2. Histone covalent post-translation modifications (PTMs) and RNA Pol II used in this study and their typical genomic localisation relative to coding genes and regulatory regions in bilaterian model organisms (Ho et al., 2014).

Histone PTM Genomic Localization monomethylated histone H3 at lysine 4 (H3K4me1) Distal cis-regulatory sites trimethylated histone H3 at lysine 4 (H3K4me3) Active proximal promoters trimethylated histone H3 at lysine 36 (H3K36me3) Actively transcribed regions trimethylated histone H3 at lysine 27 (H3K27me3) Polycomb-repressed regions acetylated histone H3 at lysine 27 (H3K27ac) Activated regulatory regions RNAPII (8WG16) Active transcription

Appendix 5.3. Conservation of Histone H3 proteins euqnce in eukaryotes Multiple sequence alignment of various eukaryotic histone H3 proteins (1-136 amino acids). Note that the entire amino acids sequence of histone H3 is highly conserved across eukaryotes. A. queenslandica sequence is highlighted. The amino acids sequences used to generate the alignment are also provided in Appendix 5.6

Appendix 5.4. Summary statistics on the ChIP-Seq datasets* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 5.5. Custom Perl script used to compute the genome coverage selected genomic features * * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 5.6. Histone H3 protein sequences* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

184 Appendix 5.7. BLASTp search outcome of the relevant histone methyltransferases and acetyltransferases against Amphimedon queenslandica proteins (NCBI nr database; E- value <1e-09).* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 5.8. Association of H3K36me3 with different chromatin states and transcriptional states (see next page) (A) Definition and enrichments for a 9-state hidden Markov model based on five histone PTMs (H3K4me3, H3K27ac, H3K4me1, H3K27me3 and H3K36me3) in A. queenslandica larva. From left to right: chromatin state definitions, abbreviations, histone PTM probabilities, genomic coverage, protein-coding gene functional annotation enrichments, expressed (Expr.; CEL-seq normalised counts >0.5 in larva) and repressed (Repr.; CEL- seq normalised counts <0.5 in larva) protein-coding gene enrichments. Blue shading indicates intensity, scaled by column. (B) Larva regions of enrichment of five histone PTMs (H3K4me3, H3K27ac, H3K4me1, H3K27me3 and H3K36me3) and RNAPII were overlapped with protein-coding genes and their association with lists of various gene expression groups was tested. Only non-overlapping protein-coding genes in scaffolds larger than 10 kb were used in this analysis. The color key represents the log2(odds ratio) and the significant adjusted P-values (Fisher’s exact test) are superimposed on the grids. A P-value of zero means the overlap is highly significant. N.S.: not significant. Note that H3K36me3 (highlighted in yellow in A and B) usually marks actively transcribed regions. In A. queenslandica, H3K36me3 associated with silenced regions and non- transcribed genes. This was most likely the result of technical difficulties, possibly due to antibody unspecific or weak binding, which were also reported in adult (Gaiti et al., under review). H3K36me3 was therefore excluded from all further analyses.

185 Appendix 5.8. Association of H3K36me3 with different chromatin states and transcriptional states (legend on previous page)

186 Appendix 5.9. Association of H3 PTMs with LincRNAs Larva regions of enrichment of four histone PTMs (H3K4me3, H3K27ac, H3K4me1, H3K27me3 ) and RNAPII were overlapped with LincRNA and their association with lists of various gene expression groups was tested. Only non-overlapping LincRNA in scaffolds larger than 10 kb were used in this analysis. The color key represents the log2(odds ratio) and the significant adjusted P-values (Fisher’s exact test) are superimposed on the grids. A P-value of zero means the overlap is highly significant. N.S.: not significant.

187 Appendix 5.10. Functional annotation of nearest neighbouring genes of the larva predicted activated enhancer-like elements.* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

Appendix 5.11. GO term enrichment outcome for the nearest neighbouring genes of the larva predicted activated enhancer-like elements (Hypergeometric test, FDR <0.01).* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

188 Appendix 5.12. Association of AqATF4 ChIP-Seq peaks with specific genomic features, histone H3 PTMs and chromatin states

# AqATF4 % AqATF4 Genomic location % Genome Fold Enrichemnt peaks peaks Exons 225 23 13.2 1.7 Introns 97 10 11.7 0.9 <1kb upstream TSS 199 21 23.4 0.9 >1kb upstream TSS 446 46 51.8 0.9 TSS_400bp 57 6 5.5 1.1 TES_400bp 75 8 5.5 1.5

# AqATF4 % AqATF4 Random H3 PTM peaks peaks # peaks sd H3K4me3 54 3% 52 7 H3K4me1 1329 76% 65 8 H3K27ac 1164 66% 94 9 RNAPII 641 37% 137 11 H3K27me3 1130 64% 120 10 Enhancer 638 36% 41 6 Enhancer_active 408 23% 15 4

# AqATF4 % AqATF4 Random Chromatin state peaks peaks # peaks sd TssA 8 0% 48 7 TxEnhA1 6 0% 334 16 TxEnhA2 26 1% 74 8 EnhWk 9 1% 155 12 BivTx 495 28% 64 8 EnhP 950 54% 144 11 ReprPC 26 1% 153 12 ReprPCWk 27 2% 231 14 Quies 194 11% 550 20

189 Appendix 5.13. Motifs enriched in AqATF4 peaks, identified by HOMER

Appendix 5.14. Enriched GO (gene ontology) terms amongst AqTAF4 putative targets (see next page) (A) GO terms enriched in all nearest neighbour protein-coding genes of the AqATF4 ChIP- Seq peak summits. (B) Same analysis in a subset of those genes, located with 5 kb of the peak summits.

190

191 Appendix 5.15. List of all AqATF4 putative gene targets and their annotations* * Available online via CloudStor+

(https://cloudstor.aarnet.edu.au/plus/index.php/s/NyQh92NKvbMUa1r; pw= bZIP)

192 Appendix 5.16. Expression of all AqATF4 putative targets during larval stages and early metamorphosis Heatmap showing expression profiles of all AqATF4 putative targets across the developmental stages indicated along the x-axis. Expression levels were measured by CEL-Seq and are rescaled by row. Each row represents one gene and rows are clustered by expression profile similarity. Colours represent expression levels: high to low (red to white to blue). hpe: hours post emergence (from the brood chamber); hps: hours post settlement.

193