Quick viewing(Text Mode)

Sequencing and Characterization of Hox Gene Clusters in Japanese Lamprey (Lethenteron Japonicum)

Sequencing and Characterization of Hox Gene Clusters in Japanese Lamprey (Lethenteron Japonicum)

SEQUENCING AND CHARACTERIZATION OF HOX GENE CLUSTERS IN JAPANESE (LETHENTERON JAPONICUM)

TARANG KUMAR MEHTA

BSc. Biomedical Science (Honours)

Nottingham Trent University, 2004

MRes. Molecular and Cellular Biology

Nottingham Trent University, 2007

A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

INSTITUTE OF MOLECULAR AND CELL BIOLOGY

&

DEPARTMENT OF PAEDIATRICS, YONG LOO LIN SCHOOL OF MEDICINE

NATIONAL UNIVERSITY OF SINGAPORE

2013

DECLARATION

I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis.

This thesis has also not been submitted for any degree in any university previously.

……………………………………….. Tarang Kumar Mehta 23rd August 2013 (Revised 14th February 2014)

Acknowledgements

There are several people I would like to acknowledge:

First and foremost I would like to thank my supervisor, Professor Byrappa Venkatesh for his patience, superb mentorship, and the amount of time he devoted into improving my scientific writing and presentation skills;

I would also like to thank my thesis-advisory committee (TAC) members, Dr Paul Robson, Dr Samuel Chong, and Dr Sydney Brenner for their guidance and suggestions throughout the course of my PhD;

Dr Alice Tay for her advice on presentations and at conferences. Additionally, I would also like to thank the DNA Sequencing Facility, Chew Ah-Keng and other lab members, who were excellent in handling and processing my samples both efficiently and effectively;

Dr Ravi Vydianathan for his all-round mentoring throughout my PhD and work towards my first manuscript. He showed me the ropes in all scientific disciplines and without which, I would not have been able to progress as a scientist;

Tay Boon-Hui for her constant guidance and support in scientific techniques, and her immaculate lab-managing skills allowing for a safe lab working environment;

Sumanty Tohari for maintaining a healthy balance between social and work life activities and making the lab an all round enjoyable place to not only grow as a scientist, but as a person too;

Dr Alison Lee for her valuable comments and work on the manuscript, and with Michelle Lian, the both of them have done a fantastic job in assembling the Japanese lamprey genome and have helped me with various aspects of

i bioinformatics. In the same breath I must also thank the collaborative effort between Dr. Sydney Brenner’s lab at Okinawa Institute of Science and Technology (OIST, Japan), and our lab in IMCB, for generating the sequencing data for the Japanese lamprey genome;

Lim Zhi-Wei for teaching me zebrafish transgenics and being an all-round approachable individual in the lab;

The Zebrafish facility for doing an excellent job in maintaining my fish according to all standard regulations;

All present and former members of the Comparative and Medical Genomics Laboratory and DNA Sequencing Facility in IMCB, Singapore for making the laboratory a pleasant and beneficial place to work and grow as a scientist;

The staff of A*Star, Singapore International Graduate Award (SINGA) and Yong Loo Lin School of Medicine, NUS for handing me this fantastic opportunity to work in a great environment, among great scientists. I would also like to thank them both the efficient handling of administrative affairs;

All of the interesting and diverse group of friends that I have made here in Singapore who have made my experiences both here and in South-East Asia both exciting and enjoyable;

Finally, I would like to thank my late father and mother, and my sister who without their efforts and support, I would not have progressed so far in life.

ii

Table of contents

Acknowledgements ...... i List of tables ...... vii List of figures ...... viii List of abbreviations ...... x Chapter 1: Introduction ...... 1 1.1 Hox proteins ...... 1 1.2 Hox gene clusters ...... 2 1.3 Function of Hox proteins in metazoans...... 6 1.3.1 Cnidaria ...... 6 1.3.2 Protostomes ...... 7 1.3.3 Deuterostomes – Ambulacraria ...... 8 1.3.4 Deuterostomes – Vertebrates ...... 8 1.4 Expression and regulation of Hox genes ...... 11 1.5 Cis-regulatory elements ...... 19 1.5.1 Methods to predict cis-regulatory elements ...... 20 1.5.2 Comparative genomics approach to predict cis-regulatory elements ...... 24 1.5.3 Testing the function of predicted cis-regulatory elements ...... 24 1.6 Genome duplication in the stem vertebrate lineage ...... 27 1.7 Jawless vertebrates (cyclostomes) ...... 29 1.8 Hox genes in jawless vertebrates (cyclostomes) ...... 32 1.9 Objectives of my work ...... 35 Chapter 2: Materials and methods ...... 38 2.1 Japanese lamprey BAC libraries ...... 38 2.2 Screening of BAC libraries ...... 38 2.3 Screening potential positive clones ...... 39 2.4 Sequencing ends of BAC inserts ...... 40 2.5 Shotgun-sequencing of BAC clones ...... 40 2.5.1 Large scale isolation of plasmid DNA ...... 41 2.5.2 Sequencing plasmid DNA ...... 41 2.6 Assembling BAC shotgun reads ...... 42

iii

2.7 Japanese lamprey whole genome sequence ...... 43 2.8 Annotation and analysis of genes ...... 43 2.9 Predicting conserved noncoding elements (CNEs) ...... 45 2.10 Functional assay of CNEs in transgenic zebrafish ...... 46 Chapter 3: Results – Hox clusters in the Japanese lamprey ...... 49 3.1 Screening of lamprey BAC libraries ...... 49 3.2 Sequencing and assembly of BAC clones ...... 50 3.3 Mining the Japanese lamprey genome assembly for Hox genes ...... 53 3.4 Determining orthology of lamprey Hox gene clusters ...... 62 3.4.1 Phylogenetic analysis ...... 62 3.4.2 Analysis of gene synteny ...... 71 3.5 Sizes of Japanese lamprey Hox clusters ...... 73 3.6 Absence of Hox12 gene in lamprey ...... 78 3.7 microRNA in Japanese lamprey Hox clusters ...... 79 Chapter 4: Results – Conserved noncoding elements (CNEs) in the Hox gene clusters ...... 82 4.1 Introduction ...... 82 4.2 CNEs in lamprey and representative gnathostome Hox loci ...... 82 4.3 Functional assay of CNEs ...... 97 4.3.1 Functional assay of αCNE2 ...... 98 4.3.2 Functional assay of αCNE5 ...... 101 4.3.3 Functional assay of βCNE2 and βCNE3 ...... 103 4.3.4 Summary of expression patterns driven by selected lamprey CNEs ...... 105 Chapter 5: Results – Genome duplications in the lamprey lineage ...... 107 5.1 Japanese lamprey Hox clusters and their duplication ...... 107 5.2 Comparative analysis of non-Hox genes in lamprey and representative gnathostomes ...... 108 5.3 Lamprey and gnathostome genome duplication history ...... 110 Chapter 6: Discussion ...... 117 Bibliography ...... 128 Appendix ...... 141 List of publications ...... 155

iv

Summary

Cyclostomes (comprising lampreys and hagfishes) are the sister group of

living jawed vertebrates (gnathostomes) and are therefore an important group for understanding the origin and diversity of vertebrates. In vertebrates and

other metazoans, Hox genes determine positional identities along the developing embryo and are implicated in driving morphological diversity.

Invertebrates typically contain a single Hox cluster (intact or fragmented)

whereas elephant shark, coelacanth and tetrapods contain four Hox clusters

owing to two rounds (‘1R’ and ‘2R’) of whole-genome duplication during

early vertebrate evolution. By contrast, most teleost fishes contain up to eight

Hox clusters due to an additional genome duplication event (‘3R’) in the ray-

finned fish lineage. In my project, using a combination of sequences from

BAC clones and a draft genome assembly, I provide evidence for at least six

Hox clusters in the Japanese lamprey (Lethenteron japonicum). Unlike the

compact gnathostome Hox clusters, lamprey Hox clusters are large and highly

repetitive and are therefore organized more like the single invertebrate Hox

cluster. Cis-regulatory elements conserved in lamprey and gnathostomes

represent elements that were present in the common ancestor of all vertebrates.

Such elements must be playing a fundamental role in the regulation of

vertebrate Hox cluster genes and can be identified as conserved noncoding

elements (CNEs) by comparing lamprey and gnathostome Hox clusters. By

aligning the lamprey Hox clusters with the four Hox clusters of elephant shark

and human, I identified 13 CNEs. Transgenic zebrafish assays indicated the

potential of selected lamprey and human/mouse CNEs to function as

v

enhancers (cis-regulatory elements), driving reporter gene expression

resembling the expression pattern of certain Hox genes in the cluster.

The presence of more than four lamprey Hox clusters suggests that its lineage

has experienced an additional round of genome duplication compared to

tetrapods. Further support for this is provided by the presence of additional

non-Hox gene/gene family paralogs in the Japanese lamprey genome

compared to the human genome. If my inference of an additional round of

genome duplication in the lamprey lineage is correct, previous inferences

stating that the lamprey lineage has experienced only 1R and 2R need to be reexamined. Because of the GC-bias of the lamprey genome, which affects codon usage patterns and amino acid composition, phylogenetic analysis was not informative for timing the 1R and 2R events relative to lamprey and gnathostome divergence. Alternatively, I sought clues about the timings of 1R and 2R by analyzing the Hox clusters of lamprey and gnathostomes. First, the synteny of genes linked to each lamprey Hox cluster is different to those linked to gnathostome Hox clusters. Secondly, individual lamprey Hox clusters share CNEs across four paralogous elephant shark and human Hox clusters suggesting a many-to-many orthology relationship between lamprey and gnathostome Hox clusters. These independent lines of evidence suggest that the lamprey and gnathostome lineages may not have shared the first two rounds (1R and 2R) of genome duplication, implying independent genome duplication histories for the two lineages followed by an additional whole- genome duplication event in the lamprey lineage.

vi

List of tables

Table 1 – Repeat element content (%) in the four complete Hox clusters of the Japanese lamprey...... 76 Table 2 - Conserved noncoding elements (CNEs) between the Japanese lamprey Hox-α, -β, -γ, and -δ clusters and the four Hox clusters (HoxA, B, C and D) of elephant shark (C. milii) and human...... 85 Table 3 – CNEs selected for functional assay in transgenic zebrafish...... 97

vii

List of figures

Figure 1 – Hox gene clusters in ...... 5 Figure 2 – Mouse (Mus musculus) Hox gene clusters (A) and their collinear expression along the embryonic anterior-posterior axis (B)...... 14 Figure 3 – Hox genes previously identified in jawless vertebrates...... 33 Figure 4 – Tol2 transgenesis in zebrafish...... 48 Figure 5 – Schematic diagram of Hox contigs derived from sequencing 32 BAC clones...... 52 Figure 6 - Japanese lamprey Hox gene loci obtained from the combined data set of 32 BAC clones and draft genome assembly...... 55 Figure 7 - Hox clusters in the Japanese lamprey...... 59 Figure 8 – The unique exon-intron structure of Japanese lamprey Hox- η4 and Hox-θ4...... 60 Figure 9 – Alignment of protein sequences of Japanese lamprey Hox-δ4, Hox- η4, and Hox–θ4 genes...... 61 Figure 10 – Phylogenetic analyses of Japanese lamprey Hox4 genes using second exon coding (nucleotide) sequences and protein sequences...... 65 Figure 11 – Phylogenetic analyses of Japanese lamprey Hox13 genes using only the second exon sequence...... 66 Figure 12 – Phylogenetic analyses of Japanese lamprey Hox9 genes using full- length coding (nucleotide) sequences and protein sequences...... 67 Figure 13 – Phylogenetic analyses of Japanese lamprey Hox11 genes using full-length coding (nucleotide) sequences and protein sequences...... 68 Figure 14 – Phylogenetic analyses of Japanese lamprey Hox8 genes using full- length coding (nucleotide) sequences and protein sequences...... 69 Figure 15 – Comparison of Japanese lamprey, human, elephant shark and coelacanth Hox loci...... 72 Figure 16 – Comparison of four Hox clusters (HOXA, B, C, and D) in human and anole lizard with the lamprey Hox clusters (Hox-α, -β, -γ, and –δ) and the single amphioxus Hox cluster...... 75 Figure 17 – Average content of repetitive elements in Hox clusters of various chordates...... 78 Figure 18 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox- α cluster with the four Hox clusters (A to D) of elephant shark and human. .. 86 Figure 19 – Alignment of partial exon1 and intron of mouse, zebrafish, and elephant shark HoxB4 and lamprey Hox-α4 genes...... 87 Figure 20 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox- β loci with the four Hox clusters (A to D) of elephant shark and human...... 89 Figure 21 – Alignment of part of the human Hs246_enhancer element and orthologous CNE regions from elephant shark (Callorhinchus milii), zebrafish (Danio rerio), and the Japanese lamprey...... 91

viii

Figure 22 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox- γ cluster with the four Hox clusters (A to D) of elephant shark and human. ... 92 Figure 23 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox- δ cluster with the four Hox clusters (HoxA to D) of elephant shark and human...... 93 Figure 24 - CNEs shared between each of the four lamprey Hox clusters and human Hox clusters...... 95 Figure 25 - 3D graph of the number of CNEs shared between human, elephant shark (C. milii), and fugu Hox clusters...... 96 Figure 26 – Expression pattern driven by lamprey αCNE2 and its human homolog in 3 dpf F1 generation zebrafish embryos...... 99 Figure 27 – Expression pattern driven by lamprey αCNE5 and its mouse homolog in 3 dpf F1 generation zebrafish embryos...... 101 Figure 28 – Expression pattern driven by lamprey Hox-βCNE2 and Hox- βCNE3 and its human homolog in 3 dpf F1 generation zebrafish embryos. . 104 Figure 29 – Three models for WGD histories in lamprey and gnathostome lineages and their expected phylogenetic topologies...... 111 Figure 30 - Schematic diagram of Japanese lamprey Hox clusters and a single flanking gene (if available)...... 116 Figure 31 - Graphical representation of spatial (left of the cluster) and temporal (right of the cluster) expression patterns of the Japanese lamprey Hox-α, -β, and –ε cluster genes in Japanese lamprey embryos...... 121

ix

List of abbreviations

1R first round of whole genome duplication

2R second round of whole genome duplication

3R third round of whole genome duplication

BAC bacterial artificial chromosome

BI Bayesian Inference bp base-pair

CNE conserved noncoding element

CNS central nervous system

DNA deoxyribonucleic acid

DNase deoxyribonuclease

ELCR early limb control region

EtBr ethidium bromide

Gb gigabase

GCR global control region

GFP green fluorescent protein kb kilobase lincRNA large intergenic noncoding RNA

LINE long interspersed nuclear element lncRNA long noncoding RNA

LTR long terminal repeat

Ma million years ago

Mb megabase

McFos mouse cFos basal promoter

x miRNA microRNA

ML Maximum-Likelihood ncRNA noncoding RNA

NJ Neighbor-Joining nt nucleotide

PCR polymerase chain reaction

PG paralogous group

PTU 1-phenyl 2-thiourea

RNA ribonucleic acid

RT-PCR reverse transcription polymerase chain reaction

SINE short interspersed nuclear element

TAE Tris-acetate-EDTA

TF transcription factor

TFBS transcription factor binding site

TSGD teleost-specific genome duplication

TSS transcription start site

UTR untranslated region

WGD whole genome duplication

YFP yellow fluorescent protein

xi

Chapter 1: Introduction

1.1 Hox proteins

Hox proteins are transcription factors that play a crucial role in developmental

patterning by establishing positional identities along the anterior-posterior

embryonic axis in most metazoans (see section 1.3). Hox proteins are also

responsible for patterning the embryonic axis of metazoans that lack an

anterior-posterior axis and are not bilaterally symmetrical, like Cnidaria (Ryan

et al. 2007). Furthermore, Hox protein are also crucial for patterning certain

structures independent of an anterior-posterior axis, like those of cephalopods

(Lee P. N. et al. 2003) and tetrapod limbs. Along with defining external

morphological features, Hox proteins also control the accurate development of

the nervous system and internal organs, as well as the vertebrate axial skeleton

(see section 1.3).

Hox proteins are characterized by the presence of a highly conserved 60 amino acid, helix-turn-helix motif known as the homeodomain. The triple helical

nature of the homeodomain allows for DNA binding at each helix for

regulation of gene transcription. Upstream of the homeodomain, Hox proteins

have a pentameric region binding the three amino acid loop extension (TALE) proteins at the N-terminus, and downstream, an acidic tail at the C-terminus.

The binding of Hox cofactors such as Exd in Drosophila and Meis or Pbx in

mammals to a motif N-terminal to the homeodomain, increases the stability of

Hox protein binding to DNA. Hox proteins function as activators as well as

repressors of downstream genes by specifically binding to DNA sequences in

1

their target’s regulatory regions, known as Hox-response enhancers.

Information about the function of Hox proteins has been largely obtained

through the study of Hox mutations. Mutations in Hox proteins can induce

dramatic phenotypic changes such as homeotic transformations, whereby

segment identities are morphologically altered and one body part develops into

another. One classical example is the Antennapedia (Antp) mutant of the fruit

fly (Drosophila melanogaster) (Schneuwly et al. 1987). This is a dominant

gain-of-function mutant where ectopic expression of Antp in a head segment

causes legs to develop in place of antenna on the fly’s head. Mutations of human Hox proteins also lead to severe phenotypes; nonsense and missense mutations in HoxA13 (Mortlock and Innis 1997; Goodman et al. 2000) and

HoxD13 (Muragaki et al. 1996; Johnson et al. 2003) cause genetic disorders of

limb formation such as hand–foot–genital syndrome (HFGS), synpolydactyly

(SPD), and brachydactyly. Because of the crucial role of Hox proteins in

patterning the embryonic axis and internal organs of diverse organisms, it is

important to characterize Hox genes and proteins in different metazoan

lineages, which would enable a better understanding of their contributions to

the morphological diversity of metazoans.

1.2 Hox gene clusters

In all metazoans, Hox genes are either arranged in intact or broken clusters on

chromosomes in the genome. Invertebrates typically have a single Hox cluster.

While the single Hox cluster is intact in some invertebrates, it is split into two

or more fragments in some or totally atomized in others resulting in singleton

Hox genes dispersed across the genome. Protostomes are the earliest branching

2

clade of Bilateria. They can be subdivided into two major groups,

Lophotrochozoa which include the widely studied annelid, Caenorhabditis

elegans (roundworm), and Ecdysozoa which include the well studied

arthropod, Drosophila (fruit fly). In C. elegans there is an atomized Hox cluster

consisting of six Hox genes that include one anterior Hox gene, ceh-13, two

linked middle-group paralogs, lin-39 and mab-5, and three linked posterior

paralogs egl-5, nob-1, and php-3 (Burglin and Ruvkun 1993). In the traditional

genetic model, the fruit fly, the Hox cluster is split into two complexes, known

as the antennapedia and bithorax complexes (Lewis 1978; Kaufman et al.

1980). Among chordates, amphioxus (Branchiostoma floridae) representing the

cephalochordates and the most basally branching clade, possesses a single

intact Hox cluster with 15 Hox genes (Fig. 1) (Amemiya et al. 2008; Holland et

al. 2008). In stark contrast, the single Hox cluster in urochordates, the sister

group of vertebrates (Delsuc et al. 2006) is highly disintegrated. For example,

the ascidian Ciona intestinalis has a highly disintegrated cluster of nine Hox

genes with several rearrangements (Seo et al. 2004) whereas the larvacean

Oikopleura dioica has a completely atomized Hox cluster with only two

duplicated Hox9 genes remaining linked (Fig. 1) (Ikuta et al. 2004). In contrast

to invertebrates, all examined vertebrate taxa to date possess variable numbers

of multiple Hox clusters. The numbers of Hox clusters in vertebrates generally

reflect the evolutionary history of genome duplications in the respective

lineages. All tetrapods possess four Hox clusters (HoxA, HoxB, HoxC, and

HoxD) (Fig. 1) (Krumlauf 1994) which have been attributed to two rounds of whole-genome duplication (WGD) events known as 1R and 2R during the early evolution of vertebrates (Dehal and Boore 2005; Putnam et al. 2008).

3

Likewise, the elephant shark (Callorhinchus milii), a Holocephalan cartilaginous fish (Ravi et al. 2009) and the two coelacanth species, Latimeria menadoensis and Latimeria chalumnae (Amemiya et al. 2010; Higasa et al.

2012) also possess four Hox clusters (Fig. 1). An exception here are the

Elasmobranch cartilaginous fishes such as the little skate (Leucoraja erinacea) and small-spotted catshark (Scyliorhinus canicula) which have only three Hox clusters each (Fig. 1), with the HoxC cluster being completely lost presumably due to a genomic deletion after the divergence of holocephalan and elasmobranch lineages (King et al. 2011). There is now incontrovertible evidence that the ancestor of teleost fishes experienced an additional round of genome duplication (“teleost-specific genome duplication” or TSGD) after it diverged from the tetrapod ancestor (Fig. 1) (Christoffels et al. 2004; Jaillon et al. 2004). This has resulted in teleost fishes possessing almost twice the number of Hox clusters as tetrapods. Basally branching teleost lineages such as

Elopomorpha (e.g., European eel, Anguilla anguilla) and Hiodontiformes (e.g., goldeye, Hiodon alosoides) have retained all eight clusters (HoxAa, -Ab, -Ba, -

Bb, -Ca, -Cb, -Da and -Db) (Chambers et al. 2009; Henkel et al. 2012). In contrast, acanthopterygians such as fugu, medaka and stickleback have lost a duplicate HoxC cluster (Málaga-Trillo and Meyer 2001) whereas cyprinids such as the zebrafish have lost a duplicate HoxD cluster (Fig. 1) (Amores et al.

1998). Among teleosts, the salmonid lineage has undergone a more recent tetraploidization event after the TSGD (Fig. 1)(Allendorf and Thorgaard 1984).

Thus it is no surprise that 13 Hox clusters were identified in the Atlantic salmon (Salmo salar) (Fig. 1) (Mungpakdee et al. 2008).

4

Figure 1 – Hox gene clusters in chordates. Stars indicate whole genome duplication events. Figure modified from (Ravi et al. 2009)

In contrast to the detailed information available for Hox genes and clusters in jawed vertebrates (gnathostomes such as cartilaginous fishes, lobe-finned fishes, tetrapods and teleosts), the organization of Hox clusters is yet to be definitively elucidated in jawless vertebrates (cyclostomes such as lampreys

5 and hagfishes). A draft genome assembly of the (Petromyzon marinus) was recently generated based on DNA isolated from the liver (Smith et al. 2013). Detailed analysis of this assembly provided evidence for two Hox clusters and an additional eight Hox genes that could not be assigned to any cluster. Nevertheless, the identification of four Hox genes from paralogous group (PG) 9 in the sea lamprey genome suggests that its genome contains at least four Hox clusters (Smith et al. 2013).

1.3 Function of Hox proteins in metazoans

1.3.1 Cnidaria

Cnidaria, the sister group to Bilateria are the only non-bilaterian phyla with identifiable Hox proteins (Larroux et al. 2007). Compared to the twin body axis of Bilateria, Cnidaria have a single body axis referred to as the oral-aboral axis

(Technau and Steele 2011). They exhibit various stages of life cycle comprising a larval (planula) stage followed by a pelagic (polyp) or a benthic

(medusa) form (Technau and Steele 2011). Contrary to certain arguments against the presence of true Hox proteins in Cnidaria (Kamm et al. 2006), the majority of studies support that a group of Hox proteins are involved in patterning the oral-aboral axis of Cnidaria (Gauchat et al. 2000; Masuda-

Nakagawa et al. 2000; Yanze et al. 2001; Finnerty et al. 2004; Ryan et al.

2007). The evolutionary history of Hox function in Cnidaria is made complex as a result secondary gene losses, variable expression patterns along the oral- aboral axis and at different stages of development among different cnidarian taxa (Chiori et al. 2009). For example, the Hox9-14 group A gene from the hydrozoan Clytia hemisphaerica is expressed at the oral pole of the planula

6 whereas its ortholog in another hydrozoan, Eleutheria dichotoma is expressed only at the medusa stage (Kamm et al. 2006). In addition, certain orthologs are expressed at the same stage but located at opposite poles along the oral-aboral axis. For example the Hox9-14 group B gene of the hydrozoan Podocoryne carnea is expressed at the oral pole (Yanze et al. 2001) whereas its ortholog in the anthozoan, Nematostella vectensis displays expression at the aboral pole

(Finnerty et al. 2004; Ryan et al. 2007).

1.3.2 Protostomes

Bilateria comprise two groups, protostomes and deuterostomes. Protostomes, the earliest branching clade of Bilateria include two well-studied organisms with a distinct anterior-posterior axis, namely Caenorhabditis elegans

(roundworm) and Drosophila (fruit fly). A number of famous studies in

Drosophila found that a series of recessive and dominant mutations could induce homeotic transformations of the fly body plan. In a well-known example, the loss-of-function Ultrabithorax (Ubx) mutant converts the

‘wingless’ third thoracic (T3) segment into a second thoracic (T2) segment with wings, ultimately producing a mutant four-winged fly (Bender et al.

1983). In Drosophila (and most bilaterians), Hox proteins play critical roles in patterning the embryonic central nervous system (CNS). For example, the Hox gene products of labial (lab) and Deformed (Dfd) are important for regionalizing neuronal identity in the Drosophila brain (Hirth et al. 1998).

Studies in C. elegans on the other hand found that of the six Hox genes in its genome, only products of the anterior gene, ceh-13 and the two most posterior

Hox genes, nob-1 and php-3 are required for proper embryonic patterning as

7 triple loss-of-function mutants with defective lin-39, mab-5, and egl-5 genes can still develop into fertile adults (Kenyon et al. 1997; Wrischnik and Kenyon

1997).

1.3.3 Deuterostomes – Ambulacraria

Within deuterostomes, ambulacrarians are the most basally branching clade comprising Xenoturbellida, Hemichordata and Echinodermata.

Characterization of Hox genes and function in Xenoturbellida is incomplete having identified only five partial Hox genes in Xenoturbella bocki (Fritzsch et al. 2008). The echinoderm sea urchin, Stronglyocentrotus purpuratus has an unusual penta-radial (and not bilateral) symmetric body-plan and its genome contains a rearranged cluster of 11 Hox genes (Cameron et al. 2006). Five of these Hox gene products are important for specifying the five points of the penta-radial body plan during adult stages of embryonic development (Arenas-

Mena et al. 2000). As opposed to the peculiar penta-radial body plan of echinoderms, their sister group, hemichordates are bilaterally symmetrical. In the hemichordate, Saccoglossus kowalevskii, the complete complement of 12

Hox genes pattern the anterior-posterior axis (Aronowicz and Lowe 2006).

Modifications of the two-most posterior Hox genes (with reverse orientation) in the hemichordate clusters are thought to have altered the function of their protein product, driving posterior axial innovations such as tails, stalks, and holdfasts (Freeman et al. 2012).

1.3.4 Deuterostomes – Vertebrates

8

Within deuterostomes, chordates are the sister group to ambulacrarians.

Vertebrates are the largest subphylum of chordates and morphologically diverse. Numerous studies have looked into the evolution of Hox protein function in controlling axial morphology and patterning various organs, tissues, and cell types within this morphologically distinct group of organisms. Data on

Hox protein function in the earliest branching clade of extant vertebrates, jawless vertebrates is non-existent however, certain studies have managed to follow the expression pattern of some Hox genes in Japanese lamprey embryos

(Takio et al. 2004; Takio et al. 2007); see section 1.8. In jawed vertebrates on the other hand, a number of studies have gained useful insights into Hox function within major groups, namely tetrapods (studies in mice, chick, snakes, and frogs) and teleost fishes (studies in zebrafish). Key information regarding

Hox protein function in vertebrates is predominantly derived from the manipulation of Hox activity in chick embryos or gain/loss-of-function studies in mice. The focus of this section will be related to the functions of each of the

39 Hox proteins in mice as they have been established by a series of targeted mutations. Such mutations have demonstrated that the inactivation or overexpression of certain Hox genes products result in the transformation of anatomical regions. More specifically, mice lacking Hoxa1 exhibit defects in hindbrain segmentation (Lufkin et al. 1991; Chisaka et al. 1992) whereas

Hoxa2/Hoxb2 compound mutants completely lack interrhombomeric boundaries between rhombomere 1 (r1) and rhombomere 4 (r4) (Davenne et al.

1999). Such studies demonstrate that products of the earliest expressed Hox1 and Hox2 genes are important for the initial stages of hindbrain development and compartmentalization. More posteriorally, the generation of Hox PG4 to

9

PG11 mutants display abnormal spinal cord development; this includes a

reduced and disorganized phrenic motor column region in the cervical-brachial

spinal cord junction of Hoxa5/Hoxc5 mutants (Philippidou et al. 2012),

degeneration of the second spinal ganglion in Hoxb8 mutants (van den Akker

et al. 1999; Holstege et al. 2008), and the loss of lumbar motor neurons in

Hoxc10 mutants (Hostikka et al. 2009). Additionally, Hoxb13 mutants display

a caudally extended spinal cord and defective sensory innervations of the tail

(Economides et al. 2003). Such studies highlight the critical role that Hox

proteins play in neuronal specification of the CNS. Meanwhile, other studies

have concentrated on the role of Hox proteins in axial skeleton development of

mice. For example, the loss of Hoxd3 induces the formation of defective first and second cervical vertebrae (Condie and Capecchi 1993). On the other hand, overexpression of HoxPG6 induces ectopic rib formation at the cervical and lumbar regions, highlighting the importance of this paralog group for rib morphogenesis (Vinagre et al. 2010). In contrast, the expression of Hox PG10

is able to inhibit rib formation as demonstrated by the complete absence of ribs

after the activation of Hoxa10 in the presomitic mesoderm (Wellik and

Capecchi 2003). The production of floating ribs (as opposed to those attached

to the sternum) is under the control of Hox PG9 as shown by a higher than

expected number of ribs attached to the sternum upon complete inactivation of

all members of Hox PG9 (McIntyre et al. 2007). Meanwhile, the loss of

Hoxd11 results in changes associated with sacral patterning (Davis and

Capecchi 1994). Such mutation studies of members from Hox PG3 to PG11

suggest that these genes are also vital for patterning the axial skeleton.

10

The function of vertebrate Hox proteins are not only limited to patterning the primary body axis, CNS, and the axial mesoderm, but also involved in the development of other systems independent of an anterior-posterior axis, like for example the urogenital system (Taylor et al. 1997), vertebrate digestive system

(Sekimoto et al. 1998), paired appendages (Ruvinsky and Gibson-Brown 2000) and heart looping (Soshnikova et al. 2013). For example, specific compound deletion of the HoxA and HoxB clusters result in mouse embryos with deficient heart looping (Soshnikova et al. 2013). Additionally, the inactivation of both HoxA and HoxD clusters in mice induce a severe reduction in limb size, highlighting the combined role of both HoxA and HoxD cluster genes in limb development (Kmita et al. 2005). Such studies are all the more interesting when comparisons are made with the function of Hox genes in fellow tetrapods with an atypical body plan, like for example the snake. The elongated body plan of snakes is characterized by a loss of limbs, an increased number of vertebrae, and reduced skeletal regionalization along the primary body axis.

Such striking morphological differences are thought to have occurred by sequence changes in particular snake Hox genes that altered their expected axial boundaries and certain body regions compared to other tetrapods e.g. the significantly extended thorax (Woltering et al. 2009; Di-Poi et al. 2010). Such variations in vertebrate morphology between taxa are fascinating and in part, derived from a diversification of mechanisms regulating Hox gene expression and protein function during embryonic development.

1.4 Expression and regulation of Hox genes

11

The presence of Hox genes in all studied metazoans and their specific expression patterns along the embryonic axis indicate a conserved role for this feature. An interesting aspect that emerged from the genetic analysis of

Drosophila Hox genes was the idea of ‘spatial collinearity’ whereby the ordered arrangement of Hox genes in a cluster affects body patterning from the anterior to the posterior of the organism (Lewis 1978). Most invertebrate Hox genes are expressed in an anterior to posterior manner according to their ordered arrangement within the cluster (spatial collinearity). In C. elegans, the majority of Hox genes display a spatial collinear order of expression along the anterior-posterior axis (Tihanyi et al. 2010). However, the anterior Hox gene, ceh-13, extends it expression domain from anterior to posterior regions

(Tihanyi et al. 2010). Such breaks in collinearity may have facilitated the evolutionary innovations of the ceh-13 gene which, unlike other C. elegans

Hox genes, is involved in multiple developmental roles including cell adhesion, cell fusion, cell migration, growth rate, and fertility (Brunschwig et al. 1999;

Tihanyi et al. 2010). In the archetypal single Hox cluster of the cephalochordate amphioxus, Branchiostoma lanceolatum, Hox1, Hox3, Hox4,

Hox7, and Hox10 genes all follow a spatial collinear order of expression along the CNS and mesoderm (Pascual-Anaya et al. 2012). However, a break in spatial collinearity was observed for both Hox6 which was expressed more anteriorally than Hox4, and Hox14, which was expressed in both anterior

(cerebral vesicle) and posterior (mid-hindgut, posterior notochord, and tail bud) structures (Pascual-Anaya et al. 2012). Other invertebrate Hox genes display a limited order of spatial collinearity due to their atomized organization. Among urochordates, the ascidian Ciona intestinalis display a limited collinear order of

12

Hox gene expression within the larval CNS and the juvenile gut (Ikuta et al.

2004). On the other hand, Hox gene expression in the larvacean urochordate

Oikopleura dioica was mainly restricted to the tail with strong domains of spatial collinear expression in the nerve cord, notochord, and muscle (Seo et al.

2004). Despite the disintegrated or atomized Hox clusters of urochordates, the ability to maintain a certain degree of spatial collinearity is remarkable and apparently sufficiently deployed for patterning the anterior-posterior axis of the developing CNS (Ikuta et al. 2004). Certain breaks in collinearity and cluster disruption appear to have increased the tissue-specific role of Hox genes in urochordates. This could have permitted the separation of expression domains and the advanced development of certain anatomical features e.g. a tail with far more sophisticated functions (Seo et al. 2004).

Data on Hox gene expression in the earliest branching clade of extant vertebrates, jawless vertebrates is limited but certain studies have managed to follow the spatial expression pattern of some Hox genes along the CNS in

Japanese lamprey embryos (Takio et al. 2004; Takio et al. 2007); see section

1.8. This phenomenon of ‘spatial collinearity’ has been widely studied in mouse Hox clusters whereby genes in the 3’-end of the cluster are expressed in anterior segments of the embryo, and the 5’-genes are expressed in the posterior segments (Gaunt et al. 1988; Duboule D. and Dolle 1989; Graham et al. 1989) (Fig. 2). Overall, the ‘spatial collinearity’ of Hox genes is conserved in most invertebrate and all vertebrate Hox clusters. In addition, Hox clusters in gnathostomes (jawed vertebrates) display ‘temporal collinearity’, wherein the

3’-end genes of the cluster are expressed earlier than the 5’-end genes during

13 development (Izpisua-Belmonte et al. 1991) (Fig. 2). Besides spatial and temporal collinearity, Hox cluster genes also show ‘quantitative collinearity’, i.e. when several Hox genes are co-activated at a particular position along the anterior-posterior axis, the most posterior gene in the cluster is the most strongly expressed (Dolle et al. 1991; Kmita et al. 2002). The organization of vertebrate Hox genes as tight clusters is thought to be an evolutionary constraint due to these precise spatial, temporal and quantitative expression patterns of Hox genes along the developing embryo reflecting their positions within the cluster. Although much work has been done on the regulation of

Hox genes and clusters, the relationship between the clustered organization of

Hox genes and their spatial and temporal collinear expression is poorly understood.

A) Mouse Hox clusters

HoxA HoxB HoxC HoxD

B) Hox expression along mouse embryo

Posterior Anterior

Late Tempor al collinearity Early

Figure 2 – Mouse (Mus musculus) Hox gene clusters (A) and their collinear expression along the embryonic anterior-posterior axis (B). Paralogous Hox genes (A) and their respective spatially collinear expression boundaries in the mouse embryo (B) are shown in the same color. Figure modified from (Pearson et al. 2005).

14

The spatial collinearity of Hox gene expression pattern appears to be regulated

by a number of proximal regulatory elements distributed across the clusters

(Spitz et al. 2001; Tumpel et al. 2009). For example, the independent

expression of Hox1-4 genes in the hindbrain (Abzhanov and Kaufman 1999;

Tumpel et al. 2009) and the more posteriorly located spinal cord (Carpenter

2002) is regulated by multiple cis-acting elements flanking the individual

Hox1-4 genes (Sham et al. 1993; Frasch et al. 1995; Popperl et al. 1995). In a recent analysis, the coordinated expression of certain HoxD genes along the neural tube was pinpointed to be regulated by two ‘enhancer mini-hubs’, one located between HoxD3-4 and the other between HoxD9-10 (Tschopp et al.

2012). The first ‘regulatory hub’ controls transcription of HoxD3 and HoxD4

along the spinal cord, whereas the second ‘hub’ directs HoxD9 and HoxD11

expression to more posterior regions of the spinal cord (Tschopp et al. 2012).

The regulation of HoxD genes is not only confined to regulatory elements

within the cluster, but also multiple long-range enhancers acting from outside

the cluster (Spitz et al. 2003; Zakany et al. 2004; Spitz et al. 2005; Montavon et

al. 2011; Andrey et al. 2013). These long range enhancers are found flanking

the HoxD cluster as part of two global control regions (GCRs) – regions of

DNA responsible for directing expression patterns of multiple genes in a

manner independent of their local enhancers, and over large genomic distances.

A 5’-GCR (located towards the centromeric end in a gene desert) mapping

~240 kb upstream of HoxD13 regulates the expression of Lnp, Evx2, and the

posterior HoxD genes (HoxD13 to HoxD10) in the distal limb bud (giving rise

to prospective digits) and the central nervous system (Spitz et al. 2003). A 3’-

GCR referred to as the ‘early limb control region’ (ELCR) in the telomeric gene

15 desert is located downstream of HoxD1 and regulates the early collinear order of

HoxD gene expression along the anterior-posterior axis of developing limb buds

(Zakany et al. 2004). Interestingly, the 5’ enhancer shares a common core-motif with a long-range enhancer located upstream of the HoxA cluster in the fourth intron of Hibadh (Lehoczky et al. 2004). This 5’-HoxA enhancer has been implicated in the regulation and expression of HoxA13 and four upstream genes

(Evx1, Hibadh, Tax1bp1, and Jazf1) in the distal limb and genital bud

(Lehoczky et al. 2004). Currently, it is unknown whether the 3’ region of the

HoxA cluster or the flanking regions of both HoxB and HoxC clusters possess regulatory GCRs similar to those found in the HoxD loci.

The HoxD cluster genes of mouse are activated in two ‘transcriptional phases’ associated with the early and late stages of limb development (Nelson et al.

1996). Recent studies have shown that during the early stages of limb development, HoxD9-11 genes exhibit the first phase of expression by contacting the centromeric gene desert, whereas during late stage limb development, the same genes are involved in the second phase of expression and now interact with the telomeric gene desert (Andrey et al. 2013).

Interestingly, enhancers of the centromeric desert are shut down almost immediately when the switch occurs, even if the telomeric desert region is deleted, indicating that the two desert enhancer regions are functionally independent of each other when patterning the vertebrate limb (Andrey et al.

2013). This example of vertebrate limb patterning demonstrates the importance of maintaining a collinear organization of Hox genes in tight clusters.

16

The higher-order chromatin structure of Hox loci adds to the complexity of

regulation over the clustered set of genes, contributing to the coordinated

transcriptional control of Hox genes (Soshnikova and Duboule 2009). Changes

in chromatin state include open, closed, or poised for activation, allowing for

the transcription of Hox genes at various developmental stages whilst

maintaining cellular identity throughout cell divisions (Soshnikova 2013). The

epigenetic regulation of Hox gene clusters seem to mainly rely on the histone

modifying activities and histone mark binding of protein complexes encoded by

Polycomb (PcG) and Trithorax (TrxG)/Mll group genes (Simon and Kingston

2013). Two PcG complexes, namely Polycomb Repressive Complex 1 and 2

(PRC1 and PRC2) are responsible for silencing the Hox clusters (Simon and

Kingston 2013). More specifically, a component of PRC2, Ezh2 methylates

histone H3 at lysine 27 (H3K27me3), which dynamically coats the Hox

clusters leading to a transcriptionally inactive state of all Hox genes in

embryonic stem (ES) cells or forebrain (Lee T. I. et al. 2006; Noordermeer et

al. 2011). On the other hand, TrxG complexes tri-methylate histone H3 at

lysine 4 (H3K4me3) and overlay extended actively transcribed Hox cluster

regions (Montavon et al. 2011). A well-documented example is the temporal

collinear activation of Hox genes in the mouse tail bud that is associated with a

progressive gain in H3K4me3 marks (transcriptionally active chromatin state)

and a loss of H3K27me3 marks (repressed chromatin state) over the HoxD

cluster (Soshnikova and Duboule 2009). However, splitting of the HoxD

cluster forcing the isolation of the HoxD11-D13 region from the rest of the

cluster (Spitz et al. 2005) showed that the active H3K4me3 marks pre-label

future transcription sites but are unable to activate Hox genes in the absence of

17 remote enhancers. This therefore indicates the strict requirement of clustered

Hox genes for the precise temporal activation of Hox genes.

The regulation of Hox gene expression is also controlled by the transcription of overlapping noncoding RNAs (ncRNA) located within Hox clusters, mostly originating from the antisense strand (Rinn et al. 2007). Certain long noncoding

RNAs (lncRNAs) regulate Hox genes by modifying chromatin state through

PRC and Mll complex binding (Rinn and Chang 2012). One of these, a 2.2 kb ncRNA, named HOTAIR (HOX Antisense Intergenic RNA) was found in the

HoxC locus of mammals and shown to interact with PRC2 to repress transcription in trans across a 40 kb region (including HoxD8-11 and several ncRNAs) of the HoxD cluster (Rinn et al. 2007). Conversely, a large intergenic noncoding RNA (lincRNA) transcribed from the 5’ tip of the HoxA locus, termed HOTTIP (HoxA transcript at the distal tip) recruits the Mll complex to chromatin in order to coordinate the activation of multiple 5’ HoxA genes in vivo by looping itself over to the target gene (Wang K. C. et al. 2011). On the

3’ end of the HoxA cluster exists the lincRNA HOTAIRM1 (HOX Antisense

Intergenic RNA Myeloid 1) implicated in myelopoeisis (Zhang X. et al. 2009).

Another form of non-coding RNA enforcing regulatory control upon Hox genes are the microRNAs (miRNA) found embedded within the noncoding regions of Hox clusters (Mansfield and McGlinn 2012), and shown to preferentially target and repress mRNAs of Hox genes at the 3’ end of the cluster, reinforcing an increase in posterior Hox gene function over anterior

(i.e. posterior prevalence) (Yekta et al. 2008).

18

1.5 Cis-regulatory elements

The differential expression of genes at various stages of development and in different cell types is important for a variety of biological processes such as morphogenesis, cell differentiation and proliferation. Cis-regulatory elements are noncoding DNA sequences that regulate the precise spatial and temporal expression of the target gene. They comprise core promoters, proximal promoters, enhancers, insulators, silencers, and locus control regions. A core promoter, usually containing the transcription start site (TSS) and ~35 bp flanking sequence either side is considered as the minimal region required to successfully initiate gene transcription by the RNA polymerase II machinery

(Butler and Kadonaga 2002). The other classes of cis-regulatory elements generally interact with the core-promoter of the target gene to enhance or suppress the level of expression of the target gene. Such elements contain multiple sequence-specific transcription factor binding sites (TFBSs). The locus control regions (LCRs) or global control regions (GCRs) are a distinct class of regulatory element. They are composed of multiple types of cis- regulatory elements including enhancers, silencers, and insulators and enhance the tissue-specific expression patterns of single or multiple genes independent of the proximal enhancers of the genes.

Transcriptional enhancers are modular in nature and thus different enhancers can act independently of each other on the same promoter at different times, in various cell types and in response to different external stimuli. An example of this is seen for the regulated expression of even skipped (Eve) in seven thin stripes of early-stage Drosophila embryos by five distinct enhancers, each

19

contributing to the specific expression in either one or two stripes (Andrioli et

al. 2002). The modular organization of transcriptional enhancers implies that if

a single module is mutated, the functions of other modules are unaffected.

Transcriptional enhancers contain multiple binding sites for transcription

factors and the binding sites for each transcription factor normally ranges from

4-12 bases. Transcriptional enhancers can be located in the flanking regions or

within the intron of a gene and can act on target genes located hundreds of

kilobases away. A classic example is the Sonic hedgehog (Shh) enhancer, located ~1 Mb away in the intron of a neighboring gene, Lmbr1 (Lettice et al.

2003). The long range interaction between transcription factors at enhancer

regions and transcriptional units at the promoter occurs by a ‘looping’

mechanism whereby the enhancer is brought close to the promoter by a

‘looping out’ of the intervening DNA sequence to form an active transcription

complex (Li et al. 2002). Transcriptional enhancers can act independently of

their orientation with respect to their target genes. This is illustrated by the

SV40 viral enhancer sequence that could successfully enhance globin gene

expression when cloned either 1.4 kb upstream or 3.3 kb downstream of the

rabbit β-globin gene (Banerji et al. 1981). The identification of cis-regulatory elements in the large genomes of vertebrates is a challenging task as they lack a well-defined structure. As mentioned above, they are typically composed of clusters of various TFBSs that are normally 4 – 12 bp long and arranged in no specific order. They can be located far away from their target gene and can be located on the positive or negative strand of DNA in relation to the target gene.

1.5.1 Methods to predict cis-regulatory elements

20

Several methods are used to predict cis-regulatory elements in genomic

sequence and these methods can be broadly classified as ‘traditional methods’

and ‘genomic strategies’. The traditional methods include the DNA

footprinting assay (Galas and Schmitz 1978), electrophoretic mobility shift

assay (EMSA), chromatin immunoprecipitation (ChIP) and the DNase I

hypersensitivity assay (Keene et al. 1981; McGhee et al. 1981). The DNA

footprinting assay (Galas and Schmitz 1978) and EMSA were two of the first

biochemical methods applied to identify DNA binding sites of a particular

protein of interest. They are both similar in that the binding of a particular

protein to an end-labeled DNA is detected by comparing mobility shift against

control DNA (with no protein-bound) along a polyacrylamide gel. In EMSA,

lack of a cleavage step (applied in DNA footprinting) results in the observation

of a single band and an increase in protein concentration (slowing the mobility

shift of bound regions) enables the detection of more binding sites compared to

DNA footprinting. The disadvantage of both methods is that prior knowledge

of the putative cis-regulatory element and a potential DNA-binding protein is required, and as the assays are carried out in vitro they may not be reflective of

events occurring in vivo. On the other hand, the DNase I hypersensitivity assay is able to capture the chromatin state of DNA sequences in vivo (Keene et al.

1981; McGhee et al. 1981). In the eukaryotic nucleus, DNA coils around

histone complexes to form nucleosomes that serve to pack the large eukaryotic

genome into the nucleus. The DNA’s affinity to nucleosomes is lowered by

certain modifications to histone proteins such as trimethylation of histone H3’s

lysine 4 and acetylation of histone H3. This causes nucleosome displacement

resulting in an open chromatin state making the affected DNA susceptible to

21

DNase I cleavage and accessible to the binding of transcription factors. Hence,

DNase I hypersensitivity sites mark functional noncoding elements like cis- regulatory elements and origins of replication (Cereghini et al. 1984; Gross and

Garrard 1988). The fleeting nature of DNase I hypersensitivity marks on DNA sequences is advantageous for the discovery of potential cis-regulatory elements interacting with transcription factors acting in either a spatial or temporal manner. On the other hand, chromatin immunoprecipitation (ChIP) is a targeted approach whereby the sites of transcription factor-DNA interaction can be identified in vivo. In this technique, living cells are fixed with formaldehyde to crosslink proteins to genomic DNA. The genomic DNA is then extracted and digested into 150 – 900 bp fragments. Antibodies specific to the transcription factor of interest are used to co-precipitate DNA fragments to which the protein is bound. After the cross-linking reaction is reversed, DNA is purified and prepared into a ChIP library by cloning into a vector or by adding adaptors, and then sequenced using vector or adaptor primers. Advantages of this method include the detection of specific protein-DNA interactions with no bias in any given cell type. Disadvantages of this method include problems in raising antibodies specific to the transcription factor of interest, especially for those belonging to large protein families. Other disadvantages include in-depth analysis to map direct protein-DNA interactions and the occurrence of unwanted indirect interactions.

With the availability of whole genome sequences of human and other vertebrates, several genomic strategies have been developed to identify and characterize cis-regulatory elements across the genome. Some of these

22

strategies are an extension of ‘traditional methods’ like the DNase I

hypersensitivity assay and ChIP assay, but on a whole-genome scale to map

certain chromatin signatures characteristic of cis-regulatory elements and

identify occupancy sites of sequence-specific transcription factors. A powerful

strategy that was made feasible by the availability of multiple related genomes

is that of comparative genomics. In this method, there is no sequence-specific

bias to genomic regions and does not require any prior information of

transcription factor interactions. Comparisons of whole genome sequences through multi-species alignments highlight functional elements (coding and noncoding) that have remained highly conserved over evolutionary time scales

(Bejerano et al. 2004; Sandelin et al. 2004; Shin et al. 2005; Woolfe et al. 2005;

Pennacchio et al. 2006). Functional assays of conserved noncoding elements,

hereafter referred to as CNEs, have indicated that a majority of them function

as cis-regulatory elements driving tissue-specific expression of associated

genes during early development (Woolfe et al. 2005; Pennacchio et al. 2006;

Visel et al. 2007). Thus, comparative genomics has become a method of choice

for predicting cis-regulatory elements in the human genome. Whole-genome

comparisons are usually conducted by local alignment programs like

MegaBLAST (Zhang Z. et al. 2000) and BLASTZ (Schwartz et al. 2003b) to

rapidly align homologous regions. When such methods are used for the

alignment of distantly related genomes like human and fish, some orthologous

regions may be missed due to the stringent parameters of local alignment

algorithms. This is where locus-by-locus global alignments can be utilized to

identify all the CNEs of a given region by using programs like

23

PipMaker/MultiPipMaker (Schwartz et al. 2003a), LAGAN/MLAGAN

(Brudno et al. 2003) and AVID (Bray et al. 2003).

1.5.2 Comparative genomics approach to predict cis-regulatory elements

The comparative genomics approach has been used effectively for predicting

functional cis-regulatory elements in the Hox clusters of vertebrates. A

pioneering work was carried out by (Aparicio et al. 1995) who identified a

neural enhancer in the intron of HoxB4 in mouse by comparing the noncoding

region surrounding the HoxB4 gene of mouse and fugu. In another study, a

multiple alignment of vertebrate HoxA cluster sequences from horn shark,

mouse, human and teleost fishes such as tilapia, pufferfish, striped bass, and

zebrafish was able to identify several known regulatory elements as well as

many new putative regulatory elements in the HoxA cluster (Santini et al.

2003). Indeed in a more recent study comparing the orthologous Hox clusters

between 19 vertebrate species (excluding teleost fish), several highly conserved

CNEs were identified within the four Hox clusters, with more CNEs located in anterior regions than posterior regions (Matsunami et al. 2010). Some of the

CNEs contained motifs for elements contributing to Hox gene expression, for

example retinoic acid response elements (RARE). However, none of these

were functionally verified. Nonetheless, such studies have shown that by

comparing DNA sequences from evolutionary distant vertebrate species, cis-

regulatory elements that were present in the common ancestor of these

vertebrates can be predicted.

1.5.3 Testing the function of predicted cis-regulatory elements

24

Predicting CNEs is the first step in prioritizing candidate cis-regulatory elements for functional assays. The function of predicted cis-regulatory elements can be studied by examining their ability to direct reporter gene expression in cell lines or transgenic . The latter is the method of choice for testing candidate cis-regulatory elements associated with developmental genes that exhibit tissue-specific expression. In transgenic reporter gene assays, the putative cis-regulatory element is cloned upstream of a reporter gene linked to a core promoter. Examples of reporter genes used in such systems include the gene encoding green fluorescent protein (GFP) and β-galactosidase. The linearized plasmid construct is injected into mouse or zebrafish embryos, upon which it randomly integrates into the genome. The reporter gene expression is then monitored during development. The advantage of transgenic reporter gene assays is that they are an effective way to test the function of putative cis- regulatory elements in vivo. However, a major disadvantage of this technique is the occurrence of random transgene integration causing a phenomenon referred to as ‘positional effect’, whereby the reporter gene may show expression driven by an endogenous cis-regulatory element that is closer to its integration site. To eliminate such positional effect, similar patterns of expression are required from independent transgenic lines before any conclusions are made about the putative cis-regulatory element being tested. It is also important that transgene microinjections into mouse or zebrafish embryos be carried out at the one-cell stage so that all resulting cells in the organism contain a transgene copy in their genome. While these are some of the limitations that can be overcome in transgenic assays, there are certain advantages and disadvantages of using mice and zebrafish as transgenic systems for studying reporter gene expression. The

25 main advantage of using mice for transgenic studies is that they are mammals and are therefore similar both physiologically and genetically to humans. Also, when genetically manipulated and homozygous transgenic lines are established, the highly stable germline integration can be maintained by breeding programs to a high number of generations (Aigner et al. 1999). On the other hand, mice are expensive to maintain compared to fish models and microinjection into mouse eggs requires elaborate techniques. The advantage of using zebrafish in transgenic systems is that they are inexpensive to grow and maintain than mice. However, zebrafish require well-maintained water systems and they are not as closely related to humans as mice. Nonetheless, the use of zebrafish for transgenic studies is well-suited as they lay a large number of eggs and the embryos are transparent allowing for visualization of reporter gene expression in live embryos. The generation time is also relatively short (~

3 months). The ability to detect reporter gene expression in injected fish shortly after transgene injection allows for the screening of large number of fishes and hence, increases the possibility of observing positive transgenics. However, the resulting founders will be mosaic in expression in their soma and germline as transgene integration into the genome usually occurs after the first cell division.

Transient expression in the F0 population can be confirmed as the real expression either by observing a large number of founders or by generating stable F1 transgenic lines. In my project I looked at both transient expression

(in F0) as well as stable transgenic expression (after germline transmission in

F1). This could be carried out as a result of the increasingly efficient transposon-based gene transfer strategies with high cargo capacities of a

26

maximum of ~11 kb (Kawakami et al. 2004; Urasaki et al. 2006; Kawakami

2007).

1.6 Genome duplication in the stem vertebrate lineage

Susumu Ohno, in his famous book, Evolution by Gene Duplication, proposed

that two or more whole genome duplications occurred in the early stages of

vertebrate evolution (Ohno 1970). This hypothesis at the time was solely based

upon the genome sizes of various chordates and the complement of

chromosomal arms. Ohno also proposed that the large number of duplicate

genes generated through genome duplications provided the genetic material for evolutionary innovations in the vertebrate lineage (Ohno 1970). Clearly, whole genome duplications (WGDs) are a major evolutionary force that can dramatically change the genomic content of organisms. They create a large number of duplicated genes providing raw material for evolution of novel genes and new genetic networks that can ultimately lead to evolutionary phenotypic innovations. The evolutionary innovation of many vertebrate- specific features such as the neural crest, a complex segmented brain, various signaling transduction pathways, and an endoskeleton to name but a few, have been hypothesized to be fuelled by whole-genome duplication events in the vertebrate stem.

Recent comparisons of the human genome with that of basal chordates such as urochordates (Ciona) and cephalochordates (Amphioxus) have provided support to Ohno’s hypothesis and shown that two rounds of genome duplication, called 1R and 2R, occurred in the stem vertebrate lineage (Dehal

27

and Boore 2005; Putnam et al. 2008). Because of these genome duplications,

human and other tetrapod genomes contain four paralogous regions for a single

region in basally branching non-vertebrate chordates. This is also reflected in

the four Hox clusters found in tetrapods compared to a single Hox cluster in

Amphioxus. In addition to 1R and 2R in the stem lineage of vertebrates, a more

recent WGD event, referred to as the teleost-specific genome duplication

(TSGD) or 3R, was identified in the ray-finned fish lineage (Christoffels et al.

2004; Jaillon et al. 2004). This genome duplication is estimated to have

occurred before the radiation of teleosts around 350 million years ago

(Christoffels et al. 2004; Jaillon et al. 2004). Teleosts are the largest and most diverse group of vertebrates. They exhibit wide diversity in their morphology, behavior and adaptation. With approximately 28,000 living species, teleosts

represent nearly half of all living vertebrates. Although the diversity and rapid

speciation of teleosts has been attributed to the additional TSGD (Ravi and

Venkatesh 2008), strong evidence linking the two events is lacking. The

supernumerary Hox clusters in teleosts fishes, as discussed in section 1.2, also

reflect the additional genome duplication event in the ray-finned fish lineage.

Although there is convincing evidence for 1R and 2R in the vertebrate stem

lineage, the timings of these events in relation to the branching of early

vertebrate lineages have not been satisfactorily resolved. The presence of four

Hox clusters in the elephant shark (Ravi et al. 2009) similar to that in tetrapods

(Fig. 1), indicates that both 1R and 2R occurred before the divergence of

gnathostomes. However, it is not clear if 1R and 2R occurred before or after the

divergence of the jawless cyclostomes (lampreys and hagfish) from the

28

ancestor of gnathostomes. The key to this question lies in the genomes of

cyclostomes; and in particular the gene complement and genomic organization

of cyclostome genomes and their relationship to their homologs in gnathostome

genomes. This issue was addressed when the draft genome of the sea lamprey

was analyzed recently (Smith et al. 2013). Based on the conserved patterns of

duplication frequencies between lamprey, human, and chicken protein-coding

genes, and the fact that single lamprey scaffolds contained regions of

interdigitated homology to two distinct regions of a gnathostome genome, it

was inferred that both 1R and 2R may have occurred before the lamprey

lineage diverged from the gnathostome lineage. However, the sea lamprey

genome assembly was generated from DNA extracted from a somatic tissue

(liver). Since it is known that sea lamprey looses about 20% of genomic DNA

from somatic tissue due to developmentally programmed genome

rearrangements during early development, the assembly represents less than

80% of the genome. Furthermore, this assembly contained only two Hox clusters in addition to eight Hox genes that were not part of any cluster (Smith et al. 2013). Thus the analysis of the somatic DNA based draft genome assembly of the sea lamprey is considered inconclusive regarding the timing of

1R and 2R.

1.7 Jawless vertebrates (cyclostomes)

The living vertebrates are divided into two broad groups, the jawless vertebrates (also known as cyclostomes) and jawed vertebrates (gnathostomes).

The jawless cyclostomes are thus a valuable outgroup to gnathostomes and are an important reference group for understanding the origin and diversity of

29 vertebrates. The cyclostomes include two lineages, the lampreys and hagfishes represented by approximately 38 and 60 species, respectively (Hardisty 2011).

By contrast, the gnathostomes are the most diverse and ‘successful’ group of vertebrates. They comprise approximately 60,000 species that are split into two major lineages, the cartilaginous fishes (sharks, skates, rays and holocephalans) and bony vertebrates (ray-finned fishes, lobe-finned fishes and tetrapods).

Cyclostomes and gnathostomes differ significantly in their morphological traits and physiological systems. Cyclostomes contain a single, medially- located dorsal nostril as opposed to two ventrally located nostrils in gnathostomes. The notochord persists in adult cyclostomes whereas it is present only during embryogenesis in gnathostomes. Cyclostomes lack hinged jaws, paired appendages, pancreas, and a spleen (Richardson et al. 2010), features that are characteristic of gnathostomes. Cyclostomes are devoid of mineralized tissues unlike gnathostomes that exhibit a wide range of mineralized tissues in their exoskeleton and endoskeleton. Although cyclostomes possess an adaptive immune system, it is not based on recombination-activating gene (RAG)-mediated rearranged immunoglobulin genes. Instead, they use diversified antigen receptors containing leucine-rich repeats, called variable lymphocyte receptors (VLRs) (Pancer et al. 2004).

These contrasting features of cyclostomes and gnathostomes combined with the unique phylogenetic position of cyclostomes, makes them a critical group for understanding the evolution and morphological diversity of vertebrates.

30

Recent studies have uncovered an interesting and intriguing feature unique to

the lamprey genome. The germline genome of the sea lamprey was shown to

undergo massive programmed genome rearrangements during early

embryogenesis, resulting in the deletion of ~20% of germline DNA from

somatic tissues (Smith et al. 2009; Smith et al. 2012). While most of the DNA

lost was noncoding, certain genes functioning as part of transcriptional

programs regulating germline and somatic cell fates were also lost (Smith et al.

2012). This finding lead the authors to suggest that genes with germline-

specific functions associated with pluripotency e.g. recombination and genetic

reprogramming are likely excluded from the somatic lineage so as to eliminate

genes which when misregulated, could contribute to oncogenesis or other fatal

disease states (Smith et al. 2012). Although no detailed studies have been

conducted in other lampreys or in hagfish, previous studies have identified the

deletion of certain germline repeats in the somatic tissues of Pacific hagfish

(Nakai et al. 1991). Therefore it is likely that the programmed rearrangement of

genomic DNA in somatic tissues is a shared feature of all cyclostomes. This makes it imperative that efforts to generate whole genome sequence assemblies of cyclostomes should use germline DNA for sequencing.

The jawless vertebrates are proposed to have diverged from gnathostomes about 535 – 462 Ma (million years ago) (Janvier 2006), with hagfish and lamprey lineages diverging shortly after the split, around 470 – 390 Ma

(Kuraku and Kuratani 2006). Although it is widely accepted that gnathostomes are monophyletic, the precise relationship between lampreys, hagfish, and

gnathostomes was controversial for a long time. Morphological datasets have

31 supported a sister group relationship between lamprey and gnathostomes to the exclusion of hagfish, implying that cyclostomes are paraphyletic (Lovtrup

1977; Hardisty 2011). However, molecular data support ‘cyclostome monophyly’ i.e., hagfish and lamprey are sister groups and together form an outgroup to the gnathostomes (Stock and Whitt 1992; Takezaki et al. 2003). In recent years, the latter phylogenetic relationship has received more support and this includes analysis based on the presence or absence of miRNA families in the three lineages (Heimberg et al. 2010). Thus, it is now widely accepted that lamprey and hagfish form a monophyletic group.

1.8 Hox genes in jawless vertebrates (cyclostomes)

Initial studies of Hox genes in cyclostomes mainly used PCR to amplify homeodomain regions encoded by the second exon and assigned them to various paralogous groups (PGs) based on their homology to known Hox genes in other vertebrates. Using this strategy 33 Hox genes were identified in the

Pacific hagfish (Eptatretus stoutii) (Stadler et al. 2004) and were assigned to 13 different Hox PGs (Fig. 3). Interestingly these genes include seven members of

PG9 which suggests that hagfish contains up to seven Hox clusters. This finding lead the authors to suggest that the hagfish lineage may have undergone an additional Hox cluster or gene duplication event independent of 1R and 2R in the vertebrate stem lineage (Stadler et al. 2004).

32

Figure 3 – Hox genes previously identified in jawless vertebrates. These Hox genes were reported by (Sharman and Holland 1998; Stadler et al. 2004; Takio et al. 2004; Takio et al. 2007; Kuraku et al. 2008; Smith et al. 2013). The cluster organization is known only for the sea lamprey for which a draft genome sequence assembly has been generated (Smith et al. 2013).

The Hox genes in various lamprey species were initially characterized by sequencing PCR products as well as genomic cosmid clones (insert size ~40 kb). PCR survey of Hox genes in the ( planeri)

provided evidence for 18 distinct Hox genes including three members each for

PG1 and PG9 (Sharman and Holland 1998) (Fig. 3). This indicated the

presence of at least three Hox clusters in the brook lamprey. Investigation of

Hox genes in the Japanese lamprey (Lethenteron japonicum) identified a total

of 18 Hox gene fragments representing either one or two members for PG1 to

11, 13, and 14 (Takio et al. 2004; Takio et al. 2007; Kuraku et al. 2008) (Fig.

3). Interestingly, these genes include a Hox14 gene which has been previously identified in the single amphioxus Hox cluster (Amemiya et al. 2008), the

HoxA cluster of coelacanth (Amemiya et al. 2010), and the HoxD cluster of

elephant shark (Ravi et al. 2009) and horn shark (Powers and Amemiya 2004).

Phylogenetic analysis of the lamprey Hox14 gene with Hox14 genes from

33 amphioxus, coelacanth and sharks showed that amphioxus Hox14 evolved independently in the amphioxus lineage and is not a direct ortholog of vertebrate Hox14s (Kuraku et al. 2008). A more detailed analysis of Hox genes and Hox clusters has been carried out in the sea lamprey (Petromyzon marinus), initially by sequencing PCR fragments or genomic clones from phage, cosmid and P1 artificial chromosome (PAC) libraries (Pendleton et al.

1993; Force et al. 2002; Irvine et al. 2002) and later by analyzing the draft assembly of the somatic genome of the sea lamprey (Smith et al. 2013). These studies together provide evidence for two complete Hox clusters, a fragment containing a Hox4 and Hox1 gene and six additional Hox genes that could not be assigned to any cluster (Fig. 3). These studies also showed that the sea lamprey contains four members from PG9 (Fig. 3) and hence the sea lamprey should contain at least four Hox clusters. Interestingly, phylogenetic analysis of sea lamprey Hox genes from eight PGs suggested that the Hox genes of sea lamprey and gnathostomes arose as a result of independent duplication events in each lineage (Fried et al. 2003). This suggestion implies that whole genome duplications that occurred early during the evolution of vertebrates (1R and 2R) occurred independently in the lamprey and gnathostome lineages. However, analysis of the whole genome sequence of the sea lamprey has suggested otherwise, i.e. 1R and 2R occurred in the common ancestor of lamprey and gnathostomes.

The spatial and temporal expression pattern of 18 Japanese lamprey Hox sequences have been analyzed by in situ hybridization (Takio et al. 2004; Takio et al. 2007; Kuraku et al. 2008). Hox genes from PG2-8 displayed a spatially

34 collinear order of expression along the neural tube, showing nested patterns of expression from anterior to posterior regions as PG number increased (Takio et al. 2004; Takio et al. 2007), confirming that the Hox genes in lamprey also follow spatial collinearity. Interestingly, lamprey Hox14 as well as the shark

HoxD14 genes are expressed in restricted cell populations surrounding the hindgut but not in the neural tube and somites where Hox genes are normally expressed. This suggests that Hox14 genes are decoupled from the spatial colinearity code of Hox genes (Kuraku et al. 2008). Analysis of the embryonic expression patterns of three Japanese lamprey Hox genes (known as LjHox4w,

6w, and Q8 in literature) whose orthologs in sea lamprey are found in a cluster, showed no sign of temporal collinearity with LjHoxQ8 having the earliest onset of neural tube expression out of the three (Takio et al. 2007). This suggested that the temporal collinearity of Hox genes may be disrupted in lampreys.

However, further work involving determining the actual cluster position of the

Japanese lamprey Hox genes and the temporal expression patterns of all the genes in a cluster is required to conclude about the conservation or disruption of temporal collinearity in lamprey Hox genes.

1.9 Objectives of my work

Cyclostomes (comprising lampreys and hagfishes) are the sister group of the only other living lineage of vertebrates, the gnathostomes. Thus, cyclostomes are an important outgroup to gnathostomes and an important reference group for understanding the origin and diversity of vertebrates. Cyclostomes exhibit distinct morphological features that are considered ‘primitive’ compared to gnathostomes. For example, lampreys possess only a median fin and lack

35 paired appendages (i.e. pelvic and pectoral fins). The Hox genes, which are organized into multiple clusters in vertebrates, play a key role in defining body segment identities along the anterior-posterior axis of embryos as well as the appendages and early genital development of vertebrates. Thus, Hox genes are implicated in the morphological diversity of all vertebrates. The number of Hox clusters in vertebrates also reflects the genome duplication history of the lineages. Thus it is important to characterize Hox gene clusters in diverse taxa of vertebrates. The main aim of my project is to characterize Hox gene clusters in lamprey (a cyclostome) by sequencing and comparative analysis of all Hox genes and gene clusters. This should help to understand the role of Hox genes in the unique morphological features of lampreys in addition to providing insights into the genome duplication events at the root of vertebrates.

Among lampreys, our laboratory has chosen to characterize the genome of the

Japanese lamprey because of its relatively small size (~1.6 Gb compared to the

2.3 Gb genome of sea lamprey). A smaller genome implies a lower content of repetitive sequences which should make it relatively easier to sequence and assemble long contiguous regions. Our preliminary study has indeed indicated that the Japanese lamprey contains a lower percentage of repetitive sequences

(~21%) compared to the sea lamprey genome (~35%), (Smith et al. 2013).

Both the Japanese lamprey (Lethenteron japonicum) and sea lamprey

(Petromyzon marinus) are northern hemisphere lampreys grouped under

Petromyzontidae family. The two lineages diverged ~30-10 million years ago

(Ma) (Kuraku and Kuratani 2006). The Japanese lamprey is a parasitic, anadromous species distributed in Japanese rivers such as the Miomote River

36 in Niigata, and the Ishikari River in Hokkaido, that join the Sea of Japan.

Japanese lampreys have a diploid chromosome number (2n) of 159-165

(Suzuki 1999) which is comparable to the chromosome number of the sea lamprey (2n = 164-198) (Smith et al. 2010) but higher than those in the majority of gnathostomes (2n = 28-104) (Gregory T. R. et al. 2007).

The specific objectives of my project are to (1) sequence and characterize Hox gene clusters in the Japanese lamprey by probing BAC libraries and sequencing

BAC clones completely; (2) Perform comparative analysis of lamprey Hox gene clusters with gnathostome Hox gene clusters; (3) Use a comparative genomics approach to predict cis-regulatory elements conserved in lamprey and other gnathostome Hox clusters through the identification of conserved noncoding elements (CNEs). Such elements represent ancient cis-regulatory elements that were present in the common ancestor of all vertebrates and therefore must be playing a fundamental role in the regulation of vertebrate

Hox genes; and finally, (4) to assay the function of selected CNEs in transgenic zebrafish to confirm if they indeed function as cis-regulatory elements. During the course of my PhD program, a draft genome of the sea lamprey was generated and published (Smith et al. 2013). This would allow me to carry out a detailed comparison between the Hox gene clusters in the Japanese lamprey and the sea lamprey.

37

Chapter 2: Materials and methods

2.1 Japanese lamprey BAC libraries

To identify Hox genes I probed three Japanese lamprey BAC libraries:

IMCB_Testis1 (EcoRI; 92,160 clones, average insert size 100 kb),

IMCB_Testis2 (HindIII; 165,888 clones, average insert size 115 kb) and

IMCB_Blood (HindIII; 119,808 clones, average insert size 115 kb).

Considering a haploid genome size of 1.6 Gb for Japanese lamprey, the three libraries provide a fold coverage of 5.7, 11.9 and 8.6, respectively.

2.2 Screening of BAC libraries

Probes of 200-250 bp in length were designed from non-repetitive regions

(checked by BLAST searches against the sea lamprey genome assembly v7) of

18 Japanese lamprey Hox gene fragments available in Genbank (refer to

Appendix Table 1). Hox-containing BAC clones were identified using standard radioactive probing methods (protocol given below). It is known that lamprey somatic tissues lose ~20% germline DNA due to extensive rearrangements

(Smith et al. 2009; Smith et al. 2012) and therefore I based my analyses and results mainly on BACs obtained from testis libraries. The blood library was used only to ensure complete coverage of the lamprey Hox gene repertoire.

BAC library filters were prepared for hybridization by washing at 65°C in

“hybe solution” (1 mM EDTA, 0.5 M Na2HPO4, 7% SDS, sterile water) in a

Hybaid Maxi-14 hybridization oven (Thermo Scientific, USA) for a minimum

of 1 hour. The regions to be used as probes were amplified by PCR using

38

Japanese lamprey testis genomic DNA as a template. The PCR product was run on a 1.5% TAE-EtBr agarose gel and extracted with the Geneclean® II Kit

(Qbiogene, USA) according to manufacturer’s protocols. DNA was quantified on the Qubit® 2.0 fluorometer (Invitrogen, USA) and the DNA was diluted with sterile water to a final working concentration of 25 ng/µl. For the purpose of labeling, 1 µl of PCR product (25 ng) was made up to 9 µl with sterile water and then denatured at 100°C for 10 min. The amplified product was labeled with dCTP, α-32P (PerkinElmer, USA) using the Random Primed DNA labeling kit (Roche, Switzerland) to make the probe. Unincorporated dNTPs were removed from the mix by filtering through the illustra™ ProbeQuant G-

50 micro column (GE Healthcare, UK). The purified radioactive probe was denatured at 100°C for 10 min and then immediately placed on ice. The BAC library filters were hybridized with the purified radioactive probe by adding to the hybe solution and incubating overnight at 60-65°C with constant shaking in a Hybaid Maxi-14 hybridization oven (Thermo Scientific, USA). The next day, filters were washed in “washing solution” (0.1% SDS, 0.1× SSC, sterile water) at 65°C for 2-3 times (10-15 min each). After washing, each filter was carefully placed between thin plastic films and then exposed to Super RX-N 100NIF

(35×43) medical X-ray films (Fujifilm, Japan) at -80°C. The films were developed after 2-3 days according to the signal intensity using the X-OMAT

2000 processor (Kodak, USA).

2.3 Screening potential positive clones

In order to eliminate false positives, potential positive BAC clones were screened by PCR using primers used for making the probes. Positive BAC

39

clones were grown in 10 ml 1×LB + chloramphenicol (15 mg/ml) and isolated

using small scale BAC miniprep using the standard alkaline lysis method.

2.4 Sequencing ends of BAC inserts

The ends of BAC clones were sequenced using universal vector primers

specific to the BAC clones. IMCB_Testis1 BAC clones were sequenced using

bT7.b and bT3.g primers, whereas IMCB_Testis2 and IMCB_Blood clones

were sequenced with M13FL.b and pccBACR1.g primers. BAC-end

sequencing was carried out using the BigDye® Terminator v3.1 cycle

sequencing kit (Applied Biosystems, USA) and analyzed on a 3730XL DNA

Analyzer (Applied Biosystems, USA).

2.5 Shotgun-sequencing of BAC clones

Representative ‘seed’ BACs were completely sequenced using standard

shotgun sequencing methods. BAC clones were grown in 250 ml 1×LB +

chloramphenicol (15 mg/ml) and isolated using the Large-Construct kit

(QIAGEN, Germany) following manufacturer’s protocol. DNA was randomly

sheared into ~3 kb or ~5 kb fragments using a HYDROSHEAR

(GeneMachines, USA). The ~3 kb or ~5 kb fragments were end repaired using

T4 DNA polymerase (Roche, Germany) and then ligated into pBluescript KS+ vector (Stratagene, USA) digested with EcoRV. The blunt-end ligation

reaction was set up using a molar ratio of 1:3 (vector:insert) with 1.5 units of

T4 DNA ligase (Fermentas, Lithuania). Ligation was carried out at 16°C for

~18-20 hrs. The ligation mixture was transformed using E. coli ElectroMax

DH10B™ (Invitrogen, USA) electrocompetent cells. The quality of the

40

libraries was ascertained by picking 12 white colonies and 1 blue colony and

growing them in 1ml 1×LB + Ampicillin (50 mg/ml). DNA was prepared by

small scale plasmid miniprep using the standard alkaline lysis method. A library with ≥80% inserts of required size was considered a suitable library for shotgun sequencing.

2.5.1 Large scale isolation of plasmid DNA

Libraries were spread on square Nunc trays (Nalgene Nunc International,

Denmark) containing 250 ml 1×LB agar with 50 mg/ml ampicillin (800 μl),

4% X-gal (600 μl) and 100mM IPTG (400 μl) using sterile glass beads to

guarantee an even distribution. The plates were then incubated overnight at

37°C.

The colonies were grown in 96-well blocks with each well containing 1ml

2×LB + Ampicillin (50 mg/ml). For 3 kb libraries about 1500 colonies were

picked, whereas for 5 kb libraries, about 800 colonies were picked. The

colonies were picked either manually using sterile toothpicks or by using the

automated robotic colony picker (QPix2, Genetix, UK). The plates were

incubated overnight at 37°C with constant shaking at 250 rpm. The next day,

plasmid DNA isolation was carried out using a 96-well plasmid preparation kit

(Geneaid, UK). DNA was dissolved in 70 μl elution buffer (10 mM Tris-HCl,

pH 8.5) and sequencing reactions were set up on the same day or the plates

were frozen at -20°C.

2.5.2 Sequencing plasmid DNA

41

Both ends of plasmid inserts were sequenced using the BigDye® Terminator

Cycle Sequencing Kit (Applied Biosystems, USA) using universal vector primers T3L.b and T7L.g. The following PCR conditions were used: initial denaturation at 96°C for 2 min; 35 cycles of denaturation at 96°C for 30 sec, annealing at 50°C for 15 sec, and extension at 60°C for 4 min. The extension products were separated and analyzed on a 3730XL DNA Analyzer (Applied

Biosystems, USA).

2.6 Assembling BAC shotgun reads

Raw chromatogram files were processed and assembled using Phred, Phrap and Consed (http://www.phrap.org/phredphrapconsed.html) (Ewing and Green

1998; Ewing et al. 1998; Gordon 2003) package. The Phred-Phrap package includes Phred which carries out base calling for raw sequence data and then assigns a quality value to each base (Ewing and Green 1998; Ewing et al.

1998). A good quality base will have a Phred score of 30 or above (Ewing and

Green 1998; Ewing et al. 1998). Phrap then uses all the data generated by

Phred to assemble the sequences into contig sequences of highest quality. The assembled sequences were viewed and edited on Consed (Gordon 2003). Each

BAC clone was sequenced to a coverage of approximately 8-10×.

Sequencing gaps (i.e. gaps between the two plasmid end sequences) in the assembly were filled by designing walking primers in high quality read regions close to the end of each contig and sequencing using appropriate plasmid clones as a template. In some instances sequencing gap regions were flanked by highly repetitive elements, and hence primers could not be

42

designed for the conventional primer walking technique. For such clones,

transposon technology was applied using the Template Generation Kit

(Finnzymes, Finland) for getting overlapping shotgun reads of the plasmid

insert. Physical gaps are gaps between contigs with no linking plasmid ends.

Such gaps were filled by designing PCR primers close to the contig ends and amplifying BAC DNA using all possible pairs of PCR primers. The PCR product was either sequenced directly or cloned into pGEM® -T easy vector

(Promega, USA) and then sequenced.

2.7 Japanese lamprey whole genome sequence

A draft sequence of the Japanese lamprey genome was generated jointly by

Dr. Sydney Brenner’s lab at Okinawa Institute of Science and Technology

(OIST, Japan) and our lab in IMCB using the 454 GS FLX Titanium and GS

FLX+ systems (Roche, Germany). A combination of the following libraries

was used to produce a 20.5 coverage genome assembly: six shotgun libraries

(15.6), two 3 kb paired-end libraries (1.8), one 8 kb paired-end library

(~1.0), one 12 kb paired-end library (1.2), one 16 kb paired-end library

(0.9), and one BAC library (IMCB_Testis2; 38,210 BAC ends). The

combined dataset was assembled using the Newbler assembler (ver. 2.7,

Roche/454 Life Sciences). I was provided access to the assembly for filling

gaps in the Hox clusters. I am grateful to the two labs for this generosity.

2.8 Annotation and analysis of genes

A combination of ab initio and homology-based methods were used to predict

protein-coding genes and miRNA genes in the Hox loci. Exon-intron structure

43

of some genes was confirmed by RT-PCR and 5’RACE using RNA from

various Japanese lamprey tissues.

Phylogenetic analyses were carried out for Hox genes or genes flanking the

Hox clusters together with orthologs from representative chordates such as

amphioxus (Branchiostoma lanceolatum or Branchiostoma floridae), Ciona

intestinalis, elephant shark (Callorhinchus milii), leopard shark (Triakis

semifasciata), smaller-spotted catshark (Scyliorhinus canicula), fugu (Takifugu

rubripes), zebrafish (Danio rerio), coelacanth (Latimeria menadoensis or

Latimeria chalumnae), Xenopus tropicalis, chicken (Gallus gallus), lizard

(Anolis carolinensis) and human (Homo sapiens). Sequences for the

representative chordates were extracted from GenBank and Ensembl. Multiple

alignments were generated using MUSCLE

(http://www.ebi.ac.uk/Tools/msa/muscle/) (Edgar 2004). Codon-based alignments of corresponding nucleotide coding sequences were generated based on the amino acid alignment using PAL2NAL

(http://coot.embl.de/pal2nal/) (Suyama et al. 2006). Alignments were refined

by manual inspection and trimmed using BioEdit sequence alignment editor

(http://www.mbio.ncsu.edu/BioEdit/BioEdit.html) (Hall 1999). The best-fit

substitution models for the alignments were deduced using MEGA-CC

(Kumar et al. 2012). Two different phylogenetic methods, Maximum

Likelihood (ML) and Bayesian Inference (BI) methods were used for

phylogenetic analyses using the best-fit substitution model. In these two

model-based methods, probabilistic models of molecular evolution are used for

the alignment before generating the tree, making them largely unbiased and

44 statistically consistent when the correct model is used (Yang and Rannala

2012). These methods are considered more robust than the distance-based method of Neighbour-Joining (NJ) and less prone to Long-Branch attraction.

An advantage to both BI analysis and the ML method is that all substitution model assumptions are clear, can be evaluated based on the tree, and subsequently improved. A key difference between ML and BI lies in the parameters used for the selected evolutionary model, in Bayesian methods the parameters are considered to be random variables with user-defined statistical distributions (prior distribution), allowing for robustness against relative branch-length differences, whereas ML uses the data to estimate specific values for these parameters (Yang and Rannala 2012).

For ML analyses, MEGA-CC (http://www.megasoftware.net/) (Kumar et al.

2012) was used with 100 bootstrap replicates were used for node support whereas MrBayes 3.2.1 (http://mrbayes.csit.fsu.edu/) (Ronquist et al. 2012) was used to generate BI phylogenetic trees. Two independent runs starting from different random trees were run for 1,000,000 generations with sampling every 100 generations. A consensus tree was built from all sampled trees excluding the first 2,500 (“burnin”). Samples not reaching “stationarity” after

1,000,000 generations were run for 5,000,000 generations and the “burnin” was adjusted accordingly.

2.9 Predicting conserved noncoding elements (CNEs)

Repetitive sequences were identified and masked using CENSOR with default settings using Lethenteron Repbase repeats

45

(http://www.girinst.org/censor/index.php) (Kohany et al. 2006). Sequences for

fugu loci were extracted from the University of California Santa Cruz (UCSC)

Genome Browser (http://genome.ucsc.edu/). Elephant shark and human Hox

cluster sequences were extracted from GenBank and Ensembl, respectively.

Multiple alignments of Japanese lamprey, elephant shark and human Hox

cluster loci sequences were generated using the global alignment program

MLAGAN (Brudno et al. 2003) with lamprey as the reference sequence.

CNEs were predicted using a cut-off of ≥65% identity across >50 bp windows

and visualized using VISTA (Frazer et al. 2004).

2.10 Functional assay of CNEs in transgenic zebrafish

To test the biological function of CNEs, selected CNEs were amplified by

PCR and cloned into a miniTol2 transposon donor plasmid (modified from

Balciunas et al. 2006) linked to the mouse cFos basal promoter (McFos)

(Dorsky et al. 2002), and coding sequence of green fluorescent protein (GFP).

The constructs were tested for reporter gene expression in transgenic

zebrafish; the method is summarized in Figure 4. CNEs were amplified by

PCR using lamprey BAC clones or genomic DNA from lamprey (testis),

human, mouse, or zebrafish as a template. About 50 bp of sequence on either

side of CNEs were included in the PCR products. The PCR was performed

using the DyNAzyme I DNA polymerase (Thermo Scientific, USA) (25 µl reaction, 10 mM primers, 2.5 mM dNTPs) with the following conditions: initial denaturation at 95°C for 2 min; 35 cycles of denaturation at 95°C for 30 sec, annealing at 60°C for 1 min, extension at 72°C for 1 min; and final extension at

72°C for 5 min. The PCR product was ran on a 1.2 – 1.5% TAE-EtBr gel

46

(according to expected product size). The single band product was then excised

and purified using the QIAquick® gel extraction kit (QIAGEN, Germany) and

eluted in 25-35 µl buffer EB (10mM Tris-HCl pH8.5). The extracted product

was digested with appropriate enzymes and ligated into the modified miniTol2

vector. The CNE-miniTol2 constructs were prepared for microinjection using standard phenol-chloroform extraction methods. The concentration was

estimated with NanoDrop® and a working concentration of 125 ng/µl was

prepared for each CNE construct. Transposase mRNA was prepared from

cDNA in vitro using the mMESSAGE mMACHINE T7 kit (Ambion, USA)

and cleaned up using the RNeasy Mini Kit (QIAGEN, Germany). The

concentration was estimated with NanoDrop® and a working concentration of

175 ng/µl was prepared for microinjection. Zebrafish (AB strain) were reared and mated following standard methods (Westerfield 2000). CNE-miniTol2 constructs and transposase mRNA were mixed with 0.5% phenol red to a final concentration of 25 ng/μl and 35 ng/μl, respectively, and approximately 1 nl of the mixture was injected into the yolk (just below the blastomeres) of zebrafish embryos at the late one-cell or early two-cell stage. For each construct, around

300-400 embryos were injected and GFP expression was observed at 24, 48 and 72 hours post-fertilization (hpf). The embryos were reared at 28°C and

GFP expression was observed using a stereomicroscope fitted with an epifluorescence filter (MVX10, Olympus, Japan) and photographed using an attached digital camera (DP70, Olympus, Japan). The fish were temporarily immobilized for imaging by the use of buffered (pH 6.5-7.5) 1× Tricaine methanesulfonate (MS-222 168 mg/l). Zebrafish embryos were maintained from 6 hpf onwards in 0.003% N-phenylthiourea (PTU) (Sigma-Aldrich,

47

Sweden), a tyrosinase inhibitor that blocks pigmentation and allows for GFP visualization in developing embryos (Karlsson et al. 2001). The GFP expression pattern observed in at least 15% of transgenics was considered as

CNE-driven expression. Embryos with transient GFP expression were reared until maturity (3+ months) and out-crossed with wild type fish (AB strain) to analyze stable GFP expression of F1 progeny.

CNE PCR amplified

CNE Transposase mRNA + Tol2 Tol2

Co‐injection to fertilized egg CNE McFos_prom-GFP

Tol2 Tol2

Genomic DNA

Excision

Grow to adult stage Integration into genome via (2 to 3 months) transposition

F0 x Founder (injected fish) Wild type

Transgenic F1 GFP reporter gene expression: 24 –72 hrs

Figure 4 – Tol2 transgenesis in zebrafish. Figure adapted from (Kawakami 2007).

48

Chapter 3: Results – Hox clusters in the Japanese lamprey

3.1 Screening of lamprey BAC libraries

For designing appropriate probes for screening the BAC libraries I searched for

known Japanese lamprey Hox genes in GenBank and identified fragments of

18 Japanese lamprey Hox genes (Appendix Table 1). Hox genes are typically

made up of two exons, with the second exon coding for a highly conserved 60

amino acid motif known as the homeodomain. The general strategy I used for

BAC library screening was to initially design probes for unique regions such as

the 3’UTR, 5’UTR or the first exon. Once these probes were exhausted, I

designed probes for the Hox domain which should hybridize to all Hox genes

from a particular paralogous group or related paralogous groups (e.g., anterior

Hox genes, posterior Hox genes). The probes for UTRs and first exon were

generally 200 bp to 300 bp long while those for second exon were 120 bp to

150 bp long. To start with, probes were designed for each of the 18 known

Japanese lamprey Hox genes. The DNA was amplified from testis genomic

DNA by PCR and then radioactively labeled with dCTP, α-32P. Altogether

three BAC libraries were probed: IMCB_Testis1 library (average insert size

100 kb) prepared with EcoRI digestion with a coverage of 5.7×; IMCB_Testis2

library (average insert size 115 kb) prepared with HindIII digestion with a coverage of 11.9× and IMCB_Blood library prepared with HindIII digestion

with a coverage of 8.6×. Two different enzymes were used to overcome any bias that a particular restriction site may have in the Japanese lamprey genome.

Thus I expected that the three libraries to provide a good coverage of the entire genome and contain all the Hox genes in the genome. Among the three BAC

49

libraries, IMCB_Testis1 library was screened first followed by IMCB_Testis2

and IMCB_Blood libraries if required. Using this strategy, I identified in

excess of 600 potential positive BAC clones. I screened the potential positive

BAC clones by PCR using the primer pair used for amplifying the probe or by

Southern hybridization. The ends of all identified BACs were sequenced using

Sanger sequencing method. The end sequences were useful for identifying

overlapping BACs that could extend the sequence of a completely sequenced

BAC. In spite of the precautions taken in designing specific probes, my

screening identified many ‘false positive’ BAC clones that either contained

Hox related genes such as Gbx, Lhx, and Mnx or genes that contained repetitive

sequences with some similarity to the probe sequences.

3.2 Sequencing and assembly of BAC clones

Initially one positive clone per probe was sequenced completely by the

standard shotgun-sequencing method and assembled using Phred, Phrap and

Consed (http://www.phrap.org/phredphrapconsed.html) (Ewing and Green

1998; Ewing et al. 1998; Gordon 2003) package. These BAC clones are

referred to as ‘seed’ BACs. To identify overlapping BACs, I designed primers

for the ends of the ‘seed’ BACs and probed the library. Positive BACs

identified were first characterized by sequencing their ends and an overlapping

BAC that gave the longest overlap (i.e., had the shortest overlap to the seed

BAC) was selected for complete sequencing. The probing and sequencing were repeated until I hit a non-Hox gene which indicated I have reached the end of the Hox cluster. In total, I identified 638 potential positive BAC clones out of

which 32 were sequenced completely. This resulted in 13 contigs (Fig. 5) with

50

a combined length of 1.2 Mb. The contigs were annotated for Hox or Hox-

linked genes by a combination of BlastX search against the non-redundant

protein sequence (nr) database at http://blast.ncbi.nlm.nih.gov/Blast.cgi (NCBI)

and de novo prediction such as similar protein-based gene prediction using

SoftBerry FGENESH+ at http://linux1.softberry.com/berry.phtml. The predicted exon-intron boundaries were further refined by manual annotation.

The 13 contigs code for 31 Hox genes (including five partial genes) and some non-Hox genes (Fig. 5). The largest contig (#1) is 204 kb long and contains six

Hox genes. Another contig (#5) also contains 6 Hox genes but is only 152 kb long (Fig. 5). On the other hand, contig #6 contains only three Hox genes and is similar in length (155 kb) to contig #6 (Fig. 5). Some contigs contained only

one Hox gene by itself (contig 9) or linked to a non-Hox gene (contigs 12 and

13). Thus there is a wide variation in the density of Hox genes between different contigs.

51

10 987 54 Mtx2 Contig 1 (204kb) 3 2

e1 Contig 2 (94kb) 9 5 4 Contig 3 (96kb) 3 2 Contig 4 (83kb) 9 8 7 6 5 4

e2+3 Contig 5 (152kb) 14 13 11 Contig 6 (155kb) 10 9 Contig 7 Jaz Evx e2 (73kb) 13

Contig 8 (72kb) 8 Contig 9 e2 (40kb) 41

Evx e2 Contig 10 (45kb) 11 Wnt Contig 11 (32kb) 4 Contig 12(55kb) Zfp Ψ 4 Contig 13(113kb)

Figure 5 – Schematic diagram of Hox contigs derived from sequencing 32 BAC clones. Not drawn to scale. Contigs are numbered in the order in which they were sequenced and assembled. Hox genes are shown as blue boxes while non-Hox genes are shown as white boxes. Some Hox genes are partial containing either exon 1 (e1), exon 2 (e2), or exon 2 and 3 (e2+3). The Hox4 gene in contig 13 is a pseudogene (Ψ) as it contains a premature stop coding in the last exon.

At this point, I had exhaustively searched the three BAC libraries and presumably identified all the Hox genes present in these libraries. Yet they do

not represent complete Hox clusters and included only 15 out of the 18

previously known Japanese lamprey Hox genes. This indicated that the three

BAC libraries together do not completely represent the whole genome of the

Japanese lamprey and highlights the difficulty in making a well-represented

52 library for the Japanese lamprey. This seems to be related to the extensive repetitive sequences found in Hox loci as well as the entire genome of the

Japanese lamprey (see section 3.5 for details of repetitive sequences).

3.3 Mining the Japanese lamprey genome assembly for Hox genes

At this phase of my research, our lab in collaboration with Dr. Sydney

Brenner’s lab at the Okinawa Institute of Science and Technology (OIST,

Japan) generated a draft assembly of the Japanese lamprey genome. The draft assembly was based on 20.5× coverage and has an N50 scaffold size of 1.05

Mb and spans ~1.03 Gb of the estimated 1.6 Gb germline genome of Japanese lamprey (see Materials and methods). N50 scaffold is a weighted median statistic such that 50% of the entire assembly is contained in scaffolds equal to or larger than this value. The larger the N50 scaffold size, the better the assembly quality. I was provided access to the Japanese lamprey draft genome assembly and given the task of characterizing all the Hox genes in the assembly. I used the 31 Hox protein sequences from the thirteen contigs (Fig.

5) and searched the genome assembly by TBLASTN to identify scaffolds that contained Hox genes. I extracted sequences of such scaffolds and annotated

Hox and Hox-linked genes using the strategy mentioned in section 3.2. After which, I combined Hox containing scaffolds with 32 contigs generated by BAC sequencing and carried out detailed analysis of the resulting Hox loci.

Altogether I could identify 43 Hox genes that could be assigned to 12 loci (Fig.

6). Notably, these 43 genes include all previously known Japanese lamprey

Hox genes (Appendix Table 1) and orthologs of all 31 previously known sea

53 lamprey Hox genes (Appendix Table 2). This indicates that I have identified nearly all the Hox genes present in the Japanese lamprey.

54

Figure 6 - Japanese lamprey Hox gene loci obtained from the combined data set of 32 BAC clones and draft genome assembly. Drawn to scale. Genes are represented as boxes with pseudogenes denoted by the Ψ symbol. Genes labeled with ‘(e2)’ indicate that only the second exon could be identified. BAC clones that were completely sequenced are indicated below. Brackets denote deletions within the BAC identified by alignments to contiguous genome assembly scaffold sequences. 55

A significant feature of these loci is that they contain eight Hox-4 genes which

indicate that the Japanese lamprey potentially contains eight Hox clusters. On

this assumption I have arranged the 12 loci into eight putative Hox clusters as

shown in Fig. 7 and named them tentatively as Hox-α, Hox-β, Hox –γ, Hox-δ,

Hox-ε, Hox-ζ, Hox-η, and Hox-θ loci (Fig. 7A and B). These loci include four

‘complete Hox clusters’ (Hox-α, Hox-β, Hox –γ and Hox-δ), i.e. a cluster of

Hox genes flanked by genes known to be linked to Hox clusters in other

vertebrates. Hox-α cluster contains 10 Hox genes, flanked 5’ by Tax1bp and 3’

by Mtx (Fig. 7A and B). Hox-β cluster has 8 Hox genes flanked by Evx (5’) and

Snx (3’) (Fig. 7A and B). Hox-γ cluster has only four Hox genes flanked by

Atp5g on its 5’-end and Cbx on its 3’-end (Fig. 7). The fourth complete cluster,

Hox-δ contains only six Hox genes linked to Lnp (5’) and Smug (3’) (Fig. 7).

Besides the four complete Hox clusters, there are four fragments of Hox clusters (Hox- and Hox-ζ), i.e. two or more Hox genes that are either linked or not linked to a non-Hox gene (Fig. 7A and B). One of these Hox cluster fragment (Hox-ε14 to –ε9) contains 5 Hox genes, including a Hox14 gene linked to an Evx pseudogene (Fig. 7A and B). In addition, there are four

‘singleton’ Hox genes that are either on their own (Hox-3) or linked to a non-

Hox gene (Hox-13, Evx and Jazf). However, it is not known if the Hox cluster fragments and singletons form a cluster as shown in Fig. 7 or are a cluster by

themselves. To resolve this, I made several attempts to determine if the

fragments/singletons are linked as shown in Fig. 7 or in any other manner by

carrying out long-PCR using primer pairs for all possible combinations of

fragments and singletons using lamprey testis genomic DNA as a template.

However, I failed to get any PCR bands linking the fragments and singletons.

56

Likewise I also carried out Southern blotting to see if any of the fragments/singletons are localized to a single restriction fragment. I used 6- base cutters such as BstXI, HindIII, SacI, XbaI and XmnI as well as 8-base cutters such as NotI, PacI, SbfI, and SwaI. However, none of the digests identified a restriction fragment that contained more than one Hox cluster fragment or singleton. In fact the Japanese lamprey was nearly recalcitrant to

NotI enzyme, which is sensitive to methylation, presumably due to the extensive methylation of DNA in the testis. Thus I could not establish the linkages of Hox cluster fragments and singletons. Generating further deep coverage genomic sequences of the Japanese lamprey might help to determine the complete cluster organization of these Hox loci.

57

(A)

(B)

58

Figure 7 - Hox clusters in the Japanese lamprey. (A) Hox clusters and singleton Hox genes shown in Fig. 6 are rearranged to represent eight putative Hox clusters. (B) Schematic representation of the putative Hox clusters shown in A. Hox-linked genes are shown in white color. Arrows denote the direction of transcription; pseudogenes are marked with a Ψ symbol.

Among the singletons, Hox-η4 and Hox-θ4 are unique in that they have a

different exon-intron structure compared to all other Hox genes. Most Hox

genes comprise two exons, with the second exon coding for the homeodomain

(Fig. 8). However, certain Hox genes such as Hox14 genes of coelacanth,

elephant shark, horn shark, amphioxus and the Japanese lamprey (designated in

this study as Hox-ε14) and Hox13 of the Japanese lamprey (designated in this

study as Hox-γ13) have been shown to contain three exons (two introns), with the additional intron located within the homeodomain coding region (Amemiya et al. 2008; Kuraku et al. 2008; Ravi et al. 2009; Amemiya et al. 2010). In contrast to these Hox genes, Hox-η4 comprises four coding exons and three

introns (Fig. 8). This was verified by conducting RT-PCR using several

Japanese lamprey tissues which resulted in a product from the kidney. I then

carried out 5’ RACE using kidney cDNA to obtain full-length cDNA which was then mapped to the genomic sequence. This is the first report of a vertebrate Hox gene containing four exons, with the additional introns splitting the first exon into three exons (Fig. 8 and Fig. 9). Interestingly, the coding sequence of Hox-θ4 gene is highly similar to that of Hox-η4 (Fig. 9), but contains a premature stop codon in the last exon (Fig. 8), and its first two exons are not identifiable, indicating that it has become a pseudogene.

59

123 4 LjHox-η4

34* ΨLjHox-θ4

Exon 1 Exon 2

LjHox-δ4 Hd

LjHox-γ13, 123 LjHox-ε14, LmHoxA14, CmHoxD14, HfHoxD14

Figure 8 – The unique exon-intron structure of Japanese lamprey Hox- η4 and Hox-θ4. Exon-Intron structure of a typical Hox gene (LjHox-δ4) and Hox 13 and Hox14 genes of various vertebrates that contain an additional intron in the homeodomain are shown below for comparison. Exons are shown as boxes and the numbers are given above. Homeodomain is marked in green. The pseudogene (Ψ) LjHox-θ4 has a premature stop codon (marked by asterisk) in its homeodomain and its first two exons are not identifiable (shown as dotted line). Lj, Lethenteron japonicum; Lm, Latimeria menadoensis; Cm, Callorhinchus milii; Hf, Heterodontus franscisci.

60

LjHox-δ4 M S S YLMNRSYVEPKFPPCEEYSQSSY I SEGPAVDFFASERRVLECAPGGP LjHox-η4 M P S QGQ I YNNAG------LjHox-θ4 ------

LjHox-δ4 GSYQSCDG VVDSSPP RSGVMPPL V P P PPTSRGPSLT P N L DSTPSAPASAA LjHox-η4 ------G LRNKGFP QRWQKDNV V H P ------T R H L P------LjHox-θ4 ------

LjHox-δ4 A V VASTAAAAA A L G V S S S PRGRESGAGDE TRGAMRAREGGGGGGGGGGGG LjHox-η4 - V R------A P L G T A S S S S ------LjHox-θ4 ------A P L G T A N S S S ------

Homeodomain

LjHox-δ4 GRRRDSPERMA T P VVYPWMKK I HVGGGGA V D S E S Q G H E TKRARTAYTRQQ LjHox-η4 ------V N T P ------A I D S E S Q G H E ------Q LjHox-θ4 ------V N T P ------A I D S E S Q G H E ------Q

Homeodomain

LjHox-δ4 V L E L E K E F H F N R Y L T R R R R - L E I A H S L A L S E R Q V K I W F Q N R R M K W K K D H K LjHox-η4 V L E L E K E F H F N G Y L T R R R R H L E I A H S L A R S E R Q V N I W F Q N R R M K W K K D H K LjHox-θ4 V L E L E K E F H F N G Y L T R R R R H L E I A Q S L A R S E R Q V N I W F Q N R R M K *

LjHox-δ4 L P N T K M R P G P P G L P - Q G L S H R P D C S R T LjHox-η4 L P N I K M R P G P P G L P P Q G L S D R P D C S R T LjHox-θ4

Figure 9 – Alignment of protein sequences of Japanese lamprey Hox-δ4, Hox-η4, and Hox–θ4 genes. The Japanese lamprey Hox-δ4 is included in the alignment as representative typical Hox gene for comparison. Amino acid residues that are conserved in all sequences are shaded in black. Positions of introns are marked by red arrow heads.

The singleton Hox-η4 gene is linked in its 3’-end to a Wnt gene (Fig. 7A and

B). Although no Wnt gene is linked immediately to a Hox cluster gene in

mammals and other vertebrates, members of the Wnt gene family are located

on the same human chromosomes as the four Hox clusters. For example, on

human chromosome 7 Wnt2 is located ~90 Mb upstream of the HoxA cluster.

Two Wnt genes, Wnt3 and Wnt9B are found ~32 kb apart on chromosome 17 at

~2 Mb downstream of the HoxB cluster. Likewise, Wnt10B/Wnt1 (~7 kb apart),

and Wnt10a/Wnt6 (~34 kb apart) are found ~5 Mb upstream and ~43 Mb

downstream of the HoxC and HoxD clusters respectively. Thus it seems

possible that a Wnt gene was distantly linked to a Hox cluster in the common

61

ancestor of lamprey and gnathostomes and in the Japanese lamprey the

distantly linked Wnt might have been brought to the proximity of the Hox-η4 gene due to the secondary loss or translocation of genes located between Wnt and Hox-η4. The other singleton Hox4 gene, Hox–θ4 is linked in its 5’-end to a

Zfp gene (Fig. 7A and B). One member of the Zfp gene family, Zfp12, is

located ~20 Mb upstream of the human HoxA cluster on chromosome 7. In the

Japanese lamprey, gene loss, transposition, or chromosome rearrangements

could have brought Zfp and Hox–θ4 genes to close proximity.

Overall my analysis has provided evidence for at least six Hox clusters in the

Japanese lamprey; this includes four complete Hox clusters (Hox-α, Hox-β,

Hox –γ and Hox-δ) and two fragmented Hox clusters (Hox-ε and Hox-ζ) (Fig.

7A and B). My next task was to determine the orthology relationships of these

supernumerary lamprey Hox clusters to the four Hox clusters (HoxA, HoxB,

HoxC and HoxD) of gnathostomes.

3.4 Determining orthology of lamprey Hox gene clusters

3.4.1 Phylogenetic analysis

Phylogenetic analysis of molecular sequences is a powerful approach for

establishing orthology relationships between genes and gene families in related

genomes. This typically involves reconstruction of a phylogenetic tree using

diverse algorithms to infer the evolutionary relationships among the sequences.

To determine orthology relationships between the four gnathostome Hox

clusters (HoxA, B, C and D) and the supernumerary lamprey Hox clusters and

singletons, I carried out phylogenetic analysis using two different algorithms.

62

Multiple alignments were generated for protein sequences, and then codon- based alignments of corresponding nucleotide coding sequences were generated based on the amino acid alignment. Alignments were refined by manual inspection and alignment gaps were trimmed. Phylogenetic analysis was carried out using Maximum Likelihood (ML) and Bayesian Inference (BI) methods with the best-fit substitution model. Consistent relationships inferred using two independent methods provide strong support for the relationships.

Both ML and BI are model-based methods. However, in BI, the parameters in the model are random variables with statistical distributions and all possible parameters are considered for the final result. On the other hand in ML, the parameters are unknown fixed constants that are estimated from the data (Yang and Rannala 2012). The two methods have their own strengths and limitations.

An advantage of the two methods is the availability of a number of sophisticated evolutionary models for phylogenetic analysis, either selected manually or by model predictions based off the alignment. A drawback to both model-based methods is the poor statistical output obtained if the model is mis- specified. The distance-based method of Neighbor Joining (NJ) was not used as it performs poorly for highly divergent sequences from distantly related species leading to large sampling errors and the clustering of incorrect relatives (Long-

Branch attraction) (Holder and Lewis 2003).

Since the Japanese lamprey Hox4 paralogous group (PG) contains eight members (seven full-length and one partial), phylogenetic analysis of Hox4 genes should be most informative for assigning orthology. Thus, I first carried out phylogenetic analysis of Japanese lamprey Hox4 genes together with Hox4

63 genes from other representative vertebrates. I attempted phylogenetic analysis using sequences of either both exons or just the second exon; full-length coding

(nt) sequence or protein sequence; and just 1st and 2nd codon positions.

However, in most of these phylogenetic trees, the lamprey genes generally cluster together outside of the gnathostome genes (Fig. 10, Appendix Figures 1 to 2). This topology was relatively consistent for both ML and BI analyses with different degrees of statistical support – Fig. 10, Appendix Figures 1 to 2).

I next attempted phylogenetic analysis of Hox PGs 13, 11, 9, and 8 which have either five or four members each in the Japanese lamprey. However, a similar exclusive clustering of lamprey genes was observed for members of these groups (Fig. 11 to 14 and Appendix Figures 3 to 4). In certain instances, like the phylogenetic analysis for four full-length lamprey Hox9 genes, the nodes for exclusive clustering of lamprey protein and CDS sequences were highly supported by both ML (bootstrap value of 95) and BI (posterior probability of

0.995 and 0.998) (Fig. 12).

64

Figure 10 – Phylogenetic analyses of Japanese lamprey Hox4 genes using second exon coding (nucleotide) sequences and protein sequences. The second exon of eight Hox4 protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Hox4 homologs with amphioxus Hox4 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Ac, Anolis carolinensis; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milii; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

65

Figure 11 – Phylogenetic analyses of Japanese lamprey Hox13 genes using only the second exon sequence. Nucleotide (cds) and protein (pro) sequences encoded by the second exons of five Japanese lamprey Hox13 genes were aligned with homologous sequences from selected gnathostomes with amphioxus Hox13 second exon sequence as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

66

Figure 12 – Phylogenetic analyses of Japanese lamprey Hox9 genes using full-length coding (nucleotide) sequences and protein sequences. Four full-length Hox9 protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Hox9 homologs with amphioxus Hox9 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

67

Figure 13 – Phylogenetic analyses of Japanese lamprey Hox11 genes using full-length coding (nucleotide) sequences and protein sequences. Four full-length Hox11 protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Hox11 homologs with amphioxus Hox11 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

68

Figure 14 – Phylogenetic analyses of Japanese lamprey Hox8 genes using full-length coding (nucleotide) sequences and protein sequences. Four full-length Hox8 protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Hox8 homologs with amphioxus Hox8 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Ac, Anolis carolinensis; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

69

This pattern of exclusive clustering of lamprey Hox genes to the exclusion of

gnathostome Hox genes suggests that the duplication of the lamprey and

gnathostome Hox genes occurred independently after the divergence of the

lamprey and gnathostome lineages, and thus they are not one-to-one orthologs

but group-to-group orthologs. If the lamprey and gnathostome Hox genes were

the result of shared duplication histories, each of the lamprey Hox gene would

have clustered with its single ortholog in gnathostomes (i.e. either with HoxA,

HoxB, HoxC or HoxD genes). The clustering of lamprey Hox genes outside the

gnathostome Hox genes therefore suggests that the lamprey Hox genes

duplicated independently of the gnathostome Hox clusters. This pattern of

clustering and similar interpretation of the relationships of lamprey and

gnathostome Hox genes have been previously made based on analysis of a

handful of lamprey Hox genes (Fried et al. 2003). However, recent analysis of

a large number of sea lamprey genes (Qiu et al. 2011) and the draft sequence of

the sea lamprey (Smith et al. 2013) have led to the interpretation of this pattern

of lamprey gene clustering as an artifact due to the GC-bias of the sea lamprey

genome affecting codon usage pattern and the amino-acid composition of

protein sequences (Qiu et al. 2011; Smith et al. 2013). The GC-bias of the

lamprey paralogous genes has been suggested to render them more similar to

each other than to their orthologs in other gnathostomes. Since the Japanese

lamprey genome is also GC-rich (average GC content of ~48%) like the sea lamprey (average GC content ~46%), it is possible that the exclusive clustering of the Japanese lamprey Hox genes could be an artifact due the GC- and amino acid composition bias. Thus I concluded that phylogenetic analysis is

70

inconclusive in assigning orthology to the Japanese lamprey Hox clusters and

genes.

3.4.2 Analysis of gene synteny

I next attempted to use the synteny of genes flanking Hox clusters to infer

orthology of lamprey and gnathostome Hox clusters. Vertebrate Hox clusters

sharing common whole genome duplication histories are expected to show a

high conservation of genes flanking them. I chose the human, coelacanth and

elephant shark Hox loci as representative gnathostome species. The flanking

gene information for the human (assembly hg19) and coelacanth (assembly latCha1) Hox loci was obtained from the UCSC genome browser and elephant shark data was obtained from (Ravi et al. 2009). Each of the human, elephant shark, and coelacanth orthologous Hox clusters are flanked by certain common genes. However, these set of common genes are not seen in any of the lamprey

Hox clusters (Fig. 15). For example, the human, elephant shark, and coelacanth

HoxD clusters are flanked by Lnp and Evx2 on their 5’-end and Mtx2 in their

3’-end whereas none of the lamprey Hox clusters is flanked by these sets of

genes (Fig. 15). As another example, Calcoco2 and Ttll6 genes are found in the

5’-end of HoxB cluster in human, elephant shark and coelacanth. A conserved

block of Skap1-Snx11-Cbx1-Nfe2L1 genes are found in the 3’-end of the

human and coelacanth Hox B clusters (Fig. 15). However, this complement of

genes cannot be found in any lamprey Hox cluster. Likewise, a conserved

block of Smug1-Cbx5-Hnrnpa1-Nfe2 genes flanking the 3’-end of the human

and coelacanth HoxC clusters is not found linked to any lamprey Hox cluster.

71

Figure 15 – Comparison of Japanese lamprey, human, elephant shark and coelacanth Hox loci. Arrows denote the direction of gene transcription; pseudogenes are marked with a Ψ symbol. The ends of scaffolds are represented by black circles.

In addition to the different complement of genes flanking the gnathostome and lamprey Hox clusters, some genes flanking the lamprey Hox clusters have also

72 undergone rearrangements. For example, homologs of Atf and Agps genes found upstream and downstream of the human HoxC and D clusters, are located in the opposite ends of lamprey Hox- cluster (Fig. 15). Also, genes like Skap2 and Skap1 which are closely linked to human and coelacanth HoxA and HoxB clusters are not found at similar positions in the lamprey Hox-α, -β, and –γ cluster loci (Fig. 15). Thus, the synteny of genes flanking the lamprey

Hox clusters was also not informative for inferring orthology relationships between the lamprey and gnathostome Hox clusters. Thus I decided to retain the nomenclature of lamprey Hox clusters and genes as Hox-, Hox-, Hox-γ,

Hox-δ, Hox-ε, Hox-ζ, Hox-η, and Hox-θ, instead of HoxA, HoxB, HoxC and

HoxD.

3.5 Sizes of Japanese lamprey Hox clusters

A striking feature of the Japanese lamprey Hox clusters is their large size (Fig.

16) compared to gnathostome Hox clusters. Gnathostome Hox clusters are generally compact ranging from ~100 kb to 210 kb (see Fig. 16 for the sizes of human clusters) and contain very little repetitive sequences (~1% to 8%) (Di-

Poi, N. et al. 2010). By contrast, Hox clusters of invertebrates are generally large. The two separated complexes (antennapedia and bithorax) of the

Drosophila Hox cluster span around 400 kb and 350 kb respectively. The single intact Amphioxus is ~470 kb (Amemiya et al. 2008). Indeed the lamprey Hox clusters are more like invertebrate Hox clusters rather than gnathostome Hox clusters. The lamprey Hox-δ cluster, which is the largest, is ~526 kb long (Fig.

16). In fact, just the intergenic region between Hox-δ4 and Hox-δ3 is as large

(168 kb) as a complete gnathostome Hox cluster (Fig. 16). The lamprey Hox-α

73 and Hox-β clusters are approximately ~320 kb (Fig. 16). The shortest Hox cluster, Hox-γ cluster is only ~145 kb long, probably due to the fact that it contains only four Hox genes (Fig. 16). Some introns of lamprey Hox genes are also remarkably large. For instance, the first intron of the lamprey Hox-ε14 is

~56 kb while its homologous introns in elephant shark HoxD14 and coelacanth

HoxA14 genes are a mere 0.7 kb and 3.3 kb, respectively (Ravi et al. 2009;

Amemiya et al. 2010).

74

Figure 16 – Comparison of four Hox clusters (HOXA, B, C, and D) in human and anole lizard with the lamprey Hox clusters (Hox-α, -β, -γ, and –δ) and the single amphioxus Hox cluster. The size of each Hox cluster indicated at the right.

75

The enlarged sizes of the Japanese lamprey Hox clusters are mainly due to a high content of repetitive sequences in their intergenic and intronic sequences.

The level of interspersed repetitive sequences of the lamprey Hox cluster range from 20% in the Hox-α cluster to 32% in the Hox-δ cluster. In fact the overall content of repetitive sequences in the lamprey Hox clusters is the highest among all vertebrate Hox clusters (Fig. 17). The repeat content of most of the

Japanese lamprey Hox clusters is also higher than the average repeat content

(~21%) of the whole genome. This suggests that unlike the gnathostome Hox clusters which are impervious to repetitive sequences, the Japanese lamprey

Hox clusters have been invaded by a number of repeat elements. I analyzed the different types of repetitive sequences in the lamprey Hox clusters and found that the main categories include SINEs (16 to 18%), small RNA (16 to 18%), simple repeats (13-17%), LINEs (8-10%), lamprey-specific repeats (6-8%), and low complexity repeats (2-4%) (Table 1). Thus there is no dominance of any single type of repetitive element, suggesting that the lamprey Hox clusters are as prone to repetitive sequence invasion as any other region in the genome.

Table 1 – Repeat element content (%) in the four complete Hox clusters of the Japanese lamprey. Repeat elements were predicted using RepeatMasker-3.2.9 (http://www.repeatmasker.org/) and Lethenteron Repbase repeats.

Repeat element Lamprey Hox clusters Hox-α Hox-β Hox-γ Hox-δ SINEs 16% 18% 17% 16% LINEs 8% 10% 9% 9% Small RNA (Ribosomal RNA, small cytoplasmic RNA, small 16% 18% 17% 16% nuclear RNA, and tRNA) Simple repeats 16% 13% 17% 15% Lamprey-specific repeats 8% 6% 7% 8% Low complexity repeats 4% 2% 3% 3%

76

Previous studies have shown that interspersed repetitive DNA sequences are

typically excluded from gnathostome Hox clusters (Fried et al. 2004). Such

selection against the invasion of repeats are thought to reduce the chances of

cluster rearrangement, maintain compact intergenic distances and avoid

disrupting cis-regulatory elements which would all affect the precise spatial

and temporal expression patterns of vertebrate Hox genes (Duboule Denis

2007). There are however certain exceptions to this in certain taxa of

gnathostomes. For example, some Hox clusters in squamate reptiles (lizards

and snakes) (Di-Poi et al. 2009; Di-Poi et al. 2010) and amphibians (Voss et al.

2013) contain an unusually higher level of repeats compared to other gnathostomes. Hox clusters of the green anole lizard contain an average repeat content of ~8% (excluding simple repeats) and are larger (~175 to 310 kb) than the typical gnathostome Hox clusters (Fig. 16) (Di-Poi et al. 2009). The invasion of variable repeat elements in the Hox clusters of lizards and snakes are proposed to have altered coding and noncoding regulatory Hox regions (Di-

Poi et al. 2010). Such alterations are believed to have facilitated Hox adaptations, possibly contributing to the differential regulation and expression of certain Hox genes and the diverse body plans of lizards and snakes (Di-Poi

et al. 2010). It is therefore possible that the invasion of the Japanese lamprey

Hox clusters by a large number of repeat elements might have influenced the

spatial and temporal expression patterns of certain lamprey Hox genes. This

hypothesis needs to be verified by analyzing the expression patterns of Hox

genes at various stages of development of the Japanese lamprey.

77

25%

20%

15%

10%

5% Average content of repeats (%)

0%

Organism

Figure 17 – Average content of repetitive elements in Hox clusters of various chordates. Average repeat content of Hox clusters of Homo sapiens, Mus musculus, Xenopus tropicalis and Danio rerio were obtained from (Di-Poi et al. 2009), Callorhinchus milii from (Ravi et al. 2009), and Amphioxus from (Amemiya et al. 2008). Repetitive sequences in four Lethenteron japonicum Hox clusters (Hox-α, -β, -γ, and –δ) were identified using CENSOR (Kohany et al. 2006) (see Materials and methods).

3.6 Absence of Hox12 gene in lamprey

The Hox genes identified by me in the Japanese lamprey genome include members of all paralogous groups except 12. Hox12 is also absent in the sea lamprey genome. However, the Hox12 gene is present the Hox clusters of basally branching chordates such as amphioxus and Ciona that contain a single intact or fragmented Hox cluster (Ikuta et al. 2004). At least two Hox clusters of most gnathostomes (HoxC and HoxD) contain Hox12 genes (Fig. 16). Thus, the absence of Hox12 seems to be unique to the lamprey lineage. Hox12 has been lost either very early during the duplication history of lamprey Hox

78 clusters or independently in different lamprey Hox clusters after the duplication of Hox clusters. Interestingly, Hox12 genes retained in gnathostomes exhibit expression in some organs that are absent in lampreys. For example, zebrafish

Hoxd12a is expressed in pectoral fins (Thisse and Thisse 2005), and mouse

HoxC12 is expressed during hair follicle development (Shang et al. 2002).

More importantly, mice with a defective HoxD12 gene have minor autopodal defects of the forelimb which include a reduced length of metacarpals and phalanges indicating that this Hox12 gene plays an important role in the development of forelimb (Capecchi 1989). It therefore seems possible that

Hox12 genes retained in gnathostomes may have been co-opted for the development of gnathostome-specific organs such as paired limbs.

3.7 microRNA in Japanese lamprey Hox clusters

At least two microRNA (miRNA) families are embedded within the Hox clusters of metazoans. The evolutionary history of Hox-embedded miRNAs often parallel the evolution of Hox clusters themselves (Mansfield and

McGlinn 2012). One miRNA gene family, miR-10 is found in all chordates and has been traced to early bilaterians (Tanzer et al. 2005; Lemons and McGinnis

2006). Amphioxus, a cephalochordate, was shown to have three miR-10 copies in its single Hox cluster, one between Hox4 and Hox5 (like all other chordates), and another two arising from amphioxus-specific gene duplications located between Hox5 and Hox6, and Hox9 and Hox10 (Campo-Paysaa et al. 2011). In gnathostome Hox clusters there are two miRNA genes, miR-10 and miR-196 located between Hox4 and Hox5, and Hox9 and Hox10, respectively. In elephant shark, the HoxB, HoxC, and HoxD clusters each contain a miR-10

79 whereas the HoxA, HoxB, and HoxC clusters each contain a miR-196 (Ravi et al. 2009). The miR-10 in the HoxC cluster of elephant shark is lost in the HoxC cluster of human, thus there are only two miR-10 genes and three miR-196 genes in human. Among the seven zebrafish Hox clusters, miR-10 gene is found in the HoxBa, HoxBb, HoxCa, and HoxDa clusters (Tanzer et al. 2005).

Interestingly, while all the Hox genes have been lost in the eighth HoxDb cluster, a single miR-10 is still intact (Woltering and Durston 2008). The zebrafish HoxAa, HoxCa, and HoxCb clusters each contain a miR-196 gene

(Tanzer et al. 2005). This pattern of distribution of miR-196 and miR-10 genes in gnathostome Hox clusters suggests that both microRNA gene families were present in the Hox clusters of the common ancestor of vertebrates. Previously a miR-196 was identified in one of the two Hox clusters (Pm2Hox) of the sea lamprey (Smith et al. 2013). To determine the complement of microRNA in the

Hox clusters of the Japanese lamprey, I predicted miRNAs either by searching for mature miR genes using miRBase (http://www.mirbase.org/ ), or by

BLASTN using known miR-10 and miR-196 as seed sequences. My analysis identified two miRNAs, one miR-10 in the Hox-α cluster (between Hox-α4 to –

α5 ) and one miR-196 in the Hox-β cluster (between Hox-β9 to –β10) (Fig. 7).

The miR-196 gene shows 100% orthology to its gnathostome orthologs.

However, the miR-10 sequence in the Hox-α cluster is divergent, showing only

87% identity to gnathostome miR-10 genes but 100% identity to sea lamprey miR-10a (Heimberg et al. 2010). Additional copies of miR-196 or miR-10 were not found between expected regions in the Hox-β, Hox-γ, and Hox-ε clusters, for example between Hoxε9-ε10, Hoxβ4-β5, and Hoxγ4-γ5 for which complete sequence were available (Fig. 6). It therefore seems that lamprey has retained

80 fewer miRNA genes in its Hox clusters than gnathostomes after Hox cluster duplication. This either suggests that the expression pattern of only a few Hox genes in the lamprey clusters are post-transcriptionally regulated by microRNAs or, the two embedded microRNAs could be acting in trans to regulate the Hox genes of other clusters. In this respect, it would be interesting to analyze the expression patterns of mir-10 (from Hox-α cluster) and mir-196

(from Hox-β cluster) in lamprey embryos.

81

Chapter 4: Results – Conserved noncoding elements (CNEs) in the Hox gene clusters

4.1 Introduction

The first step in studying gene regulation is the identification and

characterization of cis-regulatory elements (enhancers) that mediate tissue-

specific expression of genes. A powerful approach for identifying cis-

regulatory elements is through comparative genomics. Since functional

noncoding elements evolve slower than flanking sequences, cis-regulatory

elements can be identified as conserved noncoding elements (CNEs) through

the comparison of related genomes. Indeed, the comparative genomics approach has been used to predict cis-regulatory elements in the Hox clusters of gnathostomes by comparison of human, teleost fish and cartilaginous fish

Hox clusters (Nolte et al. 2003; Santini et al. 2003; Prohaska et al. 2004; Wang

W. C. et al. 2004). However, it is unclear how many of the gnathostome Hox cis-regulatory elements are conserved in jawless vertebrates. Cis-elements conserved in gnathostomes and jawless vertebrates represent elements that were present in the common ancestor of vertebrates. Such elements must be playing a fundamental role in the regulation of vertebrate Hox cluster genes.

The four complete Japanese lamprey Hox clusters sequenced by me allowed me to investigate such ancient vertebrate cis-regulatory elements in Hox clusters.

4.2 CNEs in lamprey and representative gnathostome Hox loci

In order to identify ancient vertebrate cis-regulatory elements in Hox clusters, I predicted CNEs in the four complete lamprey Hox clusters (Hox-α, Hox-β,

82

Hox-γ, and Hox-δ). Since the exact orthology of these Hox clusters to the four gnathostome Hox clusters (HoxA, B, C and D) could not be established, I aligned each of the four lamprey Hox clusters with all the four Hox clusters of elephant shark and human, representing two distantly related lineages of gnathostomes. Global alignments of the Hox cluster sequences were carried out using the program MLAGAN (Brudno et al. 2003) and the CNEs were predicted using VISTA (Frazer, K. A. et al. 2004) (see Materials and methods).

I defined CNEs as sequences displaying a minimum of 65% identity across a length of 50-bp or more. CNEs identified previously between human and mouse (divergence time ~60 Ma) using threshold values of 70% identity over

100 bp sequences have been shown to function as cis-regulatory elements

(Loots et al. 2000). Considering the much larger evolutionary distance between human and lamprey (~500 Ma), I chose to use lower threshold values of 65% identity over 50 bp. In fact similar lower cutoff values of 65% identity over

40-bp have been used previously to identify CNEs between sea lamprey and gnathostome non-Hox loci and transgenic assay of such CNEs showed them to have the potential to function as cis-regulatory elements (McEwen et al. 2009).

In addition to the conservation criteria, I also required that CNEs be conserved in both elephant shark and human Hox clusters. This eliminates any elements that are conserved between lamprey and just one gnathostome cluster by chance.

The alignment of lamprey Hox-α cluster with the four human and elephant shark Hox clusters indicated that the lamprey cluster shares seven CNEs with gnathostome Hox clusters (Fig. 18 and Table 2). The first CNE, αCNE1, is

83 conserved in the elephant shark and human HoxA and HoxB clusters in the intergenic region of Hox8 and Hox7. The second CNE, αCNE2, is located in the intergenic region of Hox7 and Hox6 and conserved across all four elephant shark and human Hox clusters. Three CNEs, αCNE3, αCNE5, and αCNE6 are conserved in the elephant shark and human HoxA and HoxB clusters like

αCNE1 but located in the intergenic region of Hox6 and Hox5, Hox4 intron, and the Hox4 and Hox3 intergenic region, respectively. Another CNE, αCNE4 is found in the intergenic region of Hox5 and Hox4 and conserved in HoxC and

HoxD clusters of elephant shark and human. The final CNE, αCNE7 is only conserved in the HoxA cluster of elephant shark and human.

84

Table 2 - Conserved noncoding elements (CNEs) between the Japanese lamprey Hox-α, -β, -γ, and -δ clusters and the four Hox clusters (HoxA, B, C and D) of elephant shark (C. milii) and human. Each lamprey Hox cluster sequence was aligned with all the four Hox clusters of elephant shark and human using MLAGAN and CNEs were predicted using VISTA.

CNE length (bp) and % identity CNEs HoxA cluster HoxB cluster HoxC cluster HoxD cluster Location C. milii Human C. milii Human C. milii Human C. milii Human 112 bp 105 bp 108 bp 92 bp Hox8-7 Hox-αCNE1 - - - - (71%) (67%) (69%) (74%) intergenic 184 bp 159 bp 160 bp 160 bp 145 bp 50 bp 96 bp Hox7-6 Hox-αCNE2 52bp (81%) (72%) (74%) (79%) (78%) (68%) (78%) (74%) intergenic 109 bp 81 bp 101 bp 61 bp Hox6-5 Hox-αCNE3 - - - - (69%) (73%) (72%) (77%) intergenic 75 bp 73 bp 71 bp 55 bp Hox5-4 Hox-αCNE4 - - - - (59%) (59%) (69%) (71%) intergenic 91 bp 91 bp 83 bp 62 bp Hox-αCNE5 - - - - Hox4 intron (66%) (64%) (64%) (71%) 52 bp 58 bp 52 bp 53 bp Hox4-3 Hox-αCNE6 - - - - (69%) (72%) (70%) (68%) intergenic 71 bp 71 bp Hox3-2 Hox-αCNE7 ------(83%) (85%) intergenic 110 bp 132 bp Tax1bp-Evx βCNE1 ------(73%) (73%) intergenic 60 bp 76 bp Tax1bp-Evx βCNE2 - - - - (80%) (80%) intergenic 179 bp 182 bp Tax1bp-Evx βCNE3 - - - - (74%) (73%) intergenic 158 bp 129 bp 150 bp 144 bp 65 bp 53 bp Hox7-5 Hox-βCNE4 - - (67%) (71%) (71%) (65%) (69%) (68%) intergenic 73 bp 93 bp 93 bp 98 bp 56 bp 51 bp 58 bp 64 bp Hox9-5 Hox-γCNE1 (77%) (73%) (69%) (66%) (75%) (67%) (71%) (64%) intergenic 52 bp 52 bp Hox8-4 Hox-δCNE1 ------(69%) (69%) intergenic

85

Lamprey-α Tax1bp Hox13 Hox11Hox9 8765 Hox4 Hox3 2 Mtx

Shark HoxA αCNE2

Human HoxA

Shark HoxB %

Human HoxB identity

Shark HoxC

Human HoxC

Shark HoxD

Human HoxD

Hox7 Hox6 Hox5Hox4 Hox3 Hox2

Shark HoxA

Human HoxA

Shark HoxB αCNE7

Human HoxB

αCNE1 αCNE3 αCNE5 αCNE6 Shark HoxC

Human HoxC

Shark HoxD

Human HoxD αCNE4 Figure 18 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox-α cluster with the four Hox clusters (A to D) of elephant shark and human. The lamprey Hox-α sequence was used as the base (x-axis) and CNEs were predicted using a criterion of ≥65% identity across ≥50-bp windows. The y- axis represents percentage sequence identity. Blue peaks represent exons and pink peaks represent CNEs. CNEs are marked with green boxes.

Interestingly, the lamprey Hox-αCNE5, located in the 512 bp intron of Hox-α4 overlaps a 92 bp region of a previously characterized HoxB4 enhancer (referred

86 to as ‘CR1’) that is conserved in mouse, human, zebrafish, and elephant shark

(Aparicio et al. 1995) (Fig. 19).

Exon 1 Mm HoxB4 A A AGA G C C C G T C G T C T A C C C C T G G A T G CGCA A AG T T C A CG T G A G C A CGG G Dr_HoxB4a A A AGA T C C C G T G G T A T A C C C C T G G A T G AAAA A AG T C C A CG T A A A C A TCG G Es HoxB4 A A AGA G C C C G T G G T T T A T C C A T G G A T G AAGA A AG T A C A TAT A A A C A TCG G Lj Hox-alpha4 A A GCA G C C G G T C G T G T A C C C G T G G A T G AAAA A GAT C C A CG T G A G C A CAG G

Mm HoxB4 T G A G TG------CG T G G G C A CCCC TTCCCCTCCC A C C Dr_HoxB4a T A A G TGC A G C T G A A C T T C T TT TT TT A C T A G T G T G T-TTCC GTGCACCGAC A G C Es HoxB4 T G A G T------AG T G T G C-CTTC GCTTTTTACC A C C Lj Hox-alpha4 T G A G GA------AG C G G G C-GACC GAGTGTTTAC A G C

Mm HoxB4 CCCG GCGCTTTACAG T G A AGGGAC C T C CGAGG CA T G T G GGGG AGGG AGC G Dr_HoxB4a TCGG CCACTTT------A GAGAGC C T------AATG TCC G Es HoxB4 T------GGAAGC C T------Lj Hox-alpha4 CGCG TGGAGACGGGG G G A GGCAGC C G C TTTTG GTT G T G AAAG GCG-----

CR1 region Mm HoxB4 AGG GGAAGCC A A T T- - - -G TCC CCGCT A T A A A C T CGCCA TTGCCAGA G A T Dr_HoxB4a GCG TTAAATC T A T T- - - -G CTC CTCTT A T G A A G T TGCCCT TGG TCGA G A T Es HoxB4 ----GGGATC T A T TGCA A G CAC CCA TT G T A A ---TAGTCATGCTTTA G A T Lj Hox-alpha4 ----GAGGAC C A T CAAA G G GGC TCCTT T T ------TCCCCGCA G A T

Mm HoxB4 T T A CG-GTCTCC TG T T T TCAG AGCCA CA T A A T T A C A T C GCC C A T A A A T T T Dr_HoxB4a T T A CG - ACCG TC TG T T T GCAG GGCAA TAT A A T T A C A C C CTC C A T A A A T T T Es HoxB4 T T A CA - ATCGCC TATCT GGGG GTTA- -GT A A T T A C A T C CTC C A T A A A T T T Lj Hox-alpha4 T T A TGC GGAGCC GGCG T GG TG GCCG- -GT A A T T A C A C C CCC C A T A A A A T T

Mm HoxB4 T T A T G G C C T A G T G G G C C C Dr_HoxB4a T T A T T G ------C Es HoxB4 T T A T T G ------C Lj Hox-alpha4 T T A T T G ------C

Figure 19 – Alignment of partial exon1 and intron of mouse, zebrafish, and elephant shark HoxB4 and lamprey Hox-α4 genes. Partial HoxB4 gene sequences for Mus musculus (Mm), Danio rerio (Dr), and Elephant shark (Es) are shown aligned with partial Lethenteron japonicum (Lj) Hox-α4 sequence. Conserved regions are shown with black shading and alignment gaps are given as a dash (-). Exon1 boundary is shaded in blue and CR1 region of the HoxB4 enhancer (Aparicio et al. 1995) is shaded in red.

The CR1 region has been shown to be important for correct spatiotemporal

HoxB4 expression in the mesoderm, central nervous system, and peripheral nervous system (Aparicio et al. 1995). Identification of this element in lamprey suggests that this is an ancient vertebrate enhancer conserved even in jawless vertebrates. This element in lamprey may mediate expression of Hox-α4 expression in the lamprey nervous system.

87

The alignment of lamprey Hox-β cluster with the elephant shark and human

Hox clusters identified four CNEs (Fig. 20 and Table 2).

Lamprey-β A upstream

Tax1bp Evx

Shark HoxA

Human HoxA

Shark HoxB %

Human HoxB identity

Shark HoxC

Human HoxC

Shark HoxD βCNE1,2 ,3

Human HoxD

88

B Lamprey-β Evx Hox11 Hox10Hox9 8 7 5 Hox4 Hox1 Snx

Shark HoxA

Human HoxA

Shark HoxB % Human HoxB

identity

Shark HoxC

Human HoxC

Shark HoxD

Human HoxD

Hox7 Hox5

βCNE4 Shark HoxA

Human HoxA

Shark HoxB

Human HoxB

Shark HoxC

Human HoxC

Shark HoxD Human HoxD Figure 20 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox-β loci with the four Hox clusters (A to D) of elephant shark and human. (A) Region from Tax1bp to Evx flanking 5’ region of lamprey Hox-β cluster and, (B) the lamprey Hox-β cluster. In both alignments lamprey sequence was used as the base (x-axis). CNEs were predicted using a criterion of ≥65% identity across ≥50-bp windows. The y-axis represents percentage sequence identity. Blue peaks represent exons and pink peaks represent CNEs. Identified CNEs are marked with green boxes.

Three of the CNEs (βCNE1, βCNE2, and βCNE3) are located in the intergenic region of Tax1bp and Evx genes, and conserved in the elephant shark and human HoxD cluster (Fig. 20A and Table 2). The fourth CNE (βCNE4) is

89

found in the intergenic region of Hox7 and Hox5 and conserved in the HoxA,

HoxB, and HoxD clusters of elephant shark and human (Fig. 20B).

Interestingly, both βCNE2 and βCNE3 partially overlap a 1340 bp previously

tested enhancer element (named Hs246) located in the intergenic region of

Lnp-Evx2 in human, elephant shark, and zebrafish HoxD loci (Pennacchio et al.

2006) (Fig. 21). However, the lamprey βCNE2 and βCNE3 elements are ~7 kb

apart in the lamprey whereas the corresponding human enhancer regions

(Hs246 element) are only ~170 bp apart (Fig. 21). Functional assay of Hs246

element had previously indicated that it drives tissue-specific expression in the

hindbrain (rhombencephalon) of developing mouse embryos (Pennacchio et al.

2006). Therefore it is likely that the CNEs identified in lamprey are core

elements of this enhancer that were present in the common ancestor of

vertebrates. The two elements in lamprey are separated by 7 kb as a result

repetitive element insertions which make up nearly half of the sequence.

90

βCNE2 Hs246 enh. C A G T C A G C T A A A T G A A A C A T TATTCTA AACATAT G C A TCG T A A T C A G TTC CmHoxA ------T G C A CCG T A A T C A G CTT CmHoxD ------G C A TCG T A A T C A G TTC DrHoxDa ------T TATTCTA AACATAT G C A TCG T A A T C A G TTC LjHox-Beta ------TT ATCCTAA GCATATT G C A TTG T A A T C C G CGC

Hs246 enh. G G T C A C A C T T A C A A G A A C ACG CG T T A A T A AGG C A A T C A A T C A CCCTGGA A CmHoxA G G T C A C A C T T A C A A G A A C AAG CG T T A A T A AGG C A A T C A A T C A AAGCAAA A CmHoxD G G T C A C A C T T A C A A G A A C ACG CG T T A A T A AGG C A A T C A A T C A CC------DrHoxDa G G T C A C A C T T A C A A G A A C ACG CG T T A A T A AGG C A A T C A A T C A C------LjHox-Beta G G T C A C A C T T A C A A G A G C GCG ACGT A A T A GCG C T A T C A A T C A CC------

Hs246 enh. C AAAGCAAGTTGTTCTGTAACAGCTCATAAACAGTGTGTAATGAAGAATTGA G C A A G T T G T T C T G T A A C A G C T C A T A A A C A G T G T G T A A T G A A G A A T T G CmHoxA C ------CmHoxD ------DrHoxDa ------LjHox-Beta ------

Hs246 enh. GAGGTTACCGTGACATGCGTTGATCAGATAATCAATGTCAAAGATGCGATG A G G T T A C C G T G A C A T G C G T T G A T C A G A T A A T C A A T G T C A A A G A T G C G A T CmHoxA ------CmHoxD ------DrHoxDa ------LjHox-Beta ------

Hs246 enh. GGAATGTCAGTAAATGTAGTTTTCATGTCGTTTCTATAAAATCTTCAATTTA A T G T C A G T A A A T G T A G T T T T C A T G T C G T T T C T A T A A A A T C T T C A A T T T CmHoxA ------CmHoxD ------DrHoxDa ------LjHox-Beta ------βCNE3 Hs246 enh. ACAACAAGCAGTA C A A C A A G C A G T T C A A T T A C C C A G A A A A T A C A G T C A A T T A A A TAGGGG T G CmHoxA ------T C A A T T A C C C T G C A A A C A C A G T C A A T T A A A GAATGTT G CmHoxD ------T C A A T T A C C C A G A A A A T A C A G T C A A T T A A A TAGGGG T G DrHoxDa ------T C A A T T A C C C A G A A A A T A C A G T C A A T T A A A CGGGCG T G LjHox-Beta ------T C A A T T A C C C A G A A A A T A T A G T C A A T T A A A GGCAAC T G

Hs246 enh. A TTG G A C A G TAGGG G G GTG G A T C A T C G A T C T T T G C ATTT CTATCTCGCTA CmHoxA A TTG G A C G G GAGG - - - ATG G A T C A T C G A T C T T T G C -ATT TCAGTATCACT CmHoxD A TTG G A C A G TAGG- - -GTG G A T C A T C G A T C T T T G C -ATT TCTATTTCGCG DrHoxDa A TTG G A C A G TAGCG G -GAG G A T C A T C G A T C T T T G C -ATT TCTATCTTGCT LjHox-Beta A AAG G A C A G CGAG - - - A CG G G T C A T C G A T C T T T G C CAGT CTAATTTCGTA

Hs246 enh. GTGGACATT TAA T T T GG T T T T TCC ATTAGCGACG AAATAA AGA AAA T A TA CmHoxA TTCAGACA T CTA A T T TTGCT T TTC CTA------CmHoxD AGTGGACA T TTA A T T TTGTT T TTC TATT - - AGCG CTGAAA TAA AGA A A AT DrHoxDa GG TGGACA T TTA A T T TGGG T T TTC TATT--A------LjHox-Beta GCCCGACA T CTA A T T TTGTT T CGC CTATTAACGG GCGAGA TCA GGA C A AT

Hs246 enh. A A A TATATTA AACA A C CTAC AGA T T TA T T T T C T T G T CAAA CmHoxA ------CmHoxD A T A AAATATA TTAA A C AACC TACAGATT T A T T T T C T TG TC DrHoxDa ------LjHox-Beta A T A ACACGCA TTAA A C AAGC GATGAAGT T A T T T T C T TG TC

Figure 21 – Alignment of part of the human Hs246_enhancer element and orthologous CNE regions from elephant shark (Callorhinchus milii), zebrafish (Danio rerio), and the Japanese lamprey. Orthologous CNE regions are from the intergenic region of Lnp and Evx2 genes in the HoxA and HoxD loci of the Callorhinchus milii (Cm), HoxDa locus of Danio rerio (Dr), and Hox-β locus of Japanese lamprey Lethenteron japonicum (Lj). Conserved regions are shown with black shading and alignment gaps are shown as a dash (-). The predicted lamprey βCNE2 region is shaded in green and the lamprey βCNE3 region in blue.

91

The alignment of lamprey Hox-γ cluster showed that it contains only one CNE located in the intergenic region of Hox9 and Hox5 (Fig. 22 and Table 2). This

CNE is conserved in all the four Hox clusters (HoxA, HoxB, HoxC, and

HoxD) of elephant shark and human (Table 2).

Lamprey-γ Atp5g 13 Hox9Hox5 Hox4 Cbx

Shark HoxA γCNE1

Human HoxA

Shark HoxB

Human HoxB %

Shark HoxC identity

Human HoxC

Shark HoxD

Human HoxD

Figure 22 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox-γ cluster with the four Hox clusters (A to D) of elephant shark and human. The lamprey Hox-γ sequence was used as the base (x-axis), and CNEs were predicted using a criterion of ≥65% identity across ≥50-bp windows. The y- axis represents percentage sequence identity. Blue peaks represent exons and pink peaks represent CNEs. The single identified CNE is marked with a green box.

The lamprey Hox-δ cluster also has only one CNE. It is located in the intergenic region of Hox8 and Hox4 (Fig. 23 and Table 2). This CNE is conserved only in the elephant shark and human HoxB clusters (Fig. 23 and

Table 2).

92

Lamprey-δ Lnp Hox13Hox11 Hox8 Hox4 Hox3 2 Smug

Shark HoxA

Human HoxA

Shark HoxB %

Human HoxB identity

Shark HoxC

Human HoxC

Shark HoxD

Human HoxD

Hox8

Shark HoxA

Human HoxA δCNE1

Shark HoxB

Human HoxB

Shark HoxC

Human HoxC

Shark HoxD Human HoxD Figure 23 - VISTA plot of the MLAGAN alignment of Japanese lamprey Hox-δ cluster with the four Hox clusters (HoxA to D) of elephant shark and human. The lamprey Hox-δ sequence was used as the base (x-axis), and CNEs were predicted using a criterion of ≥65% identity across ≥50-bp windows. The y- axis represents percentage sequence identity. Blue peaks represent exons and pink peaks represent CNEs. The single identified CNE is marked with a green box.

Overall, I identified a total of 13 CNEs between the four complete lamprey

Hox clusters (Hox-α, Hox-β, Hox-γ, and Hox-δ) and the HoxA, HoxB, HoxC,

93

and HoxD clusters of elephant shark and human (Table 2). Previously 179

CNEs were identified between the Hox clusters of human and elephant shark

(Ravi et al. 2009). Considering these two gnathostomes lineages diverged ~450

million years ago (Ma) (Inoue et al. 2010) and that lamprey and human

lineages diverged ~50 million years earlier (~500 million years ago) (Janvier

2006), the 13 CNEs identified in the lamprey Hox clusters is considerably

lower than that expected in relation to the divergence times. This low number is

also consistent with the overall low number of CNEs (~300) identified between

the whole genome sequences of the sea lamprey and human (Smith et al. 2013).

By contrast, ~2100 CNEs have been identified between human and fugu

(Venkatesh et al. 2006) that diverged from one another ~ 420 Ma. One possible explanation for the identification of a limited number of CNEs in the lamprey

Hox clusters (and the whole genome) could be due to a faster rate of evolution

of noncoding sequences in the lamprey genome compared to gnathostomes.

Another explanation could be that a large number of noncoding elements were brought under constraint after the divergence of lamprey and gnathostome lineages, which implies a large scale regulatory invention in the gnathostome

Hox clusters.

An interesting observation is that although lamprey Hox clusters contain few

CNEs, each lamprey Hox cluster (except Hox-δ cluster) shares CNEs with three or more elephant shark and human paralogous Hox clusters. For example, lamprey CNE2 and γCNE1 are conserved in all the four Hox clusters of elephant shark and human (Table 2). On the other hand, lamprey βCNE4 is conserved in three gnathostome Hox clusters while lamprey αCNE1, αCNE3,

94

αCNE4, αCNE5, and αCNE6 are conserved in two gnathostome Hox clusters

(Table 2 and Fig. 24).

Figure 24 - CNEs shared between each of the four lamprey Hox clusters and human Hox clusters. CNEs shared (z-axis) between each of the human HoxA, B, C, and D clusters (y-axis) and four complete lamprey Hox clusters.

This pattern of CNEs between individual lamprey Hox clusters and multiple paralogous gnathostome Hox clusters suggests that each lamprey Hox cluster is a common ortholog of all the four gnathostome Hox clusters rather than an ortholog of a single gnathostome Hox cluster. If it were the latter case, each lamprey Hox cluster would have shared CNEs predominantly with a single gnathostome Hox cluster. For example, when each of the human Hox clusters is aligned with the four elephant shark Hox clusters, each human cluster shares

CNEs predominantly with its one orthologous Hox cluster in elephant shark as shown in Fig. 25A. Similarly, when human Hox clusters were aligned with the seven Hox clusters (HoxAa, HoxAb, HoxBa, HoxBb, HoxCa, HoxDa, and

95

HoxDb) of fugu, which has experienced an additional round of WGD, the same

pattern (i.e. human HoxA shares most CNEs with fugu HoxAa or Ab, human

HoxB with fugu HoxBa or Bb, and so on) of CNE numbers being highest

between orthologous human and fugu Hox clusters is observed (Fig. 25B).

Figure 25 - 3D graph of the number of CNEs shared between human, elephant shark (C. milii), and fugu Hox clusters. (A) CNEs shared (z-axis) between each of the four human Hox clusters (y- axis) with elephant shark HoxA, B, C, and D clusters. (B) CNEs shared (z- axis) between each of the four human Hox clusters (y-axis) fugu HoxAa, Ba, Ca, and Da clusters (above), and duplicate fugu HoxAb, Bb, and Db clusters (below). Fugu has lost duplicated HoxCb cluster.

By contrast, the lamprey Hox clusters do not predominantly share CNEs with

just a single gnathostome Hox cluster, but with up to all the four Hox clusters

(HoxA, HoxB, HoxC, and HoxD) human and elephant shark. One explanation

for this is that the lampreys and gnathostome lineage diverged before the

duplication of Hox clusters and thus each lamprey Hox cluster is an ortholog of

all the four Hox clusters in human and elephant shark. This view contradicts the suggestion that the first two rounds of genome duplications in the vertebrate 96

stem lineage (1R and 2R) occurred before the divergence of lamprey and

gnathostome lineages (Smith et al. 2013).

4.3 Functional assay of CNEs

One way to determine whether the identified CNEs function as cis-regulatory

elements is to test their ability to up-regulate reporter gene expression in a

transgenic assay system. Zebrafish is ideal for this purpose due to its

transparent embryos, ease of manipulation, high fecundity, short generation

time, and inexpensive maintenance compared to mice. Recent advances in

transposon-based gene transfer strategies (Kawakami 2007) have enabled

testing expression in stable transgenic lines. To determine if the CNEs

identified in the lamprey Hox clusters have the potential to function as cis-

regulatory elements, I selected four lamprey CNEs and their human or mouse orthologs for testing in transgenic zebrafish (Table 3).

Table 3 – CNEs selected for functional assay in transgenic zebrafish. Selected Corresponding human CNE tested with length (bp) and % CNEs identity Human Human Human Human HoxA HoxB HoxC HoxD αCNE2 - 160 bp (78%) - - αCNE5 - 62 bp (71%) - - βCNE2 - - - 76 bp (80%) βCNE3 - - - 182 bp (73%)

Among the four CNEs, αCNE2 was selected due to its conservation in all four

human and elephant shark Hox clusters, suggesting that it may mediate

expression of neighboring Hox genes in all the four gnathostome Hox clusters.

The other three CNEs, αCNE5, βCNE2, and βCNE3 were selected as they

overlap known enhancer elements (Fig. 19 and 21) that were tested previously

97 in transgenic mice. It would be interesting to see if the element conserved in the lamprey drives expression pattern(s) similar to that in gnathostomes. I assayed the lamprey CNEs together with their corresponding human CNEs

(Table 3) to see whether sequence variations of the same CNE in different organisms contribute to differing expression patterns. Also, as I am conducting a trans-species assay (that is CNE from lamprey and human in transgenic zebrafish), the expression pattern driven by orthologous lamprey and human

CNEs can be assessed when both are taken out of context and tested in a different model organism.

For the functional assay, CNEs were first amplified from genomic DNA and cloned into a mini-Tol2 construct (modified from (Balciunas et al. 2006)) upstream of a GFP reporter gene. The CNE-containing mini-Tol2 vector and the transposase mRNA were co-injected into early stage fertilized zebrafish embryos and screened for GFP up to 72 hrs post fertilization (hpf) (see Fig. 4 in

Materials and methods). Zebrafish embryos showing transient GFP expression were grown to adult stage (3 months) and out-crossed with wild-type to generate stable transgenic lines to avoid possible phenotypic variation associated with transient transgenic assays. A CNE was considered a transcriptional enhancer if it directed reproducible reporter gene expression in the same anatomical structures in two or more lines of F1 fishes.

4.3.1 Functional assay of αCNE2

The lamprey αCNE2 is located in the intergenic region of Hox-α7 and Hox-α6.

98

LjHox‐α7LjHox‐α6

mb hb sc hb oc fb fb mb CNE2  oc

Lamprey ACB pf

hb sc mb hb fb fb CNE2  Human DE F

Figure 26 – Expression pattern driven by lamprey αCNE2 and its human homolog in 3 dpf F1 generation zebrafish embryos. A, B, D, and E: lateral view. C and F: dorsal view. oc, otic capsule; fb, forebrain; mb, midbrain; hb, hindbrain; pf, pectoral fin; sc, spinal cord.

In the F0 population, both lamprey and human CNEs drove weak expression to the spinal cord region. In the F1 population, both lamprey and human CNEs drove strong reproducible expression in the spinal cord, with the human CNE showing more pronounced expression throughout the spinal cord (Fig. 26).

This expression pattern resembles a subset of expression domains of Hox6 and

Hox7 genes in mouse and zebrafish. For example mouse HoxA6, HoxB6, and

HoxC6 are expressed in the spinal cord starting from behind the rhombencephalon and extending up to the caudal region (Nolte and Krumlauf

2000). Similarly all paralogs of zebrafish Hox6 are expressed in the spinal cord with an anterior limit adjacent to the somite1/2 boundary and extending posteriorly (Prince et al. 1998; Thisse and Thisse 2004, 2005). The zebrafish

Hoxb7a gene is expressed in the spinal cord beginning from the somite 2/3 boundary (Prince et al. 1998). In situ hybridization studies in the Japanese lamprey have shown that, Hox-α6 gene (known in literature as LjHox6W) is

99 expressed in the neural tube from the adjacent mid-somite 6 to 8 region (Takio et al. 2007). Thus it is possible that the CNE predicted by me is responsible for mediating expression of its flanking Hox genes in the neural tube.

The lamprey and human CNEs also drove expression in the forebrain, midbrain and hindbrain (Fig. 26). This is unexpected considering that Hox genes in gnathostomes are known to be expressed predominantly in the region posterior to hindbrain. However, in the Japanese lamprey, the anterior Hox-α2 gene

(known in literature as LjHox2) is expressed in the neural tube starting from the hindbrain (rhombomere 2) region (Takio et al. 2007). Thus the lamprey CNE might posses the potential to drive expression in the hindbrain region in the lamprey. Nevertheless, the expression patterns observed in the forebrain and midbrain of zebrafish are clearly not the expression domains of Hox genes.

This ectopic reporter gene expression could be due to the fact that the CNEs have been taken out of their original genomic context and may lack certain silencer elements that exclude their ectopic expression.

In addition to the shared expression domains of the lamprey and human CNEs described above, lamprey CNE drove reporter gene expression in the otic capsule and pectoral fin (Fig. 26) of transgenic zebrafish. These domains resemble the expression patterns of some Hox genes present in the vicinity of the CNE: a mouse Hox7 transcript has been shown to be expressed in the external ear (MacKenzie et al. 1991), while zebrafish Hoxc6a and Hoxa5a are expressed in the pectoral fin (Jarinova et al. 2008). An interesting point is that while lampreys possess otic capsules, they lack pectoral fins and hence the

100 lamprey CNE driving expression in this domain is unexpected. An explanation for this expression domain is that the lamprey CNE may have TFBS that have the potential to drive expression to paired appendages, and that the zebrafish may possess TFs that bind to these sequences.

4.3.2 Functional assay of αCNE5

The lamprey αCNE5 is located in the 512 bp intron of Hox-α4. Because this

CNE overlaps a conserved region (referred to as ‘CR1’ in literature) of the previously characterized mouse HoxB4 enhancer (Aparicio et al. 1995), I chose to test the mouse orthologs of this element in zebrafish. A total of three stable transgenic founder lines each were generated for lamprey and mouse elements.

Hox‐α4

e1 e2 512 bp

mn

CNE5 mn 

Lamprey Lamprey ACB

sc r

CNE5 at 

L r pd Mouse DE F

Figure 27 – Expression pattern driven by lamprey αCNE5 and its mouse homolog in 3 dpf F1 generation zebrafish embryos. A, B, D, and E: lateral view. C and F: dorsal view. mn, motor neurons; L, lens; r, retina; pd, pronephric duct; at, axonal tracts; sc, spinal cord.

101

Both lamprey and mouse CNEs drove strong spinal cord expression, albeit

transient, in both sets of the F0 population. In the F1 population, both lamprey

and mouse CNEs also drove expression in the spinal cord with an anterior

border at the hindbrain region (Fig. 27). The lamprey CNE specifically drove

GFP reporter gene expression in small dot-like structures which appear to be

individual motor neuron populations of the spinal cord, whereas the mouse

CNE directed pronounced GFP expression throughout the spinal cord and in individual axonal tracts along the spinal cord (Fig. 27). The pattern of mouse

CNE expression is consistent with the neural CR1 element shown to drive a

subset of Hox4 expression in the central nervous system (Aparicio et al. 1995)

with an anterior limit of expression at the hindbrain/spinal cord boundary of

developing mouse embryos (Brend et al. 2003). The corresponding zebrafish

element (referred to as b4-7 in literature) has also previously been shown to

direct reporter gene expression to the entire spinal cord of zebrafish embryos

(Punnamoottil et al. 2010). The lamprey CNE seems to recapitulate a subset of

the expression pattern of its flanking gene, Hox-α4 (known in literature as

LjHox4W) which has been shown to be expressed in the neural tube of lamprey embryos (Takio et al. 2007). The differences in the expression patterns driven by lamprey and mouse elements is probably due to the somewhat divergent sequences of the lamprey elements at its 5’ and 3’ regions compared to the highly conserved mouse and zebrafish sequence (Fig.

19). Such sequence divergence could result in the loss of binding sites for some transcription factors, thereby altering the expression patterns and/or the

levels of expression.

102

The mouse αCNE5 also drove reporter gene expression in some additional structures in transgenic zebrafish including the eye and pronephric duct (Fig.

27 D, E, F). These are clearly ectopic expression as Hox genes are not known to express in these domains. The mouse CNE also drove strong reporter gene expression to the pronephric duct (Fig. 27 D, E, F). This domain resembles the expression patterns of zebrafish Hox genes present in the vicinity of the region homologous to the mouse CNE, such as Hoxb5b and Hoxb6b. Both genes are expressed in the pronephric duct (Prince et al. 1998; Jarinova et al. 2008). The mouse CNE could be interacting with transcription factors of zebrafish thereby recapitulating the expression pattern of zebrafish genes.

4.3.3 Functional assay of βCNE2 and βCNE3

The lamprey βCNE2 and βCNE3 are located in the intergenic region of

Tax1bp and Evx, upstream of the Hox-β cluster. In human and zebrafish, the corresponding CNE is flanked by Lnp at the 5’ end instead of Tax1bp. The lamprey βCNE2 and βCNE3 regions are ~7 kb apart whereas the corresponding human CNE regions in the HoxD cluster are only ~170 bp apart

(Fig. 20). I tested the lamprey βCNE2 and βCNE3s separately as they are located far apart whereas human βCNE2 and βCNE3 were tested as a single element including 167 bp of their intergenic region. A total of two stable transgenic founder lines each for the two lamprey elements and the combined human element were generated.

103

βCNE2 βCNE3 Lj‐βTax1bp Lj‐βEvx

fp fp CNE2 β

fp

Lamprey ACB

ns CNE3 β ns

Lamprey DE F

oc fp pf j ba CNE2-3 β ba j oc oc Human GHI J

Figure 28 – Expression pattern driven by lamprey Hox-βCNE2 and Hox- βCNE3 and its human homolog in 3 dpf F1 generation zebrafish embryos. A, B, D, E, G, and H: lateral view. C, F, and I: dorsal view. J: ventral view. fp, floor plate; ns, notochord sheath; pf, pectoral fin; j, jaw; oc, otic capsule; ba, branchial arches.

In the F0 population, both lamprey CNEs (βCNE2 and βCNE3) and the

combined human CNE (βCNE2-3) drove expression to the lens and muscle

region. By contrast, in the F1 population the lamprey βCNE2 drove reporter

gene expression exclusively to the floor plate whereas βCNE3 drove to the

notochord sheath (also known as the notochord walls) (Fig. 28). On the other

hand, the combined human βCNE2-3 drove expression to the floor plate and

some additional structures such as the jaw, branchial arches, otic capsule, and

pectoral fin (Fig. 28). The expression of lamprey and human CNEs in the floor

plate is ectopic considering that neither Hox genes nor the two flanking non-

Hox genes in the locus (Tax1bp or Lnp and Evx) are normally expressed in this region. The notochord expression driven by lamprey βCNE3 and the lens, jaw,

104 and branchial arch expression driven by human βCNE2-3 are also considered as ectopic expression. Interestingly, human βCNE2-3 drove reporter gene expression to the pectoral fin (Fig. 28) where zebrafish orthologs of Lnp and

Evx2 are known to express (Ahn and Ho 2008). This suggests that the transcription factors in zebrafish could be reading the human CNEs as they read the cis-regulatory elements of Lnp and Evx2 genes and recapitulating their expression patterns. It should be also noted that the expression pattern driven by the lamprey and human CNEs are not consistent with hindbrain expression of human Hs246 enhancer element that shows partial overlap with the lamprey and human CNEs (Pennacchio et al. 2006). This is probably because the tested

CNEs are short and represent only a part of the 1340 bp Hs246 hindbrain enhancer (Fig. 20) and therefore may not include all TFBSs necessary to drive expression to the hindbrain.

4.3.4 Summary of expression patterns driven by selected lamprey CNEs

In summary, I assayed the function of four selected CNEs (from lamprey and human or mouse) out of the 13 CNEs predicted between the lamprey and gnathostome Hox clusters. Among these, two elements from lamprey and human (αCNE2 and αCNE5) drove expression patterns resembling some Hox genes in the cluster. This provides support to the view that CNEs are enriched for cis-regulatory elements. Since these elements are conserved in lamprey and gnathostomes like human and elephant shark, they must have originated in the common ancestor of jawless vertebrates and gnathostomes. Some of the CNEs tested by me also showed expression in domains in which Hox genes are not known to be expressed. Such ectopic expression are generally observed in

105 trans-species transgenic assays (de la Calle-Mustienes et al. 2005; Woolfe et al.

2005; McEwen et al. 2009), presumably due to a lack of other cis-elements such as enhancers or silencers that are required for the proper restricted expression, or due to the variations in the complement of transcription factors between the CNE bearing species (lamprey, human or mouse) and zebrafish in which the CNEs were tested.

106

Chapter 5: Results – Genome duplications in the lamprey lineage

5.1 Japanese lamprey Hox clusters and their duplication

I managed to identify 43 Hox genes in the Japanese lamprey genome and these

could be assembled into four complete Hox clusters (designated Hox-α, -β, -γ, and –δ), two fragmented clusters (Hox-ε and –ζ) and two singleton Hox4 genes

(Hox-η4 and –θ4) both linked to a single non-Hox gene (Fig. 7). This suggests that the Japanese lamprey has at least six Hox clusters in its genome. The identification of eight Hox4 genes (of which at least six are confirmed to be linked to other Hox genes) suggests the presence of at least eight Hox loci in

Japanese lamprey. This is an interesting finding as the presence of more than four Hox clusters in the Japanese lamprey would suggest that its lineage has

experienced an additional round of genome duplication. The presence of more

than four Hox clusters have only so far been noted in ray-finned fishes that are

the result of an additional genome duplication event (3R) in the lineage

(Christoffels et al. 2004; Jaillon et al. 2004). It is also possible that the

additional lamprey Hox clusters (Hox-ε and -ζ) are the result of gene or

segmental duplications. However, this possibility is less likely given that some

of the additional lamprey Hox clusters comprise multiple linked Hox genes

(Hox-ε14 to ε9 and Hox-ζ9 to ζ4) or Hox genes linked to known Hox cluster-

linked genes, like Hox-ε14 linked to Evx and Hox-ζ13 linked to Jazf and Evx

(Fig. 7). To date there is no evidence for such large-scale segmental duplication

of Hox loci in any other vertebrate and therefore this possibility in lamprey

appears unlikely. Given that the Japanese lamprey contains at least six Hox

clusters and this suggests that the lamprey lineage underwent an additional

107 round of genome duplication, I looked for additional evidence to support this hypothesis. If there was an additional round of genome duplication in the lamprey lineage, then its genome should contain genes with more paralogs than human, a genome having not undergone any additional rounds of whole- genome duplication. For this, I mined the Japanese lamprey genome assembly for gene families that have been studied previously for phylogenetic analyses of duplicated genes in sea lamprey and other vertebrates (Neidert et al. 2001;

Kuraku et al. 2009; Tank et al. 2009; Qiu et al. 2011; Crow et al. 2012).

5.2 Comparative analysis of non-Hox genes in lamprey and

representative gnathostomes

I extracted sequences of 179 members of 70 human genes/gene families (and some sea lamprey orthologs) from GenBank and Ensembl, and used this dataset to search for homologous gene containing scaffolds in the Japanese lamprey genome assembly. The gene-containing scaffolds were extracted and protein sequences predicted using homology-based and de novo-based methods

(like that done for Hox genes). Of the 179 members of 70 human genes/gene families searched, a total of 214 homologs were identified in the Japanese lamprey with 30 genes/gene families containing more members in Japanese lamprey than human and only 16 families containing more members in human than lamprey (Appendix Table 3).This finding suggests that the lamprey lineage underwent an additional round of genome duplication. Among the analyzed genes, the standout case supporting this hypothesis is KCNA. The

Japanese lamprey genome contains 15 full-length KCNA genes (Serial No.1 in

Appendix Table 3) located on different scaffolds whereas the human genome

108 contains eight KCNA genes (Hoegg and Meyer 2007). In the other analyzed genes, one member of the transducin family (GNA) is particularly interesting; in the human genome, a tandem duplication of an ancestral GNA gene gave rise to both GNAT and GNAI genes linked on the same chromosome, and three such pairs of genes on different chromosomes arose by chromosome duplications

(Nordstrom et al. 2004) however, the Japanese lamprey genome has three

GNAT genes like human, but seven GNAI genes (Serial No. 35 and 2 in

Appendix Table 3). In lamprey, there are two instances of GNAT genes found adjacent to a GNAI gene with presumably a third pair also in the genome like that shown for human. However this scenario leaves four extra unlinked copies of GNAI found on different scaffolds in the Japanese lamprey genome. These extra copies could be the result of tandem gene duplications if they are later found (through further genome sequencing) to be located on the same chromosome or, the extra copies of GNAI could have arosen as a result of the additional duplication event in the lamprey lineage with the secondary loss of their linked GNAT gene after duplication. Members of other gene families like

Fzd1/2/7 (three copies in human) have four copies in lamprey, and the presence of duplicate members of this gene family have been used previously to support independent genome duplications in the paddlefish lineage of ray-finned fishes

(Crow et al. 2012). An interesting finding is the identification of five Wnt7 genes in lamprey, a gene with only two copies in human (Wnt7a and Wnt7b) but four copies in zebrafish; two paralogs each of the human Wnt7a and Wnt7b genes that arose as a result of the teleost-specific genome duplication (3R) event (Beretta et al. 2011). The additional copies of Wnt7 in the lamprey genome therefore appear to be the product of an additional round of genome

109

duplication in the lamprey lineage. Furthermore, the multiple gene members in

lamprey are more often than not found on different scaffolds of the genome

assembly, suggesting that they are less likely to be copies resulting from

tandem duplications. Indeed, further sequencing of the genome could find that

some of these additional gene copies in the lamprey genome are found on the

same chromosome, and therefore possibly the result of tandem or segmental

duplications in the lineage. However, given that I used a diverse set of so many

(70) gene/gene families for analysis, it is highly unlikely that all 30 genes with

additional copies in the lamprey genome (compared to human) are the result of

such large-scale tandem or segmental duplications. Therefore, these findings of

additional copy numbers of more genes/gene families in the lamprey genome

compared to the human genome strongly suggest that the lamprey lineage

underwent an additional round of genome duplication.

5.3 Lamprey and gnathostome genome duplication history

The evidence for 1R and 2R in the vertebrate stem is well-supported (Dehal

and Boore 2005; Putnam et al. 2008) however the timing of these events in

relation to cyclostome and gnathostome divergence is yet to be resolved. As

Hox clusters are reflective of genome duplication history, the presence of four

Hox gene clusters in the basally branching cartilaginous fish (Ravi et al. 2009)

suggests that both 1R and 2R occurred before gnathostome divergence. In this

respect, by completely sequencing the Hox clusters of a cyclostome (Japanese

lamprey), I showed that the Japanese genome contains at least six Hox clusters

(or up to eight Hox loci) that likely arose as a result of three rounds of genome

duplication. Given that I have provided evidence that suggests for an additional

110

round of genome duplication in the lamprey lineage I was now interested to see

if I could infer the timings of 1R and 2R relative to cyclostome and

gnathostome divergence. Such inferences can be made by phylogenetic

analysis which can be used to assign orthology of lamprey genes to their

gnathostome relatives that underwent both 1R and 2R WGD. As an example,

by using the eight lamprey Hox loci and four representative gnathostome Hox

clusters (HoxA, HoxB, HoxC, HoxD) having undergone 1R and 2R only, there

are three hypothetical models for the relative timings of WGD events. These

relative timings can be inferred by the topology of lamprey and gnathostome

Hox genes and clusters in a phylogenetic tree (Fig. 29).

Figure 29 – Three models for WGD histories in lamprey and gnathostome lineages and their expected phylogenetic topologies. The pattern of Hox gene clustering with respect to the model of genome duplication history is shown below the respective model. G1R and G2R represent one and two rounds of independent genome duplication events in the gnathostome lineage whereas L1R, L2R, and L3R represent one, two, and three rounds of independent duplication events in the lamprey lineage.

The first model is for both 1R and 2R before the lamprey-gnathostome split

and then a third round (3R – not to be confused with the teleost-specific genome duplication) in the lamprey lineage (Fig. 29). If this were the case, in

111 the phylogenetic tree, we would expect to see genes from two of the eight lamprey Hox loci group with each of the four gnathostome Hox clusters (Fig.

29). The second model hypothesizes for 1R before the lamprey-gnathostome split, 2R only in the gnathostome lineage, and two additional independent rounds of WGD in the lamprey lineage (Fig. 29). In a phylogenetic tree, this would give a topology where genes from four of the eight lamprey Hox loci group with two of the four gnathostome Hox clusters (Fig. 29). The final model postulates 1R and 2R specifically in the gnathostome lineage (named G1R and

G2R) and for three independent rounds of WGD in the lamprey lineage, referred to as L1R, L2R, and L3R (Fig. 29). The expected topology for this would provide exclusive clustering of gnathostome genes together and lamprey genes together outside as they are derived from their own independent duplication events (Fig. 29).

In chapter 3 I showed that lamprey Hox genes cluster exclusively out of their gnathostome relatives (Fig. 10 to 14 and Appendix Fig. 1 to 4). This topology would lend support for model 3 where both 1R and 2R occurred independently in the lamprey and gnathostome lineages (Fig. 29). Given that I have now identified more genes/gene families with more copies in lamprey than human, I was interested to see whether members of these genes/gene families supported the same duplication model as that for Hox genes, or a different model of whole genome duplication history given in the top panels of Fig. 29. To carry this out, I extracted sequences of the selected gene families from representative vertebrate and invertebrate organisms available in GenBank and Ensembl.

These included sequences from human, chicken, lizard, zebrafish, fugu,

112

Xenopus, coelacanth, shark, sea lamprey, amphioxus and/or Ciona. Protein and

nucleotide coding sequences were aligned then trimmed and phylogenetic trees

were reconstructed using ML and BI methods (see Materials and methods);

some of these trees are given in (Appendix Figures 5 to 10). Phylogenetic

analysis of the 15 full-length Japanese lamprey KCNA genes with gnathostome

genes from the eight KCNA groups showed independent clustering of lamprey genes together outside of gnathostomes (Appendix Figure 5) and this topology

is observed for other gene families too (Appendix Figures 6 to 10). The

topology provided by phylogenetic analysis of non-Hox gene families with

more copies in lamprey than human support for both 1R and 2R occurring

independently in the lamprey and gnathostome lineages (model 3, top right

panel in Fig. 29). In previous studies of sea lamprey genes, including Hox

(Fried et al. 2003) and non-Hox genes (Neidert et al. 2001; Tank et al. 2009;

Qiu et al. 2011), phylogeny reconstruction also found that the lamprey

sequences clustered together outside of all gnathostome genes. However, such

topologies that imply independent duplication histories in the two lineages

(lampreys and gnathostomes) has been interpreted as an artifact due to the GC-

bias of the sea lamprey genome affecting codon usage pattern and the amino-

acid composition of protein sequences (Smith et al. 2013). Considering the GC- content of the Japanese lamprey genome (48%) is in fact higher than that of the

sea lamprey genome (46%) (Smith et al. 2013) it is likely that the Japanese

lamprey genome also has a GC-bias which might be causing the exclusive

clustering of the Japanese lamprey genes. Thus, phylogenetic analysis

involving any set of lamprey genes is not informative for timing the 1R and 2R

events relative to lamprey and gnathostome divergence.

113

As an alternative, data from my previous two results chapters can provide some clues to the timing of 1R and 2R in lamprey and gnathostome Hox clusters. For one, the synteny of genes linked to lamprey and gnathostome Hox clusters are different. While this could have occurred by the two lineages sharing 1R and

2R and then an independent loss of genes in the two lineages, it can also be explained by independent 1R and 2R events followed by independent secondary gene loss in the two lineages. The latter scenario is in fact well supported by the observation that single lamprey Hox clusters share CNEs across the four paralogous Hox clusters (HoxA, HoxB, HoxC, HoxD) of both elephant shark and human. This pattern of CNE conservation suggests that a single lamprey Hox cluster is a common ortholog of all the four representative gnathostome Hox clusters and this is likely if the lamprey and gnathostome

Hox clusters were duplicated after the two lineages diverged from one another.

These two independent lines of evidence which are unaffected by the GC-bias of the lamprey genome suggest that both 1R and 2R occurred independently in the lamprey and gnathostome lineages. If this inference is correct it contradicts findings from the comparative analysis of sea lamprey and tetrapod genomes that demonstrated patterns of homology and similar duplication frequencies of protein-coding genes consistent with 1R and 2R occurring before the lamprey lineage diverged from gnathostomes (Smith et al. 2013). However, considering the draft assembly was generated using somatic DNA which has been shown to lose ~20% of genomic DNA present in the germline (Smith et al. 2009; Smith et al. 2012), the analysis regarding the timing of 1R and 2R WGD events is considered inconclusive. Moreover, only two complete Hox clusters and an additional two linked and six unlinked Hox genes were found in the sea

114

lamprey genome assembly (Smith et al. 2013). This is in contrast to the

presence of at least six Hox clusters that I identified in the Japanese lamprey

genome. Considering that this finding suggests that there was an additional

round of genome duplication in the lamprey lineage, I was interested to see

whether the sea lamprey Hox genes support for the additional duplication event

in the lamprey lineage as it was not reported previously (Smith et al. 2013). For

this, I extracted sequences for 31 distinct sea lamprey Hox genes found in

GenBank (given in Appendix Table 2) and analyzed their homology to the

Japanese lamprey Hox genes. This analysis showed that the set of 43 Japanese

lamprey Hox genes included 1-to-1 orthologs of all 31 sea lamprey

(Petromyzon marinus) Hox genes available in GenBank (Marked with asterisk

in Fig. 30 and Appendix Table 2). Interestingly, Japanese lamprey orthologs of

the 31 Hox genes known in the sea lamprey are distributed across six Hox clusters (Hox-α to –ζ) (Fig. 30). This includes five paralogous group Hox9

genes each linked to multiple orthologous Hox genes and indicates that the sea

lamprey also contains more than four Hox clusters (Fig. 30).

115

Figure 30 - Schematic diagram of Japanese lamprey Hox clusters and a single flanking gene (if available). Non-Hox genes are shown in white color. Arrows denote the direction of gene transcription; pseudogenes are marked with a Ψ symbol. Asterisks indicate Japanese lamprey Hox genes that have 1-to-1 orthologs in the sea lamprey. Black lines represent scaffolds from the genome assembly whereas blue lines represent sequences obtained by sequencing BAC clones. Scaffold ends are represented as black circles and deletions in BAC clones are shown as brackets.

This finding implies that the proposed additional genome duplication event

predated the divergence of the Japanese lamprey and sea lamprey lineages.

Whether or not the additional duplication event predated cyclostome

divergence is unclear and inferences could be made if the Hox genes and

clusters were completely sequenced in hagfish. Partial sequences of 35 hagfish

Hox genes (Stadler et al. 2004) found in GenBank were too short (<40 amino acids) to definitively assign orthology to Japanese lamprey Hox genes.

However, the identification of seven Hox9 genes in Pacific hagfish suggests the presence of at least seven Hox clusters and implies that the hagfish lineage underwent an additional Hox cluster (and/or genome) duplication event

(Stadler et al. 2004) like lamprey.

116

Chapter 6: Discussion

The evolutionary innovations of many vertebrate-specific features are believed to be related to the two rounds (1R and 2R) of whole genome duplication

(WGD) events during early vertebrate evolution (Ohno 1970; Dehal and Boore

2005; Putnam et al. 2008). A major unresolved issue in this respect is the timing of these events in relation to cyclostome and gnathostome divergence and whether these events contributed to the complexity of both of these vertebrate lineages. The Hox transcription factors play a key role in defining body segment identities along the anterior-posterior axis of developing

embryos (Fig. 2) and are therefore implicated in the morphological diversity of

all vertebrates. In vertebrates, the numbers of Hox gene clusters serve as an

indicator for the genome duplication history of the lineage. The four Hox

clusters of elephant shark, coelacanth and tetrapods are believed to be the result

of both 1R and 2R WGD events. However, very little is known about the Hox

cluster organization in cyclostomes (comprising lampreys and hagfishes).

Characterization of the Hox clusters in cyclostomes would provide clues to the

genome duplication events at the root of vertebrates. Hence, the main aim of

my project was to sequence and characterize the Hox genes and clusters in

lamprey (a cyclostome) and perform comparative analysis with the Hox genes

and clusters of other chordates.

Our lab had identified Japanese lamprey as a good candidate to characterize a

cyclostome genome (including the Hox genes and clusters) due to its relatively

small genome size (~1.6 Gb) in comparison to sea lamprey (~2.3 Gb) (Smith et

117

al. 2013) and pacific hagfish (~2.6 Gb) (Gregory T.R. 2005). To this end, by

using a combination of BAC sequencing data and scaffold sequences from a draft germline genome assembly I identified a total of 43 Hox genes which could be assembled into six Hox clusters (four complete clusters: Hox-α, -β, -γ,

-δ and two fragmented clusters: Hox-ε and –ζ) and two singletons (Hox-η4 and

–θ4) each linked to a non-Hox gene (Fig. 7). This shows that the Japanese

lamprey genome contains at least six Hox clusters. A striking feature of the

Japanese lamprey Hox clusters is their large size and high repeat content

compared to gnathostome Hox clusters (Fig. 16 and 17). In the lamprey Hox

clusters, the invasion of various repeat elements may have influenced the

spatial and temporal expression patterns of certain Hox genes. Indeed, the

expansion of squamate Hox cluster regions through repeat element insertion is

thought to have facilitated Hox adaptations in the differential body plans

observed between the members of this diverse group (Di-Poi et al. 2010). In

this way, the invasion of a higher number of repeat elements in the lamprey

Hox clusters may have significantly impacted the arrangement and splicing of

coding regions, like for example in lamprey Hox-η4, which is the first reported

case of three introns in any Hox gene (Fig. 8). Additionally, repeat element

insertion may have also impacted the arrangement of functional noncoding

regions, affecting the direct regulation of Hox gene transcription and thus, the

spatial and temporal expression patterns of certain lamprey Hox genes.

An interesting feature of Hox cluster genes is their position within the cluster

reflecting their nested and time-coordinated expression pattern along the

anterior-posterior axis of the developing embryo, otherwise referred to as

118

‘spatial’ and ‘temporal’ collinearity (Gaunt et al. 1988; Izpisua-Belmonte et al.

1991). This ‘Hox code’ is influential in conferring different structural

identities along the developing embryo. Changes to the Hox code are thought

to drive evolutionary novelties, some examples of which include the transition

of fins to limbs through the action of posterior HoxD genes (Ahn and Ho

2008) and evolution of the snake body plan through the variation of protein sequences and regulation of posterior Hox genes compared to other squamates

(Di-Poi et al. 2010). The absence of Hox12 in lampreys, which is otherwise retained in gnathostome Hox clusters and possibly recruited for the development of gnathostome-specific organs, like paired-limbs (Capecchi

1989), suggests that there could be differences between the lamprey and

gnathostome Hox code. Considering spatial collinearity exists in most studied

invertebrates and Japanese lamprey (Takio et al. 2007) whereas gnathostomes

exhibit both spatial and temporal collinearity, this then raises the question of

whether or not temporal collinearity of Hox genes (the vertebrate Hox code)

was already established in the last common ancestor of cyclostomes and

gnathostomes.

To this end, the temporal expression patterns of 18 Japanese lamprey Hox

sequences were previously studied in lamprey embryos by in situ

hybridization (Takio et al. 2004; Takio et al. 2007). Analysis of the embryonic expression patterns of three Japanese lamprey Hox genes, whose orthologs in

the sea lamprey are found in a cluster, showed no sign of temporal collinearity

(Takio et al. 2007). Considering this expression data was generated without prior knowledge of the organization of Hox genes in clusters, it was

119 inconclusive in providing evidence for a distinct lack of temporal collinearity of Hox genes in Japanese lamprey. Since the relative positions of many more

Japanese lamprey Hox genes are now known from my study, I analyzed both their spatial and temporal collinearity by mapping previous known expression data to the lamprey Hox genes and clusters generated in my study (Takio et al.

2004; Takio et al. 2007) (Fig. 31). As expected, the collinearity of a spatial order of expression in the rhombomeres, pharyngeal ectomesenchyme and neural tube was conserved in the Japanese lamprey (Fig. 31). However, the overall temporal order of the onset of expression in the neural tube was not collinear with the positions of the Hox genes in the clusters (Fig. 31). Among the Hox-α cluster genes, the posterior Hox-α8 gene starts expressing in the neural tube earlier (stage 20) than the anterior Hox-α2 and -α3 genes (stage

21), while Hox-α6 expression is initiated two stages before Hox-α5 expression

(stage 24) (upper left panel in Fig. 31). Likewise, among the Hox-β cluster genes, Hox-β1 and Hox-β7 start expressing at stage 23 whereas expression of the more posterior Hox-β9 gene is initiated earlier at stage 22 (upper right panel in Fig. 31). In the Hox-ε cluster, the onset of Hox-ε8 expression is seen at stage 19, which is the earliest known stage of Hox gene expression (lower panel in Fig. 31). Thus, it is unlikely that the complete complement of genes in the Hox-ε cluster exhibit temporal collinearity.

120

Figure 31 - Graphical representation of spatial (left of the cluster) and temporal (right of the cluster) expression patterns of the Japanese lamprey Hox-α, -β, and –ε cluster genes in Japanese lamprey embryos. Expression patterns of Japanese lamprey Hox genes were obtained from previous studies (Takio et al. 2004; Takio et al. 2007). Expression pattern of only one gene in Hox-cluster and none in Hox- cluster is known and hence these clusters are not shown. Spatial expression domains are shown for rhombomeres (r2-6: peach bars), neural tube (nt: blue bars), and arches of pharyngeal pouches (pp1-8: brown bars). Temporal expression is shown as the earliest onset of neural tube expression (blue bars) at developmental stages 19- 30.

However it must be noted that certain genes from each of the three lamprey

Hox clusters shown in Fig. 31 follow a limited temporal collinear order of expression. For example, PG3 to 5 in the Hox-α cluster, PG9 to 10 in the Hox-

121

β cluster, and PG8 to 11 in the Hox-ε cluster all follow temporal collinearity.

The data conferring the lack of temporal collinearity of lamprey Hox clusters is therefore incomplete and requires further attention. Future work may be able to resolve this issue by conducting in situ hybridization on lamprey embryos using probes for all the Hox genes and studying their expression pattern along the lamprey neural tube at different stages of development. Overall, the lamprey Hox cluster genes exhibit spatial collinearity and possibly lack temporal collinearity, similar to invertebrate Hox cluster genes. However, the four Hox clusters of gnathostomes exhibit both kinds of collinearity. If there is a lack of temporal collinearity in lamprey Hox gene clusters, this would be an interesting finding, and in fact the first among studied vertebrates. One probable explanation for this scenario is that lamprey and gnathostome lineages diverged before the duplication of the Hox clusters. This was then followed by evolution of structural and regulatory changes that underlie temporal collinearity within the ancestral gnathostome Hox cluster, and finally duplications of the clusters occurred to give the four Hox clusters of gnathostomes. The alternative scenario, in which lamprey and gnathostomes shared the Hox cluster duplication history and then the duplicated lamprey Hox clusters independently lost the regulatory mechanisms involved in temporal collinearity, is less parsimonious.

Compared to the compact gnathostome Hox clusters that are less prone to repeat element insertion, I have shown that the lamprey Hox clusters are large like invertebrate Hox clusters (Fig. 16) and have been invaded by a high number of repeat elements (Fig. 17). I have also shown (in conjunction with

122 previous data) that the lamprey Hox clusters display spatial collinearity but data conferring the lack of temporal collinearity is inconclusive at this stage and requires further examination (Fig. 31). Nonetheless, given the organizational differences between lamprey and gnathostome Hox clusters, I was at this point interested in studying the regulation of lamprey Hox genes.

The first step in studying the regulation of genes is the identification of certain cis-regulatory elements (such as enhancers) that mediate the tissue-specific expression of genes. Comparative analysis of the lamprey and gnathostome

Hox clusters offer the maximum evolutionary distance in which to characterize ancient vertebrate cis-regulatory elements and since they are evolving slower than flanking sequences, potential cis-regulatory elements can be identified as conserved noncoding elements (CNEs). By aligning the four complete (Hox-α,

Hox-β, Hox-γ, and Hox-δ) lamprey Hox clusters with all the four clusters of representative gnathostomes (elephant shark and human) I predicted a total of

13 CNEs (Table 2). In the set of 13 identified CNEs, three (αCNE5, βCNE2, and βCNE3) overlapped known enhancer elements (Fig. 19 and 21) and were therefore recognized as ancient enhancer elements driving the expression patterns of certain flanking genes. For the other ten identified CNEs, certain ones, like Hox-αCNE2, were highly conserved in all the four Hox clusters of human and elephant shark (Fig. 18 and Table 2) and on this basis were selected for assaying the function in transgenic zebrafish. In total I tested the function of four identified CNEs (Table 3) and most notably, two, αCNE2 and αCNE5, each drove overlapping domains of reporter gene expression for both lamprey and human/mouse elements (Fig. 26 and 27). Overall, these two CNEs each drove expression patterns in certain domains (like the hindbrain and spinal

123 cord) resembling some Hox genes in the cluster Fig. 26 and 27). This suggests that these CNEs are transcriptional enhancers mediating the expression pattern of its flanking gene. Future experiments involving transposon-mediated BAC transgenesis (Suster et al. 2011) can provide a better understanding to the role of certain identified lampreys CNEs in Hox gene regulation. For example, a single BAC (BAC#Ljt343N18) contains five Hox genes (Fig. 6) with a CNE

(αCNE2) located in the intergenic region of Hox-α7 and Hox-α6. By introducing different reporter genes e.g. GFP, YFP, and mCherry into the transcription start site of different Hox genes in the BAC, the specific reporter gene expression pattern driven by the CNE can be analyzed in transgenic zebrafish.

The low number of CNEs identified in the lamprey Hox clusters is consistent with the low number of CNEs identified in the sea lamprey genome (Smith et al. 2013). The 13 CNEs identified in the lamprey Hox clusters are in fact drastically lower than the number of CNEs (179 in total) identified between the

Hox clusters of human and elephant shark (Ravi et al. 2009). Considering elephant shark and human diverged ~450 million years ago (Ma) (Inoue et al.

2010) and that the lamprey and human lineages diverged ~50 million years earlier (Janvier 2006), the 13 CNEs identified in the lamprey Hox clusters is considerably lower than that expected in relation to the divergence times. In this respect, the CNEs in the lamprey Hox clusters (and the genome) could be evolving at a faster rate in the lamprey genome compared to gnathostomes.

One other explanation is that a number of CNEs were functionally constrained after the divergence of the two lineages (lampreys and gnathostomes) which

124

was followed by the large scale expansion of regulatory elements in the

gnathostome Hox clusters.

Identification of more than four Hox clusters in Japanese lamprey suggests that

its lineage has experienced an additional round of whole genome duplication.

The possibility of the additional lamprey Hox clusters (Hox-ε and Hox-ζ)

arising from gene or segmental duplications is highly unlikely given that these

additional clusters comprise multiple linked Hox genes and such large-scale

segmental duplication events of Hox loci have not been reported in any other

vertebrate previously. Additionally, analysis for the number of 70 human

gene/gene family members in the lamprey genome recovered a significantly

higher number of gene/gene families with more members in lamprey than

human, consistent with an additional duplication event in the lamprey lineage

(Appendix Table 3). Furthermore, the majority of genes/gene families present

with more members in lamprey than human are found on separate scaffolds,

suggesting that they are unlikely to be copies resulting from tandem

duplications. Also, analysis of many gene/gene families makes it less likely

that a significant number of genes with more copies in lamprey than human

were generated by large-scale tandem or segmental duplications in the lamprey

lineage. Therefore, the identification of many more gene/gene family members

in lamprey further supports for an additional round of whole genome

duplication in the lamprey lineage. If my inference of an additional genome duplication event in the lamprey lineage is correct, then it raises questions about previous inferences of the timings of 1R and 2R based on the assumption that the lamprey lineage has experienced only 1R and 2R similar to

125 gnathostomes. For example, based on the analysis of frequency of retained duplicate genes and their synteny pattern in the sea lamprey and gnathostome genomes, it was concluded that both 1R and 2R occurred before the divergence of the lamprey and gnathostome lineages (Smith et al. 2013). This conclusion needs to be reexamined in light of the possibility that lamprey may have experienced an additional round of genome duplication. Because of the codon usage and amino acid composition bias of lamprey protein-coding genes (Qiu et al. 2011; Smith et al. 2013) and the consequent exclusive clustering of lamprey sequences outside gnathostome sequences in the phylogenetic trees

(Fig. 10 to 14 and Appendix Figure 1 to 10), phylogenetic analysis is not informative for timing 1R and 2R events in relation to the divergence of lamprey and gnathostome lineages. As an alternative, I looked for clues about the timings of 1R and 2R in the Hox clusters of lamprey and gnathostomes.

First, the complement of syntenic genes linked to each lamprey Hox cluster is different from those linked to gnathostome Hox clusters (Fig. 15). Although this pattern of synteny of duplicated genes can be explained by shared duplication history followed by independent gene losses, this scenario can also be due to independent histories of duplication and independent secondary loss of genes in the lamprey and gnathostome lineages. Second, individual lamprey

Hox clusters share CNEs across the four paralogous Hox clusters of elephant shark and human, rather than predominantly with one gnathostome cluster (Fig.

24). This pattern of CNE distribution suggests a many-to-many orthology relationship between lamprey and representative gnathostome Hox clusters, which can be the case if the lamprey and gnathostome Hox clusters duplicated after the divergence of the two lineages. Thirdly, the possible lack of temporal

126 collinearity of the lamprey Hox clusters (Fig. 31) suggests that the recruitment of regulatory elements conferring temporal collinearity to the gnathostome Hox clusters could have occurred in the gnathostome lineage prior to the duplication of gnathostome Hox clusters, but after the lamprey-gnathostome split. These independent lines of evidence gained from sequencing the lamprey Hox clusters suggest that Hox cluster duplication occurred independently in the lamprey and gnathostome lineages (Model 3 in Fig. 29). Further studies are required to verify this inference through detailed comparative analysis of germ line genomes of the sea lamprey, Japanese lamprey and representative gnathostome genomes. Nonetheless, if the inference of independent Hox cluster duplications in the lamprey and gnathostome lineages is correct, it implies that 1R and 2R occurred independently in the lamprey and gnathostome lineages followed by an additional whole genome duplication event in the lamprey lineage (Model 3 in Fig. 29).

127

Bibliography

Abzhanov A, Kaufman TC. 1999. Homeotic genes and the arthropod head: expression patterns of the labial, proboscipedia, and Deformed genes in crustaceans and insects. Proc Natl Acad Sci U S A. 96:10224-10229. Ahn D, Ho RK. 2008. Tri-phasic expression of posterior Hox genes during development of pectoral fins in zebrafish: implications for the evolution of vertebrate paired appendages. Dev Biol. 322:220-233. Aigner B, Fleischmann M, Muller M, Brem G. 1999. Stable long-term germ- line transmission of transgene integration sites in mice. Transgenic Res. 8:1-8. Allendorf F, Thorgaard G. 1984. Tetraploidy and the Evolution of Salmonid Fishes. In: Turner B, editor. Evolutionary Genetics of Fishes: Springer US. p. 1-53. Amemiya CT, Powers TP, Prohaska SJ, et al. 2010. Complete HOX cluster characterization of the coelacanth provides further evidence for slow evolution of its genome. Proc Natl Acad Sci U S A. 107:3622-3627. Amemiya CT, Prohaska SJ, Hill-Force A, Cook A, Wasserscheid J, Ferrier DE, Pascual-Anaya J, Garcia-Fernandez J, Dewar K, Stadler PF. 2008. The amphioxus Hox cluster: characterization, comparative genomics, and evolution. J Exp Zoolog B Mol Dev Evol. 310:465-477. Amores A, Force A, Yan YL, et al. 1998. Zebrafish hox clusters and vertebrate genome evolution. Science. 282:1711-1714. Andrey G, Montavon T, Mascrez B, Gonzalez F, Noordermeer D, Leleu M, Trono D, Spitz F, Duboule D. 2013. A switch between topological domains underlies HoxD genes collinearity in mouse limbs. Science. 340:1234167. Andrioli LP, Vasisht V, Theodosopoulou E, Oberstein A, Small S. 2002. Anterior repression of a Drosophila stripe enhancer requires three position-specific mechanisms. Development. 129:4931-4940. Aparicio S, Morrison A, Gould A, Gilthorpe J, Chaudhuri C, Rigby P, Krumlauf R, Brenner S. 1995. Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, Fugu rubripes. Proc Natl Acad Sci U S A. 92:1684-1688. Arenas-Mena C, Cameron AR, Davidson EH. 2000. Spatial expression of Hox cluster genes in the ontogeny of a sea urchin. Development. 127:4631- 4643. Aronowicz J, Lowe CJ. 2006. Hox gene expression in the hemichordate Saccoglossus kowalevskii and the evolution of deuterostome nervous systems. Integr Comp Biol. 46:890-901. Balciunas D, Wangensteen KJ, Wilber A, et al. 2006. Harnessing a high cargo-capacity transposon for genetic applications in vertebrates. PLoS Genet. 2:e169. Banerji J, Rusconi S, Schaffner W. 1981. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell. 27:299-308. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. 2004. Ultraconserved elements in the human genome. Science. 304:1321-1325.

128

Bender W, Akam M, Karch F, Beachy PA, Peifer M, Spierer P, Lewis EB, Hogness DS. 1983. Molecular Genetics of the Bithorax Complex in Drosophila melanogaster. Science. 221:23-29. Beretta CA, Brinkmann I, Carl M. 2011. All four zebrafish Wnt7 genes are expressed during early brain development. Gene Expr Patterns. 11:277-284. Bray N, Dubchak I, Pachter L. 2003. AVID: A global alignment program. Genome Res. 13:97-102. Brend T, Gilthorpe J, Summerbell D, Rigby PW. 2003. Multiple levels of transcriptional and post-transcriptional regulation are required to define the domain of Hoxb4 expression. Development. 130:2717-2728. Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S. 2003. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13:721- 731. Brunschwig K, Wittmann C, Schnabel R, Burglin TR, Tobler H, Muller F. 1999. Anterior organization of the Caenorhabditis elegans embryo by the labial-like Hox gene ceh-13. Development. 126:1537-1546. Burglin TR, Ruvkun G. 1993. The Caenorhabditis elegans homeobox gene cluster. Curr Opin Genet Dev. 3:615-620. Butler JE, Kadonaga JT. 2002. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 16:2583- 2592. Cameron RA, Rowen L, Nesbitt R, et al. 2006. Unusual gene order and organization of the sea urchin hox cluster. J Exp Zool B Mol Dev Evol. 306:45-58. Campo-Paysaa F, Semon M, Cameron RA, Peterson KJ, Schubert M. 2011. microRNA complements in deuterostomes: origin and evolution of microRNAs. Evol Dev. 13:15-27. Capecchi MR. 1989. Altering the genome by homologous recombination. Science. 244:1288-1292. Carpenter EM. 2002. Hox genes and spinal cord development. Dev Neurosci. 24:24-34. Cereghini S, Saragosti S, Yaniv M, Hamer DH. 1984. SV40-alpha-globulin hybrid minichromosomes. Differences in DNase I hypersensitivity of promoter and enhancer sequences. Eur J Biochem. 144:545-553. Chambers KE, McDaniell R, Raincrow JD, Deshmukh M, Stadler PF, Chiu CH. 2009. Hox cluster duplication in the basal teleost Hiodon alosoides (Osteoglossomorpha). Theory Biosci. 128:109-120. Chiori R, Jager M, Denker E, Wincker P, Da Silva C, Le Guyader H, Manuel M, Queinnec E. 2009. Are Hox genes ancestrally involved in axial patterning? Evidence from the hydrozoan Clytia hemisphaerica (Cnidaria). PLoS One. 4:e4231. Chisaka O, Musci TS, Capecchi MR. 1992. Developmental defects of the ear, cranial nerves and hindbrain resulting from targeted disruption of the mouse homeobox gene Hox-1.6. Nature. 355:516-520. Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S, Venkatesh B. 2004. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 21:1146-1151.

129

Condie BG, Capecchi MR. 1993. Mice homozygous for a targeted disruption of Hoxd-3 (Hox-4.1) exhibit anterior transformations of the first and second cervical vertebrae, the atlas and the axis. Development. 119:579-595. Crow KD, Smith CD, Cheng JF, Wagner GP, Amemiya CT. 2012. An independent genome duplication inferred from Hox paralogs in the American paddlefish--a representative basal ray-finned fish and important comparative reference. Genome Biol Evol. 4:937-953. Davenne M, Maconochie MK, Neun R, Pattyn A, Chambon P, Krumlauf R, Rijli FM. 1999. Hoxa2 and Hoxb2 control dorsoventral patterns of neuronal development in the rostral hindbrain. Neuron. 22:677-691. Davis AP, Capecchi MR. 1994. Axial homeosis and appendicular skeleton defects in mice with a targeted disruption of hoxd-11. Development. 120:2187-2198. de la Calle-Mustienes E, Feijoo CG, Manzanares M, Tena JJ, Rodriguez- Seguel E, Letizia A, Allende ML, Gomez-Skarmeta JL. 2005. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts. Genome Res. 15:1061-1072. Dehal P, Boore JL. 2005. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3:e314. Delsuc F, Brinkmann H, Chourrout D, Philippe H. 2006. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature. 439:965-968. Di-Poi N, Montoya-Burgos JI, Duboule D. 2009. Atypical relaxation of structural constraints in Hox gene clusters of the green anole lizard. Genome Res. 19:602-610. Di-Poi N, Montoya-Burgos JI, Miller H, Pourquie O, Milinkovitch MC, Duboule D. 2010. Changes in Hox genes' structure and function during the evolution of the squamate body plan. Nature. 464:99-103. Dolle P, Izpisua-Belmonte JC, Brown JM, Tickle C, Duboule D. 1991. HOX-4 genes and the morphogenesis of mammalian genitalia. Genes Dev. 5:1767-1767. Duboule D. 2007. The rise and fall of Hox gene clusters. Development. 134:2549-2560. Duboule D, Dolle P. 1989. The structural and functional organization of the murine HOX gene family resembles that of Drosophila homeotic genes. EMBO J. 8:1497-1505. Economides KD, Zeltser L, Capecchi MR. 2003. Hoxb13 mutations cause overgrowth of caudal spinal cord and tail vertebrae. Dev Biol. 256:317- 330. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792-1797. Ewing B, Green P. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome research. 8:186-194. Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome research. 8:175-185.

130

Finnerty JR, Pang K, Burton P, Paulson D, Martindale MQ. 2004. Origins of bilateral symmetry: Hox and dpp expression in a sea anemone. Science. 304:1335-1337. Force A, Amores A, Postlethwait JH. 2002. Hox cluster organization in the jawless vertebrate Petromyzon marinus. J Exp Zool. 294:30-46. Frasch M, Chen X, Lufkin T. 1995. Evolutionary-conserved enhancers direct region-specific expression of the murine Hoxa-1 and Hoxa-2 loci in both mice and Drosophila. Development. 121:957-974. Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I. 2004. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 32:W273-279. Freeman R, Ikuta T, Wu M, et al. 2012. Identical genomic organization of two hemichordate hox clusters. Curr Biol. 22:2053-2058. Fried C, Prohaska SJ, Stadler PF. 2003. Independent Hox-cluster duplications in lampreys. J Exp Zoolog B Mol Dev Evol. 299:18-25. Fried C, Prohaska SJ, Stadler PF. 2004. Exclusion of repetitive DNA elements from gnathostome Hox clusters. J Exp Zool B Mol Dev Evol. 302:165- 173. Fritzsch G, Bohme MU, Thorndyke M, Nakano H, Israelsson O, Stach T, Schlegel M, Hankeln T, Stadler PF. 2008. PCR survey of Xenoturbella bocki Hox genes. J Exp Zool B Mol Dev Evol. 310:278-284. Galas DJ, Schmitz A. 1978. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res. 5:3157-3170. Gauchat D, Mazet F, Berney C, Schummer M, Kreger S, Pawlowski J, Galliot B. 2000. Evolution of Antp-class genes and differential expression of Hydra Hox/paraHox genes in anterior patterning. Proc Natl Acad Sci U S A. 97:4493-4498. Gaunt SJ, Sharpe PT, Duboule D. 1988. Spatially restricted domains of homeo-gene transcripts in mouse embryos: relation to a segmented body plan. Development. 104:169-179. Goodman FR, Bacchelli C, Brady AF, et al. 2000. Novel HOXA13 mutations and the phenotypic spectrum of hand-foot-genital syndrome. Am J Hum Genet. 67:197-202. Gordon D. 2003. Viewing and editing assembled sequences using Consed. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]. Chapter 11:Unit11 12. Graham A, Papalopulu N, Krumlauf R. 1989. The murine and Drosophila homeobox gene complexes have common features of organization and expression. Cell. 57:367-378. Gregory TR. 2005. Animal Genome Size Database. http://www.genomesize.com. Gregory TR, Nicol JA, Tamm H, Kullman B, Kullman K, Leitch IJ, Murray BG, Kapraun DF, Greilhuber J, Bennett MD. 2007. Eukaryotic genome size databases. Nucleic Acids Res. 35:D332-338. Gross DS, Garrard WT. 1988. Nuclease hypersensitive sites in chromatin. Annu Rev Biochem. 57:159-197. Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95-98.

131

Hardisty MW. 2011. Lampreys : life without jaws. Tresaith, Cardigan, Ceregidion: Forrest. Heimberg AM, Cowper-Sal·lari R, Sémon M, Donoghue PCJ, Peterson KJ. 2010. microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the nature of the ancestral vertebrate. Proceedings of the National Academy of Sciences. Henkel CV, Burgerhout E, de Wijze DL, et al. 2012. Primitive Duplicate Hox Clusters in the European Eel's Genome. PLoS One. 7:e32231. Higasa K, Nikaido M, Saito TL, et al. 2012. Extremely slow rate of evolution in the HOX cluster revealed by comparison between Tanzanian and Indonesian coelacanths. Gene. 505:324-332. Hirth F, Hartmann B, Reichert H. 1998. Homeotic gene action in embryonic brain development of Drosophila. Development. 125:1579-1589. Hoegg S, Meyer A. 2007. Phylogenomic analyses of KCNA gene clusters in vertebrates: why do gene clusters stay intact? BMC Evol Biol. 7:139. Holder M, Lewis PO. 2003. Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet. 4:275-284. Holland LZ, Albalat R, Azumi K, et al. 2008. The amphioxus genome illuminates vertebrate origins and cephalochordate biology. Genome Res. 18:1100-1111. Holstege JC, de Graaff W, Hossaini M, Cardona Cano S, Jaarsma D, van den Akker E, Deschamps J. 2008. Loss of Hoxb8 alters spinal dorsal laminae and sensory responses in mice. Proc Natl Acad Sci U S A. 105:6338-6343. Hostikka SL, Gong J, Carpenter EM. 2009. Axial and appendicular skeletal transformations, ligament alterations, and motor neuron loss in Hoxc10 mutants. Int J Biol Sci. 5:397-410. Ikuta T, Yoshida N, Satoh N, Saiga H. 2004. Ciona intestinalis Hox gene cluster: Its dispersed structure and residual colinear expression in development. Proc Natl Acad Sci U S A. 101:15118-15123. Inoue JG, Miya M, Lam K, Tay BH, Danks JA, Bell J, Walker TI, Venkatesh B. 2010. Evolutionary origin and phylogeny of the modern holocephalans (Chondrichthyes: Chimaeriformes): a mitogenomic perspective. Mol Biol Evol. 27:2576-2586. Irvine SQ, Carr JL, Bailey WJ, Kawasaki K, Shimizu N, Amemiya CT, Ruddle FH. 2002. Genomic analysis of Hox clusters in the sea lamprey Petromyzon marinus. J Exp Zool. 294:47-62. Izpisua-Belmonte JC, Falkenstein H, Dolle P, Renucci A, Duboule D. 1991. Murine genes related to the Drosophila AbdB homeotic genes are sequentially expressed during development of the posterior part of the body. EMBO J. 10:2279-2289. Jaillon O, Aury JM, Brunet F, et al. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto- karyotype. Nature. 431:946-957. Janvier P. 2006. Palaeontology: modern look for ancient lamprey. Nature. 443:921-924. Jarinova O, Hatch G, Poitras L, Prudhomme C, Grzyb M, Aubin J, Berube- Simard FA, Jeannotte L, Ekker M. 2008. Functional resolution of duplicated hoxb5 genes in teleosts. Development. 135:3543-3553.

132

Johnson D, Kan SH, Oldridge M, Trembath RC, Roche P, Esnouf RM, Giele H, Wilkie AO. 2003. Missense mutations in the homeodomain of HOXD13 are associated with brachydactyly types D and E. Am J Hum Genet. 72:984-997. Kamm K, Schierwater B, Jakob W, Dellaporta SL, Miller DJ. 2006. Axial patterning and diversification in the cnidaria predate the Hox system. Curr Biol. 16:920-926. Karlsson J, von Hofsten J, Olsson PE. 2001. Generating transparent zebrafish: a refined method to improve detection of gene expression during embryonic development. Mar Biotechnol (NY). 3:522-527. Kaufman TC, Lewis R, Wakimoto B. 1980. Cytogenetic Analysis of Chromosome 3 in DROSOPHILA MELANOGASTER: The Homoeotic Gene Complex in Polytene Chromosome Interval 84a-B. Genetics. 94:115-133. Kawakami K. 2007. Tol2: a versatile gene transfer vector in vertebrates. Genome Biol. 8 Suppl 1:S7. Kawakami K, Takeda H, Kawakami N, Kobayashi M, Matsuda N, Mishina M. 2004. A transposon-mediated gene trap approach identifies developmentally regulated genes in zebrafish. Dev Cell. 7:133-144. Keene MA, Corces V, Lowenhaupt K, Elgin SC. 1981. DNase I hypersensitive sites in Drosophila chromatin occur at the 5' ends of regions of transcription. Proc Natl Acad Sci U S A. 78:143-146. Kenyon CJ, Austin J, Costa M, et al. 1997. The dance of the Hox genes: patterning the anteroposterior body axis of Caenorhabditis elegans. Cold Spring Harb Symp Quant Biol. 62:293-305. King BL, Gillis JA, Carlisle HR, Dahn RD. 2011. A natural deletion of the HoxC cluster in elasmobranch fishes. Science. 334:1517. Kmita M, Fraudeau N, Herault Y, Duboule D. 2002. Serial deletions and duplications suggest a mechanism for the collinearity of Hoxd genes in limbs. Nature. 420:145-150. Kmita M, Tarchini B, Zakany J, Logan M, Tabin CJ, Duboule D. 2005. Early developmental arrest of mammalian limbs lacking HoxA/HoxD gene function. Nature. 435:1113-1116. Kohany O, Gentles AJ, Hankus L, Jurka J. 2006. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC bioinformatics. 7:474. Krumlauf R. 1994. Hox genes in vertebrate development. Cell. 78:191-201. Kumar S, Stecher G, Peterson D, Tamura K. 2012. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics. 28:2685-2686. Kuraku S, Kuratani S. 2006. Time scale for cyclostome evolution inferred with a phylogenetic diagnosis of hagfish and lamprey cDNA sequences. Zoolog Sci. 23:1053-1064. Kuraku S, Meyer A, Kuratani S. 2009. Timing of Genome Duplications Relative to the Origin of the Vertebrates: Did Cyclostomes Diverge before or after? Mol Biol Evol. 26:713-. Kuraku S, Takio Y, Tamura K, Aono H, Meyer A, Kuratani S. 2008. Noncanonical role of Hox14 revealed by its expression patterns in lamprey and shark. Proc Natl Acad Sci U S A. 105:6679-6683.

133

Larroux C, Fahey B, Degnan SM, Adamski M, Rokhsar DS, Degnan BM. 2007. The NK homeobox gene cluster predates the origin of Hox genes. Curr Biol. 17:706-710. Lee PN, Callaerts P, De Couet HG, Martindale MQ. 2003. Cephalopod Hox genes and the origin of morphological novelties. Nature. 424:1061- 1065. Lee TI, Jenner RG, Boyer LA, et al. 2006. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell. 125:301- 313. Lehoczky JA, Williams ME, Innis JW. 2004. Conserved expression domains for genes upstream and within the HoxA and HoxD clusters suggests a long-range enhancer existed before cluster duplication. Evol Dev. 6:423-430. Lemons D, McGinnis W. 2006. Genomic evolution of Hox gene clusters. Science. 313:1918-1922. Lettice LA, Heaney SJ, Purdie LA, Li L, de Beer P, Oostra BA, Goode D, Elgar G, Hill RE, de Graaff E. 2003. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum Mol Genet. 12:1725-1735. Lewis EB. 1978. A gene complex controlling segmentation in Drosophila. Nature. 276:565-570. Li Q, Peterson KR, Fang X, Stamatoyannopoulos G. 2002. Locus control regions. Blood. 100:3077-3086. Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science. 288:136-140. Lovtrup S. 1977. The phylogeny of vertebrata / Soren Lovtrup. London ; New York: Wiley. Lufkin T, Dierich A, LeMeur M, Mark M, Chambon P. 1991. Disruption of the Hox-1.6 homeobox gene results in defects in a region corresponding to its rostral domain of expression. Cell. 66:1105-1119. MacKenzie A, Ferguson MW, Sharpe PT. 1991. Hox-7 expression during murine craniofacial development. Development. 113:601-611. Málaga-Trillo E, Meyer A. 2001. Genome Duplications and Accelerated Evolution of Hox Genes and Cluster Architecture in Teleost Fishes. American Zoologist. 41:676-686. Mansfield JH, McGlinn E. 2012. Evolution, expression, and developmental function of Hox-embedded miRNAs. Curr Top Dev Biol. 99:31-57. Masuda-Nakagawa LM, Groer H, Aerne BL, Schmid V. 2000. The HOX-like gene Cnox2-Pc is expressed at the anterior region in all life cycle stages of the jellyfish Podocoryne carnea. Dev Genes Evol. 210:151- 156. Matsunami M, Sumiyama K, Saitou N. 2010. Evolution of conserved non- coding sequences within the vertebrate Hox clusters through the two- round whole genome duplications revealed by phylogenetic footprinting analysis. J Mol Evol. 71:427-436. McEwen GK, Goode DK, Parker HJ, Woolfe A, Callaway H, Elgar G. 2009. Early evolution of conserved regulatory sequences associated with development in vertebrates. PLoS Genet. 5:e1000762.

134

McGhee JD, Wood WI, Dolan M, Engel JD, Felsenfeld G. 1981. A 200 base pair region at the 5' end of the chicken adult beta-globin gene is accessible to nuclease digestion. Cell. 27:45-55. McIntyre DC, Rakshit S, Yallowitz AR, Loken L, Jeannotte L, Capecchi MR, Wellik DM. 2007. Hox patterning of the vertebrate rib cage. Development. 134:2981-2989. Montavon T, Soshnikova N, Mascrez B, Joye E, Thevenet L, Splinter E, de Laat W, Spitz F, Duboule D. 2011. A regulatory archipelago controls Hox genes transcription in digits. Cell. 147:1132-1145. Mortlock DP, Innis JW. 1997. Mutation of HOXA13 in hand-foot-genital syndrome. Nat Genet. 15:179-180. Mungpakdee S, Seo HC, Angotzi AR, Dong X, Akalin A, Chourrout D. 2008. Differential evolution of the 13 Atlantic salmon Hox clusters. Mol Biol Evol. 25:1333-1343. Muragaki Y, Mundlos S, Upton J, Olsen BR. 1996. Altered growth and branching patterns in synpolydactyly caused by mutations in HOXD13. Science. 272:548-551. Nakai Y, Kubota S, Kohno S. 1991. Chromatin diminution and chromosome elimination in four Japanese hagfish species. Cytogenet Cell Genet. 56:196-198. Neidert AH, Virupannavar V, Hooker GW, Langeland JA. 2001. Lamprey Dlx genes and early vertebrate evolution. Proc Natl Acad Sci U S A. 98:1665-1670. Nelson CE, Morgan BA, Burke AC, Laufer E, DiMambro E, Murtaugh LC, Gonzales E, Tessarollo L, Parada LF, Tabin C. 1996. Analysis of Hox gene expression in the chick limb bud. Development. 122:1449-1466. Nolte C, Amores A, Nagy Kovacs E, Postlethwait J, Featherstone M. 2003. The role of a retinoic acid response element in establishing the anterior neural expression border of Hoxd4 transgenes. Mech Dev. 120:325- 335. Nolte C, Krumlauf R. 2000. Expression of Hox Genes in the Nervous System of Vertebrates. Madame Curie Bioscience Database [Internet]. Noordermeer D, Leleu M, Splinter E, Rougemont J, De Laat W, Duboule D. 2011. The dynamic architecture of Hox gene clusters. Science. 334:222-225. Nordstrom K, Larsson TA, Larhammar D. 2004. Extensive duplications of phototransduction genes in early vertebrate evolution correlate with block (chromosome) duplications. Genomics. 83:852-872. Ohno S. 1970. Evolution by Gene Duplication. Springer-Verlag, Heidelberg. Pancer Z, Amemiya CT, Ehrhardt GR, Ceitlin J, Gartland GL, Cooper MD. 2004. Somatic diversification of variable lymphocyte receptors in the agnathan sea lamprey. Nature. 430:174-180. Pascual-Anaya J, Adachi N, Alvarez S, Kuratani S, D'Aniello S, Garcia- Fernandez J. 2012. Broken colinearity of the amphioxus Hox cluster. Evodevo. 3:28. Pearson JC, Lemons D, McGinnis W. 2005. Modulating Hox gene functions during animal body patterning. Nat Rev Genet. 6:893-904. Pendleton JW, Nagai BK, Murtha MT, Ruddle FH. 1993. Expansion of the Hox gene family and the evolution of chordates. Proc Natl Acad Sci U S A. 90:6300-6304.

135

Pennacchio LA, Ahituv N, Moses AM, et al. 2006. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 444:499-502. Philippidou P, Walsh CM, Aubin J, Jeannotte L, Dasen JS. 2012. Sustained Hox5 gene activity is required for respiratory motor neuron development. Nat Neurosci. 15:1636-1644. Popperl H, Bienz M, Studer M, Chan SK, Aparicio S, Brenner S, Mann RS, Krumlauf R. 1995. Segmental expression of Hoxb-1 is controlled by a highly conserved autoregulatory loop dependent upon exd/pbx. Cell. 81:1031-1042. Powers TP, Amemiya CT. 2004. Evidence for a Hox14 paralog group in vertebrates. Curr Biol. 14:R183-184. Prince VE, Joly L, Ekker M, Ho RK. 1998. Zebrafish hox genes: genomic organization and modified colinear expression patterns in the trunk. Development. 125:407-420. Prohaska SJ, Fried C, Amemiya CT, Ruddle FH, Wagner GP, Stadler PF. 2004. The shark HoxN cluster is homologous to the human HoxD cluster. J Mol Evol. 58:212-217. Punnamoottil B, Herrmann C, Pascual-Anaya J, D'Aniello S, Garcia- Fernandez J, Akalin A, Becker TS, Rinkwitz S. 2010. Cis-regulatory characterization of sequence conservation surrounding the Hox4 genes. Dev Biol. 340:269-282. Putnam NH, Butts T, Ferrier DE, et al. 2008. The amphioxus genome and the evolution of the karyotype. Nature. 453:1064-1071. Qiu H, Hildebrand F, Kuraku S, Meyer A. 2011. Unresolved orthology and peculiar coding sequence properties of lamprey genes: the KCNA gene family as test case. BMC Genomics. 12:325. Ravi V, Lam K, Tay B-H, Tay A, Brenner S, Venkatesh B. 2009. Elephant shark (Callorhinchus milii) provides insights into the evolution of Hox gene clusters in gnathostomes. Proceedings of the National Academy of Sciences. 106:16327-16332. Ravi V, Venkatesh B. 2008. Rapidly evolving fish genomes and teleost diversity. Curr Opin Genet Dev. 18:544-550. Richardson MK, Admiraal J, Wright GM. 2010. Developmental anatomy of lampreys. Biol Rev Camb Philos Soc. 85:1-33. Rinn JL, Chang HY. 2012. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 81:145-166. Rinn JL, Kertesz M, Wang JK, et al. 2007. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 129:1311-1323. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology. 61:539-542. Ruvinsky I, Gibson-Brown JJ. 2000. Genetic and developmental bases of serial homology in vertebrate limb evolution. Development. 127:5233- 5244. Ryan JF, Mazza ME, Pang K, Matus DQ, Baxevanis AD, Martindale MQ, Finnerty JR. 2007. Pre-bilaterian origins of the Hox cluster and the Hox code: evidence from the sea anemone, Nematostella vectensis. PLoS ONE. 2:e153.

136

Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B. 2004. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 5:99. Santini S, Boore JL, Meyer A. 2003. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res. 13:1111-1122. Schneuwly S, Klemenz R, Gehring WJ. 1987. Redesigning the body plan of Drosophila by ectopic expression of the homoeotic gene Antennapedia. Nature. 325:816-818. Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, Smit A, Program NCS, Green ED, Hardison RC, Miller W. 2003a. MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res. 31:3518-3524. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. 2003b. Human-mouse alignments with BLASTZ. Genome Res. 13:103-107. Sekimoto T, Yoshinobu K, Yoshida M, Kuratani S, Fujimoto S, Araki M, Tajima N, Araki K, Yamamura K. 1998. Region-specific expression of murine Hox genes implies the Hox code-mediated patterning of the digestive tract. Genes Cells. 3:51-64. Seo HC, Edvardsen RB, Maeland AD, et al. 2004. Hox cluster disintegration with persistent anteroposterior order of expression in Oikopleura dioica. Nature. 431:67-71. Sham MH, Vesque C, Nonchev S, Marshall H, Frain M, Gupta RD, Whiting J, Wilkinson D, Charnay P, Krumlauf R. 1993. The zinc finger gene Krox20 regulates HoxB2 (Hox2.8) during hindbrain segmentation. Cell. 72:183-196. Shang L, Pruett ND, Awgulewitsch A. 2002. Hoxc12 expression pattern in developing and cycling murine hair follicles. Mech Dev. 113:207-210. Sharman AC, Holland PW. 1998. Estimation of Hox gene cluster number in lampreys. Int J Dev Biol. 42:617-620. Shin JT, Priest JR, Ovcharenko I, Ronco A, Moore RK, Burns CG, MacRae CA. 2005. Human-zebrafish non-coding conserved elements act in vivo to regulate transcription. Nucleic Acids Res. 33:5437-5445. Simon JA, Kingston RE. 2013. Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol Cell. 49:808-824. Smith JJ, Antonacci F, Eichler EE, Amemiya CT. 2009. Programmed loss of millions of base pairs from a vertebrate genome. Proceedings of the National Academy of Sciences of the United States of America. 106:11212-11217. Smith JJ, Baker C, Eichler EE, Amemiya CT. 2012. Genetic consequences of programmed genome rearrangement. Curr Biol. 22:1524-1529. Smith JJ, Kuraku S, Holt C, et al. 2013. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat Genet. 45:415-421. Smith JJ, Stuart AB, Sauka-Spengler T, Clifton SW, Amemiya CT. 2010. Development and analysis of a germline BAC resource for the sea lamprey, a vertebrate that undergoes substantial chromatin diminution. Chromosoma. 119:381-389.

137

Soshnikova N. 2013. Hox genes regulation in vertebrates. Dev Dyn. Soshnikova N, Dewaele R, Janvier P, Krumlauf R, Duboule D. 2013. Duplications of hox gene clusters and the emergence of vertebrates. Dev Biol. Soshnikova N, Duboule D. 2009. Epigenetic temporal control of mouse Hox genes in vivo. Science. 324:1320-1323. Spitz F, Gonzalez F, Duboule D. 2003. A global control region defines a chromosomal regulatory landscape containing the HoxD cluster. Cell. 113:405-417. Spitz F, Gonzalez F, Peichel C, Vogt TF, Duboule D, Zakany J. 2001. Large scale transgenic and cluster deletion analysis of the HoxD complex separate an ancestral regulatory module from evolutionary innovations. Genes Dev. 15:2209-2214. Spitz F, Herkenne C, Morris MA, Duboule D. 2005. Inversion-induced disruption of the Hoxd cluster leads to the partition of regulatory landscapes. Nat Genet. 37:889-893. Stadler PF, Fried C, Prohaska SJ, Bailey WJ, Misof BY, Ruddle FH, Wagner GP. 2004. Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii. Mol Phylogenet Evol. 32:686-694. Stock DW, Whitt GS. 1992. Evidence from 18S ribosomal RNA sequences that lampreys and hagfishes form a natural group. Science. 257:787- 789. Suster ML, Abe G, Schouw A, Kawakami K. 2011. Transposon-mediated BAC transgenesis in zebrafish. Nat Protoc. 6:1998-2021. Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic acids research. 34:W609-612. Suzuki A, Ikeda, Y, Nakayama, K. 1999. Chromosome and Ag-NORs of three species of Lampetra (Petromyzontiformes). Chromosome Science. 3:150. Takezaki N, Figueroa F, Zaleska-Rutczynska Z, Klein J. 2003. Molecular phylogeny of early vertebrates: monophyly of the agnathans as revealed by sequences of 35 genes. Mol Biol Evol. 20:287-292. Takio Y, Kuraku S, Murakami Y, Pasqualetti M, Rijli FM, Narita Y, Kuratani S, Kusakabe R. 2007. Hox gene expression patterns in Lethenteron japonicum embryos--insights into the evolution of the vertebrate Hox code. Dev Biol. 308:606-620. Takio Y, Pasqualetti M, Kuraku S, Hirano S, Rijli FM, Kuratani S. 2004. Evolutionary biology: lamprey Hox genes and the evolution of jaws. Nature. 429:1 p following 262. Tank EM, Dekker RG, Beauchamp K, Wilson KA, Boehmke AE, Langeland JA. 2009. Patterns and consequences of vertebrate Emx gene duplications. Evol Dev. 11:343-353. Tanzer A, Amemiya CT, Kim CB, Stadler PF. 2005. Evolution of microRNAs located within Hox gene clusters. J Exp Zoolog B Mol Dev Evol. 304:75-85. Taylor HS, Vanden Heuvel GB, Igarashi P. 1997. A conserved Hox axis in the mouse and human female reproductive system: late establishment and

138

persistent adult expression of the Hoxa cluster genes. Biol Reprod. 57:1338-1345. Technau U, Steele RE. 2011. Evolutionary crossroads in developmental biology: Cnidaria. Development. 138:1447-1458. Thisse B, Thisse C. 2004. Fast Release Clones: A High Throughput Expression Analysis. ZFIN Direct Data Submission (http://zfin.org). Thisse B, Thisse C. 2005. High Throughput Expression Analysis of ZF- Models Consortium Clones. ZFIN Direct Data Submission (http://zfin.org). Tihanyi B, Vellai T, Regos A, Ari E, Muller F, Takacs-Vellai K. 2010. The C. elegans Hox gene ceh-13 regulates cell migration and fusion in a non- colinear way. Implications for the early evolution of Hox clusters. BMC Dev Biol. 10:78. Tschopp P, Christen AJ, Duboule D. 2012. Bimodal control of Hoxd gene transcription in the spinal cord defines two regulatory subclusters. Development. 139:929-939. Tumpel S, Wiedemann LM, Krumlauf R. 2009. Hox genes and segmentation of the vertebrate hindbrain. Curr Top Dev Biol. 88:103-137. Urasaki A, Morvan G, Kawakami K. 2006. Functional dissection of the Tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition. Genetics. 174:639-649. van den Akker E, Reijnen M, Korving J, Brouwer A, Meijlink F, Deschamps J. 1999. Targeted inactivation of Hoxb8 affects survival of a spinal ganglion and causes aberrant limb reflexes. Mech Dev. 89:103-114. Venkatesh B, Kirkness EF, Loh YH, et al. 2006. Ancient noncoding elements conserved in the human genome. Science. 314:1892. Vinagre T, Moncaut N, Carapuco M, Novoa A, Bom J, Mallo M. 2010. Evidence for a myotomal Hox/Myf cascade governing nonautonomous control of rib specification within global vertebral domains. Dev Cell. 18:655-661. Visel A, Minovitsky S, Dubchak I, Pennacchio LA. 2007. VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res. 35:D88-92. Voss SR, Putta S, Walker JA, Smith JJ, Maki N, Tsonis PA. 2013. Salamander Hox clusters contain repetitive DNA and expanded non-coding regions: a typical Hox structure for non-mammalian tetrapod vertebrates? Hum Genomics. 7:9. Wang KC, Yang YW, Liu B, et al. 2011. A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature. 472:120-124. Wang WC, Anand S, Powell DR, Pawashe AB, Amemiya CT, Shashikant CS. 2004. Comparative cis-regulatory analyses identify new elements of the mouse Hoxc8 early enhancer. J Exp Zool B Mol Dev Evol. 302:436-445. Wellik DM, Capecchi MR. 2003. Hox10 and Hox11 genes are required to globally pattern the mammalian skeleton. Science. 301:363-367. Westerfield M. 2000. A guide for the laboratory use of zebrafish (Danio rerio). University of Oregon Press.

139

Woltering JM, Durston AJ. 2008. MiR-10 represses HoxB1a and HoxB3a in zebrafish. PLoS One. 3:e1396. Woltering JM, Vonk FJ, Muller H, Bardine N, Tuduce IL, de Bakker MA, Knochel W, Sirbu IO, Durston AJ, Richardson MK. 2009. Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code. Dev Biol. 332:82-89. Woolfe A, Goodson M, Goode DK, et al. 2005. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3:e7. Wrischnik LA, Kenyon CJ. 1997. The role of lin-22, a hairy/enhancer of split homolog, in patterning the peripheral nervous system of C. elegans. Development. 124:2875-2888. Yang Z, Rannala B. 2012. Molecular phylogenetics: principles and practice. Nat Rev Genet. 13:303-314. Yanze N, Spring J, Schmidli C, Schmid V. 2001. Conservation of Hox/ParaHox-related genes in the early development of a cnidarian. Dev Biol. 236:89-98. Yekta S, Tabin CJ, Bartel DP. 2008. MicroRNAs in the Hox network: an apparent link to posterior prevalence. Nat Rev Genet. 9:789-796. Zakany J, Kmita M, Duboule D. 2004. A dual role for Hox genes in limb anterior-posterior asymmetry. Science. 304:1669-1672. Zhang X, Lian Z, Padden C, Gerstein MB, Rozowsky J, Snyder M, Gingeras TR, Kapranov P, Weissman SM, Newburger PE. 2009. A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster. Blood. 113:2526-2534. Zhang Z, Schwartz S, Wagner L, Miller W. 2000. A greedy algorithm for aligning DNA sequences. J Comput Biol. 7:203-214.

140

Appendix

Appendix Table 1 - New gene names given in this project for Japanese lamprey Hox genes submitted previously to GenBank by other studies (Takio et al. 2004; Takio et al. 2007; Kuraku et al. 2008). Genes identified by sequencing representative BAC clones and assembling into thirteen Hox contigs (Fig. 5) are labeled with an asterisk.

Previous GenBank submissions New names given in this Name Accession # project LjHoxQ8* AB125274.1 Hox-α8 LjHox6w* AB125275.1 Hox-α6 LjHox5i* AB125276.1 Hox-α5 LjHox4w* AB125269.1 Hox-α4 LjHox3d* AB125270.1 Hox-α3 Hox2* AY497314.1 Hox-α2 LjHoxW10a* AB286672.1 Hox-β10 LjHox9r* AB125271.1 Hox-β9 LjHox7m* AB125272.1 Hox-β7 LjHox1w AB286671.1 Hox-β1 LjHox13-beta AB293598.1 Hox-γ13 LjHox5w* AB125277.1 Hox-γ5 LjHox4x* AB125278.1 Hox-γ4 LjHox13-alpha AB293597.1 Hox-δ13 LjHox14-alpha* AB293599.1 Hox-ε14 LjHox11t* AB286674.1 Hox-ε11 LjHox10s* AB286673.1 Hox-ε10 LjHox8p* AB125273.1 Hox-ε8

141

Appendix Table 2. Known sea lamprey Hox genes and their Japanese lamprey orthologs. The sea lamprey scaffolds are from the Pmarinus_7.0 assembly (http://www.ensembl.org).

Sea lamprey Hox genes Japanese Name GenBank Accession/ Scaffold No. lamprey orthologs identified in this study Hox11 AFZ94995.1/AAM19482.1 Hox-α11 Hox9 AFZ94994.1 Hox-α9 Hox8 AFZ94993.1/AAC04332.1 Hox-α8 Hox7 AFZ94992.1/AAM19475.1 Hox-α7 Hox6 AFZ94991.1/AAM19474.1/AAD15930.1 Hox-α6 Hox5 AFZ94990.1/AAM19473.1 Hox-α5 Hox4 AFZ94989.1/AAL61642.1 Hox-α4 Hox3 AFZ94988.1/AAM19467.1 Hox-α3 Hox2 AFZ94987.1/AAM19466.1 Hox-α2 HoxA11, partial scaffold_1553 Hox-β11 HoxW10a, partial AAM19478.1 Hox-β10 Hox9-like, partial AFZ94999.1/AAM19476.1 Hox-β9 Hox8-like, partial AFZ94998.1/AAC04331.1 Hox-β8 HoxK6, partial AAM19471.1 Hox-β7 Hox5-like, partial AFZ94997.1/AAM19472.1 Hox-β5 Hox4-like AFZ94996.1/AAM19469.1 Hox-β4 Hox1w AAL61641.1 Hox-β1 HoxB13 scaffold_2687 Hox-γ13 HoxA9, partial AAM19477.1/scaffold_6175 Hox-γ9 Hoxw5 AAD15929.1/AAM19470.1 Hox-γ5 Hox4x AAL17914.1 Hox-γ4 HoxZ11b, partial AAM19483.1 Hox-δ11 HoxB8 AAC04330.1/scaffold_6993 Hox-δ8 HoxB3, partial scaffold_13473 Hox-δ3 HoxY11, partial AAM19481.1 Hox-ε11 HoxW10b, partial AAM19479.1 Hox-ε10 Hox9x, partial AAA02545.1 Hox-ε9 HoxA7, partial AAM19468.1/scaffold_6616 Hox-ε7 Hox4-like, partial AFZ95000.1 Hox-ε4 HoxA1, partial scaffold_10557 Hox-ε1 HoxA9, partial AAM19480.1/scaffold_16685 Hox-ζ9

142

Appendix Table 3 - Number of gene family members in Japanese lamprey and human. Homology of these genes was verified by phylogenetic analysis. Phylogenetic trees for the gene families can be obtained from authors. These gene families are either linked to vertebrate Hox clusters or were previously used for phylogenetic analysis of lamprey gene families (Neidert et al. 2001; Kuraku et al. 2009; Tank et al. 2009; Crow et al. 2012)

Serial Gene family Number of members No. Human Japanese lamprey 1 KCNA 8 15 2 GNAI 1/2/3 3 7 3 AhR/AhR repressor 2 6 4 FST/FSTL 5 6 5 Wnt7 2 5 6 PPAR α/β(δ)/γ 3 5 7 FGFR 1/2/3/4 4 5 8 GNB 1/2/3/4/5 5 5 9 P2ry1 1 4 10 CK M/B 2 4 11 ITSN 1/2 2 4 12 PRICKLE 1/2 2 4 13 THR A/B 2 4 14 TNNC 1/2 (slow/fast) 2 4 15 Amph/Bin1/Bin2 3 4 16 Fzd 1/2/7 3 4 17 Otx1/2(5)/Crx 3 4 18 BRD T/2/3/4 4 4 (RING3/RING3L/HUNKI/BRDT) 19 plexin A 1/2/3/4 4 4 20 MAGI 1/2/3 3 4 21 Dlx 2/3/5 3 3 22 Gas1 1 3 23 Rxfp3 1 3 24 Bmp 2/4 2 3 25 Evx1/2 (linked to Hox cluster) 2 3 26 NEO/DCC 2 3 27 cPKC α/β/γ 3 3 28 enolase α/β/γ 3 3 29 SLC4A 1/2/3 3 3 30 RAR α/β/γ 3 3 31 HMG 1/2/3 3 3 32 Lunatic/radical/manic fringe 3 3

143

33 Nuclear receptor family NR1I 3 3 1/2/3 (VDR/PXR/CAR) 34 AMP activated protein kinase γ 3 3 1/2/3 35 GNAT 1/2/3 3 3 36 Pax 1/9 2 3 37 Dlx 1/4/6 3 3 38 Fzd 5/8 2 3 39 Sox 4/11/12 3 3 40 Creb5 (linked to Hox cluster) 1 2 41 Tax1bp1 (linked to Hox cluster) 1 2 42 Hexim1 1 2 43 Mgat2 1 2 44 Pskh1 1 2 45 PurA 1 2 46 Ache/BChe 2 2 47 LDH A/B 2 2 48 Patched 1/2 2 2 49 Pax3/7 2 2 50 Phosphoinositide 3 kinase B-cell 2 2 adaptor protein 51 SPARC / SPARC-like 1 2 2 52 Complement component 3 2 C3/C4/C5 53 KIF1 A/B/C 3 2 54 adrenergic receptor β 1/2/3 3 2 55 RXR α/β/γ 3 2 56 Synapsin I/II/III 3 2 57 steroid hormone receptor 4 2 (AR/PR/MR/GR) 58 PACSIN 1/2 3 2 59 GNGT1/2 / GNG11 3 2 60 Enc 1/2 2 2 61 Sox 1/2/3 3 2 62 Sox 14/21 2 2 63 Emx 1/3 2 2 64 HNF1 α/β 2 1 65 Synaptojanin 1/2 2 1 66 Proteasome subunit Y / LMP2 2 1 67 ER α/β 2 1 68 LMP7 / LMPX (PSMB5/PSMB8) 2 1 69 Pax 2/5/8 3 1 70 SH3GL 1/2/3 3 1

144

Appendix Figure 1 - Phylogenetic analyses of Japanese lamprey Hox4 genes using full-length coding (nucleotide) sequences and protein sequences. Seven full-length Hox4 protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Hox4 homologs with amphioxus Hox4 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Ac, Anolis carolinensis; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milii; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

145

Appendix Figure 2 – Phylogenetic analyses of Japanese lamprey Hox4 genes using the first two codon positions of full-length protein-coding sequences. The first and second codon positions of seven full-length, Hox4 protein-coding sequences from Japanese lamprey were aligned with selected gnathostome Hox4 homologs with amphioxus Hox4 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for the alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. The BI tree is shown as a cladogram due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Ac, Anolis carolinensis; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milii; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

146

Appendix Figure 3 – Phylogenetic analyses of Japanese lamprey Hox13 genes using full-length coding (nucleotide) sequences and protein sequences. Four full-length Hox13 protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Hox13 homologs with amphioxus Hox13 as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. All trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox.

147

Appendix Figure 4 – Phylogenetic analyses of Japanese lamprey Hox9 genes using only the second exon sequence. Nucleotide (cds) and protein (pro) sequences encoded by the second exons of five Japanese lamprey Hox9 genes were aligned with homologous sequences from selected gnathostomes with amphioxus Hox9 second exon sequence as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. BI trees are shown as cladograms due to markedly varying branch lengths. Japanese lamprey branches and genes are highlighted in red. Sequences for other chordates were obtained from GenBank and Ensembl. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Xt, Xenopus tropicalis; Lm, Latimeria menadoensis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; AmphiHox, amphioxus Hox. 148

Appendix Figure 5 – Phylogenetic analyses of Japanese lamprey KCNA genes using full-length coding (nucleotide) sequences and protein sequences. Fifteen full-length KCNA protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome KCNA homologs and Ciona KCNA was used as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. The BI tree is shown as a cladogram due to markedly varying branch lengths. The eight members of the KCNA family of genes are marked. Lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Ac, Anolis carolinensis; Xt, Xenopus tropicalis; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; Ci, Ciona intestinalis

149

Appendix Figure 6 – Phylogenetic analyses of Japanese lamprey Wnt5 and Wnt7 genes using full-length coding (nucleotide) sequences and protein sequences. Three full-length Wnt5 and five full-length Wnt7 protein- coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Wnt5 and Wnt7 homologs and amphioxus Wnt1 was used as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. The BI tree is shown as a cladogram due to markedly varying branch lengths. Lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Lc, Latimeria chalumnae; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; Bf, Branchiostoma floridae. 150

Appendix Figure 7 – Phylogenetic analyses of Japanese lamprey THR genes using full-length coding (nucleotide) sequences and protein sequences. Four full-length THR protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome THR homologs and amphioxus THR was used as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. The BI tree is shown as a cladogram due to markedly varying branch lengths. Lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Lc, Latimeria chalumnae; Sc, Scyliorhinus canicula; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; Bl, Branchiostoma lanceolatum.

151

Appendix Figure 8 – Phylogenetic analyses of Japanese lamprey Otx genes using full-length coding (nucleotide) sequences and protein sequences. Four full-length Otx protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Otx and Crx homologs and amphioxus Otx was used as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. All trees are shown as a cladograms due to markedly varying branch lengths. Lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Lc, Latimeria chalumnae; Sc, Scyliorhinus canicula; Cm, Callorhinchus milli; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; Bf, Branchiostoma floridae; Bl, Branchiostoma lanceolatum.

152

Appendix Figure 9 – Phylogenetic analyses of Japanese lamprey Dlx genes using full-length coding (nucleotide) sequences and protein sequences. Six full-length Dlx protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Dlx homologs and amphioxus Dll was used as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. The BI tree is shown as a cladogram due to markedly varying branch lengths. Lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Ac, Anolis carolinensis; Lc, Latimeria chalumnae; Ts, Triakis semifasciata; Pm, Petromyzon marinus; Lj, Lethenteron japonicum; Bf, Branchiostoma floridae.

153

Appendix Figure 10 – Phylogenetic analyses of Japanese lamprey Fzd genes using full-length coding (nucleotide) sequences and protein sequences. Eleven full-length Fzd protein-coding sequences (cds – nucleotide coding sequence and pro - protein) from Japanese lamprey were aligned with selected gnathostome Fzd homologs and amphioxus frizzled was used as an outgroup. Bayesian Inference (BI) and Maximum Likelihood (ML) trees were generated for these alignments. Statistical support values for the nodes are shown as either Bayesian posterior probability values or ML bootstrap percentages. The BI tree is shown as a cladogram due to markedly varying branch lengths. Lamprey branches and genes are highlighted in red. Dr, Danio rerio; Tr, Takifugu rubripes; Hs, Homo sapiens; Gg, Gallus gallus; Lc, Latimeria chalumnae; Cm, Callorhinchus milli; Lj, Lethenteron japonicum; Bf, Branchiostoma floridae.

154

List of publications

Mehta TK*, Vydianathan R*, Yamasaki S, Lee AP, Lian MM, Tay BH, Tohari S, Yanai S, Tay A, Brenner S, Venkatesh B. 2013. Evidence for at least six Hox clusters in the Japanese lamprey (Lethenteron japonicum). Proc Natl Acad Sci USA. 2013; 110: 16044-16049.

155