A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP ( AUSTRALIENSIS)

Kimberley Rogl BAppSc BSc (Hons)

Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Biology and Environmental Sciences Science and Engineering Faculty University of Technology 2020

Keywords

Atyid, cryptic species, freshwater shrimp, hybridisation, next-generation sequencing, phylogenomics, reproductive isolation, SNP analysis, speciation, speciation genes, speciation genomics, species complex, transcriptomics,

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) i

Abstract

The speciation debate in evolutionary science is long and protracted, as evidenced by the multitude of concepts regarding species and speciation, but understanding the nature of species is of central importance in the study of biology and ecology. Traditional approaches for investigating speciation (eg. behavioural or morphological) have not always provided sufficient insight into understanding the processes involved. While modern molecular techniques provide different perspectives to help address these biological questions, it is often difficult to find natural populations outside the laboratory to recognise speciation in which to apply them. A human-induced translocation event in the 1990s of the atyid shrimp, Paratya australiensis, inadvertently brought together two very divergent lineages of the species, resulting in extreme non-random mating. While, the species status of P. australiensis has previously been debated, it is currently considered monotypic. Other research, however, suggests that this single species may indeed be a complex of cryptic species (up to nine species) or is at least in the final stages of speciating. Consequently, this system provides an excellent opportunity to investigate speciation from a genetics/genomics perspective.

In the Brisbane River, SE Queensland in 1993, 10,000 shrimp from one lineage (Kilcoy Creek) were translocated to another site (Branch Creek) where a divergent lineage (~6% for mtDNA COI) existed. The majority of resulting F1 juveniles were hybrids resulting from Branch Creek males mating with Kilcoy Creek females (there were no hybrids found from the reverse cross). However, when the same site was sampled after these F1 reached adult stage some months later, the hybrids had significantly decline in number relative to the number of individuals from matings within lineages, indicating reduced fitness in these hybrids. The purpose of this study was to investigate the genetic architecture of individuals from both pure lineages and of individuals across the hybrid zone to gain insight into the process of speciation at the molecular level.

First, a comparative transcriptomic approach was used on the two highly divergent lineages of P. australiensis to identify genes of interest that may be associated with adaptation to local environments that may contribute to the observed breakdown in mate recognition. Differential gene expression (DGE) analysis showed ii A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS)

660 highly differentially expressed transcripts between the two populations (eg. genes associated with temperature tolerance, osmoregulation, egg size control and other life history characteristics). Hard variant calling revealed 2,554 filtered single nucleotide polymorphisms (SNPs); overall, this high number is consistent with neutral divergence shown in previous studies. However, SNPs at some of these genes indicated that local adaptation associated with energy production, temperature tolerance, etc. has significantly contributed to how these lineages have evolved and diverged while adapting to their respective habitats.

The second research chapter incorporated the two pure lineages along with individuals from the known hybrid zone to identify genes directly involved in speciation or reproductive isolation. Using transcriptomics, five speciation genes identified in the literature (adenylate cyclase, arylsulfatase, heat shock 70, tcP-1, and triosephosphate isomerase) appeared to be under purifying selection. Multiple genes were identified through gene ontology that are involved in the reproductive process but a single gene (takeout protein – identified as part of the enzymatic pathway associated with courtship behaviour in Drosophila) showed a signature of positive selection. Furthermore, The DGE analysis revealed a cytonuclear interaction relating to temperature and oxygen transport that had not previously been seen in P. australiensis. This pattern fits the Dobzhansky-Muller model where speciation is underpinned by a few genes interacting between the mitochondrial and nuclear genomes.

The third component of research was to investigate whether the asymmetrical hybridisation was still ongoing in the known hybrid zone in Branch Creek. Of particular interest was whether, after 25 generations of interbreeding (i.e. hybridisation, backcrossing etc.), the two lineages had become homogenised or if there were signatures of continued deviation from random mating seen in both mitochondrial and nuclear genomes. A SNP approach was used and after variant calling, 35,704 SNPs were found across the two pure lineages and the nine hybrid zone individuals. Multiple analyses (identity by state, principal component, and relatedness) of these SNPs showed that hybridisation is still ongoing as evidenced by the detection of early generation hybrids. While variation in nuclear genomes of the hybrid zone individuals suggested that hybridisation was still occurring, eight out of these nine individuals had the introduced mtDNA (i.e. from Kilcoy Creek), suggesting strong

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) iii

selection was still present, even when large parts of their nuclear genomes had returned to a similar profile as an original Branch Creek resident. There is a strong indication that given sufficient time, the introduced mtDNA (either through purifying or sexual selection) may send the Branch Creek lineage extinct.

Finally, the biogeographic history of the nine divergent P. australiensis lineages was investigated using the mitochondrial cytochrome oxidase I (COI) gene; previous problems with polytomies were resolved with whole mitogenome analysis of the two lineages used in this study. From this, the COI phylogeny can more confidently be relied upon to provide an accurate portrayal of the diversification/divergence history of the P. australiensis species complex. The phylogeny showed three distinct range expansions that have occurred in a south to north direction. It is interesting to note that after these range expansions, none of the currently known lineages have ever been found in sympatry. This suggests that when two lineages have come together in the past, non-random mating in every case has led to the fixation of one lineage and the extinction of the other.

Overall, this project provides an extensive genomic resource to investigate the nature of species and speciation in P. australiensis. From the results, it can be inferred that P. australiensis sits along the speciation continuum, towards becoming a species complex (cryptic or not). There is evidence that a single lineage is dominant over others that it comes into contact with (i.e. through hybridisation, one lineage goes extinct - consistent with Paterson’s recognition species concept) before positive reinforcement can occur (sensu Dobzhansky/Mayr biological species concept). The cytonuclear incompatibilities are the perfect example of a few genes interacting that results in lower fitness in the hybrid offspring (i.e. genic species concept).

iv A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS)

Table of Contents

Keywords ...... i Abstract ...... ii List of Figures ...... vii List of Tables ...... ix List of Abbreviations ...... x Statement of Original Authorship ...... xi Acknowledgements ...... xii Chapter 1: Introduction ...... 1 1.1 Background ...... 2 1.2 Aims of the current project ...... 23 1.3 Thesis Outline ...... 24 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis ...... 27 2.1 Introduction ...... 28 2.2 Methods ...... 30 2.3 Results ...... 32 2.4 Discussion ...... 36 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone ...... 39 3.1 Introduction ...... 40 3.2 Methods ...... 44 3.3 Results ...... 47 3.4 Discussion ...... 61 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis) ...... 67 4.1 Introduction ...... 68 4.2 Methods ...... 70 4.3 Results ...... 72 4.4 Discussion ...... 81 Chapter 5: Resolving the polytomy of Paratya australiensis ...... 84 5.1 Introduction ...... 85 5.2 Methods ...... 86 5.3 Results ...... 88 5.4 Discussion ...... 93 Chapter 6: General Discussion ...... 96

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) v

6.1 Overview of results ...... 96 6.2 Consequences and relevance of this research ...... 98 6.3 Limitations and future directions ...... 102 6.4 Concluding remarks ...... 104 References ...... 105 Appendices ...... 127 Appendix A ...... 127 Appendix B ...... 143 Appendix C ...... 147 Appendix D ...... 154

vi A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS)

List of Figures Figure 1.1 From a panmictic population to two reproductively isolated species along the speciation continuum (from Seehausen et al., 2014)...... 11 Figure 1.2 A simplified representation of a single lineage/species splitting and forming two lineages/species. Here the shades of grey represent daughter lineages diverging through time and the horizontal lines SC (species criterion) 1-9 represent the time they acquire different properties (from de Queiroz, 2007)...... 12 Figure 1.3 Map of Paratya australiensis distribution ...... 16 Figure 1.4 Distribution of the five species and subspecies as described by Reik (1953)...... 18 Figure 1.5 From Fawcett et al., 2010. Frequency of the translocated alleles across years in Branch Creek, where B-0 is the site of the initial translocation in 1995 and + is pools above and – is pools below the translocation site...... 19 Figure 1.6 (A) Neighbour-joining gene tree (mtDNA COI & nDNA 28S) for Paratya australiensis (B) Map of eastern presenting the P. australiensis lineages found at each sampled river (Cook et al., 2006)...... 20 Figure 3.1 (A) Map of eastern Australia presenting the P. australiensis lineages found at each sampled river (B) Neighbour-joining gene tree (mtDNA COI & nDNA 28S) for Paratya australiensis (from Cook et al., 2006) ...... 43 Figure 3.2. Map detailing the sampling locations of Kilcoy Creek (lineage 4), Stony Creek (lineage 6) and Branch Creek (hybrid zone)...... 44 Figure 3.3 Expression pattern of the 50 most differentially expressed transcripts within each pairwise comparison at e-3 with a log fold change 2. The yellow coloured transcripts are upregulated or highly expressed and the purple transcripts are downregulated or lowly expressed...... 60 Figure 4.1 Workflow used in the R package SNPRelate to create a clustering tree ...... 71 Figure 4.2 Log10 WEGO plot of most frequent GO terms in the 6,307 SNP transcript ...... 75 Figure 4.3 Matrix of genome wide average identity by state pairwise identities where green is more different to pink more similar. BC = Branch Creek, KC = Kilcoy Creek, SC = Stony Creek...... 76 Figure 4.4 Multidimensional scaling analysis on identity by state distance. Red = Branch Creek hybrid zone, green = Kilcoy Creek, blue = Stony Creek...... 77 Figure 4.5 Clustering dendrogram based on hierarchical clustering of IBS fractional values...... 78

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) vii

Figure 4.6 Principal component analysis plot on SNP genotypes with eigenvectors one and two. BC = Branch Creek; KC = Kilcoy Creek; SC = Stony Creek...... 79 Figure 4.7 Dendrogram built on the Φ statistic from Manichaikul et al. (2010) where the further from 0, the less related individuals are...... 80 Figure 5.1 Maximum likelihood unrooted phylogeny based on whole mitogenome of Paratya australiensis. Bootstrap values for the analysis are presented on branches...... 91 Figure 5.2 Maximum likelihood phylogeny based on COI and rooted with Paratya howensis as an outgroup. Bootstrap values are presented on each branch...... 92 Figure A1 Top hit species distribution chart ...... 127 Figure A2 Expression pattern of the differentially expressed transcripts for individuals collected from Stony Creek (SC and Kilcoy Creek (KC) (the yellow coloured transcripts are upregulated or highly expressed and purple transcripts are lowly expressed). Graph A is gene expression at e-3 with a log fold change 2. Graph B is gene expression at e-10 with a log fold change 2...... 129 Figure C1 Principal Component analysis plot of the first four eigenvectors based on SNPs...... 147

viii A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS)

List of Tables Table 1.1 Genes identified to play a role in the process of speciation...... 14 Table 2.1 Assembly, annotation and transcriptome completeness statistics for P. australiensis ...... 33 Table 2.2 Top 20 transcripts/genes expressed in each of two lineages of P. australiensis ...... 34 Table 2.3 SNP table for the two populations of P. australiensis ...... 35 Table 3.1 Statistics for Illumina sequencing, de novo assembly and annotation ...... 48 Table 3.2 Genes identified from the literature to play a role in the process of speciation ...... 49 Table 3.3 Genes identified from Table 3.2 and their pairwise selection results where dN= nonsynonymous substitution dS= synonymous substitution ...... 50 Table 3.4 Genes identified based on gene ontology and their pairwise selection results where dN= nonsynonymous substitution dS= synonymous substitution ...... 53 Table 4.1 Homozygosity and heterozygosity for each sampled individual...... 73 Table A1 Abundance estimation (TPM) of identified genes potentially involved in reproduction, temperature tolerance, osmoregulation and egg size control in P. australiensis based on GO terms from blast hits. ... 130 Table A2 Abundance estimation (TPM) of differentially expressed genes associated with SNPs in Kilcoy Creek and Stony Creek...... 135 Table B1 List of differentially expressed transcripts and their annotation ...... 143 Table C1 Output of Manichaikul et al. (2010) relatedness test...... 148

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) ix

List of Abbreviations bp Base pairs BSC Biological species concept BUSCO Benchmarking universal single-copy orthologs CSC Cohesion species concept DGE Differential gene expression DM Dobzhansky-Muller EcSC Ecological species concept EvSC Evolutionary species concept FDR False discovery rate GATK Genome Analysis Toolkit GO Gene ontology GSC Genic species concept IBS Identity by state MDS Multidimensional scaling mtDNA Mitochondrial DNA NCBI National Center for Biotechnology Information NGS Next generation sequencing ORF Open reading frame PCA Principal component analysis PhSC Phenetic species concept PSC Phylogenetic species concept QC Quality control RI Reproductive isolation rRNA Ribosomal RNA RSC Recognition species concept SMRS Specific mate recognition system SNP Single nucleotide polymorphism TPM Transcript abundance per million reads tRNA Transfer RNA WEGO Web gene ontology

x A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

QUT Verified Signature

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) xi

Acknowledgements

This thesis is dedicated to Barbra Winch.

First, my sincere thanks and gratitude goes to my principal supervisor Dr David Hurwood, your kindness for offering me this opportunity is greatly appreciated. Your guidance, advice and understanding throughout my candidature has been second to none. I am so glad I came and spoke to you all those years ago when I was an unsure 20 year old, you have undoubtedly helped me grow as a person. My thanks also goes to my associate supervisor, Dr Peter Prentis for your comments and advice on many aspects of this research. Without their supervision this thesis would not be complete. I would like to acknowledge the funds provided by an Australian Government Research Training Program (RTP) scholarship that facilitated this research. To my colleagues, thank you for being such a joy to be around every day. From morning coffee, to various field trips, you have all made this time so much more enjoyable. Special mention to all of my lab mates, Shengjie Ren, Mitch Irvine, Pia Schoenefuss, Liam Bartlett, Lifat Rahi, Dania Aziz and Azam Moshtaghi. To my family who have been a pillar of strength not only throughout my candidature but my life, always supporting me in everything I choose. Special thanks to my mum, whether you were driving to or from volleyball training or just being someone to talk to, thank you, I could not have done this without you. To my sisters, thank you for being there for me whenever I needed you, we have been through a lot together. Finally, to my partner Jack, thank you for pushing me to keep going when times were tough, for putting up with my range of moods and for just making me a better person. Also, a huge thank you for all your help with showing me the basics of bioinformatics, your help was invaluable.

xii A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS)

A GENOMICS PERSPECTIVE OF SPECIES AND SPECIATION IN AN ATYID SHRIMP (PARATYA AUSTRALIENSIS) xiii

Chapter 1: Introduction

Defining a species is one of the most difficult, yet one of the most critically important tasks in biology. Understanding the mechanisms that lead to the formation of new species is just as important as defining a species in evolutionary biology. A more traditional approach, such as morphological identification or behavioural experiments, to investigate speciation has not always provided sufficient insight into understanding the process of speciation itself and can often underestimate biodiversity (Abebe, Mekete & Thomas, 2011). The morphological approach specifically can often be misleading due to cryptic species and species complexes. Modern molecular technologies (i.e. next generation sequencing – NGS) offer greater resolution to resolve a range of biological questions from a genomics perspective (Pareek, Smoczynski & Tretyn, 2011). High-throughput sequencing allows for the efficient detection of key genes involved with particular traits or phenotypes, the differential expression of genes to understand functional roles, and allows ways to identify the functional genomic regions that are involved with significant evolutionary changes (Blank et al., 2014). are an excellent group to explore many questions within the biological field relating to speciation and evolution in general, due to the availability of many cryptic species in this subphylum. As a primary freshwater inhabitant, the decapod , Paratya australiensis (endemic Australian atyid shrimp) can provide an ideal starting point to understand speciation in this group. This is due to the many divergent lineages (inhabiting different geographic locations within Australia) within the one species (Cook et al., 2006) and apparent non-random mating when in sympatry (Fawcett et al., 2010). Therefore, the functional genomic characterisation (genomic divergence) of different lineages of P. australiensis can provide novel insights in understanding the ongoing speciation processes in decapod crustaceans. This will further help us in resolving the existing taxonomic difficulty in this species. It is thought that adaptation occurs via a set of interacting candidate genes instead of response from the whole genome (Radwan & Babik, 2012). The same can be true in the speciation process. One approach to understanding the speciation process is to characterise pre- and postzygotic isolation mechanisms. A transcriptomic

Chapter 1: Introduction

1

approach on different lineages of P. australiensis in both sympatry and allopatry will be able to provide a suite of candidate genes that are facilitating and/or influence the process of speciation.

1.1 BACKGROUND

1.1.1 Species concepts and speciation Defining a species is both a difficult and extremely important element with all fields of biological science. The ‘species’ is claimed to be one of the most important units (comparable to genes and cells) in biology (de Queiroz, 2005a) and is certainly the most important concept in as it is the only systematic level that has any real biological meaning; and higher systematic levels are essentially human constructs (Sokal & Crovello, 1970; de Queiroz, 1997). Confusion surrounding species and species concepts arises as a function of the many meanings that the term ‘species’ can take on, if not differentiated and defined clearly. Generally, species represent a taxonomic category where organisms are discovered, described and ordered into the binomial Linnaean classification system (Mayden, 1997). The discipline of taxonomy is dedicated specifically to the delimitation of species, which in turn requires a universally acceptable set of rules and conditions that work across all living organisms; i.e. a species concept. Over the years there have been many attempts to achieve this. While much controversy and debate has arisen around the various proposals, what they have in common is that each species concept not only tries to define what a species is but more broadly, it is making inference on the process of speciation itself. Not surprisingly, this divisive topic has subsequently led to many definitions of what constitutes a species; at least 26 from a recent count (Frankham et al., 2012). And of course, as many species concepts have been proposed, the various disciplines of biology that view the evolutionary processes differently, support different concepts (de Queiroz, 2005a). Some of the more influential species concepts are outlined briefly below.

1.1.1.1 Biological species concept The Biological Species Concept (BSC), originally known as the Polytypic Species Concept which later evolved to the BSC (Cracraft, 2000), is the most influential species concept to date (Hausdorf, 2011) due to its utilitarian and seemingly intuitive

2 Chapter 1: Introduction

nature. Dobzhansky (1937) first defined the BSC as “species are systems of populations: the gene exchange between these systems is limited or prevented by a reproductive isolating mechanism or perhaps by a combination of several such mechanisms”. In other words, species are defined as populations of interbreeding individuals that are prevented from breeding with other populations. A major issue some had with this particular definition of the BSC is the use of the word ‘mechanism’, believing the term to be misleading as it implies that selection is acting in such a way that reproductive isolation (RI) is selected for while in allopatry to keep species discrete (Coyne & Orr, 2004). Ernst Mayr was the major champion of the BSC and later transformed Dobzhansky’s definition into the famous delineation of the BSC as “groups of actual or potentially interbreeding natural populations, which are reproductively isolated from other such groups” (Mayr, 1942). As a famous advocate, Mayr defended the BSC against critics but also fell victim to the wording of his own definition. The use of the word ‘potentially’ caused difficulty when determining the species status of taxa in allopatry. As such, Mayr eventually removed ‘potentially’ and reworded the definition “Species are groups of interbreeding natural populations that are reproductively isolated from other such groups” (Mayr, 1995). Both Dobzhansky and Mayr define species status on a population level rather than a single individual, i.e. a single hybrid individual does not necessarily constitute a new species. Following on from this, species status has little to do with the degree of phenotypic difference exhibited between species as was previously true with topology based species concepts (i.e. Morphological Species Concept (see Shull, 1923)). The application of the BSC is based on local populations that are in reproductive condition, but also in contact with each other. Thus, determining what are ‘good species’ is not based on their differences but their absence of interbreeding (Mayr, 2000a). Although the BSC is the most influential and popular species concept, it is not without faults. A major issue with the BSC is that it is not applicable to asexual organisms. However, Dobzhansky (1970) and Mayr (1963) have noted that the BSC is strictly for sexually reproducing organisms; Dobzhansky went so far as to call asexual species, pseudospecies. While the BSC is easily applied to species in sympatry, it is somewhat difficult to apply this concept to allopatric species. Mayr (1969) acknowledges this particular problem attributing it to the difficulty of testing their status as species in relation to each other due to the distance between them. More recently it has been discovered that not only are reproductive barriers semipermeable Chapter 1: Introduction

3

to gene flow through zones of introgression, but species have the potential to differentiate despite ongoing interbreeding (Rieseberg et al., 2003; Coyne & Orr, 2004). The BSC is also non-dimensional in both time and space as this concept is a single snapshot of the here and now and does not consider any form of ancestry. These aspects indicate that both Dobzhansky and Mayr’s version of the BSC do not correlate with how species are currently defined (Hausdorf, 2011).

1.1.1.2 Evolutionary Species Concept One of the first attempts to address the difficulties of the BSC was the Evolutionary Species Concept (EvSC) proposed by Simpson (1951). It is defined as “a lineage (an ancestral-descendant sequence of populations) evolving separately from others and with its own unitary evolutionary role and tendencies” (Simpson, 1961). Wiley (1978), a key supporter of the EvSC revised the concept and altered Simpson’s version to become “a single lineage of ancestor-descendant populations which maintains its identity from other such lineages and which has its own evolutionary tendencies and historical fate”. He believed and argued for this concept to be applied to all biological systems, generally (Mayden, 1997). This lineage-based concept did not become popular until the 1990s when it was further reviewed and developed (Frost & Hillis, 1990; Frost & Kluge, 1994; Wiley & Mayden, 2000). Apart from species existing as separately evolving lineages, another property that the EvSC must be in accordance with, is that each species must form a diagnosable group, i.e. each species has its own evolutionary role, tendencies, and historical fate (de Queiroz, 2005a; 2007). Mayden (1997) advocated that while the EvSC is not an operational concept, it is a lineage-based concept that is non-relational and due to this, the unique descent of every species can be interpreted from attributes and patterns. Unlike the BSC, the EvSC is inclusive of asexual species as well as species formed by hybridisation of ancestral species. In comparison to the BSC furthermore, RI and, much more broadly, reproductive success are relatively uninformative when applying the EvSC.

4 Chapter 1: Introduction

1.1.1.3 Phenetic Species Concept After the EvSC, there was a break in the debate that did not get picked up again until the Phenetic Species Concept (PhSC) was introduced by Sokal and Crovello (1970). Sneath and Sokal (1973) considered the PhSC as organisms that are phenotypically similar and look different from other sets of organisms. Sneath (1976) later wrote “the species level is that at which distinct phenetic clusters can be observed”. This operational concept is like the Typological Species Concept that where variation in morphology is less within than between groups is all that is required to be considered distinct species. Sokal and Crovello (1970) regarded the PhSC as superior to the BSC as it is simpler to apply to natural populations and at the time was the only concept that could be associated with the taxonomic category “species”. Although this concept is relatively simple and appears to be a regress on developing complexity in species concepts, its modern application can be seen through the use of DNA barcoding (Hebert et al., 2003). The purpose of DNA barcoding is to use short standardised sequences (generally the mitochondrial gene cytochrome oxidase I (COI) for animal and internal transcribed spacer (ITS) rDNA for plants) to distinguish unknown organisms and also enhance the discovery of new species (in addition to using traditional taxonomy methods) (Moritz & Cicero, 2004; Hajibabaei et al., 2007). It is essential that these DNA sequences within species be more similar to each other than to sequences between species (Ward et al., 2005). A predefined benchmark for the percentage of divergence to indicate different species can create uncertainty when divergence estimates are far higher than expected within populations. Significant taxonomic uncertainty can be also created, especially with the use of cytoplasmic markers, when hybridisation occurs as the hybrid offspring only carry the maternal species DNA (Ward et al., 2005).

1.1.1.4 Ecological Species Concept While the PhSC was relatively neglected over time, the Ecological Species Concept (EcSC) was a major competitor to the BSC. Van Valen (1976) conceived the EcSC as “a lineage which occupies an adaptive zone minimally different from that of any other lineage in its range and which evolves separately from all lineages outside its range”. The EcSC views species as populations that occupy niches different to other populations. This is an operational concept where organisms’ differences in ecology are what define them to be independently evolving species. The EcSC caters to both Chapter 1: Introduction

5

sexual and asexually reproducing organisms as well as species that have evolved through hybridisation. However, ecological distinction must be preserved in the lineage (Mayden, 1997). Wiley (1978), a proponent of the EvSC, does not believe that species must occupy minimally different niches from other species within a range to be considered different species, as species are required to be adapted to the environment in which they live. Situations where different species occupy the same niche are common in both nature and through the introduction of species to a new environment. Coyne and Orr (2004) suggested that perhaps it may be more meaningful to regard ecological distinctness as criteria for species to persist rather than to make judgement on the species status. Comparisons have been made between the EcSC and the EvSC (Stuessy, 1990; Minelli, 1993; Coyne & Orr, 2004) in which the EcSC bears a resemblance to the EvSC but independently evolving lineages are also characterised as inhabiting “minimally differential adaptive zones”. Van Valen (1976) describes an adaptive zone “…is some part of the resource space together with whatever predation and parasitism occurs on the group considered. It is a part of the environment, as distinct from the way of life of a taxon that may occupy it, and exists independently of any inhabitants it may have”. Van Valen’s definition is purely theoretical and conflicts with the view that ecological niches cannot be defined independently of the organisms which inhabit these niches (Lewontin, 1983).

1.1.1.5 Phylogenetic Species Concept Phylogenetic Species Concepts (PSCs) are distinctly different from other concepts as PSCs are intended to identify historically related groups and are concerned with reconstructing the history of life (Coyne & Orr, 2004). Advocates of PSCs are generally systematists and are extremely critical of the BSC. Where proponents of PSCs view that RI is important to a degree, it is mostly irrelevant when reconstructing history compared to supporters of the BSC who believe that historical relationships are unimportant to understand the discreteness of nature (Coyne & Orr, 2004).

There are two main concepts that identify as phylogenetic:

1. PSC1 – “a diagnosable cluster of individuals within which there is a parental pattern of ancestry and descent, beyond which there is not, and which exhibits a pattern of phylogenetic ancestry and descent among units of like kind” (Eldredge & Cracraft, 1980)

6 Chapter 1: Introduction

The PSC1 is another typological species concept in that species are diagnosed on fixed trait differences. However, this can be problematic when applying the PSC1. Although it accounts for asexually reproducing organisms, it is not clear on what trait/s are used to diagnose species and thus a single mutation in an asexual species has the potential to be classified as separate species (Mayr, 2000b). This problem can lead to an extremely large overestimate in the actual number of species being described (McKitrick & Zink, 1988). The major challenge with this concept is that through its application it can distort evolutionary history – the exact problem it was proposed for to solve – due to species being identified on simple diagnostic traits (Coyne & Orr, 2004).

Nixon & Wheeler (1990), supporters of the Phylogenetic Species Concept, reworked the definition (1990) to “the smallest aggregation of populations (sexual) or lineages (asexual) diagnosable by a unique combination of character states in comparable individuals (semaphoronts)”. They also later synonymise the term “character- fixation” with speciation, irrespective of lineage branching events (Nixon & Wheeler, 1992). Speciation, under this definition, in unbranching lineages occurs each time the last organism bearing an ancestral trait dies and in branching lineages occurs as soon as two lineages become diagnosably distinct (Baum & Donoghue, 1995). This can occur through the fixation of a diagnostic character in the speciating population, also known as divergent evolution (Baum & Donoghue, 1995; Coyne & Orr, 2004).

2. PSC2 – “a geographically constrained group of individuals with some unique apomorphous character, is the unit of evolutionary significance” (Rosen, 1978)

The PSC2 recognises apomorphies (an evolutionary trait that is novel to a particular species and all descendants), rather than diagnostic characters (PSC1), as the only necessary evidence for a phylogenetic species (Mishler & Theriot, 2000). Taxa are considered species under this concept if cladistics analysis shows they are monophyletic and do not include other exclusive groups within it (Coyne & Orr, 2004). Problems with this concept arise with population genetics as the intention is to find out whether populations are exclusive groups that share a common ancestor. The ancestry of populations must be inferred by the ancestry of genes, however, gene trees do not correspond to species trees (Coyne & Orr, 2004). Thus, the PSC2 is challenging as it Chapter 1: Introduction

7

ignores the distinction between monophyly of genes and monophyly of species (Coyne & Orr, 2004).

1.1.1.6 Recognition Species Concept One of the most controversial species concepts was proposed by Paterson (1985) who defined the Recognition Species Concept (RSC) as “A species is that most inclusive population of individual, biparental organisms which share a common fertilization system” (Paterson, 1993). The RSC primarily focuses on the processes that act to preserve a common fertilisation system rather than those that isolate them (Coyne & Orr, 2004). Specific-Mate-Recognition-Systems (SMRS) are both active (i.e. courtship behaviour) and passive (i.e. gamete recognition and fusion) features where organisms recognise each other as mates (Coyne, Orr & Futuyama, 1988; Mayden, 1997). The SMRS is maintained by strong stabilising selection when the organism is in a relatively constant environment, however, if a population becomes isolated and the new isolated environment is vastly different, the SMRS may not be effective and may become altered in the new habitat through directional selection (Mayden, 1997). If the daughter population SMRS characters are sufficiently different from the parental form and they no longer recognise each other as mates, under the RSC, speciation has occurred (Raubenheimer & Crowe, 1987). The RSC can be classed as a part of, rather than an alternative to the BSC, as SMRS characters are a class of premating isolation mechanisms which had been identified by Dobzhansky (1937) and Mayr (1942), previously (Raubenheimer & Crowe, 1987; Coyne & Orr, 2004).

1.1.1.7 Cohesion Species Concept The development of the Cohesion Species Concept (CSC) was formed by Templeton (1989) using the positive aspects of the BSC, EvSC and RSC. The CSC is defined as “the most inclusive population of individuals having the potential for phenotypic cohesion through intrinsic cohesion mechanisms” (Templeton, 1989). Here a cohesion mechanism enforces similarity within clusters that remain genetically and phenotypically similar through genetic exchangeability and/or demographic exchangeability (Templeton, 1989). The CSC fails to diagnose species when measures of genetic and demographic exchangeability conflict (Coyne & Orr, 2004). If groups of sexually reproducing organisms are reproductively isolated but demographically exchangeable under the CSC they would be considered the same species as they

8 Chapter 1: Introduction

occupy the same fundamental niche (Hausdorf, 2011). If the CSC was accepted, many genetically nonexchangeable species would be grouped together as many are demographically exchangeable (Hausdorf, 2011).

1.1.1.8 Genic Species Concept A more recent species concept is the Genic Species Concept (GSC). Wu (2001) developed this novel concept based in the genetic process that happens during speciation and defines species as “groups that are differentially adapted and, upon contact, are not able to share genes controlling these adaptive characters, by direct exchanges or through intermediate hybrid populations”. Specifically, the process of speciation is dependent on the genes that are responsible for differential adaptation to both natural and sexual environments, i.e. “speciation genes”. Throughout the process of speciation, these speciation genes may only account for a small fraction of the genome but gene exchange is highly restricted at these loci (Hausdorf, 2011). A criticism directed towards the GSC is the restricted focus on differential adaptation caused by mutations (Orr, 2001; Noor, 2002). However, Wu (2001) identifies that other genetic features can result in RI (e.g. chromosomal changes) and names these as “special cases” of speciation.

1.1.1.9 Speciation with gene flow and the speciation continuum A more recently proposed theory is speciation with gene flow, although originally thought to be difficult as gene flow limits population differentiation and thus prevents the evolution of RI (Nosil, 2008). Speciation with gene flow is possible as divergence can occur at some genes while there is still an exchange of other genes (Hey, 2006). Backcross hybrids are an excellent example of this, as they do not carry the full set of genes from each population, but it is still possible for some genes to pass between populations based on the fitness of the backcross hybrids (Hey, 2006). Therefore, selection is playing an active role in divergence as it is acting differently in the two diverging populations (Hey, 2006).

Early on, theoretically demonstrating speciation with gene flow under specific conditions was considered possible (Bolnick & Fitzpatrick, 2007), while it was difficult to demonstrate this empirically (Nosil, 2008). There are now multiple examples, however, that demonstrate speciation with gene flow empirically for a range Chapter 1: Introduction

9

of different species including: cave salamanders (Niemiller et al., 2008); endemic plants on Lord Howe Island (Papadopulos et al., 2011); species of lake whitefish (Gagnaire et al., 2013); Heliconius butterflies (Martin et al., 2013); equids (horses, zebras etc.) (Jónsson et al., 2014); and Rhagoletis pomonella flies (Egan et al., 2015).

Sympatric speciation is the most extreme case of speciation with gene flow with no physical barriers to gene flow (Nosil, 2008). The most well-known example of sympatric speciation are the host races of hawthorn flies (Rhagoletis pomonella) across north-eastern and Midwestern USA (Bush, 1966). Endemic hawthorn flies began to differentiate phenotypically and genetically when they began feeding on introduced apples (Feder et al., 1988) which are found in clusters in old fields and edges of woodlands (Mallet et al., 2009). Host plant choice was the critical ecological adaptation that affected gene flow of these flies and thus host choice is directly linked with mate choice (Mallet et al., 2009).

The speciation continuum as described by Shaw & Mullen (2014) is the continual genetic changes that arise as two lineages diverge away from one another and become reproductively isolated (Figure 1.1). This theory is an important conceptual shift in recognising that speciation is a process rather than it being an endpoint (Hendry et al., 2009). Hendry (2009) describes four distinct states along this continuum coming from an ecological speciation standpoint. The first state is continuous adaptive variation without reproductive isolation, state 2 is discontinuous adaptive variation and minor RI, state 3 is adaptive differences with reversible reproductive isolation and state 4 is adaptive differences with permanent reproductive isolation (Hendry, 2009). These states are merely a construct and are not definitive and the junction between them are not always as clear along the continuum.

10 Chapter 1: Introduction

Figure 1.1 From a panmictic population to two reproductively isolated species along the speciation continuum (from Seehausen et al., 2014).

1.1.1.10 Where are we now? While more recently there has been a call for a workable generalised species concept (Hausdorf, 2011), this provocative topic is still not completely resolved. Figure 1.2 below is an excellent representation of where some of the confusion surrounding species and speciation comes from. The ‘grey zone’ is the place where all disagreement between concepts and theories arises and thus the question: at which point of this ‘grey zone’ is a single species considered two? de Queiroz (1998, 2005b) introduced the General Lineage Species Concept and Unified Species Concepts, which are ultimately the same concept, they follow a hierarchical approach similar to the EvSC. These concepts aim to be based on what de Queiroz believes all modern species concepts have in common and can simply explain species as separately evolving metapopulation lineages (de Queiroz 1998, 2007). The species criteria described in the species concepts discussed above, such as reproductive isolation (BSC), niche specialisation (EcSC) or diagnosability (PSC1) are secondary criteria indicative of speciation but confusion arises due to the unknown and

Chapter 1: Introduction

11

unpredictable timing of the criteria in the process of lineage splitting (Lega et al., 2012).

Figure 1.2 A simplified representation of a single lineage/species splitting and forming two lineages/species. Here the shades of grey represent daughter lineages diverging through time and the horizontal lines SC (species criterion) 1-9 represent the time they acquire different properties (from de Queiroz, 2007).

In comparison to the species concepts from de Queiroz, the GSC and speciation with gene flow concepts look at speciation in terms of the genes underlying the process. This is now more relevant than ever with continually advancing molecular techniques and being amidst the post-genomic era we are able to get an insight into the genetic architecture of not only the speciation process, but the genes that are involved. We are now in a position to more fully identify genes that have a direct impact on the speciation process.

1.1.2 Speciation genes While the debate for defining a species is still ongoing, there has also been a shift away from the more traditional view of identifying species based on morphological

12 Chapter 1: Introduction

identification and geographic patterns, to the process of divergence and more recently with a greater focus on speciation with gene flow (Schlüter, 2018). The need to understand the genetic basis of speciation has been a main objective for a long period of time in evolutionary biology and now in the post-genomic era it is possible to identify genes contributing to speciation, now known as speciation genes. The term ‘speciation gene’ can first be found in the literature in Fry & Salsar (1977). Although there has been some scrutiny as to what a speciation gene is, Dion- Cote & Barbash (2017) describe this term simply as a reproductive isolating gene. While this is simple to understand, it is not specific enough. A speciation gene therefore, can be defined as any gene whose divergence has had a significant effect on the evolution of RI (Nosil & Schluter, 2011). The role of speciation genes can be categorised into: pre- and post-mating isolation, sexual isolation, inviability, RI, and courtship behaviour to name a few. Table 1.1 lists candidate genes that have previously been identified as speciation genes across a range of taxa. The in-depth scan of the literature here, resulted in the detection of 18 such genes representing most of the speciation gene categories listed above. Sea urchins for example, reproduce through broadcast spawning and thus require highly conserved RI from other closely related species. This is achieved through the sperm protein Bindin that binds the sperm to the egg (Zigler et al., 2005). Metz & Palumbi (1996) found that closely related species of sea urchins have fixed species-specific differences in the Bindin protein. They showed that it only took a single mutation at one of these fixed nucleotide sites in the sperm for it to be unrecognisable by the egg (Metz & Palumbi, 1996). These species-specific fixed differences where a single mutation results in RI is the result of extreme stabilising selection on the Bindin gene within species and is consistent with the SMRS as part of the RSC. Another example is the male hybrid sterility gene Overdrive in subspecies of

Drosophila pseudoobscura which causes sterility and segregation distortion in F1 hybrids (Phandis & Orr, 2009). It was found that there are seven non-synonymous fixed differences between the two subspecies and a comparison with out-groups concluded that all fixed differences occurred in the D. pseudoobscura bogotana lineage (Phandis & Orr, 2009).

Chapter 1: Introduction

13

Table 1.1 Genes identified to play a role in the process of speciation. Gene Name Role Species Reference Omocestus viridulus & Adenylate cyclase Courtship behaviour Heinrich et al., 2001 Chorthippus biguttulus Arylsulfatase Sperm-zona pellucida binding Pig Carmona et al., 2002 Echinometra Bindin Gametic compatibility Palumbi & Metz, 1991 oblonga Ca+2 dependent protein Helianthus Lexer, Lai & Rieseberg, Inviability kinase (CDPK) paradoxus 2004 Gasterosteus Ectodysplasin (Eda) Hybrid inviability Colosimo et al., 2005 aculeatus Penetration of zona pellucida, Leucophaea Glycosyl hydrolase Cornette et al., 2003 male courtship behaviour maderae Sperm zona-pellucida binding, Heat shock 70 kDa zona pellucida receptor Mus sp. Calvert et al., 2003 complex Hybrid male rescue Barbash, Roote & Post mating isolation Drosophila sp. (Hmr) Ashburner, 2000 Lethal hybrid rescue Post mating isolation Drosophila sp. Brideau et al., 2006 (Lhr) Necleoporin96 (Nup96) Post mating isolation Drosophila sp. Presgraves et al., 2003 Nucleoporin160 Shanwu & Presgraves, Post mating isolation Drosophila sp. (Nup160) 2009 Odysseus (OdsH) Post mating isolation Drosophila sp. Ting, Tsaur & Wu, 2000 Hybrid sterility between Overdrive Drosophila sp. Phadnis & Orr, 2009 species Pentatricopeptide repeat Sweigart, Fiahman & Sexual isolation Mimulus sp. (PPR) Willis, 2006 PR domain containing 9 Reproductive isolation Mus sp. Mihola et al., 2009 (Prdm9) tcP-1 CPN60 Aerosomal reaction, sperm Apis mellifera Barchuck et al., 2007 Chaperonin zona-pellucida binding Triosephosphate Sexual isolation Ostrinia nubilalis Dopman et al., 2005 isomenase (TPi) Bleil & Wassarman, ZP3 – binding protein Sperm zona-pellucida binding Mus sp. 1990

1.1.3 Suitability of crustaceans for studies on speciation A clear distinction to make when viewing these ‘speciation genes’ is that the majority have been found in model species and taken from organisms reared in laboratory conditions opposed to wild populations. It is also important to note that no speciation genes have yet been identified in crustaceans and therefore we are unsure if these genes have the same role in this subphylum. Crustaceans are an excellent model organism to address evolutionary and taxonomic studies as both species complexes and cryptic species are commonly found within these taxa (e.g. Belyaeva & Taylor, 2009; Machordom & Macpherson, 2004; Mathews, 2006). At present there are six classes of crustacean comprised of

14 Chapter 1: Introduction

approximately 67,000 extant species; class (that includes the decapods) is the most speciose, morphologically, and ecologically diverse group (Richter & Scholtz, 2001). There are approximately 14,700 extant species of decapods and the family is the most species rich group of shrimp. Atyids are found in freshwater habitats on all continents, except Antarctica (von Rintelen et al., 2012). The Atyidae family is comprised of 42 extant genera, with approximately 470 species (De Grave & Fransen, 2011). Five of the atyid genera are widespread in coastal streams of Australia, particularly in the tropics and subtropics. The Paratya genus has a wide but disjointed distribution in the Pacific region (Japan, Australia, , , Lord Howe Island, Norfolk Island etc.) and are ancient inhabitants of freshwater environments as well as lakes and estuaries (Carpenter, 1977). The endemic Australian species, Paratya australiensis, is the most widely distributed shrimp, ubiquitous across the eastern mainland – including the Murray-Darling system – and Tasmanian drainages (Figure 1.3). P. australiensis exists at high density in headwater streams, but is also common in lowland river channels. This species is the most conspicuous and abundant macroinvertebrate of upland subtropical rainforest streams in southeast Queensland (Hancock & Bunn, 1997).

1.1.3.1 Ecology of Paratya australiensis Paratya australiensis has a life span of two years and generally the females breed in their second summer (Hancock, 1995). They can be distinguished from other atyids by the presence of a supra-orbital spine on each side of the carapace and by the exopods on all pereiopods (Williams, 1980). Sexing this species can be difficult but Smith & Williams (1980) describe the males having an appendix masculina on the second pleopod and the presence of a pointed sternite between the 5th pereiopods. Many characteristics, such as the rostrum shape and cheliped proportions, vary greatly across the species range, leading to debate over the taxonomy. In all locations where P. australiensis are found, they preference well vegetated areas but can also be found in other protected areas such as under logs and rocks (Williams, 1977; Morris, 1991). In this habitat, P. australiensis is described as a browser and filter feeder (Gemmel, 1979), scraping food from the substratum as well as scavenging on detritus, dead , algae and plants (Walker, 1972).

Chapter 1: Introduction

15

Figure 1.3 Map of Paratya australiensis distribution

1.1.3.2 Paratya australiensis species status The species status of P. australiensis has previously been debated but is currently considered taxonomically monotypic. It was first described a century ago (Kemp, 1917), but a taxonomic review by Riek (1953) recognised five distinct taxa (at both species and subspecies level (Figure 1.4)). However, Williams & Smith (1979) redescribed Paratya, with all species re-synonymised with P. australiensis. Recent molecular research, particularly on the population structure (Hurwood et al., 2003; Baker et al., 2004), has re-ignited the taxonomic uncertainty of this species. In 1992, a sample of approximately 10,000 P. australiensis individuals were reciprocally translocated between two streams of the Brisbane River; Kilcoy Creek and Branch Creek (a tributary to Stony Creek) (Hancock, 1995). This was to measure their instream dispersal using three allozyme loci that had been shown to have fixed differences between these two creek systems (Hughes et al., 1995). It was assumed that the introduced shrimps would admix with the local shrimps resulting in complete introgression, however, the movement of introduced alleles could be monitored through subsequent sampling of later generations. It was later found using mitochondrial DNA (COI) that these two populations (Kilcoy and Stony) showed high

16 Chapter 1: Introduction

levels of sequence divergence (approximately 6%) and belonged to two reciprocally monophyletic clades (Hurwood et al., 2003). What was also found in the translocation sites was extreme non-random mating (100% of F1 hybrids in Branch Creek were from matings between resident males and introduced females) and highly variable fitness of the F1 hybrids compared to offspring from matings within lineages (Hughes et al., 2003). Later work by Fawcett et al. (2010) confirmed this to be the case as can be seen by a ratcheting effect in Figure 1.5, the translocated genotypes had far higher reproductive success but an overall lower fitness than the resident genotypes. The reciprocal translocation in Kilcoy Creek resulted in no detection of the translocated alleles. This may be due to three main effects: (i) the environment of Kilcoy Creek is at a higher elevation to that of Branch Creek and thus the translocated individuals could not successfully acclimate to the cooler environment; (ii) the hybrid offspring between the two lineages were not viable within the Kilcoy Creek environment; or (iii) the two lineages failed to recognise each other as the same species in the Kilcoy Creek environment. However, under the scenarios presented in (ii) and (iii), we still would have expected to see offspring from Branch Creek (lineage 6) matings among themselves, which was not the case. Conversely, it is known that the Kilcoy type (lineage 4) has not only survived in Branch Creek, but it has almost totally displaced the Stony Creek type (lineage 6) to the lower reaches of Branch Creek (Fawcett et al., 2010).

Chapter 1: Introduction

17

Figure 1.4 Distribution of the five species and subspecies as described by Reik (1953).

18 Chapter 1: Introduction

Figure 1.5 From Fawcett et al., 2010. Frequency of the translocated alleles across years in Branch Creek, where B-0 is the site of the initial translocation in 1995 and + is pools above and – is pools below the translocation site.

Since the study in Branch Creek, further work has identified a total of nine equally divergent clades across the range of the species (Baker et al., 2004; Cook et al., 2006). Figure 1.6A shows the equal divergence of these lineages and also shows the lack of phylogenetic resolution seen among lineages. Many lineages are co-distributed as can be seen in the Brisbane River. Several of these lineages have vast, yet overlapping, geographic distributions (Figure 1.6B) reflecting complex biogeographic histories involving multiple range expansion and possible life history transitions (Cook et al., 2006). Some work has documented the ongoing fate of the field hybridisation (Fawcett et al., 2010; Wilson et al., 2016) and while the translocated lineage is persisting in the receiving stream (i.e. Branch Creek), there is still significant non-random mating, even after 25 generations of contact. Non-random mating and low hybrid viability point at P. australiensis being a species complex or approaching that status. This system therefore, provides an ideal scenario for a genomic investigation into speciation genes and the speciation process.

Chapter 1: Introduction

19

A

B

Figure 1.6 (A) Neighbour-joining gene tree (mtDNA COI & nDNA 28S) for Paratya australiensis (B) Map of eastern Australia presenting the P. australiensis lineages found at each sampled river (Cook et al., 2006).

20 Chapter 1: Introduction

1.1.4 Genomics as a tool to study species and speciation Genomics originated in the mid-1980s as an extension of DNA sequencing with an emphasis on whole genome functions (Hieter & Boguski, 1997). This field of science can be divided into structural genomics and functional genomics. Where structural genomics deals with the initial phase of genome analysis, functional genomics is the expression and function of genes (Upadhyaya, Pereira & Watson, 2010). The ultimate goal of functional genomics is to expand the scope for biological investigations from single gene and/or protein studies to studying a large portion of genes and proteins of the genome at once. Next generation sequencing (NGS) technologies are an exceptional source to answer various ecological and evolutionary questions (Morozova & Marra, 2008). Genotyping by sequencing (RAD-seq or DART-seq) is a popular approach to genotype thousands of single nucleotide polymorphisms (SNPs) that provides the opportunity to perform genomic scans (detect signatures of selection and adaptive mutations) (Ekblom & Galindo, 2011). This technique has been applied to species of lake whitefish in North America (Gagnaire et al., 2013). RAD-seq was used in conjunction with a population genetics approach and identified multiple effects on the size of genomic islands of differentiation between the lakes. These influences include, linkage disequilibrium maintained by selection on numerous loci, niche divergence and demographic characteristics unique to each location (Gagnaire et al., 2013). A RAD- seq only approach has also been used in angelfish species to estimate the timing and mode of speciation events (Tariel, Longo & Bernardi, 2016). Modes of speciation were identified as peripatric in three species found in the tropical eastern Pacific and the possibility of sympatric speciation in sister species found in the tropical western Atlantic (Tariel, Longo & Bernardi, 2016). Another approach to investigate speciation is through functional genomics (transcriptomics) as it reduces the genome size for non-model animals while still providing a significant portion of relevant functional information. It also offers an effective way to identify candidate genes affecting a phenotype (biological condition), differential gene expression patterns to understand the functional role of the genes and functional sites containing important evolutionary information (Blank et al., 2014). Henning et al. (2013) used a transcriptomic approach, more specifically a differential expression analysis, to compare colour transformation in Midas cichlids. Using specifically skin tissue, 46 genes were identified to be differentially expressed between Chapter 1: Introduction

21

the different colour morphs. As this lineage is considered extremely young (due to strong assortative mating), the identified candidate genes are important for understanding speciation in these fish (Henning et al., 2013). While next generation sequencing has been well established in many species regarding understanding speciation, it has not been used extensively in crustaceans. Generally, crustacean studies rely on a mitochondrial approach rather than the use of next generation sequencing to investigate speciation. The mitochondrial approach has been applied to a range of species including the marbled crayfish (Procambarus fallax) (Vogt et al., 2015), reef hermit crab (Calcinus sp.) (Malay & Paulay, 2009), Agar River prawn (Macrobrachium jelskii) (Vera-Silva et al., 2016) and the stygofaunal family Bathynellidae (Perina et al., 2019).

1.1.5 Phylogenomics Phylogenomics encompasses several areas of research between molecular biology and evolution (Philippe et al., 2005). This discipline primarily deals with using molecular data to help understand the relationships between species and to gain an insight into the mechanisms of molecular evolution using information from species evolutionary history (Philippe et al., 2005). The use of transcriptomics data in phylogenomics has gained momentum in recent times as this method includes many genes that can be a powerful tool to resolve taxonomic ambiguity (Simon et al., 2012; Egger et al., 2015), as seen with P. australiensis (see Figure 1.6A). Phylogenomic trees inferred from transcriptomes have a potential advantage over other methods as these phylogenies reduce the risk of homoplasy by convergence as they are inferred from features such as gene content and order, intron positions or protein structure (Philippe et al., 2005). A transcriptomic phylogenomics approach would significantly reduce the stochastic error seen in standard phylogenetic studies as the number of genes added to the analysis provides a greater amount of phylogenetic information leading to more fully resolved trees (Philippe et al., 2005). To reduce incongruence, phylogenomic studies generally apply a data filtering step to reduce missing data or use slowly evolving genes to improve the signal quality of data (Chen, Liang & Zhang, 2015). A transcriptomic approach has proven to be an effective method for phylogenomics in resolving the relationship of the eight major lineages of molluscs (Kocot et al., 2011). Previously, morphological traits and traditional molecular

22 Chapter 1: Introduction

phylogenetics were used as the key identifier of the lineages but this led to multiple conflicting phylogenies and hypotheses. Kocot et al. (2011) used transcriptome and genome data to reconstruct a well-supported Mollusca phylogeny, supporting the Aculifera hypothesis.

1.2 AIMS OF THE CURRENT PROJECT

Paratya australiensis is the most common freshwater shrimp in eastern Australia that occurs along the east coast of mainland Australia from northern Queensland to and is also found in . This extensive distribution is quite puzzling as the species is considered primary freshwater. While the genetics, dispersal and systematics of this species has been studied extensively, Cook et al. (2006) made a concluding remark that their results suggested a complex of cryptic species. Furthermore, Hughes et al. (2003) and Fawcett et al. (2010) showed non-random mating among clades 4 and 6. Given the apparent taxonomic difficulty of this species and associated lack of phylogenetic resolution, the broad research aims and questions for the current study include:

1. Can genes that contribute to the process of speciation be identified in P. australiensis? 2. Is hybridisation still ongoing in Branch Creek after 25 generations since the translocation event? 3. Can the polytomy seen in P. australiensis be resolved using whole mitochondrial genomes? 4. Is Paratya australiensis a species complex comprised of cryptic species or still a single species in the process of becoming multiple species?

Chapter 1: Introduction

23

1.3 THESIS OUTLINE

This thesis is comprised of four data chapters that aim to answer the research questions stated above. All chapters use a transcriptomic approach and chapter 5 also incorporates mitochondrial DNA analysis. Chapter 2 involves a comparison of differentially expressed genes between lineages 4 and 6 of P. australiensis found in southeast Queensland, Australia. This chapter forms a published article in Hydrobiologia (2018). Chapter 3 investigates speciation genes in lineages 4 and 6 as well as individuals from a known hybrid zone. These speciation genes have previously been identified in literature and genes involved in a range of reproductive processes based on their annotated gene ontology were also investigated. Differential gene expression analysis was also performed on the three populations to investigate the hybrid population. Chapter 4 employed a single nucleotide polymorphism (SNP) approach to determine whether hybridisation is still ongoing in Branch Creek (known hybrid zone). Identity by state analysis, principal component analyses and relatedness tests were employed to investigate the Branch Creek individuals in comparison to the Kilcoy Creek and Stony Creek populations. Chapter 5 uses mtDNA along with whole mitogenomes to resolve the phylogeny as presented by Cook et al. (2006). The resultant phylogeny aids in the further understanding the diversification and divergence history of the species complex. Finally, Chapter 6 draws together the main findings from each of the data chapters to address all research questions. This chapter also provides future directions for further research in the species complex and final conclusions on the research performed within this thesis.

24 Chapter 1: Introduction

Chapter 1: Introduction

25

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

*This chapter has been published as: Rogl, K. A., Rahi, M. L., Royle, J. W. L., Prentis, P. J. & Hurwood, D. A. (2018). A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis. Hydrobiologia, 825(1), 183-196.

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

27

2.1 INTRODUCTION

Next-generation sequencing (NGS) technology has allowed many previously unanswered biological questions to find a resolution (Yue & Wang, 2017). Multiple NGS techniques have been useful for investigating biological phenomena in aquatic species including crustaceans, however, genomic data remains relatively sparse, even for commercially important species exploited in the aquaculture industry we now see today (Robledo et al., 2017; Pawlowski et al., 2014). While there are now more available genomes than ever before, and despite their economic importance, there are no decapod whole genome sequences publicly available (although some are being prepared at the time of writing). While whole genomes are useful, they are not necessarily critical in NGS studies where the goal may be to identify specific genes (candidate genes) associated with particular phenotypes (eg. growth traits in prawn/shrimp aquaculture), or for discovery of unknown yet phenotypically important genes (novel/orphan genes). Using a functional genomics (transcriptomics) approach to investigate genes relating to specific traits has streamlined the way genes are selected for future generations in aquaculture. Some examples include genes relating to growth (Jung et al., 2011), disease resistance (Ghaffari et al., 2014), osmoregulation (Moshtaghi et al., 2016; Rahi et al., 2017), sexual determination (Liu et al., 2015) and developmental biology (Wei et al. 2014). While this approach has facilitated many practical research outcomes, the vast majority of data to date still reflect economically important groups. As so little work has been done on freshwater decapods, apart from commercially important Macrobrachium species, and no attention has been paid to primary freshwater taxa. The primary freshwater Paratya genus of the family Atyidae has a wide, but disjointed, distribution in the Pacific region (Japan, Australia, New Zealand, New Caledonia, Lord Howe Island, Norfolk Island etc.). They are ancient inhabitants of many freshwater environments, both lotic and lentic, with some species also found in estuaries (Carpenter, 1977). The endemic species, Paratya australiensis (Kemp, 1917), is the most widely distributed shrimp in Australia, ubiquitous across the eastern mainland and Tasmanian drainages. P. australiensis exists at high density in headwater streams, but is also common in lowland river channels (Walsh & Mitchell, 1995). This species is the most conspicuous and abundant macroinvertebrate of upland subtropical rainforest streams in southeast Queensland (Hancock & Bunn, 1997).

28 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

P. australiensis is particularly interesting because of the enigmatic biogeographic history (Hughes et al., 1995; Hurwood et al., 2003; Cook et al., 2006). The study by Cook et al. (2006) showed nine equally divergent clades across the range on the Australian mainland with three of the lineages being widespread. The relationship among these lineages, however, is largely unresolved. Furthermore, many of these lineages are found with distributions that overlap with at least one other clade within single river systems. Rather than allopatric isolation of a single amphidromous population (where larval development occurs in estuarine habitats and post larvae migrate back into freshwater (Novak et al., 2017) resulting in a number of divergent clades, they hypothesised that the observed pattern was a result of repeated instances of independent transitions from amphidromy to a freshwater lifestyle with subsequent dispersal among river drainages. Because the Cook et al. (2006) study relied primarily on the neutral mtDNA COI marker, their hypothesis could not be further explored. That is, the question of whether neutral divergence correlates with divergence at genes likely to be important during an amphidromy/freshwater transition (eg. genes associated with, temperature tolerance, reproduction, larval development, osmoregulation etc.) was not addressed. Now, with NGS technology far more accessible, we can initiate the investigation into functional divergence across divergent lineages. A first step in addressing this question is the generation of a detailed transcriptome for this species and identifying candidate genes involved in freshwater invasion. The overall aim of this research was to generate transcriptomic data from two divergent lineages of Paratya found in southeast Queensland (lineage 4 and 6, sensu Cook et al., 2006) to identify candidate genes/regions that may be important for adaptation across local environmental gradients and to produce a suite of single nucleotide polymorphisms (SNPs) that could be used to quantify genome wide levels of divergence.

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

29

2.2 METHODS

2.2.1 Sample collection P. australiensis samples were collected from Stony Creek (lineage 6) in Bellthorpe National Park (26°50’52.7”S 152°42’40.4”E) and Kilcoy Creek (lineage 4) in Conondale National Park (26°45’21.2”S 152°32’46.9”E), South East Queensland. A seine net was used to collect individuals that were subsequently preserved in situ in liquid N2. Samples were returned to CARF Genomics research facility at the Queensland University of Technology (QUT) and stored at -80°C until further analysis. Total genomic DNA was extracted from sampled individuals and screened for an mtDNA COI fragment using universal primers (Folmer et al., 1994) to confirm species identification. All individuals were confirmed to be Paratya australiensis.

2.2.2 RNA extraction, cDNA library preparation and Illumina sequencing From each of Stony and Kilcoy Creeks three individuals were used for Illumina sequencing. Total RNA was extracted from 10 individuals from each site using a TRIzol/chloroform extraction method (Chomczynski & Mackey, 1995). Prior to this, whole individuals were crushed into a fine powder with liquid nitrogen. Extracted samples were then purified using an Isolate II Mini Kit, according to the manufacturer’s instructions (Bioline, Alexandria, Australia). Purified RNA was then preserved in -80°C until required for cDNA synthesis. Total RNA yield and quality were checked on a Nano Drop 2000 Spectrophotometer (Thermo Scientific), 2% agarose gel electrophoresis and finally using Bioanalyzer with an RNA nano-chip (Agilent 2100, version 6). The three higher quality samples from each site were used in the subsequent steps. cDNA synthesis was performed using the Illumina NeoPrep and all libraries were then sequenced on an Illumina NextSeq™ 500 Platform (Illumina, San Diego, USA) at MGRF, QUT for 35 bp paired end sequencing.

2.2.3 Quality filtering and de novo assembly Sequence data quality was first checked using FastQC (Andrews 2010) and the Illumina paired end reads were then de novo assembled in Trinity with quality trimming performed by the included QC software Trimmomatic set on default settings to retain only high quality reads (Q > 20, N <1%). All quality filtered cDNA libraries

30 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

were pooled together for de novo assembly to generate a reference transcriptome. BUSCO (Simão et al., 2015) was used to assess assembly completeness by mapping against genes that are highly conserved in Eukaryotes.

2.2.4 Bioinformatic analysis De novo assembled contigs were blasted against the NCBI non-redundant database using BLAST+ (version 2.3.0) applying an e-value stringency of 1e-5 to identify significant blast hits. The blasted contig files were then loaded in Blast2Go Pro (version 4.1) for mapping and annotation of sequences to describe potential functions of each contig based on gene ontology (GO). The data set was then checked for genes potentially involved in temperature tolerance, osmoregulation and reproduction based on GO terms. These categories were selected as it is hypothesised they are important to adapt to the local environment.

Differential gene expression Differential gene expression (DGE) analysis was performed between the two populations by estimating the transcript abundance for each individual using RSEM with default settings (Li & Dewey, 2011). The transcript abundance data was then loaded into the edgeR Bioconductor package (using the p-value cut off for false discovery rate 1e-3 and 1e-10 with log fold change 2) for DGE analysis to generate the output in the form of a heatmap.

Single nucleotide polymorphism (SNP) detection Reads from each individual were first mapped individually to the reference transcriptome in BOWTIE2 (Berdan et al., 2015) using default parameters. PICARD (Langmead & Salzberg, 2012) was then used to mark and remove duplicated reads and de-duplicated reads were then realigned to the reference using the GENOME ANALYSIS TOOLKIT (GATK) (van der Auwera et al., 2013; DePristo et al., 2011). SNPs were then called using the GATK-module Unified Genotyper applying default settings. After this first step of SNP calling, several layers of filters were applied. At first, GATK used a threshold of 30 to filter all SNPs based on Phred-scaled quality as well as to filter out variants where more than 10% of the reads had a mapping quality of 0. Variants were further removed if: (i) quality scores, normalised on the amount of

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

31

coverage, were <2; (ii) Phred-scaled p-values for the Fisher’s exact test were >60; and (iii) the root mean square for mapping quality across all genotypes was <40 (Berdan et al., 2015; Moshtaghi et al, 2017). We then used VCFTools (Danecek et al, 2011) to: (i) filter genotypes called below 50% across all individuals; (ii) filter SNPs that have a minor allele count <3; (iii) recode genotypes with less than 3 reads, apply a genotype call rate of 95% across all individuals and filter by mean depth of genotypes >20; and (iv) use a maximum mean depth of 77 to filter poor quality loci. It is known that hard filtering, as described here, can overlook rare alleles (Huang & Knowles., 2016). To account for this, we also applied the more conservative default settings in GATK for: (i) Phred-scaled quality score (30); (ii) quality score normalised by the amount of coverage (QD <2.0); (iii) Phred-scaled p-values for Fisher’s exact test (FS >60.0); (iv) haplotype score (13); (v) the root mean square of the mapping quality across all samples (MQ<40.0); (vi) depth of coverage (DP>10); (vii) u-based z-approximation from the Mann-Whitney rank-sum test for mapping qualities (MQRankSum<-12.5); and (viii) u-based z-approximation of the Mann- Whitney mapping quality rank-sum test (ReadPosRankSum<-8.0). Similarly, we used VCFTools default settings to: (i) filter all contigs that contain variants with heterozygosity >80%; (ii) filter all variants with a minor allele frequency <0.1; and (iii) filter all SNPs with genotypes missing for more than 15% of individuals (Berdan et al., 2015). Based on this alternative scenario, we detected a <5% increase in the number of SNPs and concluded that any benefit from an increased number of SNPs was unlikely to outweigh the probability of including false positives in the data set. As such we, have only presented data gathered from the hard filtering protocols described above.

2.3 RESULTS

2.3.1 Illumina sequencing, de novo assembly and annotation The Illumina NextSeq 500 platform yielded 206,461,330 high quality 35 bp paired end raw reads. De novo assembly of the raw reads resulted in 95,315 contigs (>200 bp) and of these 30,991 showed significant BLAST hits. Table 1 describes the features of the assembly and annotation statistics for the cDNA libraries of P. australiensis. The BUSCO transcriptome completeness shows 97.40% complete and fragmented mapping, suggesting a relatively high quality of the transcriptome data set. The top hit

32 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

species distribution chart (Online Resource 1) shows the highest top hit matches with Daphnia pulex (Leydig, 1860) (micro-crustacean species). In total 11,699 GO (involved in three different processes, biological process, molecular function and cellular components) terms were identified in 21, 344 annotated transcripts.

Table 2.1 Assembly, annotation and transcriptome completeness statistics for P. australiensis Parameters Results Total number of Illumina reads 206,461,330 Total number of assembled contigs 95,315 Total assembled bases 57,701,345 Average contig length 605 bp Median contig length 355 bp Contig range 201-17,505 bp N50 847 Transcriptome completeness 97.40% (BUSCO) Number of contigs blasted 30,991 Number of contigs mapped 28,605 Number of contigs annotated 21,344

2.3.2 Candidate gene identification and DGE analysis A total of 50 candidate genes were identified in the P. australiensis data set that have a potential role in reproduction, temperature tolerance, osmoregulation and egg size control. Appendix Table A2.1 details the transcript abundance per population (in transcript abundance per million reads (TPM)) in these 50 potential candidate genes. The top 20 expressed genes of each population (Table 2.2) were highly expressed within populations and differentially expressed between the populations. DGE analysis using a p-value of 1e-10 revealed 660 transcripts differentially expressed (Appendix A, Figure A2).

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

33

Table 2.2 Top 20 transcripts/genes expressed in each of two lineages of P. australiensis Kilcoy population Stony population Transcript ID Annotation TPM Transcript ID Annotation TPM DN34189_c1_g1 Hypothetical 57090.59 DN34189_c1_g1 Hypothetical 77522.73 protein protein DN30264_c0_g1 trypsin-like 13623.37 DN34848_c0_g1 chymotrypsin 48048.1 serine ase 2 ase DN34848_c0_g1 chymotrypsin 11003.72 DN30264_c0_g1 trypsin-like 24790.8 ase serine ase 2 DN34311_c0_g1 myosin light 8956.26 DN34189_c0_g1 Uncharacterised 18328.13 chain DN34766_c1_g1 sarcoplasmic 8625.76 DN34766_c1_g1 sarcoplasmic 16314.29 calcium-binding calcium-binding DN34766_c0_g2 sarcoplasmic 7892.42 DN35479_c2_g1 hemocyanin 11346.23 calcium-binding DN28803_c0_g1 myosin light 7791.84 DN34311_c0_g1 myosin light 7730.48 chain 2 chain DN34528_c2_g1 Arginine kinase 5270.22 DN34528_c2_g1 Arginine kinase 7480.59 DN15414_c0_g1 Uncharacterised 4728.89 DN15414_c0_g1 Uncharacterised 7390 DN34655_c0_g3 troponin I 4711.72 DN34824_c9_g1 cytochrome c 7152.22 oxidase subunit II (mitochondrion) DN19257_c0_g1 cytochrome c 4219.36 DN28803_c0_g1 myosin light 6871.51 oxidase subunit chain 2 III DN35289_c1_g1 Uncharacterised 3342.22 DN9935_c0_g1 Proteoliaisin 5361.5 DN35479_c2_g1 hemocyanin 3283.89 DN19257_c0_g1 cytochrome c 4777.45 oxidase subunit III DN9304_c0_g1 Uncharacterised 3115.54 DN34655_c0_g3 troponin I 4444.45 DN35479_c9_g1 hemocyanin 3024.57 DN34864_c2_g1 ferritin 4339.87 subunit DN9935_c0_g1 Proteoliaisin 2985.19 DN35289_c1_g1 Uncharacterised 3736.3 DN38574_c0_g1 Uncharacterised 2309.32 DN35417_c4_g2 cytochrome b 3733.7 (mitochondrion) DN33615_c2_g1 ATP synthase 2075.02 DN35519_c1_g1 Uncharacterised 3459.76 subunit 6 (mitochondrion) DN39199_c0_g1 Uncharacterised 2002.05 DN27792_c0_g1 low-density lipo 3378.05 receptor-related 1B-like DN27792_c0_g1 low-density lipo 1807.72 DN34824_c0_g1 cytochrome c 3273.92 receptor-related oxidase subunit 1B-like (mitochondrion)

34 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

2.3.3 SNP detection Transcriptome wide SNP calling revealed 5973 raw SNPs and 2554 filtered SNPs across the two sampled populations. Table 2.3 details the number of raw and filtered SNPs for each population. A comparison of the SNPs between populations revealed they shared 111 variable loci. While the filtered number of SNPs is quite high, no SNPs were identified in any of the candidate genes (Appendix A, Table A1) or the 20 highly expressed genes (Table 2.2). Of the 977 filtered Kilcoy SNPs, 74 of these were identified in the DGE analysis and 5 were identified from the Stony population (see Appendix A, Table A2). Of these highly differentially expressed genes associated with SNP detection, several fall into the categories identified, these genes include heat shock 70, mitochondrial ATP synthase gamma chain and hematopoietic prostaglandin D synthase. There are also multiple transcripts identified that are uncharacterised and are possibly novel genes important for multiple processes in P. australiensis. The SNPs identified in these genes are found in both the open reading frames and the untranslated regions of the genes.

Table 2.3 SNP table for the two populations of P. australiensis Populations

Kilcoy Stony

Raw SNPs 2332 3641

Filtered 977 1577

SNPs

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

35

2.4 DISCUSSION

At least nine distinct lineages exist across the natural range of P. australiensis in eastern Australia, and levels of divergence at neutral loci suggest that, on average, the lineages shared a common ancestor several million years ago. How each lineage has attained its current distribution, however, remains unclear; a hypothesis of multiple independent transitions from an amphidromous to a pure freshwater life style continues to be speculative (Cook et al., 2006). The comparative transcriptomic data set generated here for two of the divergent lineages of P. australiensis (lineages 4 and 6, sensu Cook et al., 2006) provides an important first step to further understanding the divergence of this species at a functional level. While both of these lineages have overlapping distributions in numerous river systems, to our knowledge they do not actually co-exist in the same pool/stream (e.g. lineage 4 was restricted to the Kilcoy Creek subcatchments and lineage 6 was restricted to the Stony Creek subcatchments in the Brisbane River). Notwithstanding the possible independent freshwater transitions, it has been suggested that the level of neutral divergence seen in this case, correlates with adaptive differences in their respective ecologies (Fawcett, Hurwood & Hughes, 2010), reflecting slight, but significant, differences in their respective local habitats (eg. altitudinal temperature differences, conductivity etc.). In this study, 50 identified candidate genes based on BLAST hits (Appendix A) targeted reproduction and development, temperature tolerance, and osmoregulation; genes likely to play a role in adaptation to the local aquatic environments (Moshtaghi et al., 2016; Rahi et al., 2017). Osmoregulation specifically has been well studied in crustaceans where key genes that play major roles in cell volume regulation, water channel regulation and body fluid maintenance (Havird et al., 2014; McNamara & Faria, 2012) have been identified. Genes described in this study (Appendix A) that show a difference in expression pattern include Arginine kinase and sodium potassium ATPase alpha subunit. Both of these genes functions include ATP binding, while Arginine kinase is also important for phosphorylation and salinity regulation. Sodium potassium ATPase alpha subunit function includes ion binding, transport and exchange for osmoregulation.

36 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

It has also been previously recognised that temperature affects the time of breeding and release of larvae (Hancock & Bunn, 1997) in this species at the locations of sampling in this study. The genes relating to temperature tolerance such as 10 kDa chaperonin, heat shock 86 and heat shock cognat 71kDa showed far higher expression levels in lineage 4 (Kilcoy) compared to lineage 6 (Stony). The same is true for genes relating to reproduction, expression levels are higher in lineage 4 compared to lineage 6. There are a few exceptions to this, however, and these are, Ankyrin repeat- containing, and non-receptor type 4-like whose expression levels were far higher than those of lineage 4. Few, large eggs correlated with direct development, is a trait that often has been associated with freshwater crustaceans (Vogt, 2013). Egg size has been measured for both Kilcoy (lineage 4) and Stony Ck. (lineage 6) populations with lineage 4 having relatively larger and fewer eggs than lineage 6. Hancock (1998) suggested that this may relate to altitudinal differences associated with a steeper upstream gradient selecting for more direct development. However, this is the first time that we have scrutinised the genomic basis for such egg size variation in this species. The only gene found to have any difference in gene expression pattern was vitellogenin with the gene function involved in nutrient reserve and lipid transport activity, as well as in oogenesis. While the Mothers against DPP and Mothers against DPP 3 are important for immune system development, ovarian follicle cell development and heart looping, these genes showed little to no difference in gene expression pattern. Of the 50 genes identified a priori within multiple GO categories, few showed a difference in gene expression based on DGE analysis and no SNPs after filtering were found in any of these genes. This is due to the likelihood of differential expression being affected by both the environment and genetics as well as the position of the SNP (e.g. in the regulatory region). To further understand this in P. australiensis future studies may be able to use quantitative genetics to identify if differential expression at particular points of the genome are genetically determined (Hill, 2010; Moore & Hu, 2015). When identifying the differentially expressed genes that contained SNPs (see Appendix A) several were involved in the targeted gene categories. Most though, based on GO terms, were involved with the cellular components. Many of these genes were also uncharacterised yet they had stark differences in gene expression. The top

Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

37

20 expressed genes for each population were similar, however, their relative expression pattern differed, consistent with local adaptation to differing environments. The high number (2,554) of SNP differences identified among lineages in this study are consistent with the differences seen in the neutral markers used by Cook et al. (2006) and Hurwood et al. (2003) that show 6% divergence between these two lineages. In this case, however, it is apparent that divergence is not entirely due to neutral evolution (i.e. random mutations sorted by genetic drift) but rather a possible consequence of multiple independent transitions from amphidromy to a freshwater lifestyle (random mutations sorted by natural selection) as initially suggested by Cook et al. (2006). This is also further supported as the DGE analysis is on the functional genome where certain SNPs (whether in the ORF or UTR) have been selectively fixed due to adaptation to their respective local environments. This conclusion is supported by multiple studies that have recognised that differences in gene expression pattern play a meaningful role in population specific adaptation and differentiation (King & Wilson, 1975; Leder et al., 2015). Future directions for the study of this species include studying the other lineages described by Cook et al (2006) and identifying the genes found in this study in these other lineages to compare their expression patterns. It has been stated on several occasions (Cook et al., 2006; Gan et al., 2015; Hurwood et al, 2003; Page et al., 2005; von Rintelen et al., 2012) that P. australiensis may indeed be a complex of multiple cryptic species. Therefore, identifying fixed SNP differences and differential gene expression patterns in these lineages will aid in determining if these divergent lineages represent a single species or multiple cryptic species. This study comparing two lineages found in the same river system of southeast Queensland is the first step to unravelling these questions, which are currently being pursued.

38 Chapter 2: A transcriptome-wide assessment of differentially expressed genes among two highly divergent, yet sympatric, lineages of the freshwater atyid shrimp, Paratya australiensis

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

*This chapter has been prepared as a manuscript for submission for review. As such there is some repetition in both the Introduction and Methods from Chapter 1 and Chapter 2.

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

39

3.1 INTRODUCTION

Darwin (1859) first recognised speciation as the ongoing accrual of differences between populations in small steps. However, this view leaves a ‘speciation continuum’ of populations that vary in degrees of differentiation (Nosil et al., 2017), the very issue that brings forth the plethora of speciation concepts. Different speciation concepts emerged due to how various disciplines within biology view the evolutionary process. These different concepts that have been raised and result from confusion that still surrounds the term ‘species’. Therefore, to help understand species and speciation, it is useful to define what is meant when using these words. The definition of speciation that will be followed is by Dion-Côté & Barbash (2017), which closely follows the biological species concept (Mayr, 1995), the process by which populations diverge and are reproductively isolated from one another.

Understanding and developing methods to investigate speciation and the speciation process has long been a difficult area in evolutionary biology. Traditional species identification methods, such as morphological identification, are often challenging, especially when species complexes and cryptic species are involved. Morphological delimitation techniques are particularly problematic in many species, including decapods, due to the variability in many diagnostic characters as a result of phenotypic plasticity (de Carvalho et al., 2013). Molecular techniques are a way to overcome the challenges faced using a purely morphological approach, and have been used successfully in a wide range of crustacean species (e.g. Meusel & Schwentner, 2017; Schön et al., 2017; Miranda et al., 2018; Van Der Wal et al., 2019). While many of the molecular studies that aimed to delineate species involving crustaceans have used mitochondrial DNA (mtDNA), they have all relied on measures of divergence as an indirect indicator inferring reproductive isolation (RI). The mtDNA method to identify species (also known as DNA barcoding (Hebert et al., 2003)) has been useful in a range of taxa including birds, fish, insects, plants and crustaceans. DNA barcoding has been undoubtedly successful in the discovery of cryptic species that have previously been overlooked by traditional taxonomic methods (Dasmahapatra et al., 2010). However, there are still some shortcomings when applying DNA barcoding for species discovery and specimen identification. Collins & Cruickshank (2013) describe “seven deadly sins” of DNA barcoding and

40 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

describe these as; i) a failure to test clear hypotheses; ii) an insufficient a priori identification of specimens; iii) the use of the term ‘species identification’; iv) an inappropriate use of neighbour-joining trees, v) bootstrap resampling, vi) using fixed distance thresholds; and vii) incorrectly interpreting the barcoding gap. Although still somewhat problematic, DNA barcoding started to bridge the gap between morphological identification methods and genomic identification methods. But now, with modern molecular techniques and technology we can use next generation sequencing (NGS) to identify gene regions that are directly associated with, or influence, speciation - i.e. ‘speciation genes’. A functional genomic (transcriptomics) approach is a useful tool for speciation gene identification as not only does it reduce the size of the genome, it also provides the expression levels for all contigs along with gene functions, when identified. As defined by Nosil and Schluter (2011), a speciation gene is any gene whose divergence has had significant effect on the evolution of RI. While in most animals, these speciation genes are involved in postzygotic isolation by generating unfit hybrids, prezygotic isolation or barriers can also have a great effect in impeding species from interbreeding. Interestingly, it is uncommon to see diverging populations carrying differentially fixed alleles of genes involved in postzygotic RI in the speciation genes that have been identified throughout time, as the continuation of divergence and evolution seem to occur mainly through soft sweeps from standing genetic variation with RI often under polygenic influence (Dion-Côté & Barbash, 2017). The term ‘speciation genes’ is not a new concept and can be found in the literature as early as the late 1970s (Fry & Salser, 1977). However, much of the research has predominantly been performed on model species, with a particular focus on Drosophila (Barbash, Roote & Ashburner, 2000; Ting, Tsaur & Wu, 2000; Presgraves et al., 2003; Brideau et al., 2006; Phadnis & Orr, 2009; Shanwu & Presgraves, 2009). Although this may seem disadvantageous for non-model organisms, specifically crustaceans that have no currently identified speciation genes, it gives a starting point to look at these gene homologues in different organisms to investigate what role they may be playing in different taxa. An interesting system to identify potential speciation genes in crustaceans is the endemic Australian atyid shrimp, Paratya australiensis. This is for a number of reasons, the first being its broad geographic distribution. Paratya australiensis is found

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

41

in high densities in headwater streams and lowland river channels along the eastern Australian mainland (including the Murray-Darling system) as well as Tasmanian drainages. Population genetic studies on this species have discovered nine equally divergent clades (Cook et al., 2006) across the mainland, with numerous clades having vast as well as overlapping distributions (see Figure 3.1). While P. australiensis is currently classified as a single species, a previous translocation experiment has shown that two lineages found in the Brisbane River, Queensland (highlighted river in Figure 3.1B), show non-random mating with highly variable fitness of F1 hybrids (Hughes et al., 2003). Many studies have commented that P. australiensis is a likely complex of cryptic species (Cook et al., 2006; Wilson, Schmidt & Hughes, 2016). From these observations, it could be expected that their global genetic diversity would be low while genes involved in the speciation process too show higher levels of differentiation (Nosil & Schluter, 2011; Nosil & Feder 2012; Weber et al., 2017).

This study first aimed to identify genes proposed to be involved in the speciation process from the literature and use a transcriptomic approach identify them in P. australiensis. A secondary aim was to identify genes based on their gene ontology and along with genes from the literature search, look at how selection is acting upon these genes. Identifying previously known speciation genes was an important aspect to this study as currently there are no known speciation genes in decapod crustaceans. It was hypothesised that if a signature of positive selection was identified to be acting upon any of the identified genes, it could be involved in the reproductive isolation. Our final aim was to look at the expression patterns of the pure lineages (Kilcoy Creek, lineage 4 and Stony Creek, lineage 6) and see how they compared to individuals from the known hybrid zone (Branch Creek).

42 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

A

B

Figure 3.1 (A) Map of eastern Australia presenting the P. australiensis lineages found at each sampled river (B) Neighbour-joining gene tree (mtDNA COI & nDNA 28S) for Paratya australiensis (from Cook et al., 2006)

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

43

3.2 METHODS

3.2.1 Sample collection P. australiensis samples were collected in December of 2016 from Kilcoy Creek (26°45’21.2”S 152°32’46.9”E) (lineage 4 sensu Cook et al., 2006), Stony Creek (26°50’52.7”S 152°42’40.4”E) (lineage 6 sensu Cook et al., 2006) and Branch Creek (26°51’56.8”S 152°41’57.2”E) (a known hybrid zone, Wilson et al., 2016) (Figure 3.2). A seine net was used to collect individuals that were preserved in situ in liquid

N2. Samples were then returned and stored at -80°C in the CARF Genomics research facility at the Queensland University of Technology. Total gDNA was extracted from the sampled individuals to confirm species and lineage identification using a fragment of mtDNA COI (sensu Hurwood et al., (2003)) using the Folmer et al. (1994) universal primers LCO1490 5’-GGTCAACAAATCATAAAGATATTG-3’ and HCO2198 5’- TAAACTTCAGGGTGACCAAAAAATCA-3’. All individuals were successfully identified as P. australiensis.

Figure 3.2. Map detailing the sampling locations of Kilcoy Creek (lineage 4), Stony Creek (lineage 6) and Branch Creek (hybrid zone).

44 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

3.2.2 RNA extraction, cDNA library preparation and Illumina sequencing From lineages 4 and 6, three individuals each were used for Illumina sequencing and nine individuals were selected from the hybrid zone. Total RNA was extracted from 10 individuals from both lineages 4 and 6 and 15 individuals from a pool in the hybrid zone using a TRIzol/chloroform extraction method (Chomezynski & Mackey, 1995). All samples were then purified using an Isolate II Mini Kit, according to the manufacturer’s instructions (Bioline, Alexandria, Australia), and preserved at -80°C until required for cDNA synthesis. Total RNA quality and yield were checked using a Nano Drop 2000 Spectrophotometer (Thermo Scientific), 2% agarose gel electrophoresis and Bioanalyzer with an RNA nano-chip (Agilent 2100, version 6). From these assessments, three higher-quality samples from each lineage and nine of the higher-quality samples from the hybrid zone were used in the subsequent steps. cDNA synthesis was performed using the Illumina NeoPrep™ and all libraries were then sequenced on an Illumina NextSeq™ 500 Platform for 35 bp paired-end sequencing (Illumina, San Diego, USA) at the CARF Genomics lab, Queensland University of Technology (QUT).

3.2.3 Quality filtering and de novo assembly FastQC (Andrews, 2010) was used to first check the sequence data quality and Illumina paired-end reads were then de novo assembled in Trinity and quality trimmed using the incorporated QC software, Trimmomatic, with default settings to retain only high-quality reads (Q >20, N <1%). All quality filtered cDNA libraries were then pooled together for de novo assembly to generate a reference transcriptome. Sequence clustering (97% or higher sequence similarity) was performed using CD-HIT to remove redundant and chimeric sequences (Fu et al., 2012). To assess assembly completeness BUSCO (Simão et al., 2015) was used, this maps against genes that are highly conserved in eukaryotes.

3.2.4 Bioinformatic analysis Annotation To generate an annotated assembly, we used the Trinotate pipeline (Bryant et al., 2017) using the default settings. First, we found the longest open reading frames (ORFs) using TransDecoder, which was then used as an input for searching the UniProt protein

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

45

database with Blastx, Blastp and Pfam to provide the necessary files for the Trinotate annotation report.

Identification of candidate speciation genes

Candidate speciation genes were identified through an in-depth search of the literature. The genes identified have been found previously to play a role in forms of RI ranging from pre- and post-mating isolation, sexual isolation, courtship behaviour and zygote inviability. These genes were then searched for in the annotation report from Trinotate. Once identified in the annotation report, sequence identity had to be confirmed to do this sequences were blasted against the NCBI (National Centre for Biotechnology Information) non redundant database. If sequences were matched, they were then submitted to the online program ORF Finder to identify ORFs. From the annotation report, genes were identified based on their gene ontology (GO) that may be involved in RI. The GOs that were selected for further investigation were: reproduction, fertilisation, regulation of fertilisation, gamete recognition, mating behaviour, courtship behaviour, binding of sperm to pellucida, negative regulation binding of sperm to pellucida, sperm entry and egg. All 10 of these GO terms are found under the ‘biological process’ category. The same process was used to confirm sequence identity and identify the ORFs for these transcripts.

Signatures of selection on identified speciation genes

To test for selection on the identified speciation genes from the literature and those based on gene ontology from the transcriptome, a local blast database was created using BLASTN version 2.6.0+ (Zhang et al., 2000) to identify these genes from the lineage 4, lineage 6 and Branch Creek populations. Genes with significant hits were then aligned by eye using BioEdit (Hall, 1999). In the Phylogenetic Analysis by Maximum Likelihood (PAML) V4.9 software package (Yang, 2007), the Yang and Nielsen (2000) counting method was used to estimate ratio of the number of nonsynonymous substitutions per nonsynonymous site (dN) to the number of synonymous substitutions per synonymous site (dS), the dN/dS ratio. A ratio of <1 is indicative of purifying selection, while >1 indicates positive selection while a score =1 is consistent with neutral evolution (Dunning et al., 2016). The mean dN/dS values were then calculated for each lineage 4-lineage 6, lineage4-Branch and lineage 6-Branch

46 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

pairwise comparisons. The fraction of nonsynonymous substitutions (fN) was also calculated using the equation𝑓 𝑑/𝑑 𝑑, where fN >0.5 indicates positive selection and <0.5 indicates purifying selection (Xie et al., 2011). This calculation measures the rate of mutation rather than the absolute number of mutation. This is particularly useful as the absolute value (dN/dS) can be large and uninformative if dS is zero or close to zero (Xie et al., 2011).

Differential gene expression and gene ontology enrichment

To visualise the differences in expression patterns among the three populations, differential gene expression (DGE) analysis was performed by first estimating transcript abundance using RSEM with default settings (Li & Dewey, 2011). Abundance data were then loaded into the edgeR Bioconductor package (using the p value cut off for false discovery rate (FDR) of 0.001 with log fold change of 2). Subsequently, there was a large number of features identified as differentially expressed. As such, a flag was included to extract only the top number of differentially expressed features within each pairwise comparison (max_DE_genes_per_comparison 50). Using this flag produced a much more manageable number of differentially expressed transcripts. The heatmap output of DE gene vs samples clusters transcripts according to their differential expression across the 15 samples. GO enrichment analysis was performed to find GO terms that are enriched and depleted from the pairwise differential expression results. To do this GO term categories were first extracted from the Trinotate annotation report using a Perl script within Trinotate. The GOSeq package was used to obtain the enrichment results according to the differential expression Perl script in Trinity. This yielded enriched and depleted gene sets at FDR 0.01.

3.3 RESULTS

3.3.1 Illumina sequencing, de novo assembly and annotation High throughput sequencing yielded approximately 518 million high quality 35 bp paired-end raw reads. From these, Trinity assembled a total of 130,145 transcripts (> 200 bp). Table 3.1 describes the features and statistics of the assembly and annotation. Of these transcripts, approximately 46,000 and 28,000 resulted in significant BlastX

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

47

and BlastP hits respectively, with an overlap of approximately 27,000 transcripts. BUSCO transcriptome completeness showed 95.7% mapping completeness and 4.0% fragmented mapping, indicative of a high quality data set. The only Branch Creek individual to have the Stony Creek type (lineage 6) mtDNA was identified as Branch 6, all other Branch Creek individuals were identified as the Kilcoy type (lineage 4) mtDNA.

Table 3.1 Statistics for Illumina sequencing, de novo assembly and annotation Sequencing Parameters Results Total number of Illumina reads 517,835,637 Total assembled bases 83,588,789 Total Trinity genes 130,049 Total Trinity transcripts 130,145 Median contig length 373 Mean contig length 642.27 Contig N50 value 935 Contig range 201-18,909 Transcriptome completeness (BUSCO) 95.7% BlastX hits 46,010 BlastP hits 28,209 Pfam hits 24,998

3.3.2 Candidate gene identification and signatures of selection From the literature, 18 candidate genes were identified involved in the process of speciation in a range of species and in a range of stages in the reproductive cycle, both pre- and post-mating. Table 3.2 details the gene name and respective roles of the genes as well as the species it was found to play that role. Of these 18 genes, five were identified in the transcriptome data; adenylate cyclase (calmodulin-responsive adenylate cyclase), arylsulfatase (arylsulfatase B), heat shock 70kDa (heat shock 70 kDa protein), Tcp1 (subunits gamma, theta, alpha and beta) and Tpi (triosephosphate isomerase A) (see Table 3.3). Estimates of synonymous versus nonsynonymous substitutions in the identified genes showed that all genes had only synonymous substitutions, but not all genes had nonsynonymous substitutions. Where it was possible to get the dN/dS ratio,

48 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

the genes showed signatures of purifying selection (dN/dS <1) (Table 3.3). The signature of purifying selection is also seen in the rate of mutation (fN) for all genes across the three pairwise comparisons.

Table 3.2 Genes identified from the literature to play a role in the process of speciation Gene Name Role Species Reference Omocestus viridulus & Heinrich et al., Adenylate cyclase Courtship behaviour Chorthippus 2001 biguttulus Sperm-zona Carmona et al., Arylsulfatase Pig pellucida binding 2002 Gametic Echinometra Palumbi & Metz, Bindin compatibility oblonga 1991 Ca+2 dependent Helianthus Lexer, Lai & protein kinase Inviability paradoxus Rieseberg, 2004 (CDPK) Gasterosteus Colosimo et al., Ectodysplasin (Eda) Hybrid inviability aculeatus 2005 Penetration of zona Leucophaea Cornette et al., Glycosyl hydrolase pellucida, male maderae 2003 courtship behaviour Sperm zona- pellucida binding, Heat shock 70 kDa Mus sp. Calvert et al., 2003 zona pellucida receptor complex Hybrid male rescue Barbash, Roote & Post mating isolation Drosophila sp. (Hmr) Ashburner, 2000 Lethal hybrid rescue Post mating isolation Drosophila sp. Brideau et al., 2006 (Lhr) Necleoporin96 Presgraves et al., Post mating isolation Drosophila sp. (Nup96) 2003 Nucleoporin160 Shanwu & Post mating isolation Drosophila sp. (Nup160) Presgraves, 2009 Ting, Tsaur & Wu, Odysseus (OdsH) Post mating isolation Drosophila sp. 2000 Hybrid sterility Phadnis & Orr, Overdrive Drosophila sp. between species 2009 Pentatricopeptide Sweigart, Fiahman Sexual isolation Mimulus sp. repeat (PPR) & Willis, 2006 PR domain Reproductive containing 9 Mus sp. Mihola et al., 2009 isolation (Prdm9) Aerosomal reaction, tcP-1 CPN60 Barchuck et al., sperm zona-pellucida Apis mellifera Chaperonin 2007 binding Triosephosphate Dopman et al., Sexual isolation Ostrinia nubilalis isomenase (TPi) 2005 ZP3 – binding Sperm zona- Bleil & Mus sp. protein pellucida binding Wassarman, 1990

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

49

Table 3.3 Genes identified from Table 3.2 and their pairwise selection results where dN= nonsynonymous substitution dS= synonymous substitution Population Gene Name d d d/d f Pair n s n s N Ca(2+)/calmodulin-responsive 0 0.0201 0 0 adenylate cyclase Arylsulfatase B 0.0067 0.0323 0.208 0.1718 Heat shock 70 kDa protein 0.0059 0.0229 0.2577 0.2049 Lineage 4 T-complex protein 1 subunit gamma 0 0.122 0 0 v Lineage 6 T-complex protein 1 subunit theta 0.0016 0.0099 0.1632 0.1391 T-complex protein 1 subunit alpha 0 0.0141 0 0 T-complex protein 1 subunit beta 0 0.0104 0 0 Triosephosphate isomerase A 0.0018 0.0053 0.3427 0.2535 Heat shock 70 kDa protein 0 0.0204 0 0 T-complex protein 1 subunit gamma 0 0.0114 0 0 Lineage 4 v T-complex protein 1 subunit theta 0.0012 0.0041 0.3802 0.2237 Branch T-complex protein 1 subunit alpha 0 0.0073 0 0 Creek T-complex protein 1 subunit beta 0.0002 0.0041 0.0422 0.0512 Triosephosphate isomerase A 0 0.0029 0 0 Heat shock 70 kDa protein 0.005 0.0208 0.2486 0.194 T-complex protein 1 subunit gamma 0 0.0064 0 0 Lineage 6 v T-complex protein 1 subunit theta 0.0005 0.0073 0.1412 0.059 Branch T-complex protein 1 subunit alpha 0 0.0099 0 0 Creek T-complex protein 1 subunit beta 0.0002 0.0113 0.028 0.0193 Triosephosphate isomerase A 0.0018 0.0024 0.1523 0.4332

50 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

Of the chosen GO terms, 22 genes were identified that included at least one of the 10 identified GO terms (Table 3.4). However, genes found under the GO term ‘egg activation’ did not have enough sequence overlap to test for selection. For the estimates of synonymous versus nonsynonymous substitutions in the lineage 4-lineage6 pairwise comparison, 11 of the 22 genes had synonymous substitutions only and four had only nonsynonymous substitutions. The gene takeout was the only gene that appeared to be indicative of positive selection (>1 dN/dS and >0.5 fN) in the lineage 4-lineage 6 comparison, although in the lineage 6-Branch comparison fN was suggestive of positive selection while the dN/dS ratio was indicative of purifying selection. All other genes in all three pairwise comparisons were showing signatures of purifying selection (Table 3.4)

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

51

Table 3.4 Genes identified based on gene ontology and their pairwise selection results where dN= nonsynonymous substitution dS= synonymous substitution Population Gene Name GO term/s d d d/d f Pair n s n s N Binding of sperm to Beta-1,4-galactosyltransferase 1 0.0026 0.0185 0.1405 0.1232 pellucida Binding of sperm to Zonadhesin 0.0048 0.0067 0.7164 0.4174 pellucida Binding of sperm to Zonadhesin 0.011 0.0496 0.2217 0.1815 pellucida Calcium/calmodulin-dependent Courtship behaviour 0 0.0047 0 0 protein kinase type II alpha chain

Doublesex- and mab-3-related Courtship behaviour 0 0.0169 0 0 transcription factor 1 Lineage 4 Potassium voltage-gated channel v Courtship behaviour 0 0.0075 0 0 Lineage 6 protein Shaker Protein spinster Courtship behaviour 0.0041 0.0118 0.3474 0.2579 Protein spinster Courtship behaviour 0.0019 0.0136 0.1397 0.1226 Courtship cAMP-specific 3’,5’-cyclic behaviour, mating 0 0.012 0 0 phosphodiesterase behaviour, reproduction Bcl-2-like protein 1 Fertilisation 0.0031 0 0 0 Histone-lysine N-methyltransferase Fertilisation 0 0.0106 0 0 EHMT2 Tubulin polyglutamylase TTLL5 Fertilisation 0 0.0149 0 0

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone 53

Fertilisation, negative regulation Astacin 0 0.014 0 0 of binding sperm to pellucida Fertilisation, negative regulation Astacin 0 0.0245 0 0 of binding sperm to pellucida Fertilisation, mating Ubiquitin-conjugating enzyme E2 Q1 0 0.003 0 0 behaviour DNA helicase MCM9 Gamete recognition 0.0075 0.0197 0.3807 0.2757

E3 ubiquitin-protein ligase FANCL Gamete recognition 0.0106 0 0 0

Gamete recognition, Importin subunit alpha-3 0.0064 0 0 0 reproduction Takeout Mating behaviour 0.0054 0.0052 1.0384 0.5094

Integrator complex subunit 13 Regulation of 0.0021 0 0 0 {ECO:0000250|UniProtKB:Q9NVM9} fertilisation

Integrator complex subunit 13 Regulation of 0 0.0066 0 0 {ECO:0000312|MGI:MGI:1918427} fertilisation

Ubiquitin-protein ligase E3A Sperm entry 0 0.0100 0 0 Lineage 4 Binding of sperm to Beta-1,4-galactosyltransferase 1 0.0014 0.0064 0.1741 0.1749 v pellucida Branch Binding of sperm to Zonadhesin 0.0032 0.0060 0.6080 0.3489 Creek pellucida

54 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

Binding of sperm to Zonadhesin 0.0056 0.0225 0.1246 0.1993 pellucida Calcium/calmodulin-dependent Courtship behaviour 0 0.0079 0 0 protein kinase type II alpha chain

Doublesex- and mab-3-related Courtship behaviour 0 0.0142 0 0 transcription factor 1 Protein spinster Courtship behaviour 0 0.0065 0 0 Protein spinster Courtship behaviour 0.0016 0.0041 0.1714 0.2753 Courtship cAMP-specific 3’,5’-cyclic behaviour, mating 0 0.0033 0 0 phosphodiesterase behaviour, reproduction Bcl-2-like protein 1 Fertilisation 0.0011 0 0 1 Fertilisation, negative regulation Astacin 0.0006 0.0041 0.0565 0.1326 of binding sperm to pellucida Fertilisation, negative regulation Astacin 0 0.0082 0 0 of binding sperm to pellucida Fertilisation, mating Ubiquitin-conjugating enzyme E2 Q1 0 0.0020 0 0 behaviour Takeout Mating behaviour 0.0010 0.0012 0.1024 0.4524 Ubiquitin-protein ligase E3A Sperm entry 0 0.0038 0 0

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone 55

Binding of sperm to Beta-1,4-galactosyltransferase 1 0.0019 0.0132 0.3309 0.1261 pellucida Binding of sperm to Zonadhesin 0.0037 0.0057 0.4549 0.3926 pellucida Binding of sperm to Zonadhesin 0.0079 0.0315 0.2497 0.1997 pellucida Calcium/calmodulin-dependent Courtship behaviour 0 0.0090 0 0 protein kinase type II alpha chain

Doublesex- and mab-3-related Courtship behaviour 0 0.0070 0 0 transcription factor 1 Lineage 6 Protein spinster Courtship behaviour 0 0.0036 0 0 v Protein spinster Courtship behaviour 0.0028 0.0088 0.2777 0.2430 Branch Courtship Creek cAMP-specific 3’,5’-cyclic behaviour, mating 0 0.0090 0 0 phosphodiesterase behaviour, reproduction Bcl-2-like protein 1 Fertilisation 0.0021 0 0 1 Fertilisation, negative regulation Astacin 0.0006 0.0130 0.1003 0.0466 of binding sperm to pellucida Fertilisation, negative regulation Astacin 0 0.0163 0 0 of binding sperm to pellucida

56 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

Fertilisation, mating Ubiquitin-conjugating enzyme E2 Q1 0 0.0030 0 0 behaviour Takeout Mating behaviour 0.0048 0.0035 0.8205 0.5793 Ubiquitin-protein ligase E3A Sperm entry 0 0.0034 0 0

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone 57

3.3.3 Differential gene expression and GO enrichment analysis In total, 13,359 transcripts were differentially expressed between the lineage 4, lineage 6 and Branch Creek populations. To manage this number, adding the flag max_DE_genes_per_comparison brought this number down to 146 differentially expressed transcripts. The heatmap in Figure 3.3 highlights the stark differences between the lineage 4 and lineage 6 populations as well as the areas where the Branch Creek individuals are similar to the other populations. Of the 146 transcripts, 71 were unannotated and generally grouped in pairs or bigger throughout the heatmap. The remaining 75 transcripts were annotated and many were found to be isoforms or subunits of genes (see appendix B, Table B1 for list of contigs and their annotation). Functional GO enrichment analysis showed 53, 910 and 95 GO terms were significantly enriched (at P<0.01) between Branch-lineage 4, Branch-lineage 6 and lineage 4-lineage 6 respectively. Interestingly, a single GO term was enriched among all three pairwise comparisons GO:0032993 (part of the cellular component, protein- DNA complex).

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

59

Figure 3.3 Expression pattern of the 50 most differentially expressed transcripts within each pairwise comparison at e-3 with a log fold change 2. The yellow coloured transcripts are upregulated or highly expressed and the purple transcripts are downregulated or lowly expressed.

60 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

3.4 DISCUSSION

This research signifies the first-time speciation genes have been searched for in a decapod crustacean. Using a transcriptomic approach has allowed us to dive into the functional genome of Paratya australiensis to further help us understand if this currently identified single species is actually a series of cryptic species or a species complex (as referred to by Cook et al., (2006) and Wilson et al., (2016) or in the advanced stages of becoming multiple ‘species’. This investigation has started by looking at “speciation genes” or genes previously identified as being associated with RI. The RI process can be split into three categories: 1) pre- e.g. courtship behaviour; 2) during- e.g. sperm binding and; 3) post- e.g. hybrid inviability of any kind. The pre-identified speciation genes from the literature fell in to all three categories, however, only genes relating to ‘pre’ and ‘during’ were able to be identified in the transcriptome. Previously identified homology between invertebrates and mammals (~70% of the genome) (Prachumwat & Li, 2008) would suggest that some of the pre-identified genes would be found in the transcriptome of P. australiensis, however, it was not hypothesised that any of those genes would act in a similar fashion to the organism it was first identified in. It was important to include these genes, even though they have been identified generally in model organisms and laboratory reared populations, as at the time of writing they are the only genes identified as speciation genes. Currently, speciation studies focusing on crustaceans have generally used mitochondrial genes to determine different species (e.g. Miranda et al., 2019). The selection results indicated that the genes identified in our transcriptomes that returned a dN/dS ratio are possibly being subjected to purifying selection. The genes identified from the GO terms had a combination of synonymous substitutions only and nonsynonymous substitutions only. However, genes identified based on GO terms, again showed a general pattern towards purifying selection. It is known based on other organisms that these genes are likely to be involved in the reproductive process and thus it is not unusual for these genes to show extremely strong signatures of purifying selection. Any divergence from the population mean for any component of mate recognition or zygote viability system would reduce the fitness of individuals with respect to their ability to pass these genes to the next generation; hence would be subject to strong purifying selection. One example of organisms to be under extreme

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

61

purifying selection can be seen in sea urchins, where point substitutions in the sperm binding protein bindin leads to gamete incompatibility as the egg surface receptors are no longer able to recognise the sperm (Metz & Palumbi, 1996). Although sea urchins are broadcast spawners with external fertilisation, bindin demonstrates how sensitive fertilisation systems can be as the rise of a single mutation can lead to gametic incompatibility. Generally, most genes across all pairwise comparisons in both the pre- identified genes and genes based on GO terms had synonymous substitutions. This divergence through synonymous substitutions could be due to SNPs (single nucleotide polymorphisms) being found in the regulatory region due to the plasticity of P. austaliensis. Of all the genes investigated, a single gene, takeout, was identified showing a signature of positive selection in the Kilcoy-Stony comparison based on both the dN/dS ratio and rate of mutation (fN) and in the Stony-Branch comparison based on the fN value only. The takeout gene is part of a larger family of secreted factors with the function to bind small lipophiles and has also been identified as a circadian-regulator gene (Dauwalder et al., 2002). The takeout gene has been shown to have a male- specific expression in Drosophila and mutations in the gene result in a reduction in male courtship behaviour (Dauwalder et al., 2002). As this gene is indicating positive selection in the two pure lineages, it could be a gene to investigate further in P. australiensis.

The DGE analysis (Figure 3.1) illustrates the immense differences between the two pure lineages (lineages 4 and 6) in terms of gene expression. Although these two lineages are considered a single species, the expression profiles significantly contrast one another. The Branch Creek individuals are the most interesting in terms of their expression profiles. It was expected that there would be some introgression and after 25 generations it was expected that in general, the expression pattern would regress back to the original lineage 6 type. This expectation to regress back to the resident type expression is due to the environmental factors and lineage 6 already being well adapted to the Branch Creek environment. What is seen, however, based on the heatmap (Figure 2), is expression leaning towards a more introgressed pattern and therefore it is believed that while the introduced type (lineage 4) are known to prefer a higher altitude with lower temperatures (Hughes et al., 2003), they are able to quickly adapt

62 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

to the local environment that surrounds them and that hybridisation between the two lineages is still ongoing. It is important to note here that as the introduced lineage moves downstream they are coming into contact with pure lineage 6 individuals while lineage 4 is unlikely to be pure it will be some form of cross, backcross or hybrid. Looking specifically at Branch Creek individual 6 and the higher expression value (bright yellow) area, the genes here are NADH ubiquinone oxidoreductase (the largest enzyme of the mitochondrial respiratory chain), hemocyanin chains (used in oxygen transportation throughout the body) and cytochrome b (protein in the mitochondria used for electron transport). All these genes in conjunction with one another could be associated with oxygen intake, as the lower altitude of Branch Creek is linked to warmer water temperature and therefore less dissolved oxygen compared to the higher altitude and cooler water of Kilcoy Creek with more dissolved oxygen. Consequently, the respiratory systems of the Kilcoy Creek population do not have to work as hard. This difference in water temperature may also have a greater impact on the uneven mating seen between the two lineages. As the translocated females could be becoming sexually active earlier than the resident type females and either the resident males are dominant in their own environment or the water temperature is too high for the translocated males hence the uneven mating displayed when the translocation first occurred. If this is the case, temperature and altitude may be what is separating these lineages rather than pre- or post-mating isolation genes. Knowing specifically that cytochrome b and NADH are some of the genes that is highly differentially expressed could indicate that there is some type of cytonuclear interaction (e.g. discordance, disequilibrium) occurring. We know from population genetics theory (Barton & Jones, 1983) that cytoplasmically inherited genes (i.e. mitochondrial genes) are more likely to cross the species boundary compared to their counterparts (i.e. autosomal genes) as mitochondrial genes are unlinked to the nuclear genes that are responsible for interspecific incompatibility (Di Candia & Routman, 2007). The Dobzhansky-Muller (DM) model proposes that hybrid sterility, inviability or incompatibilities (low fitness) can be caused by the interactions among heterospecific alleles (Bateson, 1909; Dobzhansky, 1937; Muller, 1942; Brideau et al., 2006; Barr & Fishman, 2010). The generality of the model is its key strength as the alleles (minimum of two) that interact and cause low hybrid fitness have functionally diverged in their native genetic backgrounds (Barr & Fishman, 2010). While there has Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

63

not been any general pattern identified with the genes involved in the DM model, there have been some instances (Burke & Arnold 2001; Burton & Barreto 2012; Greiner et al., 2011) where hybrid breakdown has been associated with mitochondrial function (Burton, Pereira & Barreto 2013). Therefore, the mitochondrial-nuclear gene interaction between cytochrome b, NADH and the hemocyanin chains could be a candidate for the DM model in P. australiensis as it is known from previous research that the hybrid offspring (lineage 4 female and lineage 6 male) have a lower fitness than the resident type (Fawcett et al., 2010). The differences seen between the populations in the DGE analysis could also be due to environmental factors. Previous studies have shown that clutch size varies according to the environment that P. australiensis is found in (Hancock, Hughes & Bunn, 1998). After the translocation event and one generation, the translocated females (Kilcoy, lineage 4) reflected the resident type (Stony, lineage 6) clutch size (larger clutch size) suggesting this trait is influenced by the environment (Hancock, Hughes & Bunn, 1998), while maintaining a larger egg size (genetically determined). Furthermore, DGE differences may be due to the potentially different times of the reproductive cycle individuals were in. For example, hemocyanin, which was identified as being differentially expressed, has been shown to be upregulated during pre- and intermoult in the blue swimmer crab (Portunus pelagicus) (Kuballa & Elizur, 2008). The moulting cycle is an essential process that all go through for not only growth but also reproduction and metamorphosis. The differential expression of this specific gene in P. australiensis could therefore be due to the different stages of moulting individuals were in. Finally looking at the GO enrichment we see that the Branch-Stony comparison has the highest number of enriched GO terms and the Branch-Kilcoy is the most dissimilar with the lowest number of enriched terms. The higher the number of enriched terms there are the more similar they are to one another. This is particularly interesting as we know based on COI that only one of the Branch Creek individuals has the Stony mitochondria and the remaining five have the Kilcoy mtDNA and in general they look to express closer to the Kilcoy type. However, based on GO enrichment they are still more similar to the original Stony type and least similar to the Kilcoy type.

64 Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

Overall, the research here is a major step in the right direction to aid in identifying genes that are playing a role in P. australiensis and possibly wider in decapod crustaceans. While we understand that pre-identified homologous genes from other taxa are not necessarily going to play the same functional role in a crustacean species, we found it important to look for these genes regardless to give us a starting point to identify particular genes in the future. First looking at GO terms associated with reproduction also gave us a platform to start looking for specific genes for this species. Although we only found a single gene that indicated positive selection, future studies can now target this gene to see how it is being expressed in the Branch Creek population through behavioural tank experiments. Further work needs to be done to identify other genes that could be playing a role and to do this we need to look further than just reproductive genes as we know that evidence of RI varies greatly from any single given gene.

Chapter 3: The identification and expression of speciation genes in a Paratya australiensis hybrid zone

65

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

*This chapter has been written to easily convert into a manuscript for publication.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

67

4.1 INTRODUCTION

Hybridisation is a common occurrence in natural populations and these regions of species mixing have been coined as hybrid zones. Here, hybridisation is defined as the mating of two genetically distinguishable populations (Todesco et al., 2016). Hybridisation can lead to a variety of evolutionary outcomes in both a positive and negative way. A hybrid zone can maintain or increase diversity in multiple ways through genetic rescue of small inbred populations, origin and transfer of adaptations, reinforcement of reproductive isolation (RI) and the formation of hybrid lineages (Anderson 1949; Ellstrand & Schierenbeck 2000; Mallet 2007; Abbott et al. 2013; Frankham 2015; Todesco et al., 2016). However, hybridisation can also decrease diversity through the breakdown of reproductive barriers, the merging of previously distinctive lineages and the extinction of populations or species (Rieseberg et al. 1989; Ellstrand 1992; Levin et al. 1996; Rhymer & Simberloff 1996; Allendorf et al. 2001; Buerkle et al. 2003; Vuillaume et al. 2015). Extinction of a population or species is the most extreme negative outcome of hybridisation and can be facilitated through two mechanisms, demographic swamping or genetic swamping. As described by Wolf et al. (2001), demographic swamping occurs with severe outbreeding depression and population growth rates of one or both parental lineages declines below replacement rates due to wasted reproductive effort, while genetic swamping occurs when outbreeding depression occurs but is less severe and population growth rates exceed replacement rates and one or both parental lineages are replaced by the hybrids. A hybrid zone and can be seen as a place to view how mutations that have been accrued between species are tested (Payseur, 2010). The combination of mutations fit into three categories; (i) mutations are equally fit in both the hybrids and pure species (i.e. they flow freely across the hybrid zone); (ii) mutations thrive on hybrid backgrounds (i.e. move easily and quickly across secondary contact); and (iii) the variants interrupt reproduction and/or development in hybrids or causes conspecific mating choice (i.e. positive assortative mating) thus inhibiting gene flow between species (i.e. post- and prezygotic isolation) (Payseur, 2010).

This secondary contact between species is generally seen in a natural setting, however, in the case of Paratya australiensis, a hybrid zone was formed after approximately 10,000 individuals were translocated between two streams in south east Queensland,

68 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

Kilcoy Creek and Branch Creek, in 1993 to measure instream dispersal (Hancock, 1995). Each population had been identified to have fixed differences on multiple allozyme loci (Hughes et al., 1995). It was not until much later it was discovered that these two populations represented different mitochondrial lineages (Hughes et al., 2003; Hurwood et al., 2003). Further work has since been performed to investigate hybridisation between the resident and translocated lineages (Fawcett et al., 2010) in Branch Creek. By 2001, the translocated and hybrid type shrimp had almost totally displaced the resident type to the lower pools of Branch Creek (Fawcett, 2010). A study from Wilson et al. (2013) revealed that the hybrid zone had moved a further 510m downstream since 2002. They attribute this to higher than average rainfall in 2010 as Branch Creek would have been flowing for most of the year and not isolating any of the lower altitude pools. The aims of this study were to use a genome-wide single nucleotide polymorphism (SNP) approach to identify patterns of hybridisation in the nuclear genome and to investigate the effect of 25 years of introgression in the hybrid zone. Due to the fitness differences observed between the Kilcoy (introduced) and Branch (resident) lineages while in sympatry, it was expected that much of the nuclear genome would revert to the resident type. Therefore, it was hypothesised that the closer an F1 hybrid cross a Branch Creek individual was, the more equidistant they would be to the pure lineages and the more backcrossed an individual was, the more likely it would be to resemble the resident SNPs. This was anticipated due to the expectation that environmental conditions would be more suitable for the resident lineage compared to the introduced lineage.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

69

4.2 METHODS

For this study, the reference transcriptome made of the nine Branch Creek individuals (hybrid zone), three Kilcoy Creek individuals (lineage 4) and three Stony Creek individuals (lineage 6), from chapter 3 was used.

4.2.1 Variant calling Variants were called differently to Chapter 2 as a more streamlined approach had been released in the Trinity package. First, the reference transcriptome was used to create SuperTranscirpts, these SuperTranscripts are useful as they provide a genome-like reference for de novo transcriptome assemblies (Davidson, Hawkins & Oshlack, 2017). The outputs generated from the SuperTranscripts were then used to call variants in Trinity with the python script run_variant_calling.py. This script automatically runs the Genome Analysis Toolkit (GATK) pipeline for variant calling. The script was customised each time it was run to use each individuals paired-end raw read sequences. The first step in the GATK pipeline was to index and create a dictionary of the SuperTranscript using Samtools and Picard, respectively. The paired-end raw reads were then aligned to the SuperTranscript file using STAR v2.7.1a. Variants were then called in GATK using the Haplotype Caller with default settings. The final step in the pipeline is basic variant filtering based on recommended values that; (i) filters clusters of at least 3 SNPs within a 35bp window between them; (ii) Fisher Strand value (FS > 30.0); and (iii) Quality by Depth values (QD < 2.0). The individual outputs from the pipeline then underwent further hard filtering using GATK to filter out variants based on the following: (i) Fisher Strand value (FS > 60.0); (ii) HaplotypeScore (> 13.0); (iii) root mean square for mapping quality across all genotypes (MQ < 40.0); (iv) filter out variants where more than 10% of the reads had a mapping quality of 0 (DP > 10); (v) Rank Sum Test for mapping qualities of reference versus alternate read (MQRankSum < -12.5); and (vi) Rank Sum Test for relative positioning of reference versus alternate alleles within reads (ReadPosRankSum < -8.0). All outputs from this step of filtering were then merged into a single vcf file using the GATK CombineVariants command. The final filtering step used VCFTools to first filter to include only sites with a Minor Allele Frequency greater than or equal to 0.05 (maf 0.05) and finally to include only sites with mean

70 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

depth values greater than or equal to 20 (min-meanDP 20). This final output was then used for further steps of analysis.

4.2.2 SNP analysis The SNPRelate package in R was used to perform identity by state (IBS) analysis. IBS is generally used to describe segments which are not identical by descent and therefore do not assume the sharing of a recent common ancestor) on the final output from variant calling, using the command snpgdsIBS to create a matrix and also for multidimensional scaling analysis (distance scatter plot). The snpgdsIBS command calculates the fraction of identity by state for each pair of samples. A clustering tree was created using the work flow seen in Figure 4.1. A principal component analysis was also performed in the SNPRelate package using the command snpgdsPCA and was plotted using the package ggplots to create the scatter plot. The PCA along with the phylogenies built based on SNPs can be used as an indicator of heterozygosity in the Branch Creek population that was sampled and, from this, the current state of hybridisation can be inferred.

Figure 4.1 Workflow used in the R package SNPRelate to create a clustering tree

Finally, in the VCFTools program, the relatedness2 output statistic was calculated; this statistic (Φ) uses the Manichaikul et al. (2010) method to determine relatedness. Φ can be interpreted as the probability of identifying identical alleles when randomly sampling one allele from each heterozygous individual. As described by Manichaikul et al. (2010), the kinship coefficient, Φ, has expected ranges for duplicate samples or monozygotic twins (>0.354), 1st degree relatives (0.177-0.354), 2nd degree relatives (0.0884-0.177), 3rd degree relatives (0.0442-0.0884) and for unrelated samples (<0.0442). A dendrogram was then created following the script relatedness.R (https://github.com/davemcg/R_play/blob/master/relatedness.R) in R.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

71

4.3 RESULTS

4.3.1 Variant calling Transcriptome wide variant calling identified 850,884 raw SNPs across all individuals. After the first round of filtering, 49,317 SNPs were identified and after the final filtering round, 35,704 SNPs were identified across all individuals. The 35,704 SNPs were identified across 6,307 unique transcripts. Figure 4.2 displays the most common GO terms from each category: cellular component, molecular function and biological process, with 2,994, 3,023 and 2,769 terms, respectively. Generally, across the Branch Creek individuals, the observed heterozygosities of their SNP loci were at least double to what their expected proportion would be (with the exception of individuals 1 and 9 whose observed and expected values were similar) and therefore their observed homozygosity was low (Table 4.1). The Kilcoy population also had higher than expected heterozygosity of SNP loci, but only by <10%. The Stony Creek individuals however had a higher observation of homozygous SNP loci but also by < 10%.

72 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

Table 4.1 Homozygosity and heterozygosity for each sampled individual. O(HOM) = observed homozygosity, E(HOM) = expected homozygosity, O(HET) = observed heterozygosity and E(HET) = expected heterozygosity.

Individual N_SITES F (inbreeding O(HOM) E(HOM) Proportion Proportion O(HET) E(HET) Proportion Proportion coefficient) O(HOM) E(HOM) O(HET) E(HET) Branch 1 18871 -0.0798 10259 10895.4 0.54 0.58 8612 7975.6 0.46 0.42 Branch 2 21489 -1.02699 3432 12580.7 0.16 0.59 18057 8908.3 0.84 0.41 Branch 3 19371 -0.85008 4540 11354.6 0.23 0.59 14831 8016.4 0.77 0.41 Branch 4 16897 -0.50809 6200 9803.9 0.37 0.58 10697 7093.1 0.63 0.42 Branch 5 24599 -1.2926 1724 14621.2 0.07 0.59 22875 9977.8 0.93 0.41 Branch 6 19873 -0.26757 9226 11473.5 0.46 0.58 10647 8399.5 0.54 0.42 Branch 7 20082 -0.88824 4527 11844.2 0.23 0.59 15555 8237.8 0.77 0.41 Branch 8 21676 -1.08733 3178 12813.9 0.15 0.59 18498 8862.1 0.85 0.41 Branch 9 14892 -0.16043 7829 8805.4 0.53 0.59 7063 6086.6 0.47 0.41 Kilcoy 1 14382 -0.1144 7749 8429.9 0.54 0.59 6633 5952.1 0.46 0.41 Kilcoy 2 14946 -0.13412 7825 8667.1 0.52 0.58 7121 6278.9 0.48 0.42 Kilcoy 3 14672 -0.09949 7926 8536.4 0.54 0.58 6746 6135.6 0.46 0.42 Stony 1 17690 0.10865 11111 10309.1 0.63 0.58 6579 7380.9 0.37 0.42 Stony 2 17382 0.14423 11208 10167.4 0.64 0.58 6174 7214.6 0.36 0.42 Stony 3 17408 0.10213 10915 10176.4 0.63 0.58 6493 7231.6 0.37 0.42

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis) 73

Figure 4.2 Log10 WEGO plot of most frequent GO terms in the 6,307 SNP transcripts.

4.3.2 SNP analysis The matrix created from the IBS analysis (Figure 4.3), provides an insight into how similar the SNPs are, not necessarily through direct descent. The closer to 1.00 a pairwise comparison is, the more similar they are to one another. Generally, the Branch Creek individuals are more similar to each other than they are to the pure lineages, 4 (Kilcoy Creek) and 6 (Stony Creek). The lineage 6 individuals appear to be the most dissimilar to the other samples, however, taking into account the scaling they still have approximately 75% similarity. This possibly relates to the Stony population being

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

75

large and outbred, where sampled individuals are less likely to be close relatives. Comparatively to the lineage 4 individuals that are between 85-90% similar. These individuals were sampled from smaller headwater stream facilitating a higher rate of inbreeding.

Figure 4.3 Matrix of genome wide average identity by state pairwise identities where green is more different to pink more similar. BC = Branch Creek, KC = Kilcoy Creek, SC = Stony Creek.

76 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

The multidimensional scaling analysis scatter plot (Figure 4.4) clearly plots lineage 4 clustered together and lineage 6 clustered together while the Branch Creek individuals are dispersed throughout the plot, suggesting differing levels of hybridisation. The distribution of the Branch Creek individuals in Figure 4.4 in terms of hybridisation can be interpreted as the more equidistant from both parental lineages an individual is, the more likely it is to be an early generation hybrid (i.e. F1 hybrid).

Figure 4.4 Multidimensional scaling analysis on identity by state distance. Red = Branch Creek hybrid zone, green = Kilcoy Creek, blue = Stony Creek.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

77

The clustering tree (Figure 4.5), built on IBS fractional values, much like Figure 4.4, groups individuals of the pure lineages, 4 and 6, together in their own separate clades. Knowing that the Branch 6 individual has the lineage 6 mtDNA, it is therefore likely that this individual is an original Branch Creek type. Comparing the Branch 6 individual to the Branch 9 individual, it is more probable that it is a back crossed with only lineage 4 due to how strongly it is sitting within the Kilcoy individuals. The Branch Creek clade can be interpreted as varying levels of hybridisation, with individuals 2, 3, 5, 7 and 8 being closer to early generation hybrids.

Figure 4.5 Clustering dendrogram based on hierarchical clustering of IBS fractional values.

78 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

The principal component analysis (Figure 4.6), also shows varying levels of hybridisation occurring within the Branch Creek individuals. Once again, the lineage 4 and 6 individuals cluster tightly within themselves but far away from each other as populations. Eigenvectors one and two hold close to 50% of the variation among the populations (see appendix C Figure C1 for a plot of the first 4 eigenvectors and their percentage of variation).

Figure 4.6 Principal component analysis plot on SNP genotypes with eigenvectors one and two. BC = Branch Creek; KC = Kilcoy Creek; SC = Stony Creek.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

79

The relatedness tree (Figure 4.7) built on Manichaikul et al. (2010) statistic Φ, is almost identical to the IBS clustering tree (Figure 4.5). For each pairwise comparison (see Appendix C, Table C1), Φ appeared to indicate only first-generation relatives (i.e. Φ was in the range of 0.177-0.354), however, it is known that each population was caught in different creeks that come to a confluence where P. australiensis is not found. From this it can be inferred that it is not possible for between populations to be first generation relatives.

Figure 4.7 Dendrogram built on the Φ statistic from Manichaikul et al. (2010) where the further from 0, the less related individuals are.

80 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

4.4 DISCUSSION

Subsequent to the translocation of approximately 10,000 P. australiensis individuals from Kilcoy Creek to Branch Creek in 1993 (Hancock, 1995), it was discovered that non-random mating (Hughes et al., 1995) was occurring between what is now known after further studies (Hurwood et al., 2003; Baker et al., 2004; Cook et al., 2006) as two distinct and widespread lineages, 4 and 6 (sensu Cook et al. 2006). Given the periodic focus and debate on varying concepts of species and underlying processes of speciation reviewed earlier, potentially there is much to be learned from investigating the fate of the lineages and their respective genomes some 20 generations on. This study used genome-wide SNPs, rather than mtDNA or other nuclear markers, as a tool to examine the nature of hybridisation that is occurring at the current zone of introgression of the Kilcoy genome (lineage 4) in Branch Creek (lineage 6). The identity by state analysis on the SNPs (Figure 4.3) revealed that generally, individuals of the Branch Creek population had a relatively high degree of similarity, but they appear to fall in to two distinct groups. This is interesting given the variable distribution of the Branch Creek individuals in the MDS and PCA plots (Figures 4.4 and 4.6) and the  analysis clearly shows the Branch Creek individuals are variable and intermediate to the other two populations (Figure 4.7). This higher degree of similarity seen within the Branch Creek hybrids could be due to a range of reasons including, but not limited to, the relatively short amount of time that introgressed individuals dispersing downstream have spent in the sampled pool and thus less time for recombination to break up co-inherited allelic patterns. As there has been less time for recombination, allele linkage disequilibrium is likely to be high (Kim et al., 2007). Alternatively, because the pure lineages have been in contact within their respective populations for a significantly longer period of time, recombination is high and in turn allele linkage disequilibrium is low. An additional explanation to the high similarity seen in the Branch Creek individuals could be due to only a small number of related individuals producing offspring in the sampled pool. Previous studies (Fawcett et al., 2010) have shown the lower fitness of the hybrid offspring, often not making it to maturity to reproduce the next season, this in turn creates a smaller reproducing population.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

81

As it is known that all individuals sequenced from Branch Creek have the introduced type (lineage 4) mitochondrial DNA, with the exception of individual 6, the higher levels of observed heterozygosity seen in the filtered SNPs suggest that these individuals are indeed hybrids to varying degrees. This is further confirmed through the clustering tree built on the IBS proportion. On this tree (Figure 4.5) it can be inferred that Branch Creek 6 (resident type lineage 6 mtDNA) is likely a descendant of a local lineage 6 female and either a Branch Creek male or a male that was heavily backcrossed with lineage 4; the heatmap in Figure 3.3 suggests that this individual’s greater similarity in expression patterns to that seen in lineage 4 may thus be due to a similar genetic background. While the Branch Creek 9 individual is likely to be an introduced Kilcoy type that has little to no crossing with any resident type shrimp due to its tight clustering within the Kilcoy Creek samples. Whereas the Branch 4 individual still clusters within the Kilcoy population, it is more plausible that this shrimp is the result of backcrossing with both the resident Branch Creek type and the Kilcoy type. The remaining Branch Creek individuals are prospective earlier generation hybrids due to their clustering together and their place in the phylogeny. The PCA aids in further supporting this, as the two pure lineage populations group tightly within but among they are spread across both eigenvectors. The Branch Creek individuals are spread throughout the plot. It was hypothesised that the closer an individual is to the equidistant between the Stony and Kilcoy plots, the earlier generation the hybrid would be as they would share half the SNPs with the lineage 4 type and the other half with lineage 6. The relatedness tree is congruent with the clustering tree, placing the Branch individuals 1 and 6 within the Stony clade and Branch 4 and 9 within the Kilcoy clade. Currently, there have been no verified reports of P. australiensis lineages found in sympatry (within the same sampling pool) in the sampling locations, as such the Manichaikul et al. (2010) statistic Φ can largely be ignored for between the pure populations. This is because there is no way that the sampled individuals from the pure locations can be related to one another, however using this statistic can aid in determining the effect of mating in Branch Creek. As such, recombination must be low as Branch Creek individuals are still falling within the same clade as the pure populations.

82 Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

Overall, this study provides a genetic insight into hybridisation in P. australiensis and shows that the sampling site of Branch Creek was on the edge of the invasion front of the translocated mtDNA. As the translocated genotype moves further downstream only backcrossed individuals of lineage 4 will be coming into contact with the resident Branch Creek individuals, lineage 6. The visualisation of the SNPs clearly shows varying degrees of hybridisation and thus it can be confidently said that hybridisation is still ongoing between the translocated genotype and resident genotype. Comment can also be made on mate choice based on these data. It appears that still after over 20 generations that non-random mating is still occurring, this is evident through seven of the nine Branch Creek samples presenting as recent hybrids. The high frequency of the lineage 4 mtDNA also suggests that non-random mating is still occurring as it would be expected to find an even number of both lineage 4 and 6 mitochondrial types.

Chapter 4: Using SNPs to understand levels of hybridisation in the Australian glass shrimp (Paratya australiensis)

83

Chapter 5: Resolving the polytomy of Paratya australiensis

84 Chapter 5: Resolving the polytomy of Paratya australiensis

5.1 INTRODUCTION

Mitochondrial DNA (mtDNA) is universally studied and widely popular in molecular phylogenetics, population genetics and as a molecular genetic marker (Peregrino- Uriarte et al., 2009). The typical mitogenome is double stranded circular molecules ranging in size from 15-20kB and is composed of 37 genes encoding 13 protein subunits, 22 transfer RNAs (tRNA) and two ribosomal RNAs (rRNA) as well as non- coding region known as the control region that contains signals for transcription and replication. The mitochondria are involved in an array of processes including metabolism, apoptosis, electron transport chain and are responsible for the majority of cellular ATP production. Currently there are 24 atyid whole mitochondrial genomes sequenced and one of these is Paratya australiensis (Gan et al., 2015). The mitogenome of P. australiensis is made up of 15,990 base pairs (bp) comprised of the typical mitochondrial genes, 13 protein coding, 22 tRNAs and two rRNAs as well as the AT-rich control region.

As mentioned above, mtDNA has been a highly popular marker in molecular phylogenetic studies, this is true also for P. australiensis. Multiple studies on P. australiensis looking at gene flow and population structure have relied on the mitochondrial cytochrome oxidase I (COI) gene (Hurwood et al., 2003; Baker et al, 2004; Cook et al., 2006). In particular the study from Cook et al. (2006), found that while P. australiensis has an extensive distribution across the eastern coast of the Australian mainland, its phylogeny could not be resolved. This resulted in a large polytomy consisting of nine equally divergent lineages. Their distribution and lack of phylogenetic resolution was attributed to a number of independent amphidromy- freshwater life history transitions and this study was also the first to recognise that P. australiensis could in fact be multiple cryptic species or a species complex.

Therefore, the aims of this study are to (i) resolve the polytomy put forward by Cook et al. (2006) through a meta-analysis approach using all publicly available COI sequences and (ii) see if any further resolution can be obtained from using the whole mitogenome of two distinct lineages.

Chapter 5: Resolving the polytomy of Paratya australiensis

85

5.2 METHODS

For this study, the transcriptomes sequenced from Chapter 2 (three Kilcoy individuals and three Stony Creek individuals) and the nine Branch Creek individuals (Chapter 3) were used throughout to construct the mitogenome. The mtDNA COI gene was also sequenced from nine individuals from Kilcoy, Stony and Branch Creeks.

5.2.1 DNA extraction and COI sequencing DNA was isolated from nine individuals from Kilcoy, Stony and Branch Creeks. A modified salt extraction method (Miller et al. 1998) was used to isolate the genomic

DNA and eluted to a final volume of 100µL in sterile H2O (see appendix D for extraction details). The Folmer et al. (1994) universal primers LCO1490 5’- GGTCAACAAATCATAAAGATATTG-3’ and HCO2198 5’- TAAACTTCAGGGTGACCAAAAAATCA-3’ were used to amplify a 710bp region of COI. PCR reactions contained 12.5µL MyFi mix (Bioline, Alexandria, Australia), 1µL forward and reverse primers at concentration 10pM, 2µL DNA template and

8.5µL dH2O for a final reaction volume of 25µL. The PCR cycling conditions were as follows; initial denaturing at 95°C for 5 minutes, 35 cycles of denaturing at 92°C for 30 seconds, annealing at 58°C for 30 seconds and extension at 72°C for 1 minute and a final extension at 15°C for 2 minutes. The amplified product were then cleaned using an Isolate PCR and Gel Kit II (Bioline, Alexandria, Australia) and eluted into 30 µL of elution buffer. The sequencing reaction contained 1µL BigDye Terminator, 3.5µL 5x BigDye sequencing buffer, 1µL LCO1-1490 primer (at concentration 3.2pM), 1µL purified PCR product and 13.5µL dH2O. The sequencing protocol was as follows; 96°C for 1 minute, 30 cycles of 96°C for 10 seconds, 50°C for 5 seconds and 60°C for 4 minutes with a final extension step of 15°C. To prepare the samples for Sanger sequencing an EDTA/ethanol precipitation protocol was followed. Finally, samples were sequenced using a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City) and were analysed on an Applied Biosystem (ABI) 3500 sequencing platform (Carlsbad, CA, USA).

86 Chapter 5: Resolving the polytomy of Paratya australiensis

5.2.2 Bioinformatics Mitogenome construction and phylogeny

The complete mitogenome of P. australiensis (Gan et al., 2015) was acquired from GenBank (Accession Number KM978917) and split into the 37 genes that make up the mitogenome. From this a local nucleotide database was created and each individual transcriptome was blasted against the local database using a stringency of 1e-5. The output was then searched for significant hits and these were aligned in BioEdit (Hall, 1999) by eye. An additional mitogenome was constructed in the same way from a transcriptome from Bain et al. (2016) (CSIRO Data Access Portal: http://doi.org/10.4225/08/58046aaadaeda) A fasta file containing all 17 mitogenomes was used in IQ-TREE (version 1.6.9) (Nguyen et al., 2015). A nexus file was created partitioning the 37 mitochondrial genes and the option to use ModelFinder Plus (MFP+MERGE) was used in the running script. MFP uses ModelFinder (Kalyaanamoorthy et al., 2017) which computes the log-likelihoods of a parsimony tree for many different models and account for Akaike information criterion and correction as well as the Bayesian information criterion. ModelFinder then chooses the model that minimizes the Bayesian score. The +MERGE option lets ModelFinder implement a greedy strategy (Lanfear et al., 2012) that starts with the full partition model and subsequentially merges two genes until the model fit does not increase any further. The option to use ultrafast bootstrapping (Hoang et al., 2017) (-bb 1000) was also used in the running script for 1000 bootstrap replicates. The .treefile output from IQ-TREE was then used in FigTree (version 1.4.4) to visualise the phylogeny.

The time of any inferred population divergence was estimated by using the value of mutational units of time () calculated in DnaSP, to estimate the number of generations since expansion assuming the relationship:  = 2ut where u =  (mutation rate of the gene) x number of base pairs in the fragment, and t is time in generations. For this analysis, the estimate for mutation rate for COI in crustaceans was 0.7% per million years (Knowlton et al., 1993).

Chapter 5: Resolving the polytomy of Paratya australiensis

87

COI phylogeny

All Paratya australiensis COI sequences were gathered from GenBank (https://www.ncbi.nlm.nih.gov/genbank/) with accession numbers AY308106- AY308175, AY641767-AY641791 (Cook et al, 2006); AY308077-AY308105 (Baker et al., 2004); and AF534894-AF534904 (Hurwood et al., 2003). A Paratya howensis COI sequence was also collected (accession number DQ478482 (Page et al., 2007)) to use as the outgroup and root the phylogeny. All sequences were placed in a fasta file and aligned by eye in BioEdit (Hall, 1999). IQ-TREE was also used with the same settings as the mitogenome phylogeny. FigTree was again used to visualise the output. Population divergence was also calculated for just COI using the same method as above. The two pure lineages (4 & 6) were also isolated separately to get an estimate of their divergence times.

5.3 RESULTS

5.3.1 Mitogenome construction and phylogeny For all individuals, there were 3 tRNAs that could not be identified through blasting as well as the whole control region. All individuals could not blast all mitochondrial genes and as such they were not included in the phylogeny construction. In total 11,598bp of the full mtDNA sequence was used to create the phylogeny. The best-fit model found by ModelFinder was TIM2+F+I. The resulting phylogeny can be seen in Figure 5.1. The Gan et al. (2015) mitogenome clades within the lineage 4 individuals while the Bain et al. (2016) mitogenome clades within the lineage 6 individuals. Knowing that the Gan et al. (2015) individual was caught in the Loddon River in and the Bain et al. (2016) individuals were bought from Aquarium Industries in Victoria gives not only between lineage perspectives but also within lineages. Tau () was used to estimate how many generations ago expansion occurred, i.e. divergence time. Using the equation =2ut, expansion is estimated to have occurred approximately 25,000 years ago. However, due to sampling bias this is likely to be the lower limit of when expansion occurred.

5.3.2 COI phylogeny In total there were 172 COI sequences made up of 435bp. This had to be cut down as each group of sequences obtained from GenBank used different primers and therefore

88 Chapter 5: Resolving the polytomy of Paratya australiensis

different regions of the gene were sequenced. The best-fit model found by ModelFinder was TIM2+F+I+G4 and the phylogeny can be seen in Figure 5.2. The divergence estimate found using  for the whole phylogeny was estimated to be approximately 2.4mya. While lineage 4 and lineage 6 are estimated to have had a population expansion approximately 618,000 and 452,000 years ago, respectively.

Chapter 5: Resolving the polytomy of Paratya australiensis

89

KC1

KC2

Figure 5.1 Maximum likelihood unrooted phylogeny based on whole mitogenome of Paratya australiensis. Bootstrap values for the analysis are presented on branches.

Chapter 5: Resolving the polytomy of Paratya australiensis 91

Figure 5.2 Maximum likelihood phylogeny based on COI and rooted with Paratya howensis as an outgroup. Bootstrap values are presented on each branch.

92 Chapter 5: Resolving the polytomy of Paratya australiensis

5.4 DISCUSSION

This study aimed to resolve the current polytomy seen in P. australiensis using the same mitochondrial marker from a previous study (Cook et al., 2006), and identify whether a whole mitochondrial genome approach can give any further resolution to the phylogeny. While the whole mitogenome phylogeny has strong bootstrapping support, it is extremely congruent with the high level of divergence (6%) between lineages 4 and 6 first identified by Hurwood et al. (2003). It is interesting to see high levels of divergence between individuals within lineages based on their geography. The Bain et al. (2016) sample, while still clading within its respective lineage, appears to have diverged before the northern samples. This, however, could be a by-product of the samples being bought from a pet store where there is a higher chance of inbreeding compared to naturally caught samples. As such it is likely more realistic that the differences seen within the lineage 4 samples are more representative of the most parsimonious phylogeny. The divergence estimate of 25,000 years using the mitogenome phylogeny may be hugely underestimated for two main reasons; (i) there is an extreme sample bias as most individuals were sampled from the same location, reducing the amount of variation that is in their natural populations; and (ii) the mutation rate of 0.7% is based on COI for crustaceans.

The COI phylogeny (Figure 5.2) clades the same individuals within the same lineages and sublineages according to the described Cook et al. (2006) phylogeny. The obvious major difference between the two phylogenies is the lack of polytomy seen and because of this, a rearrangement of the lineages has occurred. It appears as though lineage 4 is the most ancestral type. In accordance with coalescent theory this is an expected outcome, as lineage 4 is also the most widespread throughout the sampled distribution along with lineage 6 and 8 having the next largest distributions. The resolution of the new phylogeny would suggest that there has been sequential range expansions, likely in the order of lineage 4, lineage 6 and finally lineage 8, based on the topology of the tree. From this it can be further inferred that the expansions likely happened in a south to north direction based on the distributions of the lineages. While Cook et al. (2006) attributes the P. australiensis distribution and divergence to a number of

Chapter 5: Resolving the polytomy of Paratya australiensis

93

independent amphidromy-freshwater life history transitions, this may be the case for some of the population expansions but is unlikely to account for them all. Freshwater species are able to move throughout drainages in two main ways: either passively or actively. Passively, populations may be subject to drainage rearrangements, simultaneously creating a vicariant event for the captured populations, while facilitating dispersal from one drainage to another. The freshwater fish species, Mogurnda adspersa, have been hypothesised to be genetically structured through drainage rearrangement in north-eastern Queensland (Hurwood & Hughes, 1998). Alternatively, vicariance could arise through climate change scenarios. An example of this can be seen in Euastacus species in Queensland. During the Pliocene, when the continent was warming the ancestor to all current species retreated to the tops of the mountains and lead to speciation through isolation of the populations (Ponniah & Hughes, 2006). Divergence time between lineages 4 and 6 was estimated to be 618,000 years and 452,000 years respectively, these times fall at the beginning and end estimates of the mid Pleistocene when the earth was in an interglacial period, allowing for dispersal potential for the various lineages. Whether through passive or active means of dispersal, after the first population expansion (likely lineage 4) there would have been ample opportunity for these lineages to be in contact with one another. While this species is found in the same rivers there has been no evidence of P. australiensis occurring in true sympatry (sampled at the same time from the same pool). Thus, something is driving one lineage to have higher fitness than the other that it comes into contact with. This may be the result of competitive exclusion (Hardin, 1960) or according to Paterson’s recognition species concept, one population will become extinct before positive reinforcement can occur (Paterson, 1985). This can be seen in real time in Branch Creek, the introduced shrimp have excluded the resident Branch Creek shrimp from the higher elevation pools and are slowly invading the lower elevation pools. The hybrid breakdown between the two lineages is further driving the introduced type to reproduce between themselves while the resident males have a breakdown potentially in mate recognition. Thus, according to the RSC these two lineages are likely to be two different species, however, according to the BSC they would still be considered one as there is still some mate recognition.

94 Chapter 5: Resolving the polytomy of Paratya australiensis

Overall this study resolved the polytomy of Paratya australiensis and in turn the dispersal pattern of the lineages can be inferred (i.e. a south to north pattern). This study also showed that while further resolution is seen using the whole mitogenome, it is not necessary as the COI gene provided enough resolution. However, a future prospect of this study could be to get the whole mitogenome of different clades within lineage 4 to be able to confidently call their positions within the phylogeny.

Chapter 5: Resolving the polytomy of Paratya australiensis

95

Chapter 6: General Discussion

6.1 OVERVIEW OF RESULTS

The overarching aim of this project was to use a functional genomic approach to gain an insight into species and speciation in Paratya australiensis. While still a relatively contentious subject, the speciation field has moved away from single species concepts and towards a more general consensus that speciation occurs along continuum (Seehausen et al., 2014). The research chapters presented here have provided evidence to comment on the current species status of P. australiensis, and the potential processes underlying speciation in this species. The results have also increased our understanding of the biogeographic history of P. australiensis. In this chapter, I summarise the results from each research chapter independently, what these results mean in terms of our current understanding of species and the speciation process and discuss how this relates in terms of multiple species, species complexes and cryptic species with regards to P. australiensis.

The first study (Chapter 2) identified a number of genes involved in reproduction, temperature tolerance, and osmoregulation that are differentially expressed between the two pure lineages found in southeast Queensland (4 – Kilcoy Creek and 6 – Stony Creek). The stark difference between the two lineage expression patterns is likely to be a function of each lineage’s adaptation to their respective environments, where Kilcoy Creek is at a higher altitude and therefore has cooler waters and in turn has a higher level of dissolved oxygen available. The high level of SNPs identified between these two lineages is consistent with differences seen in neutral markers (Cook et al., 2006; Hurwood et al., 2003) but is likely due to natural selection on randomly occurring mutations rather than neutral evolution. Overall, this study provides the first comparative analysis at the functional genome level between two highly divergent lineages of P. australiensis and further support the high level of divergence seen in ‘neutral’ mitochondrial markers.

As aforementioned (Chapter 3), speciation genes are responsible for reproductive isolation (Wang et al., 2018). Here, four previously identified speciation genes were found in the transcriptomes of lineage 4, lineage 6 and the Branch Creek

96 Chapter 6: General Discussion

hybrid individuals. All pre-identified genes appeared to be subject to purifying selection. As these genes are involved in reproductive isolation, the signature of purifying was not surprising as divergence from the standard population through random mutation would be detrimental to those mutant individuals. The male mating behaviour gene takeout was the only gene to be identified as showing a signature of positive selection. Previous research has shown that mutations in this gene in Drosophila results in reduced courtship behaviour in males, although it is expressed in both sexes in the antennae (Dauwalder et al., 2002).

The DGE analysis again illustrated the contrast between the two pure lineages, while the Branch Creek individuals presented an intermediate expression pattern and from this it can be inferred as a form of introgression of the two lineages. Identifying mitochondrial genes along with oxygen transportation genes, indicated that a cytonuclear interaction is likely to be occurring. This finding is the first time a cytonuclear interaction has been presented for P. australiensis and previous research has shown that cytonuclear incompatibilities can lead to speciation (Won et al., 2003; Chou & Leu, 2010). These mito-nuclear gene interactions are also a candidate for the Dobzhansky-Muller (DM) model, as incompatibilities are between loci rather than within (Sweigart & Willis, 2012).

A SNP approach (Chapter 4) was used for the first time to examine hybridisation in the known zone of introgression in P. australiensis. This study was important to determine if mating between the crossed and backcrossed lineage 4 is still ongoing with lineage 6 and whether extreme non-random mating among these lineages continues. Multiple analyses including IBS, PCA and MDS show varying levels of hybridisation within the sampled Branch Creek individuals. From this it can be concluded that hybridisation between the two mitochondrial genotypes is still ongoing even after two decades of introgression and is evident by the early generation hybrids. However, with eight out of the nine individuals carrying the introduced lineage 4 mitogenome, it can be inferred that non-random mating is still occurring.

Aspects of the Cook et al. (2006) nine lineage polytomy was resolved (Chapter 5) through an analysis of COI of the nine lineages and the mitogenome of lineages 4 (Kilcoy Creek) and 6 (Branch Creek). It was revealed that lineage 4 (found in Kilcoy Creek) is the most ancestral and likely the first to expand north from a southern source

Chapter 6: General Discussion

97

population. There appears to be two major expansions that followed lineage 4 in this south to north dispersal, lineage 6 which was closely followed lineage 8. Using the whole mitogenome of lineages 4 and 6 confirmed what was seen with only the COI gene. While in this case mitogenomes were used from only two lineages, the mitogenomic approach has advantages over the COI analysis, such as providing higher support levels as well as a much greater range of time scales of phylogenetically informative divergence (Qin et al., 2019). For this particular analysis, the mitogenomic phylogeny supports the COI phylogeny as the ratio of the branch lengths between the two lineages was consistent between the two. Knowing that the distribution of lineages likely occurred sequentially gives a greater insight into why only a single lineage is found within any one creek system.

These four studies provide significant resources to aid in the understanding of P. australiensis and what factors may be facilitating divergence. The combination of differential gene expression, large number of SNPs and new COI phylogeny suggests that lineages 4 and 6 are likely members of a species complex. These results also confirm that non-random mating is still ongoing even after 20 years. This research is a positive step for two of the nine known lineages and provides the ideal pathway to further investigate speciation in the other lineages.

6.2 CONSEQUENCES AND RELEVANCE OF THIS RESEARCH

A genomic perspective on delimiting species has not always been a popular method (Lee, 2003; Will et al., 2005). But now that we are in the post-genomic era, genomic technology has opened avenues for species identification that were not previously possible. While greatly advantageous to identify reproductive isolation or speciation genes, it does not reduce the confusion surrounding the definitions of species and speciation. This is because of the multitude of species concepts that exist and each concept varies (some more greatly than others) how and what it defines a species to be (Hundsdoerfer et al., 2019). This is evident in P. australiensis, firstly taking into account a single mitochondrial gene (COI) and the new phylogeny (Figure 5.2). Under the phylogenetic species concept, these lineages would be representative of different species as they are separately evolving. Yet under the ecological species concept for example, P. australiensis would be considered a single species as each lineage

98 Chapter 6: General Discussion

occupies the same ecological niche. It would be inappropriate to identify P. australiensis as a single species based purely on the fact they occupy the same niche when much more evidence, such as non-random mating on initial introduction (Hughes et al., 2003), large mitochondrial divergence between lineages (Hurwood et al., 2003), lineage sorting based on SNPs and a probable cytonuclear interaction occurring in the hybrid offspring, suggests there is reason to speculate about multiple cryptic species. Each one of these points individually has led to species delimitation (Pinho, Harris & Ferrand, 2007; Servido et al., 2011; Unmack et al., 2017) and thus in conjunction with one another would put P. australiensis as multiple species or part of a species complex.

While currently a species complex, in enough evolutionary time it is likely that each lineage will become a distinct species as each lineage appears to be separately evolving (at least for lineages 4 and 6). Paratya australiensis is a great example in recognising that speciation is not a finite endpoint but rather a continuum (Hendry et al., 2009). The shift towards the speciation continuum theory is important as there is the recognition that getting to complete reproductive isolation (RI) does not occur instantly but is the accumulation of differences in RI genes (Hendry et al., 2009; Nosil et al., 2009). Hendry et al. (2009) describes four states along this continuum to be used as a guide (refer to 1.1.1.9 for a description of these states). According to these states, lineages 4 and 6 of P. australiensis would be considered somewhere between states 2 and 3 as there is a presence of partial discontinuous variation while also having adaptive differences between the lineages but more importantly there is a presence of RI. The occurrence of RI can be seen in the Branch Creek hybrids as even after 25 generations together, reproduction is still occurring in one direction between the two lineages (resident males and introduced females). If the biological species concept (BSC) is taken into consideration, lineages 4 and 6 would be considered a single species as there is not enough breakdown in species recognition as reproduction still occurs between the lineages (although still in a single direction).

Reproductive isolation has almost exclusively been the focus of speciation research, likely because each species concept defines and describes species on the process of RI (Rabosky, 2016). Because of this, throughout this research RI has been heavily focused on as the main biological process to discern species. It is known that RI can occur through a range of mechanisms and at different points of the reproductive cycle (Nosil et al., 2005). A suite of genes can contribute to RI which have been coined Chapter 6: General Discussion

99

as speciation genes. The research here (Chapter 3) was the first time that speciation genes have been searched for in decapod crustaceans as currently all known speciation genes have been identified in several model species. It is important to restate that it is recognised that these pre-identified speciation genes are unlikely to contribute to RI in the same manner as they do in the species, they were first identified in. Previous research on Drosophila species describes that low levels of genetic variation could be the result of recent fixation during speciation or purifying selection within species along with periodic fixation of alleles by natural selection between species (Civetta & Singh, 1995). Civetta & Singh (1995) state that if purifying selection is maintaining low genetic variation in reproductive traits then the same should be true for more distantly related species. While not currently described as different species, lineages 4 and 6 of P. australiensis can be considered distantly related based on their mitochondrial divergence and thus the signature of purifying selection on the majority of identified speciation and reproductive genes is not unusual. Purifying selection on these genes would suggest that they are conserved from the most recent common ancestor (Schlüter et al., 2011).

The single gene that was identified as having a signature of positive selection, takeout, can alter the way mate choice is viewed in this species. Previous research on Drosophila identified that the takeout gene is involved in male courtship behaviour and associated with detecting female pheromones (Lazareva et al., 2007). It was suggested based on the translocation study (Hughes et al., 2003) the translocated females were preferentially mating with the resident males, while the reverse cross was not seen at all (not a single F1 hybrid resulting from a cross between an introduced male and a resident female has ever been detected). As the takeout gene shows a signature of positive selection, the skewed mating seen in the first generation of the translocation may have been a function of the translocated males unable to detect the resident female pheromone. Many decapods use waterborne cues to gain information about other individuals around them (Chak, Bauer & Thiel, 2015). The use of female sex pheromones to manipulate male competition has been identified in blue crabs (Gleeson, 1980; Jivoff & Hines, 1998) and hermit crabs (Imafuku, 1986; Yamanoi et al., 2006). It has also been reported in the caridean shrimp Heptacarpus paludicola that the females that are ready for mating release a waterborne substance, likely released in the urine, increases male activity (Bauer, 1979). A possible explanation of

100 Chapter 6: General Discussion

the extreme non-random mating may be the translocated males being unable to detect or recognise the resident female pheromone, i.e. do not recognise them as potential mates. Another explanation could simply be male-male competition and the resident males ‘winning’ over the translocated males. However, caridean shrimp do not have a prolonged and elaborate behavioural interaction before copulation. Because they are primarily a swimming species rather than having a benthic lifestyle, this competition can be an unnecessary use of energy (Asakura, 2016).

While the non-random mating may be a function of pheromone detection, the reduced hybrid fitness may be due to a cytonuclear interaction, in part involving oxygen transport genes. As the discordance occurs between loci, the cytonuclear incompatibility in P. australiensis aligns with a DM model scenario. Evidently, a DM model will exist in a post-zygotic state, but the isolation arises in allopatry due to adaptation to their local environments (Orr & Turelli, 2001). Divergence from the common ancestor and fixation of alleles must not significantly impact viability or fertility within their respective populations, but when populations are brought back together their respective divergence of alleles may not function properly in the hybrid offspring (Orr & Turelli, 2001). As such, RI can be driven by loci involved in DM models and cytonuclear incompatibilities are considerable contributors to RI and ultimately speciation (Barnard-Kubow, So & Galloway, 2016).

The genic species concept (GSC) is an important concept to mention when speaking about these incompatible loci, as the GSC fundamentally considers that few genes are responsible for RI and adaptation to natural environments. Under the GSC, lineages 4 and 6 of P. australiensis would be considered different species as the genetic exchange of genes associated with oxygen transport is not advantageous for the hybrid offspring. This is because the lineage 4 (Kilcoy Creek) are well adapted to the cooler water temperatures along with a higher dissolved oxygen content. The translocation to the lower altitude and warmer waters of Branch Creek could also aid in the non- random mating. As the females come into season in the spring when the water is warming, the translocated females could be coming into the breeding season earlier than the resident females due to the already warmer temperatures than Kilcoy Creek.

Finally, the results from Chapter 5 have revealed that there is an extremely high probability that throughout each lineage’s range expansion, from south to north, there

Chapter 6: General Discussion

101

would be ample opportunity to encounter one another. But currently, there is no evidence of any two lineages occurring within the same pool and a process such as competitive exclusion may be driving a single lineage to be dominant within a creek system. The process of competitive exclusion is arguably what has occurred as the expanding lineage occupies the same fundamental niche as the resident type (Young, 2004). Generally, competitive exclusion occurs between species and is seen when an invasive species is introduced to a niche that is occupied by an ecologically similar species (e.g. Maldonado-Coelho et al., 2017). If competitive exclusion is the cause of a single lineage occurring within a stream, each lineage could be considered as distinct species. However, research on freshwater crayfish in the USA reports that although the local and introduced species occupy the same habitats, there was little evidence suggesting that competitive exclusion was reason for species replacement (Westhoff & Rabeni, 2013). This may also be the case for P. australiensis, where something else is driving a single lineage to have higher fitness over another in a single creek.

If not competitive exclusion causing this scenario, it may be that one lineage is going extinct before positive reinforcement occurs. If this is what is occurring and arguments from the recognition species concept (RSC) follow, lineages 4 and 6 should be considered two species due to the substantial breakdown in mate recognition. This sentiment conflicts entirely with the biological species concept as there is some form of mate recognition occurring between the lineages. This situation exemplifies why the species debate is so contentious and why it is necessary for a universally agreed upon definition, concept or theory of species and speciation. This is slowly occurring with the acceptance of the species continuum theory. However, it is likely that different disciplines of biology will continue to use different theories and concepts and this debate will remain unresolved into the foreseeable future.

6.3 LIMITATIONS AND FUTURE DIRECTIONS

Perspectives of speciation vary dramatically among biological disciplines, which makes it difficult to identify populations that are in the process of becoming multiple species. Paratya australiensis is a system that, from previous research, appears to be multiple species. The findings presented throughout this thesis are highly significant with regard to determining if P. australiensis is a single species or species complex.

102 Chapter 6: General Discussion

The results presented throughout can largely only comment on the two lineages from the same river system that were the focus of the project, thus a major area of future research that will aid in species delimitation is to sequence the transcriptome of the remaining seven lineages. The information provided by the transcriptomes will further inform research regarding speciation genes as well as the cytonuclear incompatibilities in P. australiensis. The use of different genomic approaches such as DArTseq or whole genome sequencing was outside the scope of this study, but would be great additions to this research avenue. Currently, the model species for crustaceans is Daphnia pulex, an extremely distantly related crustacean belonging to the Cladocera order. A genome belonging to a decapod crustacean would greatly advance all avenues of research, including speciation and expand our knowledge of speciation genes within this order of crustaceans. Another area to strengthen the research from this project will be to perform behavioural studies. Under controlled laboratory conditions, removing the effect of local environment adaptation, mate choice experiments would aid in determining how mate choice occurs in the species. These behavioural experiments will be beneficial for the detection of the possible female pheromone and differences between the two lineages, the expression of the takeout gene in the male shrimp when presented with a female of each lineage and which male ‘wins’ in the face of competition for the female. As a final point for the future directions of this research, a taxonomic revision of the species should be undertaken. The first step in this process would be to identify RI genes as well as any differences in morphological traits between the lineages as well as any ecological traits. While potentially not all lineages can be considered species, it would be valuable to go back and identify the multiple species and subspecies of Riek (1953). The five species described by Riek’s (1953) taxonomic revision may align with five of the nine known lineages. Through the use of morphology, ecology and genetics and genomics, Paratya australiensis could be revised to a total of nine species and given the geographic distribution of lineage 4 and placement within the phylogeny would be considered the ancestral type to all other revised species. The distribution of some of the southern lineages is narrow, with many found within a single river and thus important for the conservation of the species.

Chapter 6: General Discussion

103

6.4 CONCLUDING REMARKS

Gaining an insight into speciation has been a difficult task within biological science, but now genetics and genomics provide a different insight into the speciation process. Overall, the results presented indicate that Paratya australiensis should no longer be considered monotypic. Through the use of genomics to identify potential speciation genes, identification of SNPs between pure lineages and within a zone of introgression along with resolving the previous nine lineage polytomy through mtDNA analysis, there is little doubt that P. australiensis should undergo a taxonomic revision.

104 Chapter 6: General Discussion

References Abbott, R., Albach, D., Ansell, S., Arntzen, J. W., Baird, S. J. E., Bierne, N., Boughman, J., Brelsford, A., Buerkle, C. A., Buggs, R., Butlin, R. K., Dieckmann, U., Eroukhmanoff, F., Grill, A., Cahan, S. H., Hermansen, J. S., Hewitt, G., Hudson, A. G., Jiggins, C., Jones, J., Keller, B., Marczewski T., Mallet, J., Martinez-Rodriguez, P., Möst, M., Mullen, S., Nicholas, R., Nolte, A. W., Parisod, C., Pfennig, K., Rice, A. M., Ritchie, M. G., Seifert, B., Smadja, C. M., Stelkens, R., Szymura, J. M., Väinölä, R., Wolf, J. B. W. & Zinner, D. (2013). Hybridization and speciation. Journal of Evolutionary Biology, 26(2), 229-246.

Abebe, E., Mekete, T., & Thomas, W. K. (2011). A critique of current methods in nematode taxonomy. African Journal of Biotechnology, 10(3), 312-323.

Andrews S. (2010). FastQC: A quality control tool for high throughput sequence data. Reference Source: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

Anderson, E. (1949). Introgressive Hybridization. John Wiley & Sons, New York

Asakura, A. K. I. R. A. (2016). The evolution of mating systems in decapod crustaceans. In Decapod crustacean phylogenetics (pp. 133-194). CRC Press.

Bain, P. A., Gregg, A. L. & Kumar, A. (2016). De novo assembly and analysis of changes in the protein-coding transcriptome of the freshwater shrimp Paratya australiensis (: Atyidae) in response to acid sulfate drainage water. BMC Genomics, 17(1), 890. https://doi.org/10.1186/s12864-016-3208-y

Baker, A. M., Hurwood, D. A., Krogh, M., & Hughes, J. M. (2004). Mitochondrial DNA signatures of restricted gene flow within divergent lineages of an atyid shrimp (Paratya australiensis). Heredity, 93(2), 196–207. http://doi.org/10.1038/sj.hdy.6800493

Barbash, D. A., Roote, J., & Ashburner, M. (2000). The Drosophila melanogaster hybrid male rescue gene causes inviability in male and female species hybrids. Genetics, 154(4), 1747–1771.

Barchuk, A. R., Cristino, A. S., Kucharski, R., Costa, L. F., Simões, Z. L. P., & Maleszka, R. (2007). Molecular determinants of caste differentiation in the highly eusocial honeybee Apis mellifera. BMC Developmental Biology, 7, 70. http://doi.org/10.1186/1471-213X-7-70

Barnard-Kubow, K. B., So, N., & Galloway, L. F. (2016). Cytonuclear incompatibility contributes to the early stages of speciation. Evolution, 70(12), 2752–2766. https://doi.org/10.1111/evo.13075

Barr, C.M. & Fishman, L. (2010). The nuclear component of a cytonuclear hybrid incompatibility in Mimulus maps to a cluster of pentatricopeptide repeat genes. Genetics 184, 455–465

References 105

Barton, N.H., & Jones, J.S. (1983). Mitochondrial DNA: new clues about evolution. Nature, 306, 317-318.

Bateson, W. (1909). Heredity and variation in modern lights. In: Seward, A.C. (Ed.), Darwin and Modern Science. Cambridge: Cambridge University Press, 85-101.

Bauer, R. T. (1979). Sex attraction and recognition in the caridean shrimp Heptacarpus paludicola Holmes (Decapoda: Hippolytidae). Marine & Freshwater Behaviour & Physiology, 6(3), 157-174.

Baum, D. A., & Donoghue, M. J. (1995). Choosing among alternative “Phylogenetic” species concepts. Systematic Botany, 20(4), 560–573.

Belyaeva, M. & Taylor, D. (2009). Cryptic species within the Chydorus sphaericus species complex (Crustacea: Cladocera) revealed by molecular markers and sexual stage morphology. Molecular Phylogenetics and Evolution, 50(3), 534-546. http://dx.doi.org/10.1016/j.ympev.2008.11.007

Berdan, E. L., Mazzoni, C. J., Waurick, I., Roehr, J. T., & Mayer, F. (2015). A population genomic scan in Chorthippus grasshoppers unveils previously unknown phenotypic divergence. Molecular Ecology, 24(15), 3918–3930. https://doi.org/10.1111/mec.13276

Blank, D., Wolf, L., Ackerman, M. & Silander, O. K. (2014). The predictability of molecular evolution during functional innovation. Proceedings of the National Academy of Science of United States of America, 111(8), 3044-3049.

Bleil, J. D., & Wassarman, P. M. (1990). Identification of a ZP3-binding protein on acrosome-intact mouse sperm by photoaffinity crosslinking. Proceedings of the National Academy of Sciences of the United States of America, 87(14), 5563–5567. http://doi.org/10.1073/pnas.87.14.5563

Bolnick, D. I., & Fitzpatrick, B. M. (2007). Sympatric speciation: models and empirical evidence. Annual Review Ecology Systematics, 38(928), 459–487. http://doi.org/10.1146/annurev.ecolsys.38.091206.095804

Brideau, N. J., Flores, H. A., Wang, J., Maheshwari, S., Wang, X., & Barbash, D. A. (2006). Two Dobzhansky-Muller genes interact to cause hybrid lethality in Drosophila. Science, 314(5803), 1292–5. http://doi.org/10.1126/science.1133953

Buerkle, C. A., Wolf, D. E., & Rieseberg, L. H. (2003). The origin and extinction of species through hybridization. In Population Viability in Plants (pp. 117-141). Springer, Berlin, Heidelberg.

Burke, J. M., & Arnold, M. L. (2001). Genetics and the fitness of hybrids. Annual Review of Genetics, 35(1), 31-52.

Burton, R. S., & Barreto, F. S. (2012). A disproportionate role for mt DNA in Dobzhansky–Muller incompatibilities?. Molecular Ecology, 21(20), 4942-4957.

106 References

Bush, G.L. 1966. The taxonomy, cytology, and evolution of the genus Rhagoletis in North America (Diptera: Tephritidae). Bulletin of the Museum of Comparative Zoology, 134, 431–562.

Calvert, M. E., Digilio, L. C., Herr, J. C., & Coonrod, S. A. (2003). Oolemmal proteomics - identification of highly abundant heat shock proteins and molecular chaperones in the mature mouse egg and their localization on the plasma membrane. Reproductive Biology and Endocrinology, 1, 27. http://doi.org/10.1186/1477-7827-1- 27

Carmona, E., Weerachatyanukul, W., Soboloff, T., Fluharty, A. L., White, D., Promdee, L., Ekker, M. Berger, T., Buhr, M. & Tanphaichitr, N. (2002). Arylsulfatase a is present on the pig sperm surface and is involved in sperm-zona pellucida binding. Developmental Biology, 247, 182–196. http://doi.org/10.1006/dbio.2002.0690

Carpenter, A. (1977) Zoogeography of the New Zealand freshwater Decapoda: a review. Tuatara, 23, 41–48.

Chak, S. T., Bauer, R., & Thiel, M. (2015). Social behaviour and recognition in decapod shrimps, with emphasis on the . In Social Recognition in Invertebrates (pp. 57-84). Springer, Cham.

Chen, M. Y., Liang, D., & Zhang, P. (2015). Selecting question-specific genes to reduce incongruence in phylogenomics: A case study of jawed vertebrate backbone phylogeny. Systematic Biology, 64(6), 1104–1120. http://doi.org/10.1093/sysbio/syv059

Chomczynski, P. & Mackey, K. (1995). Modification of the TRI reagent procedure for isolation of RNA from polysaccharide- and proteoglycan-rich sources. Biotechniques, 19, 942-945.

Chou, J. Y., & Leu, J. Y. (2010). Speciation through cytonuclear incompatibility: Insights from yeast and implications for higher eukaryotes. BioEssays, 32(5), 401– 411. https://doi.org/10.1002/bies.200900162

Civetta, A., & Singh, R. S. (1995). High divergence of reproductive tract proteins and their association with postzygotic reproductive isolation in Drosophila melanogaster and Drosophila virilis group species. Journal of Molecular Evolution, 41(6), 1085– 1095. https://doi.org/10.1007/BF00173190

Collins, R. A., & Cruickshank, R. H. (2013). The seven deadly sins of DNA barcoding. Molecular Ecology Resources, 13(6), 969–975. https://doi.org/10.1111/1755- 0998.12046

Colosimo, P. F. (2005). Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science, 307(5717), 1928–1933. http://doi.org/10.1126/science.1107239

References 107

Cook, B. D., Baker, A. M., Page, T. J., Grant, S. C., Fawcett, J. H., Hurwood, D. A., & Hughes, J. M. (2006). Biogeographic history of an Australian freshwater shrimp, Paratya australiensis (Atyidae): The role life history transition in phylogeographic diversification. Molecular Ecology, 15(4), 1083–1093. http://doi.org/10.1111/j.1365- 294X.2006.02852.x

Cornette, R., Farine, J.-P., Abed-Viellard, D., Quennedey, B., & Brossut, R. (2003). Molecular characterization of a male-specific glycosyl hydrolase, Lma-p72, secreted on to the abdominal surface of the Madeira cockroach Leucophaea maderae (Blaberidae, Oxyhaloinae). The Biochemical Journal, 372(2), 535–541. http://doi.org/10.1042/BJ20030025

Coyne, J. A. & Orr, H. A. (2004). Speciation. Sinauer Associates, Sunderland, MA.

Coyne, J. A., Orr, H. A., & Futuyma, D. J. (1988). Do we need a new species concept ? Systematic Zoology, 37(2), 190–200.

Cracraft, J. (2000). Species concepts in theoretical and applied biology: A systematic debate with consequences. In Q. Wheeler & R. Meier (Eds.), Species Concepts and Phylogenetic Theory (pp. 4-14). New York: Columbia University Press.

Cui, Z., Hui, M., Liu, Y., Song, C., Li, X., Li, Y., Liu, L., Shi, G., Wang, S., Li, F., Zhang, X., Liu, C., Xiang, J. & Chu, K. H. (2015). High-density linkage mapping aided by transcriptomics documents ZW sex determination system in the Chinese mitten crab Eriocheir sinensis. Heredity, 115(3), 206–215. https://doi.org/10.1038/hdy.2015.26

Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., Handsaker, R. E., Lunter, G., Math, G. T., Sherry, S. T., McVean, G. & Durbin, R. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156–2158. https://doi.org/10.1093/bioinformatics/btr330

Darwin, C. (1859). On the Origin of Species by Means of Natural Selection (J. Murray, London).

Dasmahapatra, K. K., Elias, M., Hill, R. I., Hoffman, J. I., & Mallet, J. (2010). Mitochondrial DNA barcoding detects some species that are real, and some that are not. Molecular Ecology Resources, 10(2), 264–273. https://doi.org/10.1111/j.1755- 0998.2009.02763.x

Dauwalder, B., Tsujimoto, S., Moss, J., & Mattox, W. (2002). The Drosophila takeout gene is regulated by the somatic sex-determination pathway and affects male courtship behaviour. Genes & Development, 16, 2879–2892. https://doi.org/10.1101/gad.1010302.rily

Davidson, N. M., Hawkins, A. D., & Oshlack, A. (2017). SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes. Genome biology, 18(1), 148.

108 References

de Carvalho, F. L., Pileggi, L. G., & Mantelatto, F. L. (2013). Molecular data raise the possibility of cryptic species in the Brazilian endemic prawn Macrobrachium potiuna (Decapoda, Palaemonidae). Latin American Journal of Aquatic Research, 41(4), 707– 717. https://doi.org/10.3856/vol41-issue4-fulltext-7

De Grave, S. & Fransen, C. H. J. M. (2011). Carideorum catalogus: the recent species of the dendrobranchiate, stenopodidean, procarididean and caridean shrimps (Crustacea: Decapoda). Zoologische Mededelingen, 85, 195.

DePristo, M. A., Banks, E., Poplin, R. E., Garimella, K. V., Maguire, J. R., Hartl, C., Philippakis, A. A., del Angel, G., Rivas, M. A., Hanna, M., McKenna, A., Fennell, T. J., Kernytsky, A. M., Sivachenko, A. Y., Cibulskis, K., Gabriel, S. B., Altshuler, D., & Daly, M. J. (2011). A framework for variation discovery and genotyping using next- generation DNA sequencing data. Nature Genetics, 43(5), 491–498. https://doi.org/10.1038/ng.806.A de Queiroz, K. (1997). The Linnaean hierarchy and the evolutionization of taxonomy, with emphasis on the problem of nomenclature. Aliso, 15(2), 125-144.

De Queiroz, K. (1998). The general lineage concept of species, species criteria, and the process of speciation. Endless forms: species and speciation (pp 57-75). Oxford University Press. de Queiroz, K. (2005a). Ernst Mayr and the modern concept of species. Proceedings Of The National Academy Of Sciences, 102(Supplement 1), 6600-6607. http://dx.doi.org/10.1073/pnas.0502030102 de Queiroz, K. (2005b). A unified concept of species and its consequences for the future of taxonomy. Proceedings of the California Academy of Sciences, 56(Supplement 1), 196-215. de Queiroz, K. (2007). Species concepts and species delimitation. Systematic Botany, 56(6), 879–886. http://doi.org/10.1080/10635150701701083

Di Candia, M. R., & Routman, E. J. (2007). Cytonuclear discordance across a leopard frog contact zone. Molecular Phylogenetics and Evolution, 45(2), 564–575. https://doi.org/10.1016/j.ympev.2007.06.014

Dion-Côté, A.-M., & Barbash, D. A. (2017). Beyond speciation genes: an overview of genome stability in evolution and speciation. Current Opinions in Genetics and Development, 47, 17–23. https://doi.org/10.1016/j.gde.2017.07.014

Dobzhansky, T., & Dobzhansky, T. G. (1937). Genetics and the Origin of Species (No. 11). Columbia University Press.

Dobzhansky, T. (1970). Genetics of the Evolutionary Process. New York: Columbia University Press.

Dopman, E. B., Pérez, L., Bogdanowicz, S. M., & Harrison, R. G. (2005). Consequences of reproductive barriers for genealogical discordance in the European

References 109

corn borer. Proceedings of the National Academy of Sciences of the United States of America, 102(41), 14706–11. http://doi.org/10.1073/pnas.0502054102

Egan, S. P., Ragland, G. J., Assour, L., Powell, T. H. Q., Hood, G. R., Emrich, S., Nosil, P. & Feder, J. L. (2015). Experimental evidence of genome-wide impact of ecological selection during early stages of speciation-with-gene-flow. Ecology Letters, 18(8), 817–825. http://doi.org/10.1111/ele.12460

Egger, B., Lapraz, F., Tomiczek, B., Müller, S., Dessimoz, C., & Girstmair, J., Škunca, N., Rawlinson, K. A., Cameron, C. B., Beli, E., Todaro, M. A., Gammoudi, M., Noreña, C. & Telford, M. J. (2015). A Transcriptomic-Phylogenomic Analysis of the evolutionary relationships of flatworms. Current Biology, 25(10), 1347-1353. http://dx.doi.org/10.1016/j.cub.2015.03.034

Ekblom, R. & Galindo, J. (2011). Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity, 107(1), 1-15. http://dx.doi.org/10.1038/hdy.2010.152

Eldredge, N., & Cracraft, J. (1980). Phylogenetic patterns and the evolutionary process. Method and theory in comparative biology (pp. 1239-40). Columbia University Press, New York.

Ellstrand, N. C. (1992). Gene flow by pollen: implications for plant conservation genetics. Oikos, 63, 77-86.

Ellstrand, N. C., & Schierenbeck, K. A. (2000). Hybridization as a stimulus for the evolution of invasiveness in plants? Proceedings of the National Academy of Sciences, 97(13), 7043–7050.

Fawcett, J.H., Hurwood, D.A. & Hughes, J.M. (2010). Consequences of a translocation between two divergent lineages of the Paratya australiensis (Decapoda:Atyidae) complex: reproductive success and relative fitness. Journal of the North American Benthological Society, 29(3), 1170-1180.

Feder, J.L., Chilcote, C.A. & Bush, G.L. 1988. Genetic differentiation between sympatric host races of Rhagoletis pomonella. Nature 336, 61–64.

Folmer, O., Black, M., Hoeh, W., Lutz, R., & Vrijenhoek, R. C. (1994). DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology, 3(5), 294–299. https://doi.org/10.1371/journal.pone.0013102

Frankham, R., Ballou, J., Dudash, M., Eldridge, M., Fenster, C., Lacy, R., Mendelson III, J., Porton, I., Ralls, K. & Ryder, O. (2012). Implications of different species concepts for conserving biodiversity. Biological Conservation, 153, 25-31. http://dx.doi.org/10.1016/j.biocon.2012.04.034

Frankham, R. (2015) Genetic rescue of small inbred populations: meta-analysis reveals large and consistent benefits of gene flow. Molecular Ecology, 24, 2610-2618

110 References

Frost, D. R., & Hillis, D. M. (1990). Species in concept and practice: herpetologica applications. Herpetologica, 46(1), 87–104.

Frost, D. R., & Kluge, A. G. (1994). A consideration of epistemological in systematic biology, with special references to species. Cladistics, 10, 259–294.

Gagnaire, P. A., Pavey, S. A., Normandeau, E., & Bernatchez, L. (2013). The genetic architecture of reproductive isolation during speciation-with-gene-flow in lake whitefish species pairs assessed by rad sequencing. Evolution, 67(9), 2483–2497. http://doi.org/10.1111/evo.12075

Fry, K., & Salser, W. (1977). Nucleotide sequences of HS-α satellite DNA from kangaroo rat Dipodomys ordii and characterization of similar sequences in other rodents. Cell, 12(4), 1069–1084. https://doi.org/0092-8674(77)90170-2

Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23), 3150–3152. https://doi.org/10.1093/bioinformatics/bts565

Gan, H. Y., Gan, H. M., Lee, Y. P., & Austin, C. M. (2015). The complete mitogenome of the Australian freshwater shrimp Paratya australiensis Kemp, 1917 (Crustacea: Decapoda: Atyidae). Mitochondrial DNA, 1736, 1–3. https://doi.org/10.3109/19401736.2015.1007312

Gemmel, P. (1979). Feeding habits and structure of the gut of the Australian freshwater prawn Paratya australiensis Kemp (Crustacea, Caridea, Atyadae). In Proceedings of the Linnean Society of .

Ghaffari, N., Sanchez-Flores, A., Doan, R., Garcia-Orozco, K. D., Chen, P. L., Ochoa- Leyva, A., Lopez-Zavala, A. A., Carrasco, J. S., Hong, C., Brieba, L. G., Rudiño- Piñera, E., Blood, P. D., Sawyer, J. E., Johnson, C. D., Dindot, S. V., Sotelo-Mundo R. R. & Criscitiello, M. F. (2015). Novel transcriptome assembly and improved annotation of the whiteleg shrimp (Litopenaeus vannamei), a dominant crustacean in global seafood mariculture. Scientific Reports, 4(1), 7081. https://doi.org/10.1038/srep07081

Gleeson, R. A. (1980). Pheromone communication in the reproductive behavior of the blue crab, Callinectes sapidus. Marine & Freshwater Behaviour & Physiology, 7(2), 119-134.

Greiner, S., Rauwolf, U. W. E., Meurer, J., & Herrmann, R. G. (2011). The role of plastids in plant speciation. Molecular ecology, 20(4), 671-691.

Hajibabaei, M., Singer, G. A. C., Hebert, P. D. N., & Hickey, D. A. (2007). DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics, 23(4), 167–172. http://doi.org/10.1016/j.tig.2007.02.001

References 111

Hall, T. A. (1999). BioEdit: a user friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series, 41, 95– 98.

Hancock, M. A. (1995). Population dynamics and life history of Paratya australiensis Kemp, 1917 (Decapoda: Atyidae) in upland rainforest streams, south eastern Queensland, Australia. PhD thesis. Griffith University, Nathan, Queensland.

Hancock, M. & Bunn, S. (1997). Population dynamics and life history of Paratya australiensis Kemp, 1917 (Decapoda: Atyidae) in upland rainforest streams, south- eastern Queensland, Australia. Marine and Freshwater Research, 48(4), 361. http://dx.doi.org/10.1071/mf97003

Hardin, G. (1960). The competitive exclusion principle. Science, 131(3409), 1292- 1297.

Hausdorf, B. (2011). Progress toward a general species concept. Evolution, 65(4), 923- 931. http://dx.doi.org/10.1111/j.1558-5646.2011.01231.x

Havird, J. C., & Santos, S. R. (2016). Here we are, but where do we go? A systematic review of crustacean transcriptomic studies from 2014–2015. Integrative and Comparative Biology, 1–12. https://doi.org/10.1093/icb/icw061

Hebert, P. D. N., Cywinska, A., Ball, S. L., & DeWaard, J. R. (2003). Biological identifications through DNA barcodes. Proceedings of the Royal Society of London B: Biological Sciences, 270(1512), 313–321. https://doi.org/10.1098/rspb.2002.2218

Heinrich, R., Wenzel, B., & Elsner, N. (2001). A role for muscarinic excitation: control of specific singing behavior by activation of the adenylate cyclase pathway in the brain of grasshoppers. Proceedings of the National Academy of Sciences of the United States of America, 98(17), 9919–9923. http://doi.org/10.1073/pnas.151131998

Hendry, A. P., Bolnick, D. I., Berner, D., & Peichel, C. L. (2009). Along the speciation continuum in sticklebacks. Journal of Fish Biology, 75(8), 2000–2036. https://doi.org/10.1111/j.1095-8649.2009.02419.x

Hendry, Andrew P. (2009). Ecological speciation! Or the lack thereof? Canadian Journal of Fisheries and Aquatic Sciences, 66(8), 1383–1398. https://doi.org/10.1139/F09-074

Hey, J. (2006). Recent advances in assessing gene flow between diverging populations and species. Current Opinion in Genetics and Development, 16(6), 592–596. http://doi.org/10.1016/j.gde.2006.10.005

Hieter, P. & Boguski, M. (1997). Functional genomics: it's all how you read it. Science, 278(5338), 601-602. http://dx.doi.org/10.1126/science.278.5338.601

Hill, W. G. (2010). Understanding and using quantitative genetic variation. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 73–85. https://doi.org/10.1098/rstb.2009.0203

112 References

Hoang, D. T., Chernomor, O., Von Haeseler, A., Minh, B. Q., & Vinh, L. S. (2017). UFBoot2: improving the ultrafast bootstrap approximation. Molecular Biology and Evolution, 35(2), 518-522.

Huang, H & Knowles, L. L. (2016) Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences. Systematic Biology, 65, 357-365.

Hughes, J., Goudkamp, K., Hurwood, D., & Hancock, M. (2003). Translocation Causes Extinction of a Local Population of the Freshwater Shrimp Paratya australiensis. Conservation Biology, 17(4), 1007–1012.

Hughes, J. M., Schmidt, D. J., & Finn, D. S. (2009). Genes in streams: using DNA to understand the movement of freshwater fauna and their riverine habitat. 21st Century Directions in Biology, 59(7), 573–583.

Hundsdoerfer, A. K., Lee, K. M., Kitching, I. J., & Mutanen, M. (2019). Genome-wide SNP data reveal an overestimation of species diversity in a group of hawkmoths. Genome Biology and Evolution, 11(8), 2136–2150. https://doi.org/10.1093/gbe/evz113

Hurwood, D. A., & Hughes, J. M. (1998). Phylogeography of the freshwater fish, Mogurnda adspersa, in streams of northeastern Queensland, Australia: evidence for altered drainage patterns. Molecular Ecology, 7(11), 1507-1517.

Hurwood, D. a, Hughes, J. M., Bunn, S. E., & Cleary, C. (2003). Population structure in the freshwater shrimp (Paratya australiensis) inferred from allozymes and mitochondrial DNA. Heredity, 90(1), 64–70. http://doi.org/10.1038/sj.hdy.6800179

Imafuku, M. (1986). Sexual discrimination in the hermit crab Pagurus geminus. Journal of Ethology, 4(1), 39-47.

Jivoff, P., & Hines, A. H. (1998). Female behaviour, sexual competition and mate guarding in the blue crab, Callinectes sapidus. Animal Behaviour, 55(3), 589-603.

Jónsson, H., Schubert, M., Seguin-Orlando, A., Ginolhac, A., Petersen, L., Fumagalli, M., Albrechtsen, A., Petersen, B., Korneliussen, T. S., Vilstrup, J. T., Lear, T., Myka, J. L., Lundquist, J., Miller, D. C., Alfarhan, A. H., Alquraishi, S., A., Al-Rasheid, K. A. S., Stagegaard, J., Strauss, G., Bertelsen, M., F., Sicheritz-Ponten, T., Antczak, F. F., Bailey, E., Nielsen, R., Willerslev, E. & Orlando, L. (2014). Speciation with gene flow in equids despite extensive chromosomal plasticity. Proceedings of the National Academy of Sciences of the United States of America, 111(52), 18655–60. http://doi.org/10.1073/pnas.1412627111.

Jung, H., Lyons, R. E., Dinh, H., Hurwood, D. A., McWilliam, S., & Mather, P. B. (2011). Transcriptomics of a giant freshwater prawn (Mmacrobrachium rosenbergii): De Novo assembly, annotation and marker discovery. PLoS ONE, 6(12). https://doi.org/10.1371/journal.pone.0027938

References 113

Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K., von Haeseler, A., & Jermiin, L. S. (2017). ModelFinder: fast model selection for accurate phylogenetic estimates. Nature methods, 14(6), 587.

Kim, S., Plagnol, V., Hu, T. T., Toomajian, C., Clark, R. M., Ossowski, S., Ecker, J. R., Weigel, D. & Nordborg, M. (2007). Recombination and linkage disequilibrium in Arabidopsis thaliana. Nature genetics, 39(9), 1151.

King, M., & Wilson, A. C. (1975). Evolution at two levels in humans and chimpanzees. Science, 188(4184), 107–116. https://doi.org/10.1126/science.1090005

Knowlton, N., Weigt, L. A., Solórzano, L. A., Mills, D. K. & Bermingham, E. (1993). Divergence in proteins, mitochondrial DNA, and reproductive compatibility across the Isthmus of Panama. Science 260, 1629-1632

Kuballa, A. V., & Elizur, A. (2008). Differential expression profiling of components associated with exoskeletal hardening in crustaceans. BMC genomics, 9(1), 575.

Lanfear, R., Calcott, B., Ho, S. Y., & Guindon, S. (2012). PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution, 29(6), 1695-1701.

Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. https://doi.org/10.1038/nmeth.1923.Fast

Lazareva, A. A., Roman, G., Mattox, W., Hardin, P. E., & Dauwalder, B. (2007). A role for the adult fat body in Drosophila male courtship behavior. PLoS Genetics, 3(1), 0115–0122. https://doi.org/10.1371/journal.pgen.0030016

Leder, E. H., McCairns, R. J. S., Leinonen, T., Cano, J. M., Viitaniemi, H. M., Nikinmaa, M., Primmer, C. R. & Merila, J. (2015). The evolution and adaptive potential of transcriptional variation in sticklebacks - Signatures of selection and widespread heritability. Molecular Biology and Evolution, 32(3), 674–689. https://doi.org/10.1093/molbev/msu328

Lee, M. S. Y. (2003). Species concepts and species reality: Salvaging a Linnaean rank. Journal of Evolutionary Biology, 16(2), 179–188. https://doi.org/10.1046/j.1420- 9101.2003.00520.x

Lega, M., Fior, S., Prosser, F., Bertolli, A., Li, M., & Varotto, C. (2012). Application of the unified species concept reveals distinct lineages for disjunct endemics of the Brassica repanda (Brassicaceae) complex. Biological journal of the Linnean society, 106(3), 482-497.

Levin, D. A., Francisco-Ortega, J., & Jansen, R. K. (1996). Hybridization and the extinction of rare plant species. Conservation Biology, 10(1), 10–16. https://doi.org/10.1046/j.1523-1739.1996.10010010.x

Lewontin, R. C. (1983). The organism as the subject and object of evolution. Scientia (Milan) 118, 65-82.

114 References

Lexer, C., Lai, Z., & Rieseberg, L. H. (2004). Candidate gene polymorphisms associated with salt tolerance in wild sunflower hybrids: Implications for the origin of Helianthus paradoxus, a diploid hybrid species. New Phytologist, 161(1), 225–233. http://doi.org/10.1046/j.1469-8137.2003.00925.x

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA- Seq data with or without a reference genome. BMC Bioinformatics, 12(1), 323. https://doi.org/10.1186/1471-2105-12-323

Machordom, A. & Macpherson, E. (2004). Rapid radiation and cryptic speciation in squat lobsters of the genus Munida (Crustacea, Decapoda) and related genera in the South West Pacific: molecular and morphological evidence. Molecular Phylogenetics and Evolution, 33(2), 259-279. http://dx.doi.org/10.1016/j.ympev.2004.06.001

Malay, M. C. M. D., & Paulay, G. (2010). Peripatric speciation drives diversification and distributional pattern of reef hermit crabs (Decapoda: Diogenidae: Calcinus). Evolution: International Journal of Organic Evolution, 64(3), 634-662.

Maldonado-Coelho, M., Marini, M. Â., do Amaral, F. R., & Ribon, R. (2017). The invasive species rules: competitive exclusion in forest avian mixed-species flocks in a fragmented landscape. Revista Brasileira de Ornitologia, 25(1), 54-59.

Mallet, J. (2007). Hybrid Speciation. Nature, 446, 279–283. https://doi.org/10.1016/B978-0-12-800049-6.00072-X

Mallet, J., Meyer, A., Nosil, P., & Feder, J. L. (2009). Space, sympatry and speciation. Journal of Evolutionary Biology, 22(11), 2332–2341. http://doi.org/10.1111/j.1420- 9101.2009.01816.x

Manichaikul, A., Mychaleckyj, J. C., Rich, S. S., Daly, K., Sale, M., & Chen, W. M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics, 26(22), 2867-2873.

Martin, S. H., Dasmahapatra, K. K., Nadeau, N. J., Slazar, C., Walters, J. R., Simpson, F., Blaxter, M., Manica, A., Mallet, J. & Jiggins, C. D. (2013). Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Research, 23, 1817– 1828. http://doi.org/10.1101/gr.159426.113.

Mathews, L. (2006). Cryptic biodiversity and phylogeographical patterns in a snapping shrimp species complex. Molecular Ecology, 15(13), 4049-4063. http://dx.doi.org/10.1111/j.1365-294x.2006.03077.x

Mayden, R. L. (1997). A hierarchy of species concepts: the denouement in the saga of the species problem. In M. F. Claridge, H. A. Dawah, & M. R. Wilson (Eds.), The units of biodiversity (pp. 381–423). London: Chapman and Hall.

Mayr, E. (1942). Systematics and the origin of species, from the viewpoint of a zoologist. Harvard University Press.

References 115

Mayr, E. (1995). Species, classification, and evolution. Biodiversity and Evolution. National Science Museum Foundation, Tokyo, 3-122.

Mayr, E. (2000a). The Biological Species Concept. In Q. Wheeler & R. Meier (Eds.), Species Concepts and Phylogenetic Theory (pp. 17-29). New York: Columbia University Press.

Mayr, E. (2000b). A critique form the biological species concept perspective: what is a species, and what is not?. In Q. Wheeler & R. Meier (Eds.), Species Concepts and Phylogenetic Theory (pp. 93-100). New York: Columbia University Press.

Metz, E. C., & Palumbi, S. R. (1996). Positive selection and sequence rearrangements generate extensive polymorphism in the gamete recognition protein bindin. Molecular Biology and Evolution, 13(2), 397-406

Meusel, F., & Schwentner, M. (2017). Molecular and morphological delimitation of Australian Triops species (Crustacea: Branchiopoda: Notostraca)—large diversity and little morphological differentiation. Organisms Diversity and Evolution, 17(1), 137– 156. https://doi.org/10.1007/s13127-016-0306-2

Minelli, A. (1993) Biological Systematics: The State of the Art. Chapman & Hall, New York.

Miranda, I., Gomes, K. M., Ribeiro, F. B., Araujo, P. B., Souty-Grosset, C., & Schubart, C. D. (2018). Molecular systematics reveals multiple lineages and cryptic speciation in the freshwater crayfish Parastacus brasiliensis (von Martens, 1869) (Crustacea : Decapoda : Parastacidae). Invertebrate Systematics, 32(6), 1265. https://doi.org/10.1071/is18012

Mishler, B. D. & Theriot, E. C. (2000). The phylogenetic species concept (sensu Mishler and Theriot): monophyly, apomorphy, and phylogenetic species concepts. In Q. Wheeler & R. Meier (Eds.), Species Concepts and Phylogenetic Theory (pp. 44- 54). New York: Columbia University Press.

McKitrick, M. C., & Zink, R. M. (1988). Species concepts in ornithology. The Condor, 90(1), 1–14.

McNamara, J. C., & Faria, S. C. (2012). Evolution of osmoregulatory patterns and gill ion transport mechanisms in the decapod Crustacea: A review. Journal of Comparative Physiology B: Biochemical, Systemic, and Environmental Physiology, 182(8), 997– 1014. https://doi.org/10.1007/s00360-012-0665-8

Mihola, O., Trachtulec, Z., Vlcek, C., Schimenti, J. C., & Forejt, J. (2009). A mouse speciation gene encodes a meiotic histone H3 methyltransferase. Science, 323(5912), 373–5. http://doi.org/10.1126/science.1163601

116 References

Miranda, I., Gomes, K. M., Ribeiro, F. B., Araujo, P. B., Souty-Grosset, C., & Schubart, C. D. (2019). Molecular systematics reveals multiple lineages and cryptic speciation in the freshwater crayfish Parastacus brasiliensis (von Martens, 1869)(Crustacea: Decapoda: Parastacidae). Invertebrate Systematics, 32(6), 1265- 1281.

Moore, J. H. & Hu, T. (2015). Epistasis analysis using information theory. In: Moore J. & S. Williams (eds) Epistasis. Methods in molecular biology (methods and protocols), vol 1253. Humana Press, New York, NY.

Moritz, C., & Cicero, C. (2004). DNA barcoding: Promise and pitfalls. PLoS Biology, 2(10), 1529–1531. http://doi.org/10.1371/journal.pbio.0020354

Morozova, O. & Marra, M. (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics, 92(5), 255-264. http://dx.doi.org/10.1016/j.ygeno.2008.07.001

Morris, T. (1991). A comparison of the distribution and abundance of the freshwater shrimp, Paratya australiensis and Caradina mccullochi (Decapoda: Atyidae), in the Lower River Murray. B. Sc (Hons.) Thesis, Department of Zoology, University of Adelaide, Adelaide.

Moshtaghi, A., Rahi, M. L., Nguyen, V. T., Mather, P. B., & Hurwood, D. A. (2016). A transcriptomic scan for potential candidate genes involved in osmoregulation in an obligate freshwater palaemonid prawn (Macrobrachium australiense). PeerJ, 4, e2520. https://doi.org/10.7717/peerj.2520

Muller, H. J. (1942). Isolating mechanisms, evolution and temperature. Biological Symposia, 6, 71-125.

Nguyen, L. T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2014). IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular biology and evolution, 32(1), 268-274. Niemiller, M. L., Fitzpatrick, B. M., & Miller, B. T. (2008). Recent divergence with gene flow in Tennessee cave salamanders (Plethodontidae: Gyrinophilus) inferred from gene genealogies. Molecular Ecology, 17(9), 2258–2275. http://doi.org/10.1111/j.1365-294X.2008.03750.x

Nixon, K. C., & Wheeler, Q. D. (1990). An amplification of the phylogenetic species concept. Cladistics, 6(3), 211–223. http://doi.org/10.1111/j.1096- 0031.1990.tb00541.x

Nixon, K. C., & Wheeler, Q. D. (1992). Extinction and the origin of species. In M. J. Novacek and Q. D. Wheeler (eds.) Extinction and Phylogeny (119-143). New York: Columbia University Press.

Noor, M. A. F. (2002). Is the biological species concept showing its age? Trends in Ecology and Evolution, 17(4), 153–154. http://doi.org/10.1016/S0169- 5347(02)02452-7

References 117

Nosil, P., & Feder, J. L. (2012). Genomic divergence during speciation: Causes and consequences. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1587), 332–342. https://doi.org/10.1098/rstb.2011.0263

Nosil, P., Feder, J. L., Flaxman, S. M., & Gompert, Z. (2017). Tipping points in the dynamics of speciation. Nature Ecology and Evolution, 1(2), 1–8. https://doi.org/10.1038/s41559-016-0001

Nosil, P., & Schluter, D. (2011). The genes underlying the process of speciation. Trends in Ecology and Evolution, 26(4), 160–167. https://doi.org/10.1016/j.tree.2011.01.001

Nosil, P., Harmon, L. J., & Seehausen, O. (2009). Ecological explanations for (incomplete) speciation. Trends in Ecology and Evolution, 24(3), 145–156. https://doi.org/10.1016/j.tree.2008.10.011

Nosil, P., Vines, T. H., & Funk, D. J. (2005). Immigrants from divergent habitats. Evolution; International Journal of Organic Evolution, 59, 705–719. https://doi.org/10.1111/j.0014-3820.2005.tb01747.x

Novak, P. A., Bayliss, P., Crook, D. A., Garcia, E. A., Pusey, B. J., & Douglas, M. M. (2017). Do upstream migrating, juvenile amphidromous shrimps, provide a marine subsidy to river ecosystems? Freshwater Biology, 62(5), 880–893. https://doi.org/10.1111/fwb.12907

Orr, H. A. (2001). Some doubts about (yet another) view of species. Journal of Evolutionary Biology, 14, 870–871. http://doi.org/10.1046/j.1420-9101.2001.00340.x

Orr, H. A., & Turelli, M. (2001). The evolution of postzygotic isolation: accumulating Dobzhansky‐Muller incompatibilities. Evolution, 55(6), 1085-1094.

Page, T. J., Baker, A. M., Cook, B. D., & Hughes, J. M. (2005). Historical transoceanic dispersal of a freshwater shrimp: The colonization of the South Pacific by the genus Paratya (Atyidae). Journal of Biogeography, 32(4), 581–593. https://doi.org/10.1111/j.1365-2699.2004.01226.x

Page, T. J., Von Rintelen, K., & Hughes, J. M. (2007). An island in the stream: Australia’s place in the cosmopolitan world of Indo-West Pacific freshwater shrimp (Decapoda: Atyidae: Caridina). Molecular Phylogenetics and Evolution, 43(2), 645- 659.

Palumbi, S. R., & Metz, E. C. (1991). Strong reproductive isolation between closely related tropical sea urchins (genus Echinometra). Molecular Biology and Evolution, 8(2), 227-239.

Papadopulos, A. S. T., & Baker, W. J. (2011). Speciation with gene flow on Lord Howe Island. Proceedings of the National Academy of Sciences of the United States of America, 108(32), 1–6. http://doi.org/10.1073/pnas.1106085108

118 References

Pareek, C. S., Smoczynski, R., & Tretyn, A. (2011). Sequencing technologies and genome sequencing. Journal of Applied Genetics, 52(4), 413–435. http://doi.org/10.1007/s13353-011-0057-x

Paterson H. E. H. (1985). The recognition concept of species. In E. S. Vrba (ed), Species and Speciation (21-29). Transvaal Museum Monograph No. 4, Pretoria.

Paterson, H.E.H. (1993). Evolution and the recognition concept of species, Collected Writings of H. E. H. Paterson (ed. S. F. McEvey), Johns Hopkins University Press, Baltimore, Maryland.

Pawlowski, J., Esling, P., Lejzerowicz, F., Cedhagen, T., & Wilding, T. A. (2014). Environmental monitoring through protist next-generation sequencing metabarcoding: assessing the impact of fish farming on benthic foraminifera communities. Molecular Ecology Resources, 14(6), 1129–1140. https://doi.org/10.1111/1755-0998.12261

Payseur, B. A. (2010). Using differential introgression in hybrid zones to identify genomic regions involved in speciation. Molecular Ecology Resources, 10(5), 806– 820. https://doi.org/10.1111/j.1755-0998.2010.02883.x

Peregrino-Uriarte, A. B., Varela-Romero, A., Muhlia-Almazán, A., Anduro-Corona, I., Vega-Heredia, S., Gutiérrez-Millán, L. E., De la Rosa-Vélez, J. & Yepiz-Plascencia, G. (2009). The complete mitochondrial genomes of the yellowleg shrimp Farfantepenaeus californiensis and the blue shrimp Litopenaeus stylirostris (Crustacea: Decapoda). Comparative Biochemistry and Physiology Part D: Genomics and Proteomics, 4(1), 45-53.

Perina, G., Camacho, A. I., Huey, J., Horwitz, P., & Koenders, A. (2019). The role of allopatric speciation and ancient origins of Bathynellidae (Crustacea) in the Pilbara (Western Australia): two new genera from the De Grey River catchment. Contributions to Zoology, 88(4), 452-497.

Phadnis, N., & Allen Orr, H. (2009). A single gene causes both male sterility and segregation distortion in Drosophila hybrids. Science (New York, N.Y.), 323(5912), 376–379. http://doi.org/10.1126/science.1163934.A

Philippe, H., Delsuc, F., Brinkmann, H., & Lartillot, N. (2005). Phylogenomics. Annual Review of Ecology, Evolution, and Systematics, 36, 541–562. http://doi.org/10.1146/annurev.ecolsys.35.112202.130205

Pinho, C., Harris, D. J., & Ferrand, N. (2007). Comparing patterns of nuclear and mitochondrial divergence in a cryptic species complex: The case of Iberian and North African wall lizards (Podarcis, Lacertidae). Biological Journal of the Linnean Society, 91(1), 121–133. https://doi.org/10.1111/j.1095-8312.2007.00774.x

Ponniah, M., & Hughes, J. M. (2006). The evolution of Queensland spiny mountain crayfish of the genus Euastacus. II. Investigating simultaneous vicariance with intraspecific genetic data. Marine and Freshwater Research, 57(3), 349-362.

References 119

Prachumwat, A., & Li, W. H. (2008). Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes. Genome research, 18(2), 221-232.

Presgraves, D. C., Balagopalan, L., Abmayr, S. M., & Orr, H. A. (2003). Adaptive evolution drives divergence of a hybrid inviability gene between two species of Drosophila. Nature, 423(6941), 715–719. http://doi.org/10.1038/nature01679

Qin, J., Li, J., Gao, Q., Wilson, J. J., & Zhang, A. B. (2019). Mitochondrial phylogeny and comparative mitogenomics of closely related pine moth pests (Lepidoptera: Dendrolimus). PeerJ, 7, e7317.

Rabosky, D. L. (2016). Reproductive isolation and the causes of speciation rate variation in nature. Biological Journal of the Linnean Society, 118(1), 13–25. https://doi.org/10.1111/bij.12703

Radwan, J. & Babik, W. (2012). The genomics of adaptation. Proceedings of The Royal Society B Biological Sciences, doi:10.1098/rspb.2012.2322.

Rahi, M. L., Amin, S., Mather, P. B., & Hurwood, D. A. (2017). Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba). PeerJ, 5, e2977. https://doi.org/10.7717/peerj.2977

Rahi, M. L., Mather, P. B., Ezaz, T., & Hurwood, D. A. (2019). The molecular basis of freshwater adaptation in prawns: Insights from comparative transcriptomics of three Macrobrachium species. Genome Biology and Evolution, 11(4), 1002–1018. https://doi.org/10.1093/gbe/evz045

Raubenheimer, D., & Crowe, T. M. (1987). The recognition concept of species: is it really an alternative? South African Journal of Science, 83(September), 530–534.

Riek, E. F. (1953). The Australian freshwater prawns of the family Atyidae. Records of the Australian Museum 23, 111-121.

Rieseberg, L. H., Zona, S., Aberbom, L., & Martin, T. D. (1989). Hybridization in the island endemic, Catalina Mahogany. Conservation Biology, 3(1), 52–58.

Rieseberg, L. H., Raymond, O., Rosenthal, D. M., Lai, Z., Livingstone, K., Nakazato, T., Durphy, J. L., Schwarzbach, A. E., Donovan, L. A. & Lexer, C. (2003). Major ecological transitions in wild sunflowers facilitated by hybridization. Science, 301(5637), 1211–1216. http://doi.org/10.1126/science.1086949

Richter, S. & Scholtz, G. (2001). Phylogenetic analysis of the Malacostraca (Crustacea). J Zoological System, 39(3), 113-136. http://dx.doi.org/10.1046/j.1439- 0469.2001.00164.x

Robledo, D., Palaiokostas, C., Bargelloni, L., Martínez, P., & Houston, R. (2017). Applications of genotyping by sequencing in aquaculture breeding and genetics. Reviews in Aquaculture, 1–13. https://doi.org/10.1111/raq.12193

120 References

Rosen, D. E. (1978). Vicariant patterns and historical explanation in biogeography. Systematic Zoology, 27(2), 159–188.

Schön, I., Pieri, V., Sherbakov, D. Y., & Martens, K. (2017). Cryptic diversity and speciation in endemic Cytherissa (Ostracoda, Crustacea) from Lake Baikal. Hydrobiologia, 800(1), 61–79. https://doi.org/10.1007/s10750-017-3259-3

Schumer, M., Cui, R., Powell, D. L., Rosenthal, G. G., & Andolfatto, P. (2016). Ancient hybridization and genomic stabilization in a swordtail fish. Molecular Ecology, 2661–2679. https://doi.org/10.1111/mec.13602

Schlüter, P. M. (2018). The magic of flowers or: speciation genes and where to find them. American Journal of Botany, 105(12), 1957–1961. https://doi.org/10.1002/ajb2.1193

Schlüter, P. M., Xu, S., Gagliardini, V., Whittle, E., Shanklin, J., Grossniklaus, U., & Schiestl, F. P. (2011). Stearoyl-acyl carrier protein desaturases are associated with floral isolation in sexually deceptive orchids. Proceedings of the National Academy of Sciences of the United States of America, 108(14), 5696–5701. https://doi.org/10.1073/pnas.1013313108

Seehausen, O., Butlin, R. K., Keller, I., Wagner, C. E., Boughman, J. W., Hohenlohe, P. A., Peichel, C.L., Saetre, G., Bank, C., Brannstrom, A., Brelsford, A., Clarkson, C.S., Eroukhmanoff, F., Feder, J.L., Fischer, M.C., Foote, A.D., Franchini, P., Jiggins, C.D., Jones, F.C., Lindholm, A.K., Lucek, K., Maan, M.E., Marques, D.A., Martin, S.H., Matthews, B., Meier, J.I., Most, M., Nachman, M.W., Nonaka, E., Rennison, D.J., Schwarzer, J., Watson, E.T., Westram, A.M., Widmer, A. (2014). Genomics and the origin of species. Nature Reviews Genetics, 15(3), 176–192. https://doi.org/10.1038/nrg3644

Servedio, M. R., Doorn, G. S. Van, Kopp, M., Frame, A. M., & Nosil, P. (2011). Magic traits in speciation: “magic” but not rare? Trends in Ecology and Evolution, 26(8), 389–397. https://doi.org/10.1016/j.tree.2011.04.005

Shanwu, T., & Presgraves, D. C. (2009). Evolution of the Drosophila nuclear pore complex results in multiple hybrid incompatibilities. Science, 323(5915), 779–782. http://doi.org/10.1126/science.1169123.Evolution

Shaw, K. L., & Mullen, S. P. (2014). Speciation continuum. Journal of Heredity, 105, 741–742. https://doi.org/10.1016/B978-0-12-800049-6.00080-9

Shull, G. H. (1923). The species concept from the point of view of a geneticist. American Journal of Botany, 10(5), 221–228. http://doi.org/10.2307/2435375

Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31(19), 3210–3212. http://doi.org/10.1093/bioinformatics/btv351

References 121

Simon, S., Narechania, A., DeSalle, R., & Hadrys, H. (2012). Insect phylogenomics: exploring the source of incongruence using new transcriptomic data. Genome Biology and Evolution, 4(12), 1295-1309. http://dx.doi.org/10.1093/gbe/evs104

Simpson, G. G. (1951). The species concept. Evolution, 5(4), 285-298.

Simpson, G.G. (1961). Principles of Animal Taxonomy. New York: Columbia University Press.

Smith, M.J. & Williams, W.D. (1980). Infraspecific variation within the Atyidae: A study of morphological variation within a population of Paratya australiensis (Crustacea:Decapoda). Australian Journal of Marine and Freshwater Research, 31, 397-407.

Sneath, P.H. (1976) Phenetic taxonomy at the species level and above. Taxon, 25, 437- 50.

Sneath, P. H., & Sokal, R. R. (1973). Numerical taxonomy. The principles and practice of numerical classification.

Sokal, R. R., & Crovello, T. J. (1970). The biological species concept: a critical evaluation. The American Naturalist, 104(936), 127–153. http://dx.doi.org/10.1086/282646

Stuessy, T.F. (1990) Plant Taxonomy. The Systematic Evaluation of Comparative Data. New York: Columbia University Press.

Sweigart, A. L., Fishman, L., & Willis, J. H. (2006). A simple genetic incompatibility causes hybrid male sterility in mimulus. Genetics, 172(4), 2465–2479. http://doi.org/10.1534/genetics.105.053686

Templeton, A. R. (1989). The meaning of species and speciation: a genetic perspective. In D. Otte & J. A. Endler (eds.), Speciation and its Consequences (pp 3- 27). Sinauer Associates, Sunderland, MA.

Ting, C. T., Tsaur, S. C., & Wu, C. I. (2000). The phylogeny of closely related species as revealed by the genealogy of a speciation gene, Odysseus. Proceedings of the National Academy of Sciences of the United States of America, 97(10), 5313–5316. http://doi.org/10.1073/pnas.090541597

Todesco, M., Pascual, M. A., Owens, G. L., Ostevik, K. L., Moyers, B. T., Hübner, S., Heredia, A. M., Hahn, M. A., Caseys, C., Bock, D. G. & Rieseberg, L. H. (2016). Hybridization and extinction. Evolutionary Applications, 9(7), 892–908. https://doi.org/10.1111/eva.12367

Unmack, P. J., Sandoval-Castillo, J., Hammer, M. P., Adams, M., Raadik, T. A., & Beheregaray, L. B. (2017). Genome-wide SNPs resolve a key conflict between sequence and allozyme data to confirm another threatened candidate species of river

122 References

blackfishes (Teleostei: Percichthyidae: Gadopsis). Molecular Phylogenetics and Evolution, 109, 415–420. https://doi.org/10.1016/j.ympev.2017.02.013

Upadhyaya, N. M., Pereira, A., & Watson, J. M. (2010). Biotech crops and functional genomics. In Transgenic Crop Plants (pp. 359-390). Springer Berlin Heidelberg.

Van Der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Levy-moonshine, A., Jordan, T., Shakir, K., Roazen D., Thibault, J., Banks, E., Garimella, K. V., Altshuler, D., Gabriel, S. & Depristo, M. A. (2014). From FastQ data to high confidence varant calls: the Genonme Analysis Toolkit best practices pipeline. In Current Protocols in Bioinformatics (Vol. 11). https://doi.org/10.1002/0471250953.bi1110s43

Van Der Wal, C., Ahyong, S. T., Ho, S. Y. W., Lins, L. S. F., & Lo, N. (2019). Combining morphological and molecular data resolves the phylogeny of Squilloidea (Crustacea : Malacostraca). Invertebrate Systematics, (March). https://doi.org/10.1071/is18035

Van Valen, L. (1976). Ecological species, multispecies, and oaks. Taxon, 25(2/3), 233–239. http://doi.org/10.2307/1219444

Vera-Silva, A. L., Carvalho, F. L., & Mantelatto, F. L. (2016). Distribution and Genetic Differentiation of Macrobrachium Jelskii (Natantia: Palaemonidae) in Brazil Reveal Evidence of Non-Natural Introduction and Cryptic Allopatric Speciation. Journal of Crustacean Biology, 36(3), 373-383.

Vogt, G. (2013). Abbreviation of larvel developmet and extension of brood care as key features of the evolution of freshwater Decapoda. Biological Reviews 88, 81-116.

Vogt, G., Falckenhayn, C., Schrimpf, A., Schmid, K., Hanna, K., Panteleit, J., Helm, M., Schulz, R. & Lyko, F. (2015). The marbled crayfish as a paradigm for saltational speciation by autopolyploidy and parthenogenesis in animals. Biology open, 4(11), 1583-1594. von Rintelen, K., Page, T., Cai, Y., Roe, K., Stelbrink, B., & Kuhajda, B. Iliffe, T. M., Hughes, J. & von Rintelen, T. (2012). Drawn to the dark side: a molecular phylogeny of freshwater shrimps (Crustacea: Decapoda: Caridea: Atyidae) reveals frequent cave invasions and challenges current taxonomic hypotheses. Molecular Phylogenetics and Evolution, 63(1), 82-96. http://dx.doi.org/10.1016/j.ympev.2011.12.015

Vuillaume, B., Valette, V., Lepais, O., Grandjean, F., & Breuil, M. (2015). Genetic evidence of hybridization between the endangered native species iguana delicatissima and the invasive Iguana iguana (Reptilia, Iguanidae) in the Lesser Antilles: Management implications. PLoS ONE, 10(6), 1–20. https://doi.org/10.1371/journal.pone.0127575

Walker, T.M. (1972). A study of the morphology, taxonomy, biology and some aspects of the ecology of Paratya australiensis Kemp from Tasmania. B. Sc (Hons.) Thesis, Zoology Department, University of Tasmania, Hobart.

References 123

Walsh, C. J. & Mitchell, B. D. (1995). The freshwater shrimp Paratya australiensis (Kemp, 1917) (Decapod:Atyidae) in estuaries of south-western Victoria, Australia. Marine and Freshwater Research, 46, 959-965.

Wang, R. J., & Hahn, M. W. (2018). Speciation genes are more likely to have discordant gene trees. Evolution Letters, 2(4), 281–296. https://doi.org/10.1002/evl3.77

Ward, R. D., Zemlak, T. S., Innes, B. H., Last, P. R., & Hebert, P. D. N. (2005). DNA barcoding Australia’s fish species. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 360, 1847–1857. http://doi.org/10.1098/rstb.2005.1716

Weber, A. A.-T., Abi-Rached, L., Galtier, N., Bernard, A., Montoya-Burgos, J. I., & Chenuil, A. (2017). Positive selection on sperm ion channels in a brooding brittle star: consequence of life-history traits evolution. Molecular Ecology, 26, 3744–3759. https://doi.org/10.1111/mec.14024

Wei, J., Zhang, X. Yu, Y., Huang, H., Li, F. & Xiang, J. (2014). Comparative transcriptomic characterization of the early development in Pacific white shrimp Litopenaeus vannamei. PLoS One, 9, 1-13.

Westhoff, J. T., & Rabeni, C. F. (2013). Resource selection and space use of a native and an invasive crayfish: evidence for competitive exclusion? Freshwater Science, 32(4), 1383–1397. https://doi.org/10.1899/13-036.1

Wiley, E. O. (1978). The evolutionary species concept reconsidered. Systematic Zoology, 27(1), 17–26. http://doi.org/10.2307/2412809

Wiley, E. O. & Mayden, R. L. (2000). The Evolutionary Species Concept. In Q. Wheeler & R. Meier (Eds.), Species Concepts and Phylogenetic Theory (pp. 55-69). New York: Columbia University Press.

Will, K. W., Mishler, B. D., & Wheeler, Q. D. (2005). The perils of DNA barcoding and the need for integrative taxonomy. Systematic Biology, 54(5), 844–851. https://doi.org/10.1080/10635150500354878

Williams, W.D. (1977). Some aspects of the ecology of Paratya australiensis (Crustacea: Decapoda: Atyidae). Australian Journal of Marine and Freshwater Research, 30, 815-832.

Williams, W.D. (1980). ‘Australian Freshwater Life’ (Macmillan: Melbourne).

Williams, W. D., & Smith, M. J. (1979). A taxonomic revision of Australian species of Paratya (Crustacea: Aytidae). Australian Journal of Marine and Freshwater Research, 30, 815–832.

Wilson, J. D., Schmidt, D. J., & Hughes, J. M. (2016). Movement of a hybrid zone between lineages of the Australian glass shrimp (Paratya australiensis). Journal of Heredity, 107, 413–422. http://doi.org/10.1093/jhered/esw033

124 References

Wolf, D. E., Takebayashi, N. & Rieseberg L. H. (2001). Predicting the risk of extinction through hybridization. Conservation Biology, 15, 1039–1053.

Won, Y., Hallam, S. J., O’Mullan, G. D., & Vrijenhoek, R. C. (2003). Cytonuclear disequilibrium in a hybrid zone involving deep-sea hydrothermal vent mussels of the genus Bathymodiolus. Molecular Ecology, 12(11), 3185–3190. https://doi.org/10.1046/j.1365-294X.2003.01974.x

Wu, C. I. (2001). The genic view of the process of speciation. Journal of Evolutionary Biology, 14(6), 851–865. http://doi.org/10.1046/j.1420-9101.2001.00335.x

Xie, X., Qiu, W. G., & Lipke, P. N. (2011). Accelerated and adaptive evolution of yeast sexual adhesins. Molecular Biology and Evolution, 28(11), 3127–3137. https://doi.org/10.1093/molbev/msr145

Yamanoi, T., Yoshino, K., Kon, K., & Goshima, S. (2006). Delayed copulation as a means of female choice by the hermit crab Pagurus filholi. Journal of Ethology, 24(3), 213-218.

Yang, Z. (2007). PAML 4: a program package for phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24, 1586-1591.

Yang, Z., & Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution, 17(1), 32–43.

Young, K. A. (2004). Asymmetric competition, habitat selection, and niche overlap in juvenile salmonids. Ecology, 85(1), 134-149.

Yue, G. H. &Wang, L. (2017). Current status of genome sequencing and its application in aquaculture. Aquaculture, 468, 337-347.

Zhang. Z., Schwartz, S., Wagner, L., & Miller, W. (2000). A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7(1-2), 203-214.

Zigler, K. S., McCartney, M. A., Levitan, D. R., & Lessios, H. A. (2005). Sea urchin bindin divergence predicts gamete compatibility. Evolution, 59(11), 2399. http://doi.org/10.1554/05-098.1

References 125

Appendices

Appendix A

Figure A1 Top hit species distribution chart

Appendices 127

A

B

128 Appendices

Figure A2 Expression pattern of the differentially expressed transcripts for individuals collected from Stony Creek (SC and Kilcoy Creek (KC) (the yellow coloured transcripts are upregulated or highly expressed and purple transcripts are lowly expressed). Graph A is gene expression at e-3 with a log fold change 2. Graph B is gene expression at e-10 with a log fold change 2.

Appendices 129

Table A1 Abundance estimation (TPM) of identified genes potentially involved in reproduction, temperature tolerance, osmoregulation and egg size control in P. australiensis based on GO terms from blast hits. Transcript ID Gene name GO terms Kilcoy Stony DN19103_c1_g1_i1 26S protease proteasome regulatory particle, 3.15 0.77 subunit base subcomplex; determination regulatory of adult lifespan; embryo subunit 6a development ending in birth or egg hatching; embryo sac development; reproduction DN18230_c2_g1_i1 abnormal gamete generation; cell division; 1.4 0.99 spindle-like developmental process involved microcephaly- in reproduction; forebrain associated development homolog DN35441_c16_g1_i1 ankyrin repeat- reproduction 5.64 36.33 containing DN32119_c1_g1_i1 ATPase family multicellular organism 2.30 0 AAA domain- development; reproduction; containing 3- mitochondrion organisation like DN25402_c0_g1_i1 Dihydrolipoyl mitochondrial matrix; 1.33 0.51 dehydrogenase locomotion; determination of adult lifespan; embryo development ending in birth or egg hatching; :reproduction DN34779_c13_g1_i1 EMB- isoform a hatching; locomotion; embryo 4.19 3.23 development ending in birth or egg hatching; reproduction DN35586_c3_g1_i1 high-density hermaphrodite genitalia 5.1 0.02 lipo -binding development; locomotion; embryo development ending in birth or egg hatching; body morphogenesis; reproduction DN35152_c0_g1_i1 LET- isoform a hatching; embryo development 14.14 0.01 ending in birth or egg hatching; reproduction; positive regulation of multicellular organism growth DN35152_c12_g1_i1 LET- isoform b hatching; embryo development 3.48 0 ending in birth or egg hatching; reproduction; positive regulation of multicellular organism growth DN14719_c1_g1_i1 LSM14 cell cycle process; anatomical 36.05 24.81 homolog A-like structure development; reproduction; single-organism developmental process DN34673_c3_g1_i1 major facilitator receptor-mediated endocytosis; 4.65 8.43 superfamily reproduction transporter

DN16911_c0_g1_i1 MAP kinase 5 developmental process involved 1.64 0.31 in reproduction; multicellular organism development; response to stress

130 Appendices

DN28047_c0_g1_i1 maternal developmental process involved 0.21 0 embryonic in reproduction leucine zipper kinase DN19991_c2_g1_i1 osteoclast- reproduction 6.37 8.66 stimulating factor DN35581_c0_g1_i1 peritrophin A extracellular space; chitin 255.69 20.46 binding; multicellular organism reproduction DN35485_c7_g1_i1 Protein zinc ion binding; reproduction 2.93 6.12 Y47G6A.14 DN32172_c0_g1_i1 SCY1 2 ATP binding; multicellular 1.45 0.18 organism reproduction; protein kinase activity DN12518_c1_g1_i1 t1 st2 receptor regulation of post-mating 1.07 0.26 binding oviposition; neurogenesis; reproduction DN14787_c0_g2_i1 trypsin Inhibitor hermaphrodite genitalia 2.35 0 like cysteine development; locomotion; rich domain embryo development ending in birth or egg hatching; reproduction DN24177_c1_g1_i1 Tyrosine- kinase reproduction; innate immune 2.7 0 Src42A response

DN35080_c15_g1_i1 tyrosine- embryonic body 12.05 19.44 phosphatase morphogenesis; reproduction; non-receptor type 4-like DN35532_c1_g1_i1 ubiquitin- DNA repair; reproduction 24.93 21.53 conjugating enzyme E2-17 kDa-like isoform 2 DN25381_c3_g1_i1 WD repeat- anatomical structure 2.81 0 containing 43 morphogenesis; reproduction

DN37590_c0_g1_i1 10 kDa response to heat; chaperone 9.61 0 chaperonin binding; DN30157_c0_g1_i1 A Chain ATPase activity; response to 1.57 0.79 Tricyclic Series estrogen; chaperone binding; Of Hsp90 response to salt stress; response Inhibitors to cold; response to heat DN35514_c28_g1_i1 aconitate response to salt stress; response 2.84 0 cytoplasmic to temperature stimulus DN34770_c34_g1_i1 Biotin-- ligase response to heat; histone 0.7 3.79 biotinylation

DN28188_c0_g1_i1 endoplasmin ATPase activity; response to 1 0 homolog salt stress; response to cadmium ion; response to water deprivation; response to cold; response to heat DN25306_c1_g1_i1 family ATP binding; heat shock protein 5.26 0 chaperone binding; response to heat

Appendices 131

DN30970_c5_g1_i1 heat shock positive regulation of response 2.06 0 to oxidative stress; response to oxidative stress; response to heat DN32372_c3_g1_i1 heat shock 70 response to heat; response to 1.08 0.98 kDa virus DN35343_c16_g1_i1 heat shock 86 response to heat 8.09 0

DN35167_c22_g1_i1 heat shock 90 chaperone binding; receptor 1.48 0 complex; response to salt stress; regulation of cellular response to heat; response to cold DN32372_c0_g1_i1 Heat shock heat shock protein binding; 9.96 3.42 cognate 71 kDa male meiosis I; response to cold; response to heat DN27610_c2_g1_i1 mitogen- inflammatory response; 1.87 0 activated kinase response to osmotic stress; 10 isoform X5 response to heat;

DN25237_c0_g1_i1 serine response to heat; dauer larval 1.56 0 threonine- development; determination of phosphatase 2A adult lifespan; 56 kDa regulatory subunit alpha isoform-like DN30845_c2_g1_i1 transient potassium ion transport; 0.65 0.29 receptor response to heat; cation channel potential activity; calcium ion transport; channel pyrexia- cation channel complex like DN33567_c0_g1_i1 V-type proton water transport; response to 2.05 0.25 ATPase temperature stimulus; cell catalytic subunit morphogenesis A-like DN34847_c14_g1_i1 aquaporin integral component of plasma 37.31 34.79 membrane; glycerol channel activity; cellular water homeostasis; ion transmembrane transport; water channel activity DN34528_c2_g1_i1 Arginine kinase ATP binding; phosphorylation; 5270.2 7480. kinase activity 2 59 DN25466_c0_g1_i1 calreticulin calcium ion binding; unfolded 388.21 427.9 protein binding; protein folding 3 DN35846_c0_g1_i1 carbonic zinc ion binding; carbonate 109.7 58.3 anhydrase 1 dehydratase activity; one-carbon metabolic process DN33253_c2_g1_i1 carbonic zinc ion binding; carbonate 36.76 35.31 anhydrase 15- dehydratase activity; one-carbon like metabolic process DN39338_c0_g1_i1 H+-ATPase V- vacuolar proton-transporting V- 24.46 15.23 type subunit type ATPase, V0 domain; ATP hydrolysis coupled proton transport; DN35183_c6_g1_i1 sodium sodium ion export from cell; 51.69 28.97 potassium sodium:potassium-exchanging ATPase alpha ATPase activity involved in subunit regulation of cardiac muscle cell

132 Appendices

membrane potential; sodium:potassium-exchanging ATPase complex; DN12636_c1_g1_i1 Sodium sodium ion transport; 1.29 0.64 potassium transmembrane transport; calcium calcium, potassium:sodium exchanger antiporter activity; calcium ion Nckx30C transport DN35350_c10_g1_i1 water-specific glycerol transmembrane 3.17 1.86 aquaporin transporter activity; channel activity; ion transport; transmembrane transport; water transmembrane transporter activity DN35168_c11_g1_i1 vitellogenin lipid transporter activity; 12.76 8.27 oogenesis Mothers againstMothers ovarian against follicledpp cell 6.54 7.49 DPP against dppdevelopment DN10421_c1_g1_i1 DN35616_c4_g1_i1 Mothers against post-embryonic development; 4.96 7.2 DPP 3 immune system development; response to hypoxia; immune response; in utero embryonic development; heart looping

Appendices 133

Table A2 Abundance estimation (TPM) of differentially expressed genes associated with SNPs in Kilcoy Creek and Stony Creek. Transcript ID Gene name GO term Kilcoy Stony

DN14761_c0_g1 Uncharacterised 17.89 0.01 DN37214_c0_g1 60S ribosomal L37-A structural constituent of ribosome; metal ion binding; cytosolic large ribosomal subunit; 48.37 0.03 rRNA binding; translation DN14526_c1_g1 delta 4 regulation of neural retina development; multicellular organism development; cell 11.02 0.01 communication; negative regulation of cell proliferation; cell adhesion; negative regulation of blood vessel endothelial cell proliferation involved in sprouting angiogenesis; Notch binding; homophilic cell adhesion via plasma membrane adhesion molecules; negative regulation of Notch signaling pathway; positive regulation of Notch signaling pathway; Notch signaling involved in heart development; negative regulation of cell migration involved in sprouting angiogenesis; negative regulation of transcription from RNA polymerase II promoter; cardiac ventricle morphogenesis; angiogenesis; branching involved in blood vessel morphogenesis; cardiac atrium morphogenesis; calcium ion binding; positive regulation of neural precursor cell proliferation; dorsal aorta morphogenesis; negative regulation of endothelial cell migration; negative regulation of blood vessel endothelial cell migration; zinc ion binding; cellular response to fibroblast growth factor stimulus; ventricular trabecula myocardium morphogenesis; pericardium morphogenesis; plasma membrane; ventral spinal cord interneuron fate commitment; T cell differentiation; negative regulation of gene expression; blood vessel remodeling; Notch signaling pathway; blood vessel lumenization; positive regulation of gene expression; membrane; integral component of membrane; cellular response to vascular endothelial growth factor stimulus; regulation of neurogenesis

DN34691_c1_g1 Uncharacterised 7.63 0.01 DN37041_c0_g1 Hematopoietic magnesium ion binding; prostaglandin-D synthase activity; protein homodimerization 27.01 0.00 prostaglandin D synthase activity; calcium ion binding; negative regulation of male germ cell proliferation; prostaglandin metabolic process; transferase activity

DN18716_c0_g1 urate oxidase single-organism metabolic process 10.71 0.00

DN25388_c0_g1 Uncharacterised 8.33 0.00

DN39296_c0_g1 Uncharacterised 25.58 0.02

Appendices 135

DN28673_c1_g1 Sushi domain-containing cell-matrix adhesion; scavenger receptor activity; cell adhesion; homophilic cell 7.45 0.00 2 adhesion via plasma membrane adhesion molecules; receptor-mediated endocytosis; plasma membrane; negative regulation of cell cycle G1/S phase transition; protein binding; immune response; extracellular exosome; negative regulation of cell division; calcium ion binding; membrane; integral component of membrane; polysaccharide binding DN39578_c0_g1 Uncharacterised 27.03 0.01 DN38477_c0_g1 tektin A1 intracellular organelle; sperm midpiece; cilium movement involved in cell motility; 9.99 0.00 regulation of fertilization; sperm principal piece DN19087_c0_g1 lysyl-tRNA synthetase ATP binding; microtubule cytoskeleton; lysine-tRNA ligase activity; mitochondrion; 10.30 0.00 aminoacyl-tRNA synthetase multienzyme complex; lysyl-tRNA aminoacylation; tRNA binding; amino acid binding; diadenosine tetraphosphate biosynthetic process

DN15195_c1_g1 delta 4 chemotaxis; heart development; protein binding; movement of cell or subcellular 35.60 0.05 component; cell communication; neuron differentiation; negative regulation of multicellular organismal process; cellular response to stimulus; anatomical structure morphogenesis; negative regulation of cellular process; regulation of multicellular organismal development DN36008_c0_g1 Uncharacterised 16.34 0.00

DN37192_c0_g1 Uncharacterised 52.29 0.01 DN38138_c0_g1 ribosomal L11 chordate embryonic development; structural constituent of ribosome; nematode larval 27.31 0.00 development; apoptotic process; glucose homeostasis; hemoglobin biosynthetic process; erythrocyte development; cytosolic large ribosomal subunit; rRNA binding; ribosomal large subunit assembly; reproduction; translation DN39540_c0_g1 mitochondrial ATP ATPase activity, coupled to transmembrane movement of ions, rotational mechanism; 12.27 0.00 synthase gamma chain proton transport; hydrogen ion transmembrane transporter activity; cation-transporting ATPase activity; proton-transporting ATP synthase complex; ATP biosynthetic process; mitochondrial inner membrane DN5775_c0_g1 dynein intermediate chain outer dynein arm; microtubule motor activity; protein binding; outer dynein arm 4.70 0.00 axonemal assembly; cilium movement; sperm flagellum; determination of left/right symmetry DN34691_c3_g1 SCO-spondin-like cell differentiation; nervous system development 10.75 0.01

136 Appendices

DN8515_c0_g1 plasma membrane integral component of membrane 42.35 0.11 proteolipid 3 DN38855_c0_g1 translationally-controlled extracellular space; stem cell population maintenance; negative regulation of ectoderm 66.92 0.02 tumor -like isoform 1 development; cytoplasm; vesicle; intracellular non-membrane-bounded organelle; transcription factor binding; single-organism cellular process; nucleoplasm

DN35128_c1_g1 Uncharacterised 303.88 0.10 DN37433_c0_g1 protocadherin Fat 4-like multicellular organism development; collagen trimer; cell-cell signaling; cell adhesion; 16.27 0.01 peptidase activity; homophilic cell adhesion via plasma membrane adhesion molecules; extracellular region; plasma membrane; extracellular space; calcium ion binding; membrane; proteolysis; integral component of membrane; hydrolase activity DN8068_c0_g1 extracellular protease hydrolase activity 122.54 0.07

DN6977_c0_g1 Uncharacterised 42.43 0.02 DN15708_c1_g1 60S ribosomal L3-like peroxidase activity; response to oxidative stress; structural constituent of ribosome; 26.36 0.02 calcium ion binding; integral component of membrane; cellular oxidant detoxification; oxidation-reduction process; ribosome; heme binding; translation

DN34888_c12_g1 Uncharacterised 21.99 0.00 DN26243_c0_g1 heat shock 70 kDa 4 ion binding 5.60 0.00

DN34691_c8_g1 Uncharacterised 11.93 0.00

DN34892_c0_g1 Uncharacterised 8.85 0.00 DN36441_c0_g1 serine protease hydrolase activity 35.41 0.00 DN25141_c2_g1 glutamine--fructose-6- skeletal system development; glutamine-fructose-6-phosphate transaminase 13.50 0.00 phosphate (isomerizing) activity; fructose 6-phosphate metabolic process; cell differentiation; UDP- aminotransferase N-acetylglucosamine metabolic process; animal organ development; tissue development [isomerizing] 1-like isoform X1 DN36630_c0_g1 hypothetical protein not annotated 21.09 0.00 CGI_10022443

Appendices 137

DN35879_c1_g1 S-adenosylmethionine ATP binding; selenomethionine adenosyltransferase activity; sulfur amino acid 14.48 0.00 synthase isoform type-1 metabolic process; selenium compound metabolic process; metal ion binding; cytosol; S- adenosylmethionine biosynthetic process; methionine adenosyltransferase activity; one- carbon metabolic process; methylation

DN38770_c0_g1 Uncharacterised 40.05 0.00 DN36828_c0_g1 elongation factor-2 kinase not annotated 11.83 0.00 DN36313_c0_g1 serine--tRNA selenocysteinyl-tRNA(Sec) biosynthetic process; ATP binding; extracellular exosome; 10.95 0.00 cytoplasmic-like cytoplasm; branching involved in blood vessel morphogenesis; serine-tRNA ligase activity; seryl-tRNA aminoacylation

DN23311_c1_g1 mitochondrial ATP proton-transporting ATP synthase complex, catalytic core F(1); ATP binding; ATP 11.01 0.00 synthase subunit hydrolysis coupled proton transport; ATP synthesis coupled proton transport; proton- transporting ATP synthase activity, rotational mechanism

DN38634_c0_g1 elongation factor 1-beta response to ethanol; protein binding; translation elongation factor activity; endoplasmic 29.58 0.03 reticulum; translational elongation; eukaryotic translation elongation factor 1 complex DN36035_c0_g1 hsc70-interacting isoform cytoplasm 15.30 0.00 X2 DN18961_c0_g1 Uncharacterised 10.32 0.00 DN5768_c0_g1 heat shock 60 ATP binding; mitochondrial matrix; fin regeneration; protein refolding 11.25 0.01 DN38891_c0_g1 predicted protein carbohydrate metabolic process; extracellular region; hydrolase activity, hydrolyzing O- 40.54 0.02 glycosyl compounds; cellulose binding DN25026_c1_g1 Brain tumor cytoplasm; negative regulation of ribosome biogenesis 16.78 0.01 DN30329_c3_g1 EGF-like domain- cell-matrix adhesion; chitin binding; calcium ion binding; chitin metabolic process; 13.49 0.00 containing membrane; integral component of membrane; cell adhesion; extracellular region; extracellular matrix structural constituent; proteinaceous extracellular matrix

DN35566_c10_g1 Uncharacterised 7.22 0.06

DN13441_c0_g1 Uncharacterised 24.18 0.00

138 Appendices

DN19845_c1_g1 argonaute family member nucleic acid binding; response to other organism; regulation of gene expression; 6.93 0.00 intracellular part; single-organism process; cellular process DN30544_c0_g1 chymotrypsinogen 2 hydrolase activity 30.30 0.03

DN38656_c0_g1 Uncharacterised 17.72 0.02

DN39208_c0_g1 Uncharacterised 45.71 0.03 DN37323_c0_g1 CCAAT-enhancer DNA binding; single-organism process; cellular process 9.32 0.00 binding delta DN34079_c3_g1 eukaryotic initiation translation initiation factor activity; ATP binding; RNA splicing; catalytic step 2 7.27 0.00 factor 4A-III-like spliceosome; regulation of gene expression; RNA secondary structure unwinding; translational initiation; ATP-dependent RNA helicase activity

DN10559_c0_g1 SET negative regulation of histone acetylation; heterocyclic compound binding; signal 13.44 0.00 transduction; DNA replication; endoplasmic reticulum; regulation of mRNA stability; protein complex; protein phosphorylation; nucleosome disassembly; nucleosome assembly; nucleoplasm; negative regulation of neuron apoptotic process; protein phosphatase inhibitor activity; nucleocytoplasmic transport; perinuclear region of cytoplasm; organic cyclic compound binding; negative regulation of transcription, DNA- templated; histone binding; regulation of catalytic activity DN35601_c12_g1 Uncharacterised 20.62 0.01 DN35141_c31_g1 heat shock 70 ATP binding; 2-alkenal reductase [NAD(P)] activity; oxidation-reduction process 32.64 0.02 DN32479_c1_g1 NADH dehydrogenase oxidoreductase activity; mitochondrion; integral component of membrane; oxidation- 37.41 0.02 subunit 5 reduction process (mitochondrion) DN36616_c0_g1 cathepsin D anatomical structure development; peptidase activity; single-organism developmental 9.14 0.00 process

DN7215_c0_g1 Uncharacterised 122.84 0.08 DN36077_c0_g1 hypothetical protein membrane; integral component of membrane 13.90 0.00 VICG_01136

Appendices 139

DN33501_c0_g1 ATP synthase subunit myelin sheath; ATP hydrolysis coupled proton transport; MHC class I protein binding; 17.09 0.01 mitochondrial-like proton-transporting ATPase activity, rotational mechanism; plasma membrane; mitochondrial proton-transporting ATP synthase complex; proton-transporting ATP synthase activity, rotational mechanism; proton-transporting ATP synthase complex, catalytic core F(1); ATP binding; extracellular exosome; lipid metabolic process; negative regulation of endothelial cell proliferation; COP9 signalosome; ATP synthesis coupled proton transport DN19522_c0_g1 Uncharacterised 10.02 0.00

DN35531_c2_g1 Uncharacterised 11.03 0.00 DN36042_c0_g1 60 kDa neurofilament intracellular part 12.94 0.01

DN10163_c0_g1 Uncharacterised 18.17 0.00 DN8026_c0_g1 EF-hand domain- biological_process; calcium ion binding; cellular_component 6.53 0.00 containing family member C2 DN38590_c0_g1 stress-associated protein binding; endoplasmic reticulum unfolded protein response; integral component 23.05 0.00 endoplasmic reticulum 2- of membrane; protein glycosylation; endoplasmic reticulum; transport like DN34492_c7_g1 unknown not annotated 15.12 0.00 DN37115_c0_g1 inactive pancreatic lipase- extracellular space; triglyceride lipase activity 15.65 0.00 related 1

DN2460_c0_g1 Uncharacterised 80.88 0.06 DN1724_c0_g1 carboxypeptidase A2 peptidase activity, acting on L-amino acid peptides 17.22 0.01 DN35298_c1_g1 myosin heavy chain myosin complex; ATP binding; actin binding; cytoplasm; motor activity 16.60 0.01

DN38764_c0_g1 Uncharacterised 28.97 0.01

DN36553_c0_g1 Uncharacterised 27.94 0.04

DN35260_c22_g1 Uncharacterised 0.00 12.70

DN35318_c5_g1 Uncharacterised 0.10 12.82

DN18783_c0_g1 Uncharacterised 0.00 36.69 DN736_c0_g1 serine protease 27-like peptidase activity 0.02 45.45

140 Appendices

DN35318_c6_g1 Uncharacterised 0.15 16.72

Appendices 141

Appendix B

Table B1 List of differentially expressed transcripts and their annotation

Trinity ID Gene Name TRINITY_DN44167_c0_g1_i1 undescribed Pyrophosphate-energized vacuolar membrane proton TRINITY_DN19164_c1_g1_i1 pump TRINITY_DN12711_c0_g1_i1 Tenascin-X TRINITY_DN43393_c0_g1_i1 60S ribosomal protein L4-B TRINITY_DN43064_c0_g1_i1 undescribed TRINITY_DN41360_c0_g1_i1 60S ribosomal protein L7 TRINITY_DN37959_c2_g1_i1 undescribed TRINITY_DN43534_c0_g1_i1 Elongation factor 2 60S acidic ribosomal protein TRINITY_DN44512_c0_g1_i1 P0 TRINITY_DN40376_c0_g1_i1 60S ribosomal protein TRINITY_DN5253_c0_g2_i1 undescribed TRINITY_DN38719_c8_g1_i1 tubulin beta chain TRINITY_DN40343_c0_g1_i1 undescribed TRINITY_DN42224_c0_g1_i1 60S ribosomal L18 TRINITY_DN4212_c0_g1_i1 40S ribosomal protein S6-A TRINITY_DN42508_c0_g1_i1 60S ribosomal protein L10 TRINITY_DN40027_c0_g1_i1 40S ribosomal protein S4 TRINITY_DN41253_c0_g1_i1 60S ribosomal protein L8-3 TRINITY_DN42801_c0_g1_i1 Ribosomal protein L3 TRINITY_DN43778_c0_g1_i1 60S ribosomal protein L15 TRINITY_DN43654_c0_g1_i1 40S ribosomal protein TRINITY_DN39715_c8_g1_i1 undescribed TRINITY_DN38958_c25_g1_i1 undescribed TRINITY_DN38676_c37_g1_i1 undescribed TRINITY_DN32669_c0_g1_i1 undescribed TRINITY_DN6337_c0_g1_i1 undescribed TRINITY_DN37954_c3_g1_i1 undescribed TRINITY_DN37959_c4_g1_i1 undescribed TRINITY_DN37954_c0_g1_i1 undescribed TRINITY_DN7805_c0_g1_i1 undescribed TRINITY_DN34406_c0_g1_i1 undescribed TRINITY_DN36801_c1_g1_i1 60S ribosomal protein L27a TRINITY_DN41820_c0_g1_i1 40S ribosomal protein TRINITY_DN25785_c0_g1_i1 undescribed TRINITY_DN39351_c3_g1_i1 Tubulin alpha chain TRINITY_DN42848_c0_g1_i1 60S ribosomal protein L10a

Appendices 143

TRINITY_DN43110_c0_g1_i1 60S ribosomal protein L19 TRINITY_DN44623_c0_g1_i1 Zinc finger protein GIS2 TRINITY_DN42865_c0_g1_i1 60S ribosomal protein L35 TRINITY_DN24632_c0_g1_i1 Zonadhesin TRINITY_DN38624_c6_g1_i1 undescribed TRINITY_DN39965_c4_g1_i1 undescribed TRINITY_DN7504_c1_g1_i1 undescribed TRINITY_DN4342_c1_g1_i1 Trypsin-1 TRINITY_DN38242_c10_g1_i1 undescribed TRINITY_DN39678_c0_g1_i1 undescribed TRINITY_DN38213_c7_g2_i1 undescribed TRINITY_DN38641_c11_g1_i1 undescribed TRINITY_DN39705_c44_g1_i1 undescribed TRINITY_DN37930_c3_g1_i1 CD209 antigen-like protein TRINITY_DN11596_c1_g1_i1 undescribed TRINITY_DN11596_c0_g1_i1 undescribed TRINITY_DN38847_c23_g1_i1 undescribed TRINITY_DN39715_c17_g1_i1 undescribed TRINITY_DN21014_c0_g1_i1 Histone H2B.2 TRINITY_DN14706_c0_g1_i1 undescribed TRINITY_DN42303_c0_g1_i1 undescribed TRINITY_DN37773_c5_g1_i1 Histone H2AX TRINITY_DN35116_c3_g1_i1 Histone H4 TRINITY_DN38019_c3_g1_i1 Histone H3 TRINITY_DN5253_c0_g1_i1 undescribed TRINITY_DN39680_c10_g1_i1 Hemocyanin C chain TRINITY_DN39965_c20_g1_i1 Hemocyanin B chain TRINITY_DN39680_c1_g1_i1 Hemocyanin A chain TRINITY_DN39056_c6_g1_i1 Lipase 3 TRINITY_DN38847_c23_g2_i1 undescribed TRINITY_DN39612_c11_g1_i1 undescribed C-type lectin domain family TRINITY_DN39612_c2_g1_i1 member D3 TRINITY_DN38767_c41_g1_i1 undescribed TRINITY_DN32397_c0_g1_i1 undescribed TRINITY_DN39294_c7_g1_i1 Cytochrome b NADH-ubiquinone TRINITY_DN39294_c3_g1_i1 oxidoreductase chain 4 NADH-ubiquinone TRINITY_DN39931_c6_g1_i1 oxidoreductase chain 4 TRINITY_DN39380_c6_g1_i1 undescribed TRINITY_DN38085_c2_g1_i1 Hepatic lectin TRINITY_DN39965_c9_g1_i1 undescribed TRINITY_DN38217_c36_g1_i1 undescribed TRINITY_DN39086_c28_g1_i1 Heat shock protein 90

144 Appendices

TRINITY_DN5819_c0_g1_i1 Heat shock 70 kDa protein TRINITY_DN38441_c4_g1_i1 undescribed TRINITY_DN13572_c1_g1_i1 heat shock protein homolog TRINITY_DN42608_c0_g1_i1 40S ribosomal protein S16 TRINITY_DN42051_c0_g1_i1 Tubulin alpha chain TRINITY_DN29916_c2_g1_i1 undescribed TRINITY_DN13572_c0_g1_i1 heat shock protein homolog TRINITY_DN32250_c0_g1_i1 heat shock protein 90 TRINITY_DN24137_c0_g1_i1 Actin nonmuscle Endoplasmic reticulum TRINITY_DN42969_c0_g1_i1 chaperone BiP TRINITY_DN37826_c0_g1_i1 undescribed TRINITY_DN29916_c0_g1_i1 undescribed TRINITY_DN24926_c0_g1_i1 Protein argonaute-2 TRINITY_DN38245_c1_g1_i1 undescribed TRINITY_DN42788_c0_g1_i1 undescribed TRINITY_DN44314_c0_g1_i1 undescribed TRINITY_DN41263_c0_g1_i1 undescribed TRINITY_DN40617_c0_g1_i1 40S ribosomal protein S17 TRINITY_DN42537_c0_g1_i1 Elongation factor 1-alpha TRINITY_DN42620_c0_g1_i1 Polar tube protein 2 TRINITY_DN37044_c0_g1_i1 undescribed TRINITY_DN5468_c0_g1_i1 undescribed TRINITY_DN38360_c11_g1_i1 undescribed TRINITY_DN29165_c1_g1_i1 undescribed TRINITY_DN29165_c0_g1_i1 undescribed TRINITY_DN44455_c0_g1_i1 undescribed TRINITY_DN41189_c0_g1_i1 Histone H4 TRINITY_DN40414_c0_g1_i1 undescribed TRINITY_DN41975_c0_g1_i1 undescribed TRINITY_DN40247_c0_g1_i1 Histone H2B TRINITY_DN42241_c0_g1_i1 Histone H2A TRINITY_DN42344_c0_g1_i1 Histone H3.1 TRINITY_DN41753_c0_g1_i1 undescribed TRINITY_DN43764_c0_g1_i1 undescribed TRINITY_DN43823_c0_g1_i1 undescribed TRINITY_DN38332_c14_g1_i1 Polyubiquitin TRINITY_DN38884_c33_g1_i1 undescribed Retrovirus-related Pol polyprotein from type-1 retrotransposable element TRINITY_DN39867_c0_g1_i1 R2DM Retrovirus-related Pol polyprotein from type-1 TRINITY_DN38972_c18_g1_i1 retrotransposable element R2

Appendices 145

TRINITY_DN43387_c0_g1_i1 undescribed TRINITY_DN36482_c4_g1_i1 undescribed Endoplasmic reticulum TRINITY_DN38103_c0_g1_i1 chaperone BiP TRINITY_DN42959_c0_g1_i1 undescribed TRINITY_DN38234_c5_g1_i1 Heat shock 70 kDa protein TRINITY_DN38682_c15_g1_i1 undescribed TRINITY_DN5035_c0_g1_i1 undescribed TRINITY_DN38217_c34_g1_i1 undescribed TRINITY_DN38217_c19_g1_i1 undescribed TRINITY_DN14127_c0_g1_i1 Cathepsin B Transmembrane protease serine TRINITY_DN44261_c0_g1_i1 3 TRINITY_DN42025_c0_g1_i1 40S ribosomal protein SA TRINITY_DN5161_c0_g1_i1 60S ribosomal protein L23a TRINITY_DN43446_c0_g1_i1 60S ribosomal protein L5-B TRINITY_DN5495_c0_g1_i1 undescribed TRINITY_DN39951_c18_g1_i1 Protein HEG homolog 1 Endoplasmic reticulum TRINITY_DN39382_c10_g1_i1 chaperone BiP TRINITY_DN44127_c0_g1_i1 Protein disulfide-isomerase TRINITY_DN38865_c2_g1_i1 undescribed TRINITY_DN39400_c0_g1_i1 CD109 antigen TRINITY_DN39650_c6_g1_i1 Coagulation factor VII TRINITY_DN41017_c0_g1_i1 Prostasin TRINITY_DN3355_c0_g1_i1 extracellualr protease TRINITY_DN40293_c0_g1_i1 Ganglioside GM2 activator TRINITY_DN25994_c0_g1_i1 undescribed TRINITY_DN27629_c0_g1_i1 undescribed TRINITY_DN43574_c0_g1_i1 undescribed TRINITY_DN42154_c0_g1_i1 undescribed TRINITY_DN39165_c19_g1_i1 Ferritin

146 Appendices

Appendix C

Figure C1 Principal Component analysis plot of the first four eigenvectors based on SNPs.

Appendices 147

Table C1 Output of Manichaikul et al. (2010) relatedness test. INDV1 INDV2 N_AaAa N_AAaa N1_Aa N2_Aa RELATEDNESS_PHI Branch 1 Branch 1 8612 0 8612 8612 0.5 Branch 1 Kilcoy 1 3023 0 8612 6633 0.198295 Branch 1 Kilcoy 2 2834 0 8612 7121 0.180131 Branch 1 Kilcoy 3 2803 0 8612 6746 0.182511 Branch 1 Stony 1 3700 0 8612 6579 0.243565 Branch 1 Stony 2 3615 0 8612 6174 0.244488 Branch 1 Stony 3 3669 0 8612 6493 0.2429 Branch 1 Branch 2 4730 0 8612 18057 0.177359 Branch 1 Branch 3 4140 0 8612 14831 0.176599 Branch 1 Branch 4 3398 0 8612 10697 0.17598 Branch 1 Branch 5 4958 0 8612 22875 0.157462 Branch 1 Branch 6 5285 0 8612 10647 0.274417 Branch 1 Branch 7 5040 0 8612 15555 0.208549 Branch 1 Branch 8 4786 0 8612 18498 0.17654 Branch 1 Branch 9 3057 0 8612 7063 0.195024 Kilcoy 1 Branch 1 3023 0 6633 8612 0.198295 Kilcoy 1 Kilcoy 1 6633 0 6633 6633 0.5 Kilcoy 1 Kilcoy 2 4228 0 6633 7121 0.307401 Kilcoy 1 Kilcoy 3 3755 0 6633 6746 0.280664 Kilcoy 1 Stony 1 2281 0 6633 6579 0.172646 Kilcoy 1 Stony 2 2294 0 6633 6174 0.179121 Kilcoy 1 Stony 3 2358 0 6633 6493 0.179643 Kilcoy 1 Branch 2 4163 0 6633 18057 0.168611 Kilcoy 1 Branch 3 4398 0 6633 14831 0.204901 Kilcoy 1 Branch 4 4092 0 6633 10697 0.236122 Kilcoy 1 Branch 5 4457 0 6633 22875 0.151044 Kilcoy 1 Branch 6 3034 0 6633 10647 0.175579 Kilcoy 1 Branch 7 4182 0 6633 15555 0.18848 Kilcoy 1 Branch 8 4080 0 6633 18498 0.162349 Kilcoy 1 Branch 9 3601 0 6633 7063 0.262923 Kilcoy 2 Branch 1 2834 0 7121 8612 0.180131 Kilcoy 2 Kilcoy 1 4228 0 7121 6633 0.307401 Kilcoy 2 Kilcoy 2 7121 0 7121 7121 0.5 Kilcoy 2 Kilcoy 3 3744 0 7121 6746 0.269994 Kilcoy 2 Stony 1 2373 0 7121 6579 0.173212 Kilcoy 2 Stony 2 2305 0 7121 6174 0.173373

148 Appendices

Kilcoy 2 Stony 3 2334 0 7121 6493 0.171441 Kilcoy 2 Branch 2 4659 0 7121 18057 0.185042 Kilcoy 2 Branch 3 4375 0 7121 14831 0.199298 Kilcoy 2 Branch 4 4068 0 7121 10697 0.228308 Kilcoy 2 Branch 5 4732 0 7121 22875 0.157754 Kilcoy 2 Branch 6 3064 0 7121 10647 0.172445 Kilcoy 2 Branch 7 4321 0 7121 15555 0.190554 Kilcoy 2 Branch 8 4116 0 7121 18498 0.160662 Kilcoy 2 Branch 9 3548 0 7121 7063 0.250141 Kilcoy 3 Branch 1 2803 0 6746 8612 0.182511 Kilcoy 3 Kilcoy 1 3755 0 6746 6633 0.280664 Kilcoy 3 Kilcoy 2 3744 0 6746 7121 0.269994 Kilcoy 3 Kilcoy 3 6746 0 6746 6746 0.5 Kilcoy 3 Stony 1 2059 0 6746 6579 0.154522 Kilcoy 3 Stony 2 2150 0 6746 6174 0.166409 Kilcoy 3 Stony 3 2145 0 6746 6493 0.162021 Kilcoy 3 Branch 2 4237 0 6746 18057 0.170826 Kilcoy 3 Branch 3 4627 0 6746 14831 0.214441 Kilcoy 3 Branch 4 4927 0 6746 10697 0.282463 Kilcoy 3 Branch 5 4246 0 6746 22875 0.143344 Kilcoy 3 Branch 6 2814 0 6746 10647 0.161789 Kilcoy 3 Branch 7 4363 0 6746 15555 0.195641 Kilcoy 3 Branch 8 4032 0 6746 18498 0.159721 Kilcoy 3 Branch 9 3975 0 6746 7063 0.287856 Stony 1 Branch 1 3700 0 6579 8612 0.243565 Stony 1 Kilcoy 1 2281 0 6579 6633 0.172646 Stony 1 Kilcoy 2 2373 0 6579 7121 0.173212 Stony 1 Kilcoy 3 2059 0 6579 6746 0.154522 Stony 1 Stony 1 6579 0 6579 6579 0.5 Stony 1 Stony 2 4061 0 6579 6174 0.318435 Stony 1 Stony 3 4221 0 6579 6493 0.322904 Stony 1 Branch 2 3729 0 6579 18057 0.151364 Stony 1 Branch 3 3218 0 6579 14831 0.150304 Stony 1 Branch 4 2613 0 6579 10697 0.15125 Stony 1 Branch 5 4054 0 6579 22875 0.137638 Stony 1 Branch 6 3814 0 6579 10647 0.221409 Stony 1 Branch 7 3336 0 6579 15555 0.150718 Stony 1 Branch 8 3583 0 6579 18498 0.14288 Stony 1 Branch 9 2299 0 6579 7063 0.168524

Appendices 149

Stony 2 Branch 1 3615 0 6174 8612 0.244488 Stony 2 Kilcoy 1 2294 0 6174 6633 0.179121 Stony 2 Kilcoy 2 2305 0 6174 7121 0.173373 Stony 2 Kilcoy 3 2150 0 6174 6746 0.166409 Stony 2 Stony 1 4061 0 6174 6579 0.318435 Stony 2 Stony 2 6174 0 6174 6174 0.5 Stony 2 Stony 3 4181 0 6174 6493 0.33007 Stony 2 Branch 2 3459 0 6174 18057 0.142751 Stony 2 Branch 3 3096 0 6174 14831 0.147393 Stony 2 Branch 4 2630 0 6174 10697 0.155889 Stony 2 Branch 5 3843 0 6174 22875 0.132294 Stony 2 Branch 6 3431 0 6174 10647 0.203971 Stony 2 Branch 7 3202 0 6174 15555 0.147361 Stony 2 Branch 8 3493 0 6174 18498 0.141577 Stony 2 Branch 9 2300 0 6174 7063 0.173755 Stony 3 Branch 1 3669 0 6493 8612 0.2429 Stony 3 Kilcoy 1 2358 0 6493 6633 0.179643 Stony 3 Kilcoy 2 2334 0 6493 7121 0.171441 Stony 3 Kilcoy 3 2145 0 6493 6746 0.162021 Stony 3 Stony 1 4221 0 6493 6579 0.322904 Stony 3 Stony 2 4181 0 6493 6174 0.33007 Stony 3 Stony 3 6493 0 6493 6493 0.5 Stony 3 Branch 2 3645 0 6493 18057 0.148473 Stony 3 Branch 3 3261 0 6493 14831 0.152926 Stony 3 Branch 4 2679 0 6493 10697 0.155846 Stony 3 Branch 5 4031 0 6493 22875 0.137258 Stony 3 Branch 6 3608 0 6493 10647 0.210502 Stony 3 Branch 7 3496 0 6493 15555 0.158563 Stony 3 Branch 8 3559 0 6493 18498 0.142411 Stony 3 Branch 9 2366 0 6493 7063 0.174535 Branch 2 Branch 1 4730 0 18057 8612 0.177359 Branch 2 Kilcoy 1 4163 0 18057 6633 0.168611 Branch 2 Kilcoy 2 4659 0 18057 7121 0.185042 Branch 2 Kilcoy 3 4237 0 18057 6746 0.170826 Branch 2 Stony 1 3729 0 18057 6579 0.151364 Branch 2 Stony 2 3459 0 18057 6174 0.142751 Branch 2 Stony 3 3645 0 18057 6493 0.148473 Branch 2 Branch 2 18057 0 18057 18057 0.5 Branch 2 Branch 3 9961 0 18057 14831 0.302876

150 Appendices

Branch 2 Branch 4 7238 0 18057 10697 0.251721 Branch 2 Branch 5 13544 0 18057 22875 0.33089 Branch 2 Branch 6 5674 0 18057 10647 0.197673 Branch 2 Branch 7 8809 0 18057 15555 0.262079 Branch 2 Branch 8 11187 0 18057 18498 0.306032 Branch 2 Branch 9 4398 0 18057 7063 0.17508 Branch 3 Branch 1 4140 0 14831 8612 0.176599 Branch 3 Kilcoy 1 4398 0 14831 6633 0.204901 Branch 3 Kilcoy 2 4375 0 14831 7121 0.199298 Branch 3 Kilcoy 3 4627 0 14831 6746 0.214441 Branch 3 Stony 1 3218 0 14831 6579 0.150304 Branch 3 Stony 2 3096 0 14831 6174 0.147393 Branch 3 Stony 3 3261 0 14831 6493 0.152926 Branch 3 Branch 2 9961 0 14831 18057 0.302876 Branch 3 Branch 3 14831 0 14831 14831 0.5 Branch 3 Branch 4 7179 0 14831 10697 0.281221 Branch 3 Branch 5 10968 0 14831 22875 0.290882 Branch 3 Branch 6 4719 0 14831 10647 0.185219 Branch 3 Branch 7 8124 0 14831 15555 0.26736 Branch 3 Branch 8 9149 0 14831 18498 0.274506 Branch 3 Branch 9 4661 0 14831 7063 0.212889 Branch 4 Branch 1 3398 0 10697 8612 0.17598 Branch 4 Kilcoy 1 4092 0 10697 6633 0.236122 Branch 4 Kilcoy 2 4068 0 10697 7121 0.228308 Branch 4 Kilcoy 3 4927 0 10697 6746 0.282463 Branch 4 Stony 1 2613 0 10697 6579 0.15125 Branch 4 Stony 2 2630 0 10697 6174 0.155889 Branch 4 Stony 3 2679 0 10697 6493 0.155846 Branch 4 Branch 2 7238 0 10697 18057 0.251721 Branch 4 Branch 3 7179 0 10697 14831 0.281221 Branch 4 Branch 4 10697 0 10697 10697 0.5 Branch 4 Branch 5 6897 0 10697 22875 0.205439 Branch 4 Branch 6 3611 0 10697 10647 0.169181 Branch 4 Branch 7 5319 0 10697 15555 0.202613 Branch 4 Branch 8 6878 0 10697 18498 0.235588 Branch 4 Branch 9 4207 0 10697 7063 0.236881 Branch 5 Branch 1 4958 0 22875 8612 0.157462 Branch 5 Kilcoy 1 4457 0 22875 6633 0.151044 Branch 5 Kilcoy 2 4732 0 22875 7121 0.157754

Appendices 151

Branch 5 Kilcoy 3 4246 0 22875 6746 0.143344 Branch 5 Stony 1 4054 0 22875 6579 0.137638 Branch 5 Stony 2 3843 0 22875 6174 0.132294 Branch 5 Stony 3 4031 0 22875 6493 0.137258 Branch 5 Branch 2 13544 0 22875 18057 0.33089 Branch 5 Branch 3 10968 0 22875 14831 0.290882 Branch 5 Branch 4 6897 0 22875 10697 0.205439 Branch 5 Branch 5 22875 0 22875 22875 0.5 Branch 5 Branch 6 7270 0 22875 10647 0.216873 Branch 5 Branch 7 11684 0 22875 15555 0.304033 Branch 5 Branch 8 14227 0 22875 18498 0.343872 Branch 5 Branch 9 5306 0 22875 7063 0.177233 Branch 6 Branch 1 5285 0 10647 8612 0.274417 Branch 6 Kilcoy 1 3034 0 10647 6633 0.175579 Branch 6 Kilcoy 2 3064 0 10647 7121 0.172445 Branch 6 Kilcoy 3 2814 0 10647 6746 0.161789 Branch 6 Stony 1 3814 0 10647 6579 0.221409 Branch 6 Stony 2 3431 0 10647 6174 0.203971 Branch 6 Stony 3 3608 0 10647 6493 0.210502 Branch 6 Branch 2 5674 0 10647 18057 0.197673 Branch 6 Branch 3 4719 0 10647 14831 0.185219 Branch 6 Branch 4 3611 0 10647 10697 0.169181 Branch 6 Branch 5 7270 0 10647 22875 0.216873 Branch 6 Branch 6 10647 0 10647 10647 0.5 Branch 6 Branch 7 5854 0 10647 15555 0.223418 Branch 6 Branch 8 6464 0 10647 18498 0.221788 Branch 6 Branch 9 3074 0 10647 7063 0.173574 Branch 7 Branch 1 5040 0 15555 8612 0.208549 Branch 7 Kilcoy 1 4182 0 15555 6633 0.18848 Branch 7 Kilcoy 2 4321 0 15555 7121 0.190554 Branch 7 Kilcoy 3 4363 0 15555 6746 0.195641 Branch 7 Stony 1 3336 0 15555 6579 0.150718 Branch 7 Stony 2 3202 0 15555 6174 0.147361 Branch 7 Stony 3 3496 0 15555 6493 0.158563 Branch 7 Branch 2 8809 0 15555 18057 0.262079 Branch 7 Branch 3 8124 0 15555 14831 0.26736 Branch 7 Branch 4 5319 0 15555 10697 0.202613 Branch 7 Branch 5 11684 0 15555 22875 0.304033 Branch 7 Branch 6 5854 0 15555 10647 0.223418

152 Appendices

Branch 7 Branch 7 15555 0 15555 15555 0.5 Branch 7 Branch 8 9400 0 15555 18498 0.27604 Branch 7 Branch 9 4659 0 15555 7063 0.205986 Branch 8 Branch 1 4786 0 18498 8612 0.17654 Branch 8 Kilcoy 1 4080 0 18498 6633 0.162349 Branch 8 Kilcoy 2 4116 0 18498 7121 0.160662 Branch 8 Kilcoy 3 4032 0 18498 6746 0.159721 Branch 8 Stony 1 3583 0 18498 6579 0.14288 Branch 8 Stony 2 3493 0 18498 6174 0.141577 Branch 8 Stony 3 3559 0 18498 6493 0.142411 Branch 8 Branch 2 11187 0 18498 18057 0.306032 Branch 8 Branch 3 9149 0 18498 14831 0.274506 Branch 8 Branch 4 6878 0 18498 10697 0.235588 Branch 8 Branch 5 14227 0 18498 22875 0.343872 Branch 8 Branch 6 6464 0 18498 10647 0.221788 Branch 8 Branch 7 9400 0 18498 15555 0.27604 Branch 8 Branch 8 18498 0 18498 18498 0.5 Branch 8 Branch 9 4909 0 18498 7063 0.19205 Branch 9 Branch 1 3057 0 7063 8612 0.195024 Branch 9 Kilcoy 1 3601 0 7063 6633 0.262923 Branch 9 Kilcoy 2 3548 0 7063 7121 0.250141 Branch 9 Kilcoy 3 3975 0 7063 6746 0.287856 Branch 9 Stony 1 2299 0 7063 6579 0.168524 Branch 9 Stony 2 2300 0 7063 6174 0.173755 Branch 9 Stony 3 2366 0 7063 6493 0.174535 Branch 9 Branch 2 4398 0 7063 18057 0.17508 Branch 9 Branch 3 4661 0 7063 14831 0.212889 Branch 9 Branch 4 4207 0 7063 10697 0.236881 Branch 9 Branch 5 5306 0 7063 22875 0.177233 Branch 9 Branch 6 3074 0 7063 10647 0.173574 Branch 9 Branch 7 4659 0 7063 15555 0.205986 Branch 9 Branch 8 4909 0 7063 18498 0.19205 Branch 9 Branch 9 7063 0 7063 7063 0.5

Appendices 153

Appendix D

DNA salt extraction method, adapted from Miller et al. (1988)

1. Place tissue sample in 1.5mL tube and add 500µL solution 1 2. Add 5µL proteinase K (20mg/mL) – vortex and spin down 3. Incubate at 37°C overnight 4. Chill on ice for 10 minutes 5. Add 250µL of solution 2 and invert several times to mix 6. Chill on ice for 5 minutes 7. Spin at 13,000rpm for 15 minutes 8. Collect 500µL of clear supernatant into a new 1.5mL tube 9. Add twice the volume of 100% cold ethanol to precipitate DNA 10. Freeze at -4°C for at least 2 hours or overnight 11. Spin for 20 minutes at 13,000rpm 12. Remove supernatant 13. Rinse DNA pellet in ~500µL of cold 70% ethanol 14. Spin at 11,000rpm for 5 minutes 15. Remove supernatant, pipette off excess liquid and partially dry with lid off at 55°C on heating block for 5 minutes

16. Resuspend dried DNA in 50-200µL of sterile H2O by gently pipetting sample or gently vortexing sample

Solution 1 50mM Tris HCl pH8, 20mM EDTA pH8, 2% SDS Solution 2 6M NaCl solution

154 Appendices