Roskilde Universitet / Roskilde University Den Naturvidenskabelige Bacheloruddannelse / The Natural Science Bachelor Programme ______The study of evolutionary origin of the gene in mordax. ______​ ______​ _​ ______

Group members: Saudat Alishayeva Nicolaj Stelzner Grønvall Alexander Varnich Hansen Philip Kruse Nina Štrancar

Project supervisor: Peter Kamp Busk

Semester: 5th Semester

Date: 18/12/2018

Abstract This project report details the attempt to isolate and sequence candidate sequences for the evolution of the antifreeze protein gene present in Rhagium mordax. Larvae of ​ ​ the R.mordax species were collected, their DNA extracted and underwent PCR ​ ​ amplification with the use of specially designed primers which targeted sequences similar to ones coding for antifreeze genes. In addition, to validate the species we have applied DNA barcoding method. PCR amplifications were analysed with gel electrophoresis. Several PCR products were observed and sent for direct Sanger sequencing. Due to a problem with sequencing, we have been unable to identify the sequences and draw the conclusions of the origin of the antifreeze gene. Although the sequencing failed, results from the previous steps indicate that the methods utilized in this project were feasible for the study of R. mordax genetics. ​ ​

1

Table of contents

Abstract 1 Table of contents 2 Introduction 4 1.1 Hypothesis 5 2. Theory 6 2.1 Historical Overview of the Discovery of the Antifreeze Proteins 6 2.2 and R. mordax 6 2.3 Mutations - Gene Duplication 8 2.4 Antifreeze Proteins and Their Function 10 2.5 Degenerate primer design 10 3. Experimental methods 15 3.1 Сollection of R. mordax 15 3.2 DNA barcoding and CO1 gene 16 3.3 DNA Extraction and Purification 16 3.4 Analysis of the DNA Quality and Concentration 17 3.5 Primers Design 18 3.5.1 Forward Primer 18 3.5.2 First Reverse Primer 19 3.5.3 Second Reverse Primer 19 3.5.4 Primer Modification 20 3.4.5 CO1 primers 21 3.6 Polymerase Chain Reaction 22 3.7 Gel Electrophoresis 22 3.8 Gel Samples Extraction and Purification 22 3.9 Sequencing 24 3.10 Bioinformatics analysis 25 4. Results 26 4.1 The first agarose gel 27 4.2 The second agarose gel 28 4.3 The third agarose gel 30

2

4.4 Sequencing 31 4.5 Bioinformatics analysis 32 5. Discussion 36 5.1 Purification of the larvae DNA with QIAamp mini kit 36 5.2 PCR settings 36 5.3 Gel-1 examination 37 5.4 Primers specificity 37 5.5 Gel-2 examination 38 5.6 Gel DNA extraction 38 5.7 Sequencing 39 5.8 Interpretation of bioinformatics analysis 40 6. Conclusion 41 7. Perspective 41 8. References 44 9. Appendix 47

3

1.Introduction

Organisms around the world have evolved mechanisms to optimise their chances of survival to the different climate. One of such examples are organisms found in colder climates where there is a chance of , which may cause damage to an organism. A discovery in the late 1960s by Arthur L. DeVries and his team shed light on how some Antarctic fish species preserve their blood in the liquid state despite being in ice-cold water (DeVries et al., 1970). The team has found glycoproteins ​ which were able to depress the freezing point of water. This revelation was just one of the later discovered proteins, which came to be commonly known as Antifreeze Proteins (AFPs), also labeled ice structuring proteins (ISPs) or more generally, ice-binding proteins (IBPs). In this project, we refer to them as AFPs (Davies, 2014). ​ ​

The proteins are not exclusive to fish but have been identified in organisms such as , vertebrates, , , and fungi (Bar Dolev et al., 2016). While they ​ ​ share a common function, their structure exhibits great variation, which hints they have evolved independently in different species during evolution (Davies, 2014). The ​ ​ mechanism behind the AFPs' function is their binding to ice . The restricts growth, and that causes the freezing point to be depressed (Bar ​ Dolev et al., 2016).

The research of AFPs is not limited to their structure or function, but also extends to analysis of their core elements, such as AFP-coding DNA sequences. By obtaining the DNA sequence which codes an AFP, it is possible to analyse possible evolutionary origins between species that exhibit AFPs, observing effects on their function if mutations are implemented, or produce modified AFPs for utilisation in industry.

4

In this project, we designed primers based on AFP protein amino acid sequences originating from Rhagium mordax and performed PCR followed by direct sequencing.

1.1 Hypothesis

Our study hypothesizes that the gene responsible for the antifreeze protein has appeared in Rhagium mordax after a gene duplication event, as was previously ​ shown to be the case for the arctic fish Notothenia where similar antifreeze genes ​ ​ evolved by gene duplication. If a gene duplication event has taken place, then we expect to see a similar gene to rmAFP in the genome of R. mordax. The goal is thus ​ ​ to find sequences similar to the gene coding for AFPs, which could provide an insight from which sequences the rmAFP originated.

5

2. Theory

2.1 Historical Overview of the Discovery of the Antifreeze Proteins

The antifreeze activity was first observed in the 1950s when a Norwegian scientist P.F. Scholander set out an expedition to reveal the mystery of the mechanism by which the Arctic fish Notothenia can survive in water with colder temperature than the ​ freezing point of their blood. He concluded from his experiments that there was “antifreeze” in the blood of Arctic fish. But only later, in the late 1960s, was the actual antifreeze protein was isolated from the Antarctic fish by Arthur DeVries. These proteins were later named antifreeze glycoproteins (AFGPs) or antifreeze glycopeptides to emphasize the difference with newly discovered non-glycoprotein biological antifreeze agents (AFPs). There are several possible explanations as to what led to the evolution of antifreeze genes. One of the hypotheses (Hudait, 2018) ​ states that the changes in climate during the glacial period could be a possible source of the selective forces that drove convergent evolution of the novel mechanisms as the means of adaptation to the cold conditions.

2.2 Convergent Evolution and R. mordax ​ Convergent evolution is the phenomenon of species developing same traits without transfer of genes. This can occur by emergencies of a similar character (in our case freezing of water) and leads to various molecular mechanisms that compensate for this specific environmental hazard. Only mechanisms that are alike are referred to as convergent evolution.

According to (C. Deng et al. 2010) the antifreeze gene that was discovered in arctic ​ fish evolved from a mutated copy of gene that codes for sialic acid synthase (SAS) enzyme. The sequence alignment of the genes evolved in Rhagium mordax and ​

6

Notothenia fish shows no significant similarities. Therefore it’s fair to assume that the antifreeze gene in R. mordax has evolved convergently with the Notothenia gene. ​ ​ ​ ​

Carl Linnaeus has identified all as Coleoptera Order in 1758. (Linnaeus, ​ 1758, Systema Naturae) According to (Duman et al. 2004) antifreeze genes evolved ​ convergently within Coleoptera Order in 14 species:

Figure 1: Antifreeze proteins of Coleoptera order (Duman et al. 2004).

The publication does not consider Rhagium mordax., so on total there are about 15 ​ ​ species of beetles with the adaptation to similar condition. In case of Rhagium ​ mordax and inquisitor there is an ancestral heredity of the gene. Even though their ​ amino acid sequences coding for the antifreeze proteins are 30% diverse (Kristiansen, 2012), they share enough similarity to say that their most recent common ancestor had this gene. The Coleoptera order is the largest of all orders and contributes to 400 000 species of beetles. Since only 15 of species that belong to unrelated suborders have evolved to have the antifreeze gene it is convenient to assume that most of them have acquired this gene independently from each other. The structure of all antifreeze proteins is made of repetitive domains, forming because of very high content of Tyrosine amino acids. Moreover, the convergent evolution of those genes allowed some species to have more efficient antifreeze proteins than others hence to survive in even harsher conditions. For example, the beetle of our primary interest, R. mordax can survive temperatures that are down to ​ ​

7

-20°C, while an interesting beetle from Alaska Cucujus clavipes puniceus is able to ​ survive in −58°C and its larvae is capable of living under −100°C (Sformo et al, ​ 2010). In most cases, evolution led not only to adaptation through the emergence of ​ convergent antifreeze genes in the beetles, but also to alternative mechanisms that suppress the melting point, such as having high high sugar and glycerol content in blood. In the case of previously mentioned Cucujus clavipes puniceus, the species ​ achieves prevention of ice crystal formation with combination of deliberate dehydration and antifreeze activity.

In conclusion, the adaptation to frost environment is a convergent process of evolution that appeared to favour the antifreeze proteins together with combinatorial effects of several alternative factors.

2.3 Mutations - Gene Duplication

Mutations are changes in the genetic code in an individual cell or in an entire organism. Mutations come in many different variants. One variant, the spontaneous mutation can be caused by errors in DNA replication as well as spontaneous lesions, among other things. Spontaneous mutations are also categorized. An example would be tautomerism, in which the isomer of one base is changed via. the repositioning of a hydrogen atom, which results in the base bonding differently, leading to replication errors due to incorrect base pairing (Griffiths et al, 2000). ​ ​ Another primary variant is induced mutation. To put it simply, induced mutation is when a substance or effect (chemicals compounds, ions or radiation) induces a mutation by altering the chemical processes that would otherwise produce error-free sequences (Griffiths et al, 2000). One famous example often cited is ionizing ​ ​ radiation, which can enter a cell and cause a variety of damage. Due to the radiations ionizing nature, it can cause the creation free radicals that, on their own, can cause mutations due to their highly reactive nature (Breimer, 1988). ​ ​

A type of mutation that is particularly relevant to this project is gene duplication mutation. Gene duplication mutation is, in simple terms, when a genetic sequence is

8

copied from one chromosome and inserted into another, likely multiple times. This may alter the function of the protein that the gene that received the repeats should code for (Reams & Roth, 2015). The mechanisms behind gene duplication are ​ ​ varied. One mechanism which happens on a chromosomal level is the phenomenon known as unequal crossing over. Unequal crossing over, also known as illegitimate recombination occurs during meiosis, when two misaligned chromosomes have a crossover event (L. Silver, 2001). The crossover may be dependent on a stretch of ​ ​ homologous sequence at the two sites where the crossover is initiated. The new sequences have new functions as a result.

Another way for gene duplication mutation to occur, is via. replication slippage. Replication slippage is when an error occurs during DNA replication and can, not unlike the unequal crossing over, duplicate short genetic sequences. Specifically, the error occurs when DNA polymerase stalls during replication, and dissociates from the DNA strand (Viguera et al, 2001). DNA polymerase then reattaches to the DNA ​ ​ strand, but does so whilst misaligning the replication strand and replicates the sequence more than once, resulting in repeat sequences (Viguera et al, 2001). ​ These gene duplication events are categorized as neofunctionalization when it occurs as an evolutionary event, resulting in proteins with new functions that are potentially useful to the organism in some way (Magadum et al, 2013). ​ ​

9

Figure 2: Gene Duplication. (Reference: Talking Glossary of Genetics. The illustration was declared freely available for use without special permission.) Other ways of gene duplication do exist, such as aneuploidy, an abnormal number of chromosomes, which is generally harmful and whole gene duplication, which can lead to speciation, but none of these are either not applicable in regards to R. mordax ​ or irrelevant when concerning neofunctionalization (Griffiths et al, 2000). ​ ​

2.4 Antifreeze Proteins and Their Function

Antifreeze proteins are proteins produced by organisms in cold environments to avoid lethal freezing of body fluids. The AFP makes it possible for the organism to adapt to temperatures below the body fluids’ melting point (Friis et al, 2014). ​ Antifreeze proteins have the capability to slow down and prevent ice growth by binding to the growing ice crystals (Kristiansen, Zachariassen, 2005) (Bar Dolev et al, ​ 2016). When binding to the ice crystals, the AFPs slows down the ice growth by an adsorption-inhibition mechanism (Bar Dolev et al, 2016). This adsorption between an ​ AFP and an ice crystal causes thermal hysteresis; a separation of the ice crystal’s melting temperature and freezing temperature (Kristiansen, Zachariassen, 2005). ​ ​ The AFP lowers the freezing temperature below the melting point, which means that the temperature at which the ice crystals can expand will be lower (Bar Dolev et al, ​ 2016) (Kristiansen, Zachariassen, 2005). ​

2.5 Degenerate primer design

Amino acids Generally, we refer to amino acids as the 20 most important ones that appear within the genetics and form proteins. All proteins are made up of chains of amino acids which have been assembled based on the genomic sequence. Since we have the amino acid sequence of the AFP proteins of R. mordax we can reverse it back to the ​ ​ DNA nucleotide sequence and design the primers.

10

Nucleotide triplets Three consecutive non-overlapping nucleotides are what is referred to as a nucleotide triplet or a codon. The nucleotide triplet is what eventually becomes one amino acid. The reason for the genetic code utilising a triplet rather than a doublet is that the DNA consists of only 4 different nucleotides. The genome should be coding for at least 20 different amino acids. Had each nucleotide coded for one amino acid, there would be room for only 4 different amino acids. Using a triplet of 4 different nucleotides generates 64 different combinations, which suffices the 22 amino acid requirement (Griffiths et al, 2000). ​ ​

Reading Frame The reading frame of single strand DNA or RNA consists of consecutive non-overlapping codons and it can be read in three different ways. This is because the codons consist of three nucleotides and the reading frame to obtain amino acids can start at any of the three starting nucleotides. Effectively, the reading frame can determine which amino acid will be produced from the given triplet. Shifting from one reading frame to the other may change the amino acid it codes for (Griffiths et al, ​ 2000). ​

Primers Primers are fragments of single stranded DNA or RNA that can pair up with a sequence allowing polymerases to attach and run the process of duplication. Without the primer, the polymerase cannot start a reaction at a specific point. To pair with the DNA or RNA strand, the primers have to be nearly precisely complementary to the sequence. Some variation is possible, as long the nucleotides around them pair up strongly. It is necessary to identify convergent AFP sequences between R. inquisitor and R. ​ ​ mordax to make good non-redundant primers. The primer sequences should be unique and not be repeated anywhere else in the coding sequence. If that is the case, the primers may attach to the wrong parts of the genomic sequence. Furthermore, there are should be around 8 to 9 Guanine or Cytosine per primer. This Guanine and Cytosine requirement can be explained by the fact that they form triple

11

bonds, rather than double hydrogen bonds like in of Adenine and Thymine. Overall, the resulting primers had to be at least 18 nucleotides long (Griffiths et al, 2000). ​ ​

Reverse translation Reverse translation is a tool that converts amino acid sequence back into its primary DNA sequence. It provides the genetic code from a protein of interest, although reverse translation is inaccurate and cannot by itself give the exact nucleotide sequence, because there can be several triplets that code for the same amino acid, and that makes DNA sequence inferred by reverse translation redundant. The last and middle nucleotide of the triplets may differ from other triplets, while still coding for the same amino acid. These degenerate sequences are generally not desired in primer design. In the case of such coding sequence, one may choose to include all nucleotide combinations in the primer sequence. This would require using several primers and only one of the primers should be fully complementary to the sequence. However, this method effectively decreases the correct primer concentration. The list found below (see Figure 3) shows that some amino acids have a large variety of ​ ​ genetic code sequences. Therefore it is recommended to avoid amino acids such as Serine, Leucine and Arginine in primer sequences, as they contain many interchangeable nucleotides. (Griffiths et al, 2000). ​ ​

12

Figure 3: Triplets and their corresponding amino acids. (Baskauf, 2017)

Distance between primers and its effect on PCR product PCR best amplifies sequences of relatively small sizes. However, it is necessary to have primers at least 200 bp apart from each other to be able to identify possible amplified genes.

Primer tails Primer tails are an addition of nucleotides attached to the ‘5 end of the primer. The primer tails can be used to regulate/raise the melting point. This is relevant in our case because it allows for raising the melting point without making the primers more specific, since we will be searching for genes alike, as opposed to other alternative ways of increasing the melting point. One such method would be extending the primer into the coding area, furthermore, extending the primer length would dilute our primers with once that might have no affinity. A rule of thumb for well-designed primers is to include a primer tail. A primer tail is made of randomly selected nucleotides. An optimal primer tail is about 5-10

13

nucleotides long, because if it gets too long it might be able to bend enough to pair up with itself and this would restrict any possible transcription. When using primer tails for PCR there are a few things to keep in mind. Oftentimes the experiment might not yield any results if primers with tails are used in PCR at a normal 60°C annealing temperature. This could be because the tails need to be forcibly attached to the DNA strand to recruit the polymerase. To do that, several runs at lower temperature are recommended before the actual reaction. We have been recommended a temperature of 37°C, although there is no concrete evidence why this temperature should be ideal (Busk, P. K., Lange, L. 2013). ​ ​

Bioinformatics analysis: The Bioinformatics analysis tools relevant for the project include two types of the BLAST algorithm: tBLASTn, and nBLAST. BLAST that corresponds to Basic Local Alignment Search Tool is the most widely used Bioinformatics algorithm. We apply this method to align genes for gene identification purposes. This method is also widely used to find relatedness, and genetic similarities across various species database maintained and updated continuously at NCBI web source. The input file for BLAST is a nucleic or amino acid sequence written in FASTA format, and the output gives the aligned closely resembling sequences with provided accession numbers and scores, ranking the found sequences according to the degree of their identity and the sequence coverage. BLASTn is the tool designed for the search of nucleic acid sequences given that the input is provided in nucleotides as well. According to (Hahn et al. ​ 2007), tBLASTn can be even more comprehensive than nBLAST to analyze the gene ​ duplication events. tBLASTn is a similar searching tool that makes an alignment to compare a given protein sequence against the database of nucleic acids. The algorithm dynamically translates and compares sequences in all possible six reading frames

14

3. Experimental methods The goal was to sequence DNA fragments that are highly similar to the antifreeze protein gene sequence, in order to attempt to ascertain its evolutionary origin.

3.1 Сollection of R. mordax ​ In collecting specimens of Rhagium mordax, members of the group went to ​ ​ Nordskoven forest on 12.10.2018 with Hans Ramløv to search for R. mordax larvae. ​ Hans Ramløv offered his expertise in identifying the specimens as well as aided in their transportation to Roskilde University. The larvae samples were mostly found on ​ the oak trees. We collected 11 larvae samples and 2 grown beetles. The location of the forest is shown below:

Figure 4: The Nordskoven forest location (skoven kirke, 2018 ).

The equipment that was used for the search of the was a knife and an axe for the removal of bark where the insects could be found. The larvae and the beetles were found by removing the bark of trees. Larvae of the R. mordax species look identical to those of R. inquisitor species. ​ ​ Generally speaking, R. inquisitor species is normally found in , while R. ​ ​ mordax is more likely to be in . But, according to (Svatopluk Bílý & O. Mehl, ​ 15

1989) R. inquisitor also may inhibit Denmark, especially in the areas of older conifers ​ trees. Consequently we decided to identify our samples using method called DNA barcoding.

3.2 DNA barcoding and CO1 gene

DNA barcoding is a method similar molecular phenology in a way that it is very useful for modern studies in . For almost all multicellular organisms, except fungi the method requires to amplify a mitochondrial gene that everyone shares - cytochrome oxidase I, or CO1. After the gene is amplified and sequenced, further analysis requires to compare with previously sequenced CO1 genes with our samples.(Puillandre, 2007; Rulik, 2017) The Barcode of Life Data or BoLD is the ​ ​ ​ most preferable database devoted for the molecular phylogenetic studies. It includes sequenced CO1 genes for both species R. inquisitor and R. mordax. ​ ​ ​

3.3 DNA Extraction and Purification

To isolate and purify DNA from the insect, the larvae and insects were collected and transported to the lab and kept in the fridge for 2 weeks. Two chosen larvae were placed in a mortar and crushed. 20 μL proteinase K was added to a microcentrifuge tube. 200 μL sample in the form of liquid and tissue was collected and placed in the microcentrifuge tube. Next, 200 μL of buffer AL was added to the microcentrifuge tube and it was vortex mixed for 15 seconds. Next, the tube was placed in an incubator at 56°C for 10 minutes. Once done, the 1,5 mL microcentrifuge tube was centrifuged briefly to remove potential drops from the top of the lid. Then, 200 μL of ethanol (96% to 100%) was added to the sample, it was then vortex-mixed and centrifuged to remove potential droplets. The mixture from the previous step was carefully applied to a QIAmp mini spin column with a 2 mL collection tube attached and the tube and column were centrifuged at 8000 rpm for 1 minute. The collection tube was removed and a new one was attached.

16

Next, a collection tube was added onto the QIAmp mini spin column and 500 μL wash buffer AW1 was added. The combined tubes were centrifuged again for 60 seconds at 8000 rpm. The collection tube was again discarded and a new one was attached. 500 μL Washbuffer AW2 was added to the combined tube and QIAmp mini spin column and it was centrifuged for 3 minutes at 14000 rpm. Once more, the collection tube was discarded and a new one was attached and the combined collection tube and QIAmp mini spin column were centrifuged again to be certain, this time at as high rpm as the centrifuge would go for one minute. Next, a new collection tube was attached and X μL of Elution buffer AE was added to the combined tube and spin column. The mini spin column and collection tube were then left for 10 minutes and then, the combined tube was then centrifuged for one minute at 8000 rpm and the QIAmp mini spin column was discarded and the collection tube kept. The collection tube now contained the solution with the purified DNA from the larvae.

3.4 Analysis of the DNA Quality and Concentration

The extracted DNA was analysed with NanoDrop. The NanoDrop is a spectrophotometer that is specifically used to verify the purity of samples in molecular biology labs. The needed amount for a given sample is about 1-2 µL. Samples are compared to a blank which in our case is elution buffer. The outputs that machine provides are the concentration of the DNA in the sample and the 260/280 wavelengths ratios. The pure DNA samples fall in the range between 1.8 and 2.2 of the 260/280 ratio results.

17

3.5 Primers Design

Our goal was to search for DNA fragments which are similar to the antifreeze protein sequence and determine possible ancestral genes of the antifreeze gene.

The approach we chose was to create primers for the antifreeze protein sequences that have the right specificity to bind antifreeze DNA sequences and other similar sequences. This would create PCR fragments that could then be sequenced and result in candidates of possible ancestors of the antifreeze protein in R. mordax. ​ ​

We based the primer design on the amino acid sequences from the different antifreeze proteins (Appendix 1.). Since we only had access to the amino acid sequences, not the DNA sequences, we had to choose primer sequence based on most non-redundant parts of the AFP sequences. For those purposes we have applied MUSCLE alignment tool. The compilation of the sequences and their comparison may be found on Figure 5. ​ ​

Figure 5: Lineup of the amino acid sequences from different antifreeze protein coding regions in R. mordax (first 8 sequences) and in R. inquisitor (the last sequence, in black). The highlighted regions were the chosen primers - pink is the Forward Primer, orange is the Reverse Primer 2 and red is Reverse Primer 1.

In finding the appropriate sequence on the AFP coding region, we have followed the general guidelines described in Theory section.

3.5.1 Forward Primer

The sequence seen on Figure 5 marked with pink was the chosen forward primer ​ ​ because it is not repeated anywhere else in the AFP coding sequence and does not

18

contain amino acids with too many nucleotide combinations. The chosen region decodes to the following nucleotides: 5’ACI Tg(CT) CA(CT) gCI AA(gA) gC The ‘Inosine’ at the very end of the sequence is not included in the decoded sequence, as it can be any one of the four nucleotides, making it insignificant when positioned in the 3’ end.

3.5.2 First Reverse Primer

To increase the chance of amplifying something other than the AFP coding sequence, we chose to design two reverse primers. Sequence highlighted in red on Figure 5 represents the first reverse primer. Although there are variations in the ​ amino acid on the 5th position there are no repeats of this sequence in the AFP ​ coding sequences. For the primer sequence, we have chosen Q at the 5th position - ​ although K would also make an appropriate candidate. To increase the chances of amplifying a sequence, the forward and the reverse primer should not be spaced too far from each other. The resulting sequence with the forward and the first reverse primer would be 203 bp.

This primer amino acid sequence translates into: 5' CCI ACI CA(gA) ACI CA(gA) AC

3.5.3 Second Reverse Primer

The orange highlighted area of Figure 5 shows the region on which we have based ​ ​ our second reverse primer. There are some varying amino acids at the 4th and the 5th ​ ​ position, but overall seems sufficiently conserved. The distance between the forward primer and the second reverse primer is 299 bp.

This amino acid code translates into the following nucleotides: 5' AA(CT) gCI AT(TCA) ggI CA(gA) gg

19

3.5.4 Primer Modification

During PCR our forward primer will be on one of the DNA’s stands while the reverse primer will be on the opposite - going in each direction, always from the 5’ to the 3’ ends. Therefore, it is necessary to convert the reverse primers into the complementary strand.

Reverse Primer 1(PTQTQT): 5'CCI ACI CA(gA) ACI CA(gA) AC > ggI TgI gT(CT) TgI gT(CT) Tg'5

Reverse Primer 2(NAIGQG): 5'AA(CT) gCI AT(TCA) ggI CA(gA) gg > TT(gA) CgI TA(AgT) CCI gT(CT) CCI'5

Converting the reverse primers into the complementary strand results in change of the direction to 3’-5’, which needs to be normalised back to 5’-3’.

Reverse Primer 1 (PTQTQT): ggI TgI gT(CT) TgI gT(CT) Tg'5 < 5'gT (CT)Tg IgT (CT)Tg IgT Igg

Reverse Primer 2 (NAIGQG): TT(gA) CgI TA(AgT) CCI GT(CT) CCI'5 < 5'CC (CT)TG ICC (AgT)AT IgC (gA)TT

The last thing to consider when making primers is their melting point. The standard melting point is at 60°C. Considering that our primer sequences were relatively short,

20

their melting temperatures were assumed to be low. To increase the melting temperature and ensure stability, a tail was added. The tail we have designed was approximately 10 nucleotide long chain with a random combination of nucleotides. Restriction sites were included into the random sequence for possible fragment identification. The following were the tails with their restriction sites (in bold) we have chosen to be a part of the primer sequences.

Forward Primer tail (TCHAKA) -> 4 nucleotides [gACT] + BglII = gACTAgATCT ​ ​ ​ Reverse Primer 1 tail (PTQTQT) -> 4 nucleotides[gACT] + SalI = gACTCAgCTg ​ ​ ​ Reverse Primer 2 tail (NAIGQG) -> 4 nucleotides[CTgA] + PstI = CTgACTgCAg ​ ​ ​

Together with the tails (in bold), the finished primer sequences were as following:

Forward Primer (TCHAKA): 5'gACTAgATCT ACI Tg(CT) CA(CT) CgI AA(gA) Cg ​ ​

Reverse Primer 1 (PTQTQT): 5'gACTCAgCT ggT (CT)Tg IgT (CT)Tg IgT Igg ​ ​

Reverse Primer 2 (NAIGQG): 5'CTgACTgCA gCC (CT)TG ICC (AgT)AT IgC (gA)TT ​ ​

3.4.5 CO1 primers

The primers were based on conserved regions of CO1 gene of both species inquisitor or mordax. Therefore the primers should be able to amplify CO1 gene. ​ ​ ​

CO1 Forward Primer (with the tail): 5-GATATGATCAGAATTAGGAAATCCAGGATC Tm = 57.5°C

CO1 Reverse Primer (complemented, reversed and with the tail):

21

5-GTCACAACAGATGATCCTCTATGAGC Tm = 58.4°C

CO1 gene sequences can be found in Appendix [CO1 gene sequence]. ​ ​

3.6 Polymerase Chain Reaction

Polymerase chain reaction was performed by setting the lid temperature of the machine to 99°C. Samples were inserted and the reaction proceeded with 95°C for 5 minutes. The samples were exposed to either 0, 1 or 2 repetitions (depending on the sample, for more details refer to Table 6) of 95°C for 30 seconds, annealing ​ ​ temperature of 37°C for 30 seconds and 72°C for 30 seconds. This was followed by 15 or 40 repetitions (refer to Table 6) at 95°C for 30 seconds, 56°C for 30 seconds ​ ​ and 72°C for 30 seconds. Lastly, the samples were exposed to 72°C for 5 minutes. The amplified samples were then stored in a freezer.

3.7 Gel Electrophoresis

We have prepared a 2% and 1.5% agarose gel for gel electrophoresis. The 2% gel contained 20 μL wells and its running time was approximately 1 hour. The 1.5% agarose gel contained 40 μL wells and its running time was approximately 45 minutes. GeneRuler 1kb Plus DNA Ladder was used In all experimental trials for fragment size comparison. After the sufficient running time has passed, the gel was taken to a dark room and analysed under UV light.

3.8 Gel Samples Extraction and Purification

Extraction of the DNA from the 2% agarose gel was done by first moving the gel into a dark room with a UV light to observe the bands clearly. Then, disposable pipette tips were stabbed into the bands to extract DNA directly. These disposable pipette tips were then placed in tubes. Once back in the laboratory, the disposable pipette tips were attached to a pipette, and the extracted gel is then pushed out, as if it was regular content in a pipette. Afterwards, the extracted gel with DNA could be melted and amplified via. PCR.

22

Extraction of the DNA from the 1.5% agarose gel for sequencing was done differently than the previous extraction. Here, the gel was placed on the UV light like the previous extraction, but instead of using disposable pipette tips, a scalpel was used to cut out the entirety of the desired bands in the gel. These were, like the previous extraction, placed in tubes for later use in preparation for sequencing. In both cases, gloves were used due to presence of ethidium bromide in the gel.

In the case of agarose gel samples, prior to the purification procedure, the gel samples were placed in a sterile E-tube. The mass of the individual gel sample was determined. 300 μL of Binding Buffer was added per every 100 mg agarose gel slice to the E-tube. The agarose gel was then dissolved to release the DNA by vortexing the E-tube for 15-30 seconds to resuspend the gel slice in the Binding Buffer. The suspension was incubated for 10 minutes at 56°C in a shaking water bath. After the gel was completely dissolved, 150 μL of isopropanol for every 100 mg agarose gel was added and vortexed. Then, High Pure filter tube was inserted into a Collection tube and the obtained mixture from the E-tube was transferred into the filter tube. The filter tube was centrifuged for 60 seconds at maximum speed, the flow-through was discarded and the filter tube was reconnected with a collection tube. From this step on, the procedure followed the purification method described below.

Purification was performed using HighPure PCR product purification kit. We followed the following instructions: 500μL of Binding Buffer (containing 3M guanidine-thiocyanate, 10mM Tris-HCl pH 6.6, 5% EtOH) was added to 10 μL of PCR product. The mixture was mixed well and applied to a High Pure filter tube, centrifuged at maximum speed for 1 minute. The flow-through was then discarded. 500 μL of Wash Buffer (containing 20 mM NaCl, 2 mM Tris-HCl pH 7.5, 80% EtOH) was added and the mixture was again centrifuged at 13 000 x g for 1 minute. The flow-through was discarded and the Wash Buffer was added again, this time 200 μL, and centrifuged at 13 000 x g for 1 minute. The flow-through was again discarded and 50 μL (25 μL in case of lower volumes of PCR products) of the Elution Buffer (containing 10 mM Tris-HCl, 1 mM EDTA, pH 8.5) was added to the tube. The

23

mixture was centrifuged for the last time at 13 000 x g for one minute and the tube now contained purified PCR product.

3.9 Sequencing

Preliminary steps had to be taken before sending the samples for sequencing. We have followed instructions for sample submission set by Eurofins Genomics and can be found on their website under Mix2Seq section. Templates with 15 μL purified DNA at the concentration of 1ng/μL were prepared. This was done by diluting samples with water, producing the desired volume and concentration. Each sample was made into two templates.

Sample # Purified DNA (μL) Water (μL) ​ ​

8.1b 11.0 4.0

7p 1.0 14.0t

10p 1.56 13.44

Table 1: Content of the templates. 2 μL of forward and reverse primers were added to the according samples. Sample 8.1b(F) and 7p(F) got the Forward Primer, Sample 8.1b(R) got the Second Reverse Primer while the Sample 7p(R) got the First Reverse Primer. Samples 10p(F) and 10p(R) got the CO1 Forward and CO2 Reverse Primers accordingly. One of the two sample templates received the forward primer and the other received the reverse primer. The total volume was 17 μL. The tubes were marked and sent to Eurofins Genomics in which performed sequencing for us.

24

3.10 Bioinformatics analysis

There are about 15 antifreeze genes that were identified in beetles.(Duman et al. ​ 2004). At NCBI website (National Center of Biotechnological Information) there are ​ 17 beetle species with available sequenced genomes. We have made phylogenetic tree in NCBI Taxonomy browser to explain and illustrate their relatedness( Figure 12. ​ Color code: yellow for those who have antifreeze, white for others.) ​

We have applied two algorithms: tBLASTn and nBLAST. For the first analysis we used the AFP protein sequences from Rhagium mordax that ​ ​ we were provided (Appendix 1). We have changed the default local BLOSUM62 alignment matrices to PAM720 to make sure that the distinctly related sequences would be identified as well. The results were restricted to the gene sequences identified from Coleoptera order (taxid:7041). For the purpose of our second analysis with nBLAST we have used the partial cds sequence of AFP gene available from (Trautsch et. al., 2011) (Accession number: ​ HQ54031). The sequence we used belongs to the R. inquisitor species, because no ​ ​ nucleotide sequence for R.mordax was available. The 82% similarity between their ​ AFP proteins seems high enough to make search with this nucleotide sequence of closely related taxa.

BLAST provides user-friendly options for the queries in for this type of searches . We have chosen to use discontiguous megablast option because it helps to find dissimilar sequences. The alternative option of ordinal megablast, that target sequences wither higher similarities, did not yield to any significant results. The results were restricted to the gene sequences identified from Coleoptera order (taxid:7041).

25

4. Results

Mass [g]

Sample 1 14.3

Sample 2 18.7

Table 2: Larvae mass.

ng/μL A260/A280

Sample 1 69.7 1.80

Sample 2 153.9 2.00

Table 3: Larvae DNA quality and concentration.

26

4.1 The first agarose gel

Figure 6: Results from gel electrophoresis 1. Sample contents may be found in Table 6.

Prior to the first gel electrophoresis we have calculated the expected length of the bands present on the gel. This was done by summing-up the amplicon plus the tails from primers on the each side or in the case of CO1 primers, the length of the primers was summed up along with the distance between them.

Forward Primer/Reverse Primer 1: 279 bp + 2*10 bp = 299 bp Forward Primer/Reverse Primer 2: 183 bp + 2*10 bp = 203 bp CO1 Forward and Reverse Primer: 24 bp + 22 bp + 272 bp = 318 bp

Comparing these calculations to the result, we could observe that there were no significant bands from samples 1 to 6, as well as 9 and 11. The only band from those samples at approximately 75 bp was assumed to be primer dimers. Samples 7 and 8

27

both showed a band at the same length of 200 bp. Sample 7 contained Reverse Primer 1 and was expected to be at the length of 300 bp instead - assuming the antifreeze protein sequence was amplified. Sample 8 contained Reverse Primer 2 and was expected to produce a band at 200 bp, had the antifreeze sequence been amplified - that band was found on the gel. Sample 8 showed two additional bands, one at 300 bp and the other at approximately 500 bp. Those two bands were of great interest and we have proceeded our experiments based on that finding. Another band was found with Sample 10 located at approximately 300 bp. This result fit with the expected band length calculation and was in favour of the sample being the targeted species and having CO1 gene.

4.2 The second agarose gel

Figure 7: Results from gel electrophoresis 2.

With the second amplification we aimed for three different things. First was to retest DNA found in the bands that were seen on the first gel electrophoresis and were of

28

interest (Sample 7g-8.2g). The second aim was to reamplify the species determinant by using more cycles of PCR - 40 instead of 30 (Sample 1040 ), confirm it was the right sample and saving it for later sequencing. Lastly, we have used different amount of cycles at annealing temperature of 37°C along with different combinations of primers on Sample 2. This was done in case more bands would appear, such as the ones in Sample 7 and 8.

In regards to bands from the first amplification, the second amplification produced some unexpected results. 7g showed the same band that was amplified at approximately 200 bp - we have assumed this is the antifreeze protein sequence, as many other samples have exhibited this band. 8.0g produced no strong band which was seen at Sample 8. 8.1g produced two distinct bands - one at 200 bp and another at the expected 300 bp (this was the section that was originally taken and amplified). Lastly, 8.2g produced three less visible bands found at 200 bp, 300 bp and the targeted band at 500 bp.

The amplification of CO1 gene at 40 cycles of PCR produced the desired product - the band was again found at approximately 300 bp. Some of the PCR product was used on the gel, while most of it was saved for sequencing.

Combining different primers and cycles at annealing temperature of 37°C produced already observed bands at approximately 200 bp. No additional bands of interest were observed. However, knowing that two of the samples received Reverse Primer 1 and the other two received Reverse Primer 2, we would expect distinct bands based on our calculations. The expected length for Reverse Primer 1 samples predicts a band at approximately 300 bp, while the expected length for Reverse Primer 2 samples predicts a band at approximately 200 bp.

29

4.3 The third agarose gel

Figure 8: The third gel electrophoresis.

The aim of the third gel electrophoresis was to isolate the bands of interest and have a sufficient amount of the DNA material so they could be sent for sequencing. Sample 8.1 produced two bands - one at 200 bp and another at approximately 300 bp. Sample 8.2 produced three bands - one at 200 bp, another at approximately 300 bp and the third one at approximately 500 bp. All of the bands were cut out. After purification, the samples were analysed using NanoDrop. The following table shows the results of that analysis.

Sample # DNA concentration A260/A280 (ng/μL)

8.1a 7.1 1.65

8.1b 2.7 1.27

8.2a 1.5 1.19

8.2b 1.4 1.14

30

8.2c 0.7 0.9

7p 32.7 1.7

10p 19.2 1.6

Table 4: Results from DNA quality and concentration analysis after purification. Samples 8.2a, 8.2b and 8.2c were of insufficient concentration and therefore could not be used for sequencing. Best results can be seen in Samples 7p and 10p. This is because they are direct PCR products, and not purified from the gel. We sent samples 8.1b, 7p and 10p for analysis. 8.1b should produce the sequence of the band of interest. 7p was sent because it contained only the 200 bp band and was of better quality than Sample 8.1a - it was assumed that the two 200 bp bands contained the same product. Its sequence should confirm that the antifreeze protein sequence was amplified and observed. 10p was sent as to confirm that the species we were working with is R. mordax. ​

4.4 Sequencing

The following are the results we have received from sequencing. 8.2 NNNNN (F)

8.2 (R) NNNNN

7p (F) NNNNN

7p (R) NNNNN

31

10p (F) NNNNN

10p NNNNN (R)

Table 5: Results from sequencing. The results did not produce any significant information about the samples. The only sequence obtained was chains of ‘N’, which could be any nucleotide.

4.5 Bioinformatics analysis

We have observed that the query with tBLASTn did not yield to finding similar amount of sequences as in nBLAST: only one sequence was found. The identified sequence is 52 amino acids long and it belongs to mRNA that codes for titin protein from Aethina tumida species. ​ ​

32

Figure 9: Summary of the tBLASTn results. nBLAST: After doing nBLAST of R.mordax with the beetle genomes, we saw some interesting ​ matches with 2 species (underlined with red on the Figure 12). The highest number ​ ​ of matches were shown to be similar to different genes from Agrilus planipennis ​ species. There are 5 main matches to the genome of that beetle, and some of them are uncharacterized. Others belong to ankyrin protein and cuticle protein.

Figure 10: BLAST taxonomy report shows highest number of hits with Agrilus planipennis species.

33

Figure 11: Coverage of the nBLAST new matches.

34

Figure 12: Phylogenetic map (Made with NCBI taxonomy browser). ​

35

5. Discussion

5.1 Purification of the larvae DNA with QIAamp mini kit We were successful in obtaining DNA from the R.mordax larvae using a method ​ normally used for human blood tissue. The method is easier and faster than previously used techniques that were reported to be successful with larvae DNA purification before. The conventional techniques require the use of liquid nitrogen or CTAB-based technique (Huanca-Mamani et. al., 2015; Calderón-Cortés et al., 2010). ​ The methods that acquire handling of the samples with liquid nitrogen need to be performed with safety precautions according to laboratory standard guidelines. Therefore, utilising the genomic kit may be a promising alternative when dealing with insect larvae or any other organisms with similar lipid/carbohydrates tissue content to R.mordax. As DNA was measured in the samples and appeared on the gels, the ​ purification appears to be successful, however, gel 1 should have had shown equally clear bands in column 5 and 6 as in 7 and 8, but this is not the case, indicating that sample 1 is either contaminated or contained insufficient quality of DNA. Although other factors such as the DNA concentration of the Sample 1 being insufficient for detection could also be the case. This was further supported by measurements of DNA quality and concentration, showing that the Sample 2 was generally better than the Sample 1.

5.2 PCR settings

For the first amplification we chose run 30 cycles of 56°C because of the melting point of our primers were simulated for 56°C, and estimated that 30 cycles would be sufficient to make enough PCR product. furthermore we chose to make duplicates of our samples and run 2 cycles of 37°C. This was done because of degenerate primers such as ours sometimes have difficulties binding successfully for polymerase recruiting and the lower annealing temperature helps the binding between primer and

36

DNA. Due to the lack of product we later increased the number of cycles to 40 for the second gel.

5.3 Gel-1 examination

After having performed the first amplification to see whether or not anything would get amplified, we got some positive results. Despite the reaction getting only 30 PCR cycles and using degenerate primers, strong bands could be observed in some of the wells/columns. The DNA isolation and purification seemed to be partially successful as we could observe two bands at approximately 200 bp in Samples 7 and 8, as well as bands at approximately 300 bp for CO1 gene amplification (Sample 10). An interesting result came from Sample 8 - it produced not only one, but three bands measured at approximately 200 bp, 300 bp and 500 bp respectively. The calculated band length for amplified antifreeze protein sequence for this specific sample was approximately 200 bp. Therefore, we assumed that was the content of the 200 bp band which was found. Sample 7 should have produced a sequence at 300 bp had the antifreeze portion been amplified. However, only the 200 bp band could be observed. The reason for that is unknown, unless another sequence other than the targeted antifreeze sequence, was amplified. This sequence could be a part of the antifreeze protein coding sequence, a completely random sequence or a sequence similar to the antifreeze protein and hence of interest. The importance of finding a band at 300 bp in Sample 10 is great as it aligns with our expected band length calculations for CO1 amplified gene. The finding suggests that the CO1 gene was successfully amplified and that the species we were working with is either R.mordax ​ or R.inquisitor. Final species determination would have been done with sequencing. ​ ​ Overall, our result suggests that the primers were functional.

5.4 Primers specificity With the examination of gel-1 we can see a single band in column 7 meaning that the Reverse Primer 1 might be too specific for our purpose as we attempted to find more

37

than only the antifreeze protein whereas the Reverse Primer 2 found two additional bands meaning it still is specific enough and yet able to find alike sequences.

5.5 Gel-2 examination Another amplification was performed based on our previous findings and produced unexpected results. The new agarose gel, seen on Figure 7 revealed that one of the ​ bands that we attempted to amplify ended up blank (see Sample 8.0g). Furthermore, what was previously one band, ended up being multiple after amplification. One possible reason for that is contamination of the agarose gel, which contained traces of other bands’ contents. To avoid this cross-contamination, less amplifications were performed (15 PCR cycles instead of 30 or 40, 1 cycle at annealing temperature of 37°C instead of 2), however, our attempt was futile, as we could still see three different bands where we only took samples from one band.

An additional significant observation was found in samples A, B, C and D. Those samples had similar conditions but different combinations of reverse primers. We expected a band at 300 bp in samples with Reverse Primer 1 and a 200 bp band in samples with Reverse Primer 2. Despite this, only 200 bp bands could be found. This suggests that the Reverse Primer 1 might have bound more effectively close to the position where Reverse Primer 2 was bound instead of binding at the intended place. Therefore, a shorter sequence was produced and 200 bp bands were formed in both cases. Our hypothesis could only be confirmed by sequencing the observed bands. Overall, our findings point in the conclusion that the Reverse Primer 1 was generally less successful in amplifying target sequences than the Reverse Primer 2 (which amplified 3 different sequences altogether).

5.6 Gel DNA extraction

Two methods have been used for DNA extraction from agarose gel in this project. The first method, introduced to the group by Peter Kamp Busk, utilizes disposable

38

pipette tips for extraction by stabbing the tip into the relevant band, which then picks up material in the tip itself. The second, more common method, is to utilize a scalpel to cut out the entire band in question from the gel.

Each method has its benefits and drawbacks. The pipette method has the benefit of not including nearly as much gel and therefore agarose in its extraction method, but it has the drawback that the extracted material is hard to spot in the pipette and an empty pipette may pass inspection as containing the desired material.

As for the more conventional method, it has the notable advantage of the DNA being easily visible and therefore extracted with greater ease. A notable disadvantage of this method is that the extraction includes a much larger amount of gel and therefore agarose, which if used for reamplification, is an inhibiting element.

5.7 Sequencing

As to what occurred during the sequencing is not immediately apparent. The error/errors could have been mixing error or impurities. There are many possibilities, and often, failed sequencing is attributed to wrong primers used, potential loss of sequencing reaction products during cleanup (specifically via. ethanol precipitation) and poor quality DNA caused by remaining SDS detergent as well as primer synthesis failure. We were positive that the DNA was in our samples. This is known both from the sequence data we received as well as the NanoDrop One machine that measured the concentration of DNA in each of the samples before their preparation for sequencing. We also know that primers were present in our prepared samples, thanks to the sequencing data. This indicates that the error/errors were not a result of DNA or primers not being present.

One of the possible reasons that could have a negative effect on our results is the remaining particles of ethanol in the tubes after we did PCR clean-up. It is worth noting that we have performed an extra step that was not supplied in the general protocol to remove the ethanol particles. We have centrifuged our samples with new

39

tubes on maximum speed. We can conclude that most likely that was not enough and ethanol was traced in all of the samples. The alternative solution for this ethanol problem could be to let the samples heat up to 65°C and let the cap be open. This way the ethanol would evaporate from the tubes. The time tubes must spend on the lid will depend on the volume. For example, 5 mL volume would require 5 minutes, while 50 ml sample can have about 10 minutes.

5.8 Interpretation of bioinformatics analysis

The identified matches did not seem to explain the basis of possible gene duplication event. Interestingly, the gene that was identified in the first analysis did not show any significant similarities on the results of the second analysis and vice versa. Moreover, it is unclear why the respective genes from not highly related species showed those matches. Although we observed almost identical matches in analysis with nBLAST, it is worth noting that the coverage of the matches was approximately 30 nucleotides on average. Since we see that there are many of those short similar sequences followed by intervened novel gene parts it is unclear whether the similarities appeared by chance or due the mutated gene copy. Furthermore, the limitation of this study is that we are restricted to the genomes of only 17 beatles, but none of them belong to genus Rhagium and that make the findings not reliable enough to infer the evolutionary origin of RmAFP gene.

40

6. Conclusion

The QIAamp mini kit and the associated method for extraction of DNA from human blood was to our surprise very successful in purifying the DNA retrieved from the larvae from either Rhagium mordax or Rhagium inquisitor. We may assume that this ​ ​ ​ ​ method could possibly be applied to DNA purification for other larvae with similar composition. It is however unclear how error prone this method is.

The primers that the group designed appear to have been mostly successful, with the notable exception of Reverse Primer 1, as it appears to have been too specific for our purposes. This is in comparison to Reverse Primer 2, which produced more bands that were not of the expected size of the antifreeze protein gene.

As for the PCR and the gels, it would appear that these efforts worked successfully, as bands were produced in the agarose gels. This would indicate that the PCR program we utilized was suited for our primers.

Gel extraction, whether by pipette tip or scalpel, appeared to be suitable for the experiments needs, although both methods have their drawbacks. Given enough training and slight improvement in methods of detection, the group judges the pipette method as being superior.

The sequencing was unsuccessful and did not yield any useful results at all. This might be the result of some contamination in all our samples — however, the most probable explanation is potentially high ethanol content traces in the samples.

7. Perspective

To conclude this project, we have come up with several suggestions as to how the course of future research could proceed so it would further strengthen, or reject our hypothesis and interpretation of the results.

41

Firstly, the bands of interest which were observed should be successfully sequenced. This could be achieved by simply preparing the samples for sequencing again and be cautious of possible errors during the procedure. Had there been results from sequencing, it would be possible to perform appropriate analysis and identify the sequences amplified by the designed primers. This would not only provide information about the targeted antifreeze sequences, but also point in the direction of possible ancestors of the antifreeze protein. Results from sequencing would confirm that the species we have been working with was R.mordax and shed a light on ​ ​ whether or not primers amplified the intended sequences.

An alternative way to study the gene duplication events would be to apply Next Generation Sequencing (NGS) to sequence the whole genome of the beetle. This would not only provide significant insight on our project, but may provide information about the genetics of closely related species. Not a single species of the genus Rhagium have been sequenced yet. This is not surprising if we consider the fact that out of all species in the Coleoptera order (40 000) only 17 were sequenced with ​ NGS. This fact is partially explained by the cost of the whole genome sequencing that despite exponential decrease of NGS costs in the last decade, is still a main issue for the prospective research. (Park & Kim, 2016) There are several main ​ reasons why we think NGS is a great future direction for solving the current challenges in molecular evolution research similar to our study of the origin of the antifreeze gene. Firstly, the NGS will give us all possible sequences from the genome of R. mordax. It is important to have the sequences of all genes from that ​ beatle because it will increase the statistical confidence of the identified similar genes. When we use degenerate primers, they target only specific parts of the antifreeze gene. It is possible to assume that the targeted sequences were not present in the original gene, and have mutated only after the gene duplication event. If we make NGS on this species we can illuminate potential mismatch errors and find the genes with the highest similarity.

42

Among minor improvements that could be made in the method would be the use of an ultraviolet flashlight to determine whether or not gel with DNA has been successfully extracted utilizing the method involving pipette tips.

43

8. References

Baskauf, S. Research Guides: BSCI 1510L: The genetic code and the Central Dogma of Molecular Biology. (2017). http://researchguides.library.vanderbilt.edu/c.php?g=69346&p=816436

Bar Dolev, M., Braslavsky, I., & Davies, P. L. (2016). Ice-Binding Proteins and Their Function. Annual Review of Biochemistry, 85(1), 515–542. https://doi.org/10.1146/annurev-biochem-060815-014546

Breimer, L. H. (1988). Ionizing radiation-induced mutagenesis. British Journal of Cancer, 57(1), 6–18. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3279995 ​ ​

Busk, P. K., & Lange, L. (2013). Cellulolytic Potential of Thermophilic Species from Four Fungal Orders. AMB Express. https://doi.org/10.1186/2191-0855-3-47 ​ ​

Calderón-Cortés, N., Quesada, M., Cano-Camacho, H., & Zavala-Páramo, G. (2010). A simple and rapid method for DNA isolation from xylophagous insects. International Journal of Molecular Sciences, 11(12), 5056–5064.https://doi.org/10.3390/ijms11125056 ​

Davies, P. L. (2014). Ice-binding proteins: A remarkable diversity of structures for stopping and starting ice growth. Trends in Biochemical Sciences, 39(11), 548–555. https://doi.org/10.1016/j.tibs.2014.09.005

DeVries, A. L., Komatsu, S. K., & Feeney, R. E. (1970). Chemical and physical properties of freezing point-depressing glycoproteins from Antarctic fishes. Journal of Biological Chemistry, 245(11), 2901–2908. https://www.jbc.org/content/245/11/2901.long

Duman, J. G., Bennett, V., Sformo, T., Hochstrasser, R., & Barnes, B. M. (2004). Antifreeze proteins in Alaskan insects and spiders. Journal of Insect Physiology. https://doi.org/10.1016/j.jinsphys.2003.12.003

Griffiths, A. J., Miller, J. H., Suzuki, D. T., Lewontin, R. C., & Gelbart, W. M. (2000). Aneuploidy. Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK21870/ ​ ​

Griffiths, A. J., Miller, J. H., Suzuki, D. T., Lewontin, R. C., & Gelbart, W. M. (2000). Induced mutations. Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK21936/ ​ ​

44

Griffiths, A. J., Miller, J. H., Suzuki, D. T., Lewontin, R. C., & Gelbart, W. M. (2000). Spontaneous mutations. Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK21897/

Huanca-Mamani, W., Rivera-Cabello, D., & Maita-Maita, J. (2015). A simple, fast, and inexpensive CTAB-PVP-silica based method for genomic DNA isolation from single, small insect larvae and pupae. Genetics and Molecular Research. https://doi.org/10.4238/2015.July.17.8

Hudait, A., Moberg, D. R., Qiu, Y., Odendahl, N., Paesani, F., & Molinero, V. (2018). Preordering of water is not needed for ice recognition by hyperactive antifreeze proteins. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1806996115

Kristiansen, E., & Zachariassen, K. E. (2005). The mechanism by which fish antifreeze proteins cause thermal hysteresis. Cryobiology. https://doi.org/10.1016/j.cryobiol.2005.07.007

Kristiansen, E., Wilkens, C., Vincents, B., Friis, D., Lorentzen, A. B., Jenssen, H., … Ramløv, H. (2012). Hyperactive antifreeze proteins from longhorn beetles: Some structural insights. Journal of Insect Physiology, 58(11), 1502–1510. https://doi.org/10.1016/j.jinsphys.2012.09.004

Linnaeus, C. 1758, Systema Naturae. DOI: 10.1063/1.2193967 ​ ​ ​ ​

L. Silver. (n.d.). Unequal crossing over - an overview | ScienceDirect Topics. Retrieved November 14, 2018, from https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/u nequal-crossing-over

Magadum, S., Banerjee, U., Murugan, P., Gangapur, D., & And Ravikesavan, R. (2013). Gene duplication as a major force in evolution. J. Genet (Vol. 92). Retrieved from https://www.ias.ac.in/article/fulltext/jgen/092/01/0155-0161 ​ ​

Park, S. T., & Kim, J. (2016). Trends in next-generation sequencing and a new era for whole genome sequencing. International Neurourology Journal. https://doi.org/10.5213/inj.1632742.371

Reams, A. B., & Roth, J. R. (2015). Mechanisms of gene duplication and amplification. Cold Spring Harbor Perspectives in Biology, 7(2), a016592. https://doi.org/10.1101/cshperspect.a016592

45

Rulik, B., Eberle, J., von der Mark, L., Thormann, J., Jung, M., Köhler, F., … Ahrens, D. (2017). Using taxonomic consistency with semi-automated data pre-processing for high quality DNA barcodes. Methods in Ecology and Evolution. https://doi.org/10.1111/2041-210X.12824

Sformo, T., Walters, K., Jeannet, K., Wowk, B., Fahy, G. M., Barnes, B. M., & Duman, J. G. (2010). Deep , vitrification and limited survival to -100 C in the Alaskan beetle Cucujus clavipes puniceus (Coleoptera: Cucujidae) larvae. Journal of Experimental Biology. https://doi.org/10.1242/jeb.035758 ​ ​ skoven kirke – Google Maps. (n.d.). Retrieved December 17, 2018, from https://www.google.com/maps/search/skoven+kirke/@55.9010773,11.9413805,14.65 z

Svatopluk Bílý & O. Mehl, Longhorn beetles (coleoptera, Cerambycidae) of Fennoscandia and Denmark, 1989

Trautsch, J., Rosseland, B. O., Pedersen, S. A., Kristiansen, E., & Zachariassen, K. E. (2011). Do ice nucleating lipoproteins protect frozen insects against toxic chemical agents? Journal of Insect Physiology. https://doi.org/10.1016/j.jinsphys.2011.03.025 ​ ​

Viguera, E., Canceill, D., & Ehrlich, S. D. (2001). Replication slippage involves DNA polymerase pausing and dissociation. The EMBO Journal, 20(10), 2587–95. https://doi.org/10.1093/emboj/20.10.2587

46

9. Appendix

1. AFP protein sequences

>AFP1 [Rhagium mordax] MLTSPAIAHAYSCRAVGVDASTVTDVQGTCHAKATGPGAVASGTSVDGSTSTATATGSGATATSTSTGTGTATTTATSNAAATS NAIGQGTATSTATGTAAARAIGSSTTSASATEPTQTKTVSGPGAQTATAIAIDTATTTVTAS >AFP2 [Rhagium mordax] MLTSPAIAHAYSCRAVGVDASTVTDVQGTCHAKATGPGAVASGTSVDGSTSTATATGSGATATSTSTGTGTATTTATSNAAATS NAIGQGTATSTATGTAAARAIGSSTTSASATEPTQTKTVSGPGAQTATAIAIDTATTTVTAS >AFP3 [Rhagium mordax] MSMKMIQTFAFACLVITLTSPAIAHAYSCRAVGVDGPAVTDIQGTCHAKATGYGAVASGTSEDGSTSTATATGSGATATSTSTGT GTATTTATSNAEATSNAIGQGTATTTATGNGGARAIGASTTSASASEPTQTRTITGPGSQTATAFARDTATTTVTAS >AFP4 [Rhagium mordax] MHTPCRAVGVDGPVVTDVQGTCHAKATGVGAVASGTSVDGSTSTATATGSGASATSTSTGSGTATTTATSNASATSNAIDQGT ATSTATGTAAARAIGASTTSASASEPTQTQTISGVGTQTATAFATDTATTTVTAS >AFP5 [Rhagium mordax] MSMKMIQRFAFACLVITLTSPAIAHAYSCRAVGVDGPVVTDVQGTCHAKATGVGAVASGTSVDGSTSTATATGSGASATSTSTG SGTATTTATSNASATSNAIDQGTATSTATGTAAARAIGASTTSASASEPTQTQTISGVGTQTATAFATDTATTTVTAS >AFP6 [Rhagium mordax] MMLTSPAIAHAYSCRAVGVDGQAVTDIHGTCHAKATGSGAVASGTSEDGSRSTATATGSGAIATSTSSGSGTATTTATGNAAAT SNAIGRGTATTTATGTGGRATGTSTISASASEPTQTSTVTGPGSQTGTAFARDTATTTVTSS >AFP7 [Rhagium mordax] MMLTSPAIAHAYSCRAVGVDASTVTDVQGTCHAKATGPGAVASGTSVDGSTSTATATGSGATATSTSTGTGTATTTATSNAAAT SNAIGQGTATSTATGTAAARAIGSSTTSASATEPTQTKTVSGPGAQTATAIAIDTATTTVTAS >AFP8 [Rhagium mordax] MIQAFAFACLVMMLTSPAIAHAYSCRAVGVDASTVTDVQGTCHAKATGPGAVASGTSVDGSTSTATATGSGATATSTSTGTGTA TTTATSNAAATSNAIGQGTATSTATGTAAARAIGSSTTSASATEPTQTKTVSGPGAQTATAIAIDTATTTVTAS >ADR80610.1 antifreeze protein, partial [Rhagium inquisitor] CRAVGVDGRAVTDIQGTCHAKATGAGAMASGTSEPGSTSTATATGRGATARSTSTGRGTATTTATGTASATSNAIGQGTATTTA TGSAGGRATGSATTSSSASQPTQTQTITGPGFQTAKSFARNTATTTVTAS 2. CO1 Gene sequence

>MF776961.1 Rhagium mordax isolate RT19 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial GAACATCTTTAAGTTTATTAATTCGATCAGAATTAGGAAATCCAGGATCATTAATTGGTGATGATCAAATTTATAATGTAATTG ​ ​ TTACCGCTCATGCTTTCATTATAATTTTCTTTATAGTTATACCTATTATAATTGGAGGATTTGGAAATTGATTAGTTCCATTAAT ACTTGGAGCTCCTGATATAGCATTCCCTCGAATGAATAATATAAGATTTTGACTTTTACCTCCTTCACTAACTTTATTAATTAT AAGAAGAGTAGTAGAAAGTGGTGCTGGAACTGGTTGAACAGTTTATCCCCCTCTTTCATCAAATATTGCTCATAGAGGATCA ​ TCTGTTGATTTAGCAATTTTTAGATTACATCTAGCAGGAATTTCTTCAATTCTTGGAGCTGTAAATTTTATTACAACAGTTATTA ​ ATATACGACCTGTAGGTATAACTCCAGACCGTGTTCCTTTATTTGTTTGAGCAGTTGTAATTACAGCAATTCTTCTTCTTCTTT CTTTACCTGTTTTAGCAGGTGCTATCACAATATTATTAACAGATCGAAATTTAAATACTTCATTTTTTGATCCTGCAGGAGGAG GAGATCCTATTCTTTATCAACATTTATTTTGATTTTTTG

>MF115598.1 Rhagium inquisitor cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial CATCCTTAAGATTGTTAATTCGAACAGAACTAGGAAATCCAGGATCTTTAATTGGAGATGATCAAATCTATAATGTTATTGTTA ​ ​ ​ ​ ​ CTGCCCATGCTTTTATTATAATTTTTTTCATAGTTATACCAATTATAATTGGAGGATTTGGAAACTGATTAGTTCCCCTAATATT GGGGGCTCCCGATATAGCTTTTCCTCGAATAAATAATATAAGATTTTGACTTTTACCTCCTTCTTTAACTTTATTAATCATAAG

47

AAGAGTGGTAGAAAATGGGGCTGGTACCGGATGAACCGTCTATCCCCCTCTTTCTTCAAATATTGCTCATAGAGGATCTTCA ​ GTTGATTTAGCAATTTTTAGGTTACATTTAGCAGGTATCTCTTCAATTCTTGGTGCAGTAAATTTTATCACTACAGTAATTAAT ​ ATACGACCAATTGGAATAACTCCTGATCGTGTTCCTTTATTTGTTTGGGCAGTAGTAATTACTGCAATTCTTCTTCTTCTTT

3. Content of the samples

B FP RP1 RP2 CO1FP CO1BP DNA DNA H2O 37°C PCR S1 S2 Cycles Cycles

1 10 2 2 / / / 2.86 / 3.14 / 30

2 10 2 / 2 / / 2.86 / 3.14 / 30

3 10 2 2 / / / / 1.3 4.7 / 30

4 10 2 / 2 / / / 1.3 4.7 / 30

5 10 2 2 / / / 2.86 / 3.14 2 30

6 10 2 / 2 / / 2.86 / 3.14 2 30

7 10 2 2 / / / / 1.3 4.7 2 30

7g 25 5 5 / / / band from 15 / 15 sample 7

7p 25 5 5 / / / / 1.3 15 / 15

8 10 2 / 2 / / / 1.3 4.7 2 30

8.0g 25 5 / 5 1st band 15 / 15 from Sample 8

8.1g 25 5 / 5 / / 2nd band 15 / 15 from Sample 8

8.2g 25 5 / 5 / / 3rd band 15 / 15 from sample 8

9 10 / / / 2 2 2.86 / 3.14 / 30

10 10 / / / 2 2 / 1.3 4.7 / 30

1040 25 / / / 5 5 Band from 8.5 / 40 Sample 10

11 10 / / / 2 2 / / 10 / 30

A 10 2 2 / / / / 4.7 1 40

48

B 10 2 / 2 / / / 4.7 1 40

C 10 2 2 / / / / 4.7 2 40

D 10 2 / 2 / / / 4.7 2 40 Table 6: All samples and their contents based on the PCR reaction. All units are in μL. B designates the buffer, FP is the forward primer, RP1 is the first reverse primer, RP2 is the second reverse primer, CO1FP is the CO1 gene forward primer, CO2RP is the CO1 gene forward primer, DNA S1 is the purified DNA from Sample 1, DNA S2 is the purified DNA from Sample 2. 37°C Cycles refers to the number of repetitions at the annealing temperature of 37°C during PCR. PCR Cycles refers to the number of repetitions at the standard annealing temperature of 56°C during PCR. The colouring refers to the different PCR trials: red was the first one performed, yellow was the second one and the blue was the third.

49

50